Results for "direct preference optimization"
A preference-based training method optimizing policies directly from pairwise comparisons without explicit RL loops.
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
Optimizing continuous action sequences.
Inferring reward function from observed behavior.
Inferring and aligning with human preferences.
Directly optimizing control policies.
Generative model that learns to reverse a gradual noise process.
Diffusion performed in latent space for efficiency.
Formal model linking causal mechanisms and variables.
Assigning AI costs to business units.
System returns to equilibrium after disturbance.
Human controlling robot remotely.
Designing systems where rational agents behave as desired.
Optimization with multiple local minima/saddle points; typical in neural networks.
Optimization problems where any local minimum is global.
Optimization under equality/inequality constraints.
A point where gradient is zero but is neither a max nor min; common in deep nets.
Optimization using curvature information; often expensive at scale.
Visualization of optimization landscape.
Restricting updates to safe regions.
Methods like Adam adjusting learning rates dynamically.
A function measuring prediction error (and sometimes calibration), guiding gradient-based optimization.
The shape of the loss function over parameter space.
Adjusting learning rate over training to improve convergence.
Minimum relative to nearby points.
Distributed agents producing emergent intelligence.
Flat high-dimensional regions slowing training.
Choosing step size along gradient direction.
Converts constrained problem to unconstrained form.
Alternative formulation providing bounds.