Results for "direct preference optimization"
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
A preference-based training method optimizing policies directly from pairwise comparisons without explicit RL loops.
Optimization problems where any local minimum is global.
Optimization with multiple local minima/saddle points; typical in neural networks.
Optimization under equality/inequality constraints.
A function measuring prediction error (and sometimes calibration), guiding gradient-based optimization.
Optimization using curvature information; often expensive at scale.
Measure of vector magnitude; used in regularization and optimization.
Visualization of optimization landscape.
Optimization under uncertainty.
Methods like Adam adjusting learning rates dynamically.
Optimizing continuous action sequences.