Search: preference optimization

DPO Intermediate

A preference-based training method optimizing policies directly from pairwise comparisons without explicit RL loops.

Optimization

RLHF Intermediate

Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.

Optimization

Inverse Reinforcement Learning Advanced

Inferring reward function from observed behavior.

Reinforcement Learning

Value Learning Intermediate

Inferring and aligning with human preferences.

Governance & Ethics

Non-Convex Optimization Intermediate

Optimization with multiple local minima/saddle points; typical in neural networks.

AI Economics & Strategy

Convex Optimization Intermediate

Optimization problems where any local minimum is global.

AI Economics & Strategy

Constrained Optimization Intermediate

Optimization under equality/inequality constraints.

Foundations & Theory

Trajectory Optimization Advanced

Optimizing continuous action sequences.

Reinforcement Learning

Saddle Point Intermediate

A point where gradient is zero but is neither a max nor min; common in deep nets.

AI Economics & Strategy

Second-Order Methods Intermediate

Optimization using curvature information; often expensive at scale.

AI Economics & Strategy

Objective Surface Intermediate

Visualization of optimization landscape.

Foundations & Theory

Trust Region Intermediate

Restricting updates to safe regions.

Foundations & Theory

Adaptive Optimization Intermediate

Methods like Adam adjusting learning rates dynamically.

Foundations & Theory

Loss Function Intermediate

A function measuring prediction error (and sometimes calibration), guiding gradient-based optimization.

Foundations & Theory

Loss Landscape Intermediate

The shape of the loss function over parameter space.

AI Economics & Strategy

Learning Rate Schedule Intermediate

Adjusting learning rate over training to improve convergence.

AI Economics & Strategy

Swarm Intelligence Advanced

Distributed agents producing emergent intelligence.

Agents & Autonomy

Local Minimum Intermediate

Minimum relative to nearby points.

Foundations & Theory

Saddle Plateau Intermediate

Flat high-dimensional regions slowing training.

Foundations & Theory

Line Search Intermediate

Choosing step size along gradient direction.

Foundations & Theory

Lagrangian Intermediate

Converts constrained problem to unconstrained form.

Foundations & Theory

Dual Problem Intermediate

Alternative formulation providing bounds.

Foundations & Theory

Surrogate Model Advanced

Fast approximation of costly simulations.

AI in Science

Hyperparameters Intermediate

Configuration choices not learned directly (or not typically learned) that govern training or architecture.

Optimization

Momentum Intermediate

Uses an exponential moving average of gradients to speed convergence and reduce oscillation.

Optimization

Learning Rate Intermediate

Controls the size of parameter updates; too high diverges, too low trains slowly or gets stuck.

Foundations & Theory

Gradient Noise Intermediate

Variability introduced by minibatch sampling during SGD.

AI Economics & Strategy

Gradient Clipping Intermediate

Limiting gradient magnitude to prevent exploding gradients.

AI Economics & Strategy

Hessian Matrix Intermediate

Matrix of second derivatives describing local curvature of loss.

AI Economics & Strategy

Hessian Advanced

Matrix of curvature information.

Mathematics

Results for "preference optimization"

Welcome to AI Glossary

Search

Browse

3D WordGraph