Search: direct preference optimization

DPO Intermediate

A preference-based training method optimizing policies directly from pairwise comparisons without explicit RL loops.

Optimization

RLHF Intermediate

Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.

Optimization

Trajectory Optimization Advanced

Optimizing continuous action sequences.

Reinforcement Learning

Inverse Reinforcement Learning Advanced

Inferring reward function from observed behavior.

Reinforcement Learning

Value Learning Intermediate

Inferring and aligning with human preferences.

Governance & Ethics

Policy Search Advanced

Directly optimizing control policies.

Reinforcement Learning

Diffusion Model Advanced

Generative model that learns to reverse a gradual noise process.

Diffusion & Generative Models

Latent Diffusion Advanced

Diffusion performed in latent space for efficiency.

Diffusion & Generative Models

Structural Causal Model Advanced

Formal model linking causal mechanisms and variables.

Causal AI & Interpretability

Cost Attribution Intermediate

Assigning AI costs to business units.

AI Economics & Strategy

Stability Intermediate

System returns to equilibrium after disturbance.

Foundations & Theory

Teleoperation Frontier

Human controlling robot remotely.

World Models & Cognition

Mechanism Design Advanced

Designing systems where rational agents behave as desired.

Agents & Autonomy

Non-Convex Optimization Intermediate

Optimization with multiple local minima/saddle points; typical in neural networks.

AI Economics & Strategy

Convex Optimization Intermediate

Optimization problems where any local minimum is global.

AI Economics & Strategy

Constrained Optimization Intermediate

Optimization under equality/inequality constraints.

Foundations & Theory

Saddle Point Intermediate

A point where gradient is zero but is neither a max nor min; common in deep nets.

AI Economics & Strategy

Second-Order Methods Intermediate

Optimization using curvature information; often expensive at scale.

AI Economics & Strategy

Objective Surface Intermediate

Visualization of optimization landscape.

Foundations & Theory

Trust Region Intermediate

Restricting updates to safe regions.

Foundations & Theory

Adaptive Optimization Intermediate

Methods like Adam adjusting learning rates dynamically.

Foundations & Theory

Loss Function Intermediate

A function measuring prediction error (and sometimes calibration), guiding gradient-based optimization.

Foundations & Theory

Loss Landscape Intermediate

The shape of the loss function over parameter space.

AI Economics & Strategy

Learning Rate Schedule Intermediate

Adjusting learning rate over training to improve convergence.

AI Economics & Strategy

Local Minimum Intermediate

Minimum relative to nearby points.

Foundations & Theory

Swarm Intelligence Advanced

Distributed agents producing emergent intelligence.

Agents & Autonomy

Saddle Plateau Intermediate

Flat high-dimensional regions slowing training.

Foundations & Theory

Line Search Intermediate

Choosing step size along gradient direction.

Foundations & Theory

Lagrangian Intermediate

Converts constrained problem to unconstrained form.

Foundations & Theory

Dual Problem Intermediate

Alternative formulation providing bounds.

Foundations & Theory

Results for "direct preference optimization"

Welcome to AI Glossary

Search

Browse

3D WordGraph