Results for "direct preference optimization"

AdvertisementAd space — search-top

123 results

DPO Intermediate

A preference-based training method optimizing policies directly from pairwise comparisons without explicit RL loops.

Optimization
RLHF Intermediate

Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.

Optimization
Trajectory Optimization Advanced

Optimizing continuous action sequences.

Reinforcement Learning
Inverse Reinforcement Learning Advanced

Inferring reward function from observed behavior.

Reinforcement Learning
Value Learning Intermediate

Inferring and aligning with human preferences.

Governance & Ethics
Policy Search Advanced

Directly optimizing control policies.

Reinforcement Learning
Diffusion Model Advanced

Generative model that learns to reverse a gradual noise process.

Diffusion & Generative Models
Latent Diffusion Advanced

Diffusion performed in latent space for efficiency.

Diffusion & Generative Models
Structural Causal Model Advanced

Formal model linking causal mechanisms and variables.

Causal AI & Interpretability
Cost Attribution Intermediate

Assigning AI costs to business units.

AI Economics & Strategy
Stability Intermediate

System returns to equilibrium after disturbance.

Foundations & Theory
Teleoperation Frontier

Human controlling robot remotely.

World Models & Cognition
Mechanism Design Advanced

Designing systems where rational agents behave as desired.

Agents & Autonomy
Non-Convex Optimization Intermediate

Optimization with multiple local minima/saddle points; typical in neural networks.

AI Economics & Strategy
Convex Optimization Intermediate

Optimization problems where any local minimum is global.

AI Economics & Strategy
Constrained Optimization Intermediate

Optimization under equality/inequality constraints.

Foundations & Theory
Saddle Point Intermediate

A point where gradient is zero but is neither a max nor min; common in deep nets.

AI Economics & Strategy
Second-Order Methods Intermediate

Optimization using curvature information; often expensive at scale.

AI Economics & Strategy
Objective Surface Intermediate

Visualization of optimization landscape.

Foundations & Theory
Trust Region Intermediate

Restricting updates to safe regions.

Foundations & Theory
Adaptive Optimization Intermediate

Methods like Adam adjusting learning rates dynamically.

Foundations & Theory
Loss Function Intermediate

A function measuring prediction error (and sometimes calibration), guiding gradient-based optimization.

Foundations & Theory
Loss Landscape Intermediate

The shape of the loss function over parameter space.

AI Economics & Strategy
Learning Rate Schedule Intermediate

Adjusting learning rate over training to improve convergence.

AI Economics & Strategy
Local Minimum Intermediate

Minimum relative to nearby points.

Foundations & Theory
Swarm Intelligence Advanced

Distributed agents producing emergent intelligence.

Agents & Autonomy
Saddle Plateau Intermediate

Flat high-dimensional regions slowing training.

Foundations & Theory
Line Search Intermediate

Choosing step size along gradient direction.

Foundations & Theory
Lagrangian Intermediate

Converts constrained problem to unconstrained form.

Foundations & Theory
Dual Problem Intermediate

Alternative formulation providing bounds.

Foundations & Theory

Welcome to AI Glossary

The free, self-building AI dictionary. Help us keep it free—click an ad once in a while!

Search

Type any question or keyword into the search bar at the top.

Browse

Tap a letter in the A–Z bar to browse terms alphabetically, or filter by domain, industry, or difficulty level.

3D WordGraph

Fly around the interactive 3D graph to explore how AI concepts connect. Click any word to read its full definition.