Search: preference optimization

Policy Gradient Intermediate

Optimizing policies directly via gradient ascent on expected reward.

AI Economics & Strategy

Stochastic Approximation Intermediate

Optimization under uncertainty.

Foundations & Theory

Norm Advanced

Measure of vector magnitude; used in regularization and optimization.

Mathematics

Value Misalignment Advanced

Model optimizes objectives misaligned with human values.

AI Safety & Alignment

Global Minimum Intermediate

Lowest possible loss.

Foundations & Theory

Machine Learning Intermediate

A subfield of AI where models learn patterns from data to make predictions or decisions, improving with experience rather than explicit rule-coding.

Machine Learning

Artificial Intelligence Intermediate

The field of building systems that perform tasks associated with human intelligence—perception, reasoning, language, planning, and decision-making—via algori...

Foundations & Theory

Semi-Supervised Learning Intermediate

Training with a small labeled dataset plus a larger unlabeled dataset, leveraging assumptions like smoothness/cluster structure.

Machine Learning

Deep Learning Intermediate

A branch of ML using multi-layer neural networks to learn hierarchical representations, often excelling in vision, speech, and language.

Deep Learning

Meta-Learning Intermediate

Methods that learn training procedures or initializations so models can adapt quickly to new tasks with little data.

Machine Learning

Online Learning Intermediate

Learning where data arrives sequentially and the model updates continuously, often under changing distributions.

Machine Learning

Parameters Intermediate

The learned numeric values of a model adjusted during training to minimize a loss function.

Foundations & Theory

Representation Learning Intermediate

Automatically learning useful internal features (latent variables) that capture salient structure for downstream tasks.

Machine Learning

Objective Function Intermediate

A scalar measure optimized during training, typically expected loss over data, sometimes with regularization terms.

Optimization

Mean Squared Error Intermediate

Average of squared residuals; common regression objective.

Optimization

Gradient Descent Intermediate

Iterative method that updates parameters in the direction of negative gradient to minimize loss.

Optimization

Stochastic Gradient Descent Intermediate

A gradient method using random minibatches for efficient training on large datasets.

Foundations & Theory

Adam Intermediate

Popular optimizer combining momentum and per-parameter adaptive step sizes via first/second moment estimates.

Optimization

Epoch Intermediate

One complete traversal of the training dataset during training.

Foundations & Theory

Neural Network Intermediate

A parameterized function composed of interconnected units organized in layers with nonlinear activations.

Neural Networks

Activation Function Intermediate

Nonlinear functions enabling networks to approximate complex mappings; ReLU variants dominate modern DL.

Foundations & Theory

Exploding Gradient Intermediate

Gradients grow too large, causing divergence; mitigated by clipping, normalization, careful init.

Foundations & Theory

LSTM Intermediate

An RNN variant using gates to mitigate vanishing gradients and capture longer context.

Foundations & Theory

Grounding Intermediate

Constraining outputs to retrieved or provided sources, often with citation, to improve factual reliability.

Foundations & Theory

Fine-Tuning Intermediate

Updating a pretrained model’s weights on task-specific data to improve performance or adapt style/behavior.

Large Language Models

Reward Model Intermediate

Model trained to predict human preferences (or utility) for candidate outputs; used in RLHF-style pipelines.

Foundations & Theory

Alignment Intermediate

Ensuring model behavior matches human goals, norms, and constraints, including reducing harmful or deceptive outputs.

Foundations & Theory

Curriculum Learning Intermediate

Ordering training samples from easier to harder to improve convergence or generalization.

Foundations & Theory

Federated Learning Intermediate

Training across many devices/silos without centralizing raw data; aggregates updates, not data.

Foundations & Theory

Softmax Intermediate

Converts logits to probabilities by exponentiation and normalization; common in classification and LMs.

Foundations & Theory

Results for "preference optimization"

Welcome to AI Glossary

Search

Browse

3D WordGraph