Results for "convergence"

Instrumental Convergence

Advanced

Tendency for agents to pursue resources regardless of final goal.

Instrumental convergence is like a student who wants to get good grades but realizes that studying hard and doing homework will help them achieve that goal. No matter what subject they are studying, they know that being organized and having enough time to study will help them succeed. In AI, this...

Full Definition View in 3D WordGraph

34 results

Momentum Intermediate

Uses an exponential moving average of gradients to speed convergence and reduce oscillation.

Batch Size Intermediate

Number of samples per gradient update; impacts compute efficiency, generalization, and stability.

Foundations & Theory

Learning Rate Schedule Intermediate

Adjusting learning rate over training to improve convergence.

AI Economics & Strategy

Instrumental Convergence Advanced

Tendency for agents to pursue resources regardless of final goal.

AI Safety & Alignment

Gradient Descent Intermediate

Iterative method that updates parameters in the direction of negative gradient to minimize loss.

Curriculum Learning Intermediate

Ordering training samples from easier to harder to improve convergence or generalization.

Foundations & Theory

Stochastic Gradient Descent Intermediate

A gradient method using random minibatches for efficient training on large datasets.

Foundations & Theory

Second-Order Methods Intermediate

Optimization using curvature information; often expensive at scale.

AI Economics & Strategy

Adaptive Optimization Intermediate

Methods like Adam adjusting learning rates dynamically.

Foundations & Theory

Hyperparameters Intermediate

Configuration choices not learned directly (or not typically learned) that govern training or architecture.

Learning Rate Intermediate

Controls the size of parameter updates; too high diverges, too low trains slowly or gets stuck.

Foundations & Theory

ReLU Intermediate

Activation max(0, x); improves gradient flow and training speed in deep nets.

Foundations & Theory

Activation Function Intermediate

Nonlinear functions enabling networks to approximate complex mappings; ReLU variants dominate modern DL.

Foundations & Theory

Weight Initialization Intermediate

Methods to set starting weights to preserve signal/gradient scales across layers.

Foundations & Theory

Vanishing Gradient Intermediate

Gradients shrink through layers, slowing learning in early layers; mitigated by ReLU, residuals, normalization.

Foundations & Theory

Normalization Intermediate

Techniques that stabilize and speed training by normalizing activations; LayerNorm is common in Transformers.

Foundations & Theory

Federated Learning Intermediate

Training across many devices/silos without centralizing raw data; aggregates updates, not data.

Foundations & Theory

Saddle Point Intermediate

A point where gradient is zero but is neither a max nor min; common in deep nets.

AI Economics & Strategy

Gradient Noise Intermediate

Variability introduced by minibatch sampling during SGD.

AI Economics & Strategy

Loss Landscape Intermediate

The shape of the loss function over parameter space.

AI Economics & Strategy

Warmup Intermediate

Gradually increasing learning rate at training start to avoid divergence.

AI Economics & Strategy

On-Policy Learning Intermediate

Learning only from current policy’s data.

AI Economics & Strategy

Noise Schedule Advanced

Controls amount of noise added at each diffusion step.

Diffusion & Generative Models

Orthogonality Advanced

Vectors with zero inner product; implies independence.

Condition Number Advanced

Sensitivity of a function to input perturbations.

Monte Carlo Estimation Advanced

Approximating expectations via random sampling.

Probability & Statistics

Objective Surface Intermediate

Visualization of optimization landscape.

Foundations & Theory

Local Minimum Intermediate

Minimum relative to nearby points.

Foundations & Theory

Saddle Plateau Intermediate

Flat high-dimensional regions slowing training.

Foundations & Theory

Line Search Intermediate

Choosing step size along gradient direction.

Foundations & Theory