Results for "training loss"

54 results

Loss Landscape Intermediate

The shape of the loss function over parameter space.

AI Economics & Strategy
Parameters Intermediate

The learned numeric values of a model adjusted during training to minimize a loss function.

Foundations & Theory
Objective Function Intermediate

A scalar measure optimized during training, typically expected loss over data, sometimes with regularization terms.

Optimization
Empirical Risk Minimization Intermediate

Minimizing average loss on training data; can overfit when data is limited or biased.

Optimization
Gradient Descent Intermediate

Iterative method that updates parameters in the direction of negative gradient to minimize loss.

Optimization
Quantization Intermediate

Reducing numeric precision of weights/activations to speed inference and reduce memory with acceptable accuracy loss.

Foundations & Theory
Hessian Matrix Intermediate

Matrix of second derivatives describing local curvature of loss.

AI Economics & Strategy
Global Minimum Intermediate

Lowest possible loss.

Foundations & Theory
Catastrophic Forgetting Intermediate

Loss of old knowledge when learning new tasks.

Model Failure Modes
Value at Risk Intermediate

Maximum expected loss under normal conditions.

AI Economics & Strategy
Epoch Intermediate

One complete traversal of the training dataset during training.

Foundations & Theory
Privacy Attack Intermediate

Attacks that infer whether specific records were in training data, or reconstruct sensitive training examples.

Foundations & Theory
Training Pipeline Intermediate

End-to-end process for model training.

MLOps & Infrastructure
Training Cost Intermediate

Cost of model training.

AI Economics & Strategy
Loss Function Intermediate

A function measuring prediction error (and sometimes calibration), guiding gradient-based optimization.

Foundations & Theory
Log Loss Intermediate

Penalizes confident wrong predictions heavily; standard for classification and language modeling.

Optimization
Semi-Supervised Learning Intermediate

Training with a small labeled dataset plus a larger unlabeled dataset, leveraging assumptions like smoothness/cluster structure.

Machine Learning
Multitask Learning Intermediate

Training one model on multiple tasks simultaneously to improve generalization through shared structure.

Machine Learning
Meta-Learning Intermediate

Methods that learn training procedures or initializations so models can adapt quickly to new tasks with little data.

Machine Learning
Domain Shift Intermediate

A mismatch between training and deployment data distributions that can degrade model performance.

MLOps & Infrastructure
Hyperparameters Intermediate

Configuration choices not learned directly (or not typically learned) that govern training or architecture.

Optimization
Overfitting Intermediate

When a model fits noise/idiosyncrasies of training data and performs poorly on unseen data.

Foundations & Theory
Underfitting Intermediate

When a model cannot capture underlying structure, performing poorly on both training and test data.

Foundations & Theory
Generalization Intermediate

How well a model performs on new data drawn from the same (or similar) distribution as training.

Foundations & Theory
Train/Validation/Test Split Intermediate

Separating data into training (fit), validation (tune), and test (final estimate) to avoid leakage and optimism bias.

Evaluation & Benchmarking
Data Leakage Intermediate

When information from evaluation data improperly influences training, inflating reported performance.

Foundations & Theory
Stochastic Gradient Descent Intermediate

A gradient method using random minibatches for efficient training on large datasets.

Foundations & Theory
Early Stopping Intermediate

Halting training when validation performance stops improving to reduce overfitting.

Foundations & Theory
ReLU Intermediate

Activation max(0, x); improves gradient flow and training speed in deep nets.

Foundations & Theory
Normalization Intermediate

Techniques that stabilize and speed training by normalizing activations; LayerNorm is common in Transformers.

Foundations & Theory