Results for "training loss"
The shape of the loss function over parameter space.
Penalizes confident wrong predictions heavily; standard for classification and language modeling.
A function measuring prediction error (and sometimes calibration), guiding gradient-based optimization.
A scalar measure optimized during training, typically expected loss over data, sometimes with regularization terms.
Iterative method that updates parameters in the direction of negative gradient to minimize loss.
Minimizing average loss on training data; can overfit when data is limited or biased.
The learned numeric values of a model adjusted during training to minimize a loss function.
Halting training when validation performance stops improving to reduce overfitting.
Assigning labels per pixel (semantic) or per instance (instance segmentation) to map object boundaries.
Visualization of optimization landscape.
Lowest possible loss.
Maximum expected loss under normal conditions.
One complete traversal of the training dataset during training.
Cost of model training.
A wide basin often correlated with better generalization.
Minimum relative to nearby points.
Two-network setup where generator fools a discriminator.
Applying learned patterns incorrectly.
Measures divergence between true and predicted probability distributions.
A narrow minimum often associated with poorer generalization.
Uses an exponential moving average of gradients to speed convergence and reduce oscillation.
Matrix of second derivatives describing local curvature of loss.
Pixel-wise classification of image regions.
Loss of old knowledge when learning new tasks.
Learning policies from expert demonstrations.
A preference-based training method optimizing policies directly from pairwise comparisons without explicit RL loops.
Ordering training samples from easier to harder to improve convergence or generalization.
Combining simulation and real-world data.
Training objective where the model predicts the next token given previous tokens (causal modeling).
Empirical laws linking model size, data, compute to performance.