Results for "loss geometry"
Visualization of optimization landscape.
Penalizes confident wrong predictions heavily; standard for classification and language modeling.
The shape of the loss function over parameter space.
A function measuring prediction error (and sometimes calibration), guiding gradient-based optimization.
A scalar measure optimized during training, typically expected loss over data, sometimes with regularization terms.
Iterative method that updates parameters in the direction of negative gradient to minimize loss.
The learned numeric values of a model adjusted during training to minimize a loss function.
Minimizing average loss on training data; can overfit when data is limited or biased.
Assigning labels per pixel (semantic) or per instance (instance segmentation) to map object boundaries.
Maximum expected loss under normal conditions.
Lowest possible loss.
Uses an exponential moving average of gradients to speed convergence and reduce oscillation.
Halting training when validation performance stops improving to reduce overfitting.
Measures divergence between true and predicted probability distributions.
A narrow minimum often associated with poorer generalization.
A wide basin often correlated with better generalization.
Matrix of second derivatives describing local curvature of loss.
Pixel-wise classification of image regions.
Two-network setup where generator fools a discriminator.
Minimum relative to nearby points.
Applying learned patterns incorrectly.
Loss of old knowledge when learning new tasks.
Learning policies from expert demonstrations.
Learning a function from input-output pairs (labeled data), optimizing performance on predicting outputs for unseen inputs.
Learning where data arrives sequentially and the model updates continuously, often under changing distributions.
Reusing knowledge from a source task/domain to improve learning on a target task/domain, typically via pretrained models.
Training one model on multiple tasks simultaneously to improve generalization through shared structure.
Methods that learn training procedures or initializations so models can adapt quickly to new tasks with little data.
Automatically learning useful internal features (latent variables) that capture salient structure for downstream tasks.
Techniques that discourage overly complex solutions to improve generalization (reduce overfitting).