Results for "training loss"
Optimization with multiple local minima/saddle points; typical in neural networks.
Optimization using curvature information; often expensive at scale.
Joint vision-language model aligning images and text.
Generates audio waveforms from spectrograms.
Using production outcomes to improve models.
Direction of steepest ascent of a function.
Choosing step size along gradient direction.
Asking model to review and improve output.
Learning by minimizing prediction error.
Systems where failure causes physical harm.
AI giving legal advice without authorization.
Agents have opposing objectives.
Updating a pretrained model’s weights on task-specific data to improve performance or adapt style/behavior.
A formal privacy framework ensuring outputs do not reveal much about any single individual’s data contribution.
Configuration choices not learned directly (or not typically learned) that govern training or architecture.
Number of samples per gradient update; impacts compute efficiency, generalization, and stability.
Gradually increasing learning rate at training start to avoid divergence.
Recovering training data from gradients.
Inferring sensitive features of training data.
Differences between training and inference conditions.
Model behaves well during training but not deployment.
Training with a small labeled dataset plus a larger unlabeled dataset, leveraging assumptions like smoothness/cluster structure.
A parameterized mapping from inputs to outputs; includes architecture + learned parameters.
When a model cannot capture underlying structure, performing poorly on both training and test data.
Separating data into training (fit), validation (tune), and test (final estimate) to avoid leakage and optimism bias.
Activation max(0, x); improves gradient flow and training speed in deep nets.
Gradients grow too large, causing divergence; mitigated by clipping, normalization, careful init.
Methods to set starting weights to preserve signal/gradient scales across layers.
Randomly zeroing activations during training to reduce co-adaptation and overfitting.
Expanding training data via transformations (flips, noise, paraphrases) to improve robustness.