Results for "generalization risk"
Restricting distribution of powerful models.
Techniques that discourage overly complex solutions to improve generalization (reduce overfitting).
When information from evaluation data improperly influences training, inflating reported performance.
The set of tokens a model can represent; impacts efficiency, multilinguality, and handling of rare strings.
A high-capacity language model trained on massive corpora, exhibiting broad generalization and emergent behaviors.
Expanding training data via transformations (flips, noise, paraphrases) to improve robustness.
Artificially created data used to train/test models; helpful for privacy and coverage, risky if unrealistic.
Automated testing and deployment processes for models and data workflows, extending DevOps to ML artifacts.
A theoretical framework analyzing what classes of functions can be learned, how efficiently, and with what guarantees.
Error due to sensitivity to fluctuations in the training dataset.
Using same parameters across different parts of a model.
The range of functions a model can represent.
Encodes positional information via rotation in embedding space.
Increasing performance via more data.
Measure of spread around the mean.
Ensuring learned behavior matches intended objective.
Applying learned patterns incorrectly.
Loss of old knowledge when learning new tasks.
Train/test environment mismatch.
Randomizing simulation parameters to improve real-world transfer.
Performance drop when moving from simulation to reality.
Combining simulation and real-world data.
Learned model of environment dynamics.
Modeling environment evolution in latent space.
A function measuring prediction error (and sometimes calibration), guiding gradient-based optimization.
Breaking documents into pieces for retrieval; chunk size/overlap strongly affect RAG quality.
Training across many devices/silos without centralizing raw data; aggregates updates, not data.
Samples from the k highest-probability tokens to limit unlikely outputs.
Systematic review of model/data processes to ensure performance, fairness, security, and policy compliance.
Constraining model outputs into a schema used to call external APIs/tools safely and deterministically.