Results for "out-of-sample performance"
Sample mean converges to expected value.
Learning a function from input-output pairs (labeled data), optimizing performance on predicting outputs for unseen inputs.
A mismatch between training and deployment data distributions that can degrade model performance.
When information from evaluation data improperly influences training, inflating reported performance.
A robust evaluation technique that trains/evaluates across multiple splits to estimate performance variability.
Often more informative than ROC on imbalanced datasets; focuses on positive class performance.
Halting training when validation performance stops improving to reduce overfitting.
Achieving task performance by providing a small number of examples inside the prompt without weight updates.
Updating a pretrained model’s weights on task-specific data to improve performance or adapt style/behavior.
Standardized documentation describing intended use, performance, limitations, data, and ethical considerations.
Systematic review of model/data processes to ensure performance, fairness, security, and policy compliance.
Training a smaller “student” model to mimic a larger “teacher,” often improving efficiency while retaining performance.
Maliciously inserting or altering training data to implant backdoors or degrade performance.
Empirical laws linking model size, data, compute to performance.
Increasing performance via more data.
Performance drop when moving from simulation to reality.
Unequal performance across demographic groups.
Tradeoff between safety and performance.