Results for "out-of-sample performance"
Sample mean converges to expected value.
A structured collection of examples used to train/evaluate models; quality, bias, and coverage often dominate outcomes.
A model is PAC-learnable if it can, with high probability, learn an approximately correct hypothesis from finite samples.
Sampling from easier distribution with reweighting.
Variable whose values depend on chance.
Incrementally deploying new models to reduce risk.
Storing results to reduce compute.
Uses an exponential moving average of gradients to speed convergence and reduce oscillation.
The set of tokens a model can represent; impacts efficiency, multilinguality, and handling of rare strings.
Prevents attention to future tokens during training/inference.
Separates planning from execution in agent architectures.
Failure to detect present disease.
A theoretical framework analyzing what classes of functions can be learned, how efficiently, and with what guarantees.
A measure of a model class’s expressive capacity based on its ability to shatter datasets.
Measures a model’s ability to fit random noise; used to bound generalization error.
Minimizing average loss on training data; can overfit when data is limited or biased.
A gradient method using random minibatches for efficient training on large datasets.
Expanding training data via transformations (flips, noise, paraphrases) to improve robustness.
Estimating parameters by maximizing likelihood of observed data.
Learns the score (∇ log p(x)) for generative sampling.
Sum of independent variables converges to normal distribution.
Approximating expectations via random sampling.
When information from evaluation data improperly influences training, inflating reported performance.
Empirical laws linking model size, data, compute to performance.
Observing model inputs/outputs, latency, cost, and quality over time to catch regressions and drift.
Separating data into training (fit), validation (tune), and test (final estimate) to avoid leakage and optimism bias.
A robust evaluation technique that trains/evaluates across multiple splits to estimate performance variability.
A table summarizing classification outcomes, foundational for metrics like precision, recall, specificity.
Harmonic mean of precision and recall; useful when balancing false positives/negatives matters.
Often more informative than ROC on imbalanced datasets; focuses on positive class performance.