Results for "data distribution"
How well a model performs on new data drawn from the same (or similar) distribution as training.
Shift in feature distribution over time.
A measure of randomness or uncertainty in a probability distribution.
Measures how one probability distribution diverges from another.
Bayesian parameter estimation using the mode of the posterior distribution.
Graphical model expressing factorization of a probability distribution.
Average value under a distribution.
Sum of independent variables converges to normal distribution.
Sampling from easier distribution with reweighting.
Restricting distribution of powerful models.
Updated belief after observing data.
Belief before observing data.
Learning structure from unlabeled data, such as discovering groups, compressing representations, or modeling data distributions.
Minimizing average loss on training data; can overfit when data is limited or biased.
When a model fits noise/idiosyncrasies of training data and performs poorly on unseen data.
Training across many devices/silos without centralizing raw data; aggregates updates, not data.
Protecting data during network transfer and while stored; essential for ML pipelines handling sensitive data.
When information from evaluation data improperly influences training, inflating reported performance.
Expanding training data via transformations (flips, noise, paraphrases) to improve robustness.
Artificially created data used to train/test models; helpful for privacy and coverage, risky if unrealistic.
Processes and controls for data quality, access, lineage, retention, and compliance across the AI lifecycle.
Tracking where data came from and how it was transformed; key for debugging and compliance.
Maliciously inserting or altering training data to implant backdoors or degrade performance.
Increasing performance via more data.
Describes likelihoods of random variable outcomes.
Train/test environment mismatch.
The field of building systems that perform tasks associated with human intelligence—perception, reasoning, language, planning, and decision-making—via algori...
A subfield of AI where models learn patterns from data to make predictions or decisions, improving with experience rather than explicit rule-coding.
Learning a function from input-output pairs (labeled data), optimizing performance on predicting outputs for unseen inputs.
Learning from data by constructing “pseudo-labels” (e.g., next-token prediction, masked modeling) without manual annotation.