Results for "data → model"
When a model fits noise/idiosyncrasies of training data and performs poorly on unseen data.
Learning structure from unlabeled data, such as discovering groups, compressing representations, or modeling data distributions.
Minimizing average loss on training data; can overfit when data is limited or biased.
Training across many devices/silos without centralizing raw data; aggregates updates, not data.
Protecting data during network transfer and while stored; essential for ML pipelines handling sensitive data.
Probabilistic model for sequential data with latent states.
Learning where data arrives sequentially and the model updates continuously, often under changing distributions.
A mismatch between training and deployment data distributions that can degrade model performance.
When a model cannot capture underlying structure, performing poorly on both training and test data.
How well a model performs on new data drawn from the same (or similar) distribution as training.
Updating a pretrained model’s weights on task-specific data to improve performance or adapt style/behavior.
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
Systematic differences in model outcomes across groups; arises from data, labels, and deployment context.
Systematic review of model/data processes to ensure performance, fairness, security, and policy compliance.
Attacks that manipulate model instructions (especially via retrieved content) to override system goals or exfiltrate data.
Methods to protect model/data during inference (e.g., trusted execution environments) from operators/attackers.
Empirical laws linking model size, data, compute to performance.
Detecting unauthorized model outputs or data leaks.
When information from evaluation data improperly influences training, inflating reported performance.
Expanding training data via transformations (flips, noise, paraphrases) to improve robustness.
Artificially created data used to train/test models; helpful for privacy and coverage, risky if unrealistic.
Processes and controls for data quality, access, lineage, retention, and compliance across the AI lifecycle.
Tracking where data came from and how it was transformed; key for debugging and compliance.
Maliciously inserting or altering training data to implant backdoors or degrade performance.
Increasing performance via more data.
Standardized documentation describing intended use, performance, limitations, data, and ethical considerations.
Inferring sensitive features of training data.
Models that learn to generate samples resembling training data.
Competitive advantage from proprietary models/data.
The field of building systems that perform tasks associated with human intelligence—perception, reasoning, language, planning, and decision-making—via algori...