Results for "data-driven"
A structured collection of examples used to train/evaluate models; quality, bias, and coverage often dominate outcomes.
A measurable property or attribute used as model input (raw or engineered), such as age, pixel intensity, or token ID.
A parameterized mapping from inputs to outputs; includes architecture + learned parameters.
The learned numeric values of a model adjusted during training to minimize a loss function.
When a model cannot capture underlying structure, performing poorly on both training and test data.
How well a model performs on new data drawn from the same (or similar) distribution as training.
Systematic differences in model outcomes across groups; arises from data, labels, and deployment context.
Measure of consistency across labelers; low agreement indicates ambiguous tasks or poor guidelines.
Selecting the most informative samples to label (e.g., uncertainty sampling) to reduce labeling cost.
Structured dataset documentation covering collection, composition, recommended uses, biases, and maintenance.
Logging hyperparameters, code versions, data snapshots, and results to reproduce and compare experiments.
Systematic review of model/data processes to ensure performance, fairness, security, and policy compliance.
Attacks that infer whether specific records were in training data, or reconstruct sensitive training examples.
Methods to protect model/data during inference (e.g., trusted execution environments) from operators/attackers.
A narrow minimum often associated with poorer generalization.
Systematic error introduced by simplifying assumptions in a learning algorithm.
Built-in assumptions guiding learning efficiency and generalization.
Estimating parameters by maximizing likelihood of observed data.
Detecting unauthorized model outputs or data leaks.
Neural networks that operate on graph-structured data by propagating information along edges.
Models that define an energy landscape rather than explicit probabilities.
Learns the score (∇ log p(x)) for generative sampling.
Exact likelihood generative models using invertible transforms.
Two-network setup where generator fools a discriminator.
CNNs applied to time series.
Attention between different modalities.
End-to-end process for model training.
Running predictions on large datasets periodically.
Centralized repository for curated features.
Scaling law optimizing compute vs data.