Results for "data → model"
Systematic review of model/data processes to ensure performance, fairness, security, and policy compliance.
Detecting unauthorized model outputs or data leaks.
Exact likelihood generative models using invertible transforms.
Running predictions on large datasets periodically.
A measurable property or attribute used as model input (raw or engineered), such as age, pixel intensity, or token ID.
Systematic differences in model outcomes across groups; arises from data, labels, and deployment context.
Selecting the most informative samples to label (e.g., uncertainty sampling) to reduce labeling cost.
Attacks that infer whether specific records were in training data, or reconstruct sensitive training examples.
Methods to protect model/data during inference (e.g., trusted execution environments) from operators/attackers.
A narrow minimum often associated with poorer generalization.
Systematic error introduced by simplifying assumptions in a learning algorithm.
Models that define an energy landscape rather than explicit probabilities.
Startup latency for services.
Finding mathematical equations from data.
Logging hyperparameters, code versions, data snapshots, and results to reproduce and compare experiments.
Centralized repository for curated features.
Estimating parameters by maximizing likelihood of observed data.
Scaling law optimizing compute vs data.
Attention between different modalities.
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
Model-generated content that is fluent but unsupported by evidence or incorrect; mitigated by grounding and verification.
Applying learned patterns incorrectly.
Central catalog of deployed and experimental models.
A structured collection of examples used to train/evaluate models; quality, bias, and coverage often dominate outcomes.
Measure of consistency across labelers; low agreement indicates ambiguous tasks or poor guidelines.
Neural networks that operate on graph-structured data by propagating information along edges.
Structured dataset documentation covering collection, composition, recommended uses, biases, and maintenance.
CNNs applied to time series.
Two-network setup where generator fools a discriminator.
Belief before observing data.