Results for "full pass through data"
Maximum number of tokens the model can attend to in one forward pass; constrains long-document reasoning.
One complete traversal of the training dataset during training.
Generator produces limited variety of outputs.
Maximizing reward without fulfilling real goal.
A gradient method using random minibatches for efficient training on large datasets.
Incrementally deploying new models to reduce risk.
Gradients shrink through layers, slowing learning in early layers; mitigated by ReLU, residuals, normalization.
Competitive advantage from proprietary models/data.
Running predictions on large datasets periodically.
Processes and controls for data quality, access, lineage, retention, and compliance across the AI lifecycle.
Hidden behavior activated by specific triggers, causing targeted mispredictions or undesired outputs.
Generating human-like speech from text.
Models whose weights are publicly available.
Asking model to review and improve output.
AI systems that perceive and act in the physical world through sensors and actuators.
Emergence of conventions among agents.
Artificially created data used to train/test models; helpful for privacy and coverage, risky if unrealistic.
Protecting data during network transfer and while stored; essential for ML pipelines handling sensitive data.
When information from evaluation data improperly influences training, inflating reported performance.
Human or automated process of assigning targets; quality, consistency, and guidelines matter heavily.
Maliciously inserting or altering training data to implant backdoors or degrade performance.
When a model fits noise/idiosyncrasies of training data and performs poorly on unseen data.
Learning from data by constructing “pseudo-labels” (e.g., next-token prediction, masked modeling) without manual annotation.
Empirical laws linking model size, data, compute to performance.
Generative model that learns to reverse a gradual noise process.
A formal privacy framework ensuring outputs do not reveal much about any single individual’s data contribution.
Enables external computation or lookup.
The learned numeric values of a model adjusted during training to minimize a loss function.
How well a model performs on new data drawn from the same (or similar) distribution as training.
Selecting the most informative samples to label (e.g., uncertainty sampling) to reduce labeling cost.