Search: training contamination

Vocabulary Intermediate

The set of tokens a model can represent; impacts efficiency, multilinguality, and handling of rare strings.

Transformers & LLMs

Language Model Intermediate

A model that assigns probabilities to sequences of tokens; often trained by next-token prediction.

Large Language Models

Autoregressive Model Intermediate

Generates sequences one token at a time, conditioning on past tokens.

Foundations & Theory

Masked Language Model Intermediate

Predicts masked tokens in a sequence, enabling bidirectional context; often used for embeddings rather than generation.

Foundations & Theory

RLHF Intermediate

Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.

Optimization

Hallucination Intermediate

Model-generated content that is fluent but unsupported by evidence or incorrect; mitigated by grounding and verification.

Model Failure Modes

Bias Intermediate

Systematic differences in model outcomes across groups; arises from data, labels, and deployment context.

Foundations & Theory

Guardrails Intermediate

Rules and controls around generation (filters, validators, structured outputs) to reduce unsafe or invalid behavior.

Reinforcement Learning

Data Labeling Intermediate

Human or automated process of assigning targets; quality, consistency, and guidelines matter heavily.

Foundations & Theory

Inter-Annotator Agreement Intermediate

Measure of consistency across labelers; low agreement indicates ambiguous tasks or poor guidelines.

Foundations & Theory

Model Card Intermediate

Standardized documentation describing intended use, performance, limitations, data, and ethical considerations.

Foundations & Theory

MLOps Intermediate

Practices for operationalizing ML: versioning, CI/CD, monitoring, retraining, and reliable production management.

MLOps & Infrastructure

Distillation Intermediate

Training a smaller “student” model to mimic a larger “teacher,” often improving efficiency while retaining performance.

Foundations & Theory

Softmax Intermediate

Converts logits to probabilities by exponentiation and normalization; common in classification and LMs.

Foundations & Theory

Eval Harness Intermediate

System for running consistent evaluations across tasks, versions, prompts, and model settings.

Foundations & Theory

Backdoor / Trojan Intermediate

Hidden behavior activated by specific triggers, causing targeted mispredictions or undesired outputs.

Foundations & Theory

Human-in-the-Loop Intermediate

System design where humans validate or guide model outputs, especially for high-stakes decisions.

Foundations & Theory

Multimodal Model Intermediate

Models that process or generate multiple modalities, enabling vision-language tasks, speech, video understanding, etc.

Foundations & Theory

Computer Vision Intermediate

AI focused on interpreting images/video: classification, detection, segmentation, tracking, and 3D understanding.

Computer Vision

Segmentation Intermediate

Assigning labels per pixel (semantic) or per instance (instance segmentation) to map object boundaries.

Computer Vision

PAC Learning Intermediate

A model is PAC-learnable if it can, with high probability, learn an approximately correct hypothesis from finite samples.

AI Economics & Strategy

Speech Recognition Intermediate

Converting audio speech into text, often using encoder-decoder or transducer architectures.

Speech & Audio AI

VC Dimension Intermediate

A measure of a model class’s expressive capacity based on its ability to shatter datasets.

AI Economics & Strategy

Computational Learning Theory Intermediate

A theoretical framework analyzing what classes of functions can be learned, how efficiently, and with what guarantees.

AI Economics & Strategy

Information Gain Intermediate

Reduction in uncertainty achieved by observing a variable; used in decision trees and active learning.

AI Economics & Strategy

Cross-Entropy Intermediate

Measures divergence between true and predicted probability distributions.

AI Economics & Strategy

Gradient Noise Intermediate

Variability introduced by minibatch sampling during SGD.

AI Economics & Strategy

Sharp Minimum Intermediate

A narrow minimum often associated with poorer generalization.

AI Economics & Strategy

Highway Network Intermediate

Early architecture using learned gates for skip connections.

AI Economics & Strategy

Inductive Bias Intermediate

Built-in assumptions guiding learning efficiency and generalization.

AI Economics & Strategy

Results for "training contamination"

Welcome to AI Glossary

Search

Browse

3D WordGraph