Results for "modeling mismatch"
Differences between training and inference conditions.
A mismatch between training and deployment data distributions that can degrade model performance.
Ensuring AI systems pursue intended human goals.
Train/test environment mismatch.
Modeling environment evolution in latent space.
Penalizes confident wrong predictions heavily; standard for classification and language modeling.
Networks with recurrent connections for sequences; largely supplanted by Transformers for many tasks.
Converting text into discrete units (tokens) for modeling; subword tokenizers balance vocabulary size and coverage.
An RNN variant using gates to mitigate vanishing gradients and capture longer context.
Training objective where the model predicts the next token given previous tokens (causal modeling).
Models time evolution via hidden states.
CNNs applied to time series.
Modeling interactions with environment.
Differences between simulated and real physics.
Learning from data by constructing “pseudo-labels” (e.g., next-token prediction, masked modeling) without manual annotation.
Learning structure from unlabeled data, such as discovering groups, compressing representations, or modeling data distributions.
The degree to which predicted probabilities match true frequencies (e.g., 0.8 means ~80% correct).
Iterative method that updates parameters in the direction of negative gradient to minimize loss.
A parameterized function composed of interconnected units organized in layers with nonlinear activations.
Attention where queries/keys/values come from the same sequence, enabling token-to-token interactions.
Injects sequence order into Transformers, since attention alone is permutation-invariant.
Generates sequences one token at a time, conditioning on past tokens.
A model that assigns probabilities to sequences of tokens; often trained by next-token prediction.
Stepwise reasoning patterns that can improve multi-step tasks; often handled implicitly or summarized for safety/privacy.
Framework for reasoning about cause-effect relationships beyond correlation, often using structural assumptions and experiments.
Tracking where data came from and how it was transformed; key for debugging and compliance.
Generating speech audio from text, with control over prosody, speaker identity, and style.
Updating beliefs about parameters using observed evidence and prior distributions.
Using same parameters across different parts of a model.
Prevents attention to future tokens during training/inference.