Results for "autoregressive training"
Popular optimizer combining momentum and per-parameter adaptive step sizes via first/second moment estimates.
Attention where queries/keys/values come from the same sequence, enabling token-to-token interactions.
Nonlinear functions enabling networks to approximate complex mappings; ReLU variants dominate modern DL.
Converting text into discrete units (tokens) for modeling; subword tokenizers balance vocabulary size and coverage.
An RNN variant using gates to mitigate vanishing gradients and capture longer context.
The set of tokens a model can represent; impacts efficiency, multilinguality, and handling of rare strings.
Model-generated content that is fluent but unsupported by evidence or incorrect; mitigated by grounding and verification.
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
Rules and controls around generation (filters, validators, structured outputs) to reduce unsafe or invalid behavior.
Systematic differences in model outcomes across groups; arises from data, labels, and deployment context.
Human or automated process of assigning targets; quality, consistency, and guidelines matter heavily.
Measure of consistency across labelers; low agreement indicates ambiguous tasks or poor guidelines.
Standardized documentation describing intended use, performance, limitations, data, and ethical considerations.
Practices for operationalizing ML: versioning, CI/CD, monitoring, retraining, and reliable production management.
Converts logits to probabilities by exponentiation and normalization; common in classification and LMs.
System for running consistent evaluations across tasks, versions, prompts, and model settings.
Hidden behavior activated by specific triggers, causing targeted mispredictions or undesired outputs.
System design where humans validate or guide model outputs, especially for high-stakes decisions.
Models that process or generate multiple modalities, enabling vision-language tasks, speech, video understanding, etc.
AI focused on interpreting images/video: classification, detection, segmentation, tracking, and 3D understanding.
Assigning labels per pixel (semantic) or per instance (instance segmentation) to map object boundaries.
A model is PAC-learnable if it can, with high probability, learn an approximately correct hypothesis from finite samples.
Converting audio speech into text, often using encoder-decoder or transducer architectures.
Reduction in uncertainty achieved by observing a variable; used in decision trees and active learning.
A theoretical framework analyzing what classes of functions can be learned, how efficiently, and with what guarantees.
Measures divergence between true and predicted probability distributions.
Variability introduced by minibatch sampling during SGD.
A narrow minimum often associated with poorer generalization.
Early architecture using learned gates for skip connections.
Built-in assumptions guiding learning efficiency and generalization.