Results for "input variable"
Learning where data arrives sequentially and the model updates continuously, often under changing distributions.
Training one model on multiple tasks simultaneously to improve generalization through shared structure.
A mismatch between training and deployment data distributions that can degrade model performance.
A structured collection of examples used to train/evaluate models; quality, bias, and coverage often dominate outcomes.
The internal space where learned representations live; operations here often correlate with semantics or generative factors.
Randomly zeroing activations during training to reduce co-adaptation and overfitting.
Networks with recurrent connections for sequences; largely supplanted by Transformers for many tasks.
Converting text into discrete units (tokens) for modeling; subword tokenizers balance vocabulary size and coverage.
Predicts masked tokens in a sequence, enabling bidirectional context; often used for embeddings rather than generation.
Maximum number of tokens the model can attend to in one forward pass; constrains long-document reasoning.
The text (and possibly other modalities) given to an LLM to condition its output behavior.
A high-priority instruction layer setting overarching behavior constraints for a chat model.
Achieving task performance by providing a small number of examples inside the prompt without weight updates.
Letting an LLM call external functions/APIs to fetch data, compute, or take actions, improving reliability.
Techniques to understand model decisions (global or local), important in high-stakes and regulated settings.
Raw model outputs before converting to probabilities; manipulated during decoding and calibration.
Observing model inputs/outputs, latency, cost, and quality over time to catch regressions and drift.
System design where humans validate or guide model outputs, especially for high-stakes decisions.
Generating speech audio from text, with control over prosody, speaker identity, and style.
Allows gradients to bypass layers, enabling very deep networks.
Early architecture using learned gates for skip connections.
Using same parameters across different parts of a model.
The range of functions a model can represent.
Techniques to handle longer documents without quadratic cost.
Inferring sensitive features of training data.
Models that define an energy landscape rather than explicit probabilities.
Models that learn to generate samples resembling training data.
Diffusion model trained to remove noise step by step.
Diffusion performed in latent space for efficiency.
Autoencoder using probabilistic latent variables and KL regularization.