Results for "tokens set"
Samples from the smallest set of tokens whose probabilities sum to p, adapting set size by context.
Predicts masked tokens in a sequence, enabling bidirectional context; often used for embeddings rather than generation.
Detecting unauthorized model outputs or data leaks.
Samples from the k highest-probability tokens to limit unlikely outputs.
A model that assigns probabilities to sequences of tokens; often trained by next-token prediction.
Prevents attention to future tokens during training/inference.
The set of tokens a model can represent; impacts efficiency, multilinguality, and handling of rare strings.
Injects sequence order into Transformers, since attention alone is permutation-invariant.
Training objective where the model predicts the next token given previous tokens (causal modeling).
Generates sequences one token at a time, conditioning on past tokens.
Maximum number of tokens the model can attend to in one forward pass; constrains long-document reasoning.
How many requests or tokens can be processed per unit time; affects scalability and cost.
Encodes positional information via rotation in embedding space.
Encodes token position explicitly, often via sinusoids.
Attention mechanisms that reduce quadratic complexity.
Stochastic generation strategies that trade determinism for diversity; key knobs include temperature and nucleus sampling.
Set of vectors closed under addition and scalar multiplication.
Mechanism that computes context-aware mixtures of representations; scales well and captures long-range dependencies.
Converting text into discrete units (tokens) for modeling; subword tokenizers balance vocabulary size and coverage.
Techniques to handle longer documents without quadratic cost.
Limiting inference usage.
Separating data into training (fit), validation (tune), and test (final estimate) to avoid leakage and optimism bias.
Probabilistic graphical model for structured prediction.
Search algorithm for generation that keeps top-k partial sequences; can improve likelihood but reduce diversity.
Techniques that discourage overly complex solutions to improve generalization (reduce overfitting).
Set of all actions available to the agent.
When information from evaluation data improperly influences training, inflating reported performance.
Structured graph encoding facts as entity–relation–entity triples.
Rules and controls around generation (filters, validators, structured outputs) to reduce unsafe or invalid behavior.
A measure of a model class’s expressive capacity based on its ability to shatter datasets.