Results for "max tokens"
Activation max(0, x); improves gradient flow and training speed in deep nets.
A point where gradient is zero but is neither a max nor min; common in deep nets.
Converting text into discrete units (tokens) for modeling; subword tokenizers balance vocabulary size and coverage.
The set of tokens a model can represent; impacts efficiency, multilinguality, and handling of rare strings.
A model that assigns probabilities to sequences of tokens; often trained by next-token prediction.
Training objective where the model predicts the next token given previous tokens (causal modeling).
Generates sequences one token at a time, conditioning on past tokens.
Predicts masked tokens in a sequence, enabling bidirectional context; often used for embeddings rather than generation.
Maximum number of tokens the model can attend to in one forward pass; constrains long-document reasoning.
How many requests or tokens can be processed per unit time; affects scalability and cost.
Samples from the k highest-probability tokens to limit unlikely outputs.
Samples from the smallest set of tokens whose probabilities sum to p, adapting set size by context.
Prevents attention to future tokens during training/inference.
Detecting unauthorized model outputs or data leaks.