Results for "tokens/sec"
Converting text into discrete units (tokens) for modeling; subword tokenizers balance vocabulary size and coverage.
The set of tokens a model can represent; impacts efficiency, multilinguality, and handling of rare strings.
A model that assigns probabilities to sequences of tokens; often trained by next-token prediction.
Training objective where the model predicts the next token given previous tokens (causal modeling).
Generates sequences one token at a time, conditioning on past tokens.
Predicts masked tokens in a sequence, enabling bidirectional context; often used for embeddings rather than generation.
Maximum number of tokens the model can attend to in one forward pass; constrains long-document reasoning.
How many requests or tokens can be processed per unit time; affects scalability and cost.
Samples from the k highest-probability tokens to limit unlikely outputs.
Samples from the smallest set of tokens whose probabilities sum to p, adapting set size by context.
Prevents attention to future tokens during training/inference.
Detecting unauthorized model outputs or data leaks.