Browse: Transformers & LLMs

Attention Intermediate

Mechanism that computes context-aware mixtures of representations; scales well and captures long-range dependencies.

Context Window Intermediate

Maximum number of tokens the model can attend to in one forward pass; constrains long-document reasoning.

Self-Attention Intermediate

Attention where queries/keys/values come from the same sequence, enabling token-to-token interactions.

Throughput Intermediate

How many requests or tokens can be processed per unit time; affects scalability and cost.

Transformer Intermediate

Architecture based on self-attention and feedforward layers; foundation of modern LLMs and many multimodal models.

Vocabulary Intermediate

The set of tokens a model can represent; impacts efficiency, multilinguality, and handling of rare strings.

Domain: Transformers & LLMs