Results for "attention"
Attention
IntermediateMechanism that computes context-aware mixtures of representations; scales well and captures long-range dependencies.
Attention is like a spotlight that helps a model focus on the most important parts of the input data when making predictions. For example, when translating a sentence, attention allows the model to pay more attention to certain words that are crucial for understanding the meaning. Instead of trea...
A single attention mechanism within multi-head attention.
Attention mechanisms that reduce quadratic complexity.
Mechanism that computes context-aware mixtures of representations; scales well and captures long-range dependencies.
Attention where queries/keys/values come from the same sequence, enabling token-to-token interactions.
GNN using attention to weight neighbor contributions dynamically.
Attention between different modalities.
Architecture based on self-attention and feedforward layers; foundation of modern LLMs and many multimodal models.
Allows model to attend to information from different subspaces simultaneously.
Prevents attention to future tokens during training/inference.
Techniques to handle longer documents without quadratic cost.
Injects sequence order into Transformers, since attention alone is permutation-invariant.
Studying internal mechanisms or input influence on outputs (e.g., saliency maps, SHAP, attention analysis).
Stores past attention states to speed up autoregressive decoding.
Transformer applied to image patches.
Using markers to isolate context segments.
Generates sequences one token at a time, conditioning on past tokens.
Networks with recurrent connections for sequences; largely supplanted by Transformers for many tasks.
Maximum number of tokens the model can attend to in one forward pass; constrains long-document reasoning.
A high-capacity language model trained on massive corpora, exhibiting broad generalization and emergent behaviors.
Extending agents with long-term memory stores.
AI subfield dealing with understanding and generating human language, including syntax, semantics, and pragmatics.
Extracting system prompts or hidden instructions.
Converting audio speech into text, often using encoder-decoder or transducer architectures.
Maximizing reward without fulfilling real goal.
Models trained to decide when to call tools.
Assigning a role or identity to the model.
Breaking tasks into sub-steps.
Temporary reasoning space (often hidden).
Small prompt changes cause large output changes.
Controlling robots via language.