Results for "parallel attention"
A single attention mechanism within multi-head attention.
Attention where queries/keys/values come from the same sequence, enabling token-to-token interactions.
Attention mechanisms that reduce quadratic complexity.
GNN using attention to weight neighbor contributions dynamically.
Attention between different modalities.
Architecture based on self-attention and feedforward layers; foundation of modern LLMs and many multimodal models.
Injects sequence order into Transformers, since attention alone is permutation-invariant.
Studying internal mechanisms or input influence on outputs (e.g., saliency maps, SHAP, attention analysis).
Prevents attention to future tokens during training/inference.
Stores past attention states to speed up autoregressive decoding.
Mechanism that computes context-aware mixtures of representations; scales well and captures long-range dependencies.
Allows model to attend to information from different subspaces simultaneously.