Results for "parallel attention"
A single attention mechanism within multi-head attention.
Attention mechanisms that reduce quadratic complexity.
Architecture based on self-attention and feedforward layers; foundation of modern LLMs and many multimodal models.
Mechanism that computes context-aware mixtures of representations; scales well and captures long-range dependencies.
Attention where queries/keys/values come from the same sequence, enabling token-to-token interactions.
GNN using attention to weight neighbor contributions dynamically.
Attention between different modalities.
Allows model to attend to information from different subspaces simultaneously.
Prevents attention to future tokens during training/inference.
Techniques to handle longer documents without quadratic cost.
Ability to replicate results given same code/data; harder in distributed training and nondeterministic ops.
How many requests or tokens can be processed per unit time; affects scalability and cost.
Hardware resources used for training/inference; constrained by memory bandwidth, FLOPs, and parallelism.
Running new model alongside production without user impact.
Running predictions on large datasets periodically.
Injects sequence order into Transformers, since attention alone is permutation-invariant.
Studying internal mechanisms or input influence on outputs (e.g., saliency maps, SHAP, attention analysis).
Stores past attention states to speed up autoregressive decoding.
Transformer applied to image patches.
Using markers to isolate context segments.
Networks with recurrent connections for sequences; largely supplanted by Transformers for many tasks.
Generates sequences one token at a time, conditioning on past tokens.
A high-capacity language model trained on massive corpora, exhibiting broad generalization and emergent behaviors.
Maximum number of tokens the model can attend to in one forward pass; constrains long-document reasoning.
AI subfield dealing with understanding and generating human language, including syntax, semantics, and pragmatics.
Extending agents with long-term memory stores.
Converting audio speech into text, often using encoder-decoder or transducer architectures.
Extracting system prompts or hidden instructions.
Models trained to decide when to call tools.
Maximizing reward without fulfilling real goal.