Attention

Mechanism that computes context-aware mixtures of representations; scales well and captures long-range dependencies.

Why It Matters

Attention mechanisms are essential for improving the performance of AI models in tasks that require understanding context and relationships within data. They have transformed natural language processing and are integral to modern architectures like Transformers, enabling advancements in machine translation, text generation, and beyond.

Attention mechanisms are computational strategies that enable models to focus on specific parts of the input data when generating outputs, effectively allowing for context-aware processing. The fundamental operation involves computing a weighted sum of values (V) based on the relevance of queries (Q) to keys (K), expressed mathematically as Attention(Q, K, V) = softmax(QK^T / âˆšd_k)V. This mechanism allows for dynamic weighting of input features, facilitating the capture of long-range dependencies and contextual relationships within sequences. Attention mechanisms can be categorized into various forms, including global attention, where all input tokens are considered, and local attention, which restricts the focus to a subset of tokens. The introduction of attention has significantly improved the performance of models in tasks such as machine translation and text summarization, leading to the development of architectures like Transformers that leverage attention as a core component.

Keywords

query-key-value

Domains

Transformers & LLMs

Related Terms

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 3

3D WordGraph

Full 3D WordGraph

Click a connected term to explore it. The center node is Attention.

Relationship Types

related to broader / narrower prerequisite of contrasts with used in

Why It Matters

Keywords

Domains

Related Terms

Welcome to AI Glossary

Search

Browse

3D WordGraph