Attention

Intermediate

Mechanism that computes context-aware mixtures of representations; scales well and captures long-range dependencies.

AdvertisementAd space — term-top

Why It Matters

Attention mechanisms are essential for improving the performance of AI models in tasks that require understanding context and relationships within data. They have transformed natural language processing and are integral to modern architectures like Transformers, enabling advancements in machine translation, text generation, and beyond.

Attention mechanisms are computational strategies that enable models to focus on specific parts of the input data when generating outputs, effectively allowing for context-aware processing. The fundamental operation involves computing a weighted sum of values (V) based on the relevance of queries (Q) to keys (K), expressed mathematically as Attention(Q, K, V) = softmax(QK^T / √d_k)V. This mechanism allows for dynamic weighting of input features, facilitating the capture of long-range dependencies and contextual relationships within sequences. Attention mechanisms can be categorized into various forms, including global attention, where all input tokens are considered, and local attention, which restricts the focus to a subset of tokens. The introduction of attention has significantly improved the performance of models in tasks such as machine translation and text summarization, leading to the development of architectures like Transformers that leverage attention as a core component.

Keywords

Domains

Related Terms

Welcome to AI Glossary

The free, self-building AI dictionary. Help us keep it free—click an ad once in a while!

Search

Type any question or keyword into the search bar at the top.

Browse

Tap a letter in the A–Z bar to browse terms alphabetically, or filter by domain, industry, or difficulty level.

3D WordGraph

Fly around the interactive 3D graph to explore how AI concepts connect. Click any word to read its full definition.