Attention Head

Intermediate

A single attention mechanism within multi-head attention.

AdvertisementAd space — term-top

Why It Matters

Attention heads are essential for enhancing the performance of AI models, particularly in natural language processing tasks. By allowing models to focus on multiple aspects of input data simultaneously, they improve understanding and context awareness. This capability is crucial for applications such as machine translation, text summarization, and sentiment analysis, making them foundational to modern AI systems.

An attention head is a fundamental component of the multi-head attention mechanism, which is a core element of transformer architectures. Each attention head independently computes a weighted sum of input representations based on learned attention scores, allowing the model to focus on different parts of the input sequence. Mathematically, for an input sequence represented as a matrix X, the attention mechanism can be defined as: Attention(Q, K, V) = softmax(QK^T / √d_k)V, where Q, K, and V are the query, key, and value matrices derived from X, and d_k is the dimensionality of the key vectors. The output of each attention head is then concatenated and linearly transformed to produce the final output of the multi-head attention layer. This mechanism allows the model to capture diverse contextual information by attending to various subspaces of the input data, enhancing its ability to learn complex relationships within the data. Attention heads are crucial for improving the expressiveness of the model by enabling parallel processing of information across different representation subspaces.

Keywords

Domains

Related Terms

Welcome to AI Glossary

The free, self-building AI dictionary. Help us keep it free—click an ad once in a while!

Search

Type any question or keyword into the search bar at the top.

Browse

Tap a letter in the A–Z bar to browse terms alphabetically, or filter by domain, industry, or difficulty level.

3D WordGraph

Fly around the interactive 3D graph to explore how AI concepts connect. Click any word to read its full definition.