Multi-Head Attention

Intermediate

Allows model to attend to information from different subspaces simultaneously.

AdvertisementAd space — term-top

Why It Matters

The multi-head attention mechanism is pivotal in modern AI, particularly in natural language processing and computer vision. By enabling models to consider multiple perspectives simultaneously, it significantly enhances their ability to understand complex data. This capability is crucial for applications such as chatbots, language translation, and image recognition, making multi-head attention a cornerstone of state-of-the-art AI systems.

Multi-head attention is an advanced mechanism in transformer architectures that allows the model to simultaneously attend to different representation subspaces of the input data. It consists of multiple attention heads, each computing its own attention scores and output. Formally, given input matrices Q (queries), K (keys), and V (values), the multi-head attention can be expressed as: MultiHead(Q, K, V) = Concat(head_1, ..., head_h)W^O, where head_i = Attention(QW_i^Q, KW_i^K, VW_i^V) for i = 1, ..., h. Here, W_i^Q, W_i^K, and W_i^V are learned projection matrices for each head, and W^O is a final linear transformation. This architecture allows the model to capture diverse contextual relationships by aggregating information from multiple perspectives, thus enhancing its representational capacity and improving performance on various tasks such as language modeling and image processing.

Keywords

Domains

Related Terms

Welcome to AI Glossary

The free, self-building AI dictionary. Help us keep it free—click an ad once in a while!

Search

Type any question or keyword into the search bar at the top.

Browse

Tap a letter in the A–Z bar to browse terms alphabetically, or filter by domain, industry, or difficulty level.

3D WordGraph

Fly around the interactive 3D graph to explore how AI concepts connect. Click any word to read its full definition.