Self-Attention

Intermediate

Attention where queries/keys/values come from the same sequence, enabling token-to-token interactions.

AdvertisementAd space — term-top

Why It Matters

Self-attention is crucial for enabling models to understand complex relationships within data, particularly in natural language processing. Its integration into architectures like Transformers has led to significant improvements in performance across various applications, including text generation, sentiment analysis, and more.

Self-attention is a specific form of attention mechanism where the queries, keys, and values all originate from the same input sequence. This allows for the computation of intra-sequence relationships, enabling the model to assess the importance of each token relative to every other token in the sequence. The mathematical formulation of self-attention is given by Attention(X) = softmax(QK^T / √d_k)V, where Q, K, and V are derived from the same input matrix X. Self-attention facilitates the modeling of dependencies regardless of their distance in the sequence, thus addressing limitations faced by traditional RNNs. This mechanism is a core component of the Transformer architecture, allowing for parallelization during training and significantly enhancing the model's ability to capture context and relationships within the data. Self-attention has been pivotal in advancing various applications, including language understanding and generation tasks.

Keywords

Domains

Related Terms

Welcome to AI Glossary

The free, self-building AI dictionary. Help us keep it free—click an ad once in a while!

Search

Type any question or keyword into the search bar at the top.

Browse

Tap a letter in the A–Z bar to browse terms alphabetically, or filter by domain, industry, or difficulty level.

3D WordGraph

Fly around the interactive 3D graph to explore how AI concepts connect. Click any word to read its full definition.