Key-Value Cache

Intermediate

Stores past attention states to speed up autoregressive decoding.

AdvertisementAd space — term-top

Why It Matters

The key-value cache is crucial for improving the efficiency of AI models during text generation and other sequential tasks. By allowing models to quickly access previously computed information, it significantly speeds up inference times and enhances the ability to maintain context over longer interactions. This capability is vital for applications like chatbots and virtual assistants, where timely and relevant responses are essential.

A key-value cache is a mechanism employed in transformer models, particularly during autoregressive decoding, to store previously computed attention states. This cache allows the model to efficiently retrieve past key and value representations without recomputing them for each new token generated. Mathematically, if K_t and V_t represent the keys and values for the t-th token, the cache is updated as K_cache = [K_cache; K_t] and V_cache = [V_cache; V_t]. This approach reduces the computational complexity from O(n^2) to O(n) for subsequent tokens, where n is the sequence length, thereby accelerating inference and enabling the model to handle longer sequences effectively. The key-value cache is particularly beneficial in applications such as text generation and dialogue systems, where maintaining context over long interactions is essential.

Keywords

Domains

Related Terms

Welcome to AI Glossary

The free, self-building AI dictionary. Help us keep it free—click an ad once in a while!

Search

Type any question or keyword into the search bar at the top.

Browse

Tap a letter in the A–Z bar to browse terms alphabetically, or filter by domain, industry, or difficulty level.

3D WordGraph

Fly around the interactive 3D graph to explore how AI concepts connect. Click any word to read its full definition.