Masked Language Model

Intermediate

Predicts masked tokens in a sequence, enabling bidirectional context; often used for embeddings rather than generation.

AdvertisementAd space — term-top

Why It Matters

Masked language models are crucial for improving natural language understanding, enabling applications like search engines and virtual assistants to better comprehend user queries. Their ability to leverage context from both directions enhances the quality of AI interactions, making them more intuitive and effective.

A masked language model (MLM) is a type of neural network architecture designed to predict masked tokens within a sequence, allowing the model to leverage bidirectional context. In contrast to autoregressive models, which predict the next token in a unidirectional manner, MLMs are trained by randomly masking a portion of the input tokens and requiring the model to predict these masked tokens based on the surrounding context. Mathematically, this involves maximizing the conditional probability P(w_i | context), where w_i represents the masked token and 'context' includes both preceding and following tokens. The training objective typically employs cross-entropy loss to evaluate the model's predictions against the actual masked tokens. This architecture is foundational for models such as BERT (Bidirectional Encoder Representations from Transformers), which has demonstrated significant improvements in various natural language understanding tasks, including sentiment analysis, question answering, and named entity recognition.

Keywords

Domains

Related Terms

Welcome to AI Glossary

The free, self-building AI dictionary. Help us keep it free—click an ad once in a while!

Search

Type any question or keyword into the search bar at the top.

Browse

Tap a letter in the A–Z bar to browse terms alphabetically, or filter by domain, industry, or difficulty level.

3D WordGraph

Fly around the interactive 3D graph to explore how AI concepts connect. Click any word to read its full definition.