Masked Language Model

Predicts masked tokens in a sequence, enabling bidirectional context; often used for embeddings rather than generation.

Why It Matters

Masked language models are crucial for improving natural language understanding, enabling applications like search engines and virtual assistants to better comprehend user queries. Their ability to leverage context from both directions enhances the quality of AI interactions, making them more intuitive and effective.

A masked language model (MLM) is a type of neural network architecture designed to predict masked tokens within a sequence, allowing the model to leverage bidirectional context. In contrast to autoregressive models, which predict the next token in a unidirectional manner, MLMs are trained by randomly masking a portion of the input tokens and requiring the model to predict these masked tokens based on the surrounding context. Mathematically, this involves maximizing the conditional probability P(w_i | context), where w_i represents the masked token and 'context' includes both preceding and following tokens. The training objective typically employs cross-entropy loss to evaluate the model's predictions against the actual masked tokens. This architecture is foundational for models such as BERT (Bidirectional Encoder Representations from Transformers), which has demonstrated significant improvements in various natural language understanding tasks, including sentiment analysis, question answering, and named entity recognition.

Keywords

BERT-style

Domains

Foundations & Theory

Related Terms

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 3

3D WordGraph

Full 3D WordGraph

Click a connected term to explore it. The center node is Masked Language Model.

Relationship Types

related to broader / narrower prerequisite of contrasts with used in

Why It Matters

Keywords

Domains

Related Terms

Welcome to AI Glossary

Search

Browse

3D WordGraph