Results for "language"
Language Model
IntermediateA model that assigns probabilities to sequences of tokens; often trained by next-token prediction.
A language model is like a smart assistant that predicts what word comes next in a sentence based on the words that came before it. Imagine you’re playing a word game where you have to guess the next word in a sentence. The model learns from a huge amount of text, like books and articles, to unde...
Networks with recurrent connections for sequences; largely supplanted by Transformers for many tasks.
Injects sequence order into Transformers, since attention alone is permutation-invariant.
An RNN variant using gates to mitigate vanishing gradients and capture longer context.
Converting text into discrete units (tokens) for modeling; subword tokenizers balance vocabulary size and coverage.
Architecture based on self-attention and feedforward layers; foundation of modern LLMs and many multimodal models.
Training objective where the model predicts the next token given previous tokens (causal modeling).
Generates sequences one token at a time, conditioning on past tokens.
A datastore optimized for similarity search over embeddings, enabling semantic retrieval at scale.
Retrieval based on embedding similarity rather than keyword overlap, capturing paraphrases and related concepts.
Expanding training data via transformations (flips, noise, paraphrases) to improve robustness.
Search algorithm for generation that keeps top-k partial sequences; can improve likelihood but reduce diversity.
Scales logits before sampling; higher increases randomness/diversity, lower increases determinism.
Samples from the k highest-probability tokens to limit unlikely outputs.
Samples from the smallest set of tokens whose probabilities sum to p, adapting set size by context.
Converts logits to probabilities by exponentiation and normalization; common in classification and LMs.
A dataset + metric suite for comparing models; can be gamed or misaligned with real-world goals.
Attacks that manipulate model instructions (especially via retrieved content) to override system goals or exfiltrate data.
Methods for breaking goals into steps; can be classical (A*, STRIPS) or LLM-driven with tool calls.
Generating speech audio from text, with control over prosody, speaker identity, and style.
Prevents attention to future tokens during training/inference.
Allows model to attend to information from different subspaces simultaneously.
Attention mechanisms that reduce quadratic complexity.
Routes inputs to subsets of parameters for scalable capacity.
Capabilities that appear only beyond certain model sizes.
Structured graph encoding facts as entity–relation–entity triples.
Probabilistic graphical model for structured prediction.
Transformer applied to image patches.
Generating human-like speech from text.
Maps audio signals to linguistic units.
Identifying speakers in audio.