Results for "text"
Context Window
IntermediateMaximum number of tokens the model can attend to in one forward pass; constrains long-document reasoning.
The context window is like a pair of glasses that helps a language model see a certain number of words at a time. If the model can only see a few words, it might miss important details that come later in a long sentence or paragraph. For example, if you’re reading a book and can only see one page...
Generating speech audio from text, with control over prosody, speaker identity, and style.
Generating human-like speech from text.
Converting text into discrete units (tokens) for modeling; subword tokenizers balance vocabulary size and coverage.
Joint vision-language model aligning images and text.
A model that assigns probabilities to sequences of tokens; often trained by next-token prediction.
A high-capacity language model trained on massive corpora, exhibiting broad generalization and emergent behaviors.
Generates sequences one token at a time, conditioning on past tokens.
The text (and possibly other modalities) given to an LLM to condition its output behavior.
Converting audio speech into text, often using encoder-decoder or transducer architectures.
Mechanism that computes context-aware mixtures of representations; scales well and captures long-range dependencies.
Architecture based on self-attention and feedforward layers; foundation of modern LLMs and many multimodal models.
Training objective where the model predicts the next token given previous tokens (causal modeling).
Human or automated process of assigning targets; quality, consistency, and guidelines matter heavily.
Expanding training data via transformations (flips, noise, paraphrases) to improve robustness.
Inputs crafted to cause model errors or unsafe behavior, often imperceptible in vision or subtle in text.
Attacks that manipulate model instructions (especially via retrieved content) to override system goals or exfiltrate data.
Models that process or generate multiple modalities, enabling vision-language tasks, speech, video understanding, etc.
Prevents attention to future tokens during training/inference.
AI subfield dealing with understanding and generating human language, including syntax, semantics, and pragmatics.
Stores past attention states to speed up autoregressive decoding.
Models that learn to generate samples resembling training data.
Combining signals from multiple modalities.
Attention between different modalities.
Generates audio waveforms from spectrograms.
AI supporting legal research, drafting, and analysis.