Results for "probability over text"
Generating speech audio from text, with control over prosody, speaker identity, and style.
Generating human-like speech from text.
A model that assigns probabilities to sequences of tokens; often trained by next-token prediction.
Converting text into discrete units (tokens) for modeling; subword tokenizers balance vocabulary size and coverage.
Joint vision-language model aligning images and text.
Describes likelihoods of random variable outcomes.
A high-capacity language model trained on massive corpora, exhibiting broad generalization and emergent behaviors.
Samples from the k highest-probability tokens to limit unlikely outputs.
Measures how one probability distribution diverges from another.
Probability of data given parameters.
Models that define an energy landscape rather than explicit probabilities.
Generates sequences one token at a time, conditioning on past tokens.
The text (and possibly other modalities) given to an LLM to condition its output behavior.
Converting audio speech into text, often using encoder-decoder or transducer architectures.
Stochastic generation strategies that trade determinism for diversity; key knobs include temperature and nucleus sampling.
Models that learn to generate samples resembling training data.
Average value under a distribution.
Eliminating variables by integrating over them.
Training objective where the model predicts the next token given previous tokens (causal modeling).
Expanding training data via transformations (flips, noise, paraphrases) to improve robustness.
Penalizes confident wrong predictions heavily; standard for classification and language modeling.
Updating beliefs about parameters using observed evidence and prior distributions.
Estimating parameters by maximizing likelihood of observed data.
Graphical model expressing factorization of a probability distribution.
The relationship between inputs and outputs changes over time, requiring monitoring and model updates.
Stores past attention states to speed up autoregressive decoding.
Samples from the smallest set of tokens whose probabilities sum to p, adapting set size by context.
A measure of randomness or uncertainty in a probability distribution.
Measures divergence between true and predicted probability distributions.
Measures how much information an observable random variable carries about unknown parameters.