Results for "cumulative probability"
Stochastic generation strategies that trade determinism for diversity; key knobs include temperature and nucleus sampling.
Describes likelihoods of random variable outcomes.
A learning paradigm where an agent interacts with an environment and learns to choose actions to maximize cumulative reward.
Expected cumulative reward from a state or state-action pair.
Samples from the smallest set of tokens whose probabilities sum to p, adapting set size by context.
Samples from the k highest-probability tokens to limit unlikely outputs.
Probability of data given parameters.
Strategy mapping states to actions.
Formal framework for sequential decision-making under uncertainty.
Identifying abrupt changes in data generation.
Expected return of taking action in a state.
Measures how one probability distribution diverges from another.
Models that define an energy landscape rather than explicit probabilities.
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
Search algorithm for generation that keeps top-k partial sequences; can improve likelihood but reduce diversity.
Set of all actions available to the agent.
Finding control policies minimizing cumulative cost.
Incremental capability growth.
Penalizes confident wrong predictions heavily; standard for classification and language modeling.
Estimating parameters by maximizing likelihood of observed data.
Updating beliefs about parameters using observed evidence and prior distributions.
Graphical model expressing factorization of a probability distribution.
Average value under a distribution.
A model that assigns probabilities to sequences of tokens; often trained by next-token prediction.
A model is PAC-learnable if it can, with high probability, learn an approximately correct hypothesis from finite samples.
A measure of randomness or uncertainty in a probability distribution.
Measures divergence between true and predicted probability distributions.
Measures how much information an observable random variable carries about unknown parameters.
Probability of treatment assignment given covariates.
Two-network setup where generator fools a discriminator.