Results for "expected return"
Optimizing policies directly via gradient ascent on expected reward.
Returns above benchmark.
Expected cumulative reward from a state or state-action pair.
Expected return of taking action in a state.
Strategy mapping states to actions.
A learning paradigm where an agent interacts with an environment and learns to choose actions to maximize cumulative reward.
Formal framework for sequential decision-making under uncertainty.
Balancing learning new behaviors vs exploiting known rewards.
Directly optimizing control policies.
Learning only from current policy’s data.
Sample mean converges to expected value.
Centralized AI expertise group.
Assigning AI costs to business units.
System returns to equilibrium after disturbance.
Stability proven via monotonic decrease of Lyapunov function.
Quantifying financial risk.
Error due to sensitivity to fluctuations in the training dataset.
Expected causal effect of a treatment.
Approximating expectations via random sampling.
Maximum expected loss under normal conditions.
Methods that learn training procedures or initializations so models can adapt quickly to new tasks with little data.
A scalar measure optimized during training, typically expected loss over data, sometimes with regularization terms.
How well a model performs on new data drawn from the same (or similar) distribution as training.
Randomly zeroing activations during training to reduce co-adaptation and overfitting.
A high-priority instruction layer setting overarching behavior constraints for a chat model.
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
Feature attribution method grounded in cooperative game theory for explaining predictions in tabular settings.
Constraining model outputs into a schema used to call external APIs/tools safely and deterministically.
Separates planning from execution in agent architectures.
Systematic error introduced by simplifying assumptions in a learning algorithm.