Results for "recursive value"
Fundamental recursive relationship defining optimal value functions.
Expected cumulative reward from a state or state-action pair.
Model trained on its own outputs degrades quality.
Stores past attention states to speed up autoregressive decoding.
Combines value estimation (critic) with policy learning (actor).
Maximum expected loss under normal conditions.
Expected return of taking action in a state.
Sample mean converges to expected value.
Optimizing policies directly via gradient ascent on expected reward.
Approximating expectations via random sampling.
Model optimizes objectives misaligned with human values.
Directly optimizing control policies.
Inferring and aligning with human preferences.
Average of squared residuals; common regression objective.
Iterative method that updates parameters in the direction of negative gradient to minimize loss.
Feature attribution method grounded in cooperative game theory for explaining predictions in tabular settings.
Limiting gradient magnitude to prevent exploding gradients.
Models that define an energy landscape rather than explicit probabilities.
Attention between different modalities.
Decomposes a matrix into orthogonal components; used in embeddings and compression.
Average value under a distribution.
Agents optimize collective outcomes.
A learning paradigm where an agent interacts with an environment and learns to choose actions to maximize cumulative reward.
Of predicted positives, the fraction that are truly positive; sensitive to false positives.
Scalar summary of ROC; measures ranking ability, not calibration.
A wide basin often correlated with better generalization.
Systematic error introduced by simplifying assumptions in a learning algorithm.
Gradually increasing learning rate at training start to avoid divergence.
Estimating parameters by maximizing likelihood of observed data.
Strategy mapping states to actions.