Results for "state-action value"
Expected cumulative reward from a state or state-action pair.
Expected return of taking action in a state.
Set of all actions available to the agent.
Fundamental recursive relationship defining optimal value functions.
Continuous cycle of observation, reasoning, action, and feedback.
Formal framework for sequential decision-making under uncertainty.
Predicts next state given current state and action.
Models time evolution via hidden states.
Strategy mapping states to actions.
Inferring the agent’s internal state from noisy sensor data.
Optimizing policies directly via gradient ascent on expected reward.
Continuous loop adjusting actions based on state feedback.
Combines value estimation (critic) with policy learning (actor).
All possible configurations an agent may encounter.
Directly optimizing control policies.
Simple agent responding directly to inputs.
Learning action mapping directly from demonstrations.
Learning only from current policy’s data.
Stores past attention states to speed up autoregressive decoding.
Maximum expected loss under normal conditions.
A learning paradigm where an agent interacts with an environment and learns to choose actions to maximize cumulative reward.
Optimal estimator for linear dynamic systems.
Interleaving reasoning and tool use.
Sample mean converges to expected value.
Approximating expectations via random sampling.
Model optimizes objectives misaligned with human values.
Inferring and aligning with human preferences.
Monte Carlo method for state estimation.
RL using learned or known environment models.
Reward only given upon task completion.