Results for "state-action value"
Expected cumulative reward from a state or state-action pair.
Predicts next state given current state and action.
Expected return of taking action in a state.
Inferring the agent’s internal state from noisy sensor data.
Continuous cycle of observation, reasoning, action, and feedback.
Optimizing continuous action sequences.
Learning action mapping directly from demonstrations.
Fundamental recursive relationship defining optimal value functions.
Combines value estimation (critic) with policy learning (actor).
Average value under a distribution.
Sample mean converges to expected value.
Central system to store model versions, metadata, approvals, and deployment state.
A broader capability to infer internal system state from telemetry, crucial for AI services and agents.
A system that perceives state, selects actions, and pursues goals—often combining LLM reasoning with tools and memory.
Monte Carlo method for state estimation.
Agents communicate via shared state.
Continuous loop adjusting actions based on state feedback.
Set of all actions available to the agent.
Stores past attention states to speed up autoregressive decoding.
Decomposes a matrix into orthogonal components; used in embeddings and compression.
Model optimizes objectives misaligned with human values.
Maximum expected loss under normal conditions.
Inferring and aligning with human preferences.
All possible configurations an agent may encounter.
Models time evolution via hidden states.