Results for "action possibilities"
Set of all actions available to the agent.
Continuous cycle of observation, reasoning, action, and feedback.
Expected cumulative reward from a state or state-action pair.
Expected return of taking action in a state.
Predicts next state given current state and action.
Formal framework for sequential decision-making under uncertainty.
Strategy mapping states to actions.
Fundamental recursive relationship defining optimal value functions.
Optimizing policies directly via gradient ascent on expected reward.
Balancing learning new behaviors vs exploiting known rewards.
Interleaving reasoning and tool use.
Simple agent responding directly to inputs.
Continuous loop adjusting actions based on state feedback.
Learning policies from expert demonstrations.
Learning action mapping directly from demonstrations.
Learning only from current policy’s data.
Acting to minimize surprise or free energy.
A learning paradigm where an agent interacts with an environment and learns to choose actions to maximize cumulative reward.
A high-priority instruction layer setting overarching behavior constraints for a chat model.
Combines value estimation (critic) with policy learning (actor).
Methods for breaking goals into steps; can be classical (A*, STRIPS) or LLM-driven with tool calls.
Separates planning from execution in agent architectures.
What would have happened under different conditions.
System that independently pursues goals over time.
AI systems that perceive and act in the physical world through sensors and actuators.
RL using learned or known environment models.
Directly optimizing control policies.
Optimizing continuous action sequences.
Reward only given upon task completion.
Imagined future trajectories.