Results for "reasoning + action"
Set of all actions available to the agent.
Continuous cycle of observation, reasoning, action, and feedback.
Interleaving reasoning and tool use.
Stepwise reasoning patterns that can improve multi-step tasks; often handled implicitly or summarized for safety/privacy.
Temporary reasoning space (often hidden).
Expected cumulative reward from a state or state-action pair.
Expected return of taking action in a state.
Simple agent responding directly to inputs.
Predicts next state given current state and action.
Formal framework for sequential decision-making under uncertainty.
The field of building systems that perform tasks associated with human intelligence—perception, reasoning, language, planning, and decision-making—via algori...
Strategy mapping states to actions.
Fundamental recursive relationship defining optimal value functions.
Balancing learning new behaviors vs exploiting known rewards.
Optimizing policies directly via gradient ascent on expected reward.
Continuous loop adjusting actions based on state feedback.
Learning policies from expert demonstrations.
Learning action mapping directly from demonstrations.
Acting to minimize surprise or free energy.
Learning only from current policy’s data.
Methods for breaking goals into steps; can be classical (A*, STRIPS) or LLM-driven with tool calls.
What would have happened under different conditions.
System that independently pursues goals over time.
AI systems that perceive and act in the physical world through sensors and actuators.
Maximum number of tokens the model can attend to in one forward pass; constrains long-document reasoning.
Framework for reasoning about cause-effect relationships beyond correlation, often using structural assumptions and experiments.
Agent reasoning about future outcomes.
AI capable of performing most intellectual tasks humans can.
A learning paradigm where an agent interacts with an environment and learns to choose actions to maximize cumulative reward.
A high-priority instruction layer setting overarching behavior constraints for a chat model.