Domain: Reinforcement Learning
Learning action mapping directly from demonstrations.
Predicts next state given current state and action.
Rules and controls around generation (filters, validators, structured outputs) to reduce unsafe or invalid behavior.
Learning policies from expert demonstrations.
Inferring reward function from observed behavior.
RL using learned or known environment models.
RL without explicit dynamics model.
Directly optimizing control policies.
A learning paradigm where an agent interacts with an environment and learns to choose actions to maximize cumulative reward.
Modifying reward to accelerate learning.
Reward only given upon task completion.
A high-priority instruction layer setting overarching behavior constraints for a chat model.
Optimizing continuous action sequences.