Results for "reward hacking"
Reward Hacking
Advanced
Maximizing reward without fulfilling real goal.
Reward Shaping
Advanced
Modifying reward to accelerate learning.
Sparse Reward
Advanced
Reward only given upon task completion.
Reinforcement Learning
Intermediate
A learning paradigm where an agent interacts with an environment and learns to choose actions to maximize cumulative reward.
RLHF
Intermediate
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
Value Function
Intermediate
Expected cumulative reward from a state or state-action pair.
Policy Gradient
Intermediate
Optimizing policies directly via gradient ascent on expected reward.
Inverse Reinforcement Learning
Advanced
Inferring reward function from observed behavior.
Reward Model
Intermediate
Model trained to predict human preferences (or utility) for candidate outputs; used in RLHF-style pipelines.