Results for "shared reward"
Control shared between human and agent.
Maximizing reward without fulfilling real goal.
Modifying reward to accelerate learning.
Reward only given upon task completion.
Training one model on multiple tasks simultaneously to improve generalization through shared structure.
Quantifies shared information between random variables.
Agents communicate via shared state.
A learning paradigm where an agent interacts with an environment and learns to choose actions to maximize cumulative reward.
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
Expected cumulative reward from a state or state-action pair.
Optimizing policies directly via gradient ascent on expected reward.
Inferring reward function from observed behavior.
Model trained to predict human preferences (or utility) for candidate outputs; used in RLHF-style pipelines.