Results for "shared reward"
Maximizing reward without fulfilling real goal.
Modifying reward to accelerate learning.
Inferring reward function from observed behavior.
Model trained to predict human preferences (or utility) for candidate outputs; used in RLHF-style pipelines.
Reward only given upon task completion.
Training one model on multiple tasks simultaneously to improve generalization through shared structure.
Agents communicate via shared state.
Control shared between human and agent.
Designing AI to cooperate with humans and each other.
Model exploits poorly specified objectives.
Expected cumulative reward from a state or state-action pair.
A learning paradigm where an agent interacts with an environment and learns to choose actions to maximize cumulative reward.
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
Model optimizes objectives misaligned with human values.
Rules and controls around generation (filters, validators, structured outputs) to reduce unsafe or invalid behavior.
Correctly specifying goals.
Optimizing policies directly via gradient ascent on expected reward.
Quantifies shared information between random variables.
GNN using attention to weight neighbor contributions dynamically.
Joint vision-language model aligning images and text.
Decomposing goals into sub-tasks.
Humans assist or override autonomous behavior.
Human controlling robot remotely.
Agents optimize collective outcomes.
Emergence of conventions among agents.
A preference-based training method optimizing policies directly from pairwise comparisons without explicit RL loops.
Formal framework for sequential decision-making under uncertainty.
Strategy mapping states to actions.
Expected return of taking action in a state.
Fundamental recursive relationship defining optimal value functions.