Results for "human values"
Model optimizes objectives misaligned with human values.
The learned numeric values of a model adjusted during training to minimize a loss function.
Attention where queries/keys/values come from the same sequence, enabling token-to-token interactions.
Predicting future values from past observations.
Variable whose values depend on chance.
Required human review for high-risk decisions.
The field of building systems that perform tasks associated with human intelligence—perception, reasoning, language, planning, and decision-making—via algori...
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
Model trained to predict human preferences (or utility) for candidate outputs; used in RLHF-style pipelines.
Ensuring model behavior matches human goals, norms, and constraints, including reducing harmful or deceptive outputs.
Human or automated process of assigning targets; quality, consistency, and guidelines matter heavily.
AI subfield dealing with understanding and generating human language, including syntax, semantics, and pragmatics.
Generating human-like speech from text.
Ensuring AI systems pursue intended human goals.
Using limited human feedback to guide large models.
Human-like understanding of physical behavior.
Human controlling robot remotely.
Control shared between human and agent.
Inferring human goals from behavior.
Interpreting human gestures.
Inferring and aligning with human preferences.
System design where humans validate or guide model outputs, especially for high-stakes decisions.
Humans assist or override autonomous behavior.