Results for "human values"

23 results

Value Misalignment Advanced

Model optimizes objectives misaligned with human values.

AI Safety & Alignment
Parameters Intermediate

The learned numeric values of a model adjusted during training to minimize a loss function.

Foundations & Theory
Self-Attention Intermediate

Attention where queries/keys/values come from the same sequence, enabling token-to-token interactions.

Transformers & LLMs
Forecasting Intermediate

Predicting future values from past observations.

Time Series
Random Variable Advanced

Variable whose values depend on chance.

Probability & Statistics
Human Oversight Intermediate

Required human review for high-risk decisions.

AI Economics & Strategy
Artificial Intelligence Intermediate

The field of building systems that perform tasks associated with human intelligence—perception, reasoning, language, planning, and decision-making—via algori...

Foundations & Theory
RLHF Intermediate

Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.

Optimization
Reward Model Intermediate

Model trained to predict human preferences (or utility) for candidate outputs; used in RLHF-style pipelines.

Foundations & Theory
Alignment Intermediate

Ensuring model behavior matches human goals, norms, and constraints, including reducing harmful or deceptive outputs.

Foundations & Theory
Data Labeling Intermediate

Human or automated process of assigning targets; quality, consistency, and guidelines matter heavily.

Foundations & Theory
NLP Intermediate

AI subfield dealing with understanding and generating human language, including syntax, semantics, and pragmatics.

Foundations & Theory
Speech Synthesis Intermediate

Generating human-like speech from text.

Speech & Audio AI
Alignment Problem Advanced

Ensuring AI systems pursue intended human goals.

AI Safety & Alignment
Scalable Oversight Advanced

Using limited human feedback to guide large models.

AI Safety & Alignment
Commonsense Physics Frontier

Human-like understanding of physical behavior.

World Models & Cognition
Teleoperation Frontier

Human controlling robot remotely.

World Models & Cognition
Shared Autonomy Frontier

Control shared between human and agent.

World Models & Cognition
Intent Recognition Frontier

Inferring human goals from behavior.

World Models & Cognition
Gesture Recognition Frontier

Interpreting human gestures.

World Models & Cognition
Value Learning Intermediate

Inferring and aligning with human preferences.

Governance & Ethics
Human-in-the-Loop Intermediate

System design where humans validate or guide model outputs, especially for high-stakes decisions.

Foundations & Theory
Human-in-the-Loop Control Frontier

Humans assist or override autonomous behavior.

World Models & Cognition