Results for "human values"

AdvertisementAd space — search-top

105 results

Value Misalignment Advanced

Model optimizes objectives misaligned with human values.

AI Safety & Alignment
Human Oversight Intermediate

Required human review for high-risk decisions.

AI Economics & Strategy
Alignment Problem Advanced

Ensuring AI systems pursue intended human goals.

AI Safety & Alignment
Value Learning Intermediate

Inferring and aligning with human preferences.

Governance & Ethics
Human-in-the-Loop Control Frontier

Humans assist or override autonomous behavior.

World Models & Cognition
Human-in-the-Loop Intermediate

System design where humans validate or guide model outputs, especially for high-stakes decisions.

Foundations & Theory
Shared Autonomy Frontier

Control shared between human and agent.

World Models & Cognition
Outer Alignment Advanced

Correctly specifying goals.

AI Safety & Alignment
RLHF Intermediate

Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.

Optimization
Alignment Intermediate

Ensuring model behavior matches human goals, norms, and constraints, including reducing harmful or deceptive outputs.

Foundations & Theory
Scalable Oversight Advanced

Using limited human feedback to guide large models.

AI Safety & Alignment
Corrigibility Advanced

Willingness of system to accept correction or shutdown.

AI Safety & Alignment
Alignment Research Intermediate

Research ensuring AI remains safe.

Governance & Ethics
x-Risk Advanced

Existential risk from AI systems.

AI Safety & Alignment
Existential Risk Advanced

Risk threatening humanity’s survival.

AI Safety & Alignment
Reward Model Intermediate

Model trained to predict human preferences (or utility) for candidate outputs; used in RLHF-style pipelines.

Foundations & Theory
Random Variable Advanced

Variable whose values depend on chance.

Probability & Statistics
Orthogonality Thesis Advanced

Intelligence and goals are independent.

AI Safety & Alignment
Deceptive Alignment Advanced

Model behaves well during training but not deployment.

AI Safety & Alignment
Gesture Recognition Frontier

Interpreting human gestures.

World Models & Cognition
NLP Intermediate

AI subfield dealing with understanding and generating human language, including syntax, semantics, and pragmatics.

Foundations & Theory
Artificial Intelligence Intermediate

The field of building systems that perform tasks associated with human intelligence—perception, reasoning, language, planning, and decision-making—via algori...

Foundations & Theory
Parameters Intermediate

The learned numeric values of a model adjusted during training to minimize a loss function.

Foundations & Theory
Loss Function Intermediate

A function measuring prediction error (and sometimes calibration), guiding gradient-based optimization.

Foundations & Theory
AUC Intermediate

Scalar summary of ROC; measures ranking ability, not calibration.

Foundations & Theory
Mean Squared Error Intermediate

Average of squared residuals; common regression objective.

Optimization
Self-Attention Intermediate

Attention where queries/keys/values come from the same sequence, enabling token-to-token interactions.

Transformers & LLMs
SHAP Intermediate

Feature attribution method grounded in cooperative game theory for explaining predictions in tabular settings.

Foundations & Theory
Q-Function Intermediate

Expected return of taking action in a state.

AI Economics & Strategy
Forecasting Intermediate

Predicting future values from past observations.

Time Series

Welcome to AI Glossary

The free, self-building AI dictionary. Help us keep it free—click an ad once in a while!

Search

Type any question or keyword into the search bar at the top.

Browse

Tap a letter in the A–Z bar to browse terms alphabetically, or filter by domain, industry, or difficulty level.

3D WordGraph

Fly around the interactive 3D graph to explore how AI concepts connect. Click any word to read its full definition.