Search: human values — AI Glossary

Likelihood Function Advanced

Probability of data given parameters.

Probability & Statistics

Instrumental Convergence Advanced

Tendency for agents to pursue resources regardless of final goal.

AI Safety & Alignment

Inner Alignment Advanced

Ensuring learned behavior matches intended objective.

AI Safety & Alignment

Mesa-Optimizer Advanced

Learned subsystem that optimizes its own objective.

AI Safety & Alignment

Robust Alignment Advanced

Maintaining alignment under new conditions.

AI Safety & Alignment

Alignment Tax Advanced

Tradeoff between safety and performance.

AI Safety & Alignment

AI Boxing Advanced

Isolating AI systems.

AI Safety & Alignment

Tripwire Advanced

Signals indicating dangerous behavior.

AI Safety & Alignment

Power-Seeking Behavior Advanced

Tendency to gain control/resources.

AI Safety & Alignment

Cooperative AI Intermediate

Designing AI to cooperate with humans and each other.

Governance & Ethics

DPO Intermediate

A preference-based training method optimizing policies directly from pairwise comparisons without explicit RL loops.

Optimization

Data Labeling Intermediate

Human or automated process of assigning targets; quality, consistency, and guidelines matter heavily.

Foundations & Theory

Speech Synthesis Intermediate

Generating human-like speech from text.

Speech & Audio AI

EU AI Act Intermediate

European regulation classifying AI systems by risk.

Governance & Ethics

Commonsense Physics Frontier

Human-like understanding of physical behavior.

World Models & Cognition

Teleoperation Frontier

Human controlling robot remotely.

World Models & Cognition

Developmental Robotics Advanced

Robots learning via exploration and growth.

Agents & Autonomy

Artificial General Intelligence Frontier

AI capable of performing most intellectual tasks humans can.

AGI & General Intelligence

Shutdown Problem Advanced

Ensuring AI allows shutdown.

AI Safety & Alignment

Cognitive Architecture Frontier

System-level design for general intelligence.

AGI & General Intelligence

Feature Intermediate

A measurable property or attribute used as model input (raw or engineered), such as age, pixel intensity, or token ID.

Foundations & Theory

Objective Function Intermediate

A scalar measure optimized during training, typically expected loss over data, sometimes with regularization terms.

Optimization

ROC Curve Intermediate

Plots true positive rate vs false positive rate across thresholds; summarizes separability.

Foundations & Theory

PR Curve Intermediate

Often more informative than ROC on imbalanced datasets; focuses on positive class performance.

Evaluation & Benchmarking

Brier Score Intermediate

A proper scoring rule measuring squared error of predicted probabilities for binary outcomes.

Evaluation & Benchmarking

Weight Initialization Intermediate

Methods to set starting weights to preserve signal/gradient scales across layers.

Foundations & Theory

Activation Function Intermediate

Nonlinear functions enabling networks to approximate complex mappings; ReLU variants dominate modern DL.

Foundations & Theory

Attention Intermediate

Mechanism that computes context-aware mixtures of representations; scales well and captures long-range dependencies.

Transformers & LLMs

Transformer Intermediate

Architecture based on self-attention and feedforward layers; foundation of modern LLMs and many multimodal models.

Transformers & LLMs

Inter-Annotator Agreement Intermediate

Measure of consistency across labelers; low agreement indicates ambiguous tasks or poor guidelines.

Foundations & Theory

Results for "human values"

Welcome to AI Glossary

Search

Browse

3D WordGraph