Results for "human values"
Probability of data given parameters.
Tendency for agents to pursue resources regardless of final goal.
Ensuring learned behavior matches intended objective.
Learned subsystem that optimizes its own objective.
Maintaining alignment under new conditions.
Tradeoff between safety and performance.
Isolating AI systems.
Signals indicating dangerous behavior.
Tendency to gain control/resources.
Designing AI to cooperate with humans and each other.
A preference-based training method optimizing policies directly from pairwise comparisons without explicit RL loops.
Human or automated process of assigning targets; quality, consistency, and guidelines matter heavily.
Generating human-like speech from text.
European regulation classifying AI systems by risk.
Human-like understanding of physical behavior.
Human controlling robot remotely.
Robots learning via exploration and growth.
AI capable of performing most intellectual tasks humans can.
Ensuring AI allows shutdown.
System-level design for general intelligence.
A measurable property or attribute used as model input (raw or engineered), such as age, pixel intensity, or token ID.
A scalar measure optimized during training, typically expected loss over data, sometimes with regularization terms.
Plots true positive rate vs false positive rate across thresholds; summarizes separability.
Often more informative than ROC on imbalanced datasets; focuses on positive class performance.
A proper scoring rule measuring squared error of predicted probabilities for binary outcomes.
Methods to set starting weights to preserve signal/gradient scales across layers.
Nonlinear functions enabling networks to approximate complex mappings; ReLU variants dominate modern DL.
Mechanism that computes context-aware mixtures of representations; scales well and captures long-range dependencies.
Architecture based on self-attention and feedforward layers; foundation of modern LLMs and many multimodal models.
Measure of consistency across labelers; low agreement indicates ambiguous tasks or poor guidelines.