Results for "learned objectives"

14 results

Hyperparameters Intermediate

Configuration choices not learned directly (or not typically learned) that govern training or architecture.

Optimization
Specification Gaming Advanced

Model exploits poorly specified objectives.

AI Safety & Alignment
Value Misalignment Advanced

Model optimizes objectives misaligned with human values.

AI Safety & Alignment
Competitive Game Advanced

Agents have opposing objectives.

Agents & Autonomy
Latent Space Intermediate

The internal space where learned representations live; operations here often correlate with semantics or generative factors.

Foundations & Theory
Model Intermediate

A parameterized mapping from inputs to outputs; includes architecture + learned parameters.

Foundations & Theory
Parameters Intermediate

The learned numeric values of a model adjusted during training to minimize a loss function.

Foundations & Theory
Computational Learning Theory Intermediate

A theoretical framework analyzing what classes of functions can be learned, how efficiently, and with what guarantees.

AI Economics & Strategy
Highway Network Intermediate

Early architecture using learned gates for skip connections.

AI Economics & Strategy
Inner Alignment Advanced

Ensuring learned behavior matches intended objective.

AI Safety & Alignment
Mesa-Optimizer Advanced

Learned subsystem that optimizes its own objective.

AI Safety & Alignment
Overgeneralization Intermediate

Applying learned patterns incorrectly.

Model Failure Modes
Model-Based RL Advanced

RL using learned or known environment models.

Reinforcement Learning
World Model Frontier

Learned model of environment dynamics.

World Models & Cognition