Results for "hidden objectives"
Specification Gaming
Advanced
Model exploits poorly specified objectives.
Value Misalignment
Advanced
Model optimizes objectives misaligned with human values.
Competitive Game
Advanced
Agents have opposing objectives.
Confounding
Intermediate
A hidden variable influences both cause and effect, biasing naive estimates of causal impact.
Backdoor / Trojan
Intermediate
Hidden behavior activated by specific triggers, causing targeted mispredictions or undesired outputs.
Bottleneck Layer
Intermediate
A narrow hidden layer forcing compact representations.
Prompt Leakage
Intermediate
Extracting system prompts or hidden instructions.
Boltzmann Machine
Intermediate
Probabilistic energy-based neural network with hidden variables.
State Space Model
Intermediate
Models time evolution via hidden states.
Scratchpad
Intro
Temporary reasoning space (often hidden).
Hidden Markov Model
Intermediate
Probabilistic model for sequential data with latent states.