Results for "hidden objectives"
Probabilistic energy-based neural network with hidden variables.
Probabilistic model for sequential data with latent states.
Model exploits poorly specified objectives.
Learned subsystem that optimizes its own objective.
Hidden behavior activated by specific triggers, causing targeted mispredictions or undesired outputs.
Extracting system prompts or hidden instructions.
Simplified Boltzmann Machine with bipartite structure.
Temporary reasoning space (often hidden).
Maximizing reward without fulfilling real goal.
Tendency for agents to pursue resources regardless of final goal.
Model optimizes objectives misaligned with human values.
Correctly specifying goals.
Willingness of system to accept correction or shutdown.
Agents have opposing objectives.
Learning structure from unlabeled data, such as discovering groups, compressing representations, or modeling data distributions.
A parameterized function composed of interconnected units organized in layers with nonlinear activations.
Networks with recurrent connections for sequences; largely supplanted by Transformers for many tasks.
Neural networks can approximate any continuous function under certain conditions.
A hidden variable influences both cause and effect, biasing naive estimates of causal impact.
A narrow hidden layer forcing compact representations.
Converting audio speech into text, often using encoder-decoder or transducer architectures.
Maps audio signals to linguistic units.
Temporal and pitch characteristics of speech.
Models time evolution via hidden states.
Inferring human goals from behavior.
Learning from data by constructing “pseudo-labels” (e.g., next-token prediction, masked modeling) without manual annotation.
Multiple agents interacting cooperatively or competitively.
Coordination arising without explicit programming.
Generator produces limited variety of outputs.
Decomposing goals into sub-tasks.