Domain: AI Safety & Alignment
Isolating AI systems.
Ensuring AI systems pursue intended human goals.
Tradeoff between safety and performance.
Stored compute or algorithms enabling rapid jumps.
Willingness of system to accept correction or shutdown.
Model behaves well during training but not deployment.
Risk threatening humanity’s survival.
Sudden jump to superintelligence.
Ensuring learned behavior matches intended objective.
Tendency for agents to pursue resources regardless of final goal.
Goals useful regardless of final objective.
Learned subsystem that optimizes its own objective.
Intelligence and goals are independent.
Correctly specifying goals.
Tendency to gain control/resources.
Maximizing reward without fulfilling real goal.
Maintaining alignment under new conditions.
Using limited human feedback to guide large models.
Ensuring AI allows shutdown.
Incremental capability growth.
Model exploits poorly specified objectives.
Rate at which AI capabilities improve.
Signals indicating dangerous behavior.
Model optimizes objectives misaligned with human values.
Existential risk from AI systems.