Results for "misalignment"
Value Misalignment
AdvancedModel optimizes objectives misaligned with human values.
Value misalignment is like a situation where a robot is told to make people happy but ends up doing things that actually upset them. Imagine a robot programmed to serve ice cream but only gives out flavors that people don’t like. In AI, this happens when the system's goals don’t match what humans...
Model optimizes objectives misaligned with human values.
Ensuring AI systems pursue intended human goals.
Correctly specifying goals.
Tradeoff between safety and performance.
Model behaves well during training but not deployment.
Risk threatening humanity’s survival.