Results for "misalignment"

Value Misalignment

Advanced

Model optimizes objectives misaligned with human values.

Value misalignment is like a situation where a robot is told to make people happy but ends up doing things that actually upset them. Imagine a robot programmed to serve ice cream but only gives out flavors that people don’t like. In AI, this happens when the system's goals don’t match what humans...

Full Definition View in 3D WordGraph

6 results

Value Misalignment Advanced

Model optimizes objectives misaligned with human values.

AI Safety & Alignment

Alignment Problem Advanced

Ensuring AI systems pursue intended human goals.

AI Safety & Alignment

Outer Alignment Advanced

Correctly specifying goals.

AI Safety & Alignment

Alignment Tax Advanced