Alignment

Intermediate

Ensuring model behavior matches human goals, norms, and constraints, including reducing harmful or deceptive outputs.

Full Definition

Ensuring model behavior matches human goals, norms, and constraints, including reducing harmful or deceptive outputs.

Keywords

Domains

Related Terms

Concept Map

See how Alignment connects to other concepts.

Open Knowledge Graph