DPO

Intermediate

A preference-based training method optimizing policies directly from pairwise comparisons without explicit RL loops.

Full Definition

A preference-based training method optimizing policies directly from pairwise comparisons without explicit RL loops.

Keywords

Domains

Related Terms

Concept Map

See how DPO connects to other concepts.

Open Knowledge Graph