DPO
IntermediateA preference-based training method optimizing policies directly from pairwise comparisons without explicit RL loops.
Full Definition
A preference-based training method optimizing policies directly from pairwise comparisons without explicit RL loops.
Keywords
Domains
Related Terms
Concept Map
See how DPO connects to other concepts.
Open Knowledge Graph