Reward Model

Intermediate

Model trained to predict human preferences (or utility) for candidate outputs; used in RLHF-style pipelines.

Full Definition

Model trained to predict human preferences (or utility) for candidate outputs; used in RLHF-style pipelines.

Keywords

Domains

Related Terms

Concept Map

See how Reward Model connects to other concepts.

Open Knowledge Graph