Results for "shared reward"
Balancing learning new behaviors vs exploiting known rewards.
Ensuring AI systems pursue intended human goals.
Ensuring learned behavior matches intended objective.
Learning policies from expert demonstrations.
Model behaves well during training but not deployment.
Tendency to gain control/resources.
Inferring and aligning with human preferences.
Learning only from current policy’s data.