Results for "policy"
Off-Policy Learning
Intermediate
Learning from data generated by a different policy.
On-Policy Learning
Intermediate
Learning only from current policy’s data.
RLHF
Intermediate
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
Audit
Intermediate
Systematic review of model/data processes to ensure performance, fairness, security, and policy compliance.
Red Teaming
Intermediate
Stress-testing models for failures, vulnerabilities, policy violations, and harmful behaviors before release.
Actor-Critic
Intermediate
Combines value estimation (critic) with policy learning (actor).
Explainability Requirement
Intermediate
Legal or policy requirement to explain AI decisions.
Policy
Intermediate
Strategy mapping states to actions.
Policy Gradient
Intermediate
Optimizing policies directly via gradient ascent on expected reward.
Policy Search
Advanced
Directly optimizing control policies.