Results for "policies"
Optimizing policies directly via gradient ascent on expected reward.
A preference-based training method optimizing policies directly from pairwise comparisons without explicit RL loops.
Policies and practices for approving, monitoring, auditing, and documenting models in production.
Strategy mapping states to actions.
Finding control policies minimizing cumulative cost.
Framework for reasoning about cause-effect relationships beyond correlation, often using structural assumptions and experiments.
Processes and controls for data quality, access, lineage, retention, and compliance across the AI lifecycle.
Measures how one probability distribution diverges from another.
Systematic review of model/data processes to ensure performance, fairness, security, and policy compliance.
Set of all actions available to the agent.
Formal framework for sequential decision-making under uncertainty.
Fundamental recursive relationship defining optimal value functions.
Expected cumulative reward from a state or state-action pair.
Balancing learning new behaviors vs exploiting known rewards.
Expected return of taking action in a state.
Storing results to reduce compute.
Learning from data generated by a different policy.
Algorithm computing control actions.
Model behaves well during training but not deployment.
RL without explicit dynamics model.
RL using learned or known environment models.
Directly optimizing control policies.
Learning policies from expert demonstrations.
Regulating access to large-scale compute.