Search: policy consistency

On-Policy Learning Intermediate

Learning only from current policy’s data.

AI Economics & Strategy

Policy Search Advanced

Directly optimizing control policies.

Reinforcement Learning

Off-Policy Learning Intermediate

Learning from data generated by a different policy.

AI Economics & Strategy

Policy Gradient Intermediate

Optimizing policies directly via gradient ascent on expected reward.

AI Economics & Strategy

Policy Intermediate

Strategy mapping states to actions.

AI Economics & Strategy

Actor-Critic Intermediate

Combines value estimation (critic) with policy learning (actor).

AI Economics & Strategy

Self-Consistency Intro

Sampling multiple outputs and selecting consensus.

Prompting & Instructions

Data Labeling Intermediate

Human or automated process of assigning targets; quality, consistency, and guidelines matter heavily.

Foundations & Theory

Inter-Annotator Agreement Intermediate

Measure of consistency across labelers; low agreement indicates ambiguous tasks or poor guidelines.

Foundations & Theory

Agent Loop Intermediate

Continuous cycle of observation, reasoning, action, and feedback.

AI Economics & Strategy

Reinforcement Learning Intermediate

A learning paradigm where an agent interacts with an environment and learns to choose actions to maximize cumulative reward.

Reinforcement Learning

RLHF Intermediate

Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.

Optimization

Guardrails Intermediate

Rules and controls around generation (filters, validators, structured outputs) to reduce unsafe or invalid behavior.

Reinforcement Learning

Red Teaming Intermediate

Stress-testing models for failures, vulnerabilities, policy violations, and harmful behaviors before release.

Security & Privacy

Value Function Intermediate

Expected cumulative reward from a state or state-action pair.

AI Economics & Strategy

Planner-Executor Intermediate

Separates planning from execution in agent architectures.

AI Economics & Strategy

Explainability Requirement Intermediate

Legal or policy requirement to explain AI decisions.

AI Economics & Strategy

Controller Intermediate

Algorithm computing control actions.

Foundations & Theory

Model-Free RL Advanced

RL without explicit dynamics model.

Reinforcement Learning

Behavior Cloning Advanced

Learning action mapping directly from demonstrations.

Reinforcement Learning

Semi-Supervised Learning Intermediate

Training with a small labeled dataset plus a larger unlabeled dataset, leveraging assumptions like smoothness/cluster structure.

Machine Learning

SHAP Intermediate

Feature attribution method grounded in cooperative game theory for explaining predictions in tabular settings.

Foundations & Theory

Model Registry Intermediate

Central system to store model versions, metadata, approvals, and deployment state.

Foundations & Theory

Reproducibility Intermediate

Ability to replicate results given same code/data; harder in distributed training and nondeterministic ops.

Foundations & Theory

Structured Output Intermediate

Forcing predictable formats for downstream systems; reduces parsing errors and supports validation/guardrails.

Foundations & Theory

Feature Store Intermediate

Centralized repository for curated features.

MLOps & Infrastructure

Maximum Likelihood Estimation Intermediate

Estimating parameters by maximizing likelihood of observed data.

AI Economics & Strategy

Sentencing Algorithm Intermediate

Models estimating recidivism risk.

AI in Law

System Prompt Intermediate

A high-priority instruction layer setting overarching behavior constraints for a chat model.

Reinforcement Learning

Reward Model Intermediate

Model trained to predict human preferences (or utility) for candidate outputs; used in RLHF-style pipelines.

Foundations & Theory

Results for "policy consistency"

Welcome to AI Glossary

Search

Browse

3D WordGraph