Search: reward inference

Reward Hacking Advanced

Maximizing reward without fulfilling real goal.

AI Safety & Alignment

Reward Shaping Advanced

Modifying reward to accelerate learning.

Reinforcement Learning

Inverse Reinforcement Learning Advanced

Inferring reward function from observed behavior.

Reinforcement Learning

Reward Model Intermediate

Model trained to predict human preferences (or utility) for candidate outputs; used in RLHF-style pipelines.

Foundations & Theory

Sparse Reward Advanced

Reward only given upon task completion.

Reinforcement Learning

Specification Gaming Advanced

Model exploits poorly specified objectives.

AI Safety & Alignment

Value Function Intermediate

Expected cumulative reward from a state or state-action pair.

AI Economics & Strategy

Secure Inference Intermediate

Methods to protect model/data during inference (e.g., trusted execution environments) from operators/attackers.

Foundations & Theory

Inference Pipeline Intermediate

Model execution path in production.

MLOps & Infrastructure

Inference Cost Intermediate

Cost to run models in production.

AI Economics & Strategy

Causal Inference Intermediate

Framework for reasoning about cause-effect relationships beyond correlation, often using structural assumptions and experiments.

Foundations & Theory

Online Inference Intermediate

Low-latency prediction per request.

MLOps & Infrastructure

Batch Inference Intermediate

Running predictions on large datasets periodically.

MLOps & Infrastructure

Exposure Bias Intermediate

Differences between training and inference conditions.

Model Failure Modes

Reinforcement Learning Intermediate

A learning paradigm where an agent interacts with an environment and learns to choose actions to maximize cumulative reward.

Reinforcement Learning

RLHF Intermediate

Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.

Optimization

Guardrails Intermediate

Rules and controls around generation (filters, validators, structured outputs) to reduce unsafe or invalid behavior.

Reinforcement Learning

Value Misalignment Advanced

Model optimizes objectives misaligned with human values.

AI Safety & Alignment

Policy Gradient Intermediate

Optimizing policies directly via gradient ascent on expected reward.

AI Economics & Strategy

Outer Alignment Advanced

Correctly specifying goals.

AI Safety & Alignment

Active Inference Frontier

Acting to minimize surprise or free energy.

World Models & Cognition

Value Learning Intermediate

Inferring and aligning with human preferences.

Governance & Ethics

Compute Intermediate

Hardware resources used for training/inference; constrained by memory bandwidth, FLOPs, and parallelism.

Foundations & Theory

Edge Inference Intermediate

Running models locally.

AI Economics & Strategy

DPO Intermediate

A preference-based training method optimizing policies directly from pairwise comparisons without explicit RL loops.

Optimization

Policy Intermediate

Strategy mapping states to actions.

AI Economics & Strategy

Markov Decision Process Intermediate

Formal framework for sequential decision-making under uncertainty.

AI Economics & Strategy

Bellman Equation Intermediate

Fundamental recursive relationship defining optimal value functions.

AI Economics & Strategy

Q-Function Intermediate

Expected return of taking action in a state.

AI Economics & Strategy

Exploration-Exploitation Tradeoff Intermediate

Balancing learning new behaviors vs exploiting known rewards.

AI Economics & Strategy

Results for "reward inference"

Welcome to AI Glossary

Search

Browse

3D WordGraph