Results for "reward inference"

25 results

Reward Hacking Advanced

Maximizing reward without fulfilling real goal.

AI Safety & Alignment
Reward Shaping Advanced

Modifying reward to accelerate learning.

Reinforcement Learning
Sparse Reward Advanced

Reward only given upon task completion.

Reinforcement Learning
Secure Inference Intermediate

Methods to protect model/data during inference (e.g., trusted execution environments) from operators/attackers.

Foundations & Theory
Reinforcement Learning Intermediate

A learning paradigm where an agent interacts with an environment and learns to choose actions to maximize cumulative reward.

Reinforcement Learning
RLHF Intermediate

Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.

Optimization
Value Function Intermediate

Expected cumulative reward from a state or state-action pair.

AI Economics & Strategy
Policy Gradient Intermediate

Optimizing policies directly via gradient ascent on expected reward.

AI Economics & Strategy
Inverse Reinforcement Learning Advanced

Inferring reward function from observed behavior.

Reinforcement Learning
Latency Intermediate

Time from request to response; critical for real-time inference and UX.

Foundations & Theory
Compute Intermediate

Hardware resources used for training/inference; constrained by memory bandwidth, FLOPs, and parallelism.

Foundations & Theory
Quantization Intermediate

Reducing numeric precision of weights/activations to speed inference and reduce memory with acceptable accuracy loss.

Foundations & Theory
Causal Mask Intermediate

Prevents attention to future tokens during training/inference.

AI Economics & Strategy
Instrumental Variable Advanced

Variable enabling causal inference despite confounding.

Causal AI & Interpretability
Exposure Bias Intermediate

Differences between training and inference conditions.

Model Failure Modes
Token Budgeting Intermediate

Limiting inference usage.

AI Economics & Strategy
Reward Model Intermediate

Model trained to predict human preferences (or utility) for candidate outputs; used in RLHF-style pipelines.

Foundations & Theory
Causal Inference Intermediate

Framework for reasoning about cause-effect relationships beyond correlation, often using structural assumptions and experiments.

Foundations & Theory
Bayesian Inference Intermediate

Updating beliefs about parameters using observed evidence and prior distributions.

AI Economics & Strategy
Inference Pipeline Intermediate

Model execution path in production.

MLOps & Infrastructure
Batch Inference Intermediate

Running predictions on large datasets periodically.

MLOps & Infrastructure
Online Inference Intermediate

Low-latency prediction per request.

MLOps & Infrastructure
Inference Cost Intermediate

Cost to run models in production.

AI Economics & Strategy
Edge Inference Intermediate

Running models locally.

AI Economics & Strategy
Active Inference Frontier

Acting to minimize surprise or free energy.

World Models & Cognition