Results for "reward inference"
Ensuring AI systems pursue intended human goals.
Ensuring learned behavior matches intended objective.
Model behaves well during training but not deployment.
Learning policies from expert demonstrations.
Tendency to gain control/resources.
Learning only from current policy’s data.
A hidden variable influences both cause and effect, biasing naive estimates of causal impact.
Time from request to response; critical for real-time inference and UX.
Reducing numeric precision of weights/activations to speed inference and reduce memory with acceptable accuracy loss.
Updating beliefs about parameters using observed evidence and prior distributions.
Prevents attention to future tokens during training/inference.
Autoencoder using probabilistic latent variables and KL regularization.
Variable enabling causal inference despite confounding.
Limiting inference usage.
Probability of data given parameters.
Converting text into discrete units (tokens) for modeling; subword tokenizers balance vocabulary size and coverage.
Controlled experiment comparing variants by random assignment to estimate causal effects of changes.
Selecting the most informative samples to label (e.g., uncertainty sampling) to reduce labeling cost.
A broader capability to infer internal system state from telemetry, crucial for AI services and agents.
How many requests or tokens can be processed per unit time; affects scalability and cost.
Attacks that infer whether specific records were in training data, or reconstruct sensitive training examples.
System design where humans validate or guide model outputs, especially for high-stakes decisions.
Measures how one probability distribution diverges from another.
Stores past attention states to speed up autoregressive decoding.
Estimating parameters by maximizing likelihood of observed data.
Ensuring decisions can be explained and traced.
Recovering training data from gradients.
Inferring sensitive features of training data.
Probabilistic graphical model for structured prediction.
Probabilistic model for sequential data with latent states.