Results for "human values"
Standardized documentation describing intended use, performance, limitations, data, and ethical considerations.
Allows model to attend to information from different subspaces simultaneously.
Systematic error introduced by simplifying assumptions in a learning algorithm.
Stores past attention states to speed up autoregressive decoding.
Estimating parameters by maximizing likelihood of observed data.
Set of all actions available to the agent.
Expected cumulative reward from a state or state-action pair.
Fundamental recursive relationship defining optimal value functions.
Learning from data generated by a different policy.
Sequential data indexed by time.
Persistent directional movement over time.
Decomposes a matrix into orthogonal components; used in embeddings and compression.
Measure of vector magnitude; used in regularization and optimization.
Describes likelihoods of random variable outcomes.
Average value under a distribution.
Measure of spread around the mean.
Visualization of optimization landscape.
Alternative formulation providing bounds.
Review process before deployment.
Testing AI under actual clinical conditions.
Risk of incorrect financial models.
Truthful bidding is optimal strategy.
A formal privacy framework ensuring outputs do not reveal much about any single individual’s data contribution.
Measures a model’s ability to fit random noise; used to bound generalization error.
Learning only from current policy’s data.
Inferring the agent’s internal state from noisy sensor data.
Studying internal mechanisms or input influence on outputs (e.g., saliency maps, SHAP, attention analysis).
Ordering training samples from easier to harder to improve convergence or generalization.
A model that assigns probabilities to sequences of tokens; often trained by next-token prediction.
Tendency to trust automated suggestions even when incorrect; mitigated by UI design, training, and checks.