Results for "transition function"
Optimal estimator for linear dynamic systems.
Expected return of taking action in a state.
Expected cumulative reward from a state or state-action pair.
Formal framework for sequential decision-making under uncertainty.
Fundamental recursive relationship defining optimal value functions.
Set of all actions available to the agent.
Probabilistic model for sequential data with latent states.
Temporary reasoning space (often hidden).
A scalar measure optimized during training, typically expected loss over data, sometimes with regularization terms.
Probability of data given parameters.
Converts logits to probabilities by exponentiation and normalization; common in classification and LMs.
Lowest possible loss.
A function measuring prediction error (and sometimes calibration), guiding gradient-based optimization.
A parameterized function composed of interconnected units organized in layers with nonlinear activations.
Nonlinear functions enabling networks to approximate complex mappings; ReLU variants dominate modern DL.
Direction of steepest ascent of a function.
Neural networks can approximate any continuous function under certain conditions.
Inferring reward function from observed behavior.
Learning a function from input-output pairs (labeled data), optimizing performance on predicting outputs for unseen inputs.
A parameterized mapping from inputs to outputs; includes architecture + learned parameters.
The learned numeric values of a model adjusted during training to minimize a loss function.
Iterative method that updates parameters in the direction of negative gradient to minimize loss.
Constraining model outputs into a schema used to call external APIs/tools safely and deterministically.
The shape of the loss function over parameter space.
Matrix of second derivatives describing local curvature of loss.
Combines value estimation (critic) with policy learning (actor).
Learns the score (∇ log p(x)) for generative sampling.
Matrix of first-order derivatives for vector-valued functions.
Matrix of curvature information.
Describes likelihoods of random variable outcomes.