Results for "gradient of density"
Iterative method that updates parameters in the direction of negative gradient to minimize loss.
Variability introduced by minibatch sampling during SGD.
A gradient method using random minibatches for efficient training on large datasets.
Limiting gradient magnitude to prevent exploding gradients.
Optimizing policies directly via gradient ascent on expected reward.
Direction of steepest ascent of a function.
Generative model that learns to reverse a gradual noise process.
Learns the score (∇ log p(x)) for generative sampling.
Gradients grow too large, causing divergence; mitigated by clipping, normalization, careful init.
Recovering training data from gradients.
Estimating parameters by maximizing likelihood of observed data.
Exact likelihood generative models using invertible transforms.
Describes likelihoods of random variable outcomes.
Average value under a distribution.
Probability of data given parameters.
Modeling chemical systems computationally.
Number of samples per gradient update; impacts compute efficiency, generalization, and stability.
Gradients shrink through layers, slowing learning in early layers; mitigated by ReLU, residuals, normalization.
Uses an exponential moving average of gradients to speed convergence and reduce oscillation.
Activation max(0, x); improves gradient flow and training speed in deep nets.
Combines value estimation (critic) with policy learning (actor).
Flat high-dimensional regions slowing training.
Choosing step size along gradient direction.
Directly optimizing control policies.
A function measuring prediction error (and sometimes calibration), guiding gradient-based optimization.
A point where gradient is zero but is neither a max nor min; common in deep nets.
An RNN variant using gates to mitigate vanishing gradients and capture longer context.
Methods like Adam adjusting learning rates dynamically.
The learned numeric values of a model adjusted during training to minimize a loss function.
A scalar measure optimized during training, typically expected loss over data, sometimes with regularization terms.