Results for "gradient of density"
Iterative method that updates parameters in the direction of negative gradient to minimize loss.
A gradient method using random minibatches for efficient training on large datasets.
Limiting gradient magnitude to prevent exploding gradients.
Optimizing policies directly via gradient ascent on expected reward.
A function measuring prediction error (and sometimes calibration), guiding gradient-based optimization.
Number of samples per gradient update; impacts compute efficiency, generalization, and stability.
Activation max(0, x); improves gradient flow and training speed in deep nets.
Methods to set starting weights to preserve signal/gradient scales across layers.
A point where gradient is zero but is neither a max nor min; common in deep nets.
Choosing step size along gradient direction.
Gradients shrink through layers, slowing learning in early layers; mitigated by ReLU, residuals, normalization.
Gradients grow too large, causing divergence; mitigated by clipping, normalization, careful init.
Variability introduced by minibatch sampling during SGD.
Recovering training data from gradients.
Direction of steepest ascent of a function.