Results for "noisy gradients"
Gradients grow too large, causing divergence; mitigated by clipping, normalization, careful init.
Limiting gradient magnitude to prevent exploding gradients.
Recovering training data from gradients.
Methods like Adam adjusting learning rates dynamically.
Inferring the agent’s internal state from noisy sensor data.
Gradients shrink through layers, slowing learning in early layers; mitigated by ReLU, residuals, normalization.
Diffusion model trained to remove noise step by step.
Optimization under uncertainty.
Using limited human feedback to guide large models.
Optimal estimator for linear dynamic systems.
Uses an exponential moving average of gradients to speed convergence and reduce oscillation.
Methods to set starting weights to preserve signal/gradient scales across layers.
Allows gradients to bypass layers, enabling very deep networks.
A learning paradigm where an agent interacts with an environment and learns to choose actions to maximize cumulative reward.
A function measuring prediction error (and sometimes calibration), guiding gradient-based optimization.
Popular optimizer combining momentum and per-parameter adaptive step sizes via first/second moment estimates.
A gradient method using random minibatches for efficient training on large datasets.
One complete traversal of the training dataset during training.
Techniques that stabilize and speed training by normalizing activations; LayerNorm is common in Transformers.
Training across many devices/silos without centralizing raw data; aggregates updates, not data.
Networks with recurrent connections for sequences; largely supplanted by Transformers for many tasks.
Variability introduced by minibatch sampling during SGD.
An RNN variant using gates to mitigate vanishing gradients and capture longer context.
Tradeoffs between many layers vs many neurons per layer.
Optimizing policies directly via gradient ascent on expected reward.
Generator produces limited variety of outputs.
Pixel motion estimation between frames.
Matrix of first-order derivatives for vector-valued functions.
Algorithm computing control actions.