Results for "gradient of density"
A branch of ML using multi-layer neural networks to learn hierarchical representations, often excelling in vision, speech, and language.
Average of squared residuals; common regression objective.
Learning where data arrives sequentially and the model updates continuously, often under changing distributions.
Popular optimizer combining momentum and per-parameter adaptive step sizes via first/second moment estimates.
Controls the size of parameter updates; too high diverges, too low trains slowly or gets stuck.
One complete traversal of the training dataset during training.
A parameterized function composed of interconnected units organized in layers with nonlinear activations.
Methods to set starting weights to preserve signal/gradient scales across layers.
Nonlinear functions enabling networks to approximate complex mappings; ReLU variants dominate modern DL.
Training across many devices/silos without centralizing raw data; aggregates updates, not data.
Converts logits to probabilities by exponentiation and normalization; common in classification and LMs.
Optimization problems where any local minimum is global.
Optimization with multiple local minima/saddle points; typical in neural networks.
Adjusting learning rate over training to improve convergence.
Optimization using curvature information; often expensive at scale.
Allows gradients to bypass layers, enabling very deep networks.
Early architecture using learned gates for skip connections.
Continuous cycle of observation, reasoning, action, and feedback.
Formal framework for sequential decision-making under uncertainty.
Inferring sensitive features of training data.
Expected cumulative reward from a state or state-action pair.
Probabilistic graphical model for structured prediction.
Pixel motion estimation between frames.
Measures similarity and projection between vectors.
Using production outcomes to improve models.
Sensitivity of a function to input perturbations.
Visualization of optimization landscape.
Minimum relative to nearby points.
Lowest possible loss.
Restricting updates to safe regions.