Results for "adaptive learning rates"
Methods like Adam adjusting learning rates dynamically.
Popular optimizer combining momentum and per-parameter adaptive step sizes via first/second moment estimates.
Controls the size of parameter updates; too high diverges, too low trains slowly or gets stuck.
Flat high-dimensional regions slowing training.
Learning where data arrives sequentially and the model updates continuously, often under changing distributions.
System-level design for general intelligence.
Adjusting learning rate over training to improve convergence.
Ordering training samples from easier to harder to improve convergence or generalization.
Gradually increasing learning rate at training start to avoid divergence.
Visualization of optimization landscape.
Techniques that stabilize and speed training by normalizing activations; LayerNorm is common in Transformers.
Matrix of first-order derivatives for vector-valued functions.
Nonlinear functions enabling networks to approximate complex mappings; ReLU variants dominate modern DL.
Centralized AI expertise group.
Predicting borrower default risk.
Controlled experiment comparing variants by random assignment to estimate causal effects of changes.
Optimization using curvature information; often expensive at scale.
Storing results to reduce compute.
Simulating adverse scenarios.
Optimization under uncertainty.
The relationship between inputs and outputs changes over time, requiring monitoring and model updates.
Randomizing simulation parameters to improve real-world transfer.
Using production outcomes to improve models.
Coordination arising without explicit programming.
Shift in feature distribution over time.
Imagined future trajectories.
Closed loop linking sensing and acting.
Interleaving reasoning and tool use.
Acting to minimize surprise or free energy.
Selecting the most informative samples to label (e.g., uncertainty sampling) to reduce labeling cost.