Results for "updates"
Training across many devices/silos without centralizing raw data; aggregates updates, not data.
The relationship between inputs and outputs changes over time, requiring monitoring and model updates.
Achieving task performance by providing a small number of examples inside the prompt without weight updates.
Combines value estimation (critic) with policy learning (actor).
Restricting updates to safe regions.
Learning only from current policy’s data.
Learning where data arrives sequentially and the model updates continuously, often under changing distributions.
The learned numeric values of a model adjusted during training to minimize a loss function.
A function measuring prediction error (and sometimes calibration), guiding gradient-based optimization.
Iterative method that updates parameters in the direction of negative gradient to minimize loss.
Uses an exponential moving average of gradients to speed convergence and reduce oscillation.
Controls the size of parameter updates; too high diverges, too low trains slowly or gets stuck.
Gradients shrink through layers, slowing learning in early layers; mitigated by ReLU, residuals, normalization.
Gradients grow too large, causing divergence; mitigated by clipping, normalization, careful init.
Updating beliefs about parameters using observed evidence and prior distributions.
Gradually increasing learning rate at training start to avoid divergence.
Optimization using curvature information; often expensive at scale.
Matrix of second derivatives describing local curvature of loss.
Monte Carlo method for state estimation.
Running new model alongside production without user impact.
Temporary reasoning space (often hidden).
Acting to minimize surprise or free energy.