Results for "learning rate"
Learning Rate
IntermediateControls the size of parameter updates; too high diverges, too low trains slowly or gets stuck.
Think of the learning rate as the size of your steps when walking towards a destination. If you take giant steps, you might overshoot and miss your goal, but if you take tiny steps, you might take forever to get there. In machine learning, the learning rate controls how big of a change we make to...
Learning where data arrives sequentially and the model updates continuously, often under changing distributions.
Learning structure from unlabeled data, such as discovering groups, compressing representations, or modeling data distributions.
A branch of ML using multi-layer neural networks to learn hierarchical representations, often excelling in vision, speech, and language.
A learning paradigm where an agent interacts with an environment and learns to choose actions to maximize cumulative reward.
Methods that learn training procedures or initializations so models can adapt quickly to new tasks with little data.
Learning only from current policy’s data.
Automatically learning useful internal features (latent variables) that capture salient structure for downstream tasks.
Learning from data by constructing “pseudo-labels” (e.g., next-token prediction, masked modeling) without manual annotation.
A theoretical framework analyzing what classes of functions can be learned, how efficiently, and with what guarantees.
Training with a small labeled dataset plus a larger unlabeled dataset, leveraging assumptions like smoothness/cluster structure.
Achieving task performance by providing a small number of examples inside the prompt without weight updates.
Training across many devices/silos without centralizing raw data; aggregates updates, not data.
Built-in assumptions guiding learning efficiency and generalization.
Modifying reward to accelerate learning.
Training one model on multiple tasks simultaneously to improve generalization through shared structure.
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
Inferring reward function from observed behavior.
Learning by minimizing prediction error.
Humans assist or override autonomous behavior.
Robots learning via exploration and growth.
Inferring and aligning with human preferences.
A subfield of AI where models learn patterns from data to make predictions or decisions, improving with experience rather than explicit rule-coding.
Reusing knowledge from a source task/domain to improve learning on a target task/domain, typically via pretrained models.
A structured collection of examples used to train/evaluate models; quality, bias, and coverage often dominate outcomes.
Designing input features to expose useful structure (e.g., ratios, lags, aggregations), often crucial outside deep learning.
Human or automated process of assigning targets; quality, consistency, and guidelines matter heavily.
System design where humans validate or guide model outputs, especially for high-stakes decisions.
Strategy mapping states to actions.
Systematic error introduced by simplifying assumptions in a learning algorithm.
Combines value estimation (critic) with policy learning (actor).