Results for "learning rate"
Learning Rate
IntermediateControls the size of parameter updates; too high diverges, too low trains slowly or gets stuck.
Think of the learning rate as the size of your steps when walking towards a destination. If you take giant steps, you might overshoot and miss your goal, but if you take tiny steps, you might take forever to get there. In machine learning, the learning rate controls how big of a change we make to...
Central system to store model versions, metadata, approvals, and deployment state.
Systematic review of model/data processes to ensure performance, fairness, security, and policy compliance.
Logging hyperparameters, code versions, data snapshots, and results to reproduce and compare experiments.
Ability to replicate results given same code/data; harder in distributed training and nondeterministic ops.
Time from request to response; critical for real-time inference and UX.
Hardware resources used for training/inference; constrained by memory bandwidth, FLOPs, and parallelism.
Reducing numeric precision of weights/activations to speed inference and reduce memory with acceptable accuracy loss.
Removing weights or neurons to shrink models and improve efficiency; can be structured or unstructured.
Converts logits to probabilities by exponentiation and normalization; common in classification and LMs.
A dataset + metric suite for comparing models; can be gamed or misaligned with real-world goals.
System for running consistent evaluations across tasks, versions, prompts, and model settings.
Stress-testing models for failures, vulnerabilities, policy violations, and harmful behaviors before release.
Methods to protect model/data during inference (e.g., trusted execution environments) from operators/attackers.
Models that process or generate multiple modalities, enabling vision-language tasks, speech, video understanding, etc.
Identifying and localizing objects in images, often with confidence scores and bounding rectangles.
Assigning labels per pixel (semantic) or per instance (instance segmentation) to map object boundaries.
Error due to sensitivity to fluctuations in the training dataset.
AI subfield dealing with understanding and generating human language, including syntax, semantics, and pragmatics.
Measures divergence between true and predicted probability distributions.
Generating speech audio from text, with control over prosody, speaker identity, and style.
Measures how one probability distribution diverges from another.
Estimating parameters by maximizing likelihood of observed data.
Updating beliefs about parameters using observed evidence and prior distributions.
Bayesian parameter estimation using the mode of the posterior distribution.
Optimization problems where any local minimum is global.
A point where gradient is zero but is neither a max nor min; common in deep nets.
The shape of the loss function over parameter space.
A wide basin often correlated with better generalization.
Limiting gradient magnitude to prevent exploding gradients.
Matrix of second derivatives describing local curvature of loss.