Results for "on-device training"
PEFT method injecting trainable low-rank matrices into layers, enabling efficient fine-tuning.
Reducing numeric precision of weights/activations to speed inference and reduce memory with acceptable accuracy loss.
Tendency to trust automated suggestions even when incorrect; mitigated by UI design, training, and checks.
A point where gradient is zero but is neither a max nor min; common in deep nets.
The shape of the loss function over parameter space.
A wide basin often correlated with better generalization.
Limiting gradient magnitude to prevent exploding gradients.
Allows gradients to bypass layers, enabling very deep networks.
Prevents attention to future tokens during training/inference.
Capabilities that appear only beyond certain model sizes.
Controls amount of noise added at each diffusion step.
Two-network setup where generator fools a discriminator.
Minimum relative to nearby points.
Flat high-dimensional regions slowing training.
Ensuring learned behavior matches intended objective.
Maintaining alignment under new conditions.
Applying learned patterns incorrectly.
Model trained on its own outputs degrades quality.
Model relies on irrelevant signals.
Performance drop when moving from simulation to reality.
Differences between training and deployed patient populations.
Learning structure from unlabeled data, such as discovering groups, compressing representations, or modeling data distributions.
Training one model on multiple tasks simultaneously to improve generalization through shared structure.
A branch of ML using multi-layer neural networks to learn hierarchical representations, often excelling in vision, speech, and language.
A structured collection of examples used to train/evaluate models; quality, bias, and coverage often dominate outcomes.
A function measuring prediction error (and sometimes calibration), guiding gradient-based optimization.
Iterative method that updates parameters in the direction of negative gradient to minimize loss.
Popular optimizer combining momentum and per-parameter adaptive step sizes via first/second moment estimates.
Attention where queries/keys/values come from the same sequence, enabling token-to-token interactions.
Nonlinear functions enabling networks to approximate complex mappings; ReLU variants dominate modern DL.