Results for "autoregressive training"
Fine-tuning on (prompt, response) pairs to align a model with instruction-following behaviors.
Model trained to predict human preferences (or utility) for candidate outputs; used in RLHF-style pipelines.
Training across many devices/silos without centralizing raw data; aggregates updates, not data.
Ability to replicate results given same code/data; harder in distributed training and nondeterministic ops.
PEFT method injecting trainable low-rank matrices into layers, enabling efficient fine-tuning.
Reducing numeric precision of weights/activations to speed inference and reduce memory with acceptable accuracy loss.
Tendency to trust automated suggestions even when incorrect; mitigated by UI design, training, and checks.
A point where gradient is zero but is neither a max nor min; common in deep nets.
The shape of the loss function over parameter space.
A wide basin often correlated with better generalization.
Limiting gradient magnitude to prevent exploding gradients.
Allows gradients to bypass layers, enabling very deep networks.
Capabilities that appear only beyond certain model sizes.
Controls amount of noise added at each diffusion step.
Two-network setup where generator fools a discriminator.
Minimum relative to nearby points.
Flat high-dimensional regions slowing training.
Ensuring learned behavior matches intended objective.
Maintaining alignment under new conditions.
Applying learned patterns incorrectly.
Model trained on its own outputs degrades quality.
Model relies on irrelevant signals.
Performance drop when moving from simulation to reality.
Differences between training and deployed patient populations.
Learning structure from unlabeled data, such as discovering groups, compressing representations, or modeling data distributions.
Training one model on multiple tasks simultaneously to improve generalization through shared structure.
A branch of ML using multi-layer neural networks to learn hierarchical representations, often excelling in vision, speech, and language.
A structured collection of examples used to train/evaluate models; quality, bias, and coverage often dominate outcomes.
A function measuring prediction error (and sometimes calibration), guiding gradient-based optimization.
Iterative method that updates parameters in the direction of negative gradient to minimize loss.