Results for "on-device training"
Software regulated as a medical device.
One complete traversal of the training dataset during training.
Attacks that infer whether specific records were in training data, or reconstruct sensitive training examples.
End-to-end process for model training.
Cost of model training.
Training with a small labeled dataset plus a larger unlabeled dataset, leveraging assumptions like smoothness/cluster structure.
Training one model on multiple tasks simultaneously to improve generalization through shared structure.
Methods that learn training procedures or initializations so models can adapt quickly to new tasks with little data.
A mismatch between training and deployment data distributions that can degrade model performance.
The learned numeric values of a model adjusted during training to minimize a loss function.
Configuration choices not learned directly (or not typically learned) that govern training or architecture.
A scalar measure optimized during training, typically expected loss over data, sometimes with regularization terms.
Minimizing average loss on training data; can overfit when data is limited or biased.
When a model fits noise/idiosyncrasies of training data and performs poorly on unseen data.
When a model cannot capture underlying structure, performing poorly on both training and test data.
How well a model performs on new data drawn from the same (or similar) distribution as training.
Separating data into training (fit), validation (tune), and test (final estimate) to avoid leakage and optimism bias.
When information from evaluation data improperly influences training, inflating reported performance.
A gradient method using random minibatches for efficient training on large datasets.
Halting training when validation performance stops improving to reduce overfitting.
Activation max(0, x); improves gradient flow and training speed in deep nets.
Techniques that stabilize and speed training by normalizing activations; LayerNorm is common in Transformers.
Randomly zeroing activations during training to reduce co-adaptation and overfitting.
Training objective where the model predicts the next token given previous tokens (causal modeling).
A preference-based training method optimizing policies directly from pairwise comparisons without explicit RL loops.
Ordering training samples from easier to harder to improve convergence or generalization.
Expanding training data via transformations (flips, noise, paraphrases) to improve robustness.
Training across many devices/silos without centralizing raw data; aggregates updates, not data.
Ability to replicate results given same code/data; harder in distributed training and nondeterministic ops.
Hardware resources used for training/inference; constrained by memory bandwidth, FLOPs, and parallelism.