Results for "autoregressive training"
Hardware resources used for training/inference; constrained by memory bandwidth, FLOPs, and parallelism.
Training a smaller “student” model to mimic a larger “teacher,” often improving efficiency while retaining performance.
Maliciously inserting or altering training data to implant backdoors or degrade performance.
Tendency to trust automated suggestions even when incorrect; mitigated by UI design, training, and checks.
Error due to sensitivity to fluctuations in the training dataset.
Adjusting learning rate over training to improve convergence.
Gradually increasing learning rate at training start to avoid divergence.
Prevents attention to future tokens during training/inference.
Recovering training data from gradients.
Inferring sensitive features of training data.
Models that learn to generate samples resembling training data.
Flat high-dimensional regions slowing training.
Model behaves well during training but not deployment.
Differences between training and inference conditions.
Artificial environment for training/testing agents.
Differences between training and deployed patient populations.
Combining simulation and real-world data.