Results for "insufficient capacity"
When a model cannot capture underlying structure, performing poorly on both training and test data.
One complete traversal of the training dataset during training.
Routes inputs to subsets of parameters for scalable capacity.
A measure of a model class’s expressive capacity based on its ability to shatter datasets.
Increasing model capacity via compute.
Maximum system processing rate.
A parameterized mapping from inputs to outputs; includes architecture + learned parameters.
The learned numeric values of a model adjusted during training to minimize a loss function.
When a model fits noise/idiosyncrasies of training data and performs poorly on unseen data.
The set of tokens a model can represent; impacts efficiency, multilinguality, and handling of rare strings.
A high-capacity language model trained on massive corpora, exhibiting broad generalization and emergent behaviors.
A model is PAC-learnable if it can, with high probability, learn an approximately correct hypothesis from finite samples.
The range of functions a model can represent.
Tradeoffs between many layers vs many neurons per layer.
Allows model to attend to information from different subspaces simultaneously.
Increasing performance via more data.
Number of linearly independent rows or columns.
Measures a model’s ability to fit random noise; used to bound generalization error.