Domain: Foundations & Theory
Tokenization
Intermediate
Converting text into discrete units (tokens) for modeling; subword tokenizers balance vocabulary size and coverage.
Top-k
Intermediate
Samples from the k highest-probability tokens to limit unlikely outputs.
Top-p
Intermediate
Samples from the smallest set of tokens whose probabilities sum to p, adapting set size by context.
Trust Region
Intermediate
Restricting updates to safe regions.
Underfitting
Intermediate
When a model cannot capture underlying structure, performing poorly on both training and test data.
Vanishing Gradient
Intermediate
Gradients shrink through layers, slowing learning in early layers; mitigated by ReLU, residuals, normalization.
Weight Initialization
Intermediate
Methods to set starting weights to preserve signal/gradient scales across layers.