Results for "weight reuse"
Centralized repository for curated features.
Methods to set starting weights to preserve signal/gradient scales across layers.
Models whose weights are publicly available.
Gradients grow too large, causing divergence; mitigated by clipping, normalization, careful init.
Networks using convolution operations with weight sharing and locality, effective for images and signals.
Achieving task performance by providing a small number of examples inside the prompt without weight updates.
Gradients shrink through layers, slowing learning in early layers; mitigated by ReLU, residuals, normalization.
PEFT method injecting trainable low-rank matrices into layers, enabling efficient fine-tuning.
Networks with recurrent connections for sequences; largely supplanted by Transformers for many tasks.
Raw model outputs before converting to probabilities; manipulated during decoding and calibration.
Removing weights or neurons to shrink models and improve efficiency; can be structured or unstructured.
A narrow hidden layer forcing compact representations.
Using same parameters across different parts of a model.
Models accessible only via service APIs.
GNN using attention to weight neighbor contributions dynamically.
Vectors with zero inner product; implies independence.
Loss of old knowledge when learning new tasks.
Learning without catastrophic forgetting.