Results for "layers"
Vanishing Gradient
Intermediate
Gradients shrink through layers, slowing learning in early layers; mitigated by ReLU, residuals, normalization.
Neural Network
Intermediate
A parameterized function composed of interconnected units organized in layers with nonlinear activations.
Weight Initialization
Intermediate
Methods to set starting weights to preserve signal/gradient scales across layers.
Transformer
Intermediate
Architecture based on self-attention and feedforward layers; foundation of modern LLMs and many multimodal models.
LoRA
Intermediate
PEFT method injecting trainable low-rank matrices into layers, enabling efficient fine-tuning.
Residual Connection
Intermediate
Allows gradients to bypass layers, enabling very deep networks.
Depth vs Width
Intermediate
Tradeoffs between many layers vs many neurons per layer.