Results for "layers"
Gradients shrink through layers, slowing learning in early layers; mitigated by ReLU, residuals, normalization.
Allows gradients to bypass layers, enabling very deep networks.
A parameterized function composed of interconnected units organized in layers with nonlinear activations.
Methods to set starting weights to preserve signal/gradient scales across layers.
Networks using convolution operations with weight sharing and locality, effective for images and signals.
PEFT method injecting trainable low-rank matrices into layers, enabling efficient fine-tuning.
A narrow hidden layer forcing compact representations.
Tradeoffs between many layers vs many neurons per layer.
A branch of ML using multi-layer neural networks to learn hierarchical representations, often excelling in vision, speech, and language.
A parameterized mapping from inputs to outputs; includes architecture + learned parameters.
Configuration choices not learned directly (or not typically learned) that govern training or architecture.
Randomly zeroing activations during training to reduce co-adaptation and overfitting.
Architecture based on self-attention and feedforward layers; foundation of modern LLMs and many multimodal models.
Early architecture using learned gates for skip connections.
Hidden behavior activated by specific triggers, causing targeted mispredictions or undesired outputs.
Using same parameters across different parts of a model.
Capabilities that appear only beyond certain model sizes.
Empirical laws linking model size, data, compute to performance.
Simplified Boltzmann Machine with bipartite structure.
Exact likelihood generative models using invertible transforms.
Transformer applied to image patches.
CNNs applied to time series.
Mechanism to disable AI system.
Software pipeline converting raw sensor data into structured representations.