Results for "neural networks"
Neural networks can approximate any continuous function under certain conditions.
Networks with recurrent connections for sequences; largely supplanted by Transformers for many tasks.
Neural networks that operate on graph-structured data by propagating information along edges.
Generates audio waveforms from spectrograms.
A parameterized function composed of interconnected units organized in layers with nonlinear activations.
GNN framework where nodes iteratively exchange and aggregate messages from neighbors.
A branch of ML using multi-layer neural networks to learn hierarchical representations, often excelling in vision, speech, and language.
Automatically learning useful internal features (latent variables) that capture salient structure for downstream tasks.
Using same parameters across different parts of a model.
Nonlinear functions enabling networks to approximate complex mappings; ReLU variants dominate modern DL.
Activation max(0, x); improves gradient flow and training speed in deep nets.
Gradients grow too large, causing divergence; mitigated by clipping, normalization, careful init.
Tradeoffs between many layers vs many neurons per layer.
Loss of old knowledge when learning new tasks.
Techniques that stabilize and speed training by normalizing activations; LayerNorm is common in Transformers.
Networks using convolution operations with weight sharing and locality, effective for images and signals.
A model that assigns probabilities to sequences of tokens; often trained by next-token prediction.
Identifying and localizing objects in images, often with confidence scores and bounding rectangles.
Optimization with multiple local minima/saddle points; typical in neural networks.
Limiting gradient magnitude to prevent exploding gradients.
Maps audio signals to linguistic units.
Detects trigger phrases in audio streams.
Randomly zeroing activations during training to reduce co-adaptation and overfitting.
Allows gradients to bypass layers, enabling very deep networks.
Gradients shrink through layers, slowing learning in early layers; mitigated by ReLU, residuals, normalization.
Early architecture using learned gates for skip connections.
Generates sequences one token at a time, conditioning on past tokens.
Learning without catastrophic forgetting.
An RNN variant using gates to mitigate vanishing gradients and capture longer context.
Routes inputs to subsets of parameters for scalable capacity.