Results for "layer norm"
Measure of vector magnitude; used in regularization and optimization.
Emergence of conventions among agents.
A narrow hidden layer forcing compact representations.
Sensitivity of a function to input perturbations.
A branch of ML using multi-layer neural networks to learn hierarchical representations, often excelling in vision, speech, and language.
A parameterized function composed of interconnected units organized in layers with nonlinear activations.
Techniques that stabilize and speed training by normalizing activations; LayerNorm is common in Transformers.
A high-priority instruction layer setting overarching behavior constraints for a chat model.
Tradeoffs between many layers vs many neurons per layer.
Gradients grow too large, causing divergence; mitigated by clipping, normalization, careful init.
Limiting gradient magnitude to prevent exploding gradients.
Automated detection/prevention of disallowed outputs (toxicity, self-harm, illegal instruction, etc.).
Protecting data during network transfer and while stored; essential for ML pipelines handling sensitive data.
Raw model outputs before converting to probabilities; manipulated during decoding and calibration.
Converts logits to probabilities by exponentiation and normalization; common in classification and LMs.
Neural networks can approximate any continuous function under certain conditions.
A single attention mechanism within multi-head attention.
GNN framework where nodes iteratively exchange and aggregate messages from neighbors.
Models trained to decide when to call tools.
Simplified Boltzmann Machine with bipartite structure.