Results for "adaptive learning rates"
A wide basin often correlated with better generalization.
Limiting gradient magnitude to prevent exploding gradients.
Matrix of second derivatives describing local curvature of loss.
Allows gradients to bypass layers, enabling very deep networks.
The range of functions a model can represent.
Capabilities that appear only beyond certain model sizes.
Extending agents with long-term memory stores.
Optimizing policies directly via gradient ascent on expected reward.
Multiple agents interacting cooperatively or competitively.
Models trained to decide when to call tools.
Ensuring decisions can be explained and traced.
Legal or policy requirement to explain AI decisions.
Recovering training data from gradients.
Detecting unauthorized model outputs or data leaks.
Neural networks that operate on graph-structured data by propagating information along edges.
Extension of convolution to graph domains using adjacency structure.
Graphs containing multiple node or edge types with different semantics.
GNN using attention to weight neighbor contributions dynamically.
Probabilistic graphical model for structured prediction.
Graphical model expressing factorization of a probability distribution.
Diffusion performed in latent space for efficiency.
Model that compresses input into latent space and reconstructs it.
Exact likelihood generative models using invertible transforms.
Combining signals from multiple modalities.
Generating human-like speech from text.
Changing speaker characteristics while preserving content.
Maps audio signals to linguistic units.
Identifying speakers in audio.
Generates audio waveforms from spectrograms.
Model execution path in production.