Results for "compute-data-performance"
Methods to set starting weights to preserve signal/gradient scales across layers.
Techniques that stabilize and speed training by normalizing activations; LayerNorm is common in Transformers.
The set of tokens a model can represent; impacts efficiency, multilinguality, and handling of rare strings.
A high-priority instruction layer setting overarching behavior constraints for a chat model.
Breaking documents into pieces for retrieval; chunk size/overlap strongly affect RAG quality.
Controlled experiment comparing variants by random assignment to estimate causal effects of changes.
When some classes are rare, requiring reweighting, resampling, or specialized metrics.
Policies and practices for approving, monitoring, auditing, and documenting models in production.
Reducing numeric precision of weights/activations to speed inference and reduce memory with acceptable accuracy loss.
Reduction in uncertainty achieved by observing a variable; used in decision trees and active learning.
Removing weights or neurons to shrink models and improve efficiency; can be structured or unstructured.
Measures how one probability distribution diverges from another.
Hidden behavior activated by specific triggers, causing targeted mispredictions or undesired outputs.
Quantifies shared information between random variables.
Adjusting learning rate over training to improve convergence.
Gradually increasing learning rate at training start to avoid divergence.
Tradeoffs between many layers vs many neurons per layer.
Encodes positional information via rotation in embedding space.
Routes inputs to subsets of parameters for scalable capacity.
Extending agents with long-term memory stores.
Multiple agents interacting cooperatively or competitively.
Models evaluating and improving their own outputs.
Framework for identifying, measuring, and mitigating model risks.
Pixel-wise classification of image regions.
Transformer applied to image patches.
Maps audio signals to linguistic units.
Detects trigger phrases in audio streams.
System that independently pursues goals over time.
Number of steps considered in planning.
Average value under a distribution.