Results for "out-of-sample performance"
Techniques that stabilize and speed training by normalizing activations; LayerNorm is common in Transformers.
Randomly zeroing activations during training to reduce co-adaptation and overfitting.
Networks using convolution operations with weight sharing and locality, effective for images and signals.
Mechanism that computes context-aware mixtures of representations; scales well and captures long-range dependencies.
Converting text into discrete units (tokens) for modeling; subword tokenizers balance vocabulary size and coverage.
A high-priority instruction layer setting overarching behavior constraints for a chat model.
Achieving task performance by providing a small number of examples inside the prompt without weight updates.
Breaking documents into pieces for retrieval; chunk size/overlap strongly affect RAG quality.
Fine-tuning on (prompt, response) pairs to align a model with instruction-following behaviors.
Controlled experiment comparing variants by random assignment to estimate causal effects of changes.
Human or automated process of assigning targets; quality, consistency, and guidelines matter heavily.
Selecting the most informative samples to label (e.g., uncertainty sampling) to reduce labeling cost.
Ordering training samples from easier to harder to improve convergence or generalization.
When some classes are rare, requiring reweighting, resampling, or specialized metrics.
Processes and controls for data quality, access, lineage, retention, and compliance across the AI lifecycle.
Policies and practices for approving, monitoring, auditing, and documenting models in production.
Central system to store model versions, metadata, approvals, and deployment state.
Time from request to response; critical for real-time inference and UX.
How many requests or tokens can be processed per unit time; affects scalability and cost.
Techniques that fine-tune small additional components rather than all weights to reduce compute and storage.
Reducing numeric precision of weights/activations to speed inference and reduce memory with acceptable accuracy loss.
Mechanisms for retaining context across turns/sessions: scratchpads, vector memories, structured stores.
Removing weights or neurons to shrink models and improve efficiency; can be structured or unstructured.
Reduction in uncertainty achieved by observing a variable; used in decision trees and active learning.
Hidden behavior activated by specific triggers, causing targeted mispredictions or undesired outputs.
Measures how one probability distribution diverges from another.
AI subfield dealing with understanding and generating human language, including syntax, semantics, and pragmatics.
Quantifies shared information between random variables.
Converting audio speech into text, often using encoder-decoder or transducer architectures.
A narrow minimum often associated with poorer generalization.