Results for "performance"
A scalar measure optimized during training, typically expected loss over data, sometimes with regularization terms.
Techniques that discourage overly complex solutions to improve generalization (reduce overfitting).
When a model cannot capture underlying structure, performing poorly on both training and test data.
How well a model performs on new data drawn from the same (or similar) distribution as training.
Of predicted positives, the fraction that are truly positive; sensitive to false positives.
Of true positives, the fraction correctly identified; sensitive to false negatives.
Of true negatives, the fraction correctly identified.
Penalizes confident wrong predictions heavily; standard for classification and language modeling.
Uses an exponential moving average of gradients to speed convergence and reduce oscillation.
Number of samples per gradient update; impacts compute efficiency, generalization, and stability.
Methods to set starting weights to preserve signal/gradient scales across layers.
Nonlinear functions enabling networks to approximate complex mappings; ReLU variants dominate modern DL.
Techniques that stabilize and speed training by normalizing activations; LayerNorm is common in Transformers.
Randomly zeroing activations during training to reduce co-adaptation and overfitting.
Networks using convolution operations with weight sharing and locality, effective for images and signals.
Mechanism that computes context-aware mixtures of representations; scales well and captures long-range dependencies.
Converting text into discrete units (tokens) for modeling; subword tokenizers balance vocabulary size and coverage.
The set of tokens a model can represent; impacts efficiency, multilinguality, and handling of rare strings.
A high-priority instruction layer setting overarching behavior constraints for a chat model.
Achieving task performance by providing a small number of examples inside the prompt without weight updates.
Breaking documents into pieces for retrieval; chunk size/overlap strongly affect RAG quality.
Fine-tuning on (prompt, response) pairs to align a model with instruction-following behaviors.
Controlled experiment comparing variants by random assignment to estimate causal effects of changes.
Human or automated process of assigning targets; quality, consistency, and guidelines matter heavily.
Selecting the most informative samples to label (e.g., uncertainty sampling) to reduce labeling cost.
Ordering training samples from easier to harder to improve convergence or generalization.
When some classes are rare, requiring reweighting, resampling, or specialized metrics.
Processes and controls for data quality, access, lineage, retention, and compliance across the AI lifecycle.
Policies and practices for approving, monitoring, auditing, and documenting models in production.
Central system to store model versions, metadata, approvals, and deployment state.