Search: out-of-sample performance

Fine-Tuning Intermediate

Updating a pretrained model’s weights on task-specific data to improve performance or adapt style/behavior.

Large Language Models

CI/CD for ML Intermediate

Automated testing and deployment processes for models and data workflows, extending DevOps to ML artifacts.

MLOps & Infrastructure

Data Scaling Intermediate

Increasing performance via more data.

AI Economics & Strategy

Latency SLA Intermediate

Guaranteed response times.

AI Economics & Strategy

Robust Control Intermediate

Control that remains stable under model uncertainty.

Foundations & Theory

Capability Overhang Advanced

Stored compute or algorithms enabling rapid jumps.

AI Safety & Alignment

Alignment Tax Advanced

Tradeoff between safety and performance.

AI Safety & Alignment

Domain Shift Intermediate

A mismatch between training and deployment data distributions that can degrade model performance.

MLOps & Infrastructure

Overfitting Intermediate

When a model fits noise/idiosyncrasies of training data and performs poorly on unseen data.

Foundations & Theory

Accuracy Intermediate

Fraction of correct predictions; can be misleading on imbalanced datasets.

Foundations & Theory

ROC Curve Intermediate

Plots true positive rate vs false positive rate across thresholds; summarizes separability.

Foundations & Theory

AUC Intermediate

Scalar summary of ROC; measures ranking ability, not calibration.

Foundations & Theory

Mean Squared Error Intermediate

Average of squared residuals; common regression objective.

Optimization

Early Stopping Intermediate

Halting training when validation performance stops improving to reduce overfitting.

Foundations & Theory

Model Card Intermediate

Standardized documentation describing intended use, performance, limitations, data, and ethical considerations.

Foundations & Theory

Experiment Tracking Intermediate

Logging hyperparameters, code versions, data snapshots, and results to reproduce and compare experiments.

Evaluation & Benchmarking

Audit Intermediate

Systematic review of model/data processes to ensure performance, fairness, security, and policy compliance.

Governance & Ethics

Observability Intermediate

A broader capability to infer internal system state from telemetry, crucial for AI services and agents.

Evaluation & Benchmarking

MLOps Intermediate

Practices for operationalizing ML: versioning, CI/CD, monitoring, retraining, and reliable production management.

MLOps & Infrastructure

Compute Intermediate

Hardware resources used for training/inference; constrained by memory bandwidth, FLOPs, and parallelism.

Foundations & Theory

Distillation Intermediate

Training a smaller “student” model to mimic a larger “teacher,” often improving efficiency while retaining performance.

Foundations & Theory

Perplexity Intermediate

Exponential of average negative log-likelihood; lower means better predictive fit, not necessarily better utility.

Evaluation & Benchmarking

Benchmark Intermediate

A dataset + metric suite for comparing models; can be gamed or misaligned with real-world goals.

Evaluation & Benchmarking

Eval Harness Intermediate

System for running consistent evaluations across tasks, versions, prompts, and model settings.

Foundations & Theory

Data Poisoning Intermediate

Maliciously inserting or altering training data to implant backdoors or degrade performance.

Foundations & Theory

Emergent Abilities Intermediate

Capabilities that appear only beyond certain model sizes.

AI Economics & Strategy

Image Classification Intermediate

Assigning category labels to images.

Computer Vision

Training Pipeline Intermediate

End-to-end process for model training.

MLOps & Infrastructure

Shadow Deployment Intermediate

Running new model alongside production without user impact.

MLOps & Infrastructure

Compute Scaling Intermediate

Increasing model capacity via compute.

AI Economics & Strategy

Results for "out-of-sample performance"

Welcome to AI Glossary

Search

Browse

3D WordGraph