Search: standardized evaluation

Model Card Intermediate

Standardized documentation describing intended use, performance, limitations, data, and ethical considerations.

Foundations & Theory

Eval Harness Intermediate

System for running consistent evaluations across tasks, versions, prompts, and model settings.

Foundations & Theory

Benchmark Intermediate

A dataset + metric suite for comparing models; can be gamed or misaligned with real-world goals.

Evaluation & Benchmarking

Central Limit Theorem Advanced

Sum of independent variables converges to normal distribution.

Probability & Statistics

Datasheet for Datasets Intermediate

Structured dataset documentation covering collection, composition, recommended uses, biases, and maintenance.

Foundations & Theory

NIST AI RMF Intermediate

US framework for AI risk governance.

Governance & Ethics

Data Leakage Intermediate

When information from evaluation data improperly influences training, inflating reported performance.

Foundations & Theory

Train/Validation/Test Split Intermediate

Separating data into training (fit), validation (tune), and test (final estimate) to avoid leakage and optimism bias.

Evaluation & Benchmarking

Cross-Validation Intermediate

A robust evaluation technique that trains/evaluates across multiple splits to estimate performance variability.

Foundations & Theory

Confusion Matrix Intermediate

A table summarizing classification outcomes, foundational for metrics like precision, recall, specificity.

Foundations & Theory

Accuracy Intermediate

Fraction of correct predictions; can be misleading on imbalanced datasets.

Foundations & Theory

F1 Score Intermediate

Harmonic mean of precision and recall; useful when balancing false positives/negatives matters.

Foundations & Theory

ROC Curve Intermediate

Plots true positive rate vs false positive rate across thresholds; summarizes separability.

Foundations & Theory

AUC Intermediate

Scalar summary of ROC; measures ranking ability, not calibration.

Foundations & Theory

Prompt Engineering Intermediate

Crafting prompts to elicit desired behavior, often using role, structure, constraints, and examples.

Prompting & Instructions

Class Imbalance Intermediate

When some classes are rare, requiring reweighting, resampling, or specialized metrics.

Machine Learning

MLOps Intermediate

Practices for operationalizing ML: versioning, CI/CD, monitoring, retraining, and reliable production management.

MLOps & Infrastructure

CI/CD for ML Intermediate

Automated testing and deployment processes for models and data workflows, extending DevOps to ML artifacts.

MLOps & Infrastructure

Model Registry Intermediate

Central system to store model versions, metadata, approvals, and deployment state.

Foundations & Theory

Automation Bias Intermediate

Tendency to trust automated suggestions even when incorrect; mitigated by UI design, training, and checks.

Foundations & Theory

Variance Term Intermediate

Error due to sensitivity to fluctuations in the training dataset.

AI Economics & Strategy

Mutual Information Intermediate

Quantifies shared information between random variables.

AI Economics & Strategy

Depth vs Width Intermediate

Tradeoffs between many layers vs many neurons per layer.

AI Economics & Strategy

State Space Intermediate

All possible configurations an agent may encounter.

AI Economics & Strategy

Policy Intermediate

Strategy mapping states to actions.

AI Economics & Strategy

Instance Segmentation Intermediate

Pixel-level separation of individual object instances.

Computer Vision

Semantic Segmentation Intermediate

Pixel-wise classification of image regions.

Computer Vision

Counterfactual Advanced

What would have happened under different conditions.

Causal AI & Interpretability

Training Pipeline Intermediate

End-to-end process for model training.

MLOps & Infrastructure

Specification Gaming Advanced

Model exploits poorly specified objectives.

AI Safety & Alignment

Results for "standardized evaluation"

Welcome to AI Glossary

Search

Browse

3D WordGraph