Results for "evaluation protocol"

AdvertisementAd space — search-top

38 results

Eval Harness Intermediate

System for running consistent evaluations across tasks, versions, prompts, and model settings.

Foundations & Theory
Data Leakage Intermediate

When information from evaluation data improperly influences training, inflating reported performance.

Foundations & Theory
Train/Validation/Test Split Intermediate

Separating data into training (fit), validation (tune), and test (final estimate) to avoid leakage and optimism bias.

Evaluation & Benchmarking
Cross-Validation Intermediate

A robust evaluation technique that trains/evaluates across multiple splits to estimate performance variability.

Foundations & Theory
Confusion Matrix Intermediate

A table summarizing classification outcomes, foundational for metrics like precision, recall, specificity.

Foundations & Theory
Accuracy Intermediate

Fraction of correct predictions; can be misleading on imbalanced datasets.

Foundations & Theory
F1 Score Intermediate

Harmonic mean of precision and recall; useful when balancing false positives/negatives matters.

Foundations & Theory
AUC Intermediate

Scalar summary of ROC; measures ranking ability, not calibration.

Foundations & Theory
ROC Curve Intermediate

Plots true positive rate vs false positive rate across thresholds; summarizes separability.

Foundations & Theory
Prompt Engineering Intermediate

Crafting prompts to elicit desired behavior, often using role, structure, constraints, and examples.

Prompting & Instructions
Class Imbalance Intermediate

When some classes are rare, requiring reweighting, resampling, or specialized metrics.

Machine Learning
Model Card Intermediate

Standardized documentation describing intended use, performance, limitations, data, and ethical considerations.

Foundations & Theory
MLOps Intermediate

Practices for operationalizing ML: versioning, CI/CD, monitoring, retraining, and reliable production management.

MLOps & Infrastructure
CI/CD for ML Intermediate

Automated testing and deployment processes for models and data workflows, extending DevOps to ML artifacts.

MLOps & Infrastructure
Model Registry Intermediate

Central system to store model versions, metadata, approvals, and deployment state.

Foundations & Theory
Benchmark Intermediate

A dataset + metric suite for comparing models; can be gamed or misaligned with real-world goals.

Evaluation & Benchmarking
Automation Bias Intermediate

Tendency to trust automated suggestions even when incorrect; mitigated by UI design, training, and checks.

Foundations & Theory
Variance Term Intermediate

Error due to sensitivity to fluctuations in the training dataset.

AI Economics & Strategy
Mutual Information Intermediate

Quantifies shared information between random variables.

AI Economics & Strategy
Depth vs Width Intermediate

Tradeoffs between many layers vs many neurons per layer.

AI Economics & Strategy
State Space Intermediate

All possible configurations an agent may encounter.

AI Economics & Strategy
Policy Intermediate

Strategy mapping states to actions.

AI Economics & Strategy
Instance Segmentation Intermediate

Pixel-level separation of individual object instances.

Computer Vision
Semantic Segmentation Intermediate

Pixel-wise classification of image regions.

Computer Vision
Counterfactual Advanced

What would have happened under different conditions.

Causal AI & Interpretability
Training Pipeline Intermediate

End-to-end process for model training.

MLOps & Infrastructure
Batch Inference Intermediate

Running predictions on large datasets periodically.

MLOps & Infrastructure
Specification Gaming Advanced

Model exploits poorly specified objectives.

AI Safety & Alignment
Value Misalignment Advanced

Model optimizes objectives misaligned with human values.

AI Safety & Alignment
Inner Alignment Advanced

Ensuring learned behavior matches intended objective.

AI Safety & Alignment

Welcome to AI Glossary

The free, self-building AI dictionary. Help us keep it free—click an ad once in a while!

Search

Type any question or keyword into the search bar at the top.

Browse

Tap a letter in the A–Z bar to browse terms alphabetically, or filter by domain, industry, or difficulty level.

3D WordGraph

Fly around the interactive 3D graph to explore how AI concepts connect. Click any word to read its full definition.