Search: metrics — AI Glossary

Confusion Matrix Intermediate

A table summarizing classification outcomes, foundational for metrics like precision, recall, specificity.

Foundations & Theory

Class Imbalance Intermediate

When some classes are rare, requiring reweighting, resampling, or specialized metrics.

Machine Learning

ROC Curve Intermediate

Plots true positive rate vs false positive rate across thresholds; summarizes separability.

Foundations & Theory

Model Card Intermediate

Standardized documentation describing intended use, performance, limitations, data, and ethical considerations.

Foundations & Theory

Audit Intermediate

Systematic review of model/data processes to ensure performance, fairness, security, and policy compliance.

Governance & Ethics

Benchmark Intermediate

A dataset + metric suite for comparing models; can be gamed or misaligned with real-world goals.

Evaluation & Benchmarking

Monitoring Intermediate

Observing model inputs/outputs, latency, cost, and quality over time to catch regressions and drift.

MLOps & Infrastructure

Specification Gaming Advanced

Model exploits poorly specified objectives.

AI Safety & Alignment

Model Documentation Intermediate

Required descriptions of model behavior and limits.

Governance & Ethics

Generalization Intermediate

How well a model performs on new data drawn from the same (or similar) distribution as training.

Foundations & Theory

Train/Validation/Test Split Intermediate

Separating data into training (fit), validation (tune), and test (final estimate) to avoid leakage and optimism bias.

Evaluation & Benchmarking

Accuracy Intermediate

Fraction of correct predictions; can be misleading on imbalanced datasets.

Foundations & Theory

Data Leakage Intermediate

When information from evaluation data improperly influences training, inflating reported performance.

Foundations & Theory

Calibration Intermediate

The degree to which predicted probabilities match true frequencies (e.g., 0.8 means ~80% correct).

Foundations & Theory

Prompt Engineering Intermediate

Crafting prompts to elicit desired behavior, often using role, structure, constraints, and examples.

Prompting & Instructions

Chain-of-Thought Intermediate

Stepwise reasoning patterns that can improve multi-step tasks; often handled implicitly or summarized for safety/privacy.

Foundations & Theory

Semantic Search Intermediate

Retrieval based on embedding similarity rather than keyword overlap, capturing paraphrases and related concepts.

Foundations & Theory

Safety Filter Intermediate

Automated detection/prevention of disallowed outputs (toxicity, self-harm, illegal instruction, etc.).

Foundations & Theory

Bias Intermediate

Systematic differences in model outcomes across groups; arises from data, labels, and deployment context.

Foundations & Theory

A/B Testing Intermediate

Controlled experiment comparing variants by random assignment to estimate causal effects of changes.

Foundations & Theory

Data Governance Intermediate

Processes and controls for data quality, access, lineage, retention, and compliance across the AI lifecycle.

Foundations & Theory

CI/CD for ML Intermediate

Automated testing and deployment processes for models and data workflows, extending DevOps to ML artifacts.

MLOps & Infrastructure

MLOps Intermediate

Practices for operationalizing ML: versioning, CI/CD, monitoring, retraining, and reliable production management.

MLOps & Infrastructure

Model Registry Intermediate

Central system to store model versions, metadata, approvals, and deployment state.

Foundations & Theory

Experiment Tracking Intermediate

Logging hyperparameters, code versions, data snapshots, and results to reproduce and compare experiments.

Evaluation & Benchmarking

Observability Intermediate

A broader capability to infer internal system state from telemetry, crucial for AI services and agents.

Evaluation & Benchmarking

Compute Intermediate

Hardware resources used for training/inference; constrained by memory bandwidth, FLOPs, and parallelism.

Foundations & Theory

Eval Harness Intermediate

System for running consistent evaluations across tasks, versions, prompts, and model settings.

Foundations & Theory

Responsible AI Intermediate

A discipline ensuring AI systems are fair, safe, transparent, privacy-preserving, and accountable throughout lifecycle.

Governance & Ethics

Emergent Abilities Intermediate

Capabilities that appear only beyond certain model sizes.

AI Economics & Strategy

Results for "metrics"

Welcome to AI Glossary

Search

Browse

3D WordGraph