Search: dataset documentation

Inter-Annotator Agreement Intermediate

Measure of consistency across labelers; low agreement indicates ambiguous tasks or poor guidelines.

Foundations & Theory

Data Augmentation Intermediate

Expanding training data via transformations (flips, noise, paraphrases) to improve robustness.

Foundations & Theory

Synthetic Data Intermediate

Artificially created data used to train/test models; helpful for privacy and coverage, risky if unrealistic.

Foundations & Theory

Data Poisoning Intermediate

Maliciously inserting or altering training data to implant backdoors or degrade performance.

Foundations & Theory

Depth vs Width Intermediate

Tradeoffs between many layers vs many neurons per layer.

AI Economics & Strategy

Canary Tokens Intermediate

Detecting unauthorized model outputs or data leaks.

AI Economics & Strategy

Generative Model Advanced

Models that learn to generate samples resembling training data.

Diffusion & Generative Models

Image Classification Intermediate

Assigning category labels to images.

Computer Vision

CLIP Intermediate

Joint vision-language model aligning images and text.

Computer Vision

Trend Component Intermediate

Persistent directional movement over time.

Time Series

Wake Word Detection Intermediate

Detects trigger phrases in audio streams.

Speech & Audio AI

Training Cost Intermediate

Cost of model training.

AI Economics & Strategy

Batch Inference Intermediate

Running predictions on large datasets periodically.

MLOps & Infrastructure

Open-Weight Model Intermediate

Models whose weights are publicly available.

AI Economics & Strategy

Overgeneralization Intermediate

Applying learned patterns incorrectly.

Model Failure Modes

AlphaFold Advanced

Deep learning system for protein structure prediction.

AI in Science

Symbolic Regression Advanced

Finding mathematical equations from data.

AI in Science

Fine-Tuning Intermediate

Updating a pretrained model’s weights on task-specific data to improve performance or adapt style/behavior.

Large Language Models

Differential Privacy Intermediate

A formal privacy framework ensuring outputs do not reveal much about any single individual’s data contribution.

Security & Privacy

Distillation Intermediate

Training a smaller “student” model to mimic a larger “teacher,” often improving efficiency while retaining performance.

Foundations & Theory

Results for "dataset documentation"

Welcome to AI Glossary

Search

Browse

3D WordGraph