Domain: Foundations & Theory

127 terms

Data Governance Intermediate

Processes and controls for data quality, access, lineage, retention, and compliance across the AI lifecycle.

Data Labeling Intermediate

Human or automated process of assigning targets; quality, consistency, and guidelines matter heavily.

Data Leakage Intermediate

When information from evaluation data improperly influences training, inflating reported performance.

Data Lineage Intermediate

Tracking where data came from and how it was transformed; key for debugging and compliance.

Data Poisoning Intermediate

Maliciously inserting or altering training data to implant backdoors or degrade performance.

Datasheet for Datasets Intermediate

Structured dataset documentation covering collection, composition, recommended uses, biases, and maintenance.

Distillation Intermediate

Training a smaller “student” model to mimic a larger “teacher,” often improving efficiency while retaining performance.

Dropout Intermediate

Randomly zeroing activations during training to reduce co-adaptation and overfitting.

Dual Problem Intermediate

Alternative formulation providing bounds.

Early Stopping Intermediate

Halting training when validation performance stops improving to reduce overfitting.

Epoch Intermediate

One complete traversal of the training dataset during training.

Eval Harness Intermediate

System for running consistent evaluations across tasks, versions, prompts, and model settings.

Explainability Intermediate

Techniques to understand model decisions (global or local), important in high-stakes and regulated settings.

Exploding Gradient Intermediate

Gradients grow too large, causing divergence; mitigated by clipping, normalization, careful init.

F1 Score Intermediate

Harmonic mean of precision and recall; useful when balancing false positives/negatives matters.

Feature Intermediate

A measurable property or attribute used as model input (raw or engineered), such as age, pixel intensity, or token ID.

Feature Engineering Intermediate

Designing input features to expose useful structure (e.g., ratios, lags, aggregations), often crucial outside deep learning.

Federated Learning Intermediate

Training across many devices/silos without centralizing raw data; aggregates updates, not data.

Feedback Intermediate

Using output to adjust future inputs.

Few-Shot Learning Intermediate

Achieving task performance by providing a small number of examples inside the prompt without weight updates.

Function Calling Intermediate

Constraining model outputs into a schema used to call external APIs/tools safely and deterministically.

Generalization Intermediate

How well a model performs on new data drawn from the same (or similar) distribution as training.

Global Minimum Intermediate

Lowest possible loss.

Grounding Intermediate

Constraining outputs to retrieved or provided sources, often with citation, to improve factual reliability.

Human-in-the-Loop Intermediate

System design where humans validate or guide model outputs, especially for high-stakes decisions.

Inter-Annotator Agreement Intermediate

Measure of consistency across labelers; low agreement indicates ambiguous tasks or poor guidelines.

Interpretability Intermediate

Studying internal mechanisms or input influence on outputs (e.g., saliency maps, SHAP, attention analysis).

Lagrangian Intermediate

Converts constrained problem to unconstrained form.

Latency Intermediate

Time from request to response; critical for real-time inference and UX.

Latent Space Intermediate

The internal space where learned representations live; operations here often correlate with semantics or generative factors.