Difficulty: Intermediate
Learning only from current policy’s data.
Low-latency prediction per request.
Learning where data arrives sequentially and the model updates continuously, often under changing distributions.
Models whose weights are publicly available.
Pixel motion estimation between frames.
Finding control policies minimizing cumulative cost.
Coordinating tools, models, and steps (retrieval, calls, validation) to deliver reliable end-to-end behavior.
Probabilities do not reflect true correctness.
When a model fits noise/idiosyncrasies of training data and performs poorly on unseen data.
Applying learned patterns incorrectly.
A model is PAC-learnable if it can, with high probability, learn an approximately correct hypothesis from finite samples.
Using same parameters across different parts of a model.
Techniques that fine-tune small additional components rather than all weights to reduce compute and storage.
The learned numeric values of a model adjusted during training to minimize a loss function.
Monte Carlo method for state estimation.
Exponential of average negative log-likelihood; lower means better predictive fit, not necessarily better utility.
Classical controller balancing responsiveness and stability.
Information that can identify an individual (directly or indirectly); requires careful handling and compliance.
Separates planning from execution in agent architectures.
Methods for breaking goals into steps; can be classical (A*, STRIPS) or LLM-driven with tool calls.
The physical system being controlled.
Strategy mapping states to actions.
Optimizing policies directly via gradient ascent on expected reward.
Injects sequence order into Transformers, since attention alone is permutation-invariant.
Often more informative than ROC on imbalanced datasets; focuses on positive class performance.
Of predicted positives, the fraction that are truly positive; sensitive to false positives.
Shift in model outputs.
AI predicting crime patterns (highly controversial).
Attacks that infer whether specific records were in training data, or reconstruct sensitive training examples.
Predicting disease progression or survival.