Difficulty: Intermediate
Flat high-dimensional regions slowing training.
A point where gradient is zero but is neither a max nor min; common in deep nets.
Automated detection/prevention of disallowed outputs (toxicity, self-harm, illegal instruction, etc.).
Software regulated as a medical device.
Stochastic generation strategies that trade determinism for diversity; key knobs include temperature and nucleus sampling.
Empirical laws linking model size, data, compute to performance.
Repeating temporal patterns.
Optimization using curvature information; often expensive at scale.
Methods to protect model/data during inference (e.g., trusted execution environments) from operators/attackers.
Assigning labels per pixel (semantic) or per instance (instance segmentation) to map object boundaries.
Attention where queries/keys/values come from the same sequence, enabling token-to-token interactions.
Models evaluating and improving their own outputs.
Learning from data by constructing “pseudo-labels” (e.g., next-token prediction, masked modeling) without manual annotation.
Retrieval based on embedding similarity rather than keyword overlap, capturing paraphrases and related concepts.
Pixel-wise classification of image regions.
Training with a small labeled dataset plus a larger unlabeled dataset, leveraging assumptions like smoothness/cluster structure.
Ability to correctly detect disease.
Models estimating recidivism risk.
Fine-tuning on (prompt, response) pairs to align a model with instruction-following behaviors.
AI used without governance approval.
Running new model alongside production without user impact.
Feature attribution method grounded in cooperative game theory for explaining predictions in tabular settings.
A narrow minimum often associated with poorer generalization.
Simultaneous Localization and Mapping for robotics.
Converts logits to probabilities by exponentiation and normalization; common in classification and LMs.
Attention mechanisms that reduce quadratic complexity.
Identifying speakers in audio.
Of true negatives, the fraction correctly identified.
Converting audio speech into text, often using encoder-decoder or transducer architectures.
Generating human-like speech from text.