Difficulty: Intermediate
The text (and possibly other modalities) given to an LLM to condition its output behavior.
Crafting prompts to elicit desired behavior, often using role, structure, constraints, and examples.
Attacks that manipulate model instructions (especially via retrieved content) to override system goals or exfiltrate data.
Extracting system prompts or hidden instructions.
Small prompt changes cause large output changes.
Temporal and pitch characteristics of speech.
Removing weights or neurons to shrink models and improve efficiency; can be structured or unstructured.
Expected return of taking action in a state.
Reducing numeric precision of weights/activations to speed inference and reduce memory with acceptable accuracy loss.
Measures a model’s ability to fit random noise; used to bound generalization error.
Architecture that retrieves relevant documents (e.g., from a vector DB) and conditions generation on them to reduce hallucinations.
Of true positives, the fraction correctly identified; sensitive to false negatives.
Networks with recurrent connections for sequences; largely supplanted by Transformers for many tasks.
Stress-testing models for failures, vulnerabilities, policy violations, and harmful behaviors before release.
Techniques that discourage overly complex solutions to improve generalization (reduce overfitting).
A learning paradigm where an agent interacts with an environment and learns to choose actions to maximize cumulative reward.
Activation max(0, x); improves gradient flow and training speed in deep nets.
Automatically learning useful internal features (latent variables) that capture salient structure for downstream tasks.
Ability to replicate results given same code/data; harder in distributed training and nondeterministic ops.
Allows gradients to bypass layers, enabling very deep networks.
A discipline ensuring AI systems are fair, safe, transparent, privacy-preserving, and accountable throughout lifecycle.
Simplified Boltzmann Machine with bipartite structure.
Model trained to predict human preferences (or utility) for candidate outputs; used in RLHF-style pipelines.
Quantifying financial risk.
Central log of AI-related risks.
Grouping patients by predicted outcomes.
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
Control that remains stable under model uncertainty.
Plots true positive rate vs false positive rate across thresholds; summarizes separability.
Encodes positional information via rotation in embedding space.