Results for "demonstration-based"
Model trained to predict human preferences (or utility) for candidate outputs; used in RLHF-style pipelines.
Controlled experiment comparing variants by random assignment to estimate causal effects of changes.
Ordering training samples from easier to harder to improve convergence or generalization.
Central system to store model versions, metadata, approvals, and deployment state.
Artificially created data used to train/test models; helpful for privacy and coverage, risky if unrealistic.
Search algorithm for generation that keeps top-k partial sequences; can improve likelihood but reduce diversity.
Raw model outputs before converting to probabilities; manipulated during decoding and calibration.
Reconstructing a model or its capabilities via API queries or leaked artifacts.
Methods for breaking goals into steps; can be classical (A*, STRIPS) or LLM-driven with tool calls.
Constraining model outputs into a schema used to call external APIs/tools safely and deterministically.
Models that process or generate multiple modalities, enabling vision-language tasks, speech, video understanding, etc.
Limiting gradient magnitude to prevent exploding gradients.
AI subfield dealing with understanding and generating human language, including syntax, semantics, and pragmatics.
Built-in assumptions guiding learning efficiency and generalization.
Prevents attention to future tokens during training/inference.
Encodes token position explicitly, often via sinusoids.
A single attention mechanism within multi-head attention.
Routes inputs to subsets of parameters for scalable capacity.
Strategy mapping states to actions.
Extending agents with long-term memory stores.
Expected return of taking action in a state.
Coordination arising without explicit programming.
Optimizing policies directly via gradient ascent on expected reward.
Categorizing AI applications by impact and regulatory risk.
Learning from data generated by a different policy.
Extracting system prompts or hidden instructions.
Models trained to decide when to call tools.
Neural networks that operate on graph-structured data by propagating information along edges.
GNN framework where nodes iteratively exchange and aggregate messages from neighbors.
Diffusion model trained to remove noise step by step.