Results for "task-specific"
A dataset + metric suite for comparing models; can be gamed or misaligned with real-world goals.
Attacks that manipulate model instructions (especially via retrieved content) to override system goals or exfiltrate data.
Identifying and localizing objects in images, often with confidence scores and bounding rectangles.
Identifying speakers in audio.
Assigning labels per pixel (semantic) or per instance (instance segmentation) to map object boundaries.
Model exploits poorly specified objectives.
Models trained to decide when to call tools.
One example included to guide output.
GNN using attention to weight neighbor contributions dynamically.
Multiple examples included in prompt.
Estimating robot position within a map.
A measurable property or attribute used as model input (raw or engineered), such as age, pixel intensity, or token ID.
The internal space where learned representations live; operations here often correlate with semantics or generative factors.
A table summarizing classification outcomes, foundational for metrics like precision, recall, specificity.
Randomly zeroing activations during training to reduce co-adaptation and overfitting.
Mechanism that computes context-aware mixtures of representations; scales well and captures long-range dependencies.
Attention where queries/keys/values come from the same sequence, enabling token-to-token interactions.
The text (and possibly other modalities) given to an LLM to condition its output behavior.
Crafting prompts to elicit desired behavior, often using role, structure, constraints, and examples.
Techniques to understand model decisions (global or local), important in high-stakes and regulated settings.
Controlled experiment comparing variants by random assignment to estimate causal effects of changes.
Selecting the most informative samples to label (e.g., uncertainty sampling) to reduce labeling cost.
Practices for operationalizing ML: versioning, CI/CD, monitoring, retraining, and reliable production management.
System for running consistent evaluations across tasks, versions, prompts, and model settings.
Constraining model outputs into a schema used to call external APIs/tools safely and deterministically.
Forcing predictable formats for downstream systems; reduces parsing errors and supports validation/guardrails.
A narrow hidden layer forcing compact representations.
Strategy mapping states to actions.
Expected return of taking action in a state.
Multiple agents interacting cooperatively or competitively.