Results for "trigger-based behavior"
The learned numeric values of a model adjusted during training to minimize a loss function.
Configuration choices not learned directly (or not typically learned) that govern training or architecture.
A table summarizing classification outcomes, foundational for metrics like precision, recall, specificity.
Number of samples per gradient update; impacts compute efficiency, generalization, and stability.
Crafting prompts to elicit desired behavior, often using role, structure, constraints, and examples.
Fine-tuning on (prompt, response) pairs to align a model with instruction-following behaviors.
Techniques to understand model decisions (global or local), important in high-stakes and regulated settings.
Studying internal mechanisms or input influence on outputs (e.g., saliency maps, SHAP, attention analysis).
A broader capability to infer internal system state from telemetry, crucial for AI services and agents.
Techniques that fine-tune small additional components rather than all weights to reduce compute and storage.
Attacks that manipulate model instructions (especially via retrieved content) to override system goals or exfiltrate data.
Coordinating tools, models, and steps (retrieval, calls, validation) to deliver reliable end-to-end behavior.
The shape of the loss function over parameter space.
Adjusting learning rate over training to improve convergence.
Logged record of model inputs, outputs, and decisions.
Models time evolution via hidden states.
Persistent directional movement over time.
Shift in feature distribution over time.
Matrix of first-order derivatives for vector-valued functions.
Visualization of optimization landscape.
Ensuring AI systems pursue intended human goals.
Model exploits poorly specified objectives.
Maximizing reward without fulfilling real goal.
Model optimizes objectives misaligned with human values.
Small prompt changes cause large output changes.
Required descriptions of model behavior and limits.
Mechanism to disable AI system.
AI used without governance approval.
Optimizes future actions using a model of dynamics.
Control that remains stable under model uncertainty.