Results for "collective behavior"
Research ensuring AI remains safe.
Decisions dependent on others’ actions.
A mismatch between training and deployment data distributions that can degrade model performance.
The learned numeric values of a model adjusted during training to minimize a loss function.
Configuration choices not learned directly (or not typically learned) that govern training or architecture.
A table summarizing classification outcomes, foundational for metrics like precision, recall, specificity.
Number of samples per gradient update; impacts compute efficiency, generalization, and stability.
Crafting prompts to elicit desired behavior, often using role, structure, constraints, and examples.
Fine-tuning on (prompt, response) pairs to align a model with instruction-following behaviors.
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
Techniques to understand model decisions (global or local), important in high-stakes and regulated settings.
Studying internal mechanisms or input influence on outputs (e.g., saliency maps, SHAP, attention analysis).
A broader capability to infer internal system state from telemetry, crucial for AI services and agents.
Techniques that fine-tune small additional components rather than all weights to reduce compute and storage.
Attacks that manipulate model instructions (especially via retrieved content) to override system goals or exfiltrate data.
Coordinating tools, models, and steps (retrieval, calls, validation) to deliver reliable end-to-end behavior.
The shape of the loss function over parameter space.
Adjusting learning rate over training to improve convergence.
Strategy mapping states to actions.
Logged record of model inputs, outputs, and decisions.
Learning from data generated by a different policy.
Extracting system prompts or hidden instructions.
Models time evolution via hidden states.
Persistent directional movement over time.
Shift in feature distribution over time.
Interleaving reasoning and tool use.
Matrix of first-order derivatives for vector-valued functions.
Visualization of optimization landscape.
Ensuring AI systems pursue intended human goals.
Model exploits poorly specified objectives.