Results for "safety science"
Tradeoff between safety and performance.
Accelerating safety relative to capabilities.
The field of building systems that perform tasks associated with human intelligence—perception, reasoning, language, planning, and decision-making—via algori...
Automated detection/prevention of disallowed outputs (toxicity, self-harm, illegal instruction, etc.).
Systems where failure causes physical harm.
Restricting distribution of powerful models.
Research ensuring AI remains safe.
Mechanism to disable AI system.
Hard constraints preventing unsafe actions.
Central system to store model versions, metadata, approvals, and deployment state.
A theoretical framework analyzing what classes of functions can be learned, how efficiently, and with what guarantees.
Sequential data indexed by time.
Field combining mechanics, control, perception, and AI to build autonomous machines.
Learning by minimizing prediction error.
Intelligence emerges from interaction with the physical world.
Closed loop linking sensing and acting.
Robots learning via exploration and growth.
AI applied to scientific problems.
AI discovering new compounds/materials.
Agents optimize collective outcomes.
No agent benefits from unilateral deviation.
Early signals disproportionately influence outcomes.
Groups adopting extreme positions.
Mathematical guarantees of system behavior.
Sudden jump to superintelligence.
Risk threatening humanity’s survival.
Stepwise reasoning patterns that can improve multi-step tasks; often handled implicitly or summarized for safety/privacy.
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
Ensuring model behavior matches human goals, norms, and constraints, including reducing harmful or deceptive outputs.
Rules and controls around generation (filters, validators, structured outputs) to reduce unsafe or invalid behavior.