Results for "safety cost"
Increasing model capacity via compute.
Scaling law optimizing compute vs data.
Competitive advantage from proprietary models/data.
Visualization of optimization landscape.
Finding routes from start to goal.
Stepwise reasoning patterns that can improve multi-step tasks; often handled implicitly or summarized for safety/privacy.
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
Ensuring model behavior matches human goals, norms, and constraints, including reducing harmful or deceptive outputs.
Stress-testing models for failures, vulnerabilities, policy violations, and harmful behaviors before release.
Rules and controls around generation (filters, validators, structured outputs) to reduce unsafe or invalid behavior.
Ensuring AI systems pursue intended human goals.
Maximizing reward without fulfilling real goal.
Tendency for agents to pursue resources regardless of final goal.
Model optimizes objectives misaligned with human values.
Correctly specifying goals.
Model behaves well during training but not deployment.
Learned subsystem that optimizes its own objective.
Maintaining alignment under new conditions.
Willingness of system to accept correction or shutdown.
European regulation classifying AI systems by risk.
AI used in sensitive domains requiring compliance.
Governance of model changes.
Control shared between human and agent.
Ensuring robots do not harm humans.
Testing AI under actual clinical conditions.
US approval process for medical AI devices.
Software regulated as a medical device.
AI capable of performing most intellectual tasks humans can.
Existential risk from AI systems.
Incremental capability growth.