Results for "standardized evaluation"
Running predictions on large datasets periodically.
Model optimizes objectives misaligned with human values.
Ensuring learned behavior matches intended objective.
Train/test environment mismatch.
Model behaves well during training but not deployment.
Required descriptions of model behavior and limits.
Governance of model changes.
AI applied to X-rays, CT, MRI, ultrasound, pathology slides.
Ability to correctly detect disease.
Failure to detect present disease.
US approval process for medical AI devices.