Results for "real-world testing"
Combining simulation and real-world data.
Artificial environment for training/testing agents.
Differences between simulated and real physics.
Learned model of environment dynamics.
Controlled experiment comparing variants by random assignment to estimate causal effects of changes.
Performance drop when moving from simulation to reality.
Randomizing simulation parameters to improve real-world transfer.
Automated testing and deployment processes for models and data workflows, extending DevOps to ML artifacts.
Stress-testing models for failures, vulnerabilities, policy violations, and harmful behaviors before release.
A dataset + metric suite for comparing models; can be gamed or misaligned with real-world goals.
Testing AI under actual clinical conditions.
AI systems that perceive and act in the physical world through sensors and actuators.
Simulating adverse scenarios.
Artificial sensor data generated in simulation.
Artificially created data used to train/test models; helpful for privacy and coverage, risky if unrealistic.
Running new model alongside production without user impact.
Guaranteed response times.
Training with a small labeled dataset plus a larger unlabeled dataset, leveraging assumptions like smoothness/cluster structure.
A mismatch between training and deployment data distributions that can degrade model performance.
Ensuring model behavior matches human goals, norms, and constraints, including reducing harmful or deceptive outputs.
When information from evaluation data improperly influences training, inflating reported performance.
Systematic differences in model outcomes across groups; arises from data, labels, and deployment context.
Model-generated content that is fluent but unsupported by evidence or incorrect; mitigated by grounding and verification.
Models evaluating and improving their own outputs.
Rules and controls around generation (filters, validators, structured outputs) to reduce unsafe or invalid behavior.
Simultaneous Localization and Mapping for robotics.
Systematic error introduced by simplifying assumptions in a learning algorithm.
Agent calls external tools dynamically.
Optimization under equality/inequality constraints.
Maintaining alignment under new conditions.