Results for "robustness"
Expanding training data via transformations (flips, noise, paraphrases) to improve robustness.
Maintaining alignment under new conditions.
A mismatch between training and deployment data distributions that can degrade model performance.
Techniques that discourage overly complex solutions to improve generalization (reduce overfitting).
How well a model performs on new data drawn from the same (or similar) distribution as training.
Techniques that stabilize and speed training by normalizing activations; LayerNorm is common in Transformers.
Framework for reasoning about cause-effect relationships beyond correlation, often using structural assumptions and experiments.
Measure of consistency across labelers; low agreement indicates ambiguous tasks or poor guidelines.
Artificially created data used to train/test models; helpful for privacy and coverage, risky if unrealistic.
Stress-testing models for failures, vulnerabilities, policy violations, and harmful behaviors before release.
Inputs crafted to cause model errors or unsafe behavior, often imperceptible in vision or subtle in text.
Forcing predictable formats for downstream systems; reduces parsing errors and supports validation/guardrails.
The range of functions a model can represent.
Framework for identifying, measuring, and mitigating model risks.
Embedding signals to prove model ownership.
Generator produces limited variety of outputs.
Using production outcomes to improve models.
Restricting updates to safe regions.
Methods like Adam adjusting learning rates dynamically.
Sampling multiple outputs and selecting consensus.
Applying learned patterns incorrectly.
Train/test environment mismatch.
Small prompt changes cause large output changes.
Control using real-time sensor feedback.
Randomizing simulation parameters to improve real-world transfer.
Performance drop when moving from simulation to reality.
Testing AI under actual clinical conditions.
Research ensuring AI remains safe.
Risk of incorrect financial models.