Results for "adversarial"
Adversarial Example
IntermediateInputs crafted to cause model errors or unsafe behavior, often imperceptible in vision or subtle in text.
An adversarial example is like a trick question designed to confuse a machine learning model. Imagine you have a smart assistant that can recognize pictures of cats and dogs. If someone changes a picture of a cat just a little bit, the assistant might get confused and think it's a dog instead. Th...
Inputs crafted to cause model errors or unsafe behavior, often imperceptible in vision or subtle in text.
Stress-testing models for failures, vulnerabilities, policy violations, and harmful behaviors before release.
Market reacting strategically to AI.
Two-network setup where generator fools a discriminator.
The internal space where learned representations live; operations here often correlate with semantics or generative factors.
Systematic differences in model outcomes across groups; arises from data, labels, and deployment context.
Artificially created data used to train/test models; helpful for privacy and coverage, risky if unrealistic.
Reconstructing a model or its capabilities via API queries or leaked artifacts.
Hidden behavior activated by specific triggers, causing targeted mispredictions or undesired outputs.
Methods to protect model/data during inference (e.g., trusted execution environments) from operators/attackers.
Models that learn to generate samples resembling training data.
Generator produces limited variety of outputs.
Generative model that learns to reverse a gradual noise process.
Changing speaker characteristics while preserving content.
Generates audio waveforms from spectrograms.
Model exploits poorly specified objectives.
Maximizing reward without fulfilling real goal.
Ensuring learned behavior matches intended objective.
Maintaining alignment under new conditions.
Model relies on irrelevant signals.
Modeling environment evolution in latent space.
Unequal performance across demographic groups.
Agents have opposing objectives.