Results for "dataset documentation"
Measure of consistency across labelers; low agreement indicates ambiguous tasks or poor guidelines.
Expanding training data via transformations (flips, noise, paraphrases) to improve robustness.
Artificially created data used to train/test models; helpful for privacy and coverage, risky if unrealistic.
Maliciously inserting or altering training data to implant backdoors or degrade performance.
Tradeoffs between many layers vs many neurons per layer.
Detecting unauthorized model outputs or data leaks.
Models that learn to generate samples resembling training data.
Assigning category labels to images.
Joint vision-language model aligning images and text.
Persistent directional movement over time.
Detects trigger phrases in audio streams.
Cost of model training.
Running predictions on large datasets periodically.
Models whose weights are publicly available.
Applying learned patterns incorrectly.
Deep learning system for protein structure prediction.
Finding mathematical equations from data.
Updating a pretrained model’s weights on task-specific data to improve performance or adapt style/behavior.
A formal privacy framework ensuring outputs do not reveal much about any single individual’s data contribution.
Training a smaller “student” model to mimic a larger “teacher,” often improving efficiency while retaining performance.