Results for "instruction exposure"
Differences between training and inference conditions.
A high-priority instruction layer setting overarching behavior constraints for a chat model.
Fine-tuning on (prompt, response) pairs to align a model with instruction-following behaviors.
Task instruction without examples.
Attacks that infer whether specific records were in training data, or reconstruct sensitive training examples.
Methods to protect model/data during inference (e.g., trusted execution environments) from operators/attackers.
Quantifying financial risk.
Automated detection/prevention of disallowed outputs (toxicity, self-harm, illegal instruction, etc.).
One example included to guide output.
Multiple examples included in prompt.
Controlling robots via language.