Results for "trial-and-error"
Ensuring AI allows shutdown.
Accelerating safety relative to capabilities.
A measure of a model class’s expressive capacity based on its ability to shatter datasets.
Using markers to isolate context segments.
Physical form contributes to computation.
A measurable property or attribute used as model input (raw or engineered), such as age, pixel intensity, or token ID.
A table summarizing classification outcomes, foundational for metrics like precision, recall, specificity.
Automatically learning useful internal features (latent variables) that capture salient structure for downstream tasks.
Fraction of correct predictions; can be misleading on imbalanced datasets.
Of predicted positives, the fraction that are truly positive; sensitive to false positives.
Of true positives, the fraction correctly identified; sensitive to false negatives.
Of true negatives, the fraction correctly identified.
Often more informative than ROC on imbalanced datasets; focuses on positive class performance.
The degree to which predicted probabilities match true frequencies (e.g., 0.8 means ~80% correct).
Randomly zeroing activations during training to reduce co-adaptation and overfitting.
Gradients shrink through layers, slowing learning in early layers; mitigated by ReLU, residuals, normalization.
Maximum number of tokens the model can attend to in one forward pass; constrains long-document reasoning.
A preference-based training method optimizing policies directly from pairwise comparisons without explicit RL loops.
PEFT method injecting trainable low-rank matrices into layers, enabling efficient fine-tuning.
Attacks that manipulate model instructions (especially via retrieved content) to override system goals or exfiltrate data.
Maliciously inserting or altering training data to implant backdoors or degrade performance.
A wide basin often correlated with better generalization.
Gradually increasing learning rate at training start to avoid divergence.
Matrix of second derivatives describing local curvature of loss.
Using same parameters across different parts of a model.
The range of functions a model can represent.
Encodes positional information via rotation in embedding space.
Techniques to handle longer documents without quadratic cost.
Capabilities that appear only beyond certain model sizes.
Multiple agents interacting cooperatively or competitively.