Results for "data → model"
The degree to which predicted probabilities match true frequencies (e.g., 0.8 means ~80% correct).
Plots true positive rate vs false positive rate across thresholds; summarizes separability.
Iterative method that updates parameters in the direction of negative gradient to minimize loss.
Controls the size of parameter updates; too high diverges, too low trains slowly or gets stuck.
Injects sequence order into Transformers, since attention alone is permutation-invariant.
Training objective where the model predicts the next token given previous tokens (causal modeling).
Techniques to understand model decisions (global or local), important in high-stakes and regulated settings.
PEFT method injecting trainable low-rank matrices into layers, enabling efficient fine-tuning.
Reducing numeric precision of weights/activations to speed inference and reduce memory with acceptable accuracy loss.
Removing weights or neurons to shrink models and improve efficiency; can be structured or unstructured.
Raw model outputs before converting to probabilities; manipulated during decoding and calibration.
Stress-testing models for failures, vulnerabilities, policy violations, and harmful behaviors before release.
Measures divergence between true and predicted probability distributions.
Quantifies shared information between random variables.
Measures how much information an observable random variable carries about unknown parameters.
Gradually increasing learning rate at training start to avoid divergence.
Prevents attention to future tokens during training/inference.
Stores past attention states to speed up autoregressive decoding.
Encodes positional information via rotation in embedding space.
Legal or policy requirement to explain AI decisions.
Graphical model expressing factorization of a probability distribution.
GNN using attention to weight neighbor contributions dynamically.
Assigning category labels to images.
Aligns transcripts with audio timestamps.
Explicit output constraints (format, tone).
Mechanics of price formation.
Groups adopting extreme positions.
Emergence of conventions among agents.
Using markers to isolate context segments.
Agents copy others’ actions.