Results for "latent ability"
Of true negatives, the fraction correctly identified.
Often more informative than ROC on imbalanced datasets; focuses on positive class performance.
Number of samples per gradient update; impacts compute efficiency, generalization, and stability.
One complete traversal of the training dataset during training.
Activation max(0, x); improves gradient flow and training speed in deep nets.
Attention where queries/keys/values come from the same sequence, enabling token-to-token interactions.
Converting text into discrete units (tokens) for modeling; subword tokenizers balance vocabulary size and coverage.
Maximum number of tokens the model can attend to in one forward pass; constrains long-document reasoning.
Breaking documents into pieces for retrieval; chunk size/overlap strongly affect RAG quality.
Constraining outputs to retrieved or provided sources, often with citation, to improve factual reliability.
Fine-tuning on (prompt, response) pairs to align a model with instruction-following behaviors.
Model trained to predict human preferences (or utility) for candidate outputs; used in RLHF-style pipelines.
Automated detection/prevention of disallowed outputs (toxicity, self-harm, illegal instruction, etc.).
Tracking where data came from and how it was transformed; key for debugging and compliance.
Allows gradients to bypass layers, enabling very deep networks.
A theoretical framework analyzing what classes of functions can be learned, how efficiently, and with what guarantees.
Using same parameters across different parts of a model.
Built-in assumptions guiding learning efficiency and generalization.
Encodes positional information via rotation in embedding space.
A single attention mechanism within multi-head attention.
Set of all actions available to the agent.
Chooses which experts process each token.
Neural networks that operate on graph-structured data by propagating information along edges.
Extension of convolution to graph domains using adjacency structure.
Graphs containing multiple node or edge types with different semantics.
GNN using attention to weight neighbor contributions dynamically.
Models that define an energy landscape rather than explicit probabilities.
Generator produces limited variety of outputs.
Joint vision-language model aligning images and text.
Maps audio signals to linguistic units.