Results for "cross-modal"
Models that process or generate multiple modalities, enabling vision-language tasks, speech, video understanding, etc.
A robust evaluation technique that trains/evaluates across multiple splits to estimate performance variability.
Measures divergence between true and predicted probability distributions.
Attention between different modalities.
A scalar measure optimized during training, typically expected loss over data, sometimes with regularization terms.
A function measuring prediction error (and sometimes calibration), guiding gradient-based optimization.
Minimizing average loss on training data; can overfit when data is limited or biased.
When a model fits noise/idiosyncrasies of training data and performs poorly on unseen data.
How well a model performs on new data drawn from the same (or similar) distribution as training.
When information from evaluation data improperly influences training, inflating reported performance.
Penalizes confident wrong predictions heavily; standard for classification and language modeling.
Halting training when validation performance stops improving to reduce overfitting.
Training objective where the model predicts the next token given previous tokens (causal modeling).
Predicts masked tokens in a sequence, enabling bidirectional context; often used for embeddings rather than generation.
Assigning labels per pixel (semantic) or per instance (instance segmentation) to map object boundaries.
Model that compresses input into latent space and reconstructs it.
Assigning category labels to images.
Pixel-wise classification of image regions.
End-to-end process for model training.
Centralized AI expertise group.
Learning policies from expert demonstrations.
Learning action mapping directly from demonstrations.
A conceptual framework describing error as the sum of systematic error (bias) and sensitivity to data (variance).