Results for "pre-softmax scores"
Converts logits to probabilities by exponentiation and normalization; common in classification and LMs.
Raw model outputs before converting to probabilities; manipulated during decoding and calibration.
Chooses which experts process each token.
A single attention mechanism within multi-head attention.
Attention between different modalities.
Identifying and localizing objects in images, often with confidence scores and bounding rectangles.
Probability of treatment assignment given covariates.
Techniques that fine-tune small additional components rather than all weights to reduce compute and storage.
PEFT method injecting trainable low-rank matrices into layers, enabling efficient fine-tuning.
Assigning category labels to images.
Generating speech audio from text, with control over prosody, speaker identity, and style.
Generating human-like speech from text.
Task instruction without examples.
Assigning a role or identity to the model.
Robots learning via exploration and growth.
Mechanism that computes context-aware mixtures of representations; scales well and captures long-range dependencies.
Architecture based on self-attention and feedforward layers; foundation of modern LLMs and many multimodal models.
Attention where queries/keys/values come from the same sequence, enabling token-to-token interactions.
Stochastic generation strategies that trade determinism for diversity; key knobs include temperature and nucleus sampling.
Scales logits before sampling; higher increases randomness/diversity, lower increases determinism.
Studying internal mechanisms or input influence on outputs (e.g., saliency maps, SHAP, attention analysis).
Processes and controls for data quality, access, lineage, retention, and compliance across the AI lifecycle.
Attacks that infer whether specific records were in training data, or reconstruct sensitive training examples.
Prevents attention to future tokens during training/inference.
Allows model to attend to information from different subspaces simultaneously.
Attention mechanisms that reduce quadratic complexity.
Maximizing reward without fulfilling real goal.
AI predicting crime patterns (highly controversial).
Models estimating recidivism risk.
Identifying suspicious transactions.