Results for "query-key-value"
Attention between different modalities.
Expected cumulative reward from a state or state-action pair.
A single attention mechanism within multi-head attention.
Stores past attention states to speed up autoregressive decoding.
Fundamental recursive relationship defining optimal value functions.
Retrieval based on embedding similarity rather than keyword overlap, capturing paraphrases and related concepts.
Selecting the most informative samples to label (e.g., uncertainty sampling) to reduce labeling cost.
Reconstructing a model or its capabilities via API queries or leaked artifacts.
Combines value estimation (critic) with policy learning (actor).
Maximum expected loss under normal conditions.
Expected return of taking action in a state.
Sample mean converges to expected value.
Optimizing policies directly via gradient ascent on expected reward.
Approximating expectations via random sampling.
Model optimizes objectives misaligned with human values.
Directly optimizing control policies.
Inferring and aligning with human preferences.
Average value under a distribution.
Agents optimize collective outcomes.
Average of squared residuals; common regression objective.
Iterative method that updates parameters in the direction of negative gradient to minimize loss.
Feature attribution method grounded in cooperative game theory for explaining predictions in tabular settings.
Limiting gradient magnitude to prevent exploding gradients.
Models that define an energy landscape rather than explicit probabilities.
Decomposes a matrix into orthogonal components; used in embeddings and compression.
Describes likelihoods of random variable outcomes.
Variable whose values depend on chance.
RL without explicit dynamics model.
Stochastic generation strategies that trade determinism for diversity; key knobs include temperature and nucleus sampling.
A learning paradigm where an agent interacts with an environment and learns to choose actions to maximize cumulative reward.