Results for "contrastive vision-language"
Generates sequences one token at a time, conditioning on past tokens.
A datastore optimized for similarity search over embeddings, enabling semantic retrieval at scale.
Retrieval based on embedding similarity rather than keyword overlap, capturing paraphrases and related concepts.
Search algorithm for generation that keeps top-k partial sequences; can improve likelihood but reduce diversity.
Scales logits before sampling; higher increases randomness/diversity, lower increases determinism.
Samples from the k highest-probability tokens to limit unlikely outputs.
Samples from the smallest set of tokens whose probabilities sum to p, adapting set size by context.
Converts logits to probabilities by exponentiation and normalization; common in classification and LMs.
A dataset + metric suite for comparing models; can be gamed or misaligned with real-world goals.
Attacks that manipulate model instructions (especially via retrieved content) to override system goals or exfiltrate data.
Methods for breaking goals into steps; can be classical (A*, STRIPS) or LLM-driven with tool calls.
Prevents attention to future tokens during training/inference.
Generating speech audio from text, with control over prosody, speaker identity, and style.
Allows model to attend to information from different subspaces simultaneously.
Attention mechanisms that reduce quadratic complexity.
Routes inputs to subsets of parameters for scalable capacity.
Capabilities that appear only beyond certain model sizes.
Structured graph encoding facts as entity–relation–entity triples.
Generating human-like speech from text.
Maps audio signals to linguistic units.
Detects trigger phrases in audio streams.
Identifying speakers in audio.
Agent calls external tools dynamically.
Scaling law optimizing compute vs data.
One example included to guide output.
Multiple examples included in prompt.
Sampling multiple outputs and selecting consensus.
Breaking tasks into sub-steps.
Temporary reasoning space (often hidden).
Prompt augmented with retrieved documents.