Results for "demonstration-based"
RL using learned or known environment models.
Exact likelihood generative models using invertible transforms.
Combines value estimation (critic) with policy learning (actor).
Learns the score (∇ log p(x)) for generative sampling.
Simple agent responding directly to inputs.
Dynamic resource allocation.
Continuous loop adjusting actions based on state feedback.
Algorithm computing control actions.
Architecture based on self-attention and feedforward layers; foundation of modern LLMs and many multimodal models.
Achieving task performance by providing a small number of examples inside the prompt without weight updates.
Retrieval based on embedding similarity rather than keyword overlap, capturing paraphrases and related concepts.
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
A preference-based training method optimizing policies directly from pairwise comparisons without explicit RL loops.
Feature attribution method grounded in cooperative game theory for explaining predictions in tabular settings.
Removing weights or neurons to shrink models and improve efficiency; can be structured or unstructured.
Samples from the smallest set of tokens whose probabilities sum to p, adapting set size by context.
Identifying and localizing objects in images, often with confidence scores and bounding rectangles.
Continuous cycle of observation, reasoning, action, and feedback.
Generating speech audio from text, with control over prosody, speaker identity, and style.
Separates planning from execution in agent architectures.
Chooses which experts process each token.
Detecting unauthorized model outputs or data leaks.
Models that define an energy landscape rather than explicit probabilities.
Probabilistic energy-based neural network with hidden variables.
Simultaneous Localization and Mapping for robotics.
Monte Carlo method for state estimation.
Distributed agents producing emergent intelligence.
Flat high-dimensional regions slowing training.
Methods like Adam adjusting learning rates dynamically.
Classifying models by impact level.