Results for "model-based"
Model-Based RL
AdvancedRL using learned or known environment models.
Model-based reinforcement learning is like having a map while exploring a new city. Instead of wandering around aimlessly, you can look at the map to plan your route and make better decisions about where to go next. In this type of learning, an AI agent first learns how the environment works—like...
RL using learned or known environment models.
Exact likelihood generative models using invertible transforms.
Learns the score (∇ log p(x)) for generative sampling.
Combines value estimation (critic) with policy learning (actor).
Simple agent responding directly to inputs.
Dynamic resource allocation.
Continuous loop adjusting actions based on state feedback.
Algorithm computing control actions.
Predicts next state given current state and action.
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
Classifying models by impact level.
Internal representation of the agent itself.
A preference-based training method optimizing policies directly from pairwise comparisons without explicit RL loops.
Feature attribution method grounded in cooperative game theory for explaining predictions in tabular settings.
Detecting unauthorized model outputs or data leaks.
Models that define an energy landscape rather than explicit probabilities.
Removing weights or neurons to shrink models and improve efficiency; can be structured or unstructured.
Probabilistic energy-based neural network with hidden variables.
Acting to minimize surprise or free energy.
Achieving task performance by providing a small number of examples inside the prompt without weight updates.
Architecture based on self-attention and feedforward layers; foundation of modern LLMs and many multimodal models.
Monte Carlo method for state estimation.
Generating speech audio from text, with control over prosody, speaker identity, and style.
Software simulating physical laws.
Chooses which experts process each token.
Deep learning system for protein structure prediction.
Methods like Adam adjusting learning rates dynamically.
Retrieval based on embedding similarity rather than keyword overlap, capturing paraphrases and related concepts.
Samples from the smallest set of tokens whose probabilities sum to p, adapting set size by context.
Identifying and localizing objects in images, often with confidence scores and bounding rectangles.