Results for "model-based"
Model-Based RL
AdvancedRL using learned or known environment models.
Model-based reinforcement learning is like having a map while exploring a new city. Instead of wandering around aimlessly, you can look at the map to plan your route and make better decisions about where to go next. In this type of learning, an AI agent first learns how the environment works—like...
Continuous cycle of observation, reasoning, action, and feedback.
Separates planning from execution in agent architectures.
Simultaneous Localization and Mapping for robotics.
Flat high-dimensional regions slowing training.
Distributed agents producing emergent intelligence.
Guaranteed response times.
Artificial environment for training/testing agents.
Directly optimizing control policies.
Space of all possible robot configurations.
Sampling-based motion planner.
Learning by minimizing prediction error.
Software regulated as a medical device.
Learning only from current policy’s data.
Central system to store model versions, metadata, approvals, and deployment state.
Reconstructing a model or its capabilities via API queries or leaked artifacts.
Risk of incorrect financial models.
Model trained to predict human preferences (or utility) for candidate outputs; used in RLHF-style pipelines.
Diffusion model trained to remove noise step by step.
Assigning a role or identity to the model.
RL without explicit dynamics model.
Predicts masked tokens in a sequence, enabling bidirectional context; often used for embeddings rather than generation.
The text (and possibly other modalities) given to an LLM to condition its output behavior.
Multiple examples included in prompt.
Asking model to review and improve output.
Learned model of environment dynamics.
Credit models with interpretable logic.
Learning where data arrives sequentially and the model updates continuously, often under changing distributions.
A function measuring prediction error (and sometimes calibration), guiding gradient-based optimization.
A high-capacity language model trained on massive corpora, exhibiting broad generalization and emergent behaviors.
Maximum number of tokens the model can attend to in one forward pass; constrains long-document reasoning.