Results for "model-based"
Model-Based RL
AdvancedRL using learned or known environment models.
Model-based reinforcement learning is like having a map while exploring a new city. Instead of wandering around aimlessly, you can look at the map to plan your route and make better decisions about where to go next. In this type of learning, an AI agent first learns how the environment works—like...
Training with a small labeled dataset plus a larger unlabeled dataset, leveraging assumptions like smoothness/cluster structure.
Methods to set starting weights to preserve signal/gradient scales across layers.
Ordering training samples from easier to harder to improve convergence or generalization.
Pixel-level separation of individual object instances.
Artificially created data used to train/test models; helpful for privacy and coverage, risky if unrealistic.
Predicting future values from past observations.
Extension of convolution to graph domains using adjacency structure.
Agent reasoning about future outcomes.
Optimal estimator for linear dynamic systems.
Differences between training and inference conditions.
Control without feedback after execution begins.
Imagined future trajectories.
Grouping patients by predicted outcomes.
Ensuring models comply with lending fairness laws.
Predicting borrower default risk.
Returns above benchmark.
Effect of trades on prices.
Early signals disproportionately influence outcomes.
Inferring the agent’s internal state from noisy sensor data.
A subfield of AI where models learn patterns from data to make predictions or decisions, improving with experience rather than explicit rule-coding.
A continuous vector encoding of an item (word, image, user) such that semantic similarity corresponds to geometric closeness.
Mechanism that computes context-aware mixtures of representations; scales well and captures long-range dependencies.
An RNN variant using gates to mitigate vanishing gradients and capture longer context.
A datastore optimized for similarity search over embeddings, enabling semantic retrieval at scale.
Controlled experiment comparing variants by random assignment to estimate causal effects of changes.
Search algorithm for generation that keeps top-k partial sequences; can improve likelihood but reduce diversity.
Methods for breaking goals into steps; can be classical (A*, STRIPS) or LLM-driven with tool calls.
Limiting gradient magnitude to prevent exploding gradients.
AI subfield dealing with understanding and generating human language, including syntax, semantics, and pragmatics.
Strategy mapping states to actions.