Results for "model-based"
Model-Based RL
AdvancedRL using learned or known environment models.
Model-based reinforcement learning is like having a map while exploring a new city. Instead of wandering around aimlessly, you can look at the map to plan your route and make better decisions about where to go next. In this type of learning, an AI agent first learns how the environment works—like...
Coordinating models, tools, and logic.
Optimizes future actions using a model of dynamics.
Mathematical representation of friction forces.
Updating a pretrained model’s weights on task-specific data to improve performance or adapt style/behavior.
The relationship between inputs and outputs changes over time, requiring monitoring and model updates.
How well a model performs on new data drawn from the same (or similar) distribution as training.
Fraction of correct predictions; can be misleading on imbalanced datasets.
Often more informative than ROC on imbalanced datasets; focuses on positive class performance.
Generates sequences one token at a time, conditioning on past tokens.
Stepwise reasoning patterns that can improve multi-step tasks; often handled implicitly or summarized for safety/privacy.
Techniques that fine-tune small additional components rather than all weights to reduce compute and storage.
Systematic review of model/data processes to ensure performance, fairness, security, and policy compliance.
Exponential of average negative log-likelihood; lower means better predictive fit, not necessarily better utility.
Maliciously inserting or altering training data to implant backdoors or degrade performance.
System design where humans validate or guide model outputs, especially for high-stakes decisions.
Using same parameters across different parts of a model.
Allows model to attend to information from different subspaces simultaneously.
Logged record of model inputs, outputs, and decisions.
Probabilistic graphical model for structured prediction.
Controls amount of noise added at each diffusion step.
Formal model linking causal mechanisms and variables.
Running predictions on large datasets periodically.
Incrementally deploying new models to reduce risk.
Increasing model capacity via compute.
Increasing performance via more data.
Cost to run models in production.
Learned subsystem that optimizes its own objective.
Breaking tasks into sub-steps.
Prompt augmented with retrieved documents.
Small prompt changes cause large output changes.