Results for "model-based"
Model-Based RL
AdvancedRL using learned or known environment models.
Model-based reinforcement learning is like having a map while exploring a new city. Instead of wandering around aimlessly, you can look at the map to plan your route and make better decisions about where to go next. In this type of learning, an AI agent first learns how the environment works—like...
Learning a function from input-output pairs (labeled data), optimizing performance on predicting outputs for unseen inputs.
Reusing knowledge from a source task/domain to improve learning on a target task/domain, typically via pretrained models.
A mismatch between training and deployment data distributions that can degrade model performance.
A measurable property or attribute used as model input (raw or engineered), such as age, pixel intensity, or token ID.
Configuration choices not learned directly (or not typically learned) that govern training or architecture.
Designing input features to expose useful structure (e.g., ratios, lags, aggregations), often crucial outside deep learning.
Techniques that discourage overly complex solutions to improve generalization (reduce overfitting).
A table summarizing classification outcomes, foundational for metrics like precision, recall, specificity.
Harmonic mean of precision and recall; useful when balancing false positives/negatives matters.
Plots true positive rate vs false positive rate across thresholds; summarizes separability.
The degree to which predicted probabilities match true frequencies (e.g., 0.8 means ~80% correct).
Iterative method that updates parameters in the direction of negative gradient to minimize loss.
Controls the size of parameter updates; too high diverges, too low trains slowly or gets stuck.
Attention where queries/keys/values come from the same sequence, enabling token-to-token interactions.
Converting text into discrete units (tokens) for modeling; subword tokenizers balance vocabulary size and coverage.
Training objective where the model predicts the next token given previous tokens (causal modeling).
Crafting prompts to elicit desired behavior, often using role, structure, constraints, and examples.
Systematic differences in model outcomes across groups; arises from data, labels, and deployment context.
Techniques to understand model decisions (global or local), important in high-stakes and regulated settings.
Selecting the most informative samples to label (e.g., uncertainty sampling) to reduce labeling cost.
PEFT method injecting trainable low-rank matrices into layers, enabling efficient fine-tuning.
Reducing numeric precision of weights/activations to speed inference and reduce memory with acceptable accuracy loss.
Stress-testing models for failures, vulnerabilities, policy violations, and harmful behaviors before release.
Attacks that infer whether specific records were in training data, or reconstruct sensitive training examples.
Methods to protect model/data during inference (e.g., trusted execution environments) from operators/attackers.
A model is PAC-learnable if it can, with high probability, learn an approximately correct hypothesis from finite samples.
Measures divergence between true and predicted probability distributions.
Systematic error introduced by simplifying assumptions in a learning algorithm.
Quantifies shared information between random variables.
Measures how much information an observable random variable carries about unknown parameters.