Results for "function approximation"
Modifying reward to accelerate learning.
A learning paradigm where an agent interacts with an environment and learns to choose actions to maximize cumulative reward.
Activation max(0, x); improves gradient flow and training speed in deep nets.
Raw model outputs before converting to probabilities; manipulated during decoding and calibration.
Optimization problems where any local minimum is global.
Estimating parameters by maximizing likelihood of observed data.
A narrow minimum often associated with poorer generalization.
Formal framework for sequential decision-making under uncertainty.
Models that define an energy landscape rather than explicit probabilities.
Optimizing policies directly via gradient ascent on expected reward.
Average value under a distribution.
Minimum relative to nearby points.
Optimization under equality/inequality constraints.
Converts constrained problem to unconstrained form.
Alternative formulation providing bounds.
Finding control policies minimizing cumulative cost.
Optimizes future actions using a model of dynamics.
Optimal control for linear systems with quadratic cost.
Learning policies from expert demonstrations.
Learning where data arrives sequentially and the model updates continuously, often under changing distributions.
Reusing knowledge from a source task/domain to improve learning on a target task/domain, typically via pretrained models.
Training one model on multiple tasks simultaneously to improve generalization through shared structure.
A continuous vector encoding of an item (word, image, user) such that semantic similarity corresponds to geometric closeness.
Designing input features to expose useful structure (e.g., ratios, lags, aggregations), often crucial outside deep learning.
Configuration choices not learned directly (or not typically learned) that govern training or architecture.
Automatically learning useful internal features (latent variables) that capture salient structure for downstream tasks.
Minimizing average loss on training data; can overfit when data is limited or biased.
Techniques that discourage overly complex solutions to improve generalization (reduce overfitting).
When a model fits noise/idiosyncrasies of training data and performs poorly on unseen data.
Average of squared residuals; common regression objective.