Results for "self-reinforcement"
Using production outcomes to improve models.
System that independently pursues goals over time.
Number of steps considered in planning.
Sampling from easier distribution with reweighting.
Optimization under uncertainty.
Model optimizes objectives misaligned with human values.
Ensuring learned behavior matches intended objective.
Learned subsystem that optimizes its own objective.
Model behaves well during training but not deployment.
Explicit output constraints (format, tone).
Algorithm computing control actions.
Artificial environment for training/testing agents.
Randomizing simulation parameters to improve real-world transfer.
Combining simulation and real-world data.
RL without explicit dynamics model.
RL using learned or known environment models.
Optimizing continuous action sequences.
Modifying reward to accelerate learning.
Learned model of environment dynamics.
Modeling environment evolution in latent space.
Learning by minimizing prediction error.
Imagined future trajectories.
Humans assist or override autonomous behavior.
Closed loop linking sensing and acting.
Robots learning via exploration and growth.
AI applied to scientific problems.
AI selecting next experiments.
Agents have opposing objectives.
AI tacitly coordinating prices.
AI limited to specific domains.