Results for "delayed feedback"
Reward only given upon task completion.
Continuous cycle of observation, reasoning, action, and feedback.
Using production outcomes to improve models.
Continuous loop adjusting actions based on state feedback.
Control using real-time sensor feedback.
Using output to adjust future inputs.
AI reinforcing market trends.
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
Using limited human feedback to guide large models.
Model trained on its own outputs degrades quality.
Models evaluating and improving their own outputs.
Control without feedback after execution begins.
Equations governing how system states change over time.
Human controlling robot remotely.
Closed loop linking sensing and acting.
A learning paradigm where an agent interacts with an environment and learns to choose actions to maximize cumulative reward.
Learning where data arrives sequentially and the model updates continuously, often under changing distributions.
Model trained to predict human preferences (or utility) for candidate outputs; used in RLHF-style pipelines.
Ensuring model behavior matches human goals, norms, and constraints, including reducing harmful or deceptive outputs.
System design where humans validate or guide model outputs, especially for high-stakes decisions.
Combines value estimation (critic) with policy learning (actor).
Coordination arising without explicit programming.
Running new model alongside production without user impact.
Incrementally deploying new models to reduce risk.
Shift in feature distribution over time.
Shift in model outputs.
Willingness of system to accept correction or shutdown.
Explicit output constraints (format, tone).
AI systems that perceive and act in the physical world through sensors and actuators.
Hardware components that execute physical actions.