Results for "direct preference optimization"
Measures joint variability between variables.
Maximizing reward without fulfilling real goal.
Learned subsystem that optimizes its own objective.
Using limited human feedback to guide large models.
Explicit output constraints (format, tone).
Asking model to review and improve output.
Breaking tasks into sub-steps.
Coordinating models, tools, and logic.
Requirement to provide explanations.
Limiting inference usage.
Maximum system processing rate.
Mathematical framework for controlling dynamic systems.
Optimizes future actions using a model of dynamics.
Control that remains stable under model uncertainty.
Computing joint angles for desired end-effector pose.
High-fidelity virtual model of a physical system.
Randomizing simulation parameters to improve real-world transfer.
Sampling-based motion planner.
Modeling environment evolution in latent space.
Ensuring robots do not harm humans.
Acting to minimize surprise or free energy.
Learning without catastrophic forgetting.
Fabrication of cases or statutes by LLMs.
AI-driven buying/selling of financial assets.
AI applied to scientific problems.
Finding mathematical equations from data.
Designing efficient marketplaces.
Collective behavior without central control.
Stored compute or algorithms enabling rapid jumps.
Tradeoff between safety and performance.