Results for "self-reinforcement"
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
Architecture based on self-attention and feedforward layers; foundation of modern LLMs and many multimodal models.
Automated detection/prevention of disallowed outputs (toxicity, self-harm, illegal instruction, etc.).
A learning paradigm where an agent interacts with an environment and learns to choose actions to maximize cumulative reward.
Inferring reward function from observed behavior.
Learning from data by constructing “pseudo-labels” (e.g., next-token prediction, masked modeling) without manual annotation.
Attention where queries/keys/values come from the same sequence, enabling token-to-token interactions.
Models evaluating and improving their own outputs.
Sampling multiple outputs and selecting consensus.
Internal representation of the agent itself.