Search: self-reinforcement

Q-Function Intermediate

Expected return of taking action in a state.

AI Economics & Strategy

Bellman Equation Intermediate

Fundamental recursive relationship defining optimal value functions.

AI Economics & Strategy

Off-Policy Learning Intermediate

Learning from data generated by a different policy.

AI Economics & Strategy

Exploration-Exploitation Tradeoff Intermediate

Balancing learning new behaviors vs exploiting known rewards.

AI Economics & Strategy

Agent Loop Intermediate

Continuous cycle of observation, reasoning, action, and feedback.

AI Economics & Strategy

Alignment Problem Advanced

Ensuring AI systems pursue intended human goals.

AI Safety & Alignment

Embodied AI Advanced

AI systems that perceive and act in the physical world through sensors and actuators.

Robotics & Embodied AI

Dynamics Model Advanced

Predicts next state given current state and action.

Reinforcement Learning

Policy Search Advanced

Directly optimizing control policies.

Reinforcement Learning

Sparse Reward Advanced

Reward only given upon task completion.

Reinforcement Learning

Shared Autonomy Frontier

Control shared between human and agent.

World Models & Cognition

Alignment Research Intermediate

Research ensuring AI remains safe.

Governance & Ethics

Machine Learning Intermediate

A subfield of AI where models learn patterns from data to make predictions or decisions, improving with experience rather than explicit rule-coding.

Machine Learning

System Prompt Intermediate

A high-priority instruction layer setting overarching behavior constraints for a chat model.

Reinforcement Learning

Online Learning Intermediate

Learning where data arrives sequentially and the model updates continuously, often under changing distributions.

Machine Learning

DPO Intermediate

A preference-based training method optimizing policies directly from pairwise comparisons without explicit RL loops.

Optimization

Reward Model Intermediate

Model trained to predict human preferences (or utility) for candidate outputs; used in RLHF-style pipelines.

Foundations & Theory

Alignment Intermediate

Ensuring model behavior matches human goals, norms, and constraints, including reducing harmful or deceptive outputs.

Foundations & Theory

Guardrails Intermediate

Rules and controls around generation (filters, validators, structured outputs) to reduce unsafe or invalid behavior.

Reinforcement Learning

Active Learning Intermediate

Selecting the most informative samples to label (e.g., uncertainty sampling) to reduce labeling cost.

Foundations & Theory

Curriculum Learning Intermediate

Ordering training samples from easier to harder to improve convergence or generalization.

Foundations & Theory

Agent Intermediate

A system that perceives state, selects actions, and pursues goals—often combining LLM reasoning with tools and memory.

Agents & Autonomy

KL Divergence Intermediate

Measures how one probability distribution diverges from another.

AI Economics & Strategy

Action Space Intermediate

Set of all actions available to the agent.

AI Economics & Strategy

Markov Decision Process Intermediate

Formal framework for sequential decision-making under uncertainty.

AI Economics & Strategy

Actor-Critic Intermediate

Combines value estimation (critic) with policy learning (actor).

AI Economics & Strategy

Value Function Intermediate

Expected cumulative reward from a state or state-action pair.

AI Economics & Strategy

On-Policy Learning Intermediate

Learning only from current policy’s data.

AI Economics & Strategy

Policy Gradient Intermediate

Optimizing policies directly via gradient ascent on expected reward.

AI Economics & Strategy

Multi-Agent System Intermediate

Multiple agents interacting cooperatively or competitively.

AI Economics & Strategy

Results for "self-reinforcement"

Welcome to AI Glossary

Search

Browse

3D WordGraph