Results for "self-reinforcement"

AdvertisementAd space — search-top

94 results

Q-Function Intermediate

Expected return of taking action in a state.

AI Economics & Strategy
Bellman Equation Intermediate

Fundamental recursive relationship defining optimal value functions.

AI Economics & Strategy
Off-Policy Learning Intermediate

Learning from data generated by a different policy.

AI Economics & Strategy
Exploration-Exploitation Tradeoff Intermediate

Balancing learning new behaviors vs exploiting known rewards.

AI Economics & Strategy
Agent Loop Intermediate

Continuous cycle of observation, reasoning, action, and feedback.

AI Economics & Strategy
Alignment Problem Advanced

Ensuring AI systems pursue intended human goals.

AI Safety & Alignment
Embodied AI Advanced

AI systems that perceive and act in the physical world through sensors and actuators.

Robotics & Embodied AI
Dynamics Model Advanced

Predicts next state given current state and action.

Reinforcement Learning
Policy Search Advanced

Directly optimizing control policies.

Reinforcement Learning
Sparse Reward Advanced

Reward only given upon task completion.

Reinforcement Learning
Shared Autonomy Frontier

Control shared between human and agent.

World Models & Cognition
Alignment Research Intermediate

Research ensuring AI remains safe.

Governance & Ethics
Machine Learning Intermediate

A subfield of AI where models learn patterns from data to make predictions or decisions, improving with experience rather than explicit rule-coding.

Machine Learning
System Prompt Intermediate

A high-priority instruction layer setting overarching behavior constraints for a chat model.

Reinforcement Learning
Online Learning Intermediate

Learning where data arrives sequentially and the model updates continuously, often under changing distributions.

Machine Learning
DPO Intermediate

A preference-based training method optimizing policies directly from pairwise comparisons without explicit RL loops.

Optimization
Reward Model Intermediate

Model trained to predict human preferences (or utility) for candidate outputs; used in RLHF-style pipelines.

Foundations & Theory
Alignment Intermediate

Ensuring model behavior matches human goals, norms, and constraints, including reducing harmful or deceptive outputs.

Foundations & Theory
Guardrails Intermediate

Rules and controls around generation (filters, validators, structured outputs) to reduce unsafe or invalid behavior.

Reinforcement Learning
Active Learning Intermediate

Selecting the most informative samples to label (e.g., uncertainty sampling) to reduce labeling cost.

Foundations & Theory
Curriculum Learning Intermediate

Ordering training samples from easier to harder to improve convergence or generalization.

Foundations & Theory
KL Divergence Intermediate

Measures how one probability distribution diverges from another.

AI Economics & Strategy
Markov Decision Process Intermediate

Formal framework for sequential decision-making under uncertainty.

AI Economics & Strategy
Action Space Intermediate

Set of all actions available to the agent.

AI Economics & Strategy
Value Function Intermediate

Expected cumulative reward from a state or state-action pair.

AI Economics & Strategy
Actor-Critic Intermediate

Combines value estimation (critic) with policy learning (actor).

AI Economics & Strategy
Policy Gradient Intermediate

Optimizing policies directly via gradient ascent on expected reward.

AI Economics & Strategy
Multi-Agent System Intermediate

Multiple agents interacting cooperatively or competitively.

AI Economics & Strategy
Toolformer Intermediate

Models trained to decide when to call tools.

AI Economics & Strategy
Generative Model Advanced

Models that learn to generate samples resembling training data.

Diffusion & Generative Models

Welcome to AI Glossary

The free, self-building AI dictionary. Help us keep it free—click an ad once in a while!

Search

Type any question or keyword into the search bar at the top.

Browse

Tap a letter in the A–Z bar to browse terms alphabetically, or filter by domain, industry, or difficulty level.

3D WordGraph

Fly around the interactive 3D graph to explore how AI concepts connect. Click any word to read its full definition.