AI Glossary

Actor-Critic

An RL architecture combining a policy network (actor) with a value network (critic) for stable training.

Overview

Actor-critic is a reinforcement learning architecture that combines two neural networks: the actor (policy network) decides which action to take, and the critic (value network) evaluates how good the chosen action was. The critic's value estimates reduce the variance of policy gradient updates, leading to more stable and efficient training.

Key Details

Variants include A2C (Advantage Actor-Critic), A3C (Asynchronous A3C), SAC (Soft Actor-Critic), and PPO. The actor-critic framework is the basis for most modern RL algorithms, including those used in RLHF for language model alignment, robotic control, and game-playing AI.

Related Concepts

policy gradientreinforcement learningproximal policy optimization

← Back to AI Glossary

Last updated: March 5, 2026