Actor-Critic
An RL architecture combining a policy network (actor) with a value network (critic) for stable training.
Overview
Actor-critic is a reinforcement learning architecture that combines two neural networks: the actor (policy network) decides which action to take, and the critic (value network) evaluates how good the chosen action was. The critic's value estimates reduce the variance of policy gradient updates, leading to more stable and efficient training.
Key Details
Variants include A2C (Advantage Actor-Critic), A3C (Asynchronous A3C), SAC (Soft Actor-Critic), and PPO. The actor-critic framework is the basis for most modern RL algorithms, including those used in RLHF for language model alignment, robotic control, and game-playing AI.
Related Concepts
policy gradient • reinforcement learning • proximal policy optimization