AI Glossary

SARSA

State-Action-Reward-State-Action -- an on-policy reinforcement learning algorithm that updates Q-values based on the action actually taken in the next state.

How It Differs from Q-Learning

Q-learning (off-policy) updates using the maximum possible Q-value in the next state. SARSA (on-policy) updates using the Q-value of the action actually taken. This makes SARSA more conservative and safer in some environments.

Usage

SARSA is preferred when the exploration policy matters (e.g., in environments with cliffs or dangerous states where an aggressive policy could be catastrophic). Q-learning is preferred when you want the optimal policy regardless of how you explore.

← Back to AI Glossary

SARSA

How It Differs from Q-Learning

Usage

Related Articles

Reinforcement Learning: The Complete Guide

Related Concepts