AI Glossary

SARSA

State-Action-Reward-State-Action -- an on-policy reinforcement learning algorithm that updates Q-values based on the action actually taken in the next state.

How It Differs from Q-Learning

Q-learning (off-policy) updates using the maximum possible Q-value in the next state. SARSA (on-policy) updates using the Q-value of the action actually taken. This makes SARSA more conservative and safer in some environments.

Usage

SARSA is preferred when the exploration policy matters (e.g., in environments with cliffs or dangerous states where an aggressive policy could be catastrophic). Q-learning is preferred when you want the optimal policy regardless of how you explore.

← Back to AI Glossary

Last updated: March 5, 2026