AI Glossary

Exploration vs Exploitation

The fundamental RL dilemma between trying new actions (exploration) and using known good actions (exploitation).

Overview

The exploration-exploitation tradeoff is a fundamental challenge in reinforcement learning and decision-making: should an agent try new, potentially better actions (explore) or stick with the best known action (exploit)? Too much exploration wastes time on suboptimal actions; too much exploitation misses potentially better strategies.

Key Details

Strategies for balancing this tradeoff include epsilon-greedy (random exploration with probability epsilon), UCB (Upper Confidence Bound, which explores uncertain actions), Thompson sampling (sampling from posterior distributions), and curiosity-driven exploration (rewarding novel states). This tradeoff appears in many real-world systems including A/B testing, recommendation systems, and clinical trials.

Related Concepts

reinforcement learningmulti armed banditq learning

← Back to AI Glossary

Last updated: March 5, 2026