AI Glossary

Multi-Armed Bandit

A simplified RL framework for making sequential decisions among multiple options with uncertain rewards.

Overview

The multi-armed bandit problem is a simplified reinforcement learning framework where an agent must choose between multiple actions (arms), each with an unknown reward distribution, to maximize cumulative reward over time. It captures the essential exploration-exploitation tradeoff without the complexity of state transitions.

Key Details

Named after a gambler facing multiple slot machines, MAB algorithms include epsilon-greedy, UCB (Upper Confidence Bound), and Thompson sampling. Contextual bandits extend the framework by incorporating contextual information about each decision. Applications include ad placement, website optimization, clinical trials, recommendation systems, and dynamic pricing.

Related Concepts

exploration exploitationreinforcement learninga b testing ml

← Back to AI Glossary

Last updated: March 5, 2026