Self-Play
A training approach where an AI agent improves by playing against copies of itself.
Overview
Self-play is a training paradigm where an AI agent improves by competing or interacting with copies of itself (or previous versions), rather than requiring human-generated training data or fixed opponents. The agent generates its own training data through these self-interactions, creating a curriculum that automatically adapts to its current skill level.
Key Details
Self-play achieved breakthroughs in game playing (AlphaGo Zero, AlphaZero learned chess, Go, and shogi from scratch using only self-play) and is increasingly applied to language model training. LLMs can use self-play for debate, red-teaming, and generating training data for reasoning improvement. The key insight is that agents can discover strategies and knowledge beyond what's present in human-generated data.