AI Glossary

Elo Rating (for LLMs)

A ranking system adapted from chess that scores AI models based on head-to-head comparisons from human evaluators, used in the Chatbot Arena leaderboard.

How It Works

Human evaluators compare responses from two anonymous models and select the better one. Models gain or lose Elo points based on wins and losses, with upsets (beating a higher-rated model) worth more points.

Chatbot Arena

LMSYS's Chatbot Arena is the most influential LLM ranking using Elo ratings. It's considered more reliable than automated benchmarks because it captures real user preferences across diverse tasks.

← Back to AI Glossary