AI Glossary

Constitutional AI

An alignment technique developed by Anthropic where an AI system is guided by a set of written principles (a 'constitution') rather than relying solely on human feedback.

How It Works

The model is first trained with helpful responses, then asked to critique and revise its own outputs according to a set of principles. This self-improvement loop replaces much of the human labeling in traditional RLHF.

The Constitution

The principles cover helpfulness, harmlessness, and honesty. Examples include: 'Choose the response that is least likely to be used for illegal purposes' and 'Choose the response that is most helpful while being safe.'

Advantages

Reduces reliance on expensive human annotators. Makes alignment criteria explicit and auditable. Scales better than pure RLHF. The principles can be updated without retraining from scratch.

← Back to AI Glossary

Constitutional AI

How It Works

The Constitution

Advantages

Related Articles

Constitutional AI: Teaching Models to Self-Improve

AI Alignment Research: Ensuring AI Does What We Want

AI Safety Research: Preventing Catastrophic Risks

10 AI Startup Success Stories: From Zero to Unicorn

AI Agents: The Complete Guide for 2025

Related Concepts