AI Glossary

Constitutional AI

An alignment technique developed by Anthropic where an AI system is guided by a set of written principles (a 'constitution') rather than relying solely on human feedback.

How It Works

The model is first trained with helpful responses, then asked to critique and revise its own outputs according to a set of principles. This self-improvement loop replaces much of the human labeling in traditional RLHF.

The Constitution

The principles cover helpfulness, harmlessness, and honesty. Examples include: 'Choose the response that is least likely to be used for illegal purposes' and 'Choose the response that is most helpful while being safe.'

Advantages

Reduces reliance on expensive human annotators. Makes alignment criteria explicit and auditable. Scales better than pure RLHF. The principles can be updated without retraining from scratch.

← Back to AI Glossary

Last updated: March 5, 2026