AI Glossary

DeBERTa

A BERT variant using disentangled attention that separates content and position information.

Overview

DeBERTa (Decoding-enhanced BERT with disentangled Attention), developed by Microsoft, improves the transformer architecture by using disentangled attention, where content and position are represented as separate vectors. Each token's attention score is computed from both content-to-content and content-to-position interactions.

Key Details

DeBERTa also introduces an enhanced mask decoder that incorporates absolute position information for prediction. DeBERTa V3 was the first model to surpass human performance on the SuperGLUE benchmark and remains one of the strongest encoder models for natural language understanding tasks.

Related Concepts

bertattention mechanismtransformer

← Back to AI Glossary

Last updated: March 5, 2026