AI Glossary

Decoder

In transformer architecture, a decoder generates output tokens one at a time using masked self-attention to prevent looking at future tokens, plus cross-attention to the encoder's output.

Decoder-Only Models

GPT, LLaMA, and most modern LLMs are decoder-only transformers. They generate text autoregressively -- each new token is predicted based on all previous tokens. This architecture excels at text generation tasks.

How Masked Attention Works

The decoder uses causal (masked) self-attention: each position can only attend to earlier positions. This prevents the model from 'cheating' by looking at tokens it hasn't generated yet.

Encoder-Decoder vs Decoder-Only

Encoder-decoder models (T5, BART) process input with an encoder then generate output with a decoder. Decoder-only models process everything in a single pass. The trend has shifted toward decoder-only for general-purpose LLMs.

← Back to AI Glossary

Last updated: March 5, 2026