AI Glossary

Decoder

In transformer architecture, a decoder generates output tokens one at a time using masked self-attention to prevent looking at future tokens, plus cross-attention to the encoder's output.

Decoder-Only Models

GPT, LLaMA, and most modern LLMs are decoder-only transformers. They generate text autoregressively -- each new token is predicted based on all previous tokens. This architecture excels at text generation tasks.

How Masked Attention Works

The decoder uses causal (masked) self-attention: each position can only attend to earlier positions. This prevents the model from 'cheating' by looking at tokens it hasn't generated yet.

Encoder-Decoder vs Decoder-Only

Encoder-decoder models (T5, BART) process input with an encoder then generate output with a decoder. Decoder-only models process everything in a single pass. The trend has shifted toward decoder-only for general-purpose LLMs.

← Back to AI Glossary

Decoder

Decoder-Only Models

How Masked Attention Works

Encoder-Decoder vs Decoder-Only

Related Articles

Encoder-Decoder Models: T5, BART, and Seq2Seq Transformers

Encoder-Decoder Architecture: From Seq2Seq to Transformers

Decoder-Only Models: The GPT Family and Autoregressive Generation

Self-Attention vs Cross-Attention: A Visual Guide

The Attention Mechanism: How AI Learned to Focus

Related Concepts