Decoder
In transformer architecture, a decoder generates output tokens one at a time using masked self-attention to prevent looking at future tokens, plus cross-attention to the encoder's output.
Decoder-Only Models
GPT, LLaMA, and most modern LLMs are decoder-only transformers. They generate text autoregressively -- each new token is predicted based on all previous tokens. This architecture excels at text generation tasks.
How Masked Attention Works
The decoder uses causal (masked) self-attention: each position can only attend to earlier positions. This prevents the model from 'cheating' by looking at tokens it hasn't generated yet.
Encoder-Decoder vs Decoder-Only
Encoder-decoder models (T5, BART) process input with an encoder then generate output with a decoder. Decoder-only models process everything in a single pass. The trend has shifted toward decoder-only for general-purpose LLMs.