AI Glossary

Positional Encoding

A mechanism that injects information about token positions into transformer models, since the attention mechanism itself has no inherent sense of order.

Why It's Needed

Unlike RNNs that process tokens sequentially, transformers process all tokens in parallel. Without positional encoding, 'the cat sat on the mat' and 'mat the on sat cat the' would produce identical representations.

Types

Sinusoidal (original): Fixed mathematical patterns of different frequencies. Learned: Trained embeddings for each position (BERT, GPT-2). RoPE (Rotary): Encodes relative positions through rotation, enabling length generalization (used in LLaMA, Mistral).

Context Length Extension

RoPE and ALiBi (Attention with Linear Biases) enable models to generalize to longer sequences than seen during training. This is a key technique behind extending LLM context windows beyond their training length.

← Back to AI Glossary

Last updated: March 5, 2026