AI Glossary

Long Short-Term Memory (LSTM)

A recurrent neural network architecture designed to learn long-range dependencies in sequential data, using gating mechanisms to control information flow.

Architecture

LSTMs have three gates: Forget gate: what to discard from cell state. Input gate: what new information to store. Output gate: what to output. The cell state acts as a highway for information, solving the vanishing gradient problem.

Historical Importance

Invented by Hochreiter & Schmidhuber (1997). Dominated sequence tasks (machine translation, speech recognition, text generation) from 2014-2017. Google used LSTMs for Google Translate before switching to Transformers.

Current Status

Largely superseded by Transformers for most NLP tasks. Still used in some time-series applications, embedded systems, and scenarios where training data or compute is limited. Transformers' parallelism advantage is decisive for large-scale training.

← Back to AI Glossary