Long Short-Term Memory (LSTM)
A recurrent neural network architecture designed to learn long-range dependencies in sequential data, using gating mechanisms to control information flow.
Architecture
LSTMs have three gates: Forget gate: what to discard from cell state. Input gate: what new information to store. Output gate: what to output. The cell state acts as a highway for information, solving the vanishing gradient problem.
Historical Importance
Invented by Hochreiter & Schmidhuber (1997). Dominated sequence tasks (machine translation, speech recognition, text generation) from 2014-2017. Google used LSTMs for Google Translate before switching to Transformers.
Current Status
Largely superseded by Transformers for most NLP tasks. Still used in some time-series applications, embedded systems, and scenarios where training data or compute is limited. Transformers' parallelism advantage is decisive for large-scale training.