Long Short-Term Memory (LSTM)

Long Short-Term Memory, or LSTM, is a kind of neural network built to handle sequences such as text, speech, or time series. Ordinary recurrent networks struggle to connect information that is far apart in a sequence because the learning signal fades or explodes over many steps. LSTM solves this with a memory cell and a set of gates that decide what to keep, what to forget, and what to output, letting useful information persist across long gaps.

It was introduced by Sepp Hochreiter and Jurgen Schmidhuber in their 1997 paper “Long Short-Term Memory,” published in Neural Computation, volume 9. For roughly two decades it was the workhorse for sequence tasks, behind major advances in speech recognition, machine translation, and text prediction before transformers became dominant.

Why business readers should care: LSTM made it practical for software to process language and time-ordered data at scale. It powered the first generation of usable voice assistants, translation tools, and forecasting systems, and it set the stage for the language models that followed.

Long Short-Term Memory (LSTM)

Sources

Related