Paper Summary: Learning to Forget: Continual Prediction with LSTM

Last updated: 15 Jun 2025

Please note This post is mainly intended for my personal use. It is not peer-reviewed work and should not be taken as such.

learning-to-forget-continual-prediction-with-lstm-front-page

Learning to Forget: Continual Prediction with LSTM Source

WHAT

This paper introduces the "forget gate" to LSTM cells, that learn when they should "reset" the cell state. This was not present in the original 1997 paper by Hochreiter and Schmidhuber.

WHY

If LSTMs are applied to continuous data streams, cells without gates keep growing the magnitude of the state indefinitely and then cause the network to saturate and stop working.

HOW

The CEC (Constant Error Carousel) weight (which is a constant $1.0$ for vanilla LSTMs) is replaced by a learned parameter, which controls what signals should be memorized or "discarded".

CLAIMS/QUOTES

RNNs are bad: "[...] standard RNNs fail to learn in the presence of time lags greater than 5-10 discrete time steps between relevant input events and target signals."
Constant Error Carousel (CEC): "The CEC's solve the vanishing error problem: in the absence of new input or error signals to the cell, the CEC's local error back flow remains constant, neither growing nor decaying. [...] This is why LSTM can bridge arbitrary time lags between input events and target signals."
State size growth: "The internal states tend to grow linearly."
Learning rates: Using an exponentially decaying learning rate improves results in the continuous learning case.

EXTENDS/USES

Hochreiter and Schmidhuber 1997: Long Short-term memory

NOTES

What the authors call a "continuous" problem is what other people refer to as "online" learning.¹

References

IEEE Xplore: Learning to Forget: Continual Prediction with LSTM
2: Not to be confused with real-time ML.

Felipe 31 May 2025 15 Jun 2025 paper-summary sequence-learning recurrent-neural-networks

WHAT

WHY

HOW

CLAIMS/QUOTES

EXTENDS/USES

NOTES

References

Dialogue & Discussion