GRU — The Simpler Gated Cell
An LSTM, streamlined
The Gated Recurrent Unit (2014) is a lighter cousin of the LSTM. It keeps the gating idea that solved the memory problem, but with two gates instead of three and no separate cell state — fewer parameters, faster training.
Update gate
keep vs replace
Decides how much of the previous hidden state to carry forward versus overwrite with new info. (It merges the LSTM's forget + input gates.)
Reset gate
how much past to use
Controls how much of the past hidden state feeds into the new candidate.
LSTM vs GRU at a glance
Side by side: the LSTM's three gates and cell state, versus the GRU's two gates and single hidden state.
LSTM or GRU?
GRU advantages
- Fewer parameters → faster, less data-hungry
- Often matches LSTM accuracy
- Simpler to implement and tune
LSTM advantages
- The explicit cell state can help on very long sequences
- Slightly more expressive
- The older, most-studied default
Practical advice
There's rarely a big difference — try both and let validation decide. GRU is a great first choice when you want speed. And remember: for long-range modern NLP, transformers have largely replaced both.