Recurrent Neural Networks (RNNs)
Networks with memory
Dense nets and CNNs take a fixed-size input all at once. But language, audio, and time-series arrive as sequences, where order and context matter. RNNs handle them by processing one element at a time and carrying a hidden state forward — a memory of everything seen so far.
hₜ = f(W·xₜ + U·hₜ₋₁ + b)
The new hidden state depends on the current input xₜ and the previous hidden state hₜ₋₁. The same weights are reused at every step.
Unroll it through time
Watch the same cell process a sentence word by word, passing its hidden state along — then see how gradients flow backward through the unrolled chain.
Key ideas
A vector summarizing the sequence so far. It's the channel through which the past influences the present.
One set of weights handles any sequence length — like weight sharing, but across time.
Unroll the loop into a deep chain, then backpropagate through every time step to update the shared weights.
The Achilles' heel
Because BPTT multiplies gradients across many time steps, plain RNNs suffer badly from vanishing/exploding gradients. They forget what happened more than a few steps ago — they can't connect "The clouds are in the…" to a word 50 tokens earlier.
The fix was gated cells that learn what to remember and forget: see LSTM and GRU. (And for very long range, transformers dropped recurrence entirely.)