Bidirectional RNNs
Context comes from both sides
A plain RNN reads left to right, so at each word it only knows the past. But meaning often depends on what comes next: in "he went to the bank to fish", only the later word "fish" disambiguates "bank".
A bidirectional RNN runs two RNNs — one forward, one backward — and combines their hidden states at each position. Now every word sees its full left and right context.
Two passes, then combine
Watch a forward pass sweep left-to-right, a backward pass sweep right-to-left, then the two hidden states merge at each position.
How it works
Produces a hidden state at each step summarizing everything to the left.
A second RNN summarizing everything to the right of each position.
Join the two hidden states at each step → a representation with full context.
Use LSTM or GRU cells inside and you get the famous BiLSTM — for years the go-to for tagging and named-entity recognition.
The crucial limitation
- Tagging & sequence labeling (POS, NER)
- Classification of a whole sequence
- Any task where you have the full input up front
- Generation — you can't read the future you haven't written
- Real-time/streaming where future tokens are unknown
- Needs the entire sequence before producing output
Bidirectional context is exactly what BERT (an encoder-only transformer) brought to scale — every token attends to every other, both directions, at once.