Bidirectional RNNs · Suman Bhadra Notes

Context comes from both sides

A plain RNN reads left to right, so at each word it only knows the past. But meaning often depends on what comes next: in "he went to the bank to fish", only the later word "fish" disambiguates "bank".

A bidirectional RNN runs two RNNs — one forward, one backward — and combines their hidden states at each position. Now every word sees its full left and right context.

Two passes, then combine

Watch a forward pass sweep left-to-right, a backward pass sweep right-to-left, then the two hidden states merge at each position.

How it works

Forward RNN → left to right

Produces a hidden state at each step summarizing everything to the left.

Backward RNN ← right to left

A second RNN summarizing everything to the right of each position.

Concatenate [ → ; ← ]

Join the two hidden states at each step → a representation with full context.

BiLSTM

Use LSTM or GRU cells inside and you get the famous BiLSTM — for years the go-to for tagging and named-entity recognition.

The crucial limitation

Great for

Tagging & sequence labeling (POS, NER)
Classification of a whole sequence
Any task where you have the full input up front

Can't be used for

Generation — you can't read the future you haven't written
Real-time/streaming where future tokens are unknown
Needs the entire sequence before producing output

The successor

Bidirectional context is exactly what BERT (an encoder-only transformer) brought to scale — every token attends to every other, both directions, at once.