Sequence-to-Sequence (Seq2Seq) · Suman Bhadra Notes

One sequence in, another out

Many language tasks turn one sequence into another of a different length: English → French, article → summary, question → answer. Seq2Seq solves this with two networks. An encoder reads the whole input and compresses it into a context vector; a decoder reads that vector and generates the output one token at a time.

Encoder read & compress

Steps through the input (often an LSTM), updating a hidden state at each word.

Context vector the handoff

The encoder's final hidden state — a fixed-size summary of the entire input.

Decoder generate

Starts from the context and emits tokens, feeding each output back in to produce the next.

Watch it encode, then decode

The encoder ingests "she likes cats", boils it down to a context vector, and the decoder unrolls it into "elle aime les chats" — generating left to right, each word conditioned on the last.

The bottleneck — and the fix

One vector for everything

Cramming an entire sentence (or paragraph) into a single fixed-size context vector is a tight squeeze. For long inputs the decoder forgets the early words. The fix — attention — lets the decoder look back at all the encoder's hidden states and focus on the relevant ones at each step. That idea grew into the Transformer.

Translation EN → FR

The original Seq2Seq breakthrough application.

Summarization long → short

Map a document to a shorter abstractive summary.

Chat / Q&A prompt → reply

Any "text in, text out" task fits the encoder-decoder mould.