Sequence-to-Sequence (Seq2Seq)
One sequence in, another out
Many language tasks turn one sequence into another of a different length: English → French, article → summary, question → answer. Seq2Seq solves this with two networks. An encoder reads the whole input and compresses it into a context vector; a decoder reads that vector and generates the output one token at a time.
Steps through the input (often an LSTM), updating a hidden state at each word.
The encoder's final hidden state — a fixed-size summary of the entire input.
Starts from the context and emits tokens, feeding each output back in to produce the next.
Watch it encode, then decode
The encoder ingests "she likes cats", boils it down to a context vector, and the decoder unrolls it into "elle aime les chats" — generating left to right, each word conditioned on the last.
The bottleneck — and the fix
Cramming an entire sentence (or paragraph) into a single fixed-size context vector is a tight squeeze. For long inputs the decoder forgets the early words. The fix — attention — lets the decoder look back at all the encoder's hidden states and focus on the relevant ones at each step. That idea grew into the Transformer.
The original Seq2Seq breakthrough application.
Map a document to a shorter abstractive summary.
Any "text in, text out" task fits the encoder-decoder mould.