Text Summarization

NLP extractive abstractive seq2seq

Two philosophies of shrinking text

Summarization condenses a long document into a short one that keeps the gist. There are two fundamentally different ways to do it: extractive lifts the most important sentences verbatim, like a highlighter; abstractive understands the content and writes something new, like a person paraphrasing.

Extractive select & stitch

Score every sentence, keep the top few, paste them together. Always grammatical (the sentences are real) but can feel choppy.

Abstractive understand & rewrite

A seq2seq model generates fresh wording. Fluent and compact — but can drift from the facts (hallucinate).

Evaluation ROUGE

Quality is often scored with ROUGE — overlap with a human reference summary.

Score, select, or rewrite

Watch a short article get scored sentence by sentence, the top sentences pulled out as an extractive summary, and finally an abstractive model rewrite the whole thing into one fresh line.

Which to use?

Extractive when…
  • Factual accuracy is critical (legal, medical)
  • You need to trace every claim to the source
  • You want a fast, robust baseline
Abstractive when…
  • You want fluent, human-like summaries
  • The source is repetitive or messy
  • You can tolerate (and check for) occasional errors
Today's tools

Modern abstractive summarizers are encoder-decoder Transformers (like BART/T5) or prompted LLMs. Watch for hallucination — always check an abstractive summary against the source.