Text Summarization
Two philosophies of shrinking text
Summarization condenses a long document into a short one that keeps the gist. There are two fundamentally different ways to do it: extractive lifts the most important sentences verbatim, like a highlighter; abstractive understands the content and writes something new, like a person paraphrasing.
Score every sentence, keep the top few, paste them together. Always grammatical (the sentences are real) but can feel choppy.
A seq2seq model generates fresh wording. Fluent and compact — but can drift from the facts (hallucinate).
Quality is often scored with ROUGE — overlap with a human reference summary.
Score, select, or rewrite
Watch a short article get scored sentence by sentence, the top sentences pulled out as an extractive summary, and finally an abstractive model rewrite the whole thing into one fresh line.
Which to use?
- Factual accuracy is critical (legal, medical)
- You need to trace every claim to the source
- You want a fast, robust baseline
- You want fluent, human-like summaries
- The source is repetitive or messy
- You can tolerate (and check for) occasional errors
Modern abstractive summarizers are encoder-decoder Transformers (like BART/T5) or prompted LLMs. Watch for hallucination — always check an abstractive summary against the source.