Machine Translation

NLP translation alignment attention

Why translation is hard

Translating isn't swapping each word for its dictionary match. Word order flips, one word becomes several (or none), and meaning depends on context. "The red car" becomes "la voiture rouge" in French — the adjective moves after the noun. A good translator has to reorder, not just substitute.

Rule-based hand-written

Early systems used dictionaries and grammar rules — brittle and endless to maintain.

Statistical (SMT) phrase tables

Learn phrase alignments and probabilities from millions of translated sentence pairs.

Neural (NMT) seq2seq + attention

One network reads the source and generates the target, learning alignment on its own. Today's standard.

From word-swap to alignment

Watch a literal word-by-word swap produce broken French, then see the alignment that maps each target word to the right source word — including the reordering attention learns automatically.

Modern MT, in brief

Attention = learned alignment

NMT is a seq2seq model: an encoder reads the source, a decoder writes the target. Attention lets the decoder, at every output word, weigh all the source words and focus on the ones that matter — exactly the alignment shown above, learned from data. Today's translators are Transformers doing this at scale.

Word-by-word
  • Ignores word order
  • Can't handle one-to-many words
  • No context or agreement
Attention-based NMT
  • Reorders fluently
  • Uses full-sentence context
  • Learns alignment without being told