The NLP Pipeline
From raw text to a prediction
Text almost never goes straight into a model. It passes through an assembly line of stages, each one nudging the messy raw text closer to clean numbers a model can learn from.
This page is the map for the whole NLP track — every later article zooms into one of these stages.
Watch a sentence flow through
Follow one sentence as it is tokenized, cleaned, normalized, vectorized, and finally fed to a model that predicts its sentiment.
The stages
5. Model
learn / predict
Feed the vectors to a classifier or neural network.
6. Evaluate
measure
Score with accuracy, precision, recall, F1 — and iterate.
A few guardrails
Good practice
- Fit every transform on training data only
- Keep the same steps for train and inference
- Bundle stages into a single pipeline object
Watch out
- Modern transformers need less cleaning — don't over-strip
- Aggressive normalization can destroy signal (e.g. negation)
- Mismatched train/serve steps cause silent bugs