Model Interpretability — SHAP & LIME
"The model said no" isn't an answer
A model rejects a loan application, or flags a patient as high-risk, and someone asks the only question that matters: why? If your answer is "the model said so," you don't have an answer — you have a liability. Interpretability tools exist to turn a bare score into a story you can check, defend, and act on.
Some models explain themselves. Linear and logistic regression wear their reasoning on their sleeve — read the weights and you know exactly how each feature moves the prediction. A small decision tree is even friendlier: follow the splits from root to leaf and the prediction narrates itself. But the models that actually win on tabular data — random forests and boosted ensembles like XGBoost — are hundreds of deep trees voting together. No human reads hundreds of trees. They're accurate precisely because they're complicated, and complicated means opaque.
Linear/logistic weights and small-tree splits are the explanation. No extra tooling needed — but you pay for it in accuracy on messy data.
Forests and boosted trees blend hundreds of overlapping rules. The prediction is great; the reasoning is smeared across the whole ensemble.
Tools like permutation importance, LIME, and SHAP treat the trained model as a black box and reverse-engineer its reasons — no retraining required.
Interpretability tools take a trained black-box model and answer two questions: which features matter overall (global) and why it made this one prediction (local).
Global vs local: two different questions
"Explain the model" is really two requests wearing one trench coat. A data scientist auditing the model wants the big picture; the rejected applicant wants their own picture. Different tools answer each.
Across the whole dataset, what does the model lean on? Useful for audits, sanity checks, and feature pruning.
For one specific row, which feature values pushed the score where it landed? This is what LIME and SHAP deliver.
The cleanest global tool is permutation importance: take one feature's column, shuffle it so it becomes useless noise (while keeping its distribution intact), and re-score the model on held-out data. If accuracy barely moves, the model never needed that feature. If accuracy craters, the model was leaning on it hard. Repeat per feature and you get an honest ranking — and because it only needs predictions, it works on any model.
Tree libraries ship a free "feature importance" computed from how much each split reduced impurity during training. It's fast but biased: it inflates high-cardinality features (an ID-like column with thousands of unique values gets endless splitting opportunities) and it's measured on training data. Permutation importance — shuffle and re-score on a validation set — is slower but honest.
LIME in one idea: a tiny linear model, valid nearby
LIME (Local Interpretable Model-agnostic Explanations) starts from a simple observation: a black box's decision boundary may be hopelessly curvy globally, but zoom in close enough to one point and it looks almost flat — the same way the Earth is round but your street is flat.
Take the one row you want to explain and generate hundreds of variants — nudge the income, flip a category, jitter the numbers.
Run every variant through the black box and record its predictions. You're probing how the model behaves around this point.
Fit a small linear model to those (variant, prediction) pairs, weighting nearby variants more. Its coefficients are the local explanation.
The result reads like a regression you can actually trust near this point: "three late payments pushed the score down 0.21, income below $40k pushed it down 0.14, six years of employment pulled it up 0.06." Fast, intuitive, and works on literally any model that returns predictions.
LIME's perturbations are random — run it twice on the same row and you can get two different stories, especially when the local boundary is genuinely curvy or features are correlated. Treat LIME as a quick sketch of the neighborhood, not a sworn statement.
SHAP in one idea: split the credit fairly
SHAP comes from cooperative game theory. Picture the prediction as a payout and the features as players on a team. The Shapley value — a 1950s game-theory result — is the provably fair way to split the payout: for each feature, average its marginal contribution over all possible orders in which the features could "join the team." Join early or join late, every ordering counts, so no feature gets credit just for showing up first.
The "game" is this one prediction. Each feature value gets a signed contribution: how much it pushed the score up or down from the average.
A feature's Shapley value averages its effect across every order of arrival — the only attribution scheme satisfying a small set of fairness axioms.
Start from the base rate (the average prediction) and add every feature's contribution — you land exactly on the model's output. That's what makes waterfall plots possible.
Averaging over all orderings is exponential in general — hopeless to brute-force. But for tree models (forests, gradient boosting, XGBoost) the TreeSHAP algorithm exploits the tree structure to compute exact Shapley values in polynomial time. This is why SHAP became the default explanation tool for tabular ML: the models that most need explaining are the ones it explains fastest.
Watch one rejection get explained
The animation follows a single loan applicant. First the black box delivers its verdict — rejected, score 0.23 — with no reasons attached. Then each tool opens the box a little wider: permutation importance shuffles columns one at a time and ranks features by how far accuracy falls, LIME scatters perturbed copies of the applicant around the decision boundary and fits a local line through them, and finally a SHAP waterfall starts at the 0.50 base rate and lets each feature push the score down (red) or up (green) until it lands exactly on 0.23.
Read explanations with care
An explanation is a measurement of the model, not of the world. Used well, these tools are the best debugger you have; over-read them and they'll happily tell you a confident, additive, beautifully-plotted lie.
- Debug the model — a feature that "shouldn't" dominate usually means a bug upstream
- Spot leakage — if a suspiciously perfect feature tops every chart, you likely have data leakage
- Sanity-check before shipping — do the top features match domain sense?
- Communicate — give a regulator or an applicant the model's actual reasons
- Prove causation — "income drove the score" is about the model, not about life
- Stay stable under correlation — correlated features split or swap credit between runs and methods
- Describe reality — importance reflects this model's habits, not the true drivers
- Justify a bad model — a tidy waterfall over garbage is still garbage
If income and education are strongly correlated, the model can lean on either — so SHAP may split the credit between them, while LIME or a retrained model may hand it all to one. Neither is "wrong"; the model genuinely can't tell them apart. When attributions matter, check feature correlations first, and treat explanations as a tool to debug — catching leakage, junk features, and silent bugs — at least as much as a tool to justify.