Recommender Systems

ML collaborative filtering matrix factorization latent factors

A table full of holes

Picture a giant table: users down the side, items across the top, a rating in each cell. The catch — almost every cell is empty, because no one rates everything. A recommender system's job is to predict the missing entries, then suggest the items it thinks you'd score highest.

Content-based match features

Recommend items similar to ones you liked, using item attributes (genre, brand, keywords).

Collaborative match people

"People like you also liked…" — use the crowd's behaviour, no item features needed.

Hybrid both

Real systems blend the two, and handle the cold-start problem of brand-new users and items.

Two ways to fill a blank

First, neighbourhood-based collaborative filtering: find users who agree with you and borrow their opinion. Then, matrix factorization: explain every rating as a handful of hidden "taste factors".

Matrix factorization, in one idea

R ≈ U × Vᵀ

Approximate the huge, sparse ratings matrix R as the product of two skinny matrices: one row of hidden factors per user (U) and one per item (V). A user's predicted rating is the dot product of their factor vector with the item's. Learn U and V by minimizing error on the ratings you do have — the same gradient-descent idea behind linear regression.

Neighbourhood CF
  • Intuitive, easy to explain
  • Struggles when the matrix is very sparse
  • Slow to find neighbours at scale
Matrix factorization
  • Compresses millions of ratings into small factor vectors
  • Generalizes through learned latent structure
  • The workhorse behind the Netflix Prize