Amazon Recommendations

The principle axes transform is a rigid registration technique for images. By taking the Singular Value Decomposition of the correlation matrix of fiduciary markers, the parameters of the transformation can be determined (this corresponds to three orientation and three translation in three dimensions.)

In information retrieval, this is known as latent semantic analysis. For example, if we have data linking customers and items they’ve bought, we can build a matrix with this information. Row-rank approximations of this matrix can be used to cluster customers. For data the size that Amazon has, the matrix is extremely sparse. For obvious reasons, this method doesn’t scale.

How does Amazon do it then? I didn’t know if the recommendations were done in real-time or updated offline like Google does with PageRank. That’s when some research took me to this publication: Amazon.com recommendations: item-to-item collaborative filtering. The publication has a decent review of existing methods: collaborative filtering, cluster models, and search-based methods and the reasons why they don’t scale for Amazon. More importantly, they aren’t fine-grained enough to recommend relevant items.

Amazon’s approach is a little different. Instead of grouping a user to a cluster of existing customers, they cluster items instead. For the details, read the paper (it also has pseudo-code.) This might sound obvious now, but it was pretty novel ten years back (a patent from 1998 by Amazon precedes the publication.) The algorithm scales independently of the number of customers and number of items in the product catalog. The computations of similar items is still expensive and done offline, but the retrieval can be done in real-time with high quality.

Comments are closed.