Archive for August, 2007

Economical SVD

Posted in Computing 1 year, 5 months ago

Principle Component Analysis (PCA) generates a rotation for the covariance matrix of random variables so that it becomes diagonal. You can do this by computing the eigenvalue decomposition of the covariance matrix. An easier method is to compute the singular value decomposition (SVD) of the data itself (which in general is non-square.) The eigenvalues of the covariance are the square of the singular values and the eigenvectors are only off by a constant factor.

Y = U S V^t

Let’s do a quick check of the dimensions: If the original matrix with the data Y is of the dimension m-by-n, then U is m-by-m, S is m-by-n and V is n-by-n.

If your data comes from images, then for a 1024×1024 image, the number of rows in Y is on the order of a million. U in this case would be a million by a million. Ouch! Moreover, S is always diagonal and of the same dimensions as the original matrix. In general, this isn’t stored in a sparse fashion.

With PCA, you’re only interested in the first few principle components anyways, so there’s no point in computing all of the basis vectors when you don’t need them.

Enter Economical SVD.

I first came across this while reading the documentation for VXL. Using the “economy” mode for the SVD only computes the first n eigenvectors and eigenvalues. This results in U being m-by-n, S being n-by-n and V being n-by-n. A huge saving in memory and computation time!

In languages such as Matlab, calling this routine is as simple as sending a zero in the second argument. Wikipedia has a whole section on reduced SVDs.

Mersenne Twister

Posted in Computing 1 year, 5 months ago

Mersenne Twister is a high quality random number generator. One of the many qualities is the huge period: 2^19937-1 (which incidently is a Mersenne prime.) I don’t see why one would use anything other than MT for all Monte Carlo simulations.

Recently I was brought to the attention of SFMT, which stands for SIMD-oriented Fast Mersenne Twister by the same folks. The authors claim it to be twice as fast as the original implementation. I remember using the implementation in AMD’s C Math Library: wonder how much faster this is compared to AMD’s.

A Monte Carlo engine I was writing for non-parametric statistical testing required upward of 40 million random numbers. I was happy to see that both Matlab as well as Python had native support for MT. I hope the faster version of MT is included with the next update to Python as it’s trivial to embed C code within Python.