Anton

Posted in Paper 1 year, 5 months ago

Named after Anton van Leeuwenhoek, the father of microbiology, this is the latest from D. E. Shaw Research. This paper describes the massively-parallel hardware part of their super software-hardware technology. I’ve previously described their software here.

Scheduled for completion at the end of 2008, Anton is built with the aim of doing millisecond simulations. In contrast with approaches like Folding@Home, they are interested in a single trajectory. Folding@Home patches together many small trajectories that are more feasible on a regular desktop PC. Their initial configuration will have 512 nodes.

Many of the ideas implemented in their software simulation engine Desmond were developed with Anton in mind. For long range electrostatic interactions, they use a method called k-space Gaussian split Ewald, reducing the computational burden from O(n^2) to O(n log(n)). Another advantage is that the same kernel is used for force interpolation and spreading saving real estate on the chip.

As any molecular dynamics aficionado knows, about 90% of the computational time is spent in computing non-bonded forces. With this is mind, they’ve done everything possible to speed up the process devoting much of the ASIC space. One of their ideas that struck me was the explicit computation between all pairs and then to go back subtract correction terms. This leads to a clear separation in the accounting between what particles are required for what forces – a major win for scalability.

They get major speedups by building a specialized hardware datapath and control logic tuned for common molecular dynamics communication patterns. They even have dedicated support to accumulate forces for reducing latency. I was previously excited by the addition of the dot product instruction haddps to recent generations of processors. Anton does dot products in hardware. Dot products can be used everytime you’d want to “squish” a vector.

Anton does all computation in fixed point arithmetic, which is faster and more accurate for the same silicon area (their claim.) Molecular dynamics is expensive in computation and communication, but not so much in memory. All of their data for a 25,000 particle system fits in L1 cache.

One of the courses in Engineering at SFU (ENSC 250) deals with designing a RISC processor from the ground up. I vaguely remember building branch prediction logic in VHDL, not really knowing if I’d ever use that knowledge again. If only I’d known…

Anyways, I’m missing out on some of the finer details of their technology. Go read the paper for the real deal.

Going Shopping

Posted in Activity 1 year, 5 months ago

Not for grocery, clothes, shoes, electronics or a car, but for university courses!

Shopping for courses

This made me laugh so much that it made me cry. Or maybe the other way.

Economical SVD

Posted in Computing 1 year, 5 months ago

Principle Component Analysis (PCA) generates a rotation for the covariance matrix of random variables so that it becomes diagonal. You can do this by computing the eigenvalue decomposition of the covariance matrix. An easier method is to compute the singular value decomposition (SVD) of the data itself (which in general is non-square.) The eigenvalues of the covariance are the square of the singular values and the eigenvectors are only off by a constant factor.

Y = U S V^t

Let’s do a quick check of the dimensions: If the original matrix with the data Y is of the dimension m-by-n, then U is m-by-m, S is m-by-n and V is n-by-n.

If your data comes from images, then for a 1024×1024 image, the number of rows in Y is on the order of a million. U in this case would be a million by a million. Ouch! Moreover, S is always diagonal and of the same dimensions as the original matrix. In general, this isn’t stored in a sparse fashion.

With PCA, you’re only interested in the first few principle components anyways, so there’s no point in computing all of the basis vectors when you don’t need them.

Enter Economical SVD.

I first came across this while reading the documentation for VXL. Using the “economy” mode for the SVD only computes the first n eigenvectors and eigenvalues. This results in U being m-by-n, S being n-by-n and V being n-by-n. A huge saving in memory and computation time!

In languages such as Matlab, calling this routine is as simple as sending a zero in the second argument. Wikipedia has a whole section on reduced SVDs.

Mersenne Twister

Posted in Computing 1 year, 5 months ago

Mersenne Twister is a high quality random number generator. One of the many qualities is the huge period: 2^19937-1 (which incidently is a Mersenne prime.) I don’t see why one would use anything other than MT for all Monte Carlo simulations.

Recently I was brought to the attention of SFMT, which stands for SIMD-oriented Fast Mersenne Twister by the same folks. The authors claim it to be twice as fast as the original implementation. I remember using the implementation in AMD’s C Math Library: wonder how much faster this is compared to AMD’s.

A Monte Carlo engine I was writing for non-parametric statistical testing required upward of 40 million random numbers. I was happy to see that both Matlab as well as Python had native support for MT. I hope the faster version of MT is included with the next update to Python as it’s trivial to embed C code within Python.

Browse Url

Posted in Computing 1 year, 5 months ago

Superhacker Federico Mena-Quintero talks about using Emacs for GTD. I do something similar but in a less organized fashion. He mentions that it would be nice if it were better integrated with the desktop. For URLs, I’ve been using the browse-url package for many years. Dropping the following line into your .emacs enables the feature.

(global-set-key (kbd "C-c b") 'browse-url)

Pressing C-c b parses the text around the cursor for a plausible URL and uses that as the default. Pressing enter will open the default browser with the chosen URL.

browse-url

Pretty cool, huh?