Named after Anton van Leeuwenhoek, the father of microbiology, this is the latest from D. E. Shaw Research. This paper describes the massively-parallel hardware part of their super software-hardware technology. I’ve previously described their software here.
Scheduled for completion at the end of 2008, Anton is built with the aim of doing millisecond simulations. In contrast with approaches like Folding@Home, they are interested in a single trajectory. Folding@Home patches together many small trajectories that are more feasible on a regular desktop PC. Their initial configuration will have 512 nodes.
Many of the ideas implemented in their software simulation engine Desmond were developed with Anton in mind. For long range electrostatic interactions, they use a method called k-space Gaussian split Ewald, reducing the computational burden from O(n^2) to O(n log(n)). Another advantage is that the same kernel is used for force interpolation and spreading saving real estate on the chip.
As any molecular dynamics aficionado knows, about 90% of the computational time is spent in computing non-bonded forces. With this is mind, they’ve done everything possible to speed up the process devoting much of the ASIC space. One of their ideas that struck me was the explicit computation between all pairs and then to go back subtract correction terms. This leads to a clear separation in the accounting between what particles are required for what forces – a major win for scalability.
They get major speedups by building a specialized hardware datapath
and control logic tuned for common molecular dynamics communication
patterns. They even have dedicated support to accumulate forces for
reducing latency. I was previously excited by the addition of
the dot product instruction
haddps to recent generations of
processors. Anton does dot products in hardware. Dot products can be
used everytime you’d want to “squish” a vector.
Anton does all computation in fixed point arithmetic, which is faster and more accurate for the same silicon area (their claim.) Molecular dynamics is expensive in computation and communication, but not so much in memory. All of their data for a 25,000 particle system fits in L1 cache.
One of the courses in Engineering at SFU (ENSC 250) deals with designing a RISC processor from the ground up. I vaguely remember building branch prediction logic in VHDL, not really knowing if I’d ever use that knowledge again. If only I’d known…
Anyways, I’m missing out on some of the finer details of their technology. Go read the paper for the real deal.