Named after Anton van Leeuwenhoek, the father of microbiology,
this is the latest from D. E. Shaw Research. This paper
describes the massively-parallel hardware part of their super
software-hardware technology. I’ve previously described their software
here.
Scheduled for completion at the end of 2008, Anton is built with the
aim of doing millisecond simulations. In contrast with approaches like
Folding@Home, they are interested in a single
trajectory. Folding@Home patches together many small trajectories that
are more feasible on a regular desktop PC. Their initial configuration
will have 512 nodes.
Many of the ideas implemented in their software simulation engine
Desmond were developed with Anton in mind. For long range
electrostatic interactions, they use a method called k-space Gaussian
split Ewald, reducing the computational burden from O(n^2) to O(n log(n)). Another advantage is that the same kernel is used for force
interpolation and spreading saving real estate on the chip.
As any molecular dynamics aficionado knows, about 90% of the
computational time is spent in computing non-bonded forces. With this
is mind, they’ve done everything possible to speed up the process
devoting much of the ASIC space. One of their ideas that struck me was
the explicit computation between all pairs and then to go back
subtract correction terms. This leads to a clear separation in
the accounting between what particles are required for what forces –
a major win for scalability.
They get major speedups by building a specialized hardware datapath
and control logic tuned for common molecular dynamics communication
patterns. They even have dedicated support to accumulate forces for
reducing latency. I was previously excited by the addition of
the dot product instruction haddps to recent generations of
processors. Anton does dot products in hardware. Dot products can be
used everytime you’d want to “squish” a vector.
Anton does all computation in fixed point arithmetic, which is faster
and more accurate for the same silicon area (their claim.) Molecular
dynamics is expensive in computation and communication, but not so
much in memory. All of their data for a 25,000 particle system fits in
L1 cache.
One of the courses in Engineering at SFU (ENSC 250) deals with
designing a RISC processor from the ground up. I vaguely remember
building branch prediction logic in VHDL, not really knowing if I’d
ever use that knowledge again. If only I’d known…
Anyways, I’m missing out on some of the finer details of their
technology. Go read the paper for the real deal.