Archive for May, 2007

C++ Gymnastics

Posted in Computing 1 year, 2 months ago

I’m having an opportunity to play with a highly parallel machine (think >40 processors/cores) and starting to think of smart ways to make use of this power. For low level libraries, you can play all kinds of tricks with the C++ preprocessor, known as template meta programming.

I’m going to describe one that I thought of.

Modern processors have special registers for vector processing. For example, if you were asked to add four single-precision floating point values, you’d write something like this:

float a[4], b[4], c[4];
for (int i=0; i<4; i++)
    c[i] = a[i] + b[i];

The SSE addps instruction will add four packed single-precision floating point values simultaneously. __m128 is the packed version of four floats.

__m128 a, b, c;
c = addps (a, b);

If you had longer vectors, the logical extension of this idea is to create an array that is a multiple of four and then process the remaining elements in the normal way. In C, you’d write something like this:

const int LENN = LEN / 4;
const int LENR = LEN % 4;
__m128 a[LENN], b[LENN], c[LENN];
float ar[LENR], br[LENR], cr[LENR];

for (int i=0; i<LENN; i++)
    c[i] = addps(a[i], b[i]);

for (int i=0; i<LENR; i++)
    cr[i] = ar[i] + br[i];

The problem in C is that if your program uses a vector length that is a multiple of four, the second loop is still executed. This isn’t optimal.

Using template meta programming, you’d create a template that uses the length of the vector as a template argument. What happens is that the variables LENN and LENR are determined at compile time, and the compiler will be able to optimize the second loop away (with a decent compiler anyways.)

The second advantage is that each instantiation of the template becomes a new type and gets you compile-time type safety. This makes it impossible to add vectors of different lengths (which happens all the time with void* tricks in vanilla C.)

I hope to share more tricks in the future, but that’s all for now.

Back Up Again

Posted in Web 1 year, 2 months ago

I had forgotten to renew my domain and it expired. That’s fixed now and I foresee no further problems.

Classic Math Papers

Posted in Paper 1 year, 2 months ago

I walked into a gold mine: a list of thirteen classic papers in applied math. The webpage has other key papers, but this list should keep me occupied for some time. I’m reproducing the list for my own reference:

  • J.W. Cooley and J.W. Tukey, “An algorithm for the machine calculation of complex Fourier series,” Math. Comp., 19 (1965), pp. 297–301.
  • R. Courant, K. Friedrichs, and H. Lewy, “On the partial difference equations of mathematical physics,” IBM J. Res. Develop., 11 (1967), pp. 215–234.
  • A.S. Householder, “Unitary triangularization of a nonsymmetric matrix,” J. Assoc. Comput. Mach., 5 (1958), pp. 339–342.
  • C.F. Curtiss and J.O. Hirschfelder, “Integration of stiff equations,” Proc. Nat. Acad. Sci. U.S.A., 38 (1952), pp. 235–243.
  • C. de Boor, “On calculating with B-splines,” J. Approximation Theory, 6 (1972), pp. 50–62.
  • R. Courant, “Variational methods for the solution of problems of equilibrium and vibrations,” Bull. Amer. Math. Soc., 49 (1943), pp. 1–23.
  • G. Golub and W. Kahan, “Calculating the singular values and pseudo-inverse of a matrix,” J. Soc. Indust. Appl. Math. Ser. B Numer. Anal., 2 (1965), pp. 205–224.
  • A. Brandt, “Multi-level adaptive solutions to boundary-value problems,” Math. Comp., 31 (1977), no. 138, pp. 333–390.
  • M.R. Hestenes and E. Stiefel, “Methods of conjugate gradients for solving linear systems,” J. Research Nat. Bur. Standards, 49 (1952), pp. 409–436.
  • R. Fletcher and M.J.D. Powell, “A rapidly convergent descent method for minimization,” Comput. J., 6 (1963/1964), pp. 163–168.
  • G. Wanner, E. Hairer, and S.P. Nørsett, “Order stars and stability theorems,” BIT, 18 (1978), no. 4, pp. 475–489.
  • N. Karmarkar, “A new polynomial-time algorithm for linear programming,” Combinatorica, 4 (1984), no. 4, pp. 373–395.
  • L. Greengard and V. Rokhlin, “A fast algorithm for particle simulations,” J. Comput. Phys., 73 (1987), no. 2, pp. 325–348.

Time Lines

Posted in Design 1 year, 3 months ago

What’s the best way to show a time series? A static plot only goes so far, and it’s actually very bad for presenting long timelines. Fortunately, designers have come up with creative uses of rich platforms such as Flash for interactive interfaces. All of these interfaces enable “zooming” to capture the level of detail.

Google Analytics

Google Analytics

The new Google Analytics platform uses elements from Google Finance. You can click and drag a “window” to focus on a period of interest.

British History Timeline

BBC history timeline

BBC recently released an interactive Flash program to explore British history from the Neolithic (nothing much to see here) ages to today. This is very useful to our history discussions we have often in the lab.

Google Timeline View

Google Timeline

At a conference last week, Google releases a couple of new interfaces to its engine. The timeline view plots the frequency of occurrence of search keywords. (I was talking to a friend about nanotechnology last week, and used this feature to show him when the hype peaked.)

Amazon Recommendations

Posted in Computing 1 year, 3 months ago

The principle axes transform is a rigid registration technique for images. By taking the Singular Value Decomposition of the correlation matrix of fiduciary markers, the parameters of the transformation can be determined (this corresponds to three orientation and three translation in three dimensions.)

In information retrieval, this is known as latent semantic analysis. For example, if we have data linking customers and items they’ve bought, we can build a matrix with this information. Row-rank approximations of this matrix can be used to cluster customers. For data the size that Amazon has, the matrix is extremely sparse. For obvious reasons, this method doesn’t scale.

How does Amazon do it then? I didn’t know if the recommendations were done in real-time or updated offline like Google does with PageRank. That’s when some research took me to this publication: Amazon.com recommendations: item-to-item collaborative filtering. The publication has a decent review of existing methods: collaborative filtering, cluster models, and search-based methods and the reasons why they don’t scale for Amazon. More importantly, they aren’t fine-grained enough to recommend relevant items.

Amazon’s approach is a little different. Instead of grouping a user to a cluster of existing customers, they cluster items instead. For the details, read the paper (it also has pseudo-code.) This might sound obvious now, but it was pretty novel ten years back (a patent from 1998 by Amazon precedes the publication.) The algorithm scales independently of the number of customers and number of items in the product catalog. The computations of similar items is still expensive and done offline, but the retrieval can be done in real-time with high quality.