Archive for May, 2007

C++ Gymnastics

Posted in Computing 4 years, 8 months ago

I’m having an opportunity to play with a highly parallel machine (think >40 processors/cores) and starting to think of smart ways to make use of this power. For low level libraries, you can play all kinds of tricks with the C++ preprocessor, known as template meta programming.

I’m going to describe one that I thought of.

Modern processors have special registers for vector processing. For example, if you were asked to add four single-precision floating point values, you’d write something like this:

//lang cpp
float a[4], b[4], c[4];
for (int i=0; i<4; i++)
    c[i] = a[i] + b[i];

The SSE addps instruction will add four packed single-precision floating point values simultaneously. __m128 is the packed version of four floats.

//lang cpp
__m128 a, b, c;
c = addps (a, b);

If you had longer vectors, the logical extension of this idea is to create an array that is a multiple of four and then process the remaining elements in the normal way. In C, you’d write something like this:

//lang cpp
const int LENN = LEN / 4;
const int LENR = LEN % 4;
__m128 a[LENN], b[LENN], c[LENN];
float ar[LENR], br[LENR], cr[LENR];

for (int i=0; i<LENN; i++)
    c[i] = addps(a[i], b[i]);

for (int i=0; i<LENR; i++)
    cr[i] = ar[i] + br[i];

The problem in C is that if your program uses a vector length that is a multiple of four, the second loop is still executed. This isn’t optimal.

Using template meta programming, you’d create a template that uses the length of the vector as a template argument. What happens is that the variables LENN and LENR are determined at compile time, and the compiler will be able to optimize the second loop away (with a decent compiler anyways.)

The second advantage is that each instantiation of the template becomes a new type and gets you compile-time type safety. This makes it impossible to add vectors of different lengths (which happens all the time with void* tricks in vanilla C.)

I hope to share more tricks in the future, but that’s all for now.

Back Up Again

Posted in Web 4 years, 8 months ago

I had forgotten to renew my domain and it expired. That’s fixed now and I foresee no further problems.

Classic Math Papers

Posted in Paper 4 years, 8 months ago

I walked into a gold mine: a list of thirteen classic papers in applied math. The webpage has other key papers, but this list should keep me occupied for some time. I’m reproducing the list for my own reference:

  • J.W. Cooley and J.W. Tukey, “An algorithm for the machine calculation of complex Fourier series,” Math. Comp., 19 (1965), pp. 297–301.
  • R. Courant, K. Friedrichs, and H. Lewy, “On the partial difference equations of mathematical physics,” IBM J. Res. Develop., 11 (1967), pp. 215–234.
  • A.S. Householder, “Unitary triangularization of a nonsymmetric matrix,” J. Assoc. Comput. Mach., 5 (1958), pp. 339–342.
  • C.F. Curtiss and J.O. Hirschfelder, “Integration of stiff equations,” Proc. Nat. Acad. Sci. U.S.A., 38 (1952), pp. 235–243.
  • C. de Boor, “On calculating with B-splines,” J. Approximation Theory, 6 (1972), pp. 50–62.
  • R. Courant, “Variational methods for the solution of problems of equilibrium and vibrations,” Bull. Amer. Math. Soc., 49 (1943), pp. 1–23.
  • G. Golub and W. Kahan, “Calculating the singular values and pseudo-inverse of a matrix,” J. Soc. Indust. Appl. Math. Ser. B Numer. Anal., 2 (1965), pp. 205–224.
  • A. Brandt, “Multi-level adaptive solutions to boundary-value problems,” Math. Comp., 31 (1977), no. 138, pp. 333–390.
  • M.R. Hestenes and E. Stiefel, “Methods of conjugate gradients for solving linear systems,” J. Research Nat. Bur. Standards, 49 (1952), pp. 409–436.
  • R. Fletcher and M.J.D. Powell, “A rapidly convergent descent method for minimization,” Comput. J., 6 (1963/1964), pp. 163–168.
  • G. Wanner, E. Hairer, and S.P. Nørsett, “Order stars and stability theorems,” BIT, 18 (1978), no. 4, pp. 475–489.
  • N. Karmarkar, “A new polynomial-time algorithm for linear programming,” Combinatorica, 4 (1984), no. 4, pp. 373–395.
  • L. Greengard and V. Rokhlin, “A fast algorithm for particle simulations,” J. Comput. Phys., 73 (1987), no. 2, pp. 325–348.