Archive for the 'Computing' Category

Fast Determinant Calculation

Posted in Computing 5 days, 10 hours ago

Someone came to me recently for some help on computing determinants. Determinants of 2×2 and 3×3 and maybe even 4×4 matrices are trivial and you can easily hard code the expanded symbolic form. Determinants of bigger matrices are much harder and require careful thought.

First some rules:

  • Singular matrices have determinant zero. Simple, but often overlooked.
  • If you require a spectral decomposition at a later stage, the easiest is the product of the eigenvalues (the matrix cookbook is a handy reference.)
  • For triangular matrices, it’s the product of the diagonal entries.
  • The determinant of the product of two matrices is the product of their determinants.

Using the above rules, the first step is to factor your matrix into two triangular matrices. The LU decomposition achieves this. In general, a permutation matrix allows all ones on the diagonal of L. If you only care about the absolute value of the determinant, simply computing the product of the diagonal values of U gets you the determinant.

If your matrix is symmetric and positive definite, then you can use a specialization known as Cholesky decomposition which makes L the transpose of U (doing half the work with half the storage.) The determinant is then the product of the square of the diagonal elements. Diffusion tensors are covariance matrices (symmetric, positive definite) and so this decomposition always applies.

Modern Massive Data Sets

Posted in Computing 2 months ago

I’m excited about the Workshop on Modern Massive Data Sets I’ll be attending next week at Stanford.

The 2008 Workshop on Algorithms for Modern Massive Data Sets (MMDS 2008) will address algorithmic, mathematical, and statistical challenges in modern statistical data analysis. The goals of MMDS 2008 are to explore novel techniques for modeling and analyzing massive, high-dimensional, and nonlinearly-structured scientific and internet data sets, and to bring together computer scientists, statisticians, mathematicians, and data analysis practitioners to promote cross-fertilization of ideas.

These are my notes (disclaimer: I’m no expert, corrections are welcome.)

Most algorithm engineers thus far are happy with algorithms that are linear, i.e., \mathcal{O}(n) in the number of data points in the dataset. With web-scale data, linear is not good enough: sub-linear is required. These streams of data are typically unbounded and do not fit in main memory. They cannot even be stored, as it is infeasible to go back and reload them.

The Frequency Problem is used as a model problem over data streams, for example, computing means/variances, medians, top-k frequent items, distinct elements etc. One approach is to approximate the computation over the stream via random sampling.

There are four topics that keep recurring when looking at proofs of streaming algorithms using randomization:

This is really key. As an example, suppose we wanted to run some complicated algorithm (offline) over the packets flowing through a router. As this would a huge number of packets, we won’t have the luxury of storing all packets, but only a subset. By sampling the packets with probability p, we can estimate the amount of memory required to store the packets as the expectation value of the binomial variable with parameter p and n, where n is the number of packets going through the router.

The use of one of the bounds over the other is a matter of how tight the bound is. The Markov inequality only uses the expectation value of the random variable while the other use higher moments. They do this by using the Markov inequality to a moment generating function exp(tX) of the random variable X.

You’ll want to check the preliminary program for the kind of topics that are going to be covered. While you are at it, check out the workshop webpage from 2006. I’ll go find something to wipe the drool off my chin…

The SVD

Posted in Computing 6 months, 2 weeks ago

A couple of months back, Prof. Gilbert Strang had come to SFU as part of the distinguished speakers series (as I had written earlier.) After the talk, I got a chance to chat with the guru. I owe a lot of my understanding of linear algebra to his books and his kick-ass animations (check out these eigen-analysis demos,) and so I asked him about the single most important topic in linear algebra. Without hesitation, he immediately responded: “the SVD!”

The Singular Value Decomposition is the swiss-knife of linear algebra. Every matrix Y can be factored into three matrices: U, S, and V as

Y = U S V^t

U and V are orthogonal matrices and S is a diagonal matrix. Some uses of this factorization of the matrix Y: calculating 2-norms, Frobenius norms, ranks, null spaces and ranges, (pseudo)inverses and determinants, and by extension solving systems of equations (exact, over- and under- determined), eigenvalues and eigenvectors, and approximations.

Almost every problem in engineering becomes an optimization problem (as in reducing the error under some norm) and the method of least squares makes extensive use of the SVD.

The stage has been set for the next few posts of mine.

Fitting data with Python

Posted in Computing, Physics 6 months, 2 weeks ago

I’ve recently become a heavy user of the numerical capabilities of Python. I’ve written about my experiments before, but now I’m writing production quality code with numpy and matplotlib.

Mobility Temperature Plot

The above is an actual plot that I created for some Hall measurements I was doing. I was supposed for find functional relationships between temperature and majority charge carriers, which in my case were electrons because of the n-type doping. The simple case was a least squares fit: scipy.optimize.leastsq to the rescue. The more complicated part was solving a non-linear equation for roots and then doing a least squares fit. The root-finding module in scientific python provides lots of options. At this point, I can confidently say that this environment has more features than Octave.

Just today, I wanted to use the Fourier method on a differential equation (plug: the advantages of which are here) and numerical python with fft, fftshift and fftfreq are exact substitutes for their Matlab equivalents. You can also put actual LaTeX equations on plots, which is a major plus.

That is all.

Unit Tests in C

Posted in Computing 6 months, 2 weeks ago

I strongly think that unit tests are absolutely necessary to keep the overall quality of ones code high. I’ve written about Unit Tests before, so I’ll continue on that trend.

Most high performance numerical codes use some variant of Fortran or just raw C (which is my staple language.) In that case, as Micheal Feathers rightly points out, unit tests can “collide” with C.

I’ll want to expand on his solution of link seams.

GNU/Linux (and other Unix-like operating systems I think), provide a mechanism of loading shared object files at runtime. Using dlopen(3) and dlsym(3) you can re-create much of the reflection functionality that the Java community enjoys. So by loading your unit test dynamically at runtime, and having a predetermined symbol that behaves as the entry point, you can nicely wrap your module/function with a unit test. Interface this with ctest from the cmake project, and you are set. If you want to have a private function whose symbol isn’t visible on the outside, make it static. The nm utility will give you a list of public symbols in an object.

I’m pretty sure this is supported across the board. I remember doing this under DOS using DJGPP (remember that?) so this isn’t exactly novel.