Archive for December, 2006

Emacs, Xhtml, Javascript

Posted in Web 5 years, 1 month ago

One of my projects for next year has Javascript and related technologies as a strong contender [[the others being C++ and Java]]. A big portion of the satisfaction that comes with programming has to be from the environment used for design and coding. Emacs does that for me.

For modifying the code behind this website, I make use of nxml-mode, css-mode, ecmascript-mode and php-mode. For embedded php, ecmascript or css code, I make use of html-inlined. I’m also going to try integrating Mozilla and emacs via MozRepl. The supposed benefit is never having to leave emacs to evaluate chunk of javascript.

Why emacs? First, I don’t really know any other tool that well. The big advantage for me is in how trivial it is to integrate different tools within the same program. I can transfer the same set of skills from writing papers, coding in C, lisp or assembly, or gasp playing Tetris. Moreover, if a feature isn’t available, it’s trivial to extend it (if you know elisp.) Now that I have my environment more or less setup, I’ll have to actually go and learn javascript…

BTW, emacs 22 is in pre-test. I built a CVS copy of emacs last week and it has been super-stable for me thus far.

Normality, Graphs & R

Posted in Computing 5 years, 1 month ago

R graph

It always irks me to see unsubstantiated claims such as “this is better,” or “this is faster.” Better or Faster than what? I’d much like to see the data and draw my own conclusions.

Similarly, graphs without error bars, or sample means without standard deviations are pure evil. If you don’t report the deviation, I suspect something fishy – like your samples being statistically insignificant.

This reminds me of a physics experiment a friend conducted sometime back. We were supposed to show the linear relationship between two variables. It was quite difficult to keep the other variables fixed, so after a while he gave up. His final dataset had two points. The best fit curve through two points is a straight line, right? QED.

On Linux, the only way to create publication quality graphs was by using gnuplot. If you’ve ever used gnuplot, you’d know how un-intuitive and complicated it is to accomplish even simple tasks. In short, it is an example of software that is very user-friendly, but it gets to choose who its friends are [[On the otherhand, gnuplot is awesome for scripting. I've used it a lot to plot data of the same class.]].

I’ve recently been trying out the graphing capabilities from the R project. Just check out these graphs. The platform also supports scripting. I’ve slowly started using this program for all my graphing needs.

If you’re familiar with Monte Carlo simulations, you’d know that for an equilibrated stochastic process, the distribution of output values is a normal distribution. This is from the Central Limit Theorem, which states for “large” number of samples, the distribution is normal. Ofcourse, as with almost everything else in statistics, nobody tells you how “large” is “large.”

For one of my projects, I wanted to use the Kolmogorov-Smirnov test to check the normality of my output. Here’s a variation of a script I wrote in R:

S <- 10;
I <- 50;
N <- 10000;

p <- numeric(N);
v <- numeric(N);
j <- 0;

for (i in seq(S, N, I)) {
    y <- rnorm (i);
    j <- j + 1;
    v[j] <- i;
    p[j] <- ks.test (y, "pnorm", mean(y), sd(y))$statistic;
}

v <- v[p>0.00];
p <- p[p>0.00];

plot (v, p, "s", col="dark red")
title ("D -vs- N");

What this script does is compare the D-statistic from KS-test with a normal distribution. You can see the value dropping to zero as the number of samples increases.

Kolmogorov-Smirnov Test

Zipf & Supercomputers

Posted in Computing 5 years, 1 month ago

Matei Ripeanu has a short note in IEEE Distributed Systems Online comparing the performance of supercomputers from the TOP500 list to a power law.

Interesting notes by Ian Foster:

  • Plot of performance versus rank is a power law.
  • The power law coefficient is getting smaller with time. Ian says that “bigger machines are getting faster more slowly than slower machines.” It also means that the gap between the first and last ranks on list is shrinking.
  • Top rankers are more likely to buy faster machines to maintain their ranks. The act of measuring performance evolution changes that very evolution (insert smart QM remark here…)