
It always irks me to see unsubstantiated claims such as “this is
better,” or “this is faster.” Better or Faster than what? I’d much
like to see the data and draw my own conclusions.
Similarly, graphs without error bars, or sample means without standard
deviations are pure evil. If you don’t report the deviation, I suspect
something fishy - like your samples being statistically insignificant.
This reminds me of a physics experiment a friend conducted sometime
back. We were supposed to show the linear relationship between two
variables. It was quite difficult to keep the other variables fixed,
so after a while he gave up. His final dataset had two points. The
best fit curve through two points is a straight line, right? QED.
On Linux, the only way to create publication quality graphs was by
using gnuplot. If you’ve ever used gnuplot, you’d know how
un-intuitive and complicated it is to accomplish even simple tasks. In
short, it is an example of software that is very user-friendly, but it
gets to choose who its friends are.
I’ve recently been trying out the graphing capabilities from the R
project. Just check out these graphs. The platform also
supports scripting. I’ve slowly started using this program for all my
graphing needs.
If you’re familiar with Monte Carlo simulations, you’d know that
for an equilibrated stochastic process, the distribution of output
values is a normal distribution. This is from the Central Limit
Theorem, which states for “large” number of samples, the
distribution is normal. Ofcourse, as with almost everything else in
statistics, nobody tells you how “large” is “large.”
For one of my projects, I wanted to use the Kolmogorov-Smirnov
test to check the normality of my output. Here’s a variation of a
script I wrote in R:
S <- 10;
I <- 50;
N <- 10000;
p <- numeric(N);
v <- numeric(N);
j <- 0;
for (i in seq(S, N, I)) {
y <- rnorm (i);
j <- j + 1;
v[j] <- i;
p[j] <- ks.test (y, "pnorm", mean(y), sd(y))$statistic;
}
v <- v[p>0.00];
p <- p[p>0.00];
plot (v, p, "s", col="dark red")
title ("D -vs- N");
What this script does is compare the D-statistic from KS-test with a
normal distribution. You can see the value dropping to zero as the
number of samples increases.
