Normality, Graphs & R

R graph

It always irks me to see unsubstantiated claims such as “this is better,” or “this is faster.” Better or Faster than what? I’d much like to see the data and draw my own conclusions.

Similarly, graphs without error bars, or sample means without standard deviations are pure evil. If you don’t report the deviation, I suspect something fishy - like your samples being statistically insignificant.

This reminds me of a physics experiment a friend conducted sometime back. We were supposed to show the linear relationship between two variables. It was quite difficult to keep the other variables fixed, so after a while he gave up. His final dataset had two points. The best fit curve through two points is a straight line, right? QED.

On Linux, the only way to create publication quality graphs was by using gnuplot. If you’ve ever used gnuplot, you’d know how un-intuitive and complicated it is to accomplish even simple tasks. In short, it is an example of software that is very user-friendly, but it gets to choose who its friends are1.

I’ve recently been trying out the graphing capabilities from the R project. Just check out these graphs. The platform also supports scripting. I’ve slowly started using this program for all my graphing needs.

If you’re familiar with Monte Carlo simulations, you’d know that for an equilibrated stochastic process, the distribution of output values is a normal distribution. This is from the Central Limit Theorem, which states for “large” number of samples, the distribution is normal. Ofcourse, as with almost everything else in statistics, nobody tells you how “large” is “large.”

For one of my projects, I wanted to use the Kolmogorov-Smirnov test to check the normality of my output. Here’s a variation of a script I wrote in R:

S <- 10;
I <- 50;
N <- 10000;

p <- numeric(N);
v <- numeric(N);
j <- 0;

for (i in seq(S, N, I)) {
    y <- rnorm (i);
    j <- j + 1;
    v[j] <- i;
    p[j] <- ks.test (y, "pnorm", mean(y), sd(y))$statistic;
}

v <- v[p>0.00];
p <- p[p>0.00];

plot (v, p, "s", col="dark red")
title ("D -vs- N");

What this script does is compare the D-statistic from KS-test with a normal distribution. You can see the value dropping to zero as the number of samples increases.

Kolmogorov-Smirnov Test


  1. On the otherhand, gnuplot is awesome for scripting. I've used it a lot to plot data of the same class. ()

Possibly related:

  • No related posts

Leave a Reply