Taking a Closer Look at Distributions
The normal distribution (also known as the Gaussian distribution or bell curve)
is a key concept in statistics. Much statistical inference is based on the assumption
that, at some point in your calculations, you have values that are distributed
normally. Testing whether the distribution of your data follows this bell curve
closely enough is often one of the first things you do before you choose a test to
test your hypothesis.
If you’re not familiar with the normal distribution, check out Statistics For
Dummies, 2nd Edition, by Deborah J. Rumsey, PhD (Wiley), which devotes a
whole chapter to this concept.
Let’s take a look at an example. The biologist and statistician Penny Reynolds
observed some beavers for a complete day and measured their body temperature
every ten minutes. She also wrote down whether the beavers were active at that
moment. You find the measurements for one of these animals in the dataset
, which consists of two data frames; for the examples, we use the second
one, which contains only four variables, as you can see in the following:
‘data.frame’: 100 obs. of 4 variables:
$ day : num 307 307 307 307 307 ...
$ time : num 930 940 950 1000 1010 ...
$ temp : num 36.6 36.7 ...
$ activ: num 0 0 0 0 0 ...
If you want to know whether there’s a difference between the average body
temperature during periods of activity and periods without, you first have to choose
a test. To know which test is appropriate, you need to find out if the temperature is
distributed normally during both periods. So, let’s take a closer look at the
Testing normality graphically
You could, of course, plot a histogram for every sample you want to look at.
You can use the
function pretty easily to plot histograms for different