Вы находитесь на странице: 1из 4

How to avoid negative persons

The Green Cow June 30, 2013

This month we start with a well-known joke about mathematicians. A physicist, a biologist and a mathematician are observing a bar. First, they see two people going in. After a while, three people are coming out of the bar. That is a measurement error, says the physicist. They multiplied, says the biologist. If one more person goes in, the bar will be empty, says the mathematician.1 A wise, old man said that this was the proof that negative persons do exist. But do they. We will take a look at the number of people that have visited The Green Cow each day over the past month.

Some calculations
The number of visitors for each day in the past month is listed in table 1. The average number of visitors per day was 1.5. As most of you will know, the average is calculated by adding all the numbers and dividing it by the number of observations. So when I look at my visitor numbers on any day, I can expect that there will be one or two visitors. O course, this does not mean that there will be one or two visitors on every day. There may be zero visitors, or three or even more. Statisticians found a measure to express how big the spread of the number of visitors is. When I look at the number of visitors this evening, I have a good chance that there will be three registered visitors, but it will not be a thousand. I can make an estimate of how big the chance is that I will have at least three visitors, given that the average is 1.5. Therefore I can use something that is called a standard deviation (which is abbreviated as SD). Normally, this thing is calculated as follows: SD = E ( X ) 2 (1)

That looks terrible, doesnt it? is a standard symbol for the mean, which is 1.5 in our case. X is the symbol for the data. Here it is the number of visitor each day. So (X-) is calculated by subtracting 1.5 from
1

If you dont get the joke, remember that 2-3+1=0

0,1,0,11 etc which will result in 30 new numbers. The 2 most of you will recognize. It is the square, so it is calculated by taking (X-) times (X-). For May 31, this is (-1.5)*(-1.5)=2.25. 2 The E means that we take the average of (X )2 . So we add the value of (X )2 for each day and divide the result by 30. Then we take the square root of the nal result. This is the standard deviation. It turns out to be 2.6. Very nice, this standard deviation, but what can we do with it? Well, a standard deviation is given in order to estimate the chances to get a certain value for your variable of interest. Here I wanted to know what the chances are to register a certain number of visitors on The Green Cow on one day. Normally it is assumed that 68% of the days will have a number of visitors between -SD and +SD (that is 1.5-2.6 and 1.5+2.6). Further it is assumed that 95%3 of the days will have number of visitors between -1.96*SD and +1.96*SD (that is 1.5-5.1 and 1.5+5.1). So on 95% of the days I will see a number of daily visitors between -3.6 and 6.6. There you go: a negative number of visitors. You will all realize that it is impossible to have a negative number of visitors. Such a number must be 0. Therefore the only explaination for the -3.6 is that I have negative persons visiting my website. 4

Spoilers
There is an alternative explanation to the -3.6 visitors. The fact is that I made a - far to common- statistical error. I intentionally have written that Normally, this thing is calculated as follows and that Normally it is assumed that 68%... The error is that this calculation and this percentage are only facts for normally distributed data. That is data that looks like gure 1. In gure 2 you can see that the data on the number of visitors does not look like the normal data at all. Therefore, I calculated the standard deviation in a wrong way and I end up with negative persons.5

You can calculate it for the other days from the numbers in 1 if you like. Click 4 Remember from highschool that + times - is - so here we have a positive number times a negative person in order to explain the -3.6. 5 However, I would not mind to have a bit more visitors to The Green Cow, so if you enjoy this blog you can spread the word.
3

Day May 31 June 1 June 2 June 3 June 4 June 5 June 6 June 7 June 8 June 9 June 10 June 11 June 12 June 13 June 14 June 15 June 16 June 17 June 18 June 19 June 20 June 21 June 22 June 23 June 24 June 25 June 26 June 27 June 28 June 29

Count 0 1 0 11 0 0 0 2 2 8 5 5 0 0 0 0 0 0 0 4 1 0 2 0 0 1 1 2 0 1

(X-) -1.5 -0.5 -1.5 9.5 -1.5 -1.5 -1.5 0.5 0.5 6.5 3.5 3.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 2.5 -0.5 -1.5 0.5 -1.5 -1.5 -0.5 -0.5 0.5 -1.5 -0.5 Table 1: Visitors of The Green Cow

Figure 1: What normally distributed data look like

Figure 2: What the data on the number of visitors look like

Вам также может понравиться