Академический Документы
Профессиональный Документы
Культура Документы
Read M&M 5.1, but ignore two starred parts at the end. Read M&M 5.2.
Skip M&M 5.3. Sampling distributions of counts, proportions and averages.
Binomial distribution. Normal approximations.
where the last sum in each line runs over all outcomes in S. The standard deviation X
is the square root of the variance.
Facts
For constants and and random variables X and Y :
X +Y = X + Y ,
+ X = + X ,
+
2
X = X .
2 2
Standard errors provide one measure of spread for the disribution of a random variable.
If we add together several random variables the spread in the distribution increases, in
Statistics 101106 Lecture 5 (6 October 98)
c David Pollard Page 2
general. For independent summands the increase is not as large as you might imagine:
it is not just a matter of adding together standard deviations. The key result is:
X2 +Y = X2 + Y2 if X and Y are independent random variables
0.2
20
Density
probs
0.1 10
0
0.0
Suppose a coin has probability p of landing heads on any particular toss. Let X denote
the number of heads obtained from n independent tosses. The random variable X can
take values 0, 1, 2 . . . , n. It is possible (see M&M pages 387389) to show that
n (n 1) . . . (n k + 1) k
P{X = k} = p (1 p)nk for k = 0, 1, 2 . . . , n
k (k 1) . . . 1
Such a random variable is said to have a Binomial distribution, with parameters n
M&M use the abbrevia- and p, or Bin(n, p) for short.
tion B(n, p).
Mean and variance of the Binomial distribution n
1 if ith toss lands heads
We can write X as a sum X 1 + X 2 + . . . + X n , where X i =
0 if ith toss lands tails
Each X i takes the value 1 with probability p and 0 with probability 1 p, giving a mean
of 1 p + 0 (1 p) = p. Thus X = X 1 + X 2 + . . . + X n = np. Sound reasonable?
Each X i has variance p (1 p)2 + (1 p) (0 p)2 = p(1 p). By independence
of the X i , the sum of the X i has variance var(X 1 ) + . . . + var(X n ) = np(1 p). Notice
that the independence between the X i is not used for the calculation of the mean, but it
is used for the calculation of the variance.
Statistics 101106 Lecture 5 (6 October 98)
c David Pollard Page 4
7. Monte Carlo
Using a computer and the Binomial distribution, one can determine areas.
The quarter circle shown in the picture occupies a fraction p = /4 0.785397
(1,1) of the unit square. I pretended I did not know that fact. I repeatedly
generated a large number of points in the unit square using Minitab, and
calculated the proportion that landed within the circle. The coordinates of
each point came from the Minitab command Random, with subcommand
Uniform. I saved a bunch of instructions in a file that I called monte.mac.
(You can retrieve a copy of this file from a link on the Syllabus page at
(0,0) the Statistics 101106 web site.) I was running Minitab from a directory
D:\stat100\Lecture5 on my computer. The file monte.mac was sitting in the directory
D:\stat100\macros. I had to tell Minitab how to find the macro file (.. means go up
You might find macro
one level). When I typed in a percent sign, followed by the path to my monte.mac file,
files useful if you want Minitab excuted the commands in the file.
to carry out simulations. [output slightly edited]
MTB > %..\macros\monte
Executing from file: ..\macros\monte.MAC
Data Display
sample size 1000
number of repetitions 2000
true p 0.785397
true std.dev 0.0129826
sample proportion 0.785152 ## mean of all 2000 proportions
sample std. dev. 0.0126428 ## from the 2000 proportions
0.4
99
0.3 95
90
Density
80
Percent
70
60
0.2 50
40
30
20
10
0.1 5
0.0
-4 -3 -2 -1 0 1 2 3 4
rescaled sample proportions -3 -2 -1 0 1 2 3
Data
Left: Histogram of standardized proportions, standard normal density superimposed.
Right: normal plot.
The pictures and the output from monte.mac show how the distribution of
proportions in 2000 samples, each of size 1000, is concentrated around the true p with
an approximately normal shape. For the pictures I subtracted the true p from each
sample proportion, then divided by the theoretical standard deviation, p(1 p)/n.
The central limit theorem says that the resulting standardized proportions should have
approximately a standard normal distribution. What do you think?
In practice, one would take a single large sample to estimate an unknown
proportion p, then invoke the normal approximation to derive a measure of precision.