Вы находитесь на странице: 1из 30

Chapter 9

Inferences Based
on
Two Samples
9.1

z Tests and Confidence


Intervals for a
Difference Between
Two Population Means
The Difference Between Two
Population Means
New Notation
Assumptions:
1. X1,,Xm is a random sample from a
population with m1 and s 1 .
2

m: sample size 1
2. Y1,,Yn is a random sample2 from a
population with m 2 and s 2 .
n: sample size 2

3. The X and Y samples are independent


of one another
Expected Value and Standard
Deviation of X - Y
Think of this as
the parameter.
The expected value is m1 - m 2 .
So X - Y is an estimator of
m1 - m 2 .
The standard deviation is
s2
s 2
s X -Y = 1
+ 2
m n
Test Procedures for Normal
Populations With Known Variances

Null hypothesis: H 0 : m1 - m 2 = 0
same
x - y - D0
Test statistic value: z =
s 2
s 2
+
1 2
m n
b ( D
) = P(Type II Error)
Alt. Hypothesis b ( D
= m1 - m 2 )
H a : m1 - m 2 > D 0 D - D0
F za -
s
D
- D0
H a : m1 - m 2 < D 0 1- F - za -
s
D - D0
H a : m1 - m 2 D 0 F za / 2 -
s

D - D0
Similar to p. 330 -F - za / 2 -
formulas s
Large-Sample Tests

The assumptions of normal population


distributions and known values of s 1 , s 2
are unnecessary. The Central Limit
Theorem guarantees that X - Y has
approximately a normal distribution.

Rule of thumb: Both m, n>40


Large-Sample Tests
Use of the test statistic value Usually zero
x - y - D0
z= m, n >40
2 2
s s
1
+ 2
m n
along with previously stated rejection
regions based on z critical values give
large-sample tests whose significance
levels are approximately a .
Confidence Interval for m1 - m 2

Provided m and n are large, a CI for


m1 - m 2 with a confidence level of
100(1 - a )% is
2 2
s s
x - y za / 2 1
+ 2
m n
confidence bounds can be found by
replacing za / 2 by za .
9.2

The Two-Sample
t Test and
Confidence Interval
Assumptions

Both populations are normal, so that X1,


,Xm is a random sample from a normal
distribution and so is Y1,,Yn. The
plausibility of these assumptions can be
judged by constructing a normal
probability plot of the xis and another of
the yis.
Normality assumption important for (small-sample) t-tests!
t Distribution
When the population distributions are
both normal, the standardized variable
X - Y - ( m1 - m 2 )
T=
S12 S 22
+
m n

has approximately a t distribution


t Distribution

df v can be estimated from the data


by 2 2
Yuck! Dont do
s1 s2
2
by hand if you
+
m n can help it.
v=
( s1 / m ) + ( s2 / n )
2 2 2 2

m -1 n -1

(round down to the nearest integer)


Two-Sample CI for m1 - m 2

The two-sample CI for m1 - m 2


with a confidence level of 100(1 - a )%
is
2 2
s s
x - y ta / 2,v 1
+ 2
m n
Two-Sample t Test

Null hypothesis: H 0 : m1 - m 2 = D 0
Usually zero
x - y - D0
Test statistic value: z=
2 2
s s
1
+ 2
m n
The Two-Sample t Test
Alternative Rejection Region for
Hypothesis Approx. Level a Test

H a : m - m0 > D0 t ta ,v

H a : m - m0 < D0 t -ta ,v

H a : m - m 0 D 0 t ta / 2,v or t -ta / 2,v


Important: pooled t assumes equal variances
Pooled t Procedures

Assume two populations are normal and


have equal variances. If s denotes the
2

common variance, it can be estimated


by combining information from the two
samples. Standardizing X - Y using
the pooled estimator gives a t variable
based on m + n 2 df.
Pooled sample variance
( m - 1) S 2
( n - 1) S 2
S P2 = 1
+ 2
m+n-2 m+n-2

Usage in formulas:

S12 S 22 S P2 S P2 2 1 1
+ becomes + or S P +
m n m n m n
9.3

Analysis of

Paired Data
Paired Data (Assumptions)
Important: A natural pairing must exist!
The data consists of n independently
selected pairs (X1,Y1),, (Xn,Yn), with
E ( X i ) = m1 and E (Yi ) = m 2
Let D1 = X1 Y1, , Dn = Xn Yn.
The Dis are assumed to be normally
distributed 2with mean value m Dand
variance s D . Bottom line: Two-sample problem
becomes a one-sample problem!
The Paired t Test

Null hypothesis: H 0 : m D = D0
Usually zero
d - D0
Test statistic value: t=
sD / n
d and sD are the sample mean
and standard deviation of the dis.
The Paired t Test Nothing new
here!
Alternative Rejection Region for
Hypothesis Level a Test
H a : mD > D0 t ta ,n -1

H a : m D < D0 t -ta ,n -1
H a : m D D 0 t ta / 2,n -1 or t -ta / 2,n -1
Confidence Interval for m D
Nothing new
here!
The paired t CI for m D is

d ta / 2,n -1
sD / n
confidence bounds can be found by
replacing ta / 2 by ta .
For large samples, you could use Z test and CI
Paired Data and Two-Sample t
1
V ( X - Y ) = V ( D) = V Di
n
V ( Di ) s 1 + s 2 - 2 rs 1s 2
2 2
= =
n n
Remember: Smaller variance means better estimates
Independence between X and Y r = 0
Positive dependence r > 0
Pros and Cons of Pairing
1. For great heterogeneity and large correlation
within experimental units, the loss in degrees
of freedom will be compensated for by an
increased precision associated with pairing
(use pairing). Usually, were in case 1;
use pairing if possible.
2. If the units are relatively homogeneous and
the correlation within pairs is not large, the
gain in precision due to pairing will be
outweighed by the decrease in degrees of
freedom (use independent samples).
9.4

Inferences
Concerning a
Difference Between
Population Proportions
Difference Between Population
Proportions
Let X ~Bin(m,p1) and Y ~Bin(n,p2) with
X and Y independent variables. Then
p1 - p 2 is an estimator of p1 - p2
X Y
Note: p1 = and p2 =
E ( p1 - p 2 ) = p1 - p2 m n

p1q1 p2 q2
V ( p1 - p 2 ) = - (qi = 1 pi)
m n
mp1 10 and mq1 10 and np 2 10 and nq2 10
Large-Samples

Null hypothesis: H 0 : p1 - p2 = 0

Test statistic value:


p1 - p 2
z=
( 1/ m + 1/ n )
pq
Standard error involves p, a
weighted average of p1 and p2
Only for test of H 0 : p1 - p2 = 0,
Standard error involves p, a weighted average of p1 and p2

p = m n
p1 + p2
m+n m+n

p = Total number of successes (X + Y )


Total number of trials (m + n)
Confidence Interval for p1 p2

p1q1 p 2 q2
p1 - p 2 za / 2 +
m n

Note: Standard error here is


slightly different than for test!

Вам также может понравиться