Вы находитесь на странице: 1из 50

Two-Sample Tests

Learning Objectives
In this session, you learn how to use hypothesis
testing for comparing the difference between:
The means of two independent populations
The means of two related populations
The proportions of two independent
populations
The variances of two independent
populations
Two-Sample Tests Overview
Two Sample Tests
Independent
Population
Means
Means,
Related
Populations
Independent
Population
Variances
Group 1 vs.
Group 2
Same group
before vs. after
treatment
Variance 1 vs.
Variance 2
Examples
Independent
Population
Proportions
Proportion 1vs.
Proportion 2
Two-Sample Tests
Independent
Population Means

1
and
2
known

1
and
2
unknown
Goal: Test hypothesis or form
a confidence interval for the
difference between two
population means,
1

2
The point estimate for the
difference between sample
means:
X
1
X
2
Two-Sample Tests
Independent Populations
Independent
Population Means

1
and
2
known

1
and
2
unknown
Different data sources
Independent: Sample selected
from one population has no
effect on the sample selected
from the other population
Use the difference between 2
sample means
Use Z test, pooled variance t
test, or separate-variance t test
Two-Sample Tests
Independent Populations
Independent
Population Means

1
and
2
known

1
and
2
unknown
Use a Z test statistic
Use S to estimate unknown ,
use a t test statistic
Two-Sample Tests
Independent Populations
Independent
Population Means

1
and
2
known

1
and
2
unknown
Assumptions:
Samples are randomly and
independently drawn
population distributions are
normal
Two-Sample Tests
Independent Populations
Independent
Population Means

1
and
2
known

1
and
2
unknown
When
1
and
2
are known and both
populations are normal, the test
statistic is a Z-value and the
standard error of X
1
X
2
is
2
2
2
1
2
1
X X
n

2 1
+ =

Two-Sample Tests
Independent Populations
Independent
Population Means

1
and
2
known

1
and
2
unknown
( ) ( )
2
2
2
1
2
1
2 1
2 1
n

X X
Z
+

=
The test statistic is:
Two-Sample Tests
Independent Populations
Lower-tail test:
H
0
:
1
>
2
H
1
:
1
<
2
i.e.,
H
0
:
1

2
> 0
H
1
:
1

2
< 0
Upper-tail test:
H
0
:
1

2
H
1
:
1
>
2
i.e.,
H
0
:
1

2
0
H
1
:
1

2
> 0
Two-tail test:
H
0
:
1
=
2
H
1
:
1

2
i.e.,
H
0
:
1

2
= 0
H
1
:
1

2
0
Two Independent Populations, Comparing Means
Two-Sample Tests
Independent Populations
Two Independent Populations, Comparing Means
Lower-tail test:
H
0
:
1

2
> 0
H
1
:
1

2
< 0
Upper-tail test:
H
0
:
1

2
0
H
1
:
1

2
> 0
Two-tail test:
H
0
:
1

2
= 0
H
1
:
1

2
0
o o/2 o/2 o
-z
o
-z
o/2
z
o
z
o/2
Reject H
0
if Z < -Z
a
Reject H
0
if Z > Z
a
Reject H
0
if Z < -Z
a/2
or Z > Z
a/2
Two-Sample Tests
Independent Populations
Independent
Population Means

1
and
2
known

1
and
2
unknown
Assumptions:
Samples are randomly and
independently drawn
Populations are normally
distributed
Population variances are
unknown but assumed equal
Two-Sample Tests
Independent Populations
Independent
Population Means

1
and
2
known

1
and
2
unknown
Forming interval estimates:
The population variances
are assumed equal, so use
the two sample standard
deviations and pool them to
estimate
the test statistic is a t value
with (n
1
+ n
2
2) degrees
of freedom
Two-Sample Tests
Independent Populations
Independent
Population Means

1
and
2
known

1
and
2
unknown
The pooled standard
deviation is:
( ) ( )
1) n ( ) 1 (n
S 1 n S 1 n
S
2 1
2
2 2
2
1 1
p
+
+
=
Two-Sample Tests
Independent Populations
Where t has (n
1
+ n
2
2) d.f., and
( ) ( )
|
|
.
|

\
|
+

=
2 1
2
p
2 1
2 1
n
1
n
1
S
X X
t
The test statistic is:
( ) ( )
1) n ( ) 1 (n
S 1 n S 1 n
S
2 1
2
2 2
2
1 1
2
p
+
+
=
Independent
Population Means

1
and
2
known

1
and
2
unknown
Two-Sample Tests
Independent Populations
You are a financial analyst for a brokerage firm. Is there a
difference in dividend yield between stocks listed on the
NYSE & NASDAQ? You collect the following data:
NYSE NASDAQ
Number 21 25
Sample mean 3.27 2.53
Sample std dev 1.30 1.16
Assuming both populations are approximately normal with equal
variances, is there a difference in average yield (o = 0.05)?
Two-Sample Tests
Independent Populations
( ) ( ) ( ) ( )
1.5021
1) 25 ( 1) - (21
1.16 1 25 1.30 1 21
1) n ( ) 1 (n
S 1 n S 1 n
S
2 2
2 1
2
2 2
2
1 1
2
p
=
+
+
=
+
+
=
( ) ( ) ( )
2.040
25
1
21
1
5021 . 1
0 2.53 3.27
n
1
n
1
S
X X
t
2 1
2
p
2 1
2 1
=
|
.
|

\
|
+

=
|
|
.
|

\
|
+

=
The test statistic is:
Two-Sample Tests
Independent Populations
H
0
:
1
-
2
= 0 i.e. (
1
=
2
)
H
1
:
1
-
2
0 i.e. (
1

2
)
o = 0.05
df = 21 + 25 - 2 = 44
Critical Values: t = 2.0154
Test Statistic: 2.040
t
0
2.0154 -2.0154
.025
Reject H
0
Reject H
0
.025
Decision: Reject H
0
at = 0.05
2.040
Conclusion: There is evidence
of a difference in the means.
Two-Sample Tests
Independent Populations
Independent
Population Means

1
and
2
known

1
and
2
unknown
( )
2
2
2
1
2
1
2 1
n

X X + Z
The confidence interval for

1

2
is:
Two-Sample Tests
Independent Populations
Independent
Population Means

1
and
2
known

1
and
2
unknown
( )
|
|
.
|

\
|
+
+
2 1
2
p 2 - n n
2 1
n
1
n
1
S X X
2 1
t
The confidence interval for

1

2
is:
Where
( ) ( )
1) n ( ) 1 (n
S 1 n S 1 n
S
2 1
2
2 2
2
1 1
2
p
+
+
=
Two-Sample Tests
Related Populations
D = X
1
- X
2
Tests Means of 2 Related Populations
Paired or matched samples
Repeated measures (before/after)
Use difference between paired values:
Eliminates Variation Among Subjects
Assumptions:
Both Populations Are Normally Distributed
Two-Sample Tests
Related Populations
The ith paired difference is D
i
, where
n
D
D
n
1 i
i
=
=
D
i
= X
1i
- X
2i
The point estimate for the population mean
paired difference is D :
Suppose the population standard deviation of
the difference scores,
D
, is known.
Two-Sample Tests
Related Populations
The test statistic for the mean difference is a Z
value:
n

D
Z
D
D

=
Where

D
= hypothesized mean difference

D
= population standard deviation of differences
n = the sample size (number of pairs)
Two-Sample Tests
Related Populations
If
D
is unknown, you can estimate the
unknown population standard deviation with a
sample standard deviation:
1 n
) D (D
S
n
1 i
2
i
D

=

=
Two-Sample Tests
Related Populations
1 n
) D (D
S
n
1 i
2
i
D

=

=
n
S
D
t
D
D

=
The test statistic for D is now a t statistic:
Where t has n - 1 d.f.
and S
D
is:
Two-Sample Tests
Related Populations
Lower-tail test:
H
0
:
D
> 0
H
1
:
D
< 0
Upper-tail test:
H
0
:
D
0
H
1
:
D
> 0
Two-tail test:
H
0
:
D
= 0
H
1
:
D
0
o o/2 o/2 o
-t
o
-t
o/2
t
o
t
o/2
Reject H
0
if t < -t
a
Reject H
0
if t > t
a
Reject H
0
if t < -t
a/2
or t > t
a/2
Two-Sample Tests
Related Populations Example
Assume you send your salespeople to a customer
service training workshop. Has the training made a
difference in the number of complaints? You collect the
following data:
Salesperson Number of Complaints Difference, D
i
(2-1)
Before (1) After (2)
C.B. 6 4 -2
T.F. 20 6 -14
M.H. 3 2 -1
R.K. 0 0 0
M.O 4 0 -4
Two-Sample Tests
Related Populations Example
2 . 4
n
D
D
n
1 i
i
= =

=
5.67
1 n
) D (D
S
2
i
D
=

=

Salesperson Number of Complaints Difference, D
i
(2-1)
Before (1) After (2)
C.B. 6 4 -2
T.F. 20 6 -14
M.H. 3 2 -1
R.K. 0 0 0
M.O 4 0 -4
Two-Sample Tests
Related Populations Example
Has the training made a difference in the number of
complaints (at the = 0.01 level)?
H
0
:
D
= 0
H
1
:
D
= 0
Critical Value = 4.604
d.f. = n - 1 = 4
Test Statistic:
1.66
5 5.67/
0 4.2
n / S

t
D
D
=

=

=
D
Two-Sample Tests
Related Populations Example
Reject
- 4.604 4.604
Reject
o/2
- 1.66
Decision: Do not reject H
0
(t statistic is not in the reject
region)
Conclusion: There is no
evidence of a significant change
in the number of complaints
o/2
Two-Sample Tests
Related Populations
The confidence interval for
D
( known) is:
n

D
Z D
Where
n = the sample size (number of pairs in the paired sample)
Two-Sample Tests
Related Populations
The confidence interval for
D
( unknown) is:
1 n
) D (D
S
n
1 i
2
i
D

=

=
n
S
t D
D
1 n

where
Two Population Proportions
Goal: Test a hypothesis or form a confidence
interval for the difference between two
independent population proportions,
1

2
Assumptions:
n
1

1
> 5 , n
1
(1-
1
) > 5
n
2

2
> 5 , n
2
(1-
2
) > 5
The point estimate for the difference is p
1
- p
2
Two Population Proportions
Since you begin by assuming the null
hypothesis is true, you assume
1
=
2
and pool
the two sample (p) estimates.
2 1
2 1
n n
X X
p
+
+
=
The pooled estimate for
the overall proportion is:
where X
1
and X
2
are the number of
successes in samples 1 and 2
Two Population Proportions
( ) ( )
|
|
.
|

\
|
+

=
2 1
2 1 2 1
1 1
) 1 (
n n
p p
p p
Z
t t
The test statistic for p
1
p
2
is a Z statistic:
2
2
2
1
1
1
2 1
2 1
n
X
,
n
X
,
n n
X X
p = =
+
+
= P P where
Two Population Proportions
Hypothesis for Population Proportions
Lower-tail test:
H
0
:
1
>
2
H
1
:
1
<
2
i.e.,
H
0
:
1

2
> 0
H
1
:
1

2
< 0
Upper-tail test:
H
0
:
1

2
H
1
:
1
>
2
i.e.,
H
0
:
1

2
0
H
1
:
1

2
> 0
Two-tail test:
H
0
:
1
=
2
H
1
:
1

2
i.e.,
H
0
:
1

2
= 0
H
1
:
1

2
0
Two Population Proportions
Hypothesis for Population Proportions
Lower-tail test:
H
0
:
1

2
> 0
H
1
:
1

2
< 0
Upper-tail test:
H
0
:
1

2
0
H
1
:
1

2
> 0
Two-tail test:
H
0
:
1

2
= 0
H
1
:
1

2
0
o o/2 o/2 o
-z
o
-z
o/2
z
o
z
o/2
Reject H
0
if Z < -Z
o
Reject H
0
if Z > Z
o
Reject H
0
if Z < -Z
o/2
or Z > Z
o/2
Two Independent Population
Proportions: Example
Is there a significant difference between the
proportion of men and the proportion of
women who will vote Yes on Proposition A?
In a random sample of 72 men, 36 indicated
they would vote Yes and, in a sample of 50
women, 31 indicated they would vote Yes
Test at the .05 level of significance
Two Independent Population
Proportions: Example
H
0
:
1

2
= 0 (the two proportions are equal)
H
1
:
1

2
0 (there is a significant difference
between proportions)
The sample proportions are:
Men: p
1
= 36/72 = .50
Women: p
2
= 31/50 = .62
The pooled estimate for the overall proportion is:
.549
122
67
50 72
31 36
n n
X X
p
2 1
2 1
= =
+
+
=
+
+
=
Two Independent Population
Proportions: Example
The test statistic for
1

2
is:
( ) ( )
( ) ( )
1.31
50
1
72
1
.549) (1 .549
0 .62 .50
n
1
n
1
) p (1 p
z
2 1
2 1 2 1
=
|
.
|

\
|
+

=
|
|
.
|

\
|
+

=
t t p p
Critical Values = 1.96
For o = .05
.025
-1.96 1.96
.025
-1.31
Decision: Do not reject H
0
Conclusion: There is no evidence of a
significant difference in proportions who
will vote yes between men and women.
Reject H
0
Reject H
0
Two Independent Population
Proportions
( )
2
2 2
1
1 1
2 1
n
) (1
n
) (1 p p p p
Z p p

+


The confidence interval for
1

2
is:
Testing Population Variances
Purpose: To determine if two independent
populations have the same variability.
H
0
:
1
2
=
2
2
H
1
:
1
2

2
2
H
0
:
1
2
>
2
2
H
1
:
1
2
<
2
2
H
0
:
1
2

2
2
H
1
:
1
2
>
2
2
Two-tail test
Lower-tail test Upper-tail test
Testing Population Variances
2
2
2
1
S
S
F =
The F test statistic is:
= Variance of Sample 1
n
1
- 1 = numerator degrees of freedom
n
2
- 1 = denominator degrees of freedom
= Variance of Sample 2
2
1
S
2
2
S
Testing Population Variances
The F critical value is found from the F table
There are two appropriate degrees of
freedom: numerator and denominator.
In the F table,
numerator degrees of freedom determine the
column
denominator degrees of freedom determine the
row
Testing Population Variances
0
o
F
L
Reject
H
0
Do not
reject H
0
H
0
:
1
2
>
2
2
H
1
:
1
2
<
2
2
Reject H
0
if F < F
L
0
o
F
U
Reject H
0
Do not
reject H
0
H
0
:
1
2

2
2
H
1
:
1
2
>
2
2
Reject H
0
if F > F
U
Lower-tail test Upper-tail test
Testing Population Variances
L
2
2
2
1
U
2
2
2
1
F
S
S
F
F
S
S
F
< =
> =
rejection region
for a two-tail test is:
F
0
o/2
Reject H
0
Do not
reject H
0
F
U
H
0
:
1
2
=
2
2
H
1
:
1
2

2
2
F
L
o/2
Two-tail test
Testing Population Variances
To find the critical F values:
1. Find F
U
from the F table for n
1
1
numerator and n
2
1 denominator degrees
of freedom.
* U
L
F
1
F =
2. Find F
L
using the formula:
Where F
U*
is from the F table with n
2
1
numerator and n
1
1 denominator degrees of
freedom (i.e., switch the d.f. from F
U
)
Testing Population Variances
You are a financial analyst for a brokerage firm. You
want to compare dividend yields between stocks listed
on the NYSE & NASDAQ. You collect the following
data:
NYSE NASDAQ
Number 21 25
Mean 3.27 2.53
Std dev 1.30 1.16
Is there a difference in the variances between the
NYSE & NASDAQ at the o = 0.05 level?
Testing Population Variances
Form the hypothesis test:
H
0
:
2
1

2
2
= 0 (there is no difference between variances)
H
1
:
2
1

2
2
0 (there is a difference between variances)
Numerator:
n
1
1 = 21 1 = 20 d.f.
Denominator:
n
2
1 = 25 1 = 24 d.f.
F
U
= F
.025, 20, 24
= 2.33
Numerator:
n
2
1 = 25 1 = 24 d.f.
Denominator:
n
1
1 = 21 1 = 20 d.f.
F
L
= 1/F
.025, 24, 20
= 0.41
F
U
: F
L
:
Testing Population Variances
The test statistic is:
256 . 1
16 . 1
30 . 1
2
2
2
2
2
1
= = =
S
S
F
0
o/2 = .025
F
U
=2.33
Reject H
0
Do not
reject H
0
F
L
=0.41
o/2 = .025
Reject H
0
F
F = 1.256 is not in the
rejection region, so we do
not reject H
0
Conclusion: There is insufficient
evidence of a difference in
variances at o = .05

Вам также может понравиться