Вы находитесь на странице: 1из 53

Two-Sample Tests

Two-Sample Tests

Two-Sample Tests

Population
Means, Means, Population Population
Independent Related Proportions Variances
Samples Samples

Examples:
Population 1 vs. Same population Proportion 1 vs. Variance 1 vs.
independent before vs. after Proportion 2 Variance 2
Population 2 treatment
Difference Between Two Means

Population means, Goal: Test hypothesis or form


independent
samples
* a confidence interval for the
difference between two
population means, μ1 – μ2
σ1 and σ2 known

The point estimate for the


σ1 and σ2 unknown, difference is
assumed equal
X1 – X2
σ1 and σ2 unknown,
not assumed equal
Independent Samples

 Different data sources


Population means,
 Unrelated
independent
samples
*  Independent
 Sample selected from one
population has no effect on the
sample selected from the other
σ1 and σ2 known population
 Use the difference between 2
σ1 and σ2 unknown, sample means
assumed equal  Use Z test, a pooled-variance t
test, or a separate-variance t
σ1 and σ2 unknown, test
not assumed equal
Difference Between Two Means

Population means,
independent
samples
*
σ1 and σ2 known Use a Z test statistic

Use Sp to estimate unknown


σ1 and σ2 unknown, σ , use a t test statistic and
assumed equal pooled standard deviation

σ1 and σ2 unknown, Use S1 and S2 to estimate


not assumed equal unknown σ1 and σ2, use a
separate-variance t test
σ1 and σ2 Known

Population means, Assumptions:


independent
samples  Samples are randomly and
independently drawn
σ1 and σ2 known *  Population distributions are
normal or both sample sizes
σ1 and σ2 unknown, are  30
assumed equal
 Population standard
σ1 and σ2 unknown, deviations are known
not assumed equal
σ1 and σ2 Known
(continued)

Population means, When σ1 and σ2 are known and


both populations are normal or
independent
both sample sizes are at least 30,
samples
the test statistic is a Z-value…

σ1 and σ2 known * …and the standard error of


X1 – X2 is
σ1 and σ2 unknown,
assumed equal 2 2
σ σ2
σ X1  X2  1

σ1 and σ2 unknown, n1 n2
not assumed equal
σ1 and σ2 Known
(continued)

Population means,
independent The test statistic for
samples μ1 – μ2 is:

σ1 and σ2 known * Z
 X1 
 X 2   μ1  μ2 
2 2
σ1 and σ2 unknown, σ σ2
assumed equal
1

n1 n2
σ1 and σ2 unknown,
not assumed equal
Hypothesis Tests for
Two Population Means
Two Population Means, Independent Samples

Lower-tail test: Upper-tail test: Two-tail test:

H0: μ1  μ2 H0: μ1 ≤ μ2 H0: μ1 = μ2


H1: μ1 < μ2 H1: μ1 > μ2 H1: μ1 ≠ μ2
i.e., i.e., i.e.,
H0: μ1 – μ2  0 H0: μ1 – μ2 ≤ 0 H0: μ1 – μ2 = 0
H1: μ1 – μ2 < 0 H1: μ1 – μ2 > 0 H1: μ1 – μ2 ≠ 0
Hypothesis tests for μ1 – μ2
Two Population Means, Independent Samples
Lower-tail test: Upper-tail test: Two-tail test:
H0: μ1 – μ2  0 H0: μ1 – μ2 ≤ 0 H0: μ1 – μ2 = 0
H1: μ1 – μ2 < 0 H1: μ1 – μ2 > 0 H1: μ1 – μ2 ≠ 0

a a a/2 a/2

-za za -za/2 za/2


Reject H0 if Z < -Za Reject H0 if Z > Za Reject H0 if Z < -Za/2
or Z > Za/2
Confidence Interval,
σ1 and σ2 Known

Population means,
independent The confidence interval for
samples μ1 – μ2 is:

σ1 and σ2 known *
 
2 2
σ σ2
X1  X 2  Z 1

σ1 and σ2 unknown, n1 n2
assumed equal

σ1 and σ2 unknown,
not assumed equal
Apply you Knowledge

 An experiment was performed to compare the fracture


toughness of high-purity 18 Ni maraging steel with
commercial purity steel of the same type. For m=32
specimens, the sample average toughness was 65.6 for
the high purity steel, while for n=38 specimens of
commercial steel the sample mean was 59.8. Because
high purity steel is more expensive, its use for a certain
application can be justified only if its fracture toughness
exceeds that of commercial purity steel by more than 5.
Assuming that σ =1.2 and σ =1.1,test the relevant
hypotheses using α = 0.001
σ1 and σ2 Unknown,
Assumed Equal

Population means, Assumptions:


independent
 Samples are randomly and
samples
independently drawn

σ1 and σ2 known  Populations are normally


distributed or both sample
σ1 and σ2 unknown,
assumed equal
* sizes are at least 30

 Population variances are


unknown but assumed equal
σ1 and σ2 unknown,
not assumed equal
σ1 and σ2 Unknown,
Assumed Equal
(continued)

Population means, Forming interval


independent estimates:
samples
 The population variances
are assumed equal, so use
σ1 and σ2 known the two sample variances
and pool them to
σ1 and σ2 unknown,
assumed equal
* estimate the common σ2

 the test statistic is a t value


σ1 and σ2 unknown, with (n1 + n2 – 2) degrees
not assumed equal of freedom
σ1 and σ2 Unknown,
Assumed Equal
(continued)

Population means,
independent
samples
The pooled variance is
σ1 and σ2 known

n1  1S
2
 n2  1S2
2
σ1 and σ2 unknown,
assumed equal
* S 2
p  1
(n1  1)  (n2  1)

σ1 and σ2 unknown,
not assumed equal
σ1 and σ2 Unknown,
Assumed Equal
(continued)

The test statistic for


Population means,
independent μ1 – μ2 is:

 X  X   μ  μ 
samples

t
1 2 1 2
σ1 and σ2 known
1 1 
S   
2

σ1 and σ2 unknown,
assumed equal
* p
 n1 n2 
Where t has (n1 + n2 – 2) d.f.,
and
σ1 and σ2 unknown, n1  1S1  n2  1S2
2 2

not assumed equal S 2



(n1  1)  (n2  1)
p
Confidence Interval,
σ1 and σ2 Unknown

Population means,
The confidence interval for
independent
samples μ1 – μ2 is:

σ1 and σ2 known
X  X   t
1 2 n1 n2 -2
1 1 
S   
2
p
 n1 n2 
σ1 and σ2 unknown,
assumed equal
*
Where

σ1 and σ2 unknown, n
S2  1
 1 S1
2
 n 2  1 S 2
2

(n1  1)  (n2  1)
p
not assumed equal
Apply you Knowledge

 In an experiment o study the effects of exposure to


ozone, 20 rats were exposed to ozone in the amount of 2
parts per million for a period of 30 days. The average
lung volume of these rats were determined to be 9.28 ml
with a standard deviation of 0.37, while the average lung
volume for a control group of 17 rats with similar
characteristics was 7.97 ml with a standard deviation of
0.41initial type. Does this data indicate that there is an
increase in true average lung volume due to ozone?
σ1 and σ2 Unknown,
Not Assumed Equal

Population means, Assumptions:


independent
 Samples are randomly and
samples
independently drawn

σ1 and σ2 known  Populations are normally


distributed or both sample
sizes are at least 30
σ1 and σ2 unknown,
assumed equal  Population variances are
unknown but cannot be
σ1 and σ2 unknown,
not assumed equal
* assumed to be equal
σ1 and σ2 Unknown,
Not Assumed Equal
(continued)

Population means,
independent Forming the test statistic:
samples
 The population variances
are not assumed equal, so
σ1 and σ2 known include the two sample
variances in the computation
σ1 and σ2 unknown, of the t-test statistic
assumed equal
 the test statistic is a t value
σ1 and σ2 unknown,
not assumed equal
* with v degrees of freedom
(see next slide)
σ1 and σ2 Unknown,
Not Assumed Equal
(continued)

Population means,
independent
The number of degrees of
samples freedom is the integer
portion of:
σ1 and σ2 known 2
S 2
S2 
2
 1
 
 n n 
σ1 and σ2 unknown,    12 2 
2 2
assumed equal  S1   S 2 
2
   
 n   n 
 1   2 
σ1 and σ2 unknown,
not assumed equal
* n1  1 n2  1
σ1 and σ2 Unknown,
Not Assumed Equal
(continued)

Population means,
independent The test statistic for
samples
μ1 – μ2 is:

 X  X   μ  μ 
σ1 and σ2 known

t
1 2 1 2
σ1 and σ2 unknown, 2 2
S S
assumed equal 
1 2
n1 n2
σ1 and σ2 unknown,
not assumed equal
*
Related Populations
Tests Means of 2 Related Populations
Related  Paired or matched samples
samples  Repeated measures (before/after)
 Use difference between paired values:

Di = X1i - X2i
 Eliminates Variation Among Subjects
 Assumptions:
 Both Populations Are Normally Distributed
 Or, if not Normal, use large samples
Mean Difference, σD Known
The ith paired difference is Di , where
Related Di = X1i - X2i
samples
n
The point estimate for
the population mean
D i
D i 1
paired difference is D : n
Suppose the population
standard deviation of the
difference scores, σD, is known
n is the number of pairs in the paired sample
Mean Difference, σD Known
(continued)
The test statistic for the mean
Paired difference is a Z value:
samples
D  μD
Z
σD
n
Where
μD = hypothesized mean difference
σD = population standard dev. of differences
n = the sample size (number of pairs)
Confidence Interval, σD Known

Paired The confidence interval for μD is


samples
σD
DZ
n
Where
n = the sample size
(number of pairs in the paired sample)
Mean Difference, σD Unknown
If σD is unknown, we can estimate the
Related unknown population standard deviation
samples with a sample standard deviation:

n
The sample standard
deviation is  i
(D  D ) 2

SD  i1
n 1
Mean Difference, σD Unknown
(continued)

 Use a paired t test, the test statistic for


Paired D is now a t statistic, with n-1 d.f.:
samples
D  μD
t
SD
n
n
Where t has n - 1 d.f.  i
(D  D ) 2

and SD is: SD  i1


n 1
Confidence Interval, σD Unknown

Paired The confidence interval for μD is


samples
SD
D  t n1
n
n

 (D  D)
i
2

where SD  i1
n 1
Hypothesis Testing for
Mean Difference, σD Unknown
Paired Samples

Lower-tail test: Upper-tail test: Two-tail test:

H0: μD  0 H0: μD ≤ 0 H0: μD = 0


H1: μD < 0 H1: μD > 0 H1: μD ≠ 0

a a a/2 a/2

-ta ta -ta/2 ta/2


Reject H0 if t < -ta Reject H0 if t > ta Reject H0 if t < -ta/2
or t > ta/2
Where t has n - 1 d.f.
Paired t Test Example
 Assume you send your salespeople to a “customer
service” training workshop. Has the training made a
difference in the number of complaints? You collect
the following data:
Number of Complaints: (2) - (1)  Di
Salesperson Before (1) After (2) Difference, Di D = n
C.B. 6 4 - 2 = -4.2
T.F. 20 6 -14
M.H. 3 2 - 1
R.K. 0 0 0
SD 
 (D  D)
i
2

M.O. 4 0 - 4 n 1
-21
 5.67
Paired t Test: Solution
 Has the training made a difference in the number of
complaints (at the 0.01 level)?
Reject Reject
H0: μD = 0
H1: μD  0
a/2 a/2
a = .01 D = - 4.2 - 4.604 4.604
- 1.66
Critical Value = ± 4.604
d.f. = n - 1 = 4
Decision: Do not reject H0
(t stat is not in the reject region)
Test Statistic:
Conclusion: There is not a
D  μD  4.2  0
t   1.66 significant change in the
SD / n 5.67/ 5 number of complaints.
Apply you Knowledge
 A car company reported results of an experiment to compare handling ability
for two cars having quite difference length, wheelbases, and turning radii.
The observations are time in second require to parallel park each car

Subject 1 2 3 4 5 6 7
Car A 37 25.8 16.2 24.2 22 33.4 23.8

Car B 17.8 20.2 16.8 41.4 21.4 38.4 16.8

Subject 8 9 10 11 12 13 14
Car A 58.2 33.6 24.4 23.4 21.2 36.2 29.8
Car B 32.2 27.8 23.2 29.6 20.6 32.2 53.8

 Does the data suggest that the average person will more easily handle one
car than the other? Test relevant hypotheses using α=0.10
Apply you Knowledge

 A sample of nine local banks shows their deposits (in billions of dollars) 3
years ago and their deposits (in billions of dollars) today. At a 0.05, can
it be concluded that the average in deposits for the banks is greater today
than it was 3 years ago? Use a 0.05

Bank 1 2 3 4 5 6 7 8 9

3 years 11.42 8.41 3.98 7.37 2.28 1.10 1.00 0.9 1.35
ago

Today 16.69 9.44 6.53 5.58 2.92 1.88 1.78 1.5 1.22
Two Population Proportions
Goal: test a hypothesis or form a
Population confidence interval for the difference
proportions between two population proportions,
π1 – π2
Assumptions:
n1 π1  5 , n1(1- π1)  5
n2 π2  5 , n2(1- π2)  5

The point estimate for


the difference is
p1  p2
Two Population Proportions
Since we begin by assuming the null
hypothesis is true, we assume π1 = π2
Population
and pool the two sample estimates
proportions
The pooled estimate for the
overall proportion is:

X1  X 2
p
n1  n2
where X1 and X2 are the numbers from
samples 1 and 2 with the characteristic of
interest
Two Population Proportions
(continued)

The test statistic for


Population p1 – p2 is a Z statistic:
proportions

Z
 p1  p2    π1  π2 
1 1
p (1 p)   
 n1 n2 
X1  X2 X X
where p , p1  1 , p 2  2
n1  n2 n1 n2
Confidence Interval for
Two Population Proportions

Population The confidence interval for


proportions
π1 – π2 is:

p1(1 p1 ) p2 (1 p2 )
 p1  p2  Z 
n1 n2
Hypothesis Tests for
Two Population Proportions
Population proportions

Lower-tail test: Upper-tail test: Two-tail test:

H0: π1  π2 H0: π1 ≤ π2 H0: π1 = π2


H1: π1 < π2 H1: π1 > π2 H1: π1 ≠ π2
i.e., i.e., i.e.,
H0: π1 – π2  0 H0: π1 – π2 ≤ 0 H0: π1 – π2 = 0
H1: π1 – π2 < 0 H1: π1 – π2 > 0 H1: π1 – π2 ≠ 0
Hypothesis Tests for
Two Population Proportions
(continued)
Population proportions
Lower-tail test: Upper-tail test: Two-tail test:
H0: π1 – π2  0 H0: π1 – π2 ≤ 0 H0: π1 – π2 = 0
H1: π1 – π2 < 0 H1: π1 – π2 > 0 H1: π1 – π2 ≠ 0

a a a/2 a/2

-za za -za/2 za/2


Reject H0 if Z < -Za Reject H0 if Z > Za Reject H0 if Z < -Za/2
or Z > Za/2
Example:
Two population Proportions
Is there a significant difference between the
proportion of men and the proportion of
women who will vote Yes on Proposition A?

 In a random sample, 36 of 72 men and 31 of


50 women indicated they would vote Yes

 Test at the .05 level of significance


Example:
Two population Proportions
(continued)

 The hypothesis test is:


H0: π1 – π2 = 0 (the two proportions are equal)
H1: π1 – π2 ≠ 0 (there is a significant difference between proportions)

 The sample proportions are:


 Men: p1 = 36/72 = .50
 Women: p2 = 31/50 = .62

 The pooled estimate for the overall proportion is:


X1  X 2 36  31 67
p    .549
n1  n2 72  50 122
Example:
Two population Proportions
(continued)
Reject H0 Reject H0

The test statistic for π1 – π2 is:


.025 .025
z
 p1  p2     1   2 
1 1
p (1  p)    -1.96 1.96
 n1 n2  -1.31

 .50  .62    0   1.31
 1 1  Decision: Do not reject H0
.549 (1  .549)   
 72 50 
Conclusion: There is not
significant evidence of a
Critical Values = ±1.96
For a = .05 difference in proportions
who will vote yes between
men and women.
Hypothesis Tests for Variances

Tests for Two


Population
*
H0: σ12 = σ22
Variances Two-tail test
H1: σ12 ≠ σ22

H0: σ12  σ22 Lower-tail test


F test statistic
H1: σ12 < σ22

H0: σ12 ≤ σ22 Upper-tail test


H1: σ12 > σ22
Hypothesis Tests for Variances
(continued)

Tests for Two


The F test statistic is:
Population
Variances 2
S
F 1

F test statistic * S 2
2

S12 = Variance of Sample 1


n1 - 1 = numerator degrees of freedom

S22 = Variance of Sample 2


n2 - 1 = denominator degrees of freedom
The F Distribution

 The F critical value is found from the F table


 There are two appropriate degrees of freedom:
numerator and denominator
S12
F 2 where df1 = n1 – 1 ; df2 = n2 – 1
S2
 In the F table,
 numerator degrees of freedom determine the column
 denominator degrees of freedom determine the row
Finding the Rejection Region
H0: σ12  σ22
H0: σ12 = σ22
a H1: σ12 < σ22
H1: σ12 ≠ σ22
a/2
0 F a/2
Reject Do not
H0 FL reject H0
Reject H0 if F < FL 0 F
Reject Do not Reject H0
H0 FL reject H0 FU
H0: σ1 ≤ σ2 2 2

H1: σ12 > σ22  rejection


S12
F  2  FU
a region for a S2
two-tail test is:
0 S12
F  2  FL
Do not Reject H0 F S2
reject H0 FU
Reject H0 if F > FU
Finding the Rejection Region
(continued)
H0: σ12 = σ22
a/2 H1: σ12 ≠ σ22
a/2

0 F
Reject Do not Reject H0
H0 FL reject H0 FU
To find the critical F values:
1
1. Find FU from the F table 2. Find FL using the formula: FL 
FU*
for n1 – 1 numerator and
n2 – 1 denominator Where FU* is from the F table
degrees of freedom with n2 – 1 numerator and n1 – 1
denominator degrees of freedom
(i.e., switch the d.f. from FU)
F Test: An Example

You are a financial analyst for a brokerage firm. You


want to compare dividend yields between stocks listed
on the NYSE & NASDAQ. You collect the following data:
NYSE NASDAQ
Number 21 25
Mean 3.27 2.53
Std dev 1.30 1.16

Is there a difference in the


variances between the NYSE
& NASDAQ at the a = 0.05 level?
F Test: Example Solution
 Form the hypothesis test:
H0: σ21 – σ22 = 0 (there is no difference between variances)
H1: σ21 – σ22 ≠ 0 (there is a difference between variances)

 Find the F critical values for a = 0.05:


FU: FL:
 Numerator:  Numerator:
 n1 – 1 = 21 – 1 = 20 d.f.  n2 – 1 = 25 – 1 = 24 d.f.
 Denominator:  Denominator:
 n2 – 1 = 25 – 1 = 24 d.f.  n1 – 1 = 21 – 1 = 20 d.f.

FU = F.025, 20, 24 = 2.33 FL = 1/F.025, 24, 20 = 1/2.41


= 0.415
F Test: Example Solution
(continued)

 The test statistic is: H0: σ12 = σ22


H1: σ12 ≠ σ22
S12 1.302
F 2  2
 1.256
S2 1.16
a/2 = .025 a/2 = .025

0 F
Reject H0 Do not Reject H0
reject H0
 F = 1.256 is not in the rejection FL=0.43
FU=2.33
region, so we do not reject H0

 Conclusion: There is not sufficient evidence


of a difference in variances at a = .05
Apply you Knowledge
 “Conservationists have despaired over destruction of
tropical rain forest by logging, clearing, and burning”
These words begin a report on statistical study of the
effects of logging in one of the forests. Here is a summary
of data on the number of tree species in logged and
unlogged plots
Plot mean variance number of
plots
unlogged 17.5 12.45 12

logged 13.67 20.25 9

 Do these data give evidence that logging affects the


variation in species counts among plots? Use α= 0.05
Apply you Knowledge
 Test of product quality using inspectors can lead to serious error problems. To
evaluate the performance of inspectors in a new company, a quality manager had
sample of 12 novice inspectors evaluate 200 finished products. The same 200
items were evaluated by 12 experienced inspectors. The quality of each item
whether defective or non defective was known to the manager. The following table
lists the number of inspection errors (classifying a defective item as non defective
or vice versa) made by each inspector.

Novice Inspectors Experienced Inspectors

30 35 26 40 31 15 25 19

36 20 45 31 28 17 19 18

33 29 21 48 24 10 20 21

 Prior to conducting this experiment, the manager believed that the variance in
inspection errors was lower for experienced inspectors than novice inspectors. Do
sample data support her belief? Test using a= 0.05