Вы находитесь на странице: 1из 26

GOODNESS OF FIT,

INDEPENDENCE,
AND
HOMOGENEITY TEST
This week’s Objectives...
After careful study of this chapter, you should be able to do the
following:
1.Use the chi-square goodness-of-fit test to check distributional assumptions.
2.Use contingency table tests to check the independence.
3.Use contingency table to test homogeneity.
Testing for Goodness of Fit
Throughout the hypotheses testing chapter, we have been
concerned with the testing of statistical hypotheses about
single and two populations parameters such as 𝜇, 𝜎 2 𝑎𝑛𝑑 𝑝.

Now we shall consider a test to determine if a population


has a specified theoretical distribution..

In this test, we consider:

Frequency of
occurrence of
observations in an
observed sample
Expected frequency
from hypothesis
?
How good a fit
Testing for Goodness of Fit
• The test is based on the chi-square distribution.
• Assume there is a sample of size n from a population whose
probability distribution is unknown.
• Let Oi be the observed frequency in the ith class interval.
• Let Ei be the expected frequency in the ith class interval.
The test statistic is
𝑘
𝑂𝑖 − 𝐸𝑖 2
𝜒02 =
𝐸𝑖
𝑖=1
Hypothesis Testing for Goodness of Fit
𝐻0 = 𝑇ℎ𝑒 𝑓𝑜𝑟𝑚 𝑜𝑓 𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑜𝑓 < 𝑠𝑜𝑚𝑒𝑡ℎ𝑖𝑛𝑔 > 𝑓𝑜𝑙𝑙𝑜𝑤𝑠 < 𝑋 > 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
𝐻1 = 𝑇ℎ𝑒 𝑓𝑜𝑟𝑚 𝑜𝑓 𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑜𝑓 < 𝑠𝑜𝑚𝑒𝑡ℎ𝑖𝑛𝑔 > 𝑑𝑜𝑒𝑠 𝑛𝑜𝑡 𝑓𝑜𝑙𝑙𝑜𝑤 < 𝑋 > 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛

We would reject 𝐻0 if the calculated value of the test statistic satisfies


2
𝜒0 > 𝜒𝛼,𝑘−𝑝−1
The degree of freedom is k-p-1 when p represents the number of parameters
of the hypothesized distribution estimated by sample statistics

Important to remember:
There is no general agreement regarding the minimum value of expected
frequency (𝐸𝑖 ) , but some writers suggest that 𝐸𝑖 should exceed 5
EXAMPLE 9-12 Printed Circuit Board Defects-Discrete Distribution

Poisson Distribution
The number of defects in printed circuit boards is hypothesized to
follow a Poisson distribution. A random sample of n = 60 printed
boards has been collected, and the following number of defects
observed.
The mean of the assumed Poisson distribution in this example is unknown and
must be estimated from the sample data. The estimate of the mean number of
defects per board is the sample average, that is, (32·0 + 15·1 + 9·2 + 4·3)/60 = 0.75.
From the Poisson distribution with parameter 0.75, we may compute pi, the
theoretical, hypothesized probability associated with the ith class interval. Since
each class interval corresponds to a particular number of defects, we may find the
pi as follows:
𝑒 −0.75 0.75 0
𝑝1 = 𝑃 𝑋 = 0 = = 0.472
0!
𝑒 −0.75 0.75 1
𝑝2 = 𝑃 𝑋 = 1 = = 0.354
1!
𝑒 −0.75 0.75 2
𝑝3 = 𝑃 𝑋 = 2 = = 0.133
2!

𝑝4 = 𝑃 𝑋 ≥ 3 = 1 − 𝑝1 + 𝑝2 + 𝑝3 = 0.041
The expected frequencies are computed by multiplying the
sample size n = 60 times the probabilities pi.
E = np
That is, i i.
The expected frequencies follow:
Since the expected frequency in the last cell is less than 5, we
combine the last two cells:

The chi-square test statistic in Equation 9-16 will have k p  1 = 3


1  1 = 1 degree of freedom. because the mean of the Poisson
distribution was estimated from the data.
It is your turn to conduct the hypothesis
testing and make a conclusion!
EXAMPLE Battery Life-Continuous Distribution

Normal Distribution
The battery life of a specified brand is hypothesized to follow a
Normal distribution with mean 3.5 and standard deviation 0.7. Verify
this hypothesis!
Applying normal approximation factor
for each class interval:
Calculating Expected Frequencies
For example: z values corresponding to the fourth class where
2.95<x<3.45
2.95 − 3.5 3.45 − 3.5
𝑧1 = = −0.79 𝑧2 = = −0.07
0.7 0.7

Thus the area between z1 and z2 is,


𝑃 −0.79 < 𝑍 < −0.07 = Φ −0.07 − Φ −0.79 = 0.2573

Hence, the expected frequency for the fourth class is:


𝐸4 = 𝑛𝑝4 = 40 0.2573 = 10.3
After calculating all expected frequencies
for all classes we get:
It is your turn to conduct the hypothesis
testing and make a conclusion!
Contingency Table Tests
Many times, the n elements of a sample from a population may be
classified according to two or more different criteria. It is then of
interest to know whether the two or more methods of classification
are statistically independent and/or homogenous.
We are interested in testing the hypothesis that the row-and-column
methods of classification are independent and homogenous. If we
reject this hypothesis, we conclude there is some interaction and
similarity between the criteria of classification. The exact test
procedures are difficult to obtain, but an approximate test statistic is
valid for large n.
𝒄
𝟏
𝒖𝒊 = 𝑶𝒊𝒋
𝒏
𝒋=𝟏

𝒓
𝟏
𝒗𝒋 = 𝑶𝒊𝒋
𝒏
𝒊=𝟏
Therefore, the expected frequency of each cell is

𝒄 𝒓
𝟏
𝑬𝒊𝒋 = 𝒏𝒖𝒊 𝒗𝒋 = 𝑶𝒊𝒋 𝑶𝒊𝒋
𝒏
𝒋=𝟏 𝒊=𝟏

Then, for large n, the statistic

𝒓 𝒄 𝟐
𝑶𝒊𝒋 − 𝑬𝒊𝒋
𝝌𝟐𝟎 =
𝑬𝒊𝒋
𝒊=𝟏 𝒋=𝟏

as an approximate chi-square distribution with (r 1)(c  1) degrees of freedom


if the null hypothesis is true. We should reject the null hypothesis if
2
𝜒0 > 𝜒𝛼, 𝑟−1 (𝑐−1)
Hypothesis Testing for Independency
𝐻0 =< 𝑋 > , < 𝑌 >, < 𝑂𝑡ℎ𝑒𝑟𝑠 > 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡
𝐻1 =< 𝑋 > , < 𝑌 >, < 𝑂𝑡ℎ𝑒𝑟𝑠 > 𝑎𝑟𝑒 𝑛𝑜𝑡 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡

We would reject 𝐻0 if the calculated value of the test statistic satisfies

2
𝜒0 > 𝜒𝛼, 𝑟−1 (𝑐−1)
Hypothesis Testing for Homogeneity
𝐻0 =< 𝑋 > , < 𝑌 >, < 𝑂𝑡ℎ𝑒𝑟𝑠 > 𝑎𝑟𝑒 𝑠𝑎𝑚𝑒
𝐻1 =< 𝑋 > , < 𝑌 >, < 𝑂𝑡ℎ𝑒𝑟𝑠 > 𝑎𝑟𝑒 𝑛𝑜𝑡 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒

We would reject 𝐻0 if the calculated value of the test statistic satisfies

2
𝜒0 > 𝜒𝛼, 𝑟−1 (𝑐−1)
Example: Testing for Independence
A company has to choose among three health insurance plans. Management
wishes to know whether the preference for plans is independent of job
classification and wants to use α= 0.05.

The opinions of a random sample of 500 employees are shown in Table 9-3.

Health Insurance Plan


Job Classification 1 2 3 Totals
Salaried workers 160 140 40 340
Hourly workers 40 60 60 160
Totals 200 200 100 500
To find the expected frequencies, we must first compute

Health Insurance Plan


Job 𝑣1 𝑣2 𝑣3 Totals
𝑢1 340/500=0.68
𝑢2 160/500=0.32
200/500 200/500 100/500
Totals 1
=0.4 =0.4 =0.2
The expected frequencies may now be computed from Equation 9-18.
For example, the expected number of salaried workers favoring health
insurance plan 1 is

𝐸11 = 𝒏𝒖𝟏 𝒗𝟏 = 𝟓𝟎𝟎 𝟎. 𝟔𝟖 𝟎. 𝟒 = 𝟏𝟑𝟔

The expected frequencies are shown in below table

Health Insurance Plan


Job Classification 1 2 3 Totals
Salaried workers 160 (136) 140 (136) 40 (68) 340
Hourly workers 40 (64) 60 (64) 60 (32) 160
Totals 200 200 100 500
It is your turn to conduct the hypothesis
testing and make a conclusion!
ATTEMPT FOR ADDITIONAL POINT

Additional Point will be = your score/50


Maximum Score for the next question is 100
Test the hypothesis that opinions concerning the proposed
abortion law are the same within each political affiliation. Use a
0.05 level of significance.