Stat 130 - Chi-Square Goodnes-Of-Fit Test

Chi-Square Goodness-of-Fit Test
LOZANO, ALDRIN T.
Introduction
The chi-square distribution can be used for tests concerning frequency distributions, such as:
If a sample of buyers is given a choice of automobile colors, will each color be selected with the same frequency?
Assumptions
- The data are obtained from a random sample
- The expected frequency for each category must be 5 or more
Test for Goodness-of-Fit

The chi-square statistic can be used to see whether a frequency distribution fits a specific pattern.
This is referred to as the chi-squared goodness-of-fit test.
Observed Frequencies vs Expected Frequencies

Suppose a market analyst wished to see whether consumers have any preference among five flavors of a new fruit soda. A sample of 100 people provided these data:
Cherry Strawberry Orange Lime Grape
32
28
16
14
10

Since the frequencies for each flavor were obtained from a sample, these actual frequencies are called the observed frequencies. The frequencies obtained by calculation (as if there were no preference) are called the expected frequencies.

Frequency Observed Expected Cherry 32 20 Strawberry 28 20 Orange 16 20 Lime 14 20 Grape 10 20
Goodness-of-Fit Test
The formula for the chi-square goodnessof-fit test is:
2 = ( )2
Where: O observed or obtained frequency E expected or theoretical frquency
Goodness-of-Fit Test
The degrees of freedom (df) is: = ( 1)( 1)
Where: C number of columns R number of rows
Example
Is there enough evidence to reject the claim that there is no preference in the selection of fruit soda flavors, using the data shown previously? Let = 0.05. Strawberry Orange Frequency Cherry Lime Grape
Observed Expected 32 20 28 20 16 20 14 20 10 20
Solution
Step 1: State the hypotheses and define the claim
Ho: Consumers show no preference for flavors (claim) Ha: Consumers show a preference
Step 2: Find the critical value

df = 4 and = 0.05, hence, the critical value from the chi- square distribution table is 9.488
Solution
Step 3: Compute X2
2 =
()2
= 18.0
Solution
Step 4: Make the decision
The decision if to reject the null hypothesis, since 18.0 > 9.488
Solution
Step 5: Summarize the results
There is enough evidence to reject the claim that consumers show no preference for the flavors.
A good fit
When the observed values and expected values are close together, the chi-square test value will be small.
Then the decision will be not to reject the null hypothesis hence, there is a good fit.
Not a good fit

When the observed values and the expected values are far apart, the chi-square test value will be large. Then, the null hypothesis will be rejected hence, there is not a good fit.
Chi-Square Goodness-of-Fit Procedure Summary

Step 1: State the hypotheses and define the
claim. Step 2: Find the critical value. (test is always right
tailed)
Step 3: Compute the test value. Step 4: Make the decision.
An example in R
Professor Bumblefuss takes a random sample of students enrolled in Statistics 101 at ABC University. He finds the following: there are 25 freshman in the sample, 32 sophomores, 18 juniors, and 20 seniors. Test the null hypothesis that freshman, sophomores, juniors, and seniors are equally represented among students signed up for Stat 101.
Freshman 25 Sophomore 32 Juniors 18 Seniors 20
R Implementation
chisq.test(x, y = NULL, correct = TRUE, p = rep(1/length(x), length(x)), rescale.p = FALSE, simulate.p.value = FALSE, B = 2000)
> chisq.test(c(25,32,18,20)) Chi-squared test for given probabilities data: c(25, 32, 18, 20) X-squared = 4.9158, df = 3, p-value = 0.1781
Another Example
A new casino game involves rolling 3 dice. The winnings are directly proportional to the total number of sixes rolled. Suppose a gambler plays the game 100 times, with the following observed counts:
Number of Sixes Number of Rolls
0
1 2 3
48
35 15 2
Another Example continued

The casino becomes suspicious of the gambler and wishes to determine whether the dice are fair. What do they conclude?
Another Example continued

If a die is fair, we would expect the probability of rolling a 6 on any given toss to be 1/6. Assuming the 3 dice are independent (the roll of one die should not affect the roll of the others), we might assume that the number of sixes in three rolls is distributed Binomial(3,1/6). To determine whether the gambler's dice are fair, we may compare his results with the results expected under this distribution. The expected values for 0, 1, 2, and 3 sixes under the Binomial(3,1/6) distribution are the following:
Expected Binomial Distribution values

P1 = P(roll P2 = P(roll P3 = P(roll P4 = P(roll 0 sixes) 1 six ) 2 sixes) 3 sixes) = P(X=0) = 0.58 = P(X=1) = 0.345 = P(X=2) = 0.07 = P(X=3) = 0.005
Expected vs Observed
Since the gambler plays 100 times, the expected counts are the following:
Number of Sixes 0 Expected Count 58 Observed Count 48
1
2 3
34.5
7 0.5
35
15 2
Visual Comparison
The two plots shown below provide visual comparison of the expected and observed values:
Chi-gram
From these graphs, it is difficult to distinguish differences between the observed and expected counts. A visual representation of the differences is the chi-gram, which plots the observedexpected counts divided by the square root of the expected counts, as shown here:
Chi-Square Statistic
The chi-square statistic is the sum of the squares of the plotted values, (48 58)2/58 + (35 34.5)2/34.5 + (15 7)2/7 + (2 0.5)2/0.5 1.72 + 0.007 + 9.14 + 4.5 = 15.367 Given this statistic, are the observed values likely under the assumed model?
Making a decision
In the gambling example above, the chi-square test statistic X2 was calculated to be 15.367. Since k = 4 in this case (the possibilities are 0,1,2, and 3 sixes) the test statistic is associated with the chisquare distribution with 3 degrees of freedom. If we are interested in a significance level of 0.05, we may reject the null hypothesis (that the dice is fair) if X2 7.815, the value corresponding to the 0.05 significance level for the X2 distribution. Since 15.367 is clearly greater than 7.815, we may reject the null hypothesis that the dice is fair at a 0.05 significance level.
Making a decision
Given this information, the casino can ask the gambler to take his dice (and business) somewhere else.
R Implementation
> expected <- c(58,34.5,7,0.5) > observed <- c(48,35,15,2) > chisq.test(observed, p = (expected/100)) Chi-squared test for given probabilities data: observed X-squared = 15.3742, df = 3, p-value = 0.001523
References
http://www.stat.yale.edu/Courses/1997-98/101/chigf.htm
http://www.scribd.com/doc/101960970/10/CHI-SQUAREGOODNESS-OF-FIT-PROCEDURE-SUMMARY
Thank you!

Stat 130 - Chi-Square Goodnes-Of-Fit Test

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Stat 130 - Chi-Square Goodnes-Of-Fit Test

Загружено:

Авторское право:

Доступные форматы

Chi-Square Goodness-of-Fit Test

- The expected frequency for each category must be 5 or more

Test for Goodness-of-Fit

This is referred to as the chi-squared goodness-of-fit test.

Observed Frequencies vs Expected Frequencies

Observed Frequencies vs Expected Frequencies

Observed Frequencies vs Expected Frequencies

Where: O observed or obtained frequency E expected or theoretical frquency

Where: C number of columns R number of rows

Step 2: Find the critical value

Not a good fit

Chi-Square Goodness-of-Fit Procedure Summary

Step 3: Compute the test value. Step 4: Make the decision.

Another Example continued

Another Example continued

Expected Binomial Distribution values

Вам также может понравиться