Вы находитесь на странице: 1из 32

Chi-Square Goodness-of-Fit Test

LOZANO, ALDRIN T.

Introduction
The chi-square distribution can be used for tests concerning frequency distributions, such as:

If a sample of buyers is given a choice of automobile colors, will each color be selected with the same frequency?

Assumptions
- The data are obtained from a random sample

- The expected frequency for each category must be 5 or more

Test for Goodness-of-Fit


The chi-square statistic can be used to see whether a frequency distribution fits a specific pattern.

This is referred to as the chi-squared goodness-of-fit test.

Observed Frequencies vs Expected Frequencies


Suppose a market analyst wished to see whether consumers have any preference among five flavors of a new fruit soda. A sample of 100 people provided these data:
Cherry Strawberry Orange Lime Grape

32

28

16

14

10

Observed Frequencies vs Expected Frequencies


Since the frequencies for each flavor were obtained from a sample, these actual frequencies are called the observed frequencies. The frequencies obtained by calculation (as if there were no preference) are called the expected frequencies.

Observed Frequencies vs Expected Frequencies


Frequency Observed Expected Cherry 32 20 Strawberry 28 20 Orange 16 20 Lime 14 20 Grape 10 20

Goodness-of-Fit Test
The formula for the chi-square goodnessof-fit test is:
2 = ( )2

Where: O observed or obtained frequency E expected or theoretical frquency

Goodness-of-Fit Test
The degrees of freedom (df) is: = ( 1)( 1)

Where: C number of columns R number of rows

Example
Is there enough evidence to reject the claim that there is no preference in the selection of fruit soda flavors, using the data shown previously? Let = 0.05. Strawberry Orange Frequency Cherry Lime Grape
Observed Expected 32 20 28 20 16 20 14 20 10 20

Solution
Step 1: State the hypotheses and define the claim
Ho: Consumers show no preference for flavors (claim) Ha: Consumers show a preference

Step 2: Find the critical value


df = 4 and = 0.05, hence, the critical value from the chi- square distribution table is 9.488

Solution
Step 3: Compute X2
2 =
()2

= 18.0

Solution
Step 4: Make the decision
The decision if to reject the null hypothesis, since 18.0 > 9.488

Solution
Step 5: Summarize the results
There is enough evidence to reject the claim that consumers show no preference for the flavors.

A good fit
When the observed values and expected values are close together, the chi-square test value will be small.

Then the decision will be not to reject the null hypothesis hence, there is a good fit.

Not a good fit


When the observed values and the expected values are far apart, the chi-square test value will be large. Then, the null hypothesis will be rejected hence, there is not a good fit.

Chi-Square Goodness-of-Fit Procedure Summary


Step 1: State the hypotheses and define the
claim. Step 2: Find the critical value. (test is always right
tailed)

Step 3: Compute the test value. Step 4: Make the decision.

An example in R
Professor Bumblefuss takes a random sample of students enrolled in Statistics 101 at ABC University. He finds the following: there are 25 freshman in the sample, 32 sophomores, 18 juniors, and 20 seniors. Test the null hypothesis that freshman, sophomores, juniors, and seniors are equally represented among students signed up for Stat 101.
Freshman 25 Sophomore 32 Juniors 18 Seniors 20

R Implementation
chisq.test(x, y = NULL, correct = TRUE, p = rep(1/length(x), length(x)), rescale.p = FALSE, simulate.p.value = FALSE, B = 2000)
> chisq.test(c(25,32,18,20)) Chi-squared test for given probabilities data: c(25, 32, 18, 20) X-squared = 4.9158, df = 3, p-value = 0.1781

Another Example
A new casino game involves rolling 3 dice. The winnings are directly proportional to the total number of sixes rolled. Suppose a gambler plays the game 100 times, with the following observed counts:
Number of Sixes Number of Rolls

0
1 2 3

48
35 15 2

Another Example continued


The casino becomes suspicious of the gambler and wishes to determine whether the dice are fair. What do they conclude?

Another Example continued


If a die is fair, we would expect the probability of rolling a 6 on any given toss to be 1/6. Assuming the 3 dice are independent (the roll of one die should not affect the roll of the others), we might assume that the number of sixes in three rolls is distributed Binomial(3,1/6). To determine whether the gambler's dice are fair, we may compare his results with the results expected under this distribution. The expected values for 0, 1, 2, and 3 sixes under the Binomial(3,1/6) distribution are the following:

Expected Binomial Distribution values


P1 = P(roll P2 = P(roll P3 = P(roll P4 = P(roll 0 sixes) 1 six ) 2 sixes) 3 sixes) = P(X=0) = 0.58 = P(X=1) = 0.345 = P(X=2) = 0.07 = P(X=3) = 0.005

Expected vs Observed
Since the gambler plays 100 times, the expected counts are the following:
Number of Sixes 0 Expected Count 58 Observed Count 48

1
2 3

34.5
7 0.5

35
15 2

Visual Comparison
The two plots shown below provide visual comparison of the expected and observed values:

Chi-gram
From these graphs, it is difficult to distinguish differences between the observed and expected counts. A visual representation of the differences is the chi-gram, which plots the observedexpected counts divided by the square root of the expected counts, as shown here:

Chi-Square Statistic
The chi-square statistic is the sum of the squares of the plotted values, (48 58)2/58 + (35 34.5)2/34.5 + (15 7)2/7 + (2 0.5)2/0.5 1.72 + 0.007 + 9.14 + 4.5 = 15.367 Given this statistic, are the observed values likely under the assumed model?

Making a decision
In the gambling example above, the chi-square test statistic X2 was calculated to be 15.367. Since k = 4 in this case (the possibilities are 0,1,2, and 3 sixes) the test statistic is associated with the chisquare distribution with 3 degrees of freedom. If we are interested in a significance level of 0.05, we may reject the null hypothesis (that the dice is fair) if X2 7.815, the value corresponding to the 0.05 significance level for the X2 distribution. Since 15.367 is clearly greater than 7.815, we may reject the null hypothesis that the dice is fair at a 0.05 significance level.

Making a decision
Given this information, the casino can ask the gambler to take his dice (and business) somewhere else.

R Implementation
> expected <- c(58,34.5,7,0.5) > observed <- c(48,35,15,2) > chisq.test(observed, p = (expected/100)) Chi-squared test for given probabilities data: observed X-squared = 15.3742, df = 3, p-value = 0.001523

References
http://www.stat.yale.edu/Courses/1997-98/101/chigf.htm
http://www.scribd.com/doc/101960970/10/CHI-SQUAREGOODNESS-OF-FIT-PROCEDURE-SUMMARY

Thank you!

Вам также может понравиться