Академический Документы
Профессиональный Документы
Культура Документы
LOZANO, ALDRIN T.
Introduction
The chi-square distribution can be used for tests concerning frequency distributions, such as:
If a sample of buyers is given a choice of automobile colors, will each color be selected with the same frequency?
Assumptions
- The data are obtained from a random sample
32
28
16
14
10
Goodness-of-Fit Test
The formula for the chi-square goodnessof-fit test is:
2 = ( )2
Goodness-of-Fit Test
The degrees of freedom (df) is: = ( 1)( 1)
Example
Is there enough evidence to reject the claim that there is no preference in the selection of fruit soda flavors, using the data shown previously? Let = 0.05. Strawberry Orange Frequency Cherry Lime Grape
Observed Expected 32 20 28 20 16 20 14 20 10 20
Solution
Step 1: State the hypotheses and define the claim
Ho: Consumers show no preference for flavors (claim) Ha: Consumers show a preference
Solution
Step 3: Compute X2
2 =
()2
= 18.0
Solution
Step 4: Make the decision
The decision if to reject the null hypothesis, since 18.0 > 9.488
Solution
Step 5: Summarize the results
There is enough evidence to reject the claim that consumers show no preference for the flavors.
A good fit
When the observed values and expected values are close together, the chi-square test value will be small.
Then the decision will be not to reject the null hypothesis hence, there is a good fit.
An example in R
Professor Bumblefuss takes a random sample of students enrolled in Statistics 101 at ABC University. He finds the following: there are 25 freshman in the sample, 32 sophomores, 18 juniors, and 20 seniors. Test the null hypothesis that freshman, sophomores, juniors, and seniors are equally represented among students signed up for Stat 101.
Freshman 25 Sophomore 32 Juniors 18 Seniors 20
R Implementation
chisq.test(x, y = NULL, correct = TRUE, p = rep(1/length(x), length(x)), rescale.p = FALSE, simulate.p.value = FALSE, B = 2000)
> chisq.test(c(25,32,18,20)) Chi-squared test for given probabilities data: c(25, 32, 18, 20) X-squared = 4.9158, df = 3, p-value = 0.1781
Another Example
A new casino game involves rolling 3 dice. The winnings are directly proportional to the total number of sixes rolled. Suppose a gambler plays the game 100 times, with the following observed counts:
Number of Sixes Number of Rolls
0
1 2 3
48
35 15 2
Expected vs Observed
Since the gambler plays 100 times, the expected counts are the following:
Number of Sixes 0 Expected Count 58 Observed Count 48
1
2 3
34.5
7 0.5
35
15 2
Visual Comparison
The two plots shown below provide visual comparison of the expected and observed values:
Chi-gram
From these graphs, it is difficult to distinguish differences between the observed and expected counts. A visual representation of the differences is the chi-gram, which plots the observedexpected counts divided by the square root of the expected counts, as shown here:
Chi-Square Statistic
The chi-square statistic is the sum of the squares of the plotted values, (48 58)2/58 + (35 34.5)2/34.5 + (15 7)2/7 + (2 0.5)2/0.5 1.72 + 0.007 + 9.14 + 4.5 = 15.367 Given this statistic, are the observed values likely under the assumed model?
Making a decision
In the gambling example above, the chi-square test statistic X2 was calculated to be 15.367. Since k = 4 in this case (the possibilities are 0,1,2, and 3 sixes) the test statistic is associated with the chisquare distribution with 3 degrees of freedom. If we are interested in a significance level of 0.05, we may reject the null hypothesis (that the dice is fair) if X2 7.815, the value corresponding to the 0.05 significance level for the X2 distribution. Since 15.367 is clearly greater than 7.815, we may reject the null hypothesis that the dice is fair at a 0.05 significance level.
Making a decision
Given this information, the casino can ask the gambler to take his dice (and business) somewhere else.
R Implementation
> expected <- c(58,34.5,7,0.5) > observed <- c(48,35,15,2) > chisq.test(observed, p = (expected/100)) Chi-squared test for given probabilities data: observed X-squared = 15.3742, df = 3, p-value = 0.001523
References
http://www.stat.yale.edu/Courses/1997-98/101/chigf.htm
http://www.scribd.com/doc/101960970/10/CHI-SQUAREGOODNESS-OF-FIT-PROCEDURE-SUMMARY
Thank you!