Вы находитесь на странице: 1из 5

Chi-Square

What is chi-square testing? Generally speaking, the chi-square test, also referred to as chi-square test or X2 test, is a statistical test used to examine differences with categorical variables. Identifies significant differences among the observed frequencies and the expected frequencies of a particular group Attempts to identify whether any differences between the expected and observed frequencies are due to chance, or some other factor that is affecting it. There are actually many types of Chi-square tests, but the most common one is the Pearson Chi-square Test. Other types: Yates's correction for continuity, also known as Yates' chi-squared test. CochranMantelHaenszel chi-squared test. McNemar's test, used in certain 2 2 tables with pairing Tukey's test of additivity The portmanteau test in time-series analysis, testing for the presence of autocorrelation Likelihood-ratio tests in general statistical modeling, for testing whether there is evidence of the need to move from a simple model to a more complicated one. Important Characteristics of a Chi-square Test This test is based on frequencies and not on parameters like mean and standard deviation. This test is used for testing the hypothesis and is not useful for estimation. This test can also be applied to a complex contingency table with several classes and as such a very useful test in research work. This test is an important non-parametric test as no rigid assumptions are necessary in regard to the type of population, no need of parameter values and relatively less mathematical details are needed. Application of chi-square testing Goodness-of-fit distributions The chi-square statistic can be used to see whether a frequency distribution fits a specific pattern. Test of independence of attributes Enables us to explain whether or not two attributes are associated For instance, we may be interested in knowing whether a new substance is effective in controlling fever or not, X2 test is useful Test of homogeneity Enables us to test whether the occurrence of events follow uniformity or not For example, the admission of patients in government hospital in all days of week is uniform or not can be tested with the help of chi square test. Requirements of the Chi-squared Test The values of the parameters to be compared are quantitative and nominal. There should be one or more categories in the setup. The observations should be independent of each other. An adequate sample size. (At least 10) Most of the time, it is the frequency of the observations that are used. Example A student wants to see whether the food preferences of males and females differed. He tried to see whether males or females had a general difference in the preference for cooked and raw foods. A survey was conducted with the following results: Twelve males preferred cooked foods. Eight males preferred raw foods.

Five females preferred cooked foods. Five females preferred raw foods. Step 1: State the null hypothesis and the alternative hypothesis. H0 : There is no significant difference between the food preferences of males and females. OR H0 : Food preference is independent of gender. Ha : There is a significant difference between the food preferences of males and females. OR Ha : Food preference is affected by gender. Step 2: State the level of significance. = 0.05 0.05 is the level of significance for most scientific experiments. Step 3: Set up a contingency table. The contingency table summarizes the data. The categories on the columns are the preferences that you are checking. The categories on the rows are the populations whose preferences are being checked. A row total and column total is always included as well. Preference Male Cooked 12 Raw 8 Total (Column) 20 Female 5 5 10 Total (Row) 17 13 30

Step 4: Compute for the expected frequencies. Expected frequencies fe: the cell frequencies that would be expected in a contingency table if the two variables were statistically independent. This expected frequency is computed for EACH cell. Observed frequencies fo : the cell frequencies actually observed in a contingency table

To obtain the frequencies for any cell in any cross-tabulation in which the two variables are assumed independent, multiply the row and column totals for that cell and divide the product by the total number of cases (N) in the table. Preference Cooked Male (20)(17) 30 =11.33 (20)(13) 30 =8.67 20 Female (10)(17) 30 =5.67 (10)(13) 30 =4.33 10 Total (Row) 17

Raw

13

Total (Column)

30

The fundamental formula for the Chi-squared test is:

Step 5: Rearrange the table. Preference Cooked Males Cooked Females Raw Males Raw Females Total Step 6: Determine the degrees of freedom df = (r 1)(c 1) Where Preference Cooked Raw Total (Column) r = number of rows df = (r 1)(c 1) df = (2 1)(2 1) df = 1 Observed 12 5 8 5 Expected 11.33 5.67 8.67 4.33 X2 0.0396 0.0792 0.0518 0.1037 0.2743

Male 12 8 20

Female 5 5 10

Total (Row) 17 13 30

c = number of columns

Step 7: Check the tabular Chi-squared value with your df and level of significance. = 0.05 df = 1

Step 7: Check the tabular Chi-squared value with your df and level of significance. Checking the table, we see that the tabular chi-squared value for df = 1, and = 0.05 is 3.841. Since our calculated chi-square is less than this, the conclusion is to accept the null hypothesis. Hence, food preference is independent of gender. Summary Step 1: State the null hypothesis and the alternative hypothesis. Step 2: State the level of significance. Step 3: Set up a contingency table: Step 4: Compute for the expected frequencies. Step 5: Rearrange the table to show the observed and expected frequencies on the columns, and the subcategories on the rows. Step 6: Determine the degrees of freedom Step 7: Check the tabular Chi-squared value with your df and level of significance. Conclusion Checking the table, we see that the tabular chi-squared value for df = 1, and = 0.05 is 3.841. Since our calculated chi-square is greater than this, the conclusion is to reject the null hypothesis. We can thus safely say that the null hypothesis is incorrect; attending class and passing the exam are definitely dependent on one another. Limitations The data is from a random sample. This test, applied in a four fold table, will not give a reliable result with one degree of freedom if the expected value in any cell is less than five. Interpret this test with caution if sample total or total values in all cells is less than 50. This test tells the presence or absence of an association between the events but doesnt measure the strength of the association. This test doesnt indicate the cause and effect, it only tells the probability of occurrence of association by chance. The test is to be applied only when the individual observations of sample are independent which means that the occurrence of one individual observation (event) has no effect upon the occurrence of any other observation (event) in the sample under consideration.

Seatwork
A book publisher would like to know whether customer preference is the same for 4 different book cover designs. The title, author, and price are the same for each of the designs. A random sample of 113 customers was observed with the following results. Design 1 2 3 4 Number of customers selecting the design 30 26 25 22

Solution: Step 1: State the hypothesis Ho: The 4 designs are equally preferable Ha: The designs are not equally preferable Step 2: State the level of significance = 0.05 0.05 is the level of significance for most scientific experiments. Step 3: Construct a contingency table

Вам также может понравиться