Chi Square Test

Chi-square test
"Chi-square test" also known as Pearson's chi-square test.
A chi-square test (also chi-squared or χ2 test) is any statistical hypothesis test in which the
sampling distribution of the test statistic is a chi-square distribution when the null hypothesis is
true, or any in which this is asymptotically true, meaning that the sampling distribution (if the
null hypothesis is true) can be made to approximate a chi-square distribution as closely as
desired by making the sample size large enough.
Some examples of chi-squared tests where the chi-square distribution is only approximately
valid:
 Pearson's chi-square test, also known as the chi-square goodness-of-fit test or chi-
square test for independence. When mentioned without any modifiers or without other
precluding context, this test is usually understood (for an exact test used in place of χ2,
see Fisher's exact test).
 Yates' chi-square test, also known as Yates' correction for
 Mantel–Haenszel chi-square test.
 Linear-by-linear association chi-square test.
 The portmanteau test in time-series analysis, testing for the presence of autocorrelation
 Likelihood-ratio tests in general statistical modelling, for testing whether there is
evidence of the need to move from a simple model to a more complicated one (where the
simple model is nested within the complicated one).
One case where the distribution of the test statistic is an exact chi-square distribution is the test
that the variance of a normally-distributed population has a given value based on a sample
variance. Such a test is uncommon in practice because values of variances to test against are
seldom known exactly.
Chi-square test for variance in a normal population

If a sample of size n is taken from a population having a normal distribution, then there is a well-
known result (see distribution of the sample variance) which allows a test to be made of whether
the variance of the population has a pre-determined value. For example, a manufacturing process
might have been in stable condition for a long period, allowing a value for the variance to be
determined essentially without error. Suppose that a variant of the process is being tested, giving
rise to a small sample of product items whose variation is to be tested. The test statistic T in this
instance could be set to be the sum of squares about the sample mean, divided by the nominal
value for the variance (i.e. the value to be tested as holding). Then T has a chi-square distribution
with n–1 degrees of freedom. For example if the sample size is 21, the acceptance region for T
for a significance level of 5% is the interval 9.59 to 34.17
G-test
In statistics, G-tests are likelihood-ratio or maximum likelihood statistical significance tests that
are increasingly being used in situations where chi-square tests were previously recommended.
The commonly used chi-squared tests for goodness of fit to a distribution and for independence
in contingency tables are in fact approximations of the log-likelihood ratio on which the G-tests
are based. This approximation was developed by Karl Pearson because at the time it was unduly
laborious to calculate log-likelihood ratios. With the advent of electronic calculators and personal
computers, this is no longer a problem. G-tests are coming into increasing use, particularly since
they were recommended in the 1994 edition of the popular statistics textbook by Sokal and Rohl.
Dunning introduced the test to the computational linguistics community where it is now widely
used.
The general formula for Pearson's chi-squared test statistic is
where Oi is the frequency observed in a cell, E is the frequency expected on the null hypothesis,
and the sum is taken across all cells. The corresponding general formula for G is
where ln denotes the natural logarithm (log to the base e) and the sum is again taken over all
non-empty cells.
Relation with Mutual Information

The value of G can also be expressed in terms of Mutual Information.
Let , , and .
Then we have,
or,
or, ,
where the Entropy of some discrete random variable is defined as
is the Mutual Information

between the row vector and the column vector of the contingency table.
It can also be shown that the inverse document frequency weighting commonly used for text
retrieval is an approximation of G where the row sum for the query is much smaller than the row
sum for the remainder of the corpus. Similarly, the result of Bayesian inference applied to a
choice of single multinomial distribution for all rows of the contingency table taken together
versus the more general alternative of a separate multinomial per row produces results very
similar to the G statistic.
Distribution and usage

Given the null hypothesis that the observed frequencies result from random sampling from a
distribution with the given expected frequencies, the distribution of G is approximately that of
chi-squared, with the same number of degrees of freedom as in the corresponding chi-squared
test.
For samples of a reasonable size, the G-test and the chi-squared test will lead to the same
conclusions. However, the approximation to the theoretical chi-square distribution for the G-test
is better than for the Pearson chi-squared tests in cases where for any cell | Oi − Ei | > Ei, and in
any such case the G-test should always be used.
For very small samples the multinomial test for goodness of fit, and Fisher's exact test for
contingency tables, or even Bayesian hypothesis selection are preferable to either the chi-squared
test or the G-test.
Statistical software
 Software for the R programming language (homepage here) to perform the G-test is
available on a Professor's software page at the University of Alberta.
 Fisher's G-Test in the GeneCycle Package of the R programming language (fisher.g.test)
does not implement the G-test as described in this article, but rather Fisher's exact test of
Gaussian white-noise in a time series (see Fisher, R.A. 1929 "Tests of significance in
harmonic analysis").
 In SAS, one can conduct G-Test by applying the /chisq option in proc freq.
Pearson's chi-square test
A chi-squared test nomogram
Pearson's chi-square (χ2) test is the best-known of several chi-square tests – statistical
procedures whose results are evaluated by reference to the chi-square distribution. Its properties
were first investigated by Karl Pearson. In contexts where it is important to make a distinction
between the test statistic and its distribution, names similar to Pearson Χ-squared test or
statistic are used.
It tests a null hypothesis stating that the frequency distribution of certain events observed in a
sample is consistent with a particular theoretical distribution. The events considered must be
mutually exclusive and have total probability 1. A common case for this is where the events each
cover an outcome of a categorical variable. A simple example is the hypothesis that an ordinary
six-sided die is "fair", i.e., all six outcomes are equally likely to occur.

Definition
Pearson's chi-square is used to assess two types of comparison: tests of goodness of fit and tests
of independence. A test of goodness of fit establishes whether or not an observed frequency
distribution differs from a theoretical distribution. A test of independence assesses whether paired
observations on two variables, expressed in a contingency table, are independent of each other –
for example, whether people from different regions differ in the frequency with which they
report that they support a political candidate.
The first step in the chi-square test is to calculate the chi-square statistic. In order to avoid
ambiguity, the value of the test-statistic is denoted by Χ2 rather than χ2 (i.e. uppercase chi instead
of lowercase); this also serves as a reminder that the distribution of the test statistic is not exactly
that of a chi-square random variable. However some authors do use the χ2 notation for the test
statistic. An exact test which does not rely on using the approximate χ2 distribution is Fisher's
exact test: this is significantly more accurate in evaluating the significance level of the test,
especially with small numbers of observation.
The chi-square statistic is calculated by finding the difference between each observed and
theoretical frequency for each possible outcome, squaring them, dividing each by the theoretical
frequency, and taking the sum of the results. A second important part of determining the test
statistic is to define the degrees of freedom of the test: this is essentially the number of observed
frequencies adjusted for the effect of using some of those observations to define the "theoretical
frequencies".
Test for fit of a distribution
Discrete uniform distribution
In this case N observations are divided among n cells. A simple application is to test the
hypothesis that, in the general population, values would occur in each cell with equal frequency.
The "theoretical frequency" for any cell (under the null hypothesis of a discrete uniform
distribution) is thus calculated as
and the reduction in the degrees of freedom is p = 1, notionally because the observed frequencies
Oi are constrained to sum to N.
Other distributions
When testing whether observations are random variables whose distribution belongs to a given
family of distributions, the "theoretical frequencies" are calculated using a distribution from that
family fitted in some standard way. The reduction in the degrees of freedom is calculated as p = s
+ 1, where s is the number of parameters used in fitting the distribution. For instance, when
checking a 3-parameter Weibull distribution, p = 4, and when checking a normal distribution
(where the parameters are mean and standard deviation), p = 3. In other words, there will be n −
p degrees of freedom, where n is the number of categories.
It should be noted that the degrees of freedom are not based on the number of observations as
with a Student's t or F-distribution. For example, if testing for a fair, six-sided die, there would be
five degrees of freedom because there are six categories/parameters (each number). The number
of times the die is rolled will have absolutely no effect on the number of degrees of freedom.
Calculating the test-statistic
The value of the test-statistic is
where
Χ2 = Pearson's cumulative test statistic, which asymptotically approaches a χ2 distribution.
Oi = an observed frequency;
Ei = an expected (theoretical) frequency, asserted by the null hypothesis;
n = the number of cells in the table.
Chi-square distribution, showing χ² on the x-axis and P-value on the y-axis.
The chi-square statistic can then be used to calculate a p-value by comparing the value of the
statistic to a chi-squared distribution. The number of degrees of freedom is equal to the number
of cells n, minus the reduction in degrees of freedom, p.
The result about the number of degrees of freedom is valid when the original data was
multinomial and hence the estimated parameters are efficient for minimizing the chi-square
statistic. More generally however, when maximum likelihood estimation does not coincide with
minimum chi-square estimation, the distribution will lie somewhere between a chi-square
distribution with n − 1 − p and n − 1 degrees of freedom (See for instance Chernoff and
Lehmann, 1954).
Bayesian method
In Bayesian statistics, one would instead use a Dirichlet distribution as conjugate prior. If one
took a uniform prior, then the maximum likelihood estimate for the population probability is the
observed probability, and one may compute a credible region around this or another estimate.
Test of independence
In this case, an "observation" consists of the values of two outcomes and the null hypothesis is
that the occurrence of these outcomes is statistically independent. Each observation is allocated
to one cell of a two-dimensional array of cells (called a table) according to the values of the two
outcomes. If there are r rows and c columns in the table, the "theoretical frequency" for a cell,
given the hypothesis of independence, is
and fitting the model of "independence" reduces the number of degrees of freedom by
p = r + c − 1. The value of the test-statistic is
The number of degrees of freedom is equal to the number of cells rc, minus the reduction in
degrees of freedom, p, which reduces to (r − 1)(c − 1).
For the test of independence, a chi-square probability of less than or equal to 0.05 (or the chi-
square statistic being at or larger than the 0.05 critical point) is commonly interpreted by applied
workers as justification for rejecting the null hypothesis that the row variable is unrelated (that is,
only randomly related) to the column variable. The alternative hypothesis corresponds to the
variables having an association or relationship where the structure of this relationship is not
specified.
Assumptions
The chi square test, when used with the standard approximation that a chi-square distribution is
applicable, has the following assumptions:
 Random sample – A random sampling of the data from a fixed distribution or population.
 Sample size (whole table) – A sample with a sufficiently large size is assumed. If a chi square test
is conducted on a sample with a smaller size, then the chi square test will yield an inaccurate
inference. The researcher, by using chi square test on small samples, might end up committing a
Type II error.
 Expected Cell Count – Adequate expected cell counts. Some require 5 or more, and others
require 10 or more. A common rule is 5 or more in all cells of a 2-by-2 table, and 5 or more in
80% of cells in larger tables, but no cells with zero expected count. When this assumption is not
met, Yates' correction is applied.
 Independence – The observations are always assumed to be independent of each other. This
means chi-square cannot be used to test correlated data (like: matched pairs, panel data). In
those cases you might want to turn to McNemar's test.
Example
For example, to test the hypothesis that a random sample of 100 people has been drawn from a
population in which men and women are equal in frequency, the observed number of men and
women would be compared to the theoretical frequencies of 50 men and 50 women. If there were
44 men in the sample and 56 women, then
If the null hypothesis is true (i.e., men and women are chosen with equal probability in the
sample), the test statistic will be drawn from a chi-square distribution with one degree of
freedom. Though one might expect two degrees of freedom (one each for the men and women),
we must take into account that the total number of men and women is constrained (100), and thus
there is only one degree of freedom (2 − 1). Alternatively, if the male count is known the female
count is determined, and vice-versa.
Consultation of the chi-square distribution for 1 degree of freedom shows that the probability of
observing this difference (or a more extreme difference than this) if men and women are equally
numerous in the population is approximately 0.23. This probability is higher than conventional
criteria for statistical significance (.001-.05), so normally we would not reject the null hypothesis
that the number of men in the population is the same as the number of women (i.e. we would
consider our sample within the range of what we'd expect for a 50/50 male/female ratio.)
Problems
The approximation to the chi-square distribution breaks down if expected frequencies are too
low. It will normally be acceptable so long as no more than 20% of the events have expected
frequencies below 5. Where there is only 1 degree of freedom, the approximation is not reliable
if expected frequencies are below 10. In this case, a better approximation can be obtained by
reducing the absolute value of each difference between observed and expected frequencies by 0.5
before squaring; this is called Yates' correction for continuity.
In cases where the expected value, E, is found to be small (indicating either a small underlying
population probability, or a small number of observations), the normal approximation of the
multinomial distribution can fail, and in such cases it is found to be more appropriate to use the
G-test, a likelihood ratio-based test statistic. Where the total sample size is small, it is necessary
to use an appropriate exact test, typically either the binomial test or (for contingency tables)
Fisher's exact test; but note that this test assumes fixed and known marginal totals.
Distribution
The null distribution of the Pearson statistic with j rows and k columns is approximated by the
chi-square distribution with (k − 1)(j − 1) degrees of freedom.
This approximation arises as the true distribution, under the null hypothesis, if the expected value
is given by a multinomial distribution. For large sample sizes, the central limit theorem says this
distribution tends toward a certain multivariate normal distribution.
Two cells
In the special case where there are only two cells in the table, the expected values follow a
binomial distribution,
where
p = probability, under the null hypothesis,
n = number of observations in the sample.
In the above example the hypothesised probability of a male observation is 0.5, with 100
samples. Thus we expect to observe 50 males.
If n is sufficiently large, the above binomial distribution may be approximated by a Gaussian

(normal) distribution and thus the Pearson test statistic approximates a chi-squared distribution,
Let O1 be the number of observations from the sample that are in the first cell. The Pearson test
statistic can be expressed as
which can in turn be expressed as
By the normal approximation to a binomial this is the square of one standard normal variate, and
hence is distributed as chi-square with 1 degree of freedom. Note that the denominator is one
standard deviation of the Gaussian approximation, so can be written
So as consistent with the meaning of the chi-square distribution, we are measuring how probable
the observed number of standard deviations away from the mean is under the Gaussian
approximation (which is a good approximation for large n).
The chi-square distribution is then integrated on the right of the statistic value to obtain the P-
value, which is equal to the probability of getting a statistic equal or bigger than the observed
one, assuming the null hypothesis.
Many cells
Similar arguments as above lead to the desired result. Each cell (except the final one, whose
value is completely determined by the others) is treated as an independent binomial variable, and
their contributions are summed and each contributes one degree of freedom.
Student's t-test
A t-test is any statistical hypothesis test in which the test statistic follows a Student's t
distribution if the null hypothesis is supported. It is most commonly applied when the test
statistic would follow a normal distribution if the value of a scaling term in the test statistic were
known. When the scaling term is unknown and is replaced by an estimate based on the data, the
test statistic (under certain conditions) follows a Student's t distribution.
History
The t-statistic was introduced in 1908 by William Sealy Gosset, a chemist working for the
Guinness brewery in Dublin, Ireland ("Student" was his pen name).[1][2][3] Gosset had been hired
due to Claude Guinness's innovative policy of recruiting the best graduates from Oxford and
Cambridge to apply biochemistry and statistics to Guinness' industrial processes. Gosset devised
the t-test as a way to cheaply monitor the quality of stout. He published the test in Biometrika in
1908, but was forced to use a pen name by his employer, who regarded the fact that they were
using statistics as a trade secret. In fact, Gosset's identity was unknown to fellow statisticians.
Uses
Among the most frequently used t-tests are:
 A one-sample location test of whether the mean of a normally distributed population has a value
specified in a null hypothesis.
 A two sample location test of the null hypothesis that the means of two normally distributed
populations are equal. All such tests are usually called Student's t-tests, though strictly speaking
that name should only be used if the variances of the two populations are also assumed to be
equal; the form of the test used when this assumption is dropped is sometimes called Welch's t-
test. These tests are often referred to as "unpaired" or "independent samples" t-tests, as they
are typically applied when the statistical units underlying the two samples being compared are
non-overlapping.
 A test of the null hypothesis that the difference between two responses measured on the same
statistical unit has a mean value of zero. For example, suppose we measure the size of a cancer
patient's tumor before and after a treatment. If the treatment is effective, we expect the tumor
size for many of the patients to be smaller following the treatment. This is often referred to as
the "paired" or "repeated measures" t-test:
 A test of whether the slope of a regression line differs significantly from 0.
Assumptions
Most t-test statistics have the form T = Z/s, where Z and s are functions of the data. Typically, Z
is designed to be sensitive to the alternative hypothesis (i.e. its magnitude tends to be larger when
the alternative hypothesis is true), whereas s is a scaling parameter that allows the distribution of
T to be determined.
As an example, in the one-sample t-test Z is , where is the sample mean of the data,
n is the sample size, and σ is the population standard deviation of the data; s in the one-sample t-
test is , where is the sample standard deviation.
The assumptions underlying a t-test are that
 Z follows a standard normal distribution under the null hypothesis

 ps2 follows a χ2 distribution with p degrees of freedom under the null hypothesis, where p is a
positive constant
 Z and s are independent.
In a specific type of t-test, these conditions are consequences of the population being studied,
and of the way in which the data are sampled. For example, in the t-test comparing the means of
two independent samples, the following assumptions should be met:
 Each of the two populations being compared should follow a normal distribution (which can be
tested using a normality test, such as the Shapiro-Wilk and Kolmogorov–Smirnov tests, or which
can be assessed graphically using a normal quantile plot).
 If using Student's original definition of the t-test, the two populations being compared should
have the same variance (testable using Levene's test, Bartlett's test, or the Brown–Forsythe test;
or assessable graphically using a normal quantile plot). If the sample sizes in the two groups
being compared are roughly equal, Student's original t-test is highly robust to the presence of
unequal variances [7]. Welch's t-test is insensitive to equality of the variances regardless of
whether the sample sizes are similar.
 The data used to carry out the test should be sampled independently from the two populations
being compared. This is in general not testable from the data, but if the data are known to be
dependently sampled (i.e. if they were sampled in clusters), then the classical t-tests discussed
here may give misleading results.
Unpaired and paired two-sample t-tests
Two-sample t-tests for a difference in mean can be either unpaired or paired. Paired t-tests are a
form of blocking, and have greater power than unpaired tests when the paired units are similar
with respect to "noise factors" that are independent of membership in the two groups being
compared. In a different context, paired t-tests can be used to reduce the effects of confounding
factors in an observational study.
The unpaired, or "independent samples" t-test is used when two separate sets of independent and
identically distributed samples are obtained, one from each of the two populations being
compared. For example, suppose we are evaluating the effect of a medical treatment, and we
enroll 100 subjects into our study, then randomize 50 subjects to the treatment group and 50
subjects to the control group. In this case, we have two independent samples and would use the
unpaired form of the t-test. The randomization is not essential here—if we contacted 100 people
by phone and obtained each person's age and gender, and then used a two-sample t-test to see
whether the mean ages differ by gender, this would also be an independent samples t-test, even
though the data are observational.
Dependent samples (or "paired") t-tests typically consist of a sample of matched pairs of similar
units, or one group of units that has been tested twice (a "repeated measures" t-test). A typical
example of the repeated measures t-test would be where subjects are tested prior to a treatment,
say for high blood pressure, and the same subjects are tested again after treatment with a blood-
pressure lowering medication.
A dependent t-test based on a "matched-pairs sample" results from an unpaired sample that is
subsequently used to form a paired sample, by using additional variables that were measured
along with the variable of interest . The matching is carried out by identifying pairs of values
consisting of one observation from each of the two samples, where the pair is similar in terms of
other measured variables. This approach is often used in observational studies to reduce or
eliminate the effects of confounding factors.
Calculations
Explicit expressions that can be used to carry out various t-tests are given below. In each case,
the formula for a test statistic that either exactly follows or closely approximates a t-distribution
under the null hypothesis is given. Also, the appropriate degrees of freedom are given in each
case. Each of these statistics can be used to carry out either a one-tailed test or a two-tailed test.
Once a t value is determined, a p-value can be found using a table of values from Student's t-
distribution. If the calculated p-value is below the threshold chosen for statistical significance
(usually the 0.10, the 0.05, or 0.01 level), then the null hypothesis is rejected in favor of the
alternative hypothesis.
Independent one-sample t-test
In testing the null hypothesis that the population mean is equal to a specified value μ0, one uses
the statistic
where s is the sample standard deviation of the sample and n is the sample size. The degrees of
freedom used in this test is n − 1.
Slope of a regression line
Suppose one is fitting the model
where xi, i = 1, ..., n are known, α and β are unknown, and εi are independent normally distributed
random errors with expected value 0 and unknown variance σ2, and Yi, i = 1, ..., n are observed. It
is desired to test the null hypothesis that the slope β is equal to some specified value β0 (often
taken to be 0, in which case the hypothesis is that x and y are unrelated).
Let
Then
has a t-distribution with n − 2 degrees of freedom if the null hypothesis is true. The standard
error of the angular coefficient:
can be written in terms of the residuals. Let
Then tscore is given by:
Independent two-sample t-test
Equal sample sizes, equal variance
This test is only used when both:
 the two sample sizes (that is, the number, n, of participants of each group) are equal;
 it can be assumed that the two distributions have the same variance.
Violations of these assumptions are discussed below.
The t statistic to test whether the means are different can be calculated as follows:
where
Here is the grand standard deviation (or pooled standard deviation), 1 = group one, 2 =
group two. The denominator of t is the standard error of the difference between two means.
For significance testing, the degrees of freedom for this test is 2n − 2 where n is the number of
participants in each group.
Unequal sample sizes, equal variance
This test is used only when it can be assumed that the two distributions have the same variance.
(When this assumption is violated, see below.) The t statistic to test whether the means are
different can be calculated as follows:
where
Note that the formulae above are generalizations for the case where both samples have equal
sizes (substitute n1 and n2 for n and you'll see).
is an estimator of the common standard deviation of the two samples: it is defined in this
way so that its square is an unbiased estimator of the common variance whether or not the
population means are the same. In these formulae, n = number of participants, 1 = group one, 2 =
group two. n − 1 is the number of degrees of freedom for either group, and the total sample size
minus two (that is, n1 + n2 − 2) is the total number of degrees of freedom, which is used in
significance testing.
Unequal sample sizes, unequal variance
This test is used only when the two population variances are assumed to be different (the two
sample sizes may or may not be equal) and hence must be estimated separately. See also Welch's
t-test. The t statistic to test whether the population means are different can be calculated as
follows:
where
Where s2 is the unbiased estimator of the variance of the two samples, n = number of
participants, 1 = group one, 2 = group two. Note that in this case, is not a pooled
variance. For use in significance testing, the distribution of the test statistic is approximated as
being an ordinary Student's t distribution with the degrees of freedom calculated using
This is called the Welch–Satterthwaite equation. Note that the true distribution of the test statistic
actually depends (slightly) on the two unknown variances: see Behrens–Fisher problem.
Dependent t-test for paired samples
This test is used when the samples are dependent; that is, when there is only one sample that has
been tested twice (repeated measures) or when there are two samples that have been matched or
"paired". This is an example of a paired difference test.
For this equation, the differences between all pairs must be calculated. The pairs are either one
person's pre-test and post-test scores or between pairs of persons matched into meaningful
groups (for instance drawn from the same family or age group: see table). The average (XD) and
standard deviation (sD) of those differences are used in the equation. The constant μ0 is non-zero
if you want to test whether the average of the difference is significantly different from μ0. The
degree of freedom used is n − 1.
Example of repeated measures
Number Name Test 1 Test 2
1 Mike 35% 67%
2 Melanie 50% 46%
3 Melissa 90% 86%
4 Mitchell 78% 91%
Example of matched pairs
Pair Name Age Test
1 John 35 250
1 Jane 36 340
2 Jimmy 22 460
2 Jessy 21 200
Worked examples
Let A1 denote a set obtained by taking 6 random samples out of a larger set:
A1 = {30.02,29.99,30.11,29.97,30.01,29.99}
and let A2 denote a second set obtained similarly:
A2 = {29.89,29.93,29.72,29.98,30.02,29.98}
These could be, for example, the weights of screws that were chosen out of a bucket.
We will carry out tests of the null hypothesis that the means of the populations from which the
two samples were taken are equal.
The difference between the two sample means, each denoted by , which appears in the
numerator for all the two-sample testing approaches discussed above, is
The sample standard deviations for the two samples are approximately 0.05 and 0.11,
respectively. For such small samples, a test of equality between the two population variances
would not be very powerful. Since the sample sizes are equal, the two forms of the two sample t-
test will perform similarly in this example.
Unequal variances
If we follow the approach for unequal variances, discussed above, we get
and
The test statistic is approximately 1.959. The two-tailed test p-value is approximately 0.091 and
the one-tailed p-value is approximately 0.045.
Equal variances
If we follow the approach for equal variance, discussed above, we get
and
Since the sample sizes are equal (both are 6), the test statistic is again approximately equal to
1.959. Since the degrees of freedom is different from what it is in the unequal variances test, the
p-values will differ slightly from what was found above. Here, the two-tailed p-value is
approximately 0.078, and the one-tailed p-value is approximately 0.039. Thus if there is good
reason to believe that the population variances are equal, the results become somewhat more
suggestive of a difference in the mean weights for the two populations of screws.
Alternatives to the t-test for location problems

The t-test provides an exact test for the equality of the means of two normal populations with
unknown, but equal, variances. (The Welch's t-test is a nearly-exact test for the case where the
data are normal but the variances may differ.) For moderately large samples and a one tailed test,
the t is relatively robust to moderate violations of the normality assumption.
For exactness, the t-test and Z-test require normality of the sample means, and the t-test
additionally requires that the sample variance follows a scaled χ2 distribution, and that the sample
mean and sample variance be statistically independent. Normality of the individual data values is
not required if these conditions are met. By the central limit theorem, sample means of
moderately large samples are often well-approximated by a normal distribution even if the data
are not normally distributed. For non-normal data, the distribution of the sample variance may
deviate substantially from a χ2 distribution. However, if the sample size is large, Slutsky's
theorem implies that the distribution of the sample variance has little effect on the distribution of
the test statistic. If the data are substantially non-normal and the sample size is small, the t-test
can give misleading results. See Location test for Gaussian scale mixture distributions for some
theory related to one particular family of non-normal distributions.
When the normality assumption does not hold, a non-parametric alternative to the t-test can often
have better statistical power. For example, for two independent samples when the data
distributions are asymmetric (that is, the distributions are skewed) or the distributions have large
tails, then the Wilcoxon Rank Sum test (also known as the Mann-Whitney U test) can have three
to four times higher power than the t-test.[10][11][12] The nonparametric counterpart to the paired
samples t test is the Wilcoxon signed-rank test for paired samples. For a discussion on choosing
between the t and nonparametric alternatives, see Sawilowsky.
One-way analysis of variance generalizes the two-sample t-test when the data belong to more
than two groups.
Multivariate testing
A generalization of Student's t statistic, called Hotelling's T-square statistic, allows for the testing
of hypotheses on multiple (often correlated) measures within the same sample. For instance, a
researcher might submit a number of subjects to a personality test consisting of multiple
personality scales (e.g. the Big Five). Because measures of this type are usually highly
correlated, it is not advisable to conduct separate univariate t-tests to test hypotheses, as these
would neglect the covariance among measures and inflate the chance of falsely rejecting at least
one hypothesis (Type I error). In this case a single multivariate test is preferable for hypothesis
testing. Hotelling's T 2 statistic follows a T 2 distribution. However, in practice the distribution is
rarely used, and instead converted to an F distribution.
One-sample T 2 test
For a one-sample multivariate test, the hypothesis is that the mean vector ( ) is equal to a given
vector ( ). The test statistic is defined as:
where n is the sample size, is the vector of column means and is a sample
covariance matrix.
Two-sample T 2 test
For a two-sample multivariate test, the hypothesis is that the mean vectors ( , ) of two
samples are equal. The test statistic is defined as
Implementations
Most spreadsheet programs and statistics packages, such as OpenOffice.org Calc, Microsoft
Excel, SPSS, DAP, gretl, R, Python (programming language) ([1]), PSPP, and Minitab, include
implementations of Student's t-test.
F-test
An F-test is any statistical test in which the test statistic has an F-distribution under the null
hypothesis. It is most often used when comparing statistical models that have been fit to a data
set, in order to identify the model that best fits the population from which the data were sampled.
Exact F-tests mainly arise when the models have been fit to the data using least squares. The
name was coined by George W. Snedecor, in honour of Sir Ronald A. Fisher. Fisher initially
developed the statistic as the variance ratio in the 1920s.
Common examples of F-tests

Examples of F-tests include:
 The hypothesis that the means of several normally distributed populations, all having the same
standard deviation, are equal. This is perhaps the best-known F-test, and plays an important role
in the analysis of variance (ANOVA).
 The hypothesis that a proposed regression model fits the data well. See Lack-of-fit sum of
squares.
 The hypothesis that a data set in a regression analysis follows the simpler of two proposed linear
models that are nested within each other.
 Scheffé's method for multiple comparisons adjustment in linear models.
F-test of the equality of two variances

Main article: F-test of equality of variances
This F-test is extremely sensitive to non-normality. Alternatives are Levene's test, Bartlett's test,
and the Brown–Forsythe test. However, when any of these tests are conducted to test the
underlying assumption of homoscedasticity (i.e. homogeneity of variance), as a preliminary step
to testing for mean effects in ANOVA, there is an increase in the experiment-wise Type I error
rate.
Formula and calculation

Most F-tests arise by considering a decomposition of the variability in a collection of data in
terms of sums of squares. The test statistic in an F-test is the ratio of two scaled sums of squares
reflecting different sources of variability. These sums of squares are constructed so that the
statistic tends to be greater when the null hypothesis is not true. In order for the statistic to follow
the F-distribution under the null hypothesis, the sums of squares should be statistically
independent, and each should follow a scaled chi-squared distribution. The latter condition is
guaranteed if the data values are independent and normally distributed with a common variance.
Multiple-comparison ANOVA problems
The F-test in one-way analysis of variance is used to assess whether the expected values of a
quantitative variable within several pre-defined groups differ from each other. For example,
suppose that a medical trial compares four treatments. The ANOVA F-test can be used to assess
whether any of the treatments is on average superior, or inferior, to the others versus the null
hypothesis that all four treatments yield the same mean response. This is an example of an
"omnibus" test, meaning that a single test is performed to detect any of several possible
differences. Alternatively, we could carry out pairwise tests among the treatments (for instance,
in the medical trial example with four treatments we could carry out six tests among pairs of
treatments). The advantage of the ANOVA F-test is that we do not need to pre-specify which
treatments are to be compared, and we do not need to adjust for making multiple comparisons.
The disadvantage of the ANOVA F-test is that if we reject the null hypothesis, we do not know
which treatments can be said to be significantly different from the others — if the F-test is
performed at level α we cannot state that the treatment pair with the greatest mean difference is
significantly different at level α.
The formula for the one-way ANOVA F-test statistic is
or
The "explained variance", or "between-group variability" is

where denotes the sample mean in the ith group, ni is the number of observations in the ith
group, and denotes the overall mean of the data.
The "unexplained variance", or "within-group variability" is
where Yij is the jth observation in the ith out of K groups and N is the overall sample size. This F-
statistic follows the F-distribution with K − 1, N −K degrees of freedom under the null
hypothesis. The statistic will be large if the between-group variability is large relative to the
within-group variability, which is unlikely to happen if the population means of the groups all
have the same value.
Note that when there are only two groups for the one-way ANOVA F-test, F = t2 where t is the
Student's t statistic.
Regression problems
Consider two models, 1 and 2, where model 1 is 'nested' within model 2. That is, model 1 has p1
parameters, and model 2 has p2 parameters, where p2 > p1, and for any choice of parameters in
model 1, the same regression curve can be achieved by some choice of the parameters of model
2. (We use the convention that any constant parameter in a model is included when counting the
parameters. For instance, the simple linear model y = mx + b has p = 2 under this convention.)
The model with more parameters will always be able to fit the data at least as well as the model
with fewer parameters. Thus typically model 2 will give a better (i.e. lower error) fit to the data
than model 1. But one often wants to determine whether model 2 gives a significantly better fit to
the data. One approach to this problem is to use an F test.
If there are n data points to estimate parameters of both models from, then one can calculate the
F statistic, given by
where RSSi is the residual sum of squares of model i. If your regression model has been
calculated with weights, then replace RSSi with χ2, the weighted sum of squared residuals. Under
the null hypothesis that model 2 does not provide a significantly better fit than model 1, F will
have an F distribution, with (p2 − p1, n − p2) degrees of freedom. The null hypothesis is rejected
if the F calculated from the data is greater than the critical value of the F distribution for some
desired false-rejection probability (e.g. 0.05). The test is a likelihood ratio test.
One-way ANOVA example

Consider an experiment to study the effect of three different levels of some factor on a response
(e.g. three types of fertilizer on plant growth). If we had 6 observations for each level, we could
write the outcome of the experiment in a table like this, where a1, a2, and a3 are the three levels
of the factor being studied.
a1 a2 a3
6 8 13
8 12 9
4 9 11
5 11 8
3 6 7
4 8 12
The null hypothesis, denoted H0, for the overall F-test for this experiment would be that all three
levels of the factor produce the same response, on average. To calculate the F-ratio:
Step 1: Calculate the mean within each group:
Step 2: Calculate the overall mean:
where a is the number of groups.
Step 3: Calculate the "between-group" sum of squares:

where n is the number of data values per group.
The between-group degrees of freedom is one less than the number of groups
dfb = 3 − 1 = 2
so the between-group mean square value is
MSB = 84 / 2 = 42
Step 4: Calculate the "within-group" sum of squares. Begin by centering the data in each group
a1 a2 a3
6 − 5 = 1 8 − 9 = -1 13 − 10 = 3
8 − 5 = 3 12 − 9 = 3 9 − 10 = -1
4 − 5 = -1 9 − 9 = 0 11 − 10 = 1
5 − 5 = 0 11 − 9 = 2 8 − 10 = -2
3 − 5 = -2 6 − 9 = -3 7 − 10 = -3
4 − 5 = -1 8 − 9 = -1 12 − 10 = 2
The within-group sum of squares is the sum of squares of all 18 values in this table
SSW = 1 + 9 + 1 + 0 + 4 + 1 + 1 + 9 + 0 + 4 + 9 + 1 + 9 + 1 + 1 + 4 + 9 + 4 = 68
The within-group degrees of freedom is
dfW = a(n − 1) = 3(6 − 1) = 15

Thus the within-group mean square value is
Step 5: The F-ratio is
The critical value is the number that the test statistic must exceed to reject the test. In this case,
Fcrit(2,15) = 3.68 at α = 0.05. Since F = 9.3 > 3.68, the results are significant at the 5%
significance level. One would reject the null hypothesis, concluding that there is strong evidence
that the expected values in the three groups differ. The p-value for this test is 0.002.
After performing the F-test, it is common to carry out some "post-hoc" analysis of the group
means. In this case, the first two group means differ by 4 units, the first and third group means
differ by 5 units, and the second and third group means differ by only 1 unit. The standard error
of each of these differences is . Thus the first group is strongly

different from the other groups, as the mean difference is more times the standard error, so we
can be highly confident that the population mean of the first group differs from the population
means of the other groups. However there is no evidence that the second and third groups have
different population means from each other, as their mean difference of one unit is comparable to
the standard error.
Note F(x, y) denotes an F-distribution with x degrees of freedom in the numerator and y degrees
of freedom in the denominator.
ANOVA's robustness with respect to Type I errors for

departures from population normality
The oneway ANOVA can be generalized to the factorial and multivariate layouts, as well as to the analysis
of covariance. None of these F-tests, however, are robust when there are severe violations of the
assumption that each population follows the normal distribution, particularly for small alpha levels and
unbalanced layouts.Furthermore, if the underlying assumption of homoscedasticity is violated, the Type
I error properties degenerate much more severely. For nonparametric alternatives in the factorial layout,
see Sawilowsky. For more discussion see ANCHI-SQUARE TEST
Adapted by Anne F. Maben from "Statistics for the Social Sciences" by Vicki Sharp
The chi-square (I) test is used to determine whether there is a significant difference between the expected
frequencies and the observed frequencies in one or more categories. Do the number of individuals or
objects that
fall in each category differ significantly from the number you would expect? Is this difference between the
expected and observed due to sampling error, or is it a real difference?
Chi-Square Test Requirements
1. Quantitative data.
2. One or more categories.
3. Independent observations.
4. Adequate sample size (at least 10).
5. Simple random sample.
6. Data in frequency form.
7. All observations must be used.
Expected Frequencies
When you find the value for chi square, you determine whether the observed frequencies differ
significantly
from the expected frequencies. You find the expected frequencies for chi square in three ways:
I . You hypothesize that all the frequencies are equal in each category. For example, you might expect
that
half of the entering freshmen class of 200 at Tech College will be identified as women and half as men.
You
figure the expected frequency by dividing the number in the sample by the number of categories. In this
exam
pie, where there are 200 entering freshmen and two categories, male and female, you divide your sample
of
200 by 2, the number of categories, to get 100 (expected frequencies) in each category.
2. You determine the expected frequencies on the basis of some prior knowledge. Let's use the Tech
College
example again, but this time pretend we have prior knowledge of the frequencies of men and women in
each
category from last year's entering class, when 60% of the freshmen were men and 40% were women.
This
year you might expect that 60% of the total would be men and 40% would be women. You find the
expected
frequencies by multiplying the sample size by each of the hypothesized population proportions. If the
freshmen total were 200, you would expect 120 to be men (60% x 200) and 80 to be women (40% x 200).
Now let's take a situation, find the expected frequencies, and use the chi-square test to solve the
problem.
Situation
Thai, the manager of a car dealership, did not want to stock cars that were bought less frequently
because of
their unpopular color. The five colors that he ordered were red, yellow, green, blue, and white. According
to Thai,
the expected frequencies or number of customers choosing each color should follow the percentages of
last year.
She felt 20% would choose yellow, 30% would choose red, 10% would choose green, 10% would choose
blue,
and 30% would choose white. She now took a random sample of 150 customers and asked them their
color
preferences. The results of this poll are shown in Table 1 under the column labeled _observed
frequencies."
Table 1 - Color Preference for 150 Customers for Thai's Superior Car Dealership
Category Color
Observed Frequencies
Yellow
35
30
Red
50
45
Green
30
15
Blue
10
15
White
25
45
The expected frequencies in Table 1 are figured from last year's percentages. Based on the percentages
for
last year, we would expect 20% to choose yellow. Figure the expected frequencies for yellow by taking
20% of
the 150 customers, getting an expected frequency of 30 people for this category. For the color red we
would
expect 30% out of 150 or 45 people to fall in this category. Using this method, Thai figured out the
expected
frequencies 30, 45, 15, 15, and 45. Obviously, there are discrepancies between the colors preferred by
customers
in the poll taken by Thai and the colors preferred by the customers who bought their cars last year. Most
striking
is the difference in the green and white colors. If Thai were to follow the results of her poll, she would
stock twice
as many green cars than if she were to follow the customer color preference for green based on last
year's sales.
In the case of white cars, she would stock half as many this year. What to do??? Thai needs to know
whether or
not the discrepancies between last year's choices (expected frequencies) and this year's preferences on
the basis
of his poll (observed frequencies) demonstrate a real change in customer color preferences. It could be
that the
differences are simply a result of the random sample she chanced to select. If so, then the population of
customers
really has not changed from last year as far as color preferences go. The null hypothesis states that there
is no significant difference between the expected and observed frequencies. The alternative hypothesis
states
they are different. The level of significance (the point at which you can say with 95% confidence that the
difference is NOT due to chance alone) is set at .05 (the standard for most science experiments.) The chi-
square
formula used on these data is
X2 = (O - E)2 where O is the Observed Frequency in each category
E E is the Expected Frequency in the corresponding category
_ is _sum of_
df is the "degree of freedom" (n-1)
X2 is Chi Square
PROCEDURE
We are now ready to use our formula for X2 and find out if there is a significant difference between the
observed and expected frequencies for the customers in choosing cars. We will set up a worksheet; then
you will
follow the directions to form the columns and solve the formula.
1. Directions for Setting Up Worksheet for Chi Square
Category
O
E
(O - E)
(O - E)2
(O - E)2
E
yellow
35
30
5
25
0.83
red
50
45
5
25
0.56
green
30
15
15
225
15
blue
10
15
-5
25
1.67
white
25
45
-20
400
8.89
X2 = 26.95
2. After calculating the Chi Square value, find the "Degrees of Freedom." (DO NOT SQUARE THE
NUMBER
YOU GET, NOR FIND THE SQUARE ROOT - THE NUMBER YOU GET FROM COMPLETING THE
CALCULATIONS AS ABOVE IS _CHI SQUARE.)
Degrees of freedom (df) refers to the number of values that are free to vary after restriction has been
placed on the data. For instance, if you have four numbers with the restriction that their sum has to be 50,
then three of these numbers can be anything, they are free to vary, but the fourth number definitely is
restricted. For example, the first three numbers could be 15, 20, and 5, adding up to 40; then the fourth
number has to be 10 in order that they sum to 50. The degrees of freedom for these values are then
three.
The degrees of freedom here is defined as N - 1, the number in the group minus one restriction (4 - I ).
3. Find the table value for Chi Square. Begin by finding the df found in step 2 along the left hand side of
the
table. Run your fingers across the proper row until you reach the predetermined level of significance (.05)
at
the column heading on the top of the table. The table value for Chi Square in the correct box of 4 df and
P=.05 level of significance is 9.49.
4. If the calculated chi-square value for the set of data you are analyzing (26.95) is equal to or greater
than the
table value (9.49 ), reject the null hypothesis. There IS a significant difference between the data sets
that
cannot be due to chance alone. If the number you calculate is LESS than the number you find on the
table,
than you can probably say that any differences are due to chance alone.
In this situation, the rejection of the null hypothesis means that the differences between the expected
frequencies (based upon last year's car sales) and the observed frequencies (based upon this year's poll
taken by Thai) are not due to chance. That is, they are not due to chance variation in the sample Thai
took;
there is a real difference between them. Therefore, in deciding what color autos to stock, it would be to
Thai's
advantage to pay careful attention to the results of her poll!
The steps in using the chi-square test may be summarized as follows:
Chi-Square I. Write the observed frequencies in column O
Test Summary 2. Figure the expected frequencies and write them in column E.
3. Use the formula to find the chi-square value:
4. Find the df. (N-1)
5. Find the table value (consult the Chi Square Table.)
6. If your chi-square value is equal to or greater than the table value, reject the null
hypothesis: differences in your data are not due to chance alone
For example, the reason observed frequencies in a fruit fly genetic breeding lab did not match expected
frequencies could be due to such influences as:
• Mate selection (certain flies may prefer certain mates)
• Too small of a sample size was used
• Incorrect identification of male or female flies
• The wrong genetic cross was sent from the lab
• The flies were mixed in the bottle (carrying unexpected alleles)
CHI-SQUARE TEST
Adapted by Anne F. Maben from "Statistics for the Social Sciences" by Vicki Sharp
The chi-square (I) test is used to determine whether there is a significant difference between the expected
frequencies and the observed frequencies in one or more categories. Do the number of individuals or
objects that
fall in each category differ significantly from the number you would expect? Is this difference between the
expected and observed due to sampling error, or is it a real difference?
Chi-Square Test Requirements
1. Quantitative data.
2. One or more categories.
3. Independent observations.
4. Adequate sample size (at least 10).
5. Simple random sample.
6. Data in frequency form.
7. All observations must be used.
When you find the value for chi square, you determine whether the observed frequencies differ
significantly
from the expected frequencies. You find the expected frequencies for chi square in three ways:
I . You hypothesize that all the frequencies are equal in each category. For example, you might expect
that
half of the entering freshmen class of 200 at Tech College will be identified as women and half as men.
You
figure the expected frequency by dividing the number in the sample by the number of categories. In this
exam
pie, where there are 200 entering freshmen and two categories, male and female, you divide your sample
of
200 by 2, the number of categories, to get 100 (expected frequencies) in each category.
2. You determine the expected frequencies on the basis of some prior knowledge. Let's use the Tech
College
example again, but this time pretend we have prior knowledge of the frequencies of men and women in
each
category from last year's entering class, when 60% of the freshmen were men and 40% were women.
This
year you might expect that 60% of the total would be men and 40% would be women. You find the
expected
frequencies by multiplying the sample size by each of the hypothesized population proportions. If the
freshmen total were 200, you would expect 120 to be men (60% x 200) and 80 to be women (40% x 200).
Now let's take a situation, find the expected frequencies, and use the chi-square test to solve the
problem.
Situation
Thai, the manager of a car dealership, did not want to stock cars that were bought less frequently
because of
their unpopular color. The five colors that he ordered were red, yellow, green, blue, and white. According
to Thai,
the expected frequencies or number of customers choosing each color should follow the percentages of
last year.
She felt 20% would choose yellow, 30% would choose red, 10% would choose green, 10% would choose
blue,
and 30% would choose white. She now took a random sample of 150 customers and asked them their
color
preferences. The results of this poll are shown in Table 1 under the column labeled _observed
frequencies."
Table 1 - Color Preference for 150 Customers for Thai's Superior Car Dealership
Category Color
Observed Frequencies
Yellow
35
30
Red
50
45
Green
30
15
Blue
10
15
White
25
45
The expected frequencies in Table 1 are figured from last year's percentages. Based on the percentages
for
last year, we would expect 20% to choose yellow. Figure the expected frequencies for yellow by taking
20% of
the 150 customers, getting an expected frequency of 30 people for this category. For the color red we
would
expect 30% out of 150 or 45 people to fall in this category. Using this method, Thai figured out the
expected
frequencies 30, 45, 15, 15, and 45. Obviously, there are discrepancies between the colors preferred by
customers
in the poll taken by Thai and the colors preferred by the customers who bought their cars last year. Most
striking
is the difference in the green and white colors. If Thai were to follow the results of her poll, she would
stock twice
as many green cars than if she were to follow the customer color preference for green based on last
year's sales.
In the case of white cars, she would stock half as many this year. What to do??? Thai needs to know
whether or
not the discrepancies between last year's choices (expected frequencies) and this year's preferences on
the basis
of his poll (observed frequencies) demonstrate a real change in customer color preferences. It could be
that the
differences are simply a result of the random sample she chanced to select. If so, then the population of
customers
really has not changed from last year as far as color preferences go. The null hypothesis states that there
is no significant difference between the expected and observed frequencies. The alternative hypothesis
states
they are different. The level of significance (the point at which you can say with 95% confidence that the
difference is NOT due to chance alone) is set at .05 (the standard for most science experiments.) The chi-
square
formula used on these data is
X2 = (O - E)2 where O is the Observed Frequency in each category
E E is the Expected Frequency in the corresponding category
_ is _sum of_
df is the "degree of freedom" (n-1)
X2 is Chi Square
PROCEDURE
We are now ready to use our formula for X2 and find out if there is a significant difference between the
observed and expected frequencies for the customers in choosing cars. We will set up a worksheet; then
you will
follow the directions to form the columns and solve the formula.
1. Directions for Setting Up Worksheet for Chi Square
Category
O
E
(O - E)
(O - E)2
(O - E)2
E
yellow
35
30
5
25
0.83
red
50
45
5
25
0.56
green
30
15
15
225
15
blue
10
15
-5
25
1.67
white
25
45
-20
400
8.89
X2 = 26.95
2. After calculating the Chi Square value, find the "Degrees of Freedom." (DO NOT SQUARE THE
NUMBER
YOU GET, NOR FIND THE SQUARE ROOT - THE NUMBER YOU GET FROM COMPLETING THE
CALCULATIONS AS ABOVE IS _CHI SQUARE.)
Degrees of freedom (df) refers to the number of values that are free to vary after restriction has been
placed on the data. For instance, if you have four numbers with the restriction that their sum has to be 50,
then three of these numbers can be anything, they are free to vary, but the fourth number definitely is
restricted. For example, the first three numbers could be 15, 20, and 5, adding up to 40; then the fourth
number has to be 10 in order that they sum to 50. The degrees of freedom for these values are then
three.
The degrees of freedom here is defined as N - 1, the number in the group minus one restriction (4 - I ).
3. Find the table value for Chi Square. Begin by finding the df found in step 2 along the left hand side of
the
table. Run your fingers across the proper row until you reach the predetermined level of significance (.05)
at
the column heading on the top of the table. The table value for Chi Square in the correct box of 4 df and
P=.05 level of significance is 9.49.
4. If the calculated chi-square value for the set of data you are analyzing (26.95) is equal to or greater
than the
table value (9.49 ), reject the null hypothesis. There IS a significant difference between the data sets
that
cannot be due to chance alone. If the number you calculate is LESS than the number you find on the
table,
than you can probably say that any differences are due to chance alone.
In this situation, the rejection of the null hypothesis means that the differences between the expected
frequencies (based upon last year's car sales) and the observed frequencies (based upon this year's poll
taken by Thai) are not due to chance. That is, they are not due to chance variation in the sample Thai
took;
there is a real difference between them. Therefore, in deciding what color autos to stock, it would be to
Thai's
advantage to pay careful attention to the results of her poll!
The steps in using the chi-square test may be summarized as follows:
Chi-Square I. Write the observed frequencies in column O
Test Summary 2. Figure the expected frequencies and write them in column E.
3. Use the formula to find the chi-square value:
4. Find the df. (N-1)
5. Find the table value (consult the Chi Square Table.)
6. If your chi-square value is equal to or greater than the table value, reject the null
hypothesis: differences in your data are not due to chance alone
For example, the reason observed frequencies in a fruit fly genetic breeding lab did not match expected
frequencies could be due to such influences as:
• Mate selection (certain flies may prefer certain mates)
• Too small of a sample size was used
• Incorrect identification of male or female flies
• The wrong genetic cross was sent from the lab
• The flies were mixed in the bottle (carrying unexpected alleles)
Chi-Square Test
Chi-square is a statistical test commonly used to compare observed data with data we would
expect to obtain according to a specific hypothesis. For example, if, according to Mendel's laws,
you expected 10 of 20 offspring from a cross to be male and the actual observed number was 8
males, then you might want to know about the "goodness to fit" between the observed and
expected. Were the deviations (differences between observed and expected) the result of chance,
or were they due to other factors. How much deviation can occur before you, the investigator,
must conclude that something other than chance is at work, causing the observed to differ from
the expected. The chi-square test is always testing what scientists call the null hypothesis, which
states that there is no significant difference between the expected and observed result.
2
The formula for calculating chi-square ( ) is:
2
= (o-e)2/e
That is, chi-square is the sum of the squared difference between observed (o) and the expected
(e) data (or the deviation, d), divided by the expected data in all possible categories.
For example, suppose that a cross between two pea plants yields a population of 880 plants, 639
with green seeds and 241 with yellow seeds. You are asked to propose the genotypes of the
parents. Your hypothesis is that the allele for green is dominant to the allele for yellow and that
the parent plants were both heterozygous for this trait. If your hypothesis is true, then the
predicted ratio of offspring from this cross would be 3:1 (based on Mendel's laws) as predicted
from the results of the Punnett square (Figure B. 1).
Figure B.1 - Punnett Square. Predicted offspring from cross

between green and yellow-seeded plants. Green (G) is dominant
(3/4 green; 1/4 yellow).
To calculate 2 , first determine the number expected in each category. If the ratio is 3:1 and the
total number of observed individuals is 880, then the expected numerical values should be 660
green and 220 yellow.
Chi-square requires that you use numerical values, not percentages or ratios.
Then calculate 2 using this formula, as shown in Table B.1. Note that we get a value of 2.668 for
2
. But what does this number mean? Here's how to interpret the 2 value:
1. Determine degrees of freedom (df). Degrees of freedom can be calculated as the number of
categories in the problem minus 1. In our example, there are two categories (green and yellow);
therefore, there is I degree of freedom.
2. Determine a relative standard to serve as the basis for accepting or rejecting the hypothesis.
The relative standard commonly used in biological research is p > 0.05. The p value is the
probability that the deviation of the observed from that expected is due to chance alone (no other
forces acting). In this case, using p > 0.05, you would expect any deviation to be due to chance
alone 5% of the time or less.
3. Refer to a chi-square distribution table (Table B.2). Using the appropriate degrees of 'freedom,
locate the value closest to your calculated chi-square in the table. Determine the closestp
(probability) value associated with your chi-square and degrees of freedom. In this case (
2
=2.668), the p value is about 0.10, which means that there is a 10% probability that any
deviation from expected results is due to chance only. Based on our standard p > 0.05, this is
within the range of acceptable deviation. In terms of your hypothesis for this example, the
observed chi-squareis not significantly different from expected. The observed numbers are
consistent with those expected under Mendel's law.
Step-by-Step Procedure for Testing Your Hypothesis and Calculating Chi-Square
1. State the hypothesis being tested and the predicted results. Gather the data by conducting the
proper experiment (or, if working genetics problems, use the data provided in the problem).
2. Determine the expected numbers for each observational class. Remember to use numbers, not
percentages.
Chi-square should not be calculated if the expected value in any category is less than 5.
3. Calculate 2 using the formula. Complete all calculations to three significant digits. Round off
your answer to two significant digits.
4. Use the chi-square distribution table to determine significance of the value.
a. Determine degrees of freedom and locate the value in the appropriate column.
b. Locate the value closest to your calculated 2 on that degrees of freedom df row.
c. Move up the column to determine the p value.
5. State your conclusion in terms of your hypothesis.
a. If the p value for the calculated 2 is p > 0.05, accept your hypothesis. 'The deviation is
small enough that chance alone accounts for it. A p value of 0.6, for example, means that
there is a 60% probability that any deviation from expected is due to chance only. This is
within the range of acceptable deviation.
b. If the p value for the calculated 2 is p < 0.05, reject your hypothesis, and conclude that
some factor other than chance is operating for the deviation to be so great. For example, a
p value of 0.01 means that there is only a 1% chance that this deviation is due to chance
alone. Therefore, other factors must be involved.
The chi-square test will be used to test for the "goodness to fit" between observed and expected
data from several laboratory investigations in this lab manual.
Table B.1
Calculating Chi-Square
Green Yellow
Observed (o) 639 241
Expected (e) 660 220
Deviation (o - e) -21 21
Deviation2 (d2) 441 441
2
d /e 0.668 2
2 2
= d /e = 2.668 . .
Table B.2
Chi-Square Distribution
Degrees of
Freedom
Probability (p)
(df)
0.95 0.90 0.80 0.70 0.50 0.30 0.20 0.10 0.05 0.01 0.001
1 0.004 0.02 0.06 0.15 0.46 1.07 1.64 2.71 3.84 6.64 10.83
2 0.10 0.21 0.45 0.71 1.39 2.41 3.22 4.60 5.99 9.21 13.82
3 0.35 0.58 1.01 1.42 2.37 3.66 4.64 6.25 7.82 11.34 16.27
4 0.71 1.06 1.65 2.20 3.36 4.88 5.99 7.78 9.49 13.28 18.47
5 1.14 1.61 2.34 3.00 4.35 6.06 7.29 9.24 11.07 15.09 20.52
6 1.63 2.20 3.07 3.83 5.35 7.23 8.56 10.64 12.59 16.81 22.46
7 2.17 2.83 3.82 4.67 6.35 8.38 9.80 12.02 14.07 18.48 24.32
8 2.73 3.49 4.59 5.53 7.34 9.52 11.03 13.36 15.51 20.09 26.12
9 3.32 4.17 5.38 6.39 8.34 10.66 12.24 14.68 16.92 21.67 27.88
10 3.94 4.86 6.18 7.27 9.34 11.78 13.44 15.99 18.31 23.21 29.59
Nonsignificant Significant
Source: R.A. Fisher and F. Yates, Statistical Tables for Biological Agricultural and Medical
Research, 6th ed., Table IV, Oliver & Boyd, Ltd., Edinburgh, by permission of the authors and
publishers.
Chi-Square Goodness-of-Fit Test
Purpose:
Test for distributional adequacy The chi-square test (Snedecor and Cochran, 1989) is used to test
if a sample of data came from a population with a specific distribution.
An attractive feature of the chi-square goodness-of-fit test is that it can be applied to any
univariate distribution for which you can calculate the cumulative distribution function. The chi-
square goodness-of-fit test is applied to binned data (i.e., data put into classes). This is actually
not a restriction since for non-binned data you can simply calculate a histogram or frequency
table before generating the chi-square test. However, the value of the chi-square test statistic are
dependent on how the data is binned. Another disadvantage of the chi-square test is that it
requires a sufficient sample size in order for the chi-square approximation to be valid.
The chi-square test is an alternative to the Anderson-Darling and Kolmogorov-Smirnov

goodness-of-fit tests. The chi-square goodness-of-fit test can be applied to discrete distributions
such as the binomial and the Poisson. The Kolmogorov-Smirnov and Anderson-Darling tests are
restricted to continuous distributions.
Additional discussion of the chi-square goodness-of-fit test is contained in the product and
process comparisons chapter (chapter 7).
Definition The chi-square test is defined for the hypothesis:

H0: The data follow a specified distribution.
Ha: The data do not follow the specified distribution.
Test Statistic: For the chi-square goodness-of-fit computation, the data are divided into k bins and
the test statistic is defined as
where is the observed frequency for bin i and is the expected frequency for
bin i. The expected frequency is calculated by
where F is the cumulative Distribution function for the distribution being tested, Yu
is the upper limit for class i, Yl is the lower limit for class i, and N is the sample
size.
This test is sensitive to the choice of bins. There is no optimal choice for the bin
width (since the optimal bin width depends on the distribution). Most reasonable
choices should produce similar, but not identical, results. Dataplot uses 0.3*s,
where s is the sample standard deviation, for the class width. The lower and upper
bins are at the sample mean plus and minus 6.0*s, respectively. For the chi-square
approximation to be valid, the expected frequency should be at least 5. This test is
not valid for small samples, and if some of the counts are less than five, you may
need to combine some bins in the tails.
Significance .
Level:
Critical The test statistic follows, approximately, a chi-square distribution with (k - c)
Region: degrees of freedom where k is the number of non-empty cells and c = the number
of estimated parameters (including location and scale parameters and shape
parameters) for the distribution + 1. For example, for a 3-parameter Weibull
distribution, c = 4.
Therefore, the hypothesis that the data are from a population with the specified
distribution is rejected if
where is the chi-square percent point function with k - c degrees of

freedom and a significance level of .
In the above formulas for the critical regions, the Handbook follows the convention
that is the upper critical value from the chi-square distribution and is the
lower critical value from the chi-square distribution. Note that this is the opposite
of what is used in some texts and software programs. In particular, Dataplot uses
the opposite convention.
Sample Output
Dataplot generated the following output for the chi-square test where 1,000 random numbers
were generated for the normal, double exponential, t with 3 degrees of freedom, and lognormal
distributions. In all cases, the chi-square test was applied to test for a normal distribution. The
test statistics show the characteristics of the test; when the data are from a normal distribution,
the test statistic is small and the hypothesis is accepted; when the data are from the double
exponential, t, and lognormal distributions, the statistics are significant and the hypothesis of an
underlying normal distribution is rejected at significance levels of 0.10, 0.05, and 0.01.
The normal random numbers were stored in the variable Y1, the double exponential random
numbers were stored in the variable Y2, the t random numbers were stored in the variable Y3,
and the lognormal random numbers were stored in the variable Y4.
*************************************************
** normal chi-square goodness of fit test y1 **
*************************************************
CHI-SQUARED GOODNESS-OF-FIT TEST
NULL HYPOTHESIS H0: DISTRIBUTION FITS THE DATA

ALTERNATE HYPOTHESIS HA: DISTRIBUTION DOES NOT FIT THE DATA
DISTRIBUTION: NORMAL
SAMPLE:
NUMBER OF OBSERVATIONS = 1000
NUMBER OF NON-EMPTY CELLS = 24
NUMBER OF PARAMETERS USED = 0
TEST:
CHI-SQUARED TEST STATISTIC = 17.52155
DEGREES OF FREEDOM = 23
CHI-SQUARED CDF VALUE = 0.217101
ALPHA LEVEL CUTOFF CONCLUSION

10% 32.00690 ACCEPT H0
5% 35.17246 ACCEPT H0
1% 41.63840 ACCEPT H0
CELL NUMBER, BIN MIDPOINT, OBSERVED FREQUENCY,

AND EXPECTED FREQUENCY
WRITTEN TO FILE DPST1F.DAT
*************************************************
*************************************************

SAMPLE:
TEST:

10% 34.38158 REJECT H0
5% 37.65248 REJECT H0
1% 44.31411 REJECT H0

*************************************************
*************************************************

SAMPLE:
TEST:

10% 33.19624 REJECT H0
5% 36.41503 REJECT H0
1% 42.97982 REJECT H0

*************************************************
*************************************************

SAMPLE:
TEST:
CHI-SQUARED TEST STATISTIC = 1162098.

10% 14.68366 REJECT H0
5% 16.91898 REJECT H0
1% 21.66600 REJECT H0

As we would hope, the chi-square test does not reject the normality hypothesis for the normal
distribution data set and rejects it for the three non-normal cases. Questions The chi-square test
can be used to answer the following types of questions:
 Are the data from a normal distribution?
 Are the data from a log-normal distribution?
 Are the data from a Weibull distribution?

 Are the data from an exponential distribution?
 Are the data from a logistic distribution?
 Are the data from a binomial distribution?
Importance Many statistical tests and procedures are based on specific distributional
assumptions. The assumption of normality is particularly common in classical statistical tests.
Much reliability modeling is based on the assumption that the distribution of the data follows a
Weibull distribution.
There are many non-parametric and robust techniques that are not based on strong distributional
assumptions. By non-parametric, we mean a technique, such as the sign test, that is not based on
a specific distributional assumption. By robust, we mean a statistical technique that performs
well under a wide range of distributional assumptions. However, techniques based on specific
distributional assumptions are in general more powerful than these non-parametric and robust
techniques. By power, we mean the ability to detect a difference when that difference actually
exists. Therefore, if the distributional assumption can be confirmed, the parametric techniques
are generally preferred.
If you are using a technique that makes a normality (or some other type of distributional)
assumption, it is important to confirm that this assumption is in fact justified. If it is, the more
powerful parametric techniques can be used. If the distributional assumption is not justified, a
non-parametric or robust technique may be required.
Related Techniques Anderson-Darling Goodness-of-Fit Test

Kolmogorov-Smirnov Test
Shapiro-Wilk Normality Test
Probability Plots
Probability Plot Correlation Coefficient Plot
Case Study Airplane glass failure times data. Software Some general purpose statistical software
programs, including Dataplot, provide a chi-square goodness-of-fit test for at least some of the
common distributions.
The Statistics Calculator

Statistical Analysis Tests At Your Fingertips
Download the Free Version
Counts Menu
The Counts menu selection has four tests that can be performed for simple frequency data. The
chi-square test is used to analyze a contingency table consisting of rows and columns to
determine if the observed cell frequencies differ significantly from the expected frequencies.
Fisher's exact test is similar to the chi-square test except it is used only for tables with exactly
two rows and two columns. The binomial test is used to calculate the probability of two mutually
exclusive outcomes. The poisson distribution events test is used to describe the number of events
that will occur in a specific period of time.
The Counts menu has four selections:
 Chi-square test
 Fisher's exact test
 Binomial test
 Poisson distribution events test
Chi-square test
The chi-square is one of the most popular statistics because it is easy to calculate and interpret.
There are two kinds of chi-square tests. The first is called a one-way analysis, and the second is
called a two-way analysis. The purpose of both is to determine whether the observed frequencies
(counts) markedly differ from the frequencies that we would expect by chance.
The observed cell frequencies are organized in rows and columns like a spreadsheet. This table
of observed cell frequencies is called a contingency table, and the chi-square test if part of a
contingency table analysis.
The chi-square statistic is the sum of the contributions from each of the individual cells. Every
cell in a table contributes something to the overall chi-square statistic. If a given cell differs
markedly from the expected frequency, then the contribution of that cell to the overall chi-square
is large. If a cell is close to the expected frequency for that cell, then the contribution of that cell
to the overall chi-square is low. A large chi-square statistic indicates that somewhere in the table,
the observed frequencies differ markedly from the expected frequencies. It does not tell which
cell (or cells) are causing the high chi-square...only that they are there. When a chi-square is
high, you must visually examine the table to determine which cell(s) are responsible.
When there are exactly two rows and two columns, the chi-square statistic becomes inaccurate,
and Yate's correction for continuity is usually applied. Statistics Calculator will automatically use
Yate's correction for two-by-two tables when the expected frequency of any cell is less than 5 or
the total N is less than 50.
If there is only one column or one row (a one-way chi-square test), the degrees of freedom is the
number of cells minus one. For a two way chi-square, the degrees of freedom is the number or
rows minus one times the number of columns minus one.
Using the chi-square statistic and its associated degrees of freedom, the software reports the
probability that the differences between the observed and expected frequencies occurred by
chance. Generally, a probability of .05 or less is considered to be a significant difference.
A standard spreadsheet interface is used to enter the counts for each cell. After you've finished
entering the data, the program will print the chi-square, degrees of freedom and probability of
chance.
Use caution when interpreting the chi-square statistic if any of the expected cell frequencies are
less than five. Also, use caution when the total for all cells is less than 50.
Example
A drug manufacturing company conducted a survey of customers. The research question is: Is
there a significant relationship between packaging preference (size of the bottle purchased) and
economic status? There were four packaging sizes: small, medium, large, and jumbo. Economic
status was: lower, middle, and upper. The following data was collected.
Lower Middle Upper
Small 24 22 18
Medium 23 28 19
Large 18 27 29
Jumbo 16 21 33
------------------------------------------------
Chi-square statistic = 9.743

Degrees of freedom = 6
Probability of chance = .1359
Fisher's exact test

The chi-square statistic becomes inaccurate when used to analyze contingency tables that contain
exactly two rows and two columns, and that contain less than 50 cases. Fisher's exact probability
is not plagued by inaccuracies due to small N's. Therefore, it should be used for two-by-two
contingency tables that contain fewer than 50 cases.
Example
Here are the results of a public opinion poll broken down by gender. What is the exact
probability that the difference between the observed and expected frequencies occurred by
chance?
Male Female
Favor 10 14
Opposed 15 9
-------------------------------------------
Fisher's exact probability = .0828
Binomial test
The binomial distribution is used for calculating the probability of dichotomous outcomes in
which the two choices are mutually exclusive. The program requires that you enter the number of
trials, probability of the desired outcome on each trial, and the number of times the desired
outcome was observed.
Example
If we were to flip a coin one hundred times, and it came up heads seventy times, what is the
probability of this happening?
Number of trials: 100

Probability of success on each trial (0-1): .5
Number of successes: 70
---------------------------------------------------------
Probability of 70 or more successes < .0001
Poisson distribution events test

The poisson distribution, like the binomial distribution, is used to determine the probability of an
observed frequency. It is used to describe the number of events that will occur in a specific
period of time or in a specific area or volume. You need to enter the observed and expected
frequencies.
Example
Previous research on a particular assembly line has shown that they have an average daily defect
rate of 39 products. Thus, the expected number of defective products expected on any day is 39.
The day after implementing a new quality control program, they found only 25 defects. What is
the probability of seeing 25 or fewer defects on any day?
Observed frequency: 25
Expected frequency: 39
---------------------------------------------------
Probability of 25 or fewer events = .0226
e & Philosophy: Categorical judgement to Chimaera
Chi-Square Test
Share
chd table syndrome expected data probability observed freedom

Ads by Google
Stats Analysis in Excel
Powerful Statistics & Curve Fitting
Within Excel. Free 30 Day Trial
www.ExcelCurveFitting.com
Free Trial - Class 5-12
Tests, Lessons, Animations & Videos
Math, Science, CBSE, ICSE Board
www.meritnation.com
CBSE Class VI to X
3000+ solved questions & tests
Study from Home. Register free
www.LearnNext.com
Aspergers Diagnosis
Learn All You Can Know
About Aspergers!
www.AspergersSociety.org
The chi-square test (KHGR2) is the most commonly used method for comparing frequencies or
proportions. It is a statistical test used to determine if observed data deviate from those expected
under a particular hypothesis. The chi-square test is also referred to as a test of a measure of fit or
"goodness of fit" between data. Typically, the hypothesis tested is whether or not two samples are
different enough in a particular characteristic to be considered members of different populations.
Chi-square analysis belongs to the family of univariate analysis, i.e., those tests that evaluate the
possible effect of one variable (often called the independent variable) upon an outcome (often
called the dependent variable).
The chi-square analysis is used to test the null hypothesis (H0), which is the hypothesis that states
there is no significant difference between expected and observed data. Investigators either accept
or reject H0, after comparing the value of chi-square to a probability distribution. Chi-square
values with low probability lead to the rejection of H0 and it is assumed that a factor other than
chance creates a large deviation between expected and observed results. As with all non-
parametric tests (that do not require normal distribution curves), chi-square tests only evaluate a
single variable, thus they do not take into account the interaction among more than one variable
upon the outcome.
A chi-square analysis is best illustrated using an example in which data from a population is
categorized with respect to two qualitative variables. Table 1 shows a sample of patients
categorized with respect to two qualitative variables, namely, congenital heart defect (CHD;
present or absent) and karyotype (trisomy 21, also called Down syndrome, or trisomy 13, also
called Patau syndrome). The classification table used in a chi-square analysis is called a
contingency table and this is its simplest form (2 x 2). The data in a contingency table are often
defined as row (r) and column (c) variables.
In general, a chi-square analysis evaluates whether or not variables within a contingency table
are independent, or that there is no association between them. In this example, independence
would mean that the proportion of individuals affected by CHD is not dependent on karyotype;
thus, the proportion of patients with CHD would be similar for both Down and Patau syndrome
patients. Dependence, or association, would mean that the proportion of individuals affected by
CHD is dependent on kayotype, so that CHD would be more commonly found in patients with
one of the two karyotypes examined.
Table 1 shows a 2 x 2 contingency table for a chi-square test—CHD (congenital heart defects)
found in patients with Down and Patau syndromes
Figure 1. Chi-square distributions for 1, 3, and 5 degrees of freedom. The shaded region in
each of the distributions indicates the upper 5% of the distribution. Illustration by Argosy.
The Gale Group.
Chi-square is the sum of the squared difference between observed and expected data, divided by
the expected data in all possible categories:
Χ2 = (O11 - E11)2 / E11 + (O12 - E12)2 / E 12 + (O21 - E21)2/ E21 + (O22–E22)2 / E22, where O11 represents the
observed number of subjects in column 1, row 1, and so on. A summary is shown in Table 2.
The observed frequency is simply the actual number of observations in a cell. In other words, O11
for CHD in the Down-syndrome-affected individuals is 24. Likewise, the observed frequency of
CHD in the Patau-syndrome-affected patients is 20 (O12). Because the null hypothesis assumes
that the two variables are independent
TABLE 1.
Karyotype
Down syndrome Patau syndrome Total
Congenital Heart Defects CHD present 24 20 44
CHD absent 36 5 41
Total 60 25 85
TABLE 2.
Karyotype
Down syndrome Patau syndrome Total
Congenital Heart Defects CHD present O11 O12 r1
CHD absent O21 O22 r2
Total c1 c2 N
TABLE 3.
Observed (o) Expected (e) o-e (o-e)2 (o-e)2/e
24 31.1 -7.1 50.41 1.62
20 12.9 7.1 50.41 3.91
36 28.9 7.1 50.41 1.74

TABLE 3.
Observed (o) Expected (e) o-e (o-e)2 (o-e)2/e
5 12.1 -7.1 50.41 4.17
85 85.0 Χ2=11.44
of each other, expected frequencies are calculated using the multiplication rule of probability.
The multiplication rule says that the probability of the occurrence of two independent events X
and Y is the product of the individual probabilities of X and Y. In this case, the expected
probability that a patient has both Down syndrome and CHD is the product of the probability that
a patient has Down syndrome (60/85 = 0.706) and the probability that a patient has CHD (44/85
= 0.518), or 0.706 x 0.518 = 0.366. The expected frequency of patients with both Down
syndrome and CHD is the product of the expected probability and the total population studied, or
0.366 x 85 = 31.1.
Table 3 presents observed and expected frequencies and Χ2 for data in Table 1.
Before the chi-square value can be evaluated, the degrees of freedom for the data set must be
determined. Degrees of freedom are the number of independent variables in the data set. In a
contingency table, the degrees of freedom are calculated as the product of the number of rows
minus 1 and the number of columns minus 1, or (r-1)(c-1). In this example, (2-1)(2-1) = 1; thus,
there is just one degree of freedom.
Once the degrees of freedom are determined, the value of Χ2 is compared with the appropriate
chi-square distribution, which can be found in tables in most statistical analyses texts. A relative
standard serves as the basis for accepting or rejecting the hypothesis. In biological research, the
relative standard is usually p = 0.05, where p is the probability that the deviation of the observed
frequencies from the expected frequencies is due to chance alone. If p is less than or equal to
0.05, then the null hypothesis is rejected and the data are not independent of each other. For one
degree of freedom, the critical value associated with p = 0.05 for Χ2 is 3.84. Chi-square values
higher than this critical value are associated with a statistically low probability that H0 is true.
Because the chi-square value is 11.44, much greater than 3.84, the hypothesis that the proportion
of trisomy-13-affected patients with CHD does not differ significantly from the corresponding
proportion for trisomy-21-affected patients is rejected. Instead, it is very likely that there is a
dependence of CHD on karyotype.
Figure 1 shows chi-square distributions for 1, 3, and 5 degrees of freedom. The shaded region in
each of the distributions indicates the upper 5% of the distribution. The critical value associated
with p = 0.05 is indicated. Notice that as the degrees of freedom increases, the chi-square value
required to reject the null hypothesis increases.
Because a chi-square test is a univariate test; it does not consider relationships among multiple
variables at the same time. Therefore, dependencies detected by chi-square analyses may be
unrealistic or non-causal. There may be other unseen factors that make the variables appear to be
associated. However, if properly used, the test is a very useful tool for the evaluation of
associations and can be used as a preliminary analysis of more complex statistical evaluations.
Read more: Chi-Square Test - Chd, Table, Syndrome, Expected, Data, and Probability
http://science.jrank.org/pages/1401/Chi-Square-Test.html#ixzz12pAPy6qp
D. The Chi-Square Test
Introduction
One of the most common and useful ways to look at information about the social world is in the
format of a table. Say, for example, we want to know whether boys or girls get into trouble more
often in school. There are many ways we might show information related to this question, but
perhaps the most frequent and easiest to comprehend method is in a table.
Got in Trouble No Trouble Total
Boys 46 71 117
Girls 37 83 120
Total 83 154 237
The above example is relatively straightforward in that we can fairly quickly tell that more boys
than girls got into trouble in school. Calculating percentages, we find that 39 percent of boys got
into trouble (46 boys got in trouble out of 117 total boys = 39%), as compared with 31 percent of
girls (37 girls got in trouble out of 120 total girls = 31%). However, to re-frame the issue, what if
we wanted to test the hypothesis that boys get in trouble more often than girls in school. These
figures are a good start to examining that hypothesis; however, the figures in the table are only
descriptive. To examine the hypothesis, we need to employ a statistical test, the chi-square test.

Chi Square Test

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Chi Square Test

Загружено:

Авторское право:

Доступные форматы

Chi-square test

"Chi-square test" also known as Pearson's chi-square test.

Chi-square test for variance in a normal population

The general formula for Pearson's chi-squared test statistic is

Relation with Mutual Information

where the Entropy of some discrete random variable is defined as

is the Mutual Information

Distribution and usage

Pearson's chi-square test

A chi-squared test nomogram

Test for fit of a distribution

Discrete uniform distribution

Calculating the test-statistic

The value of the test-statistic is

Χ2 = Pearson's cumulative test statistic, which asymptotically approaches a χ2 distribution.

Ei = an expected (theoretical) frequency, asserted by the null hypothesis;

n = the number of cells in the table.

Chi-square distribution, showing χ² on the x-axis and P-value on the y-axis.

p = probability, under the null hypothesis,

n = number of observations in the sample.

If n is sufficiently large, the above binomial distribution may be approximated by a Gaussian

 A test of whether the slope of a regression line differs significantly from 0.

The assumptions underlying a t-test are that

 Z follows a standard normal distribution under the null hypothesis

Unpaired and paired two-sample t-tests

Independent one-sample t-test

Slope of a regression line

Suppose one is fitting the model

can be written in terms of the residuals. Let

Then tscore is given by:

Independent two-sample t-test

Equal sample sizes, equal variance

This test is only used when both:

Violations of these assumptions are discussed below.

Unequal sample sizes, equal variance

Unequal sample sizes, unequal variance

Dependent t-test for paired samples

Number Name Test 1 Test 2

1 Mike 35% 67%

2 Melanie 50% 46%

3 Melissa 90% 86%

4 Mitchell 78% 91%

Example of matched pairs

Pair Name Age Test

and let A2 denote a second set obtained similarly:

If we follow the approach for unequal variances, discussed above, we get

If we follow the approach for equal variance, discussed above, we get

Alternatives to the t-test for location problems

Common examples of F-tests

 Scheffé's method for multiple comparisons adjustment in linear models.

F-test of the equality of two variances

Formula and calculation

Multiple-comparison ANOVA problems

The formula for the one-way ANOVA F-test statistic is

The "explained variance", or "between-group variability" is

The "unexplained variance", or "within-group variability" is

One-way ANOVA example

Step 1: Calculate the mean within each group:

Step 2: Calculate the overall mean:

where a is the number of groups.

Step 3: Calculate the "between-group" sum of squares:

so the between-group mean square value is

The within-group degrees of freedom is