Вы находитесь на странице: 1из 7

F-test

From Wikipedia, the free encyclopedia

An F-test is any statistical test in which the test statistic has an F-distribution under the null hypothesis. It is
most often used when comparing statistical models that have been fit to adata set, in order to identify the
model that best fits the population from which the data were sampled. Exact F-tests mainly arise when the
models have been fit to the data using least squares. The name was coined by George W. Snedecor, in honour
of Sir Ronald A. Fisher. Fisher initially developed the statistic as the variance ratio in the 1920s.[1]

Common examples of F-tests


Examples of F-tests include:

 The hypothesis that the means of several normally distributed populations, all having the
same standard deviation, are equal. This is perhaps the best-known F-test, and plays an important role in
the analysis of variance (ANOVA).

 The hypothesis that a proposed regression model fits the data well. See Lack-of-fit sum of squares.

 The hypothesis that a data set in a regression analysis follows the simpler of two proposed linear
models that are nested within each other.

 Scheffé's method for multiple comparisons adjustment in linear models.

[edit]F-test of the equality of two variances


Main article: F-test of equality of variances

This F-test is extremely sensitive to non-normality.[2][3] Alternatives are Levene's test, Bartlett's test, and
the Brown–Forsythe test. However, when any of these tests are conducted to test the underlying assumption
of homoscedasticity (i.e. homogeneity of variance), as a preliminary step to testing for mean effects in ANOVA,
there is an increase in the experiment-wise Type I error rate.[4]

[edit]Formula and calculation


Most F-tests arise by considering a decomposition of the variability in a collection of data in terms of sums of
squares. The test statistic in an F-test is the ratio of two scaled sums of squares reflecting different sources of
variability. These sums of squares are constructed so that the statistic tends to be greater when the null
hypothesis is not true. In order for the statistic to follow the F-distribution under the null hypothesis, the sums of
squares should be statistically independent, and each should follow a scaled chi-squared distribution. The latter
condition is guaranteed if the data values are independent and normally distributed with a common variance.
Multiple-comparison ANOVA problems
The F-test in one-way analysis of variance is used to assess whether the expected values of a quantitative
variable within several pre-defined groups differ from each other. For example, suppose that a medical trial
compares four treatments. The ANOVA F-test can be used to assess whether any of the treatments is on
average superior, or inferior, to the others versus the null hypothesis that all four treatments yield the same
mean response. This is an example of an "omnibus" test, meaning that a single test is performed to detect any
of several possible differences. Alternatively, we could carry out pairwise tests among the treatments (for
instance, in the medical trial example with four treatments we could carry out six tests among pairs of
treatments). The advantage of the ANOVA F-test is that we do not need to pre-specify which treatments are to
be compared, and we do not need to adjust for makingmultiple comparisons. The disadvantage of the ANOVA
F-test is that if we reject the null hypothesis, we do not know which treatments can be said to be significantly
different from the others — if the F-test is performed at level α we cannot state that the treatment pair with the
greatest mean difference is significantly different at level α.

The formula for the one-way ANOVA F-test statistic is

or

The "explained variance", or "between-group variability" is

where denotes the sample mean in the ith group, ni is the number of observations in
the ith group, and denotes the overall mean of the data.

The "unexplained variance", or "within-group variability" is

where Yij is the jth observation in the ith out of K groups and N is the overall sample size.
This F-statistic follows the F-distribution with K − 1, N −K degrees of freedom under the
null hypothesis. The statistic will be large if the between-group variability is large relative
to the within-group variability, which is unlikely to happen if the population means of the
groups all have the same value.
Note that when there are only two groups for the one-way ANOVA F-
test, F = t2 where t is the Student's t statistic.

[edit]Regression problems
Consider two models, 1 and 2, where model 1 is 'nested' within model 2. That is, model
1 has p1 parameters, and model 2 has p2 parameters, where p2 > p1, and for any choice
of parameters in model 1, the same regression curve can be achieved by some choice
of the parameters of model 2. (We use the convention that any constant parameter in a
model is included when counting the parameters. For instance, the simple linear
model y = mx + b has p = 2 under this convention.) The model with more parameters will
always be able to fit the data at least as well as the model with fewer parameters. Thus
typically model 2 will give a better (i.e. lower error) fit to the data than model 1. But one
often wants to determine whether model 2 gives a significantly better fit to the data. One
approach to this problem is to use an F test.

If there are n data points to estimate parameters of both models from, then one can
calculate the F statistic, given by[5]

where RSSi is the residual sum of squares of model i. If your regression model has
been calculated with weights, then replace RSSi with χ2, the weighted sum of
squared residuals. Under the null hypothesis that model 2 does not provide a
significantly better fit than model 1, F will have an F distribution, with
(p2 − p1, n − p2) degrees of freedom. The null hypothesis is rejected if
the F calculated from the data is greater than the critical value of the F distribution
for some desired false-rejection probability (e.g. 0.05). The test is a likelihood ratio
test.

[edit]One-way ANOVA example


Consider an experiment to study the effect of three different levels of some factor
on a response (e.g. three types of fertilizer on plant growth). If we had 6
observations for each level, we could write the outcome of the experiment in a
table like this, where a1, a2, and a3 are the three levels of the factor being studied.

a1 a2 a3
6 8 13

8 12 9

4 9 11

5 11 8

3 6 7

4 8 12

The null hypothesis, denoted H0, for the overall F-test for this experiment would be
that all three levels of the factor produce the same response, on average. To
calculate the F-ratio:

Step 1: Calculate the mean within each group:

Step 2: Calculate the overall mean:

where a is the number of groups.

Step 3: Calculate the "between-group" sum of squares:


where n is the number of data values per group.

The between-group degrees of freedom is one less than the


number of groups

dfb = 3 − 1 = 2
so the between-group mean square value is

MSB = 84 / 2 = 42
Step 4: Calculate the "within-group" sum of
squares. Begin by centering the data in each
group

a1 a2 a3

6 − 5 = 1 8 − 9 = -1 13 − 10 = 3

8 − 5 = 3 12 − 9 = 3 9 − 10 = -1

4 − 5 = -1 9 − 9 = 0 11 − 10 = 1

5 − 5 = 0 11 − 9 = 2 8 − 10 = -2

3 − 5 = -2 6 − 9 = -3 7 − 10 = -3

4 − 5 = -1 8 − 9 = -1 12 − 10 = 2

The within-group sum of squares is the sum of


squares of all 18 values in this table

ANOVA's robustness with respect to Type I errors for departures from


population normality
The oneway ANOVA can be generalized to the factorial and multivariate layouts, as well as to the
analysis of covariance. None of these F-tests, however, are robust when there are severe violations of the
assumption that each population follows the normal distribution, particularly for small alpha levels and
unbalanced layouts.[6] Furthermore, if the underlying assumption of homoscedasticity is violated, the Type
I error properties degenerate much more severely.[7] For nonparametric alternatives in the factorial layout,
see Sawilowsky.[8] For more discussion see ANOVA on ranks.

Earlier in this section you saw how to perform a t-test to compare a sample
mean to an accepted value, or to compare two sample means. In this section,
you will see how to use the F-test to compare two variances or standard
deviations.

When using the F-test, you again require a hypothesis, but this time, it is to
compare standard deviations. That is, you will test the null
hypothesis H0:σ12 = σ22 against an appropriate alternate hypothesis.

You calculate the F-value as the ratio of the two variances:

where s12 ≥ s22, so that F ≥ 1. The degrees of freedom for the numerator and
denominator are n1-1 and n2-1, respectively. As with the t-test, you
compare Fcalc to a tabulated value Ftab, to see if you should accept or reject the
null hypothesis. As well, you can perform 1- or 2-tailed F-tests. The following
two examples illustrate the use of 1- and 2-tailed tests.

Example 1

As an example, assume we want to see if a method (Method 1) for measuring


the arsenic concentration in soil is significantly more precise than a second
method (Method 2). Each method was tested ten times, with, yielding the
following values:

Method Mean (ppm) Standard Deviation (ppm)


1 6.7 0.8
2 8.2 1.2

A method is more precise if its standard deviation is lower than that of the
other method. So we want to test the null hypothesis H0: σ22 = σ12, against the
alternate hypothesis HA: σ22 > σ12.

Since s2 > s1, Fcalc = s22/s12 = 1.22/0.82 = 2.25. The tabulated value for d.o.f. ν = 9
in each case, and a 1-tailed, 95% confidence level is F9,9 = 3.179. In this
case, Fcalc < F9,9, so we accept the null hypothesis that the two standard
deviations are equal, and we are 95% confident that any difference in the
sample standard deviations is due to random error. We use a 1-tailed test in
this case because the only information we are interested in is whether Method
1 is more precise than Method 2.

Example 2

If we are not interested in whether one method is better compared to another,


but were simply trying to determine if the variances of were the same or
different, we would need to use a 2-tailed test. For instance, assume we made
two sets of measurements of ethanol concentration in a sample of vodka using
the same instrument, but on two different days. On the first day, we found a
standard deviation of s1 = 9 ppm and on the next day we found s2 = 2 ppm.
Both datasets comprised 6 measurements. We want to know if we can
combine the two datasets, or if there is a significant difference between the
datasets, and that we should discard one of them.

As usual, we begin by defining the null hypothesis, H0: σ12 = σ22, and the
alternate hypothesis, HA: σ12 ≠ σ22. The "≠" sign indicates that this is a 2-tailed
test, because we are interested in both cases: σ12 > σ22 and σ12 < σ22. For the F-
test, you can perform a 2-tailed test by multiplying the confidence level P by 2,
so from a table for a 1-tailed test at the P = 0.05 confidence level, we would
perform a 2-tailed test at P = 0.10, or a 90% confidence level.

For this dataset, s2 > s1, Fcalc = s12/ s22 = 92/22 = 20.25. The tabulated value
for ν = 5 at 90% confidence is F5,5 = 5.050. Since Fcalc > F5,5, we reject the null
hypothesis, and can say with 90% certainty that there is a difference between
the standard deviations of the two methods.

Tables for other confidence levels can be found in most statistics or analytical
chemistry textbooks. Be careful when using these tables, to pay attention to
whether the table is for a 1- or a 2-tailed test. In most cases, tables are given
for 2-tailed tests, so you can divide by 2 for the 1-tailed test. For the F-test,
always ensure that the larger standard deviation is in the numerator, so
that F ≥ 1.

Вам также может понравиться