Вы находитесь на странице: 1из 22

7/12/2016

DegreesofFreedomTutorialRonDotsch

RON DOTSCH
ABOUT ME

NEWS

PUBLICATIONS

RCICR

RAFD

TUTORIALS

CONTACT

DEGREES OF F REED OM
TUTORIA L
A lot of researchers seem to be struggling with their
understanding of the statistical concept of degrees of
freedom. Most do not really care about why degrees of
freedom are important to statistical tests, but just want
to know how to calculate and report them. This page will
help. For those interested in learning more about
degrees of freedom, take a look at the following
resources:
This chapter in the little handbook of statistical
practice
Walker, H. W. (1940). Degrees of Freedom.Journal of

Educational Psychology, 31(4), 253-269.


I couldnt nd any resource on the web that explains
calculating degrees of freedom in a simple and clear
manner and believe this page will ll that void. It reects
my current understanding of degrees of freedom, based
on what I read in textbooks and scattered sources on the
web. Feel free to add or comment.

http://ron.dotsch.org/degreesoffreedom/

1/22

7/12/2016

DegreesofFreedomTutorialRonDotsch

CONC EP TUAL UNDERSTAN DIN G


Lets start with a simple explanation of degrees of
freedom. I will describe how to calculate degrees of
freedom in anF-test (ANOVA) without much statistical
terminology.When reporting an ANOVA, between the
brackets you write down degrees of freedom 1 (df1) and
degrees of freedom 2 (df2), like this: F(df1, df2) = .
Df1 and df2 refer to dierent things, but can be
understood the same following way.
Imagine a set of three numbers, pick any number you
want. For instance, it could be the set [1, 6, 5].
Calculating the mean for those numbers is easy: (1 + 6 +
5) / 3 = 4.
Now, imagine a set of three numbers, whose mean is 3.
There are lots of sets of three numbers with a mean of 3,
but for any set the bottom line is this: you can freely pick
the rst two numbers, any number at all, but the third
(last) number is out of your hands as soon as you picked
the rst two. Say our rst two numbers are the same as
in the previous set, 1 and 6, giving us a set of two freely
picked numbers, and one number that we still need to
choose, x: [1, 6, x]. For this set to have a mean of 3, we
dont have anything to choose about x. X has to be 2,
because (1 + 6 + 2) / 3 is the only way to get to 3. So, the
rst two values were free for you to choose, the last
value is set accordingly to get to a given mean. This set is
said to have two degrees of freedom, corresponding
with the number of values that you were free to choose
(that is, that were allowed to vary freely).
This generalizes to a set of any given length. If I ask you
to generate a set of 4, 10, or 1.000 numbers that average
to 3, you can freely choose all numbers but the last one.
In those sets the degrees of freedom are respectively, 3,
9, and 999. The general rule then for any set is that
http://ron.dotsch.org/degreesoffreedom/

2/22

7/12/2016

DegreesofFreedomTutorialRonDotsch

ifnequals the number of values in the set, the degrees


of freedom equalsn 1.
This is the basic method to calculate degrees of freedom,
justn 1. It is as simple as that. The thing that makes it
seemmore dicult, is the fact that in an ANOVA, you
dont have just one set of numbers, but there is a system
(design) to the numbers. In the simplest form you test
the mean of one set of numbers against the mean of
another set of numbers (one-way ANOVA). In more
complicated one-way designs, you test the means of
three groups against each other. In a 2 x 2 design things
seem even more complicated. Especially if theres a
within-subjects variable involved (Note: all examples on
this page are between-subjects, but the reasoning mostly
generalizes to within-subjects designs). However things
are not as complicated as you might think. Its all pretty
much the same reasoning: how many values are free to
vary to get to a given number?

D F1
Df1 is all about means and not about single observations.
The value depends on the exact design of your test.
Basically, the value represents the number of cell means
that are free to vary to get to a given grand mean. The
grand mean is just the mean across all groups and
conditions of your entire sample. The cell means are
nothing more than the means per group and condition.
Well call the number of cells (or cell means)k.
Lets start o with a one-way ANOVA. We have two
groups that we want to compare, so we have two cells. If
we know the mean of one of the cells and the grand
mean, the other cell must have a specic value such that
(cell mean 1 + cell mean 2) / 2 = grand mean (this
example assumes equal cell sample sizes, but unequal
cell sample sizes would not change the number of
degrees of freedom). Conclusion: for a two-group design,
df1 = 1.
Sticking to the one-way ANOVA, but moving on to three
groups. We now have three cells, so we have three
http://ron.dotsch.org/degreesoffreedom/

3/22

7/12/2016

DegreesofFreedomTutorialRonDotsch

means and a grand mean. Again, how many means are


free to vary to get to the given grand mean? Thats right,
2. So df1 = 2. See the pattern? For one-way ANOVAs df1
=k 1.
Moving on to an ANOVA with four groups. We know the
answer if this is a one-way ANOVA (that is, a 4 x 1 design):
df1 =k 1 = 4 -1 = 3. However, what if this is a two-way
ANOVA (a 2 x 2 design)? We still have four means, so to
get to a given grand mean, we can have three freely
varying cell means, right? Although this is true, we have
more to deal with than just the grand mean, namely the
marginal means. The marginal means are the combined
cell means of one variable, given a specic level of the
other variable. Lets say our 2 x 2 ANOVA follows a 2
(gender: male vs. female) x 2 (eye color: blue vs. brown)
design. In that case, the grand mean is the average of all
observations in all 4 cells. The marginal means are the
average of all eye colors for male participants, the
average of all eye colors for female participants, the
average of all genders for blue-eyed participants, and the
average of all genders for brown-eyed participants. The
following table shows the same thing:

Brown
eyes

Blue eyes

Male

CELL
MEAN
Brown
eyed
males

CELL
MEAN
Blue eyed
males

MARGINAL
MEAN
of brown
eyed
males
and blue
eyed
males

Female

CELL
MEAN
Brown
eyed
females

CELL
MEAN
Blue eyed
females

MARGINAL
MEAN
of brown
eyed
females
and blue
eyed
females

http://ron.dotsch.org/degreesoffreedom/

4/22

7/12/2016

DegreesofFreedomTutorialRonDotsch

MARGINAL
MEAN
of brown
eyed
males
and brown
eyed
females

MARGINAL
MEAN
of blue
eyed
males
and blue
eyed
females

GRAND
MEAN

The reason that we are now dealing with marginal means


is that we are interested in interactions. In a 4 x 1 oneway ANOVA, no interactions can be calculated. In our 2 x
2 two-way ANOVA, we can. For instance, we might be
interested in whether females perform better than
males depending on their eye color. Now, because we are
interested in cell means dierences in a specic way (i.e.,
we are not just interested in whether one cell mean
deviates from the grand mean, but we are also
interested in more complex patterns), we need to pay
attention to the marginal means. As a consequence, we
now have less freedom to vary our cell means, because
we need to account for the marginal means (if you want
to know how this all works, you should read up on how
the sums of squares are partitioned in 2 x 2 ANOVAs). It
is also important to realize that if all marginal means are
xed, the grand mean is xed too. In other words, we do
not have to worry about the grand mean anymore for
calculating our df1 in a two-way ANOVA, because we are
already worrying about the marginal means. As a
consequence, our df1 will not lose a degree of freedom
because we do not want to get to a specic grand mean.
Our df1 will only lose degrees of freedom to get to the
specic marginal means.
Now, how many cell means are free to vary before we
need to ll in the other cell means to get to the four
marginal means in the 2 x 2 design? Lets start with freely
picking the cell mean for brown eyed males. We know
the marginal mean for brown eyed males and blue eyed
males together (it is given, all marginal means are), so I
guess we cant choose the blue eyed males cell mean
freely. There goes one degree of freedom. We also know
the marginal mean for brown eyed males and brown
http://ron.dotsch.org/degreesoffreedom/

5/22

7/12/2016

DegreesofFreedomTutorialRonDotsch

eyed females together. That means we cant choose the


brown eyed female cell mean freely either. And as we
know the other two marginal means, we have no choice
in what we put in the blue eyed females cell mean to get
to the correct marginal means. So, we chose one cell
mean, and the other three cell means had to be lled in
as a consequence to get to the correct marginal means.
You know what that means dont you? We only have one
degree of freedom in df1 for a 2 x 2 design. Thats
dierent from the three degrees of freedom in a 4 x 1
design. The same number of groups and they might even
contain the same observations, but we get a dierent
number of degrees of freedom. So now you see that
using the degrees of freedom, you can infer a lot about
the design of the test.
You could do the same mental exercise for a 2 x 3 design,
but it is tedious for me to write up, so I am going to give
you the general rule. Every variable in your design has a
certain number of levels. Variable 1 in the 2 x 3 design
has 2 levels, variable 2 has 3 levels. You get df1 when you
multiply the levels of all variables with each other, but
with each variable, subtract one level. So in the 2 x 3
design, df1 would be (2 1) x (3 1) = 2 degrees of
freedom. Back to the 2 x 2 design, df1 would be (2 1) x
(2 1) = 1 degrees of freedom. Now lets see what
happens with a 2 x 2 x 2 design: (2 1) x (2 1) x (2 1) =
still 1 degrees of freedom. A 3 x 3 x 4 design (I hope
youll never have to analyze that one): (3 1) x ( 3 1) x (4
-1) = 2 x 2 x 3 = 12 degrees of freedom.
By now, you should be able to calculate df1 inF(df1, df2)
with ease. By the way, most statistical programs give you
this value for free. However, now youll be able to judge
whether researchers have performed the right
analysesin their papers to some extent based on their
df1 value. Also, df1 is calculated the same way in a
within-subjects design. Just treat the within-subjects
variable as any other variable. Lets move on to df2.

D F2
Whereas df1 was all about how the cell means relate to
http://ron.dotsch.org/degreesoffreedom/

6/22

7/12/2016

DegreesofFreedomTutorialRonDotsch

the grand mean or marginal means, df2 is about how the


single observations in the cells relate to the cell means.
Basically the df2 is the total number of observations in all
cells (n) minus the degrees of freedoms lost because the
cell means are set (that is, minus the number of cell
means or groups/conditions:k). Df2 =nk, thats all
folks! Say we have 150 participants across four
conditions. That means we will have df2 = 150 4 = 146,
regardless of whether the design is 2 x 2, or 4 x 1.
Most statistical packages give you df2 too. In SPSS, its
called df error, in other packages it might be called df
residuals.
For the case of within subjects-designs, things can
become a bit more complicated. The following
paragraphs are work in progress. The calculation of df2
for a repeated measures ANOVA with one withinsubjects factor is as follows: df2 = df_total df_subjects
df_factor, where df_total = number of observations
(across all levels of the within-subjects factor,n) 1,
df_subjects = number of participants (N) 1, and
df_factor = number of levels (k) 1. Basically, the take
home message for repeated measures ANOVA is that you
lose one additional degree of freedom for the subjects (if
youre interested: this is because the sum of squares
representing individual subjects average deviation from
the grand mean is partitioned separately, whereas in
between-subjects designs, thats not the case. To get to a
specic subjects sum of squares,N 1 subject means are
free to vary, hence you lose one additional degree of
freedom).

CONC LUSIO N
You should be able to calculate df1 and df2 with ease
now (or identify it from the output of your statistical
package like SPSS). Keep in mind that the degrees of
http://ron.dotsch.org/degreesoffreedom/

7/22

7/12/2016

DegreesofFreedomTutorialRonDotsch

freedom you specify are those of the design of


theeectthat you are describing. There is no such thing
as one set of degrees of freedom that is appropriate for
every eect of your design (although, in some cases,
they might seem to have the same value for every
eect).
Moreover, although we have been discussing means in
this tutorial, for a complete understanding, you should
learn about sums of squares, how those translate into
variance, and how test statistics, such as F-ratio, work.
This will make clear to you how degrees of freedom are
used in statistical analysis. The short functional
description is that, primarily, degrees of freedom aect
which critical values are chosen for test statistics of
interest, given a specic alpha level (remember those
look-up tables in your early statistics classes?).
Why do we use n-1 degrees of freedom? Computing the
mean uses all observations, divided by n. The mean is
then used in computing sum of squares (the mean needs
to be known, otherwise you cant compute sum of
squares). That xes one number (the mean), and
therefore you lose one degrees in freedom in computing
sum of squares. If the mean is known, n-1 observations
are free to vary. The last one no longer gets to be freely
picked to get to a given mean.

SHARE TH IS:

Email

Print

Facebook 203

Twitter

LinkedIn 11

Reddit

Google

43 COM MENTS O N DEG REES O F


FREEDO M T UTORIAL
A D D YO U R S

http://ron.dotsch.org/degreesoffreedom/

8/22

7/12/2016

DegreesofFreedomTutorialRonDotsch

Jon Martin says:


1.

April 9, 2015 at 8:00 am

Very good article, nally found one that


really explains the degrees of freedom
with good examples. Thanks!

Reply

Catherine says:
2.

April 18, 2015 at 1:50 am

Great post Ive shared it with all my stats


students. Thanks so much!

Reply

Colin says:
3.

April 20, 2015 at 7:25 pm

Well done,
I just wanted to extend my gratitude as
this well well done and claried my
understanding of degrees of freedom
beyond here is the formula use it

Reply

Peter says:
4.

April 28, 2015 at 12:38 am

Excellent explanation in jargon-free


language. Thank you.

Reply

Hadsy says:

http://ron.dotsch.org/degreesoffreedom/

April 30, 2015 at 8:28 am

9/22

7/12/2016

DegreesofFreedomTutorialRonDotsch

5.

Really good and clear explanation. Why


cant everyone explain statistical concepts
this clearly?
Thanks a lot for this!

Reply

Arun Sasidharan says: May 16, 2015 at 6:16 am


6.

Written in a simple manner that will be


useful to even people with some aversion
to statistical details. Thanks

Reply

Dimuth says:
7.

May 23, 2015 at 8:38 am

Thanks a lot. It clearly explained what DF


is. Really good.

Reply

Dan says:
8.

June 3, 2015 at 6:40 pm

Thanks for this, brilliant writing and


explanation

Reply

Kien says:
9.

July 3, 2015 at 7:25 am

Thanks a lot for the explanation. Learnt a


lot in university, training of Six sigma but I
always were wondering that what degree
of freedom is, I read this article 1 year ago.

http://ron.dotsch.org/degreesoffreedom/

10/22

7/12/2016

DegreesofFreedomTutorialRonDotsch

But now read it again and still feel very


useful and interesting.

Reply

July 21, 2015 at 1:34 pm

Weronika says:
10.

Many thanks for this article! Everything is


more clear now !

Reply

July 30, 2015 at 6:36 am

Joan says:
11.

This is really helpful for me, as a student! I


didnt understand this when my teacher
tried to explain it, but I clearly understand
now, after reading you article! Thanks a
lot!

Reply

Gary says:
12.

September 2, 2015 at 1:44 pm

Thanks for the great article. Still not 100%


clear, but almost there. Im sure once I look
more into refreshing my stats knowledge
itll all become clear.
Cheers!

Reply

William says:

http://ron.dotsch.org/degreesoffreedom/

September 5, 2015 at 11:11 pm

11/22

7/12/2016

DegreesofFreedomTutorialRonDotsch

13.

Thank you for this. I was hoping for the


following explanation:
Why do I care? So I understand what the
concept is. What does it mean to me when
I am considering my ANOVA? For instance,
I understand WHY I care about signicance
or the values I receive from a t-test or
various coecients.
I dont understand WHY I care about DF.
Its a value and it is meaningless to me.
Thoughts?

Reply

Ron Dotsch says: September 7, 2015 at 6:59 am


A.

You care because the p-value you


compute depends on the degrees of
freedom. The F-value is the ratio of
between-group and within-group
variance (or more precisely, sum of
squares). The higher the ratio, the more
between-group variance relative to
within-group variance and so the less
likely that youll observe your data (or
more extreme data) given that the null
hypothesis is true. The more degrees of
freedom (more participants, more
observations), this becomes even less
likely and thats why the same F value will
produce smaller p-values with higher
degrees of freedom.
Try and look up the p-value of an F-value
without knowing the degrees of
freedom; its impossible.

Reply

http://ron.dotsch.org/degreesoffreedom/

12/22

7/12/2016

DegreesofFreedomTutorialRonDotsch

Nick says:
14.

November 8, 2015 at 10:57 pm

Really good explanation in terms of


calculation (the best I have seen).
I still struggle with the purpose. Using the
ANOVA example its very clear to easy
report dierences between groups when
we look at the P value (and go on to do
post hoc etc) . However reports that I have
been exposed to do not interpret the F
value nor the df for instance testing eye
colour (IV, 3 levels) against IQ (DV)
if we compare results of 2 completely
separate studies with dierent numbers of
people.
assuming both studies showed dierences
that were signicant what would the
dierence be between these F value and
df results F(2,197) = 5.55 and F(2,997) =
25.5 ?
Thanks
Nick

Reply

Ron Dotsch says: November 9, 2015 at 1:57 pm


A.

Hi Nick,
The F values are important to determine
signicance jointly with the degrees of
freedom. From the degrees of freedom
in the F-tests you mention I can see that
the second result is based on more
observations. Given that larger sets of
observations are better at estimating the

http://ron.dotsch.org/degreesoffreedom/

13/22

7/12/2016

DegreesofFreedomTutorialRonDotsch

true population parameters, I would


update my belief about the eect you
are testing more for the F(2,997) test
than for F(2,197). Moreover, the same F
value will yield lower p-values given
larger df2. In other words: when you test
small eects, you need more
observations to have the power to detect
the eects. However, this quickly moves
the discussion from degrees of freedom
to power and sample size.
Best,
Ron

Reply

Lucas says:
I.

November 12, 2015 at 1:22 pm

Thats a really great post, and very clear


answers as well. Thanks a lot!

Reply

Nick says:
II.

November 20, 2015 at 1:02 am

OK this does make more sense. i do


really appreciate the reply.
Thanks again.
Nick

Reply

Samar says:

http://ron.dotsch.org/degreesoffreedom/

November 20, 2015 at 4:08 pm

14/22

7/12/2016

DegreesofFreedomTutorialRonDotsch

15.

thank you. the post and the previous


comments are very helpful for me as a
student. I pinned this post.

Reply

CJ says:
16.

November 29, 2015 at 6:56 pm

Great and simplied exposition. Thanks.

Reply

Kunjan says:
17.

November 30, 2015 at 11:26 am

Article is really awesome..I have never seen


such an easily way to know deep about
degrees of freedom. Thank u!

Reply

eduke bridget says:


December 4, 2015 at 5:08 pm

18.

thanks ron. i am a master degree student


in cameroon(africa) still strungling to write
my theses.there are many things that
confuse me when deeling with statistics.
DF is one of such.let me say i have
understood 50%.now, my problem is HOW
is it helpfull?HOW can i explain DF to an
ordinary man.HOW DOES IT EXPLAIN A
SITUATION?

Reply

http://ron.dotsch.org/degreesoffreedom/

15/22

7/12/2016

DegreesofFreedomTutorialRonDotsch

Ron Dotsch says: December 8, 2015 at 1:29 pm


A.

It is helpful if you want to compute a


statistic. If an ordinary man is interested
in it he/she should get into statistics
which are more complex than you can
explain in a single sentence.
The easiest explanation would be that in
order to compute whether a dierence
between two conditions of at least the
sampled size would occur if in reality
there is no dierence, you need to
estimate the within-condition and
between-condition variance. With those
two ingredients you can compute a t, F,
p-value. However, we know that the
estimation of both variances is
systematically biased, unless you account
for the degrees of freedom. If you want
to know why, you would need to read old
statistics texts, but I am ne with
assuming that this is true as I dont want
to be a statistician. Theres no shorter
way to say it.

Reply

Han Nguyen says: December 12, 2015 at 9:59 pm


19.

Dear Professpr Ron Dotsch,


Thank you very much for your useful
notes. It helps me a lot.
Best regards,
Han Nguyen.

Reply

nina says:
http://ron.dotsch.org/degreesoffreedom/

December 15, 2015 at 4:58 pm


16/22

7/12/2016

DegreesofFreedomTutorialRonDotsch

20.

nice explanation but also give other


example

Reply

lefteris says:
21.

December 16, 2015 at 8:49 am

Thank you so much!!

Reply

Feifei says:
22.

December 30, 2015 at 7:13 pm

Explains very clearly. Great resource!


Thanks : )

Reply

Edson Luwobo says:


December 31, 2015 at 3:19 pm

23.

I struggled to understand degrees of


freedom but within few minutes now I can
also teach someone what it is, wonderful
thank you. May I have to ask you for
Hypothesis as well.

Reply

Dag says:
24.

February 23, 2016 at 2:30 pm

But, clearly youre not entirely free to


choose whatever numbers as the two rst
ones of three if the mean is set to 3. There

http://ron.dotsch.org/degreesoffreedom/

17/22

7/12/2016

DegreesofFreedomTutorialRonDotsch

must be some constraints on the numbers


you are allowed to choose?

Reply

Ron Dotsch says: February 23, 2016 at 2:50 pm


A.

No, you can choose whatever numbers as


the rst two, the third is the only one
that then will have to be set according to
a specic number to make the mean have
a certain value. The only constraints you
have for choosing the values are
introduced if you want the values to be
realistic. In that case they would be
dependent on the measurement level
and value range of your measurement.
However, that is irrelevant for the
degrees of freedom.

Reply

Motti says:
25.

February 27, 2016 at 6:49 pm

If I try non-parametric analysis for Chisquare then the results and df=3 are Ok
and sensible so is that I need to use nonparametric analysis for departments with
innovation ? thnx

Reply

steve t says:
26.

February 29, 2016 at 3:47 pm

Hi.
Thanks for that. Am I right in concluding
that a lower DF1 will make a particular F

http://ron.dotsch.org/degreesoffreedom/

18/22

7/12/2016

DegreesofFreedomTutorialRonDotsch

level more likely to be signicant, while a


lower df2 will make it less likely?

Reply

Trish says:
27.

March 4, 2016 at 1:45 am

Thank you so much! Great explanation.

Reply

Enrico says:
28.

March 29, 2016 at 3:50 pm

Great explanation, thanks!


I still dont get why we divide the sum of
squares by df and not by n.
How does this give us a better F?

Reply

Ron Dotsch says:


A.

March 31, 2016 at 7:40 am

This is explained here:


https://en.wikipedia.org/wiki/Bessel%27s
_correction#Proof_of_correctness__Alternate_3

Reply

Enrico says:
29.

March 31, 2016 at 10:09 am

Ah, perfect, thanks!

Reply
http://ron.dotsch.org/degreesoffreedom/

19/22

7/12/2016

DegreesofFreedomTutorialRonDotsch

April 8, 2016 at 7:11 am

Dutchiebrit says:
30.

Thanks so much for your great tutorial!


always assumed I am useless with numbers
and avoided them, but picked up a study
which requires me to do stats and this has
been no end of help! will be bookmarking
this site!!

Reply

sritej gunta says:


31.

April 14, 2016 at 10:53 pm

I have a question. If there are 7 roads and 4


paints and each road is marked with each
of the 4 paints and a total of(7*4=28)
brightness are recorded. I am doing a 2
way ANOVA on this data. df1 is (7-1)+(4-1)
=9 right? and df2=28-(7+4)=17 right?..the
total mean, means across each of the 4
groups and also means across each of the
7 groups are know. When I am doing the
analysis using SAS I get df1=9 but
df2=18(not 17). I am confused. Please
help. BTW Nice article.

Reply

issa assam says:


32.

April 24, 2016 at 9:44 am

simple,logical nd helpful xplanation i have


encounterd ever,thax fo the article

Reply

http://ron.dotsch.org/degreesoffreedom/

20/22

7/12/2016

DegreesofFreedomTutorialRonDotsch

Morrison says:
33.

May 28, 2016 at 1:59 am

Great post the rst Ive seen that


explains in such easy to understand terms.

Reply

LE AVE A R EPLY
Your email address will not be published. Required elds
are marked *
Comment

Name *

Email *

Website

Post Comment
Notify me of new posts by email.

http://ron.dotsch.org/degreesoffreedom/

21/22

7/12/2016

DegreesofFreedomTutorialRonDotsch

C ATEGOR IES

ARCH I VE S

News (3)

March 2016(1)

rcicr (1)

May 2015(1)

Tech (1)

April 2015(1)

Uncategorized (1)

March 2015(1)
January 2015(1)
November 2011(1)

2016 Ron Dotsch | Powered by WordPress | Theme: Auberge by WebMan Design | Back to top

http://ron.dotsch.org/degreesoffreedom/

22/22

Вам также может понравиться