Degrees of Freedom (1)
Degrees of Freedom (1)
Original
XX
Blackwell
Oxford,
Teaching
TEST
©
0141-982X
Journal
XXX Articles
2007 The
compilation
UK
Statistics
Publishing
Author ©Ltd
Teaching Statistics Trust
A further application of the same principle occurs Clearly the table is impossible to complete without
in analysis of variance (ANOVA): since the further information. Of course, if a single frequency
degrees of freedom represent the effective sample is inserted, such as (R1, C1) = 16, one other cell
size, it makes perfect sense to divide the sums of becomes determined or redundant; in particular, it
squares by their respective df to get mean squares. must be the case that (R2, C1) = 4. But once a
In particular, for a total of n observations, the second nonredundant frequency is given, the rest
overall variance is simply the sum of squares total of the table is determined by default: if (R1,
(SST ) divided by the total degrees of freedom, n – 1. C2) = 44, then (R1, C3) = 40, (R2, C2) = 26, and (R2,
For k treatment categories, the sum of squares C3) = 50. Thus for r = 2 rows and c = 3 columns,
due to treatments is given by SSTR = Σ ik=1ni (x i − x )2 the table is uniquely determined by 2 nonredundant
where ni and xi denote the number of observations pieces of information; that is, there are (r – 1)
and the mean in the ith treatment category, respec- (c – 1) = 2 degrees of freedom. Of course, this exercise
tively. Once SSTR has been calculated, the final can easily be replicated and even extrapolated to
term in the sum, nk(xk – x)2, is determined by the larger contingency tables, so that with sufficient prac-
value of SSTR and the preceding k – 1 terms; tice students will convince themselves that the formula
hence for the purpose of computing the mean for degrees of freedom is intuitively reasonable.
square due to treatments there are only k – 1
degrees of freedom. And because the sum of Alternatively the general formula for the degrees
squares due to error (SSE) is based on squared of freedom in a contingency table can be derived
deviations within each of the k categories, the as follows. Any table of r rows and c columns has
mean square due to error has (n1 – 1) + (n2 – 1) + rc total cells. The table is effectively completed
. . . + (nk – 1) = n – k degrees of freedom. Of course, after entries are made in all but one row and one
because they are additive, the degrees of freedom column. Since each row has c cells, each column
associated with treatments and those associated has r cells, and each row intersects each column
with error sum to the total: (k – 1) + (n – k) = n – 1. once, the number of redundant cells (i.e. the
number of cells in the final row and final column)
Exactly the same degrees of freedom reappear in is r + c – 1. Therefore the degrees of freedom can
the context of a multiple linear regression if k is be calculated as df = rc – (r + c – 1) = (r – 1)(c – 1).
taken to represent the number of regression
coefficients including the constant term. The
rationale for replacing the coefficient of determi-
nation, R2 = 1 – (SSE/SST), with its adjusted ª DEGREES OF FREEDOM FOR ª
counterpart, adjusted R2 = 1 – [(SSE/(n – k))/(SST/ GOODNESS-OF-FIT
(n – 1))], is then a straightforward application of
the argument used above – that is, it is proper to As a final application, consider a chi-squared
average SSE and SST over their effective sample goodness-of-fit test of the hypothesis that a sample
sizes, n – k and n – 1, respectively. of 100 observations was drawn from a population
© 2008 The Author Teaching Statistics. Volume 30, Number 3, Autumn 2008 • 77
Journal compilation © 2008 Teaching Statistics Trust
having a (truncated) Poisson distribution. Initially, Although most textbooks state this rule in the
let us assume that the population mean is known goodness-of-fit context, many do so without
to be μ = 1.4, and for simplicity let us assume that providing a convincing rationale.
no observation ever exceeds 5 (for context, one can
think of a physical or legal limitation which precludes
x ≥ 6; suppose, for example, that a local ordinance
bars tavern patrons from purchasing more than five ª CONCLUSION ª
alcoholic beverages per hour). The sample data
can then be divided into six categories (k = 6), such Degrees of freedom are nearly ubiquitous in
that the random variable X takes the values 0, 1, statistical inference, yet they are often ill-defined.
2, 3, 4 or 5. The frequencies with which the sample Remarkably, this is as true in the reference
observations fall into these categories represent the literature as it is in the pedagogical literature. As a
information used in the chi-squared test, so there consequence, students taking courses in statistics –
are nominally six such pieces of information. But especially those students with high levels of math
since the six frequencies must sum to 100, only five anxiety or outright math phobia – tend to view
of them can be freely varied, the remaining frequency degrees of freedom as yet another set of inexplicable
being determined by default. Hence the chi-squared formulas. This need not be the case; however,
statistic in this case has df = k – 1 = 5. some intuitive discussions and elementary exercises
can provide reassurance that the concept is not
Now suppose instead that μ is unknown, and must nearly as complex or forbidding as it may at first
be estimated from the sample data. The estimation appear, and students can at least glimpse the
of the mean from the sample data will ‘cost’ relationships among the degrees of freedom in
another degree of freedom. To see why, suppose various statistical procedures.
the sample mean is calculated to be x = 1.40.
Now not only must the frequencies sum to 100,
that is, References
Black, K. (1994). Business Statistics: Con-
f1 + f2 + f3 + f4 + f5 + f6 = 100, temporary Decision Making. Minneapolis,
MN: West Publishing Company.
but also the sum of all observations – or equiva- Clapham, C. (1996). The Concise Oxford
lently, the sum of the multiplicative products of Dictionary of Mathematics (2nd edn).
the x values and their respective frequencies – Oxford, UK: Oxford University Press.
must be 140; that is, Daintith, J. and Rennie, R. (2005). The Facts
on File Dictionary of Mathematics (4th
0 f1 + 1 f2 + 2 f3 + 3 f4 + 4 f5 + 5 f6 = 140. edn). New York: Market House Books.
Everett, B.S. (2002). The Cambridge Dictionary
Clearly these last two expressions represent of Statistics (2nd edn). Cambridge, UK:
simultaneous equations in the six frequencies. Cambridge University Press.
Thus we have two equations in six unknowns, Glenn, J.A. and Littler, G.H. (eds) (1984). A
leaving k – 2 = 4 degrees of freedom. In particular, Dictionary of Mathematics. Totowa, NJ:
we can manipulate these simultaneous equations Barnes and Noble Books.
algebraically to derive the observed frequencies Kotz, S. and Johnson, N.L. (1982). Encyclo-
for the final two categories as functions of the pedia of Statistical Sciences (vol. 2). New
frequencies in the preceding four categories: York: John Wiley and Sons.
Mayhew, S. (2004). A Dictionary of Geography
f5 = 360 − 5 f1 − 4 f2 − 3 f3 − 2 f4 (3rd edn). Oxford, UK: Oxford University
Press, http://www.oxfordreference.com/.
and Schwartzman, S. (1994). The Words of Math-
ematics: An Etymological Dictionary of
f6 = 4 f1 + 3 f2 + 2 f3 + 1 f4 − 260, Mathematical Terms Used in English.
Washington, DC: Mathematics Association
so that only four frequencies are free to be varied. of America.
This extension of the simpler example reinforces Upton, G. and Cook, I. (2002). A Dictionary
the notion that a degree of freedom is lost for each of Statistics. Oxford, UK: Oxford University
parameter that must be estimated from the sample. Press, http://www.oxfordreference.com/.
78 • Teaching Statistics. Volume 30, Number 3, Autumn 2008 © 2008 The Author
Journal compilation © 2008 Teaching Statistics Trust