Академический Документы
Профессиональный Документы
Культура Документы
historical Perspective
Ross E. Traub
The Ontario Institute for Studies in Education of
the University of Toronto
Winter 1997 9
Association Between Two Things" mental heredity can be [nothing] lar" and "accidental" (p. 273),
and published in the American more than mere accidental coinci- the correction formula applies
Journal of Psychology. dence" (p. 98). I think I may safely only to the latter. As regards
Spearman's article angered Karl leave him to calculate the odds for Brown's contention that acci-
Pearson. The reason was that or against this most remarkable dental errors can be correlated,
"mere accidental coincidence". . . .
Spearman had had the temerity to Spearman observed that, if er-
Perhaps the best thing at present
challenge Pearson's conclusion that rors were indeed found to be
would be for Mr. Spearman to
the coefficient of correlation be- write a paper giving algebraic linked (as might be the case,
tween pairs of brothers was 0.5 for proofs of all the formulae he has e.g., if a person were ill at the
psychic characteristics, just as it used; and if he did not discover time oftaking both tests x and
had been found to be for physical their erroneous nature in the y), then the investigator should
characteristics . Using reliability es- process, he would at least provide employ a better experimental
timates from his own work, Spear- tangible material for definite crit- design .
man estimated the corrected icism, which it is difficult to apply 2. It had been suggested accord-
correlation coefficient for mental to mere unproven assertions . ing to Spearman (1910, p. 272)
(Pearson, 1904, p. 160) that investigators should make
ability to be 0.8.
Pearson was co-editor of a newly Stung into responding, Spearman measurements so "efficient"
founded journal, Biometrika. He in- published a proof of the correction that no correction would be
serted the following petulant note in for attenuation in 1907, again using needed . Spearman wondered
his 1904 article in that journal on the American Journal of Psychol- how an investigator would
the correlation between selected know his measurements were
ogy . Subsequently, a proof in a form efficient enough except by
characteristics of sibling pairs . often encountered in present day using the correction formula?
I hardly know whether it is need- textbooks on educational and psy- 3. To Pearson's criticism that the
ful to refer here to a recent article chological measurement was given correction could produce coeffi-
by Mr. C. Spearman . . . criticis- by Spearman (1910) and also by cients greater than one, Spear-
ing my results for the similarity William Brown (1910), both of man countered that this might
of inheritance in the physical and whom ascribed it to Yule . This de- occur due to sampling error. He
psychical characters. Without rivation stressed that the error com- recommended (p. 277) the coef-
waiting to read my paper in full ponents of all measures should be ficient be set to one whenever
he seems to think I must have independent, and hence uncorre-
disregarded "home influences" this happened.
and the personal equation of the lated. Spearman's earlier "proof"
The Spearman-Brown formula.
school teachers. He proceeded to had not emphasized this restriction
Coefficients of reliability are needed
"correct" my results for the error (Walker, 1929).
in order to apply the correction for
of what he calls dilation on the Pearson was not the only critic of attenuation . In his 1904 article,
double basis (i) of a formula in- the correction for attenuation . In Spearman had assumed the avail-
vented by himself, but given with- particular, Brown (1910) challenged ability of two independent measure-
out proof, and (ii) of his own it on the grounds that measurement ments of both the characteristics for
experience that two observers' ob- error is not really random (acciden-
servations or measurements of which a corrected correlation coeffi-
tal). Brown proposed a way of test- cient is desired . The breakthrough,
the same series of two characters ing the equality of the covariances
were such that the correlation be- apparently achieved independently
tween their determinations was S(x 1y l) and S(x2y2), where xl, x2, yl, by Spearman and Brown, to a for-
.58 in one case and .22 in the and y2 are observed-score variables . mula by which to calculate a relia-
other. The formula invented by Assuming xl is parallel to x2 and yl bility coefficient from the two halves
Mr. Spearman for his so-called is parallel to y2, then both these co- of just one composite measure was
"dilation" is clearly wrong, for ap- variances, according to classical the- published in adjacent articles in a
plied to perfectly definite cases, it ory, should equal S(xy), where x and 1910 issue of The British Journal of
gives values greater than unity y are the true-score variables associ- Psychology . Brown's proof of the for-
for the correlation coefficient . As ated with {xi, x2} and {yl, y2}, respec- mula is the more elegant and bears
to his second basis, all I can say is tively. Brown (1913) reported the stamp of Yule.
that if the correlation between results based on an application of During the second and third
two observers of the same thing decades of the 20th century, numer-
in Mr. Spearman's case can be as this proposal, results he claimed did
show that measurement errors are ous experiments were conducted to
low as .22, he must have em- test predictions of the Spearman-
ployed the most incompetent ob- not accidental.
Spearman was aware of Brown's Brown formula . A thoughtful review
servers, or given them the most
imperfect instructions, or chosen criticism, among others, and pre- of publications emanating from this
a character [more] suitable for sciently responded as follows in his preoccupation of early psychometri-
random guessing than obser- cians can be found in the notes on
British Journal of Psychology arti-
vation in the scientific sense . test theory prepared by Louis Leon
cle of 1910:
Mr. Spearman says that "it is diffi- Thurstone (1932).
cult to avoid the conclusion that 1. Spearman reiterated his posi- The index of reliability and other
the remarkable coincidence an- tion that of the two kinds of results. It is a worthwhile experi-
nounced between physical and errors in measurement, "regu- ence, though in the late 20th cen-
Winter 1997
z sentially tau equivalent if, for any ability from a single trial is not
6t - nPq explicitly analyzed . (p. 257)
examinee in the test population, the
(n - 1) npq I
true score on one test half differs Guttman derived six lower
from the true score on the other half bounds to reliability, of which three
which, when substituted in Equa- by a constant, which is the same for are noted here. One of these is a
tion 3, and simplified, gives all examinees. Also, under the es- generalization of KR20 to tests com-
sentially tau-equivalent assump- posed of items scored on any scale,
n r6t - npq
tion, as distinct from the parallel dichotomous or otherwise . He la-
2 test-halves assumption, the error beled this index r3 . Subsequently, r3
6t
variance for an examinee on one became better known as coefficient
test half is not necessarily equal to alpha (Cronbach, 1951). Two of the
Formula 21 was derived under the the error variance for the examinee
additional assumption of equal item other lower bounds to reliability
on the other half (see Lord & were labeled, not surprisingly, f1
difficulty indices . Novick, 1968, p. 50). and r2, with the former typically
Kelley criticized the KR formulas Whatever the wellspring of work smaller than alpha, the latter typi-
in a 1942 article in Psychometrika on lower bounds, Louis Guttman
on the grounds that they are valid cally larger. Research on lower
(1945) published the first article, as bounds to reliability constitutes a
only for tests "with unity of pur- far as I know, in which lower bounds small but still active line of psycho-
pose"-that is, for tests composed of to reliability are explicitly derived .
items that share just one factor in metric research .
But this Psychometrika article is
common. Reiterating his long-stand- important for a reason other than
ing advocacy of the parallel-test the lower bounds it contains .
design for reliability estimation, Formalization
Guttman also offered a theoretical
Kelley went on to say "we conclude framework within which to treat, if Various attempts to formalize clas-
that a belief that two or more mea- sical test theory have been made
not actually to reconcile, the antag-
sures of a mental function exist is onistic views of Brown and Kelley over the years. Already mentioned
prerequisite to the concept of relia- regarding how the reliability coeffi- is the section on reliability in Kel-
bility, and, further, not only that ley's (1923) Statistical Method.
cient should be estimated . Guttman
they exist but that they are avail- did this by first identifying three Another early work is that by Thur-
able before a measure of reliability sources of variation in test re- stone (1932). The next presentation
is possible" (p. 76) . As a further chal- sponses-persons, items, and trials . of note is Theory of Mental Tests by
lenge to the KR formulation, Kelley Harold Gulliksen (1950) . The culmi-
Guttman defined error variance ex-
demonstrated that a test with zero clusively in terms of variation in re- nation of such efforts as these was
interitem covariances could produce sponses over the universe of trials . realized in the work of Melvin
a reasonable correlation with a This definition leads to a proof of Novick (1966) and in the early chap-
"similar" (p. 81) test, even though its total test variance as the sum of ters of Statistical 'theories of Mental
KR2o index would be zero . true-score and error variance, with- Test Scores (1968) by Frederic Lord
It is obvious, a half-century later, out the need to assume zero covari- and Novick.
that Kelley's view did not prevail . ance of true and error scores . (The
The KR formulations quickly re- latter assumption lies at the heart
ceived widespread acceptance, abet- of Yule's proof of the correction for Concluding Remarks
ted in part by the publication of an attenuation .) Defining the reliabil- Several important topics from the
article by Paul Dressel (1940). Dres- ity coefficient "as the complement of realm of classical test theory have
sel showed that, when all the items the ratio of error variance to total not been covered in this brief retro-
of a test intercorrelate perfectly and [test] variance" (p. 257), Guttman
all item variances are equal, KR20 spective. Among them are the ef-
then went on to demonstrate (pp . fects of range restriction on the
attains the value of l; otherwise, it 267-268) that the reliability coeffi-
is less. He further demonstrated magnitude of the reliability coeffi-
cient can be estimated as the corre- cient, the application of analysis of
that KR20 can take values less than lation between the test scores for a
0. Dressel also increased the applic- variance to the study of measure-
group of examinees on two "experi- ment error and reliability (this be-
ability of KR20 by deriving a version mentally independent" (p. 264) tri-
for tests to which the correction for fore the advent of generalizability
als of a test. Given the results from
guessing is applied . theory), and the modeling of test
only one trial, however, Guttman
Lower bounds to reliability. Per- data generally addressed under the
showed that the best result possible
haps it was Dressel's demonstration topic of congeneric models . Clearly,
is an estimate of a lower bound to
that KR20 can be negative that there is more to classical test theory,
reliability. This result was shown to
marks the beginning of work on and its history, than the work re-
rest on the assumption
lower bounds to reliability. Alterna- viewed in this article .
that the errors of observation are Lest we leave this topic thinking
tively, a case might be made for independent between items and
Philip Rulon's (1939) article in the classical test theory an unduly
between persons over the uni-
Harvard Educational Review. Rulon verse oftrials . In the conventional important area of research in the
introduced the notion, not the name, [Yule] approach, independence is history of empirical research in psy-
of essentially tau-equivalent test taken over persons rather than chology and education, we will find
halves . The halves of a test are es- trials, and the problem of observ- it salutary to reflect on the following
Winter 1997 13
Kelley, T. L. (1923) . Statistical method . characters in man, and its compari- Sheynin, O. B. (1968). On the early his-
New York: Macmillan . son with the inheritance of physical tory ofthe law of large numbers . Bio-
Kelley, T. L. (1942). The reliability coef- characters. Biometrika, 3, 131-190. metrika, 55, 459-467 .
ficient . Psychometrika, 7, 75-83 . Pearson, K. (1930). The life, letters, and Spearman, C. (1904). The proof and
Kuder, G. F., & Richardson, M. W labours of Francis Galton . Vol. IIIA. measurement of association between
(1937) . The theory of estimation of Correlation, personal identification two things. American Journal ofPsy-
test reliability. Psychometrika, 2, and eugenics . Cambridge : The Uni- chology, 15, 72-101.
151-160 . versity Press . Spearman, C. (1907). Demonstration of
Lord, F. M., & Novick, M. R. (1968). Sta- Pearson, K, & Lee, A. (1903). On the formulae for true measurement of
tistical theories of mental test scores . laws of inheritance in man. I. Inheri- correlation. American Journal of Psy-
Reading, MA: Addison-Wesley. tance of physical characters. Bio- chology, 18, 160-169 .
Novick, M. R. (1966). The axioms and metrika, 2, 357-462 .
Spearman, C. (1910) . Correlation calcu-
principal results of classical test the- Read, C. B. (1985). Normal distribution . lated from faulty data. British
ory. Journal of Mathematical Psychol- In S. Kotz & N. L. Johnson (Eds.),
ogy, 3, 1-18 . Journal of Psychology, 3, 271-295 .
Encyclopedia of statistical sciences
Pearson, K. (1896). Mathematical con- (Vol. 6, pp. 347-359) . Toronto : Wiley. Thurstone, L. L. (1932). The reliability
tributions to the theory of evolution- Richardson, M. W. (1936). Notes on the and validity of tests . Ann Arbor, MI:
III . Regression, heredity and rationale of item analysis . Psychome- N. p.
panmixia . Philosophical Transac- trika, 1(1), 69-76 . Venn, J. (1888) . The logic of chance (3rd
tions, A, 187, 252-318 . Rulon, P J. (1939). A simplified proce- ed.). London : Macmillan .
Pearson, K. (1904). On the laws of in- dure for determining the reliability of Walker, H. M. (1929). Studies in the his-
heritance in man. II. On the inheri- a test by split-halves . Harvard Edu- tory of statistical method . Baltimore:
tance of the mental and moral cational Review, 9, 99-103 . Williams & Wilkins.