Вы находитесь на странице: 1из 40

Use of Ranks in One-Criterion Variance Analysis

Author(s): William H. Kruskal and W. Allen Wallis


Reviewed work(s):
Source: Journal of the American Statistical Association, Vol. 47, No. 260 (Dec., 1952), pp. 583-
621
Published by: American Statistical Association
Stable URL: http://www.jstor.org/stable/2280779 .
Accessed: 05/07/2012 20:25

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal
of the American Statistical Association.

http://www.jstor.org
JOURNAL OF THE AMERICAN
STATISTICAL ASSOCIATION
Number260 DECEMBER 1952 Volume47

USE OF RANKS IN ONE-CRITERION VARIANCE


ANALYSIS
WILLIAM H. KRUSKAL AND W. ALLEN WALLIS
University
of Chicago
1. INTRODUCTION ................................................. 584
1.1 Problem ...................... ............................ 584
1.2 Usual Solution ............................................. 584
1.3 Advantages of Ranks ....................... 585
1.4 The H Test ........................... 586
2. EXAMPLES..................................................... 587
2.1 Without Ties .......................... 587
2.2 With Ties ................................................. 588
3. JUSTIFICATION OF THE METHOD .. ............................... . 590
3.1 Two samples .......................... 590
3.1.1 Continuity adjustment ............................... 591
3.1.2 Ties ................................................ 592
3.1.3 Examples ........... ................................ 593
3.2 Three Samples ............................................. 595
3.3 More than Three Samples.................................... 597
4. INTERPRETATION OF THE TEST . . . 598
4.1 General Considerations ..................................... 598
4.2 Comparison of Means when Variability Differs . ........... 598
5. RELATED TESTS . ... 600
5.1 PermutationTests and Ranks ...................... 600
5.2 Friedman'sXr2 .......................... 601
5.3 Wilcoxon's Two-Sample Test ................... 602
5.3.1 Wilcoxon (1945, 1947) .602
5.3.2 Festinger (1946) .602
5.3.3 Mann and Whitney (1947) .603
5.3.4 Haldane and Smith (1948) .603
5.3.5 White (1952) .604
5.3.6 Power of Wilcoxon's test............................. 604
5.4 Whitney's Three-Sample Test ................... 605
5.5 Terpstra's C-Sample Test ........................ 606
5.6 Mosteller's C-Sample Test .................... 606
5.7 Fisher and Yates' Normalized Ranks . ........................ 606
5.8 Other Related Tests ........................................ 607
5.8.1 Runs ................ ............................... 607
5.8.2 Order statistics......................................6 07
6. SIGNIFICANCE LEVELS, TRUE AND APPROXIMATE . . . 608
6.1 True Significance Levels ...................... 608
6.1.1 Two samples ........................................ 608
6.1.2 Three samples ....................................... 608
6.1.3 More than three samples ............................. 608
6.2 Approximate Significance Levels . ................. 609
6.2.1 x2 approximation .................................... 609
6.2.2 r approximation ........ ........................... 609
6.2.3 B approximation .................................... 609
6.3 Comparisons of True and Approximate Significance Levels ...... 618
7. REFERENCES ................................................... 618
583
584 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1952

Given C samples,with ni observationsin the ith sample,


a test of the hypothesisthat the samples are fromthe same
population may be made by rankingthe observationsfrom
from1 to Eni (givingeach observationin a groupof tiesthe
mean of the rankstied for),findingthe C sums of ranks,and
computinga statisticH. Under the stated hypothesis,H is
distributedapproximatelyas X2(C-1), unlessthe samplesare
too small, in which case special approximationsor exact
tablesare provided.One of themostimportantapplicationsof
the test is in detectingdifferencesamong the population
means.*

1. INTRODUCTION
1.1. Problem
A COMMON problemin practicalstatisticsis to deride whether
several samples should be regarded as coming fromthe same
population. Almostinvariablythe samples differ,and the questionis
whetherthe differences signifydifferences among the populations,or
are merelythe chance variationsto be expectedamongrandomsamples
fromthe same population. When this problem arises one may often
assume that the populations are of approximatelythe same form,in
the sense that if theydifferit is by a shiftor translation.
1.2. Usual Solution
The usual technique for attacking such problems is the analysis
of variance with a singlecriterionof classification[46, Chap. 10]. The
variationamongthe sample means, xi,is used to estimatethe variation
among individuals,on the basis of (i) the assumptionthat the varia-
tion among the means reflectsonly random sampling froma popula-
tion in whichindividualsvary,and (ii) the factthat the variance ofthe
means of randomsamples of size ni is o-2/niwhereo-2is the population
variance. This estimate of o-2based on the variation among sample
means is thencomparedwithanotherestimatebased onlyon the varia-
* Based in part on researchsupportedby the Officeof Naval Researchat the StatisticalResearch
Center,UniversityofChicago.
For criticismsof a preliminarydraftwhichhave led to a numberof improvementswe are in-
debtedto Maurice H. Belz (Universityof Melbourne),WilliamG. Cochran(JohnsHopkinsUniversity),
J. Durbin (London School of Economics), ChurchillEisenhart (Bureau of Standards),WassilyHoeff-
ding (Universityof North Carolina), Harold Hotelling (Universityof North Carolina), Howard L.
Jones (Illinois Bell Telephone Company), Erich L. Lehmann (Universityof California),William G.
Madow (Universityof Illinois), Henry B. Mann (Ohio State University),AlexanderM. Mood (The
Rand Corporation),LincolnE. Moses (StanfordUniversity),FrederickMosteller(HarvardUniversity),
David L. Russell (Bowdoin College), I. Richard Savage (Bureau of Standards),FrederickF. Stephan
(PrincetonUniversity),Alan Stuart (London School of Economics), T. J. Terpstra (Mathematical
Center,Amsterdam),JohnW. Tukey (PrincetonUniversity),Frank Wilcoxon (AmericanCyanamnid
Company),and C. AshleyWright(StandardOil Company of New Jersey),and to our colleaguesK. A.
Brownlee,HerbertT. David, Milton Friedman,Leo A. Goodman,Ulf Grenander,JosephL. Hodges.
HarryV. Roberts,MurrayRosenblatt,Leonard J. Savage, and CharlesM. Stein.
USE OF RANKS IN ONE-CRITERION VARIANCE ANALYSIS 585
tion withinsamples. The agreementbetween these two estimates is
tested by the variance ratio distributionwith C -1 and N - C degrees
of freedom(where N is the numberof observationsin all C samples
combined),usingthe test statisticF(C- 1, N- C). A value of F larger
than would ordinarilyresultfromtwo independentsample estimates
ofa singlepopulationvarianceis regardedas contradictingthe hypoth-
esis that the variationamongthe sample means is due solelyto random
samplingfroma populationwhose individualsvary.
When o-2is known,it is used in place of the estimate based on the
variation withinsamples, and the test is based on the X2(C- 1) dis-
tribution(that is, x2 with C-1 degrees of freedom) using the test
statistic

(1.1) X2(C- 1) =2E

where9 is the mean of all N observations.

1.3. Advantagesof Ranks


Sometimes it is advantageous in statistical analysis to use ranks
instead of the originalobservations-that is, to array the N observa-
tionsin orderof magnitudeand replace the smallestby 1, the next-to-
smallest by 2, and so on, the largest being replaced by N. The ad-
vantages are:
(1) The calculationsare simplified.Most of the labor whenusing ranksis in
makingthe rankingitself,and short cuts can be devised for this. For
example, class intervalscan be set up as for a frequencydistribution,
and actual observationsenteredinsteadof tallymarks.Anothermethodis
to recordthe observationson cardsor plasticchips'whichcan be arranged
in order,the cardsperhapsby sortingdevices.
(2) Only verygeneralassumptionsare made about the kind of distributions
fromwhichthe observationscome. The only assumptionsunderlyingthe
use of ranks made in this paper are that the observationsare all inde-
pendent,that all thosewithina givensample come froma singlepopula-
tion,and that theC populationsare of approximatelythe same form.The
F and x2 testsdescribedin the precedingsectionassumeapproximatenor-
malityin addition.
(3) Data available onlyin ordinalformmay oftenbe used.
(4) Whenthe assumptionsof the usual testprocedureare too farfromreality,
not onlyis therea problemof distribution theoryifthe usual testis used,
but it is possiblethat the usual test may not have as good a chance as a
ranktest of detectingthe kindsof difference of real interest.
The presentpaper presentsan analog, based on ranksand called the
H test, to one-criterionvariance analysis.
'We are indebtedto Frank Wilcoxonforthissuggestion.
586 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1952

1.4. The H Test


The rank test presentedhere requiresthat all the observationsbe
ranked together,and the sum of the ranks obtained foreach sample.
The test statisticto be computedifthereare no ties (that is, if no two
observationsare equal) is
12 c R2
(1.2) HN= + 1)_-3(N + 1) (no ties)
N(N + 1) i==1 ni
where
C=the numberof samples,
ni=the numberof observationsin the ith sample,
N=Yni, the numberof observationsin all samples combined,
Ri =the sum of the ranks in the ith sample.
Large values of H lead to rejectionof the null hypothesis.
If the samples come fromidentical continuouspopulations and the
ns are not too small,H is distributedas x2(C- 1), permittinguse ofthe
readilyavailable tables of X2. When the ni are small and C=2, tables
are available which are described in Section 5.3. For C = 3 and all
ni< 5, tables are presentedin Section 6. For othercases wherethe x2
approximationis not adequate, two special approximationsare de-
scribedin Section 6.2.
If thereare ties,each observationis giventhe mean of the ranks for
whichit is tied. H as computedfrom(1.2) is then divided by

(1.3) 1- >2T
N 3- N
where the summationis over all groups of ties andT=(t-1)t(t+1)
=t3-t foreach group of ties, t being the numberof tied observations
in the group.Values of T fort up to 10 are shownin Table 1.1.2

TABLE 1.1
(See Section3.1.2)

t 1 2 3 4 5 6 7 8 9 10
T 0 6 24 60 120 210 336 504 720 990

Since (1.3) must lie between zero and one, it increases (1.2). If all N
observationsare equal, (1.3) reduces (1.2) to the indeterminateform
0/0. If there are no ties, each value of t is 1 so ET=0 and (1.2) is
2 DuBois [4, Table nIgivesvalues ofT/12 (his C1) and T/6 (his ce) fort (his N) from5 to 50.
USE OF RANKS IN ONE-CRITERION VARIANCE ANALYSIS 587
unaltered by (1.3). Thus, (1.2) divided by (1.3) gives a general ex-
pressionwhichholds whetheror not thereare ties, assumingthat such
ties as occur are given mean ranks:
12 c R 2

i=i ni
(1.4) H _ N(N + 1)
1- E T/(N3- N)
In many situationsthe difference between (1.4) and (1.2) is negligible.
A workingguide is that withten or fewersamples a x2 probabilityof
0.01 or more obtained from(1.2) will not be changed by more than
ten per cent by using (1.4), providedthat not morethan one-fourthof
the observationsare involved in ties.3H forlarge samples is still dis-
tributedas X2(C- 1) when ties are handled by mean ranks; but the
tables forsmall samples, while still useful,are no longerexact.
For understandingthe nature of H, a betterformulationof (1.2) is
N 1 c ni[Ri - l(N + 1)12
-
(1.5) H = E (no ties)
N il1 (N2 - 1)/12
whereRi is the mean ofthe ni ranksin the ith sample. If we ignorethe
factor(N-1)/N, and note that !(N+1) is the mean and 1(N2-1)
the variance of the uniformdistributionover the firstN integers,we
see that (1.5), like (1.1), is essentiallya sum of squaredstandardized
deviations of random variables fromtheir population mean. Inthis
respect,H is similar to X2, which is definedas a sum of squares of
standardized normal deviates, subject to certain conditions on the
relationsamong the termsof the sum. If the ni are not too small, the
Xi jointlywill be approximatelynormallydistributedand the relations
among them will meet the x2 conditions.
2. EXAMPLES
2.1 WithoutTies
In a factory,three machinesturn out large numbersof bottle caps.
One machine is standard and two have been modifiedin different
ways, but otherwisethe machines and their operatingconditionsare
identical. On any one day, only one machine is operated. Table 2.1
3 Actually,forthe case describedit is possibleforthe discrepancyslightlyto exceed ten per cent.
For a giventotal numberofties,S, the secondtermof (1.3) is a maximumifall S ties are in one group
and this maxiInum, (83 -) I/(N3 -N), is slightly less than (S/N)3. Thus, for S/N = -, (1.3) >63/64.
The 0.01 level of x2(9) is 21.666. This dividedby 63/64 is 22.010,forwhichthe probabilityis 0.00885,
a changeof 114 per cent. For higherprobabilitylevels,fewersamples,or morethan one groupof ties,
the percentagechangein probabilitywould be less. With the S ties dividedinto G groups,the second
termof (1.3) is always less than [(S-h)3 +4h1/N3,whereh =2(G-1).
588 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1952

shows the productionof the machineson various days and the calcu-
lation of H as 5.656. The true probability,if the machines really are
the same with respectto output, that H should be as large as 5.656
is shown in Figure 6.1 and Table 6.1 as 0.049. The approximationto
this probabilitygiven by the x2(2) distributionis 0.059. Two more-
complicatedapproximationsdescribed in Section 6.2 give 0.044 and
0.045.
TABLE 2.1
DAILY BOTTLE-CAP PRODUCTION OF THREE MACHINES.
(Artificialdata.)

Standard Modification1 Modification2

Output Rank Output Rank Output Rank

340 5 339 4 347 10


345 9 333 2 343 7
330 1 344 8 349 11
342 6 355 12
338 3

Sum
n 5 3 4 12
R 24 14 40 78
R2/n 115.2 65.333 400. 580.533

Checks: En =N=12 >ZR= JN(N +1) = 78

H 12 X580-533 X13=5.656 2(2) from(1.2)


12 X 13
Pr[x2(2) 25.656] =0.059 from [9] or [13]
Pr[H(5, 4, 3) _5.656] =0.049 fromTable 6.1

If the productiondata ofTable 2.1 are comparedby the conventional


analysis of variance, F(2, 9) is 4.2284, correspondingto a probability
of 0.051.
2.2 WithTies
Snedecor's data on the birth weight of pigs [46, Table 10.12] are
shown in Table 2.2, togetherwith the calculation of H adjusted for
the mean-rankmethodof handlingties. Here H as adjusted4is 18.566.
The true probabilityin this case would be difficultto find,'but the
4Note that,as will oftenbe truein practice,the adjustmentis notworththe troubleevenin thiscase:
by changingH from18.464 to 18.566,it changedthe probabilityby only 0.0003, or 3 per cent. Since
thereare 47 ties in 13 groups,we see fromthe last sentenceof note 3 that (1.3) cannot be less than
1-(233 +96) /563,whichis 0.9302.
USE OF RANKS IN ONE-CRITERION VARIANCE ANALYSIS 589
o ~ ~~~~
CZC

X<5 ce
o0 Gr<, t

W mX

cqC,ce

Pv ; < 00 CtC
~~CO
CSCq. CSC

h ~ ~~~~~
W c ooN

> vo ; B ecs nn 1~~~~~~cor-


E.. 0 C
C;l =H ? $ e X ?? O < o g

;~~~~~~~C
O' NGGN .- to
? ? r:;> _X
tO
,zlew
0
Hlew-bSew view X X + 0-- 1194

Cm~~~~~q cq c> 1sc~T can

m Q ; < cs ce es O ce 23s oo s Q1e

o _
U X HoHo
N4l
vewS- r- -r4r- CO
- -
<~~~~~~~~~~c

< H~~~~~~~~t
~~~o t X
~~~~~~~~~~~~~~~~~~~~~~~~~r-
11 + C

Ce Co CrOvJ Cq C
M CO C.0 -4ee

iC C
~~~~~~~~~~~~~~->
CS U- Cq d4 00 1 O
< $ s os t 1 +M
os
to00C U-
Cocsnood CO X0 S- to C;

Ca Cqt ~
1011~ , C
590 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1952

x2(7) approximationgives a probabilityof 0.010. Two betterapproxi-


mationsdescribedin Section 6.2 give 0.006 and 0.005.
The conventionalanalysis of variance [46, Sec. 10.8] gives F(7, 48)
= 2.987, corresponding
with a probabilityof 0.011.

3. JUSTIFICATION OF THE METHOD

3.1. TwoSamples
The rationale of the H test can be seen most easily by considering
the case of only two samples, of sizes n and N-n. As is explained in
Section 5.3, the H test for two samples is essentiallythe same as a
test-publishedby Wilcoxon[61] in 1945 and later by others.
In this case, we considereitherone of the two samples,presumably
the smallerforsimplicity,and denoteits size by n and its sum of ranks
by R. We ask whetherthe meanrankofthissampleis larger(orsmaller)
than would be expected if n of the integers1 throughN were selected
at randomwithoutreplacement.
The sum of the firstN integersis jN(N+ 1) and the sum of their
squares is 1N(N+ 1) (2N+ 1). It followsthat the mean and variance of
the firstN integersare 1(N+1) and -&?(N2-1).
The means of samples of n drawn at random withoutreplacement
fromthe N integerswill be normallydistributedto an approximation
close enoughforpracticalpurposes,providedthat n and N-n are not
too small. The mean of a distributionofsample means is, of course,the
mean of the originaldistribution;and the variance of a distributionof
sample means is (o-2/n)[(N-n)/(N-1)], where o2 is the population
variance,N is the populationsize, and n is the sample size. In thiscase,
(N2-
T-2=1 so

(3.1) _
(N2-1)(N-n) (N + 1)(N-n)
12n(NV- 1) 12n
wherea'2 representsthe variance of the mean of n numbersdrawn at
random withoutreplacementfromN consecutiveintegers.Letting R
denote the mean rank fora sample of n,

(3.2) V/(N + 1)(N - n)/12n


may be regardedas approximatelya unit normaldeviate. The square
of (3.2) is H as givenby (1.2) with5C= 2, and the square of a unit nor-
mal deviate has the x2(l) distribution.
by replacingR in (3.2) by R/n and lettingthe two values of Ri in (1.2) be
6 This may be verified
R and 'N(N+1) -R, withn and N-n the corresponding values ofni.
USE OF RANKS IN ONE-CRITERION VARIANCE ANALYSIS 591
Noticethatthisexpression of
is thesame,exceptforsign,whichever
thetwosamplesis usedto computeit. For ifthefirstsamplecontains
n rankswhosemean- is X, theothersamplemustcontainN-n ranks
whosemeanis
()N(N+1) -n7
(3.3) N-n
and thevalueof (3.2) is changedonlyin signifwe interchange n and
N-n, and replaceR by (3.3).
In thetwo-sample case thenormaldeviateis perhapsa littlesimpler
to computethanis H; furthermore, the signofthe normaldeviateis
neededif a one-tailtestis required.For computations,formula(3.2)
maybe rewritten
nN + l1)
21R- n(N
(3.4)2R
()Vn(N + 1)(N -n)/3
The nullhypothesis is thatthetwosamplescomefromthesamepopu-
lation.The alternativehypothesisis thatthesamplescomefrompopu-
lationsofapproximately thesameform,butshifted ortranslatedwith
respectto each other.If we are concerned withthe one-sidedalterna-
tivethatthepopulationproducing thesampleto whichR and n relate
is shiftedupward,then we rejectwhen (3.4) is too large.The critical
at
levelof(3.4) the a levelofsignificanceis approximately Ka, theunit
normaldeviateexceeded withprobability a, as defined by

(3.5) = a.
e_lx2dX
,\27r aR
Valuesof (3.4) as largeas K, or largerresultin rejectionofthe null
hypothesis. If the alternative is one-sidedbut fora downwardshift,
is
thenullhypothesis rejected when (3.4) is as smallas - Ka orsmaller.
If thealternative is two-sided and symmetrical, thenullhypothesisis
rejectedif (3.4) falls the
outside range - Kia to +Kja.
3.1.1. Continuity adjustment. It seemsreasonableto expectthat a
continuity adjustment maybe desirable, to allowforthefactthatR,
the sum of the ranksin one sample, can take onlyintegralvalues,
whereasthe normal distribution is continuous.6 In testingagainsta
two-sidedalternative to the null hypothesis, the adjustmentis made
forthetwo-sampletest [28] withthosebased on the
5 An extensivecomparisonofexactprobabilities
normalapproximationindicatesthat the normalapproximationis usually betterwith the continuity
adjustmentwhen the probabilityis above 0.02, and betterwithoutit when the probabilityis 0.02 or
below. This comparisonwas made forus by Jack Karush, who has also renderedinvaluable assistance
withnumerousothermattersin the preparationof thispaper.
592 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1952

by increasingor decreasingR by I, whicheverbringsit closerto


In(N+ 1), beforesubstitutinginto(3.4). (If RI=-n(N+ 1), ignorethe
continuity adjustment.)Witha one-sidedalternative, R is increased
(decreased)by I if the alternative
is thatthe sampleforwhichR is
computedcomesfromthe populationwhichis to the left(right)of
theother.
3.1.2. Ties.If someoftheN observations areequal,we suggestthat
eachmember ofa groupoftiesbe giventhemeanoftherankstiedforin
that group.This does not affectthe mean rank, t(N+1). It does,
however, reducethevariancebelow I (N2- 1). LettingT = (t- 1)t(t+1)
foreachgroupofties,wheretis thenumberoftiedobservations in the
group,and lettingET represent the sum of the valuesof T forall
groupsofties,wehave,insteadof(3.1),

(3.6) 2 N(N2-1) - T N-n


Xk 12Nn N - i
as thevarianceofthemeanrankforsamplesofn. Whenthereare no
ties, T=0 and (3.6) reducesto (3.1),so (3.6) maybe regardedas the
generalexpressionforufj whenthemean-rank methodis usedforsuch
tiesas occur.Noticethat(3.6) is theproductof(3.1) and (1.3).
This adjustment comesaboutas follows :7 The variance (N2-1) is
obtainedby subtracting thesquareofthemeanfromthemeanofthe
squares of N consecutiveintegers.If each of the t integers(x+1)
through(x+t) is replacedby x+2(t+1), the sumis not changedbut
thesumofthesquaresis reducedby

(3.7) (x+ i)2-t(x+


-- ) 1 12

So themeanofthesquares,and consequently thevariance,is reduced


by T/12N.
The mean-rank methodofhandlingties somewhatcomplicates the
continuityadjustment,forthepossiblevaluesofR are no longersim-
ply the consecutiveintegers ln(n+1) to 'n(2N-n+1), nor need
theybe symmetricalabout4l(N+1). Ourguess,however,is thatit is
betterto makethe ? 2 adjustment ofSection3.1.1.thannotto make
any.
7 This is the adjustmentalluded to by Friedman [10, footnote11]. An equivalentadjustmentfor
mean rankshas been suggestedby Hemelrijk [16, formula(6) 1, but in a very complexform.A much
simplerversionofhis formula(6) is obtainedby multiplying our (3.6) by n2.The same adjustmenthas
been suggestedby Horn [18a].
This adjustment,however,goes back at least as faras a 1921 paper by 'Student' [48al, applying
it to the Spearmanrankcorrelationcoefficient. For furtherdiscussionand otherreferences, see Kendall
[20, Chap. 3].
USE OF RANKS IN ONE-CRITERION VARIANCE ANALYSIS 593
An alternativemethodof handlingties is to assign the ranks at ran-
dom withina group of tied observations.The distributionof H under
the null hypothesisis then the same as iftherehad been no ties, since
the null hypothesisis that the ranks are distributedat random. In or-
der to use this method,adequate randomizationmustbe providedwith
consequentcomplicationsin makingand verifying computations.Some
statisticiansargue furtherthat the introductionof extraneousrandom
variabilitytendsto reducethe powerofa test. We do not knowwhether
forthe H test random rankingof ties gives more or less power than
mean ranks; indeed,it may be that the answervaries fromone alterna-
tive hypothesisto anotherand fromone significancelevel to another.8
When all membersof a groupofties fallwithinthe same sample,every
assignmentof theirranks gives rise to the same value of H, so that it
mightbe thoughtartificialin this instance to use mean-ranks;even
here,however,an eristicargumentcan be made formean ranks,on the
groundthat H interpretsa particularassignmentof ranks against the
backgroundof all possibleassignmentsof the same ranksto samples of
the given sizes, and some of the possible assignmentsput the ties into
different samples.9
3.1.3. Examples. (i) As a firstexample considera particularlysimple
one discussed by Pitman [41].

TABLE 3.1
PITMAN EXAMPLE [41, p. 122]

Sample A Sample B

Observation Rank Observation Rank

0 1 16 4
11 2 19 5
12 3 22 7
20 6 24 8
29 9

n=4, N=9, R=1,2

8 A fewcomputationsforsimpledistributions and small samples,some carriedout by Howard L.


Jonesand someby us, showedmeanrankssuperiorsometimesand randomranksothers.For theoretical
purposes,randomrankingof ties is mucheasier to handle.Forpracticalpurposes,it shouldbe remem-
beredthat therewillordinarily be littledifference
betweenthe two methods;see notes3 and 4. Compu-
tationalconsiderations, therefore, lead us to suggestthe mean-rankmethod.
Rankingoftied observationsat randomshouldbe distinguished fromincreasingthe powerofa test
by rejectingor acceptingthe null hypothesison the basis ofan ancillaryrandomdevice,in such a way
as to attain a nominalsignificance level which,because of discontinuities,
could not otherwisebe at-
tained. Discussionsof thisare givenby Eudey [6] and E. S. Pearson [37].
9 This is illustratedin the calculationofthe exact probabilityforthe data ofTable 3.2.
594 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1952

If we use the two-tailH testwithoutadjustmentforcontinuity,


we
computethe approximate deviatefrom(3.4):
unit-normal
X -(4 X10)
(2(2X12)
12)-(4 X 10) _ 16
__ - - = - 1.9596 (no adjustment)
V/(4 X 10 X 5)/3 V/200/3
to a two-tailnormalprobability
corresponding of0.0500.
If we makethecontinuity we get:
adjustment,
(2 X 12.5)-(4 X 10) 15 (continuity
__ ~~~~~ = - 1.37 (cniut
V/(4X 10 X 5)/3 3200/3 adjustment)
corresponding to a two-tailnormalprobability of0.0662.
Actually,sincethe samplesare so small,it is easy to computethe
trueprobabilityunderthenullhypothesis ofa valueofR as extreme as,
or moreextremethan,12. Thereare 9!/4!5!or 126 waysofselecting
fourranksfromamongthenine,and all 126waysareequallyprobable
underthenullhypothesis. Onlyfourofthe 126lead to valuesofR of
12 orless.By symmetry anotherset of4 lead to valuesas extremebut
in theoppositedirection,thatis, n(N+ 1) - f = 28 or more.Hencethe
true probabilityto comparewith the foregoing approximationsis
8/126,or 0.06349.This value can also be obtainedfromthe tables
givenby Mann and Whitney[28]; theyshow0.032 forone tail, and
whendoubledthisagrees,exceptforrounding, withourcalculation.'0
(ii) A second,and morerealistic,examplewillillustrate thekindof

TABLE 3.2
BROWNLEE EXAMPLE [2, p. 361

Method A Method B

Value Rank Value Rank

95.6 91 93.3 4
94.9 7 92.1 3
96.2 12 94.7 51
95.1 8 90.1 2
95.8 11 95.6 91
96.3 13 90.0 1
94.7 5i

R =601, n = 6, N=13

10Pitman [41] givesa testwhichis likeH exceptthatitconsiderspossiblepermutationsoftheactual


observationsinstead oftheirranks.For the exampleof Table 3.1, Pitman's test yieldsa two-tailprob-
abilityof 5/126 or 0.03968.
USE OF RANKS IN ONE-CRITERION VARIANCE ANALYSIS 595
complicationthat arisesin practice.Table 3.2 shows the resultsof two
alternativemethodsoftechnicalchemicalanalysis. Since thereare ties
(two groupsof two ties), mean ranks are used.
If we use (3.4) withoutadjustingeitherforcontinuityor forthe use
ofmean ranks,we obtain as our approximateunit-normaldeviate
121 - 84 37
_84 _X = - = 2.6429
3__ (no adjustments)
V\(84X 7)/3 14
whichcorrespondsto the two-tailnormalprobabilityof 0.0082.
If we use the adjustmentformean ranks,we findthat ET = 12, so
(3.6) gives oSR= 1.1635 and the denominatorof (3.4), whichis
(3.8) 2R= 2naw,
is adjusted to 13.9615. This leads to the approximateunit-normalde-
viate
121 - 84
13.9615
= 2.6501 (adjustedfor mean ranks)

corresponding to a two-tailprobabilityof 0.0080-not appreciablydif-


ferentfromthe resultwithoutthe adjustment.
The continuityadjustment is not desirable in this case, since the
probabilitylevel is appreciablyless than 0.02.6 The commentsof Sec-
tion 3.1.2 about irregularities
in the sequence of possible values of R
also apply. For purelyillustrativepurposes,however,we note that the
effectof the continuityadjustmentwould be to reduce R from604 to
60, resultingin an approximatenormal.deviate of
120 - 84
- = 2.5785 (adjusted
forcontinuity
and meanranks)

forwhichthe symmetricaltwo-tailnormalprobabilityis 0.0099.


The true probabilityin this case can be computedby consideringall
possible sets of six that could be selected fromthe 13 ranks 1, 2, 3, 4,
5a, 51, 7, 8, 9a, 9g, 11, 12, 13. There are 13!/6!7! or 1716 such sets, all
equally probable under the null hypothesis.Six of them give rise to
values of R greaterthan or equal to 60k1, and fivegive rise to values of
R less than or equal to 23k, which is as far below as 60' is above
Wn(N+1). Hence the true probabilityis 11/1716,or 0.00641.
3.2. ThreeSamples
Whenthereare threesamples,we may considerthe average ranksfor
any two of them, say the ith and jth. The other sample, the kth,
596 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1952

we cannotfindout fromtwo,forits mean


wouldnottellus anything
rankmustbe
(39) ?]N(N + 1)-(niR, + n,7i)
N- (ni + n1)
ofXi and Xi willbe
If then's are nottoo small,thejointdistribution
approximately thatbivariatenormaldistribution
whose exponentis

R
(3.10)
(N+ 1)QN+1) jN + 1Y1
2(Xi_ + 2
2p -+
iRJRJ R,

The variancesneededin (3.10) are givenby (3.1) and the correlation


by

(3.11) p= - _

betweenthe meansofsamplesofsizesni and


whichis the correlation
all
ni when ni+ni are drawnat randomwithoutreplacement froma
N
populationof elements.1' Thus theexponent(3.10) ofthe bivariate
normaldistribution whichapproximates of 7R
the joint distribution
and Xi is, whenmultiplied
by -2,
12nini rN-n _ N +
1_ 2
-
(3.12) N(N + 1)(N-ni-ni) L2k n ,
(3-12)
~~/ N + 1\/ N + 1\
+ 2 Ri - 2 )R; 2 )

+N - ni)(N+ 1)21
ihe + bRivi n d
It is well knownthat -2 timesthe exponentof a bivariatenormaldis-
11Although(3.11) is easily derivedand is undoubtedlyfamiliarto expertson samplingfromfinite
populations,we have not foundit in any of the standardtreatises.It is a special case of a formulaused
by Neyman[47,p.39] in 1923,and a moregeneralcase ofone used by K. Pearson [38] in 1924.For assist-
ance in tryingto locate previouspublicationsof (3.11) we are indebtedto ChurchillEisenbart,Tore
Dalenius (Stockholm),W. Edwards Deming (Bureau of the Budget), P. M. Grundy (Rothamsted
ExperimentalStation) who told us of [38], Morris H. Hansen (Bureau of the Census), Maurice G.
Kendall (London School ofEconomics),JersyNeyman (Universityof California)who told us of [47],
June H. Roberts (Chicago), FrederickF. Stephan who provided a compact derivationof his own,
JohnW. Tukey,and Frank Yates (RothamstedExperimentalStation).
USE OF RANKS IN ONE-CRITERION VARIANCE ANALYSIS 597
has thex2(2)distribution
tribution [32,Sec. 10.10].Hence (3.12)could
be takenas our test statisticforthe three-sample problem,and ap-
proximateprobabilitiesfoundfromtheX2tables.
Fromtherelations
(3.13) niRi+ n;R1+ nkRk = iN(N + 1)
and
(3.14) ni + n1 + nk = N
it canbe shownthatthevalueof(3.12) willbe thesamewhichever pair
ofsamplesis usedin it,and thatthisvaluewillbe H as givenby (1.2)
withC=3. For computing, (1.2) has the advantagesofbeingsimpler
than(3.12) and oftreatingall (R, n) pairsalike.
Withthreeor moresamples,adjustments are unim-
forcontinuity
portantexceptwhentheni are so smallthatspecialtablesofthetrue
distributionshouldbe used anyway.
Sincethe adjustment forthe mean-rank methodofhandlingtiesis
a correctionto thesum of squaresofthe N ranks,it is the same for
threeor moregroupsas fortwo.The variancesgivenby (3.1) forthe
case withouttiesarereplacedby (3.6) whenthereareties;hence(1.2)
withmeanranksshouldbe dividedby (1.3) to giveH as shownby
(1.4).
3.3. MorethanThreeSamples
Nothingessentially newis involvedwhenthereare morethanthree
samples.If thereare C samples,themeanranksforany C-1 ofthem
are jointlydistributed approximatelyaccordingto a multivariate nor-
mal distribution,providedthatthesamplesizesarenottoo small.The
exponent ofthis(C- 1)-variatenormaldistribution willhavethesame
valuewhichever set of C-1 samplesis used. This value,whenmulti-
pliedby -2, willbe H as givenby (1.2), and it willbe distributedap-
proximately as X2(C- 1), providedthe ni are not too small.The ex-
ponentoftheapproximating multivariatenormaldistribution is more
complicated thanforthreesamples,butit involvesonlythevariances
ofthe7i as givenby (3.6) and thecorrelations amongpairs(Ri,Ri) as
givenby (3.11).
By usingmatrixalgebra,thegeneralformula forH is obtainedquite
as readilyas theformulas fortwo and threesamplesby the methods
usedin thispaper.A mathematically rigorous discussionofH forthe
generalcase of C samplesis presented elsewhereby Kruskal[25],to-
getherwitha formalproofthat its distribution underthe null hy-
pothesisis asymptotically x2.
598 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1952

4. INTERPRETATION OF THE TEST

4.1. GeneralConsiderations
H teststhenullhypothesis thatthesamplesall comefromidentical
populations. In practice,it willfrequentlybe interpreted, as is F inthe
analysisof variance,as a test that the populationmeansare equal
againstthealternative thatat leastonediffers. So to interpret it,how-
ever,is to implysomething aboutthe kindsofdifferences amongthe
populations which,ifpresent, willprobablylead to a significant value
ofH, and thekindswhich,evenifpresent, willprobablynotlead to a
significantvalueofH. To justifythisoranysimilarinterpretation, we
needto knowsomething aboutthepowerofthetest:Forwhatalterna-
tivesto identity ofthepopulations willthetestprobablylead to rejec-
tion,and forwhatalternatives willit probablylead to acceptanceof
thenullhypothesis thatthepopulationsare identical?Unfortunately,
fortheH testas formanynonparametric teststhepoweris difficult to
investigate andlittleis yetknownaboutit.
It mustbe recognized thatrelationsamongranksneednotconform
to the corresponding relationsamongthe data beforeranking.It is
possible,forexample,thatifan observation is drawnat randomfrom
each oftwopopulations, theone fromthe firstpopulationis largerin
mostpairs,but the averageof thosefromthe secondpopulationis
larger.In such a case the firstpopulationmay be said to have the
higheraveragerankbuttheloweraveragevalue.
It has beenshownby Kruskal[25]that a necessaryand sufficient
condition fortheH testto be consistent12 is thattherebe at least one
of the populationsforwhichthe limitingprobability is not one-half
thata randomobservation fromthispopulationis greaterthanan in-
dependent randommemberoftheN sampleobservations. Thus,what
H reallytestsis a tendency forobservations in at leastoneofthepopu-
lationsto be larger(or smaller)than all the observations together,
whenpairedrandomly. In manycases,thisis practically equivalentto
themeanofat leastone populationdiffering fromthe others.
4.2. ComparisonofMeanswhenVariability Differs
Rigorouslyinterpreted,all we can concludefroma significantvalue
ofH is thatthepopulations thatthemeansdif-
notnecessarily
differ,
if the populationsdiffer
fer.In particular, in variabilitywe cannot,
12 A testis consistentagainstan alternativeif, whenapplied at the same level of significancefor
increasingsample size, the probabilityof rejectingthe nuli hypothesiswhenthe alternativeis trueap-
proachesunity.Actually,the necessaryand sufficient conditionstated heremustbe qualifiedin a way
of the H test suggestedin thisparagraph.An exact state-
that is not likelyto affectthe interpretation
mentis givenin [25].
USE OF RANKS IN ONE-CRITERION VARIANCE ANALYSIS 599
strictly speaking,inferfroma significant value of H thatthe means
differ. In the data ofTable 3.2, forexample,thevariancesofthetwo
chemicalmethodsdiffer significantly (normaltheoryprobability less
than0.01) and substantially (by a factorof 16), as Brownleeshows
[2]. A strictinterpretation ofH and its probability of less than0.01
does not,therefore, justifythe conclusionthatthe meansof the two
chemicalmethodsdiffer.
Thereis somereasonto conjecture, however, thatin practicetheH
testmaybe fairlyinsensitive to differences in variability, and so may
be usefulin the important"Behrens-Fisher problem"of comparing
meanswithoutassumingequalityofvariances.Perhaps,forexample,
we could concludethat the meansof the two chemicalmethodsof
Table 3.2 differ. The following considerations lendplausibility to this
conjecture(and perhapssuggestextendingit to otherdifferences in
form):
(i) The analysisofconsistency referred to in Section4.1 showsthat
iftwosymmetrical populations differonlyby a scalefactorabouttheir
commonmeantheH testis notconsistent forsmallsignificance levels;
in otherwords,belowa certainlevelofsignificance thereis no assur-
ancethatthenullhypothesis ofidenticalpopulations willbe rejected,
no matterhowlargethesamples.
(ii) Considerthefollowing extreme case:Samplesofeightaredrawn
fromtwopopulations havingthesamemeanbut differing so muchin
variability thatthereis virtually nochancethatanyofthesamplefrom
themorevariablepopulation willliewithintherangeoftheothersam-
ple. Furthermore, themedianofthemorevariablepopulationis at the
commonmean,so thatits observations are as likelyto lie above as to
liebelowtherangeofthesamplefromthelessvariablepopulation. The
actual distribution of H undertheseassumptions is easilycomputed
fromthe binomialdistribution withparameters 8 and 2. Figure4.1
showsthe exactdistribution ofH underthe nullhypothesis thatthe
two populationsare completely identical,underthe symmetrical al-
ternativejust described, and undera similarbut skewalternative in
whichtheprobability is 0.65thatan observation fromthemorevaria-
ble populationwilllie belowthe rangeofthe othersampleand 0.35
thatit willlie above. PossiblevaluesofH undereach hypothesis are
thoseat whichoccurthe risersin the corresponding stepfunction of
Figure4.1,andtheprobabilities at thesepossiblevaluesofH aregiven
bythetopsoftherisers. Figure4.1 shows,forexample,thatsamplesin
whichsevenobservations fromthemorevariablepopulationlie above
andoneliesbelowtheeightobservations fromthelessvariablepopula-
tion (so thatthe two valuesof R are 44 and 92, leadingto an H of
600 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1952

6.353) would be judged by the H test to be significantat the 0.010 level


using true probabilities(or at the 0.012 level using the x2 approxima-
tion), while such samples will occur about seven per cent of the time
underthe symmetricalalternativeand about seventeenper cent under
the other.In view of the extremedifference of the variances assumed
in the alternatives,it seems ratherstrikingthat the cumulativedistri-
butions given in Figure 4.1 do not differmore than they do. At least
in the case of the symmetricalalternative,the distributionforthe null

Pr{H>Ho}
1.000
.800
.600
- - -- -
.400
.300-
.200L

.100
.080-
.060-
.040- Skew Alternative
.030 _-? l
.020_

~,.010
oo? - SymmetricalAlternative I
.008 .............................
.006 -
.004
.003 NullHypothesis I
.002 _

.001 2 3 4 5 7
6 a 9 10 i1 12
HO

FIGURE 4.1. Distributionof H fortwo samplesof 8, underthe nullhypothesis


thatthepopulationsare identicaland undertwo alternativesin whichthe means
arethesamebutthevariances areextremely (Forfurther
different. specification
see Section4.2.)
ofthealternatives,

hypothesisseemsnottoo poora partial smoothing,thoughon the whole


it lies too low.
The applicabilityof the H test to the Behrens-Fisherproblem,par-
ticularlyin its two-tailform,meritsfurtherinvestigation.
5. RELATED TESTS

5.1. PermutationTestsand Ranks


The H test stemsfromtwo statisticalmethods,permutationsof the
data, and rank transformations.
USE OF RANKS IN ONE-CRITERION VARIANCE ANALYSIS 601
Permutation tests,whichto the best of our knowledgewerefirst
proposedby Fisher[8] in connection witha defenseof thenormality
assumption, acceptorrejectthenullhypothesis according to theprob-
abilityof a teststatisticamongall relevantpermutations of the ob-
servednumbers;a precisegeneralformulation ofthe methodis given
by Scheff6 [45].Applications ofthepermutation methodto important
cases maybe foundin articlesby Pitman[41,42, 43] and by Welch
[57].
The use ofranks-or moregenerally, ofconventional numbers-in-
steadoftheobservations themselves has beenproposedoften,and we
do notknowto whomthisidea maybe credited.'3 Its advantageshave
beensummarized is lossofinformation
in Section1.3. Its disadvantage
aboutexactmagnitudes.
If in one-criterion varianceanalysisthe permutation methodbased
on theconventional F statisticis combinedwiththerankmethod,the
resultis theH test.
5.2. Friedman'sXr2
Two kindsofdata mustbe distinguished in discussing testsforthe
equality of C populationaverages. The first kind consists of C inde-
pendentrandomsamples, one from each population. The second kind
consistsof C samplesof equal size which are matched (that is, cross-
classified each stratumcontributing
or stratified, one observation to
each sample)according to some criterionwhich may affect the values
oftheobservations. Thisdistinction is,ofcourse,exactlythatbetween
one-criterionvarianceanalysis with equal samplesizes and two-crite-
rionvarianceanalysis with one observation percell.
For comparing theweights of men and women, data ofthefirstkind
mightbe obtainedby measuring a random sample ofni menand an
independent randomsample n2of women. Such data wouldordinarily
be analyzedby one-criterion variance analysis, as described in Section
1.2 above,whichin thetwo-sample case is equivalent to the two-tailt
test withni+n2-2 degreesof freedom.The H test, or the two-sample
versionofit givenby (3.4), wouldalso be applicable.
Data ofthesecondkindforthesameproblemmightbe obtainedby
n ages (notnecessarily
selecting and foreach ageselecting
all different)
at randomone man and one woman. Such data wouldordinarily be
13 Our attentionhas been directedby Harold Hotellingto the use of ranksby Galton [12, Chaps.

us to the extensiveanalysesof
4 and 5] in 1889. ChurcbillEisenhartand I. RichardSavage have referred
ranksby eighteenthcenturyFrench mathematiciansin connectionwithpreference-ordering problems,
specificallyelections.The earliestworkthey mentionis by Borda [1] in 1770, and theymentionalso
Laplace [26] in 1778, Condorcet [3] in 1786,and Todhunter'ssummaryof these and relatedwritings
[51,Secs. 690, 806, 989, 990]. Systematictreatmentof ranks as a nonparainetricstatistical device,
however,seemsto commencewiththe workofHotellingand Pabst [19] in 1936.
602 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1952

analyzed by two-criterionvariance analysis, the between-sexescom-


ponentbeingthe one tested. This test would be equivalent to the two-
tail t test of the mean difference,
withn-1 degreesof freedom.Fried-
man's xr2 [10],or the two-tailsigntest whichis its two-sampleversion,
would be appropriate.'4
The H test thus providesa rank test fordata of the firstkind,just
as the Xr2 test does fordata of the second kind. H makes it possibleto
test by ranks the significanceof a groupingaccordingto a single cri-
terion.The effectof one criterioncannotbe tested by Xr2 unlessthe ob-
servationsin the different groups are matched accordingto a second
criterion.On the otherhand, if the data are matched H is not appro-
priate and Xr2 should be used.
5.3. Wilcoxon'sTwo-SampleTest
The H test in its generalformis new, as faras we know,"5but not its
two-sampleform.
5.3.1. Wilcoxon(1945, 1947). Wilcoxonwas the first,we believe,to
introducethe two-sampleform.His firstpaper [61] considersthe case
of two samples of equal size and gives true probabilitiesforvalues of
the smallersum ofranksin the neighborhoodofthe 0.01, 0.02, and 0.05
probabilitylevelsforsample sizes from5 to 10. A methodof calculating
the trueprobabilitiesis given.An example uses the mean-rankmethod
forties,interpreting the resultin termsof a table forthe no-tiessitua-
tion.
In a second paper [62] on the case of two equal samples, Wilcoxon
gives a normal approximationto the exact distribution,basing it on
the theoryofsamplingwithoutreplacementfroma finiteuniformpopu-
lation, along the lines of Section 3.1 of the presentpaper. A table of 5
per cent,2 per cent,and 1 per centsignificance levels forthe smallerto-
tal is given,coveringsample sizes from5 to 20.
5.3.2. Festinger(1946).16 Wilcoxon's test was discovered independ-
ently by Festinger[7], who considersthe case wherethe two sample
sizes,n and m, are not necessarilyequal. He gives a methodofcalculat-
ing true probabilities,and a table of two-tail5 per cent and 1 per cent
14 For otherdiscussionsof Xrt,see Kendall and Smith [21], Friedman [11], and Wallis [55].
15 Afteran abstract [24] ofa theoretical
version[25] ofthe presentpaper was publishedwe learned
fromT. J. Terpstrathatsimilarworkhas been done at the MathematicalCenter,Amsterdam,and that
papers closelyrelatedto the H testwillbe publishedsoon by himself[50] and by P. G. Rijkoort [44];
also that P. van Elterenand A. Benard are doingsome researchrelated to Xr2.References[50] and [44]
proposetestsbased upon statisticssimilarto, but not identicalwith,H.
Alan Stuarttellsus thatH. R. van der Vaart (UniversityofLeiden) has been planninga generaliza-
tion of the Wilcoxontest to severalsamples.
P. V. Krishna Iyer has announced [23] "a non-parametricmethod of testingk samples.' This
briefannouncementis not intelligibleto us, but it states that "full details will be publishedin the
JournaloftheIndian SocietyofAgricultural Research."
16We are indebtedto Alan Stuart forcallingour attentionto Festinger'spaper.
USE OF RANKS IN ONE-CRITERION VARIANCE ANALYSIS 603
significancelevels forn from2 to 12 withm fromn to 40-n, and forn
from13 to 15 with m fromn to 30-n; and moreextensivetables are
available fromhim. A large proportionof the entriesin Festinger's
table, especiallyat the 5 per cent level, seem to be slightlyerroneous.'
5.3.3. Mann and Whitney(1947). Mann and Whitney[28] made an
importantadvance in showingthat Wilcoxon'stestis consistentforthe
null hypothesisthat the two populationsare identical against the al-
ternativethat the cumulative distributionof one lies entirelyabove
that ofthe other."7They discussthe testin termsof a statisticU which,
as they point out, is equivalent to Wilcoxon'ssum of ranks (our R).
When all observationsfromboth samples are arrangedin order,they
count foreach observationin one sample, say the first,the numberof
observationsin the second sample that precede it. The sum of these
countsforthe firstsample is called U. It is relatedto R, the sum ofthe
ranksforthe firstsample,by'8
n(n + 1
(5.1) U= R-
2
They give a table showingthe one-tailprobabilityto threedecimalsfor
each possible value of U, forall combinationsof sample sizes in which
the largersample is fromthreeto eight.'9
Hemelrijk[16] has pointedout recentlythat U, and consequentlyR
forthe two-samplecase, may be regardedas a special case of Kendall's
coefficient of rank correlation[20].
5.3.4. Haldane and Smith (1948).20Haldane and Smith [14] devel-
oped the Wilcoxontest independentlyin connectionwiththe problem
of decidingwhetherthe probabilityof a hereditarytraitappearingin a
particularmemberof a sibshipdependson his birthrank.They propose
a test based on the sum ofthe birth-ranks
ofthosemembersof a sibship
havingthe trait-i.e., our R-where N is the numberin the sibshipand
n is the numberhavingthe trait. They develop an approximatedistri-
butionfromthetheoryofsamplingfroman infinite, continuous,uniform
population,and approximatethis by the unit normaldeviate given in
17 Actuallythe test isconsistentundermoregeneralconditions;see Section5.3.6 (iv).
18 Mann and Whitney'sversionofthisformulais a trifledifferent because theyrelatethe countin
the firstsample (our terminology)to the sum ofranksin the othersample.
19We have recomputedthe Mann-Whitneytable to additionaldecimals. It agrees entirelywith
our computations.
20 We are indebtedto Alan Stuart forcallingour attentionto the Haldane and Smithpaper.
Blair M. Bennett,Universityof Washington,is computingpowerfunctionsforthe Wilcoxontest
againstalternativesappropriateto thebirth-order problem.Bennettemphasizes,in a personalcommuni-
cation,that the distributionof R under the null hypothesiscorrespondsto a partitionproblemwhich
has been studiedin the theoryof numbersforcenturies-in particularby Euler [6a, Chap. 16],who in
1748 consideredcloselyrelatedpartitionproblemsand theirgeneratingfunctions,and by Cauchy [2a,
Numbers225, 226]. In fact,Euler [6a, p. 252*]givesa table whichis in part equivalentto that ofMann
and Whitney[28]. This number-theoretic approachis discussedby Wilcoxon[61].
604 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1952

this paper as (3.4)-including the continuityadjustment,which they


seem to be the firstto use. They tabulate the means and variances of
6R forvalues of N from2 to 20, withn from1 to N. They also give a
table of exact probabilities(not cumulated) for all possible values of
n up to N=12.
Haldane and Smith discuss the problemof ties in connectionwith
multiplebirths.They proposeto assign to each memberof each birth
the rank of that birth.In our terminology, they give each memberof
a tied groupthe lowestofthe rankstied for,and give the nextindivid-
ual or groupthe next rank,not the rank afterthe highestin the group
tied for. For a test in this case, they referto the theoryof sampling
withoutreplacementfroma finitebut non-uniform population.
With the Haldane-Smithmethodof handlingties, the difference be-
tween the ranks of two non-tiedobservationsis one more than the
numberof distinctvalues or groupsinterveningbetweenthe two, re-
gardlessofthe numberof interveningindividuals;withthe mean-rank
method,the difference is one morethan the numberof observations in-
tervening,plus halfthe numberof otherobservationshavingthe same
rank as eitherofthe two observationsbeingcompared.The mean-rank
methodseemspreferablewhenthe cause ofties is measurementlimita-
tions on an effectively continuousvariable,the Haldane-Smithmethod
when the cause is actual identity.Unfortunately, the Haldane-Smith
methoddoes not lend itselfso readily as does the mean-rankmethod
to simple adjustment of the formulasfor the no-ties case, since the
necessaryadjustmentsdepend upon the particularranks tied for,not
merelythe numberof ties.
5.3.5. White(1952). Tables of critical values of R at two-tail sig-
nificancelevels of 5, 1, and 0.1 per cent for all sample sizes in which
N<?30 are given by White [59].21He suggeststhat ties be handled by
the mean-rankmethod,not allowingforits effecton the significance
level, or else by assigningthe ranksso as to maximizethe finalproba-
bility,whichmay then be regardedas an upperlimitforthe true prob-
ability.
5.3.6. Powerof Wilcoxon'stest.The power of nonparametrictests in
general,and of the H test in particular,is difficult to investigate;but
21 Comparisonof the 5 and 1 per cent levels givenby White withFestinger'searlierand moreex-

tensivetable [71 shows 104 disagreements among392 comparableentries(78 disagreements among 196
comparisonsat the 5 per centlevel, and 26 among 196 at 1 per cent). In each disagreement, Festinger
gives a lowercriticalvalue of the statistic,althoughboth writersstate that they have tabulated the
smallestvalue of the statisticwhoseprobabilitydoes not exceed the specifiedsignificance level. Three
of the disagreementscan be checkedwith the Mann-Whitneytable [28]; in all three,White's entry
agrees withMann-Whitney's.In one additionalcase (samplesizes 4 and 11 at the 1 per centlevel) we
have made our own calculationand foundFestinger'sentryto have a true probability(0.0103) ex-
ceedingthe stated significancelevel. The disagreements undoubtedlyresultfromthe factthat the dis-
tributionsare discontinuous,so that exact 5 and 1 per centlevels cannotordinarilybe attained.
USE OF RANKS IN ONE-CRITERION VARIANCE ANALYSIS 605
forthe special case of Wilcoxon'stwo-sampletest certaindetails have
been discovered.Some that are interestingfroma practical viewpoint
are indicatedbelow, but withoutthe technicalqualificationsto which
they are subject:
(i) Lehmann [27] has shownthat the one-tailtest is unbiased-that
is, less likelyto reject whenthe null hypothesisis true than when any
alternativeis true-but van der Vaart [52] has shownthat the corre-
spondingtwo-tailtest may be biased.
(ii) Lehmann [27] has shown, on the basis of a theoremof Hoeff-
ding's [17],that underreasonable alternativehypotheses,as underthe
null hypothesis,the distributionof \/1 is asymptoticallynormal.
(iii). Mood [33] has shown that the asymptoticefficiencyof Wil-
coxon's test comparedwith Student's test, when both populationsare
normal with equal variance, is 3/ir, i.e., 0.955. Roughly, this means
that 3/r is the limitingratio ofsample sizes necessaryforthe two tests
to attain a fixed power. This result was given in lecture notes by
E. J. G. Pitman at Columbia Universityin 1948; it was also given by
van der Vaart [52]. To the best of our knowledge,Mood's proofis the
firstcompleteone.
(iv) Lehmann [27] and van Dantzig [15, 51a], generalizingthe find-
ingsofMann and Whitney[28],have shownthat the test is consistent'2
ifthe probabilitydiffers fromone-halfthat an observationfromthe first
populationwillexceed one drawnindependentlyfromthe second popu-
lation (forone-tailteststhe conditionis that the probabilitydiffer
from
one-halfin a stated direction). In addition van Dantzig [51a] gives
inequalities for the power. The C-sample condition for consistency
given by Kruskal (see Section 4.1) is a direct extension of the two
sample conditiongivenby Lehmann and van Dantzig.

5.4. Whitney'sThree-SampleTest
Whitney[60] has proposedtwo extensionsofthe Wilcoxontest to the
three-samplecase. Neither of his extensions,which are expressed in
termsofinversionsof orderratherthan in termsofranks,is equivalent
to our H test forC=3, since Whitneyseeks tests with power against
morespecificalternativesthan those appropriateto the H test.
Whitneyarraysall threesamplesin a singlerankingand thendefines
U as the number of times in which an observationfromthe second
sample precedesan observationfromthe firstand V as the numberof
timesin whichan observationfromthe thirdsample precedesone from
the first.22
22 U and V are not determinedby R1,R2,and R,, norvice versa,though
U + V - - ini(ni + 1)
606 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1952

Whitney'sfirsttest,whichrejectsthe null hypothesisof equality of


the populationsifboth U and V are too small (alternatively,too large),
is suggestedwhen the alternativeis that the cumulative distribution
of the firstpopulationlies above (alternatively,below) those of both
the second and thirdpopulations.His second test,whichrejectsif U is
too large and V is too small, is suggestedwhen the alternativeis that
the cumulativedistributionofthe firstpopulationlies below that ofthe
second and above that of the third.

5.5 Terpstra's
C-sampleTest.
Terpstra [50a] has proposed and investigateda test appropriatefor
alternativessimilarto those of Whitney'ssecond test, but extending
to any numberof populations.

5.6. Mosteller's
C-SampleTest
Mosteller[34] has proposeda multi-decisionprocedureforaccepting
eitherthe null hypothesisto whichthe H test is appropriateor one of
the C alternativesthat the ith populationis translatedto the right(or
left) of the others.His criterionis the numberof observationsin the
sample containingthe largestobservationthat exceed all observations
in othersamples. This procedurehas been discussed furtherby Mos-
tellerand Tukey [35].

5.7. Fisherand Yates'Normalized


Ranks
Fisher and Yates have proposed [9, Table XX] that each observa-
tion be replaced not by its simplerank but by a normalizedrank, de-
finedas the average value ofthe observationhavingthe corresponding
rank in samples of N froma normalpopulationwith mean of zero and
standard deviation of one. They propose that ordinaryone-criterion
variance analysisthenbe applied to thesenormalizedranks.Ehrenberg
[5] has suggestedas a modificationusingthe values of a randomsam-
ple of N fromthe standardizednormalpopulation.
Two advantages mightconceivablybe gained by replacingthe obser-
vations by normalizedranks or by some otherset of numbersinstead
ofby simpleranks.First,it mightbe that the distributiontheorywould
be simplified.Quite a large class of such transformations,
forexample,
lead to tests whose distributionis asymptoticallyX2(C- 1); but for
some transformations the x2 approximationmay be satisfactoryat
smallersample sizes than forothers,thus diminishingthe area of need
forspecial tables and approximationssuch as thosepresentedin Sec. 6.
USE OF RANKS IN ONE-CRITERION VARIANCE ANALYSIS 607
Second, the power of the test mightbe greater against important
classes of alternatives.
Whethereitherof these possible advantages over ranks is actually
realized by normalizedranks,or by any otherspecifictransformation,
has not to our knowledgebeen investigated.Offhand,it seems intui-
tively plausible that the X2 distributionmight be approached more
rapidly with normalizedranks, or some other set of numberswhich
resemblethe normalformmore than do ranks. On the otherhand, it
seems likely that if there is such an advantage it is not very large,
partly because the distributionof means froma uniformpopulation
approaches normalityrapidly as sample size increases,and partlybe-
cause (as Section 6 indicates) the distributionof H approaches the x2
distributionquite rapidly as sample sizes increase. As to power, we
have no suggestions,except the obvious one that the answeris likely
to differfordifferent alternativesof practcal interest.23
5.8. OtherRelatedTests
A numberof tests have been proposedwhichhave more or less the
same purpose as H and are likewisenon-parametric.We mentionhere
only two of the principalclasses of these.
5.8.1. Runs. Wald and Wolfowitz[53] have proposed forthe two-
sample case that all observationsin both samples be arrangedin order
of magnitude,that the observationsthen be replaced by designations
A or B, accordingto whichsample they represent,and that the num-
ber of runs (i.e., groupsof consecutiveA's or consecutiveB's) be used
to testthe null hypothesisthat both samples are fromthe same popula-
tion. The distributiontheoryofthistest has been discussedby Stevens
[48], Wald and Wolfowitz[53], Mood [31], Krishna Iyer [22], and oth-
ers; and Swed and Eisenhart [49] have provided tables coveringall
cases in which neithersample exceeds 20. For largersamples, normal
approximationsare given by all the writersmentioned. Wald and
Wolfowitzdiscussed the consistencyof the test, and later Wolfowitz
[63] discussed its asymptoticpower. An extensionto cases of three or
moresamples has been given by Wallis [56],based on the distribution
theoryof Mood and Krishna Iyer.
5.8.2. Orderstatistics.Westenberg[58] has suggesteda test for the
two-samplecase utilizingthe numberof observationsin each sample
above the median of the combinedsamples. Mood and Brown [32, pp.

23When the truedistributions are normal,Hoeffding[18] has shownthat in manycases, includinig


at least some analysisofvarianceones,the testbased on normalizedranksbecomesas powerfulas that
based on the actual observations,when the sample sizes increasetowardinfinity.
608 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1952

394-5, 398-9] have discussedthe test furtherand generalizedit to sev-


eral samples. Massey [29] has generalizedthe test furtherby using
otherorderstatisticsofthe combinedsamples as a basis fora morere-
finedclassification.

6. SIGNIFICANCE LEVELS, TRUE AND APPROXIMATE

6.1. True SignificanceLevels


6.1.1. Two samples. Festinger[7], Haldane and Smith [14], Mann
and Whitney[28], White [59], and Wilcoxon [61, 62] have published
tables for the two-sample case. These are described in Section 5.3.
They are exact only if ties are absent or are handled by the random-
rank method,but our guess is that they will also serve well enough if
the mean-rankmethodis used and thereare not too manyties.
6.1.2. Threesamples.(i) Five orfewerobservations in eachsample.For
each ofthese cases, Table 6.1 showsthreepairs ofvalues ofH and their
probabilitiesof being equalled or exceeded if the null hypothesisis
true24.Each pair bracketsas closelyas possiblethe 10, 5, or 1 per cent
level,exceptthat in some cases one or both membersofa pair are miss-
ing because H can take only a small numberof values. The finalsen-
tence of Section 6.1.1, about ties, applies to Table 6.1 also.
(ii) More thanfiveobservations in each sample. No exact tables are
available forthesecases. Our recommendation is that the X2approxima-
tion be used. Onlyat verysmall significance levels (less than 1 per cent,
say) and sample sizes only slightlyabove fiveis therelikelyto be ap-
preciableadvantage to the morecomplicatedr and B approximations
describedin Section 6.2. This recommendationis based only on the
comparisonsshown in Table 6.1, no true probabilitieshaving been
computedin this category.
(iii) Intermediatecases. No exact tables are available here. The r
and B approximationsprobably should be resortedto if more than
roughlyapproximateprobabilitiesare required. Except at very low
significancelevels or with very small samples, the r approximation,
whichis simpler,should serve. This recommendationis not veryfirm,
however,since we have computedno trueprobabilitiesin thiscategory.
6.1.3. More than threesamples. Since we have computed no true
probabilitiesfor more than three samples, our recommendationshere
24 These computations and othersused forthispaper weremade by JohnP. Gilbertwiththe assist-
ance of Billy L. Foster,Thomas 0. King, and Roland Silver. Space preventsreproducingall or even
mostofthe results,but we hope to filethemin such a way that interestedworkersmay have access to
them.We have the truejoint distributionsofR1,R2, and Rs underthe null hypothesisforni, n2,and ns,
each from1 through5, and the truedistributionof H underthe same conditions,exceptthat forsome
cases we have probabilitiesonlyforthose values of H exceedingthe upper twentyper centlevel.
USE OF RANKS IN ONE-CRITERION VARIANCE ANALYSIS 609
must be entirelytentative.It seems safe to use the X2approximation
whenall samplesare as largeas five.If any sample is muchsmallerthan
five,the r or B approximationshould probablybe used, especially at
low significancelevels,thoughthe importanceofthispresumablyis less
the largerthe proportionofsamples of morethan five.

6.2. ApproximateSignificanceLevels
6.2.1. X2approximation.This is the approximationdiscussedin Sec-
tions 1, 2, and 3. The most extensivesingletable is that of Hald and
Sinkbaek [13], thoughthe table in almost any modernstatisticstext
will ordinarilysuffice.
6.2.2. r approximation.This utilizes the incomplete-rdistribution
by matchingthe variance as well as the true mean of H. The mean, or
expectedvalue, of H underthe null hypothesisis [25]
(6.1) E= C-1
and the variance is

(6.2) C
2[3C2 - 6C + N(2C2 - 6C + 1)] 6 1
5N(N +1) 5 j-1 ni

One way of applyingthe approximationis to enteran ordinaryx2table


taking x2= 2HE/V and degrees of freedomf= 2E2/V. Note that the
degrees of freedomwill not ordinarilybe an integer,so interpolation
willbe requiredin both x2and the degreesoffreedomifthe fourbound-
ing tabular entriesdo not definethe probabilityaccuratelyenough.25
6.2.3. B approximation.This utilizes the incomplete-Bdistribution
by matchingthe truemaximumas well as the mean and variance of H.
The maximumvalue ofH is [25]
c
N8 - 2ni
(6.3) M = _
N(N+ 1)

To apply the approximation,K. Pearson's table of the incomplete-B


distribution[39] may be employed,but it is usually moreconvenientto
use the F distribution,a formof the incomplete-Bdistribution,since
2a The r approximations shownin Table 6.1 werebased on K. Pearson's table ofthe incomplete-r
function[40]. In Pearson's notation,the requiredprobabilityis 1 -I(u, p), whereu H/ v V and p =
E2/V -1. We used lineardouble interpolation,whichon a few tests seemed to be satisfactoryin the
regionof interest.
610 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1952

tables of F are widelyaccessible to statisticians.26


We set
(6.4) F = H(M-E)
E(M-H)
with degreesof freedom(not usually integers)
E(M-E)-V
(6.5) fi = E.
,1MV
E(M-E)-V M -E
(6.6) f2 = (M - E) -MV - E

The probabilitymay then be obtained by three-wayinterpolationin


the F tables or by using Paulson's approximation[36], accordingto
whichthe requiredprobability,P, is the probabilitythat a unit normal
deviate will exceed
(6 7) -
(1 - 2/9f2)F' + 2/9fi- 1
V/2F'2/9f2 + 2/9fi
whereF'=-F.
As an illustration,suppose C=3, ni= 5, n2= 4, n3=3, and H = 5.6308.
From (6.1), (6.2), and (6.3) we findE=2, V =3.0062, and M=9.6923.
Substitutingtheseinto (6.4), (6.5), and (6.6) gives F= 5.332,fi = 1.699,
and f2= 6.536. Then (6.7) gives Kp = 1.690, forwhich the normal dis-
tributionshowsa probabilityof 0.046. This may be comparedwiththe
true probabilityof 0.050, the x2approximationof 0.060, and the r ap-
proximationof 0.044, shownin Table 6.1.27
26The mostdetailedtable of the F distributionis that of Merringtonand Thompson [30].
27 The B approximationsshownin Table 6.1 are based on K. Pearson's table of the incomplete-B
function[39]. In Pearson's notation,the requiredprobabilityis 1 -Iz(p, q), wherex =H/M, p =fi,
and q = ifi. To simplifythe three-wayinterpolation, the followingdevice (based on the relationof the
incomplete-Bto the binomialdistribution, and of the binomialto the normaldistribution)was used:
First,let po, qo, and x, be the tabulated argumentsclosestto p, q, and x, and as a firstapproximation
to the requiredprobabilitytake 1 -Ih(po, qo). Second,add to this firstapproximationthe probabil-
ity that a unit normaldeviate willnot exceed (in algebraic,not absolute,value)
p I2 -x(p +q -1)
K =
/*X(1- x) (p + q - 1)
Third,subtractfromthis the probabilitythat a unit normaldeviate will not exceed Ko, whereKo is
definedlike K but in termsofpo,qo,and xo.This methodofinterpolationwas comparedat threepoints
withthetrivariateEverettformulato thirddifferences as presentedby Pearson [39, Introduction].The
resultswerenot excellent,but seemedto sufficeforthe presentpurposes.
Strictlyspeaking,all our statementsand numericalresultsconcerningthe B approximation(in-
cludingentriesin Table 6.1) actuallyapply to thatapproximationbased on Pearson'stablesin combina-
tion withthisnormalinterpolationdevice.
Values calculatedin this way will not in generalagree preciselywith those calculated by inter.
polatingin the F tables or by usingPaulson's approximation,thoughthe examplein the textagreesto
threedecimals.
USE OF RANKS IN ONE-CRITERION VARIANCE ANALYSIS 611

'__8o g g 0g g g 0 0 00 0 0 0 O D tt rN
IQ w0 CM 0. (D ) *0
CM '-0 000 0 0 0
Alj /

E
x
/ /
0
A '~~99q0 0 0 009900 0m
X~~~~~~~~~~
I I ;tg/ I
FtM?ThFT I ! a 1@
ol~~~~~~~~~~~~~~~4
s/~~~~~~~~~~~~ /

44)
IL 44)
0

_ _/
_ _ _ / _ _ _ 0'

0
_
j~~~~~~~t t . " ____ . .
c ._____ _ C
i,~~~~~~C OD

O v~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~O

E-4)

14)

x0~~~~~~~~~~~~~~~~~~~~~
Al~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*

I O 0 t0
O O O <) O O O
0 O
0 OD ( D u) st 0) oi
0 OD 0 0 0 0 0 o0 OD D tn Tn C - O) O O O O) O
CL? 04?)S - O O O O O O O s 0, 0
O,o 0
612 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1952

--o 0 0 o 0 0 0 0 0 0 o 0 0 0 0o co D fn c\ _
000 0o 0 0 0 o0 o@ Wtfl ) oJ -0 0 0 0 0 0
I Qa e ' <4 -7 O9000O O 9 O 0990 0 0 o 0

I 7 7 | ||,rZ Y l
CL / Z/
0

E0c C CL~~~~~~~~~~~~~~~~~~~~~C
/*/

. . / tZ . ---'.- s =~~~~~~~~~
1~ . _ _ _ _ _
0
m~~~~~
0c_
WtflS N)
//00990 9 999900
e .... Q
/

/~~~~~~~~~~~~~~~~~~~- Cs
I i /f - I '''- ' @ mER~~~~~~~

/y~~~~~~~~~~~~~~~~~~~C
/Ca
/</~~~~~~~~~~~~~~~~~~~ 0~~~~~~~
/A/~~~~~~~~~~~~~~~~~~~~~~~~~l O 0
P4

.,~~~~~~~~~~~~~~~c
,- _ -

XY r;~~~~~~~~~~~

/o/ =~~~~~~~~~~~~~~~~
/^/ ?~~~~~~~~~~~~
,i,;,
xf/vv? O=~~~~~~~~~~~~~~c

/>/ rt~~~~~~~~~~~~

_w .
_ - 0 -) .1 \ - __ . 0- JZ 0
_0 0 0 q

OD
(.u rf/ CM - - .- , 0 0
USE OF RANKS IN ONE-CRITERION VARIANCE ANALYSIS 613

.02I .0 I I -- T
I I I II
? B Approximation
.01 _ X -

2
-ox x

0 X?r
0 Approximation

Al /\t-03
_.0 303 I
c001
1 I1 I1 - I I I I I I I I
0 .02 .03 .04 .05 .06 .07 .08 .09 .10 11 .12 .13 .i
( .02 XO
A-0 I I
II I III
0 00 I I I
.02 0 _

E c_
03 ~ 0
x 0
0

0 .01 .02 .03 .04 .05 .06 .07 .08 .09 .10 .11 .12 .13 .14

FUE.
for
H in the negbrodso 0 oI h ,
,ad10prcnoit,frthe.ape Approximtion

fr in thXegbrod fte1 ,ad10prcn ons


X o he ape
of size02 to 5. Crse niaeta h mletsml i:bexeds2,(;c
that it is 2.Cases involvng samples o 1 and a few nvolving samles of 2 hav
been
t omited
s sp o 1d w vol

be om it .1 .1 1
614 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1952
TABLE 6.1
TRUE DISTRIBUTION OF H FOR THREE SAMPLES, EACH OF
SIZE FIVE OR LESS, IN THE NEIGHBORHOOD OF THE
10, 5, AND 1 PER CENT POINTS; AND COM-
PARISON WITH THREE APPROXIMATIONS
The probabilitiesshown are the probabilitiesunder the null hypothesis
that H will equal or exceed the values in the columnheaded "H"

Approximateminustrue
Sample Sizes True probability
H Proba- B
ni n2 n3 H r
(Linear (Normal
l n2
Probility x2
Interp.) Interp.)
2 1 1 2.7000 .500 -.241 -.309 -.500
2 2 1 3.6000 .267 - .101 - .167 - .267
2 2 2 4.5714 .067 +.035 -.007 -.067
3.7143 .200 - .044 - .083 +.010
3 1 1 3.2000 .300 - .098 - .180 - .300
3 2 1 4.2857 .100 +.017 - .040 - .100
3.8571 .133 +.012 - .045 - .042
3 2 2 5.3572 .029 +.040 +.083 -.029
4.7143 .048 +.047 +.012 +.014
4.5000 .067 +.039 +.003 +.020
4.4643 .105 +.002 - .033 - .014
3 3 1 5.1429 .043 +.034 - .010 - .043
4.5714 .100 +.002 - .046 - .062
4.0000 .129 +.007 - .041 - .024
3 3 2 6.2500 .011 +.033 +.012 -.011
5.3611 .032 +.036 +.010 +.001
5.1389 .061 +.016 - .012 - .019
4.5556 .100 +.002 - .027 - .020
4.2500 .121 - .002 - .031 - .014
3 3 3 7.2000 .004 +.024 +.010 -.004
6.4889 .011 +.028 +.011 - .001
5.6889 .029 +.030 +.009 +.003
5.6000 .050 +.011 - .010 - .015
5.0667 .086 - .006 - .029 - .026
4.6222 .100 -.001 -.025 -.010
4 1 1 3.5714 .200 - .032 - .114 - .200
4 2 1 4.8214 .057 +.033 -.017 -.057
4.5000 .076 +.029 - .022 - .047
4.0179 .114 +.020 - .032 - .056
4 2 2 6.0000 .014 +.036 +.010 -.014
5.3333 .033 +.036 +.007 - .017
5.1250 .052 +.025 - .006 - .021
4.3750 .100 +.012 - .020 - .002
4.1667 .105 +.020 - .012 +.014
USE OF RANKS IN ONE-CRITERION VARIANCE ANALYSIS 615
TABLE 6.1 (Continued)

Approximateminustrue
Sample Sizes H True probability
Proba- r B
nli n2 na bility ((Linear (Normal
Interp.) Interp.)
4 3 1 5.8333 .021 +.033 -.001 -.021
5.2083 .050 +.024 - .016 - .037
5.0000 .057 +.025 - .016 - .034
4.0556 .093 +.039 - .005 +.014
3.8889 .129 +.014 - .028 - .003
4 3 2 6.4444 .009 +.031 +.012 -.002
6.4222 .010 +.030 +.011 - .004
5.4444 .047 +.019 - .005 - .010
5.4000 .052 +.016 - .008 - .013
4.5111 .098 +.006 - .020 - .004
4.4667 .101 +.006 - .020 - .003
4 3 3 6.7455 .010 +.024 +.010 -.001
6.7091 .013 +.022 +.007 - .003
5.7909 .046 +.010 - .009 - .013
5.7273 .050 +.007 - .012 - .015
4.7091 .094 +.001 - .021 - .006
4.7000 .101 - .006 - .027 - .012
4 4 1 6.6667 .010 +.026 +.002 -.010
6.1667 .022 +.024 - .005 - .020
4.9667 .048 +.036 - .003 - .009
4.8667 .054 +.034 - .005 - .009
4.1667 .082 +.042 +.002 +.016
4.0667 .102 +.029 - .011 +.007
4 4 2 7.0364 .006 +.024 +.010 -.002
6.8727 .011 +.021 +.006 - .005
5.4545 .046 +.020 - .002 - .003
5.2364 .052 +.021 - .002 +.001
4.5545 .098 +.005 - .019 - .003
4.4455 .103 +.006 - .018 +.000
4 4 3 7.1439 .010 +.018 +.007 -.002
7.1364 .011 +.018 +.006 - .003
5.5985 .049 +.012 - .005 - .004
5.5758 .051 +.011 - .006 - .005
4.5455 .099 +.004 - .015 +.003
4.4773 .102 +.004 -.014 +.004
4 4 4 7.6538 .008 +.014 +.005 .000
7.5385 .011 +.012 +.003 - .002
5.6923 .049 +.009 - .006 - .002
5.6538 .054 +.005 - .010 - .007
4.6539 .097 +.001 - .015 +.004
4.5001 .104 +.001 - .015 +.007
5 1 1 3.8571 .143 +.003 - .109 -.143
616 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1952

TABLE 6.1 (Continued)

Approximateminustrue
Sample Sizes True probability
H Proba- B
nl n2 n3 bility x2 (Linear (Normal
Interp.) Interp.)
5 2 1 5.2500 .036 +.037 -.006 -.036
5.0000 .048 +.034 +.011 - .037
4.4500 .071 +.037 - .012 - .020
4.2000 .095 +.027 - .022 - .018
4.0500 .119 +.013 - .036 - .024
5 2 2 6.5333 .008 +.030 +.010 -.008
6.1333 .013 +.033 +.010 - .010
5.1600 .034 +.041 +.013 +.008
5.0400 .056 +.025 - .004 - .006
4.3733 .090 +.022 - .007 +.010
4.2933 .122 - .005 - .034 - .014

5 3 1 6.4000 .012 +.029 +.002 - .012


4.9600 .048 +.036 - .004 - .010
4.8711 .052 +.036 - .004 - .009
4.0178 .095 +.039 - .002 +.018
3.8400 .123 +.024 - .016 +.010
5 3 2 6.9091 .009 +.023 +.007 - .006
6.8218 .010 +.023 +.007 - .006
5.2509 .049 +.023 - .000 +.001
5.1055 .052 +.026 +.003 +.006
4.6509 .091 +.006 - .018 - .005
4.4945 .101 +.005 - .020 - .003

5 3 3 6.9818 .010 +.020 +.008 -.002


6.8606 .011 +.022 +.008 - .001
5.4424 .048 +.018 - .000 +.002
5.3455 .050 +.019 +.000 +.004
4.5333 .097 +.007 - .013 +.004
4.4121 .109 +.001 - .018 +.000
5 4 1 6.9545 .008 +.023 +.002 -.008
6.8400 .011 +.022 - .000 - .011
4.9855 .044 +.038 +.002 - .001
4.8600 .056 +.032 - .005 - .005
3.9873 .098 +.038 +.001 +.018
3.9600 .102 +.036 - .000 +.018
5 4 2 7.2045 .009 +.018 +.005 - .005
7.1182 .010 +.018 +.005 - .005
5.2727 .049 +.023 +.002 +.005
5.2682 .050 +.021 +.000 +.004
4.5409 .098 +.005 - .017 - .002
4.5182 .101 +.004 - .018 - .002
USE OF RANKS IN ONE-CRITERION VARIANCE ANALYSIS 617
TABLE 6.1 (Continued)

Approximateminustrue
Sample Sizes True probability
H Proba- B
nl n2 n3 bility x2 (Linear (Normal
Interp.) Interp.)
5 4 3 7.4449 .010 +.014 +.004 -.004
7.3949 .011 +.014 +.004 - .004
5.6564 .049 +.010 - .005 - .004
5.6308 .050 +.010 - .006 - .004
4.5487 .099 +.004 - .013 +.003
4.5231 .103 +.001 - .016 - .000

5 4 4 7.7604 .009 +.01l +.003 -.002


7.7440 .011 +.010 +.002 - .003
5.6571 .049 +.010 - .004 +.000
5.6176 .050 +.010 - .004 +.001
4.6187 .100 - .001 - .016 +.003
4.5527 .102 +.001 - .014 +.005
5 5 1 7.3091 .009 +.016 -.002 -.009
6.8364 .011 +.022 +.001 - .009
5.1273 .046 +.031 - .003 - .005
4.9091 .053 +.032 - .002 - .002
4.1091 .086 +.042 +.007 +.020
4.0364 .105 +.028 - .007 +.008
5 5 2 7.3385 .010 +.016 +.004 -.004
7.2692 .010 +.016 +.004 - .004
5.3385 .047 +.022 +.003 +.006
5.2462 .051 +.022 +.002 +.007
4.6231 .097 +.002 - .018 - .005
4.5077 .100 +.005 - .016 - .001

5 5 3 7.5780 .010 +.013 +.004 -.001


7.5429 .010 +.013 +.004 - .002
5.7055 .046 +.012 - .003 +.000
5.6264 .051 +.009 - .005 - .002
4.5451 .100 +.003 - .012 +.007
4.5363 .102 +.002 - .014 +.005
5 5 4 7.8229 .010 +.010 +.003 -.002
7.7914 .010 +.010 +.003 - .002
5.6657 .049 +.010 - .003 +.001
5.6429 .050 +.009 - .003 +.001
4.5229 .099 +.005 - .009 +.010
4.5200 .101 +.004 - .010 +.008
5 5 5 8.0000 .009 +.009 +.003 -.002
7.9800 .010 +.008 +.002 - .003
5.7800 .049 +.007 - .005 - .001
5.6600 .051 +.008 - .004 +.001
4.5600 .100 +.003 -.010 +.008
4.5000 .102 +.004 - .009 +.009
618 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1952

6.3. ComparisonsofTrueand Approximate SignificanceLevels


Figures6.1 and 6.2 showthetrueprobabilities and theX2,r, and B
approximations whenthesamplesizesare 3, 4, and 5, and whenthey
are all 5.28
givenby thethreeap-
For each entryin Table 6.1 theprobabilities
proximations havebeencomputedand theirerrorsrecorded in thelast
threecolumnsof the table. In Figure6.3 theseerrorsare graphed
againstthe trueprobabilities.To avoid confusingthisfigure, sample
sizeshavenotbeenindicated;casesinvolving samplesofonehavebeen
omitted,and cases involvingsamplesoftwo have beendistinguished
fromthosein whichthesmallestsampleexceedstwo.
7. REFERENCES

[1] Borda, Jean Charles, "Memoiresur les 6lectionsau scrutin,"Memoiresde


l'AcademieroyaledesSciencesde Paris pourl'Annee1781,pp. 657-65.
[2] Brownlee, K. A., Industrial Experimentation, Third American Edition,
Brooklyn,Chemical PublishingCompany, 1949.
[2a] Cauchy, D'Augustin, "Oeuvres compl6tes,"Series 1, Volume 8, Paris,
Gauthier-Villars et Fils, 1893.
[31 Condorcet,le Marquis de (Marie Jean Antoine Nicolas Caritat), Essai
sur l'applicationde l'analysed la probabilitedesdgcisionsrenduesd la pluralitg
desvoix,Paris, 1785,pp. lvii,clxxviiff.
[4] DuBois, Philip, "Formulasand tables forrank correlation,"Psychological
Record,3 (1939), 46-56.
[5] Ehrenberg,A. S. C., "Note on normaltransformations of ranks," British
Journalof Psychology, StatisticalSection,4 (1951), 133-4.
[6] Eudey, M. W., On thetreatment ofdiscontinuous randomvariables,Technical
Report Number13, StatisticalLaboratory,Universityof California(Berk-
eley), 1949.
[6a] Euler, Leonhard, 'Introduction & l'analyse infinitesimale,"(translated
fromthe Latin editionof 1748 into Frenchby J. B. Labey), Vol. 1, Paris,
Chez Barrois,1796.
[7] Festinger,Leon, "The significanceof differences between means without
reference to the frequencydistribution function,"Psychometrika, 11 (1946),
97-105.
[8] Fisher, R. A., The Design of Experiments,Edinburgh,Oliver and Boyd
Ltd., 1935 and later.
[9] Fisher, Ronald A., and Yates, Frank, Statistical Tables for Biological,
Agricultural and Medical Research,Edinburgh,Oliverand Boyd Ltd., 1938
and later.
[10] Friedman,Milton, "The use of ranksto avoid the assumptionof normality
implicitin the analysis of variance," Journal of the AmericanStatistical
Association,32 (1937), 675-701.
[11] Friedman,Milton,"A comparisonof alternativetestsof significance forthe
problemof m rankings,"AnnalsofMathematical Statistics,11 (1940), 86-92.
in thispaper are the workofH.
28 All fourfigures IrvingForman.
USE OF RANKS IN ONE-CRITERION VARIANCE ANALYSIS 619
[12] Galton,Sir Francis,NaturalInheritance, London, Macmillanand Co., 1889.
[13] Hald, A., and Sinkbaek,S. A., "A table of percentagepointsof the x2-dis-
tribution,"SkandinaviskAktuarietidskrift, 33 (1950), 168-75.
[14] Haldane, J. B. S., and Smith,Cedric A. B., "A simpleexact test forbirth-
ordereffect,"Annals of Eugenics,14 (1947-49), 117-24.
[15] Hemelrijk,J., "A familyof parameterfree testsforsymmetry withrespect
to a given point. II," Proceedings,KoninklijkeNederlandseAkademievan
Wetenschappen, 53 (1950), 1186-98.
[16] Hemelrijk,J., "Note on Wilcoxon'stwo-sampletest whenties are present,"
Annals of MathematicalStatistics,23 (1952), 133-5.
[17] Hoeffding, Wassily,"A class of statisticswithasymptotically normaldistri-
butions,"Annals of Mathematical Statistics,19 (1948), 293-325.
[18] Hoeffding,Wassily, "Some powerfulrank order tests" (abstract), Annals
of MathematicalStatistics,23 (1952), 303.
[18a] Horn,Daniel, "A correction forthe effectof tied rankson the value of the
rank difference Journalof EducationalPsychology,
correlationcoefficient,"
33 (1942), 686-90.
[19] Hotelling,Harold, and Pabst, Margaret Richards,"Rank correlationand
tests of significanceinvolving no assumptionof normality,"Annals of
MathematicalStatistics,7 (1936), 29-43.
[20] Kendall, Maurice G., Rank CorrelationMethods,London, Charles Griffin
and Company,1948.
[21] Kendall, Maurice G., and Smith,B. Babington,"The problemof m rank-
ings,"Annals of MathematicalStatistics,10 (1939), 275-87.
[22] KrishnaIyer,P. V., "The theoryof probabilitydistributions of pointson a
line," Journal of the Indian Society of AgriculturalStatistics,1 (1948),
173-95.
[23] Krishna Iyer, P. V., "A non-parametricmethod of testingk samples,"
Nature,167 (1951), 33.
[24] Kruskal,WilliamH., "A nonparametric analogue based upon ranksof one-
way analysis of variance" (abstract), Annals of MathematicalStatistics,
23 (1952), 140.
[25] Kruskal, WilliamH., "A nonparametrictest forthe severalsample prob-
lem," Annals of Mathematical Statistics,23 (1952), 525-40.
[26] Laplace, PierreSimon, A PhilosophicalEssay on Probabilities,New York,
Dover Publications,Inc., 1951 (firstedition1814).
[27] Lehmann,E. L., "Consistencyand unbiasednessof certainnon-parametric
tests," Annals of MathematicalStatistics,22 (1951), 165-79.
[28] Mann, H. B., and Whitney,D. R., "On a testofwhetherone oftwo random
variables is stochasticallylargerthan the other,"Annals of Mathematical
Statistics,18 (1947), 50-60.
[29] Massey, Frank J., Jr., "A note on a two-sampletest," Annals of Mathe-
maticalStatistics,22 (1951), 304-6.
[30] Merrington,Maxine, and Thompson,CatherineM., "Tables of percentage
pointsof the invertedBeta (F) distribution," Biometrika,33 (1943), 73-88.
[31] Mood, A. M., "The distributiontheoryof runs," Annals of Mathematical
Statistics,11 (1940), 367-92.
[32] Mood, AlexanderMcFarlane, Introduction to the Theoryof Statistics,New
York, McGraw-HillBook Co., 1950.
620 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1952

[33] Mood, A. M. Unpublishedmanuscript,submittedto Annals of Mathe-


maticalStatistics.
[34] Mosteller,Frederick,"A k-sampleslippagetestforan extremepopulation,"
Annals of MathematicalStatistics,19 (1948), 58-65.
[35] Mosteller,Frederick,and Tukey, John W., "Significancelevels for a k-
sample slippage test," Annals of MathematicalStatistics,21 (1950), 120-3.
[36] Paulson, Edward, "An approximatenormalizationof the analysis of vari-
ance distribution,"Annals of Mathematical Statistics,13 (1942), 233-5.
[37] Pearson,E. S., "On questionsraised by the combinationof tests based on
discontinuousdistributions," Biometrika,37 (1950), 383-98.
[38] Pearson, Karl, "On a certaindouble hypergeometrical seriesand its repre-
sentationby continuousfrequencysurfaces,"Biometrika, 16 (1924), 172-88.
[39] Pearson, Karl, editor, Tables of the IncompleteBeta Function,London,
BiometrikaOffice,1934.
[401 Pearson, Karl, editor, Tables of the Incompleter-Function,London, Bio-
metrikaOffice,1951 (reissue).
[41] Pitman,E. J. G., "Significance testswhichmay be applied to samplesfrom
any populations,"Supplementto theJournalof theRoyal StatisticalSociety,
4 (1937), 119-30.
[42] Pitman,E. J. G., "Significance testswhichmay be applied to samplesfrom
any populations. II. The correlationcoefficient test," Supplementto the
Journalof theRoyal StatisticalSociety,4 (1937), 225-32.
[43] Pitman,E. J. G., "Significance testswhichmay be applied to samplesfrom
any populations.III. The analysisof variancetest," Biometrika, 29 (1937),
322-35.
[44] Rijkoort,P. G., "A generalizationof Wilcoxon'stest,"Proceedings, Konin-
klijkeNederlandse Akademievan Wetenschappen, 53 (1952).
[45] Scheffe,Henry, "Statisticalinferencein the non-parametric case," Annals
of MathematicalStatistics,14 (1943), 305-32.
[46] Snedecor,GeorgeW., StatisticalMethods,Ames, Iowa State College Press,
1937 and later.
[47] Splawa-Neyman,Jerzy,"Pr6ba uzasadnieniazastosowafirachunkuprawdo-
podobiefistwado do6wiadczefipolowych.(Sur les applicationsde la theorie
des probabilit6saux exp6riencesagricoles.Essay des principes),"Roczniki
Nauk Rolniczych, 10 (1923), 1-51. (Polish withGermansummary.)
[48] Stevens, W. L., "Distributionof groups in a sequence of alternatives,"
Annals of Eugenics,9 (1939), 10-17.
[48a] 'Student,' "An experimentaldeterminationof the probable errorof Dr.
Spearman's correlationcoefficient," Biometrika,13 (1921), 263-82. Re-
printedin 'Student's'CollectedPapers (edited by E. S. Pearson and John
Wishart),London,BiometrikaOffice,n.d., 70-89.
[49] Swed, Frieda S., and Eisenhart, C., "Tables for testing randomnessof
groupingin a sequence of alternatives,"Annals of MathematicalStatistics,
14 (1943), 66-87.
[50] Terpstra,T. J., "A non-parametric k-sampletest and its connectionwith
the H test." Unpublishedmanuscript.
[50a] Terpstra,T. J., "The asymptoticnormalityand consistencyof Kendall's
test against trend, when ties are presentin one ranking," Indagationes
Mathematicae, 14 (1952), 327-33.
USE OF RANKS IN ONE-CRITERION VARIANCE ANALYSIS 621
[51] Todhunter,Isaac, A Historyof the MathematicalTheoryof Probability
fromtheTime of Pascal to That of Laplace, New York, Chelsea Publishing
Company, 1949 (firstedition 1865).
[51al van Dantzig, D., "On the consistencyand the power of Wilcoxon'stwo
sample test," IndagationesMathematicae,13 (1951), 1-8; also Proceedings,
KoninklijkeNederlandseAkademievan Wetenschappen, 54 (1951), 1-8.
[52] van der Vaart, H. R., "Some remarkson the-powerof Wilcoxon'stest for
theproblemoftwo samples,"Proceedings, KoninklijkeNederlandseAkademie
van Wetenschappen, 53 (1950), 494-506, 507-20.
[53] Wald, A., and Wolfowitz,J., "On a test whethertwo samplesare fromthe
same population,"Annals of MathematicalStatistics,11 (1940), 147-62.
[54] Wald, A., and Wolfowitz,J., "Statisticaltestsbased on permutationsof the
observations,"Annals of MathematicalStatistics,15 (1944), 358-72.
[55] Wallis, W. Allen, "The correlationratio for ranked data," Journolof the
AmericanStatisticalAssociation,34 (1939), 533-8.
[56] Wallis, W. Allen, "Rough-and-readystatisticaltests," IndustrialQuality
Control,8 (1952), 35-40.
[57] Welch, B. L., "On the z-test in randomizedblocks and Latin Squares,"
Biometrika, 29 (1937), 21-52.
[58] Westenberg,J., "Significancetest for median and interquartilerange in
samplesfromcontinuouspopulationsof any form,"Proceedings, Koninklijke
NederlandseAkademievan Wetenschappen, 51 (1948), 252-61.
[59] White,Colin, "The use of ranksin a test of significance forcomparingtwo
treatments,"Biometrics, 8 (1952), 33-41.
[60] Whitney,D. R., "A bivariate extensionof the U statistic," Annals of
Mathematical Statistics, 22 (1951), 274-82.
[61] Wilcoxon,Frank, 'Individual comparisonsby rankingmethods,"Biometrics
Bulletin(now Biometrics),1 (1945), 80-3.
[62] Wilcoxon,Frank, "Probabilitytables forindividualcomparisonsby rank-
ing methods,"Biometrics,3 (1947), 119-22.
[63] Wolfowitz,J., "Non-parametricstatisticalinference,"Proceedingsof the
BerkeleySymposiumon MathematicalStatisticsand Probability(edited by
JerzyNeyman), Berkeleyand Los Angeles,Universityof CaliforniaPress,
1949, 93-113.

Вам также может понравиться