Вы находитесь на странице: 1из 71

RTCHARD voN MISES

Probability, Statistics and Truth


SECOND REVISED ENGLISH BY EDITION PREPARED

H ILD A

GE IR IN GE R

DOVER PUBLICATIONS.INC. NEW YORK

-rp'_

CONTENTS
PR EF AC E PR EF AC E T O TH E TH I R D C E R MA N E D I TI O N page iii

FIRST LECTURE

The Defnition of Probability


Amendment of Popular Terminology Explanation of Words Synthetic Definitions Terminology The Concept of Work in Mechanics An Historical Interlude The Purpose of Rational Concepts The Inadequacy of Theories Limitation of Scope Unlimited Repetition The Collective The First Step towards a Definition Two Different Pairs of Dice Limiting Value of Relative Frequency The Experimental Basis of the Theory of Games The Probability of Death Fhst the Collective-then the Probability Probability in the Gas Theory An Historical Remark Randomness Dcfinition of Randomness: Place Selection Tho Principle of the Impossibility of a Gambling System Example of Randomness Summary of the Definition

,
J

4 5 6 6 7 8 l0 ll t2 l3 t4 l6 l6 l8 20 2l 23 24 25 27 28

SE C ON D

LEC TU R E

The Elementsof the Theory of Probability


Thc Theory of Probability is a Science Similar to Others Thc Purpose of the Theory of Probability The Beginning and the End of Each Problem must be Probabilities Distribution in a Collective Probability of a Hit; Continuous Distribution Probability Density The Four Fundamental Operations First Fundamental Operation: Selection

30 3l
?t

34 35 36 38 t9

uF

C O N T E NT S Second Fundamental Operation: Mixing Inexact Statement of the Addition Rule Uniform Distribution Summary of the Mixing Rule T h i r d F u n d a m e n ta lOp e r a tio n : Pa r titio n Probabilities after Partition Initial and Final Probability of an Attribute The So-called Probability of Causes Formulation of the Rule of Partition Fourth Fundamental Operation: Combination A New Method of Forming Partial Sequences: Correlated Sampling Mutually Independent Collectives Derivation of the Mulriplication Rule Test of Independence Combination of Dependent Collectives Example of Noncombinable Collectives Summary of the Four Fundamental Operations A Problem of Chevalier de M6r6 Solution of the Problem of Chevalier de M6rd Discussion of the Solution Some Final Conclusions Short Revierv

C ON TEN TS 39 40 4l 43 43 45 46 46 47 48 49 50 5l 53 55 56 57 58 59 62 63 64
FOU R TH LE C TU R E

TheLaws of Large Nuntbers


Poisson'sTwo Different Propositions Equally Likely Events A;ithmetical Explanation SubsequentFrequency Definition The Content of Poisson'sTheorem to Example of a Sequence which Poisson'sTheorenr does not Apply Bernoulli and non-Bernoulli Sequences Derivation of the Bernoulli-Poisson Theorem Summary Inference Bayes's Problem Initial and Inferred Probability ofTrials Longer Sequences of Independence the Initial Distribution The Relation of Bayes'sTheorem to Poisson'sTheorem The Three Propositions Generalizationof the Laws of Large Numbers The Strong Law of Large Numbers The Statistical Functions The First Law of Large Numbers for Statistical Functions The Second Law of Large Numbers for Statistical Functions C l osi ng R emark s 104 106 107 109 I lO III ll2 I ll I 15 I 16 ll7 I l8 120 122 124 125 125 127 129 l3l 132 133

THIRD

L ECT URE

Critical Discussion the Foundations Probability of of


The Classical Definition of Probability Equally Likely Cases. . . Do Not Always Exist A Geometrical Analogy How to Recognize Equally Likely Cases Are Equally Likely Casesof Exceptional Significance? The Subjective Conception of Probability Bertrand's Paradox The SuggestedLink between the Classical and the New Definitions of Probability Summary of Objections to the ClassicalDefinition Objections to My Theory Finite Collecrives Testing Probability Statements An Objection to the First Postulate Objections to the Condition of Randomness Restricted Randomness Meaning of the Condition of Randomness Consistencyof the RandomnessAxiom A Problem of Terminology Objections to the Frequency Concept Theory of the Plausibilirv of State*entt The Nihilists Restriction to One Sinsle Initial Collective Probability as Part of ihe Theory of Sets Development of the Frequency iheory summary and Conclusion

F I F T H L E CTU R E 66 68 69 70 7l
t)

and Application Statistics the Tlrcoryof Errors in


What is Statistics? Games of Chance and Games of Skill Marbe's 'Uniformity in the World' Answer to Marbe's Problem Theory of Accumulation and the Law of Series Linked Events The General Purpose of Statistics Lexis' Theory of Dispersion The Mean and the Dispersion Comparison between the Observed and the Expected Variance Lexis'Theory and the Laws of Large Nuinbers Normal and Nonnormal Dispersion Sex Distribution of Infants Statisticsof Deaths with Supernormal Dispersion Solidarity of Cases Testing Hypotheses R. A. Fisher's'Likelihood' Small Sample Theory Social and Biological Statistics Mendel's Theory of Heredity Industrial and Technological Statistics An Example of Faulty Statistics Correction Some Results Summarized

75 79 EO 8l 82 E4 86 87 89 9l 92 93 94 95 97 98 99 l0l t02

t 35 r 36 r 38 r 39
t4l t42 t44 145 t46 148 r49 l 5l 152 t53 154 155 157 r58 160 I6l 162 t63 165

r 60

l--

C O N T E NT S Descriptive Statistics Foundations of the Theory of Errors Galton's Board Normal Curve Laplace's Law The Application of the Theory of Errors

166 t67 169 170 l7l 172

FIRST LECTURE

SIXTH

L ECT URE

Statistical Problems in Physics


The Second Law ofThermodynamics Dcterminism and Probability Chance Mechanisms Random Fluctuations Small Causes and Large Effects Kinetic Theory of Gases Order of Magnitude of 'Improbability' Criticism of the Gas Theory Brownian Motion Evolution of Phenomenain Time Probability'After-Effects' Time and Its Prediction Residence Entropy Theorem and Markoff Chains Svedberg'sExperiments Radioactivity Prediction of Time Intervals Marsden's and Barratt's Experiments Recent Development in the Theory of Gases Degenerationof Gases: Electron Theory of Metals Quantum Theory Statisticsand Causality Causal Explanation in Newton's Sense Limitations of Newtonian Mechanics Simplicity as a Criterion of Causality Giving up the Concept of Causality The Law of Causality New Quantum Statistics Are Exact MeasurementsPossible? Position and Velocity of a Material Particle Heisenberg's Uncertainty Principle Consequencesfor our Physical Concept of the World Final Considerations
S U MMARY OF THE SIX LECTURES IN SIXT EEN PR OPOSIT ION S N O T E S AND ADDENDA S U B JE CT INDEX N A ME INDEX

The Definition of Probability


174 175 177 r78 179 8t 83 84 86 87 88 89 92 94 95 97 98 99 200 202 203 2M 206 208 209 210 2tl 2t3 214 215 217 2t8

statistics and truth which To illustratethe apparentcontrastbetween might be inferred from the title of our book may I quote a remark I onceoverheard:'Thereare threekinds of lies: white lies,which are justifiable; common lies-these have no justification; and statistics.' Our meaningis similar when we say: 'Anything can be proved by figures'; or, modifying a well-known quotation from Goethe, with to numbers'all men may contendtheir charming systems defend.' At the basisof all suchremarksliesthe convictionthat conclusions are drawn from statisticalconsiderations at best uncertain and at and worst misleading. do not deny that a greatdeal of meaningless I unfoundedtalk is presented the public in the name of statistics. to But my purposeis to showthat, startingfrom statisticalobservations and applyingto them a clear and precise conceptof probability it is possible arrive at conclusions to which arejust as reliableand 'truthfull' and quite as practically useful as those obtained in any other exact science. order to achievethis purpose,I must ask you to In follow me along a road which is often laboriousand by pathswhich at first sight may appeariinnecessarily winding.
A M EN D M EN T OF POPU L AR TER M IN OL OGY

22r 221
237 243

'All our philosophyis a correctionof the commonusage words,' of saysLichtenberg.lMany of the quarrelsand mistakesoccurring in the courseof scientfficadvance could be avoidedif this remark were alwaysremembered. first step,therefore,will be to inquire more Our closelyinto the meaningof the word'probability'. This will be followed by argumentswhich will gradually lead us to an adequate scientific definition of the conceptof probability. I have already hinted that the key to the relation betweenstatisticsand truth may be found in a reaionabledefinition of probability. I hope that thii I

PROBAB I LI TY,

STATI STI CS

AND T R U T H

TH E D EFIN ITION

OF PR OBABIL ITY

point will becomequite clear in the courseof the subsequent discussiou, The word 'probable'is frequentlyused in everyday speech. We say,for instance, will probablyrain tomorrow',or, '[t is probably 'It snowingright now in Iceland',or 'The temperature was probably lowera yearagotodaythanit is today'.Again,wespeak something of beingmore or lessprobableor more or lessimprobablewhen,for example, are discussing guilt of an accused we the personor the depositionof a witness, a more definiteway, we nray say that In thereis a greaterprobabilityof winning the first prize in a certain sweepstake than of gaining the smallest another.We have no in difficultyin explaining what we meanby these statements long as as the inquireris satisfied a 'descriptive' by answer. can easilyfind We a numberof expressions whichwill serve. may speak a 'guess', We of of 'approximate' 'incomplete' or knowledge, 'chance', we may or or say that we havemore or lessadequate reasons believing for that this or that is the case. and so forth.
EXPL ANAT ION OF WOR D S

of There are differing degrees probability and these ments. or upon thekind andnumberofreasons factsuponwhichthe depend of or assertion conclusion probabilityis based.'3 To considernow a familiar, modern source,Webster'sNerv gives followingdefinitionof probability: the Dictionory International groundfor presum'Quality or stateof beingprobable;reasonable that is not proof but ing; likelihood;more narrowly,a conclusion as follows logically from such evidence is available; as, reports probability of guilt.'a (For devoid of all probability,to establish Webster'sdefinition of mathematicalprobability, see note I, -Lect.3). They explanations. to It is useless quarrelwith thesephilosophic by one word is replaced othersand freare merelysubstitutions; quentlyby a greatmany. If thesenew words are more familiar to than the originalone,then he may find someexplanation the reader while otherswill find none in this way. Some,for in this procedure the instance, may understand meaningof 'more than half certain' betterthan the simpleword 'probable'.This can only be a matter of and explanations this kind cannot be of personalpreference, of generally regarded a correction commonword tlsage. as
S Y N TH E TIC D E FIN ITION S

Considerable difficultiesarise,however,when we are askedto give an exact explanation, evenmore,a definitionof what we meanby or, 'probability'. Perhaps someone may suggest lookingup the word in a dictionary.Volume XIII of the GermanDictionary by Jakob and WllhelmGrimm2gives detailed us information:The Latin term'probabilis',we aretold, wasat onetime translated by'like truth', or, by 'with an appearance truth' ('mit einemSchein Wahrheit'). of der Only sincethe middle of the seventeenth century has it beenrenderedby 'wahrscheinlich' truth-resembling). also find a number of (lit. We quotations illustrating useof the word, mostof themtakenfrom the philosophical works.I shallonly referto a few examples: 'The probable is somethingwhich lies midway betweentruth and error' (Thomasius, 1688);'An assertion, which the contraryis not comof p_letely self-contradictory impossible, called or is (Reimarus). probable, Kant says:'That which,if it wereheld astruth, would be more than half certain, calledprobable.' is Perhaps, afterthese examples, somemly wish to know what modernphilosophy contributed has to 9qe this.subject.quoteliterallyfrom RobertEisler'i Dictionary phitoI of sophicConcepts (1910):'Probability,in the subjective senie,is a degreeof certainty which is basedon strong or evenoverwhelming reasons- making an assertion.. . In ihe objective sense,thi for probableis that which is supportedby a number 6f objectiveargu2

Let us now considera way by which we may arrive at a better' which is definitionof probabilitythan that givenin the dictionaries, so obviously for unsatisfactory our purpose. a In the courseof the last few centuries methodof forming and which by definingconcepts has beendeveloped the exactsciences showsus the way clearlyand with certainty.To ignoreor to reject of this methodwould be to questionall the achievements modern let example, me quotea mathematics physics. a preliminary and As modern definition of a concept which belongs to the scienceof of sociology;this is more nearlyrelatedto the subject-matter our general education and will thus form a transitionto thoseconcepts in later. Werner Sombart,s his with which we shall be concerned book ProletarianSocialism,attemptsto find a useful definition of a his subject and in so doing he considers numberof currentinterpretations. He concludes:'The only remainingpossibility is to of concept it, consider socialism an ideaand to form a reasonable as a i.e., to delimita subject matterwhichpossessesnumberof characimportantto it and whichform teristics to considered be particularly of a meaningful unity; the "correctness" this conceptcan onlyte
1

PROBAB I LI TY,

STATI STI CS

AND T R U T H

TH E D EFIN ITION

OF PR OBABIL ITY

judged from its fruitfulness,both as a creativeidea in life and as a useful instrument for advancing scientific investigation.' These words actually contain almost all that is characteristic the scienof new concepts. Thereare in particular two tific methodof developing in points which I wish to emphasize: the first place,the content of a concept is not derived from the meaning popularly given to a of word, and it is thereforeindependent current usage.Instead,the conceptis first established its boundaries purposelycircumand are scribed,and a word, as a suitablekind of label, is affixedlat'er.In the secondplace,the value of a conceptis not gaugedby its correspondence with someusual group of notions, but only by its usefulness for further scientific development,and so, indirectly, for everydayaffairs. We may say, with Kant,6 that our aim is to give not an analytic definition of probability but a syntheticone.We may leaveopen the question of the general possibility of finding analytic definitions at all.
T ERM INOL OGY

for are rhemselves only human, and they use the commonlanguage ii,i sreaterpart of their livesin the sameway as other humans.They and often give aie iuUiect to the samekinds of confusionof speech toihem onlYtoo freelY. way
TH E C ON C E P T OF W OR K IN ME C H A N IC S

I shouldlike to add a further remarkabout the first of the abovementioned properties of synthetic definitions. The person who arrives at a new scientificconceptmay be inclined to invent a new namefor it: he will look for a word which hasnot alreadybeenused one closelyrelatedto that in which he in someother sense, perhaps himselfwishesto use it. Sinceit is obviouslvdifficult to find new words in one's own language,foreign or adopted words are frequently introduced into the scientific vocabulary. Although the puristsin the matter of language not altogetherto be blamed,it are would appearthat they go too far when they ignore this reasonfor the introduction of foreign words into the language science, and of attempt to retranslate them into ordinary language. For example,it is unfortunatethat most languages have no specificword for probability in its scientificsense only popular termslike Wahrscheinbut lichkeit, probability, probabilitd. However, no term has been inventedand, naturally,it is quite possible a scientificconceptto for existwithout havinga special name.This is the casewith many of the most important concepts mechanics which are hiddenbehindsuch of ordinarywordsasforce,mass, work, etc.All the same, do feelthat I manylaymen,and evensomeprofessionals the field of mechanics, in would understandthese conceptsmore clearly if they had Latin names rather than names taken from everyday usage.Scientists 4

Before I deal with the developmentof the scientificconcept of orobability,I shouldlike to recallthe similar stateof affairswhich brevailedduring the formation of most other conceptsin the exact iciences.As an exampleof a concept which is familiar today to persons,I shall choosethat of work as it is used in theoeducated We all use the word 'work' with a variety of retical mechanics. Evenwhen we do not consideridiomatic phrases differentmeanings. thereare many waysin which feelings', work on someone's like 'to of the word is usedwhich havelittle to do with the concept work as definitions work scientific of The in it is understood science. simplest or, are:'Work is the product of force and distance', more exactly, or 'the scalarproduct of the vectorsof force and displacement', 'the for are All of line-integral force'.? thesedefinitions equallysuitable needonly keepin mattersand the nonmathematician manyeveryday such as the mind the first of them. If we considersomeexamples, a weight,the turning of a crank, or the pushingof a pedal, tifting of in greaterwith an increase in eachcasethe work performedbecomes the weight of the load moved as well as with an increasein the distancethrough which it is moved. Yet this scientificdefinition of work is hardly applicableto even the simplest of activities which are only partly of a mechanical nature.We may think of working a typewriter or playing a musical instrument.In the latter case,it is hardly possibleto say that the is correctmeasure the work performedby the musician the product of of the force appliedby the fingersof the musicianand their displacement. Again, whenwe speakof the work involvedin writing a book, painting a picture, or attendinga patient, we are evenfurther from the scientificmeaningof the word 'work'. It is hard work from the human point of view to hold a heavy weight steadily with outstretchedarms, but in this casethe product of the force and the displacementis zero. In sports and games,the work calculated accordingto the rules of mechanicscan hardly be regardedas a correct measure of the physical effort involved. No reasonable because have becometoo person objectsto thesediscrepancies we accustomed the fact that the sameword may have a different to 5

--

PROBABI LI TY,

STATI STI CS

AND T R U T H

TH E D EFIN ITION

OF PR OBABIL ITY

as meaning ac.cording it is usedscientifically colloquially. or When we use the word 'work' in its scientific we meaning, automatically which it may bring to our mindson eliminate other associations all otheroccasions, these not appertain it in mechanics. since do to
AN HIST ORICAL IN TE R LU D E

It was not imrnediatelyrealized that the meaning of scientificconcepts is independent of the literal meanings of the rvords used for them; this recognition only evolvedover a long period in the development of scientific thought. This is illustrated by the controversy betweenthe followers of Descartesand of Leibnitz on the question Is of ursuir,a.8 the 'effect of a force' equal to the product of the mass and the velocity or to the product of the mass and half the square of the velocity? We know now that this question has no logical answer and relies upon a definition which is ultimately arbitrary; what is to be called 'uii tiua'and what 'monrcntum'iscompletelysLcondary. In the words of Robert Meyer we may say: '[t does not matter what others mean by the word "work", what we intend to convey by it is the thing that really matters'. We have all experienced, school, the difficultieswhich arisefrom in the confusion between the colloquial and the scientific meanings of words. We had to learn that even the slowest motion has velocity, that a retarded motion has an accelerationbut with a negative sign, and that 'rest' is a particular case of motion. This mastering of scientificlanguageis essential mental development,for, without it, in there is no approach to modern natural science. We have given a few examples of the use of common words as scientificterms. There is a grorving tendencytowards a more precise use of certain words of everyday language.Most educated persons are nowadays aware of the specific meanings given to words in science.They may, for instance,be expectedto distinguish between the words quadrangle,rectangle,and square,and to know how these three terms are defined. In the near future, a much deeper understanding of questionsof this kind will be taken for granted.
T HE PURPOSE OF RAT ION A L C ON C EP TS

Whena nameis chosen a scientific for concept, is obviousthat it wemustconsider linguistic convenience goodtaste. and Nevertheless, it is the contentof a concept, and not its name,which is of importance.The definition must servea useful purpose.We considera purposeto be usefulif it is in agreement with what we generally 6

This is to bring order into the regardas the purposeof science. phenomena, predictthe courseof their to of observed mittipti.ity and development, to point out waysby which we may bring about The scientific in particular phenomena which we are interested. of 'work', and the whole conceptualsystemof classical irotion of Dhysics, whichthis notion is a part, ltaveprovedtheir utility in all us of has The directions. Law of Conservation Energy provided ihese of rneans bringingorderinto a verywide regionof physical with the This law enablesus to predict the courseof many phenomena. and while,at the same time,the engineer the electrievents, natural the of to cianderivefrom it the data necessary calculate dimensions Nobody can deny the theoreticaland practical their machines. which has beenfoundedupon conmechanics, of success scientific levelledagainstthe ceptsof this kind. One criticism occasionally concepts will be of prictical utility of this rationalization scientific ixaminedbriefly. theory Peoplehaveiaid, 'It is easyto formulatea self-consistent but in the practical basedon exactly definedartificial concePts, of applications the theory we alwayshaveto deal with vagueproin described terms of correswhich can only be adequately cesses which have evolvedin a natural way'. pondinglyvagueconcepts Thereis sometruth in this objection,for it makesevidenta great which is to be found in any theoreticaltreatmentof reality. deficiency and whichwe observe, in which we take part, are always The events theorycannot and the verycomplicated; even mostelaborate detailed take into accountall the factorsinvolved.It is an art requiringa scientifically trainedmind merelyto identifythe one featureamong as whichis to be considered in a multitudepresent a naturalprocess pointof view.Nevertheless, theonly essential from thetheoretical one it wouldbe a mistake, at leastwould leadus awayfrom the whole or centuries, we weretoif the of the scientific development-of last_few of The adherents follow the Bergson schoolof modernphilosophy. hopingin concepts, this schoolrepudiate useof sharplydefined the with the complexityof the reaL this way to cope more adequately notions, world. Nothing would be gainedby a returnto thosevague whicharesometimes as praised intuitivebut whicharereallynothing but an unprecise indefinite of words. use and
THE I NADEQ UACYO F THEO RI ES Imagine that I draw a 'straight line' on a blackboard with a piece of chalk. What a complicated thing is this 'line' compared with the
I

t7-

PROBA BI LI TY,

STATI STI CS

AND T R U T H

TH E D EFIN ITION

OF PR OBABIL ITY

'straightline' defined geometly!In the first place,it is not a line by it at all, since hasdefinitebreadth;evenmorethan that, it is a threedimensional bodymadeof chalk,an aggregate manysmallbodies, of the chalk particles. personwho was unaccustomed seeing A to the teacher schooldraw 'straiehtlines' of this kind would be almost at unableto understand what t[is body of chalk has in commouwith the'straightline'defined the textbooks in as'the shortest distance between points'.All the same, do know that the exactidealtwo we ized conceptions pure geometryare essential of tools for dealing with the real thingsaroundus. We needthese just abstract concepts because aresimple they enough that our mindscanhandlethemwith comparative ease. Attempts have been made to constructgeometries which no in 'infinitely narrow' lines exist but only thoseof definitewidth. The results weremeagre becanse methodof treatment muchmore this is difficultthantheusualone.Moreover, strip of definite width is only a anotherabstraction betterthan a straightline, and is reallymore no complicated,since it involves somethinglike two straight lines limitingit, one on eitherside. I am prepared concede to without further argumentthat all the theoretical constructions, includinggeometry, which are usedin the variousbranches physics only imperfect of are instruments enable to the world of empiricalfact to be reconstructed our minds.The in theory of probability,which we includeamongthe exactsciences, is just one suchtheoretical system. I do not believe But that thereis any otherway to achieve progress science in than the old method:to begin with the simplest, i.e., the exact theoreticalscheme and to extendand improveit gradually. dealingwith the theoryof probIn ability, i.e.,with probabilitycalculus, do not hopeto achieve I more than the results already attained by geometry,mechanics, and certainother branches physics. of That is to say,I aim at the constructionof a rational theoly, basedon the simplest possible exact concepts, one which, althoughadmittedlyinadequate represent to thecom_plexity ofthe realprocesses,ableto reproduce is satisfactorily someof their essential ploperties.
L IM IT AT ION OF S C OPE

After all thesepreliminarydiscussions, now cometo the deswe cription of oul concept probability.It followsfrom our previous of remarksthat our firsl task-must be one of elimination.Fiom the complex of ideas which are colloquially covered by the word 8

.orobability', mustremove those we all that remainoutside theory the I beginwith a pre' to endeavouring formulate. shalltherefore rir "r" liminary delimitationof our conceptof probability; this will be definitionduring the courseof our into a more precise developed discussion. suchas: Our probabilitytheoryhas nothingto do with questions therea probabilityof Germanybeingat sometime in the future 'Is the of in involved a war with Liberia?'Again, question the'probfrom the of ability' of the correctinterpretation a certainpassage Annalsof Tacitushas nothingin commonwith our theory.It need with the hardly be pointed out that we are likewiseunconcerned 'intrinsicprobability'of a work of art. The relationof our theoryto superbdialogueon Truth and Probabilityin Fine Arts is Goethe's of thus only one of similarityin the sounds wordsand consequently is irrelevant.We shall not deal with the problem of the historical to of althoughit is interesting notethat accuracy Biblicalnarratives, mathematician, Markoff,ro inspiredby the ideasof the A. a Russian wished seethe theoryof probto Enlightennent, eighteenth-century Similarly,we shall not concernourability appliedto this subject. with any of thoseproblemsof the moral sciences which were selves The so ingeniouslytreated by Laplacell in his EssaiPhilosophique. unlimitedextension the validitv of the exactsciences a charwas of acteristic featureof the .*agg".ated rationalismof the eighteenth century.We do not intendto commit the samemistake. Problemssuch as the probablereliability of witnesses and the correctness judicial verdictslie more or lesson the boundaryof of the regionwhich we are going to includein our treatment. These problems have been the subject of many scientificdiscussions; Poissonr2 chosethem as the title of his famousbook. To reachthe essence the problems probabilitywhichdo form of of the subject-matter this book, we must consider, example, of for the probabilityof winning in a carefullydefinedgameof chance. it [s sensible bet that a 'double6' will appearat leastonceif two dice to are thrown twenty-four times? Is this result 'probable'? More exactly, how greatis its probabiiity?Suchare the questions feel we ableto answer. Many problems considerable of importance everyin day life belongto the same class in way; and can be treated the same examples these manyproblems of are connected with insurance, such as thoseconcerning the probability of illnessor death occurring under carefullyspecified conditions,the premium which must be asked for insurance againsta particularkind of risk, and so forth.

-r-P RO B A BIL IT Y . S T AT IS T IC S A N D TR U TH Besides the games of chance and certain problems relating to social mass phenomena,there is a third field in which our concept has a useful application. This is in the treatment of certain mechanical and physical phenomena.Typical examplesmay be seenin the movement of moleculesin a gas or in the random motion of colloidal particleswhich can be observedwith the ultramicroscope.('Colloid' is the name given to a systemof very fine particles freely suspended in a medium. with the size of the particles so minute that the whole appearsto the naked eye to be a homogeneous liquid.)
UNL IM IT ED REPE TITION

TH E D EFIN ITION

OF PR OBABIL ITY

We state here explicitly: The rational concept of probability, applies only to probwhichis the only basisof probabilitycalculus, itselfagainand again,or eventrepeats lemsin whicheitherthe same are a greatnumberof uniform elements involvedat the sametime. of we the language physics, may saythat in orderto apply the Uling theoryof probabilitywe must havea practicallyunlimitedsequence of uniform observations.
TH E C OLLE C TIV E

What is the commonfeature the lastthreeexamoles what is in and -'probability' the essential distinction between the meaningof in these cases and its meaning the earlierexamDles in whichwe have excluded frorn our treatmeit? One commonfeaturecan be recognized easily,and rve think it cmcial. In gamesof chance,in td'e problems insurance, in the molecular of and processes find events we repeating themselves againand again.They are massphenomena or repetitive events. The throwing of a pair of dice is an eventwhich can theoretically repeated unlimitednumberof times,for we be an do not take into accountthe wearof the box or the possibilitythat the dice may break. If we are dealingrvith a typical problem of insurance, canimagine greatarmy of individuals we a insuringthemselves against same the risk, and the repeated occurrence ofeventsof a similarkind (e.g., deaths) registered the records insurance are in of companies. the third case,that of the molecules colloidal In or partieles, imrnense the numberof particles partakingin eachprocess is a fundamental featureof the wholeconceotion. On the otherhand,this unlimitedrepetition, this 'mass character', is typicallyabsent the case all the examples in of previously excluded. The implicationof Germanyin a war with the Republicof Liberia is not a situationwhich frequentlyrepeats itself; the uncertainties that occurin the transcription ancientauthorsare,in general, of of a too individualcharacter themto be treated mass for as phenomena. The question the trustworthiness the historical of of narratives the of Bibleis clearlyuniqueand cannotbe considered a link in a chain as of analogous problems. classified reliabilityand trustworthiWe the nessof witnesses judgesas a borderline and casesincewe may feel reasonable doubt whetl-rer similar situationsoccur sufficientlv frequently and uniformly for them to be considered repetitive as phenomena. l0

of suitablefor the applicaA good example a massphenomenon of tion of the theoryof probabilityis the inheritance certaincharacfrom the cultivationof e.9.,the colour of flowersresulting teristics, from a given seed. Here largenumbersof plantsof a givenspecies what is meant by the words 'a repetitive we can easilyrecognize instance: growingof oneplant the Thereis primarilya single event'. of and the observation the colour of its flowers.Then comesthe contreatmentof a greatnumberof suchinstances, comprehensive unity. The individualelements as belongsidered partsof onegreater to ing to this unity differ from eachother only with respect a single attribute,the colour of the flowers. In games dice,the individualeventis a singlethrow of the dice of from-thebox and the attributeis the observati6n the numberof of pointsshownby the dice.In thegame'heads tails',eachtossof the or coin is an individualevent,and the sideof the coin which is uppermostis the attribute.In life insurance single the eventis the life of the individualand the attribute observed eitherthe ageat which the is individual diesor, more generally, momentat which the insurance the company liablefor payment. becomes Whenwe speakof 'the probability of death',the exact meaning this expression be defined of can in the followingway only. We mustnot think of an individual,but of a certainclassas a whole, e.9., 'all insuredmen forty-oneyearsold living in a given country and not engaged certain dangerous in occupations'. probabilityof deathis attached this classof men A to or to anotherclassthat can be defined a similarway. We can say in nothing about the probabilityof deathof an individual evenif we know his condition of life and health in detail. The phrase'probability of death',when it refersto a singleperson,has no meaning at all for us. This is one of the most importantconsequences our of definitionof probabilityand we shall discuss this point in greater detaillater on. We must now introducea new term, which will be very useful

THE DEFINITTON PROBABILITY OF


PROBABI LI TY, STATI STI CS AND T R U T H

This term is 'the collecduring the future courseof our argument. which or ofuniform events processes a tive', and it denotes sequence attributes,say colours, numbers,or differ by certain observable grownby a way we state:All the peas In anythin! else. a preliminary with the problemof hereditymay be considered concerned botanist beingthe the as a collective, attributesin which we are interested differentcolours of the flowers.All the throws of dice made in whereinthe attributeof the the courseof a gameform a collective eventis the numberof pointsthrown.Again,all the molecules single and the as in a givenvolumeof gasmay be considered a collective, A mightbe its velocity. furtherexample molecule attributeof a single is of a collective the wholeclassof insuredmen and womenwhose by ages at death have been registered an insuranceoffice. The the principlewhich underlies whole of our treatm!ntof the probability problem is that a collectivemust exist beforewe begin to speakof probability.The definitionof probabilitywhich we shall a with 'the probabilityof encountering certain giveis only concerned attributein a sivencollective'.
T HE F IRST ST EP T OWARDS A D EFIN ITION

later on; meanwhile, assume we that valueof the relativefrequency' the frequencyis b,'ing computed with a limited accuracyonly, so are This approximate valueof that smalldeviations not perceptible. we the relativefrequency shall, preliminarily,regardas the probe.9.,the probabilityof the result ability of the attributein question, '12' in the gameof dice.It is obviousthat if we defineprobabilityin this way, it will be a numberlessthan l, that is, a properfraction.
TW O D IFFE R E N T P A IR S OF D IC E

alike.By repeatI haveheretwo pairsof dicewhichareapparently of edly throwingone pair, it is found that the relativefrequency the a as 'double6' approaches value of 0.028,or 1136, the numberof The pair shorvs relativefrequency the a trials is increased. second for '12' whichis four timesaslarge.The first pair is usuallycalleda pair is but of true dice,the second calledbiased, our definitionof probequallyto both pairs.Whetheror not a die is biased abilityapplies is for asirrelevant our theoryasis the moral integrityof a patientwhen a physician diagnosing illness. is his 1800 throwsweremadewith each pair of these dice.The sum '12' appeared timeswith the first pair 48 and 178timeswith the second. The relativefrequencies are
and

it discussion shouldnot be difficultto arriveat After our previous a rough form of definition of probability. We may considera game with two dice.The attribute of a singlethrow is the sum of the points showingon the upper sidesof the two dice. What shall we call the probabilityof the attribute '12', i.e., the caseof eachdie showing six points?Whenwe havethrown the dicea largenumberof times, a say200,and noted the results,we find that 12hasappeared certain five times.The ratio fl200 : l/40 is called numberof times,perhaps the frequency,or more accuratelythe relative frequency, of the attribute'12' in the first 200 throws. If we continuethe gamefor relativefrequency another200throws,we can find the corresponding for 400 tfuows, and so on. The ratios which are obtainedin this way will differ a little from the first one, 1i40. If the ratios were to continue to show considerablevariation after the game had been repeated2000, 4000, or a still larger number of times, then the questionwhetherthere is a definiteprobability of the result '12' for would not ariseat all. It is essential the theoryof probabilitythat has experience shown that in the game of dice, as in all the other the which we havementioned, relativefrequencies massphenomena become of certainattributes more and more stableas the numberof is the observations increased. shall discuss idea of 'the limitine We l2

: #fu # :0.027 : #r% #: o'oee'

Theseratios became practicallyconstanttowardsthe end of the series trials.For instance, of after the l500th throw they were0.023 and 0.094respectively. differences The the between valuescalculated at this stage and later on did not exceed 10-15%. It is impossiblefor me to show you a lengthy experimentin the throwingof diceduring the course this lecturesinceit would take of too long. It is sufficient makea few trials with the second to pair of diceto see that at leastone6 appears nearlyeverythrow; this is a at result very different from that obtainedwith the other pair. In fact, it can be shownthat if we throrv one of the dice belonging the to secondpair, the relative frequencywith which a single6 appearsis about l/3, whereas eitherof the first pair this frequency almost for is exactlyl/6. In order to realizeclearlywhat our meaningof probability implies,it will be usefulto think of thesetwo pairs of dice as often as possible;eachpair has a characteristic probability of shorving 'double6', but these differ widely. probabilities l3

l-F-

AND TRUTH PROBABILITY. STATISTICS (Urphdnornen) the of Here we have the 'primary phenomenon' form. The probabilityof a 6 is a theoryof probabilityin its simplest to physicalpropertyof a givendie and is a propertyanalogous its mass, specific heat,or electrical resistance. Similarly,for a given pair of dice (includingof coursethe total setup)the probability of a 'double6'is a characteristic property,a physical constant belonging to the experiment a whole and comparable as with all its other with physical properties. theoryof probabilityis only concerned The of relations existing between physical quantities this kind.
L IM IT ING VAL UE OF REL ATIVE FR E QU EN C Y

TH E D EFIN ITION

OF PR OBABIL ITY

the whichbelongs higher to I haveused expression 'limitingvalue', We analysis, withoutfurtherexplanation.13 do not need knowmuch to aboutthemathematical of since definition thisexpression, we propose to useit in a mannerwhich can be understood anyone, by however ignorant of higher mathematics. us calculatethe relativefreLet This is the ratio of the number quency an attributein a collective. of in of cases whichthe attributehasbeenfound to the total numberof observations. shall calculate with a certainlimited accuracy, We it i.e.,to a certainnumberof decimalplaces without askingwhat the followingfigures mightbe.Suppose, instance, we play 'heads for that or tails' a numberof timesand calculate relativefrequency the of 'heads'.If the numberof games increased if we alwaysstop is and at the same decimalplacein calculating relative the frequency, then, eventually, resultsof suchcalculations will cease change. the to If the relative frequencyof headsis calculatedaccuratelyto the first decimalplace,it would not be difficult to attain constancy this in first approximation. fact, perhaps In after some500games, first this approximation reachthe valueof 0.5 and will not change will afterwards.It will take us much lonser to arrive at a constant valuefor the secondapproximation, caljulated to two decimalplaces.For this purposeit may be necessary calculatethe relativefrequency to in intervals say,500casts, afterthe 500th,l000th,l500th,and of, i.e., 2000thcast,and so on. Perhaps nore than 10,000 castswill be requiredto showthat now the second figurealsoceases change to and remains equalto 0, so that the relative frequency remains constantly 0.50.Of course is impossible continue experiment this it an to of kind indefinitely. Two experimenters, co-operating efficiently, may be able to make up to 1000observations hour, but not more. per Imagine,for example, has beencontinuedfor that the experiment ten hours and that the relative freouencvremainedconstant at t4

0.50 during the last two hours. An astute observer might perhaps have managed to calculate the third figure as well, and might have found that the changesin this figure during the last hours, although still occurring, were limited to a comparativelynarrow range. Considering these results, a scientifically trained mind may easily acceptthe hypothesis that by continuing this play for a sufficiently long time under conditions which do not change (insofar as this is practically possible). one would arrive at constant values for the ihird, fourth, and all the follorving decimal places as well. The we expression used,stating that the relative frequencyofthe attribute 'heads' tends to a limit, is no more than a short description of the situation assumedin this hypothesis. Take a sheet of graph paper and draw a curve with the otal number of observations as abscisse and the value of the relative frequency of the result 'heads' as ordinates. At the beginning this curve showslarge oscillations,but gradually they becomesmaller and smaller, and the curve approachesa straight horizontal line. At last the oscillations become so small that they cannot be representedon the diagram, even if a very large scaleis used. It is of no importance for our purpose if the ordinate of the final horizontal line is 0.6, or any other value, instead of 0.5. The important point is the existence of this straight line. The ordinate of this horizontal line is the limiting value of the relative frequency representedby the diagram, in our casethe relative frequency of the evOnt'heads'. Let us now add further precision to our previous definition of the collective. We rvill say that a collective is a mass phenomenon or a repetitiveevent,or, simply, a long sequence observationsfor which of there are sufficient reasonsto believe that the relative frequency of the observedattribute would tend to a fixed limit if the obsirvations were indefinitely continued. this limit will be called the probability of the attribute consideredwithin the giuen collectiue. This expression being a little cumbersome,it is obviously not necessary iepeat it to Occasionally, we may speak simply of the probability of 1-lways. 'heads'.The important thine to remember is that this is onlv an abbreviation, and that we sh5uld know exactly the kind of colleltive to which we are referring.'The probability oi winning a battle', for instance,has no place in our theory ofprobability, because cannot we think of a collective to which it belongs. The theory of probability cannot be applied to this problem any more than the physical concept of work can be appliedto the calculationof the 'work' done by an actor in reciting his part in a play. l5

t7

P R O B A BIL IT Y,

ST AT IST ICS

AND

TR U TH TH EOR Y OF GAMES

THE DEFINITIONOF PROBABILITY companies.la observations The covered periodfrom the twenty-three of contractuntil the cessamomentof conclusion the insurance the tion of this contract by death or otherwise.Let us consider,in the 'All beforereaching particular, followingcollective: meninsured and with the ihe age of forty after completemedicalexamination normal premium, the characteristicevent being the death of the in or insuredin his forty-first year.' Cases which the occurrence nonof e.g., of occurrence this eventcould not be ascertained, because a of were excluded from calculation. discontinuation the insurance, which could be ascertained 85,020. rvas The The number of cases numberof deathswas940.The relativefi'equency of corresponding is 940:85,020:0.01106. This deathsor the death-rate therefore after certain correctionswhich we do not need figure was accepted, to be botheredwith, as the probabilityof death occurringin the of class forty-firstyear for membels the above-described of insured i.e., collective. persons, for an exactlydefined havebeenassumed be sufficient 85,000 observations to In this case of practicaliyequalto its for the relativefrequency deathsto become limiting value, that is, to a constantwhich refersto an indefinitely of long seriesof observations persons the samecategory. of This is it assumption an arbitrary one and, strictly speaking, would be wrong to expectthat the aboverelativefrequency agrees with the true probability to more than the first threedecimalplaces. other In words,if we could increase number of observations keep the and calculatingthe relativefrequencyof deaths,we can only expectthat thefirst threedecimal places the originaldeath-rate, of namely 0.01l, will remainunchanged. concerned insurance All in business would preferthe death-rates be calculated a broaderbasis;this, howto on ever,is difficult for obviouspracticalreasons. the other hand, no On figureof this kind, however exactat themomentof its determination, canremainvalid for ever.The same true for all physical is data.The scientists determinethe acceleration to gravity at a certainplace due on the surface ofthe earth,and continue usethis valueuntil a new to determination happens reveala change it; local differences to in are treatedin the sameway. Similarly,insurance mathematicians are satisfied with the bestdataavailable the moment,and continueto at usea figure suchas the above0.011until new and more accurate calculations In becomepossible. other words, the insurance companies continue assume out of 1000 to that newlyinsured menof the previously will die in their forty-firstyear. definedcategory, eleven No signfficance any6thei category claimedfor this dgureO.Ot for is t. It is u-tter nonsense tisay, for insianie,that Mr. X, now iged forty, t7

T H E EXPERIM ENT AL

BASIS OF TH E

It will be useful to considerhow the fundamentalexperimentof of the determination probabilitycan be carriedout in the othertwo and that of mentioned:I meanin the caseof life insurance, cases doingthis,I shouldlike to add a few more of molecules a gas.Before may ask,'How do People of of wordson the question games chance' we know for clrtain that a gameof chancewill developin practice i.e., tending toward-s along the lines which we have discussed, -a of of stabilization the relativefrequencies the possibleresults?Is in basisfor this important assumption actual setherea sufficient limited to a relaArc of quences experiments? not all experiments not materialis, however, ?' tivelyshortinitial stage The experimental banks at as asrestricted it mayappear first sight'The greatgambling data relatingto many havecollected in Monte Carlo and elsewhere Thesebanksdo of millionsof repetitions one and the samegame. of of of quitewell on theassumption theexistence. a limiting-value the occurrence result.The occasional of frequency eachpossible rblative the against validityof this oi'breakingthe bank' is not an argument on theory.This could only be questioned the basisof a substantial of in decreise the total earningiof the bank from the beginning its the worse,.by transformation date,or, even to operation anyspecified with the gain into a loss.Nobody who is acquainted oi a continued sucha possibankswould everconsider of sheets gambling balance class from this point of view,to the same bility. The lotterybelongs, by Lotterieshavebeenorganized certaingovernments as r-oulette. agreement beenin complete havealways and for decades, the results frequencies. of values the relative of with the assumption constant of of We thus seethat the hypothesis the existence limiting values is of the relative frequencies well corroboratedby a large massof to Only processes which this of with actualgames chance. experience discussion' of applies form the subject our subsequent hypothesis
T HE PROBABIL IT Y OF D EA TH

companies by The'probabilityof death'is calculated theinsurance by a mithod very similarto the one which we haveusedto define of the probabilityin thecase the gameof dice.The first thing needed for is to havean exactdefinitionof the collective eachsinglecase. we As an example, may mentionthe compilingof the GermanLife ComInsurance of TablesBasedon the Experience Trventy-three single on Thesetableswerecalculated the basisof 900,000 panies. on whoselives were insuredwith one of the observations persons l6

tF

P RO B AB IL IT Y, S T AT IS T IC S AN D TR U TH has the probability 0.01I of dying in the courseof the next year. If the analogous ratio is calculated for meu and women together, the value obtainedin this way is somewhatsmallerthan 0.011,and Mr. as collective much as to that previouslyconX belongsto this second sidered. He is, furthermore, a member of a great number of other collectiveswhich can be easily defined, and for which the calculation of the probability of death may give as many different values. One might suggestthat a correct value of the probability of death for Mr. X may be obtained by restricting the collective to which he belongs more and more of his by as far as possible, taking into consideration There is, however,no end to this process, individual characteristics. and if we go further and further into the selectionof the membersof we the collective, shall be left finally with this individual alone. Insurancecompaniesnowadays apply the principle of so-called'selecthe tion by insuranca';this meansthat they take into consideration fact that personswho enter early into insurancecontracts are on the averageof a different type and have a different distribution of death agesfrom personsadmitted to the insuranceat a more advancedage. It is obviously possibleto go further in this or other directions in the limitation of the collective.It is, however,equally obvious that in trying to take into account all the properties of an individual, we shall finally arrive at the stage of finding no other members of the collective at all, and the collective will ceaseto exist altogether.
FIRST T HE COL L ECT IVE- T HEN TH E PR OBA B ILITY

THE DEFINITIONOF PROBABILITY with the true meaning probabilityat all, this connexion of connexion comprehensive description a certainnumber mayonly consist.in.a of of singleprobabilities. the My opinion is th_a_t 'improper' useof the probabilitynotion, as definedby von Kries, is in fact the only one admissible the in ofprobability. This hasbeendemonstrated the foregoing calculus in meansof the samecxample the deathprobabifty as paragraph.by of was used by von Kries, and I have tried to show tliat any oihe, is I conception impossible. consider, quitegenerally, introiuction the 'probabilityin a cbllect-ive' in important .imof the expression as in provement word usage'. Two examples may help to eiucidate this point further. considera lottery with one million tickets.Imaginethat the first peoplewill-consider prizehas fallen to ticket No. 400,000. this an amazing,and rare event; newspapers_ discussit, and everybody will will think that this was a ve{y improbableoccurrence. the othei on hand,the essence lottery is that all precautions ofa havebeentaken to ensurethe sameprobability for a *in for all tickets,and No, 400,000 thereforeexactlythe samechanceof winning as all the has other numbers,for instanceNo. 786,33I-namely the lrobability l/1,000,000. What shall we think about this piradoxi Another exampleis given Laplacel' in his famous .Essai philosophique: In -by playing with small cards,on eachof which is written a siigle letter, selectingatrandom fourteenof them and arrangingthem i"nu ,o*, one would be extremelyamazedto seethe woid;Constantinople; formed. However, in this caseagain, the mechanismof the ptay is suchas to ensurethe sameprouabitity for eachof the 26rap'osriut. combinationsof fourteenleiters (out bf the twenty-sixlettersof the alphabeQ Why. do rre neverthelese assumethe appearance the of word 'Constantinople' be something to utterly imprbUaUtet The solution of-thesetwo seemingiparadoxes-is same.The the even^t that the first prize will fall to ticket No. 400,000has, in itself,no 'probabilitytatall. A collective to be defined has beforethe word probability acquires definitemeaning.We may definethis a collective consistof repeated to drawsof a l6ttery, the attributeof the particular draw beinf the number of the ticket drawn. In this collective each number has exactly the same probability as No. .improbabi'lity, of thl number 1p,qry. However,in speakingof tlie 400,000, havein mind a collectiveof a differenticioo.ttt. abovewe mentioned impression improbabilitywould be creatednot only of Dydrawing the number 400,000, but all numbersof the samekind: 100,000, 200,000, The collectivewith which we haveto deal has etc. l9

I should like to dwell a little on this last point, which implies a characteristic difference between the definition of probability assumed in these lecturesand that which has been generally accepted before. I havealready statedthis once in the following short sentence: 'We shall not speakof probability until a collectivehas beendefined'. In this connexion, it is of interest to consider the diametrically by opposite viewpoint expressed one of the older authors, Johannes von Kries,l6in a once widely read book on the principlesof the theory of probability. He declares:'. . I shall assumethereforea definite probability of the death of Caius, Semproniusor Titus in the course of the next year. If, on the other hand, the question is raised of the probability of a general event, including an indefinite number of individual cases-for instance, of the probability of a man of 40 living another 20 years, this clearly means the use of the word "probability" in another and not quite proper way-as a kind of abbreviation. If an expression of this kind should have any l8

lt7

PRO BABI LI TY,

STATI STI CS

AN D

TRUTH

TH E D EFIN ITION

OF PR OBABIL ITY

thereforeonly the two following attributes-either the number does attributehas the cnd with five 0's, or it doesnot. The first-named times i.e., nearly 100,000 the probability0.00001, second0.99999, larger.In an alternative between draw of a nutnbercontaining the five 0's and that of a numbernot havingthis property,the second result has indeeda very much larger probability. Exactly the same considerationsapply to the secondexample. is What astonishes in the caseof the word 'Constantinople' the us fact that fourteenletters,taken and orderedat random,shouldform a well-known word instead of unreadablegibberish. Among the or immense numberof combinations fourteenletters(2614, about of to l0m), not more than a few thousandcorrespond words. The elements the collectiveare in this caseall the possiblecombinaof tions of fourteenletterswith the alternativeattributes'coherent'or has, The 'meaningless'. second attribute('meaningless') in this collective,a very much larger probability than the first one, and that is why we call the appearance the word 'Constantinople'-orof any of other word-a highly improbableevent. In many appropriateusesof the probability notion in practicallife wherethis provesto the collectivecan be easilyconstructed. cases In be impossible,the use of the word probability, from the point of view of the rational theory of probability, is an illegitimateone, and numerical determinationof the probability value is therefore impossible. many cases collectivecan be definedin several ways In the in and theseare cases which the magnitudeof the probability may becomea subjectof controversy. is only the notion of probability It in a giuencollectiue which is unambiguous.
PRO BABI LI TY I N T T I E G A S T H B O R Y

We now return to our preliminary survey of fields in which the theory of probability can be applied,and considerthe third example In -that of molecularphysics-rather moreclosely. the investigation of the behaviourof molecules a gaswe encounterconditionsnot in essentially differentfrom thoseprevailingin the two applicationsof probability we havepreviouslydiscussed. this case,the collective In can be formed, for instance, all molecules presentin the volume by of gasenclosed the walls of a cylinder and a piston. As attributes by (molecules), may consider,for instance, of the singleelements the we three rectangular componentsof tneir velocities, or the velocity vector itself. It is true that nobody has yet tried to measurethe actual velocities all the singlemolecules a gas,and to calculate in of 20

with which the different values in this way the relative frequencies occur. Instead,the physicistmakescertain theoreticalassumptions (or, thesefrequencies more exactly,their limiting values), concerning certainconsequences, derivedon the basis and testsexperimentally Although the possibilityof a direct determinaof theseassumptions. thereis nevertheless tion ofthe probability doesnot existin this case, no fundamentaldifferencebetweenit and the other two examples fieated,The main point is that in this case,too, all considerations of are basedon the existence constantlimiting valuesof relativefrewhich are unaffected a further increase the number of by in quencies concerned, by an increase the volumeof gasunder i.e., in elements consideration. In order to explain the relation betweenthis problem and the previousexampleof the probability of death established direct by counting,we may think of the following analogy.A surveyormay makecalculations relatingto a right-angled have.to triangle, e.g.,the evaluationof its hypotenuse meansof the Pythagorean by theorem. His first step may be to establishby direct measurement that the anglein the triangleis sufficiently nearto 90o.Another methodwhich he can apply is to assume that this anglels 90o,to draw conclusions from this assumption,and to verify them by comparisonwith the experimental results.This is the situationin which the physicistfinds himself when he applies statisticalmethodsto moleculesor other particlesof the samekind. The physicists often saythat the velocity of a moleculeis 'in principle' a measurable quantity, although ir il not possibleto carry out this measurement practiceby meansof in the usual measuringdevices.(At this stagewe do not considerthe moderndevelopment the questionof the measurabilityof molecof ular quantities.)Similarly,we can saythat the relativefrequency and its limiting value, the probability, are determinedin the-molecular collective'in principle' in the sameway as in the cases the games of of chance and of soiial statistics which we havepreviouslydiscissed.
AN H IS TOR IC A L R E MA R K

in . The way in which the probability concepthas beendeveloped the preceding paragraphs widely different from the one which the is older textbooks of probability calculusused in formally defining their subject. the otherhand,our foundationof proba-bility ii On is no contradiction whatsoever the actualcontentofthe probability to conceptused by theseauthors. In this sense, the first pagesof Poisson'su famoustextbook,On theprobabitityof thejuc$nZntsof 2l

lrr"-

PROB ABI LI TY.

STATI STI CS

AN D T R U T H

THE DEFI NI TI O N O F PRO BABI LI TY feature of a collective which we shall discuss in the followine paragraPh.
R A N D OMN E S S

courts of justice, are very instructive. Poisson says that a certain nhenomenon has been found to occur in manv different fields of \ixperience, namely, the fact which we have destribed above as the rstabilizationof relative frequencieswith the increasein the number of observations.In this connexion, Poisson usesan expressionwhich I have avoided uo till now on account of a prevailins confusion regarding its interpretation. Poisson calls th! fact th"at the relative frequenciesbecome constant, after the sequenceof experimentshas been sufficientlyextended,the Law of Large Numbers. He considers this law to be the basis of a theory of probability, and we fully agree with him on this point. In the actual investigationswhich follow the introduction, however,Poissonstarts not from this law, but from the formal definition of probability introduced by Laplace.r8(We shall have to speak about this definition later.) From it he deduces,by analytical methods, a mathematical proposition which he also calls the Law of Large Numbers. We shall seelater on that this mathematical proposition means something very different from the general empiricalrule calledby the samename at the beginningof Poisson's book. This double use of the sameexpressionto describetwo widely different things has caused much confusion, and we shall have to return to this point again: it will form the subject of the fourth chapter. At that point, I shall also quote the exact words in which Poisson statesthe empirical rule of the constancy of the relative frequencicwith large numbers of observationsas the foundation of the theory of probability. In the meantime I ask you not to associateany definite meaning with the expression'The Law of Large Numbers'. Let me add that our conception of the sequence obseruations of as the cornerstone in the foundation of the theory of probability, and our definition of probability as the relative frequency with which certain eventsor propertiesrecur in thesesequences, not something is absolutelynew. In a more dialecticalform and without the imrnediate intention of developinga theory of probability calculuson this basis, the sameideas were presentedas early as 1866by John Vennle in his book Logic of Chance.The development of the so-called theory of finite populations by Theodor Fechner2o and Heinrich Bruns2l is closely related to our frequencytheory of probability. Georg Helm,22 who played a certain part in the foundation of the energy principle, expressedideas very similar to ours in his paper on 'Probability Theory as the Theory of the Concept of Collectives',which appeared in 1902.These attempts. as well as many others which time does not allow us to enumerate,did not lead, and could not lead, to a complete theory ofprobability, because they failed to realizeone decisive
11

of The condition that the relativefrequencies attributesshould limiting values not the-only is onewe haveto stipulate haveconstant i.e., of whendealingwith collectives, with sequences singleobservaphenomena, repetitive or whichmay appropriately events tions,mass as of serve a basislbr the application probabilitytheory.Examples converge towards can easilybe found wherethe relativefrequencies limiting values, and whereit is nevertheless appropriate not definite a to speakof probability.Imagine,for instance, road along which are milestones placed,largeonesfor whole milesand smallerones for tenthsof a mile. If we walk long enoughalongthis road, calcufrequencies largestones, valuefound in this of the latingthe relative way will lie around l/10. The valuewill be exactly0.1 whenever in two eaih mile we arein that intervalbetween smalimilestones which to The deviations corresponds the onein whichwe started. from the value0.1 will become smallerand smalleras the numberof stones increases; other words, the relativefrequencytendstoin passed wardsthe limiting value0.1.This resultmay induceus to speakof a certain'probabilityof encountering largestone'.Nothing that we a have said so far prevents from doing so. It is, however, us worth whileto inquiremoreclosely into the obvious difference between the caseof the milestones and the cases previously discussed. point A will emerge from this inquiry whichwill makeit desirable restrict to the definitionof a collective sucha wav as to exclude caseof in the milestones and other casesof a similar-nature. The sequence of observations large or small stonesdiffers essentially of from the sequence observations, instance, the resultsof a gameof of for of chance, that the first sequence in obeysan easilyrecognizable law. Exactly every tenth observation leads to the attribute 'large', all just passed largestone, othersto the attribute'small'.After having a we arein no doubt aboutthe sizeofthe next one; thereis no chance of its beinglarge.If, however, havecasta double6 with two dice, we this fact in no way affects our chances gettingthe sameresultin of the next cast.Similarly,the death of an iisurei personduring his forty-firstyear doesnot giuethe slightest indicationof what wilt be the fate of anotherwho is resistered next to him in the booksof the insurancecompany, regardiess how the company'slist was of prepared. 23

t-

P RO B AB IL IT Y, ST A T IST IC S A N D TR .U TH This difTerencebetween the two sequencesof observations is We shall, in fittulc, consider only suchsequences actually observable. of eventsor observations,which satisfy the requirementsof complete or lawlessness 'randomness'and refer to them as collectives.In certain cases, such as the one mentioned above, where there is no collective properly speaking, it may sometimesbe useful to have a short expression for the limiting value of the relative frequency' We shall then speak of the 'chance' of an attribute's occurring in an unlimited sequenceof obselvations, which may be called an improper collective. The term 'probability' will be reserved for the limiting value of the relative frequency in a true collective which satisfiesthe condition of randomness.The only question is how to describethis condition exactly enough to be able to give a sufficiently precise definition of a collective.
DEF INIT ION OF RANDOM NESS : P LA C E SE LE C TION

TH E D EFIN ITION

OF PR OBABIL ITY

definitionof On the basisof all that hasbeensaid,an appropriate can be found without much difficulty. The essential randomness of differencebetweenthe sequence the results obtained by casting consists of diceand the regularsequence largeand srnallmilestones so the in the possibilityof devisinga methodof selecting elements as to producea fundamentalchangein the relativefrequencies. with a large stone,and registeronly every We begin,for instance, of The relation of the relativefrequencies the secondstonepassed. of towardsl/5 instead l/10. will smalland largestones now converge (We miss none of the large stones,but we do miss every secondof If method,or any other,simpleor complithe smallones.) the same of cated,method of selectionis applied to the sequence dice casts, the effectwill alwaysbe nil; the relativefrequencyof the double 6, the partial sequences, same for instance,will remain,in all selected sequences ofcourse,that the selected asin the original one(assuming, are long enoughto showan approachto the limiting value).This imof of possibilityof affecting chances a gameby a system selection, the and of this uselessness all systems gambling,is the characteristic of of or property common to all sequences observations mass decisive phenomena which form the proper subjectof probability calculus. In this way we arrive at the following definition: A collectiveappropriate for the applicationof the theory of probability must fulfil of two conditions.First, the relativefrequencies the attributesmust possess limiting values.Second,theselimiting valuesmust remain from the the samein all partial sequences which may be selected 24

original one in an arbitrary way. Of course,only such partial seas indefican be taken into consideration can be extended quJnces itself.Examples of nitely,in the sameway as the originalsequence the formed by all odd this kind are,for instance, partial sequences or of for members the originalsequence, by all members which the is of or placenumberin the sequence the square an integer, a prime according someother rule, whatever to number,or a numberselected condition is that the questionwhether it may be. The only essential belongsto the or not a certain member of the original sequence partial sequence shouldbe settledindependently the result of selected observation, i.e., before anythingis known of the corresponding of about this result.We shall call a selection this kind a place selecin tion. The limiting values of the relative frequencies a collective must be independentof all possible place selections.By place we of in selection meanthe selection a pardal sequence sucha way whether element an shouldor shouldnot be included that we decide i.e,, withoutmakinguseof the attributeof the element, the resultof our gameof chance.

rHEPRINc":i"?ff.'

iY;fr';'"''IrY

oFA

We rnay now ask a question similar to one we have previously asked:'How do we know that collectives satisfying this new and more rigid requirementreally exist?' Here again we may point to experimentalresults, and theseare numerousenough. Everybody who has beento Monte Carlo, or who has read descriptions a of gambling bank,knowshow many'absolutely safe'gambling systems, sometimes an enormouslycomplicatedcharacter,have been inof ventedand tried out by gamblers;and new systems still being are suggested everyday. The authorsof suchsystems haveall, sooneror later, had the sadexperience offinding out that no system able to is improvetheir chances winningin the long run, i.e., to affectthe of relativefrequencies with which different colours or numbersappear in a sequence from the total sequence the game. This selected of experience forms the experimentalbasis of our definition of probability. An analogy presents itselfat this point whichI shallbrieflydiscuss. The systemfanatics of Monte Carlo show an obvious likenessto another class of inventors'whose useless labour we have been accustomedto consider with a certain compassion,namely, the ancient and undying family of constructorsof 'pelpetual-motion' 25

tr-

PROB ABI LI TY,

STATI STI CS

AND T R U T H

THE DEFI NI TI O N O F PRO BABI LI TY existence of mass phenomena or repetitive events to which the principle of the impossibility of a gambling system actually applies. bnly phenomena of this kind will be the subject of our further discussion.
E X A MP LE OF R A N D OMN E S S

one, is This analogy,which is not only a psychological machines. man smile Why does every educated worth closer consideration. a whenhe hearsof a new attemptto construct perpetualnowadays he he ? motion machine Because, will answer, knowsfrom the law of Howis that sucha machine impossibie. of the conservation energy of ever, the law of conservation energyis nothing but a broad of firmly rootedin variousbranches physics generalization-however empirical results. failureof all theinnumerable The -of fundamental role amongthese. playsa decisive to attempts build sucha machine physics, energyprincipleand its variousapplicathe In theoretical of tions are often referredto as 'the principleof the impossibility of motion'. Therecan be no question provingthe law of perpetual morethan conservation energy-if we meanby 'proof' something of betweena principle and all the the simple fact of an agreement of The character beingnearly resultsso far obtained. experimental for which this principlehasacquired us, is only due to self-evident, of the enormousaccumulation empirical data which confirm it. attemptsto constructa perpetualApart from the unsuccessful of motion machine-the interest which is now purelyhistorical-all for are of of methods transformation energy evidence the technical principle. the validity of the energy of the By generalizing experience the gamblingbanks,deducing and of of from it the Principle the Impossibility a GamblingSystern, of this in including principle thefoundation thetheoryof probability, in we proceedin the sameway as did the physicists the caseof the lnergy principle.In our casealso, the naive attemptsof the by hunters of fortune are supplemented more solid experience, and similar bodies.The companies that of the insurance especially as by obtained themcanbe stated follows.Thewholefinancial results to if would be questionable it werepossible change basis insurance of of the relative frequencyof the occurrence the insurancecases (deaths, for everytenth one of the inetc.)by excluding, example, The principleof principle. or suredpersons, by someother selection for has of the impossibility a gamblingsystem the sameimportance of as the iniurance companies the principle of the conservation for power station:it is the rock on which all the energy the electric as these two principles, well as rest. calculations We can characterize laws of nature,by sayingthat they are restrictions all far-reaching upon our experience, which we imposeon the basisof our previous (Thisformulation of of expectation thefurthercourse naturalevents. of goesback to E. Mach.) The fact that predictions this kind have entitles us to assumethe been repeatedlyverified by experience 26

the in ln order to illustrate randomness a collective, will showa I experiment. is againtakenfrom thefieldof games It ofchance; simple experiments subjects on belonging otherfields to this is only because in which the theory of probability finds its applicationrequire much too elaborate be shownhere. to apparatus I havea bag containingninetyround discs,bearingthe numbers 1 to 90. I extractone disc from the bae at random.I note whether the numberit bearsis an odd o, one and replace disc.I the "n.u.Io the 100 repeat experiment timesand denoteall the odd numbers by I's, and all even numbersby 0's. The following table showsthe result: ll 00 0l I 0 I 00 ll 00 I I I 0l 0l 00 0 0 0 0l 00 l0 0 I I 00 00 0 0 I 0l t0 I 0 0 l0 ll 0 0 00 0t I 0 00 00 I 0 0l 00 0 0 Among 100experimental resultswe find fifty-oneones;in other words,the relativefrequency the resultI is 51/100. we consider of If only the first, third, fifth draw,and so forth, i.e.,if we take only the figuresin the odd columnsof the table, we find that onesappe'ar in twenty-four cases of fifty; the relative out frequency 48/100. is Using only the numbers the odd horizontalrowsof the table,we obtain, in for the relativefrequency the result l, the value50/100. may of We furtherconsider only thoseresults whose place the sequence in corresponds oneof the primenumbers, l, 2, 3, 5,7, ll, 13, 17,19, to i.e., gg 23,29,31,37,41,43,47, 59,61,67,71,73,79,93, and 97. 53, Thesetrventy-six drawshaveproduced thirteenI's, the relativefrequency thusagainexactly50/100. is Finally,we may consider 5l the drawsfollowinga result l. (A 'system' gamblermight preferto bet 27

lr7-

AND TRUTH PROBABILITY. STATISTICS of on 0 after I hasjust come out.) We find in this selection results or 27151, about 53/100. twenty-seven i.e., the relativefrequency I's, which we Thesecalculations showthat, in all the differentselections have tried out, the l's always appear with a relative frequencyof about l/2. I trust that this conveysthe feeling that more extensive of which I am not able to carry out here because the experiments, lack of time, would demonstratethe phenomenonof randomness still more strikingly. It is of coursepossible,after knowing the resultsof the hundred which would produceonly draws,to indicatea method of selection proportion.It is also I's, or only 0's, or I's and 0's in any desired analopossiblethat in someother group of a hundredexperiments, gous to the one just performed,one kind of selectionmay give a reresult widely different from ll2. The principle of randomness to quires only that the relativefrequencyshould converge l/2 when the number of results in an arbitrarily selectedpartial sequence becomes larger and larger.
SUM M ARY OF T HE D E FIN ITION

THE DEFINITIONOF PROBABILITY by lirnits; (ii) thesefixed limits are not affected any placeselection. the frequency someattribute of is to say,if we calculate relative That but according not in the original sequence, in a partial set,selected so to somefixed rule, then we requirethat the relativefrequency shouldtendto the same limit asit doesin the originalset. calculated as 3. The fulfilment of the condition (ii) will be described the or of Principleof Randomness the Principleof the Impossibility a Gambling System. frequency a givenattribute, of 4. The limiting valueof the relative to of will assumed be independent any placeselection, be called'the Whenever probabilityof that attributewithin the givencollective'. of this qualification the word 'probability'is omitted,this omission for as shouldbe considered an abbreviationand the necessity referenceto somecollectivemust be strictly kept in mind. fulfills only the first condition 5. If a sequence observations of (existence limits of the relative frequencies), not the second but of one, then such a limiting value will be called the 'chance' of the occurrence the particularattributeratherthan its 'probability'. of

detailsand suchconI do not needto insisthereon mathematical for making the definitions siderationswhich are only necessary point of view. Thosewho are intercompletefrom a mathematical ested in this question may refer to my first publication on the (1919)or to my textbookof the foundationof probabilitycalculus the theoryof probability (1931), presenting theoryin a simplifiedand, p,224.) ln to it seems me,improvedform. (See Notesand Addenda, my third lecture I will deal with various basic questionsand with differentobjectionsto my definition of probability and theseI hope to be able to refute. I trust that this discussionwill dispel those doubts which may have arisen in your minds and further clarify certain points. In closing this first lecture,may I summarize briefly the propositions which we have found and which will serveas a basisfor all future discussions. Thesepropositionsare equivalentto a definition in of mathematical probability, in the only sense which we intend to usethis concept. to L It is possibleto speakabout probabilitiesonly in reference a properly definedcollective. or of 2. A collectiveis a massphenomenon an unlimited sequence observations fulfilling the following two conditions: (i) the relative frequencies particular attributeswithin the collectivetend to fixed of 28

29

T H E EL EM EN TS

OF TH E TH EOR Y

OF PR OBABIL ITY

SECOND LECTURE

The Elementsof the Theory of Probability


lN the first lecture I have already mentionedthat the conception whichI developed there,defining probabilityasthe limiting valueof an observablerelative frequency,has its opponents.In the third lectureI intend to exarnine objections the raisedagainstthis definition in greater detail.Beforedoingthis, I shouldlike, however, to describe briefly the applicationof the fundamental definitionsto real how theycanbe usedfor solving events, practical problems; short, in I shalldiscuss valueand utility. The applicability a their general of theory to reality is, in my opinion, if not the only, then the most importanttestof its value. rHE THEoRY PROBA":?;;.T A scrENcEsrMrLAR oF ro I beginwith a statement whichwill meetwith the immediate opposition of all who think that the theory of probabilityis a science fundamentally differentfrom all the other sciences governed and by a specialkind of logic. It has beenasserted-andthis is no overstatement-thatwhereas other sciences draw their conclusions from what we know, the science probabilityderives most important of its results from what we do not know. 'Our absolute lack of knowledge concerning conditionsunder which a die falls,' saysCzuber,l the 'causes to conclude us that eachsideof the die has the probability l16.'If , however, lack of knowledge our wereascomplete Czuber as assumes to be, how could we distinguish it between two pairsof the dice shownin the preceding lecture Yet, the probability of casting ? '6' with one of them is considerablv differentfrom l/6-at least, accordingto onr definition of probability. ln fact, we will havenothing to do with assumptions fantasticas as 30

that of a distinctkind of logic usedin the theoryof probability.Twice two are four; B and the contrary of B cannot both follow from one true premise-any morein thetheoryof probabilitythan andthesame And ex nihilo nihil is true in the theory of probability as elsewhere. the well. Like all the other natural sciences, theory of probability orders them, classifies them, derivesfrom shrts from observations, certain basic conceptsand laws and, finally, by meansof the them which can usualand universallyapplicablelogic, drawsconclusions by with experimental In results. other words, be tested comparison is distinguished in our viewthetheoryof probability a normalscience, by a specialsubjectand not by a specialmethod of reasoning.
T H E P U R POSE OF TH E TH EOR Y OF PR OBABIL ITY

that the From this sober scientificpoint of view, which assumes lawsof reasoning, thesame and fundamental methods appliare same we cable in the theory of probability as in all other sciences, can the describe purposeof the theory of probability as follows: Certain exist which are in someway linked with eachother, e.9., collectives the throwing of one or the other of two dice separatelyand the throwing of the sametwo dice together form three collectivesof this kind. The first two collectives determinethe third one, i.e., the one whereboth dice are thrown together.This is true so long as the two dice,in falling together,do not influenceeachother in any way. If there is no such interaction, experience shown that the two has dicethrown togethergiveagaina collectivesuchthat its probabilities canbe derived,in a simpleway, from the probabilitiesin the first two collectives.This derivation and nothing else is here the task of probability calculus.In this problem the given quantitiesare the six probabilitiesof the six possibleresultsof castingthe first die and the six similar probabilitiesfor the seconddie. A quantity which can be calculated for example, probability of castingthe sum '10' is, the with the two dice. This is very much like the geometrical problem of calculatingthe lengthof a sideof a trianglefrom the known lengthsof the two other sidesand the angleformed by them. The geometer doesnot ask how the lengthsof th6 two sidesand the magnitudeof the anglehavebeen measured;the source from which these initial elementsof the problem are taken lies outsidethe scopeof the geometrical problem itself.This may be the business the surveyor,who, in his turn, may of haveto use many geometricalconsiderations his work. We shall in find an exact analogy to theserelations in the interdependence of 31

tF-

PROBABI LI TY.

STATI STI CS

AN D T R U T H

T H E E L EM EN TS

OF TH E TH EOR Y

OF PR OBABIL ITY

statisticsand probability. Geometryproper teaches only how to us determine certainunknown quantities from other quantities which are supposed be known-quite independently the actualvalues to of of theseknown quantitiesand of their derivation.The calculus of probability,correctlyinterpreted, providesus, for instance, with a formulafor the calculation the probabilityof casting sum,10', of the or the'double 6', with two dice, a formula which-is of general validity, whichever pair of dice may be used,e.g.,one of t[e two pairsdiscussed the preceding in lecture, anotherpair fonnedfrom or these four dice,or a completely new and differentset.The six probabilities the six sides the first die, and the corresponding of for of iet six probabilities the second may haveanyconceivable for die, values. The sourcefrom which thesevaluesare known is irrelevant. the in sameway in which the source knowledge the geometrical of of data is irrelevantfor the solution of the geometrical problemin which these data are used. A greatnumberof popularand more or lessserious obiections to the theory_ probabilitydisappear oncewhen we recognize of at that the purpose of this theory is to determine,fromlhe given -exclusive probabilities a numberof initial collectives, probabilitieJina in rhe newcollective derived from the initial ones. mathematician A teased with the question, 'Can you calculate probabilitythat I shallnot the missthe next train?', must declineto answerit in ihe sameway as he would decline to answer the question, 'Can you calculatethe distancebetweenthesetwo mountain peaks ?'-namely, by saying that a distancecan only be calculated other appropriatedistaircei if and angles known,and that a probabilitycanonly be determined are from the knowledge otherprobabilities whichlt depends. of on Because certainelements geometry of havefor a long time been includedin.the general courseof educition,everyeducitedman is able to distinguish the .between practicaltask of'the land surveyor and the theoretical investigation ofthe geometer. corresponding The distinctionbetween theoryof probabilityand statistics yet t6 the his be recognized.
T HE BEGINNING AND T HE E N D OF E A C H BE PROB A B ILITIES P R OBLE M

by can arrivedin the lastparagraph be restated saying:In a problem are the calculus, dataaswellastheresults probabilities. of probability namely, waslaid aboveon the first part of this statement, Emphasis I theitarting poirrtof all probabilitycalculations. shouldlike to add part. the a few wordsconcerning second to The result of eachcalculationappertaining the field of probis always,as far as our theory goes,nothing elqe but a ability probability,or, using our generaldefinition,the relativefrequency infinitelylong) long (theoretically, bf a certaineventin a sufficiently The theory of probabilitycan neverlead of sequence observations. a concerning singleevent.The only question to a definitestatement in is: that it can answer what is to be expected the courseof a very It of long sequence observations? is important to note that this probabilityhasone of valid alsoif the calculated remains statement valuesI or 0. the two extreme theoryof probability,and to somenew Accordingto the classical that the correof versions this theory,the probabilityvalue I means spondingeventwill certainlytake place.If we acceptthis, we are of that the knowledge a probabilityvalue admitting,by implicatiou, to us, can enable undercertaincircumstances, predictwith certainty If, the resultof any one of an infinitenumberof experiments. however, we define probability as the limiting value of the relative frequency, probability value I doesnot meanthat the correspondthe forming the ing attribute must be found in everyone of the elements collective. This can be illustratedby the followingexample: by distinguished the two of Imaginean infinite sequence elements has the different attributes A and B. Assume that the sequence followingstructure:First comesan A, then a B, then againan A, B's, then a group of two consecutive againone A, then a group of by three B's, and so on, the singleA's being separated steadily growinggroupsof B's: . ABABBABBBABBBBABBBBB. . by it This is a regularsequence symbols; can be represented a of that, with the mathematical formula, and it is easilyascertained increasing the frequency the attribute of numberof elements, relative of the relativefrequency B con' A converges towards0, whereas vergestowardsunity. The samething can occur also in irregular that its sequences. Once an attribute is rare enough,it is possible to relativefrequency, althoughneverattainingthe value0, converges In this valuewith increasing lengthof the sequence. other words,its limiting value may be 0. We see,therefore,that the probability 0

M UST

We all know,I think, that in eachmathematical problemthereare, on the onehand,a numberof known quantities data,and,on the oi other, certainquantitieswhich are to be determined and which, after this determination, calledresults.The conclusion which we are at 3?.

PR O BABI LI TY,

STATI STI CS

AND TRUTH

OF THE ELEMENTS THE THEORYOF PROBABILITY the for It is usefulto havea shortexpression denoting tvholeofthe in We to attached the differentattributes a collective. probabilities 'rhullur. for this purposethe word distribution. we think of the If the for in ofchance, reasons thischoice ofchances a game distribution six bet, If, understood. for instance, players each will beeasily of term are of oneof the six differentsides a die,the chances 'distributed' on is of chance eachof the players equal in sucha way that the relative If to the probabilityof the sidewhich he haschosen. the die is an are one, all the chances equal; they are then uniformly unbiased is die, the distributionof chances nonWith a biased 'distributed'. the uniform. In the caseof a simplealternative whole distribution of consists two numbersonly. whosesum is 1. To illustratethe of meaning the word 'distribution',one can also think of how the of in are attributes distributed the infinitesequence possible different the If, forming the collective. for instance, numbersl/5, elements the 3/5, and l/5 represent distribution in a collectivewith three of A, attributes B, and C, the probabilities A and C beingl/5 each, of long sequence observandthat of B being3/5,thenin a sufficiently A, we ations shallfind the attributes B, andC'distributed'insucha cases waythat the first and third of themoccurin I /5 of all observed 3/5. in the second the remaining and
PR OBA B ILITY OF A H IT; C ON TIN U OU S D IS TR IB U TION

means only a very rare-we may say,an infinitelyrare-occurrence of an event,but not its absolute impossibility. the sameway the In probability I meansthat the corresponding attributeoccursnearly always, not that it mustbe necessarily but foundin eachobservation. In this way the indeterminate character all statements the probof of ability theoryis maintained the limiting cases well. in as It remainsnow to give a more detailedconsideration to what as is meantby the deriuation onecollectiue of an from another, operation to which we haverepeatedly referred.It is only from a clearconception of this process that we canhopeto recognize fully the natureof the fundamentaltask of probability calculus.We shall begin this investigationby introducing a new, simple expressionwhich will permit us to makeour statements a clearer in and simplerway. It is at first only an abbreviation; later,however, will leadus to a slight it extension the conceptof a collective. of
DIST RIBUT ION IN A C OLLEC TIVE

The elementsor membersof a collective are distinguishedby certainattributes,which may be numbers,as in the caseof the game ofdice, colours,asin roulette(rougeet noir),or any other observable properties. smallest The numberofdifferent attributesin acollective is two; in this casewe call ita simplealternatiue.Insucha collective thereare only two probabilities,and, obviously,the sumof thesetwo must be l. The gameof 'headsor tails' with a coin is an example of such an alternative,with the two distinctiveattributes being ihe two different facesof the coin. Under normal conditions, each of theseattributeshas the sameprobability, ll2. ln problemsof life insurancewe also deal with a simple alternative.The two possible events are,for example, deathofthe insuredbetween first and the the the last day of his forty-first year, and his survivalbeyondthis time. In this example, probabilityof the first eventis 0.01I and that of the the second 0.989. othercases, one In suchas the gameof dice,more than two attributes are involved. A cast with one die can sive six different results,corresponding the six sidesof the die. Tliere are to six distinctprobabilities their sumis againl. If all the six results and are equallyprobable, singleprobabilities havethe value l/6. the all We call a die of this kind an unbiased one.However, die may be the biased;the six probabilities will still be proper fractionswith the of lgm I, althoughnot all equalto l/6. The values thesesix proba-to bilities must be known if the correspondingcollective is be considered given. as 34

aboveleadsto the consideraThe concept distributiondefined of Imagine man a tion of certaincases whichhavebeensofar left aside. event; thereis shootingrepeatedly a target.This is a repetitive at indefinitely. continued nothingto prevent from being,in principle, it part of the target,beginBy assigning numberto eachconcentric a outsidethe and ning with I for the bull's-eye, endingwith the space last ring, we can characterize eachshot by meansof a number.So the far thereis nothingunfamiliar theexample; numberof different in attributesof the collective which, togetherwith the corresponding probabilities, makesup the distribution,is equalto the numberof differentconcentric regionsof the target.Naturally,in order to be that the usual able to speakof probabilities all, we must assume at and conditionsconceining existence limits of frequencies of of the randomness satisfied. are We may, however,treat the sameexamplein a slightly different of way. We may measure distance eachhit from the centreof the the target and consider this distanceas the attribute of the shot in ring' to question,insteadof the numberassigned the corresponding 35

PROBABI LI TY,

STATI STI CS

AN D T R U T I I

T H E E L EM EN TS

OF TH E TH EOR Y

OF PR OBABIL ITY

'Distance' thenalsonothingbut a nurnber-thenumberof units is of length which can be markedoff on a straightline between two given points. As long as we measure this distance whole centiin metres only, the situationis not very differentfrom that previously described, eachshotis characterized, before, aninteger.If and as by the .radius.of targetis I metre,the number. differentpossible the of attributes l0l, namely,the integers is from 0 to 100;consequently thereare l0l differentprobabilities, the distributionconiistsof and lol properfractions givingthe sum l. Everyone, however, feelsthat the measure a distancein centimetre of units is not an adequate expression the notion of distance. of Thereare more than lusf tOt different distances between and I metle.Geometry 0 teachei thar us distance a continuous is variable, which may assume everypossible valuebetween and 100,i.e., values 0 belonging the infiniteset of to fractionalnumbersas well as thosebeton-gin! the finite set of to wholenumbers this interval.We arrivein thii way at the ideaof a in collective with an infinite numberof attributes.In such cases the classicalbooks speak of geometrical probabilines,which are thus contrastedwith qrlthmeticalones,wherethe numberof attributesis finite. We do not proposeto questionhere the appropriateness of theseterms. However,in a caselike the present-onei order to in describe distributionin the sameway ai before, the one would need an infinitesetof fractionsto represent corresponding the probabilities,and to present is obviously imposiible.Fortunately, _sl9h 1.set the.way of solvingdifficulties this kind wal discovered of long ago, and everybodywho has some slight knowledgeof analysisi'noi*s how to proceed a case in like this.
PROBABIL IT Y D EN S ITY

To explainhow it is possible describe to distributions whichan in infinite continuumof attributesis involved,we may consideran analogous casein anotherfield. Imaginethat we havl to distribute a certainmass, say I kg, alonga straightline I metrelong. As long as the numberof loadedpointsremains finite, the distribltion conl sists a finitenumberof fractions-fractions a kilogramassigned of of to any such point. If, however,the weight has to be distributed continu^ously alongthe wholelengthof the straightline, e.g.,in the form of a rod of nonuniformthickness, metrelong and-of I kg I weight, canno longerspeak single we of loadedpointslNevertheless] the meaningof the expression distributionof massis quite clear in thiscase well.For eiample,we saythat moremass concentrated as is 36

in a certain elementof length in the thicker part of the rod than in in an equalelement its thinnerpart, or that the massdensity(mass per unit length)is greaterin the thicker and smallerin the thinner iart. Again, if the rod is uniformly thick, we speakof a uniform of speaking, distribution full desthe is Generally distribution mass. cribed by the indication of the mass density at each point of the line. to of It is easyto extendthis concept the case hits on a target.To a ofdistancebetween and 100cm therecorresponds 0 segment each certainprobabilityof findinga hit in it, and the distributionof these by hits is described their density(numberof hits per unit length) in each part of the target. We take the liberty of introducing a new contains and 'probabilitydensity',2 state:If a collective expression, onlv a finite number of attributes. with no continuous transition them, then its distributionconsists a finite numberof of between the corresponding these to attributes. however, attriIf, probabilities variablequantities,e.g., distances from a butes are continuously by fixed point, the distributionis described a functionrepresenting the probability density per unit of length over the range of the variable. continuous Let us take againthe caseof shotsat a target.Assumingthat the shotswerefired blindly, we may expectthe number of shotshitting of a ring near the outer circumference the target to be greaterthan the that of the shots hitting a ring nearer to the centre, because surfaceof the former is larger. We may thus expectthe probability to powerof to proportionally the radius(or to some density increase the radius). We shall have to deal later with certainproblemsarising from with continuouslyvarying attributes; the existence collectives of we shall further discussthe generalization this concept so of as to include attributes that are continuousin more than one or dimension, i.e., densities surfaces volumesrather than on on we lines. At this stageof our discussion merely mention these questions order to illustrate the notion of distribution, and in to indicate its wide range.By using this concept,we can now give a more preciseformulation of the purposeof the theory of probability. collectives, new collectiveis a From one or more well-defined which will be described the followingparain derived(by methods graphs).The purposeof the theory of probability is to calculatethe from the known distribution(or distributionin the new collective distributions) the initial ones. in

37

lr7

PROB ABI LI TY,


T HE

STATI STI CS AND T R U T H


F OUR F UNDAM ENT A L OP E R A TION S

T H E EL EM EN TS

OF TH E TH EOR Y

OF PR OBABIL ITY

The abovestatement contains concept a .collective the of derived from others'and this requires closerconsideration. How can a new collective derived be from a givenone?Unless mannerin which the sucha derivation madeis clearlyexplained, that hasbeensaid is all so far is in dangerof becoming devoidof meaning. Thereare four, and only four, waysof derivinga collective alf problems and treated by the theory of probability can be reducedto a-combination of thesefour fundamentalmethods.Most practical problemsinvolve the app_lication, often a repeated one, of Jeveral fundamentaloperations. We shall now consider eachof them in turn. In eachoi the four cases, basictaskis to compute newdistribution terms our the in of the.given ones,and I do not expect that this will giveriseto any greatdifficultl. f!9 first two of the four fundamentaf operations ari of surprising simplicity. Some you mayeventhink thai I am trying of to avoidthe real mathematical difficultiesof the theoryof probabilitv. difficulties which you think are bound to exist on-accountof tlie largenumberof formula usuallyfoundin textbooks. This is far from m)r-purpose.By saying that all the operationsby which different co.llectives-are broughtinto mutual relationin the theoryof probability c_an rediced to four relativelysimpleand easilyexplained be types,I do not suggest that thereare no difficulties fhe sblution in of problemswhich we may encounter. Suchdifficulties arisefrom the complicated combination a greatnumberof the four fundaof mental operations. Remember that algebra,with all its deep and intricateproblems, nothingbut a development the four fundais of mental operationsof arithmetic.Everyonewho understands the meaning addition,subtraction, of multiplication, and divisionholds the key to all algebraic problems.But the correctuse of this key requires long trainingand greatmentaleffort.The sameconditioni are found in the calculus probability.I do not plan to teachyou of in a few lectures how to sofueproblemswhich have occupied"the mindsof a Bernoullior a Laplace, well as of many greafmatheas maticians our time. On the other hand,nobody*oitA willingly of give.up the k-nowledge the four fundamentaloperationsof arfhof metic, evenif he werefree of all mathematical ambition and had no need performanymathematical to work. This knowledge valuable, is not only from the point of view of practicalutility, but also for its educational value.By expta,injlg brieflythe four fundamental operations of the theoryof probability,I hope to achieve sametwo the objects: to give you tools for solving occasionallya simple 38

problem,and,what is moreimportant,to giveyou some probability of understanding what the theoryof probabilitymeans;this is a pel'son' to rnatterof interest everyeducated
FIR S T FU N D A ME N TA L OP E R A TION : S E LE C TION

The first of the four fundamental operations by which a new collective can be derived from one (or several)initial ones is called selection.Imagine, for instance, a collective consisting of all casts made with a certain die, or of all games on a certain roulette table. New collectives can be formed, for instance, by selectingthe first, fourth, seventh . . casts of the die, or the second, fourth, eighth, sixteenth. . gamesof roulette-generally speaking,by the selection of elements occupying certain places in the total sequenceof the original collective. The attributes in the new collective remain the sameas in the original one, namely, the number of points on the die, or the colours 'red' or 'black' in roulette. We are interested in the probabilities in the new collective, e.g., the probabilities of 'red' and 'black' in a selectionof roulette gamesconsisting only of games the order-numbersof which are, say,powers of 2 in the original sequence. According to an earlier statement concerning the properties of collectives,especiallytheir randomness,the answer to this question is obvious: the probabilities remain unchanged by the transition from the initial collective to the new one formed by selection. The six probabilities of the numbers I to'6 are the same in the selected of sequence gamesof dice as they were in the original one. This, and nothing more, is the meaning of the condition of randomness imposed on all collectives. The whole operation is so sirnple that it hardly requires any further explanation. We therefore proceed immediatelv to the followins exact formulation: From a given collectivJ, many new ones can be formed by selections of various kinds. The selectedcollective is a partial sequence derived from the complete sequence by the operation of place selection.The attributes in the selectedcollective are the same as in the original one. The distribution within the new collective is the sameas in the orieinal one.

SE C OND

FU N D A ME N TA L

OP E R A TION :

MIX IN C

The second method of formation of a new collective from a given one is scarcelymore complicated: this is the operation calledmixing.

39

PROB ABI LI TY, STATI STI CS AN D T R U T H T H E EL EM EN TS OF TH E TH EOR Y OF PR OBABIL ITY

take the samegameof dice as in First, let us consider example: an the previousexample, elements the collective the of beingthe consecutive casts,and the attributes,the six differentpossibleresults. The following questioncan be asked:What is the probability of castingan even number? The answeris well known. Of the six numbersI to 6, threeare evenones,2,4, and6. The probabilityof an evennumberis the sum of the probabilities these threeresults. of I hardly expectanyoneto doubt the correctness this solution; it of is alsoeasyto deduce by means the definitionof probabilityas it of the limiting valueof relativefrequency. general The principleunderlying the operation easily is recognizable. haveconstructed new We a collective consisting the sameelernents before,but with new of as attributes. Insteadof the six former attributesI to 6, we havenow two new ones,'even'and 'odd'. The essential point is that several original attributesare coveredby a singlenew one. It would be differentif an original attributewerereplaced several by new ones, for this wouldmakethe calculation the newprobabilities of from the initial onesimpossible. The term 'mixing' is chosen connotethat to several original attributesare now mixed togetherto form a single new attribute.We can also saythat mixing is performedon several elements differing originally by their attributes,but now forming a unit in the new collective.
INEXACT ST AT EM ENT OF TH E A D D ITION R U LE

Perhaps someof you will remember from schooldaysreferences to the probabilityof 'either-or'and the following propositionconcerningthe calculation unknownprobabilities of from known ones: The probabilityof castingeither2 or 4 or 6 is equalto the sum of the probabilities eachof these of results is, separately. statement This however,inexact; it remainsincomplete even if we say that only probabilities mutuallyexclusive of events can be addedin this way. The probabilityof dying in the intervalbetween one'sfortieth and one'sforty-firstbirthdayis, say,0.01 and that of marryingbetween l, the forty-first and the forty-secondbirthdays0.009.The two events are mutually exclusive;nevertheless cannot say that a man we enteringhis forty-first year has the chance0.01I + 0.009: 0.020 of eitherdyingin the course this yearor marryingin the course of of the followingyear. Theclarification theensuing and correctformulationof the mixing operationcan only be achieved havingrecourse the conceptof by to the collective. The difference between correctformulationof the the 40

addition rule and the incorrect one follows from the principle that only such probabilities can be added as are attachedto different The operation consistsin attributes in one and the samecollectiue. of however, mixing only attributes this kind. In the aboveexample, belonged two differentcollectives. first to The the two probabilities was were collective that of all menagedforty, and the two attributes and nonoccurrence death in the courseof the of the occurrence collective formedof menwho was forty-firstyearof age.The second their forty-firstyear,and who weredividedinto groups haveattained by of characterized the occurrenceor nonoccurrence the event of marriagein the courseof the following year. Both collectives are The only possible mixing operationin eachof simplealternatives. of themis the additionof the two probabilities, life and death,or of single-giving in eachcasethe sum l. It is marryingand remaining to not permissible mix together attributes belonging two different to collectives. Anotherexample whichshows clearlythe insufficiency the usual of 'either-or'propositionfollows: Considera good tennisplayer.He may have 801 probabilityof winning in a certaintournamentin London. His chance winning anothertournament New York. of in beginning the sameday, ma! U. 7O\. The possibility playing on of in both tournaments ruled out, hence, events mutuallyexis the are clusive, it is obviously but nonsense saythat the probabilityof his to winning either in London or in New York is 0.80+ 0.70: 1.50. In this case again,the explanation ofthe paradoxliesin the fact that the two probabilities refer to two different collectives,whereasthe additionof probabilities only allowedwithin a singlecollective. is

U N IFOR M

D IS TR IB U TION

A very specialcase of mixing which occurs often is sometimes giventhe first placein the presentation the theoryof probability; of it is even assumed that it forms the basisof evervcalculationof probabilities. havepreviously We asked:What is the probabilityof castingan evennumberof points with a die? The general solution of this problemdoesnot depend the special values the probaon of bilitiesinvolved, i.e.,thoseof the results,2,4,6. The die may be an unbiased one, with all six probabilities equalto l/6; the sum is in this case116 116 116: ll2.The die may, however, one of be + + the biased ones,suchas we havealreadyusedseveral times,and the six probabilities may be differentfrom l/6. The rule, according to 4l

P ROB AB IL IT Y, S T AT IS T IC S AN D TR U TH which the probability of an even riumber of points is equal to the sum of the probabilities of the three possibleeven numbers, remains valid in either case.The specialcasewhich we are now going to consider is that first mentioned,i.e., that of the unbiaseddie, or, as we are going to call it, the caseof the 'uniform' distribution of probabilities. In this case a correct result may be obtained in a way slightly different from the one wc have used before. We begin with the fact that six different results of a cast are possible and each of them is equally likely. We now use a slightly modified method of reasoning: we point out that, out of the six possibleresults,three are 'favourable' to our purpose (which is to cast an even number) and three are 'unfavourable'. The probability of an even number, that is 3/6 : l12, is equal to the ratio of the number of favourable results to the total number of possibleresults. This is obviously a special case of a general rule, applicable to cases in which all attributes in the initial collectivehave equal probabilities.We may assume,for instance, that the number of possible attributes in n. and that the probability of the occurrenc'e each of them is lln. Assuming of further that m among the n attributes are mixed together to form a new one, we find, by means of thc addition rule, that the probability of the new attribute (in the new collective) is a sum of ru terms, each equal to lln.In other words the probability is mln, or equal to the ratio of the number of favourable attributes to the total number of different original attributes. Later, we shall show how this rule has been misusedto serve as a basis for an apparent definition of probability. For the time being we shall be satisfiedwith having clearly stated that the determination of probabilities by counting the number of equally probable, favourable and unfavourable, casesis merely a very special case of the derivation by mixing of a new collective from one initially given. I have already made use of this specialform of the mixing rule in my first lecture, without explicit mention. We spoke there of two collectives,whose elementswere the consecutivedraws in a lottery. In the first case, the attributes considered were all the different numbers of the lottery tickets; in the second case, the numbers ending with five 0's were bracketed together. Ten numbers ending with five 0's exist between 0 and one million. By adding their probabilities, with the assumption of a uniform distribution of probabilities in the initial collective, we found the probability of drawing a number ending with five 0's to be equal to l0 in a million, or

TH E

ELE ME N TS S U MMA R Y

OF TH E OF TH E

TH E OR Y MIX IN G

OF R U LE

P R OB A B ILITY

I will now briefly formulatethe mixing rule, as derivedfrom the of concept a collective. possessing more than two attri Startingwith an initial collective manydifferentnew collectives be derived 'mixing'; the can by butes, of are elements the new collective the sameas thoseof the old one, e.g., their attributesare 'mixtures'of thoseof the initial collective, all odd numbersor all even numbers,rather than the individual 1,2, 3,. . . The distribution the new collective obin is numbers, from the givendistribution theinitial collective summing in by tained the probabilitiesof all those original attributeswhich are mixed to together form a singleattributein the new collective. The practical application of this rule has alreadybeenillustrated I that this rule can be by simpleexamples. would mention in passing to that of discussed, extended includecollectives, the kind previously rangeof attributes. Highermathematics teaches us havea continuous of is by that,in a case this kind, additionof probabilities replaced an calledintegration, which is analogous additionbut less to operation Remember, instance, example shooting for of easilyexplained. the at a target.Let us assume that the probability densityis known for all distances from the centreof the target. The probability of a hit somewhere between m and I m from the centre, i.e.,in the outer 0.5 half of the target,can be calculated a mixing operation,involving by the integrationof the densityfunction between limits 0.5 m to the 1.0 m3. Theseindicationsare sufficient thosewho are familiar for with the fundamentalconcepts analysis. of Othersmay be surethat, although these generalizations for are undoubtedly necessary the solution of many problems,they are irrelevant from the point of view of thosegeneral principles which are our only concern these in lectures.
T H I R D FU N D AM EN TAL OPER ATION : PAR TITION

0.00001. 42

After havingconsidered first two operations which a new the by collective can be derived,thoseof selection and of mixing, we now turn to the third one, which I call partition. The choiceof this term will soonbe clearto you; the word suggests certainanalogy the a to arithmetical term 'division',the operationin questionbeingin fact a 'divisionof probabilities'. helpyou understand third operaTo this tion, I shall start with the samecollectivewhich servedfor the explanation the first two operations, of namelythat formed by a 43

T H E E L EM EN TS OF TH E TH EOR Y OF PR OBABIL ITY

PROBABI LI TY,

STATI STI CS

AND T R U T H

series throwsof a die from a dice-box. of The six attributes again are the numbersof points appearingon the top side of the die. The corresponding probabilities six havethe sum I, without beingnecessarily equal to l/6 each.The uew problemwhich we are going to discuss now and to solveby meansof the operationwhich we cail partitionis the following:What is the probabilitythat a resultwhich we alreadyknow to be an evennumberwill be equal to 2? This question may appearsomewhat artificial,but it can easily givena be form which is often met in real life. Imaginethat you are standingat a bus stop wheresix different buses linespassby. Threeof them are served double-decked by and threeby single-decked buses. The first onesmay bear the numbers 2, 4, 6; the second ones, numbersI, 3, 5. Whena busapproaches the the stop, we recognize from afar to which of the two groups it belongs. Assuming that it hasa doubledeck,what is the probability of its belonging line No. 2? To solvethis problem,we must of to (or, practicallyspeaking, courseknow the six original probabilities the relativefrequencies) the six services. Assuming that they are of all equallyfrequent, are all and that the probabilities therefore equal to 116(thus corresponding the caseof an unbiaseddie), the to answeris easy:the probabilityof a double-decked beingNo.2 bus is l/3. One of the arguments which we can arriveat this resultis by as follows: Thereare threedifferent,equallyprobablepossibilities; only one of them is a favourable one; its probabilityis, therefore, according a rule quotedpreviously, to equalto l/3. This methodof is, not alwaysapplicable; cannotbe usedif it calculation however, the six bus linespassthe stop with different frequencies, if the six or sidesof the die have different probabilities.We arrive at a general solutionand at a general statement the problemby inquiringinto of the nature of the new derivedcollective.We are by now sufficiently 'probabilityof an event' accustomed the idea that the expression to has no exactmeaning the in unless collective which this eventis to be considered beenprecisely has defined. For the sakeof simplicity, us return to the example the die. of let The newderivedcollective may be described follows.It is formed as of In of elements the initial collective not of all its elements. fact. but by it containsonly thosecastsof the die which are distinguished havingthe commonattribute'evennumberof dots'. The attributes within the new collectiveare the sameas in the initial collective, namely, 'numbers dots on the uppersideof the die', but, whereas of in the initial collective thereweresix different attributes1,2, . .,6, in the derived collective thereare only three,2, 4, and6. We saythat 44

resultedfrom a partition into two categories of the new collective of One of them,the elements of the elements the originalcollective. by which are distinguished the common attribute 'even number', collective. is importantto realize It that this partiformsthe derived quite differentfrom the placeselection which we tion is something before.The latter consists selecting in certainelehave discussed to according a rule, ignoringthe mentsout of the initial collective, while specifying order numbers the elements be the of to attributes for We selected the new collective. haveseenthat the probabilities in obtained thisway arethe same in the originalone. in a collective as with partition,the decision On the otherhand,whendealing whether is to a givenelement to be selected takepart in the derived collective is specifically basedon what its attributeis. As a result,the probabilities within the derivedcollectiveare essentiallv differentfrom thosewithin the original one and the mannerof thbir changeis the of subject the followingsection.
P R OB A B ILITIE S A FTE R P A R TITION

(l Let us assume the probabilities the six possible that of results to in 6 points) theoriginal collective 0.10,0.20,0.15,0.25,0.10, are and 0.20 respectively, their sum being l. It is unimportant whether we think of the gameof dice, or the caseof the buses. The way in which the six probabilitieshavebeenderivedis equallyirrelevant.We now add together probabilities all evennumbers, the fractions the of i.e., 0.20,0.25,and 0.20; this givesthe sum 0.65 as the probabilityfor the occurrence any one of the evennumbers(second of fundamental problem,mixing, solvedby rheaddition rule). Accordingto our concept of probability, the frequencyof 'even' resultsin a sufficiently longsequence observations, equalto 65ft. About 6500 of is elements amongthefirst 10,000 observed haveevennumbers theirattributes. as About 2000of them havethe attribute2, since0.20is the frequency of this attribute. We are now going to form a new collective by excluding from the initial one all elements whoseattributes odd are numbers. Among the first 6500elements the new collective, of we find 2000elements with the attribute 2; the relativefrequencyof this : attributeis therefore 2000/6500 0.308. Since calculation the which we have just carried out is, strictly speaking,only valid for an infinitely long sequence observations, result which we have of the obtained, fraction0.308,represents the alreadythe limiting valueof the relativefrequencyof the attribute 2 in the new collective;in other words,0.308is the probabilityof this attribute.The general 45

PROBABI LI TY. STATI STI CS AND TRUTH T H E E L EM EN TS OF TH E TH EOR Y OF PR OBABIL ITY

rule for the solutionof problems this kind is now easilydeduced of from this special case. The first stepis to form the sum of the given probabilities all thoseattributeswhich are to be retainedin the of partition,i.e.,the 2's,4's,and 6's in our example. The next stepis to divide by this sum the probability of the attribute about which we are inquiring(2 in the chosen The proexample); 0.2010.65:0.308. cedure in fact that of a divisionof orobabilities. is
I NIT IAL AND F INAL PROBABILITY OF A N A TTR IBU TE

It is usefulto introduce of distinctnames the two probabilities for the sameattribute, the given probability in the initial collectiveand the calculated one in the new collectiveformed by partition. The current expressions thesetwo probabilitiesare not very satisfor factory, althoughI cannot deny that they are impressive enough. The usualway is to call the probabilityin the initial collective a the priori, and that in the derivedcollective a posterioriprobability. the The fact that theseexpressions a suggest connexionwith a wellknown philosophical is in terminology their first deficiency my eyes. Anotherone is that these expressions,priori anda posteriori, a same as are usedin the classical theoryof probabilityin a differentsense well, namely,to distinguish between probabilities derivedfrom empirical data and thoseassumed the basisof somehypothesis; on therefore, to sucha distinction not pertinent our theory.I pr'efer, is in give to the two probabilitiesin questionless pretentious names, whichhavelessfar-reaching general associations.will speakof I and initialprobabilityandfnal probability, meaning the first term the by probability in the original collective, and by the secondone, the probability(of the sameattribute)in the collective derivedby partition. In our numerical example attribute2 (two points on the the die, or the bus line 2), has the initial probability0.20,and the final : probability0.20/0.65 0.308. This meanssimplythat this attribute has the probability0.20 of beingfound amongall the elements of the sequence, the probability0.308of beingfound amongthose and elements which resulted an evennumber. in
T HE SO.CAL L ED PROBABILITY OF C A U S E S

number 2 amongthe even numbers2, 4, and 6, it is often argued of that the appearance an even number may have three different or by The 'causes', can be explained three different'hypotheses'. or'hypotheses'are nothing else the appearance but possible'causes' 2, final of oneof the threenumbers 4, 6. The above-calculated probdescribed the probabilityof the as ability 0.308is correspondingly of by appearance an evennumberbeing'caused' the number2. In this way an apparentlyquite new and specialchapterof probability dealingwith the 'probabilityof causes' the or calculusis opened, instead of the usual 'probability of 'probability of hypotheses', The partition problemis usuallypresented this theoryin in events'. the followingform: Three urns, filled with black and white balls, are placed on the each elementof which is table. We consideran initial collective, of observations. first observation The composed two separate conat sistsin selecting random one of the three urns and statingits number,1,2, ot 3. The second observation in consists drawinga ball selected in noting its colour.T[us the out of the urn previously and element the initial collective attributeof eachsingle of consists the of colour of the ball drawn and the numberof the urn from which this ball was drawn. Clearly,there are six different attributeswithin the originalcollective, namely, white and No. l, white and No. 2, white and No. 3, blackand No. l, etc.The corresponding probabilities six are given.Now assume that in a particularcasethe ball drawn was white,while the numberof the urn from which it wasdrawn is unknown. In that case may wish to calculate probability that the we the ball was drawn from urn No. I or, in other words, that the appearanceof a white ball was due to the cause that the urn selected was that bearingNo. l. The solutionis exactlyalongthe samelines as before:Theinitialprobabilityof theattribute'whiteandNo. I' hasto bedividedby the sumof the probabilitiesof the threeattributes, white and No. l, white and No. 2, andwhite and No. 3. The usualmetaphysicalformulation of this problem can only be explainedhistori cally. The partition rule was first derivedby ThomasBayes,a the in middle of the eighteenthcentury, and his original formulation has sincebeenreprintedin most textbookspracticallywithout alteration.
FOR MU LA TION OF TH E R U LE OF P A R TITION

Anotherexpression and whichI cannotleaveunmentioned, which I find equallymisleading is and confusing, often usedin connexion with the problemof partition. In the discussion cases of similar to that treatedin the preceding i.e., paragraphs, the probabilityof the 46

At this stage shouldlike to statedefinitely I that in our theoryno differencewhatsoever existsbetweenthe 'probability of causes'(or 'probability of hypotheses') and the more usual 'probability of 47

PROBABI LI TY,

STATI STI CS

AND T R U T H

THE ELEMENTS THE THEORYOF PROBABILITY OF we ask for the plobability of the result'3 on the first die and 5 on the seconddie'. We considerthe two dice in this problem to be by distinguished beingmarkedwith the figuresI and II or having colours,or otherdistinctive marks. different the Thoseamongyou who havelearned elements the theoryof of probabilityat ichool or havegiventhoughtto this problem,know how it can be solvedin a primitiveway. You will saythat it is the of as question a probabilityof 'this. well as that', and the rule is the If, multiplicationof probabilities. say,the probabilityof casting3 with the first die is l/7, and that of casting5 with the second is die : probability casting and 5 with both diceis ll7 x 116 of 3 116,the This becomes obviousif one thinks that only l/7 of all casts 1142. with the first die are to be takeninto consideration, that in l/6 and selected the casts second is expected showthenumber5. die of these to however, rulerequires this Clearly, exact statement foundation and before its generalvalidity can be accepted-a clarification of the type aswaspreviously givenfor the additionrule of the probasame bility of 'either-or'. The probabilityof casting with two dicethe sum 8 as well as the difference is, for instance,surelynot equal to the 2 product of the two corresponding singleprobabilities. shall now I consider thesehner points,and, in order to be able to present this investigation a more concise in form, I shall usea few simplealgebraic symbols. do not think that this will make thesearguments I too difficult to follow.
A NEW M E TH OD OF FOR MIN G P A R TIA L S E QU E N C E S :

i.e., sequences observaof events'.Our only subjectis collectives, tional data with various attributes obeying the two laws of the and existence limiting valuesof relativefrequencies randomness. of possessing more than two distinct attributes,a In everycollective partitioncanbe carriedout. After this,eachofthe attributes appearhas ing both in the initial and in the derivedcollectives two probaand the bilities-the initial one, i.e., that in the original collective, derivedby partition.Thereis no final one,i.e.,that in the collective formulation. placein this problemfor any metaphysical Beforeconsidering fourth and last of the fundamentaloperathe briefly the definitionof the partition tions, I want to summarize operation, and the solutionof the partition problem: then,by means If a collective involves morethan two attributes, of from it in the following a 'partition',a newcollective may be derived wav. of eonsiderthe setof attributes the initial collective choose and a certaingroupof them;pick out from theinitial collective elements all whose group.The selected attributes belongto the chosen elements, with their attributes will unchanged, form a new collective. The distribution within this new collectiveis obtainedby dividing each initial probability of a selected attribute by the sum of the probabilities all selected of attributes.
F OURT H F UNDAM ENT AL OPER A TION : C OMB IN ATION

The three fundamentaloperationsdescribed far-selection, so mixing,and partition-have one thing in common.In eachof them a newcollective derivedfrom oneoriginalcollective applying was by a certainprocedure its elements to and attributes. The founh and last operation, whichwe are now goingto consider, characterized is by the fact that a new collective formedfrom /wo originalones. is During the discussion this operation, shallat the same of we time gain a first insightinto the differentforms of relations between two or more collectives. call this fourth operationcombination. I The example which we are goingto useto explainthis operation will be, as far as possible, similarto our previousexamples the gameof of dice. The two initial collectives now two seriesof casts;the are attributes in both cases numbersI to 6. The corresponding the are two setsof six probabilities each,whichare not necessarily identical sets,are assumed be known. The final collective to consists a of sequence simultaneous of castsof both dice,and the attributes are thepossible combinations the numbers both dice.For example, of on 48

C OR R E LA TE D

S A MP LIN G

We consider gamein which two dice are cast.The methodof a castingis irrelevant;the dice may be cast from two boxesor from one commonbox, simultaneously consecutively. only essenor The tial point is the possibilityof eslablishing on.-io-on. .oir.rponu dence between casts die I and thoseof die II. We consider the of first only the results obtained with die I. Among the first r of themthere will be a certainnumber, sa.!ns;of castsin which 3 is the numberof points that appeared the face of the die..The ratio nrln is the on relativefrequency the result3 in casting I; the limiting value of die of this fractionnrln is the probabilityof casting with this die. 3 Now we go a stepfurther: Eachtime that die I hasproduced the result3, we note the resultof the corresponding of die IL This cast second will likewise the die produce resultsI to 6, in irregularalternation.A certainnumberof them, sz! n's; will show the result 5. 49

P ROBA BIL IT Y , S T AT IS T IC S A ND TR U TH The relative frequency of the result 5 for the second die (in these selected casts) is thus n'r/nr. As we consider now only a partial of derived from the complete sequence all casts by means sequence, ofthe condition that the correspondingcast ofdie I should have the result 3, the relative frequency n'uln, is not necessarily equal to the frequency of 5 in the collective composed of all casts with the seconddie. This kind of selection of a partial sequenceof the elementsof a collective is new to us. The processis different both from place selection, where the elementsare selectedby means of a pre-established arithmetical rule, independentof their attributes, and from partition, are a in which the elementsselected those possessing certain specified attribute. We need therefore a specialterm to denote this new operation, and we use for this purpose the expression correlated sampling, or sampling for short. We will say, for instance, that the second collective was sampled by means of the first one, or, more exactly, by the appearanceof the attribute 3 in the first collective. In this connexion it will be convenient to use the expressionsthe sampled collectiue and the sampling collectiue. The procedure may then be described as follows: We start by establishinga one-to-one correspondence between the elementsof the collective to be sampled and those of the sampling collective. This is done in our example by casting the two dice each time simultaneously.Next, we choose an attribute of the sampling collective (here the 3 of die I) and select thoseelementsof the sampledcollectivewhich correspondto elements of the sampling collective bearing the chosen attribute. In the above example, the first die may be used in 6 different ways to sample the casts with the second die, namely, by means of the attribute l, attribute 2, . ., etc.
M UT UAL L Y INDEPENDE N T C OLLE C TIVE S

TH E ELEM ENTS O F THE THEO RY O F PRO BABI LI TY assumefol the moment, that it is so. T'hisassumptionis simple to understand;it implies tha-tthe effcct of samplingi partial t"qu.n.. of out of the elements collectiveII is sinrilar'toihat of a placeselection, causingno changein probabilitiesat all. Is there, in fact. anv ground for suspectingthat the chance of casting 5 with die II mai be influenced by the fact that we reject all casti in which die I diil not show 3 points? A sceptic^may answer: .perhaps!It all depends on whetheror not the castsof die II are independent thosebf die of I.' But what does the word 'independent' mean in this connexion ? we can easilyindicateconditionsunder which two collectives are certainly not independent.Two dice tied together by a short thread cannot be expectedto produce two sequenceiof independentresults. However, to obtain a defnition of independence, must return to we the method that we have already used in defining the conceprs 'collective' and 'probability'. It consists choosingt-hatproperty of in the phenornenon which promises to be the most riseful bne for'the developmentof the theory, and postulating this property as the fundamental characteristic of the concept ;hich *e ur. going to define.Accordingly, subject ro a slight adiition to be madeTater"on. we now give the following definition: A collectivell will be said to be independent of another collective I if the process of sampling from colleclive II by meansof collectiveI, using'any of its attridutei does not changethe probabilitiesof the attri6utei of collectiveII. or, in other words, if the distribution within any of the samoled collectives remainsthe sameas that in the original collectiveII. If we now assume that the two dice in tfie above example are independentin the meaning o-fthe word just given, then our pioblem of.finding the probability of the combined attribute (3, 5j can be solved readily.
D ER IVA TION OF TH E MU LTIP LIC A TION R U LE

in paragraph, The ratio n'uln",whichwe considered the preceding of II is the relativefrequency the result5 in the collective which is by of collective So I. sampled means the attribute3 of the sampling valueof this ratio. We are going far, we do not know the numerical increasing lengthof the sequence the that by indefinitely to assume n, of observations, ratio n'uf tendsto a limiting value.But what the value?It is possible that this valueis equalto the limiting valueof of of the relativefrequency the attribute5 in the complete sequence I castscarriedout with die IL 'Possible', say,but not certain,since this doesnot follow from anythingwe havelearnedso far. Let us 50

We haveconsidered altogether castsof two dice and we have r? found that in n, of them the first die showed attribute3. Again, the thosen, caststhereweren', suchcastsin which the seiond lmong die had the attribute5. Hencethe total numberof castsbearinsthe combinedattribute3 and 5 was n'u.The relativefrequency of"this attributeis therefore uf and, limit of this ratio isjust theproban' n the bility we are lookingfor. Everybody familiarwith the useof mathematicalsymbols will understand equation: the
ns -lls i l n sn *ng.

5l

P ROR A BII-IT Y , S T .\T IST IC S AN D TR U TH ln other words, the relative frequency n'rfn is the product of the n'rln, and nsfn, both of which we have contrvo relative frequencies sidered previously. The limiting value of the second of them is the probability of a cast 3 with the first die; we denote itby p". According of to our assumption of independence the two dice, the ratio n'rf n" has the same limiting value as the relative frequency of 5 in the of completesequence castsof die II; in other words,its limiting value is the probability of casting 5 with the seconddie. Let us denote the probabilities correspondingto die II by the letter q, e.g., the probability of casting 5 by qu.According to a mathematical rule, the limiting value of a product is equal to the product of the limiting values of the two factors; the limiting value of n'ufn is thus the product p, x qu.In words: the probability of casting simultaneously 3 with the first die and 5 with the second die is the product of the probabilities of the two separateevents.Using the letter P to denote the probabilities in the game with the two dice, we can write the following formula: P g ,o :Ps x Qs. Analogous formule will be valid for all other combinations of two numbers, from 1,1 to 6,6. For instance,the probability of casting 5 with die I and 3 with die II is Pu,, : ps x qr, where pu denotes the probability of casting 5 with die I in collective I, and so on. We have introduced and used the definition of independenceof collectiveU with respectto collectiveI by postulating that the process of sampling from II by means of I should not change the original probabilities in IL However, we have to add that the 'independence' thus defined is actually a reciprocal property; in other words, if II is independentof I, th6n I is also indipenderit of IL They are mutually independent.This follows from our last formula for Pr,u, where the two probabilities p, and clsplay exactly the same role. The same argument can be repeated, starting now with collective II and sampling from I by means of the attribute 5 of II, etc. Whenever in what follows we speak of the independenceof collective II with respectto collective I, it is with the understandingthat the roles of I and II might also be interchanged. To state the multiplication rule of probabilities for independent collectivesin accordancewith the general principles of our theory, one more addition must be made. We must be sure that the new sequence of elements formed by the game of two dice, with two numbers as a combined attribute, is a collective in the senseof our definition. Otherwise, no clear meaning would be conveyed by

T H E E L EM EN TS

OF TH E TH EOR Y

OF PR OBABIL ITY

of speaking the probabilityof the result3,5.The first criterion-the of sincewe have existence limiting values-is obviouslysatisfied, (Pr.u any other of the ableto showhow these limitingvalues or been from Pr,r to Pu,u) be calculated. must now can We thirty-sixvalues the question the insensitivity the limiting values of of to investigate To we placeselection. be able to prove the insensitivity are in fact to to definition ofindeobliged add a certainrestriction our previous We that of pendence. must requireexpressly the values the limiting in frequencies collective II shall remain unchangedwhen we first in I make an arbitrary placeselection collective and then use this partial sequence collective for samplingcollective of I selected II. to this Theactualneed impose conditionwill beillustrated laterby an example. this section givea summary we To conclude concerning comthe collectives. binationof independent l. We say that collective is independent cbllective if the II of I in unchanged the operation by distribution I[ remains whichconsists of first, an arbitrary placeselection I, then a samplingof II by in meansof someattribute in the selected part of I, and finally an arbitraryplaceselection the sampled in part of II. 2. From two independent collectives this kind, a newcollective of can be formedby the process 'combination', of i.e., by considering simultaneously both the elements and the attributesof the two initial collectives. The result of this operationis that the distributionin the new collective obtainedby multiplyingthe probabilities the single is of attributes the two initial collectives. in
TEST OF IN D EPEN D EN C E

52

We havethus defined fourth and last methodof forming new the collectives.We have merely to add a few words concerningthe combination nonindependent collectives. Beforedoing this, I will of insertanothershort remark. In the same the sceptical spirit in whichwe havediscussed concept of probability,we may now ask: How do we know that truly independentcollectives er-ist,i.e., thosewhere the multiplicationrule applies The answeris that we take our convictionfrom the same ? sourceas previously,namely,from experience. alwaysin the As exact sciences, are drawn from abstract and when conclusions idealized is the assumptions, test of the valueof theseidealizations the confirmation these by The of conclusions experiment. definition 53

P RO B AB IL IT Y. S T AT IS T IC S A N D TR U TH of an elastic body in mechanics states that at all points of such a body, strain and stressdetermine each other uniquely' If we assume that such bodies exist, mechanics teachesus, for instance, how to calculate the deformation by a given load of a girder made of elastic material. How do we know that a particular girder is in fact elastic (in the sense ofthe above definition), and that therefore the resultsof the theoretical calculation apply to it? Is it possible to measurethe rates of strain and stressat each point of the girder? Obviously not. What we have to do is to assumethat the definition applies,calculate the deformation of the girder according to the theory, and test our result by means of an experiment. If satisfactoryagreementbetween theory and experiment is obtained, we consider the premisesof the calculation to be correct, not only for the one girder tested, but for all girders (or other objects) made of the sarnematerial. Another and still simpler example is this: Geometly teaches How do of the properties a sphere, differentpropositionsconcerning we know that thesepropositionsapply to the earth? Has anybody ever confirmed by direct measurementthe existencewithin the earth of a point equidistantfrom all points on its surface(this being the geometriccriterion of sphericalshape)?Surely not. To assumethe ipherical shapeof the earth was first an intuition. This assumption was afterwardsconfirmed by checkinga great number of conclusions drawn from it against empirical results. Finally, slight discrepancies betweenthe theoretical predictions and the experimentalresultswere detectedand showed that the sphereis only a first approximation to the true shapeof the earth. in Exactlythe sameconditiottsare encountered the caseof independent collectives.I[ two dice are connectedby a short thread, nobody will assumemutual independenceof the two corresponding collectives.If the threadis somewhatlonger,the answeris lessobvious,and the best thing to do is to postponejudgment until a sufficientlylong of sequence trials has beencarriedout and the multiplicationrule has been testedin this way. If the dice are Put into the box singly, without has anything connectingthem, long-standingand wide experience the validity of the multiplication rule in casesof this demonstrated kind. Il finally, the two dice are thrown by two different persons the from separate boxes,perhapsevenin two distant places, assumpbecomesan intuitive certainty, which is an tion of independence In outcome of i still more generalhuman experience. eachconcrete case, however, the correctnessof the assumption of independence can be confirmed only by a test, namely,by carrying out a sufficiently long sequenceof observations of the dice under consideration, or 54

TH E ELEI V{ENTS F THE THEO RY O F PRO BABI LI TY O of another systemconsideredto be equivalentto the one in which we are interested.The results of this test are compared with the prefrom the assumptionof the multiplication rule, and dictionsdeduced are consideredas independentif agreement collectives between the theory and experimentis found. of The mutual independence two collectivescan often be deduced are directly from their definition. This is true when both collectives Examples derivedin a certainway from a common original collective. later when we deal with the repeated use of this kind will be discussed of the four fundamentaloperations.
C OMB IN A TION OF D E P E N D E N T C OLLE C TIV E S

To conclude this discussion.I shall brieflv describehow the comoperates cases, which the conditions in in bination of two collectives are not completely satisfied.We do not mean cases of independence in which no condition whatsoeveris imposed on the two collectives; far from it. It is only a matter of a slight relaxation in the condition of independence. shall say that two collectives and We A B are contbinablebut interdenendertif the followins relation exists: We start, as before,with an'arbitrary place selectio*n l. Next we in use,as before, some attribute within this selected sequence order in to sample a partial sequence B. In contrast to the previous definiof tion of independence, assume we now that the distribution of probabilities in the sampled partial sequence B dependson the attribute of in I that was used for the sampling.Here is a concreteexample: The dice A and B are cast simultaneously.The probability of obtaining 5 with B, if we count only those casts where I has given the result 3, has a definitevalue.This value is, however,now assumed to be different from the probability of obtaining 5 with B if the sampling is made by meansof the result4for A. The following is an illustration. Three black balls and three white balls are placed in an urn. We draw two balls consecutively, without putting the first ball back before drawins the secondone. The two balls are then reolacedand the whole proc-edure repeated.The first of the two colleciivesunder is considerationis that composedof all the 'first' draws, i.e., draws made from the urn containing six balls; the probability of a white ball in this collectiveis l12, if the distribution in the urn is uniform. The secondcollectiveconsistsof all 'second'draws, out of the urn containing only five balls. This second collective can be sampled by means of the first one. Two partial sequences elements are obof tained in this way; the first containsall seconddraws following the
. r\

AND TRUTH STATISTICS PROBABILITY, draws containsall second drawins of a white ball, and the second followiig the drawing of a black ball. The probabilityof drawing and two new collectives, only 2/5 in black is 3/5 in the fir'stof these the secondone. This follows from the fact that in the first caseall three black balls remainedin the urn after the first ball had been drawn, whereasin the secondcase,the number of black balls left in afterthe first draw wasonly two. The distributionof probabilities in depends this caseon the attributeof the collectives the sampled It usedfor the sampling. can easilybe seenhow the first collective but of in distributioncan be calculated sucha case combinable, final To collectives. obtain, e.9.,the probabilityof the not independent, black ball-whiteball, one mustmultiply thefollowing two sequence the faCtors: probability l12 of a first black ball and the probability that under the assumption ball beingwhite calculated of a second the first onewasblack.This lastprobabilityis 3i5; the resultis there:3/10. Analogous calculations be carriedout can fore ll2 x 315 black and white. of for all other combinations the two properties
EXAM PL E OF NONCOM BINA B LE C OLLEC TIVE S

T H E E L EM EN TS

OF TH E TH EOR Y

OF PR OBABIL ITY

of to It is, finally,not without interest givean example two collecsense, in nor tiveswhichareneitherindependent dependent the above Imagine which we consideraltogetheruncombinable. collectives quantity,suchas the relativehumidity that a certainmeteorological overa longtimeeveryday at 8 a.m.The of the air, hasbeenmeasured from I to 6. These by say results expressed numbers, by integers are formed by theseconnumbersare the attributesin the collective Now imagine that the same or another secutivemeasurements. everyday at 8 p.-' quantityhasalsobeenmeasured meteorological the of collective, elements which are in oneThis eivesus a second We of with to-onEcorrespondence the elements the first collective. properties have that both setsof measurements the essential assume and of namely,existence limiting frequencies randomof collectives, in that assume the distribution the second ness. can,furthermore, We by is collective not affected samplingby meansof the first one, in followinga certainmorning measurements otherwords,that evening value,saythe value3, havethe samedistributionas thosefollowing do any other morning result.All theseassumptions not preclude, of of the however, possibility a regularity thefollowingkind: on each to 28thday a morningvalue,3, if it happens occuron this day,autovalue3 in the evening. the of maticallyinvolves occurrence the same a produces of like this, the combination the two collectives In a case 56

which is not a collective. applyingto the first collective By sequence consisting only ofthe 28th,56th,84th . . obsertheplaceselection we for of vations, obtaina sequence whichthe probabilities the attri(3,1),(3,2),(3,4),(3,5),and (3,6)are zero.For combinations bute example,pz1: pa x 0 : 0, where p, is the probability of the value3, and 0 the probabilityof an evening value I followrnorning (3,3),i.e., ing a morningvalue3. The probabilityof the combination ps equals x | : pa, sincethe probabilityof an evening value3 p3.B following a morning value 3 is L The distributionin the selected is sequence thus differentfrom that for the total of all morningand data,whichshows all possible for combinations definite nonevening The of obtained combination by zeroprobabilities. sequence elements sincein this sequence limiting the is, in this case,not a collective, of can by values the relativefrequencies be changed placeselection. of have the property of The initial singlesequences observations they have,however,a certainmutual relation which randomness; their beingcombinedinto a new collective. call two precludes We of collectives this kind noncombinable. This example illustrates againthe insufficiency the well-known of form of the rnultiplication elementary rule, which doesnot take into the relations account possible between two collectives. reliable the A statement the multiplicationrule can only be based a rational of on conceptof probability whose foundation is the analysisof the collective.
SU MMA R Y OF TH E FOU R FU N D A ME N TA L OP E R A TION S

I shallgivethe substance rvhatwe havelearned of about the four fundamental operations, the form of the following short statein ments: l. Selection: Definition: The attributesunchanged, sequence the of elements reduced placeselection. by Solution:The distributionis unchanged. 2. Mixing: Definition: Elements unchanged, attributes'mixed'. Solution Addition rule. : 3. Partition: Definition: Attributes unchanged,sequenceof elements reduced partition.Solution:Division rule. by 4. Combinatron; Definition: Attributesand elements two colof lectives combined pairs.Solution:Multiplicationrule. in With the statement thesefour fundamentaloperations, of and with the indicationof the methods determinine distributions for the in the derivedcollective from that (or those)in tlie initial ones,the 57

P RO BA BIL IT Y . ST A T IST IC S A ND TR U TH foundations of the theory of probability are laid. The method of solving concreteproblemsby the applicationof this generalscheme is as fo l l o w s : F i rs t o f a l l , w e rn u s t fi n d o u t what thc i ni ti al col l ecti ves and are statetheir distributions.Secondly, must considerthe final collecwe tive whoseprobabilitieswe are aiked to determine.Finally, we have to carry out the transformationsfrom the initial collectives the to final one, in stepswhich consistof the fundamentaloperations. The problem is then solved by applying to each operation its solution from the above scheme.Of coulse, it is not always necessary to proceedpedanticallyin tlris way, especially after one has acquir-ed a certain. experience. experienced An worker in the field immediately recognizes certain connexions between the collectivesunder consideration. He will use certain repeatedly occurring groups of fundamental operations as new oper.ations, which hi ipplies in one step. In many examples,not only in the simplest.the entire preparatorywork reducesto a minimum. The whole solution may, for instance, consistin a singlemixing operation: this may, however. involve difficulties of a purely analytic nature. consisiing in the evaluationof complicatedsums or. integrals. In the following sectionsI shall disciuss examplein which no an mathematicaldifficultiesare involved, but which illustratesseveral c onb i n a ti o n so f th e fu n d a m e n ta operati ons. l r A p R o BL E tv o F c H EvA r_reRnr l r!R i . This is perhaps the oldest problem ever solved by probability calculus; a consideration it will be usef'ulfor us from-more than of one point of view. In the time of Pascaland Fermat,stwo great seventeenth-century mathematicians, there lived in France a certain Chevalierde Mdrd. a passionate gambler.One of the gamesof chancefashionable his in time was playedin this way: A die was cast four times in succession; one of the playersbet that the 6 would appear at leastonce in four casts;the other bet againstit. Chevalierde Mdrd found out that there was a slightlygreaterchanceof gettingthe positiveresult(i.e.,6 coming out at leastonce in four casts).Gamblerssometimes like variety and the following variation of the game was introduced: Two dici wereusedinsteadof one,and werethrown twenty-fourtimes;the subject of the betting was the appearance nonappearance at least or of one double 6 in twenty-fourcasts.Chevalierde Mdrd, who was obviously a studiousgambler, found out that in this casethe win went

TH E E LEM ENTS O F THE THEO RY O F PRO BABI L'. more often to the playerbettingagainstthe positiveresult(the appearto anceof the combination6,6).This seemed him strangeand he even that arithmetic must be wrong in this case.His argument suggested wsnt as follows: The castingof a singledie can produce six different results.that of two dice thirty-six results,six times as many' One of in the six possibilities the gamewith one die is 6; one of the thirty-six in possibilities the game with two dice is the combination 6,6. In castingtwo dice twenty-fourtimes,the chanceof casting6,6 must be the same as that of casting6 in four castsof one die. Chevalierde Mdrd asked Fermat for a solution of this paradox; and Fermat for solvedit. His solution has beenpreserved us in a letter addressed to Pascal. I rvilt give the solution in the following section,in a tnore general folm than that given by Fermat, and will show how the solution follows from the conceptson which we have founded the theory of probability.
S OLU TION OF TH E P R OB LE M OF C H E V A LIE R D E ME R 6

We begin with the simplercase.that of the four castswith one die. of The initial collectiveis obviouslythe sequence castswith one die; are the elements singlecasts,and the attributesare the numbers I to implicitly that the six attributesare equallyprob6. Fermat assurned abl e, i .e., that t he die used is an'unbiased'one; t his assum pt ion, the which assigns value 1/6 to eachof the six probabilities,forms the basis of his calculations.According to our general concepts,the We ascribeto assumption. solution can be found without this special resultsthe six probabilities Pz, ' . .,pu. which may the six possible Pr, be all equal or different from cach other, but in any case give the sum l . What is the problern? We are askedto determinethe probability of a 6 appearingat least once in a group of four casts.This is obviare ously a probability in the following new collective:the elements groups of four consecutivecasts; the attributes are 'yes' or 'no' (simplealternative)-'yes', if at leastone of the four resultsis 6, 'no' if no 6 occurs in thesefour results.This is the collectivewhich we must derive from the initial one. What we are interestedin is the probability of the attribute 'yes' in this final collective. We must now find out which of our fundarnentaloperationslead to the final collectiveK from the initial one, which we may denote t bv C . Fi rst of a ll. we dr op t he dist inct ionbet r veenhe r esult s1, 2, 3, we 4, and 5, because are only asked whether the result is 6 or not 59

s8

PROBA BI LI TY,

STATI STI CS

AND T R U T I I

TH E ELEM ENTS O F THE THEO RY O F PRO BABI LI TY casts, namely, those pairs whose places in the original sequence were I and2;5 and 6; 9 and 10; 13 and 14: 17 and 18; . . . The attributes in this collective are the four possiblecombinations of the two attributes6 and not-6-i.e., 6 and 6, 6 and not-6, not-6 and 6, and not-6 and not-6. Are we right in applying the procedule of the combination of to C', and C'r? The answeris independent collectives the collectives in the affirmative; this case is one of those mentioned above, in which the independencefollows directly from the derivation of the collectives. Thii fact can be proved mathematically;it is, however, easy to recognize, without any mathematical deduction, that the randomness of the initial collective C imolies the mutual independence of the collectivesC', and C', derived from it (via the intermediate collective C'). The probabilities of the four combinations mentioned above can therefore be calculated by means of the multiplication rule. That of the first one (6 and 6) is puz,that of the secondand third one (6 and not-6, not-6 and 6) ispu(l -pu), and that of the fourth one (not-6 and not-6) is (1 - pu)2. Exactly the same kind of combination can be carried out with the collectives C', and C'n. The new collective, C"2, formed in this way. contains the following pairs of casts: 3 and 4; 7 and 8; ll and 12; 15 and 16; 19 and 20; . . . The attributes and the probabilities are the sarneas in C"t. We now proceed to the last combination-that of C"t and C"r. This processmeansthe coupling togetherof two pairs, e.g.,casts I and 2 (from the collective C"r) with casts3 and 4 (from the collective C"r), next casts 5 and 6 with casts7 and 8, and so on. The elements of the new collective are thus groups of four casts each, those numbered 1 to 4; 5 t o 8; 9 t o 12; 13 t o 16; 17 t o 20; . . . We denote this collective by K'; its attributes are the sixteen possiblecombinations of the four attributes occurring in C"t with the four attributes occurring in C"2.The correspondingsixteenprobabilities can be derived by the multiplication rule, whose applicability in this caseis due to the samerelationsas in the caseof the cotnbination of C', with C'2, ?trd of C', with C'n. The probability of the attribute ispoe x puz: poa; t hat ' 6 and 6' ,' 6 and 6'( f our '6's') , f or inst ance, of the attribute'four times not-6'is (l -p6)a; and so on. 6l

by We begintherefore mixing the attributesI to 5, and leaving6 as alternative attribute.We form in this way a newcollective, a second as which we may call C', consisting the sameelements C, but of with only two attributes, and not-6.According the additionrule, to 6 the probabilities these of are two attributes puand hl pz* p, + pa* ps, respectively. Sincethe sum of these two probabilities must be l, we can replace last surnby (l - pJ. the We now apply to C' a selection, selecting by from the infinite sequence thoseelements whosenumbers the originalsequence in are l, 5,9, 13, 17,21,25 . . . The attributesin this new collective-let us call it C'r-are the same in C' (i.e.,6 and not-6).Accordingto our general as rules,the distributionmust also be the same,and the probabilityof a 6 is therefore pu,that of not-6is (l - pJ. still We can form a second similarcollective anotherselection by from C', namely,by retainingthe elements whosenumbers are 2, 6, 10, 14, lg, 22,26 . . . We call this collectiveC'r; again,the probabilityof 6 in it isp. and that of not-6is (1 - p6). In the same way we can'yout a third selection, of the elements that 3,7, ll, 15, 19, 23,27 . . ., and a fourth selection-thatof the elements 4,9, 12, 16,20,24,29 . . . Theselast two collectives call C', and C'n. We have thus we formed altogetherfour new collectives, C'1, C'2, C'3, xrrd C'oby selection from the collective C'; the attributesin eachof them are simplealternatives with the probabilities for the attribute6, and pu (1 -pJ for the attribute not-6. These probabilitiesare known quantities, sincewe assumed that the valuesof pt, pz, .t p6 zte the givendata of the problem. It remains now to makeonelast step:to carry out a combination of the four collectives derivedby selection. us first combineC', Let with C'2;this means that we coupletogether first elements the the of (castsI and 2), the secondones (casts5 and 6), two collectives the third ones (casts9 and 10), and so on. The new collective formed in this waywe call C"r; its elements certainpairs of are 60

l7

P RO B AB IL IT Y, ST A T IST IC S AN D TR U TH We are now at the last stageof the calculation,leading from K, to the final collectiveK. We are not interested the probibilities of in all the sixteenattributesoccurring in K', but only in ihe alternative: no-6 at all, i.e., four times not-6, or all the other results.Another. mixing is thu,snecessary. The probability of the property .no-6 at all' remains unaffectedby mixing, i.e., equal to (l - pu)a.The probabilities of the remaining fifteen results-need be cilculated'separnot ately. Their sum is given by the expression P :l --(l __P 6)4'

T H E E L EM EN TS

OF TH E TH EOR Y

OF PR OBABIL ITY

This last probabilitywe now wishto compare with the probability for p calculated the gamewith onedie. We see, first of all, that the two expressions different. other are In wordsthe valuesof p andp' are not identicalfor arbitrary values of the probability pa.De M6r6 had surelya correctdie in mind, with pu: 116.By introducing this particularvalueinto the two values p andp': formula we obtain the followingnumerical of p :1-$le1a:0.516, : P' : | - (35/36)'z40'491' The observations de Mdrd were thus correct: in bettine on of casts, chance somewhat a 6 in four single the is higherthan 0.5iand in betting on a double 6 in twenty-fourdouble casts,somewhat lower than 0.5. It is therefore profitableto bet on 'yes' in the first and on 'no' in the second one.His reasoning was,however, ;ame, and his conclusion inexact, that, theoretically, chances the mustbe the samein the two cases, wrong. was
S OME FIN A L C ON C LU S ION S

This is the probability of the property 'not four times not-6'i.e.,'at leastone 6'-in the collectiveK(derived by mixing from the collectiveK'). Our problem is thus solved.

DISCUSSION

OF T H E

SOLU TION

The result can be extended, without much further calculation,to the secondpart of de Mdrd's problem-the caseof twenty-four casts of two dice. We considerthe sequence castswith twodice as the of initial collectivec; the resultin which we are interested a doubre6. is The.probabilityp, in the previousderivation must be replacednow by the probability pu,u,rhat of castinga double 6 in an indefinitely long sequence casts of two dice. The solution is found along of exactlythe samelines as above, although twenty-four selections arE now to be made instead of the four selectionsrequir.edin the simpler example, and twenty-four selectedcollectives must be combined in successive. steps. We need not discuss all these steps in detail; the outcome is simply the substitution of the exponeni twenty-four for the exponent4 in the above-given formula. Hence, p ' :l -(l -p u .)' n is the probability of a double 6 appearingat leastonce in a series of twenty-four casts with two dice. Assuming that the resultsof the game with two dice can be considered as a combination of two independent collectives, we can expressthe initial collectiveC of the secondpart of the problem in terms of the initial collective C of the first part. The probability in- this caseequal to pu2.The formula fo} the proUiUitity of i {o,o j1 double 6 becomes
P' : 1- ( l- P62) 24 .

A numberof usefulconsequences be drawn from this solumay tion. First of all we seethat the solutionof a problemin the theory of probabilitycan teachus something definiteabout the real world. It givesa predictionof the result of a long sequence physical of events; this prediction can be tested observation. by Historically, in this casethe observation preceded calculation. the This is, however, of no basic importance,sincethe result of the calculationis of general validityand can be appliedto all similarcases the future. in For instance, anothernumber may be substituted 4 or 24, a for biased can be usedinsteadof the unbiased die one (i.e.,the probability p. can be different from l/6), etc. Another characteristic property of resultsobtainedby probability calculusclearly illustrated by this problem is that all such resultsapply to relative frequencies eventsin long sequences observations, of of and to nothing else.Thus, a probabilitytheory which doesnot introduce from theverybeginning connexion a between probabilityandrelative frequency not ableto contribute is anythingto the studyof reality. I shouldlike to stress herealsoanothersideof the problemunder discussion. is oftenassumed in games chance arealways It that of we dealing with probabilities known a priori from the general principle of equal probabilityof all possible results.However,in the game considered the aboveexample, in there is no reasonfor assuming 63

62

PROBA BI LI TY,

STATI STI CS

AND T R U T H

O TH E E LEt v1ENTS F TF{E THEO RY O F PRO BABI LI TY observationending with tire recotding of a ccrtain attribute. The relative frequency with rvhich a specified attribute occurs in the has a lirniting value, rvhich remains unof sequence observations is changed if a partial sequence folmed from the original one by an arbitrary place selection. 3. The limiting value of the relative frequency of a given attribute to -which, asjust stated,is insensitive all kinds of placeselectionsThe probais called its probability within the collectiveconsidered. bilities of all the attributeswithin a collectiveform its distribution. (This much was covered in the first lecture. Now we come to the new matter n,e have learned in the secondlectrrre.) 4. The task of the theory of probability is to derivenew collectives and their distributionsfrom givendistributionsin one or more initial The specialcaseof a uniform distribution of probabilities collectives. in the original collective ('equally probable' cases)plays no exceptional role in our theory. 5. The derivation of a new collectivefrom the initial ones consists in the application of one or severalof the four fundamental oPeraMixing, Partition, Combination). tions (Selection, leaves distribution unchanged; the 6. The first operation, Selection, the second one, Mixing, changesit according to the addition rule; the third one, Partition, changesit according to the division rule, and the fourth one, Combination, changesit according to the multiplication rule. 7. The knowledge of the effect of the four fundamental operations on the distribution enablesus, in principle, to solve all problerns of the calculus of probabilities. Actual problerns may be, nevertheless, very nvolved, whether on account of difficulties in the logical analysis of the problem, i.e., in the enumeration of tite necessary elementaryoperations; or becauseof complications arising from the accumulation of a great number of elementary operations; or finally, becauseof purely analytical difficulties. 8. Each probability calculatiou is based on the knowledge of in ofobservations,and its certainrelativefrequencies long sequences result is always the prediction of another relativc flequency, which of can be tested by a new sequence observations. of The following is a summary of the essence these poirrts in a singlesentence: fhe theory of probability deals exclusively with frequencies in of long sequences observations; it starts with certain given frequencies and derives new ones by means of calculations carried out according to certain establishedrules. 65

a priori that the chancesof the two partners are the same.It is by no means obvious that exactly four casts of a die should be necessary to give to the result 6 the chance 0.5. The origin of such gamescan only have been the observation of a very large number of actual castsof dice. The history of this particular game of dice might have been as follows: In the course of centuries, men learned how to make dice so that the chancesof all six resultswere about the same.Afterwards, it was found that with these unbiased dice the chance of castine 6 once in four castswas close to 50\, and that the same was truelor the chanceof casting a double 6 in twenty-four casts.Finally, longer seriesof observationsshowedthat in thesetwo cases orobabilities the were not exactly equal to 0.5; deviations were found wtrich required explanation. Now came the theory, which investigatedthe relation between the two properties of a die-namely, its property of falling equally often on each of its sides,and its property of giving in half of the sequences four casts at least one 6. Calculations showed that of thesetwo propertiesare not strictly in accordance with each other: The valuep. : l/6 resultsin a valuep : 0.516and not 0.5. It is also easy to calculate,from the above formula, that p :0.5 requires, conversely,a value ofp. slightly smaller than 1/6, namely 0. I 591. It is hardly possible to demonstratemore clearly the empirical character of the theory of probability and its purpose of interpreting observable phenomena. However, this has brought us to the subject-matter of the next lecture.I shall not, thelefore, pursue this line any further here. I must also abstain from considering more examples and from discussing more special problems in detail. They would teach us little that we do not alreadyknow. On a former occasionI said that experience calculating helps us in to simplify the solution of special problems, to reduce the number of necessary steps,which was so large in the examplediscussed the in preceding paragraphs. It is, however, not my task to give you this practical tuition. I prefer to close by giving a short summary of the rnost important points, marking our development of probability calculus.
SHORT REV IEW

l. The starting point of the theory is the conceptof a collective. Probabilityhas real meaningonlyas probabilityin a givencollective. a 2. A collective is an infinite sequenceof observations.each u4

THIRD

LECTURE

Critical Discussion the of Foundations Probability of


I HAvE given, in the first two lecturesof this series,an outline of what I call the foundation of the new probabilitv theorv. The main points of the theory were briefly resiated at the end"of the last lecture. If it were my intention to give a complete course on the theory of probability,I shouldnow demonstrate how new collectives a.re derived from given onesby more and more complicatedcombinations of the four fundamental operations, and, on the other hand. how all problems usually treated in probability calculus can be reducedto combinationsof this kind. Ii would, liowever,be impossible to do this without using mathematical methodsout of place in this book. Those who are interested this side of the theorv mav in my I ectureson the Theoryof probabilitl,, originally puUtirtrei i.f9l 19 in^ and to my MathematicalTheory of piobability aid Sratistics f !:f, of 1946(seeNotes and Addenda, p. 224). Here, we are interestedin the generalfoundationsof the theory. This lecturewill thereforedeal wiih a critical survevof the results describedin the first two lectures.Discussionwill proleed along two lines. First, I shall consider the relation of the new theorv t6 the classicalonel and to some of the recent modifications whicir are intended to provide the classical theory with a firmer foundation. Second,I am going to deal with the nume.ous works which have appeared since my first publications which have sometimes containe.dobjections to my theory, and sometimes suggestionsfor its modificationor further development.
T HE CL ASSICAL DEF INIT ION OF PR OBA B ILITY

The 'classical' definitionof probabilitywasgivenby Laplace and has beenrepeated, until aboui 1930, nearlial tnitextbooks on in the theoryof probabilitywith its form almoit unchanged. runs: Ir 66

D IS C U S SI O N O F THF FO L] NDATI O N O f PRO BABI LI TY t i P robabi l i ty s th e r at io of t hc r lgnlbclof ir r ot t r ablecascs o t he t ot al all This sam eidca undcr lics t hc r vor k likely cases. nurnber-oi cqu ally alt pri or to that of Laplace. hought lt c at t t hor sdid not alwaysst at e i he dcfi ni ti onc lcar ly. tliat in nrorerecenttlnlesmally-matheI must point out, howev'er, definition. of maticianshavebeenawareof the inadcquacy Laplace's t Poincar i:( 1912)says:'lt is ha[ dly possible o give any For i nstance, satisfactorydefinition of probability; the ttsual one is as follou's . . .' Latei , w c shall seet dat a com plet elogicaldcvelopm e'tof t he def theoryon the b asisof t he classical init ionhas neverbeenat t em pt only t o abandor l ed. A uthors st ar t wit h t he'equally likely cases'. m om ent and t ur n t o t he not ion of thi s poi nt of view at a suit able pLobiUitity based on the freq'e.cy deflnition; in fact. thcy e'en someti mei explicit ly int r oduce a def init ion of t his kind. For t his the ne"vpoint of view and reasonI ntaintain that the gulf betrveen t he For m ost m at hem at icians on the cl assi cal e is not unbr idgeable. the new definition would oply nreanthe strrlelder of of acceDtance the io'rr in which the thco'y of probabilitv is usually presented, namely, or]e rvhich permits the solution of a nuntber of simple ol at probl erns th e begiir ning t he cour se, . wlileavoidins t he im m edifllcult and fundamentalideas' of diate discussion n"rore tlre The main objectionto the definitiongivenby Laplacc.concelns ..quuttylikely' or 'equallyp.ossible cases'. ordinary speech expression of diilerent degrees possibility.A certain event is sornerecognizes timel called 'possible'or 'impossible';it may equally well be 'quitc .hardly possible"and theseexpressions mean that we arc possible'or consciousof the varying degreesof 'efl'ort' involved. lt is 'hardll' possible'to write, lniorignariO,at a-speedof forty words a minute' .impossible' a speedof a hurrdred and twenty words a.minute. at and it Nevertheless, is 'quite possible'to achievethe first speedwith a shorthand.ln the sameway we typewriter,and the secondby using.equally possible'if the same 'effort' is required to iull t*o events of produce each of them. Jicob Bernoulli,3a predecessor Laplace. in fact speak of events 'quod pari facilitate mihi obtingere does possit' (that cin be achievedwith eqLralease).However, Laplace w as thi nki ng of som et hinqelse when he spoke of 'equally likely cases' . we In anothers ense say.'This eventis m or e likely t han t hat " and concerning*!al we expectto our conjecture in this way we express occur; this is the sensein which Laplace and the followers of the classicaltheory of probability use the phrase 'equal possibilily.' Thus we see thar thls latter phrase merely means 'equally reliable 67

PROBA B] I . I T\ ' . STATI STI CS

AND T R T J T I ]

O D IS C I.JSSI O N F TI I E FO UNDATI O N O F PRO BABI LI TY formed our third example, the classical theory uses a theorem relating to the combinafion of independent collectives. In a -very soecializedform this theolern statesthat: Each combination of one of the equally likely casesfrom the first and from the secondcollectives producls an equally likely case in the new collective' As we know, we can solvethe problems which occur most frequentlyand are of most important in the theory of probability by the p.rocesses mixing and combination. Hence, the theory of equal possibilitiespermits usto solve most problems in which there are uniform distributions of Most of the usual gamesof probabititiesin the original collectives. ihance-unbiased dice]properly made roulette wheels,and so forth -produce collectivesof this kind.
. D O N OT A LW A Y S E X IS T

or, conjectures' to use the currentexpression, 'equalprobabilities'. 'equallylikely cases' exactly The phrase is synonymous with 'equally probablecases'. Even sucha voluminous treatise A. Meinong's{ as ProbabilityandPossibility,only serves confirm this simplefact. If to we remember that with equalprobabilities a collective distriin the bution wascalleduniform,we may saythat, unless consider we the classical definitionof probabilityto be a vicious circle,this definition means reduction all distributions thesimpler the of to case uniform of distributions.
EQUAL L Y L IKEL Y C AS E S

We must now examine little rnoreclosely way in which this a the reductionis carriedout. An unbiased c-anproducesix equally die likely results. One of themis the number3, and,therefore, probthe ability of throwing a 3 is 1/6.If a wheelusedin a lottery beaisthe numbersI to 90, thereare ninetyequallylikely cases. Nine of these correspond numbersexpressed a singledigit (l to 9); nine to by others two-digitnumbers are whicharedivisible l0; the remaining by serienty-two numbershave two digits and are not divisibleby 10. Therefore, probabilityof a numberwith a singledigit is 9/b0 : the l/10, and so is that of a numberdivisible l0;ihe piobabilityof by all the other resultsis 8/10. Let us consideras a third examplea gameplayedwith two unbiased dice.Eachpossible combination of numbers an equallylikely case;thereare thirty-sixsuchcombinis ations.Thereforethe probability of throwing a double 6 is l/36; that of throwingthe sum I I is 1/18, because cases favourable two are to this result,namely,5 on the first die and 6 on the second, and vlceversa. The consideration thesethree applications the theory of of of 'equallylikely cases' leadsto the followingconclusions. the-first In case, haveobviouslya meretautology, we remember we if that the 'equally likely' and 'equally probable' are identical, expressions the only other consideration involvedbeingthat the sum of all the probabilities equalto unity. In the second is case,we haveseveral favourable cases united into a group. This is a special caseof the operation discussed the previous in lecture, that of mixing the labels in a collective. attributis1,2,3,. .,9 aremixed, attributes The thE 10, 20, . ., 90 likewise, and also the remainins attributes. Three -probabilities groupsof are thus formed,the single being -attributes eachequalto 1/90; the addition of the probabilities eachgrouf in produces results the shownabove. For the samewith two dicewhich 68

die with the problemof a biased by meansBut how arewe to cleal basedon a number of of a theory which knows otily prbUaUitity of It equallyliklly results? is obviousthat a slight filing.aw.ay one die will destroy the equal distribution of of an 'nbiased ot "o.n.i Are u'e to saythat now thereis no longera probability chances. probabilityof throwingan throwinga 3 rvith su"h a die, or that the of evennuirber is no longerthe sum of the probabilities throwinga theory,none of the theorems 2, 4, or 6? Accordinglo the classical can on derived the basisoifequallylikely cases be appliedto a biased die (since there is no probability without equally likely cases)' to attempted deal treatise in Laplaces his fundarnental Nevertheless heads for chances showing of with thecase i coin whichhad different werenot valid and that his conclusions or tails. It waslater realized any later textbookson the theory of probability merely omitteddie qnestions. biased was not consldered The of consideration these a subjectworrhy of treatmentby the calculusof probability.It is obviousthat suih a point of view admitsof no argument'. There are other problems, however, belonging to th.e sane die as category the biased which cannotbe set asideso easily.one is of tf,ese the problemof the probabilityof death.Accordingto a certaininsurantetable (seenote 14,lect. l), the p'obability that a l. man forty yearsold will die within the nextyearis 0.01 where are .equaity are example?-Which the'favourable' in likely cases' this the elevenof which are ones?^Arethere t000 different possibilities, .favourable' the occurrence possibilities ofdeath,or arethere3000 to the to It wouldbe useless search and thirty-three'favourable'ones? on how to defineequally for textbookifor an answer, no discussion 69

P ROBA BIL IT Y . S T AT IS IC S AN D TR LJTH likely cases questions this kind is given.When the authors have in of an'ived at the stagewhere somethingmust be said about the probability of death, they have forqotten that all their laws and theorems are based on a definition ol probability founded only on equally lik ely c a s e sT h e a u th o rsp a s s a s i f i t w erea matterof no i mportance. . . f r om th e c o n s i d e ra ti o n f a p ri o ri probabi l i ti es the di scussi on o to of c as e su h e re th e p ro b a b i l i ty ' i s n o t' knot.n a pri ori . brrt has to bc found a posteriori by determining thc freguency of the different at t r ib u te si n a s u ffi c i e n tl y o n e s eri esof experi ments. i th extral W or diu a ry i n tre p i d i tya l l th c th e o l e m sproved for probabi l i ti es thc of f ir s t k i n d a re a s s u me d b e v a l i d fo r.thoseof the second nd. If an to ki aut ho r w i s h e s s u b s ta n ti a te i s s tep.he usual l yrefers B ernoul l i ' s to th to so-called Law of Laree Numbcrs, which is supposedto form a bridge betu'eenthe conccpt of a priori probabilitiesand the deter.m ina ti o n o f p ro b a b i l i ti c s m o b scr.vati ons. fro W e s h a l l s e e l a te r th a t th i s d o cs not w ork. and that the u,hol c c hain o f a rg u me n ti s c o mp l e te l y i r cul ar.W i thout arvai ti ng s di sc thi cusslon,we may say at once that, up to the presenttime, no one has succeeded developinga complete theory of probability without. in sooner or later, introducing probability by means of the relative frequencies long sequences. in There is, then, little re.rson adhere to to a definition which is too narrow for the inclusionof a number of important applications and which must be given a forced interpretat ion in o rd e r to b e c a p a b l eo f d e a l ingw i th -manyquesti ons w hi ch of the theory of probability has to take cognizance. The position may be illustrated by an analogy from the field of elementary plan-e geometry.
A GEONIET RICAL AN A LOGY

D IS C U S SI O N O F THE FO UNDATI O N O F PRO BABI LI TY this can bc carried to arly desireddegreeof acculacy.if a sufficiently in sl nal l uni t i s t aken. Never t heless. a geom et r y of t his kind a distinction must be drawn betweenpolygonsfor which the number of sidesis known a priori (all of their anglcs being different fronr d 180'), and those for which it rnust be detern.rined posteriori by the as expressi ng lengt hsof t hcir sides,cxact ly or appr oxim at ely, mul ti pl esof the unit lengt h. possiblcto developa theoly of tl-ris kind. but no matheIt is qr-rite of matician will say that the conceptof length and the measurement length can be entirely removed from geomctry in tliis rvay. In fact, such a theory is rnerely a roundabout way of replacinethe more dilect approach. The sameholds true for the theory of probability basedon equally Frorl an historicalpoint of view, it is easyto understand likely cases. why the theory startedrvith the consideration cases equal probof of to polygons).The first subjects ability (corresponding the equilateral of the theory of probability were gamesof chancebasedon unifornr distributionsof probabilities.If, however.a rnodern nrathematician attemptsto reducethe probabilitiesof life and death. determinedas rel ati vefrequencies. som e hypot het ical to equallylikely cases, is he merely playing hide and seekwith the necessity a comprehensive for defi ni ti onof pr obabilit ywhich f or our t heor y is just as unavoidable as the i dea of lengt hand of it s m easur em ent e f or eeom et r y. ar

H OW

TO R E C OG,T"IZE E QU .{ LLY

LIK E LY

CASES

Someb,ody might considel the possibility of developingthe geometry of closed rectilinear figures (polygons) from the exclJsive considerationof polyeons with equal sidesof one .eivenlength. In t his k i n d o f g e o me tryth e rew o u l d be no measurement l ength,al t of f igur e sb e i n q d e te rm i n e d y th e i r a nsl esand number of si des-. an b If adherentof this systemof geometry were presented rvith a triangle having sidesof different lengths.say thrce. fonr, and fivc units, he would describe this figure as an equilateral dodecagon in which three. four, and fir,e sides respectivelv fall on straisht lines. i.e.. a dodec a g o n i th n i n e o fi ts a n g l e seach equal to t[0 degrees. w The red,uction all polygonsto equilateralonesis possible of rvithoutgreat difficulty provided all the sidesare multiplesof a certain unit length; 70

I think that I have made clear the distinction betweenour definition of probability and the classical one, which is still preferredby a few authors. I anticioatethat. in the future. the more imoortant pl obl emsof i nsur ance. at ist ics. st and t he t heor y of er r or s will t ake precedence over the problems of gambling, which ale chiefly of historicalimportance.Then there rvill be no hesitationin founding the theory oi probability on principleswhich are both simple and rational. In fact, we have alreadyenteredupon this development. Various authors have askedhow it is oossibleto be surethat each of the si x si desof a die is equally likely t o appearor t hat each of ninety numbersin a lottery is equally likely to be drawn. Our answer is of coursethat rve do not actually know this unlessthe dice or the lotterydrums havebeenthe subjectof sulhciently long series experiof ments to demonstrate this fact. In contrast to this point of view, the defenders the classical of theory usea particulararsumentto support
II

P R. O B AB IL IT Y, S T AT IS T IC S AN D TR U TIJ their concept.They assertthat the presence equally likely cases of is a piece of a priori knowledge. Let us assumethat a perfect geometricalcube has been rnade frorn perfectly homogeneous material. One would think that it is then clear,a priori, that none of the six sidescan be more likely to show up than any other. One usually statesthat in this caseit is certain that the chanceof the cube falling on any particular side is the same for all six sides.I will concedethis statementfor the nroment, although the result of the throw dependsalso or1the dice box, as well as on the whole process throwing the die from the box, and so on. of I will also forget that the statement has a definitemeaningonly if we alreadyknow what'equal chance'means.For example,we may have adopted the frequency definition, and according to this, 'equal chance' of a numbet of results would mean equal frequency in a long seriesof throws. Without sornesuch definition, the statement conveysno knowledgeat all, either a priori or of any other kind. Let us, however, overlook these points and assurnethat the whole task consistsin ascribingsome fractions,the 'probabilities',to tlte six sidesof the die. The questionariseswhether,for an actual cube, we can arrive at the conclusion that all thesenumbers must be equal by a logical process thought, independent experience. soon of of As as we consider more closely the assumptions of homogeneity and symmetry which must be satisfiedby the cube rvefind out the practical emptinessof the whole statement. We can call a material 'homogeneous' in a logical senseif no particular distinction can be madJbetween any oflts parts; that is, the material must also be one whose parts have the same origin and history. However, one part of the ivory of which the die is made was certainly nearer to the tip of the tusk than sorneother part; consequently,the identity of behaviour of all parts is no longer a logical necessity. This identity of behaviour follows in fact from experience, which shows that the original position of the ivory on the animal does not influence its properties in this respect. In. a concrete example, we not only use this but many other deductions from experienceas well. For instance, we insciibe the six sidesof the die with six different numbers and assumethat this does not affect the relative chances.Primitive tribes, i.e., human beings with a restrictedexperience, frequently believethe fate of the humin body to be affectedby inscriptions on its different parts. Moreover, we not only paint the numbers on the die, but make from one to six incisions on its surface and so substantially change its geometrical symmetry; we still assume,on the basis of experience, that this does

D I S C U S S ION

OF TH E FOU N D ATION

OF PR OBABIL ITY

not affect the chancesof the game.If a supporter of the a priori to by of concept probabilityis pressed explainwhat he understands homogeneity', finallymerelyrequires he that the centreof 'complete with its geometrical centre. he If of gravity the cubeshouldcoincide he moments inertia of mechanics, addsthat the twelve lnows enough must all be equal.No one will any longer about its trielve edges that it is evidenta priori that just theseconditionsare maintain for and necessary sufficient the 'equalpossibility'of the six sidesof suchas conditions involving the die, and that no furtherconditions, In of moments higherorder,needbe considered. fact, this formulatakenfrom the mechanics rigid of a tion contains numberof results We likewise based experience. may sumup our on a bodies, science merelyby by that no concrete case be handled can discussion saying of It of means an a priori knowledge equallylikely cases. is always results derivedfrom observato necessary usemore or lessgeneral which properties the of in tion and experience order to determine of the that we are usingmay influence course the experiapparatus are from this point of view. ments,and which properties irrelevant as applicais The situation very muchthe same in the well-known to of tion of theprincipleof symmetry thederivation the equilibrium conditionsfor a leverwith equalarms.When the two sidesof the identical, equalityof the forcesis assumed the leverare completely This form of the levertheorem is, to follow 'by reason symmetry'. of apart from tlte practicalimpossihowever, much too specialized; (in identical sides thelogical a bility of constlucting leverwith exactly we we sense havediscussed). must bear in mind that a lever with equalarms is not definedas one havingidenticalartns,but as one from the fulcrum. No in which the forcesact at equal distances is It to in symmetry required. is instructive see, further geometrical how many figuresreprethe older textbooksof appliedmechanics, havebeendrawn to acquaintthe senting leversof differentshapes studentwith the idea of equal-armleverswhich do not possess fact, that only the distances Yet geometrical symmetry. this decisive and from the fulcrum matter,is a lesult of experience observation.
A R E EQU A LLY LIK E LY C A S E S OF E X C E P TION A L ?

S IGN IFIC A N C E

72

of a Thosewho admit the insufficiency the above-explainedpriori the role likely' approach wishto rnaintain exceptional of 'equally but symmetry, cases as may reason follows:If, in additionto geometrical 'kinetic symmetry'(equal momentsof first and a cube possesses /{

P ROBA BIL IT Y , S T N T IS T IC S AN D TR IJTH secondorder), then the equal chances for the six facesfollow from the mechanics rigid bodies. of Ho w e v e r,l e t u s c o n s i d e r o w th e caseo[ a bi ased e; w e fi nd that n di heremechanics gir,es no assistance. us When we havedetermined all the mechanical characteristics this die. centreof qra\/itv.moments of of ine rti a .e tc .,w e a rc s ti l l u n a b l eto deri ve.bv mei ns ofmechani cs. the relativefrequencics with rvhich it witt fati on its different sides. I n t hi s c a s e . e o n l y u ,a yto d e te n e the probabi l i tyof the di fferent th rmi les ul tsi s b y s ta ti s ti c ac x p e ri me n t. l The caseof a symmetri cal e i s di t hus d i s ti n g u i s h e d m th a t o f a n unsymmetri cal fro on. i n that i n the f or m e r c a s i a p re d i c ti o no I p ro b a b i l i ti i si s possi bl e,f not a pri ori , i at lea s t b y th e a p p l i c a ti o no f a n e xperi mental ence (mechani cs) sci r i. ' hic hs o f a d i s ti n c tl yd e tc l m i n i s ti c i character. I t h i n k . h o rv e v e r, a t th e re i s a fl aw i n thi s afgunl eut.I have th alr ea d yp o i n te d o u t th a t th e re s u l t of a stati sti cal experi ment i th w a die d e p e n d sn o t o n l y o n th e d i e but on the w hol e processof t hr ow i n g i t. It i s p o s s i b l e c h e a t.rvi tti ngl yor unw i tti ngl y,w i th a to per f e c tl y y mme tri c ad i e b y u s i n ec e rtai ntri cks i n pl aci ngthe di e i n s l t he bo x o r th ro w i n _ et o u t. Ve ry d el i catel y ancedpsychol ogi cal i bal or physiologicalphenomcnAare sometimesinvolved in these procedures. This is well known frorn the expelience with card sharpsas r r c ll a s fro m c e rta i no b s e rv a ti o n s. rhi cl ihare often defi ed.* ofanai tion and are the favourite subject-matter so-called'parapiychoof logy'.6I do not want to defend the occult sciences; am, however., I c onv i n c e d a t fu rth c r u n b i a s e dn v e sti gati on these th i phenomena of by c ollc c ti o n a n d e v a l u a ti o no f o l d a nd ne,v evi dence, n the usui l i s c ien ti fi c a n n e r,w i l l l e a d u s s o o n eror l aterto the di scovery nerv m of and im p o rta n t re l a ti o n s f w h i c h w e haveas yet no know l edge. o but whic h a re n a tu ra l p h e n o rn e n an the usual sense. t any rate, i t i s i A certain that at thc presentstageof scientific development are not we in a p o s i ti o nto d e ri v e ' th e o re ti c a l l y' l the condi ti onsw hi ch must al be satisfiedso that the six possibleresultsof the game of dice will oc c ur w i th e q u a l fre q u e n c y n a Io n g seri es throi vs.B y ' theoreti ci ol ally' we mean a procedureivhich may make use of some resultsof experimentalsciencebut does not involvc statistical experiments c ar r ie do u t w i th th e a p p a ra tu s h o s eprobabi l i tyw e w ant to know . w or wit h o n e s i mi l a rl yc o n s tru c te d . T he fo l l o w i n g p ro p o s i ti o n .a l th o ugh not an i ntegralpart of our new foundation of the theory of probability, is an essential element in my conceptionof statisticalprocesses. The form of a distribution in a collectivecan be dcducedonly front a sufficiently long series of lepeatedobservations, and this holds true for uniform as well as for '74

D I S C U S S ION

OF TIIE FOU N D ATION

OF PR OBABIL ITY

all other distributions.The experimentmay be carried out by using to the systemunder considerationor one considered be equivalent to it on the basis of appropriate observations.This proposition in i t appl i es, n tl te fir st inst ance,o t he dist r ibut ions t he init ial collecw i tl r w hi ch all pr obabilit ypr oblem s begin;it alsoapplies o t he t ti v' es if distributionsin the derivedcollectives theseare to be checkedby observati ons.
TH E SU B J E C TTV E C ON C E P TION OF P R OB A B ILITY

According to our conception,the uniform distribution of probacaseof the generaldistribution; this position bilitiesis only a special who uphold the is in sharp contrastto that of those epistemologists subjective theory of probability. so-called In the opinion of theseauthors,the probability which we ascribe depends to a certain event, i.e., to our assertionof its occurrence, of on of exclusively the degree our knowledge;the assumption equal for severaleventsfollows from our absolutelack of knowchances conciseformulaledge.I have alreadyquoted the characteristically tion of this principle due to E. Czuber, who said that rve consider knowledge eventsto be equallyprobableif we have 'not the slightest of the conditions'under which eachof them is going to occur. In an apparently more scientificform, this is the so-called'Principle of Indifference'. J. M. Keynesremarks,quitejustly, that, by virtue of this principle, eachpropositionof whosecorrectness know nothing, is endowed we with a probability of 112,for the proposition and its contradictory propositioncan be regardedas two equally likely cases. Thus, if we know nothing about the colour of the cover of a book and say that it is red. the probability of this assertion li2. The sameprobabiliis ties can also be asserted the propositionsthat it is blue, yellow. for or green, and consequently the sum of these probabilitiesis much largerthan unity. KeynesT makeseveryeffort to avoid this dangerous consequence the subjective He of theory, but with little success. gives a formal rule precludingthe applicationof the Principleof Indifference to such a case,but he makes no suggestion to what is to as replaceit. lt does not occur to him to drarv the simple conclusion that ifwe know nothins about a thing, we cannot say anythingabout its probability. The curiousmistakeof the 'subjectivists' may, I think, be explained by the following example.If we know nothjng about the statur:-of six men, we may presumethat they are all of equal height. This /)

PROB ABI LI TY.

STATI STI CS

AN D T R U T H

D IS C LIS SI O NO F THE FO UNDATI O N O F PRO BABI LI TY BERTRAND'sPlRnoox8 to The attempt-s justify, in various ways, the assumption of equally likely casesor, more generally, of a uniform distribution by having recourseto principles of symmetry or of indifference fails definitely in the treatrnent of the problems first considered by Bertrand, and later given the name of 'Bertrand's Paradox' by Poincard. I shall show, by means of the simplest possible example, the insdperable which such problemspresentto everyform of the classical difficulties theory of probability. Consider the following simple problern: We are given a glasscontaining a mixture of water and wine. All that is known about the proportions of the liquids is that the mixture contains at least as much water as wine, and at most, twice as much water as wine. The range for our assumptionsconcerning the ratio of water to wine is thus the interval I to 2. Assuming that nothing more is known about the mixture, the indifference or symmetry principle or any other similar form of the classical theory tells us to assume that equal parts of this interval have equal probabilities. The probability of the ratio lying between I and 1.5 is thus 501, and the other 50/o to corresponds the probability of the range 1.5 to 2. But there is an alternative method of treating the same problem. Instead of the ratio water/wine, we consider the inverse ratio, wine/water;this we know lies betweenl12 and l. We are again told to assumethat the two halves of the total interval, i.e., the intervals 112to 314 and,314to l, have equal probabilities(50/,each); yet, the wine/water ratio 3/4 is equal to the water/wine ratio 4/3. Thus, according to our secondcalculation, 50f probability corlesponds to the water/wine range I to 413and the remaining 5O\ to the range 413to 2. According to the first calculation, the corresponding intervals were I to 312 and, 312 to 2. The two results are obviously incompatible. Similar contradictions may occur in all caseswhere the characteristic attributes (in our case the relative concentration) are expressed by continuous variables rather than by a discrete set of numbers (as in the case of a die or a lottery). I have already mentioned these socalled 'problems of geometrical probability', which owe their name to the geometrical origin of most of the older problems in this class. One of the oldest and best-known examples of geometrical probability is Buffon's needleproblem (1733).'0 number of parallel lines A are drawn on the floor and a needle is dropped on it at random. The question is: What is the probability that the needle shall lie 77

from is applicationof the Principleof Indifference also legitimate may be true or rule.This presumption the point of view of Keynes's in as false; it can also be described more or lessprobable, the colloquialmeaningof this word. In the sameway we can presume we that the six sidesof a die, of whoseproperties know nothing only a conjecture, haveequalprobabilities. This is, however, definite, may show that it is false,and the and nothing more. Experiment wasan illustration sucha case. of pair of diceusedin our first lecture lies The peculiarapproachof the subjectivists in the fact that they that thesecasesare equally probable'to be consider'l presume to are equallyprobable',since,for them, equivalent 'Thesecases notion. Nobody, however,would probability is only a subjective six assertthat the above-mentioned men are, in fact, equallytall, which can be measured the because lengthof the body is something lengthand between wereto maintainthis diffeience If objectively. w-e from a could in fact be 'deduced' its probability,equalprobabilities havejust as much right to we lack of knowledge; should,however, 'deduce' e.g., concerning these probabilities, any other assumption of that they are proportionalto the squares the numbersI to 6, and as would be as permissible any other. this conjecture I quite agreethat most people,askedabout the position of the centreof gravity of an unknowncube,will answer'It probablylies at the centre'.This answeris due, not to their lack of knowledge of this particularcube,but to their actualknowledge a concerning which wereall more or less'true'. It greatnumberof other cubes, investipsychological would not beimpossible carryout a detailed to probabilityestimations, of gationinto the foundations our subjective is but its relationto probabilitycalculus similarto that of the subThermojectivefeelingof temperature scientific thermodynamics. to of impressions hot had its startingpoint in the subjective dynamics begins, however,when an objective and cold. Its development of temperatures means a columnof mercury by methodof comparing of of estimate the degree warmth. for is substituted the subjective do measurements not knows that objectivetemperature Everyone estimate feeling,sinceour subjective alwaysconfirm our subjective or by of is often affected influences a psychological physiological certainly not impair the usefulness do discrepancies These character. thermodynamics, nobodythinks of alteringthermoand of physical of impressions in dynamics order to make it agreewith subjective observapointedout that repeated hot and cold. I havepreviously of are tions and frequency determinations the thermometers probability theory. 16

Y
P RO B AB IL IT Y. ST A T IST IC S AN D TR U TH ac r o s s u e o f th c l i n c so n th e fl o o r ? The characteri stiattri butcof a o c singlethrow is the position of thc necdlein rclation to the systemof lines on the floor; it carr bc describedby a set of numbers,called co-ordinates. Certain values of the co-ordinatescorresDondto the attribute 'crossing', other values to 'noncrossing'. Tlre origin of possiblecontradictionsin this case is exactly the same as in the preceding one. They alise from the fact that the resultsof the experiments can be describedby means of severaldifferent sets of coordinates. Tl.re concentration the rnixturein the previousexample of could be describedby the ratio wine/water as well as by the ratio water/wine. In the case of the needle,we calt use Cartesian coordinates,polar co-ordillates,or other sets of co-ordinates.Equal probabilitiesfor equal rangesin a certain co-ordinatesystemcoirespond,in general, unequalprobabilities equal ranges another to for in possible co-ordinatesystem,and vice versa. Any theory which startsfrorn the notion of eclualpossibilities a of number of differentcases, supposed be knowrr a priori, or derived to by some kind of instinctive insiglit. nrust invaliably fail when it comesto problemsof this kind. Keynes.whom I have alreadymentioned as being orreof the leadingsubjectivists. actually adrnitsthat in these casesseveraldifferent assurnptions are equally justifiable eventhough they lead to differentcorrclusions. The point of view of the frequencytheory is that in order to solve problemsof this kind (as well as any other problems)the distribution in the initial collective must be given. The source of this knowledge and the special characterof the distribution have nothing to do with probability calculus.In order to obtain resultsin an actual casewhich can be expected to be confirmed by experiment, the initial data rnust be taken from statistical observations. In the case of the water/wine mixture, it is perhaps difficult to give a leasonabledefinition of the collective involved; one would have to specify the actual procedure by which mixtures with different concentrationsare obtained. In the caseof the needleproblem, the rvay in which the collective is formed is more or lessclear.A needleis thrown repeatedly, meansof an by arrangement whosedetailsrcmain to be defined,and tlie distribution in this initial collective,which is formed by the seriesof throws, is characterizedby a 'probability density', which may be given, in principle, in terms of any co-ordinate system. Once this density function has been determined by actual experirnent, all further calculationsmust be basedon it, and the final resultsare independent of the choice of co-ordinates,which are nothing but a tool. The problem belongsto the classof 'mixing' problems: all co-ordinate ?8 D IS C U S SI O N O F THE FO UNDATI O N O F PRO BABI LI TY vaiues correspondingto the crossing of a line by the needle are 'mixed' together and all the remaining values are similarly 'mixed'. to such that the initial distriIt may be possible chooseco-ordinates bution is uniform in them. This may make the calculationseasier; it is, however,of no importance.Somechoicesof co-ordinates may appear preferable from various points of view; none of them is though empirical conditions may indicated by an inherent necessity, influence our choice.
A SU GGE S TE D NE W LIN K B E TW E E N T[IE C LA S S IC A L AND TH E

D E FIN ITION S

OF P R OB A B ILITY

As we have seen, the essential obiections which can be raised againstthe classical definitionof probibility are twofold. On the one hand, the definition is much too narrow; it includes only a small part of the actual applications and omits those problems which are most important in practice,e.9.,all those connected with insurance. On the other hand, the classical definition puts undue emphasison the assumptionof equally possibleeventsin the initial collectives. This assumption fails in all those casesof 'geometrical' probability which were discussedin the last few paraglaphs. Nothing worthy of mention, as far as I know, has been brought forward to meet the second objection. I think that this objection is usually left unanswered through a lack of interest rather than on positive grounds. As far as the first objection is concerned, nearly everybodywho has followed the traditional coursein the theory of probability will reply that the classical theory provides a link connecting the two definitions of probabilitli that, owing to this link, the problems which were eliminated at the outset, such as those of life insurance,may be dealt with; and that the resultsare satisfactory, at least as far as practical applications are concerned. This link is sttpposedly found in the Law of Large Numbers, which was first suggested Bernor.rlliand Poisson. (We have already mentioned it by on a previous occasion.)By rneansof this law, it can be proved mathematicallythat probability valLres obtained as quotientsof the number of favourablecasesdivided by the total number of equally possiblecases,arc. to a certain degreeof approximation,eqtral to valuesobtained by the determinationof relativefrequencies proin longed seriesof observations. Many authors have already pointed out the dangerousrveakness this link; nevertheless, has been it of usedagain and again.owing to the absence anything which could of replaceit. 19

P RO B AB IL IT Y, S T AT IS T IC S AN D 1-R U TH We shall have to examine this point closely later on, owing to the generalimportanceof the 'Law of Large Numbers' and the need of it in all practical applications.However,we postponethesedelicate for considerations the moment. Our fourth lecture will deal exclLrsively with the various aspectsof this famous law. Meanwhilc. we anticipating some of the results of that discussiorr, state: The Law of Large Numbers, including its consequences, does not relieve us of the necessity introducing probability as a limit of relative of frequency.In fact, the basic law derived by Bernoulli and Poisson loses its main importance and actually its meaning if we do not adopt the frequencydefinition of probability. Only through hidden can we arrive at the idea errors of argumentand circular reasonings that this theorem 'links' the frequencydefinition with the definition basedon equally likely events.
S U M ]\lARY OF OBJECT ' IONS T O TH E C LAS S IC AL D E FIN ITION

D IS C U S SI O N O F THI FO UNDATI O N O F PRO B. \ BI LI TY 'corrcct' No genelal prescriptionfor selecting co-ordinatesystems. co-ordinatescan be given. and there can therefore bc no general for Dreference one of the rnany possiblcuniform distributions. ^ 7. The 'Law of Laree Numbers'. derived mathematicallv bv B ernoul l iand Poisson.f r ovides no link bet weent he def init ion of probability based on equally likely casesand the statisticalresults derived from long series of observations.It does not alter our postulate that the frequency definition is the starting point of the will be elaborated on whole theory of probability.This last assertion in the next lecture.
OB J E C TION S TO MY TH E OR Y

The secondoart of this lecturewill be a discussion some new of contributions to the foundation of the theory of probability; but beforedealingrvith this, I would like to sun"r briefly the objections up I have raisedagainstthe classical definition of probability, basedon t he no ti o n o f e q u a l l vl i k e l v e v e n ts . l. Since'.quilly possibie'is only another expression 'equally for probable', the classical'definition' mealls, at best, a reductionof collectives with all kinds of distributionsto collectives with uniform distributions. 2. Equally possiblecasesdo not always exist, e.g., they are not presentin the game with a biaseddie, or in life insurance.Strictly speaking,the propositionsof the classicaltheory are thereforenot applicableto thesecases. 3. The statement that 'the six facesof an absolutelyhomogeneous cube have equal probabilities'is devoid of content, unlesswhat we mean by 'equal probabilities'has previousl), beenexplained. 4. Perfect homogeneity,iu the logical senseof this phrase,does not exist in practice If the plocessof manufactureof a die is com. pletely known, it is always possibleto find aspectsin rvhich the differentsidesdiffer from each other. 5. The 'Princiole of Indifference'and sirrrilarconceDtsare onlv 'none -of circumlocutions tlie classicaltheory. They avoirl of it! difficulties. 6. In the case of a continuous distribution. the assulnution of a 'uniform distribution' means somethins different in different 80

Since my first publicationswhich appearedin 1919,an intensive of discussion the foundationsof the theory of probability has started and is still in progress.Those authors who had worked in this field for many years and had been successfulin the solution of a number of special problems could hardly be expected to agree at once to a completerevision of the verv foundationsof their work. Apart from this older generation,lo ihere is scarcely a rnodern mathematician who still adheres without reservation the classical to theory of probability. The majority have more or lessaccepted the frequencydefinition. A small group, whom I call 'nihilists', insist that basic definitions connectingprobability theory with the empirical world are unnecessary. will deal with this point of view at I the end of this lecture. Even among those who agree that the subject of plobability calculusis freouencies and who think that this should find its exDression in the definition of probabilit), there are still many differences of opinion. In the first place, there are some mathematicians who begin their course by defining probability as the limit of relative frequency,but do not adhere consistentlyto this point of view in their further developments. Instead,they levert to the old ways of the classical theory.The Frenchtextbook by Frdchetand Halbwachs (1924),11 that by the American mathematician and Julian Coolidge (1925),12 belong to this group. A more recentwork bv Harald Cramdr.13 which seems reDresent to the prevalenttrend arnong American and British statisticiani,completely adopts the point of view of the frequency definition. Cramdr rejectsthe definition basedon equally possiblecasesas inadequate and firmly opposesthe standpoint of the modern subjectivistswhich will be further discussedlater on. However, Cram6r omits giving a 8l

P ROB AB IL ]T Y. S T AT IS T IC S AN D TR TITH clear definitiou of probability and in no way explaiusol drrrives in a logical manner the elementalyopcrationsof probability calculus. T he r e a s o n h y h e a n d a n th o rso fth e sameschoolofthought are abl e w to proceedin this way is that, for all of them, the fundamentalquestions which arisefrom the simpleproblems of the theory of chancedo not exist.If one'sattentionis focused the mathematical difficulties on of cornplicated problems it is easily possibleto pass over the difticultiesof the fundamentals. The sameholds true in the caseof pure mathematics:the mathematicianwho is concentrating the soluon tion of intricate oroblems need not concern himself with the propositionthat c limes 6 equalsb timesa. The significant difference is that in this field scientific disciolineis much further advancedaud it is thereforeno longer customiry to deal rvith the foundations in a few casualwords. Another small group of mathernaticiaus opposedto the definiis tion of the collectiveas an infinite sequence elements; of they prefer to deal exclusivelywith frequencies long. but finite, sequences, in i.e., to avoid the useof lirnits. A larger group accepts first postumy late, viz., the existence limiting valuesof relativefrequencies, of but finds difficultieswith the secondone, the postulateof randomness. Certain suggestions concerningthe possiblealteration of theseconditions have been made. I propose to deal with thesequestionsin turn in the following sections,including also a brief discussionof new developments the subjective in conceptof probability.
F INIT E COL LE C TIVE S

D IS C U S SI O N O F THE FO LI NDATI O N O F PRO BABI LI TY apart from the greater argumeut for using inlinite sequences. simplicity of this method. and I havc neverclaimed for it any other In advantages. 1934,JohanncsBlurncr{set hinrselfthe task of transof forming my theory in such a way as to use only finite sequences observations,especiallyin the fundamental definitions. His procedureis this: Insteadof the postulateconcerningthe limits of the relative frequencies,he assumesthe existence of certain fixed and postulates numbersdeterminingthe distributionof the collective, should dlffer from that the valuesof thl actual relativefrequencies thesenumbers by no more than a small positive magnitudee. Assuming that e is sufficientlysmall, it is possibleto perform certain constantlyremainingwithin the on operations thesefinite collectives, limits of an approximation defined by the magnitude e. As far as this method actually goes,it amounts to nothing more than a circumscription of the concept of a limiting value, which may be quite already by A. useful for certain purposes.This has been stressed Kolmogoroffls in his review of Blume'swork. The word 'limit' is in fact used in mathematicsonly as a conciseway of making certain concerningsmall deviations.On the other hand, neither statements Blume nor other authors workins in the samedirection have so far in in of beensuccessful describing t[e language the 'finite' theory all properties of a collective and all connexionsbetween collectives. especiallythose relating to the principle of randomness.At the present time, therefore, I do not think that we can speak of the actual existence of a theory of probability based on finite collectives.16 Here I should like to insert an historicalinterpolation.The philowho had many-sidedinterests, created, sopher Theodor Fechner,l? underthe name of 'Kollektivmasslehre', kind of systematic descripa tion of finite sequences observations.which he called 'finite of populations'(Koliektivgegenstande). work was editedby Lipps This in 1897,after the death of the author. Fechner probably did not think of the possibilityof arriving at a rationalconceptof probability fron'rsuchan abstraction his'finite population',but his viewshave as served, least for me, as a stimulusin developingthe new concept at of orobabilitv. Returningio our subject,I must defendmyselfmost emphaticaliy against the recurring rnisunderstanding that in our theory infinite sequences alwayssubstituted finite sequences are for ofobservations. at This is of course false. ln an examplediscussed the end of the preceding lecture.we spokeof the group of twenty-fourthrows of a pair of dice, Such a group can serveas the subjectof our theory, if 83

Thereis no doubt about the fact that the sequences observations of to which the theory of probability is applied in practice are all finite. In the same way, we apply in practice the mechanics of particles to the treatment of problems concernedwith bodies of finite size which are not geometricalpoints. Nevertheless, nobody will denv the utilitv and theoreticalimportance of the abstraction underlyiig the conceptof a material point. and this despitethe fact that we now have theoriesof mechanics which are not basedon the considerationof discretepoints. On the other hand, abstractions that originally belonged to the mechanicsof particles permeate far' into the mechanicsof finite bodies.We need not enter into details here. It is doubtless possible avoid the notion of inlinite sequences to in dealing with mais phenomena or repetitive events.The question is, what would be the resultsof such a method? I do not know of any 82

Y
P R OBA BII-I-T Y .ST A T IST IC S A N D TR LJ' TH it is assumed that it has beenrepeated, a whole.an infinitc uunrbcr as of times and in this way has be'come elementof a collective. an This leads us to certain statementsabout probability that apply to a in twenty-four.Similarly, fnite numberof observations, this example, if we consider,for instance,the birth rate of boys in a hundred different towns, our theory shows what can be expected,on the average, the caseof this finite number (n - 100)of observations. in There is no question of substituting an infinite sequencefor eaclr group of 100 observations.This point will be discussedin greater detail in the fifth lecture of this series,which will be concernedwith the problems of statistics.
T EST ING PROBABIL ITY S TA TE MEN TS

D IS C U S SI O N O F THE FO UNDATI O N O F PRO BABI LI TY [s a parallel to this difficulty \r'e may consider, for instance,the fact of that it is impossibleto make an infinitely long sequence throws conditions,because of with one and the samedie, under unchanged the gradual wear of the die. One could say that, after all, not all physicalstatements concern limits, for instance,that the indication of the rveight of a certain However,as finite volume of matter is likewisea physicalstatemJnt. soon as we begin to think about a really exact test of such a statement, we run into a number of conditions which cannot even be formulated in an exact way. For instance,the weighing has to be carried out under a known air pressure,and this notion of air is pressure in turn foundedon the conceptof a limit. An experienced physicistknows how to define conditions under which an experias mental test can be considered 'valid', but it is impossible give a to logically complete description of all these conditions in a form for instance,to that in which the premises a mathecomparable, of matical proposition are stated. The assumption of the correctness of a theory is based,as H. Dubislav justly states,not so much on a as logical conclusion(Schluss) on a practicaldecision(Entschluss). I quite agreewith the view which Carl G. Hempellsput forward in his very clearly written article on 'The Content of Probability Statements'. According to Hempel, the resultsof a theory basedon the notion of the infinite collectivecan be applied to finite sequences of observationsin a way which is not logically definable,but is nevertheless sufficiently exact in practice. The relation of theory to observation in this caseessentially sameas in all other physical is the sciences. Considerations this kind are often describedas inquiries into of the 'problem of application'. It is, however, r,ery definitely advisable to avoid the introduction of a 'problem of applicability', in addition to the two problems, the observationsand their theory. There is no special theory, i.e., a system of propositions, deduitions, proofs, etc., that deals with the questionof horv a scientifictheory is to be applied to the actual observations. The connexionbetweenthe empirical world and theory is established each case by the fundain mental principlesof the particular theory, which are usually called its axioms. This remark is of special importance to us because occasionalattempts have been made to assign to the theory of probability the role of such a general 'appliiation theory'. ihis conception fails at once when we realize that a new problem of applicationwould arise in connexiortrvith cach singlestatementof the cal cul us pLobabilit y. of
b)

The problem of formulatins a theorv of finite collectives. the in eiplained above,must bJclearly distinguished from that of the sense actual interpretationof the resultsof our probability calculations. Since we consider that the sole purpose of a scientific theory is to provide a mental image of objectively observable phenomena, the only test of such a theory is the extent to which it applies to actual sequences observations,and theseare always finite. of On the other hand, I have mentionedon many occasions that all the results of our calculations lead to statementswhich apply only Even if the subject of our investigation is a to infinite sequences. ofobservationsofa certaingivenlength,say 500 individual sequence trials, we actually treat this whole group as one element of an -Consequently, infinite sequence. the reiultiapply only to the infinite repetition of sequences 500 observationseach. It might thus of appear that our theory could never be testedexperimentally. This difficulty, however, is exactly the same as that which occurs in all applicationsof science. for instance,a physical or a chemical If, consideration leads us to the conclusion that the specificweight of a is substance 0.897,we may try to test the accuracyof this conclusion by direct weighing, or by some other physical experiment. However, the weight of only a finite volume of the substance can be determined in this way. The value of the specific weight, i.e., the limit of the ratio weight/volume for an infinitely small volume, remains uncertainjust as the value of a probability derived from the relative frequency in a finite sequence observationsremains uncertain. of One might even go so far as to say that specific weight does not exist at all, becausethe atomic theory of matter makes impossible the transitionto the limit of an infinitelysmall homoseneous volume. 84

Y
P R OBABIL IT Y, AN ST AT IST ICS OBJECT ION AN D TR U TH FIR S T POSTU LA TE

S D IS C T,I SI O N O T'T}I E }'O I ] NDATI O N O F PRO BABI LI TY Accor dins t o our t lt coly. than 0.51 nray be. f ol exar nple. 0. 0000l. t an thi s meanstl i at if we l'cpeat hcscsct sof - l t lr r or vs ir r f init cnunr bcr ' rve I olt of' ti mes, shallf ind t hat , on t hc aver aq! ', in 100, 000 hcse sct s Thc licquencyr ihich is concontai nsnrore than 5lizi cl'cn r esult s. i and is obt ained si deredn thi s exam pleis t hat in a linit e setof r r cast s by the di vi si on of t he nr even num bcr sin t hc set bv t he f ixed t ot al numberr of l hrows. On the othe l hand. when def ininq t hc pf obabilit y of 'evL'n'we f of a consi der rel at ive r equency a dillcr entkind. I n f act . ii'e consider ' r vit of the w hol e sequence all exper ir nent s. houtdividine it int o set s l, and count the number of evetrnunrbersfronr the beeinningof of lf rhe sequence. 1/ throrvs have been made altogether.and 1/, of t thern have gi ver r 'even'r esult s, he quot ient jvr / / / is t he f r equency and we assulnet hat t his f ir ct ion, in r vhich bot h t he consi dered, indelinit ely.t ends t o a denonri natora nd t lie nunr er at orincr ease constantl i mi ti n g value. I n our case t his value r vould be l/ 2. No of connexioncxist sbet wecnt he t wo pr oposit ions r vhich i mmedi ate f or of t one postul ateshe er ist encc a lim it ing valueof t he r at io / Vr / 1V. of iV tending to infinity. and the othcr statesthe occurrence certain olt setsof the gi venf ixed lengt hn r vhichcr hibit an unusualvah. r e he flequencyll/;r. Thele is thereforerto contradictionbetwcenthe tivo fron.t a coLrldonly ar-ise The idea of sucl.r corrtlacliction statements. an i ncompl ete nd incxactf olnr ulat ionof t he pr obler n.O ne of t he a purposes our ncxt lcct ur ewill bc t o inquir e r nor ecloselyint o t he of ti',ese tw'o statcnrcnts, and we shall find not only relation betrveen but that thc Law of Large Numbels that they are recorrcilable and full importanceonly by beingbasedon acquiresits propel scrtse the frequencydefinition ol probability.
OB JEC TION S TO TIIU C OT-I)ITION OF R A N D OIv IN E S S

TO THE

The majority of mathematicians now agreethat the conceptof an infinite sequenceof observationsor attributes is an appropriate foundation for a rational theory of probability. A certain objection. resultingfrom a vaguerecollectionof the classical theory, is raisecl. however,by many who hear for the first time the definitionof probability as the limiting value of the relativefrequency.I will discuss this objectionbriefly, although it does not stand closeexamination; it belongslogically to the problemswhich I am going to discussin my next lecturedealingwith the Larvsof Large Numbers. The objectionle refers in fact to the text of the theorem of Bernoulli and Poissonwhich I havementionedpreviously. According to this proposition, it is 'almost certain' that ihe relativefrequenci of even numbers in a very long sequence throws with a colrecr dic of rvill lie nearto the probabilityvalue l/2. Nevertheless, celtain srnall a probability existsthat this relativefrequencywill differ slightly from 0.5; it may be equal to 0.51, for instance,even if the sequence a is very long one. This is said to contradict the assumptionthat thc limiting value of the relativefrequencyis exactlyequal to 0.5. In other words,so runs the objection,the frequency theory implies that, with a sufficientincreasein the length of the sequence of observations, difference the betweenthe observedrelativefrequency and the value 0.5 will certainlv (and not alntost certainlv) b'ecome smaller than any given small fraction; there is no room for the deviation0.01 from the value 0.50 occurrins rvith a finite. althoueh small, probability even in a sufficientlylorig sequence observ-aof t ion s . This objection is basedon nothing but an inexact wording arrd may be easil-y disposedof. The above-mentionedlaw doessay something about the probab^ility a certain value of-relativefrequency of occurring in a group of n experiments. We thereforehave to know what probability meansif we are to interpretthe statement. According to our definition. the rvhole group oi n consecutive throws has to be considered one elementin a collective. the samewav as as in this was done beforewith groups of four and of twenty-fourthrows. The attributein the collective which we now consideris the freouencv of the attributc 'even'in a eroup of n throrvs.Let us call th is frequency x . I t c a n h a v e o n e o f th e n -.i -I v al ues, | l n,2l n, . . . Io nl n : l . 0. I f ' e v e n ' a p p e a rsn r ti rl c s i n a s eri csof rr throrvs,the attri bute is the fraction x =,,)n,t Each of theserr .r- I different valuesof -v n. has a certain probability. The probability that x has a value greater 86

I shal l norv considcl t hc objcct ionswhich have been r aised t o I-et us restate the my second condition, that of rundottrnes.s. probl em. W e c onsider an inf lnit e scquenceof zet 'osar t d ones, We say t hat i .e., the succcs sive com csof a sim ple alt er nat ive. out it possesses property of randonrness tlic lelative frequencyof if the a I' s (and thel efo r ealso t hat of 0's) t ends1<l cer t ainlinr it ing value of w hi ch remai nsunchangcdby t hc ont issiot t a cer t ain num ber of f the el ements d t he const r uct ionol a t t er vseqt lenccr 'om t liose an which are left. The selectionmust be a so-calledplace selectiolr. i.e.. it must be made by meansof a formula which statesrvhich eleand r et ainedand ar ments i n the or iginal sequence e t o bc sclect ed 87

Y
P ROB AB IL IT \" S T AT IS T IC S AN D TR U TH which discardcd. This formula musf leave an infinite uumber of retained elementsand it must not use the attributes of the selected by i.e.,the fate of an elementmust not be'affected the value elements, of its attribute. Examples placeselection of are: the selection eachthird element of in the sequence; selection each elernentwhose place number. the of less 2, is the square of a prime number; or the selectionof each number standingthree placesbehind one whose attribute was 0. The principle of randomnessexpresses well-knorvn property of a games of chance, uamely, the fact that the chances of winning or losing in a long series games,e.g.,of roulette,are independent of of the systsm of gambling adopted. Betting on 'black' in each gante gives the same result, in the long run, as doing so in every third game, or after 'black' has appearedfive times in succession, and so on, In my first publication in 1919 (see auto-bibliogr. note), I gave much spaceto the discussion the conceptof randomness. Among of other propositions, derivedthe following 'Theorem5': 'A collective I is completelydeterminedby the distribution, i.e., by the (limits of for the) relativefrequencies each attribute; it is howeverimpossible to specifywhich elementshave which attributes.'In the discussion of of this proposition, I said further that 'the existence a collective cannot be proved by means of the actual analytical construction of a collective in a way similar, for example, to the proof of existence of continuous but nowhere differentiable functions, a proof which consistsin actually writing down such a function. In the caseof the collective,we must be satisfied with its abstract"logical" existence. The proof of this "existence"is that it is possible operatewith the to conceptof a collectivewithout contradictionsarising.' Today, I would perhapsexpress this thought in different words, but the essential point remains:A sequence zerosand oneswhich satisof fies the principle of randomnesscannot be describedby a formula or by a rule such as: 'Each elementwhoseplace number is divisible by 3 has the attribute I ; all the others the attribute 0'; or 'All elements with place numbers equal to squaresof prime numbers plus 2 have the attribute l, all others the attribute 0'; and so on. If a collective could be describedby such a formula, then, using the same formula for a place selection,we could selecta sequence consistingof l's (or 0's) only. The relative frequency of the attribute I in this selected would have the limiting value 1, i.e., a value differentfrom sequence that of the sameattribute in the initial completesequence. It is to this consideration,namely, to the impossibility of explicitly 88 D IS C U S S I O N O F THE FO TJNDATI O N O F PRO BABI LI TY of describing the succession attributes in a collective by means of a formula that critics of the randomness principle attach their argurnents.Reducedto its simplestform. the objection which we shall which conform to the have to discussfirst assertsthat sequences do condition of randomness not exist. Here, 'nonexistent'is equivaby lent to 'incapableof representation a formula or rule'. A variant of this objection counters the joint use of the second with the first axiom, that of randomnessrvith that of limiting values. The argumentruns, roughly, as follows. of of or The existence nonexistence limiting valrtes the frequencies of numbers composing a sequence,say I's and 0's, can be proved conforms to a rule or formula. Since, however, only if this sequence in a sequencefulfilling the condition of randomnessthe succession of attributes never conforms to a rule, it is meaninglessto speak of of limiting valuesin sequences this kind.
R E S TR IC TE D R A N D OMN E S S

One way to avoid all these difficulties would seem to consist in effectively restricting the postulate of randomness. Instead of requiring that the limiting value of the relative frequency remain unchanged for euery place selection,one may consider only a predetermined definite group of place selections. In the example which we discussed at the end of the second lecture,we made use of a frequently recurring, typical place selection. Starting with an infinite sequenceof elements,we first selectedthe num ber ed2, 6, t l st,5th,9th, 13 t h, . . elem ent s; hen t he elem ent s . ; and 15, 10, 14, . . .; fo llor ving t his, t he num ber s 3, 7, ll, finally 4, 8, 12, 16, . . We assumedthat in each of these partial sequences limiting frequenciesof the various attributes were the the same as in the original sequence,and furthermore that the four partial sequences were 'independent' in the senserequired for the in operation of combination, i.e., that the limiting frequencies the new sequences which are formed by combination and whose attri'be butes aie four-dimensional could computed according to the simple rule of multiplication, The samereasoningholds true if instead ofthe value n : 4we considerany other integralvalue for n, such as n : 24, or n :400. A sequence elementswhich has the aboveof describedproperty for euery n is today generally called a Bernoulli sequence. and later A. The American mathematician H. Copeland2o on myself,2lin a simpler way, have shown how it is actually possible to construct Bernoulli sequences. following explicitly prescribed By 89

Y
P ROBA BIL I-T Y , S T AT IS ' T IC SAN D TR U TIJ rules,one can form an infinite sequence 0's and I's which satisfies of the above-stated conditionsfor evcry n. Copeland has also shown that Belnoulli sequences have other intere.sting properties.If a partial sequence formed out of those is which follorv a predetermined elements group of results, e.g.,a group of f i v e e l e me n ts o n s i s ti n g f fo u r l ' s w i th a 0 i n the mi ddl e,then i n c o slrch a sequence limiting frequencyof the I (and of coursealso the of the 0) will remain unchanged. may thereforesay that Bernoulli We sequences are those without aftereffccts.This property is called 'freedom from aftereffect'. Thesefacts seemto indicate that it might be sufficientto requile that a collectiveshould be of the Bernoulli type. Sinceit is explicitly possibleto constructBernoulli sequences, restrictionwould disthis pose of all argumentsagainstthe existence such collectives. of Let us, however. consider what rve would lose by thus restricting the condition of randomness. W h e re a s e w o u l d u u d o u b te d l y abl e to deal w i th questi ons w be of t he ty p e o f th e p ro b l e m o f th e C heval i er Mdr6, di scussed the de in precedinglecture,and would be able to proceedin the same way, there is, on the other hand. no doubt that a number of other meaningful questionsrvould now leurain unansrvered. What happens, for instance,if a player decides. the beginning,that he will consider at only the first. second,third, fifth, seventh, eleventh,. . castsof the die, that is to say,only thosewhoseorder numberis a prime number? Will this changehis chancesof rvinning or not? Will the same rule of combination hold true in the seqience obtained throush the place selectionby prime numbers? If, insteadof restlictingourselves Bernoulli sequences, conto we sider some differently defined class of sequences, do not irnprove we the state of affairs. In every caseit will be possible to indicate place selectionswhich will fall outside the framework of the class of sequences which we have selected. is not possibleto build a theory It of probability on the assumption that the limiting values of the relative frequencies should remain unchanged only for a cel'tain group of place selections, predetermined once and for all. All the same,we shall seethat the considerationof sequences such as Bernoulli sequences and others, which satisfy conditions of restricted randomness, will prove valuablein solvingcertainquestions which in we are interested.
D IS C U S S IO N MEA N IN G OF TH E OF TH E FOU N D A l ION C ON D ITION OF P R OB A B ILITY

OF R A N D OMN E S S

In our theory of probability we have given first place to the of under consideration propositionthat in the scquence observations the relativefrequencyof eachattribute has a limiting value indepenLet us review once more what we mean dent of any place selection. To by this postulate. be sure,it is not possibleto prove it. Even if it we were possibleto form infinite seriesof observations, would not be able to test any one of them for its insensitivity against all place if we selections, for no other reason,because are not in a position The axioms of scienceare not to enumerateall place selections. of of statements facts. They are rules which single out the classes problems to rvhich they apply and determine how we are to proceed in the theoretical considerationof these problems. If we say in rnechanics that the massof a solid body lemainsunchanged classical in time. then all we rnean is that, in every individual problem of mechauicsconcelned with solid bodies, it will be assumedthat a definite positive number can be attributed to tlie body under consideration;this number will be called its mass and will figure as a constant in all calculations. Whether this is 'correct' or n6t can be tested only by checking whether the predictions concerning the behaviour of the body made on the basisof such calculationscoincide with observations.Another reason for rejecting the axiom of a constant mass rvould be, of course,that it presented logical cona tradiction with other assumptions.This, holvever, would merely imply that calculations basedon all assumptions togetherrvould lead to mutually contradictory predictions. Let us now seewhat kind of prescriptions follow from the axiom of randomness. After all that has been said in the first and second lectures,it can only be this: We agree to assumethat in problems of probability calculus,that is, in deriving new collectives from known ones, the relative frequencies of the attributes remain unchanged whenever any of the sequences has been subjected to one or m-ore place selections. We do not ask, at this mcment, whether such an assumption is appropriate, i.e., rvhether it will lead us to useful results. All rve ask now is rvhetherthis orocedure mav causecontradictions. This question can be answeied clearly, ui I rhull ,ho* below.But first, I must insert somewords of explination introducing an important mathematical concept. A quantity which cannot be expressedby a number, in the usual sense the word, is said to be infinite. However, following Georg of Cantor, the great founder of the theory of sets,modern mathematics 9l

90

Y
P ROBA BIL IT Y , S T AT IS T IC S AN D TR U TTI distinguishesbetween several kinds of infinity. I shall assumeas known what is meant by the infinite sequenciof natural numbers. If it is possible establish one-to-one to a iorrespondcnce between thc elements a given infinite set and the natural numbers.then we sav of that the set consideredis enumerableor enumerably infinite. Ii other words, an infinite set is said to be enumerable wheneverit is possibleto number all its elements.The set of all numbers which represent squaresof integersand also the set of all fr-actions having rntegersas numerators and denominatorsare enumerabryinfinitel On the other hand, the set of all numbers lying between'twofixed limits, say,betweenI and 2, or the setof all poinis in a giveninterval are not enumerable. least.it has not yet been porrible to devise At a theory of the set of poi.ts in an interval-which rvo,ld not use some other essential concept besides that of enumeration.The set of all points in an interval is said to be 'nonenumerable' more specificor, ally, 'continuouslyinfinite'.This distinctionbetween enumera6le and continuously infinite sets is_of the greatest importance in many problems of Using this concept, we will explain th! presentstage'rathematicsofour kno*'ledgervith respect the consistency tb ofthe ax io m o f ra n d o mn e s s .
CONSIST ENCY OF T HE R AN D OMN E S S A X IOM

D IS C U S SI O N O F THE FO I . JNDATI O N O F PRO BABI LI TY infinite number attributesis subjectedto more than an enurnerably and I do not know whcther this is evenpossible. of place selections, Rathcr, it rnight be in thc spirit of modern logic to maintain that the total number of all the place selections v,hich t'an be indicated is Moreover, it has in no way been proved that if a probenumerable. lem should requirethe applicationof a continuouslyinfinite number of place selectionsthis would lead to a contradiction. This last questionis still an open one. But whateverthe answermay be, from what we know so far, it is certain that the probability calculus,founded on the notion of the will not lead to logical inconsistencies any application in collective, of the theory known today. Therefore,whoever wishesto reject or to modify my theory cannot give as his reason that the theory is unsound'. 'mathematically
A P R OB LE Iv I OF TE R MIN OLOGY

During the last twenty years, a number of mathematicians have wor k e d o n th i s q u e s ti o n .I n a m e here i n parti cul ar,K . D oree.2r A . H . C o p e l a n d ,2A. Wa l d ,2 a n d W . Fel l er.25 l thoueh both" tne 3 a A starting and the uj.rr.ol their respective investigitionsvary, _points all of them uneqr-rivocalrybring out this son e resirt: Given "a sequence attributes,the assumptionthat the limits of the relative of frequenciesof the various attributes are insensitiueto anv finite or enumerably infnite set of place selections cannot lead io a con_ t'adiction in a theory basedon this assumption.It is not necessary to.specily the type or propertiesof the place selections under con'srclefatlon.lt can be shorvn that, whatever enumerably infinite set.of place selections used, there exist sequences attributes is of which satisfythe postulateof insensitivity. can evenbe statedthat It 'almost all' (and this expressionhas a precise meanins which I cannot go into here) sequences attributes have the" required of property' This last statementimplies that coilectives are in a sense 'the r'le', whereaslarvfully ordered sequences arc 'the exception'. rvhichis not surprisingfrom our point of view. I know of no problem in probaUitity in which a sequence of 92

I must now say a few words about another question, which is solely one of terminology. It has sometirnesbeen said that a deficiencyof my theory consistsin the exclusionof certain purely mathematical problems connected with the existenceof limiting values of relative frequencies sequences numbers dcfined by in of formuh. It is not my intention to excludeanything. I havc merely introduceda new, nante,that of a collectire,for sequences satisfying I the criterion of randomness. think further that it is reasonable use to the word'plobability' only in connexionwith the relativefrequencies of attributes in sequenccs this special kind. My purpose is to of concerningthe devisea uniform terminology for all investigations of problemsof gamesof chanceand similar sequences phenomena. It is open to everyoneto use the telm 'probability' with a more general meaning, e.9., to say that in going through the natural sequence numbersthe probability of encountering evennumber an of is l/2. It will, however,then be up to hirn to explain the difference existing,from his point of view, between,say, the natural sequence of integersand the sequence the resultsof'odd' and 'even' in a of game of dice. This problem is not solved by a chanue in terminology. Neither am I willing to concedethat a tlteory is more generalor superiorbecause is basedoll somenotion of 'limited randomness', it There still and therefore includes a greater variety o[ sequences. remains the essential by difficulty of indicating the characteristics resultsof a which sequences such as those formed by the successive 93

P RO B AB IL IT Y, S T AT IS T IC S AN D TR U TH game of chancediffer from others.On the othcr hand, a probability t heo ry w h i c h d o e s n o t e v e n tl y to defi nc thc boundari esof thi .s fails. in nry opinion, to lpecial field, lar from beingsuperiorto n.rine, fulfil the most legitimatedemands. I i n te n d to s h o w l a te r,b y c e rta i nexampl es. how sequences hi cl r w do not possess the propeities of collectit,e,can be clerivedfronr (in collectives my sense) nteaus operationsrvhichclo not belong by of t o t h e s y s te mo f th e fo u r fu n d a mental operati ons scusscd di above. I n s o fa r a s s e q u e n c c o l th i s k i nd are of practi cal i nterest(e.g.. s certain so-called 'probability cl-rains'), they belong within the frarnewor! of my theory; but I do not seeany harm in denyin_q name the 'probabilities' to the limiting values of the relative frequencies in s uc h s e q u e n c e s . my o p i n i 6 n . i t i s both conrel ri entarrd usefi ul In [o c all v s or. .th e s e a l u e s i m p l y ' l i rl i ti n g fre quenci es' . as I havesuggested earlier,to use a word such as 'chance'.Of course,there is no lbgical needfor this cautioususeof the word probability; it is quite potiibl. that, once the frequencytheory has been firmly cstablishcd.nrot.e freedomcan be allowed in thc use of the terms.
OBJECT IONS T O T HE FR I]QU EN C Y C ON C E P T

D IS C U S SI O N O F THE FO UNDATI O N O F PRO BABI LI TY surely not by a sciencervhich centres around the word 'probable'. is Harold Jeffreysz8 similar to The point of view of the geophysicist (1931), he goeseven that of Keynes.In his book Scientificltlference further and says that any probabilit,v,in the widest senseof the by word. can be expressed a number. If, for example.a newborn child has seenonly blue and red objectsso far in his life. thcre exists for this child a numericalprobability of the next colour beingyellow; to this probability, however,is not supposed be determinedin any Other argumentsof the subjectivists way by statisticalobservations. have been presentedearlier in connexion with the question of cases. equally possible point of view has found some ln recentyears,the Keynes-Jeffrey support; efforts have been made to constructa rigorous systemof theseattempts. probability. Let us briefly describe subjective
TI{EOR Y OI. TH E P I-A U S IB ILITY OF S TA TE ME N TS

As I have mentioned previously,the liequency theor.yof probability has today been acceptedb1,alntost all rnathc'rnaticians interes t e di n th e c a l c u l u so f p ro b a b i l ity or.i n stati sti cs. s i s usual l v Thi ex pr c s s e d y th e p h ra s e ti ra t p robabi l i ty nrcansarr ' i dcal i zedfreb quen c y ' i n a l o n g s e q u e n c e f s i m il ar observati ons. bel i eve o I that by int r o d u c i n g th e n o ti o n o f th e c o l l ecti veI have show n how rhi , ' idea l i z a ti o ni's o b ta i n e da n d h o w i t l eadsto the usualprooosi ti ons and o p e ra ti o n s f p ro b a b i l i tyc a l cul us o O n th e o th e r h a n d , th e re h a v e b eeni n the past and theresti l l are a few authors who recommendapplying the theory of probability in cases which in no u,ay deal with frequencies and massobservations. T o ci te a n o l d e r e x a m p l e :E d u u rdr ' . i l ar.tmann.26 tl i e i ntroducti on in to his Philosophyof the Unt.on.sciorz.s (1869), derives mathematical f or m u l e fo r th e p ro b a b i l i tyo f n a tural evcnrsbei nq due to spi ri tual c aus e sa n d fi n d s i t to b e e q u a l to 0.5904.I havc earl i ermenti onec , t he e c o n o m i s tJ o h n Ma y n a l d K e ynes,:;a pcrsi stentsubj ecti vi st A c c o rd i n g to h i s o p i n i o n , p ro b a b i l i ty ccasei to be a trui trvorthy guide in life if the frequencyconceptis adopted.It see to me that ms if s o me b o d yi n te n d sto m a n ' y a n d w ants to l l nd out ' sci enti fi cal l y' if his c h o i c e w i l l p ro b a b l y b e s u ccessfulthen hc can be hcl pei , , perlnps. by psychology. physiology. cugenics, or sociolo-ey, bur 94

G. takes paper (1941),the mathematician P6lya2e In an interesting as his starting point the following historical fact. Referring to a proposition concerning an unproved property of integers, Euler stated that this proposition was 'probably correct', since it was valid for the numbersI to 40 as well as for the numbers101and 301. is Even though such inductivereasoning not otherwisecustomaryin of or mathematics, perltapsjust because this fact, P6lya considers this argument worthy of further investigation.He proposesthat in such instancesone might speak of 'plausibility' instead of probability. We are quite willing from our point of view to accept this at P6lya arrives essentially the following terminologicalsuggestion. (l) by conclusions: There are objectiverules,i.e., rules accepted all, on how to judge plausibility;e.g.,if the number of known instances the which support a proposition is increased, plausibility increases; if an hypothesis which the propositioncould be foundedis shown on (2) to be incorrect,the plausibilityis decreased. A numericallynonto determinable figure, between0 and l. corresponds every plausibility. (3) The formule of the calculusof probability are qualitatit:ely applicableto plausibilityconsiderations. namely,that thereare generally The first of the aboveconclusions, accepted rules for judging plausibility,will not be contended.What is meant by mathematicalformule being qualitativelyapplicableis otnot quite clear to me. Perhapsthis meansthat merely statements inequalitiesand not of equalitiescan be made, though even that could be orderedin a much would require that the plar"rsibilities vf

P RO B AB IL IT Y, S T AT IS T IC S AN D TR U TH sequencesuch as that of the real numbers. But my ntain obiection to P6lya'splausibilitytheory is the following: The plausibility of Euler's Theorem does not rest exclusively. or. even essentially, his forty-trvo particular instances. it di<i, we on If rnight stateequallywell that all numbersof the decirnalsystem could be reprcsented at most three digits. or that no number is the by product of more than six prime numbers.The essential, at least or an essential, reasonfor the plausibilityof Euler's theoremlies in the fact that it does not contradict any well-knorvnand easily checked property of rhe integers. Moreover, if we pay attention to this theorem we do so becauseit rvasformulated by Euler ancl we know that he had a comprehensive knowledgeof thL theor.yof numbers. How_ a1ewe to rveighthesefacts in judging the plausibitityin question ? Shouldwe then count the numberof properiies which a theorem does not contradict? would we have to conclude that plausibility will increase rvith cvery new property with which the theorem doe.s not conflict? As I have stated.P6lyadoes attempt to cxpress plausibility the 'ot of a statementby a definite number. Oth-erauthors are less,.rrru.d'. R. Calnap,3o who belongedto the Vienna Circle of Logical positiv_ rsm, now supports a theor-yof inductive logic' where he uses the e1nr9pi91'probability l' for the plar.rsibility a of ludgment, whereas the.idealizedfrequencyis called ,probabilitv 2'.-noih of theseare said to follow the usua-lrules of probability calculus.In carnap's opinion, the difference between Jeffrey's view and mine consists in the fact that one of us talks of .probability I' and the other o[ 'probability 2'. Within the framework of theory I, Carnap formu_ lates the. following p.oposition; on thc basis of today's irreteorological data, the probability that it will rain tomorrow is 0.20. Horvever,'the value 0.20, in this statement, not attributed to tois morrow's rain but to a definite logical retationshipbetween the prediction of rain and the meteorologicaldata. This relationship beinga logicalone . . doesnot requireany verificationby obsertation of tomorrow's weather any bther oLservation.' Carnap does 9r not statehorv the figure0.20is to be derivedfrom the rneteoroiosical data. No meteorologist would fail to say that such a deductiJn is ultimately based on statistical experien-ce. This, however, would bring us right back to probability 2. Carnap'stheory would needto indicate.how,by starting with piopositioniexpressingthe meteorol:glll data, we arrive, by means of logical opirationi, at the figure 0.20 (or any other figure). His theory is, however,unable to s[ow t his . 96

D IS C U S SI O N O F TFI E FO UNDATI O N O F PRO t sABI LI TY gap cxistsin othel systensrvhich seekto The sameunbridgeable on 'a purely logiial notion of the plausibilityof an hypothesis define the basisof given facts', using in an elaboratervay the formal tools C. of symboliclogic and largedosesof mathematics. G. Hempel and who attemptedto do this, had to resort in the end P. Oppenheim.st basis,thus as observations an essential to thi admissionof statistical recognizingthat mass phenomenaand repetitive events are actually the Jubjeci of their theory. I certainly do not wish to contest the but of usefulness logical investigations, I do not seewhy one cannot statementsabout a proq?to begin with that any r-rumerical admit bility l, about plausibility,degreeof confirmation,etc.' are actually statementsabout relative frequencies'
TH E N IH ILIS TS

to Finallv. it is necessary say a few words about thosecontempor*ho ptof.st, more or lessexplicitly, that there ary matf,ematicians is'no need to give any difinition or explanationof the notion of probability: What probability is, everybodyknows who useseverybay languige; and-the task of the theory of plobability is only 1o delermiie the exact values of these probabilities in different special the completelymisunderstand meaning Such mathematicians cases. I of exact science. think that I have alreadysaid in the first lecture true that. all that need be said about this question.It is essentially such a conception forms the sta-rtingpoint of scientific historically, development.All theoriei arise primarily from the wish to find relations betweencertain notions rvhosemeaning seemsto be firmly it In established. the course of such investigations, is often found has a-word is.an every notion for which the usual language that not appropriate Lasis for theoreticaldeductions.In all fields in which s6ienci has worked for a sufficientlylong time, a number of new artificial or theoretical concepts have been created. We know that part of scientificprogress.Everywhere, this processis an essential from the most abstract parts of mathematics to the experimental in physical sciences, so far as they are treated theoretically, the exact defrnition of concepts is a necessarystep which precedestlte statement of propositionsor goesparallel to it. of We miy find an examplein the modern development physics' In the whble history of theoreticalphysicsuntil the beginningof the eventsoccurling at presentcentury, the notion of trvo simultaneous and in no need points was considered be self-evident to iwo different Today, everyphysicistknows, as an essential of further explanation. 97

Y
P R OB AB II IT \" S T .AT IS T IC S N D TR T TFI A consequ!nce Einstein's of spec'iar theory of .elativity, that the notion of s i rn u l ra u c i ty q u i re s d e fi n i ti on.i u,hol c theory ,pri ne, re a i i onl t hls d e l l n rtro n h i c rri s g c n c ra i l y onsi dcrcd w c onc of tri c mosi fr.ui tful developments modern physics.This theory lnust be of simplv existentfor ail who think ihit we know the mearing 'onii.Iii""."i,y anyhow, i.e. 'from the usual sense the word'.32 " "i of I think thereforethat the refutatio' of those who co'sider every definition of probability to be superfluo's can be left to foilow its natural course. one reason for n-rentioning these .nihirists'is the existenceof certain intermediateopinions between their poririon and our. point of view regardingth'e formation or in'un exact science. "on".pir some of thesemiddle-of-the-road con".ptio,i, ,t,outa not g o u n m e n ti o n e d ,
REST RICT ION T O ONE SINGLE IN ITIA L

D IS C U SS] O N O F TTI I - FO UNDATI O N O F PRO BABI LI TY a 'mixing' problern.Tlte mathematicalsolutiou of such a problem as carrbe a very difficult and complicatedone, even if it consists. in operation to this case,in the applicatiortof onc singleftrndamental In one given initial collective. the literature,we firtd the solution of this problem for the specialcaseof/l.-) proportional to the product r/.r' log (log x) with rrr and n both becominginfinitely large. Let us return to the generalproblem in which we are interested. who are engaged in that mathematicians It is quite understandable the solution of difficult problemsof a certain kind becomeinclined to define probability in such a way that the definition fits exactly this type of problem. This rnay be the origin of the vierv (which is in generalnot explicitly formulated),that the calculusof probawhosedistribility dealseachtime merely rvith one singlecollective, This kind to bution is subjected certain summationsor integrations. of theory would not needany foundation or 'axioms' other than the distributionsand integrations. conditions restrictingthe admissible concernittg in consist assumptions The axiomsof this theory therefore of the admissibledistribution functrons,the nature of the sub-sets the attribute spacefor wl-richprobabilities can be defined, etc. investithese basicmathematical In the caseof probabilitycalculus, gations were carried out by Koln-rogoroff. 'Ihey form an essential part of a complete courseon the theory of probability. They do not, however, constitute the foundations of probability but rather the foundations of the mathematicaltheory of distributions,a theory of which is also used in other branches science. According to our point of view, such a systemof axioms cannot take the place of our attempt to clarify and delimit the concept of probability. This becomesevidentif we think of the simple caseof mathematicaldimthe die or the coin where the above-indicated culties do not exist or rather where their solution is immediate without drawing on the mathematical theory of sets.3a of Our presentation the foundationsof probability aims at clarifying preciselythat side of the problem which is left aside in the formalist mathematical conception.
PR OBA B ILITY AS PART OF TH E TH E OR Y OF S E TS

C OLLIC TIV E

A poi't of view_typicar the attitude of many nathematicians of is represented A' Kolmogoroff's attractiveand important in book on the Fourular iotts of rhe Theorl, oJ'probobitit.t,.BB To^ potnt ot vrew, co'sider for a momeltt the purely "nJ.rr,unJ'lf,i, mathematical aspectof the content of a textbook on the theory-of probabilitv. we soon notice thar a grearmany of rhe calculario,{, the same type; namery,'Given fhe distribution in a "i.ii"".'"rio certairr."ri."t iv e;to d e te rm i n e e p ro b a b i l i tyc orrespondi ng th to;;.;tr" p* i " i the total setof attributes';this 'part' of tire so-cilled .attribute snace, or ' . la b e ls p a c e ' i so fte n d e te rmi n e dn a compl i cateO ay;;.;i ;, i w of this ki'd, which in our terminologybelongto the class ,mixing, of problems, are somerimesvery complicatei. rhe followin; il';:, ex am p l e : T h e g i v e nc o l l e c ti v e o ' s i s tso f a combi nati on , si mpl e c of . al ternat iv es .r b e i n g a v c ry ra rg en u mb e r.The attri bute of an' crement is t hus a .s e q u e n co f a s y m b o l s . h i ch arc. e.g..0 or l ,.red.or.bl ue., e w et c . T h e p ro b a b i ri ryo r e a c rrc o m b i ned t.i rtt. i ....' .r;h )' pos s l b re o m b rn a ti o n s f , s y mb orsi s know n. w e c i ,i . o now" consi der another lar-ee number rr,, smalrerthan n. together rvitli a variable num b e r.r' .Iy i n g b e trv e e rr a n d a . and a gi ri n functi on rrr /(.r), (e.e.. I ne s q u a rero o t o t' .v ).On e .ma yn o w ask. w hat i s tl rc probabi l i ty fi r the number of I's amonq the.firslr:.symbolsto be sm'aller tfrani{ fbr all .r lying between and.rr This questionobviously ,i ? ,ingf.J oui a certainpart of tlre 2,,possible cornbinations, part dependiig onl1, a on the number nr and the functionfx), and *ei.e re.iing tn! suri, of t he p ro b a b i l i ti e s f a i l a ttri b u te suetongi ng rhi s gro;p, o ro Thi s i s 98

By consistentlydevelopinga theory which deals with only one collective in each problem of probability and merely with one type of operation applied to this collective,we would eventually arrive at the conclusion that there is no theory of probability at all. Ail that is left of it then are certain mathematical problems of real functions 99

P ROBA ts II.IT Y , S' T AT IS T IC SA N D TR U TH and point setswliich in r.ulncan be ct.rnsidered belongingto other as well-known mathentaticaldonrnins. ,From this point -of -view'. to quote from one of the reviewsof Kolmogoroff's book,r5.the theot.v of probability appearsto lose its i'dividiral eriste'ce; it becoures i part of the theory of additive set functions'. In the samelnanner,somemathernaticians proclaimedthat hvdrodynamicsdoes not exist as a separate science sinceit is nothing but a certain houndary problem of the theory of partial differJntial equations' Years ago, when Einstein'stheory fifst becan're known among mathematicians, some of them said that electroclynamics is from now on a part of the theory of groups. u logical mind this identificationol two things belongingto ..Jo different categories,this confusion of task and to6l is sorietf,ing quite unbearable. A marhematical investigation, diflicult as it rna! be, and much spaceas it may occupy in the presentationof a physicil !h:giy, is never, and can never be, identiCal with the theoiy itself. still less can a physical theory be a part of a mathematical domain. The interest of the scientistmay be concentratedon the mathematical, i.e., the.tautological, side of the problem; the physicalassumptioni on which the mathematical coristruction is basedmay be meniioned extremelycasually,but the logical relation of the two must never be reversed. Here is an analogy from another field: A state is not identical with its government; it is not a part of the governmentalfunc_ tions. In certain casesall the exteinal signs oithe existenceof a state are the actions of its government;but the two must not be identified. In the.same.sense probability theory can never becomea part of . the mathematicaltheory of sets. It remains a natural science,a theory of certain observablephenomena,which we have idealizedin the concept of a collective.It makes use of certain propositionsof the theory of sets,especially the theory of integration,to solve the mathematicalproblems arising from ihe defin-itionof collectives. Neither can we concedethe exiitence of a separateconcept of probability basedon the theory of sets, which is sometimes said to contradict the concept of probability based on the notion of relative freouencv. All that remains after our study of the modern formal development of _thisproblem is the rather unimportant statementthat the of probability doesnot require in irs surnmations integra(or 11eolY trons) other mathematical implementsbesides those alreadyexisiing in the general theory of sets.

D IS C U S S IO N

OF TH E

FOU N D A TION FR E QU E N C Y

OF

P R OB A B II.,ITY

D EV E LOP ME N T

OF TH E

TH E OR Y

100

During the past decade, the frequency theory founded on thc notion oI the collective has given rise to a notewotthy development. ihis evolution seemsmost piomising even though practically applicable formulations have so far not resulted from it' This new th99r.y J' was founded in Germany (1936) by E. Tornier.36 L' Doob8?is its chief proponent in America. I shall briefly explain.its today without presupposing fundimental ideas,in so fal as this is possible familiarity with the theory of sets on the part of the reader' At the'outset, Tornier introducesin place of the 'collective'the concept of the 'experimentalrule'. By-.that he means an infinite ,.qu.n.. of observations made according to a certain rule; for example,tMconsecutive resultsof a game of roulettc. He expressly depending aclmiisthe possibility of the result of a certain observation. My theory is basedon o. the preieding on-e of other connexions. on the assumption that all that happens to one given die, or to one of given rouiette wheel forms one infinite sequencc- . events' In rule admits of an however,a glven expertmental iornier's theory, as infinite numbei of infinite sequences its 'realizations'. Let us, for instance, think of a game of 'heads and tails' with the possible resultsdescribedby the figures0 (heads)and I (tails). one realization of of this game may be an infinite sequence. 0's' another a sequence consisting 0'i alter"nating and I's, in short, any infinite sequence of of these two-numbers. The total of all possible realizations forms a set in the mathematical senseof the u,ord; each group of realizations which have a certain characteristicin common is a partial set. If we assignthe measure I to the total set, then the theory of setste.aches to us h--.ow attribute smaller numbers to the partial setsaccording to the their frequencies; sum of thesenumbersmllst be l.In Tornier's theory, a given die, or rather the experimental mle referring to this die, ii-chiracterized by attributing to the partial sets of .possible realizations certain measures as their probabilities. For instance, there may be a die such that the realizations containing more l's showing a certain than 6's predominate; for another die, sequences so on' regular aiternation of results may occur.frequerrtly.; .a.nd . of in Totnier's theory, there is not simply a probability- .the 6 as such; there existsinitead a probability of the 6 being, for instance, the result ofthe tenth cast,i.e., the relativefrequencyofthe realizaa tions which shovu 6 on the tenth place.That means,of course,that the setuDin Tornier's theory is much more generalthan that in nty theory. hi, th.ory perrniti us, for instauce.to stipulate that the l0l

PROBABI LI TY,

STATI STI CS

AN D T R U T H

probability of a 6 on the twentieth cast should be different from that on the tenth. It also leavesus free to make an arbitrary assumption concerning the probability of casting I in the eleventh trial after having cast 6 in the tenth one (this being the frequency of the group of realizationscoirtaining 6 in the tenth place and I in the eleventh place). Thus the multiplication rule does not follow from the fundamentalsof this theory. Tornier's theory is also applicable to experimental rules whose results do not form collectives in the sense of my theory. To take into account the conditions which prevail in games of chance,it is necessary make certain assumpto tions, e.g., that the multiplication rule holds, that the frequency of the realizations having a 6 in the nth place is independent ofn, etc. The greater generality of the Tornier-Doob theory is bought at the expenseof a greatly complicated mathematical apparatus, but the logical structure of the system is perhaps more lucid and satisfactory. We will have to wait and see how the solutions of the elententaryproblems of probability calculus will be developedin the new system.This seemsto me to be the test for judging the foundations of a theory. It should be noted that in the American literature this develonment of the frequency theory is often referred to under the headiirg of 'Probability as a Measure of Sets'. I have earlier pointed out that probability can always be consideredas a measure of a set even in the classical theory of equally likely cases.This is certainly not a speciality of the theory which we have just discussed,even though in it the principles of the theory of sets are used to a greater extent than in others.
SUM M ARY AND C ON C LU S ION

lhe phenotnena'. t"i. Eu"l, theory of this kind starts with a number of so-called gencral experiencc;they u*io,"r. I' tlesc" axioms, use is maclc of facts. They delineateol' obscrr,,uble do not, however, state clirectly of t1e theory; rll theorems are but deductions i.tin. itr. subject this, to besides iro,n the axioms, i.e., ta'tologiial transformations; cet'taindata have ,otu. .on.t.te problernsby nieansof the theory, to be intloclucedto specifythe particular problern' --3. of The essentially'nerv-concept our theory is the collective. and all problemsof the theory prouabititiesexist only in collecfives rules' new of probability consisi in deriving, according to certain distributionsin calcularing_the ."fi..iiu., frim the given ones,ind thesenew col lect ives. Thisidea, r vhichisadeliber at er est r lct lonot betwee' of it"-cat"utur of probabilitiesto t6e investigalion relations has not been clearly carriid through in any of the distributions, former theoriesof ProbabilitY. propertiesof a collec4. The exact formulation bf the necessary comparatively secondaryinrportanceand ls capable.ol tive is of 'firese p'opcities ire the existence limiting of further modification. and ratrdomness' valuesof relativefrequencies, hal'e shown that objectionsto the con5. Recent investigations lor of sistency my theoiy are iuvalid. It is not possibleto substitute some postulatcof randomness ifr. g.i.tuf rindomness requiremerlt The new setof to whiJh is restricted ccrtain classcs placeselections' of a promisingdel'elopment the uf of forni.r and Doob co'stitutes frequency theorY.

D IS C U S SI O N O F Tt I E FO UND. {TI O N O F PRO B. \ BI I - I TY or 'theory of heat phenornena'. geomclry' thc 'thcory of space

I have said all that I intendedto say on the problem of the foundations of the theory of probability ind the discussionwhich has arisenaround it, and I am now at the end of this argument.In an attempt to summarize the results, I may convenientl-y refer to the content of the last paragraphs. My position may be described under the following five points: l. The calculusof probability, i.e., the theory of probabilities,in so far as they are numerically representable, the theory of definite is observablephenomena,repetitive or mass events. Examples are found in of chance,population statistics, Brorvnian motion, -sames etc. The rvord 'theory' is used here in the same way as when rve call hydrodynamics, the 'theory of the flow of fluids', thermodynamics, 102 103

Y
TH E L AWS OF L AR GE N U M BER S

FOURTH LECTURE

The Laws of Large |t{umbersl


AMoNG the many difficult questions connected with the r.ational foundationof the theory of probabilitynone has causedso much confusion that concerning real meaning the so-called as the of Law of LargeN.umbers, especially relation to the frequencytheory and its of probability. Most authors waver betweentwo pbsitions: the definitionof probabilityas the limiting valueof relative frequency is alleged either to imply Poisson's Law2 or to contradictit.- In fict, neitheris the case. The plan of theselectures naturallyincludes detailed a discussion of this question.A restriction, however,is imposedon me by the fact that I do not expect from my audience ipecialmathemitical any knowledge;thereforeI shall refrain from deductions a matheof maticalkind. Nevertheless,hopeto be ableto explainthe essential I points of the problemsufficientiy clearly.We are'goingto discuss, besides ploposition which is usuallycalled the Liw of Large the Numbers,its classical counterpart, often calledthe Second Law of Large Numbers,and we shall briefly indicatethe extensions which thesetwo lawshavefound in modernmathematics.
POISSON' S T WO DIF F ER EN T PR OPOSITION S

The ultimatecause the confusionlies in poisson's of book itself. As.wehavealready mentioned, called he two different propositions, whichwerediscussed two partsof his Recherches probabititd in sur-la jugements, the samename.Quiteprobablyhe heldihem to be des by really identical.In the introductionto his book he formulatesthe first of them in the following words: 'In many differentfields,empirical phenomena appearto obey a certaingeneral law, which can be calledthe Law of LargeNumbers. This law states that the ratios of numbers derivedfrom the observation a very largenumberof of 104

sirnilar eventsremain practically constant, provided that theseevents are governedpartly by constant factors and partly by variable factors whose variations are irregular and do not causea systematicchange in a definite direction. Certain valuesof theserelationsare characin teristicof eachgiven kind of event.With the increase lengthof the seriesof observationsthe ratios derived from such observations They could constants. come nearerand nearerto thesecharacteristic to reproduce them exactly if it were possible to make be expected of of series observations an infinite length'. These sentences,taken together with the discussion of a great number of practical exampleswhich follows, make it qnite clear that, irr speakingof the Larv of Large Numbers, Poissonmeant here a genclalizationof empirical results.The ratios to which he refersin with which Fis proposition are obviously the relative fr"equencies or in certain eventsrepeatthemselves a long seriesof observations, with which the differentpossibleresultsof an experithe frequencies ment occur irr a prolonged seriesof trials. If a certain result occurs nr times in n trials, we call mfn its'relative frequency'. The law formulated by Poissonin his introduction is thus identical with the first condition we imposedon a collective,namely,that the relative of frequency of a certain event occurring in a sequence observations is of apploachesa timiting value as the sequence observations inof definitelycontinued.If, when speaking the Law of LargeNumbers, everybody meant only rvhat Poisson meant by it in the introduction to iris book, it would be correct to say that this law is the empirical basisof the definition of probability as the limiting value of relative frequency. A large part of Poisson's book, however, is taken up by the proposition, which the of derivation and discussion a mathematical author also calls the Law of Large Numbers, and which is usually referredto either under this name or simply as'Poisson'sLaw'. This proposition is a generalization of a theorem formulated earlier by Jacob Bernoulli.s The Bernoulli Theorem may be quoted as follows: If an experintent, v'hose results are simple alternatittes v'ith the probability pfor the positiue result, is repeqtedn tinrcs,and if e is an orbitrary small number, the probability that the nurnber of positiue resuh.s will he not snnller tlun n(p - r), and not larger thann(p I e), tends to I as n tends to infnit.y. example. We may illustrateBernoulli'spropositionwith a concrete In tossinga coin 100 times, we have a certain probability that the resul t' heads 'willoccur at least49, and at m ost 5l t im es.( Her e t he 100,t ': 0, 01. )I n cast ing he sam e t 2 of the theoremequals l/ 2, n

l0s

Y
PROBABI LI TY, ST. { TI STI CS AN D T R U I 'H TH E L AWS OF L AR GE N U M BER S

coin 1000 times, the probability of the frequency of the result 'lreads'beingbetween490 and 510, is larger (p and e are the same as before, but r : 1000).The probability of this frequencyfalling in the rangc betwcen4900 and 5100 in 10,000 castsis still nearer to g l, an d s o o n . P o i s s o n ' s e n e ra l i z ati on thi s proposi ti onconsi sts of in discardingthe condition that all casts must be carried out with the same or with identicalcoins. Instead,he allowed the possibility of using a different coin each time, p (in our caseequal to |) norv deno ti n gth c a ri th rn c ti c ame a no f the n probabi l i ti es the n coi ns. l of A still more generaland very simple formulation of this proposition rvasgiven by Tschebyscheff.a appliesto thc casein which the It experiment involved is not an alternative ('heads or tails'), but admits of a number of differentresults.For our discussion. however. of the fundamentalmeaning of Poisson'sLaw, it is quite sufficient to considerit in the specialform which it takes in a simple game of ' 0 or l ' . T h e q u e -s ti o i s : Wh a t i s the rel ati onof thi s mathemati cal n proposition, rvhich we may briefly call Poisson's Theorem (or Bernoulli Theorem) to the empirical law formulated by Poissonin his introduction? Is it true that Poisson'sTheorem is eouivalentto this law? Is it correct to considerPoisson's Theorem as a theoretical deduction capableof an experimentaltest, and actually confirmed bv it ?
T ' QUAL L Y L IKELY I,VE N 'TS

are this theory,to stipulatethat 'nearlyall' the equallylikely cases of to favourable the occurrence this event' If n throws are carriedout with a coin, and n is a largenumber, of resultsof this series throws is a the numberof differentpossible the very largeone. For instance, first twenty throws,as well as the lasi thiriy, may havethe result'heads',and all the remainingole-s the resull'taili'; or the first ten throwsmay havethe result'tails', (a and so on. With n : 100thereare 2too 3land the rest'heads', we of outcoines the game.-If aszume digit number)differentpossible single to equal l12foreach of thit theprobabilityp throwing'heads'is mustbe considered of results combinations throw,thenall thisc 2100 that e is taken as 0.01 Let us assume ; as 'equallylikely' cases. that whenn is a largenumber,by far the Theoremstates Poisson's part of tlte 2" differentresultshavethe common property sreatest in contained them differsfrom nl2 by not itrut th" numberof 'heads' by derived This is the contentof the proposition than rr/100. more the concerning actual It Poisson. doesnot lead to any statement seties experiments. of of results a prolonged

A R ITH ME TIC A L

E X P LA N A TION

To answer the above questions.we lnust begin by considering what Bernoulli and his successors understood by probability. Poisson'sTheorem contains the word 'probability'; Poisson'sempirical law does not mention it. To understandclcarly the meaning of Poisson'sTheorem in the classicaltheory, we must explicitly introduce into it the definition of probabilitv used bv Poisson himself. We alreadyknow that the classical theory, in its conceptof probability, did not take into account the frequencywith rvhichdifferent eventsoccur in long seriesof observations. Insteadit declared,in a more formalist way. that 'probability is the ratio of the number of favourablecasesto the total number of equally likely cases'.With an ordinary coin the two possiblepositionsafter the throw are thc two 'equally likely' cases.One of them is favourablc for the result ' head s;' th u s ' h e a d s h a sth c p ro b a bi l i tyl /2. Thi s probabi l i ty ' concepr is the only one uscd in the derivation of Poisson's l'heorem. To say t hat a n e v e n t h a s a p ro b a b i l i tv' r.re arl y' rneans.n the l aneuage l i of t 06

the we In orderto makethispoint stillclearer, shallnow represent for 0 by thefigures and l, where0 stands results throwingthe coin of and I for the result'tails'. Eachgameof 100throws theresult'heads' number,the digitsbeing0's an<i by can be characterized a 100-digit the I's. If we omit any zeroswhich precede first I on the left-hand whichcan still be usedto rePresent side,we obtain shorternumbers all We of the corresponding sequence experiments' can now arrange of the numbelsoccurringin onr system resultsin a simplescheme, whichbegins follows: as 0, l,10, ll,100, l0l, ll0, lll,1000,1001, by all The scheme includes numbersthat can be expressed 0's and of by I's uD to the numberrepresented a succession 100 l's' As i.e', numbers, a includes total of 2100 mentioned above, this sequence abouta milliontrillions. by may be explained the The meaning the notationintroduced of to cot'responds a following The example. numberl0l in the scheme with 1,0, and againl. If result 97 zeros and ending with be.ginning would beginwith the same /r were 1000 of instead 100.the scheme 107

PROBABILITY, STATISTICS AND TRUTH numbers above,but would be very much longer,containing as 21000 numbers. wouldnow meana resultcomposed l@0 zeros. 0 of and l0l a resultbeginning with 997 zeros.Poisson's Theor.em then is nothingbut a statement concerning suchsystems numbers. of The followingfactsare of a merelyarithmetical natur.e and have nothingto do with repeated events with probability our sense. ol in If we consider setof naturalnumbers the represented 0's and by I's up to 100 digits, proportion numbers the of containing 50,or 49. 5l zeros foundto beabout16l.lf weextend scheme include is the to the of containing lumberswith up to l0O0figures, proportion those from 490 to 510 zerosis rnuchhigher,roughly47/,. Amongthe combinations 0's and I's containingup to 10,000 of figures,there will be morethan 951containing from 4900to 5100zeros. other. In words,for r : 10,000 most 5'l of all combinations suchthat at are the proportion 0's to I's differs of from l/2 by more than l/o,i.e., suchthat the numberof 0's differsfrom 5000by morethan 0.01n: 100.The concentration the frequencies tfie neighbourhood of in of the valuel/2 becomes moreand morepronounced with the increasc in the lengthof the sequence throws. of This arithmetical situationis expressed the classical in theory of probabilityby saying:In the first sequence probabilityol the the results to 5l zerosis 0.16; in the second 49 sequence probability the of the results 490 to 510 is 0.47; in the third sequencelhe results 4900to 5100lTave probability the 0.95.By assuming e:0.01 and p : l12,the theorem Bernoulliand Poisson be formulated of can as follows: Let us write down, in the order of their magnitudes, all 2" numbers whichcan be written by means 0's and I's containins of up to r?figures. The proportion of numbers containingfrom 0.49i to 0.51rzeros increases steadily with an increase a. in This proposition purelyarithmetical; sayssomething is it about certainnumbers and their properties. The statement nothingto has do with the resultof a singleor repeated sequence 1000aciuat of observations saysnothing about the distribr-rtion I's and 0's and of in suchan experimental sequence. proposition The doesnot leadto any conclusions concerning empiricalsequences observations of as lo.ng we adopt a definitionof probabilitywhichis concerned a,s only with the relativenumberof favourable and unfavourable cases, and statesnothing about the relation between probabilityand relative frequency. The sameconsiderations apply, in principle,to cases similar to that of tossing coin. Whenwe consider gamewith a true die, we a a must replace system numbers the of composed I's and 0's by the of 108

TH E L AWS OF L AR C E

N U M BER S

systemof numberscontaining six different figures,i.e., the figures in l' to 6. The theoremstatesin this casethat with an increase the n, the proportion of numberscontainingabout n/6 ones number one. steadily,and finally approaches increases as can Our conclusions be summarized follows: The mathematical basedas they deductionsof Bernoulli, Poisson,and Tschebyscheff, are on a definitionof probabilitywhich has nothingto do with the of frequencyof occurrenceof eventsin a sequence observations, of relativeto the results suchsequences. cannotbe usedfor predictions They have, therefore, no connexion whatsoeverwith the general in empiricalrule formulatedby Poisson the introduction to his book.
S UBSEQU EN T FR EQU EN C Y D EFIN ITION

his How is it possiblethat Poissonhimself considered mathematicaltheoremas a confirmationof his empiricalLaw of Large Numbers? Poisson Oncethis questionhas beenasked,it is easyto answel'. understoodtwo different things by probability. At the beginningof he his calculations, meant by the probability ll2 of the result'heads' the ratio of the number of favourablecasesto that of all equally However,he interpretedthe probability 'nearly I' at possiblecases. This value was supthe end of the calculationin a different sense. of event, the occurrence posed to mean that the corresponding 0.49nand 0.51nheadsin a gameof n throws,must occur between This change the meaningof a notion in the of in nearlyall games. courseof deduction is obviously not permissible.The exact point Is wherethe changetakesplaceremainsunspecified. the probability already from 49 to 5l heads of 0.16ofa series 100throwscontaining must producethese in to mean that 16/" of all games a long series results Or is this interpretationonly applicableto the probability ? 0.95calculated n : 10,000? for lf answerto thesequestions. one wishesto Thereis no possible retain at all cost the classicaldefinition of probability and at the then one must introduce sametime to obtain Poisson's conclusion, This would an auxiliaryhypothesis a kind of deusex machina.s as givesa valuenot haveto run somewhat this way: 'If a calculation in very differentfrom I for the probabilityof an event,then this event experiment'. takes of placein nearlyall repetitions thecorresponding definitionof probaWhat elseis this hypothesis the frequency but bility in a restricted form? If a probabilityvalue0.999meansthat at why not concede always', the corresponding eventoccurs'nearly

r09

Y
P RO B AB IL IT Y, ST A T IST IC S AN D TI{U TH once that the probability value 0.50 means that the cvent occurs in the long run in 50 cases out of 100? It is then, of course,necessary give a preciseformulation of this to assumption and to show that Poisson'sTheorem can be derived on the basis of this new definition of probability. We shall seethat this deduction of the theorem differs from the classical one in manv ways. At any rate, the procedure of changing the meaning of a concept, without notice, between the beginning and the end of an argumentis certainly not permissible. We close this sectionwith the statementthat Poisson'sTheolem can be brought into relation with Poisson's Law of Large Numbers only by adopting a frequency definition of probability i-n one form or another.
T HE CONT ENT OF POIS S ON 'S TH EOR EM

TH E L AWS OF L AR GE

N U M BER S

oF EXAMPLE A SEQUE:::

rHEoRsM PoISsoN's i3J^TcH

It is natural to raise the following objection.If we wish to define the probability of the result 'heads' as the limiting value of its relative frequency, then we have to know in advance that such a limiting value exists.In other words, first of all we must admit the validity of Poisson'sLaw of Large Numbers. What then is the use of deducing, by complicated calculations, a theorem which apparently adds nothing to what has been assumed?The answer is that the propositions derived mathematically by Bernoulli, Poisson, ancl Tschebyscheff imply much more than a simple statementof the existence of limiting values of relative frequencies.Once the frequency definition of probability has been acceptedand Poisson's'fheorem restatedin the terms of this theory, we find that it soes much fr.rrther than the original Law of Large Numbers. Under tlie new conditions. the essential part of this theorem is the formulation of a definite statelnentconcerningthe succession say, the results'heads'and of, 'tails' in an indefinitely long seriesof throws. It is easy to find sequences which obey Poisson's empirical Law of Large Numbers, but do not obey Poisson'sTheorem. In the next section,we shall discussa simple sequence this kind. In it the relative frequency of of tlre positive result has a limiting value 1f2, exactly as in tossing a coin; yet Poisson's Theoremdoesnot apply to it. No rnathematician who knows the propertiesof this sequence would considerit in connexion with probability. We shall use it here iu order 1o show what additional condition is imposedby Poisson's Theorem on sequences which form the subject of the theory of probability, namely, those which we call collectives. ll0

roots; this is a tablecontaining a Let us consider tableof square 1,2,3, integers roots of the successive of the values the square ., calculatedto a certain number of decimals,say 7. We 4, (sixth)figure after the decimal only the penultimate shall consider point; 'positiveresults'will be numberswhich contain one of the one containing of all hgu..s5,6,7,8, or 9 in thisplace; numbers results'. O, the figures 1,2,3, or 4 in this placewill be 'negative of in The whole table is transformed this way into a sequence I's (positiveresults)and 0's (negativeresults)which alternatein an an this irregularway. Theoretically, sequence.is infinite apparently tablesend with someParall oni, althoughin practice square-root that and ticularnumber.If is plausible canbe provedrigorously6 the have the of limiting frequencies both 0's and I's in this sequence to valuef 12. is also possible prove the more generalproposition If 0, of that the relativefrequencies the singlefigures l, . . 9 are all equalto 1/10. Poisson's of is What we wantto investigate the possibility applying that ifgroups, each requires The to this sequence. theorem Theorem of are numbers, formedfrom the infinite sequence of n consecutive large,eachgroup shouldcontain 0's and I's, tlten,if n is sufficiently and 501 ones. about501 zeros to of The beginning the tableseems conformto this rule.To make in eachcolumnof 30 entries a specified this clear]we may consider of group,and then count the numbers 'positive' table as a separaie resultsin each group. A simplecalculationshows, and 'negativ-e' to when we proceed very that however-, the stateof affairschanges large numberswhich lie beyondthe scopeof the usualtables.Acthen.the formula,if a is muchlargerthan-1, cordingto a well-knorvn of squarJroot of an expression the form az + | is nearlyequalto I and that for a + !. Let us assume, instance, a is onemillion(106), za numbers The az onebillion (1012). squareroots of the consecutive e2, e2* l, a2'l-2, . . . will differ by the very small amount It place- is ll210-6, i.e.,by one half of a unit in the sixth.decimal a entries cause change to aboutten consecutive to necessary consider a and root by 0.000005, so to change 'posiin the valueof the square one.In otherwords,in into tive' resultin our sequence a'negative' regularly of this part of the table our sequence 0's.and I's contains l's groupsof about ien consecutive and ten consecutive alternating lll

PROB ABI LI TY.

STATI STI CS

AND T R U T H

THE LAWS O F LARG E NUM BERS of failed to notice tlnt in this table thc frequencies 0's and I's fluctuate permanently and do not tend toward limiting values. The lesult that is important for us is that there exist infinitc of sequences 0's and I's such that the relative frequenciesof both these attributes tend toward definite limiting values but for which Bernoulli'stheoremdoes not hold true. which we have consideredhere do not Of course, the sequences satisfy the condition of randomnesssince each of them is defined by a mathematical formula. We may then ask whether randomnessin our sense(which, together with the existenceof the limit of relative forms a suflicientcondition for the validity of Bernoulli's frequencies prerequisitefor the validity of this theorem. theorem) ts a necessary This is not the case.It is not too difficult to construct mathematical i.e., sequences where a formule which define 'Bernoulli sequences', stabilization of frequencies will occur in sufficiently long subHence, the state of affairs is the following: sequences. For arbitrory sequencessatisfying the first axiom the Bernoulli to Theorem need not hold true. It is not necessary require complete randomnessin order to prove the Bernoulli theorem. In other words, of the theorem of Bernoulli is a consequence the assumption of randomness but it cannot be used as a substitute for the randomness of requirement.One could, for instance,indicate sequences numbers which would satisfythe Bernoulli theorem but in which the chanceof an attribute could be changed by a place selection of the primeTheorem expresses number type. The Bernoulli-Poisson-Tchebyscheff for only a special type of randomness. If we call those sequences we which this theoremholds'Bernoulli-sequences', can say that: All but not all Bernoulli-sequences collectives are Bernoulli-sequences are collectives.
D E R IVA TIO TOF TH E B E R N OU LLI-P OIS S ON TH E OR E M

0's. The followingsection an actualtableillustrates arrangeof this ment:


a2

t0r2+ + + + + +

t237 1238 1239 1240 l24l 1242 t249 t250

Case

106 0.0006185 + 6190 6t95

62n 6205 6210

0 0 0

+
f

62s0

6245

0 I

Further along in the table,e.9.,in the regionof a : 100millions, the run of consecutive and I's will be of the average 0's length1000. We seethat the structureof the sequence l's and 0'i derived of from the tableof square rootsis radically different from thestructure of sequences as that derivedfrom tossing coin. The randomsuch a nessdescrib_ed Poisson's by Theoremapparently existsonly at the beginning the table.Further on, the runsof identical.results'are of slowlybut steadily increasing length.It canbe easily in seen that this does_not obeythe Poisson Theorem.Let us assume large, a .sequ^enge but finite numbern as the length of a group, s&/ r : 500.By taking (e.g.,to a : 100millions),we come enoughtermsin our sequence into a regionwherethe ateragelengthof the runs is much greater thann. In this regionnearlyall groupsof 500itemswill consisieither of zerosonly,.or o^fones_only,_rvhereas Poisson's Theoremrequires them to consist of, roughly, 50/" of each.The limiting valuei t/2 for the frequencies the results'l' and '0' are due inlhis caseto of the equallengthsof runs of both kinds.In a rrue gameof chance. however,.the a^pproach the limiting values broughtaboutby an to is equalization ofthe relativefrequencies of0's and I's in nearlyeverv group of sufficient length.

BERNOUL L I

AND

NON- BERN OU LLI

S E QU EN C E S

What was shownherefor the tableof square roots holdstrue, in the sameor in a similar way, for tablesof many other functions. e.g.,for powersof any order,etc. It doesnot hoid for the table of poincard logarithms, which Henri Poincard discussed an example. as lt2

It should thus be clear that, once the frequency definition of probability has been adopted, the theorem derived by Poissonin the fourth chapter of his book contributes important information reof garding the distribution of the results in a long sequence experiments. We have also seenthat there is no logical way of obtaining this information starting with the classicaldefinition of probability. The only way is to define probability from the beginning as the limiting value of relative frequencies and then to apply, in an combination, and appropriatemanner, the operationsof selection, mlxlng. 113

P RO t } ,A BII-IT Y .S T AT IS T IC S A N D TR Tl T}.I t,rl t Lc t u s c o n s i d c fth e s i n rn l c sc u s c.i n w hi ch an i nl i ni te seqttcttcc ' i is l ex pL' r' i n rc n l \ rttrc l e trr ti rc s a n rcs i rttpl c tl tctnattre rvi tl rtl i c proba(s1rccial cascof thc BernoulliTheorcm). By applvbility 7rof succcss followedby n combinalions,rvcform a new collectivc ing n sclections just as rve did when solving the problem of de Mdr6, in tl-re second lecture (p. 00). Each element of this new collectiveconsistsof a group of rr trials. The attribute of each such elemeltt is therefore given by rr numbers(0's and l's). Then, by the operationof mixing, attribute:This is done by considering we return to a one-dimensional only the nunber ntoJ now as the result of the combinedexperirnent of of I's in the group of n trials, regardless the arrangernent 0's and I's in the group. In this way we obtain a probability of the occurrence of r r , o n e si n a g ro u p o f r tri a l s . We cal l thi s probabi l i ty tl Qrt;p), s inc ei t i s d e p e n d e no n b o th r, a n d p;i t i s gi venby the formul a: t v ,(n ,: t: p (;) p , ,,(t _ p),,-,,,.

THE LAWS O F LARG E NUM BERS To provide this greaterprecisionin the caseof the fundamentalsof probability calculus was the very task which we had set ourselves.
S U MMA R Y

Here the symbol (ij ) ,tunO, for a known integer,dependenton rr '


\tttl

and rr. If we add all the t'(nr; p) for all those nr which fall between smallpositive n(p - e) and n(p f e), wheree is an arbitrarily chosen, magnitude, we obtain the probability P that the relative frequency of lies between(p - e) and (p i e). I's in a group of rr experiments Studying the resulting formula, we arrive at the conclusion that, however small the value of e, the probability P tends towards unity as n increases indefinitely. We hace thus proued the Bernoulli Theorem. methodsof arriving Today we know simplerand more far-reaching at this result.We can also include the more generalcasewhere the r observationsare not derived from the same collective (case of Poisson),or where they are not simple alternatives(caseof Tchebyscheff). These genelalizations hold no further interest for us here sincewe are concernedonly with the logical aspectof the deduction. of In closing,we note that by amendingthe usual understanding the 'First Law of Large Numbers' we are in no way belittling the great achievement the pioneersof probability calculus.This will of be realized by anyone who is at all familiar with the history of -The-founders mathematics. of analysis,Bernoulli, Euler, Lapiace, have given us a large treasureof propositionson integration,series, and similar subjects, which wereproperly derivedand correctlyfitted into logical systems all later. In many cases, that was only centuries needed to was greaterprecisionwith respect the passage the limit. to lt 4

(1) The proposition introduced by Poisson in the introduction to irrs book as the Law of Large Numbers is a statement of the empirical fact that the relative frequencies ofcertain eventsor attributes in indefinitely prolonged sequences observationstend to constant of limiting values. This statement is postulated explicitly in our first condition for a collective. (2) The rnathematicalproposition derived by Poissonin the fourth chapter of his book, which he also called 'The Law of Large Numbers', says nothing about the course of a concrete sequence of observations.As long as we use the classicaldefinition of probability, this theorem is nothing but a statementof certain purely arithmetical regularities, and bears no relation to the empirical law explained i n (l ). (3) The Poisson Theorem (see (2)) obtains a new meaning if we agreeto define probability as the limiting value of relative frequency, a definition suggestedby the above empirical law. Ii however, we adopt this definition, Poisson's Theorem goes considerably further than the empirical law; it characterizesthe way in which different attributes alternate in empirical sequences. (4) The content of Poisson's Theorem is in fact that a certain equalization or stabilization of relative frequenciesoccurs already within nearly all sufficientlylong sub-groupsof the infinite sequence of elements.This is not implied by assuming only the existenceof limiting values of relative frequencies.In fact, as was shown in an example of a sequenceof 0's and I's, the relative frequenciesmay tend to definite limits, e.g., ll2,blut the runs of 0's and l's may gradually and regularly increasein length so that eventually, however large n may be, most groups of n consecutive elements will consistof 0's or of I's only. Thus, in most of thesegroups there will be no 'equalization' or ;stabilization' of the relaiive irequencies; obviously, such a sequence does not satisfy the criteria of randomness. (5) The correct derivation of the PoissonTheorem basedon the frequency definition of probability requires not only the assumption of the existenceof limiting valuesbut also that of comolete randomness of the results. Thii condition is formulated i; our second postulate imposed on collectives. I t5

Y
PROBABI LI TY. STATI STI CS AND T R U T H TH E L AWS OF L AR GE N U M BER S

After thingshavebeenclarifiedin this way, all that remains be to doneis to decide question terrninology: a of whichproposition shall be calledthe 'Law of Large Numbers'?My suggestion that this is nameshouldbe reserved the theoremof Bernoulliand Poissou. for The empirical law formulatedby Poisson the introductionto his in book can conveniently called the axiom of the existence be of limiting frequencies.
I N F ERENC I

If the probabilityof the occurrence an attributewithin a given of collectivehas a value near to unity, we may express this fact by saying that 'thereis a greatcertainty'or 'we are almostcertain'that this eventwill occur on one specific trial. This way of expressing ourselves not too reprehensible long aswe realize is so that it is only an abbreviation and that its real meaninsis that the eventoccurs almostalwaysin an infinitelylong s"qu.ice of observations. we If apply this terminologyin connexionwith the BernoulliTheorem, and if we say in addition that a number which, for small e, lies between - e) and n(p -| e) is 'almost equal to np', we arrive at n(p the followingimprecise formulation:If a trial with probabilityp for 'success' performed is againand again,and if n is a largenumber, itis tobe expected greatcertainty with that in oneparticular sequence of n trials the eventwill occurapproximately times.This foimularp tion leadsus to ask whethera certainconverse this propositioir of mightnot hold true,namelythe following: If in onesetof n observations,n beinga largenumber,the 'event'hasoccurred times,may n, we then inversely'expect with greatcertainty'that the basicprobability p of the eventis approximately equalto the ratio nrln? We shall seethat undercertainconditions this is actuallythe case, and thatthisso-called Bayes's? Theorem representsSecond a Lawof Large Numberswhich, under certainassumptions, be provedmathecan maticallv. If we usean extremely sloppyterminology, both lawscan be saicl to coincidein one statement: The probabilityof an eventand the relativefrequency its occurrence, a long sequence trials,are of in of about equal.If, however, useour wordswith precision-and to we do so is one of the principaltasksof theselectures-we shall find that the original propositionand its converse have very different meanings. Now, we are speaking about a probability that p will assume certainvalues will lie in a giveninterval.whereas the or ln originalinstancep wasa given,unchanlging number.I shallpresently ll6

the for problem,but first I wishto insert construct collective Bayes's a remark. Thosewho think that probabilitycan be defined independently of of of the frequency occurrence an attributein a sequence experiof that the above-mentioned proposition, mentsbelieve whereby probroughlycoincide a long run of observations, in ability and frequency a what actuallyhappens theconcept constitutes 'bridge'between and of probabilityintroduced them.However, know that this is a by we From the definitionof probabilityas the ratio of favourdelusion. no able to equallylikely cases, logical reasoning will lead to the propositionsdiscussed above-neither to the original BernoulliPoissonstatement nor to Bayes's converse it. All that we can of logicallydeduce from this premise propositions is concerning such ratios. A gap remains:the mannerin which it is to be crossed is arbitraryand logicallynot justifiable.
BAYBS'S PR OBL EM

An easyway to understand Bayes's problemis to consider game a of dice in combination with a lottery. Let us imaginean urn filled with a very large numberof small cubesor similar bodies,which we are goingto call stones. Eachstonehassix flat sides, numbered to I 6, on any of which it can fall whenthrown out of a dicebox. Each time a stoneis drawn from the urn, it is placedin a box and then turnedout. and the resultof the throw is noted.The stones not are all equal;someof themmay be'true', with a probabilityl/6 for each side,othersbiased,with the singleprobabilities differingmore or lesswidely from the value l/6. The actualvaluesof the six probabilitiescan be determined a sufficiently long sequence throws by of 6, madewith eachstone. shallconsider probabilityof casting We the and denoteitby p. Thus eachstonehas a characteristic valueofp. We now considera collective whoseelements as follows: A are stoneis drawn from the urn, placedin the dice box, and thrown n timesin succession. result,or attribute,of this total experiment The is composed on the one hand, the p-valueof the stonedrawn of, from the urn, and, on the otherhand,the ratio nrfn,wheren, is the numberof castswhich have producedthe result 6. This is a twoby dimensional with each elementcharacterized a pair collective, of numbers, andnrln. p The distribution in this collective,or the plobabilitiesof all possible p of combinations the numbers andnrln,mustbe calculated by the combinationrule. This involvesthe multiplicationof two

r17

PR O BABI LI TY.

STATI STI CS

AND TRUTH

TH E L AWS OF L AR GE

N U M BER S

factors: the first of them is the probabilityu(p) of drawinga stone with a certainvalueofp from the urn; the second the probability is that an eventwith probabilityp will occurn, timesin a series n of trials. The elementarytheory of probability teachesus how to calculate this second factor from givenvaluesof p, nv and n. This probability has, in fact, been derived(seep. I 14)and denotedby w(nt; p). The probability of drawing a stonewith the attributep and of obtainingwith it n, 6's in n throws is, accordingto the multiplicationrule, u(p)v,(nr; Assumingthat n is a constant, p). we can consider this productas a functionofp and nrfn, and denoteit byf(p, nrln). To arrive at Bayes's problem,we must perform one more operation on our collective, namely,a partition. Let us recollect what we meanby this. In a simplegameof dicea typicalproblemof partition is the determination the probabilityof the result2, if it is already of known that the resultis an evennumber. The collective dividei is into two parts, and only that corresponding 'even' resultsis to considered. In the casewhich we are now discussing, following partition the takesplace.We know alreadythat 6 appeared timesin n casts. n, What is the probabilityof this result 'beingdue' to a stonewith a certaingivenvalueofp? The part of the collective be considered to in the calculation this probabilityis that formedby the sequence of of castswith a certainvalueof nrln, e.g.,nrfn: a. Accordingto the divisionrule derivedin the second lecture,the probabilityin question, the final probability,is calculated dividing the probability by f(p,o) by the sum of the probabilitiesof all resultsretainedafter partition. The summationmust be extended over all valuesof p, while a is constant. We will denotethe resultof this summation by it {a), since is a functionof a only.The final resultof the calculation is the quotientf(p,a)lF(a), i.e., a functionof both p and a. We may call it, in the usual terminology,the 'a posterioriprobability' of a valueofp, corresponding a givenvalueof nrln: a. certain to
INIT IAL AND INF ERR E D P R OBA B ILITY

The expression posterioriprobability' for the ratiof(p,a)lF(al 'a refers. course, the fact that thereis still another of of to probability p in our problem,namely,u(p).The latter is the probabilitythat a stonewith a givenvalueofp will be drawn from the urn. It is called the'a priori probability'ofp. Eventhoughwe feelthat it doesnot matterwhat wordsareusedso long as their meaning clear,it does is I 18

preferable useexpressions to which are not weighed down by seem associations. havealready We proceeded this way in in 6etaphysical lecture, whenwe developed operation partition, the of our second The quantityu(p)hasthe followingmeaning: Instead considerof let ing the wholoexperiment, us fix our attentiononly on the repetiin tive drawingof a stoneout of the urn, without beinginterested later to this stone. what happens The relativefrequency with which in a stonewith a givenvalue of p appears thesedrawingshas the limiting value u(p). This value may well be called the initial or originalprobabilityof p. This probabilityis quiteindependent the of madesubsequently a stoneduring the n timesit is observations on out of the dice box. Next, let us consider whole experithe tossed when, in n throws, a ment, concentrating only on thoseinstances stonewhich may haveany valueof p showed 6 with a relative the thosewith the frequency nlln : a. Among the stones thus selected, value of p will appearwith a relativefrequency specified different in from u(p). The limiting value of this frequency, an infinitely continued sequence observations, bef(p,a)lF(a).This limiting of will valueis the probabilitywhich we infer from the observation the of nl successes n throws.We shalltherefore it theprobabilityof in call inferenceor inferred probability of p. A numerical example will clarify this point. Suppose our urn containsnine kinds of stones,such that the valuesof the probabilityp for the occurrence a 6 are of possible alike 0,1,0.2,0.3, .,0.9. The stones considered in outward are showing appearance. Theycould,for instance, regularicosahedra be the 6 on 2, or 4, or 6, . ., or I 8 of their 20 sides with corresponding valuesof p equalto 2120:0.1, 4120:0.2, . ., 18120:0.9. If thereis an equal numberof eachkind in the urn, we can assume that the probability of drawing a stone from any of the nine in categories the same.Therefore, this case,u(0.1): u(0.2)-is : u(0.9) l/9. This givesus our initial distributionu(p). We now decide consider case to the n:5, ftr:3, i.e.,we cast in eachstonefive timesand note only thoseinstances whichthreeof the five throws haveshowna 6. The probabilityof obtainingthree objectcan be successes of five trials with a givenexperimental out determined according the formula shownon p. l14, and is: to w(3;p): t1ps(t- p)2.
ln fact, if n :5 and n, : 3, the term (;,)

has the value 10. We


ll9

Y
P ROBA BIL IT Y , S T AT IS T IC S AN D TR U TH can now calculate the product where c : ntln:3/5, -f(p,q), eachof the nine valuesofp. It is given by the formula: for THE LAWS O F LARG E NUM BERS The same formula as rtsedabove will answer our question,even of though the process computation is now more complicated,owing to the higher powers of p and (l - Z), and to the higher value of (j) tn. result is as follows: The probability of inference for \nrl to_ p : 0.6 is now 0.99995, while the figure which correspon-ds the p-values0.5, 0.6, and 0.7 differsfrom sum of the three neighbouring unity only in the lTth decimal place.We can thereforesay that it is ' al most certai n't hat an object which has had 60 1 'success'in500 equal or almost equal to 0.60. trials has a probability p of success This result is not surprising.It showsthat inferencebasedon a long than that basedon a short is of series experiments far more effective one. If we considerall the stonesin the urn, the same probability corresponds to each of the nine categories,whereas by considering in only those stoneswhich showed601 success 500 trials, we find, practically speaking,no probability correspondingto values of p smallerthan 0.5 and largerthan 0.7. This resultstandsout evenmore markedly as we increasethe number n of observationsstill further. This feature constitutes the main content of Bayes'sTheorem. Let us statc it, for the moment, in a preliminary version: If an object, arbitrarily drawn out of an urn, is subjectedto a large nrln : a, it shov,inga frequency of success nwnber n of obseruations p is highly probable that the probability of success of this object is equal or nearly equol to the obsert'edrelatiuefrequency a. Stated a little more precisely,this probability tends to unity as n increases indefinitely. We have to clarify further the meaning of the rvords 'equal or nearly equal to a'. If we assumethat there is merely a finite number of different stones with corresponding different, discrete values of then the observedfrequency a of success the probabili ty p of success, will, in general,not coincide with any of those values of p. There will, however, be two values immediately adjoining the observed p-valuesand we value of a which are equal to some of the possible two. If, as in our Theoremto these of can apply the statement Bayes's case,the observedvalue of a does coincidewith one of the possible valuesofp, then this one alone, or together with the two immediately adjoining values of p, is to be included in the statement of Bayes's proposition. The mathematicianwill often prefer to stipulate that the probap bility of success is not restrictedin advance to certain discrete fractions but that it can assumeany value in the whole interval between0 and l, or in a given part of this interval. In that case,the 121

: .f(p,3ls): u(p)w(3;p) i . rops(r- p)2. This givesus the followingeasilycomputed resulrs: For p:0. l, the corresponding value of / is 0.0009;for p :0.2. J':0.0057; : . . .; forp : 0.6,-f 0.0384; soon.Thesumof all ninefvalues and amounts F:0. 1852. to Consequently, probability inference the of whichwe areseeking be0.0009/0.18520.005forp :0.1 ; for : will : 0.031; . . . for- p :0.6 ir p :0.2 we obtain 0.0057/0.1852 becomes 0.0384/0.1852:0.21; the sum of nine values and will be equalto L We thus haveshownthe following: If we know nothing abouta stoneexcept that it wasdrawnfrom theabove-described urn, the probabilitythat ir will belongto one of the nine differentcategories equalto l/9 : 0.ll. If, however, already is we know that this stone shownthreesuccessesfivecasts, has in thentheprobability that it belongs thecategory p: 0.I orp : 0.2becomes to of muchsmaller than 0.ll,.namely,0.005 0.031respectively, or whereas probathe bility that it is a stone whichp : 0.6increases 0.1I t; 0.21. for from Everyone consider will this resuitto be understandable reasonand able.The fact that a stonehashad a historyof 60\ success within a_ shortexperimental increases probabilitythat it is a stoneof run the thecategory p :0.6 and lessens probabilityfor a verydifferent of its value of trylr as 0.1 or 0.2. Other valuesol the probability of 4 inference this case in are:0.08for p:0.3;0.19 for^bottr p: O.S and :0.7 (taking p two decimals); finally0.05for p : 0.9.The dnd sum of the three probabilitiescorresponding the three values to whereas totalprobability the P :0.5, p.:0.6, ?ndp :0.7 is 0.59, corresponding the othersixp-values to together only 0.41. is
L ONGER SEQUENCES OF TR IA LS

Let us considerthe sameexampleas above,but with a certain modification. haveagainnine typesof objects equalproporWe in tions in our urn. Thus, the initial piobability for eacli categoryis againu:119:0.11 for eachof the nine types. Onceagaln,-we 'partition off' thoseinstances when castswith a stonehavi shown success a frequency 601. However, with of this time rvewill assume that the total numberof castsis no longer5 but 500.and. corresthat qond]1qly-, the numberr, of requireJsuccesseschanged is from 3 to 300.What can we now infer with regardto the proba6ilities of the ninevalues p? of 120

PROBABI LI TY,

STATI STI CS

AN D T R U T H

TH E L AWS OF L AR GE

N U M BER S

initial probability u(p) is given as a probability density. Bayes's Theorem then considers p along a short interval extending from a - e to a * e,where e is an arbitrarilv small number. We can then state the following: If an objectpickert at random has shown afrequency of successa, in a long sequenceof experiments, then the probability P that the probability p of this object lies betweena -'e and a I e will approach unity more and more closely as the number n of experiments is more and more increased.The number e can be arbitrarily small, so long as the interval from a - e to a f e is large enough to include, in the case of preassigneddiscretep-values, at least two possible values of p.
INDEPENDENCE OF T HE IN ITIAL D IS TR IBU TION

We still haveto add a very essential remarkin order to bring into properfocusthe actualcontentand the greatimportance Bayes's of proposition. haveseen We that the probabilityof inference depends on two groupsof data: (l) the initial probabilityu(p); (2) the observed resultsof n experiments from which the inference drawn. is In our first example, assumed we that the nine differenttypes of stones, withp:0.1 to 0.9,werecontained equalnumbers the in in urn, so that the valueof eachof the nineprobabilities wasequal u(p) to ll9. Let us now assume that thereare in the urn one stoneofthe first category, ofthe second, ., and nineofthe ninth category. two . The total content will now be 45 stones(being the sum of the numbersI to 9), and the valuesof the probabiliies will be: u(0.1) : 1145, :9145. The probability inferu(0.2):2145, . . ., u(0.9) of encecan again be computedaccordingto the formula on p. 000, substituting, placeof the previous in factor ll9, the newvalues1/45, 2145, ., 9f45, respectively. now obtain the following results . We from our calculations: The probabilityof inference p - 0.1 is for now 0.001 (comparedwith 0.005 before); for p :0.2, it is now (0.031 0.011 before); p:0.5, 0.6,0.7,we get 0.16,0.22,0.23 for (0.19, 0.21, 0.19before), forp : 0.9,we find 0.07(0.05 and before). As was to be expected, new valuesare markedlydifferentfrom the the earlierones.If we compare inferredwith the initial probabilthe ities we find, however,agiin that the numericalresultsaie higher for values p closeto 0.6, and lower for values of further awayfiom 0.6. Let usnow consider same the distribution theinitial probabilities of but assume that the numbern of experiments 500.W; then arrive is at the very remarkable fact that now thereis no noticeable change 122

all differences, the Exceptfor negligible in the inferredprobabilities. as remainthe same beforewhenu(p)wasuniform.If we pause results after all. As to reflecton this fact, we find that it is not so surprising is long as the number of experiments small, the influenceof the however, the numberof experias initial distributionpredominates; more and more. Since this influencedecreases ments increases, applying a speaking, proposition is, Theorem mathematically Bayes's we to'an infinite numberof experiments, concludethat the aboueof statedproposition of Bayes holds true independently the giuen of the i.e.,whatever contents the urn from which initialprobabilities, the stonewasdrawn. suchas an urn with Conditionslike thosegivenin our examples, occur only very rarely. More coma givendistributionof objects, monly,we may pick a die aboutwhichnothingwill be known.except 'six objectwith which to testthe alternative a that i1 seems suitable remainthe sameas before:If n experior nonsix'.The conclusions then, mentshaveshownnr successes, so long asn is small,wecannot anything conclude from t/ussincefor smalln, the resultof our inferi.e.,on the general on encedepends,mainly the initial distribution, make-up the dicefrom whichwe havepickedour particulardie.If, of n howevei, is a largernumber,say500,thenwe candraw conclusions of without anyfurther knowledge the total body of availabledice. And we can saythat thereis indeeda very high probabilitythat the of probabilityof success the die we pickedwill lie closeto the obthis Oncewe havederived fact,it appears ofsuccess. frequency served to quite remarkable realize clearand almostobvious;it is, however, that it is a direct resultof probabilitycalculusand can be derived definitionof probability. only on the basisof the frequency Theoremcan be obA brief and useful formulation of Bayes's the tained by substitutingfor the term 'probability of success' long definitionof this probability.We canthen state:If a sufficiently a, of has of sequence alternatives showna frequency success we c,an expectwith a probabilityvery closeto unity that the limiting value will not be very differentfrom a. This brings out ofthis frequency Laws of the clearlythe closerelationbetween First and the Second Larse Numbers. problemsof mathematical ti ttre fifth lecture,we shall discuss Theorem'It is indeed which are closelyrelatedto Bayes's statistics on to the principal object of statistics make inferences the probaAs bility of eventsfrom their observedfrequencies. the initial ofthe unknown,that aspect are ofsuch events generally probabilities will prove to be problem which we havejust discussed inference

r23

Y
P ROBA BIL IT Y , ST A T IST IC S AN D TR U TI" I essential.It explains why we can generally draw meaningful conclusions only from a large body of statisticalobservationsand not from s m al l g ro u p so f e x p e ri m e n ts .
T HE REL AT ION OF BAYES' S TH EOR EM TO POIS S ON 'S

THE LAWS O F LARCE NUM BERS that the limiting value is nearly always only slightly different from the observedrelative frequency nr/n. Bayes'sTheorem thus contains a new statement,not identical with the premiseused in its derivation (i.e., the first axiom) and obtainable only by a mathematical deducBecause tion, which usesalso our secondaxiom, that of randomness. of the analogy of this theorem with the Law of Large Numbers, it is often called the 'second Law of Large Numbers'. have led us further than I intendedinto the Theseconsiderations field of abstract arguments,and I will conclude by restating.thetwo Laws of Large Nurnbers and the First Axiom in the special form adaptedto the problem of throwing dice.
TH E TH R E E P R OP OS ITION S

T HEORE M

It is not my intention to show how Bayes'sTheorem is reduced, in the sameway as Poisson's Tlieorem,to a purely arithmeticalproposition if we adhereto the classical definitionwhich regardsprobability as the ratio of the number of favourableeventsto the total number of equally possible events.This proposition leads to the predictionof empiricalresultsonly if we smuggleinto it again an ad ftoc hypothesissuch as: 'Events for which the calculation gives a probability of nearly I can be expected occur nearly always,i.e., to in the great nrajority of trials'. This hypothesis,as we knorv, is equivalentto the frequencydefinition of probability in a restricted form. By introducingthis additionalhypothesis changethe meanwe ing of 'probability' somewhere betweenthe beginningand the end of the investigation. From our point of view, a more important aspect of the question is the relation in the new frequency theory of probability between Bayes's Theorem and the Law of Large Numbers (poisson's Theorera),and the relation of Bayes'sproposition to the axiom of the existenceof limiting values of relative frequencies.At first sight nothing seems more reasonable than to identify the proposition:-'If a relative frequency a is found in a sequence length n (n being a of large number), it will almost certainly remain nearly unchanged by an indefiniteprolongationof the sequence', with the simpleassertion: 'The relative frequency approaches a definite limiting value with increasing r.' The essential difference lies, however, in the lvords 'almostcertainly',and these words can be replaced by'nearly always'. If a stone has been drawn from the urn and thrown n timei (n being a large number),and the result 6 has been obtainedn, times,so that the relativefrequencyof this result is a: nrfn, this experimentsays nothing about the behaviour of the same stone in a prolonged sequenceof throws. If we merely assume the existenceof limiting values,and notiring concerningrandomness, is quite possiblethat it for almost all stoneswhich have given the same value of nln, : a in n throws the limiting value of thJfrequerlcy will differ considerably from a, horveverlarge n may be. Bayes's propositionmeansthat in practicethis is not the case,and
taA tLa

In n casts of a die, a certain result, say 6, was obtained r, times. The three propositionsare as follows: (l) The Firit Axiom, rvhich is basic to our definition of probability, says that in an indefinitely long repetition of the same game the quotient nr/r will approach-aconstant limiting value, this value being the probability of casting 6 with this particular die. (i) rne Fiist Law of Large Nunbers, which is also called the Bernoulli-Poisson Theorem, says that if the game of n casts is repeatedagain and again with the same die, and n is a sufficiently laige number, nearly ill gameswill yield nearly the samevalue of the rutio nrf n. (3) The SecondLaw of Large Numbers, or Bayes'sTheorem, says that il for a great number of different dice, each one has given nt results 6 in a fame of n casts,where n is a sufficientlylarge number, nearlv all ol'these dice must irave almost the same liniting values of the relative frequenciesof the result 6, namely, values only slightly different from the observed ratio ntfn. These formulations exactly delimit the three propositions; all that must be added is that the first of them is an axiom, that is to say an empirical statementwhich cannot be reducedto simpler components. The other two are derivcd mathematically from this axiom and the Propositions(2) and (3), the two Laws of axiom of randomness. Large Numbers, lose their relation to reality if we do not assume from the beginningthe axiom (l) of limiting frequencies.
C EN E R A LIZA TION OF TH E LA W S OF LA R C E N U MB E R S

The two theorems discussed,that of Bernoulli and Poisson and that bearing the name of Bayes, form the classical Laws of Large 125

PROBABI LI TY,

STATI STI CS

AN D T R U T H

TH E L AWS OF L AR GE

N U M BER S

Numbers.It may be worth while mentioning that we owe to Bayes only the statement the problemand the principleof its solution. of The theoremitself was first formulatedby Laplace. recenttimes In the two laws have beenconsiderably supplemented extended. and We are not going to discuss thesecontributions detail here,bein cause theydo not affectthe foundations the theoryof probability. of On the other hand, we cannotpassover them altogether. Someof themareimportantfor thepractical application the theory;others of have causeda controversy which is interesting from our point of view.Although the positionis absolutely clear,many opponents to the frequency theorystill try to interpreteverynew theoremof this kind in sucha way as to construct contradiction the frequency a to conception. This is especially true for the so-called Strong Law of LargeNumbers. Let us first consider kinds of propositions the which can reasonably be considered belongto the class Lawsof LargeNumbers. to of The limitationsare,of course, more or lessarbitrary.In accordance with a certainestablished usage, suggest following: First of we the all, the problemunder consideration must contain a large number n of singleobservations; is to say,it must represent combinathis a tion of n collectives. aim of thecalculation The mustbe a probability P, determined the r observations. otherwords,the final collecby In tive mustbe produced the combination the n initial collectives, by of P beingtherefore functionof n. The lastcharacteristic the proba of lem is the fact that P approaches as n increases I indefinitely. The solutioncan therefore alwaysbe formulated a sentence in beginning with'It is almostcertainthat whenn becomes verylarge. .' In the theoremof Bernoulli and Poisson,it is 'almost certain' that the relativefrequency an eventin a sequence n singleexperiments of of is nearlyequalto its probability. the theorem Bayes is 'almost In of it certain' that the unknown probability lies closeto the frequency found in an empiricalsequence. course Of these abbreviated formulations do not express full contentof the theorems;complete the formulations havebeengivenin the preceding paragraphs. The word 'condensation' a veryadequate for the description is one factsexpressed the differentLaws of Large of the mathematical in Numbers. Usuallya probabilitydistribution a numberof different for attributes more or lessuniform. In the cases whichthe Lawsof is to LargeNumbersapply,the total probabilityI is 'condensed' one in point or rather in its very closeneighbourhood. condensation The becomes moreand morepronounced with an increase the valueof in n. the parameter Analogously the terminology to usedin analysis,

attemptshavebeenmadeto introduceinto the theory of probability are attempts not-too In the notion of convergence. my opinionthese of 'convergence the sense probability' in fortunate.The expression doesnot add much to the clarificationof the facts.This term_inology whosepoint of view I by hasbeensuggested thosemathematicians lecture,and who are ai characiJrized the end of my preceding have inclinedto makethe theoryof probabilitya part of the theoryof sets or of the theoryof functions.
TH E S TR ON G LA W OF LA R GE N U MB E R S

r26

havereCantelliand P6lya,8 especially Variousmathematicians, further than the which goessomewhat centlyderiveda proposition of As theoiemof BernoulliandPoisson. in the derivation Bernoulli's of Theorem,we again considera group of n repetitions -a simple rn an (e.g., or l), and treat this wholegroup'as ele-ment 0 alternative We an infinite numberof such-groups' hale of a collectivecdmposed the considered numbern, of 0's as the attributeof eachgrouo of n If observations. we introducenow the letterx to denotethe relative nrln of zerosin a group' the theoremof Bernoulliand frequency x poissoniavJ thut it is almostcertainthat the frequency is not very fiom the probabilitYP. different forming observations the moreclosely n single Let us now consi'der Let of an element our collective. m be a numbersmallerthan n, so number.Among the first (m*l) k is a positive that n-mwhichmay grouptherewill be a certainnumberof zeros, in results a of the frequency zero be anything"between and m I 1' We denote in frequency 0's in'this lart of the group by xr. The corresponding denoteby xr, and so on' up to x&' we the first m'* 2 observations the m: l0 and n:15, we start by calculating If. for instance, If observations. six 0's have of freouencv 0's within the first eleven If the twelfth trials,then \:6lll' in recorded the first eleven bee^n The casexz - 6l-12' the opposite trial givesa 0, then xz: 7ll2; in -calculated is xu, determinedfrom the total last irequency to be but the In number'of0,r in th. fifteenresults. fact,x. is nothingelse by denotedpreviously x. frequency not we in As attiibutesof an elemeni our collective, now consider of. values but the whole simply the value of x or x*, 'l 'r: '.,'*r, than -ty-t-t:-. now perflorm I ' We shall and less . . whichareall positive a mixirigoperarion:fZt pbe the probabilityof a zeroin the original simpleilteinative (".g',i : ll2ln the gamerrith an ordinarycoin)' x, Someof the k numbers to xkrnay r a smallpositiv6n-umber' "nd 127

P R OBA BIL IT Y , S T AT IS T IC S A N D TR U TH belong to the interval from p * e to p { e, others may lie outside it. lf at least one of the k numbers xrlo xo falls outside this interval, we registerthis as an 'event'.If all the fr numbersfall in the interval,we say that the event did not occur. The collectivederived in this way is again a simple and welldefined alternative.We are interestedin the probability P of the occurrenceof the above-definedevent in a group of n singleobservations. We can describeP as the probability that the relative frequency of the result zero deviatesby more than e from the ffxed value p at least once in the interval betweenthe mth and the last (nth) single observations a group of n experiments. in This probability can be calculatedby the repeatedapplication of the rules describedin the second lecture. The probability P depends on the four variables n, ffi, p, and e. We are, however, not so much interestedin the actual value of P, as in a certain remarkable property of this function. Calculation shows that P is always smaller than the reciprocal of the product of m and e2: i.e., P smaller16an-l-". n1e' This relation is independent the valuesof n and p. of Let us considerthe meaning of this result more closely.However small the constantvalue adoptedfor e (e.9.,!: 0.001,or e : l0-6), the expressio I f mezdecreases n indefinitely with indefinite increasein m, and finally tends to zero. The number n, which is supposed be to larger than m, will of course also increase indefinitely during this process.When the probability of an event decreases 0, the probato bility ofits nonoccurrence increases l. The aboverelationcan then to be interpretedthus: It is almost certain that betweenthe nth and the nth observations a group of length n, the relativefrequencyof in the event 'zero' will remain near the fixed value r and be within the interval from p - e to p + e, provided that'm and n are both sufficiently large numbers. The difference between this proposition and Bernoulli'sTheorem is quite clear: formerly, rveconcludedonly that the relative frequency x will almost certainly be nearly equal to p atthe end ofeach group ofn observations; now we add that,if nt is sufficientlylarge, this frequency will remain nearly equal to p throughout the last part of the group, beginning from the rnth observation. The amazingand unexpected part of this resultis the fact that the upper limit llnte2 of the probability P is independentof n. This result has given rise to the incorrect and unclear explanationsto 128

THE LAWS O F LARG E NUM BERS a which I have previouslyreferred.Let us assume constantvalue for ftr, say nr: 1000, and consider steadily increasing values of n, sa! n : 2000, 3000, 4000, etc. Obviously, the probability P of with an increascin n, sincethe the deviation in questionincreases in number k : n - rn of observations which the deviation of the relative frequency from the fixed value / can occur becomeslarger and larger. Calculation shows that despite this increasein the number of possibilitiesfor the occurrenceof the deviation, its probability will not increase above the fixed limit llntez. lf e is, for instance, thenlf mezis 0.01,and we can make the follorving 0.1 and m is 10,000, statement: We may expect with a probability exceeding99/. that after 10,000tossesof a coin the relativc freqtrencyof 'heads' will alwaysbe includedwithin the interval from 0.4 to 0.6 (i.e., between p - e and p * e), no matter how large the total number of tosses from which this frequency is calculated may be, whether it is a million, or ten millions, or a larger nun,ber still. This formulation of the Strong Law of Large Numbers, and the way in which we derived it, shows clearly that both the problem and its solution fit into the general schemeof the frequency theory without difficulty. It is a problem of constructing a certain new collective by means of the usual operations. This is all I wanted to show; it is not my purpose to give a discussion of the incorrect expositions which the publication of this proposition provoked.
TH E S TA TIS TIC A L FU N C TION S

I wish to use the remainder of my time fol the discussionof another generalizationof the Law of Large Numbers,e which is of more practical interest and forms a suitablelink lvith those problems of statisticswith which we shall deal in the next lecture. We begin by substituting a general collective with many different attributes for the simple alternative (l or 0) which we have been considering hitherto. As an example, we may use the game of roulette with its thirty-sevenpossibleresults, the numbers 0 to 36. Let us consider,as an clement in a new collective,a group of n, say 100, singlegamesof roulette.The attribute of this new element is a certain arrangement of 100 numbers from 0 to 36. We may, however, not be interestedin the 100 individual results, but only in the relative frequency of the 37 attributes, i.e., in indicating the number of times when 0, l, 2, . . and, finally, 36 appear in the group in question.We call this the statisticaldescriptionof the group, and it is obvious that the sum of these 37 entries will be n : 100. If t29

PROBABILITY, STATISTICS AND TRUTH we divide eachof theseentriesby ,, we shall obtain a sequence of prop!rfractionswith sum l. These fractionsxs, x1. x2;xs, . ., x3o are th-e_relative frequencies the occurrence tne airerint resulti, of or 0 to 36, in the.group underconsideration. These quantities x., xo, x2, xy. . ., with the sum l, describe what we call thefreqieniy distribution the various of in the group of 100.*p..i-dntr. ' -results In the sense the definitionsgiven in ihe secondlecture,the of transition from theoriginal compleie description thegroupto the of new abbreviated is equivalent a .miiing' operati-on. given one to i frequency distributioncan be due to a great-number diffErent of arrangements. thesimple In case only trio possible of results, and l, 0 and a groupof, say,l0 casts, distribution 0.30zeros the of and 0.70 onescan be produced 120differentarrangements, by sincethree0's and sevenI's can be arranged 120differJntways.If the proba_ in bility of eachpossible arrangement known,the piobabilitvbf anv is frequency distributioncan be calculated accordingto the'law o? mixing,by means summations. of It mustbe p_ointed that the samefrequency out distributioncan correspond different to lengths the group.For instance, distriof the butionof three two I's, andfive2'Jin agroupof tenobservations 0's, is equalto that offifteenzeros, l's, andiwenty-five in a grouo ten 2's with n : 50.To saythat we know rhedistribution some of attribute's in a certainexperimental materialdoesnot necessarily involvethe knowledge the numbern of the experiments of made. The subject our interest however, of is, often not the frequency distribution such,but somequantityderivedfrom it. It is ttri's as quantity which is considered the attribute in which we are as interested. This amountsto the performance another mixing of operation, whichall distributionileading the same in to valueof thE attributeare mixed.A quantityof this kiid, derivedfrom n single results, depending but only oniheir frequency distribution(and riot on their arrangement on the total numbern), is calleda siarisrical or function. rippl.st of a statistical functionis the average the of ^.1h.: observations.^example ten I's, and twenty_five hive been If fifteen0's, 2,s countedin a sroup the by 9i trty Lesults, averige is calculated dividingby fift! the fottowirigsum: (15 x 0) * (10 x l) + (2s x 2): l0 + s0:60. this givesus the result 60/50: 1.20.Another methodof calcula_ tion, which leadsto the sameresult,is to multiply eachof til ili; results 1,2) by its frequency the distribution;in our (0, in tf,, "ur., 130

THE LAWS O F LARCE NUM BERS are 0.30, 0.20, and 0.50, and the addition of the three frequencies three products gives

: (0.30 0) + (0.20 l) + (0.50 2) : 0.20 1.00 1.20. x x x +


We seethat the averagedependsin fact only on the numbers xo, ,xr, and x2,and not on the arrangementof the fifty results. Neither is it affectedby doubling the total number of observations,provided that this involvesdoubling the number of resultsof every kind. The average, mean, of a number of resultsis thus a statistical or function. Various other statistical functions will be describedin the next lecture. We are now going to say a few words on the Laws of Large Numbers in their relation to statisticalfunctions. rH E FrR S r LAw oF r l: . I \ I r . #iBERs FoR sr Ar r sr r cAL

of whose single results The averageof a sequence n measurements are either 0's or I's is obviouslyequal to the ratio of the number of l's to n. The BernoulliTheoremmay thereforebe statedin this way: of If n observations an alternative(0 or l) are grouped togetherto form an element of a collective, it is almost certain that the average will be nearlyequal to a certainknown number of the r observations p, if n is sufficiently large. Poisson'sgeneralizationof Bernoulli's proposition can now be formulated as follows: The n observationsneed not be derived from a singlealternative;it is also admissibleto calculatethe averagein the caseof n different alternatives,that is, to divide by n the number The of of 'positive'resultsobtainedin such a sequence experiments. 'condensation probability' demonstrated Bernoulli for the case of by occursin this caseas well. of n identicalalternatives The next step in the generalizationof the theorem is mainly due to Tschebyscheff's proposition saysthat the resultsneed Tschebyscheff. not be derived from simple alternatives;they can also be taken from collectives with more than two different attributes. We can. for instance, considern gamesplayed with a certain roulettewheel,or n gamesplayed with different wheels,and calculate the averageof the n numbers representingthe results of all games. If n is sufficiently large, it is almost certain that this averagewill be nearly equal to a certain known number, which is determined by the probability distributionsof the r initial collectives. Certain recent investigationsenable us to generalizethe proposition still further, and in this new form it becomesa very important
tJl

Y
PRO BABI LI TY, STATI STI CS AND TRUTII TH E L AWS OF L AR GE N U M BER S

theoremof practicalstatistics. The phenomenon 'condensation' of first-described Bernoulli, holds not only for the auerag"of rl by results, butcssentially all statistical for funciionsof n obseriations, rlr rs.a sufhcrenily largenumber.In othcr words, if we observe rr collectives (whichmay be all equalor different), and if we calculate the valueof somefunctionof the n resultswhich depends their on frequency distribution(but not on the orderof the ob'servation, no, on their total number),then, providedr is sufficiently laree.it ls almostcertainthat the valueso obtained will differ uufritttJr.o- o certainknown number,calculated advance in from the probauititv ,almo'st distributionsof the n collectives. The expressions etc.,are,of course, ""rt"inr, abbreviations whose meaning mustbeinterpreted accordin^g our definitionof probability,in thi following#t ii; to g^roup n experiments repeated verylargenumberof"timJs, of is a and if e is an arbitrarysmallnumber, val-ue the cilculated ti*"i'ro,n "a"t the z observations in the overwhelming will majority-of alr groups differ-by lessthan e from the 'theoretical'linown vaiue. Theiare'er n is, the greater_is majorityof cases whichthis prediction the in iuFns out to be true for a givenconstant valueof e.

THE

SECOND

L AW

OF L ARGE

N U MB E R S

FOR S TA TISTIC AL

F UNCTION S

Theorem, discussed one of the preceding in sections, can . Bayes's alsobe generalized sucha way as to appryto statistical in functions. Le-!us_ imaginenow that n obiervatiorij tiaue been made on one collective whose distribution unknown,e.g,,n throwswith a stone is ofapparently cubicform and selected ranfom from a h.;p ;i;;;; at stones.From rz observationsmade with this stone (eacli characterized!I one of the numbersI to 6), we deduc. u nu*b.. z, which depends ".ituin neitheron the order of resultsnor on the value of n,^but only distribution of the resutts;i is, 9n the frequency therefore, statistical a funciion. Accordingto the considerations of the preceding sections, we.mustassume existence a certain ihe of "theoretical valueof 2", denoted zo, whichis determined ttre by uv probability._distributionthe. in originaliolrective, our .ur", u! it. in six probabilitiesof the_six 'rin", sidesof the stone;Zo is unknowrr, we haveassumed the stonewhichserved that foritr. .*p.ri-.nt, t u, not beeninvestigated before.All we know about this stoneis the valueofZ calculated from the observations. The SecondLaw of Large Numbers,generalized statistical for 132

that, with a sufficientlylarge value of n, the unfunctions, asserts known theoreticalvalue Io will 'almost certainly' lie quite near to the observedexperimentalvalue I. The original BayesTheorem was equivalentto the above statementfor the specialcaseof the the value of of avera9e the results a simplealternative; 'theoretical' was in this casethe fundamentalprobability p of the event under We proposition:Asconsideration. can now formulatethe general havebeenmadeon an unknowncollecsumingthat n observations tive and haveproducedthe valueZ of a certain statisticalfunction, largenumberit is almostcertainthat theunknown if n is a sufficiently valueof thisfunction, characteristic thedistribution Zo, of theoretical under investigation, very near to the observed lies in the collective 'almost certain', valuetr. The way of interpretingthe expressions etc.,by meansof the frequencydefinition of probability has been above. indicated The propositionwhich we have now formulatedthus allows an into the nature of an unknown colleitive basedon the inference It of resultsof a sufficientlylong sequence experiments. is therefore of an evengreaterpracticalimportancethan the First Law of Large Numbers. It is perhapsthe most important theoremof theoretical as In of statistics. the next lecturewe shallconsider an example an Lexis'sRatio. If a sufficiimportant statisticalfunction the so-called has of ently long sequence observations givenfor this ratio a certain value,sayL : l.l, the generalized Second Law of Large Numbers valueof L (i.e.,the value that the 'theoretical' allowsus to assume is of characteristic the material of our investigation) nearly equalto characterizaLl. I do not intend to deal herewith the mathematical tion of those statistical functions to which the two laws apply. Instead,I will closethis lecturewith the following remarks.
C LOS IN G R E MA R K S

a conIn this lectureI havetried to elucidate numberof questions nectedwith the popular ideasof the Laws of Large Numbers,as far as this is possiblewith a minimum use of mathematics. had in I you mind a doublepurpose:In the first place,I wantedto acquaint with the content of theselaws, which are indispensable anyone to who has to deal with statisticalproblemsin one form or another,or it with otherapplications the theoryof probability.Secondly, was of especially the important for me to investigate part played by these laws in the calculusof probability basedon the frequency definition. in of As I said at the beginning this lecture,and also indicated the 133

PROBABI LI TY,

STATI STI CS

AN D T R U T H

previous one,manJ objections raisedagainst probabilitytheory my havebeendirectedagainstits supposedcontradiciions tire Lawi to of..LargeNumbers.I thaf I have succeeded making the in \lp. followingtwo pointssufficiently clear. (l) startingwith the notion of a collective and the definition of probabilityas a limiting valueof relativefrequency, the Lawsof all Large Numbershavea^ ciear and unambiguous meaningfree from contradictions. Eachof themamounts i definite to predlction concerningthe outcome a very long sequence experiments, of in of each which consists a greatnumbern of singleobservations. of (2) If we basethe conceptof probability,not on the notion of frequency, but on the definitionuied in the classical pro_ I"l.uljy. bability theory,noneof the Lawsof LargeNumbersis capable a tf p-r-ediction concerning outcomeof sequences observations. the of when suchconclusions nevertheless are drawn,tbisis possible onlv if, at the end of the calculations, meaningof the word .prota'_ the bility' is silently changed from that adopted ihe startto a dednition at based on.theconcept frequency.. of Naiurally,sucha p.o..au." leadto obscurities contladiciions. and -uy Before concluding, mustadd anotherwarning. is impossible I It to . give absolutely correct formulationsof the pr6positions have we discussed the useof formulasand mathematical if concepts. exceDt thoseof the most elem-entary nature,is avoided. hopetf,ut f frai. I succeeded statingall the essentials in correctly.From the mathematicalpoint of view,the formulations which i hau" givenare still incomplete, lackingvariousrestrictions, suchasthose c6ncernins the setsof admissible functions, well as further formal mathema"tical as conditions.Those who are interested and possess necessarv the mathematical knowledge can find this information in the mathe'maticalliteratureof the subiect.

FIFTH LECTURE

Applications in Statisticsand the Theorv of Error s


will be devoted this series, tHts lectureand the next, which concludes to a consideration of the two most important applications of the theory of probability. We shall no longer concentrate on games of chance. In the present lecture we will deal with various series of events, which occur in everyday life, and whose investigation is commonl ycalled'st at ist ics'.

WHAT

IS

srerlsrtcs ?

t34

ion 'i The word statistics has been interoreted as the 'investigation of literal . large numbers', or 'theory of freqirencies'.This is not the litr word, but an attempt to make clear the sensewhich meaning of the it has acquired in modern language. Long sequencesof similar phenomenawhich can be characterizedby numbers form the subject matter of statistics; examples are: population statistics(e,g., birth (e.g.,marriages, of ratesand death rates); statistics socialphenomena suicides,incomes); statisticsof biclogical phenomena(e.g., heredity, sizes of different organs); medicol statistics (e.g., action of drugs, cures); technologicaland industrial statistics(e.g., mass production, massconsumption,most problemsgroupedtoday under the heading of operational research);economicstatistics (e.g., prices, demand). In these and similar fields, the usual procedure is to collect empirical material, draw general conclusions and use these to form iurther conclusionswhich may be applied in practice. Which part of is should be calledstatistics more or lessarbitrary. this investigation We are not going to intervenein the strugglebetweendifferent schools of thought, the 'general' and the 'mathematical', the 'formal' and the for 'realistic'. All that is necessary us is to delimit the field which we 135

Вам также может понравиться