Вы находитесь на странице: 1из 9

Basic Concepts of Probability and Statistics in the Law

Michael O. Finkelstein

Basic Concepts of Probability and Statistics in the Law

123

Michael O. Finkelstein 25 East 86 St., Apt. 13C New York NY 10028-0553 USA monkelstein@hotmail.com

ISBN 978-0-387-87500-2 DOI 10.1007/b105519

e-ISBN 978-0-387-87501-9

Library of Congress Control Number: 2008940587 c Springer Science+Business Media, LLC 2009 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identied as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper springer.com

Dedicated To My Wife, Vivian

Preface

When as a practicing lawyer I published my rst article on statistical evidence in 1966, the editors of the Harvard Law Review told me that a mathematical equation had never before appeared in the review.1 This hardly seems possible - but if they meant a serious mathematical equation, perhaps they were right. Today all that has changed in legal academia. Whole journals are devoted to scientic methods in law or empirical studies of legal institutions. Much of this work involves statistics. Columbia Law School, where I teach, has a professor of law and epidemiology and other law schools have similar law and professorships. Many offer courses on statistics (I teach one) or, more broadly, on law and social science. The same is true of practice. Where there are data to parse in a litigation, statisticians and other experts using statistical tools now frequently testify. And judges must understand them. In 1993, in its landmark Daubert decision, the Supreme Court commanded federal judges to penetrate scientic evidence and nd it reliable before allowing it in evidence.2 It is emblematic of the rise of statistics in the law that the evidence at issue in that much-cited case included a series of epidemiological studies. The Supreme Courts new requirement made the Federal Judicial Centers Reference Manual on Scientic Evidence, which appeared at about the same time, a best seller. It has several important chapters on statistics. Before all this began, to meet the need for a textbook, Professor Bruce Levin and I wrote Statistics for Lawyers, which was rst published in 1990. A second edition appeared in 2000. I use the book in my course, but law students who had not previously been exposed to statistical learning frequently complained that it was too hard. This led me to write a much shorter and mathematically less challenging version the present book. I have inicted pamphlet versions of it on students for several years, with good feedback. The book can be read as an introduction to statistics in law standing alone, or in conjunction with other materials. Where I thought the exposition in Statistics for Lawyers not too mathematical to be accessible, I have made free use of it, seeing no reason to restate what we

1 The article is The Application of Statistical Decision Theory to the Jury Discrimination Cases, 80 Harvard Law Review 338 (1966). 2

Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993).

vii

viii

Preface

had already written. Not all subjects in Statistics for Lawyers are covered; some selectivity was inevitable to keep the book short. On the other hand, a number of new cases are discussed and in some instances I have described in greater detail how the courts have grappled - or not grappled - with statistical evidence. The fate of such evidence in the inevitably messy world of litigation bears only a faint resemblance to the pristine hypotheticals, with their neat conclusions, used in most elementary statistical texts. Among the frustrations of social scientists who work in statistics and law is the fact that interesting statistical questions are not infrequently rendered moot by some point of law or fact that disposes of the case without their resolution, indeed perhaps to avoid their resolution. The book nonetheless considers such cases. The book also includes material from a number of studies that Professor Levin and I made after Statistics for Lawyers. I am grateful to him for our long and fruitful collaboration. His current position as Chair of the Department of Biostatistics at Columbias Mailman School of Public Health has kept him from joining me in writing this text, so any errors that may have crept by me are my responsibility alone. Finally, I am indebted to my wife, Professor Vivian Berger, for her meticulous reading of certain chapters from the point of view of a legally trained person being introduced to parts of the subject for the rst time. New York, NY April 28,2008 Michael O. Finkelstein

Contents

1 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Classical and Legal Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Bayess Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Screening Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Debate over Bayesian analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2 Descriptive Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Measures of Central Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Median . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Variants of the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Measures of Dispersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Variance and Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Variance of Sample Sums and the Sample Mean . . . . . . . . . . . . . . . . . . Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Measuring the Disparity Between Two Proportions . . . . . . . . . . . . . . . . . . . 19 19 19 20 23 23 27 27 32 33 38

3 Compound Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 The Addition Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 The Product Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4 Signicance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Concept of Signicance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rejecting the Null Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Two-or-Three Standard Error Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Statistical and Legal Signicance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Factors That Determine Signicance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nonsignicant Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 53 56 56 60 61 64

5 Random Variables and Their Distributions . . . . . . . . . . . . . . . . . . . . . . . . 67 Expectation, Variance, and Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 The Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
ix

Contents

The Hypergeometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Students t-Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Geometric and Exponential Distributions . . . . . . . . . . . . . . . . . . . . . . . .

71 73 76 77 78

6 Condence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7 Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 8 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 What is Sampling? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Simple Random Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 The Problem of Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 More Complex Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Small Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 9 Epidemiology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Cohort Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 CaseControl Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Biases and Confounding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Association vs. Causation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 10 Combining the Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Aggregated and Disaggregated Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Meta-analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 11 Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Four Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Pay Discrimination in an Agricultural Extension Service . . . . . . . . . . . 130 Nitrogen Oxide Emission Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Daily Driving Limits for Truckers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Bloc Voting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 Estimating the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Measuring Indeterminacy in Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Inherent Variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Sampling Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 Condence and Prediction Intervals in Relation to the Regression Estimate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 12 Regression Models Further Considerations . . . . . . . . . . . . . . . . . . . . . . 145 Choice of Explanatory Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Proxy Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 Tainted Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Aggregation Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

Contents

xi

Forms of Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Forms for Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Logarithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Quadratic Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Interactive Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Measuring Uncertainty in Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Assumptions of the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 First Assumption: The Errors Have Zero Mean . . . . . . . . . . . . . . . . . . . 157 Second Assumption: The Errors Are Independent . . . . . . . . . . . . . . . . . 158 Third Assumption: The Errors Have Constant Variance . . . . . . . . . . . . 159 Fourth Assumption: The Errors Are Normally Distributed . . . . . . . . . . 160 Validating the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Table of Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169