Академический Документы
Профессиональный Документы
Культура Документы
AND ECOLOGICAL
STATISTICS WITH R
Second Edition
ENVIRONMENTAL
AND ECOLOGICAL
STATISTICS WITH R
Second Edition
Song S. Qian
The University of Toledo
Ohio, USA
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copy-
right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro-
vides licenses and registration for a variety of users. For organizations that have been granted a photo-
copy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
In memory of my grandmother 张一贯,mother 仲泽庆, and father 钱拙.
Contents
Preface xiii
I Basic Concepts 1
1 Introduction 3
2 A Crash Course on R 19
2.1 What is R? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Getting Started with R . . . . . . . . . . . . . . . . . . . . . 20
2.2.1 R Commands and Scripts . . . . . . . . . . . . . . . . 21
2.2.2 R Packages . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.3 R Working Directory . . . . . . . . . . . . . . . . . . . 22
2.2.4 Data Types . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.5 R Functions . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3 Getting Data into R . . . . . . . . . . . . . . . . . . . . . . . 27
2.3.1 Functions for Creating Data . . . . . . . . . . . . . . . 29
2.3.2 A Simulation Example . . . . . . . . . . . . . . . . . . 31
2.4 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4.1 Data Cleaning . . . . . . . . . . . . . . . . . . . . . . 35
2.4.1.1 Missing Values . . . . . . . . . . . . . . . . . 36
vii
viii Contents
3 Statistical Assumptions 47
4 Statistical Inference 77
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.2 Estimation of Population Mean and Confidence Interval . . . 78
4.2.1 Bootstrap Method for Estimating Standard Error . . . 86
4.3 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . 90
4.3.1 t-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.3.2 Two-Sided Alternatives . . . . . . . . . . . . . . . . . 98
4.3.3 Hypothesis Testing Using the Confidence Interval . . . 99
4.4 A General Procedure . . . . . . . . . . . . . . . . . . . . . . 101
4.5 Nonparametric Methods for Hypothesis Testing . . . . . . . 102
4.5.1 Rank Transformation . . . . . . . . . . . . . . . . . . 102
4.5.2 Wilcoxon Signed Rank Test . . . . . . . . . . . . . . . 103
4.5.3 Wilcoxon Rank Sum Test . . . . . . . . . . . . . . . . 104
4.5.4 A Comment on Distribution-Free Methods . . . . . . 106
4.6 Significance Level α, Power 1 − β, and p-Value . . . . . . . . 109
4.7 One-Way Analysis of Variance . . . . . . . . . . . . . . . . . 116
4.7.1 Analysis of Variance . . . . . . . . . . . . . . . . . . . 117
4.7.2 Statistical Inference . . . . . . . . . . . . . . . . . . . 119
4.7.3 Multiple Comparisons . . . . . . . . . . . . . . . . . . 121
4.8 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
4.8.1 The Everglades Example . . . . . . . . . . . . . . . . 127
4.8.2 Kemp’s Ridley Turtles . . . . . . . . . . . . . . . . . . 128
4.8.3 Assessing Water Quality Standard Compliance . . . . 134
4.8.4 Interaction between Red Mangrove and Sponges . . . 137
4.9 Bibliography Notes . . . . . . . . . . . . . . . . . . . . . . . 142
Contents ix
Bibliography 515
Index 529
Preface
xiii
xiv Preface
methods. When using statistics, we first must decide what is the nature of the
problem before deciding what statistical tools to use. This first step is not
always taught in a statistics class.
Using the PCB in fish example, I want to illustrate the iterative nature
of a statistical inference problem. We may not be able to identify the most
appropriate model at first. Through repeated effort on proposing the model,
identifying flaws of the proposed model, and revising the model, we hope to
reach a sensible conclusion. As a result, a statistical analysis must have subject
matter context. It is a process of sifting through data to find useful information
to achieve a specific objective. The basic problem of the PCB in fish example
is the risk of PCB exposure from consuming fish from Lake Michigan. The
initial use of the data showed a large difference between large and small fish
PCB concentrations. However, Figure 5.1 suggests that the difference between
small and large fish PCB concentrations cannot be adequately described by the
simple two sample t-test model. Throughout Chapter 5, I used this example
to discuss how a linear regression model should be evaluated and updated. In
Chapter 6, some alternative models are presented to summarize the attempts
made in the literature to correct the inadequacies of the linear models. But I
left Chapter 6 without a satisfactory model. In Chapter 9, I used this example
again to illustrate the use of simulation for model evaluation. While writing
Chapter 9, I discovered the length imbalance. In a way, this example shows
the typical outcome of a statistical analysis — no matter how hard we try, the
outcome is always not completely satisfactory. There are always more “what
if”s. However, the ability to ask “what if” is not easy to teach and learn,
because of the “seven unnatural acts of statistical thinking” required by a
statistical analysis: think critically, be skeptical, think about variation (rather
than about center), focus on what we don’t know, perfect the process, and
think about conditional probabilities and rare events [De Veaux and Velleman,
2008]. By examining the same problem from different angles, I hope to bring
home the essential message: statistical analysis is more than reporting a p-
value.
Since the publication of the first edition, I have learned more about the
problem of using statistical hypothesis testing. One part of these problems
lies in the terminology we use in statistical hypothesis testing. The term
“statistically significant” is particularly corruptive. The term has a specific
meaning with respect to the null hypothesis. But by declaring our result
to be “significant” without further explanation, we often mislead not only
the consumer of the result but also ourselves. In this edition, I removed the
term “statistically significant” whenever possible. Instead, I try to use plain
language to describe the meaning of a “significant” result. As I explained in
a guest editorial for the journal Landscape Ecology, a statistical result should
be measured by the MAGIC criteria of Abelson [1995]: a statistical inference
should be a principled argument and the strength of the inference should
be measured by Magnitude, Articulation, Generality, Interestingness, and
Credibility, not just a p-value or R2 or any other single statistic. Throughout
Preface xv
Song S. Qian
Sylvania, Ohio, USA
July 2016