Академический Документы
Профессиональный Документы
Культура Документы
Descriptive Statistics
Stem Leaf
3 6
4
5 37
6 235899
7 011346778999
8 00111233568889
9 02238
2.Numerical descriptions
Let y denote a quantitative variable, with
observations y1 , y2 , y3 , , yn
Mean: y1 y2 ... yn yi
y
n n
Example: Annual per capita carbon dioxide
emissions (metric tons) for n = 8 largest nations
in population size
Ordered sample:
Median =
y
Mean =
Example: Annual per capita carbon dioxide
emissions (metric tons) for n = 8 largest nations
in population size
Median =
y
Mean =
Example: Annual per capita carbon dioxide
emissions (metric tons) for n = 8 largest nations
in population size
s
( yi y ) ( y1 y ) ... ( yn y )
2 2 2
s
2
n 1 n 1
The standard deviation s is the square root of the
variance,
s s 2
Example: Political ideology
For those in the student sample who attend
religious services at least once a week (n = 9
of the 60),
y = 2, 3, 7, 5, 6, 7, 5, 6, 4
y 5.0,
(2 5) 2
(3 5) 2
... (4 5) 2
24
s
2
3.0
9 1 8
s 3.0 1.7
p = 50: median
p = 25: lower quartile (LQ)
p = 75: upper quartile (UQ)
------------------------------
Total 589 1089 315
Can summarize by percentages on response
variable (happiness)
Data available at
http://www.stat.ufl.edu/~aa/social/data.html
Example: Survey in Alachua County,
Florida, on predictors of mental health
(data for n = 40 on p. 327 of text and at
www.stat.ufl.edu/~aa/social/data.html)
% income
spent on
lottery
e.g., at x = 0, predicted y =
at x = 100, predicted y =
Regression analysis gives line
predicting y using x
Example:
y = mental impairment, x = life events
(data at www.stat.ufl.edu/~aa/social/data.html)
Nominal
Qualitative
Ordinal
Variable
Discrete
Quantitative
Continuous
Nominal Variable: A qualitative variable that categorizes
(or describes, or names) an element of a population.
71
Variables
A variable is a characteristic or
condition that can change or take on
different values.
Most research begins with a general
question about the relationship
between two variables for a specific
group of individuals.
72
Population
The entire group of individuals is
called the population.
For example, a researcher may be
interested in the relation between
class size (variable 1) and academic
performance (variable 2) for the
population of third-grade children.
73
Sample
Usually populations are so large that
a researcher cannot examine the
entire group. Therefore, a sample is
selected to represent the population
in a research study. The goal is to
use the results obtained from the
sample to help answer questions
about the population.
74
Types of Variables
Variables can be classified as
discrete or continuous.
Discrete variables (such as class
size) consist of indivisible categories,
and continuous variables (such as
time or weight) are infinitely divisible
into whatever units a researcher may
choose. For example, time can be
measured to the nearest minute,
second, half-second, etc.
76
Real Limits
To define the units for a continuous
variable, a researcher must use real
limits which are boundaries located
exactly half-way between adjacent
categories.
77
Measuring Variables
To establish relationships between
variables, researchers must observe
the variables and record their
observations. This requires that the
variables be measured.
The process of measuring a variable
requires a set of categories called a
scale of measurement and a
process that classifies each individual
into one category.
78
4 Types of Measurement
Scales
1. A nominal scale is an unordered
set of categories identified only by
name. Nominal measurements only
permit you to determine whether
two individuals are the same or
different.
2. An ordinal scale is an ordered set
of categories. Ordinal
measurements tell you the direction
of difference between two
individuals. 79
4 Types of Measurement
Scales
3. An interval scale is an ordered series of
equal-sized categories. Interval
measurements identify the direction and
magnitude of a difference. The zero point
is located arbitrarily on an interval scale.
4. A ratio scale is an interval scale where a
value of zero indicates none of the
variable. Ratio measurements identify
the direction and magnitude of
differences and allow ratio comparisons
of measurements.
80
Correlational Studies
The goal of a correlational study is
to determine whether there is a
relationship between two variables
and to describe the relationship.
A correlational study simply
observes the two variables as they
exist naturally.
81
Experiments
The goal of an experiment is to
demonstrate a cause-and-effect
relationship between two variables;
that is, to show that changing the
value of one variable causes changes
to occur in a second variable.
83
Experiments (cont.)
In an experiment, one variable is
manipulated to create treatment
conditions. A second variable is observed
and measured to obtain scores for a group
of individuals in each of the treatment
conditions. The measurements are then
compared to see if there are differences
between treatment conditions. All other
variables are controlled to prevent them
from influencing the results.
In an experiment, the manipulated
variable is called the independent
variable and the observed variable is the
dependent variable. 84
Other Types of Studies
Other types of research studies,
know as non-experimental or
quasi-experimental, are similar to
experiments because they also
compare groups of scores.
These studies do not use a
manipulated variable to differentiate
the groups. Instead, the variable
that differentiates the groups is
usually a pre-existing participant
variable (such as male/female) or a
time variable (such as before/after). 86
Other Types of Studies
(cont.)
Because these studies do not use the
manipulation and control of true
experiments, they cannot
demonstrate cause and effect
relationships. As a result, they are
similar to correlational research
because they simply demonstrate
and describe relationships.
87
Data
The measurements obtained in a
research study are called the data.
The goal of statistics is to help
researchers organize and interpret
the data.
89
Descriptive Statistics
Descriptive statistics are methods
for organizing and summarizing data.
92
Notation
The individual measurements or scores
obtained for a research participant will be
identified by the letter X (or X and Y if
there are multiple scores for each
individual).
The number of scores in a data set will be
identified by N for a population or n for a
sample.
Summing a set of values is a common
operation in statistics and has its own
notation. The Greek letter sigma, , will
be used to stand for "the sum of." For
example, X identifies the sum of the 94
Order of Operations
1. All calculations within parentheses are
done first.
2. Squaring or raising to other exponents is
done second.
3. Multiplying, and dividing are done third,
and should be completed in order from
left to right.
4. Summation with the notation is done
next.
5. Any additional adding and subtracting is
done last and should be completed in
order from left to right. 95
Basics of Statistics
Statistics presents a rigorous scientific method for gaining insight into data. For
example, suppose we measure the weight of 100 patients in a study. With so
many measurements, simply looking at the data fails to provide an informative
account. However statistics can give an instant overall picture of data based
on graphical presentation or numerical summarization irrespective to the
number of data points. Besides data summarization, another important task of
statistics is to make inference and predict relations of variables.
A Taxonomy of Statistics
Statistical Description of
Data
Statistics describes a numeric set of
data by its
Center
Variability
Shape
Statistics describes a categorical set
of data by
Frequency, percentage or proportion of each
category
Some Definitions
Variable - any characteristic of an individual or entity. A variable can
take different values for different individuals. Variables can be
categorical or quantitative. Per S. S. Stevens
Nominal - Categorical variables with no inherent order or ranking sequence such
as names or classes (e.g., gender). Value may be a numerical, but without
numerical value (e.g., I, II, III). The only operation that can be applied to Nominal
variables is enumeration.
Ordinal - Variables with an inherent rank or order, e.g. mild, moderate, severe.
Can be compared for equality, or greater or less, but not how much greater or less.
Interval - Values of the variable are ordered as in Ordinal, and additionally,
differences between values are meaningful, however, the scale is not absolutely
anchored. Calendar dates and temperatures on the Fahrenheit scale are examples.
Addition and subtraction, but not multiplication and division are meaningful
operations.
Ratio - Variables with all properties of Interval plus an absolute, non-arbitrary zero
point, e.g. age, weight, temperature (Kelvin). Addition, subtraction, multiplication,
and division are all meaningful operations.
Some Definitions
Distribution - (of a variable) tells us what values the variable takes
and how often it takes these values.
Unimodal - having a single peak
Bimodal - having two distinct peaks
Symmetric - left and right half are mirror images.
Frequency Distribution
Consider a data set of 26 children of ages 1-6 years. Then the
frequency distribution of variable age can be tabulated as
follows:
Frequency Distribution of Age
Age 1 2 3 4 5 6
Frequency 5 3 7 5 4 2
Grouped Frequency Distribution of Age:
Age Group 1-2 3-4 5-6
Frequency 8 12 6
Cumulative Frequency
Cumulative frequency of data in previous page
Age 1 2 3 4 5 6
Frequency 5 3 7 5 4 2
Cumulative Frequency 5 8 15 20 24 26
Frequency 8 12 6
Cumulative Frequency 8 20 26
Data Presentation
Two types of statistical presentation of data - graphical and numerical.
Graphical Presentation: We look for the overall pattern and for striking
deviations from that pattern. Over all pattern usually described by
shape, center, and spread of the data. An individual value that falls
outside the overall pattern is called an outlier.
Bar diagram and Pie charts are used for categorical variables.
Histogram, stem and leaf and Box-plot are used for numerical variable.
Data Presentation Categorical
Variable
Bar Diagram: Lists the categories and presents the percent or count of
individuals who fall in each category.
1 15 (15/60)=0.25 25.0
2 25 (25/60)=0.333 41.7
3 20 (20/60)=0.417 33.3
Total 60 1.00 100
Data Presentation Categorical
Variable
Pie Chart: Lists the categories and presents the percent or count of
individuals who fall in each category.
1 15 (15/60)=0.25 25.0
2 25 (25/60)=0.333 41.7
3 20 (20/60)=0.417 33.3
Mean 90.41666667
Standard Error 3.902649518
Median 84
Mode 84
Standard Deviation 30.22979318
Sample Variance 913.8403955
Kurtosis -1.183899591
Skewness 0.389872725
Range 95
Minimum 48
Maximum 143
Sum 5425
Count 60
Graphical Presentation Numerical
Variable
Box-Plot: Describes the five-number summary
Box Plot
Numerical Presentation
A fundamental concept in summary statistics is that of a central value for a set of
observations and the extent to which the central value characterizes the whole
set of data. Measures of central value such as the mean or median must be
coupled with measures of data dispersion (e.g., average distance from the
mean) to indicate how well the central value characterizes the data as a whole.
x1 x2 ... xn x i
x i 1
n n
Methods of Center Measurement
( x1 x ) 2 .... ( xn x ) 2
S
2
n 1
Variance of 5, 7, 3? Mean is (5+7+3)/3 = 5 and the variance is
(5 5) 2 (3 5) 2 (7 5) 2
4
3 1
Standard Deviation: Square root of the variance. The standard
deviation of the above example is 2.
Methods of Variability Measurement
Quartiles: Data can be divided into four regions that cover the total
range of observed values. Cut points for these regions are known as
quartiles.
In notations, quartiles of a data is the ((n+1)/4)q th observation of the
data, where q is the desired quartile and n is the number of
observations of data.
The first quartile (Q1) is the first 25% of the data. The second quartile
(Q2) is between the 25th and 50th percentage points in the data. The
upper bound of Q2 is the median. The third quartile (Q3) is the 25% of
the data lying between the median and the 75% cut point in the data.
In the following example Q1= ((15+1)/4)1 =4th observation of the data. The
4th observation is 11. So Q1 is of this data is 11.
Box Plot: A box plot is a graph of the five number summary. The
central box spans the quartiles. A line within the box marks the
median. Lines extending above and below the box mark the
smallest and the largest observations (i.e., the range). Outlying
samples may be additionally plotted outside the range.
Boxplot
Distribution of Age in Month
Choosing a Summary
The five number summary is usually better than the mean and standard
deviation for describing a skewed distribution or a distribution with
extreme outliers. The mean and standard deviation are reasonable for
symmetric distributions that are free of outliers.
In real life we cant always expect symmetry of the data. Its a common
practice to include number of observations (n), mean, median, standard
deviation, and range as common for data summarization purpose. We
can include other summary statistics like Q1, Q3, Coefficient of variation
if it is considered to be important for describing data.
Shape of Data
Shape of data is measured by
Skewness
Kurtosis
Skewness
Measures asymmetry of data
Positive or right skewed: Longer right tail
Negative or left skewed: Longer left tail
10
Median 84
Mode 84
8
Standard Deviation 30.22979318
Number of Subjects
6
Sample Variance 913.8403955
Kurtosis -1.183899591
4
Skewness 0.389872725
Range 95
2
Minimum 48
0
Maximum 143
40 60 80 100 120 140 160
Sum 5425
Age in Month
Count 60
Summary of the Variable Age in the
given data set
100
80
60
Class Summary (First Part)
So far we have learned-
Any questions ?
Brief concept of Statistical Softwares
http://www.galaxy.gmu.edu/papers/astr1.html
http://ourworld.compuserve.com/homepages/Rainer_Wuerlaender/sta
tsoft.htm#archiv
http://www.R-project.org
Microsoft Excel
A Spreadsheet Application. It features calculation, graphing tools, pivot
tables and a macro programming language called VBA (Visual Basic for
Applications).
There are many versions of MS-Excel. Excel XP, Excel 2003, Excel 2007
are capable of performing a number of statistical analyses.
Worksheet: Consists of a multiple grid of cells with numbered rows down the page
and alphabetically-tilted columns across the page. Each cell is referenced by its
coordinates. For example, A3 is used to refer to the cell in column A and row 3.
B10:B20 is used to refer to the range of cells in column B and rows 10 through 20.
Microsoft Excel
Opening a document: File Open (From a existing workbook). Change the
directory area or drive to look for file in other locations.
Creating a new workbook: FileNewBlank Document
Saving a File: FileSave
Selecting more than one cell: Click on a cell e.g. A1), then hold the Shift key and
click on another (e.g. D4) to select cells between and A1 and D4 or Click on a cell
and drag the mouse across the desired range.
Creating Formulas: 1. Click the cell that you want to enter the formula,
2. Type = (an equal sign), 3. Click the Function Button, 4. Select the
formula you want and step through the on-screen instructions.
fx
Microsoft Excel
Entering Date and Time: Dates are stored as MM/DD/YYYY. No need to enter
in that format. For example, Excel will recognize jan 9 or jan-9 as 1/9/2007 and
jan 9, 1999 as 1/9/1999. To enter todays date, press Ctrl and ; together. Use a
or p to indicate am or pm. For example, 8:30 p is interpreted as 8:30 pm. To
enter current time, press Ctrl and : together.
Copy and Paste all cells in a Sheet: Ctrl+A for selecting, Ctrl +C for copying
and Ctrl+V for Pasting.
EDIT used to copy and paste data values; used to find data in a
file; insert variables and cases; OPTIONS allows the user to set
general preferences as well as the setup for the Navigator, Charts,
etc.
VIEW user can change toolbars; value labels can be seen in cells
instead of data values
EDIT undo and redo a pivot, select a table or table body (e.g., to
change the font)
Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to
its new location
Customize a toolbar
You will now have an SPSS data file containing the former tab-delimited data. You
simply need to add variable and value labels and define missing values.
1. Open the data file (from the menus, click on FILE OPEN DATA) of
interest.
DESCRIPTIVES
PASTE SPECIAL
4. Select Formatted Text (RTF) and then click on OK
5. Enlarge the graph to a desired size by dragging one or more of the black squares
along the perimeter (if the black squares are not visible, click once on the graph).
Statistics Package
for the Social Science (SPSS)
BASIC STATISTICAL PROCEDURES: CROSSTABS
1. From the ANALYZE pull-down menu, click on DESCRIPTIVE STATISTICS
CROSSTABS.
2. The CROSSTABS Dialog Box will then open.
3. From the variable selection box on the left click on a variable you wish to
designate as the Row variable. The values (codes) for the Row variable make up
the rows of the crosstabs table. Click on the arrow (>) button for Row(s). Next,
click on a different variable you wish to designate as the Column variable. The
values (codes) for the Column variable make up the columns of the crosstabs
table. Click on the arrow (>) button for Column(s).
4. You can specify more than one variable in the Row(s) and/or Column(s). A cross
table will be generated for each combination of Row and Column variables
Statistics Package
for the Social Science (SPSS)
Limitations: SPSS users have less control over data manipulation and
statistical output than other statistical packages such as SAS, Stata etc.
145
Overview
1. Terminology
2. Frequency Distributions/Histograms
3. Measures of data location
4. Measures of data spread
5. Box-plots
6. Scatter-plots
7. Clustering (Multivariate Data)
147
Lecture 1 Lecture 2 Lecture 3
Lecture 4 Statistical Inference
148
Lecture 1 Lecture 2 Lecture 3
Lecture 4 Sample Inferences
1. Two-Sample Inferences
Paired t-test
Two-sample t-test
2. Inferences for more than two samples
One-way ANOVA
Two-way ANOVA
Interactions in Two-way ANOVA
3. DataDesk demo
149
Lecture 1 Lecture 2 Lecture 3
Lecture 4
1. Regression
2. Correlation
3. Multiple Regression
4. ANCOVA
5. Normality Checks
6. Non-parametrics
7. Sample Size Calculations
8. Useful tools and websites
150
FIRST, A REALLY USEFUL SITE
Explanations of outputs
Videos with commentary
Help with deciding what test
to use with what data
151
1. Terminology
Populations & Samples
Population: the complete set of
individuals, objects or scores of interest.
Often too large to sample in its entirety
It may be real or hypothetical (e.g. the results
from an experiment repeated ad infinitum)
Descriptive Statistics:
Quantities and techniques used to
describe a sample characteristic or155
2. Frequency Distributions
An (Empirical) Frequency Distribution
or Histogram for a continuous variable
presents the counts of observations
grouped within pre-specified classes or
groups
157
Serum CK Data for 36 male
volunteers
Frequency
10.0%
4 2.5%
0.5%
0.0% minimu
160
Relative Frequency
Distribution
Distributions
CK-concentration-(U/l)
Quantiles
Mode
Shaded area is 100.0% maxim
percentage of 99.5%
males with CK 0.20 97.5%
values between 90.0%
60 and 100 U/l, 75.0% quar
Relative Frequency
i.e. 42%. 0.15 50.0% med
Right tail 25.0% quar
10.0%
(skewed) 2.5%
0.10 0.5%
0.0% minim
Left tail
0.05
162
The Mean
163
Example
Example 2: The systolic blood pressure
of seven middle aged men were as
follows:
151, 124, 132, 170, 146, 124 and 113.
x
151 124 132 170 146 124 113
7
The mean is 137.14
164
The Median and Mode
If the sample data are arranged in
increasing order, the median is
(i) the middle value if n is an odd
number, or
(ii) midway between the two middle
values if n is an even number
The mode is the most commonly
occurring value.
165
Example 1 n is odd
The reordered systolic blood pressure data seen
earlier are:
166
Example 2 n is even
Two men have the same cholesterol level- the Mode is 274.
167
Mean versus Median
168
4. Measures of Dispersion
169
Range
the sample Range is the difference
between the largest and smallest
observations in the sample
easy to calculate;
Blood pressure example: min=113
and max=170, so the range=57
mmHg
useful for best or worst case
scenarios
sensitive to extreme values
170
Sample Variance
The sample variance, s2, is the
arithmetic mean of the squared
deviations from the sample mean:
n
xi x
2
s i 1
2
n 1
>
171
Standard Deviation
The sample standard deviation, s, is
the square-root of the variance
n
xi x
2
i 1
s
n 1
x x
2
i 2304.86
7
i 1
i s 2304.86
2
x x
Therefore,
i 1 7 1
19.6
174
Coefficient of Variation
The coefficient of variation (CV) or
relative standard deviation (RSD) is the
sample standard deviation expressed as a
percentage of thes mean, i.e.
CV 100%
x
The CV is not affected by multiplicative
changes in scale
Consequently, a useful way of comparing the
dispersion of variables measured on different
scales
175
Example
The CV of the blood pressure data is:
19.6
CV 100 %
137.1
14.3%
176
Inter-quartile range
The Median divides a distribution into two
halves.
Q1 Q3
178
60% of slides complete!
179
5. Box-plots
A box-plot is a visual description of
the distribution based on
Minimum
Q1
Median
Q3
Maximum
Useful for comparing large sets of
data
180
Example 1
The pulse rates of 12 individuals
arranged in increasing order are:
62, 64, 68, 70, 70, 74, 74, 76, 76, 78,
78, 80
182
Example 2: Box-plots of intensities
from 11 gene expression arrays
14
12
10
8
outliers
186
6. Scatter-plot
188
Example 2: Up-regulation/Down-regulation
of gene expression across an array
(Control Cy5 versus Disease Cy3)
189
Example of a Scatter-plot matrix
(multiple pair-wise plots)
190
Other graphical representations
Dot-Plots, Stem-and-leaf plots
Not visually appealing
Pie-chart
Visually appealing, but hard to compare two
datasets. Best for 3 to 7 categories. A total must be
specified.
Violin-plots
=boxplot+smooth density
Nice visual of data shape
191
Multivariate Data
Clustering is useful for visualising
multivariate data and uncovering patterns,
often reducing its complexity
194
UPGMA
Unweighted Pair-Group Method Average
Most commonly used clustering method
Procedure:
1. Each observation forms its own cluster
2. The two with minimum distance are grouped into
a single cluster representing a new observation-
take their average
3. Repeat 2. until all data points form a single
cluster
195
Contrived Example
5 genes of interest on 3 replicates arrays/gels
Array1 Array2 Array3
p53 9 3 7
mdm2 10 2 9
bcl2 1 9 4
d xy ( x1 y1 ) ( x2 y2 ) ( x3 y3 )
2 2 2
cyclinE 6 5 5
caspase 8 1 10 3
196
Example
Construct a distance matrix of all pair-wise
distances
p53 mdm2 bcl2 cyclinE caspase 8
197
{caspase-8 &
p53 mdm2 cyclin E
bcl-2}
p53 0 2.5 4.12 10.9
mdm2 0 6.4 9.1
cyclin E 0 6.9
{caspase-8 &
0
bcl-2}
198
Example (contd)
199
Example of a gene expression
dendrogram
200
Variety of approaches to clustering
Clustering techniques
agglomerative -start with every element in its own
cluster, and iteratively join clusters together
divisive - start with one cluster and iteratively divide it
into smaller clusters
Distance Metrics
Euclidean (as-the-crow-flies)
Manhattan
Minkowski (a whole class of metrics)
Correlation (similarity in profiles: called similarity
metrics)
Linkage Rules
average: Use the mean distance between cluster
members
single: Use the minimum distance (gives loose clusters)
complete: Use the maximum distance (gives tight
clusters)
median: Use the median distance
centroid: Use the distance between the average 201
Clustering Summary
The clusters & tree topology often depend
highly on the distance measure and linkage
method used
202
What is Statistics?
Statistics is a way to get information
from data
Statistics
Data Information
1.204
Nominal Data
Nominal Data
The values of nominal data are categories.
E.g. responses to questions about marital status,
coded as:
Single = 1, Married = 2, Divorced = 3, Widowed = 4
1.205
Ordinal Data
Ordinal Data appear to be categorical in nature, but their
values have an order; a ranking to them:
1.206
Graphical & Tabular Techniques for Nominal
Data
1.207
Nominal Data (Tabular
Summary)
1.208
Nominal Data (Frequency)
1.210
Graphical Techniques for Interval
Data
There are several graphical methods that are
used when the data are interval (i.e. numeric,
non-categorical).
1.211
Building a Histogram
1) Collect the Data
2) Create a frequency distribution for
the data.
3) Draw the Histogram.
1.212
Histogram and Stem &
Leaf
1.213
Ogive
1.214
Cumulative Relative
Frequencies
first class
next class: .
355+.185=.540
:
:
last class: .
930+.070=1.00
1.215
Ogive
The ogive can be used
to answer questions
like:
around $35
(Refer also to Fig. 2.13 in your textbook
1.216
Scatter Diagram
Example 2.9 A real estate agent wanted
to know to what extent the selling price
of a home is related to its size
1.217
Scatter Diagram
It appears that in fact there is a
relationship, that is, the greater the
house size the greater the selling
price
1.218
Patterns of Scatter
Diagrams
Linearity and Direction are two
concepts we are interested in
1.220
Numerical Descriptive
Techniques
Measures of Central Location
Mean, Median, Mode
Measures of Variability
Range, Standard Deviation, Variance, Coefficient of
Variation
1.221
Measures of Central
Location
The arithmetic mean, a.k.a.
average, shortened to mean, is the
most popular & useful measure of
central location.
Sum of the observations
Mean
It is =
computed byofsimply adding up
Number observations
all the observations and dividing by
the total number of observations:
1.222
Arithmetic Mean
Sample Mean
Population Mean
1.223
Statistics is a pattern
language
Population Sample
Size N n
Mean
1.224
The Arithmetic Mean
is appropriate for describing
measurement data, e.g. heights of
people, marks of student papers, etc.
1.225
Measures of Variability
Measures of central location fail to
tell the whole story about the
distribution; that is, how much are
For example, two sets of class grades
the The
are shown. observations
mean (=50) is the spread out around
same in each case
the mean value?
But, the red class has greater
variability than the blue class.
1.226
Range
The range is the simplest measure of variability,
calculated as:
E.g.
Data: {4, 4, 4, 4, 50} Range = 46
Data: {4, 8, 15, 24, 39, 50} Range = 46
The range is the same in both cases,
but the data sets have very different distributions
1.227
Statistics is a pattern
language
Population Sample
Size N n
Mean
Variance
1.228
Variance population mean
population size
The variance of a populationsample
is: mean
1.229
Application
Example 4.7. The following sample consists of the
number of jobs six randomly selected students
applied for: 17, 15, 23, 7, 9, 13.
Finds its mean and variance.
Sample Variance
1.231
Standard Deviation
The standard deviation is simply the
square root of the variance, thus:
1.232
Standard Deviation
Consider Example 4.8 where a golf
club manufacturer has designed a
new club and wants to determine if it
is hit more consistently (i.e. with less
variability) than with an old club.
Using Tools > Data Analysis >
[may need to add in
1.234
Chebysheffs TheoremNot often used because interval
is very wide.
Wendys service
time is shortest and
least variable.
1.236
Methods of Collecting
Data
There are many methods used to
collect or obtain data for statistical
analysis. Three of the most popular
methods are:
Direct Observation
Experiments, and
Surveys.
1.237
Sampling
Recall that statistical inference permits us to draw
conclusions about a population based on a sample.
1.238
Sampling Plans
A sampling plan is just a method or
procedure for specifying how a sample will be
taken from a population.
1.240
Stratified Random
Sampling
After the population has been
stratified, we can use simple
random sampling to generate the
complete sample:
1.242
Sampling Error
Sampling error refers to differences between the
sample and the population that exist only because of the
observations that happened to be selected for the
sample.
1.243
Nonsampling Error
Nonsampling errors are more serious and are
due to mistakes made in the acquisition of data or
due to the sample observations being selected
improperly. Three types of nonsampling errors:
1.244
Approaches to Assigning
Probabilities
There are three ways to assign a probability, P(O i),
to an outcome, Oi, namely:
1.245
Interpreting Probability
One way to interpret probability is this:
1.246
Conditional Probability
Conditional probability is used to
determine how two events are
related; that is, we can determine
the probability of one event given
the occurrence of another related
event.
1.248
Complement Rule
The complement of an event A is the event that occurs
when A does not occur.
P(AC) = 1 P(A)
1.249
Multiplication Rule
The multiplication rule is used to
calculate the joint probability of
two events. It is based on the
formula for conditional probability
defined earlier:
If we multiply both sides of the equation by P(B) we have:
1.250
Addition Rule
Recall: the addition rule was introduced
earlier to provide a way to compute the
probability of event A or B or both A and B
occurring; i.e. the union of A and B.
1.251
Addition Rule for Mutually Excusive
Events
If and A and B are mutually exclusive the occurrence of
one event makes the other one impossible. This means
that
P(A and B) = 0
1.252
Two Types of Random
Variables
Discrete Random Variable
one that takes on a countable number of values
E.g. values on the roll of dice: 2, 3, 4, , 12
Analogy:
Integers are Discrete, while Real Numbers are
Continuous
1.253
Laws of Expected Value
1. E(c) = c
The expected value of a constant (c) is just
the value of the constant.
2. E(X + c) = E(X) + c
3. E(cX) = cE(X)
We can pull a constant out of the
expected value expression (either as part of
a sum with a random variable X or as a
coefficient of random variable X).
1.254
Laws of Variance
1. V(c) = 0
The variance of a constant (c) is zero.
2. V(X + c) = V(X)
The variance of a random variable and a constant is
just the variance of the random variable (per 1 above).
3. V(cX) = c2V(X)
The variance of a random variable and a constant
coefficient is the coefficient squared times the variance
of the random variable.
1.255
Binomial Distribution
The binomial distribution is the probability
distribution that results from doing a binomial
experiment. Binomial experiments have the
following properties:
1.256
Binomial Random Variable
The binomial random variable
counts the number of successes in n
trials of the binomial experiment. It
can take on values from 0, 1, 2, , n.
Thus, its a discrete random variable.
P(X 4) = .967
1.258
Binomial Table
What is the probability that Pat gets
two answers correct?
i.e. what is P(X = 2), given
P(success) = .20 and n=10 ?
P(X=2)=.3020
1.260
=BINOMDIST() Excel
Function
There is a binomial distribution
function in Excel that can also be
used to calculate these probabilities.
# successes
P(X4)=.9672
1.261
Binomial Distribution
As you might expect, statisticians
have developed general formulas for
the mean, variance, and standard
deviation of a binomial random
variable. They are:
1.262
Poisson Distribution
Named for Simeon Poisson, the Poisson
distribution is a discrete probability distribution
and refers to the number of events (a.k.a.
successes) within a specific time period or region
of space. For example:
The number of cars arriving at a service station in 1
hour. (The interval of time is 1 hour.)
The number of flaws in a bolt of cloth. (The specific
region is a bolt of cloth.)
The number of accidents in 1 day on a particular
stretch of highway. (The interval is defined by both time,
1 day, and space, the particular stretch of highway.)
1.263
The Poisson Experiment
Like a binomial experiment, a Poisson experiment
has four defining characteristic properties:
1. The number of successes that occur in any interval is
independent of the number of successes that occur
in any other interval.
2. The probability of a success in an interval is the
same for all equal-size intervals
3. The probability of a success is proportional to the
size of the interval.
4. The probability of more than one success in an
interval approaches 0 as the interval becomes
smaller.
1.264
Poisson Distribution
The Poisson random variable is the number
of successes that occur in a period of time or
successes
an interval of space in a Poisson experiment.
1.265
Poisson Probability
Distribution
The probability that a Poisson random
variable assumes a value of x is given by:
FYI:
1.266
Example 7.12
The number of typographical errors in new
editions of textbooks varies considerably
from book to book. After some analysis he
concludes that the number of errors is
Poisson distributed with a mean of 1.5 per
100 pages. The instructor randomly selects
100 pages of a new book. What is the
probability that there are no typos?
1.268
Example 7.13
For a 400 page book, what is the
probability that there are
no typos?
P(X=0) =
there is a very small chance there are no typos
1.269
Example 7.13
Excel is an even better alternative:
1.270
Probability Density
Functions
Unlike a discrete random variable which
we studied in Chapter 7, a continuous
random variable is one that can assume
an uncountable number of values.
We cannot list the possible values
because there is an infinite number of
them.
Because there is an infinite number of
values, the probability of each individual
value is virtually 0.
1.271
Point Probabilities are Zero
Because there is an infinite number of values, the
probability of each individual value is virtually 0.
1.272
Probability Density
Function
A function f(x) is called a probability density
function (over the range a x b if it meets
the following requirements:
area=1
a b x
2) The total area under the curve between a and b is
1.0
1.273
The Normal Distribution
The normal distribution is the most important of
all probability distributions. The probability density
function of a normal random variable is given by:
1.274
The Normal Distribution
Important things to note:
The normal distribution is fully defined by two parameters:
its standard deviation and mean
1.276
Calculating Normal
Probabilities
We can use the following function to
convert any normal random variable
to a standard normal random
variable
0
Some advice:
always draw a
picture!
1.277
Calculating Normal
Probabilities
Example: The time required to build a computer is
normally distributed with a mean of 50 minutes
and a standard deviation of 10 minutes:
1.278
Calculating Normal
Probabilities
mean of 50 minutes and a
standard deviation of 10 minutes
P(45 < X < 60) ?
1.279
Calculating Normal
Probabilities
We can use Table 3 in
Appendix B to look-up
probabilities P(0 < Z < z)
1.280
Calculating Normal
Probabilities
How to use Table 3
This table gives probabilities P(0 < Z < z)
First column = integer + first decimal
Top row = second decimal place
1.281
Using the Normal Table (
Table 3)
What is P(Z > 1.6)P(0?< Z < 1.6) = .4452
0 1.6
-2.23 0 2.23
0 1.52
0 0.9 1.9
P(0.9 < Z < 1.9) = P(0 < Z < 1.9) P(0 < Z < 0.9)
=.4713 .3159
= .1554
1.285
Finding Values of Z
1.286
Using the values of Z
Similarly
P(-1.645 < Z < 1.645) = .90
1.287
Other Continuous
Distributions
Three other important continuous
distributions which will be used
extensively in later sections are
introduced here:
Student t Distribution,
Chi-Squared Distribution, and
F Distribution.
1.288
Student t Distribution
Here the letter t is used to represent the random
variable, hence the name. The density function
for the Student t distribution is as follows
1.289
Student t Distribution
In much the same way that and define the normal
distribution, , the degrees of freedom, defines the
Student
t Distribution:
Figure 8.24
As the number of degrees of freedom increases, the t
distribution approaches the standard normal distribution.
1.290
Determining Student t
Values
The student t distribution is used extensively in
statistical inference. Table 4 in Appendix B lists values of
1.291
Using the t table (Table 4) for
values
For example, if we
Area under thewant the(tvalue
curve valueA
of
) : COLUMN
t with 10 degrees of freedom such
tthat the area under the Student t
.05,10
curve is .05:
t.05,10=1.812
1.292
F Distribution
The F density function is given by:
1.293
Determining Values of F
For example, what is the value of F
for 5% of the area under the right
hand tail of the curve, with a
There are different tables
numerator
for different values of A. degree of freedom of 3
Make sure you start with
andtable!!
the correct a denominator degree of
freedom of 7? F =4.35
F
.05,3,7
Solution:
.05,3,7 use the F look-up (Table 6)
Denominator Degrees of Freedom : ROW
Numerator Degrees of Freedom : COLUMN
1.294
Determining Values of F
For areas under the curve on the left
hand side of the curve, we can
leverage the following relationship:
1.295
Chapter 9
Sampling Distributions
1.296
Sampling Distribution of the
Mean
A fair die is thrown infinitely many times,
with the random variable X = # of spots on
any throw.
x 1 2 3 4 5 6
The probability distribution of X is:
P(x) 1/6 1/6 1/6 1/6 1/6 1/6
1.298
Sampling Distribution of Two Dice
The
P( )sampling
6/36
distribution of is
shown below:
1.0
1.5
1/36
2/36
5/36
2.0 3/36
4/36
)
2.5 4/36
3.0 5/36
3/36
P(
3.5 6/36
4.0 5/36
4.5 4/36 2/36
5.0 3/36
5.5 2/36
6.0 1/36 1/36
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
1.299
Compare
Compare the distribution of X
1 2 3 4 5 6 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
1.300
Central Limit Theorem
The sampling distribution of the
mean of a random sample drawn
from any population is
approximately normal for a
sufficiently large sample size.
1.302
Sampling Distribution of the Sample
Mean
1.
2.
3. If X is normal, X is normal. If X is
nonnormal, X is approximately normal for
sufficiently large sample sizes.
Note: the definition of sufficiently large
depends on the extent of nonnormality of x
(e.g. heavily skewed; multimodal)
1.303
Example 9.1(a)
The foreman of a bottling plant has
observed that the amount of soda in each
32-ounce bottle is actually a normally
distributed random variable, with a mean
of 32.2 ounces and a standard deviation of
.3 ounce.
1.306
Example 9.1(b)
We want to find P(X > 32), where X is normally
distributed
with =32.2 and =.3
Things we know:
1)X is normally distributed, therefore so will X.
2) = 32.2 oz.
3)
1.307
Example 9.1(b)
If a customer buys a carton of four bottles,
what is the probability that the mean
amount of the four bottles will be greater
than 32 ounces?
what is the probability that one what is the probability that the
bottle will contain more than 32 mean of four bottles will exceed 32
ounces? oz?
1.309
Sampling Distribution: Difference
of two means
The final sampling distribution introduced is that of the
difference between two sample means. This
requires:
1.310
Sampling Distribution: Difference
of two means
The expected value and variance of the
sampling distribution of are given by:
mean:
standard deviation:
1.311
Estimation
There are two types of inference: estimation
and hypothesis testing; estimation is
introduced first.
Point Estimator
Interval Estimator
1.313
Point & Interval Estimation
For example, suppose we want to estimate the mean
summer income of a class of business students. For
n=25 students,
is calculated to be 400 $/week.
1.314
Estimating when is
known
the confidence
interval
We established in Chapter 9:
Table 10.1
1.316
Example 10.1
A computer company samples demand during
lead time over
235 25374
time 309
periods:
499 253
421 361 514 462 369
394 439 348 344 330
261 374 302 466 535
386 316 296 332 334
Example 10.1
In order to use our confidence interval estimator, we need
the following pieces of
370.16
Calculated data:
from the data
1.96
75
Given
n 25
therefore:
1.318
INTERPRET
Example 10.1
The estimation for the mean demand during lead
time lies between 340.76 and 399.56 we can use
this as input in developing an inventory policy.
1.319
Interval Width
A wide interval provides little information.
For example, suppose we estimate with 95%
confidence that an accountants average starting
salary is between $15,000 and $100,000.
1.320
Interval Width
The width of the confidence interval
estimate is a function of the
confidence level, the population
standard deviation, and the sample
size
1.321
Selecting the Sample Size
We can control the width of the interval by
determining the sample size necessary to produce
narrow intervals.
Since:
It follows that
Solve for n to get requisite sample size!
1.322
Selecting the Sample Size
Solving the equation
1.324
Example 10.2
A lumber company must estimate the
mean diameter of trees to determine
whether or not there is sufficient lumber to
harvest an area of forest. They need to
estimate this to within 1 inch at a
confidence level of 99%. The tree
diameters are normally distributed with a
standard deviation of 6 inches.
1
That is, we will need to sample at
least 239 trees to have a
99% confidence interval of
1.327
Nonstatistical Hypothesis Testing
1.328
Nonstatistical Hypothesis Testing
1.330
Nonstatistical Hypothesis Testing
1.331
Nonstatistical Hypothesis Testing
P(Type I error) =
P(Type II error) =
1.332
Concepts of Hypothesis Testing (1)
1.333
Concepts of Hypothesis
Testing
Consider Example 10.1 (mean demand for
computers during assembly lead time) again.
Rather than estimate the mean demand, our
operations manager wants to know whether the
mean is different from 350 units. We can
rephrase this request into a test of the hypothesis:
H0: = 350
1.334
Concepts of Hypothesis Testing (4)
1.335
Concepts of Hypothesis
Testing
Once the null and alternative hypotheses are stated, the
next step is to randomly sample the population and
calculate a test statistic (in this example, the sample
mean).
1.336
Types of Errors
A Type I error occurs when we reject a true null
hypothesis (i.e. Reject H0 when it is TRUE)
H0 T F
Reject I
Reject II
1.337
Recap I
1) Two hypotheses: H0 & H1
2) ASSUME H0 is TRUE
3) GOAL: determine if there is enough
evidence to infer that H1 is TRUE
4) Two possible decisions:
Reject H0 in favor of H1
NOT Reject H0 in favor of H1
5) Two possible types of errors:
Type I: reject a true H0 [P(Type I)= ]
Type II: not reject a false H 0 [P(Type II)= ]
1.338
Example 11.1
A department store manager determines that a
new billing system will be cost-effective only if
the mean monthly account is more than $170.
1.340
Example 11.1
What we want to show:
H1: > 170
H0: = 170 (well assume this is true)
We know:
n = 400,
= 178, and
= 65
1.342
Example 11.1 Rejection
Region
The rejection region is a range of
values such that if the test statistic
falls into that range, we decide to
reject the null hypothesis in favor of
the alternative hypothesis.
1.344
Example 11.1
At a 5% significance level (i.e. =0.05), we get
1.345
Example 11.1 The Big
Picture
Reject H0 in favor of
1.346
Standardized Test Statistic
An easier method is to use the standardized test
statistic:
1.347
PLOT POWER CURVE
1.348
p-Value
The p-value of a test is the probability of
observing a test statistic at least as extreme
as the one computed given that the null
hypothesis is true.
1.349
Interpreting the p-value
The smaller the p-value, the more statistical evidence
exists to support the alternative hypothesis.
If the p-value is less than 1%, there is overwhelming
evidence that supports the alternative hypothesis.
If the p-value is between 1% and 5%, there is a
strong evidence that supports the alternative
hypothesis.
If the p-value is between 5% and 10% there is a weak
evidence that supports the alternative hypothesis.
If the p-value exceeds 10%, there is no evidence that
supports the alternative hypothesis.
We observe a p-value of .0069, hence there is
overwhelming evidence to support H1: > 170.
1.350
Interpreting the p-value
Compare the p-value with the selected value of the
significance level:
1.351
Chapter-Opening Example
H1: < 22
H0: = 22
1.352
Chapter-Opening Example
The x
z test statistic is
/ n
1.353
Chapter-Opening Example
z z z.10 1.28
Rejection region:
x
x
4,759
i
21.63
220 220
and
x 21.63 22
z .91
/ n 6 / 220
p-value = P(Z < -.91) = .5 - .3186 = .1814
1.354
Chapter-Opening Example
1.356
Right-Tail Testing
Calculate the critical value of the
mean ( ) and compare against the
observed value of the sample mean (
)
1.357
Left-Tail Testing
Calculate the critical value of the
mean ( ) and compare against the
observed value of the sample mean (
)
1.358
TwoTail Testing
Two tail testing is used when we want
to test a research hypothesis that a
parameter is not equal () to some
value
1.359
Example 11.2
AT&Ts argues that its rates are such that customers wont
see a difference in their phone bills between them and
their competitors. They calculate the mean and standard
deviation for all their customers at $17.09 and $3.87
(respectively).
1.360
Example 11.2
The rejection region is set up so we can reject the null
hypothesis when the test statistic is large or when it is
small.
1.361
Example 11.2
At a 5% significance level (i.e. =.
05), we have
/2 = .025. Thus, z.025 = 1.96 and
our rejection region is:
1.362
Example 11.2
From the data, we calculate = 17.55
We find that:
1.363
PLOT POWER CURVE
1.364
Summary of One- and Two-Tail
Tests
1.365
Inference About A
Population[SIGMA UNKNOWN]
Population
Sample
Inference
Statistic
Parameter
1.367
Testing when is
unknown
When the population standard
deviation is unknown and the
population is normal, the test
statistic for testing hypotheses about
is:
1.369
IDENTIFY
Example 12.1
Our objective is to describe the population of the
numbers of packages processed in 1 hour by new
workers, that is we want to know whether the new
workers productivity is more than 90% of that of
experienced workers. Thus we have:
H0: = 450
1.370
COMPUTE
Example 12.1
Our test statistic is:
1.371
COMPUTE
Example 12.1
From the data, we calculate = 460.38, s
=38.83 and thus:
Since
Example 12.2
Can we estimate the return on
investment for companies that won
quality awards?
Example 12.2
From the data, we calculate:
and so:
1.374
Check Requisite
Conditions
The Student t distribution is robust, which means
that if the population is nonnormal, the results of
the t-test and confidence interval estimate are still
valid provided that the population is not
extremely nonnormal.
1.375
Inference About Population
Variance
If we are interested in drawing inferences about a
populations variability, the parameter we need to
investigate is the population variance:
1.376
Testing & Estimating Population
Variance
Combining this statistic:
1.377
IDENTIFY
Example 12.3
Consider a container filling machine.
Management wants a machine to fill 1 liter
(1,000 ccs) so that that variance of the fills is
less than 1 cc2. A random sample of n=25 1 liter
fills were taken. Does the machine perform as it
should at the 5% significance level?
Variance is less than 1 cc2
1.378
COMPUTE
Example 12.3
Since our alternative hypothesis is phrased as:
H1: <1
re
pa
s2=.8088
m
co
And thus our test statistic takes on this value
1.379
Example 12.4
As we saw, we cannot reject the null hypothesis
in favor of the alternative. That is, there is not
enough evidence to infer that the claim is true.
Note: the result does not say that the variance
is greater than 1, rather it merely states that
we are unable to show that the variance is
less than 1.
1.380
COMPUTE
Example 12.4
In order to create a confidence interval
estimate of the variance, we need these
formulae:
lower confidence upper confidence
limit limit
1.382
Difference of Two Means
In order to test and estimate the difference
between two population means, we
draw random samples from each of two
populations. Initially, we will consider
independent samples, that is, samples that
are completely unrelated to one another.
1.383
Sampling Distribution of
1. is normally distributed if the original
populations are normal or approximately normal
if the populations are nonnormal and the sample
sizes are large (n1, n2 > 30)
3. The variance of is
1.385
Making Inferences About
except that, in practice, the z statistic is rarely
used since the population variances are unknown.
??
1.386
When are variances equal?
How do we know when the population
variances are equal?
degrees of freedom
1.388
CI Estimator for (equal
variances)
The confidence interval estimator
for when the population
variances are equal is given by:
1.389
Test Statistic for (unequal
variances)
The test statistic for when
the population variances are
unequal is given by:
degrees of freedom
Example 13.2
Two methods are being tested for assembling
office chairs. Assembly times are recorded (25
times for each method). At a 5% significance
level, do the assembly times for the two
methods differ?
1.391
COMPUTE
Example 13.2
The assembly times for each of the
two methods are recorded and
preliminary data is prepared
Example 13.2
Recall, we are doing a two-tailed test,
hence the rejection region will be:
Example 13.2
In order to calculate our t-statistic,
we need to first calculate the pooled
variance estimator, followed by
the t-statistic
1.394
INTERPRET
Example 13.2
1.395
INTERPRET
Example 13.2
Excel, of course, also provides us
with the information
Compare
or look at p-value
1.396
Confidence Interval
We can compute a 95% confidence interval estimate
for the difference in mean assembly times as:
1.397
Matched Pairs Experiment
Previously when comparing two populations,
we examined independent samples.
1.399
Inference about the ratio of two
variances
So far weve looked at comparing measures of central
location, namely the mean of two populations.
degrees of freedom.
1.400
Inference about the ratio of two
variances
Our null hypothesis is always:
H 0:
df1 = n1 - 1
df2 = n2 - 1
1.401
IDENTIFY
Example 13.6
In example 13.1, we looked at the variances of
the samples of people who consumed high fiber
cereal and those who did not and assumed
they were not equal. We can use the ideas just
developed to test if this is in fact the case.
1.402
CALCULATE
Example 13.6
Since our research hypothesis is: H1:
We are doing a two-tailed test, and
our rejection region is:
1.403
CALCULATE
Example 13.6
Our test statistic is:
.58 1.61 F
Hence there is sufficient evidence to reject the null
hypothesis in favor of the alternative; that is, there is a
difference in the variance between the two populations.
1.404
INTERPRET
Example 13.6
We may need to work with the Excel
output before drawing conclusions
Our research hypothesis
H1:
requires two-tail testing,
but Excel only gives us values
for one-tail testing
Ordinal/Rank
In order but not
equal (Likert)
Categorical
Names
9/14/2010 407
Continuous Data
If comparing 2 groups
(treatment/control)
t-test
If comparing > 2 groups
ANOVA (F-test)
If measuring association between 2
variables
Pearson r correlation
If trying to predict an outcome
(crystal ball)
Regression or multiple regression
9/14/2010 408
Ordinal Data
Beyond the capability of Excel just FYI
If comparing 2 groups
Mann Whitney U (treatment vs. control)
Wilcoxon (matched pre vs. post)
If comparing > 2 groups
Kruskal-Wallis (median test)
If measuring association between 2
variables
Spearman rho ()
Likert-type scales are ordinal data
9/14/2010 409
Categorical Data
Called a test of frequency how
often something is observed (AKA:
Goodness of Fit Test, Test of
Homogeneity)
Chi-Square (2)
Examples of burning research
questions:
Do negative ads change how people
vote?
Is there a relationship between marital
9/14/2010status and health insurance coverage? 410
Words we use to describe
statistics
Mean ()
The arithmetic
average (add all of
the scores
together, then
divide by the
number of scores)
= x / n
9/14/2010 412
Median
The middle number
(just like the
median strip that
divides a highway
down the middle;
50/50)
Used when data is
not normally
distributed
Often hear about
the median price of
housing
9/14/2010 413
Mode
The most
frequently
occurring number
(score,
measurement,
value, cost)
On a frequency
distribution, its the
highest point (like
the la mode on
pie)
9/14/2010 414
Standard Deviation ()
99%
95%
9/14/2010 415
We Make Mistakes!
Alpha level p value
Set BEFORE we collect Calculated AFTER we
data, run statistics gather the data
Defines how much of The calculated
an error we are willing probability of a mistake
by saying it works
to make to say we
AKA: level of significance
made a difference
Describes the percent of
If were wrong, its an
the population/area
alpha error or Type 1 under the curve (in the
error tail) that is beyond our
statistic
9/14/2010 416
2-tailed Test
The critical value is
the number that
separates the blue
zone from the middle
( 1.96 this example)
In a t-test, in order to
be statistically
significant the t score
needs to be in the
blue zone
If = .05, then 2.5%
of the area is in each
tail
9/14/2010 417
1-tailed Test
9/14/2010 418
Chi-Square ( ) 2