Вы находитесь на странице: 1из 35

c 

  

       


 

      

Central tendency is a statistical measure that identifies a single score as representative of an


entire distribution of scores. The goal of central tendency is to find the single score that is most
typical or most representative of the entire distribution. Unfortunately, there is no single,
standard procedure for determining central tendency. The problem is that there is no single
measure that will always produce a central, representative value in every situation. There are
three main measures of central tendency: the arithmetical mean, the median and the mode.

The mean of a set of scores (abbreviated î) is the most common and useful measure of
central tendency. The mean is the sum of the scores divided by the total number of scores. The
mean is commonly known as the arithmetic average. The mean can only be used for variables
at the interval or ratio levels of measurement. The mean of [2 6 2 10] is (2 + 6 + 2 + 10)/4 = 20/4
= 5. One can think of the mean as the balance point of a distribution (the center of gravity). It
balances the distances of observations to the mean. Another measure of central tendency is
the median, which is defined as the middle value when the numbers are arranged in increasing
or decreasing order. The median is the score that divides the distribution of scores exactly in
half. The median is also the 50th percentile. The median can be used for variables at the
ordinal, interval or ratio levels of measurement. If for example, daily expenses are $50, $100,
$150, $350, $350 the middle value is $150, and therefore, $150 is the median. For odd number
of count the median is middle value. If there is an even number of items in a set, the median is
the average of the two middle values. For example, if we had four valuesͶ$50, $100, $150,
$350Ͷthe median would be the average of the two middle values, $100 and $150; thus, 125 is
the median in that case. The median may sometimes be a better indicator of central tendency
than the mean, especially when there are extreme values. Another indicator of central
tendency is the mode, or the value that occurs most often in a set of numbers. In other words,
the mode is the score or category of scores in a frequency distribution that has the greatest
frequency. In the set of expenses mentioned above, the mode would be $350 because it
appears twice and the other values appear only once. The mode can be used for variables at
any level of measurement (nominal, ordinal, interval or ratio). Sometimes a distribution has
more than one mode. Such a distribution is called multimodal. A distribution with two modes is
called bimodal. Note that the modes do not have to have the same frequencies. The tallest
peak is called the major mode; other peaks are called minor modes. Some distributions do not
have modes. A rectangular distribution has no mode. Some distributions have many peaks and
valleys.

Variability provides a quantitative measure of the degree to which scores in a


distribution are spread out. The greater the difference between scores, the more spread out
the distribution is. The more tightly the scores group together, the less variability there is in the
distribution. Variability is the essence of statistics. The most frequently used methods of
measurement of this variance are: range, deviation and variance, interquartile range and
standard deviation. The range is simply the difference between the highest score and the
lowest score in a distribution plus one. This statistic can be calculated for measurements that
are on an interval scale or above. In dataset with 10 numbers {99,45,23,67,45,91,82,78,62,51},
the highest number is 99 and the lowest number is 23, so 99о23=76; the range is 76. The
interquartile range (IQR) is a range that contains the middle 50% of the scores in a distribution.
It is computed as follows: IQR=75th percentileо25th percentile. A related measure of variability
is called the semi-interquartile range. The semi-interquartile range is defined simply as the

1
interquartile range divided by 2. Variance can be defined as a measure of how close the scores
in the distribution are to the middle of the distribution. Using the mean as the measure of the
middle of the distribution, the variance is defined as the average squared difference of the
scores from the mean. When the scores are spread out or heterogeneous, the measure of
variability should be large. When the scores are homogeneous the variability should be smaller.
Another measure of variability is the standard deviation. The standard deviation is simply the
square root of the variance. The standard deviation is an especially useful measure of variability
when the distribution is normal or approximately normal (see Probability) because the
proportion of the distribution within a given number of standard deviations from the mean can
be calculated. Therefore standard deviation is the average distance from the mean. So the
mean is the representative value, and the standard deviation is the representative distance of
any one point in the distribution from the mean.

While the measures of central tendency convey information about the commonalties of
measured properties, the measures of variability quantify the degree to which they differ. If not
all values of data are the same, they differ and variability exists. The measures of central
tendency should be complemented by measures of variability for the same reason.

J      


  


î 

The mean is the most commonly-used measure of central tendency. When we talk about an
"average", we usually are referring to the mean. The mean is simply the sum of the values
divided by the total number of items in the set. The result is referred to as the arithmetic mean.
Sometimes it is useful to give more weighting to certain data points, in which case the result is
called the weighted arithmetic mean.

The notation used to express the mean depends on whether we are talking about the
population mean or the sample mean:

= population mean

= sample mean

The population mean then is defined as:

where

= number of data points in the population

= value of each data point á.

The mean is valid only for interval data or ratio data. Since it uses the values of all of the data
points in the population or sample, the mean is influenced by outliers that may be at the
extremes of the data set.

2
î 

The median is determined by sorting the data set from lowest to highest values and taking the
data point in the middle of the sequence. There is an equal number of points above and below
the median. For example, in the data set {1,2,3,4,5} the median is 3; there are two data points
greater than this value and two data points less than this value. In this case, the median is equal
to the mean. But consider the data set {1,2,3,4,10}. In this dataset, the median still is three, but
the mean is equal to 4. If there is an even number of data points in the set, then there is no
single point at the middle and the median is calculated by taking the mean of the two middle
points.

The median can be determined for ordinal data as well as interval and ratio data. Unlike the
mean, the median is not influenced by outliers at the extremes of the data set. For this reason,
the median often is used when there are a few extreme values that could greatly influence the
mean and distort what might be considered typical. This often is the case with home prices and
with income data for a group of people, which often is very skewed. For such data, the median
often is reported instead of the mean. For example, in a group of people, if the salary of one
person is 10 times the mean, the mean salary of the group will be higher because of the
unusually large salary. In this case, the median may better represent the typical salary level of
the group.

î 

The mode is the most frequently occurring value in the data set. For example, in the data set
{1,2,3,4,4}, the mode is equal to 4. A data set can have more than a single mode, in which case
it is multimodal. In the data set {1,1,2,3,3} there are two modes: 1 and 3.

The mode can be very useful for dealing with categorical data. For example, if a sandwich shop
sells 10 different types of sandwiches, the mode would represent the most popular sandwich.
The mode also can be used with ordinal, interval, and ratio data. However, in interval and ratio
scales, the data may be spread thinly with no data points having the same value. In such cases,
the mode may not exist or may not be very meaningful.

  
î  î  î 

The following table summarizes the appropriate methods of determining the middle or typical
value of a data set based on the measurement scale of the data.

î    î  î



Nominal
Mode
(Categorical)

Ordinal Median

3
Symmetrical data: Mean
Interval
Skewed data: Median

Symmetrical data: Mean


Ratio
Skewed data: Median

3.        


!"  

  

The first thing you usually notice about a distribution͛s shape is whether it has one mode (peak)
or more than one. If it͛s  
 (has just one peak), like most data sets, the next thing you
notice is whether it͛s   
to one side. If the bulk of the data is at the left and
the right tail is longer, we say that the distribution is 
!   " 
; if the
peak is toward the right and the left tail is longer, we say that the distribution is 
 
! " 
.

Look at the two graphs below. They both have ʅ = 0.6923 and ʍ = 0.1685, but their shapes are
different.

skewness = о0.5370 skewness = +0.5370

The first one is moderately skewed left: the left tail is longer and most of the distribution is at
the right. By contrast, the second distribution is moderately skewed right: its right tail is longer
and most of the distribution is at the left.

You can get a general impression of skewness by drawing a histogram, but there are also some
common numerical measures of skewness. Some authors favor one, some favor another.

You may remember that the mean and standard deviation have the same units as the original
data, and the variance has the square of those units. However, the skewness has no units: it͛s a
pure number, like a z-score.

4
  !

The      of a data set is

skewness: g1 = m3 / m23/2

(1) where

m3 = є(xоxȐ)3 / n and m2 = є(xоxȐ)2 / n

xȐ is the mean and n is the sample size, as usual. m3 is called the 
 of the data set.
m2 is the " , the square of the standard deviation.

You͛ll remember that you have to choose one of two different measures of standard deviation,
depending on whether you have data for the whole population or just a sample. The same is
true of skewness. If you have the whole population, then g1 above is the measure of skewness.
But "#  , you need the    :

(2) sample skewness:

source: D. N. Joanes and C. A. Gill. ͞Comparing Measures of Sample Skewness and Kurtosis͟.
a ááá 47(1):183ʹ189.

Excel doesn͛t concern itself with whether you have a sample or a population: its measure of
skewness is always G1.

 c$!î % &!  

&!    *+, Here are grouped data for heights of 100 randomly selected
'  ( î)   ) male students, adapted from Spiegel & Stephens, a 
 

 áá 3/e (McGraw-Hill, 1999), page 68.
59.5ʹ62.5 61 5
A histogram shows that the
62.5ʹ65.5 64 18 data are skewed left, not
symmetric.
65.5ʹ68.5 67 42

68.5ʹ71.5 70 27 But ! 


are
they, compared to other data
71.5ʹ74.5 73 8 sets? To answer this question,
you have to compute the skewness.

Begin with the sample size and sample mean. (The sample size was given, but it never hurts to
check.)

n = 5+18+42+27+8 = 100

xȐ = (61×5 + 64×18 + 67×42 + 70×27 + 73×8) ÷ 100

xȐ = 9305 + 1152 + 2814 + 1890 + 584) ÷ 100

5
xȐ = 6745÷100 = 67.45

Now, with the mean in hand, you can compute the skewness. (Of course in real life you͛d
probably use Excel or a statistics package, but it͛s good to know where the numbers come
from.)

 î)
*+ )  ' - .( ' - .(/ ' - .(0


61 5 305 -6.45 208.01 -1341.68

64 18 1152 -3.45 214.25 -739.15

67 42 2814 -0.45 8.51 -3.83

70 27 1890 2.55 175.57 447.70

73 8 584 5.55 246.42 1367.63

1 6745 n/a 852.75 о269.33

.)J) 67.45 n/a 8.5275 о2.6933

Finally, the skewness is

g1 = m3 / m23/2 = о2.6933 / 8.52753/2 = о0.1082

But wait, there͛s more! That would be the skewness if the you had data for the whole
population. But obviously there are more than 100 male students in the world, or even in
almost any school, so what you have here is a sample, not the population. You must compute
the    :

= [я(100×99) / 98] [о2.6933 / 8.52753/2] = о0.1098

2   !

If skewness is positive, the data are positively skewed or skewed right, meaning that the right
tail of the distribution is longer than the left. If skewness is negative, the data are negatively
skewed or skewed left, meaning that the left tail is longer.

If skewness = 0, the data are perfectly symmetrical. But a skewness of exactly zero is quite
unlikely for real-world data, so         ? Bulmer, M. G.,

áá áá (Dover, 1979) Ͷ a classic Ͷ suggests this rule of thumb:

6
2‘ If skewness is less than о1 or greater than +1, the distribution is ! 
.
2‘ If skewness is between о1 and о½ or between +½ and +1, the distribution is 
 

.
2‘ If skewness is between о½ and +½, the distribution is     .

With a skewness of о0.1098, the sample data for student heights are approximately symmetric.

  : This is an interpretation of the data you actually have. When you have data for the
whole population, that͛s fine. But when you have a sample, the sample skewness doesn͛t
necessarily apply to the whole population. In that case the question is, from the sample
skewness, can you conclude anything about the population skewness? To answer that question,
see the next section.

2  !

Your data set is just one sample drawn from a population. Maybe, from ordinary sample
variability, your sample is skewed even though the population is symmetric. But if the sample is
skewed too much for random chance to be the explanation, then you can conclude that there is
skewness in the population.

But what do I mean by ͞too much for random chance to be the explanation͟? To answer that,
you need to divide the sample skewness G1 by the 

   '( to get the
    , which measures how many standard errors separate the sample skewness from
zero:

(3) test statistic: Zg1 = G1/SES where

This formula is adapted from page 85 of Cramer, Duncan, ááá


 á

(Routledge, 1997). (Some authors suggest я(6/n), but for small samples that͛s a poor
approximation. And anyway, we͛ve all got calculators, so you may as well do it right.)

The critical value of Zg1 is approximately 2. (This is a two-tailed test of skewness т 0 at roughly
the 0.05 significance level.)

2‘ 23!c4-J, the population is very likely skewed negatively (though you don͛t know by
how much).
2‘ 23!c   -J
5J, you can͛t reach any conclusion about the skewness of the
population: it might be symmetric, or it might be skewed in either direction.
2‘ 23!c6J, the population is very likely skewed positively (though you don͛t know by how
much).

Don͛t mix up the meanings of this test statistic and the amount of skewness. The amount of
skewness tells you how highly skewed your sample is: the bigger the number, the bigger the
skew. The test statistic tells you whether the whole population is probably skewed, but not by
how much: the bigger the number, the higher the probability.

   !

GraphPad suggests a  
  "   :

(4) 95% confidence interval of population skewness = G1 ± 2 SES

7
For the college men͛s heights, recall that the sample skewness was G1 = о0.1098. The sample
size was n = 100 and therefore the standard error of skewness is

SES = я[ (600×99) / (98×101×103) ] = 0.2414

The test statistic is

Zg1 = G1/SES = о0.1098 / 0.2414 = о0.45

This is quite small, so  %            

Since the sample skewness is small, a confidence interval is probably reasonable:

G1 ± 2 SES = о.1098 ± 2×.2414 = о.1098±.4828 = о0.5926 to +0.3730.

You can give a 95% confidence interval of skewness as about о0.59 to +0.37, more or less.

7 8 9   8     


!"  

9   

If a distribution is symmetric, the next question is about the central peak: is it high and sharp, or
short and broad? You can get some idea of this from the histogram, but a numerical measure is
more precise.

The ! 
     relative to the rest of the data are measured by a
number called kurtosis. &!" 
 !) : " 
 
) 
    This occurs because, as Wikipedia͛s article on kurtosis explains, higher
kurtosis means more of the variability is due to a few extreme differences from the mean,
rather than a lot of modest differences from the mean.

Balanda and MacGillivray say the same thing in another way:    !      

   ;"     
 
      

  .͟ (Kevin P. Balanda and H.L. MacGillivray. ͞Kurtosis: A Critical Review͟. a 
á
ááá 42:2 [May 1988], pp 111ʹ119, drawn to my attention by Karl Ove Hufthammer)

You may remember that the mean and standard deviation have the same units as the original
data, and the variance has the square of those units. However, the kurtosis has no units: it͛s a
pure number, like a z-score.

The reference standard is a normal distribution, which has a kurtosis of 3. In token of this, often
the      is presented: excess kurtosis is simply    -. For example, the
͞kurtosis͟ reported by Excel is actually the excess kurtosis.

2‘ A normal distribution has kurtosis exactly 3 (excess kurtosis exactly 0). Any distribution
with kurtosis у3 (excess у0) is called   .
2‘ A distribution with kurtosis <3 (excess kurtosis <0) is called   . Compared to a
normal distribution, its central peak is lower and broader, and its tails are shorter and
thinner.
2‘ A distribution with kurtosis >3 (excess kurtosis >0) is called   . Compared to a
normal distribution, its central peak is higher and sharper, and its tails are longer and
fatter.

8
 < !

Kurtosis is unfortunately harder to picture than skewness, but these illustrations, suggested by
Wikipedia, should help. All three of these distributions have mean of 0, standard deviation of 1,
and skewness of 0, and all are plotted on the same horizontal and vertical scale. Look at the
progression from left to right, as kurtosis increases.

kurtosis = 1.8, excess = о1.2 kurtosis = 3, excess = 0 kurtosis = 4.2, excess = 1.2
Uniform(min=оя3, max=я3) Normal(ʅ=0, ʍ=1) Logistic(ɲ=0, ɴ=0.55153)

Moving from the illustrated uniform distribution to a normal distribution, you see that the
͞shoulders͟ have transferred some of their mass to the center and the tails. In other words, the
intermediate values have become less likely and the central and extreme values have become
more likely. The kurtosis increases while the standard deviation stays the same, because more
of the variation is due to extreme values.

Moving from the normal distribution to the illustrated logistic distribution, the trend continues.
There is even less in the shoulders and even more in the tails, and the central peak is higher and
narrower.

How far can this go? What are the  


!  "    ? The
smallest possible kurtosis is 1 (excess kurtosis о2), and the largest is ь, as shown here:

kurtosis = 1, excess = о2 kurtosis = ь, excess = ь

A discrete distribution with two equally likely outcomes, such as winning or losing on the flip of
a coin, has the      It has no central peak and no real tails, and you could
say that it͛s ͞all shoulder͟ Ͷ it͛s as platykurtic as a distribution can be. At the other extreme,
Student͛s t distribution with four degrees of freedom has       . A distribution can͛t
be any more leptokurtic than this.

9
  !

The      of a data set is computed almost the same way as the
coefficient of skewness: just change the exponent 3 to 4 in the formulas:

kurtosis: a4 = m4 / m22 and excess kurtosis: g2 = a4о3

(5) where

m4 = є(xоxȐ)4 / n and m2 = є(xоxȐ)2 / n

Again, the excess kurtosis is generally used because the excess kurtosis of a normal distribution
is 0. xȐ is the mean and n is the sample size, as usual. m4 is called the   of the data
set. m2 is the " , the square of the standard deviation.

Just as with variance, standard deviation, and kurtosis, the above is the final computation if you
have data for the whole population. But "
   , you have to
compute the sample excess kurtosis using this formula, which comes from Joanes and Gill:

(6) sample excess kurtosis:

Excel doesn͛t concern itself with whether you have a sample or a population: its measure of
kurtosis is always G2.

 : Let͛s continue with the example of the college men͛s heights, and compute the
kurtosis of the data set. n = 100, xȐ = 67.45 inches, and the variance m2 = 8.5275 in² were
computed earlier.

 î)  *+ ) - . ' - .(7

61 5 -6.45 8653.84

64 18 -3.45 2550.05

67 42 -0.45 1.72

70 27 2.55 1141.63

73 8 5.55 7590.35

1 n/a 19937.60

 7 n/a 199.3760

10
Finally, the kurtosis is

a4 = m4 / m2² = 199.3760/8.5275² = 2.7418

and the excess kurtosis is

g2 = 2.7418о3 = о0.2582

But this is a sample, not the population, so you have to compute the sample excess kurtosis:

G2 = [99/(98×97)] [101×(о0.2582)+6)] = о0.2091

This sample is !   : its peak is just a bit shallower than the peak of a normal
distribution.

2  !

Your data set is just one sample drawn from a population. How far must the excess kurtosis be
from 0, before you can say that the   also has nonzero excess kurtosis?

The answer comes in a similar way to the similar question about skewness. You divide the
sample excess kurtosis by the 

   '9( to get the     , which
tells you how many standard errors the sample excess kurtosis is from zero:

(7) test statistic: Zg2 = G2 / SEK where

The formula is adapted from page 89 of Duncan Cramer͛s ááá


 á

(Routledge, 1997). (Some authors suggest я(24/n), but for small samples that͛s a poor
approximation. And anyway, we͛ve all got calculators, so you may as well do it right.)

The critical value of Zg2 is approximately 2. (This is a two-tailed test of excess kurtosis т 0 at
approximately the 0.05 significance level.)

2‘ 23!J4-J, the population very likely has negative excess kurtosis (kurtosis <3,
platykurtic), though you don͛t know how much.
2‘ 23!J   -J
5J, you can͛t reach any conclusion about the kurtosis: excess
kurtosis might be positive, negative, or zero.
2‘ 23!J65J, the population very likely has positive excess kurtosis (kurtosis >3,
leptokurtic), though you don͛t know how much.

For the sample college men͛s heights (n=100), you found excess kurtosis of G2 = о0.2091. The
sample is platykurtic, but is this enough to let you say that the whole population is platykurtic
(has lower kurtosis than the bell curve)?

First compute the standard error of kurtosis:

SEK = 2 × SES × я[ (n²о1) / ((nо3)(n+5)) ]

n = 100, and the SES was previously computed as 0.2414.

SEK = 2 × 0.2414 × я[ (100²о1) / (97×105) ] = 0.4784

The test statistic is

11
Zg2 = G2/SEK = о0.2091 / 0.4784 = о0.44

U %   whether the kurtosis of the population is the same as or different from the
kurtosis of a normal distribution.

=   !> 

There are many ways to assess normality, and unfortunately none of them are without
problems. Graphical methods are a good start, such as plotting a histogram and making a
quantile plot.

One test is the 8?=!  ,@      , so called because it uses the test statistics for
both skewness and kurtosis to come up with a single p-value. The test statistic is

(8) DP = Zg1² + Zg2² follows ʖ² with df=2

You can look up the p-value in a table, or use ʖ²cdf on a TI-83 or TI-84.

Caution: The D͛Agostino-Pearson test has a tendency to err on the side of rejecting normality,
particularly with small sample sizes. David Moriarty, in his StatCat utility, recommends that you

 %  8%=!  ,@    < J

For college students͛ heights you had test statistics Zg1 = о0.45 for skewness and Zg2 = 0.44 for
kurtosis. The omnibus test statistic is

DP = Zg1² + Zg2² = 0.45² + 0.44² = 0.3961

and the p-value for ʖ²(2 df) > 0.3961, from a table or a statistics calculator, is 0.8203. You
cannot reject the assumption of normality. (Remember, you never accept the null hypothesis,
so you can͛t say from this test that the distribution á normal.) The histogram suggests
normality, and this test gives you no reason to reject that impression.

 J$<A B  

For a second illustration of inferences about skewness and kurtosis of a population, I͛ll use an
example from Bulmer͛s
áá áá:

*+ 
     <  ) CDcE

B  < 1 2 3 4 5 6 7 8 9 10 11 12

*+ 7 33 58 116 125 126 121 107 56 37 25 4

I͛ll spare you the detailed calculations, but you should be able to verify them by following
equation (1) and equation (2):

n = 815, xȐ = 6.1252, m2 = 5.1721, m3 = 2.0316

skewness g1 = 0.1727 and sample skewness G1 = 0.1730

12
The sample is roughly symmetric but slightly skewed right, which looks about right from the
histogram. The standard error of skewness is

SES = я[ (6×815×814) / (813×816×818) ] = 0.0856

Dividing the skewness by the SES, you get the test statistic

Zg1 = 0.1730 / 0.0856 = 2.02

Since this is greater than 2, you can say that     "      
Again, ͞some positive skewness͟ just means a figure greater than zero; it doesn͛t tell us
anything more about the magnitude of the skewness.

If you go on to compute a 95% confidence interval of skewness from equation (4), you get
0.1730±2×0.0856 = 0.00 to 0.34.

What about the kurtosis? You should be able to follow equation (5) and compute a fourth
moment of m4 = 67.3948. You already have m2 = 5.1721, and therefore

kurtosis a4 = m4 / m2² = 67.3948 / 5.1721² = 2.5194

excess kurtosis g2 = 2.5194о3 = о0.4806

sample excess kurtosis G2 = [814/(813×812)] [816×(о0.4806+6) = о0.4762

So the sample is moderately less peaked than a normal distribution. Again, this matches the
histogram, where you can see the higher ͞shoulders͟.

What if anything can you say about the population? For this you need equation (7). Begin by
computing the standard error of kurtosis, using n = 815 and the previously computed SES of
0.0.0856:

SEK = 2 × SES × я[ (n²о1) / ((nо3)(n+5)) ]

SEK = 2 × 0.0856 × я[ (815²о1) / (812×820) ] = 0.1711

and divide:

Zg2 = G2/SEK = о0.4762 / 0.1711 = о2.78

Since Zg2 is comfortably below о2, you can say that 
     <  
  , less sharply peaked than the normal distribution. But be careful: you know that it is
platykurtic, but you don͛t know by how much.

You already know the population is not normal, but let͛s apply the D͛Agostino-Pearson test
anyway:

13
DP = 2.02² + 2.78² = 11.8088

p-value = P( ʖ²(2) > 11.8088 ) = 0.0027

The test agrees with the separate tests of skewness and kurtosis: sizes of rat litters, for the
entire population of rats, is not normally distributed.

&
2
   
  

&
2
   
  

2‘ There are three interrelated approaches to determine normality, and all three should be
conducted.
1.‘ Look at a histogram with the normal curve superimposed. A histogram provides
useful graphical representation of the data. - To provide a rough example of
normality and non-normality, see the following histograms. The black line
superimposed on the histograms represents the bell-shaped "normal" curve.
Notice how the data for variable1 are normal, and the data for variable2 are
non-normal. In this case, the non-normality is driven by the presence of an
outlier. For more information about outliers, see What are outliers?, How do I
detect outliers?, and How do I deal with outliers?. Problem -- All samples deviate
somewhat from normal, so the question is how much deviation from the black
line indicates ͞non-normality͟? Unfortunately, graphical representations like
histogram provide no hard-and-fast rules. After you have viewed many (many!)
histograms, over time you will get a sense for the normality of data.

2. Look at the values of Skewness and Kurtosis.   involves the symmetry
of the distribution. Skewness that is normal involves a perfectly symmetric
distribution. A positively skewed distribution has scores clustered to the left,
with the tail extending to the right. A negatively skewed distribution has scores
clustered to the right, with the tail extending to the left. 9   involves the
peakedness of the distribution. Kurtosis that is normal involves a distribution
that is bell-shaped and not too peaked or flat. Positive kurtosis is indicated by a
peak. Negative kurtosis is indicated by a flat distribution. Both Skewness and
Kurtosis are 0 in a normal distribution, so the farther away from 0, the more
non-normal the distribution. The question is ͞how much͟ skew or kurtosis
render the data non-normal? This is an arbitrary determination, and sometimes

14
difficult to interpret using the values of Skewness and Kurtosis. - The histogram
above for variable1 represents perfect symmetry (skewness) and perfect
peakedness (kurtosis); and the descriptive statistics below for variable1 parallel
this information by reporting "0" for both skewness and kurtosis. The histogram
above for variable2 represents positive skewness (tail extending to the right) and
positive kurtosis (high peak); and the descriptive statistics below for variable2
parallel this information. Problem -- The question is ͞how much͟ skew or
kurtosis render the data non-normal? This is an arbitrary determination, and
sometimes difficult to interpret using the values of Skewness and Kurtosis.
Luckily, there are more objective tests of normality, described next.

3. Look at established tests for normality that take into account both Skewness
and Kurtosis simultaneously. The Kolmogorov-Smirnov test (K-S) and Shapiro-
Wilk (S-W) test are designed to test normality by comparing your data to a
normal distribution with the same mean and standard deviation of your sample.
If the test is NOT significant, then the data are normal, so any value above .05
indicates normality. If the test is significant (less than .05), then the data are non-
normal. - See the data below which indicate variable1 is normal, and
variable2 is non-normal. Also, keep in mind one limitation of the normality tests
is that the larger the sample size, the more likely to get significant results. Thus,
you may get significant results with only slight deviations from normality when
sample sizes are large.

4. Look at normality plots of the data. ͞Normal Q-Q Plot͟ provides a graphical
way to determine the level of normality. The black line indicates the values your
sample should adhere to if the distribution was normal. The dots are your actual
data. If the dots fall exactly on the black line, then your data are normal. If they
deviate from the black line, your data are non-normal. - Notice how
the data for variable1 fall along the line, whereas the data for variable2 deviate
from the line.

15
5.           
      
  
   !  
  F"  

   measure the strength of association between two variables. The most
common correlation coefficient, called the @  
 ,    ,
measures the strength of the á
 áá  between variables.
In this tutorial, when we speak simply of a correlation coefficient, we are referring to the
Pearson product-moment correlation. Generally, the correlation coefficient of a sample is
denoted by
, and the correlation coefficient of a population is denoted by ʌ or .

& 2     

The sign and the absolute value of a correlation coefficient describe the direction and the
magnitude of the relationship between two variables.
2‘ The value of a correlation coefficient ranges between -1 and 1.
2‘ The greater the absolute value of a correlation coefficient, the stronger the á

relationship.
2‘ The strongest linear relationship is indicated by a correlation coefficient of -1 or 1.
2‘ The weakest linear relationship is indicated by a correlation coefficient equal to 0.
2‘ A positive correlation means that if one variable gets bigger, the other variable tends to
get bigger.
2‘ A negative correlation means that if one variable gets bigger, the other variable tends to
get smaller.
Keep in mind that the Pearson product-moment correlation coefficient only measures linear
relationships. Therefore, a correlation of 0 does not mean zero relationship between two
variables; rather, it means zero á
relationship. (It is possible for two variables to have zero
linear relationship and a strong curvilinear relationship at the same time.)

  
   

The scatterplots below show how different patterns of data produce different degrees of
correlation.

16
î   "     !  "   3  
'Cc ( 'C D ( 'C (

î  ! "   î


  ! "     !  G 
'C,c ( 'C, 7( 'C Hc(

Several points are evident from the scatterplots.


2‘ When the slope of the line in the plot is negative, the correlation is negative; and vice
versa.
2‘ The strongest correlations (r = 1.0 and r = -1.0 ) occur when data points fall  on a
straight line.
2‘ The correlation becomes weaker as the data points become more scattered.
2‘ If the data points fall in a random pattern, the correlation is equal to zero.
2‘ Correlation is affected by outliers. Compare the first scatterplot with the last scatterplot.
The single outlier in the last plot greatly reduces the correlation (from 1.00 to 0.71).

&     

If you look in different statistics textbooks, you are likely to find different-looking (but
equivalent) formulas for computing a correlation coefficient. In this section, we present several
formulas that you may encounter.

The most common formula for computing a product-moment correlation coefficient (r) is given
below.

@
 ,   

A  
   
   ‘ ‘ 
‘‘  
‘ ‘
  ‘ ‘‘

‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘

‘‘ ‘‘ 
‘ ‘‘‘ ‘‘‘ ‘ ‘‘‘ ‘ ‘
   
‘ ‘‘ ‘‘
‘‘ ‘‘‘ ‘‘‘ ‘ ‘‘‘ ‘ ‘
   
‘ ‘
!‘‘ ‘‘
‘‘ "

The formula below uses population means and population standard deviations to compute a
population correlation coefficient (ʌ) from population data.

17
A  
   
   ‘ ‘ 
‘#‘  
‘ ‘
  ‘ ‘‘

#‘‘‘$‘‘%‘‘‘‘&‘‘' ‘‘('‘‘)‘‘‘‘* ‘‘(*‘‘)‘‘+‘‘

‘%‘ ‘‘
 ‘ ‘   

‘‘,, 
‘‘ ‘‘
 
‘ ‘' ‘ ‘‘'‘ ‘ ‘   
‘ ‘('‘ ‘‘
,, 
‘
‘ ‘   ‘'‘* ‘ ‘‘*‘ ‘ ‘   
‘ ‘(*‘ ‘
‘,, 
‘
‘ ‘   ‘*‘)‘ ‘‘,, 
‘
!!‘
!  
‘ ‘'‘
!‘)‘ ‘‘,, 
‘
!!‘!  
‘ ‘*"

The formula below uses sample means and sample standard deviations to compute a
correlation coefficient (r) from sample data.

ÿ  
   
   ‘ ‘ 
‘‘  
‘ ‘
  ‘ ‘‘

‘‘‘$‘‘
‘‘$‘‘‘‘&‘‘ ‘‘‘‘‘‘‘‘ ‘‘‘‘‘‘+‘‘

‘
‘ ‘‘
 ‘ ‘   

‘‘,‘‘ ‘‘ 
‘
 ‘ ‘ ‘‘‘ ‘ ‘   
‘ ‘‘ ‘‘,‘
‘ ‘‘ ‘
‘‘‘ ‘ ‘   
‘ ‘‘ ‘‘,‘
‘ ‘‘‘ ‘‘
,‘
!!‘!  
‘ ‘‘
!‘‘ ‘‘,‘
!!‘!  
‘
 ‘"

The interpretation of the sample correlation coefficient depends on how the sample data is
collected. With a simple random sample, the sample correlation coefficient is an unbiased
á of the population correlation coefficient.

Each of the latter two formulas can be derived from the first formula. Use the second formula
when you have data from the entire population. Use the third formula when you only have
sample data. When in doubt, use the first formula. It is always correct.

Fortunately, you will rarely have to compute a correlation coefficient by hand. Many software
packages (e.g., Excel) and most graphing calculators have a correlation function that will do the
job for you.

Note: Sometimes, it is not clear whether a software package or a graphing calculator is


computing a population correlation coefficient or a sample correlation coefficient. For example,
a casual user might not realize that Microsoft uses a population correlation coefficient (ʌ) for
the Pearson function in its Excel software.

a   


       !      

&     

A      is an assumption about a population parameter. This assumption may


or may not be true.

18
The best way to determine whether a statistical hypothesis is true would be to examine the
entire population. Since that is often impractical, researchers typically examine a random
sample from the population. If sample data are not consistent with the statistical hypothesis,
the hypothesis is rejected.

There are two types of statistical hypotheses.


2‘ >   . The null hypothesis, denoted by H0, is usually the hypothesis that
sample observations result purely from chance.
2‘ =   "   . The alternative hypothesis, denoted by H1 or Ha, is the
hypothesis that sample observations are influenced by some non-random cause.
For example, suppose we wanted to determine whether a coin was fair and balanced. A null
hypothesis might be that half the flips would result in Heads and half, in Tails. The alternative
hypothesis might be that the number of Heads and Tails would be very different. Symbolically,
these hypotheses would be expressed as
H0: P = 0.5
Ha: P т 0.5
Suppose we flipped the coin 50 times, resulting in 40 Heads and 10 Tails. Given this result, we
would be inclined to reject the null hypothesis. We would conclude, based on the evidence,
that the coin was probably not fair and balanced.

 =  >&   

Some researchers say that a hypothesis test can have one of two outcomes: you accept the null
hypothesis or you reject the null hypothesis. Many statisticians, however, take issue with the
notion of "accepting the null hypothesis." Instead, they say: you reject the null hypothesis or
you fail to reject the null hypothesis.
Why the distinction between "acceptance" and "failure to reject?" Acceptance implies that the
null hypothesis is true. Failure to reject implies that the data are not sufficiently persuasive for
us to prefer the alternative hypothesis over the null hypothesis.

&     

Statisticians follow a formal process to determine whether to reject a null hypothesis, based on
sample data. This process, called       !, consists of four steps.
2‘ State the hypotheses. This involves stating the null and alternative hypotheses. The
hypotheses are stated in such a way that they are mutually exclusive. That is, if one is
true, the other must be false.
2‘ Formulate an analysis plan. The analysis plan describes how to use sample data to
evaluate the null hypothesis. The evaluation often focuses around a single test statistic.
2‘ Analyze sample data. Find the value of the test statistic (mean score, proportion, t-
score, z-score, etc.) described in the analysis plan.
2‘ Interpret results. Apply the decision rule described in the analysis plan. If the value of
the test statistic is unlikely, based on the null hypothesis, reject the null hypothesis.

8   

Two types of errors can result from a hypothesis test.

2‘ 2. A Type I error occurs when the researcher rejects a null hypothesis when it
is true. The probability of committing a Type I error is called the !  ". This
probability is also called , and is often denoted by ɲ.
2‘ 22. A Type II error occurs when the researcher fails to reject a null hypothesis
that is false. The probability of committing a Type II error is called  , and is often

19
denoted by ɴ. The probability of   committing a Type II error is called the @ of
the test.

8  A 

The analysis plan includes decision rules for rejecting the null hypothesis. In practice,
statisticians describe these decision rules in two ways - with reference to a P-value or with
reference to a region of acceptance.
2‘ P-value. The strength of evidence in support of a null hypothesis is measured by the @,
". Suppose the test statistic is equal to . The P-value is the probability of observing
a test statistic as extreme as , assuming the null hypothesis is true. If the P-value is less
than the significance level, we reject the null hypothesis.
2‘ Region of acceptance. The !    is a range of values. If the test statistic
falls within the region of acceptance, the null hypothesis is not rejected. The region of
acceptance is defined so that the chance of making a Type I error is equal to the
significance level.

The set of values outside the region of acceptance is called the ! #  . If the test
statistic falls within the region of rejection, the null hypothesis is rejected. In such cases, we say
that the hypothesis has been rejected at the level of significance.
These approaches are equivalent. Some statistics texts use the P-value approach; others use
the region of acceptance approach. In subsequent lessons, this tutorial will present examples
that illustrate each approach.

I , 

 , 
  

A test of a statistical hypothesis, where the region of rejection is on only one side of the
sampling distribution, is called a  , 
  . For example, suppose the null hypothesis
states that the mean is less than or equal to 10. The alternative hypothesis would be that the
mean is greater than 10. The region of rejection would consist of a range of numbers located on
the right side of sampling distribution; that is, a set of numbers greater than 10.
A test of a statistical hypothesis, where the region of rejection is on both sides of the sampling
distribution, is called a , 
  . For example, suppose the null hypothesis states that the
mean is equal to 10. The alternative hypothesis would be that the mean is less than 10 or
greater than 10. The region of rejection would consist of a range of numbers located on both
sides of sampling distribution; that is, the region of rejection would consist partly of numbers
that were less than 10 and partly of numbers that were greater than 10.

&   &   

This lesson describes a general procedure that can be used to test statistical hypotheses.

=F @

  !&     

All hypothesis tests are conducted the same way. The researcher states a hypothesis to be
tested, formulates an analysis plan, analyzes sample data according to the plan, and accepts or
rejects the null hypothesis, based on results of the analysis.
2‘ State the hypotheses. Every hypothesis test requires the analyst to state a null
hypothesis and an alternative hypothesis. The hypotheses are stated in such a way that
they are mutually exclusive. That is, if one is true, the other must be false; and vice
versa.
2‘ Formulate an analysis plan. The analysis plan describes how to use sample data to
accept or reject the null hypothesis. It should specify the following elements.

20
|‘ Significance level. Often, researchers choose significance levels equal to 0.01,
0.05, or 0.10; but any value between 0 and 1 can be used.
|‘ Test method. Typically, the test method involves a test statistic and a sampling
distribution. Computed from sample data, the test statistic might be a mean
score, proportion, difference between means, difference between proportions,
z-score, t-score, chi-square, etc. Given a test statistic and its sampling
distribution, a researcher can assess probabilities associated with the test
statistic. If the test statistic probability is less than the significance level, the null
hypothesis is rejected.
2‘ Analyze sample data. Using sample data, perform computations called for in the analysis
plan.
|‘ Test statistic. When the null hypothesis involves a mean or proportion, use
either of the following equations to compute the test statistic.
Test statistic = (Statistic - Parameter) / (Standard deviation of statistic)

Test statistic = (Statistic - Parameter) / (Standard error of statistic)

where 

is the value appearing in the null hypothesis, and áá is the point
estimate of 

. As part of the analysis, you may need to compute the standard
deviation or standard error of the statistic. Previously, we presented common formulas for the
standard deviation and standard error.

When the parameter in the null hypothesis involves categorical data, you may use a chi-square
statistic as the test statistic. Instructions for computing a chi-square test statistic are presented
in the lesson on the chi-square goodness of fit test.

|‘ P-value. The P-value is the probability of observing a sample statistic as extreme


as the test statistic, assuming the null hypothesis is true.
2‘ Interpret the results. If the sample findings are unlikely, given the null hypothesis, the
researcher rejects the null hypothesis. Typically, this involves comparing the P-value to
the significance level, and rejecting the null hypothesis when the P-value is less than the
significance level.

&      î 

This lesson explains how to conduct a hypothesis test of a mean, when the following conditions
are met:
2‘ The sampling method is simple random sampling.
2‘ The sample is drawn from a normal or near-normal population.
Generally, the sampling distribution will be approximately normally distributed if any of the
following conditions apply.
2‘ The population distribution is normal.
2‘ The sampling distribution is symmetric, unimodal, without outliers, and the sample size
is 15 or less.
2‘ The sampling distribution is moderately skewed, unimodal, without outliers, and the
sample size is between 16 and 40.
2‘ The sample size is greater than 40, without outliers.

This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3)
analyze sample data, and (4) interpret results.

   &   


21
Every hypothesis test requires the analyst to state a null hypothesis and an alternative
hypothesis. The hypotheses are stated in such a way that they are mutually exclusive. That is, if
one is true, the other must be false; and vice versa.
The table below shows three sets of hypotheses. Each makes a statement about how the
population mean ʅ is related to a specified value î. (In the table, the symbol т means " not
equal to ".)

 >   =   "   >  


1 ʅ=M ʅтM 2
2 ʅ>M ʅ<M 1
3 ʅ<M ʅ>M 1

The first set of hypotheses (Set 1) is an example of a two-tailed test, since an extreme value on
either side of the sampling distribution would cause a researcher to reject the null hypothesis.
The other two sets of hypotheses (Sets 2 and 3) are one-tailed tests, since an extreme value on
only one side of the sampling distribution would cause a researcher to reject the null
hypothesis.
*  =   @ 

The analysis plan describes how to use sample data to accept or reject the null hypothesis. It
should specify the following elements.
2‘ Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or
0.10; but any value between 0 and 1 can be used.
2‘ Test method. Use the one-sample t-test to determine whether the hypothesized mean
differs significantly from the observed sample mean.

= <8 

Using sample data, conduct a one-sample t-test. This involves finding the standard error,
degrees of freedom, test statistic, and the P-value associated with the test statistic.
2‘ Standard error. Compute the standard error (SE) of the sampling distribution.
SE = s * sqrt{ ( 1/n ) * ( 1 - n/N ) * [ N / ( N - 1 ) ] }
where  is the standard deviation of the sample, N is the population size, and  is the sample
size. When the population size is much larger (at least 10 times larger) than the sample size, the
standard error can be approximated by:
SE = s / sqrt( n )
2‘ Degrees of freedom. The degrees of freedom (DF) is equal to the sample size (n) minus
one. Thus, DF = n - 1.
2‘ Test statistic. The test statistic is a t-score (t) defined by the following equation.
t = (x - ʅ) / SE
where x is the sample mean, ʅ is the hypothesized population mean in the null hypothesis, and
SE is the standard error.
2‘ P-value. The P-value is the probability of observing a sample statistic as extreme as the
test statistic. Since the test statistic is a t-score, use the t Distribution Calculator to
assess the probability associated with the t-score, given the degrees of freedom
computed above. (See sample problems at the end of this lesson for examples of how
this is done.)

2  A  

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null
hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting
the null hypothesis when the P-value is less than the significance level.

22

&      8    î 

This lesson explains how to conduct a hypothesis test for the difference between two means.
The test procedure, called the ,  ,  , is appropriate when the following conditions
are met:
2‘ The sampling method for each sample is simple random sampling.
2‘ The samples are independent.
2‘ Each population is at least 10 times larger than its respective sample.
2‘ Each sample is drawn from a normal or near-normal population. Generally, the sampling
distribution will be approximately normal if any of the following conditions apply.
|‘ The population distribution is normal.
|‘ The sample data are symmetric, unimodal, without outliers, and the sample size
is 15 or less.
|‘ The sample data are slightly skewed, unimodal, without outliers, and the sample
size is 16 to 40.
|‘ The sample size is greater than 40, without outliers.

This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3)
analyze sample data, and (4) interpret results.

   &   

Every hypothesis test requires the analyst to state a null hypothesis and an alternative
hypothesis. The hypotheses are stated in such a way that they are mutually exclusive. That is, if
one is true, the other must be false; and vice versa.
The table below shows three sets of null and alternative hypotheses. Each makes a statement
about the difference between the mean of one population ʅ1 and the mean of another
population ʅ2. (In the table, the symbol т means " not equal to ".)

 >   =   "   >  


1 ʅ1 - ʅ2 = d ʅ1 - ʅ2 т d 2
2 ʅ1 - ʅ2 > d ʅ1 - ʅ2 < d 1
3 ʅ1 - ʅ2 < d ʅ1 - ʅ2 > d 1

The first set of hypotheses (Set 1) is an example of a two-tailed test, since an extreme value on
either side of the sampling distribution would cause a researcher to reject the null hypothesis.
The other two sets of hypotheses (Sets 2 and 3) are one-tailed tests, since an extreme value on
only one side of the sampling distribution would cause a researcher to reject the null
hypothesis.
When the null hypothesis states that there is no difference between the two population means
(i.e., d = 0), the null and alternative hypothesis are often stated in the following form.

H0: ʅ1 = ʅ2
Ha: ʅ1 т ʅ2

*  =   @ 

The analysis plan describes how to use sample data to accept or reject the null hypothesis. It
should specify the following elements.
2‘ Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or
0.10; but any value between 0 and 1 can be used.

23
2‘ Test method. Use the two-sample t-test to determine whether the difference between
means found in the sample is significantly different from the hypothesized difference
between means.




= <8 

Using sample data, find the standard error, degrees of freedom, test statistic, and the P-value
associated with the test statistic.

2‘ Standard error. Compute the standard error (SE) of the sampling distribution.
SE = sqrt[(s12/n1) + (s22/n2)]
where s1 is the standard deviation of sample 1, s2 is the standard deviation of sample 2, n1 is the
size of sample 1, and n2 is the size of sample 2.

2‘ Degrees of freedom. The degrees of freedom (DF) is:


DF = (s12/n1 + s22/n2)2 / { [ (s12 / n1)2 / (n1 - 1) ] + [ (s22 / n2)2 / (n2 - 1) ] }
If DF does not compute to an integer, round it off to the nearest whole number. Some texts
suggest that the degrees of freedom can be approximated by the smaller of n1 - 1 and n2 - 1;
but the above formula gives better results.

2‘ Test statistic. The test statistic is a t-score (t) defined by the following equation.
t = [ (x1 - x2) - d ] / SE
where x1 is the mean of sample 1, x2 is the mean of sample 2, d is the hypothesized difference
between population means, and SE is the standard error.
2‘ P-value. The P-value is the probability of observing a sample statistic as extreme as the
test statistic. Since the test statistic is a t-score, use the t Distribution Calculator to
assess the probability associated with the t-score, having the degrees of freedom
computed above. (See sample problems at the end of this lesson for examples of how
this is done.)

2  A  

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null
hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting
the null hypothesis when the P-value is less than the significance level.

&     8   î 
@ 

This lesson explains how to conduct a hypothesis test for the difference between paired means.
The test procedure, called the  
,  ,  , is appropriate when the following
conditions are met:
2‘ The sampling method for each sample is simple random sampling.
2‘ The test is conducted on paired data. (As a result, the data sets are   independent.)
2‘ Each sample is drawn from a normal or near-normal population. Generally, the sampling
distribution will be approximately normal if any of the following conditions apply.
|‘ The population distribution is normal.
|‘ The sample data are symmetric, unimodal, without outliers, and the sample size
is 15 or less.
|‘ The sample data are slightly skewed, unimodal, without outliers, and the sample
size is 16 to 40.
|‘ The sample size is greater than 40, without outliers.

24
This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3)
analyze sample data, and (4) interpret results.

   &   

Every hypothesis test requires the analyst to state a null hypothesis and an alternative
hypothesis. The hypotheses are stated in such a way that they are mutually exclusive. That is, if
one is true, the other must be false; and vice versa.
The hypotheses concern a new variable d, which is based on the difference between paired
values from two data sets.
d = x1 - x2
where x1 is the value of variable x in the first data set, and x2 is the value of the variable from
the second data set that is paired with x1.
The table below shows three sets of null and alternative hypotheses. Each makes a statement
about how the true difference in population values ʅd is related to some hypothesized value D.
(In the table, the symbol т means " not equal to ".)

 >   =   "   >  


1 ʅd= D ʅd т D 2
2 ʅd > D ʅd < D 1
3 ʅd < D ʅd > D 1

The first set of hypotheses (Set 1) is an example of a two-tailed test, since an extreme value on
either side of the sampling distribution would cause a researcher to reject the null hypothesis.
The other two sets of hypotheses (Sets 2 and 3) are one-tailed tests, since an extreme value on
only one side of the sampling distribution would cause a researcher to reject the null
hypothesis.

*  =   @ 

The analysis plan describes how to use sample data to accept or reject the null hypothesis. It
should specify the following elements.
2‘ Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or
0.10; but any value between 0 and 1 can be used.
2‘ Test method. Use the matched-pairs t-test to determine whether the difference
between sample means for paired data is significantly different from the hypothesized
difference between population means.

= <8 

Using sample data, find the standard deviation, standard error, degrees of freedom, test
statistic, and the P-value associated with the test statistic.
2‘ Standard deviation. Compute the standard deviation (sd) of the differences computed
from n matched pairs.
sd = sqrt [ (ɇ(di - d)2 / (n - 1) ]
where di is the difference for pair á, d is the sample mean of the differences, and n is the
number of paired values.
2‘ Standard error. Compute the standard error (SE) of the sampling distribution of d.
SE = sd * sqrt{ ( 1/n ) * ( 1 - n/N ) * [ N / ( N - 1 ) ] }
where d is the standard deviation of the sample difference,  is the population size, and  is
the sample size. When the population size is much larger (at least 10 times larger) than the
sample size, the standard error can be approximated by:
SE = sd / sqrt( n )

25
2‘ Degrees of freedom. The degrees of freedom (DF) is: DF = n - 1 .
2‘ Test statistic. The test statistic is a t-score (t) defined by the following equation.
t = [ (x1 - x2) - D ] / SE = (d - D) / SE
where x1 is the mean of sample 1, x2 is the mean of sample 2, d is the mean difference between
paired values in the sample, D is the hypothesized difference between population means, and
SE is the standard error.
2‘ P-value. The P-value is the probability of observing a sample statistic as extreme as the
test statistic. Since the test statistic is a t-score, use the t Distribution Calculator to
assess the probability associated with the t-score, having the degrees of freedom
computed above. (See the sample problem at the end of this lesson for guidance on
how this is done.)

2  A  

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null
hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting
the null hypothesis when the P-value is less than the significance level.

&     @  

This lesson explains how to conduct a hypothesis test of a proportion, when the following
conditions are met:
2‘ The sampling method is simple random sampling.
2‘ Each sample point can result in just two possible outcomes. We call one of these
outcomes a success and the other, a failure.
2‘ The sample includes at least 10 successes and 10 failures. (Some texts say that 5
successes and 5 failures are enough.)
2‘ The population size is at least 10 times as big as the sample size.
This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3)
analyze sample data, and (4) interpret results.

   &   

Every hypothesis test requires the analyst to state a null hypothesis and an alternative
hypothesis. The hypotheses are stated in such a way that they are mutually exclusive. That is, if
one is true, the other must be false; and vice versa.

*  =   @ 

The analysis plan describes how to use sample data to accept or reject the null hypothesis. It
should specify the following elements.
2‘ Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or
0.10; but any value between 0 and 1 can be used.
2‘ Test method. Use the one-sample z-test to determine whether the hypothesized
population proportion differs significantly from the observed sample proportion.

= <8 

Using sample data, find the test statistic and its associated P-Value.
2‘ Standard deviation. Compute the standard deviation (ʍ) of the sampling distribution.
ʍ = sqrt[ P * ( 1 - P ) / n ]
where P is the hypothesized value of population proportion in the null hypothesis, and n is the
sample size.
2‘ Test statistic. The test statistic is a z-score (z) defined by the following equation.

26
z = (p - P) / ʍ
where P is the hypothesized value of population proportion in the null hypothesis, p is the
sample proportion, and ʍ is the standard deviation of the sampling distribution.
2‘ P-value. The P-value is the probability of observing a sample statistic as extreme as the
test statistic. Since the test statistic is a z-score, use the Normal Distribution Calculator
to assess the probability associated with the z-score. (See sample problems at the end of
this lesson for examples of how this is done.)

2  A  

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null
hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting
the null hypothesis when the P-value is less than the significance level.

&     @  '(

In the previous lesson, we showed how to conduct a hypothesis test for a proportion when the
sample included at least 10 successes and 10 failures. This requirement serves two purposes:
2‘ It guarantees that the sample size will be at least 20 when the proportion is 0.5.
2‘ It ensures that the minimum acceptable sample size increases as the proportion
becomes more extreme.
When the sample does not include at least 10 successes and 10 failures, the sample size will be
too small to justify the hypothesis testing approach presented in the previous lesson. This
lesson describes how to test a hypothesis about a proportion when the sample size is small, as
long as the sample includes at least one success and one failure. The key steps are:
2‘ Formulate the hypotheses to be tested. This means stating the null hypothesis and the
alternative hypothesis.
2‘ Determine the sampling distribution of the proportion. If the sample proportion is the
outcome of a binomial experiment, the sampling distribution will be binomial. If it is the
outcome of a hypergeometric experiment, the sampling distribution will be
hypergeometric.
2‘ Specify the significance level. (Researchers often set the significance level equal to 0.05
or 0.01, although other values may be used.)
2‘ Based on the hypotheses, the sampling distribution, and the significance level, define
the region of acceptance.
2‘ Test the null hypothesis. If the sample proportion falls within the region of acceptance,
accept the null hypothesis; otherwise, reject the null hypothesis.
Hypothesis Test for Difference Between Proportions

This lesson explains how to conduct a hypothesis test to determine whether the difference
between two proportions is significant. The test procedure, called the two-proportion z-test, is
appropriate when the following conditions are met:

* The sampling method for each population is simple random sampling.


* The samples are independent.
* Each sample includes at least 10 successes and 10 failures. (Some texts say that 5 successes
and 5 failures are enough.)
* Each population is at least 10 times as big as its sample.

This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3)
analyze sample data, and (4) interpret results.
State the Hypotheses

27
Every hypothesis test requires the analyst to state a null hypothesis and an alternative
hypothesis. The table below shows three sets of hypotheses. Each makes a statement about the
difference d between two population proportions, P1 and P2. (In the table, the symbol т means
" not equal to ".)

Set Null hypothesis Alternative hypothesis Number of tails


1 P1 - P2 = 0 P1 - P2 т 0 2
2 P1 - P2 > 0 P1 - P2 < 0 1
3 P1 - P2 < 0 P1 - P2 > 0 1

The first set of hypotheses (Set 1) is an example of a two-tailed test, since an extreme value on
either side of the sampling distribution would cause a researcher to reject the null hypothesis.
The other two sets of hypotheses (Sets 2 and 3) are one-tailed tests, since an extreme value on
only one side of the sampling distribution would cause a researcher to reject the null
hypothesis.

When the null hypothesis states that there is no difference between the two population
proportions (i.e., d = 0), the null and alternative hypothesis for a two-tailed test are often stated
in the following form.

H0: P1 = P2
Ha: P1 т P2

*  =   @ 

The analysis plan describes how to use sample data to accept or reject the null hypothesis. It
should specify the following elements.

* Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10;
but any value between 0 and 1 can be used.

* Test method. Use the two-proportion z-test (described in the next section) to determine
whether the hypothesized difference between population proportions differs significantly from
the observed sample difference.

= <8 

Using sample data, complete the following computations to find the test statistic and its
associated P-Value.

* Pooled sample proportion. Since the null hypothesis states that P1=P2, we use a pooled
sample proportion (p) to compute the standard error of the sampling distribution.

p = (p1 * n1 + p2 * n2) / (n1 + n2)


where p1 is the sample proportion from population 1, p2 is the sample proportion from
population 2, n1 is the size of sample 1, and n2 is the size of sample 2.

* Standard error. Compute the standard error (SE) of the sampling distribution difference
between two proportions.

SE = sqrt{ p * ( 1 - p ) * [ (1/n1) + (1/n2) ] }


where p is the pooled sample proportion, n1 is the size of sample 1, and n2 is the size of
sample 2.

28
* Test statistic. The test statistic is a z-score (z) defined by the following equation.

z = (p1 - p2) / SE
where p1 is the proportion from sample 1, p2 is the proportion from sample 2, and SE is the
standard error of the sampling distribution.

* P-value. The P-value is the probability of observing a sample statistic as extreme as the test
statistic. Since the test statistic is a z-score, use the Normal Distribution Calculator to assess the
probability associated with the z-score

The analysis described above is a two-proportion z-test.

2  A  

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null
hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting
the null hypothesis when the P-value is less than the significance level.

A! =  

In this lesson, we describe how to find the region of acceptance for a hypothesis test.
I , 

 , 
&     
The steps taken to define the region of acceptance will vary, depending on whether the null
hypothesis and the alternative hypothesis call for one- or two-tailed hypothesis tests. So we
begin with a brief review.
The table below shows three sets of hypotheses. Each makes a statement about how the
population mean ʅ is related to a specified value î. (In the table, the symbol т means " not
equal to ".)

 >   =   "   >  


1 ʅ=M ʅтM 2
2 ʅ>M ʅ<M 1
3 ʅ<M ʅ>M 1

The first set of hypotheses (Set 1) is an example of a two-tailed test, since an extreme value on
either side of the sampling distribution would cause a researcher to reject the null hypothesis.
The other two sets of hypotheses (Sets 2 and 3) are one-tailed tests, since an extreme value on
only one side of the sampling distribution would cause a researcher to reject the null
hypothesis.

& *
 A! =  

We define the region of acceptance in such a way that the chance of making a Type I error is
equal to the significance level. Here is how that is done.
2‘ Define a test statistic. Here, the test statistic is the sample measure used to estimate the
population parameter that appears in the null hypothesis. For example, suppose the null
hypothesis is
H0: ʅ = M
The test statistic, used to estimate î, would be . If î were a population mean,  would be
the sample mean; if î were a population proportion,  would be the sample proportion; if î
were a difference between population means,  would be the difference between sample
means; and so on.

29
2‘ Given the significance level ɲ , find the upper limit (UL) of the region of acceptance.
There are three possibilities, depending on the form of the null hypothesis.
|‘ If the null hypothesis is ʅ < M: The upper limit of the region of acceptance will be
equal to the value for which the cumulative probability of the sampling
distribution is equal to one minus the significance level. That is, P( m < UL ) = 1 -
ɲ.
|‘ If the null hypothesis is ʅ = M: The upper limit of the region of acceptance will be
equal to the value for which the cumulative probability of the sampling
distribution is equal to one minus the significance level divided by 2. That is, P( m
< UL ) = 1 - ɲ/2 .
|‘ If the null hypothesis is ʅ > M: The upper limit of the region of acceptance is
equal to plus infinity, unless the test statistic were a proportion or a percentage.
The upper limit is 1 for a proportion, and 100 for a percentage.

2‘ In a similar way, we find the lower limit (LL) of the range of acceptance. Again, there are
three possibilities, depending on the form of the null hypothesis.
|‘ If the null hypothesis is ʅ < M: The lower limit of the region of acceptance is
equal to minus infinity, unless the test statistic is a proportion or a percentage.
The lower limit for a proportion or a percentage is zero.
|‘ If the null hypothesis is ʅ = M: The lower limit of the region of acceptance will be
equal to the value for which the cumulative probability of the sampling
distribution is equal to the significance level divided by 2. That is, P( m < LL ) =
ɲ/2 .
|‘ If the null hypothesis is ʅ > M: The lower limit of the region of acceptance will be
equal to the value for which the cumulative probability of the sampling
distribution is equal to the significance level. That is, P( m < LL ) = ɲ .

The region of acceptance is defined by the range between LL and UL.



@&     

The probability of   committing a Type II error is called the  of a hypothesis test.

 <

To compute the power of the test, one offers an alternative view about the "true" value of the
population parameter, assuming that the null hypothesis is false. The   < is the
difference between the true value and the value specified in the null hypothesis.
Effect size = True value - Hypothesized value
For example, suppose the null hypothesis states that a population mean is equal to 100. A
researcher might ask: What is the probability of rejecting the null hypothesis if the true
population mean is equal to 90? In this example, the effect size would be 90 - 100, which equals
-10.

*    = @

The power of a hypothesis test is affected by three factors.


2‘ Sample size (). Other things being equal, the greater the sample size, the greater the
power of the test.
2‘ Significance level (ɲ). The higher the significance level, the higher the power of the test.
If you increase the significance level, you reduce the region of acceptance. As a result,
you are more likely to reject the null hypothesis. This means you are less likely to accept
the null hypothesis when it is false; i.e., less likely to make a Type II error. Hence, the
power of the test is increased.
30
2‘ The "true" value of the parameter being tested. The greater the difference between the
"true" value of a parameter and the value specified in the null hypothesis, the greater
the power of the test. That is, the greater the effect size, the greater the power of the
test.

&  @

When a researcher designs a study to test a hypothesis, he/she should compute the power of
the test (i.e., the likelihood of avoiding a Type II error).

&   @&     

To compute the power of a hypothesis test, use the following three-step procedure.
2‘ Define the region of acceptance. Previously, we showed how to compute the region of
acceptance for a hypothesis test.
2‘ Specify the critical parameter value. The   " is an alternative to the
value specified in the null hypothesis. The difference between the critical parameter
value and the value from the null hypothesis is called the   <. That is, the effect
size is equal to the critical parameter value minus the value from the null hypothesis.
2‘ Compute power. Assume that the true population parameter is equal to the critical
parameter value, rather than the value specified in the null hypothesis. Based on that
assumption, compute the probability that the sample estimate of the population
parameter will fall outside the region of acceptance. That probability is the power of the
test.
,+F
 ,,*   

This lesson explains how to conduct a , +!


    . The test is applied when
you have one categorical variable from a single population. It is used to determine whether
sample data are consistent with a hypothesized distribution.
For example, suppose a company printed baseball cards. It claimed that 30% of its cards were
rookies; 60%, veterans; and 10%, All-Stars. We could gather a random sample of baseball cards
and use a chi-square goodness of fit test to see whether our sample distribution differed
significantly from the distribution claimed by the company. The sample problem at the end of
the lesson considers this example.
The test procedure described in this lesson is appropriate when the following conditions are
met:
2‘ The sampling method is simple random sampling.
2‘ The population is at least 10 times as large as the sample.
2‘ The variable under study is categorical.
2‘ The expected value for each level of the variable is at least 5.
This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3)
analyze sample data, and (4) interpret results.

   &   

Every hypothesis test requires the analyst to state a null hypothesis and an alternative
hypothesis. The hypotheses are stated in such a way that they are mutually exclusive. That is, if
one is true, the other must be false; and vice versa.
For a chi-square goodness of fit test, the hypotheses take the following form.
H0: The data are consistent with a specified distribution.
Ha: The data are   consistent with a specified distribution.

31
Typically, the null hypothesis specifies the proportion of observations at each level of the
categorical variable. The alternative hypothesis is that  one of the specified proportions
is not true.

*  =   @ 

The analysis plan describes how to use sample data to accept or reject the null hypothesis. The
plan should specify the following elements.
2‘ Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or
0.10; but any value between 0 and 1 can be used.
2‘ Test method. Use the chi-square goodness of fit test to determine whether observed
sample frequencies differ significantly from expected frequencies specified in the null
hypothesis. The chi-square goodness of fit test is described in the next section, and
demonstrated in the sample problem at the end of this lesson.

= <8 

Using sample data, find the degrees of freedom, expected frequency counts, test statistic, and
the P-value associated with the test statistic.
2‘ Degrees of freedom. The degrees of freedom (DF) is equal to the number of levels (k) of
the categorical variable minus 1: DF = k - 1 .
2‘ Expected frequency counts. The expected frequency counts at each level of the
categorical variable are equal to the sample size times the hypothesized proportion
from the null hypothesis
Ei = npi
where Ei is the expected frequency count for the áth level of the categorical variable, n is the
total sample size, and pi is the hypothesized proportion of observations in level á.
2
2‘ Test statistic. The test statistic is a chi-square random variable (ɍ ) defined by the
following equation.
ɍ = ɇ [ (Oi - Ei)2 / Ei ]
2

where Oi is the observed frequency count for the áth level of the categorical variable, and Ei is
the expected frequency count for the áth level of the categorical variable.
2‘ P-value. The P-value is the probability of observing a sample statistic as extreme as the
test statistic. Since the test statistic is a chi-square, use the Chi-Square Distribution
Calculator to assess the probability associated with the test statistic. Use the degrees of
freedom computed above.

2  A  

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null
hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting
the null hypothesis when the P-value is less than the significance level.

,+  &!  

This lesson explains how to conduct a , +  !  . The test is applied to a
single categorical variable from two different populations. It is used to determine whether
frequency counts are distributed identically across different populations.
For example, in a survey of TV viewing preferences, we might ask respondents to identify their
favorite program. We might ask the same question of two different populations, such as males
and females. We could use a chi-square test for homogeneity to determine whether male
viewing preferences differed significantly from female viewing preference. The sample problem
at the end of the lesson considers this example.

32
The test procedure described in this lesson is appropriate when the following conditions are
met:
2‘ For each population, the sampling method is simple random sampling.
2‘ Each population is at least 10 times as large as its respective sample.
2‘ The variable under study is categorical.
2‘ If sample data are displayed in a contingency table (Populations x Category levels), the
expected frequency count for each cell of the table is at least 5.
This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3)
analyze sample data, and (4) interpret results.

   &   

Every hypothesis test requires the analyst to state a null hypothesis and an alternative
hypothesis. The hypotheses are stated in such a way that they are mutually exclusive. That is, if
one is true, the other must be false; and vice versa.
Suppose that data were sampled from
populations, and assume that the categorical variable
had  levels. At any specified level of the categorical variable, the null hypothesis states that
each population has the same proportion of observations. Thus,
H0: Plevel 1 of population 1 = Plevel 1 of population 2 = . . . = Plevel 1 of population r
H0: Plevel 2 of population 1 = Plevel 2 of population 2 = . . . = Plevel 2 of population r
...
H0: Plevel c of population 1 = Plevel c of population 2 = . . . = Plevel c of population r
The alternative hypothesis (Ha) is that  one of the null hypothesis statements is false.

*  =   @ 

The analysis plan describes how to use sample data to accept or reject the null hypothesis. The
plan should specify the following elements.
2‘ Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or
0.10; but any value between 0 and 1 can be used.
2‘ Test method. Use the chi-square test for homogeneity to determine whether observed
sample frequencies differ significantly from expected frequencies specified in the null
hypothesis. The chi-square test for homogeneity is described in the next section.

= <8 

Using sample data from the contingency tables, find the degrees of freedom, expected
frequency counts, test statistic, and the P-value associated with the test statistic. The analysis
described in this section is illustrated in the sample problem at the end of this lesson.
2‘ Degrees of freedom. The degrees of freedom (DF) is equal to:
DF = (r - 1) * (c - 1)
where r is the number of populations, and c is the number of levels for the categorical variable.
2‘ Expected frequency counts. The expected frequency counts are computed separately
for each population at each level of the categorical variable, according to the following
formula.
Er,c = (nr * nc) / n
where Er,c is the expected frequency count for population r at level  of the categorical variable,
nr is the total number of observations from population r, nc is the total number of observations
at treatment level , and n is the total sample size.
2
2‘ Test statistic. The test statistic is a chi-square random variable (ɍ ) defined by the
following equation.
ɍ2 = ɇ [ (Or,c - Er,c)2 / Er,c ]

33
where Or,c is the observed frequency count in population
for level  of the categorical variable,
and Er,c is the expected frequency count in population
for level  of the categorical variable.
2‘ P-value. The P-value is the probability of observing a sample statistic as extreme as the
test statistic. Since the test statistic is a chi-square, use the Chi-Square Distribution
Calculator to assess the probability associated with the test statistic. Use the degrees of
freedom computed above.

2  A  

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null
hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting
the null hypothesis when the P-value is less than the significance level.

,+  2

 

This lesson explains how to conduct a , +  



 . The test is applied
when you have two categorical variables from a single population. It is used to determine
whether there is a significant association between the two variables.
For example, in an election survey, voters might be classified by gender (male or female) and
voting preference (Democrat, Republican, or Independent). We could use a chi-square test for
independence to determine whether gender is related to voting preference. The sample
problem at the end of the lesson considers this example.
The test procedure described in this lesson is appropriate when the following conditions are
met:
2‘ The sampling method is simple random sampling.
2‘ Each population is at least 10 times as large as its respective sample.
2‘ The variables under study are each categorical.
2‘ If sample data are displayed in a contingency table, the expected frequency count for
each cell of the table is at least 5.
This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3)
analyze sample data, and (4) interpret results.

   &   

Suppose that Variable A has


levels, and Variable B has  levels. The null hypothesis states that
knowing the level of Variable A does not help you predict the level of Variable B. That is, the
variables are independent.
H0: Variable A and Variable B are independent.
Ha: Variable A and Variable B are not independent.
The alternative hypothesis is that knowing the level of Variable A  help you predict the level
of Variable B.
Note: Support for the alternative hypothesis suggests that the variables are related; but the
relationship is not necessarily causal, in the sense that one variable "causes" the other.

*  =   @ 

The analysis plan describes how to use sample data to accept or reject the null hypothesis. The
plan should specify the following elements.
2‘ Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or
0.10; but any value between 0 and 1 can be used.
2‘ Test method. Use the chi-square test for independence to determine whether there is a
significant relationship between two categorical variables.


34
= <8 

Using sample data, find the degrees of freedom, expected frequencies, test statistic, and the P-
value associated with the test statistic. The approach described in this section is illustrated in
the sample problem at the end of this lesson.
2‘ Degrees of freedom. The degrees of freedom (DF) is equal to:
DF = (r - 1) * (c - 1)
where r is the number of levels for one catagorical variable, and c is the number of levels for
the other categorical variable.
2‘ Expected frequencies. The expected frequency counts are computed separately for each
level of one categorical variable at each level of the other categorical variable. Compute
r * c expected frequencies, according to the following formula.
Er,c = (nr * nc) / n
where Er,c is the expected frequency count for level
of Variable A and level  of Variable B, nr is
the total number of sample observations at level r of Variable A, nc is the total number of
sample observations at level  of Variable B, and n is the total sample size.
2
2‘ Test statistic. The test statistic is a chi-square random variable (ɍ ) defined by the
following equation.
ɍ2 = ɇ [ (Or,c - Er,c)2 / Er,c ]
where Or,c is the observed frequency count at level
of Variable A and level  of Variable B, and
Er,c is the expected frequency count at level
of Variable A and level  of Variable B.
2‘ P-value. The P-value is the probability of observing a sample statistic as extreme as the
test statistic. Since the test statistic is a chi-square, use the Chi-Square Distribution
Calculator to assess the probability associated with the test statistic. Use the degrees of
freedom computed above.

2  A  

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null
hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting
the null hypothesis when the P-value is less than the significance level.

35

Вам также может понравиться