Академический Документы
Профессиональный Документы
Культура Документы
MATHEMATICS 536
STATISTICS
Independent Study Unit
0.1
Mathematics 536 Statistics
Table of Contents
0.2
Mathematics 536 Statistics
General Introduction
Statistics is a branch of mathematics which is full of subtleties and
which allows us to draw conclusions from a mass of data. In this unit, for
example, statistics will help you to better understand certain realities,
as in the life of Tommy and Clive, two cousins whom you will meet here.
They will even resort to statistical studies to clear up the “selection
mystery” for CEGEP candidates.
This will involve the study of the Z-score, which is used by educational
institutions to classify students at the time of their applications, and of
measures of dispersion.
The titles below sum up the important ideas which will be addressed in
each of these sections:
1. Measures of dispersion
2. Correlation
0.3
Mathematics 536 Statistics
Unhappily, it hasn’t happened! Clive has been refused in the first round of
applications and must wait for the second round. The boys let out their
frustrations, but then decide they need to know why this has happened, and
decide to ask some pertinent questions about the criteria by which the
colleges go about making their decisions on admission. How do colleges
classify students in order to select successful candidates? Could there be
any element of favouritism?
Section 1 1.1
Mathematics 536 Statistics
Range
You should have found that the ? Is 5 because the sum of the elements in
the first list is 45 and 45 divided by 9 elements is 5 years. For the mean
of the second list of ages to equal 5 as well, since it also contains 9
elements, the second list must also sum to 45.
How are the two lists similar? _____________________________
______________________________________________________
You may have found that,in the second list, the data are more closely
grouped around the mean and that, even if both lists have the same mean,
they are far from being composed of similar elements. Thus, it is not
enough to rely on a measure of central tendency (mean, median, mode,
studied in 436), but we must also consider the measures of dispersion .
A measure of dispersion indicates if the elements of a sample being
studied are little or very spread out.
1.2 Section 1
Mathematics 536 Statistics
You should have found: 9 - 1 = 8 and 7 - 3 = 4. Thus, in the first case, the
difference between the age of the youngest and of the oldest is 8 years,
and, in the second case, the difference is 4 years. These are the two
numbers which represent the ranges of the distributions.
We find the situation when there are only a few extreme values, or when
the distribution is not uniform. For example, if Jenny, a student in
Secondary V, wants to make a study on people’s mass. She asks some
friends from her class and also her little sister. She collects the
following data: 10, 42, 48, 49, 51, 55, 57, 57 and 58 kg.
Why?______________________________________________________
Certainly you found that the range is 58 - 10 = 48 kg, but we could obtain
a better value for the range of this data - 16 kg - by removing the non-
appropriate values.
Section 1 1.3
Mathematics 536 Statistics
Is there a value in this distribution which does not seem to fit with
the others? ___________________________________________
_____________________________________________________
It would appear that the mass of the little sister , that is 10 kg, is
not representative of the set of masses.
One can easily draw out the pertinent information regarding the dispersion
of data using a box-and-whisker plot which you saw last year.
Box-and-whisker plot
I n a box-and-whisker plot, the quartiles separate the distribution
into 4 parts, each containing 25% of the data. Q2 represents the
median and Q 1 and Q3 represent the medians of the lower and upper
halves
Q1 Q2 Q3
range
1.4 Section 1
Mathematics 536 Statistics
Semi-interquartile Range
In the case where the range does not give a good indication of the
distribution, we use other measures. An easy measure to find is the semi-
interquartile range.
Q3 − Q1
The semi-interquartile range is given by the expression where
2
Q1 and Q2 are the first and third quartiles.
The semi-interquartile range of the students in Tommy’s chemistry
class is:_____________________________________________
92% − 52%
Certainly, it is = 20%. This 20% is a measure of
2
dispersion and not of position like the mode, mean or median. It must be
understood that the measure of dispersion makes sense when it is used in
making a comparison. So, for example, if Tommy knows that the other
chemistry class has a semi-interquartile range of 30%, this indicates that
the results in his class are less spread out and are more closely grouped
around the a “middle” mark (about 70%, the median, for example).
On the other hand, the semi-interquartile range is not a sufficiently
reliable measure in certain cases since it only considers have of the
values in the distribution. However, we may encounter it being used in
advertising, for example, where there is the need to give a rapid
indication of the dispersion of the data in a majority (50%) of cases.
Section 1 1.5
Mathematics 536 Statistics
Mean Variation
A third measure of dispersion is the mean variation which is defined
below.
The mean deviation is the mean of the deviations from the mean.
∑ xi − X
Mean variation = where x i represents the data points, X the
n
mean, n the number of data points and ∑ is the symbol for the summation
(total)
Help Clive to complete the following table in order to find the mean
deviation.
Hours of TV watched 18 14 2 0 0 5 10
Mean variation x i − X
1.6 Section 1
Mathematics 536 Statistics
Although the mean variation takes all values into account, the difficulties
in manipulating absolute values means that statisticians use almost
exclusively the standard deviation, which uses squares.
Standard Deviation
The standard deviation σ (sigma) is the most widely used of the
measures of dispersion. The graphing calculator calculates the standard
deviation very rapidly, as well as the mean, median etc. To calculate the
standard deviation, the calculator uses the following formula:
∑(x i − X ) 2
σ = where x i represents the data points, X , the mean, n
n
the number of data points and ∑ the summation (or the total).
N.B. In the case of a sample, or if n < 30:
- replace σ by S
- replace n by n-1
- in case your batteries die, it is worth knowing the standard
deviation formula.
Section 1 1.7
Mathematics 536 Statistics
How would you find the value of the mean deviation for this
distribution? ___________________________________________
______________________________________________________
______________________________________________________
Enter the data representing the population of the ten towns into a graphic
calculator.
1.8 Section 1
Mathematics 536 Statistics
We must take SX because there are only 10 towns (<30). Thus, analyzing
the results, we obtain a mean population for the 10 towns of 78 787
inhabitants, and a standard deviation of 40 770 inhabitants. We cannot
interpret these results in themselves, since we cannot compare them with
results from another distribution of similar data on the same subject.
There is, however, a measure of position which can help us to analyze this
sort of data: the standard score or Z-score.
Section 1 1.9
Mathematics 536 Statistics
Z-Score
Do you remember Clive and Tommy? Of course! Some days after receiving
their respective CEGEP replies, Tommy called clive and said to him, to
give him some comfort, “Don’t worry about it, someone has explained to
me that it is perhaps your Z-score which caused you to not be accepted the
first time. You will certainly be accepted second time.” Clive understood
nothing of what his cousin had explained to him about Z-scores, so his
encouragement did not have the anticipated effect. Here you will find out
what it is...
X
− 2σ + 2σ
95.5% of data
1.10 Section 1
Mathematics 536 Statistics
The two graphs below show that the distributions are fundamentally
equivalent. They differ only in their means (X ) and by their standard
deviations (σ ). To better compare the distributions, we use a measure
which eliminates the differences in the standard deviations and in the
means. This new measure is called the Z-score (standard score).
Figure 1.7 Variation in the mean without changing the standard deviation
σ1 ≠ σ2 ≠ σ3
Figure 1.8 Variation in the standard deviation without changing the mean.
X X X
σ1 = σ2 = σ3
Section 1 1.11
Mathematics 536 Statistics
-1.5 σ
_________________________________________________
1.12 Section 1
Mathematics 536 Statistics
So, if you lay out the results of all the students in Tommy’s and Clive’s
classes, you will be able to calculate their respective Z-scores and
attempt to solve the mystery which is still hovering over us!
Here are the results - average marks - of the of the students when they
sent their applications for admission to the CEGEPs.
Figure 1.10 Average marks of the students in the two groups
Group Average mark
Tommy’s class 76 70 80 81 68 60 78 84 75 72 70 77 75 83 60
88 77 74 74 65 83 78 73 80 70 68 85 77 74
Clive’s class 85 70 53 66 77 85 81 70 73 95 58 68 70 73 91
85 56 67 68 69 80 88 56 95 77 77 72 80 90
Enter both sets of values in the graphing calculator, find the values
of the means and standard deviations of the two distributions and
calculate the Z-scores of the two cousins. The numbers in bold
characters represent their respective marks.
__________________________________________________________
__________________________________________________________
__________________________________________________________
You should have found that Tommy’s class had a mean of 75% and a
standard deviation of 6.9%. Clive’s class had a mean of 75% and a standard
deviation of 11.5% (figures 1.11 and 1.12). Since Tommy had 88%, his Z-
88% − 75%
score is 6.9%
= 1.88. You should note that the subtraction of two
percentages gives a percentage and that the division of the two
percentages gives a ratio (or the number of times the standard deviation
is contained in the deviation from the mean). In the case of Clive, who had
85% − 75%
obtained a mark slightly lower - 85%, his Z-score is 11. 5 % = 0.87. Look
at the results of the two cousins placed on the same graph (figure 1.13).
Section 1 1.13
Mathematics 536 Statistics
This value removes the effect of the mean and the standard deviation of
the two groups. This permits a comparison between the different data
sets.
Tommy’s
position
+1.88 σ1
X1 = 75%
Clive’s
position
+0.87 σ2
X2 = 75%
σ2 = 11.5%
1.14 Section 1
Mathematics 536 Statistics
Clive’s
position
+0.87 Tommy’s
position
+1.88
Z=0
Explain the difference in their Z-scores knowing that the means of
their two respective groups are identical._____________________
______________________________________________________
______________________________________________________
Section 1 1.15
Mathematics 536 Statistics
1.16 Section 1
Mathematics 536 Statistics
1.2 Practise
1. A family doctor collected in a table the heights (in m) of some of his
patients in order to do some statistical analysis. The table is below:
1.85 0.95 1.04 1.15 0.80 1.18 1.32 1.45 1.24 1.03 1.28
1.75 1.42 1.53 1.22 1.24 1.27 1.18 1.53 1.29 1.41 0.99
1.33 1.21 1.28 1.52 1.65 0.42 1.80 1.10 1.25 1.35 1.42
1.26 1.32 1.18 1.32 1.22 1.05 1.23 1.42 0.75 1.15 1.32
a) the range__________________________________________
Section 1 1.17
Mathematics 536 Statistics
2. Statistics Canada collects national data in order to obtain statistics about a wide
range of subjects. This is very useful for determining social, economic,
environmental and other policies. Here are some data collected by certain
countries in 1993.
Figure 1.14
Population, density, birth and death rates, 1993
1 2 3 4
1.18 Section 1
Mathematics 536 Statistics
Range
Semi-interquartile
range
Mean variation
Standard
variation
Canada’s Z-score
Section 1 1.19
Mathematics 536 Statistics
1.20 Section 1
Mathematics 536 Statistics
∑ (x i − X )
2
Formula 1: s =
n − 1
(∑x i )
2
∑x i −
2
n
Formula 2: s =
n − 1
Check whether they would give the same result with the following distribution: 7, 8, 3,
1, 1, 4.
Figure 1.17
xi xi − x (x i − x ) 2 xi 2
7
8
3
1
1
4
∑x i = ∑ (x i − x ) = ∑ (x i − x ) 2 = ∑x i 2 =
Using formula 1: s=
Using formula 2: s=
For someone without a graphing calculator, why is it preferable to use formula 2?____
__________________________________________________________________
Section 1 1.21