Академический Документы
Профессиональный Документы
Культура Документы
STATISTICS
Definition:
According to Moore and Mccabe (1999), statistics is the science of collecting, organising and
Class boundary refers to the actual class limit of a class interval is called class
boundary. Consider the class interval 60kg – 62kg, theoretically, the interval
includes all measurements from 59.5 to 62.5kg. The smaller number 59.5 is the
lower class boundary while the larger number 62.5 is the upper class boundary.
GRAPHICAL REPRESENTATION OF DATA
Pictorial representation
Line Graphs
Bar Graphs
Histograms
Frequency polygon
Pie charts
XY Graphs which show relationships between two sets of data e.g.
Scattergrams.
MEASURES OF CENTRAL TENDENCY
mode
median
mean
MODE
It is the piece of data with the highest frequency, i.e, it appears most often than the other
characters.
b) No mode
If the number of terms in the distribution is even the median is half (1/2) of the sum of two middle terms e.g. if n= 20
a) 7, 1, 4, 9, 7, 8, 6, 5, 6,3
(b) 1 3 4 5 6 7 7 8 9 n= 9
= 5th term
Therefore Median = 6
MEDIAN continues
(b) Ranking the numbers
19 24 29 36 50 60 77 82 100 105
= 50+60
2
= 110
2
It is obtained by adding all the terms in the distribution and dividing the sum by the number of terms in the
distribution.
It is represented by x̅ (x bar)
x1 + x2+ - - - - -=∑ x
MEAN continues
∑x = (x₁ + x₂ + x₃ - - - -xn)
n
n = Number of terms in the distribution
x̅ = ∑x
n
MEAN continues
ADVANTAGES OF MEAN
Quartiles
Quartiles are the three-point value (Q1, Q2, Q3) which divides an array data
into four equal parts.
Q1 is the first quartile of the distribution. Is the point that divides the
distribution into ratio 1:3 (25%= ¼),
position of the lower quartile =
Quartiles Cont.
Q is the median of the distribution. Is the point that divides the
2
distribution into two equal parts (50%).
Position of the median=
Q3 is the upper quartile of the distribution. Is the point that divides the
distribution into ratio 3:1 (75%=3/4)
Position of the upper quartile = ¾ (
N is the number of values or the number of times a raw score occurs.
Quartiles Cont.
Method 1
How to compute quartiles Using odd number of scores
Compute Q1, Q2, Q3 and Interquartile Range of the following scores 3, 1, 5,
9, 8, 6, 7
The number of scores is 7 (odd)
Arrange the marks in ascending order (from the lowest to the highest score)
Array 1 3 5 6 7 8 9
Rank 1st 2nd 3rd 4th 5th 6th 7th
Q1 Q2 Q3
Quartiles Cont.
Method 2
Computation of Q1, Q2, and Q3 Using position formula to locate the scores
Using the example above in the first method and locate the position of Q1,Q2 and
Q3
N is the number of scores = 7 (odd)
position of the lowest quartiles (Q1)= = = = 2nd
Position of the Q2 (median) = = =4th
Position of the upper quartile (Q3) = ¾ (=¾(= 6th
Quartiles Cont.
Q1 = 2nd position = 3
Q2 = 4th position(Median)= 6
Q3 = 6th position = 8
Interquartile Range= Q3 – Q1 = 8 – 3 =5
Method 2
Q1 = is between 2nd and 3rd
position 2.25th
= 2nd + 0.25 (3rd – 2nd )
= 2 + 0.25(4 - 2)
= 2 + 0.25 (2)
= 2 + 0.5
= 2.5
Quartiles Cont.
Q2 = is between 4th and 5th
Position 4.5th
4th + 0.5 (5th – 4th )
5 + 0.5 (6 – 5)
5 + 0.5 (1)
5 + 0.5
5.5
Q2 = 5.5
Quartiles Cont.
Q3 = is between 6th and 7th
Position 6.75th
6th + 0.75 (7th – 6th )
7 + 0.75 (8 – 7)
7 + 0.75 (1)
7 + 0.75
7.75
MEASURES OF DISPERSION /VARIABILITY/ Scatter/
SPREAD
They indicate how much the terms in the distribution are spread or scattered from the
mean/average.
The distribution can be measured relative to the mean (starting from the mean)
It focuses on the difference between the greatest and the smallest values in the distribution. There
are two types of range namely the ordinary range and the inclusive range.
It is the difference between the greatest and the smallest value in the distribution.
Disadvantages
Only focuses on outliers and ignores all the other terms.
It is greatly affected by outliers.
VARIANCE
2. Subtract the mean from each term in the distribution to get the deviations i.e
(x₁ - x̅ )(x₂-x̅ )- - - - (x n- x̅)
X X̅ X - X̅ (X - X̅ )²
60 71 -11 121
83 71 12 144
71 71 0 0
63 71 -8 64
89 71 18 324
90 71 19 361
40 71 -30 961
72 71 1 1
∑ ∑ (X - X̅ )² = 1976
VARIANCE continues
Using
sample variance formula
Variance
= 1976
8 -1
= 1976
7
=282, 2857143
SD =
SD =16,80136049
= 16,80 (2 d p)
INTERPRETATION OF VARIANCE AND
STANDARD DEVIATION
A large value of the standard deviation/variance shows that the values are widely
scattered relatively to the mean which means the greater the variance / standard
deviation the more widely spaced the terms are above and below the mean. The
smaller the variance the more closely packed the values are around the mean.
MEASURES OF ASSOCIATION
It focuses on the relationship between variables.
Correlation It is the degree of association between 2 or more variables or factors.
There are two types of variables
1) Independent variables (x) and the dependent variable (y).
The independent variable is the variable which is manipulated by the researcher during an
experiment.
The dependent variable is the factor which is influenced by the manipulation of the independent
variable
MEASURES OF ASSOCIATION continues
CORRELATION CO-EFFICIENT
It is the number which shows the size and direction of association between variables.
r normally represents a correlation co-efficient
The maximum value of r is + 1
The minimum value of r is – 1
This means r lies between -1 and + 1
When r is + 1 it is perfect positive correlation
When it is between 0,9 – 0,99 very strong positive correlation
“ “ 0,7 – 0,89 strong positive correlation
“ “ “ 0,4 – 0,69 moderate positive correlation
“ “ ‘ 0,2 – 0,39 weak positive correlation
“ “ “ 0,1 – 0,19 very weak positive correlation
CORRELATION CO-EFFICIENT continues
x y x² y² Xy
x₁ y₁ x₁² y₁² x₁ y₁
x₂ y₂ x₂² y₂² x₂ y₂
x₃ y₃ x₃² y₃² x₃ y₃
x₄ y₄ x₄² y₄² x₄ y₄
∑x ∑y ∑ x² ∑ y² ∑xy
PEARSON’S PRODUCT WORKED EXAMPLE
Ten Form 4 pupils at a certain school wrote two tests one in History and the other one in
Mathematics and results are as follows
pu A B C D E F G H I J
pil
HIS 80 74 56 52 78 90 73 65 40 75
TO
RY
Mat 40 52 75 74 50 54 59 60 71 48
hs
PEARSON WORKED EXAMPLE continues
x y x² y² Xy
n = 10
r
PEARSON WORKED EXAMPLE continues
Therefore r = - 0, 8 23 to 3 decimal
There is a very strong negative correlation between History marks and Mathematics marks
SPEARMAN’S RANK ORDER CORRELATION CO – EFFICIENT (rho)
This correlation co-efficient does not use the actual scores of the variables. It uses the rank
order of the scores (variables). The values of x and y are ranked separately either in ascending
or descending order. The corresponding rank orders are subtracted, squared and finally added
leading to ∑d².
SPEARMAN’S RANK ORDER CORRELATION CO -
EFFICIENT continues
x y Rank x Rank y D= r x – d²
(r x) (r y) ry
x y rx ry Rx -ry d²
50 52 2 2 0 0
60 3 3 0 0
75 58 5 5 0 0
42 80 1 1 0 0
92 47 6 6 0 0
61 95 4 4 0 0
60
∑ 380 ∑ 392 ∑ d² 0
SPEARMAN’S RANK ORDER continues
= 1- 0
=1
rho = 1 There is a perfect positive correlation between Maths and Physics marks.
SPEARMAN’S RANK ORDER EXAMPLE
2
AGE 61 71 72 74 83 54 74 67 57 61
(X)
MAS 63 61 51 58 48 75 57 60 75 61
S (Y)
SPEARMAN RANK continues
x y Rx Ry d= r x-r y d²
∑ d² =314,5
SPEARMAN continues
n = 10
When ranking if there are common numbers you add the numbers and divide by the number for
example 75 in the above table under (y) it falls under position 9 and 10 so it becomes 9+10
=19 divided by 2 = 9,5
SPEARMAN continues
=
= 1- 1,906060
= -0,906060
= - 0,91
There is a very strong negative correlation between age and mass that is as some gets older the
mass decreases
NORMAL DISTRIBUTION CURVE