Академический Документы
Профессиональный Документы
Культура Документы
x i
Add up the values x i
μ i 1
x i 1
N Divide by the number n
of values
Population size Sample size
2
Properties of the mean:
+ Each quantitative data set has one and only one mean;
+ It is the most comprehensive measure of central location
(i.e. it is computed from all available data values);
+ The (sample) mean is used extensively in inferential statistics;
– It can be distorted by outliers (or extreme values).
50% 50%
smallest largest
Median
3
How to find the median ‘manually’?
i. Sort the data from smallest to largest.
ii. Choose the middle value if n (N) is odd,
or take the average of the two middle values if n (N) is even.
+ Each quantitative data set has one and only one median;
+ It is unaffected by outliers;
It is computed from at most two data points;
– It has limited application and mathematical potential.
+ Each quantitative data set has one and only one range;
It is computed from only two data points;
– It is affected by outliers;
– It has limited application and mathematical potential.
i i
Sum of squared
( x μ ) 2
deviations ( x x ) 2
σ2 i 1
s2 i 1
N divided by N, n-1 n 1
5
Properties of the variance:
+ Each quantitative data set has one and only one variance;
+ It is a comprehensive measure of dispersion;
– It is affected by outliers;
– It is conceptually complicated;
– It is hard to interpret since it is given in ‘squared’ units of the
observations.
σ σ2 s s2
The standard deviation has similar properties than the variance, but
+ It is easier to interpret since it is given in the original units;
+ s is used extensively in inferential statistics.
6
• The range, the variance and the standard deviation are all ‘useless’ for
comparing the dispersions of data sets that are measured in different units
(e.g. kg and cm), or have markedly different magnitudes.
7
• Percentile: the p th percentile separates the lower p% of the
observations from the upper (100-p)%.
p% (100-p)%
smallest largest
pth percentile
• Quartiles: the 25th (Q1), 50th (Q2 or median) and 75th (Q3)
percentiles.
8
• Locating Percentiles: the following formula allows us to approximate
the location of any percentile, Lp is the location of the P th percentile:
P
L p (n 1)
100
• Calculate the 75th percentile of the data 0 0 5 7 8 9 12 14 22 33
75
L75 (10 1) 8.25
100
The 75th percentile is between the 8th and 9th data observations.
i.e. between 14 and 22.
0.25 (22-14) = 2
Therefore the 75th percentile is 14+2 = 16
9
• Inter-quartile range: IQR = Q3 – Q1
i.e. the range of the middle 50% of the data.
IQR (50%)
25% 25%
smallest Q1 Q3 largest
Q2 = Median
+ It is unaffected by outliers;
– It has limited application and mathematical potential,
but it is used to identify outliers.
10
3. DESCRIBING THE SHAPE OF A DATA SET
• (1) Plot the data using an histogram or polygon and observe its shape.
The distribution is said to be skewed, i.e. not symmetrical, if the
tails are not of the same length (approximately).
The distribution is skewed to the left The distribution is skewed to the right
(negatively skewed), if the left tail is (positively skewed), if the right tail is
longer than the right tail. longer than the left tail.
12
Negatively (or left) skewed
13
• (2) Compare the mean and the median.
Three possibilities:
i. mean = median Distribution is symmetrical
ii. mean < median Distribution is skewed to the left
iii. mean > median Distribution is skewed to the right
14
Kurtosis
Kurtosis measures the peakedness of a distribution. It can be
computed using MS Excel.
16
Ex 4:
We consider the price to earnings ratio and the dividend yield for 20 listed
shares. The data was downloaded from Selvanathan Case 3.1 and summarised
using MS Excel. We get the following results:
P/E ratio Div yield Mean: For the 20 listed shares the
average P/E ratio is 15.3, and the
average dividend yield is 4.4%.
Mean 15.3 Mean 4.4
Standard Error 1.2 Standard Error 0.4 Median: 50% of the shares have P/E
Median 13.9 Median 4.4 ratios less than 13.9 and the other
50% have P/E ratios more than 13.9.
Mode 15.0 Mode 5.5 50% of the shares have dividend
Standard Deviation 5.4 Standard Deviation 1.8 yields less than 4.4 and the other 50%
Sample Variance 29.0 Sample Variance 3.2 have dividend yields more than 4.4.
Kurtosis 3.0 Kurtosis 0.4
Skewness: The mean P/E ratio is
Skewness 1.8 Skewness -0.3
larger than the median, and so its
Range 21.1 Range 7.4 distribution is positively skewed.
Minimum 8.8 Minimum 0.3 Note, the skewness figure is positive
Maximum 29.9 Maximum 7.7 1.8. The mean dividend yield is the
same as the median, and so its
Sum 306.1 Sum 88.8
distribution is symmetrical. Note, the
Count 20 Count 20 skewness figure is very close to
zero.
17
Ex 4 Range: The range of P/E ratios is 21.1 Standard deviation: The
and the range of dividend yields is 7.4 average deviation of P/E
Continued:
ratios from the mean is
measured as 5.4, and that of
P/E ratio Div yield dividend yield as 1.8.
18
Identifying extreme values (outliers)
Q 1 11.975 Q 3 15
IQR 15 11.975 3.025