Вы находитесь на странице: 1из 44

MEASURES OF LOCATION and

VARIATION for 1 variable


Lectures 3+4+5 Topics
•Measures of Central Tendency for
numerical and categorical data
Mean, Median, Mode + other means, Fractiles
•Measures of Variation for numerical and
binary data
The Range, Variance and
Standard Deviation
•Shape
Symmetric, Skewed, Skewness, Kurtosis
Summary Measures
Summary Measures

Central Tendency part Variation


of Location
Mean Mode Fractiles
Median Range Coefficient of
Variation
Variance

Standard Deviation
Measures of Central Tendency
Central Tendency

Mean Median Mode


n
xi
i 1
n
The Mean (Arithmetic mean,
Average)
•It is the Arithmetic Average of data values:

x 
n
 xi xi  x2      xn
i 1

Sample Mean n n
•The Most Common Measure of Central Tendency
•Affected by Extreme Values (Outliers)

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Mean = 5 Mean = 6
THE ARITHMETIC
MEAN
 This is the most popular and useful measure of central location

Sum of the observations


Mean =
Number of observations
THE ARITHMETIC
MEAN

Sample mean Population mean


n N
 i11 xxii i1 x i
x 
n N

Sample size Population size


The arithmetic
mean
THE ARITHMETIC
MEAN
• Example 1
The reported time spent on the Internet of 10 adults are 0, 7, 12, 5,
33, 14, 8, 0, 9, 22 hours. Find the mean time spent on the Internet.
 i 1 xi
10
0x1  7x2  ...  22
x10
x   11.0 hours
10 10
• Example 2
Suppose the telephone bills represent
the population of measurements ( 200). The population mean is

 i200
1 x i x42.19
1  x38.45
2  ...  x45.77
200
   43.59
200 200
WEIGHTED MEAN FOR DATA
GROUPED BY CATEGORIES OR
VARIANTS

 ik1 xi f i
x
 fi
When many of the measurements have the same value, the
measurement can be summarized in a frequency table. Suppose
the number of children in a sample of 16 families were recorded
as follows:

NUMBER OF CHILDREN 0 1 2 3
NUMBER OF FAMILIES 3 4 7 2
16 families

16
i 1 xi f i x1. f1  x2 f 2 ...  x16 f16 3(0)  4(1)  7(2)  2(3)
x    1.5
16 16 16
MEAN

 FOR TABULATED DATA BY CLASSES


APPROXIMATING DESCRIPTIVE
MEASURES FOR GROUPED DATA BY
CLASSES
 Approximating descriptive measures for grouped data may be
needed in two cases:
 when approximated values.suffices the needs,
 when only secondary grouped data are available.

 x f k
x midpoint
x i 1 i i
f frequency
 f k
i 1 i
 Example 3
 Approximate the mean (calculate the mean) of the telephone call
durations problem as represented by the frequency distribution

Class Class Frequency Midpoint


i limits fi xi xi fi
Real value :
1 2-5 3 3.5 10.5
x  10.26 2 5-8 6 6.5 39.0
3 8-11 8 9.5 76.0
…. …. … …. …. .
6 17-20 2 18.5 37.0

n =sample size= 30=f1+…+fn 312.0

8 11 14 17 20 More
6.5
The Median
•Important Measure of Central Tendency
•In an ordered array, the median is the
“middle” number.
•If n is odd, the median is the middle number.
•If n is even, the median is the average of the 2
middle numbers.
•Not Affected by Extreme Values
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Median = 5 Median = 5
THE MEDIAN
 The Median of a set of observations is the value that
falls in the middle when the observations are arranged
in order of magnitude or ranked increasingly

Example Comment

Find the median of the time spent on the internet Suppose only 9 adults were sampled
for the adults of example 1 (exclude, say, the longest time (33))

Even number of observations Odd number of observations

0, 0, 5,
0, 7,
5, 8,
7, 8, 9, 12,
9, 12,
14,14,
22,22,
33 33 0, 0, 5, 7, 8 9, 12, 14, 22
MEDIAN

 Data Tabulated discretely – as ungrouped


 Data Tabulated by classes - estimation
MEDIAN AND MODE

 Median

Me -1
1
( ni  1) - n i
2
Me  x 0  K i 1
n Me
The Mode
•A Measure of Central Tendency
•Value that Occurs Most Often
•Not Affected by Extreme Values
•There May Not be a Mode
•There May be Several Modes
•Used for Either Numerical or Categorical Data

0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

No Mode
Mode = 9
THE MODE
 The Mode of a set of observations is the variable
value that occurs most frequently.
 Set of data may have one mode (or modal class), or
two or more modes.

For large data sets


The modal class the modal class is
much more relevant
than a single-value
mode.
MEDIAN AND MODE

 Mode

1
Mo  x 0  K
1   2
RELATIONSHIP AMONG MEAN,
MEDIAN, AND MODE

 If a distribution is symmetrical, the mean, median and mode


coincide

• If a distribution is non symmetrical, and


skewed to the left or to the right, the
three measures differ.
A positively skewed distribution A negatively skewed distribution
(“skewed to the right”) (“skewed to the left”)

Mode Mean Mean Mode


Median Median
OTHER MEANS

 Harmonic
 Geometric
 Square
FRACTILES

 Quartiles: 3
 Percentiles: 99
Summary Measures
 x i  x 
2
Summary Measures s 
2
n 1

Central Tendency Variation

Mean Mode
n Median Range Coefficient of
xi Variation
i 1
n Variance
Standard Deviation
Measures of Variation
Variation

Variance Standard Deviation Coefficient of


Variation
Range Population
Population
Variance Standard S 
Deviation CV     100%
Sample
Sample
X 
Variance
Standard
Deviation
The Range
• Measure of Variation
• Difference Between Largest & Smallest
Observations:
Absolute Range = x La rgest  x Smallest
• Relative Range = ( xLargest  xSmallest) / mean

•Ignores How Data Are Distributed:


7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
INTERQUARTILE RANGE

 Can eliminate some outlier problems by using the interquartile


range

 Eliminate high- and low-valued observations and calculate the


range of the middle 50% of the data

 Interquartile range = 3rd quartile – 1st quartile


IQR = Q3 – Q1
INTERQUARTILE RANGE

Example:
Median X
X Q1 Q3 maximum
minimum (Q2)
25% 25% 25% 25%

12 30 45 57 70

Interquartile range
= 57 – 30 = 27
QUARTILES
 Quartiles split the ranked data into 4 segments
with an equal number of values per segment
25% 25% 25% 25%

Q Q Q
1 2 3
• The first quartile, Q1, is the value for which
25% of the observations are smaller and 75%
are larger
• Q2 is the same as the median (50% are
smaller, 50% are larger)
• Only 25% of the observations are greater than
QUARTILE FORMULAS

Find a quartile by determining the value in the


appropriate position in the ranked data, where

First quartile position: Q1 = 0.25(n+1)

Second quartile position: Q2 = 0.50(n+1)


(the median position)

Third quartile position: Q3 = 0.75(n+1)

where n is the number of observed values


QUARTILES

• Example: Find the first


quartile
Sample Ranked Data: 11 12 13 16 16 17 18 21 22

(n = 9)
Q1 = is in the 0.25(9+1) = 2.5 position of the
ranked data
so use the value half way between the 2nd and 3rd
values,

so Q1 = 12.5
DEVIATION

 Individual deviation from the mean = xi  mean

 Overall deviation = 0, because  X i X 0

 X X
2

 Summing squared deviations i

or
absolute values of the deviations
| x i x |
Variance
•Important Measure of Variation
•Shows Variation About the Mean
• Computed as an arithmetic mean of
squared deviations or as a square mean of
individual deviations
2 Xi   
2
•For the Population:  
N
 X i  X 
2
•For the Sample: s  2
n1
For the Population: use N in the For the Sample : use n - 1
denominator. in the denominator.
Standard Deviation
•Most Important Measure of Variation
•Shows Variation About the Mean:
•For the Population: 

 i
X   2

 X i  X 2
•For the Sample: s 
n 1

For the Population: use N in the For the Sample : use n - 1


denominator. in the denominator.
Sample Standard Deviation

 X i  X 
2
s 
n1

Data: Xi : 10 12 14 15 17 18 18 24

n=8 Mean =16

s= (10  16)2  (12  16)2  (14  16)2  (15  16)2  (17  16)2  (18  16)2  (24  16)2
81

= 4.2426
Comparing Standard Deviations
Data : X i : 10 12 14 15 17 18 18 24

N= 8 Mean =16

 X i  X 
2
s = = 4.2426
n 1
 X i   
2
  = 3.9686
N

Value for the Standard Deviation is larger for data considered as a Sample.
Comparing Standard Deviations
Data A - AGE
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 3.338

Data B - AGE
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = .9258
Data C - AGE
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 4.57
COEFFICIENT OF VARIATION

Measure of Relative Variation


Always a % or coefficient
Shows Variation Relative to Mean
Used to Compare 2 or More Groups
Formula ( for Sample):

S 
CV     100%
X 
COMPARING COEFFICIENT OF VARIATION
 Stock A: Average Price last year = $50
 Standard Deviation (sd) = $5
 Stock B: Average Price last year = $100
 (sd) = $5

Coefficient of Variation:
Stock A: CV = 10%
S 
CV     100% Stock B: CV = 5%
X 
Both average prices are
representatives
SHAPE
 Describes How Data Are Distributed between smallest and largest
values
 Measures of Shape:
 Symmetric or skewed

Left-Skewed or Right-Skewed or
Positive Skew-ness Symmetric Positively Skewed
Mean Median Mod Mean = Median = Mode Mode Median Mean
e
BOX PLOT – GRAPHICAL
PRESENTATION OF CTM
CENTRAL TENDENCY MEASURES
SUMMARY FOR 1 VARIABLE
 Discussed Measures of Central Tendency
 Mean, Median, Mode
 Addressed Measures of Variation
 The Range, Variance,
 Standard Deviation, Coefficient of Variation
 Determined Shape of Distributions
 Symmetric or Skewed
Coefficient of skewness

Mean Median Mode Mean = Median = Mode Mode Median Mean

Вам также может понравиться