Measures of Location and VARIATION For 1 Variable

MEASURES OF LOCATION and
VARIATION for 1 variable

Lectures 3+4+5 Topics
•Measures of Central Tendency for
numerical and categorical data
Mean, Median, Mode + other means, Fractiles
•Measures of Variation for numerical and
binary data
The Range, Variance and
Standard Deviation
•Shape
Symmetric, Skewed, Skewness, Kurtosis
Summary Measures
Summary Measures
Central Tendency part Variation

of Location
Mean Mode Fractiles
Median Range Coefficient of
Variation
Variance
Standard Deviation
Measures of Central Tendency
Central Tendency
Mean Median Mode

n
xi
i 1
n
The Mean (Arithmetic mean,
Average)
•It is the Arithmetic Average of data values:
x 
n
 xi xi  x2      xn
i 1

Sample Mean n n
•The Most Common Measure of Central Tendency
•Affected by Extreme Values (Outliers)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 5 Mean = 6
THE ARITHMETIC
MEAN
 This is the most popular and useful measure of central location
Sum of the observations

Mean =
Number of observations
THE ARITHMETIC
MEAN
Sample mean Population mean

n N
 i11 xxii i1 x i
x 
n N
Sample size Population size

The arithmetic
mean
THE ARITHMETIC
MEAN
• Example 1
The reported time spent on the Internet of 10 adults are 0, 7, 12, 5,
33, 14, 8, 0, 9, 22 hours. Find the mean time spent on the Internet.
 i 1 xi
10
0x1  7x2  ...  22
x10
x   11.0 hours
10 10
• Example 2
Suppose the telephone bills represent
the population of measurements ( 200). The population mean is
 i200
1 x i x42.19
1  x38.45
2  ...  x45.77
200
   43.59
200 200
WEIGHTED MEAN FOR DATA
GROUPED BY CATEGORIES OR
VARIANTS
 ik1 xi f i
x
 fi
When many of the measurements have the same value, the
measurement can be summarized in a frequency table. Suppose
the number of children in a sample of 16 families were recorded
as follows:
NUMBER OF CHILDREN 0 1 2 3
NUMBER OF FAMILIES 3 4 7 2
16 families
16
i 1 xi f i x1. f1  x2 f 2 ...  x16 f16 3(0)  4(1)  7(2)  2(3)
x    1.5
16 16 16
MEAN
 FOR TABULATED DATA BY CLASSES

APPROXIMATING DESCRIPTIVE
MEASURES FOR GROUPED DATA BY
CLASSES
 Approximating descriptive measures for grouped data may be
needed in two cases:
 when approximated values.suffices the needs,
 when only secondary grouped data are available.
 x f k
x midpoint
x i 1 i i
f frequency
 f k
i 1 i
 Example 3
 Approximate the mean (calculate the mean) of the telephone call
durations problem as represented by the frequency distribution
Class Class Frequency Midpoint

i limits fi xi xi fi
Real value :
1 2-5 3 3.5 10.5
x  10.26 2 5-8 6 6.5 39.0
3 8-11 8 9.5 76.0
…. …. … …. …. .
6 17-20 2 18.5 37.0
n =sample size= 30=f1+…+fn 312.0
8 11 14 17 20 More
6.5
The Median
•Important Measure of Central Tendency
•In an ordered array, the median is the
“middle” number.
•If n is odd, the median is the middle number.
•If n is even, the median is the average of the 2
middle numbers.
•Not Affected by Extreme Values
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5 Median = 5
THE MEDIAN
 The Median of a set of observations is the value that
falls in the middle when the observations are arranged
in order of magnitude or ranked increasingly
Example Comment
Find the median of the time spent on the internet Suppose only 9 adults were sampled
for the adults of example 1 (exclude, say, the longest time (33))
Even number of observations Odd number of observations
0, 0, 5,
0, 7,
5, 8,
7, 8, 9, 12,
9, 12,
14,14,
22,22,
33 33 0, 0, 5, 7, 8 9, 12, 14, 22
MEDIAN
 Data Tabulated discretely – as ungrouped

 Data Tabulated by classes - estimation
MEDIAN AND MODE
 Median
Me -1
1
( ni  1) - n i
2
Me  x 0  K i 1
n Me
The Mode
•A Measure of Central Tendency
•Value that Occurs Most Often
•Not Affected by Extreme Values
•There May Not be a Mode
•There May be Several Modes
•Used for Either Numerical or Categorical Data
0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
No Mode
Mode = 9
THE MODE
 The Mode of a set of observations is the variable
value that occurs most frequently.
 Set of data may have one mode (or modal class), or
two or more modes.
For large data sets

The modal class the modal class is
much more relevant
than a single-value
mode.
MEDIAN AND MODE
 Mode
1
Mo  x 0  K
1   2
RELATIONSHIP AMONG MEAN,
MEDIAN, AND MODE
 If a distribution is symmetrical, the mean, median and mode

coincide
• If a distribution is non symmetrical, and

skewed to the left or to the right, the
three measures differ.
A positively skewed distribution A negatively skewed distribution
(“skewed to the right”) (“skewed to the left”)
Mode Mean Mean Mode

Median Median
OTHER MEANS
 Harmonic
 Geometric
 Square
FRACTILES
 Quartiles: 3
 Percentiles: 99
Summary Measures
 x i  x 
2
Summary Measures s 
2
n 1
Central Tendency Variation
Mean Mode
n Median Range Coefficient of
xi Variation
i 1
n Variance
Standard Deviation
Measures of Variation
Variation
Variance Standard Deviation Coefficient of

Variation
Range Population
Population
Variance Standard S 
Deviation CV     100%
Sample
Sample
X 
Variance
Standard
Deviation
The Range
• Measure of Variation
• Difference Between Largest & Smallest
Observations:
Absolute Range = x La rgest  x Smallest
• Relative Range = ( xLargest  xSmallest) / mean
•Ignores How Data Are Distributed:

7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
INTERQUARTILE RANGE
 Can eliminate some outlier problems by using the interquartile

range
 Eliminate high- and low-valued observations and calculate the

range of the middle 50% of the data
 Interquartile range = 3rd quartile – 1st quartile

IQR = Q3 – Q1
INTERQUARTILE RANGE
Example:
Median X
X Q1 Q3 maximum
minimum (Q2)
25% 25% 25% 25%
12 30 45 57 70
Interquartile range
= 57 – 30 = 27
QUARTILES
 Quartiles split the ranked data into 4 segments
with an equal number of values per segment
25% 25% 25% 25%
Q Q Q
1 2 3
• The first quartile, Q1, is the value for which
25% of the observations are smaller and 75%
are larger
• Q2 is the same as the median (50% are
smaller, 50% are larger)
• Only 25% of the observations are greater than
QUARTILE FORMULAS
Find a quartile by determining the value in the

appropriate position in the ranked data, where
First quartile position: Q1 = 0.25(n+1)
Second quartile position: Q2 = 0.50(n+1)

(the median position)
Third quartile position: Q3 = 0.75(n+1)
where n is the number of observed values

QUARTILES
• Example: Find the first

quartile
Sample Ranked Data: 11 12 13 16 16 17 18 21 22
(n = 9)
Q1 = is in the 0.25(9+1) = 2.5 position of the
ranked data
so use the value half way between the 2nd and 3rd
values,
so Q1 = 12.5
DEVIATION
 Individual deviation from the mean = xi  mean
 Overall deviation = 0, because  X i X 0
 X X
2
 Summing squared deviations i
or
absolute values of the deviations
| x i x |
Variance
•Important Measure of Variation
•Shows Variation About the Mean
• Computed as an arithmetic mean of
squared deviations or as a square mean of
individual deviations
2 Xi   
2
•For the Population:  
N
 X i  X 
2
•For the Sample: s  2
n1
For the Population: use N in the For the Sample : use n - 1
denominator. in the denominator.
Standard Deviation
•Most Important Measure of Variation
•Shows Variation About the Mean:
•For the Population: 

 i
X   2
 X i  X 2
•For the Sample: s 
n 1
For the Population: use N in the For the Sample : use n - 1

denominator. in the denominator.
Sample Standard Deviation
 X i  X 
2
s 
n1
Data: Xi : 10 12 14 15 17 18 18 24
n=8 Mean =16
s= (10  16)2  (12  16)2  (14  16)2  (15  16)2  (17  16)2  (18  16)2  (24  16)2
81
= 4.2426
Comparing Standard Deviations
Data : X i : 10 12 14 15 17 18 18 24
N= 8 Mean =16
 X i  X 
2
s = = 4.2426
n 1
 X i   
2
  = 3.9686
N
Value for the Standard Deviation is larger for data considered as a Sample.
Comparing Standard Deviations
Data A - AGE
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 3.338
Data B - AGE
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = .9258
Data C - AGE
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 4.57
COEFFICIENT OF VARIATION
Measure of Relative Variation

Always a % or coefficient
Shows Variation Relative to Mean
Used to Compare 2 or More Groups
Formula ( for Sample):
S 
CV     100%
X 
COMPARING COEFFICIENT OF VARIATION
 Stock A: Average Price last year = $50
 Standard Deviation (sd) = $5
 Stock B: Average Price last year = $100
 (sd) = $5
Coefficient of Variation:
Stock A: CV = 10%
S 
CV     100% Stock B: CV = 5%
X 
Both average prices are
representatives
SHAPE
 Describes How Data Are Distributed between smallest and largest
values
 Measures of Shape:
 Symmetric or skewed
Left-Skewed or Right-Skewed or
Positive Skew-ness Symmetric Positively Skewed
Mean Median Mod Mean = Median = Mode Mode Median Mean
e
BOX PLOT – GRAPHICAL
PRESENTATION OF CTM
CENTRAL TENDENCY MEASURES
SUMMARY FOR 1 VARIABLE
 Discussed Measures of Central Tendency
 Mean, Median, Mode
 Addressed Measures of Variation
 The Range, Variance,
 Standard Deviation, Coefficient of Variation
 Determined Shape of Distributions
 Symmetric or Skewed
Coefficient of skewness
Mean Median Mode Mean = Median = Mode Mode Median Mean

Measures of Location and VARIATION For 1 Variable

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Measures of Location and VARIATION For 1 Variable

Загружено:

Авторское право:

Доступные форматы

MEASURES OF LOCATION and

VARIATION for 1 variable

Central Tendency part Variation

Mean Median Mode

Sum of the observations

Sample mean Population mean

Sample size Population size

 FOR TABULATED DATA BY CLASSES

Class Class Frequency Midpoint

n =sample size= 30=f1+…+fn 312.0

Even number of observations Odd number of observations

 Data Tabulated discretely – as ungrouped

For large data sets

 If a distribution is symmetrical, the mean, median and mode

• If a distribution is non symmetrical, and

Mode Mean Mean Mode

Central Tendency Variation

Variance Standard Deviation Coefficient of

•Ignores How Data Are Distributed:

 Can eliminate some outlier problems by using the interquartile

 Eliminate high- and low-valued observations and calculate the

 Interquartile range = 3rd quartile – 1st quartile

Find a quartile by determining the value in the

First quartile position: Q1 = 0.25(n+1)

Second quartile position: Q2 = 0.50(n+1)

Third quartile position: Q3 = 0.75(n+1)

where n is the number of observed values

• Example: Find the first

 Individual deviation from the mean = xi  mean

 Overall deviation = 0, because  X i X 0

 Summing squared deviations i

For the Population: use N in the For the Sample : use n - 1

n=8 Mean =16

Measure of Relative Variation

Mean Median Mode Mean = Median = Mode Mode Median Mean

Вам также может понравиться