Академический Документы
Профессиональный Документы
Культура Документы
Session 8
Session Speaker
K.M. Sharath Kumar
Basic
PEMP-GP-POM
Statistics
M.S Ramaiah School of Advanced Studies - Bangalore
PEMP-GP-POM
Session Objectives
To describe the relevance of statistics and model building process
PEMP-GP-POM
Session Outline
Introduction to Statistics
Representation of Data
Normal Distribution
Skewness and Kurtosis
Correlation and regression
PEMP-GP-POM
PEMP-GP-POM
Business
Finance
Economy
M.S Ramaiah School of Advanced Studies - Bangalore
PEMP-GP-POM
Statistics
Include numerical facts and figures
Earth quake
Male Prisoners
Elderly Population
M.S Ramaiah School of Advanced Studies - Bangalore
PEMP-GP-POM
Statistics
Rely upon data
Rely upon how data are selected/chosen and statistics are
interpreted
Sometime have problematic interpretations
Consider some examples
PEMP-GP-POM
Ice-cream Sales
PEMP-GP-POM
10
PEMP-GP-POM
What is Statistics?
11
PEMP-GP-POM
A Case
You hear a commercial that 80% of children prefer to eat a
certain kind of noodles for breakfast . What do you conclude ?
1.
2.
3.
12
PEMP-GP-POM
13
PEMP-GP-POM
Benjamin Disraeli
Former British Prime Minister
14
PEMP-GP-POM
Importance of Statistics
Complete information about any subject matter may not be
always available
15
PEMP-GP-POM
16
PEMP-GP-POM
17
PEMP-GP-POM
18
PEMP-GP-POM
19
PEMP-GP-POM
Data
20
PEMP-GP-POM
Qualitative Categorical or
Nominal:
Examples are Color
Gender
Nationality
Quantitative Measurable or
Countable:
Examples are Temperatures
Salaries
Number of points
scored on a 100
point exam
21
PEMP-GP-POM
Scales of Measurement
Nominal Scale - groups or classes
Gender
Ordinal Scale - order matters
Ranks (top ten videos)
Interval Scale - difference or distance matters has
arbitrary zero value
Temperatures (0F, 0C)
Ratio Scale - Ratio matters has a natural zero value
Salaries
MS Ramaiah School of Advanced Studies - Bangalore
22
PEMP-GP-POM
Population
Sample 2
Population
23
PEMP-GP-POM
24
PEMP-GP-POM
Population (N)
Sample (n)
25
PEMP-GP-POM
Why Sample?
Census of a population may be:
Impossible
Impractical
Too costly
26
PEMP-GP-POM
27
PEMP-GP-POM
Tabulation
28
PEMP-GP-POM
29
PEMP-GP-POM
30
PEMP-GP-POM
Example
To prove children that study more leads to better
grades
Develop a questionnaire to collect data
31
PEMP-GP-POM
Example
32
PEMP-GP-POM
Example continued
33
PEMP-GP-POM
CROSS TABULATION
34
PEMP-GP-POM
CROSS TABULATION
Output Variable
Very
Good
Good
Average
Below
Average
Poor
03
3-6
6 - 12
35
PEMP-GP-POM
CROSS TABULATION
Example:
A project was undertaken to improve the CSat score of transaction processing.
Based on brainstorming, the project team concluded that lack of experience is a
cause of low CSat score.
The following data was collected. Analyze the data and verify whether lack of
experience is a cause of low CSat score
Experience
(Months)
CSat Score
VD
VS
03
50
40
30
10
10
3- 6
30
50
35
6 - 12
30
40
50
Note: Table gives the count of CSat score of 1, 2 etc for each group of
agents
MS Ramaiah School of Advanced Studies - Bangalore
36
PEMP-GP-POM
CROSS TABULATION
Example:
Step 1:
Take the column wise (output variable category wise) total
Experience
(Months)
CSat Score
VD
VS
03
50
40
30
10
10
3- 6
30
50
35
6 - 12
30
40
50
Total
61
77
110
85
67
37
PEMP-GP-POM
CROSS TABULATION
Example:
Step 2:
Take the percentages based on column total
Experience
(Months)
CSat Score
03
VD
81.97
D
51.95
N
27.27
S
11.76
VS
14.93
3- 6
8.20
38.96
45.45
41.18
10.45
6 - 12
9.84
9.09
27.27
47.06
74.63
Total
100
100
100
100
100
38
PEMP-GP-POM
Representation of Data
Data can be represented in two ways
Representation of data
Graphical
representation
Numerical
representation
39
PEMP-GP-POM
Graphical representation
Bar chart
Pie chart
Histogram chart
Frequency
10
8
6
4
2
0
40
50
60
70
TVSM-Mysore
80
90
100
40
PEMP-GP-POM
Frequency Distributions
A frequency distribution is an organisation of raw data into
tabular form using classes (or intervals) and frequencies.
Frequency count: The frequency or frequency count for a
data value is the number of times the value occurs in the data
set.
41
PEMP-GP-POM
Source : Statistics and Probability for Engineering Applications By W L DeCoirsey , college of engineering, University of
Saskatchhewan , Saskatoon.
42
PEMP-GP-POM
43
PEMP-GP-POM
44
PEMP-GP-POM
Example continued
45
PEMP-GP-POM
46
PEMP-GP-POM
47
PEMP-GP-POM
Example:
The weights (in pounds ) of 30 female students in biology
class of a college are given below.
Summarise the information with a frequency
distribution using seven classes.
143 151 136 127 132 132 126 138 119 104
113 90 126 123 121 133 104 99 112 129
107 139 122 137 112 121 140 134 133 123
48
PEMP-GP-POM
Frequency table
49
PEMP-GP-POM
Histogram
50
PEMP-GP-POM
Numerical Representation
How to characterise numerical data?
Central tendency
Spread
51
PEMP-GP-POM
Numerical representation
Central Tendency
Measures of central tendency
Mean
Median
Mode
52
PEMP-GP-POM
Central tendency
Mean
Mean: It is the arithmetic average.
Consider a list x1, x2 , . . . . , xn of n data values. Then
Mean =
xi
N
&
Sample Mean
53
PEMP-GP-POM
Central tendency
Mean
Example
Calculate the mean for
10, 12, 22, 18, 25, 15
Solution
x =
xi
N
= 10+12+22+18+25+15
6
= 102/6 = 17
M.S Ramaiah School of Advanced Studies - Bangalore
54
PEMP-GP-POM
Central tendency
Median
~
Median
~
x =
[(n+1)/2]th term
When n is odd,
when n is even.
~
Note that x is the average of the (n/2)th and [(n/2)+1]th terms
when n is even
M.S Ramaiah School of Advanced Studies - Bangalore
55
PEMP-GP-POM
Central tendency
Median
Example1
Calculate the median for
10, 12, 22, 18, 25, 15
Solution
Arrange the numbers in ascending order
10, 12, 15, 18, 22, 25
~
x
Example 2
Calculate the median for
10, 12, 22, 18, 15
Solution
Arrange the numbers in
ascending order
10, 12, 15, 18, 22
= [(n+1)/2]th term
~
= 3rd term
= 3rd term + 4th term
2
= 15
= 15 + 18 / 2 = 33/2 = 16.5
M.S Ramaiah School of Advanced Studies - Bangalore
56
PEMP-GP-POM
Mode
.
. . . . : . : : : . . . .
--------------------------------------------------------------6
9 10 12 13 14 15 16 17 18 19 20 21 22 24
Mode = 16
The mode is the most frequently occurring value. It
is the value with the highest frequency.
M.S Ramaiah School of Advanced Studies - Bangalore
57
PEMP-GP-POM
Numerical representation
Spread
Measures of spread
Range
Standard deviation
Inter quartile range
58
PEMP-GP-POM
Spread
Range
59
PEMP-GP-POM
Spread
Standard Deviation
x)
(x
x)
Sample
1
2
n
i
s2 =
=
variance
n-1
n-1
60
PEMP-GP-POM
Spread
Standard Deviation
Example
Calculate the standard deviation for 10, 12, 22, 18, 25, 15
Solution
Variance, s2 = {(10-17) 2 + (12-17) 2 + (22-17) 2 + (18-17) 2
+ (25-17) 2 + (15-17) 2}
6-1
= 33.6
S.D, s = s2 = 33.6 = 5.797
61
PEMP-GP-POM
Spread
IQR = Q3 - Q1
These values are plotted
in a box called box plot.
Max value
IQR
3rd Quartile
First Quartile
Min value
62
PEMP-GP-POM
Exercise
Calculate the
1.
Mean
2.
Median
3.
Range
4.
Variance
5.
Standard deviation
6.
4, 6, 10, 7, 6, 9
63
PEMP-GP-POM
64
PEMP-GP-POM
Continuous distribution
Normal Distribution
Example
Physiological characteristics (weight, height, size, etc), yield of
agricultural crops, experimental and measurement error
M.S Ramaiah School of Advanced Studies - Bangalore
65
PEMP-GP-POM
Standard deviation,
Mean,
=0
=1
Standardized normal distribution is represented by
x-
= u or z
66
PEMP-GP-POM
68%
93%
99.73%
-3
-2
-1
+1 +2 +3
67
PEMP-GP-POM
68%
Formulae
-3
-2
-1
+1
+2
+3
Values
224.5
233
241.5
250
258.5
267
275.5
93%
99.73%
-3
-2
-1
+1
+2
+3
68
PEMP-GP-POM
Skewness
Measure of asymmetry of a frequency distribution
Skewed to left
Symmetric or unskewed
Skewed to right
Kurtosis
Measure of flatness or peakedness of a frequency
distribution
Platykurtic (relatively flat)
Mesokurtic (normal)
Leptokurtic (relatively peaked)
M.S Ramaiah School of Advanced Studies - Bangalore
69
PEMP-GP-POM
Skewness
Skewed to left
70
PEMP-GP-POM
Skewness
Symmetric
71
PEMP-GP-POM
Skewness
Skewed to right
72
PEMP-GP-POM
Kurtosis
Platykurtic - flat distribution
73
PEMP-GP-POM
Kurtosis
Mesokurtic - not too flat and not too peaked
74
PEMP-GP-POM
Kurtosis
Leptokurtic - peaked distribution
75
PEMP-GP-POM
76
PEMP-GP-POM
Scatter Diagram
This is a diagram, used to study and
identify the possible relationship
between two variables.
Also, it can be used to establish
Existence of Correlation
Type of Correlation
Strength of the relation
77
PEMP-GP-POM
78
PEMP-GP-POM
Correlation
Statistical measure of the degree of association between
two variables
Is there a relationship between speed at which a car travels
and rate at which it consumes fuel
Car Speed and Fuel
Consumption
Speed
20
25
30
35
40
45
50
55
60
Cons..
22
21
20
23
19
18
16
14
11
79
PEMP-GP-POM
10
20
30
40
50
60
70
Speed in Km/hr
80
PEMP-GP-POM
81
PEMP-GP-POM
Speed
Computing
Linear correlation
co-efficient
r = - 0. 906
Totals
Con
xy
x2
y2
20
22
440
400
484
25
21
525
625
441
30
20
600
900
400
35
23
805
1225
529
40
19
760
1600
361
45
18
810
2025
324
50
16
800
2500
256
55
14
770
3025
196
60
11
660
3600
121
360
164
6170
15,900
3112
82
PEMP-GP-POM
Interpretation of r
If the computed r, has a value greater than +1 or less than -1 then
error must be present in the computations
Strong positive correlation is indicated by r near to + 1
Strong negative correlation is indicated by r near to 1
If r is near to 0 indicates less or no significant correlation
In general, correlation greater than 0.7 is generally described
as strong, whereas a correlation less than 0.7 is generally
described as weak
M.S Ramaiah School of Advanced Studies - Bangalore
83
PEMP-GP-POM
Exercise
84
PEMP-GP-POM
Regression
Regression helps derive a relation between two
sets of data (Cause and Effect).
Understanding equation for straight line
y = a + bx
b = slope of the line
a = y-axis intercept
y = Estimated average (mean) value of dependent variable for
a given value of independent variable x
85
PEMP-GP-POM
86
PEMP-GP-POM
y = na + b x
xy = a x + b x
87
PEMP-GP-POM
88
PEMP-GP-POM
An Example
Use
least
squares
regression line to estimate
the increase in sales
revenue expected from an
increase of 7.5 percent in
advertising expenditure
Firm
Annual
percentage
increase in ad
budget
Annual
percentage
increase in sales
budget
11
14
89
PEMP-GP-POM
Ad Expenditure
x
x2
xy
16
36
24
64
48
81
72
11
121
88
14
196
126
Total = 40
56
524
373
90
PEMP-GP-POM
y = na + b x
xy = a x + b x
40 = 8a + 56b
2
Y = a + bx = 0.072 + 0.704x
M.S Ramaiah School of Advanced Studies - Bangalore
91
PEMP-GP-POM
Contd.
For x = 7.5% or 0.075 increase in advertising
expenditure, the estimated increase in sales revenue will
be
y = 0.072 + 0.704(0.075) = 0.1248 or 12.48 %
92
PEMP-GP-POM
Exercise
Big bazaar is hopeful that its sales are rising significantly
week by week. Treating the sales for the previous six weeks
as a typical rising trend, recorded them in 1000s and
analysed the results
Week:
1
2
3
4
5
6
Sales: 2.69 2.62 2.80 2.70 2.75 2.81
Fit a linear regression equation to estimate expected sales
for the 7th week
93
PEMP-GP-POM
y = a + b1 x1 + b2 x2 + .................... + bn xn
M.S Ramaiah School of Advanced Studies - Bangalore
94
PEMP-GP-POM
95
PEMP-GP-POM
X1 = Kms traveled
Y = Travel time
100
9.3
50
4.8
100
8.9
100
6.5
50
4.2
80
6.2
75
7.4
65
6.0
90
7.6
10
90
6.1
96
PEMP-GP-POM
97
PEMP-GP-POM
98
PEMP-GP-POM
Summary (1/2)
Descriptive Statistics:
It includes collection, organising, summarising, graphical
display from the data
Inferential Statistics:
It includes making inferences, hypothesis
determining relationships, making prediction
testing,
99
PEMP-GP-POM
Summary (2/2)
Data Collection and Questionnaire Design:
Scaling techniques play a pivotal role for collecting right
sample for conducting statistical analysis
Normal Distribution:
Describes many natural phenomena, industrial and scientific
situations. A normal curve is a graphical representation of
the mathematical expression used to describe the normal
distribution
Data Analysis Package:
Helps to perform bi-variate and multi-variate analysis faster
M.S Ramaiah School of Advanced Studies - Bangalore
100