Вы находитесь на странице: 1из 26

Inspire…Educate…Transform.

Statistics and Probability in


Decision Modeling
Linear Regression

Dr. L. Srinivasa Varadharajan


srinivasa.varadharajan@insofe.edu.in

Thanks to Dr. Sridhar Pappu for the material.


The BEST GLOBAL DESTINATION for individuals and organizations to learn and adopt disruptive technologies for solving business and society’s challenges
Linear Regression

The BEST GLOBAL DESTINATION for individuals and organizations to learn and adopt disruptive technologies for solving business and society’s challenges 2
Linear Regression Walking Discipline - Sridhar Pappu
(Mi Band 2 data)
Walking Discipline - Sridhar Pappu 100
(Mi Band 2 data) 90 y = 2.7576x + 9.2606
100 80 R² = 0.99248

Ahead of % people)
90 70
80 60
Ahead of % people)

70 50
60 40
50 30

40 20

30 10

20 0
0 5 10 15 20 25 30
10
Streak (# of days continuously meeting goal of 8000 steps)
0
0 5 10 15 20 25 30
Streak (# of days continuously meeting goal of 8000 steps)

The BEST GLOBAL DESTINATION for individuals and organizations to learn and adopt disruptive technologies for solving business and society’s challenges 3
Linear Regression
Walking Discipline - Sridhar Pappu
(Mi Band 2 data)
100
90
80
Ahead of % people

70
60
50
40
30
20
10
0
0 5 10 15 20 25 30
Streak (# of days continuously meeting goal of 8000 steps)

Be careful when extrapolating.


Extrapolation is done assuming that the same process that
generated observed data is continuing in the unseen region as well.

The BEST GLOBAL DESTINATION for individuals and organizations to learn and adopt disruptive technologies for solving business and society’s challenges 4
How to Pick the Best Model?
𝑦 = 𝛽0 + 𝛽1 𝑥 + 𝜀 (Probabilistic model)
𝑦 = 𝐸(𝑌|𝑋 = 𝑥) + 𝜀
Recall: Conditional Expected Value…Conditional Expectation of a Random Variable…Conditional Mean of a Random Variable

The lines whose residual error on all


points is the least is the best line.

y To ensure residual errors don’t cancel,


we take squares of residual errors.

The BEST GLOBAL DESTINATION for individuals and organizations to learn and adopt disruptive technologies for solving business and society’s challenges 5
The BEST GLOBAL DESTINATION for individuals and organizations to learn and adopt disruptive technologies for solving business and society’s challenges 6
Burgernomics: Overvalued or Undervalued Currencies?
• Big Mac price in the US: $ 5.51
• Maharaja Mac price in India: Rs 173
• Implied PPP is 173/5.51 = Rs 31.3975/$
• Actual exchange rate = Rs 71.5565/$
31.3975−71.5565
• = −0.56
71.5565
• Rupee undervalued by 56% against the USD

Global prices for a Big Mac in July 2018 based on a survey


based on data from IMF, McDonald’s, Thomson Reuters,
Eurostat and The Economist Source: https://www.economist.com/news/2018/07/11/the-big-mac-index
Last accessed: December 13, 2018

The BEST GLOBAL DESTINATION for individuals and organizations to learn and adopt disruptive technologies for solving business and society’s challenges 7
Burgernomics by UBS Wealth Management Research

The BEST GLOBAL DESTINATION for individuals and organizations to learn and adopt disruptive technologies for solving business and society’s challenges 8
Burgernomics

Source: http://www.economist.com/content/big-mac-index
Last accessed: March 04, 2016
The BEST GLOBAL DESTINATION for individuals and organizations to learn and adopt disruptive technologies for solving business and society’s challenges 9
Determining the Equation of the Regression Line - Excel

The BEST GLOBAL DESTINATION for individuals and organizations to learn and adopt disruptive technologies for solving business and society’s challenges 10
Determining the Equation of the Regression Line - Excel

The BEST GLOBAL DESTINATION for individuals and organizations to learn and adopt disruptive technologies for solving business and society’s challenges 11
Sample Software Output

The BEST GLOBAL DESTINATION for individuals and organizations to learn and adopt disruptive technologies for solving business and society’s challenges 12
WAYS OF TESTING HOW WELL
THE REGRESSION LINE FITS DATA

The BEST GLOBAL DESTINATION for individuals and organizations to learn and adopt disruptive technologies for solving business and society’s challenges 13
Assumptions of the Regression Model –
Residuals Analysis
Zero residual line:
The model is linear The regression line
Walking Discipline - Sridhar Pappu
(Mi Band 2 data)
120
y = 0.8201x + 35.373
R² = 0.88699
100

Ahead of % people
80

60

40

20

0
0 20 40 60 80 100
Streak (# of days continuously meeting goal of 8000
steps)

The BEST GLOBAL DESTINATION for individuals and organizations to learn and adopt disruptive technologies for solving business and society’s challenges 14
Assumptions of the Regression Model
The error terms are independent
– Plot against any time or spatial variables
where order of observation is important.
Independent

– Time series methods are more appropriate


in such situations than regular regression.

Dependent

The BEST GLOBAL DESTINATION for individuals and organizations to learn and adopt disruptive technologies for solving business and society’s challenges 15
Assumptions of the Regression Model
The error terms have constant
variances (homoscedasticity as
opposed to heteroscedasticity)
Heteroscedastic
– RMSE (Root Mean Square Error) of Regression or
Standard Error of the Estimate will be misleading as
it will underestimate the spread for some 𝑥𝑖 and
overestimate for others.

Homoscedastic

The BEST GLOBAL DESTINATION for individuals and organizations to learn and adopt disruptive technologies for solving business and society’s challenges 16
Assumptions of the Regression Model
The error terms are normally distributed

Normal Not Normal

The quantile-quantile (q-q) plot


x-axis: Theoretical quantiles in a standard normal distribution
y-axis: Observed quantiles in the sample

The BEST GLOBAL DESTINATION for individuals and organizations to learn and adopt disruptive technologies for solving business and society’s challenges 17
Q-Q plot (Excel)
Quantiles are cutpoints dividing the range of a probability distribution into contiguous
intervals with equal probabilities, or dividing the observations in a sample in the same way.
https://en.wikipedia.org/wiki/Quantile

The quantile-quantile (q-q) plot is used to validate distributional assumptions of a data set.

In linear regression, this data set is the residual errors.

If the normality assumption holds true, then the z-scores of the residuals should be equal
to the expected z-scores at corresponding quantiles (of a normal distribution).

The BEST GLOBAL DESTINATION for individuals and organizations to learn and adopt disruptive technologies for solving business and society’s challenges 18
Q-Q Plot Example (Excel) • 11 data points cover 100% area
• Each data point represents
1/11*100 = 9.09% area (or 0.091)
• Each data point considered as
mid-point of each of 11 bins
2.00

1.50

1.00

0.50

0.00

-0.50

-1.00

-1.50

-2.00
-2.00 -1.50 -1.00 -0.50 0.00 0.50 1.00 1.50 2.00

-1.08 -0.47 0 0.47 1.08


-1.64 -0.74 -0.23 0.23 0.74 1.64

The BEST GLOBAL DESTINATION for individuals and organizations to learn and adopt disruptive technologies for solving business and society’s challenges 19
Interpreting Residuals
http://www.stat.berkeley.edu/~stark/SticiGui/Text/regressionDiagnostics.htm

The BEST GLOBAL DESTINATION for individuals and organizations to learn and adopt disruptive technologies for solving business and society’s challenges 20
Residual Analysis – Big Mac
Which assumption is getting violated?

The BEST GLOBAL DESTINATION for individuals and organizations to learn and adopt disruptive technologies for solving business and society’s challenges 21
Residuals – Big Mac
Is a wrong model fitted (linear or quadratic, etc.)? Are the residuals normally distributed?
USA
Japan Japan
USA

Brazil Brazil

Is the data homoscedastic? Are there influential outliers?


Brazil
Japan USA
Switzerland

Sweden
Brazil

The BEST GLOBAL DESTINATION for individuals and organizations to learn and adopt disruptive technologies for solving business and society’s challenges 22
Caution – Is there heteroscedasticity here?

The BEST GLOBAL DESTINATION for individuals and organizations to learn and adopt disruptive technologies for solving business and society’s challenges 23
Fixing Non-normality and Heteroscedasticity
Transformation of data (square root, logarithm, etc.) can
help correct normality and unequal variances problems.

The BEST GLOBAL DESTINATION for individuals and organizations to learn and adopt disruptive technologies for solving business and society’s challenges 24
Influential Observations – Rules of Thumb
• If Cook’s D of any observation (Di) > 1, that observation can be considered
as having too much influence, but investigate values greater than 0.5 also.

• Relative size interpretation: In general, investigate any value that is very


different from the rest.

The BEST GLOBAL DESTINATION for individuals and organizations to learn and adopt disruptive technologies for solving business and society’s challenges 25
HYDERABAD PUNE
2nd Floor, Jyothi Imperial, Vamsiram Builders, Old Kirloskar - Pune
Mumbai Highway, Gachibowli, Hyderabad - 500 032 S. L. Kirloskar Center for Executive Education,
+91-9701685511 (Individuals) Kirloskar Corporate Office, 8th Floor,
+91-9618483483 (Corporates) Cello Platina, Model Colony, Shivaji Nagar – 411005
BENGALURU MUMBAI
Floors 1-3, L77, 15th Cross Road, 3A Main Road Kanakia Wall Street, 4th Floor, Andheri-Kurla Road
Sector 6, HSR Layout, Bengaluru – 560 102 Chakala, Andheri East, Mumbai - 400093
+91-9502334561 (Individuals)
+91-9502799088 (Corporates)

Web: http://www.insofe.edu.in
Facebook: https://www.facebook.com/insofe
Twitter: https://twitter.com/Insofeedu
YouTube: http://www.youtube.com/InsofeVideos
SlideShare: http://www.slideshare.net/INSOFE
LinkedIn: http://www.linkedin.com/company/international-school-of-engineering

This presentation may contain references to findings of various reports available in the public domain. INSOFE makes no representation as to their accuracy or that the organization subscribes to those findings.

The BEST GLOBAL DESTINATION for individuals and organizations to learn and adopt disruptive technologies for solving business and society’s challenges 26

Вам также может понравиться