Econ101 110501195141 Phpapp02

What is econometrics?
Simple, non-technical introduction on Linear Regression/OLS as a technique
About this document

This document is not meant for presentation and is best viewed together in slideshow or printed format. It is meant to be read, not presented
This document also covers the very basics of Econometrics. Econometrics as a subject is theoretically complex. The goal of this document is to empower the reader with an understanding of econometrics so she/he can discuss the topic with some confidence
About this document

This document assumes zero-knowledge in econometrics and in linear regression It may appear to be long-winded at times, but it is designed to be so in order impress upon the reader the concepts that are being discussed herein Some online references and books are at the end of the document for those who are interested in further learning about econometric and statistical modeling
About this document

Readers who have either a formal background in, conceptual understanding of, or keen interest in statistics would find this document helpful in transitioning towards econometric modeling A conceptual understanding of linear regression will also be helpful to appreciate econometrics, but this document will assume zero-knowledge in regression Econometrics as a science is founded on complex equations and assumptions based on the theories of probability and statistics these are not covered in this document.
Econometrics? Isnt that difficult?
Its full of formulas and it could be complex
But
Things must be made as simple as possible but never simpler

9
Corbis
This is an attempt to present econometrics as simple as possible
10
Whats required to learn a little bit of econometrics
11
lots of curiosity
12
a little bit of patience
13
a little bit of brains
14
confidence in dealing with numbers
15
a belief that numbers can tell stories
16
Lets start with a little bit of definition
17
Econometrics is an application of statistics and mathematics aimed at identifying and quantifying the relationships between two sets of variables (1) the predicted variables and (2) the predictor variables. The goal of econometrics is to test a hypothesized causal relationship between the predicted and the predictor variables.
18
Econometrics is an application of statistics and mathematics Econometrics is derived from statistics largely regression and trending techniques - and from mathematics There are differences between statistics and econometrics but the differences are academic*
* but not necessarily moot and unimportant For those interested about the differences, see future tutorials
19
aimed at identifying and quantifying the relationships between two sets of variables (1) the predicted variables and (2) the predictor variables. The basic goal of econometrics is to explain using formulas and numbers the relationship between a predictor variable such as GRPs, adspends, competitive spends, temperature, and seasonality and
a predicted variable such as awareness, sales, revenues, and profits
20
This relationship is expressed in an equation such as
y mx b u
y is the predicted variable x is the predictor variable m, b and u are the values that econometrics want to uncover
21
This relationship is expressed in an equation such as
y mx b u
We know the values of y and x Econometrics helps us identify the values of m, b and u
y is the predicted variable x is the predictor variable m, b and u are the values that econometrics want to uncover
22
If we were interested in awareness and GRPs

We can rewrite the first equation taking our interest into consideration as follows
awareness = m GRPs + b + u
What econometrics does is estimate the values of m, b and u based on the available data on Awareness and GRPs, such that we have an equation that relates Awareness and GRPs. Once m, b and u are identified and estimated, we can then use the equation to explain the movements in awareness with respect to GRPs and predict how awareness is going to move in the future given different levels of GRPs
NB. This is simplifying the relationship between GRPs and awareness drastically. The relationship is far more complex, of course but lets assume that this equation is true for now.
23
There are many econometric techniques
But the most common technique is linear regression
24
A brief introduction to linear regression How to create regression lines? Regression in econometrics and marketing
What is linear regression?

25
Introduction to linear regression

Lets assume that x is the evolution of the number of users of a certain product across months (in 000), represented by time t In the first month, for example, we see that there are 4905 users of the product. By the 5th month, that has increased to about 6800 users and by the 26th month, the number of users have increased to around 34200 Clearly, there is an increase in the number of users and it seems, from looking at the data alone that indeed, there is a significant uptrend
26
If we plotted the data, we would indeed see an upward trend

Product users 000
45 40 39.91454632
35
30
25
By the 30th month, the number of users have increased to about 40000 users
In the 1st month, we see that there 20 are about 5000 product users
15
10
4.905032999 5
Time t, in months
0 0 5 10 15 20 25 30 35
27
The question
If this trend held and continued into the next 12 months, how many more users will we have?
28
To answer this question

we need to understand first the past relationship between the two variables time and numbers of users.
The Past The Future
We will then use this understanding of the past to predict whats going to happen in the next 12 months
29
What bridges the gap between the past and the future
Once we have identified the equation or the model, we will have a better grasp of (1) the past trends and (2) the potentials of the future
The Past
Linear regression equation
The Future
Linear regression comes into the picture by bridging that gap between the past and the future
30
With that in mind, lets look at the chart again
31
From mere observation, we see an uptrend in users across time

Product users 000
45 40
35
30
25
20
15
10
Time t, in months
0 0 5 10 15 20 25 30 35
32
How do we quantify* that uptrend?

Product users 000
45 40
35
30
25
20
15
10
Time t, in months
0 0 5 10 15 20 25 30 35
* Remember: In order to project into the future, we need to create a model that quantifies the relationship between time and number of users
33
There are an infinite number of lines that we could use to characterize the uptrend
Product users 000
45 40
35
30
25
20
15
10
Time t, in months
0 0 5 10 15 20 25 30 35
Different people have different views even when viewing the same set of data: I can argue that the best line is the grey line, another can argue that the blue line is best, and still another can argue that the best line is the pink line
34
Linear regression insists that there is one (and only one) line that would best characterize the trend and the relationship between the two variables
35
Linear regression also insists that this equation be of the following form:
y mx b u
where y is the number of users per month 000 x is time b is the constant
u is the unexplained variance
36
This one line that best describes the relationship between the two variables is derived through OLS
OLS which stands for ordinary least squares is an algorithm that defines the values of m, b and u such that the distance between the actual values and the line defined by the final values of m, b and u are at its minimum
Huh
37
Lets go back a few charts

Remember: Given any data set, there are an infinite number of lines that can be used to describe the trend. One can choose the pink to be the best and rationalize it; another person can argue that the yellow line is the best, and still another third person can defend the blue line. We can argue indefinitely about the merits of each of these infinite number of lines. What OLS does is it objectively goes through these infinite number of lines and finds the bestfitting line such that the distance between the line and the original data-points are at a minimum
OLS does this iteratively that is, through trial-and-error until it arrives at the values of m, b, and u that define a line with minimum distance between it and the original data. (Think of OLS as a search-algorithm that tries different m-b-u combinations to achieve the best-fitting line.)
38
Going back to the data the best fitting regression line, after applying OLS is
Product users 000
45
40
y = 1.1416x + 3.6329
R = 0.9391
35
30
25
20
15
10
Time t, in months
0 0 5 10 15 20 25 30 35
39
By applying OLS, the equation y = 1.416x + 3.6329 is found to be the best-fitting regression line
It is objective and unbiased
By using OLS, we are assured that this is unbiased and objective
It is linear
It conforms to the y= mx + b + u requirement of econometrics)
It is the best-fitting line

Because the OLS algorithm is aimed at minimizing the distance between the line and the data points, we are assured that it is the best-fitting line
40
Now comes the interesting part
So what does the equation exactly mean?
41
The story behind y = 1.416x + 3.6329

This equation suggests the following For every 1.416-unit change in x, there is a corresponding 1unit change in y
Applying this to our data, we can say that for every 1.416 months (about 5-6 weeks), there is an additional 1000 new users of the product
3.6329 is called the constant it is the number of users when the product was rolled out into the marketplace (at time t = 0)
These are perhaps the early adopters of the product or those who have been exposed to the product through free samples
42
OK, we have an equation how do we know its the correct equation?

First, we eyeball the line and the actual data Are the data points within reasonable distance of the line?
If each of the data points seem to be near the trendline, then we can say initially that we have a good fit If there are data-points that are significantly far from the line, then the equation may need to be revisited or that outlying data-point may be caused by something else apart from time
43
Lets eyeball the model: There seem to be no datapoints that are significantly away from the line
Product users 000
45
40
y = 1.1416x + 3.6329
R = 0.9391
35
30
25
20
15
10
Time t, in months
0 0 5 10 15 20 25 30 35
44
Eyeballing the data, however, brings back subjective interpretations

Product users 000
45
40
y = 1.1416x + 3.6329
R = 0.9391
35
30
25
20
15
10
Time t, in months
0 0 5 10 15 20 25 30 35
One can argue that point at month 11 is significantly away from the line and so is data for month 24 We therefore need a more accurate, more objective measurement of fit
45
How else do we know if the equation is valid or not?

We look at the r-squared (r2) 0.9391
This suggests that the variable time is able to explain 93.91% of the variance or movements in the number of users The other 6.09% are unexplained by the variable time and could be due to other factors that are beyond time The 6.09% unexplained variance could also be because of errors in measurements, or simply random errors that we will never be able to uncover
An r-squared of 0.75+ is considered to be acceptable as a rule-of-thumb
The r-squared is only one of few that measure goodness-of-fit (GIF). Other measures include adjusted R-squared, AIC/Akaike Information Criteria, RMSE/root-mean squared error, and GLM-ANOVA. These will not be discussed here.
46
Will we ever have a r-squared of 1.00?

Possible but highly improbable The higher the r-squared, the better and it possible to have a 1.00 r-squared, but in the real world, highlyimprobable A r-squared of 1.00 will only happen in a perfect scenario where the model perfectly fits and explains the data Getting an r-squared of 0.75+ in and of itself will be a challenge
47
But there are deviations between the line and the data!
Why do we have deviations? Because there are other things that we probably are not taking into account in this model
48
Deviations are not entirely bad

Actually, the deviations are part of the story Because these deviations are an indication that something else apart from time is at work, it is worth checking why these deviations exist This is where analytics and econometrics/statistics meet uncovering why things are explainable and not-explainable by a model.
49
Lets go back to the original question:
50
What have we done so far?

Weve modeled and derived an equation relating time-t with purchases for the first 30months
45 40 35 30 25 20 15 10 5 0 0 5 10 15 20 25 30 35
y = 1.1416x + 3.6329
R = 0.9391
51
What have we done so far?

Were fairly confident with the model because it explains about 94% of the variance in the number of purchasers, as reflected by the r-squared
45 40 35 30 25 20 15 10 5 0 0 5 10 15 20 25 30 35
y = 1.1416x + 3.6329
R = 0.9391
52
Lets now project whats going to happen in the next 12 months

Product users 000
Actual 60 Projected
50
40
30
20
At the end of the next 12 months [by month 42], we can expect to have 543000 users if all things remain equal
Time t, in months
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
53
Since we dont really know whats going to happen in the future and we dont have a perfect model
Actual 70 Projected
60
We can report ranges instead of just a line The dashed lines indicate the range of expectations for the next 12 months
50
40
30
20
10
We can expect that there will be about 470000 to 616000 users by month 42
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
54
Are you still there?
55
Take a sigh of relief
56
Linear regression through OLS is just amongst of the many techniques in econometrics
For those interested Wikipedias page on linear regression is here and the OLS technique is discussed here. Specifically on econometrics, Wikipedias entry is here. An international organization of econometricians and some information on econometrics can be found here. A more detailed introduction to econometrics can be found here.
57
Books on econometrics that weve found useful

Econometrics by Samuel Cameron, in Amazon.Com, is an approachable introduction to the concepts Introductory Econometrics by Humberto Barreto uses Microsoft Excel and includes a CD-ROM with interactive files. A Guide to Econometrics by Peter Kennedy is considered by most teachers in beginning econometrics and practitioners to be a good guide
58
Other books that might be helpful

Probability plays a major role in econometrics; for those interested, ET Jaynes has an e-book (in PDF) here. This is heavy reading, but enlightening. An HTML version can be found here Since econometrics builds on statistical theory, try reading chapters on linear regression (bivariate/multivariate) in Stat101 books. Amazon has this list for you to choose from.
59
Credits for the images use

Most of the images in the presentation are from Gettyimages.Com; the ownership of GettyImages over these photos are asserted and no claims are made by the presenter, author, nor by the company on these images. We acknowledge GettyImages ownership of copyright over their work in this presentation. We also acknowledge and claim no ownership of the other images that have been used in this presentation/file.
60
This presentation
Author: Philip Tiongson philtiongson@gmail.com Audiences: Staff interested in the basics of econometrics
61
62

Econ101 110501195141 Phpapp02

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Econ101 110501195141 Phpapp02

Загружено:

Авторское право:

Доступные форматы

What is econometrics?

Simple, non-technical introduction on Linear Regression/OLS as a technique

About this document

About this document

About this document

Econometrics? Isnt that difficult?

Its full of formulas and it could be complex

Things must be made as simple as possible but never simpler

This is an attempt to present econometrics as simple as possible

Whats required to learn a little bit of econometrics

a little bit of patience

a little bit of brains

confidence in dealing with numbers

a belief that numbers can tell stories

Lets start with a little bit of definition

a predicted variable such as awareness, sales, revenues, and profits

If we were interested in awareness and GRPs

There are many econometric techniques

But the most common technique is linear regression

What is linear regression?

Introduction to linear regression

If we plotted the data, we would indeed see an upward trend

To answer this question

With that in mind, lets look at the chart again

From mere observation, we see an uptrend in users across time

How do we quantify* that uptrend?

u is the unexplained variance

Lets go back a few charts

It is the best-fitting line

Now comes the interesting part

So what does the equation exactly mean?

The story behind y = 1.416x + 3.6329

OK, we have an equation how do we know its the correct equation?

Eyeballing the data, however, brings back subjective interpretations

How else do we know if the equation is valid or not?

An r-squared of 0.75+ is considered to be acceptable as a rule-of-thumb

Will we ever have a r-squared of 1.00?

Deviations are not entirely bad

Lets go back to the original question:

What have we done so far?

What have we done so far?

Lets now project whats going to happen in the next 12 months

Are you still there?

Take a sigh of relief

Books on econometrics that weve found useful

Other books that might be helpful

Credits for the images use

Вам также может понравиться