Академический Документы
Профессиональный Документы
Культура Документы
This document also covers the very basics of Econometrics. Econometrics as a subject is theoretically complex. The goal of this document is to empower the reader with an understanding of econometrics so she/he can discuss the topic with some confidence
What is econometrics?
But
Corbis
10
11
lots of curiosity
12
13
14
15
16
What is econometrics?
17
What is econometrics?
Econometrics is an application of statistics and mathematics aimed at identifying and quantifying the relationships between two sets of variables (1) the predicted variables and (2) the predictor variables. The goal of econometrics is to test a hypothesized causal relationship between the predicted and the predictor variables.
18
What is econometrics?
Econometrics is an application of statistics and mathematics Econometrics is derived from statistics largely regression and trending techniques - and from mathematics There are differences between statistics and econometrics but the differences are academic*
* but not necessarily moot and unimportant For those interested about the differences, see future tutorials
19
What is econometrics?
aimed at identifying and quantifying the relationships between two sets of variables (1) the predicted variables and (2) the predictor variables. The basic goal of econometrics is to explain using formulas and numbers the relationship between a predictor variable such as GRPs, adspends, competitive spends, temperature, and seasonality and
20
What is econometrics?
This relationship is expressed in an equation such as
y mx b u
y is the predicted variable x is the predictor variable m, b and u are the values that econometrics want to uncover
21
What is econometrics?
This relationship is expressed in an equation such as
y mx b u
We know the values of y and x Econometrics helps us identify the values of m, b and u
y is the predicted variable x is the predictor variable m, b and u are the values that econometrics want to uncover
22
awareness = m GRPs + b + u
What econometrics does is estimate the values of m, b and u based on the available data on Awareness and GRPs, such that we have an equation that relates Awareness and GRPs. Once m, b and u are identified and estimated, we can then use the equation to explain the movements in awareness with respect to GRPs and predict how awareness is going to move in the future given different levels of GRPs
NB. This is simplifying the relationship between GRPs and awareness drastically. The relationship is far more complex, of course but lets assume that this equation is true for now.
23
24
A brief introduction to linear regression How to create regression lines? Regression in econometrics and marketing
35
30
25
By the 30th month, the number of users have increased to about 40000 users
In the 1st month, we see that there 20 are about 5000 product users
15
10
4.905032999 5
Time t, in months
0 0 5 10 15 20 25 30 35
27
The question
If this trend held and continued into the next 12 months, how many more users will we have?
28
We will then use this understanding of the past to predict whats going to happen in the next 12 months
29
What bridges the gap between the past and the future
Once we have identified the equation or the model, we will have a better grasp of (1) the past trends and (2) the potentials of the future
The Past
Linear regression equation
The Future
Linear regression comes into the picture by bridging that gap between the past and the future
30
31
35
30
25
20
15
10
Time t, in months
0 0 5 10 15 20 25 30 35
32
35
30
25
20
15
10
Time t, in months
0 0 5 10 15 20 25 30 35
* Remember: In order to project into the future, we need to create a model that quantifies the relationship between time and number of users
33
There are an infinite number of lines that we could use to characterize the uptrend
Product users 000
45 40
35
30
25
20
15
10
Time t, in months
0 0 5 10 15 20 25 30 35
Different people have different views even when viewing the same set of data: I can argue that the best line is the grey line, another can argue that the blue line is best, and still another can argue that the best line is the pink line
34
Linear regression insists that there is one (and only one) line that would best characterize the trend and the relationship between the two variables
35
Linear regression also insists that this equation be of the following form:
y mx b u
where y is the number of users per month 000 x is time b is the constant
36
This one line that best describes the relationship between the two variables is derived through OLS
OLS which stands for ordinary least squares is an algorithm that defines the values of m, b and u such that the distance between the actual values and the line defined by the final values of m, b and u are at its minimum
Huh
37
OLS does this iteratively that is, through trial-and-error until it arrives at the values of m, b, and u that define a line with minimum distance between it and the original data. (Think of OLS as a search-algorithm that tries different m-b-u combinations to achieve the best-fitting line.)
38
Going back to the data the best fitting regression line, after applying OLS is
Product users 000
45
40
y = 1.1416x + 3.6329
R = 0.9391
35
30
25
20
15
10
Time t, in months
0 0 5 10 15 20 25 30 35
39
By applying OLS, the equation y = 1.416x + 3.6329 is found to be the best-fitting regression line
It is objective and unbiased
By using OLS, we are assured that this is unbiased and objective
It is linear
It conforms to the y= mx + b + u requirement of econometrics)
40
41
3.6329 is called the constant it is the number of users when the product was rolled out into the marketplace (at time t = 0)
These are perhaps the early adopters of the product or those who have been exposed to the product through free samples
42
43
Lets eyeball the model: There seem to be no datapoints that are significantly away from the line
Product users 000
45
40
y = 1.1416x + 3.6329
R = 0.9391
35
30
25
20
15
10
Time t, in months
0 0 5 10 15 20 25 30 35
44
y = 1.1416x + 3.6329
R = 0.9391
35
30
25
20
15
10
Time t, in months
0 0 5 10 15 20 25 30 35
One can argue that point at month 11 is significantly away from the line and so is data for month 24 We therefore need a more accurate, more objective measurement of fit
45
The r-squared is only one of few that measure goodness-of-fit (GIF). Other measures include adjusted R-squared, AIC/Akaike Information Criteria, RMSE/root-mean squared error, and GLM-ANOVA. These will not be discussed here.
46
47
But there are deviations between the line and the data!
Why do we have deviations? Because there are other things that we probably are not taking into account in this model
48
49
50
y = 1.1416x + 3.6329
R = 0.9391
51
y = 1.1416x + 3.6329
R = 0.9391
52
50
40
30
20
At the end of the next 12 months [by month 42], we can expect to have 543000 users if all things remain equal
Time t, in months
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
53
Since we dont really know whats going to happen in the future and we dont have a perfect model
Actual 70 Projected
60
We can report ranges instead of just a line The dashed lines indicate the range of expectations for the next 12 months
50
40
30
20
10
We can expect that there will be about 470000 to 616000 users by month 42
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
54
55
56
Linear regression through OLS is just amongst of the many techniques in econometrics
For those interested Wikipedias page on linear regression is here and the OLS technique is discussed here. Specifically on econometrics, Wikipedias entry is here. An international organization of econometricians and some information on econometrics can be found here. A more detailed introduction to econometrics can be found here.
57
58
59
60
This presentation
Author: Philip Tiongson philtiongson@gmail.com Audiences: Staff interested in the basics of econometrics
61
62