Introduction and The Simple Linear Regression Model

BMAN 70211 Cross Sectional Econometrics
Lecture 1. Introduction and the Simple Linear Regression Model
Dr. Viet Anh Dang
University of Manchester, 2016
V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 1 / 35

Outline
1 What is econometrics?
2 Key information about this course unit
3 Steps in an empirical analysis and structure of data
4 The simple regression model
5 Deriving the OLS estimator
6 Properties of the OLS estimator
7 OLS and BLUE
8 Summary

What is econometrics?
Econometrics is
[T]he study of the application of statistical methods to the analysis of
economic phenomena (Tintner, 1953).
[T]he art and science of using statistical methods for the measurement of
economic relations (Chow, 1985).
The application of statistical and mathematical methods to the analysis of

economic data, with a purpose of giving empirical content to economic
theories and verifying them or refuting them (Maddala, 1992).
Examples of studies using econometrics:
Forecasting the foreign exchange rate.
Studying the relation between economic growth and unemployment.
Investigating the relation between managerial pay and rm performance.

Why study econometrics?
It is important to be able to apply theory to real world data.
Theory may be ambiguous as to the eect of some policy change, and in any
case theory rarely tells us how large the eect might be.
It is important to forecast economic, accounting, and nancial variables

(ination, interest rates, exchange rates, sales, rm value, performance etc.).
What does econometrics mean to us?
Many A&F academics use econometrics in their research.

You will most likely need to use econometrics in your dissertations.

Learning outcomes of the course unit
On completion of this course, you will be able to

1 understand and use appropriate methods for estimating and testing the linear
regression model.
2 understand and use appropriate methods for estimating models using
instrumental variables or those with limited dependent variables.
3 understand and use advanced estimators for static and dynamic panel data
models.
4 appreciate and critically evaluate methods employed by cross-sectional
and panel data studies in accounting, nance, and business economics.
5 develop programming skills required to manage and analyse cross-sectional
and panel data.
6 develop quantitative research skills in conducting an empirical research
project in accounting, nance, and business economics.
7 develop the ability to participate constructively in group work.

Delivery and contact
Delivery via ten Lectures and ten Practical lectures (PLs): 37.5 hours.
Five two-hour Lectures, followed by ve one-hour PLs.

Five three-hour Lectures. Five 1.5-hour PLs in computer cluster.
Private study: approx. 120 hours
We expect you to (1) complete the assigned reading, (2) attend the lectures
and participate actively in class discussions, (2) develop and practice
programming skills, and (3) participate constructively in group work.
Also attempt weekly problem set for independent study.
Contact by email: vietanh.dang@manchester.ac.uk. TA: Ms. Liu Liu.
My oce hours: Fridays 10.30 12.30 (appointment via SOHOL).

Be proactive see me sooner rather than later if there is any problem.
Feedback on your progress:

Informal feedback via (1) lectures, (2) practical lectures including computer
labs, (3) emails, (4) discussions on Blackboard, and (5) in person.
Summative written feedback on assignment. Formative and summative
feedback on exam.
Your feedback on the course: via the Rep and questionnaire survey.

Textbook, reading, and assessment
Wooldridge, J.M. (2016). Introductory Econometrics, 6th Ed, Cengage

Learning.
Available in the Blackwell bookstore or on line.

There are also alternatives, including Stock, J.M. and Watson, M.W. (2015).
Introduction to Econometrics, Global Edition, Pearson Education.
An e-Book version of these texts is available for sale in the respective
publishers' website.
A limited number of copies of these texts are available in the library.
Reading for each lecture will be set in advance.
Assessment:
Group coursework (30%): empirical project to analyze real data using

techniques and programming skills learned.
Two-hour unseen examination in January (70%): more on this later (Lecture
10).
Enjoy and have fun ,!

Overall process in an empirical analysis
1 Step 1: Carefully develop research question(s).

2 Step 2: Specify an economic or conceptual model.
Ex: to study the eects of job training on worker productivity (observed hourly
wage) we can start with an equation such as
wage = f (educ, exper , training ),

where educ is a measure of schooling, exper is a measure of workforce
experience, and training is a measure of time spent in job training (the
variable of most interest).
3 Step 3: Turn the economic model into an econometric model.

Ex: specify an econometric model for the wage/job training example as
wage = β0 + β1 educ + β2 exper + β3 training + u.

the constant β0 , β1 , β2 , and β3 (the betas) are the parameters of the model,
and it is these (especially β3 in this example) that we hope to estimate. u is
called the error term or disturbance.
4 Step 4: Collect data on the variables and use econometric methods to
estimate the parameters, and test hypotheses.

Types of data
Data come in several dierent forms, namely cross-sectional, time-series,

pooled cross sections, and panel data.
We will focus on cross-sectional data.

The last few lectures will look at panel data.
In the second semester, you can study time-series data in more depth.
Cross-sectional data
Data are collected on individuals, families, rms, governments, or some other units
at a given point in time.
We will assume that a cross-sectional data set represents a random sample.
Intuitively, a random sample is representative of the population of interest, and gives
us the best chance of learning about the population.

Cross-sectional data Example

Types of data (cont'd) Time series data
Time series data

Consists of observations for a single entity (rm, government) on variables
observed over a stretch of time.
Examples include stock returns, interest rates, growth rates, unemployment

rates, etc...
Data frequency: daily, weekly, monthly, quarterly, annually,...
Key feature: the order of observations is important.
Time-series observations are typically serially correlated: we cannot assume

outcomes are independent across observation (that is, across time).
The need to control for time trends and seasonality.
Examples
Many examples in macroeconomics and nance.
Estimate the CAPM for a stock using daily stock return and market index.

Time series data Example

Types of data (cont'd) Panel data
Panel data
The same cross-sectional units are followed over time.
Panel data have a cross-sectional and a time series dimension.
Some important features of panel data:
May account for time-invariant unobservables.

May be used to model dynamics and lagged responses.
Examples
Custodio et al. (JFE, 2013) examine the evolution of corporate debt

maturity: each rm's short-term debt ratio is observed for a number of years.
Time-invariant unobserved rm characteristics may be modeled.
Eects of some rm characteristics on debt maturity may exhibit time lag.

Panel data Example

The simple regression model Basic setup
We begin with cross-sectional analysis and assume we can collect a random

sample from the population of interest.
There are two variables, x and y, and we would like to study how y varies
with changes in x.
Ex: in the wage model, x is years of schooling and y is hourly wage.
We must address three issues:
1 What is the functional relationship between x and y?

2 Do we allow factors other than x to aect y?
3 Can we be sure x causes y?
E.g.: Does more ad spending cause better sales? Or does another year of
education cause an increase in wages?
Finding correlations in data might be suggestive but is rarely conclusive.

The simple regression model
Consider the following the simple linear regression model relating y to x:
y = β0 + β1 x + u.
This equation allows for other factors, contained in u, to aect y.

Graphical illustration

Ordinary Least Squares (OLS) estimator
We use a random sample {(xi , yi ) : i = 1, 2, ..., n} of size n (the number of

observations) from the population.
OLS estimation: t as good as possible a regression line through the
data points.

Minimizing the Sum of Squared Residuals (1)
This is equivalent to solving the following minimization problem:
n n
ûi2 = (yi − β̂0 − β̂1 xi )2 .
X X
Min S(β̂0 , β̂1 ) = (1)
i=1 i=1
First-order conditions: ∂S(β̂0 , β̂1 )/∂ β̂0 = 0 and ∂S(β̂0 , β̂1 )/∂ β̂1 = 0, or
n
X
2 (yi − β̂0 − β̂1 xi ) = 0. (2)
i=1
n
X
2 xi (yi − β̂0 − β̂1 xi ) = 0. (3)
i=1
To solve the equations, pass the summation operator through (2):
n n n n
−1 −1 −1 −1
X X X X
n (yi − β̂0 − β̂1 xi ) = n yi − n β̂0 − n β̂1 xi = 0. (4)
i=1 i=1 i=1 i=1

Recall:
n n n
n −1 yi − n−1 β̂0 − n−1
X X X
β̂1 xi = ȳ − β̂0 − β̂1 x̄ = 0, (5)
i=1 i=1 i=1
which implies
β̂0 = ȳ − β̂1 x.
¯ (6)
Next, plug (6) into the second FOC, (3):
n
X n
X
xi (yi − β̂0 − β̂1 xi ) = xi [yi − (ȳ − β̂1 x̄) − β̂1 xi ] = 0. (7)
i=1 i=1
Rearranging (7) yields
n
" n
#
X X
xi (yi − ȳ ) = β̂1 xi (xi − x̄) .
i=1 i=1

Recall that: " #

n
X n
X
xi (yi − ȳ ) = β̂1 xi (xi − x̄) . (8)
i=1 i=1
Given two useful facts about the summation operator (see Appendix A.6):
n
X n
X n
X
xi (yi − ȳ ) = (xi − x̄)(yi − ȳ ), where (yi − ȳ ) = 0, (9)
i=1 i=1 i=1
n n n
(xi − x̄)2 ,
X X X
xi (xi − x̄) = where (xi − x̄) = 0. (10)
i=1 i=1 i=1
We obtain
Pn
i=1 (xi − x̄)(yi − ȳ ) Cov (xi , yi )
β̂1 = P n 2
= . (11)
i=1 (xi − x̄) Var (xi )

Example (will illustrate in Practical Lecture 2)
Regress CEO salary (in thousands of dollars) on ROE (net income as % of

common equity in the previous three years) using the data in CEOSAL1.dta.
salary = β0 + β1 roe + u.
The estimated equation for sample of rms is
\ = 936.191 + 18.501roe.
salary
What do these estimates tell us?

Population vs. Sample regression function

Properties of the OLS estimator
1 The OLS residuals always add up to zero (see (2)):
n
X
ûi = 0. (12)
i=1
û is the residual computed from the sample data, and is dierent from the
unobserved error u in y = β0 + β1 x + u .
Likewise, β̂0 and β̂1 are dierent from the parameter values, β0 and β1 .
2 The sample covariance (and therefore the sample correlation) between the
explanatory variable(s) and the residuals is always zero (see (3)):
n
X
xi ûi = 0. (13)
i=1
3 The point (x̄, ȳ ) is always on the OLS regression line:
ȳ = β̂0 + β̂1 x̄. (14)

Residuals and Fitted values

Goodness-of-t (1)
For each observation, we note that yi = ŷi + ûi .

Next, we dene the Total Sum of Squares (SST), Explained Sum of
Squares (SSE) (STATA calls this the model sum of squares), and Residual
Sum of Squares (or Sum of Squared Residuals, SSR) as follows:
n
(yi − ȳ )2 ,
X
SST = (15)
i=1
n
(ŷi − ȳ )2 ,
X
SSE = (16)
i=1
n
ûi2 .
X
SSR = (17)
i=1
Each of these is a sample variance when divided by n−1 (see PL1).
The total variation in y is the sum of the explained variation and

unexplained variation:
SST = SSE + SSR. (18)

Goodness-of-t (2)
To measure how well the OLS regression line ts the data, we use:
SSE SSR
R2 = =1− . (19)
SST SST
The R -squared shows the fraction of the sample variation in y that is
explained by x.
By construction, 0 ≤ R 2 ≤ 1:
2
R =0 means no linear relationship, while R2 = 1 means a perfect linear
relationship (OLS provides a perfect t to the data).
As R2 increases, the yi are closer and closer to falling on the OLS regression
line.
Caution when using R-squared:

Can be low in cross-sectional analysis.
Low R -squared does not always mean the OLS regression is useless.
High R -squared may say nothing about causality, which is often the aim of the
analysis.

Example: the salary regression model
In the CEO salary regression, we obtain the following:
\ = 936.191 + 18.501roe.
salary
n = 209, R 2 = 0.0132.
How do we interpret the R -squared in this case?

Desirable properties for estimators
β̂i with i = 0, 1 is an estimator of the true value of βi .

The OLS estimator is one of several possible estimators of βi .
Unbiasedness: E (β̂) = β .
On average, β̂ in repeated sampling will be equal to the true value of β.
There is no over- or under-estimation of the true parameter.
Consistency: plim (β̂) = β.

As the sample size gets larger (or increases indenitely), the estimator
converges to the true value.
This is a large sample (asymptotic) property, a minimal requirement of an
estimator.
Eciency: Var (β̂) is smallest.
No other estimator has smaller variance.

Unbiasedness, consistency, and eciency

BLUE and the Gauss-Markov assumptions
What is BLUE?
An estimator is BLUE if it is the best linear unbiased estimator.
Under certain conditions, the OLS estimator can be shown to be BLUE.
The Gauss-Markov assumptions
A1. Linear in parameters: the model is linear in parameters, i.e.,

y = β0 + β1 x + u .
A23. Random sampling (no sample selection) and some variation in x.
A4. Zero conditional mean: conditional on x, each error has zero mean
E (u | x) = 0.
This means that not only E (u) = 0 but also that any function of x is
uncorrelated with u.
This assumes strict exogeneity on the explanatory variable.
A5. Homoscedasticity: Var (u | x) = σ 2 , which, combined with A4, implies

Var (y | x) = σ 2 , i.e., the variance of y , conditional on x , is constant, i.e., the
constant variance assumption.

Homoskedasticity Constant variance assumption

When is homoskedasticity violated? An example

Some remarks on the Gauss-Markov theorem
Proof of unbiasedness of OLS is based on Assumptions A1A4 (see

Theorem 2.1 in Wooldridge).
Failure of the zero conditional mean assumption causes OLS to be biased.

However, violation of the constant variance assumption does not make OLS
biased.
Proof of eciency of OLS is more complicated but a full proof using matrix
notation is available in Appendix E.
Implications of the theorem:
If the standard set of assumption holds, we should use OLS as it is BLUE.

If, however, any of the assumptions fails to hold, the properties of OLS will be
aected.

Summary
Discussed the importance of econometrics in studying economics, accounting,

and nance.
Provided key information about the course unit.
Briey talked about the steps in an empirical analysis.
Looked at three main data structures: cross-sectional, time series, and panel
data.
Introduced the simple linear regression model and derived the OLS estimator.
Looked at the properties of the OLS estimator.
OLS is BLUE under the Gauss-Markov assumptions.
What about models with multiple regressors, i.e., multiple regression analysis?

Introduction and The Simple Linear Regression Model

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Introduction and The Simple Linear Regression Model

Загружено:

Авторское право:

Доступные форматы

BMAN 70211  Cross Sectional Econometrics

Lecture 1. Introduction and the Simple Linear Regression Model

Dr. Viet Anh Dang

University of Manchester, 2016

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 1 / 35

2 Key information about this course unit

3 Steps in an empirical analysis and structure of data

4 The simple regression model

5 Deriving the OLS estimator

6 Properties of the OLS estimator

7 OLS and BLUE

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 2 / 35

The application of statistical and mathematical methods to the analysis of

Examples of studies using econometrics:

Forecasting the foreign exchange rate.

Studying the relation between economic growth and unemployment.

Investigating the relation between managerial pay and rm performance.

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 3 / 35

It is important to be able to apply theory to real world data.

It is important to forecast economic, accounting, and nancial variables

What does econometrics mean to us?

Many A&F academics use econometrics in their research.

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 4 / 35

On completion of this course, you will be able to

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 5 / 35

Five two-hour Lectures, followed by ve one-hour PLs.

Private study: approx. 120 hours

Contact by email: vietanh.dang@manchester.ac.uk. TA: Ms. Liu Liu.

My oce hours: Fridays 10.30  12.30 (appointment via SOHOL).

Feedback on your progress:

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 6 / 35

Wooldridge, J.M. (2016). Introductory Econometrics, 6th Ed, Cengage

Available in the Blackwell bookstore or on line.

A limited number of copies of these texts are available in the library.

Reading for each lecture will be set in advance.

Group coursework (30%): empirical project to analyze real data using

Enjoy and have fun ,!

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 7 / 35

1 Step 1: Carefully develop research question(s).

wage = f (educ, exper , training ),

3 Step 3: Turn the economic model into an econometric model.

wage = β0 + β1 educ + β2 exper + β3 training + u.

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 8 / 35

Data come in several dierent forms, namely cross-sectional, time-series,

We will focus on cross-sectional data.

We will assume that a cross-sectional data set represents a random sample.

Intuitively, a random sample is representative of the population of interest, and gives

us the best chance of learning about the population.

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 9 / 35

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 10 / 35

Time series data

Examples include stock returns, interest rates, growth rates, unemployment

Data frequency: daily, weekly, monthly, quarterly, annually,...

Key feature: the order of observations is important.

Time-series observations are typically serially correlated: we cannot assume

The need to control for time trends and seasonality.

Many examples in macroeconomics and nance.

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 11 / 35

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 12 / 35

Panel data have a cross-sectional and a time series dimension.

Some important features of panel data:

May account for time-invariant unobservables.

Custodio et al. (JFE, 2013) examine the evolution of corporate debt

BMAN 70211 Cross Sectional Econometrics

The application of statistical and mathematical methods to the analysis of

Investigating the relation between managerial pay and rm performance.

It is important to forecast economic, accounting, and nancial variables

Five two-hour Lectures, followed by ve one-hour PLs.

My oce hours: Fridays 10.30 12.30 (appointment via SOHOL).

Data come in several dierent forms, namely cross-sectional, time-series,

Many examples in macroeconomics and nance.

Time-invariant unobserved rm characteristics may be modeled.

This equation allows for other factors, contained in u, to aect y.

The estimated equation for sample of rms is

Eciency: Var (β̂) is smallest.

Proof of unbiasedness of OLS is based on Assumptions A1A4 (see

Briey talked about the steps in an empirical analysis.