Вы находитесь на странице: 1из 35

BMAN 70211  Cross Sectional Econometrics

Lecture 1. Introduction and the Simple Linear Regression Model

Dr. Viet Anh Dang

University of Manchester, 2016

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 1 / 35


Outline

1 What is econometrics?

2 Key information about this course unit

3 Steps in an empirical analysis and structure of data

4 The simple regression model

5 Deriving the OLS estimator

6 Properties of the OLS estimator

7 OLS and BLUE

8 Summary

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 2 / 35


What is econometrics?

Econometrics is
 [T]he study of the application of statistical methods to the analysis of
economic phenomena (Tintner, 1953).

 [T]he art and science of using statistical methods for the measurement of
economic relations (Chow, 1985).

The application of statistical and mathematical methods to the analysis of


economic data, with a purpose of giving empirical content to economic
theories and verifying them or refuting them (Maddala, 1992).

Examples of studies using econometrics:

Forecasting the foreign exchange rate.

Studying the relation between economic growth and unemployment.

Investigating the relation between managerial pay and rm performance.

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 3 / 35


Why study econometrics?

It is important to be able to apply theory to real world data.

Theory may be ambiguous as to the eect of some policy change, and in any
case theory rarely tells us how large the eect might be.

It is important to forecast economic, accounting, and nancial variables


(ination, interest rates, exchange rates, sales, rm value, performance etc.).

What does econometrics mean to us?

Many A&F academics use econometrics in their research.


You will most likely need to use econometrics in your dissertations.

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 4 / 35


Learning outcomes of the course unit

On completion of this course, you will be able to


1 understand and use appropriate methods for estimating and testing the linear
regression model.
2 understand and use appropriate methods for estimating models using
instrumental variables or those with limited dependent variables.
3 understand and use advanced estimators for static and dynamic panel data
models.
4 appreciate and critically evaluate methods employed by cross-sectional
and panel data studies in accounting, nance, and business economics.
5 develop programming skills required to manage and analyse cross-sectional
and panel data.
6 develop quantitative research skills in conducting an empirical research
project in accounting, nance, and business economics.
7 develop the ability to participate constructively in group work.

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 5 / 35


Delivery and contact

Delivery via ten Lectures and ten Practical lectures (PLs): 37.5 hours.

Five two-hour Lectures, followed by ve one-hour PLs.


Five three-hour Lectures. Five 1.5-hour PLs in computer cluster.

Private study: approx. 120 hours

We expect you to (1) complete the assigned reading, (2) attend the lectures
and participate actively in class discussions, (2) develop and practice
programming skills, and (3) participate constructively in group work.
Also attempt weekly problem set for independent study.

Contact by email: vietanh.dang@manchester.ac.uk. TA: Ms. Liu Liu.

My oce hours: Fridays 10.30  12.30 (appointment via SOHOL).


Be proactive  see me sooner rather than later if there is any problem.

Feedback on your progress:


Informal feedback via (1) lectures, (2) practical lectures including computer
labs, (3) emails, (4) discussions on Blackboard, and (5) in person.
Summative written feedback on assignment. Formative and summative
feedback on exam.

Your feedback on the course: via the Rep and questionnaire survey.

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 6 / 35


Textbook, reading, and assessment

Wooldridge, J.M. (2016). Introductory Econometrics, 6th Ed, Cengage


Learning.

Available in the Blackwell bookstore or on line.


There are also alternatives, including Stock, J.M. and Watson, M.W. (2015).
Introduction to Econometrics, Global Edition, Pearson Education.
An e-Book version of these texts is available for sale in the respective
publishers' website.

A limited number of copies of these texts are available in the library.

Reading for each lecture will be set in advance.

Assessment:

Group coursework (30%): empirical project to analyze real data using


techniques and programming skills learned.
Two-hour unseen examination in January (70%): more on this later (Lecture
10).

Enjoy and have fun ,!

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 7 / 35


Overall process in an empirical analysis

1 Step 1: Carefully develop research question(s).


2 Step 2: Specify an economic or conceptual model.
Ex: to study the eects of job training on worker productivity (observed hourly
wage) we can start with an equation such as

wage = f (educ, exper , training ),


where educ is a measure of schooling, exper is a measure of workforce
experience, and training is a measure of time spent in job training (the
variable of most interest).

3 Step 3: Turn the economic model into an econometric model.


Ex: specify an econometric model for the wage/job training example as

wage = β0 + β1 educ + β2 exper + β3 training + u.


the constant β0 , β1 , β2 , and β3 (the betas) are the parameters of the model,
and it is these (especially β3 in this example) that we hope to estimate. u is
called the error term or disturbance.
4 Step 4: Collect data on the variables and use econometric methods to
estimate the parameters, and test hypotheses.

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 8 / 35


Types of data

Data come in several dierent forms, namely cross-sectional, time-series,


pooled cross sections, and panel data.

We will focus on cross-sectional data.


The last few lectures will look at panel data.
In the second semester, you can study time-series data in more depth.

Cross-sectional data

Data are collected on individuals, families, rms, governments, or some other units
at a given point in time.

We will assume that a cross-sectional data set represents a random sample.

Intuitively, a random sample is representative of the population of interest, and gives

us the best chance of learning about the population.

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 9 / 35


Cross-sectional data  Example

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 10 / 35


Types of data (cont'd)  Time series data

Time series data


Consists of observations for a single entity (rm, government) on variables
observed over a stretch of time.

Examples include stock returns, interest rates, growth rates, unemployment


rates, etc...

Data frequency: daily, weekly, monthly, quarterly, annually,...

Key feature: the order of observations is important.

Time-series observations are typically serially correlated: we cannot assume


outcomes are independent across observation (that is, across time).

The need to control for time trends and seasonality.

Examples

Many examples in macroeconomics and nance.

Estimate the CAPM for a stock using daily stock return and market index.

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 11 / 35


Time series data  Example

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 12 / 35


Types of data (cont'd)  Panel data

Panel data
The same cross-sectional units are followed over time.

Panel data have a cross-sectional and a time series dimension.

Some important features of panel data:

May account for time-invariant unobservables.


May be used to model dynamics and lagged responses.

Examples

Custodio et al. (JFE, 2013) examine the evolution of corporate debt


maturity: each rm's short-term debt ratio is observed for a number of years.

Time-invariant unobserved rm characteristics may be modeled.

Eects of some rm characteristics on debt maturity may exhibit time lag.

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 13 / 35


Panel data  Example

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 14 / 35


The simple regression model  Basic setup

We begin with cross-sectional analysis and assume we can collect a random


sample from the population of interest.

There are two variables, x and y, and we would like to study how y varies
with changes in x.
Ex: in the wage model, x is years of schooling and y is hourly wage.

We must address three issues:

1 What is the functional relationship between x and y?


2 Do we allow factors other than x to aect y?
3 Can we be sure x causes y?
E.g.: Does more ad spending cause better sales? Or does another year of

education cause an increase in wages?

Finding correlations in data might be suggestive but is rarely conclusive.

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 15 / 35


The simple regression model

Consider the following the simple linear regression model relating y to x:

y = β0 + β1 x + u.

This equation allows for other factors, contained in u, to aect y.

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 16 / 35


Graphical illustration

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 17 / 35


Ordinary Least Squares (OLS) estimator

We use a random sample {(xi , yi ) : i = 1, 2, ..., n} of size n (the number of


observations) from the population.
OLS estimation: t as good as possible a regression line through the
data points.

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 18 / 35


Minimizing the Sum of Squared Residuals (1)

This is equivalent to solving the following minimization problem:

n n
ûi2 = (yi − β̂0 − β̂1 xi )2 .
X X
Min S(β̂0 , β̂1 ) = (1)
i=1 i=1

First-order conditions: ∂S(β̂0 , β̂1 )/∂ β̂0 = 0 and ∂S(β̂0 , β̂1 )/∂ β̂1 = 0, or

n
X
2 (yi − β̂0 − β̂1 xi ) = 0. (2)
i=1
n
X
2 xi (yi − β̂0 − β̂1 xi ) = 0. (3)
i=1

To solve the equations, pass the summation operator through (2):

n n n n
−1 −1 −1 −1
X X X X
n (yi − β̂0 − β̂1 xi ) = n yi − n β̂0 − n β̂1 xi = 0. (4)
i=1 i=1 i=1 i=1

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 19 / 35


Minimizing the Sum of Squared Residuals (2)

Recall:

n n n
n −1 yi − n−1 β̂0 − n−1
X X X
β̂1 xi = ȳ − β̂0 − β̂1 x̄ = 0, (5)
i=1 i=1 i=1

which implies
β̂0 = ȳ − β̂1 x.
¯ (6)

Next, plug (6) into the second FOC, (3):

n
X n
X
xi (yi − β̂0 − β̂1 xi ) = xi [yi − (ȳ − β̂1 x̄) − β̂1 xi ] = 0. (7)
i=1 i=1

Rearranging (7) yields

n
" n
#
X X
xi (yi − ȳ ) = β̂1 xi (xi − x̄) .
i=1 i=1

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 20 / 35


Minimizing the Sum of Squared Residuals (3)

Recall that: " #


n
X n
X
xi (yi − ȳ ) = β̂1 xi (xi − x̄) . (8)
i=1 i=1

Given two useful facts about the summation operator (see Appendix A.6):

n
X n
X n
X
xi (yi − ȳ ) = (xi − x̄)(yi − ȳ ), where (yi − ȳ ) = 0, (9)
i=1 i=1 i=1

n n n
(xi − x̄)2 ,
X X X
xi (xi − x̄) = where (xi − x̄) = 0. (10)
i=1 i=1 i=1

We obtain
Pn
i=1 (xi − x̄)(yi − ȳ ) Cov (xi , yi )
β̂1 = P n 2
= . (11)
i=1 (xi − x̄) Var (xi )

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 21 / 35


Example (will illustrate in Practical Lecture 2)

Regress CEO salary (in thousands of dollars) on ROE (net income as % of


common equity in the previous three years) using the data in CEOSAL1.dta.

salary = β0 + β1 roe + u.

The estimated equation for sample of rms is

\ = 936.191 + 18.501roe.
salary

What do these estimates tell us?


V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 22 / 35
Population vs. Sample regression function

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 23 / 35


Properties of the OLS estimator

1 The OLS residuals always add up to zero (see (2)):

n
X
ûi = 0. (12)
i=1

û is the residual computed from the sample data, and is dierent from the
unobserved error u in y = β0 + β1 x + u .
Likewise, β̂0 and β̂1 are dierent from the parameter values, β0 and β1 .
2 The sample covariance (and therefore the sample correlation) between the
explanatory variable(s) and the residuals is always zero (see (3)):

n
X
xi ûi = 0. (13)
i=1

3 The point (x̄, ȳ ) is always on the OLS regression line:

ȳ = β̂0 + β̂1 x̄. (14)

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 24 / 35


Residuals and Fitted values

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 25 / 35


Goodness-of-t (1)

For each observation, we note that yi = ŷi + ûi .


Next, we dene the Total Sum of Squares (SST), Explained Sum of
Squares (SSE) (STATA calls this the model sum of squares), and Residual
Sum of Squares (or Sum of Squared Residuals, SSR) as follows:

n
(yi − ȳ )2 ,
X
SST = (15)
i=1
n
(ŷi − ȳ )2 ,
X
SSE = (16)
i=1
n
ûi2 .
X
SSR = (17)
i=1

Each of these is a sample variance when divided by n−1 (see PL1).

The total variation in y is the sum of the explained variation and


unexplained variation:
SST = SSE + SSR. (18)

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 26 / 35


Goodness-of-t (2)

To measure how well the OLS regression line ts the data, we use:

SSE SSR
R2 = =1− . (19)
SST SST
The R -squared shows the fraction of the sample variation in y that is
explained by x.
By construction, 0 ≤ R 2 ≤ 1:
2
R =0 means no linear relationship, while R2 = 1 means a perfect linear
relationship (OLS provides a perfect t to the data).
As R2 increases, the yi are closer and closer to falling on the OLS regression
line.

Caution when using R-squared:


Can be low in cross-sectional analysis.
Low R -squared does not always mean the OLS regression is useless.
High R -squared may say nothing about causality, which is often the aim of the
analysis.

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 27 / 35


Example: the salary regression model

In the CEO salary regression, we obtain the following:

\ = 936.191 + 18.501roe.
salary

n = 209, R 2 = 0.0132.
How do we interpret the R -squared in this case?

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 28 / 35


Desirable properties for estimators

β̂i with i = 0, 1 is an estimator of the true value of βi .


The OLS estimator is one of several possible estimators of βi .
Unbiasedness: E (β̂) = β .
On average, β̂ in repeated sampling will be equal to the true value of β.
There is no over- or under-estimation of the true parameter.

Consistency: plim (β̂) = β.


As the sample size gets larger (or increases indenitely), the estimator
converges to the true value.
This is a large sample (asymptotic) property, a minimal requirement of an
estimator.

Eciency: Var (β̂) is smallest.

No other estimator has smaller variance.

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 29 / 35


Unbiasedness, consistency, and eciency

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 30 / 35


BLUE and the Gauss-Markov assumptions

What is BLUE?
An estimator is BLUE if it is the best linear unbiased estimator.

Under certain conditions, the OLS estimator can be shown to be BLUE.

The Gauss-Markov assumptions

A1. Linear in parameters: the model is linear in parameters, i.e.,


y = β0 + β1 x + u .
A23. Random sampling (no sample selection) and some variation in x.
A4. Zero conditional mean: conditional on x, each error has zero mean
E (u | x) = 0.
This means that not only E (u) = 0 but also that any function of x is
uncorrelated with u.
This assumes strict exogeneity on the explanatory variable.

A5. Homoscedasticity: Var (u | x) = σ 2 , which, combined with A4, implies


Var (y | x) = σ 2 , i.e., the variance of y , conditional on x , is constant, i.e., the
constant variance assumption.

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 31 / 35


Homoskedasticity  Constant variance assumption

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 32 / 35


When is homoskedasticity violated? An example

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 33 / 35


Some remarks on the Gauss-Markov theorem

Proof of unbiasedness of OLS is based on Assumptions A1A4 (see


Theorem 2.1 in Wooldridge).

Failure of the zero conditional mean assumption causes OLS to be biased.


However, violation of the constant variance assumption does not make OLS
biased.

Proof of eciency of OLS is more complicated but a full proof using matrix
notation is available in Appendix E.

Implications of the theorem:

If the standard set of assumption holds, we should use OLS as it is BLUE.


If, however, any of the assumptions fails to hold, the properties of OLS will be
aected.

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 34 / 35


Summary

Discussed the importance of econometrics in studying economics, accounting,


and nance.

Provided key information about the course unit.

Briey talked about the steps in an empirical analysis.

Looked at three main data structures: cross-sectional, time series, and panel
data.

Introduced the simple linear regression model and derived the OLS estimator.

Looked at the properties of the OLS estimator.

OLS is BLUE under the Gauss-Markov assumptions.

What about models with multiple regressors, i.e., multiple regression analysis?

V.A. Dang (2016) BMAN 70211 Econometrics Lecture 1 35 / 35

Вам также может понравиться