12 views

Uploaded by Shashank Jain

Notes for econometrics. Imporatant to understand multiple linear regression and other essential topics like finding the predicted values.

- STAT4607 ii
- 3211643
- 437-1225-1-SM.pdf
- chap3
- 69813
- PowerRatingEvalu Nelson Hansen
- Simple Regression
- How to Interpret Regression Analysis Results_ P-values and Coefficients _ Minitab.pdf
- 407
- Regression Analysis
- Multiple Linear Regression Ols Nice
- Regression Analysis
- SPSS
- Multiple Regression Analysis
- Notas de clase - Regresión Lineal
- Gary Grudnitski - Valuations of Residential Properties
- Econometrics With R
- Calibration Curve Guide
- project part seven
- Stargazer

You are on page 1of 23

Note #3

The Simple Regression Model

Shif Gurmu

1. Denition of the Model

The simple regression model can be used to study the

relationship between two variables, say y and x. In par-

ticular, we are interested in explaining y in terms of x or

in studying how y varies with changes in x. For example,

if y is hourly wage rate and x is education, we might be

interested in how changes in years of schooling aects

hourly wage rate. Another example is how crime rate in

a city (y) varies with changes in the number of police

ocers (x). Although the model is simple and has limi-

tations in empirical analysis, learning how to interpret the

simple regression model is a good practice for studying

multiple regression later on.

There are three important issues in specifying a simple

regression model that will explain y in terms of x.

1. Since there is never an exact relationship between

two variables, how do we allow for other factors to

aect y?

2. What is the functional relationship between y and x

(e.g., between y = wage and x = years of school-

ing)?

3. If desired, how are we capturing the ceteris paribus

relationship between y and x?

These issues can be resolved by specifying an equa-

tion relating y to x. A linear equation relating y to x

takes the form

y =

0

+

1

x + , (1)

where the various terms are dened as follows: y is the

dependent variable, the explained variable, the response

variable, the predicted variable, or the regressand; x is

the independent variable, explanatory variable, the con-

trol variable, the predictor variable, the regressor or co-

variate. Here

0

and

1

are unknown parameters;

0

is

the intercept parameter or the constant parameter and

1

is the slope parameter. Finally, the variable , called

the error term or the the disturbance term in the relation-

ship, represents factors other than x that aect y. Both

x, which is observed, and the unobserved factor aect

y.

Table 1. Some Examples

Dependent Variable (y) Independent Variable (x)

Salary Experience

Hourly wage Education

crime rate in cities Number of police ocers

Expenditure on clothing Disposable income

Corn output amount of fertilizer

If other factors in are held constant so that change

in is zero ( = 0), then from equation (1), x has a

linear eect on y:

y =

1

x. (2)

That is, holding the other factors in xed, the change in

y is simply the slope

1

times change in x. For example, a

simple linear regression model relating a persons hourly

wage in dollars (wage) to observed years of education

(educ) and other unobserved factors () is

wage =

0

+

1

educ + . (3)

Here

1

measures the change in hourly wage given an-

other year of education, holding all other factors xed.

In empirical applications, we are interested in the ceteris

paribus eect of x on y. To achieve this, we need to make

assumptions about the relationship between the random

variables x and . We cannot estimate the causal eect

of x on y without taking the relationship between x and

into account. Just assuming that the factors in are

held constant, will not be helpful in estimating the causal

eect of x on y.

One simple assumption is that the average value of the

error term in the population is zero:

E() = 0. (4)

However, this zero mean assumption does not say any-

thing about how x and are related Since x and are

random variables, we can dene the conditional distribu-

tion of given x, and hence get the conditional mean of

given x. The crucial assumption of the simple linear

regression model is the zero conditional mean assump-

tion:

E(|x) = 0. (5)

This says that knowing something about x does not give

us any information about . Assumption (5) follows from

E(|x) = E(), which says that is mean independent

of x, and the zero mean assumption (4).

The zero conditional mean assumption (5) implies that

E(y|x) =

0

+

1

x. (6)

This shows that the population regression function is a

linear function of x. For any given value of x, the distrib-

ution of y is centered about E(y|x). The linearity means

that a one unit increase in x changes the expected value

of y by the amount

1

. That is:

E(y|x)

x

=

1

. (7)

Note that the change in the conditional mean of y, given

x, is simply

1

times the change in x:

E(y|x) =

1

x (8)

for the population.

2. Estimation: Ordinary Least Squares Estimates

The basic idea of regression is to estimate the pop-

ulation parameters using a sample. Let {(y

i

, x

i

) : i =

1, 2, ..., n} denote a random sample of size n from the

population. Since the data come from (1), the regression

model for each i is

y

i

=

0

+

1

x

i

+

i

. (9)

We need to estimate the unknown regression parameters,

0

and

1

.There are dierent ways of estimating the in-

tercept and slope parameters, including the method of

moments and ordinary least squares (OLS) approach.

The focus here is on the OLS approach; see Wooldridge

(2013), page 28 in section 2.2, for explanation of the

method of moments approach.

In the OLS approach, we choose the estimates

0

and

1

to minimize the sum of squared residuals. Dene the

predicted regression line for the i-th observation as:

y

i

=

0

+

1

x

i

. (10)

Then the residual for the i-th observation is:

i

= y

i

y

i

, or (11)

i

= y

i

1

x

i

Figure 1 of Problems for Class Discussion #1 gives the

scatter diagram of salary against experience. In the con-

text of regression of y on x, the tted values and residuals

are depicted in Figure 2.4 in your textbook.

In the method of method of ordinary least squares, we

choose

0

and

1

simultaneously to make

n

X

i=1

y

i

1

x

i

2

(12)

as small as possible. This leads to two equations in two

unknowns:

n

X

i=1

y

i

1

x

i

= 0 (13)

n

X

i=1

x

i

y

i

1

x

i

= 0

Rearranging, we get:

n

0

+

1

n

X

i=1

x

i

=

n

X

i=1

y

i

(14)

0

n

X

i=1

x

i

+

1

n

X

i=1

x

2

i

=

n

X

i=1

x

i

y

i

These are called normal equations or rst order con-

ditions.

Solving the normal equations, we obtain the OLS esti-

mates:

1

=

P

n

i=1

(x

i

x) (y

i

y)

P

n

i=1

(x

i

x)

2

(15)

and

0

= y

1

x. (16)

The ensuing predicted regression line is

y

i

=

0

+

1

x

i

.

It follows that

y

i

x

i

=

1

, (17)

showing that the slope estimate

1

is the amount by

which y

i

changes when x

i

increases by one unit. For

any change in x

i

, the predicted change in y

i

is given by

y

i

=

1

x

i

. (18)

Let us consider two examples using output from Stata.

Example 1 - From Problem for Class Discussion #1:

In this illustration, we examine the relationship between

experience and salary of 32 economists from the Univer-

sity of Michigan using data from 1983. Let salary =

salary in thousands of dollars (y) and exper = years of

experience, dened as years since receiving Ph.D. (x). A

model relating an individual is salary to observed years

of experience and other unobserved factors is

salary

i

=

0

+

1

exper

i

+

i

. (19)

Using Statas regress command, the estimated regression

line for salary is

\

salary = 39.314831 + 0.439907exper. (20)

How do we interpret the equation? First, if experience is

zero, then the predicted salary is the intercept, $39.31483

in thousands or $39, 314.83. The predicted change in

salary is given by

\

salary = 0.439907 exper

so that

\

salary

exper

= 0.439907.

This says that if experience increases by one year, then

salary is predicted to increase by about $439.91 or 0.43991

(1000s of dollars).

What is the predicted salary for a person with 10 years

of experience? Using (20), the predicted salary is

\

salary = 39.314831 + 0.439907 10

= $43.713901

or $43, 713.90 in 1983 dollars.

Example 2 - From Problem for Class Discussion #2

This illustration is based on ZIP code-level data on prices

for various items at fast-food restaurants, along with

characteristics of the zip code population, in New Jer-

sey and Pennsylvania. The purpose is to see whether

fast-food restaurants charge higher prices in areas with

large concentration of blacks. In specication 2.1 (see

the do le), we estimate a simple linear model:

psoda

i

=

0

+

1

prpblck

i

+

i

(21)

where psoda

i

is price of soda and prpblck is the propor-

tion black in the i-th ZIP code.

The estimated soda price equation is

\

psoda

i

= 1.037399 + 0.0649269prpblck

i

.

so that the predicted change in the price of soda is:

\

psoda

i

= 0.0649269 prpblck

i

.

This means that If, say, prpblck increases by 0.10 (ten

percentage points), the price of soda is estimated to in-

crease by about 0.0065 dollars (0.0649*0.10), or about

0.7 cents.

3. Algebraic Properties of OLS and Goodness-of-Fit

I Algebraic Properties

The following are three most important algebraic proper-

ties of OLS estimates and their associated statistics.

1. The point ( x, y) is always on the OLS regression

line. That is, if we plug in x for x

i

in equation (10),

the predicted value is y.

2. The sum of the residuals, and therefore the sample

average of the residuals, is zero. That is

n

X

i=1

i

= 0. (22)

This follows from the rst line of the rst order con-

ditions given in simultaneous equation system (13).

3. The sample covariance between the regressors and

the OLS residuals is zero:

n

X

i=1

x

i

i

= 0. (23)

This follows from the second line of the rst or-

der conditions given in simultaneous equation system

(13); namely

n

X

i=1

x

i

y

i

1

x

i

= 0.

I Decomposition in Total Sum of Squares in the

Dependent Variable, y

We can view OLS as decomposing each y

i

into two parts,

a tted value and a residual:

y

i

= y

i

+

i

. (24)

Since the average of the residual is zero, = 0, the

average of the tted value y

i

is the same as the average

of y

i

. That is, y = y.

Next, dene three sum of squares terms associated with

regression. First, the total sum of squares (SST) is a

measure of the total variation in the dependent variable

y. This is given as

SST =

X

i

(y

i

y)

2

. (25)

Observe that SST/n 1 is the sample variance of y,

s

2

y

=

1

n1

P

i

(y

i

y)

2

. Second, the explained sum of

squares (SSE) is

SSE =

X

i

( y

i

y)

2

, (26)

where we use y = y. SSE measures the sample variation

in y. Finally, the residual sum of squares (SSR) mea-

sures the sample variation in the residual , and is given

as

SSR =

X

i

2

i

. (27)

The total sum of squares in y can be expressed as the

sum of the explained variation and the residual sum of

squares (or the unexplained variation):

SST = SSE + SSR. (28)

This follows by squaring (24) or equivalently

(y

i

y) = ( y

i

y) +

i

and then summing and simplifying, to get

X

i

(y

i

y)

2

=

X

i

( y

i

y)

2

+

X

i

2

i

.

The cross term 2

P

i

( y

i

y)

i

can be shown to be equal

to zero.

Example 3: In the output from step 7 of the Problems

for Class Discussion #1,we have SSE = 425.201684, SSR

= 1880.12047 and SST =2305.32216 so that 2305.32216

= 425.201684 + 1880.12047.

I Goodness-of-Fit

We want to have a measure of how well the independent

variable, x, explains the dependent variable y. To do this,

divide both sides of equation (28) by SST to get

1 =

SSE

SST

+

SSR

SST

.

Consequently, the R-squared of the regression or the

coecient of determination is dened as the ratio of

the explained variation compared to the total variation:

R

2

=

SSE

SST

(29)

= 1

SSR

SST

.

R

2

is the proportion of the sample variation in y that is

explained by x. The value of R

2

is always between 0 and

1. An R-squared close to zero indicates a poor t of

the regression line. OLS gives a perfect t to the data

if all points lie on the same straight line, in which case

R

2

= 1. It can be shown that R

2

is the square of the

sample correlation between y and and y.

Example 4: In Problems for Class Discussion #1,

R

2

= 0.1844 from Stata regression output. This implies

that 18.44% of the total variation in salary is explained

by experience. It means that about 81.56% of the salary

variations for the economists at Michigan remains un-

explained. This is not surprising since other important

determinants of salary are not included in the regression.

4. Nonlinear Models - Incorporating

Nonlinearities in Simple Regression

I Introduction

So far, we have assumed a linear relationship between

the dependent and independent variables. The linearity

assumption may be restrictive for some economic appli-

cations. For example, we can expect nonlinear relation-

ships in the following cases: (a) output and labor input,

(b) salary and experience, (c) number of doctor visits and

age. Why? Here, we focus on nonlinear models that

can be transformed to linear models by suitably dening

the dependent and independent variables. For now, we

cover transformed models involving logarithms, where y

and/or x appear in logarithmic form.

The simple linear regression model we considered earlier,

namely,

y =

0

+

1

x + ,

is said to be in a level-level form. As we saw earlier, a

level-level model implies that a unit increase in x leads to

a change in expected value of y equal to

1

, a constant

amount irrespective of the value of x.

We consider three functional forms where the dependent

variable and/or the explanatory variable appear in natural

logarithmic form

1. Double-Log (Log-Log or Constant Elasticity) Model

- In the constant elasticity model, both y and x are

in logs. The model is

log y =

0

+

1

log x + , (30)

where log stands for a natural logarithm and

1

is

the elasticity of y with respect to x. That is, if

= 0, then

%y

%x

=

1

. (31)

The unknown parameters can be estimated from a

simple linear regression of log y on log x, including

the intercept term.

The double-log model is derived from a Cobb-Douglas

type nonlinear equation relating y to x:

y =

0

x

1

e

. (32)

Taking log of both sides of (32) gives the double-log

model (30), where

0

= log

0

.

2. Log-level (Semi-log 1) - The dependent variable is

in log while the explanatory variable is in level form.

The Log-level mode takes the form

log y =

0

+

1

x + . (33)

The intercept and the slope parameters can be ob-

tained from OLS regression of log y on constant term

and x. In this model, a unit increase in x changes y

by a constant percentage. That is, If change in is

zero, then

%y

x

= 100

1

. (34)

The 100

2

is called the semi-elasticity of y with re-

spect to x.

3. Level-log (Semi-log 2) - Here x is logged, but y is

not. The level-log model is

y =

0

+

1

log x + . (35)

The unknown parameters are obtained from the OLS

regression of y on constant and log x. Assuming

= 0, the the slope parameter over 100,

1

/100,

is the change in y as a result of a 1% increase in x.

That is:

y

%x

=

1

/100. (36)

Table 2 gives summary of the functional forms considered

above.

Table 2. Summary of Functional Forms Involving Logarithms

Model

Depen.

Var.

Indep.

Var.

Interpretation

of

1

Alternative

Interp. of

1

level-level y x y =

1

x

y

x

=

1

log-log log y log x %y =

1

%x

%y

%x

=

1

log-level log y x %y = 100

1

x

%y

x

= 100

1

level-log y log x y = (

1

/100)%x

y

%x

=

1

/100

Example 5 - From Problem for Class Discussion #2

In specication 3.1, we estimate a semi-elasticity model

of log of price soda on proportion black:

lpsoda =

0

+

1

prpblck +

to get the predicted equation:

\

lpsoda = 0.0331 + 0.0625prpblck.

This means that, if proportion black increases by 0.20, the

price of soda is estimated to increase by 1.25% (0.0625*100%

times 0.20), which is a log-level form interpretation.

Example 6 - From Problem for Class Discussion #2

Specication 3.3 is based on the constant elasticity model,

where the coecient estimates are obtained from the

OLS regression of log of price of soda on log income.

The estimated equation is

\

lpsoda = 0.3614 + 0.0375lincome.

The slope estimate is

1

= 0.0375. If median income

in the ZIP code increases by 10%, then price of soda is

predicted to increase by 0.375% (0.0375 10).

5. The Means and Variances of the OLS Estimators

We now consider the statistical properties of the OLS

estimators, where we view

0

and

1

as estimators for

the parameters

0

and

1

in model (1). In doing so,

we need to formally state the assumptions of the simple

linear regression model (SLR) along the way. Following

Wooldridge (2013), we number the assumptions using

the prex "SLR".

5.1 Unbiasedness of OLS

We start with the assumption that denes the population

model using linear functional form.

Assumption SLR.1 (Linear in Parameters)

The model relating the dependent variable y to the ex-

planatory variable x and the disturbance is

y =

0

+

1

x +, (37)

where

0

and

1

are the population intercept and slope

parameters, respectively.

Since we are interested in estimating the unknown para-

meters

0

and

1

using data on y and x, we assume that

our data are obtained as a random sample.

Assumption SLR.2 (Random Sampling)

We have a random sample of size n, {(x

i

, y

i

) : i =

1, 2, ..., n}, from the population.

The model can now be written in terms of the random

sample:

y

i

=

0

+

1

x

i

+

i

(38)

i = 1, 2, ..., n,

where subscript i refers to observation i, such as person,

country, city and so on.

The next assumption requires that we have sample vari-

ation in the independent variable x.

Assumption SLR.3 (Sample Variation in the Explanatory Variab

The sample observations on x, {x

i

, i = 1, ..., n}, are

not all the same values. That is,

n

X

i=1

(x

i

x)

2

> 0.

If there is no variability in x, we cannot estimate the

model. In any case, there is no point in doing regression

if the explanatory variable is constant for all observations.

To establish the unbiasedness of the OLS estimators, we

need the zero conditional mean assumption we considered

earlier (see around equation 5).

Assumption SLR.4 (Zero Conditional Mean)

The error term has an expected value of zero given any

value of the independent variable. That is

E(|x) = 0.

Assumption SLR.4 implies that

E(y|x) =

0

+

1

x,

which is the population regression line. For a random

sample, assumption SLR.4 implies that

E(

i

|x

i

) = 0

and

E(y

i

|x

i

) =

0

+

1

x

i

.

Now we turn to unbiasedness property of the OLS esti-

mators:

1

=

P

n

i=1

(x

i

x) (y

i

y)

P

n

i=1

(x

i

x)

2

=

Cov(x, y)

Var(x)

which is the slope estimator of

1

, and

0

= y

1

x,

the intercept estimator.

F Main Result on Unbiasedness of OLS

It can be shown that, under assumptions SLR.1 through

SLR.4,

0

is an unbiased estimator of

0

and

1

is an

unbiased estimator of

1

. In other words,

E(

0

) =

0

, and E(

1

) =

1

. (39)

The unbiasedness property for

1

, for example, says that,

although

1

could be dierent from the true parameter

value

1

, the expected value (mean) of

1

is equal to the

true parameter

1

. We say that the distribution of

1

is

centered around

1

. Unbiasedness generally fails if any

of the four assumptions fail. For example if the data we

use are nonrandom (i.e., assumption SLR.2 fails), then

OLS estimators are biased.

5.2 Variances of the OLS Estimators

In addition to knowing the unbiasedness of

1

, it is useful

to know how far we can expect

1

to be away from

1

on

average. One measure of the spread of

1

about its mean

1

is called the variance of

1

. The square root of the

variance of an estimator is called the standard deviation or

the standard error of the estimator. We need one more

assumption, called "constant variance" assumption, to

obtain the variances of the OLS estimators.

Assumption SLR.5 (Homoskedasticity)

The error term has the same variance given any value

of the explanatory variable. That is:

Var(|x) =

2

.

The assumption of the constant variance of implies that

the conditional variance of y is constant:

Var(y|x) =

2

. (40)

If Var(|x) is not constant, say Var(|x) varies with x,

then the error term is said to exhibit heteroskedasticity or

nonconstant variance. Since Var(|x) =Var(y|x), het-

eroskedasticity is present whenever Var(y|x) is a function

of x.

Example 7.- Suppose we want to get an unbiased esti-

mator of the ceteris paribus eect of household income

on saving using the model:

saving =

0

+

1

income +.

According to SLR.4, we must assume that E(|income) =

0, which in turn implies that E(saving|income) =

0

+

1

income. On the other hand, if we impose the ho-

moskedasticity assumption Var(|income) =

2

, which

is the same as assuming Var(saving|income) =

2

. Thus,

while average saving is allowed to change with income,

the variability of saving about its mean is assumed to be

constant across all income levels.

Question - In Example 7, is the assumption of homoskedas-

ticity realistic?

F Main Result on Variances of the OLS Estimators

Under assumptions SLR.1 through SLR.5, we have the

following sampling variances and standard errors (SE) of

the OLS estimators.

Variance and Standard Deviation of

1

Var(

1

) =

2

P

n

i=1

(x

i

x)

2

(41)

SD(

1

) = /

n

X

i=1

(x

i

x)

2

1/2

. (42)

Variance and Standard Deviation of

0

Var(

0

) =

2

1

n

+

x

2

P

n

i=1

(x

i

x)

2

!

(43)

SD(

0

) =

1

n

+

x

2

P

n

i=1

(x

i

x)

2

!

1/2

. (44)

Covariance between

0

and

1

is

cov(

0

,

1

) =

x

2

P

n

i=1

(x

i

x)

2

. (45)

5.3 Estimating the Error Variance

The variances and standard errors given in equations (41)

through (45) are unknown since the variance of the error

term

2

is unknown. In empirical implementation, we

need to estimate

2

, and subsequently nd the estimators

of the variances of the OLS estimators.

It can be shown that an unbiased estimator of

2

is

2

=

SSR

n 2

=

P

n

i=1

2

i

n 2

(46)

MSR=

SSR

n2

is called mean squared residual/error. Its

square root, =

SSR

n2

1/2

is called the standard error

of the regression (or root MSR).

Given the estimator of the error variance, we can esti-

mate the variances and standard deviations of the OLS

estimators. For example, the estimator of the standard

deviation of the slope estimator

1

is given by

SE(

1

) = /

n

X

i=1

(x

i

x)

2

1/2

. (47)

where =

q

P

n

i=1

2

i

/(n 2). Similarly,

SE(

0

) =

1

n

+

x

2

P

n

i=1

(x

i

x)

2

!

1/2

. (48)

Standard errors of regression estimates play an impor-

tant role in constructing condence intervals and testing

hypotheses.

Example 8

In order to display the standard errors of the coecient

estimates, we will reconsider examples 1, 2 and 5 that

were based on problems for class discussions 1 and 2.

The results are summarized as follows.

Stata output from step 7 of the Problems for Class

Discussion #1 - The estimated salary equation is:

\

salary = 39.3148

(3.3994)

+ 0.4399

(0.1689)

exper (49)

n = 32, R

2

= 0.1844,

where standard errors are enclosed within parenthe-

ses. Standard errors in Stata results are given in

the table of coecient estimates under column title:

"Std. Err.".

Stata output from step 2.1 of the Problems for Class

Discussion #2 - Estimation results for soda price:

\

psoda = 1.0374

(0.0052)

+ 0.0649

(0.0240)

prpblck (50)

n = 401, R

2

= 0.0181,

where standard errors are again enclosed within paren-

theses.

Stata output from step 3.1 of Problems for Class

Discussion #2 - The results relating log of price of

soda to fraction black in the ZIP code are:

\

lpsoda = 0.0331

(0.0050)

+ 0.0625

(0.0229)

prpblck (51)

n = 401, R

2

= 0.01831.

Later on, we will discuss the use of standard errors in

inference about the regression parameters.

6. Eects of Scaling and Unit of Measurement on

OLS statistics

In Problem for Class Discussion #1, salary was measured

in thousands of dollars and education is in years of school-

ing. The predicted equation (six decimal points) is:

\

salary = 39.31483

(3.399436)

+ 0.439907

(0.1688867)

exper (52)

n = 32, R

2

= 0.1844.

Suppose we choose to measure salary in dollars. How

does the OLS estimates and their standard errors change

when salary is measured in dollars? We would like to

know how the regression statistics change without run-

ning another regression.

To answer the above question, let salarydol be salary

measured in dollars so that "salary in dollars" equals

"salary in 1000s of dollars" times 1000. That is, salary-

dol = salary*1000 or salary = salarydol/1000. It is not

dicult to see that the estimated salary in dollars equa-

tion is

\

salarydol = 39314.83

(3399.436)

+ 439.907

(168.8867)

exper (53)

n = 32, R

2

= 0.1844.

We obtain the intercept and slope in (53) by multiplying

the intercept and slope in (52) by 1000.The standard

errors are also multiplied by 1000, but the coecient of

determination does not change. Note that change in the

unit of measure of the dependent variable does not aect

the estimated eect of experience on salary. Whether

salary is measured in dollars or in thousands of dollars,

for an additional year of experience, salary is predicted to

increase by $439.91 or by $0.43991 in 1000s.

Generally, if the dependent variable is multiplied by a

constant c, then the OLS intercept and slope estimates

are also multiplied by c. This gives the new estimates

of the intercept and slope parameters. The standard

errors are also multiplied by the constant c, but R

2

is

not aected by scaling the dependent variable.

It is also useful to know what happens to OLS statis-

tics when the unit measure for the explanatory variable

x changes. We can use the price of soda example

from Problems for Class Discussion #2 to see what hap-

pens when the unit of measurement of the explanatory

changes. Recall that the predicted soda price is given as

\

psoda = 1.0374

(0.0052)

+ 0.0649

(0.0240)

prpblck (54)

n = 401, R

2

= 0.0181.

Suppose now the proportion black in the ZIP code is

measured in percentage. Let pctblck be percent black

in the ZIP code so that pctblck = prpblck*100. For the

regression of psoda on pctblck, we get .

\

psoda = 1.0374

(0.0052)

+ 0.0000649

(0.0000240)

pctblck (55)

n = 401, R

2

= 0.0181

In going from (54), where the independent variable is

measured as proportion, to (55) - independent variable in

percentage, the new slope term is obtained by dividing the

slope term associated with prpblck by 100. The standard

error associated with the slope term is also divided by

100. The intercept term is not aected by scaling the

explanatory variable. R-squared is not aected either.

Generally, if the independent variable is multiplied or di-

vided by some nonzero constant c, then the OLS slope

coecient is divided or multiplied by c, respectively. If

the independent variable is multiplied or divided by some

nonzero constant c, the standard error of the slope es-

timate is also divided or multiplied by c, respectively.

Changing the unit of measurement of only the indepen-

dent variable does not aect the intercept. The coe-

cient of determination is also not aected by scaling of

the independent variable.

7. Regression through the Origin

In some rare cases, we want to estimate a linear regression

model without an intercept term. Regression without in-

tercept means that, when x = 0, the expected value of y

is zero; the regression line passes through the origin. For

example if income (x) is zero, then income tax revenue

(y) must also be zero. We will also see later on exam-

ples where a model with an intercept is transformed to

another model without an intercept.

A linear regression model without an intercept (

0

= 0)

can be specied as

y =

1

x +, (56)

where

1

is the slope parameter associated with a model

without an intercept. We can use the method of ordinary

least squares to obtain the estimator (say

1

) of

1

. The

OLS method minimizes the sum of squared residuals:

n

X

i=1

y

i

1

x

i

2

. (57)

This gives the rst order condition:

n

X

i=1

y

i

1

x

i

x

i

= 0. (58)

Solving this equation for

1

gives the OLS estimator:

1

=

P

n

i=1

y

i

x

i

P

n

i=1

x

2

i

(59)

pertaining to a regression through the origin.

In Stata, we use the "noconstant" option to request esti-

mation of a linear model without an intercept term. For

example, Stata command:

reg y x,noconstant

suppresses the constant term (intercept) in the model.

- STAT4607 iiUploaded byAlex Rush
- 3211643Uploaded bySpencer Claiborne
- 437-1225-1-SM.pdfUploaded byAnonymous nSQwTw4
- chap3Uploaded bysagar1honnungar
- 69813Uploaded byEdu Merino Arbieto
- PowerRatingEvalu Nelson HansenUploaded bythetomhunt
- Simple RegressionUploaded byA.s. Charan
- How to Interpret Regression Analysis Results_ P-values and Coefficients _ Minitab.pdfUploaded bydevanshu6594
- 407Uploaded byLuminita Vasile
- Regression AnalysisUploaded byAntonio Garcia
- Multiple Linear Regression Ols NiceUploaded bymrnoboby0407
- Regression AnalysisUploaded bypravesh1987
- SPSSUploaded bydgcavalcante
- Multiple Regression AnalysisUploaded bydsdamos
- Notas de clase - Regresión LinealUploaded byDiegoFernandoGonzálezLarrote
- Gary Grudnitski - Valuations of Residential PropertiesUploaded byIrimia Mihai Adrian
- Econometrics With RUploaded byKrishnan Chari
- Calibration Curve GuideUploaded byShri Kulkarni
- project part sevenUploaded byapi-340827878
- StargazerUploaded byVíctor Hugo
- 6_192Uploaded byDragomir Vasic
- GLMUploaded bySumair Qazi
- Hasil Spss FerisUploaded byKethut Suswantoro
- BOYUploaded byNur Khafidz Arrosyid
- Lecture1.pdfUploaded byRashidAli
- Lecture 11 NotesUploaded bymartinhochihang
- EIV Regression - Final - May 2007Uploaded byrpcovert
- CHAP12Uploaded byAbd Al-Rahim Al-Nokta
- Bio2 ComboUploaded byjuntujuntu
- deepu final.docxUploaded bySneha Bhaip

- BCom Course Strucure & Syllabi - BIT Offshore Campus RAKUploaded byAakriti Mathur
- unit 1 slm and standardsUploaded byapi-295917681
- Ch.02 Rotation KinematicsUploaded byIcarus Can
- aUploaded byJunior Arroyo
- Numerical AnalysisUploaded byJaspreetBrar
- Karo Tom OpalinidosUploaded byOscar Chavarría
- 6_20Apr2013Uploaded byArun AT
- 4075-EnterpriseArchitectureWhitepaperUploaded byRubens Sales
- r7410302-Database Management SystemsUploaded bysivabharathamurthy
- Cryptanalysis and Brute Force Attacks PaperUploaded byJamesPrice
- Corporate Finance (Chapter 4)(7th ed)Uploaded byIsrat Mustafa
- Trigonometria y VectoresUploaded bysussel baday
- Lesson Guide 4 - Book 1 - Comprehension of Whole Number v0.2Uploaded byMhing Pablo
- Analysis of Analogue and Digital Modulation in RF Transmitter ArchitectureUploaded byThemba Kaonga
- Parabola 123Uploaded by1553
- Lógica 1º Semestre CambridgeUploaded byjoao
- Fuji PID PXR459 Operation ManualUploaded bymelfer
- Mandatory Portfolio Disclosure, Stock Liquidity, And Mutual Fund PerformanceUploaded bymnhasan150
- Basic and Advance Excel CorporateUploaded byManoj Manoharan
- Gauss Theorem ProofUploaded bybpmurta
- csharpUploaded byCharanSinghkutba
- DA00-SUploaded byhaolk
- How to Think Like a Computer Scientist - C Version v1.05Uploaded byLuis Alfredo Rodriguez Escobedo
- Optimal Guard Node Placement Using SGLD and Energy FactorUploaded byJournal of Computing
- Plasma & BoseUploaded bysunma12345
- python handout.pdfUploaded byMarcellin Marca
- Cracking in RC AnalysisUploaded bySaroj Kumar Sahu
- Handout 8Uploaded byAnum Nadeem Gill
- Clausius MussottiUploaded bygopiikrishna
- hypoDD45Uploaded byblacklotus21