Вы находитесь на странице: 1из 3

Linear Regression Models

Simple Linear Regression



Simple linear regression is the study of the linear relationship between two random
variables X and Y. We call X the independent or explanatory variable and we call Y the
dependent, the predicted or the forecast variable. We assume that we can represent the
relationship between population values of X and Y using the equation of the linear model
i i i
X Y
1 0
.
We call
0
the Y-intercept and it represents the expected value of Y that is independent
of X or the expected value of Y when X equals zero, if appropriate. We call
1
the slope
and it represents the expected change in Y per unit change in X, i.e., the expected
marginal change in Y with respect to X. Finally, we call
i
the random error in Y for
each observationi .

Given a sample of X and Y values, we use the method of least squares to estimate sample
values for
0
and
1
which we call
0
b and
1
b , respectively. We represent the predicted
value of Y using the prediction line equation or the simple linear regression equation
X b b Y
1 0

.
We call
0
b the sample Y-intercept and it represents the expected value of Y that is
independent of X, and we call
1
b the sample slope and it represents the expected change
in Y per unit change in X, i.e., the expected marginal change in Y with respect to X.
Finally, we call Y

the predicted value of Y.



The coefficient of Determination and the Correlation coefficient

The coefficient of determination is the statistic
2
r and it measures the proportion of the
linear variation in Y that is explained by X using the regression model.

The correlation coefficient is the statistic r and it measures the strength of the linear
association between X and Y.

t Test for the Slope
1


H
0
:
1
= 0 (there is no linear relationship)
H
1
:
1
0 (there is a linear relationship)

For = .05 the p-value for the slope (the coefficient of the explanatory variable) should
be less than .05 for the sample slope b
1
to be statistically significantly different from zero,
indicating the presence of a statistically significant linear relationship between X and Y.

Important Notice: The p-value for the slope and the Significance F are the same in
Simple Linear Regression, leading to the same conclusion.
Multiple-Linear Regression

Multiple-linear regression or multiple regression is the study of the linear relationship
between more than two random variables X
1
, X
2
, , X
k
, and Y. We call the X
i
,
i=1,2,,k, the independent or explanatory variables and we call Y the dependent, the
predicted or the forecast variable. We assume that we can represent the relationship
between population values of X
i
and Y using the equation of the linear model
i k k i i
X X X Y ...
2 2 1 0
.
We call
0
the Y-intercept and it represents the expected value of Y that is independent
of X or the expected value of Y when each X
i
equals zero, if appropriate. We call
i
the
slope of Y with variable X
i
, holding each X
j
, ji, constant; and it represents the expected
change in Y per unit change in X
i
, i.e., the expected marginal change in Y with respect to
X
i
, holding each X
j
, ji, constant. Finally, we call
i
the random error in Y for each
observationi .

Given a sample of X and Y values, we use the method of least squares to estimate sample
values for
0
and
i
which we call
0
b and
i
b for all i = 1,,k, respectively. We represent
the predicted value of Y using the prediction line equation or the simple linear regression
equation
k k
X b X b b Y ...

1 1 0
.
We call
0
b the sample Y-intercept and it represents the expected value of Y that is
independent of X or the expected value of Y when each X
i
equals zero, if appropriate.
We call
i
b the sample slope of Y with variable X
i
, holding each X
j
, ji, constant; and it
represents the expected change in Y per unit change in X
i
, i.e., the expected marginal
change in Y with respect to X
i
, holding each X
j
, ji, constant. Finally, we call Y

the
predicted value of Y.

The Adjusted
2
r

The adjusted
2
r measures the proportion of the linear variation in Y that is explained by
the multiple-regression modeladjusted for the number of independent variables and
sample size.

The coefficient of Partial Determination

The coefficient of partial determination measures the proportion of the linear variation in
Y that is explained by X
i
, holding each X
j
, ji, constant.

Significance Testing

In a multiple-regression model with two or more independent variables, we recommend
that those independent variables that fail the t-test of significance for their slope
coefficients, be removed from the model and the model be run again without them.
Recall that this t-test of significance is the same as the slope test in a simple linear
regression model (outlined above).

Interactions

Interaction terms, or cross-product terms, are introduced into a multiple-regression model
when the effect of an independent variable, X
i
on the dependent variable Y changes
according to the values of other independent variables, X
j
, ji. In such cases we
recommend running the model with all the relevant interaction terms, and removing those
interaction terms that are not statistically significant, while keeping those that are
statistically significant, then running the model again followed by significance testing to
confirm the effects of all remaining interaction terms.

Вам также может понравиться