Вы находитесь на странице: 1из 18

A Tool for Linear Prediction

What is regression?
refers to the method of predicting the value of one variable on the basis of the known value/s of other variable/s done by establishing a simplified statement, or a model, of the relationship between a variable and another variable or set of variables

Example of a Statement
Y = + 1X1 + 2X2 + 3X3 + This statement is called a regression equation This implies that the value of Y, except for a random error, is determined by the values of the variables X1, X2 and X3. When the observed variations in the values of Y can largely be accounted for by the values of X1, X2 and X3 then the statement is acceptable.

What is a regression equation?


A regression equation is the equation of the line which is defined by the path of the means of Ys for fixed Xs, or the regression equation of Y on X. Assumptions: equation is linear Y is normally distributed for each value of X Var(Y) is the same for each value of X

A Graphical Presentation
Y

Income

X1

X2

X3

X4

X5

X6

X7

Education Fig. 1. General form of regression of Y on X or the path of the means of Y values for fixed values of X

Method for finding the regression line

Least squares procedure - method for finding the straight line that minimizes the squared deviations of points around it

Assumptions: 1. Random sampling 2. E( i) = 0 3. Xi and i are statistically independent

A Graphical Presentation
Y

vertical distances between the line and the points X

Fig. 1. Least-squares equation minimizing sum of squares of vertical distances and estimating the regression of Y on X

Components of deviations

SST = sum of squared deviations between the mean and the observed values (d.f. = n -1) SSR = sum of squared deviations between the observed values and the corresponding fitted values (d.f. = p - 1, or the number of explanatory variables) SSE = unexplained deviations; = SST - SSR with d.f. = n - p

Statistics for Evaluating Regression Equation

F =

MSR MSE

where, MSR = SSR/(p - 1) MSE = SSE/(n - p) Thus, F is a measure of the relative size of the variations accounted for by the fitted regression line to the unexplained remaining variations.

Statistics for Evaluating Regression Equation


Coefficient of determination, R2 R2 = SSR/SST R2 = 1.0 implies that the regression line is a perfect fit R2 = 0.0 implies that the regression line fails to explain the deviations in the observed values; ie, all deviations are accounted for by SSE

Statistics for Evaluating Regression Equation


Regression coefficient - quantitative measure of the change in the value of the dependent variable (Y) that is effected by a unit change in the value of the explanatory variable (X).

Y = a + b1X1 + b2X2 + b3X3 + b4X4


regression coefficients

Statistics for Evaluating Regression Equation

Partial Fi - measure of the significance of the contribution of the regression coefficient bi , given that the other variables are included in the model.

Y = a + b1X1 + b2X2 + b3X3 + b4X4


F1 F2 F3 F4

Stepwise Regression
Used when there is no definite model to be tested, or when from a set of variables a subset is to be identified as the explanatory variables
Variables are entered into (or step in) or removed from (or step out) the model one at a time

Stepping Method Criteria:

significance of the associated F-value

The explanatory variable with the highest F-value is


entered first

A variable is not entered if the contribution to the overall


F-value is not significant

Regression with Dummy Variables


Regression procedure when at least one of the explanatory variables are categorical (either nominal or ordinal) A categorical variable with k levels is converted into (k - 1) indicator variables Example: Levels of the original variable, X4 : X41, X42, X43, X44 Dummy variables : DX41, DX42, DX43

Regression with Dummy Variables


The dummies are defined as follows: DX41 = 1, if the unit belongs to X41 level of variable X4 0, otherwise DX42 = 1, if the unit belongs to X42 level of variable X4 0, otherwise DX43 = 1, if the unit belongs to X43 level of variable X4 0, otherwise Note that when the three dummies are all zero, the unit belongs to the omitted level of variable X4. Thus, there is no need for another dummy variable.

Regression with Dummy Variables

The Model: Y = a + b1X1 + b2X2 + b3X3 + b4X4 If X4 is a categorical variable, the adjusted model is: Y = a + b1X1 + b2X2 + b3X3 + b4DX41 + b42DX42 + b43DX43

Regression with Dummy Variables


Interpretation of the Coefficients of Dummy Variables Value of X4 X41 X42 X43 X44 Mean value of Y Y = a + b1X1 + b2X2 + b3X3 + b41 Y = a + b1X1 + b2X2 + b3X3 + b42 Y = a + b1X1 + b2X2 + b3X3 + b43 Y = a + b1X1 + b2X2 + b3X3

Вам также может понравиться