You are on page 1of 5

Simple

Regression
Osa Rafshodia
with a simple linear regression model with one continuous variable predicting a continuous outcome

F a k u l t a s K e d o k t e r a n U n i v e r s i t a s M u l a w a r m a n 2 0 1 3

Table of Contents
Continuous predictors: Linear ................................................................................. 3 Chapter overview ............................................................................................................................................... 3 Simple Linear Regression ............................................................................................................................... 3 Multiple regression ............................................................................................................................................ 4 Graphing ................................................................................................................................................................. 5 Checking for nonlinearity graphically ...................................................................................................... 5 1. Examining scatterplot of predictor and outcome ........................................................................... 5 2. Checking for nonlinearity using residuals .......................................................................................... 5 3. Checking for nonlinearity using locally weighted smoother ..................................................... 5 4. Graphing outcome mean at each level of predictor ....................................................................... 5

Continuous predictors: Linear


Chapter overview

By : dr. Osa Rafshodia Rafidin, MSc.IH, MPH This chapter focuses on how to interpret the coefficient of a continuous predictor in a linear regression model. This chapter begins with a simple linear regression model with one continuous variable predicting a continuous outcome. Terminology: Continuous and categorical variables When I use the term continuous variable, I am referring to a variable that is measured on an interval or ratio scale. By contrast, when I speak of a categorical (or factor) variable, I am referring to either a nominal variable or an ordinal/interval/ratio variable that we wish to treat as though it were a nominal variable

Simple Linear Regression


Lets focus only on males who were interviewed in 2008, and use the educational level of respondent as the outcome variable (tab educ, missing). Below we see a frequency distribution of this variable. We can see that the variable ranges from 0 years of education to 20 years of education. The missing value code .n indicates no answer. The dataset includes several variables that can be used as predictors of the respondents education, including the education of the respondents father, the education of the respondents mother, and the age of the respondent. (-- sum paeduc maeduc age). As we interpret the meaning of these education variables, lets assume that having 12 years of education corresponds to graduating high school and having 16 years of education corresponds to completing a four-year college degree. Lets run a simple regression model in which we predict the education of the respondent from the education of the respondents father. (-- regress educ paeduc). educ = 9.74 + 0.36paeduc The intercept is 9.74 and the coefficient for paeduc is 0.36. The intercept is the predicted mean of the respondents education when the fathers education is 0. For every one-year increase in the education of the father, we would predict that the education of the respondent increases by 0.36 years. We could use the regression equation to compute the predicted mean of the respondents education for any given level of the fathers education. For example, if the father had 8 years of education.. (-- margins, at(paeduc=8).. margins, at(paeduc=(8 12 16)) vsquish

Multiple regression
Lets now turn to a multiple regression model that predicts the respondents education from the fathers education, the mothers education and the age of the respondent. (regress educ paeduc maeduc age) educ = 6.96 + 0.26paeduc + 0.21maeduc + 0.03age The coefficients from this multiple regression model reflect the association between each predictor and the outcome after adjusting for all the other predictors. For example, the coefficient for paeduc is 0.26, meaning that for every one-year increase in the education of the father, we would expect the education of the respondent to be 0.26 years higher, holding the mothers education and the age of the respondent constant. As an aid to interpreting the coefficients from the multiple regression model, we can compute adjusted means of the outcome as a function of one or more predictors from the model. For example, to help interpret the coefficient for paeduc, we can use the margins command to compute the adjusted mean of education given different values of fathers education, adjusting for the other mean of the respondents education when the fathers education equals 8, 12 and 16, adjusting for the other predictors (mothers education and age of the respondent). (margins, at(paeduc=(8 12 16)) vsquish). Terminology: Adjusted means Looking at the output from the previous margins command, we see that when a respondents father has 8 years of education, the respondent is predicted to have 13.03 years of education, after adjusting for education of the mother and age of the respondent. What do we call the quantity 13.03 ? we can call this a predicted mean after adjusting for all other predictors. For example, we can say that the predicted mean, given the father has 8 years of education, is 13.03 after adjusting for all other predictors. We could also call this an adjusted mean. When the father has 8 years of education, the adjusted mean is 13.03. (The term adjusted mean implies after adjusting for all other predictors in the model). The margins command allow us to hold more than one variable constant at a time. In the example below, we compute the adjusted means when the fathers education equals 8, 12 and 16, while holding the mothers education constant at 14. (margins, at(paeduc=(8 12 16) maeduc=14) vsquish). Compared with the result of the previous margins command, we can see that the adjusted means are higher when the mothers education is held constant at 14. However, the effect of the fathers education remains the same. For the example, in the previous margins command, the change in the adjusted means due to increasing the fathers education from 8 to 16 years was 2.06 (15.09 13.03). When the mothers education is held constant at 14, the change due to increasing the fathers education from 8 to 16 years is the same (aside from rounding), 2.07 *15.61 13.54) . Although the adjusted means are higher when the mothers

education is held constant at 14 years, the difference in the adjusted means due to increasing the fathers education remains the same.

Graphing
margins, at(paeduc=(0(4)20)) marginsplot Checking for nonlinearity graphically These approaches include : 1. examining scatterplot of predictor and outcome 2. examining residual-versus fitted plots 3. creating plots based on locally weighted smoothers 4. plotting the mean of the outcome for each level of the predictor 1. Examining scatterplot of predictor and outcome Lets look at a scatterplot of the size of the engine(displacement) by the length of the car(length) with a line showing the linear fit. (use autosubset//graph twoway (scatter displacement length) (lfit displacement length), ytitle(Engine displacement (cu in.)) legend(off)) 2. Checking for nonlinearity using residuals We can check for nonlinearity by looking at the relationship between the residuals and predicted values, after accounting for the other variables in the model. (regress displacement length trunk weight//rvfplot) 3. Checking for nonlinearity using locally weighted smoother Suppose we wanto to determine the nature of the relationship between year that the respondent was born and education level. (graph use "/Users/osa/Desktop/survival analisis/gss_ivrm lowess.gph") 4. Graphing outcome mean at each level of predictor One sample way to create such a graph is to fit a regression model predicting the outcome treating the predictor variable as a categorical (factor) variable. (graph use "/Users/osa/Desktop/survival analisis/yrborn.gph")