Вы находитесь на странице: 1из 12

Session 5 - Multiple Regression Examples DIRECTIONS Problems should be done using the computer.

For each problem, be sure to: Discuss assumptions of the analysis for each problem Display all appropriate data and discuss the results as you would for an article Assumptions 1. Normal distribution used 2. Independent sampling 3. Simple random sampling 4. Constant variance error variance is same on all values of the predicted range 5. Using a linear model

1. Consider the following data: X 2 3 4 6 7 8 9 10 11 12 13 Y 3 6 8 4 10 14 8 12 14 12 16

a.) Regress Y on X (means dependent var. Y on the independent var. X). Obtain and interpret the prediction equation. Predict the value of Y when X=5. Y = 2.170 + .978*X. When X=5, Y=7.06. Assumptions: 1. Normal distribution used 2. Independent sampling 3. Simple random sampling 4. Constant variance 5. Using a linear model Steps to perform this: 1. Analyze Regression Linear 2. Select Y as the dependent variable and X as the independent variable 3. Click Statistics and select Descriptive (can leave others alone). 4. Plots leave for this example.

5. Save nothing 6. Options leave the defaults; nothing else needed (make sure the include constant in equation box is checked).

Y X

Descriptive Statistics Mean Std. Deviation 9.73 4.292 7.73 3.690

N 11 11

Correlations: note that .841 (the correlation coefficient, r) signifies a strong relationship (the closer you get to 1, the stronger the relationship; 0 = no relationship). Correlations Y X Pearson Correlation Y 1.000 .841 X .841 1.000 Sig. (1-tailed) Y . .001 X .001 . N Y 11 11 X 11 11 Variables entered/removed table is a throw away table (no need to copy/paste)

Model Summary Model Adjusted R R R Square Square 1 .841a .707 .675 Predictors: (Constant), X
dimension0

Std. Error of the Estimate 2.448

R squared is .841 squared. The correlation coefficient (.841 in the correlations table) becomes the R reported in the Model Summary. The reason its .841 is because we only have one predictor in this case; Y is predicted from X. If there are multiple predictors, then R in model summary becomes the multiple correlation coefficient. R is the % variance were explaining in dependent variable by independent variable. R squared is . 707; way it is interpreted: were explaining 71% of the variation in Y by X. It describes how well the model fits and how predictable our dependent variable is from our independent variable.

ANOVAb Sum of Squares 1 Regression 130.248 Residual 53.934 Total 184.182 a. Predictors: (Constant), X b. Dependent Variable: Y Model df Mean Square 1 130.248 9 5.993 Error 10 F 21.735 Sig. .001a

ANOVA table not really used. The Sig level on regression row.its testing whether R squared is different from 0. We dont really interpret this, though. Spend most of our time on the Coefficients Table. We are interested in R square to determine the appropriateness of the fit of model. Also interested in Coefficients table. Coefficientsa Model Unstandardized Coefficients B Std. Error 1 (Constant) 2.170 1.781 X .978 .210 a. Dependent Variable: Y Standardized Coefficients Beta .841

t 1.218 4.662

Sig. .254 .001

This table will show a row for the constant and a row for every predictor in the model (we only have 1 predictor, x). Sig. is reporting on the Null hypothesis of the predictor is 0 or the alternative hypothesis of the predictor is not 0. Ultimately, we are looking at this Coefficients table to identify what variables within the model are playing a significant role and are significant and arrive at a final model that you can express as a function and interpret. Always include the constant. Dont have to interpret the significance of the constant because were keeping it anyway (2.170). Unstandardized coefficients are the coefficients that make up the prediction equation. Unstandardized coefficients are saying that the metrics of the actual variables are preserved. Y = 2.170 + .978*(x) + Error (mean square residual) -- suppress error when trying to predict. Left with: Y = 2.170 + .978*(x) interpret the constant as the value of Y when X=0 (this is the intercept). Y=2.170 + .978*0 = 2.170 The way we interpret X is: What happens in Y for every increase in X?

b.) Plot the standardized residuals against the standardized predicted values. Do you see any pattern? What does this suggest? To Plot: 1. Analyze Regression Linear, select Plots button. 2. Standardized = Z 3. Y = ZRESD 4. X = ZPRED 5. Run the plot comment on whether see anything abnormal. The plot is for the residuals (the difference between the actual observations and predicted value); should be centered at 0. Looking for randomness (no pattern), centered at 0 (some dots above 0 and some below 0), and a horizontal channel (dots fall between 2 and -2, for example). This one looks good. Suggests General Linear Model. c.) Overall, how well does the model fit? Answer by commenting on the scatterplot of residuals. Linear relationship seems to be appropriate; constant variance seems to be appropriate; the parameterization of the model as a linear model seems to be approp. R squareexplaining about 71% of the variation of Y from X.

2. An experimenter was interested in the possible linear relation between the time spent per day in practicing a foreign language and the ability of the person to speak the language at the end of a 6-week period. Some 50 students were assigned at random among five experimental conditions ranged from 15 minutes practice daily to 3 hours practice per day. Then, at the end of 6 weeks, each student was scored for proficiency in the language. The data follow: Proficiency Scores, By Daily Practice Time (x = Practice, in hours) .25 .50 1 2 3 _____________________________________________________________________ 117 106 86 140 105 Assumptions: 85 81 98 128 149 1. Normal distribution used 112 74 125 108 110 2. Independent sampling 81 79 123 104 144 3. Simple random sampling 105 118 118 132 137 4. Constant variance (error variance is same on all 109 110 94 133 151 values of the predicted range 80 82 93 96 117 5. Using a linear model 73 86 91 101 113 110 111 122 103 142 78 113 130 135 112 _______________________________________________________________________ Find the linear regression equation for predicting Y, the proficiency of a student, from X, the practice time per day. Proficiency (i.e., Y) = 92.385 + 12.308*hours (x=hours/time) The constant is the y intercept (thats the value when x=0). i.e., without practice, proficiency is 92.4%. For every 1 hour I practice, my proficiency increases by 12.308. There is a moderate relationship between practice time and proficiency; for every hour of practice, you can increase your proficiency by approx 12 points. Descriptive Statistics Mean Std. Deviation 109.00 20.893 1.3500 1.03016

N 50 50

Score Time

Correlations Score Time Pearson Correlation Score 1.000 .607 Time .607 1.000 Sig. (1-tailed) Score . .000 Time .000 . N Score 50 50 Time 50 50 This matrix looks at the bivariate correlation between all the variables in the model; we only have 1 variable in our model. When more independent var, then these are the bivariate correlations that exist between all of the variables but its not the joint model. Pearson Correlation: r = .607 close enough to 1; sig. (1-tailed) - .000 which is <.05 (reject null hypothesis) .9+ = very strong r2 = .368 explaining 37% of proficiency by time .8 = strong .6-.7 = moderate .4-.5 = weaker but worth mentioning <.4 = not worth mentioning Model Summaryb Model Adjusted R Std. Error of the R R Square Square Estimate a 1 .607 .368 .355 16.779
di m

en

si

on

a. Predictors: (Constant), Time b. Dependent Variable: Score R = multiple correlation coefficient (in model summary); it is the correlation coefficient between the set of independent variables and the dependent variables. Because this model only has 1 variable, our correlation coeff is the same as the bivariate correlation coefficient (Pearson r).

R square = the proportion of variance explained by the model. The ANOVA table tests the null hypothesis that the proportion of variation explained by the model is 0 against the alternative that it is not 0. With regression analysis most models are significant. ANOVAb Sum of Squares 1 Regression 7876.923 Residual 13513.077 Total 21390.000 a. Predictors: (Constant), Time b. Dependent Variable: Score Model df Mean Square 1 7876.923 48 281.522 49 F 27.980 Sig. .000a

Coefficientsa Model Unstandardized Coefficients B Std. Error 1 (Constant) 92.385 3.937 Time 12.308 2.327 a. Dependent Variable: Score Standardized Coefficients Beta .607

t 23.468 5.290

Sig. .000 .000

The bulk of our analysis comes from here. Does the intercept (constant) have any meaning? The intercept represents the value of the dependent var when all the independent variables are 0 (x=0). Time is significant (.000 < .05, reject null hypothesis). If you practice for 0 hours, you expect to get a score of 92.385. For every additional hour we invest in studying, the return on that investment is an increase in score of 12.308. Residuals Statisticsa Minimum Maximum Mean Std. Deviation Predicted Value 95.46 129.31 109.00 12.679 Residual -24.538 25.308 .000 16.607 Std. Predicted Value -1.068 1.602 .000 1.000 Std. Residual -1.462 1.508 .000 .990 a. Dependent Variable: Score

N 50 50 50 50

Plot to check for randomness; residuals in channel around 0. Looking to see if errors are independent and random; if they are similar in variability, then our assumption of constant variance is probably appropriate.

3. An experimenter was interested in the possible linear relationship between the measure of finger dexterity X and another measure representing general muscular coordination Y. A random sample of 25 persons showed the following scores: Person 1 X Value 75 Y Value 84

2 77 94 Assumptions 3 75 90 1. Normal distribution used 4 76 90 2. Independent sampling 5 75 91 3. Simple random sampling 6 76 86 4. Constant variance error variance is same on all 7 73 87 values of the predicted range 8 75 95 5. Using a linear model 9 74 83 10 75 85 11 76 88 12 74 91 13 72 80 14 75 85 15 73 87 16 75 82 17 78 86 18 76 83 19 74 85 20 74 88 21 77 100 22 75 98 23 76 89 24 74 91 25 75 99 Test Correlate, bivariate Compute the correlation coefficient, and test its significance (=.05) r=.324, sig = .114 not sig. Therefore, no relationship between finger dexterity and general muscle coordination. 4. Based on the data in Exercise 3, find the regression equation for predicting X from Y. Plot this regression equation along with the raw data. What is the appropriate measure of the scatter or horizontal deviations of the obtained points in this plot about the regression line? He skipped this part and said maybe next session would cover it. 5. A developmental psychologist believes that the age at which a normal child begins to speak words clearly is closely related to the age at which the child first begins to use complete sentences. A random sample of 33 normal children was taken, and careful records were kept for each. Let X be the age at which words are first clearly used, and let Y be the age at which complete sentences are used. The following data gives the values of X and Y in months. Find the correlation between X and Y (.758 mod strong), and compute and interpret an appropriate regression equation (Age-Sentences) = 13.873 + .835*(AGE-WORDS); for every month delayed speaking words = .835 months longer to speak in sentences. (Explained: Y=age at which child speaks sentences. When x=0 (if the child started speaking words at

age 0), the earliest the child can speak in sentences is 13.873 months. If the child starts speaking at age 1 month, then the age start speaking in sentences is 13.873 + .835 = 14.708. In other words, if delayed speaking words by 1 month, then the age at which child speaks in sentences will be .835 months later. _____________________________________________________________________ Child X Y Child X Y Child X Y _____________________________________________________________________ 1 15.1 25.2 12 14.3 25.7 23 13.6 24.3 2 12.7 24.3 13 11.5 23.4 24 15.2 26.3 3 11.7 22.1 14 13.4 25.7 25 12.1 23.4 4 13.1 23.3 15 13.7 24.5 26 12.6 24.5 5 13.0 24.1 16 13.5 26.0 27 14.1 26.2 6 11.2 23.6 17 12.8 24.6 28 11.2 23.0 7 13.3 25.5 18 13.2 25.4 29 14.0 24.3 8 12.3 24.3 19 14.7 26.3 30 13.1 25.3 9 13.7 25.5 20 12.2 25.2 31 11.5 24.2 10 12.2 23.2 21 14.7 26.4 32 14.9 27.2 11 13.3 27.1 22 14.6 25.8 33 13.8 26.3 ________________________________________________________________________ Analyze, regression, linear Pearson = .758 the correlation is moderately strong (only one predictor variable) R2 = .574 (we are explaining about 57% of the variation; moderate) Coeff. Table (Age-Sentences) = 13.873 + .835*(AGE-WORDS) H0 = r square is = 0 H1 = r square is not = to 0 the 57% we are explaining is significantly better than 0.

Y X

Descriptive Statistics Mean Std. Deviation 24.915 1.2578 13.221 1.1409

N 33 33

Correlations Pearson Correlation Sig. (1-tailed) N Y X Y X Y X Y 1.000 .758 . .000 33 33 X .758 1.000 .000 . 33 33

Model 1 R .758a

di

Model Summaryb Adjusted R Std. Error of the R Square Square Estimate .574 .560 .8341

en

si

on

a. Predictors: (Constant), X b. Dependent Variable: Y Coefficientsa Model Unstandardized Coefficients B Std. Error 1 (Constant) 13.873 1.715 X .835 .129 a. Dependent Variable: Y Standardized Coefficients Beta .758

t 8.090 6.462

Sig. .000 .000

The constant is sig (but we would leave in anyway)

Вам также может понравиться