(LSRL) Recall from yesterday: Scatterplot: Direction Form Strength Unusual features quantitative variables does not imply causation 'r' applies to linear associations only a 'bent' association may be made linear by re-expressing the data After standardization October 15, 2014 Linear Regression Line Equation 'models' the relationship allowing us to predict y values for any given x value y = 34x +310 where y = rent with hotel x = spaces from 'go' (39, 2000) y = 34(39)+ 310 y = 1636 'hat' means predicted Linear Regression Line Residual residual = y - y residual = 2000 - 1636 = 364 if residual > 0 then observed y value is greater than predicted value i.e. the actual data point is above the regression line residual = observed - predicted Calculating Slope of Least Squares Regression line Recall: r = .83 for explanation, see Math Box on page 175 Moving away from 'standardized' equation to actual equation This is nice, but it means we have to change all of our data to z- scores, solve equation and then convert z-scores back to regular data but this is too cumbersome! slope= y-intercept October 15, 2014 Revisiting yesterday's problem a = b = Remember: the least squares regression line always passes through the center of the data (x, y) October 15, 2014 QUOTATION OF THE DAY "Thank God the research didn't find that novels increased tooth decay or blocked up your arteries." LOUISE ERDRICH, the novelist, on studies finding that after reading literary fiction people performed better on tests measuring empathy, social perception and emotional intelligence. To nd the equation of a regression line, use the slope and the point If a scatterplot satises what conditions, we can discuss correlation and regression line? no outliers no discernible pattern besides being linear both variables quantitative October 15, 2014 Important notes about the LSRL Before you use the LSRL, you must look at the scatterplot to make sure the relationship is linear Be sure to distinguish between actual data values and predicted values. Use words like 'predicted' and 'on average' The slope of the LSRL has the same sign as r, the coorelation coefficent Be able to answer questions, such as: What is the equation of the LSRL? What is the interpretation of the slope and y-intercept? October 15, 2014 The answer? Residuals How well does the line t the data? How? Create a scatterplot of the residuals vs. the x-values. How appropriate is our line in making predications? What is it about R 2 ? R 2 : the coefficient of determination Shows how good one term is at predicting another. The closer r 2 is to 1, the better the statistical chance that the input x will correctly produce term y If r square is .80, then 80% of the variation in the values of y is explained by its linear relationship with the values of x R e s i d u a l s Registrations 400 600 800 1000 0 -10 10 October 15, 2014 Residuals (continued) A good residual plot - one that confirms the appropriate use of the LSRS - should : have similar scatter throughout have no interesting features should have no direction of shape Residuals (continued) A good residual plot - one that confirms the appropriate use of the LSRS - should : se = standard deviation of the residuals A small Se implies residuals that are closely centered about the mean The mean of the residuals = 0 the units of the residuals are the same units of the y-values Step 1: Check that the two variables are quantitative Check that the association is linear Check for presence of outliers Before you find the best-fit-line: Step 2: Do the math Find the least squares line r r 2 October 15, 2014 State the conclusion What does the y-intercept tell you? What does R 2 tell you? se = standard deviation of the residuals October 15, 2014 Finally: check the residual plot to make sure that the linear model is appropriate Note: you can not work backwards; You cannot use the linear regression line to predict x October 15, 2014 October 15, 2014 r(sy/sx).