Вы находитесь на странице: 1из 4

ALEKS 1/8/20, 10(43 AM

Learning

QUESTION

Bivariate data obtained for the paired variables x and y are shown below, in the table labelled
"Sample data." These data are plotted in the scatter plot in Figure 1, which also displays the
least-squares regression line for the data. The equation for this line is y = 11.40 + 0.60x .

In the "Calculations" table are calculations involving the observed y values, the mean y of these

values, and the values y predicted from the regression equation.

y
Sample data Calculations 32

2 2 2
x y y− y y− y y−y
30

28
21.7 25.2 3.0276 0.6084 6.3504

24.2 25.3 2.6896 0.3844 1.0404 26

26.3 25.7 1.5376 2.1904 0.0576


24
27.6 29.1 4.6656 1.2996 1.0404
22
29.6 29.4 6.0516 0.0576 4.9284
x
Column
17.9720 4.5404 13.4172
sums 22 24 26 28 30 32
Figure 1

Answer the following:

1. The variation in the sample y values that is explained by the estimated linear relationship
between x and y is given by the ? , which for these data is
? .

2
2. The value r is the proportion of the total variation in the sample y values that is
explained by the estimated linear relationship between x and y. For these data, the value of
2
r is . (Round your answer to at least 2 decimal places.)

3. The least-squares regression line given above is said to be a line which "best fits" the
sample data. The term "best fits" is used because the line has an equation that minimizes the
? , which for these data is ? .

4. For the data point (27.6, 29.1), the value of the residual is . (Round your answer to at
least 2 decimal places.)

EXPLANATION

Simple linear regression, which can be used to predict values of one variable from values of
another variable, can also be used to quantify and evaluate the linear relationship between two
variables. In this problem, we examine some statistics used in simple linear regression, statistics

https://www-awa.aleks.com/alekscgi/x/Isl.exe/13mnfybujyKQVP-K…DcXyozC4CaHPK7IBXNaB3Neii7RUDGqu5TyL7RyKumaMaB1krYfHol-VPI6R Page 1 of 4
ALEKS 1/8/20, 10(43 AM

which measure the variation in data values and the amount of this variation that is explained by
a "best fitting" line relating the data values.

Suppose that a bivariate data set has the n points x 1 ,y 1 , x 2 ,y 2 ,..., x n ,y n . (In the bivariate
data set in the problem, there are five data points.) The total sum of squares for this data set,
abbreviated SST, gives a measure of the variation of the y values about their mean y :

n
SST =
Σ
i=1
yi − y
2
.

Note that the SST is simply a multiple of the sample variance of the y values. The deviations
yi − y entering into the calculation of the SST for the data set in this problem are represented by

the red lines in Figure 2 below. These deviations show a pattern: for small x values the
deviations are negative, and for largex values the deviations are positive. This suggests that
there is a relationship between the two variables x and y. We can use simple linear regression to
find a "best-fitting" line for the data points and examine the linear relationship between the two
variables.

Suppose that a line having equation y = a + b x is fit to the n data points


x 1 ,y 1 , x 2 ,y 2 ,..., x n ,y n . We'll call this line the regression line and its equation the

regression equation. (In Figure 1, the regression line has equation y = 11.40 + 0.60x .) The
difference y i − y i between the observed y value (y i) for a given x value and the predicted y value
( y i) for that x value is called a residual. (The residuals for the data points in the problem and
the least-squares regression line y = 11.40 + 0.60x are represented by the red lines in Figure 4
below.) When each of the residuals is squared and these squares are added, the value obtained
is called the error sum of squares (or sum of squares for error), abbreviated SSE:

Σ
2
SSE = yi − y i .

i=1
The SSE gives a measure of how far the data points tend to be from the
Measuring the
line. The farther the data points tend to be from the line, the larger the
distance of data
value of the SSE. The least-squares regression line is the regression line
points from a
that minimizes the value of the SSE for a given set of data points. In other
line
words, of all the regression lines that could be drawn to fit the data points,
the least-squares regression line is the one that gives the smallest value for
the SSE, and therefore may be said to "best fit" the data points. For the five
data points in the problem, the least-squares regression line is
y = 11.40 + 0.60x .
Along with the SSE and the SST, another important statistic is the regression sum of squares (or
the sum of squares for regression), abbreviated SSR and defined by

Σ
2
SSR = yi− y .

i=1

https://www-awa.aleks.com/alekscgi/x/Isl.exe/13mnfybujyKQVP-K…DcXyozC4CaHPK7IBXNaB3Neii7RUDGqu5TyL7RyKumaMaB1krYfHol-VPI6R Page 2 of 4
ALEKS 1/8/20, 10(43 AM

The deviations yi− y entering into the calculation of the SSR for the data set in the problem are
shown by the red lines in Figure 3 below. As implied by this figure, the SSR gives a measure of
the total variation in the sample y values that is explained by the linear relationship between x
and y as given by the regression equation.

y y y

32 32 32

30 30 30

28 28 28

26 y 26.94 26 y 26.94 26 y 26.94

24 24 24

22 22 22

x x x

22 24 26 28 30 32 22 24 26 28 30 32 22 24 26 28 30 32

Figure 2. Deviations (in red) used in Figure 3. Deviations (in red) used in Figure 4. Deviations (in red) used in
calculating the SST calculating the SSR calculating the SSE x

It turns out that

The relationship
among the SST,
S S T = S S R + S S E. the SSR, and the
SSE

This is an important result: the total variation (SST) in the sample y values The r 2 statistic
is equal to the sum of the variation (SSR) that is explained by the
regression equation and the variation (SSE) that is not explained by the
SSR
regression equation. The ratio gives the proportion of the total
SST
variation in the sample y values that is explained by the regression
2
equation. This ratio is denoted by r .

We are now ready to answer parts 1 through 4 in the original problem:

1. The variation in the sample y values that is explained by the estimated linear relationship
between x and y is given by the regression sum of squares (SSR). For this data set and for the
regression line y = 11.40 + 0.60x , the SSR is given by the sum of the squared deviations in the
third column of the "Calculations" table: 13.4172 .

2 SSR
2. The value r is the proportion of the total variation in the sample y values that is
SST
explained by the estimated linear relationship between the two variables. (For these data and
the regression line y = 11.40 + 0.60x , the SSR is given by the sum in the third column of the
"Calculations" table, and the SST is given by the sum in the first column of the "Calculations"
2 SSR 13.4172
table.) The proportion is r = = , which is approximately 0.75 .
SST 17.9720
3. The least-squares regression line has an equation that minimizes the error sum of squares
(SSE), which is the sum of the squares of the residuals. For this data set and its least-squares

https://www-awa.aleks.com/alekscgi/x/Isl.exe/13mnfybujyKQVP-K…DcXyozC4CaHPK7IBXNaB3Neii7RUDGqu5TyL7RyKumaMaB1krYfHol-VPI6R Page 3 of 4
ALEKS 1/8/20, 10(43 AM

regression line y = 11.40 + 0.60x , the squares of the residuals are given in the second column
of the "Calculations" table, so the SSE is the sum of the numbers in this column: 4.5404 .

4. The residual for the data point 27.6,29.1 is equal to the observed y value, which is 29.1,
minus the y value predicted from the regression line y = 11.40 + 0.60x , which is
11.40 + 0.6 27.6 = 27.96. Thus, the residual is 29.1 − 27.96 = 1.14.

ANSWER

1. The variation in the sample y values that is explained by the estimated linear relationship
between x and y is given by the regression sum of squares , which for these data is
13.4172 .

2
2. The value r is the proportion of the total variation in the sample y values that is
explained by the estimated linear relationship between x and y. For these data, the value of
2
r is 0.75 . (Round your answer to at least 2 decimal places.)

3. The least-squares regression line given above is said to be a line which "best fits" the
sample data. The term "best fits" is used because the line has an equation that minimizes the
error sum of squares , which for these data is 4.5404 .

4. For the data point (27.6, 29.1), the value of the residual is 1.14 . (Round your answer to at
least 2 decimal places.)

https://www-awa.aleks.com/alekscgi/x/Isl.exe/13mnfybujyKQVP-K…DcXyozC4CaHPK7IBXNaB3Neii7RUDGqu5TyL7RyKumaMaB1krYfHol-VPI6R Page 4 of 4

Вам также может понравиться