Explain PDF

ALEKS 1/8/20, 10(43 AM
Learning
QUESTION
Bivariate data obtained for the paired variables x and y are shown below, in the table labelled
"Sample data." These data are plotted in the scatter plot in Figure 1, which also displays the
least-squares regression line for the data. The equation for this line is y = 11.40 + 0.60x .
In the "Calculations" table are calculations involving the observed y values, the mean y of these
values, and the values y predicted from the regression equation.
y
Sample data Calculations 32
2 2 2
x y y− y y− y y−y
30
28
21.7 25.2 3.0276 0.6084 6.3504
24.2 25.3 2.6896 0.3844 1.0404 26
26.3 25.7 1.5376 2.1904 0.0576

24
27.6 29.1 4.6656 1.2996 1.0404
22
29.6 29.4 6.0516 0.0576 4.9284
x
Column
17.9720 4.5404 13.4172
sums 22 24 26 28 30 32
Figure 1
Answer the following:
1. The variation in the sample y values that is explained by the estimated linear relationship
between x and y is given by the ? , which for these data is
? .
2
2. The value r is the proportion of the total variation in the sample y values that is
explained by the estimated linear relationship between x and y. For these data, the value of
2
r is . (Round your answer to at least 2 decimal places.)
3. The least-squares regression line given above is said to be a line which "best fits" the
sample data. The term "best fits" is used because the line has an equation that minimizes the
? , which for these data is ? .
4. For the data point (27.6, 29.1), the value of the residual is . (Round your answer to at
least 2 decimal places.)
EXPLANATION
Simple linear regression, which can be used to predict values of one variable from values of
another variable, can also be used to quantify and evaluate the linear relationship between two
variables. In this problem, we examine some statistics used in simple linear regression, statistics
https://www-awa.aleks.com/alekscgi/x/Isl.exe/13mnfybujyKQVP-K…DcXyozC4CaHPK7IBXNaB3Neii7RUDGqu5TyL7RyKumaMaB1krYfHol-VPI6R Page 1 of 4
ALEKS 1/8/20, 10(43 AM
which measure the variation in data values and the amount of this variation that is explained by
a "best fitting" line relating the data values.
Suppose that a bivariate data set has the n points x 1 ,y 1 , x 2 ,y 2 ,..., x n ,y n . (In the bivariate
data set in the problem, there are five data points.) The total sum of squares for this data set,
abbreviated SST, gives a measure of the variation of the y values about their mean y :
n
SST =
Σ
i=1
yi − y
2
.
Note that the SST is simply a multiple of the sample variance of the y values. The deviations
yi − y entering into the calculation of the SST for the data set in this problem are represented by
the red lines in Figure 2 below. These deviations show a pattern: for small x values the
deviations are negative, and for largex values the deviations are positive. This suggests that
there is a relationship between the two variables x and y. We can use simple linear regression to
find a "best-fitting" line for the data points and examine the linear relationship between the two
variables.
Suppose that a line having equation y = a + b x is fit to the n data points

x 1 ,y 1 , x 2 ,y 2 ,..., x n ,y n . We'll call this line the regression line and its equation the
regression equation. (In Figure 1, the regression line has equation y = 11.40 + 0.60x .) The
difference y i − y i between the observed y value (y i) for a given x value and the predicted y value
( y i) for that x value is called a residual. (The residuals for the data points in the problem and
the least-squares regression line y = 11.40 + 0.60x are represented by the red lines in Figure 4
below.) When each of the residuals is squared and these squares are added, the value obtained
is called the error sum of squares (or sum of squares for error), abbreviated SSE:
Σ
2
SSE = yi − y i .
i=1
The SSE gives a measure of how far the data points tend to be from the
Measuring the
line. The farther the data points tend to be from the line, the larger the
distance of data
value of the SSE. The least-squares regression line is the regression line
points from a
that minimizes the value of the SSE for a given set of data points. In other
line
words, of all the regression lines that could be drawn to fit the data points,
the least-squares regression line is the one that gives the smallest value for
the SSE, and therefore may be said to "best fit" the data points. For the five
data points in the problem, the least-squares regression line is
y = 11.40 + 0.60x .
Along with the SSE and the SST, another important statistic is the regression sum of squares (or
the sum of squares for regression), abbreviated SSR and defined by
Σ
2
SSR = yi− y .
i=1
ALEKS 1/8/20, 10(43 AM
The deviations yi− y entering into the calculation of the SSR for the data set in the problem are
shown by the red lines in Figure 3 below. As implied by this figure, the SSR gives a measure of
the total variation in the sample y values that is explained by the linear relationship between x
and y as given by the regression equation.
y y y
32 32 32
30 30 30
28 28 28
26 y 26.94 26 y 26.94 26 y 26.94
24 24 24
22 22 22
x x x
22 24 26 28 30 32 22 24 26 28 30 32 22 24 26 28 30 32
Figure 2. Deviations (in red) used in Figure 3. Deviations (in red) used in Figure 4. Deviations (in red) used in
calculating the SST calculating the SSR calculating the SSE x
It turns out that
The relationship
among the SST,
S S T = S S R + S S E. the SSR, and the
SSE
This is an important result: the total variation (SST) in the sample y values The r 2 statistic
is equal to the sum of the variation (SSR) that is explained by the
regression equation and the variation (SSE) that is not explained by the
SSR
regression equation. The ratio gives the proportion of the total
SST
variation in the sample y values that is explained by the regression
2
equation. This ratio is denoted by r .
We are now ready to answer parts 1 through 4 in the original problem:
between x and y is given by the regression sum of squares (SSR). For this data set and for the
regression line y = 11.40 + 0.60x , the SSR is given by the sum of the squared deviations in the
third column of the "Calculations" table: 13.4172 .
2 SSR
SST
explained by the estimated linear relationship between the two variables. (For these data and
the regression line y = 11.40 + 0.60x , the SSR is given by the sum in the third column of the
"Calculations" table, and the SST is given by the sum in the first column of the "Calculations"
2 SSR 13.4172
table.) The proportion is r = = , which is approximately 0.75 .
SST 17.9720
3. The least-squares regression line has an equation that minimizes the error sum of squares
(SSE), which is the sum of the squares of the residuals. For this data set and its least-squares
ALEKS 1/8/20, 10(43 AM
regression line y = 11.40 + 0.60x , the squares of the residuals are given in the second column
of the "Calculations" table, so the SSE is the sum of the numbers in this column: 4.5404 .
4. The residual for the data point 27.6,29.1 is equal to the observed y value, which is 29.1,
minus the y value predicted from the regression line y = 11.40 + 0.60x , which is
11.40 + 0.6 27.6 = 27.96. Thus, the residual is 29.1 − 27.96 = 1.14.
ANSWER
between x and y is given by the regression sum of squares , which for these data is
13.4172 .
2
explained by the estimated linear relationship between x and y. For these data, the value of
2
r is 0.75 . (Round your answer to at least 2 decimal places.)
3. The least-squares regression line given above is said to be a line which "best fits" the
sample data. The term "best fits" is used because the line has an equation that minimizes the
error sum of squares , which for these data is 4.5404 .
4. For the data point (27.6, 29.1), the value of the residual is 1.14 . (Round your answer to at
least 2 decimal places.)

Explain PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Explain PDF

Загружено:

Авторское право:

Доступные форматы

ALEKS 1/8/20, 10(43 AM

values, and the values y predicted from the regression equation.

24.2 25.3 2.6896 0.3844 1.0404 26

26.3 25.7 1.5376 2.1904 0.0576

Answer the following:

Suppose that a line having equation y = a + b x is fit to the n data points

26 y 26.94 26 y 26.94 26 y 26.94

It turns out that

We are now ready to answer parts 1 through 4 in the original problem:

Вам также может понравиться