Вы находитесь на странице: 1из 2

1

The Coefficient of Determination r2 / R2: A Measure of “Goodness of Fit”


R2 is the goodness of fit of the fitted regression line to a set of data; it means that how “well” the
sample regression line fits the data.

In the following Figure, it is clear that if all the observations were to lie on the regression line,
we would obtain a “perfect” fit, (high R2) but this is rarely the case. Generally, there will be some
positive uˆ i and some negative uˆ i. We hope all these residuals around the regression line are as
small as possible.

The coefficient of determination r2 (two-variable case)


or R2 (multiple regression) is a summary measure that
tells how well the sample regression line fits
the data.

In regression analysis, "R squared" is the proportion of


the variance in the dependent variable is predicted by
the independent variable(s).

The Venn diagram, or the Ballentine of R2

The Ballentine view of r2: (a) r2 = 0; ………… (f) r2 = 1.

In this figure, the circle Y represents variation in the dependent variable Y and the circle X represents variation in the
explanatory variable X.

The overlap of the two circles (the shaded area) indicates the extent to which the variation in Y is explained by the
variation in X (r2 in OLS regression).

The greater the extent of the overlap, the greater the variation in Y is explained by X; and the lower the extent of the
overlap, the lower the variation in Y is explained by X.

The r2 is simply a numerical measure of this overlap. In the figure, as we move from left to right, the area of the
overlap increases, that is, successively a greater proportion of the variation in Y is explained by X. In short, r2
increases. When there is no overlap, r2 is obviously zero, but when the overlap is complete, r2 is 1, since 100 percent
of the variation in Y is explained by X. As we shall show shortly, r2 lies between 0 and 1.
Two properties of r2 may be noted:
1. It is a nonnegative quantity. (Why?)
2. Its limits are 0 ≤ r2 ≤ 1.
2

The Coefficient of correlation r


The Coefficient of correlation “r” is a measure of the degree of association between/among
dependent and independent variables.
It can be computed as, r = ±√ r2

Some of the properties of r are as follows:


1. It can be positive or negative, which measures the sample covariation of two variables.
2. It lies between the limits of -1 and +1; that is, -1 ≤ r ≤ 1.
3. It is symmetrical in nature; that is, the coefficient of correlation between X and Y (rXY)
is the same as that between Y and X (rYX).
4. It is independent of the origin and scale.
5. If X and Y are statistically independent, the correlation coefficient between them is zero;
but if r = 0, it does not mean that two variables are independent.
6. It is a measure of linear association or linear dependence only; it has no meaning for
describing nonlinear relations. Thus in Figure (h), Y = X 2 is an exact relationship yet
r is zero.
7. Although it is a measure of linear association between two variables, it does not
necessarily imply any cause-and-effect relationship.

In the regression analysis, r2 is a more meaningful measure than r. The r2 tells us the proportion
of variation/variance in the dependent variable explained by the explanatory variable(s) and
therefore provides an overall measure of the extent to which the variation in one variable
determines the variation in the other. The latter one (r) does not have such value.