Вы находитесь на странице: 1из 8

Coefficient of Determination Formula

Coefficient of Determination is one of the most important tools in statistics which is widely used in data analysis in economics, physics,
chemistry and many more fields. Coefficient of determination allows us to forecast or predict the possible outcomes and possible variability
in the data. Coefficient of determination is denoted by r
2
or sometimes by R
2
. It is simply explained as the square of r which is correlation
coefficient. The value of coefficient of determination lies between 0 and 1. The higher the value of r
2
, the better the prediction becomes. The
formula for coefficient of determination is given below:


The formula of correlation coefficient is given below:




Where,
r = Correlation coefficient
x = Values in first set of data
y = Values in second set of data
n = Total number of values.

Coefficient of Determination Problems
Back to Top

The few problems based on coefficient of determination are as follows:
Solved Examples
Question 1: Marks obtained by few students in physics and chemistry tests are given by the following table:
Physics 18 16 15 10
Chemistry 15 12 9 17

Compute the coefficient of determination.



Solution:

Construct the following table for the determination of correlation coefficient:

x y x
2
y
2
xy
18 15 324 225 270
16 12 256 144 192
15 9 225 81 135
10 17 100 289 170
x = 59 y = 53 x2 = 905 y2 = 739 xy = 767


Formula for correlation coefficient:
r = n(xy)(x)(y)[nx2(x)2][ny2(y)2]
r = 47675953[4905(59)2][4739(53)2]
r = 0.41275
Coefficient of determination = r
2

= 0.17036 = 0.17 (approx).


Question 2: During an observation at a garden, the number of roses and the number of marigold flowers are noted every week. The readings
of 4 successive weeks are as follows:
Roses 2 3 1 4
Marigold 2 5 3 6

Evaluate the coefficient of determination.
Solution:
Construct the following table for the determination of correlation coefficient:
x y x
2
y
2
xy
2 2 4 4 4
3 5 9 25 15
1 3 1 9 3
4 6 16 36 24
x = 10 y = 16 x2 = 30 y2 = 74 xy = 46

Formula for correlation coefficient:
r = n(xy)(x)(y)[nx2(x)2][ny2(y)2]
r = 4461016[430(10)2][474(16)2]
r = 0.8485
Coefficient of determination = r
2

= 0.7199 = 0.72 (approx).


Tutor.2
Coefficient of Determination: What it is and How to Calculate it
Main > Definitions > Coefficient of Determination
Watch the video or read the article below:

Coefficient of Determination: Overview.
The coefficient of determination, R
2
, is used to analyze how differences in one variable can be explained by a difference in a second
variable. For example, when a person gets pregnant has a direct relation to when they give birth. The coefficient of determination is similar
to the correlation coefficient, R. The correlation coefficient formula will tell you how strong of a linear relationship there is between two
variables.


Finding the Coefficient of Determination

Step 1: Find the correlation coefficient, r (it may be given to you in the question). Example, r = 0.543.
Step 2: Square the correlation coefficient.
0.543
2
= .295
Step 3:Convert the correlation coefficient to a percentage.
.295 = 29.5%


Thats it!
Meaning of the Coefficient of Determination
The coefficient of determination can be thought of as a percent. It gives you an idea of how many data points fall within the results of the
line formed by the regression equation. The higher the coefficient, the higher percentage of points the line passes through when the data
points and line are plotted. If the coefficient is 0.80, then 80% of the points should fall within the regression line. Values of .1 or 0 would
indicate the regression line represents all or none of the data, respectively. A higher coefficient is an indicator of a better goodness of fit for
the observations.
Usefulness of R
2

The usefulness of R
2
is its ability to find the likelihood of future events falling within the predicted outcomes. The idea is that if more
samples are added, the coefficient would show the probability of a new point falling on the line.
Even if there is a strong connection between the two variables, determination does not prove causality. For example, a study on birthdays
may show a large number of birthdays happen within a time frame of one or two months. This does not mean that the passage of time or the
change of seasons causes pregnancy.
Syntax
The coefficient of determination is usually written as R
2
_p. The p value indicates the number of columns of data, which is useful when
comparing the R
2
of different data sets.
Correlation Coefficients: Find Pearsons Correlation Coefficient
Main Statistics Topic Index > Pearsons Correlation Coefficients
Contents (Click to skip to the section):
1. How to Find Pearsons Correlation Coefficients.
2. How to test a correlation coefficient.
3. What Does the Correlation Coefficient Mean?
4. Cramers V Correlation
5. Where did the Correlation Coefficient Come From?
How to Find Pearsons Correlation Coefficients
Correlation coefficients are used in statistics to measure how strong a relationship is between two variables. There are several types of
correlation coefficient: Pearsons correlation or Pearson correlation is a correlation coefficient commonly used in linear regression.

Sample question: Find the value of the correlation coefficient from the following table:
Subject Age x Glucose Level y
1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81



Step 1:Make a chart. Use the given data, and add three more columns: xy, x
2
, and y
2
.
Subject Age x Glucose Level y xy x
2
y
2

1 43 99

2 21 65

3 25 79

4 42 75

5 57 87

6 59 81

Step 2::Multiply x and y together to fill the xy column. For example, row 1 would be 43 99 = 4,257.

Subject Age x Glucose Level y xy x
2
y
2

1 43 99 4257

2 21 65 1365

3 25 79 1975

4 42 75 3150

5 57 87 4959

6 59 81 4779











Step 3: Take the square of the numbers in the x column, and put the result in the x
2
column.
Subject Age x Glucose Level y xy x
2
y
2

1 43 99 4257 1849

2 21 65 1365 441

3 25 79 1975 625

4 42 75 3150 1764

5 57 87 4959 3249

6 59 81 4779 3481


Step 4: Take the square of the numbers in the y column, and put the result in the y
2
column.
Subject Age x Glucose Level y xy x
2
y
2

1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561
Step 5: Add up all of the numbers in the columns and put the result at the bottom.
2
column. The Greek letter sigma () is a short way of
saying sum of.
Subject Age x Glucose Level y xy x
2
y
2

1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561
247 486 20485 11409 40022

Step 6:Use the following correlation coefficient formula.

The answer is: 2868 / 5413.27 = 0.529809
From our table:
x = 247
y = 486
xy = 20,485
x
2
= 11,409
y
2
= 40,022
n is the sample size, in our case = 6
The correlation coefficient =
6(20,485) (247 486) / [[[6(11,409) - (247
2
)] [6(40,022) - 486
2
]]]
=0.5298
The range of the correlation coefficient is from -1 to 1. Our result is 0.5298 or 52.98%, which means the variables have a moderate positive
correlation.

How to test correlation coefficients

If you can read a tableyou can test for correlation coefficient.
Sample problem: test the significance of the correlation coefficient r = 0.565 using the critical values for PPMC table. Test at = 0.01 for a
sample size of 9.
Step 1: Subtract two from the sample size to get df, degrees of freedom.
9 7 = 2

Step 2: Look the values up in the PPMC Table. With df = 7 and = 0.01, the table value is = 0.798
Step 3: Draw a graph, so you can more easily see the relationship.

r = 0.565 does not fall into the reject region (above 0.798), so there isnt enough evidence to state a strong linear relationship exists in the
data.
What Does the Correlation Coefficient Mean?
Pearsons Correlation Coefficient returns a value of between -1 and +1. A -1 means there is a strong negative correlation and +1 means that
there is a strong positive correlation. This can initially be a little hard to wrap your head around (who likes to deal with negative numbers?).
The Political Science Department at Quinnipiac University posted this useful list of the meaning of Pearsons Correlation coefficients. They
note that these are crude estimates for interpreting strengths of correlations using Pearsons Correlation:
r value =

+.70 or higher Very strong positive relationship
+.40 to +.69 Strong positive relationship
+.30 to +.39 Moderate positive relationship
+.20 to +.29 weak positive relationship
+.01 to +.19 No or negligible relationship
0 No relationship
-.01 to -.19 No or negligible relationship
-.20 to -.29 weak negative relationship
-.30 to -.39 Moderate negative relationship
-.40 to -.69 Strong negative relationship
-.70 or higher Very strong negative relationship


It may be helpful to see graphically what these correlations look like:

Graphs showing a correlation of -1 (a negative correlation), 0 and +1 (a positive correlation)
The images show that a strong negative correlation means that the graph has a downward slope from left to right: as the x-values increase,
the y-values get smaller. A strong positive correlation means that the graph has an upward slope from left to right: as the x-values increase,
the y-values get larger.


Cramers V Correlation
Cramers V Correlation is similar to the Pearson Correlation coefficient. While the Pearson correlation is used to test the strength of linear
relationships, Cramers V is used to calculate correlation in tables with more than 2 x 2 columns and rows. Cramers V correlation varies
between 0 and 1. A value close to 0 means that there is very little association between the variables. A Cramers V of close to 1 indicates a
very strong association.
Cramers V

.25 or higher Very strong relationship
.15 to .25 Strong relationship
.11 to .15 Moderate relationship
.06 to .10 weak relationship
.01 to .05 No or negligible relationship
Where did the Correlation Coefficient Come From?
A correlation coefficient gives you an idea of how well data fits a line or curve. Pearson wasnt the original inventor of the term correlation
but his use of it became one of the most popular ways to measure correlation.


Francis Galton (who was also involved with the development of the interquartile range) was the first person to measure correlation,
originally termed co-relation, which actually makes sense considering youre studying the relationship between a couple of different
variables. In Co-Relations and Their Measurement, he said The statures of kinsmen are co-related variables; thus, the stature of the father is
correlated to that of the adult son,..and so on; but the index of co-relation is different in the different cases. Its worth noting though that
Galton mentioned in his paper that he had borrowed the term from biology, where Co-relation and correlation of structure was being used
but until the time of his paper it hadnt been properly defined.
In 1892, British statistician Francis Ysidro Edgeworth published a paper called Correlated Averages, Philosophical Magazine, 5th Series,
34, 190-204 where he used the term Coefficient of Correlation. It wasnt until 1896 that British mathematician Karl Pearson used
Coefficient of Correlation in two papers: Contributions to the Mathematical Theory of Evolution and Mathematical Contributions to the
Theory of Evolution. III. Regression, Heredity and Panmixia. It was the second paper that introduced the Pearson product-moment
correlation formula for estimating correlation.

The Pearson Product-Moment Correlation equation.

Вам также может понравиться