Вы находитесь на странице: 1из 18

Chapter 10 Relationships between variables

Definition A Scatter Plot is a picture of bivariate numerical data in which each observation ( ie each pair of values (x,y)) is represented by a point located on a rectangular co-ordinate system. The Horizontal Axis is identified with values of x and the vertical axis with values of y.

Example: Draw a Scatter Plot to represent the following dataset:

x: 1, 3, 2, 4, 7, 6, 5 y: 4, 2, 5, 6, 9, 8, 7

10 9 8 7 6 5 4 3 2 1 0 0 2 4 6 8

Another Example: Draw a Scatter Plot to represent the following dataset:

x: 1, 3, 2, 4, 7, 6, 5 y: 4, 6, 1, 3, 2, 4, 1

7 6 5 4 3 2 1 0 0 2 4 6 8

Question Any comments on these two datasets? Is there anything special about them? Looking at a scatter plot can sometimes allow us to determine if a relationship exists between two variables. But in general we need to go beyond pictures and develop a numerical measure of how strongly the two variables x and y are related.

Definition Pearsons Sample Correlation Coefficient, r, is a measure of the strength of the linear relationship between two variables x and y.

r=

( x x )( y y ) (x x) ( y y)
2

SS xy SS xx SS yy

Properties of r The correct interpretation of r requires an appreciation of some general properties: The value of r does not depend on the unit of measurement for either variable, nor does it depend on which variable is labelled x or y. The value of r is between -1 and 1. A positive value of r indicates a positive linear relationship between the variables. So as x increases so does y. A negative value of r corresponds to a negative relationship. As x increases y decreases.

The value r = 1, which indicates the strongest possible positive relationship between x and y results only when all points in the scatter plot lie exactly on a straight line that slopes upward. The value r = -1, which indicates the strongest possible negative relationship between x and y results only when all points in the scatter plot lie exactly on a straight line that slopes downward.

The value of r is a measure of the extent to which x and y are linearly related i.e. the extent to which the points in the scatter plot lie close to a straight line. A value close to zero does not rule out any strong relationship between x and y; there could still be a strong relationship but one that is not linear.

Examples For each of the following pairs of variables, indicate whether you would expect a positive correlation, a negative correlation or no correlation. Minimum daily temperature and heating costs Interest rate and number of loan applications Incomes of husbands and wives when both have full-time jobs Ages of boyfriends and girlfriends Height and IQ Height and shoe size Your Maths score in the Leaving Cert and your Irish score in the Leaving Cert

Correlation and causation Years of research have established several facts: There is a strong correlation between the numbers of storks in a country and the number of births in that country. Countries with many storks have a high number of births and countries with low stork counts have low numbers of births. There is a high correlation among primary school children between vocabulary and numbers of tooth fillings. Children with many fillings have a larger vocabulary than children with only a small number or with no fillings.

Correlation and causation What should we conclude from these facts? That storks really are responsible for bringing babies. That eating Mars bars will increase your vocabulary. No, these examples illustrate a very important point. Correlation is not the same as causation.

Correlation and causation Larger countries have larger stork populations and usually have higher human populations as well and so there will be higher numbers of babies born than in smaller countries. Young children have very few fillings because they have only been around for a few years whereas older children have had time to eat lots of sweets, get a lot of bad teeth and learn a lot of new words. So be careful before you interpret a correlation as causation. It may be that a third confounding variable is causing the correlation: Size of country, Age of child.

STATISTICS IN PRACTICE

An undergraduate UCD student was totally hung over for her final exam. She was somewhat relieved to find that the exam was a true/false test. She had taken a basic Stats course and did remember her lecturer once performing a coin flipping experiment.

Since her brain was pretty mushy she decided to flip a coin she had in her pocket to get the answers for each question. The invigilators watched the student the entire two hours as she was flipping the coin...writing the answer...flipping the coin ...writing the answer, on and on.

At the end of the two hours, everyone else had left the room except for this one student. The head invigilator walks up to her desk and interrupts the student, saying: "Listen, it is obvious that you did not study for this exam since you didn't even open the question booklet. If you are just flipping a coin for your answer, why is it taking you so long?"

The

stunned student looks up at the invigilator and replies bitterly (as she is still flipping the coin):

Im checking my answers!"

"Shhh!

Вам также может понравиться