Вы находитесь на странице: 1из 4

Casino Games

Activity 6 8
Data Analysis
What are some other data trends we can find with scatterplots?
Year 0 1000 1750 1800 1930 1960
Population (billions) 0.3 0.32 0.8 1 2 3
Year 1974 1987 1990 1992 1995 1999
Population (billions) 4 5 5.2 5.4 5.7 6

The table above shows data for world population milestones compared to the year in which
they occurred. A quick look at the data shows that while it took over 1800 years for the
world’s population to reach 1 billion, it took only 130 years to add another 1 billion to the
first billion. Because the rate of change in the population is not consistent, this data will not
be linear.

But what if we assumed it was? Would we get an error? Would the computer tell us it
wasn’t?

No. The computer will give you a linear equation for this data just like any other data. This is
why you have to use common sense. You can’t rely solely on the computer/calculator without
thinking.

Here’s the linear line of best fit for the


data above. As you can see, the
computer had no problem giving me a
linear equation, but it doesn’t show the
trend in the data.

Finding a linear equation to fit the data is called linear regression, so it’s clear in this case
that we need to use nonlinear regression. It’s more difficult to find tools to do nonlinear
regressions. Graphing calculators can.

Your job, no matter what the technology, is to choose an appropriate model. A situation like
population growth is usually best described by an exponential model.

Here’s how an exponential model fits the data.


Why?

nonlinear linear
What is correlation?
Correlation is the term for when two things change together. When the temperature outside
goes up, so do the sales of ice cream. There’s a correlation between temperature and ice
cream sales. There’s a correlation between brown hair and brown eyes. There’s also a
correlation between hours of TV watching and grades, except that the more a student
watches TV, the lower their grades tend to be. This is an example of a negative correlation.

A positive correlation is when variable A goes up, variable B also goes up. When variable
A goes down, variable B also goes down.

A negative correlation is when variable A goes up, variable B goes down. When variable A
goes down, variable B goes up.

If two things are correlated, does that mean one thing causes the other thing?
Nooooooooooooooooooooooooooooo!!!!!!!!!!!!!!!!!!!!

Correlation never never never never means by itself that variable A causes variable B to
change (or vice versa). There might be a causal relationship, but correlation alone is not
enough to determine this.

If you ever see a test question where it says “these things are correlated. What can we
conclude?”, then eliminate any answer that says something about one thing causing the
other.

In the summer, shark attacks and ice cream sales are found to be correlated at a beach.
Does this mean sharks hate when ice cream is sold, and therefore this causes them to bite
swimmers? Maybe, but we can’t say for sure.
How do we use correlation?
In the last activity, we came up with linear lines of best fit using linear regression. One way to
find out if this line of best fit will be useful at all is to determine how good the correlation is
using something called the coefficient of correlation, sometimes shown as r. Once we have
that, r 2 shows us how good the line of best fit will be for making predictions. Again, closer to
1 is better for r 2 .

! The coefficient of correlation is a fancy term for a number that goes along with a correlation.
!
If the number is 1, the correlation is perfect. If the number is zero, there is no correlation. If
the number is -1, the correlation is perfectly negative.

The coefficient of correlation tells how linear the data are. In this picture, you can see that
the most linear scatterplots have numbers closer to 1 and -1. The scatterplots with no
relationships have numbers closer to 0.

Problem Set 14
Draw a scatterplots with the following r values:
1. r = 1
2. r = -1
3. r = 0
10. Below, four data sets are presented for which a computer gave the same r-squared value and the same regression
line. Write a paragraph describing what you see about the similarities and differences among the sets. Be sure to talk
about each set individually. Does the line of best fit work for all of them? Would a nonlinear regression be more
appropriate for any? Does the term “outlier” help to describe what’s going on with any of the sets? What does this say
about the necessity of using common sense instead of just trusting that a computer or calculator knows what it’s doing?