You are on page 1of 4

This webpage will show and explain how to acquire various statistical variables for a set of data.

In this case
we are using the height of a ball being dropped vs. the height of the bounce. Our data includes the height of the
drop or our x variable and the height of the bounce the y variable.

Data Set
















The scatterplot above shows how there is a positive correlation between the two
variables. As the height of the ball drop increases so does the bounce once it hits
the ground. The graph also gives you a visual of the data in a way a chart will not.
Notice there are no outliers. Outliers are data points that fall far from the
regression line. Outliers can pose a problem since they can skew data.

To find the Mean of any data set, you add up the data for one set and divide
it by the number of variables.
The Median is found by simply looking for the number which lies in the
middle of the data. If two numbers are present you add them up then divide
them by two
Mode is any number or set of numbers that repeat in the data.
Range is the lowest number subtracted from the highest number.
X Y
Mean = 60 Mean = 36.9
Median = 60 Median = 38
No Mode No Mode
Range = 120 Range 70





The r correlation coefficient r = .997387267
To determine if there is a linear correlation one must look at the critical value that corresponds with the number of
variables. For this dataset we decided to use 20 instead of 21 since it was not in our chart. If the absolute value of r
is greater than the critical value it means there is a linear correlation. In our case r is .997 and the critical value is
.444 meaning our data has a linear correlation.
The regression equation is y = a + bx. This means that our equation looks like; y = 2.81 + .57(x).The coefficient of
determination is r^2. For our data our r^2 = .9947. This means 99.9% of the variability and how high the ball
bounced is explained by the linear relationship between heights of the ball is dropped and how high it bounced.
The points are not randomly distributed meaning it is not a valid regression model.




















QQ plot























Above is a quantile quantile plot (QQ plot). All of the points lie close to the line which means errors are evenly
distributed.





For our prediction we are going to use 20 in place of x.
y = 2.81+.57 (x).
X = 20
Y = 2.81 + .57(20)
Y = 14.21
If the ball was dropped from a height of 20 inches it would bounce 14.21 inches.
Our residual plot is not random therefore it is not a valid regression model. The residuals should be horizontal. If the
residuals are curved the regression model is not accounting for all but the random variation in the data. It must have
the same width throughout the range. If it does not then the model does not meet the requirements for equal
variance. It must be uniformly scattered along the horizonta axis. If not then the data is clustered and the regression
model could be biased. It also must be random, there should be no pattern what so ever. Good regression models
give uncorrelated results.

If you are needing more help with statistics maybe check out the link below for some informative videos!
https://www.youtube.com/watch?v=uhxtUt_-GyM&list=PL4C863861E3B2E380