Вы находитесь на странице: 1из 4

ZachStence

11/18/2015
Crow,2
LinearRegressionProjectReport
Does increasing population overcrowd our country and thusbring down lifeexpectancy?No.I
chose this data to see if there was a correlation between the population and the life expectancyof the
United States in agiven year. I found these two variables interesting and wondered ifthey werelinked.
ThisisimportantinmylifebecauseIliveintheUnitedStatesandwouldliketoknowhowtheincreasing
Data was taken from the World Bank
DataBank on United States Population and Life
Expectancy from 1960 to 2013 (World). When a
scatterplot wascreated,thedata isalmostperfectly
linear and exhibits behavior like a strong positive
correlation. Since the dataappearsperfectlylinear,
there were no outliers or influential points. The
least squaresregressionequation forthe datais

=0.0686x+57.469
. This meansthatforeachmillion
people the population increases, the life expectancy in the United States increases
0.0686 years on
average (slope). The intercept (
57.469
) meansthatat apopulation 0, the lifeexpectancywas
57.469
years. However this is notinterpretablein the context ofthedatabecause if therewas 0population,the
life expectancy wouldalsobe 0. The r valueforthedata is
0.980
,meaning thesetwovariables have a
2
strong, positive linear correlation. The r
value is
0.961 meaning that 96.1% of the variability inlife

expectancycanbepredictedbythepopulationoftheUnitedStates.
Whena residualplotwas created, we canseethatwhenthepopulationwasbetween 200 and260
million,life expectancy hasmuch higherresidualsand therefore wasfar moreaboveaveragethanatany

whenlooking at thescale of the yaxis ontheresidual plot,thenumbersareverysmall.Thismeansthat

while there is pattern in the variation of residuals,
theyareverysmallandnotsignificant.There areno
influential points or outliers that are affecting the
model.
If we choose an explanatory data point at
random

(Population

186538000

, Life

Expectancy =
70.120
), we can compare it to the
least squares regression to evaluate how accurate
our predictionwasforthispoint.Theresidualonthispointisthedifferencebetweenactualandpredicted,
(
70.1270.27
)
0.15
, meaning that the linearmodeldid afairlygoodjobofpredicting this data
pointandthepredictionisnotperfectlyvalid,butveryclose.Themodeljustoverestimatedalittlebit.
Alinearregression wasthebestfit forthisdata.Whenlookingatthescatterplot,thereisno other
2
form ofregressionthatwould fitbetter.Also,sincetherandr
valuesaresohigh,thiswasanappropriate

model to use. There are nooutliersor influential points whenusinga linearregression, andall the data
liesveryneartheleastsquaresregressionline.
An interesting thing to do is to use the data we have analysed to predict avalue faroffin the
future.While thisis aform ofextrapolationand doesnottellusanythingnecessarilytrueormeaningful,
it can still be interesting.Basedon this data,farin the future whentheUnitedStateshasapopulationof
500million,our model predictsthat the lifeexpectancywillincreaseall the waytoanaverageof91.77!
thismeansthatmostAmericanswilllivetobealmost100yearsold!
Thisdata andanalysisinformationcouldbe used byagovernmentpolicymaker.Forexample,if
the data explained that as population increased life expectancy went down then he would want to
implement policies controlling how many children people canhave so asto keepthelife expectancy of

the country high. However, since the data shows a positive correlation between the two variables, if
overcrowding werenotabyproduct ofincreasingpopulation, he may encouragepeopletohavechildren
soas toincreasethe lifeexpectancyoftheUnitedStates.Thesechoicesdependonthesituationat handas
well. Thishas relevancetooursocietybecause government policymakers controla largeportionof the