Вы находитесь на странице: 1из 9

Final Project

For the past 20 years, retail sales has grown exponentially. To show this growth, our
group decided to focus our project on retail sales and how different variables have a causal
effect. In our project, we are proving that GDP, the stock market, crime rates, unemployment
rates, poverty rates, and internet users have an effect on retail sales. As a group we were all
expecting that all of these variables will greatly affect retail sales because retail sales does have a
dependency on these variables. For example, we believed that the variables with the strongest
influence on retail sales would be internet users, unemployment, and the poverty rate. However,
it was very interesting to find out that all of this data were very strong influences.

RS=
6
1.7258 e +10599 ( GDP )27080 ( STOCK )448358 ( CRIME )138803 ( UE )+ 255020 ( POV )+ 23098.5( INTU )
+e
The range is from 1992-2014
RS
Retail sales (in Billions)
GDP
Percent Change in GDP (%)
STOCK
Dummy Variable if the stock market 0 if decreased 1 if increased
CRIME
Crime Rate (%)
UE
Unemployment Rate (%)
POV
Poverty Rate (%)
INTU
Internet Users (%)
Initially, our group ran a regression that included all of our independent variables:
GDP , Crime Rate, Unemployment Rate, Poverty rate, and percent of Internet users. Also

included was the dummy variable, Stock market trends. This first regression provided a very
good fit, the adjusted

R2 had a value of .989004. There were two variables that were

insignificant with the regression. In the second model, we took out the least significant variable,
which was the Stock Market trends, our dummy variable. Upon doing so, the adjusted
increased to .989534. However, the significance of

R2

GDP decreased further. So, in our third

regression model, we removed the Change in GDP variable. This resulted in the adjusted

increasing further to .989855. In this model, all of our variables were significant within 1%.
Out of curiosity, our group decided to run one last regression model that would only use
the insignificant variables. So, we created a model that only used the data from the stock market
trends and GDP change. The adjusted

R2 for this regression was .079630. The fit was

terrible. The P values show that this regression model provided GDP change to be significant
within 10% but the stock market trends were significant within 90%, so extremely insignificant.
One problem we might have run into is omitted variable bias. This happens when the
model attributes the effect of a missing variable to one of the other variables in the model. We
may have missed some variables such as the advent of the Internet, smartphones becoming
common for the general population, and how many retail sales are specifically from internet
sales. These variables that were not included and may have affected our model.
Once we found that the variables D-Stock Value and GDP change reduced our estimates,
we removed them and ran another regression. Removing these irrelevant variables increased our
R-squared value. The same variables also had multicollinearity issues. Our R-squared was
initially high, but the t-stats for those variables were low. We removed the variables to improve
our model. We could have also experienced the issue of multicollinearity due to not having a
large enough sample size for the amount of variables we had.
The first regression that our group ran went extremely well since four out of six variables
were sought out to be highly significant and we found an

of 0.992003 which is almost

perfect. A concern from these results is that we noticed a low t-score with the significant
variables of Crime and Unemployment which could be a factor of multicollinearity. Some of our
variables like the percent change in GDP ( GDP were not found to be significant at all. We

then ran a regression where we only kept the significant variables in which our

decreased

down to 0.991700, although it is still very high. The decrease could have been due to the fact that
we omitted some of the variables that would still have some importance indicating the overall
effect of internet usage on retail sales.
An interesting find that we discovered from our research was the positive correlation
between retail sales and the poverty rate. At first, one would assume that the relationship would
be negative, however, if the poverty rate goes down, perhaps more people are receiving aid and
are more able to purchase more goods that are necessities. During our time period of 1992-2014
the internet has rapidly been becoming more popular and widely used by all. Which is why, as to
no surprise, retail sales are positively correlated with internet users.
When using all variables, our work shows that there is a positive linear relationship
between retail sales and an increasing rate of internet users. With an R squared of 0.99203, this
regression shows that there data is extremely close to the fitted regression line, meaning it is a
good fit. However, we did have concerns over our results. First, with only twenty-two years of
data, our sample size is smaller than what is recommended. This causes us to worry about the
problem of multicollinearity, which may not affect how reliable our model is as a whole, but it
could affect factors about our variables, such as whether they are redundant or not. The variables
percent change in GDP and the dummy variable of overall stock market value were deemed
insignificant, this may in fact be due to the multicollinearity that was discussed previously, but it
still demands attention. In order to better our model, it would behoove us to obtain a larger
sample size for all of our variables, a sample size of at least thirty would suffice. Further research
to conduct would include trying to find any variables we might have omitted. Factors that could
be relevant but werent included would be data measuring how the housing market fared in a

particular year, and/or the Federal Reserve interest rate in those years. The rate at which the Fed
set the interest rate for a particular year is especially interesting, because it could be a good
indicator of how the economy was doing at the time, as well as factoring in consumers
propensity to spend less when interest rates increase, which should affect retail sales.
Regression Model 1

Regression Model 2

Regression Model 3

Regression Model 4

Retail Sales vs. Internet usage

In this particular graph we can see the positive correlation between retail sales and internet
usage. The internet usage data is red and we can see how there is not very much distance
between the spaces meaning that it is a good fit.
Retail Sales vs. Crime Percentages

In this graph we can see how retail sales are negatively related to retail sales, and that it also a
good fit given little distance found between the red dots representing the crime rate and the blue
line representing retail sales.

GDP vs. Retail Sales

We also thought it would be useful to demonstrate what a bad fit looks like so we compared the
percent change in GDP to retail sales. We can see that the line does not fit very well between the
dots and there is no correlation at all.
Websites/citing:
CRIME: http://www.disastercenter.com/crime/uscrime.htm
Unemployment Rate: http://data.bls.gov/pdq/SurveyOutputServlet
Retail Sales: https://www.census.gov/retail/index.html
Poverty Rate: http://www.statista.com/statistics/200463/us-poverty-rate-since-1990/
Sales Tax: http://www.statista.com/statistics/249137/us-state-and-local-sales-tax-revenue/

Вам также может понравиться