Вы находитесь на странице: 1из 20

United States Obesity Rates

Emmy Masangcay

Economics 284 (Econometrics) Professor David Lewis May 16, 2013

Masangcay 2

Introduction
The obesity epidemic in the United States has only increased over recent years. Between 1980 and 2000, obesity rates nearly doubled among adults.1 Currently over 60 million adults, or over 30% of the American adult population, are obese nationally. Below are four figures that show the increasing percentage of obesity per state from various time periods from the Centers of Disease Control and Prevention.2 Obesity is a dangerous health condition and medical professionals have been consistently acknowledging that the percentage of Americans classified as obese is intolerably high, especially since the United States has maintained the highest obesity rate as a country in the world. Obesity is defined using body mass index (BMI), a measure of weight to height.3 Obesity in adults is defined as a BMI greater than or equal to 30.

Figure A.1 Percentage of Obesity per State in 2010

Figure A.2 Percentage of Obesity per State in 2005

Figure A.3 Percentage of Obesity per State in 2000

Figure A.4 Percentage of Obesity per State in 1995

Obesity is a topic that is relevant to nearly everyone in America. People who are obese are at a high risk for developing Type 2 diabetes, which is now being increasingly diagnosed among
1

Prevalence of Obesity in the United States, 20092010, NCHS Data Brief, Centers for Disease Control and Prevention (CDC), http://www.cdc.gov/nchs/data/databriefs/db82.pdf (accessed 12 April 2013). 2 Overweight and Obesity, CDC Home, Centers for Disease Control and Prevention (CDC), http://www.cdc.gov/obesity/data/adult.html (accessed 5 April 2013). 3 Jennifer Petrelli and Kathleen Wolin, Obesity (Santa Barbara, California: Greenwood Press, 2009), 19.

Masangcay 3 young people. If younger people develop not only obesity but Type 2 diabetes as well, they are at a much higher risk of suffering the serious complications of these diseases as adults, such as kidney disease, blindness, and amputation.4 Furthermore, obesity related costs place a huge burden on families affected and the economy. For example, in 2003, the direct health costs attributable to obesity were estimated at $75 billion, but only $52 billion in 1995.5 Treating obesity, in addition to the diseases related to it such as diabetes, stroke, and heart disease, is extremely expensive. However, although it is a serious problem present in nearly a third of America now, its causes are not well understood. Public health advocates have argued that eating unhealthy foods, particularly fast foods, causes obesity because of the high calorie content and generously portion sized servings. The number of fast food restaurants, in addition to obesity rates, have been rising for quite some time, which may suggest a relationship between the trends. The increasing obesity rates and fast food restaurant density is shown below.6

Figure A.5 Obesity Rates and Restaurant Density 1960-2005 While there are many factors that contribute to obesity, a common intuition that arises across many regards the consumption of unhealthy foods, mainly fast foods. In 1970, Americans spent about $6 billion on fast food, but in 2000, Americans spent more than $110 billion.7 The biggest fast food restaurant chain in the world is McDonalds, whom operated about one thousand
4

Overweight and Obesity, CDC Home, Centers for Disease Control and Prevention (CDC), http://www.cdc.gov/obesity/data/adult.html (accessed 5 April 2013). 5 Ibid. 6 Michael Anderson and David Matsa "Are Restaurants Really Supersizing America?," American Economic Journal, January 2011, http://are.berkeley.edu/~mlanderson/pdf/Anderson%20and%20Matsa%202011.pdf (accessed 1 April 2013). 7 Eric Schosser, Fast Food Nation (Boston, Massachusetts: Houghton Mifflin Harcourt, 2001), 2.

Masangcay 4 restaurants in 1968, but in 2000, McDonalds had about twenty-eight thousand restaurants worldwide and that number had only increased since then.8 With these facts in mind, my study aims to answer if obesity rates in America are rising particularly because of the number of fast food restaurants. My study also hopes to capture if a lack in healthier foods eaten, such as fruits and vegetables, has contributed high obesity rates. There has been a plethora of past literature that has been helpful throughout the course of this project. While the sample size and setup has greatly differed from my model, the general topic of research has been similar. Among the literature explored, there were two sources that were particularly useful. One of which was the The Effect of Fast Food Restaurants on Obesity and Weight Gain by Janet Currie, Stefano Della Vigna, Enrico Moretti, and Vikram Pathania published in the American Economic Journal. These authors explored how the changes in the supply of fast food restaurants affect the weight outcomes of 3 million children and 3 million pregnant women. Their study found that in first-year high school students, a fast food restaurant within 0.1 miles of a school leads to a 5.2% increase in obesity rates.9 With pregnant women, a fast-food restaurant within 0.5 miles of residence results in a 1.6 percent increase in the probability of gaining over 20 kilos.10 While this study is much more focused on the exact proximity of a fast food restaurant to an individual rather than the general concentration of fast food restaurants per capita, the question explored was very similar to mine and provided for much guidance throughout the project. Another useful source was Are Restaurants Really Supersizing America? by Michael Anderson and David Matsa, published again in the American Economic Journal. The authors found no causal link between restaurant consumption and obesity, mainly because consumers usually offset calories from fast food meals by eating less at other times.11 As these two studies show, past results regarding this topic has remained inconsistent. However, it is still apparent that the intuitive thinking among most residents and health policy advocates is that the greater availability of restaurants increases the obesity rate. My study explores this concept, in addition to how lack of healthier foods eaten has contributed to obesity rates.

Conceptual Framework
While it is obvious that fast food is unhealthy due to its lack of nutrients and high calorie intake, it is not clear whether changes in the amount of fast food restaurants per capita will have an impact on health. In one case, more fast food restaurants per capita can raise the convenience of a family to get food. This can lead people to buy unhealthier foods that are closer to them because it would be cheaper to do so due to the reduction of travel and food costs. Furthermore, easier access to fast food could be tempting to consumers with self-control problems or those who do not have time in their day to cook a healthier meal. On the other hand, more fast food restaurants per capita can lead to a substitution away from unhealthy foods already eaten at home
8 9

Ibid. Janet Currie, Stefano Della Vigna, Enrico Moretti, and Vikram Pathania, "The Effect of Fast Food Restaurants on Obesity and Weight Gain," American Economic Journal, February 2009, http://www.nber.org/papers/w14721.pdf?new_window=1(accessed 10 March 2013). 10 Ibid. 11 Michael Anderson and David Matsa "Are Restaurants Really Supersizing America?," American Economic Journal, January 2011, http://are.berkeley.edu/~mlanderson/pdf/Anderson%20and%20Matsa%202011.pdf (accessed 1 April 2013).

Masangcay 5 without significantly changing the amount of unhealthy food consumed, thus leaving obesity rates hardly affected. Although both fast food restaurants and obesity have increased over time, as Figure A.5 demonstrates, a relationship is suggested between the two, but is not proven. An argument can be made in both directions for why the amount of fast food restaurants per capita would have an impact on the obesity rate or why it would not affect it at all. My intuition before the start of this project (and before reading any prior literature) was that more fast food restaurants per capita are likely to increase the obesity rate due to the reasoning above. However, I think this is greatly affected by the average per capita income, since one would think that people spend more money on healthier foods if they can afford to (i.e. higher per capita income). One piece of information to keep in mind is that fast food managers are most likely to choose to open new restaurants in the areas that would expect high demand. The higher demand for unhealthy foods is most likely correlated with a higher risk of obesity for that specific area, compared to other areas that do not have any fast food restaurants. This possibility shows that there may be unobservable data contributing to obesity that can be correlated with the number of fast food restaurants per capita, which would lead to an overestimation of the role of fast food restaurants on the obesity rate. Aside from this, there are other factors that have an impact on the obesity rate, but are impossible to include in the model due to the difficultly of quantification or simply because data was not available on the matter. Two important unobservable data include family obesity history and preferences, for example. In order to explore if obesity rates are mainly rising due to the growing number of fast food restaurants and lack of healthy foods eaten, I chose my independent variable as obesity rates across two different years; 2011 and 2007. There is nothing specific about choosing these particular years, but the data that I was searching for was available in both of these time periods. There are eight independent variables, all of which are explained below. The most important variable in regard to the main question of my project are ffr, fast food restaurants per 1,000 residents. The fruveg variable, the percentage of people who consume the recommended 5+ fruits/vegetables amount by the Centers of Disease Control and Prevention (CDC) daily per state, is the second most important variable. These two variables will therefore be the main focus of the project.

Data Description
The variables I used for this project, along with their definition and sources are summarized below. Table A: Variables, Definitions, and Sources Variable Name Definition Units of Measurement Y obes Obesity % per state Percentage (e.g. 0.15 means the obesity rate is 15% in a particular state) 1 pcinc Average per capita In actual dollars (e.g. 27,000 income per state means the average per capita income is $27,000) 2 age Average age per state In years (e.g. 37 means 37 years is the average age in a state)

Source Centers of Disease Control and Prevention (CDC) United States Census Bureau United States Census Bureau

Masangcay 6 3 4 black12 Black % per state Percentage (e.g. 0.05 means the black population is 5% in a particular state) Number based on per 1,000 residents (e.g. 0.800 means there are 0.800 fast food restaurants per 1,000 residents in a state) In # of males based per 100 females (e.g. 102 means there are 102 males per 100 females in a state) Percentage (e.g. 0.15 means 15% of the states population is leisurely inactive) Percentage (e.g. 0.10 means 10% of the states population consumes 5+ fruits/vegetables daily) United States Census Bureau United States Department of Agriculture

ffr

# of fast food restaurants per 1,000 residents per state

gen

# of males per 100 females per state

United States Census Bureau

6 7

leisinact13

fruveg

% of leisurely inactive residents per state % of people who consume 5+ fruits/vegetables combined daily per state

Centers of Disease Control and Prevention (CDC) Centers of Disease Control and Prevention (CDC)

The units of observation for n are the 50 states across the United States. The sources that I retrieved my data for my 50 observations from, as summarized above, include the Centers of Disease Control and Prevention (CDC), the United States Census Bureau, and the Department of Agriculture. The CDC is a government agency under the United States Department of Health and Human Services that works to protect public health and safety; so much of my information regarding obesity and health statistics comes from here. The United States Census Bureau is responsible for producing and releasing information about American residents and the economy. The United States Department of Agriculture is responsible for anything regarding food, farming, agriculture, forestry, natural resources, etc. Because all of the data comes from government agencies, all of the data is as up to date as possible and accurate. My main model was originally meant to capture 2011. However, the same data was available for 2007. Therefore, I was able to create three different models: 2011, 2007, and a difference model (2007s data subtracted from 2011). Based on the variables described above, the following summaries were derived:

12

The variable black, as defined by the United States Census Bureau, includes those residents who are Black in combination, not just solely. 13 leisinact encompasses residents who did not report leisure-time physical activities when surveyed, such as any physical activities or exercises like running, walking, gardening, golf, volleyball, etc. within the past month.

Masangcay 7 Stata Table A: 2011 Summary

Stata Table B: 2007 Summary

Stata Table C: Difference Model (2007 subtracted from 2011) Summary

As seen above, there was a big increase in ffr but also a sizable decrease in fruveg between 2007 and 2011. The obesity rate was also higher on average in 2011. Although this data is not particularly helpful in answering the question at hand, it may be interesting to the reader to know which states are associated with the minimum and maximum values for each of the variables. Therefore, Table B below summarizes that information. Table B: Minimum and Maximum Values with State Specifications Variable Minimum 2011 Minimum 2007 Maximum 2011 obes pcinc age black ffr gen leisinact fruveg Colorado Mississippi Utah Montana Utah Rhode Island Minnesota Mississippi Colorado Mississippi Utah Montana Wisconsin Mississippi Minnesota Oklahoma Mississippi Connecticut Maine Mississippi Vermont Alaska Mississippi Vermont

Maximum 2007 Mississippi Connecticut Maine Mississippi Mississippi Alaska Mississippi Vermont

Masangcay 8 I also added regional dummy variables to my model because obesity rates have typically been found to be higher in the Southern region of the United States compared to the West, Northeast, and Midwest. The dummy variables were added to estimate the ceteris paribus difference between the regional groups regarding the obesity rate. The dummy variables are described below. Table C: Dummy Variables Dummy Variable South (1 = state is in the South, 0 if not)

West (1 = state is in the West, 0 if not)

Midwest (mw) (1 = state is in the Midwest, 0 if not) Northeast (ne) (1 = state is in the Northeast, 0 if not)

States Florida, Georgia, Maryland, North Carolina, South Carolina, Virginia, West Virginia, Delaware, Alabama, Kentucky, Mississippi, Tennessee, Arkansas, Louisiana, Oklahoma, and Texas Alaska, Arizona, California, Colorado, Hawaii, Idaho, Montana, Nevada, New Mexico, Oregon, Utah, Washington, and Wyoming Illinois, Indiana, Iowa, Kansas, Michigan, Minnesota, Missouri, Nebraska, North Dakota, Ohio, South Dakota, Wisconsin Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, Vermont, New Jersey, New York, and Pennsylvania

Econometric Model/Estimation Methods


My regression model takes the following form: y = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5 + 6x6 + 7x7 + u With the specific variables I used, the regression took the following form: obes = 0 + 1pcinc + 2age + 3black + 4ffr + 5gen + 6leisinact + 7fruveg + u However, in different analyses conducted later on, I added regional dummy variables, interactions and squared terms. The model above is the most basic model used in my project and applies to the 2011, 2007, and difference model. I kept my model as a level-level model. This specification means that y = 1x, so if x were changed by 1, y would change accordingly by 1. I choose this specification because, as explained later, the difference model is my primary model of interest and focus throughout this paper. Because this model has negative numbers, using logs would not have been possible. Therefore, I maintained a level-level model for my regressions. Throughout this project, I was able to add quadratics and interaction terms for further analysis of my main question. The most important variable in my model in regard to my question is ffr. The second most important variable is fruveg, since it captures the percentage of people eating the healthy, recommended amount of fruits and vegetables daily. Because of this, I later squared these terms to analyze the decreasing or increasing marginal effects on obesity rates that ffr and

Masangcay 9 fruveg cause. As for interactions, I interacted ffr and pcinc to focus on the partial effect of per capita income on fast food restaurants per 1,000 residents on the obesity rate. This allows me to see if the effect of fast food restaurants on obesity differs in a poorer area versus a richer area. For the fruveg variable, I interacted it with age to analyze if the effect of fruveg on obesity differs with younger or older average ages across the United States. Some of the challenges that have come up in the project are due to the multiple linear regression (MLR) assumptions and are talked about below. Assumption MLR.3 (No Perfect Collinearity) in the sample and population, none of the independent variables are constant and there is no exact linear relationship among the independent variables. This assumption allows independent variables to be correlated, but the variables just cannot be perfectly correlated. A very simple and common way that two independent variables can be perfectly correlated with each other is when one variable is a constant multiple with another. Although this has not come up as an issue in my model, if it did, I would be able to account for perfect collinearity by dropping a variable. Assumption MLR.4 (Zero Conditional Mean) the error u has an expected value of zero given any values of the independent variables. In other words, E(u|x1, x2 xk) = 0. This assumption has proved to be a challenge for my project mainly because there have been omitted factors correlated with x1, x2, x3, xk, causing Assumption MLR.4 to be violated. This is due to data limitations or unquantifiable data that has become impossible to include. Examples of omitted variables that I have not been able to include are genetic obesity history (e.g. how many previous family members had obesity prior to an individual becoming obese?), preferences, percentage of income spent toward fast foods, gym membership numbers, average hours spent working out weekly, etc. While I have dedicated much time trying to find information on these factors, much of this data has yet to be researched (or if already researched, not published). There is definitely correlation between some of these factors and variables included in my regression. For example, preferences of what types of foods to eat and the percentage of income spent toward fast foods are most likely partially dependent on ones income. Although I have tried to find as much quantifiable data possible, Assumption MLR.4 has been violated during my project. Assumption MLR.5 (Homoskedasticity) the error u has the same variance given any values of the explanatory variables. In other words, var(u|x1,xk) = 2. This assumption means that the variance in the error term, u, conditional on the explanatory variables, is the same for all combinations of outcomes of the explanatory variables. If this fails, then the model used would exhibit heteroskedasticity, which is when the variance of u, given the explanatory variables, is not constant. Testing for this in the model is ideal because if MLR.5 fails, then the usual tstatistics and f-statistics are not t-distributed, which means that any hypothesis testing conducted would not be valid.

Masangcay 10 Stata Table D: Heteroskedasticity Test on the Difference Model

Above is the test for heteroskedasticity for the difference model (since this is my main model, as explained later) without dummy variables and squared terms. Stata calculates the f-statistic, which is 1.32. The 5% critical value is 2.34, so because the f-statistic is less than the critical value, we fail to reject the null hypothesis of homoscedasticity at the 5% significance level. This means that there is no evidence that MLR.5 is violated, showing that the standard errors in the model, along with any t-tests conducted, are correct. Therefore, no corrections need to take place. Any t-tests conducted in the Results section will be accurate. Because there is no time series involved in this project, there is no issue of serial correlation or other time series assumptions.

Results
Parameter Estimates Below are the parameter estimates for my three different models. Table D: Model Coefficients (with the unexpected signed coefficients in bold) Model constant pcinc age black ffr gen leisinact 2011 19.7763 -0.0001 0.2431 0.0736 0.0285 0.0161 0.2875 2007 13.7836 -0.0001 0.1688 0.0983 0.2115 -4.8077 0.1101 2011 2.3667 -0.00004 0.0301 -0.1283 0.6963 0.1776 0.0768 2007

fruveg -0.4362 -0.1189 0.0293

For the most part, the coefficients were signed in the ways I expected them to be, aside from three coefficients specified in bold. For the 2007 model, the parameter on ffr was estimated to be negative, which says that an increase of 1 fast food restaurant per 1,000 residents will decrease the obesity rate by 4.81%. However, as the mean of ffr is only 0.1189 (Stata Table C), it is unlikely that there will ever be an increase of 1 in ffr. It would be more appropriate to say, for example, that an increase in ffr by 0.1 would merit a decrease in the obesity rate by 0.48077%. Regardless, it was still unexpected to see a negative sign in front of ffr. Furthermore, with the 2011 and 2001 2007 model, the estimate on ffr is positive, but its effect is very minimal. If ffr were to increase by 0.1 in 2011, the corresponding change in obesity rate would be an increase of

Masangcay 11 only 0.00285%. However, the coefficient is much larger in the difference model. The estimates on fruveg are small as well. Because fruveg measures the percentage of people who consume 5+ fruits/vegetables combined daily per state, its coefficient is a proportion, so any inputs must be between 0-1. The number 1, for example, would represent a 100% increase in the amount of people who consume the recommended fruit and vegetable intake daily, whereas an input of 0.05 would represent a 5% increase. The 2011 model has the largest estimate of the fruveg coefficient and in that case, if there were a 5% increase in the amount of residents who consumes 5+ fruits and vegetables daily, then the obesity rate would correspondingly decrease by 0.02181%. I anticipated the co-efficient to be negative in all cases, but the difference model shows otherwise. It should be expected, based on health studies, doctor recommendations, and common intuition, that eating healthier (certeris paribus) will instead reduce the obesity rate. This unexpected finding was not only seen in fruveg, since the variable black was also negative. This was also unexpected because typically obesity rates are much higher in minority populations, especially those identified as black. It could be that the coefficients carry the unanticipated signs due to omitted variable bias. There are several immeasurable factors that are most likely correlated with ffr and fruveg. For example, dietary preferences are probably related with all of these factors. Preferences regarding food can dictate whether fast food restaurant chains will build a fast food restaurant in that area and whether one will eat more or less fruits and vegetables. However, because preferences have been excluded from this model, some of the coefficients may not be as accurate as they can be, possibly including the parameters with the unanticipated signs. Other omitted variable factors include family obesity history, time spent working out, ability to cook, etc. To limit some of this bias, it would be most appropriate to use the difference model for the remainder of the project. Although, it is important to keep in mind that while the difference model is ideal, the unanticipated negative coefficient associated with the black and fruveg coefficients can be overlooked to an extent because as explained later, all of these variables have insignificant pvalues. Explaining the Difference Model With the 2007 regression in the previous section, the estimated equation causally implies that an increase in ffr lowers the obesity rate. Although this is possible, this is not the expected case and this regression most likely suffers from omitted variable problems. To account for the omitted variables, I could have tried to control for more factors, but because many of these factors were hard to find appropriate data on or quantify, I was unable to do so. Therefore, an alternative method is to use a difference model. This means to view the unobserved factors that affect obes as two different types: those that are constant over 2011 and 2007 and those that vary over time. This can be expressed in the model below: obesit = 0 + 1pcincit + 2ageit + 3blackit + 4ffrit + 5genit + 6leisinactit + 7fruvegit + [ai + uit] Above, i represents the observations while t represents the time period. The terms in the brackets, ai + uit, represent the unobserved, where ai captures all of the unobserved constant over time factors that affect obes and uit is the time-varying error, as it represents the unobserved variables that change over time that affect obes. The variable ai is typically called the unobserved effect, or fixed effect, while the variable uit is typically referred to as the idiosyncratic error. When I use

Masangcay 12 the difference model through subtracting 2007s data from 2011s, I am able to eliminate some omitted variable bias in my regression. This is demonstrated below: obes2011-2007 = 0 + 1[pcinc2011 - pcinc2007] + 2[age2011 - age2007] + 3[black2011 - black2007] + 4[ffr2011 - ffr2007] + 5[gen2011 - gen2007] + 6[leisinact2011 - leisinact2007] + 7[fruveg2011 fruveg2007] + [ai - ai] + [u2011 - u2007] Eventually, this leads to: obes2011-2007 = 0 + 1[pcinc2011 - pcinc2007] + 2[age2011 - age2007] + 3[black2011 - black2007] + 4[ffr2011 - ffr2007] + 5[gen2011 - gen2007] + 6[leisinact2011 - leisinact2007] + 7[fruveg2011 fruveg2007] + [u2011 - u2007] The term ai cancels out because it does not change over time. The difference model allows me to account for at least some of the unobservable data that has affected ffr. Therefore, it is no longer needed to assume that ffr is uncorrelated with ai because those unobservables are not in the model anymore. It is important to realize that while the difference model addresses a decent amount of omitted variables, it does not account for all. The table below shows some (not all) examples of omitted variables that are included in ai (and therefore, are accounted for) and uit (still unaccounted for). Included in ai are also some variables that have not been exactly constant over time, but roughly constant. Table E: Examples of Unobservable Data Included in ai Climate, education, statewide standardized processes for making fast foods, mountains and parks per state, state perceptions toward obesity, etc.

Included in uit Gym membership, amount of time spent working out, family obesity history, changes in technology, preferences etc.

Because the difference model accounts for and eliminates quite a number of unobservable data, and is thus more accurate than the two other models, the differenced model is the model that I will be using for the remainder of the project. The 2007 and 2011 individual models are no longer needed. The equation for the difference model, excluding dummy variables, squared terms, and interactions is: obes = 2.3667 0.0004pcinc + 0.0301age + 0.6963ffr 0.1283black + 0.1776gen + 0.0768leisinact + 0.0293fruveg

Masangcay 13 T-Statistics and P-Values The following table shows all t-statistics and p-values for the differenced model. Stata Table E: Regression with T-Statistics and P-Values

The t-statistics are the statistics used to test against any alternative. The rejection rule is that H0 is rejected in favor of H1 at the 5% significance level if t-statistic > critical value. We can test the null hypothesis of a linear relationship between ffr and obes. H0: 4 = 0 H1: 4 0 For this test, the respective critical value is 1.684 and the t-statistic is 0.43. Because the critical value is greater than the t-statistic, we fail to reject the null of a linear relationship between ffr and obes. Likewise, we could run a similar test for fruveg we can test the null hypothesis of a linear relationship between fruveg and obes. H0: 7 = 0 H1: 7 0 The critical value would be the same as our last test, 1.684, but our respective t-statistic is 0.27. Similar to our last test, because the critical value is greater than the t-statistic, we fail to reject the null of a linear relationship between fruveg and obes. The p-value summarizes the strength or weakness of empirical evidence against the null hypothesis. The p-value is the probability of observing a t-statistic if the null hypothesis is true. Therefore, small p-values are evidence against the null hypothesis, while large p-values are evidence for the null hypothesis. As seen from above, none of the variables are statistically significant at any reasonable testing level. These p-values essentially say that the variables have no significant effect on the obesity rate. When the regional dummy variables are added to the model, with the South as the reference group, it is still the case that none of these variables are significant at any reasonable testing level. F-Tests and Joint Significance Joint insignificance appears when the null hypothesis is not rejected and often justifies dropping certain variables from a model. Therefore, an F-Test can be utilized to determine whether certain variables are worth keeping in a model. Using the difference model gives us the following

Masangcay 14 information. When all of the independent variables are kept in the model, R2 is 0.0672. When ffr and fruveg are dropped, R2 is 0.0618. The F-statistic can be calculated from here: F = ((R2UR R2R)/q)/((1 R2UR)/(n k 1)) F = (0.0672 0.0618)/2))/((1 0.0672)/(50 7 1) F = (0.0054/2))/((0.9328)/(42) F = (0.0027)/(0.0222095238095238) F = 0.12157 The F-statistic is very small and thus we fail to reject H0 in favor of H1 at any significance level. This means that the variables ffr and fruveg are jointly insignificant, so dropping these variables from the model would be justified. However, the insignificance of these variables has to do more about precision in estimation and not necessarily causality, so we do not necessarily have to drop these two variables. Dropping these variables would then include them in the error term, which could possibly have some bias effects with the remaining independent variables and would be a big issue to deal with. Just because ffr and fruveg are deemed insignificant, it does not mean that the tests regarding these variables stop here. Rather, it is important to keep these facts in mind throughout the continuance of this project. Dummy Variables Although using dummy variables does not directly answer the question at hand, utilizing them can be useful because they can highlight qualitative factors that are of interest to the project. In this case, regional dummy factors can be used to highlight differences in the obesity rate for the South, Northeast, Midwest, and West. Table C illustrates which states were categorized into which regional categories. The model will takes following form: obes = 0 + 1pcinc + 2age + 3black + 4ffr + 5gen + 6leisinact + 7fruveg + 1ne + 2mw + 3west The reference category is the South, meaning that all comparisons are made against this group, because the South typically has the highest obesity rate. After inputting this information in Stata, the following equation is: obes = 4.7126 0.0001pcinc + 0.3429age 0.3634black + 0.4863ffr + 0.2378gen + 0.2422leisinact + 0.1161fruveg 0.4337ne 0.9549mw 2.1843west Relative to the reference group, the Northeast is 0.4337% more likely to have a lower obesity rate than the South while the Midwest is 0.9549% more likely. The West has a bigger estimation for the difference in obesity rates, as it is predicted that the West will have a 2.1843% lower obesity rate than the South. Since the p-value on ne and mw are 0.579 and 0.165 respectively, we fail to reject the null hypothesis that these areas have the same obesity level at any level. However, the null would not be rejected for west, since its p-value is 0.001. This illustrates that there are indeed regional differences in the obesity rate, but the specific causes for this are unknown. It could be factors such as weather, climate, and the nature of the regions relative to the South. For example, the West probably has the most ideal climates year round for outdoor exercise (think of states such as California, Colorado, Oregon, etc.), especially in comparison to

Masangcay 15 the very hot and humid states in the South. Furthermore, there may be more mountain and parks in the states of the West than the other areas, so more people can utilize outdoor exercise through hikes and whatnot in these states versus the other states. There could be other factors as well, but the prior explanation is just a theory that could perhaps explain the regional differences in the obesity rate, especially in regard to the West vs. South. Quadratics Quadratics are useful for this project because they allow us to see the diminishing increasing or decreasing marginal effect of a variable on obes. I put a quadratic on the terms ffr and fruveg since those are the most relevant variables to the question at hand. Stata Table F: Quadratic Model

The estimated equation shows that both ffr and fruveg have an increasing marginal effect on obes. The turning of a quadratic function can be calculated with the equation x* = | /(2 |. For ffr, before the turning point, it has a negative effect on obes but after the turning point, ffr has a positive effect on obesity. The turning point of ffr is calculated below: ffr = 0.7676178/(2)5.032455 = 0.0762667 With the model, the results show that the effect on obes that ffr has is zero when ffr is 0.0762667. When ffr < 0.0762667, there is a negative effect on obes but when ffr > 0.0762667 this means that obes increases with respect to ffr increasing. A turning point does not have to be calculated for fruveg because there is no turning point. The quadratic shape of ffr is u-shaped, since the coefficient on ffr is negative while the co-efficient on its squared term is positive. For fruveg, the co-efficient on both the squared and normal term is positive, so the quadratic shape to the curve is always upward sloping. There is never a point in which the data will turnaround and the obesity rate will decrease given increases in fruveg. The reason for this is probably due to omitted variable bias. Even though it is possible, it is most likely not the case that as more people eat fruits and vegetables, the obesity rate increases. Although the difference model eliminated the constant error terms, there is still important data unaccounted for that are affecting the model, such as the coefficients on fruveg and fruveg_sq. However, it is interesting to note that the pvalues of fruveg and its squared term are statistically significant at the 5% level in this model.

Masangcay 16 Therefore, we could reject the null of a linear relationship between fruveg and fruveg_sq with obes. 32 states have an ffr value of 0.0762667 or greater. We can see that adding more ffr after 0.0762667 has an increasing effect on obes. This is illustrated below: obes = {[0.76761878 + (2)5.032455]ffr}ffr (0.76761878 + 10.06491ffr)ffr An increase in ffr from 0.08 to 0.09 increases the obesity rate by 0.76761878 + 10.06491(0.08) = 1.5728% whereas the increase from 0.09 to 0.10 increases the obesity rate by 0.76761878 + 10.06491(0.10) = 1.77%. This is a somewhat strong increasing marginal effect after the turning point of 0.76761878. Because more than half of the states have an ffr value equivalent to the turning point or higher, the obesity rate is expected to increase with respect to any additional fast food restaurants built in these areas. Due to the high p-values, ffr and ffr_sq are not statistically different from zero at any reasonable test level, so we fail to reject the null hypothesis of a linear relationship between ffr or ffr_sq with obes. This supports the findings from prior tests as well. Interactions I interacted ffr with pcinc and fruveg with age and here are the Stata results: Stata Table G: Interaction ffr*pcinc

In this model, the partial effect of ffr on obes (holding all other variables fixed) is

= 4 +

8pcinc. If 8 > 0, then an additional fast food restaurant per capita merits a higher obesity rate for those areas with a higher per capita income, meaning that there is an interaction effect between ffr and pcinc. At the mean pcinc of 3,604.6 (Stata Table C), the estimated partial effect of ffr on the obesity rate is 3.248847 (0.000704)(3604.6) = 0.7112086. For a state that has the average per capita income, an increase of ffr by 1 increases the obesity rate by 0.7112%. With the maximum income of 12,390, the partial effect is -5.473713 (decreasing effect) and with the minimum of -3,516, the estimated partial effect is 5.724111 (increasing effect). This illustrates a dramatic difference the effects of fast food restaurants for various spectrums of per capita income levels, since there is a decreasing effect of ffr on obes for higher levels of income, but an increasing effect with lower levels of income. Each t-statistic is also insignificant here, so we are unable to reject H0: 4 = 0, 8 = 0.

Masangcay 17 To find the partial effect of pcinc on the obesity rate, we use 1 + 8ffr in which we have to pick a value for ffr. We can test the null hypothesis that ffr has no effect on the obesity rate. If we use the pcinc mean, we get: H0: 4 + 83,604.6 = 0 against H1: 4 + 83,604.6 > 0. Let = 4 + 83,604.6, so 4 = 83,604.6. When we substitute this into our equation, we get: obes = 0 + 1pcinc + 2age + 3black + 4ffr + 5gen + 6leisinact + 7fruveg + 8ffr*pcinc obes = 0 + 1pcinc + 2age + 3black + [ 83,604.6]ffr + 5gen + 6leisinact + 7fruveg + 8ffr*pcinc obes = 0 + 1pcinc + 2age + 3black + ffr + 5gen + 6leisinact + 7fruveg + 8ffr[pcinc 3,604.6] The relevant test would be H0: = 0. Stata Table H: Interaction ffr*pcinc Test

The t-statistic on ffr is the t-statistic used to test the null hypothesis that ffr has no effect on the obesity rate. The t-statistic is 0.44, which is less than the critical value at the 5% level, so we fail to reject the null hypothesis at the 5% level. At the average per capita income, ffr has a statistically insignificant effect on the obesity rate. As a side note, one could also notice that the co-efficient on is also 0.711 which was also the calculated partial effect of ffr on obes above. Another interaction done is fruveg with age. The partial effect of fruveg on obes (holding all other variables fixed) is = 2 + 8age. If 8 > 0, then increases in the percentage of people who eat the recommended amount of fruits and vegetables daily would cause a higher obesity rate for those who are older. The mean age is for the differenced set is -0.7308, so the estimated partial effect of age on the obesity rate (based on the co-efficients; table not shown) is -5.335453 (-0.5806202)(-0.7308) = -5.759770242. This shows that when there is an increase in fruveg by 10%, the obesity rate will decrease by 0.5760%. With the minimum average age of -1.81, the estimated partial effect of fruveg is -6.386375562 and with the maximum average age of 0.45, the estimated partial effect is -5.07417391. As the results show, there is not much variation regarding the effect of fruveg on obes given different age levels. The null hypothesis stating that fruveg has no effect on the obesity rate can be tested. If we use the mean age, the null and alternative would look like: H0: 2 80.7308 = 0 against H1: 2 80.7308 > 0. We can let = 2 80.7308 so that 2 = + 80.7308. When we substitute this into our equation, we get:

Masangcay 18 obes = 0 + 1pcinc + 2age + 3black + 4ffr + 5gen + 6leisinact + 7fruveg + 8fruveg*age obes = 0 + 1pcinc + [ + 80.7308]age + 3black + 4ffr + 5gen + 6leisinact + 7fruveg + 8fruveg*age obes = 0 + 1pcinc + age + 3black + 4ffr + 5gen + 6leisinact + 7fruveg + 8fruveg[age + 0.7308] Here, the mean age is added, not subtracted, because of the original negative sign associated with it (subtracting a negative means to add the number as a positive term). The relevant test is H0: = 0. Stata Table I: Interaction fruveg*age Test

The t-statistic to focus upon is the one associated with fruveg. The t-statistic is less than the critical value, so we fail to reject the null hypothesis at the 5% level, so at the average age, fruveg has a statistically insignificant effect on the obesity rate. There is also now an associated negative sign with fruveg, stating that as the number of people who eat the recommended amount of fruits and vegetables increases, the obesity rate will decrease. However, this interpretation should be taken lightly because the variable is insignificant.

Conclusion
I originally had three different models for my project the 2007, 2011, and difference model. However, the difference model was the most compelling because it was able to eliminate some omitted variable bias. Since omitted variable bias is a challenge for any project, especially given the number of undocumented variables in my project, I decided it would be best to use the difference model for all of my final results. The overall question regarded if fast food restaurants and the lack in healthier foods eaten have been the driving force behind increasing obesity rates. My results show that these factors do not contribute to the obesity rate. The variable ffr was the most important variable because it measured essentially what my guiding question asked. This variable was insignificant at all levels in every regression. The fruveg variable was important too, but not as much as ffr. This variable was only significant when the quadratic terms (ffr_sq and fruveg_sq) were added into the equation. Additionally, the f-test showed that it is justifiable to drop the ffr and fruveg terms from the regression, further emphasizing the insignificance these variables had in regard to obes.

Masangcay 19 Omitted variable bias was a very big challenge throughout my project. Table E lists a few examples of factors that perhaps affect the obesity rate that I was unable to capture. The reasons for not being able to include these factors are mostly due to a lack of information in these areas or the challenge in being able to quantify the variables. For example, changes in technology may greatly affect the obesity rate. With changes in technology, the method of food processing within fast food restaurants and the process of growing of fruits and vegetables can change and it is possible that this may affect how healthy or unhealthy the final product is. However, technology would extremely hard to measure and insert into a dataset. Other factors, such as amount of time spent working out, are much easier to quantify, but variables such as these had very minimal to no information published for the United States. With that said, because there are still quite a number of variables missing from my project that could greatly influence my results, I do not think that my model was able to accurately answer my guiding question. This leads to the discussion of possible ways to improve upon my project in the future. There are a number of ways to extend and better my project for the future. First, including more observable data in my set would be a major way in which my project can be improved. Specifically, including information regarding the presence of organic and locally grown foods could be interesting, as this would contrast with the results concerning fast food restaurants. Second, choosing different years to research could be very useful. The years that I choose, 2011 and 2007, were arbitrary years because those were the only time periods in which the data I was searching for was available in. If I were able to selectively choose years, I would like to be able to research the mid-1990s, as this is when the obesity rate started to rapidly increase in America. By researching the mid-1990s specifically, perhaps the reason(s) for the immediate increases in obesity rates would be more apparent and obvious. Third, I would find it very intriguing to do the same study, but in another country. Although my research project concerns obesity rates in America, it could be interesting to analyze if the same results are seen in other areas of the world. Because America has had the highest number for obesity rates for quite a number of years, looking into the causes of other countries obesity factors may give insight as to what sets America apart from other areas in terms of health wellbeing. It would also be interesting to try to collect the same data used for my project on an individual basis rather than an aggregate state basis. The ways to improve upon my project extend well beyond the reasons listed above, but if I were to personally do this project again and have more time, these are the areas that I would like to look at. The results from my project definitely surprised me and I did not expect the results received. I thoroughly enjoyed working on this project, but still cannot help but wonder how the results would have differed if I were able to account for all of the variables that I would have liked to include. Attached at the back of this paper are the data from 2007, 2011, and the difference model.

Masangcay 20 Works Cited Anderson, Michael and David Matsa. Are Restaurants Really Supersizing America? 2011. In American Economic Journal. <http://are.berkeley.edu/~mlanderson/pdf/Anderson%20and%20Matsa%202011.pdf>. Centers for Disease Control and Prevention (CDC). Prevalence of Obesity in the United States, 2009- 2010. <http://www.cdc.gov/nchs/data/databriefs/db82.pdf>. Centers for Disease Control and Prevention (CDC). Overweight and Obesity. <http://www.cdc.gov/obesity/data/adult.html>. Currie, Janet, Stefano Della Vigna, Enrico Moretti, and Vikram Pathania. The Effect of Fast Food Restaurants on Obesity and Weight Gain. 2009. In American Economic Journal. <http://www.nber.org/papers/w14721.pdf?new_window=1>. Petrlli, Jennifer and Kathleetn Wolin. 2009. Obesity. Santa Barbara, California: Greenwood Press. Schosser, Eric. 2001. Fast Food Nation. Boston, Massachusetts: Houghton Mifflin Harcourt.

Вам также может понравиться