Вы находитесь на странице: 1из 10

Using the regression function in Excel, we obtained the following estimates for our pizza- demand regression equation:

Y 26.67 0.088X1 0.138X2 0.076X3 0.544X4 ( 0.018) ( 0.087) ( 0.020) ( 0.884) R2 0.717 Standard error of Y estimate 1.64 2 0.67 F 15.8 ( Standard errors of the coefficients are listed in parentheses.) Before interpreting these results, we should first think about what direction of impact changes in the explanatory variables are expected to have on the demand for pizza as evidenced by the signs of the estimated regression coefficients. To put it more formally, we can state the following hypotheses about the anticipated relationship between each of the explanatory variables and the demand for pizza: Hypothesis 1: The price of pizza ( X1) is an inverse determinant of the quantity of pizza demanded ( i. e., the sign of the coefficient is expected to be negative). Hypothesis 2: Assuming tuition to be a proxy for income, pizza could be either a normal or an inferior. Therefore, we hypothesize that tuition ( X2) is a determinant of the demand for pizza, but we cannot say before-hand whether it is an inverse or a direct determinant ( i. e., the sign of the coefficient could be either positive or negative). Hypothesis 3: The price of a soft drink ( X3) is an inverse determinant of the demand for pizza ( i. e., the sign of the coefficient is expected to be negative). Hypothesis 4: Location in an urban setting ( X4) is expected to be an inverse determi-nant of the demand for pizza.

Turning now to the regression results, we observe that the X1 coefficient has a negative sign, and this is exactly what we would expect because of the law of demand. As the price of pizza ( X1) changes, the quantity demanded for pizza will change in the opposite direction. This is what a negative coefficient tells us. The positive sign of the tuition coefficient tells us that tuition costs and quantity of pizza demanded are directly related to each other. Higher tuition costs are associated with a greater demand for pizza, and vice versa. Thus, pizza appears to be a normal product. The negative sign of the soft drink price confirms the complementarity between soft drinks and pizza. As the price of a soft drink goes up, college students tend to buy less pizza. The opposite would hold true for a reduction in the price of a soft drink. Finally, the negative sign of the dummy location variable tells us that those students who attend colleges in urban areas will buy about half a slice of pizza per month ( i. e., 0.544) less than their counterparts in the suburbs or rural areas. An interpretation of the magnitudes of the estimated regression coefficients is a bit more involved. Each estimated coefficient tells us how much the demand for pizza will change relative to a unit change in each of the explanatory variables. For example, a b1 of 0.088 indicates that a unit change in price will result in a change in the demand for pizza of 0.088 in the opposite direction. Price, as you will recall, was measured in cents. Therefore, according to our regression estimates, a 100- cent ( or $ 1.00) increase will result in a decrease in the quantity demanded for pizza of 8.8 ( 100 0.088). A tuition increase of one unit ( in this case $ 1,000) results in an increase in the quantity demanded for pizza of 0.138. Are these changes and those associated with changes in the price of soft drinks and the location of the college campus substantial or inconsequential? Researchers who are constantly estimating the demand for a particular good or service will have a fairly accurate idea whether the magnitudes of the coefficients estimated in a particular study are high or low relative to their

other work. But if there are no other studies available for comparison, then researchers can at least use the elasticities of demand to gauge the relative impact that the explanatory variables have on the quantity demanded. From our discussion of elasticity in chapter 4, you can see that regression analysis results are ideal for point- elasticity estimation. Recall that the formula for computing point elasticity is where Q quantity demanded and X any variable that affects Q ( e. g., price or income). In the case of our estimated demand for pizza, let us assume the explanatory variables have the following values: Price of pizza ( X1) 100 ( i. e., $ 1.00) Annual college tuition ( X2) 14 ( i. e., $ 14,000) Price of a soft drink ( X3) 110 ( i. e., $ 1.10) Location of campus ( X4) Urban area ( i. e., X4 1) Therefore, inserting these values into the estimated equation gives us Y 26.67 0.088 ( 100) 0.138 ( 14) 0.076 ( 110) 0.544 ( 1) 10.898 or 11 ( rounded to the nearest slice) To compute the point elasticities for each variable assuming the preceding values, we simply plug in the appropriate numbers into the point- elasticity formula. The partial derivative of Y with respect to changes in each variable ( i. e., Y/ X) is simply the estimated coefficient of each variable. Cross- price elasticity: 0.076 * 110 10.898 = - 0.767 Tution elasticity: 0.138 * 14 10.898 = 0.177 Price elasticity: - 0.088 * 100 10.898 = - 0.807. With these estimates, we can say that the demand for pizza is somewhat price inelastic and that there is some degree of crossprice elasticity between soft drinks and pizza. Judging from the rather low elasticity coefficient of 0.177, tuition does not appear to have that great an impact on the demand for pizza. Statistical Evaluation of the Regression Results Our regression results are based on a sample of colleges across the country. How confi-dent are we that these results are truly reflective of the population of college students? The basic test of the statistical significance of each estimated regression coefficient is called the t- test. Essentially, this test is conducted by computing a t- value or tstatistic for each estimated coefficient. This is done by dividing

the estimated coefficient by its standard error. 5 That is: As is the common practice in presentations of regression results, the standard errors in our pizza regression are presented in parentheses under the estimated coeffi-cients. To interpret the value of t, we use the t- table. The convention in economic research is to select the .05 level of significance. This means you can be 95 percent con-fident that the results obtained from the sample are representative of the population. We also need to know the number of degrees of freedom involved in the estimate. The term, degrees of freedom, is defined as n- k- 1, where n represents the sample size and k the number of independent variables. The 1 represents the constant or inter-cept term. Therefore, in our pizza example, we have 30- 4- 1 or 25 degrees of freedom. Turning to the t- table shown in Table A. 4 in Appendix A, we see that the critical t- value at the .05 level of significance is 1.708 using a one- tail test and 2.060 using a twotail test. 6 If the t- value computed for a particular estimated coefficient is greater than 1.708, we can say that the estimate is significant at the .05 level using a one- tail test. If it is greater than 2.060, then the same can be said, but with a two- tail test. A sim-ple and useful way to handle the critical level is to use the rule of 2. This means that if the absolute value of t is greater than 2, we can conclude that the estimated coefficient is significant at the .05 level. It is evident from the preceding regression equation that X1 ( price of pizza) and X3 ( price of soft drinks) are statistically significant because the absolute values of their tstatistics are 4.89 and 3.80, respectively. The other two variables, X2 ( tuition) and X4 ( location), are not statistically significant because the absolute values of their t- statistics are less than 2. If the estimated coefficient of a variable passes the t- test, we can be confident that the variable truly has an impact on demand. If it does not pass the t- test, then in all likelihood, the variable does not truly have an impact for the whole population of col-lege students. In other words, the regression coefficients are nonzero

numbers simply because of a fluke in the sample of students that we took from the population. In statistical analysis, the best we can hope for is to be confident that our sample results are truly reflective of the population that they represent. However, we can never be absolutely sure. Therefore, statistical analysts set up degrees of uncertainty. As explained in greater detail later in this chapter, using the rule of 2 generally implies. a 5 percent level of significance. In other words, by declaring a coefficient that passes the rule- of- 2 version of the t- test to be statistically significant, we leave ourselves open to a 5 percent chance that we may be mistaken. Another important statistical indicator used to evaluate the regression results is the coefficient of determination or R2. This measure shows the percentage of the variation in a dependent variable accounted for by the variation in all the explanatory variables in the regression equation. This measure can be as low as 0 ( indicating that the variations in the dependent variable are not accounted for by the variation in the explanatory variables) and as high as 1.0 ( indicating that all the variation in the depen-dent variable can be accounted for by the explanatory variables). For statistical analysts, the closer R2 is to 1.0, the greater the explanatory power of the regression equation. In our pizza regression, R2 0.717. This means that about 72 percent of the variation in the demand for pizza by college students can be accounted for by the varia-tion in the price of pizza, the cost of tuition, the price of a soft drink, and the location of the college. As explained later in this chapter, R2 increases as more independent variables are added to a regression equation. Therefore, most analysts prefer to use a measure that adjusts for the number of independent variables used so equations with different numbers of independent variables can be more fairly compared. This alternative measure is called the adjusted R2. As it turns out, the adjusted R2 for this equation is 0.67. Another test, called the F- test, is often used in conjunction with R2. This test measures the statistical significance of the entire regression

equation rather than of each individual coefficient ( as the t- test is designed to do). In effect, the F- test is a measure of the statistical significance of R2. The procedure for conducting the Ftest is similar to the t- test. A critical value for F is first established, depending on the degree of statistical significance that the researcher wants to set ( typically at the .05 or .01 level). The critical F- values corresponding to these acceptance levels are shown in Table A. 3 in Appendix A. As can be seen, there are two degrees of freedom values that must be incorporated in the selection of the critical F- value. These values essen-tially relate to the sample size and number of independent variables in the equation, and the sample size minus the number of independent variables plus the intercept of the equation. Therefore, because the pizza example has a sample size of 30 and four independent variables, the degrees of freedom are 4 and 25 ( 30 minus 4 minus 1). Table A. 3 shows that at the .05 level, the critical F- value with those degrees of freedom is 2.76. At the . 01 level, the critical value is 4.18. Because the regression results for the demand for pizza indicates an F- value of 15.8, we can conclude that our entire equa-tion is statistically significant at the .01 level. Step 1: Check Signs and Magnitudes The negative sign for the P variable indicates an inverse relationship between price and the quantity demanded for the product. A unit increase in price ( i. e., 1 cent) will cause the quantity to decrease by ten units. A unit decrease in price will cause the quantity to increase by ten units. So, for example, if price was decreased by $ 1.00, quantity would increase by 1,000 units. The positive sign for the PX variable indicates a direct relationship between the price of a related product and the quantity demanded. This indicates that the related product is a substitute for the product in question. For example, if the price of the related product changes by one unit ( i. e., 1 cent), then the quantity demanded of the product in question will change by four units in the same direction. The

positive sign for the I variable indicates that the product is normal or per-haps superior, depending on the magnitude of the income elasticity coefficient. A unit change in per capita income ( i. e., $ 1,000) will cause the quantity to change by fifty units in the same direction. Step 2: Compute Elasticity Coefficients To compute elasticity coefficients, we need to assume certain levels of the indepen-dent variables P, PX, and I. Let us say they are as follows: P 100 ( remember, this is 100 cents or $ 1.00) PX 120 ( also in cents) I 25 ( this represents $ 25,000) Inserting these values into the previous equation gives us Q 70 10( 100) 4( 120) 50( 25) Q 800 We now use the formula for point elasticity to obtain the elasticity coefficients. Recall that Using this formula, we obtain Step 3: Determine Statistical Significance Using the rule of 2 as an approximation for the .05 level of significance, we can say that P and PX are statistically significant because their t values are both greater than 2 ( e. g., 3.3 and 2, respectively). I is not statistically significant at the .05 level because its t value is only 1.67. As an added consideration, we note that the R2 of 0.47 indicates that 47 per-cent of the variation in quantity can be accounted for by variations in the three independent variables P, PX, and I. Although this is not actually an indication of statistical significance, it does show the explanatory power of the regression equation. For cross- section data, this R2 level can be interpreted as being moder-ately high.
The correlation coefficient is a measure of the degree of association between two variables. This measure, denoted r, ranges from a value of 1 ( perfect negative correlation) to 1 ( perfect positive correlation).

Questions: 1. What can you say about R and R-squared?

The Degree to which two or more independent variables (X) are related to the dependent (Y) variable is expressed in the correlation coefficient R. The correlation coefficient is a measure of the degree of association between two variables. This measure, denoted r, ranges from a value of 1 ( perfect negative correlation) to 1 ( perfect positive correlation). In this case So, in this case R is 0.8819. So, there is positive correlation between the dependent and independent variables. R is a measure of the strength of a linear relationship. R2 is the proportion of variabilty in the response explained by the regression model. R2 (coefficient of determination): percentage of variation in the variable (Y) accounted for by variation in all explanatory variables (X) R2 value ranges from 0.0 to 1. The closer to 1.0, the greater the explanatory power of the regression. The interpretation of R-square is: "The amount of variance in the dependent variable that can be explained by the model." If the R-square value is 1.0, this means the model explains 100% of the variance and so the model will produce perfect predictive accuracy. This never happens in the real world though. The point is, the closer to 1.0 the R-square value is, the better the model. The closer the R-square value is to 0, the worse the model. R squared is 0.7778. R2 is closer to 1 thus its good model. R2 = 0.7778 means that 77.78% of the variation of yi(demand) can be accounted for by variations in the independent variables of No of inhabitant, income, competitors, and price.

2. How do you interpret the standard error?

Standard Error is 18.3609. The standard error here refers to the estimated standard deviation of the error term u. It is sometimes called the standard error of the regression. It equals sqrt(SSE/(n-k) The standard error is an estimate of the standard deviation of the coefficient, the amount it varies across cases. It can be thought of as a measure of the precision with which the regression coefficient is measured. If a coefficient is large compared to its standard error, then it is probably different from 0. The Standard Error is the error you would expect between the predicted and actual dependent variable. Thus, 18.3609 means that the expected error for a this cases' prediction is off by 18.306.

3. Explain briefly the meaning of the F-test and interpret its value

F test is 8.7521 F-test: measures statistical significance of the entire regression as a whole (not each coefficient)F test: The F test is used to test the significance of R, which is the same as testing the significance of R2, which is the same as testing the significance of the regression model as a whole. If prob(F) < .05, then the model is considered significantly better than would be expected by chance and we reject the null hypothesis of no linear relationship of y to the independents. F is a function of R 2, the number of independents, and the number of cases. F is computed with k and (n - k 1) degrees of freedom, where k = number of terms in the equation not counting the constant. F = [R2/k]/[(1 - R2 )/(n - k - 1)]. Alternatively, F is the ratio of mean square for the model (labeled Regression) divided by mean square for error (labeled Residual), where the mean square are the respective sums of squares divided by the degrees of freedom. This test measures the statistical significance of the entire regression equation rather than of each individual. In effect, the F- test is a measure of the statistical significance of R2. A critical value for F is first established, depending on the degree of statistical significance that the researcher wants to set ( typically at the .05 or .01 level). As can be seen, there are two degrees of freedom values that must be incorporated in the selection of the critical F- value. These values essentially relate to the sample size and number of independent variables in the equation, and the sample size minus the number of independent variables plus the intercept of the equation. Therefore, because this example has a sample size of 15 and four independent variables, the degrees of freedom are 4 and 10 ( 15 minus 4 minus 1). Table A. 3 shows that at the .05 level, the critical F- value with those degrees of freedom is 3.48. At the .01 level, the critical value is 5.99. Because the regression results for the demand for this case indicates an F- value of 8.75, we can conclude that our entire equation is statistically significant at the .01 level.

4. For which variables did you obtain a significant coefficient? Interpret all the coefficients from an economic point of view Negative coefficient shows that as the independent variable (X n) changes, the variable (Y) changes in the opposite direction. Positive coefficient shows that as the independent variable (Xn) changes, the dependent variable (Y) changes in the same direction. Magnitude of regression coefficients is a measure of elasticity of each variable To interpret the direction of the relationship between variables, look at the signs (plus or minus) of the regression coefficients (B). If a B coefficient is positive, then the relationship

of this variable with the dependent variable is positive. If the B coefficient is negative then the relationship is negative and if it is equal to 0 then there is no relationship between the variables. The coefficients were determined for No of Inhabitants = -9.9105, Income = 10.1086, Competitors = -10.1397 and Price = 46.196. Means demand will change in the opposite direction of No of Inhabitants and Competitiors and in the same direction as Income and Price. Y=-125.94-9.9105(x1)+10.1086(x2)-10.1397(x3)+46.1962(x4) where the average X1=6.0707, average X2= 42.6855, average X3= 2.7333 and average X4= 2.6567 Y= -125.94-9.9105(6.0707)+10.1086(42.6855)-10.1397(2.7333 )+46.1962(2.6567) = 340.4094 No of Inhabitants Elasticity= -9.9105 *(6.0707/340.4094) = -0.1767 demand is inelastic Income Elasticity= 10.1086*(42.6855/340.4094) = 1.2676 demand is elastic Competitors elasticity= -10.1397*(2.7333/340.4094) = -0.0814 demand is inelastic Price Elasticity = 46.1962*(2.6567/340.4094) = 0.3605 demand is inelastic

5. Which variables you would eliminate if you were to re-do the estimation?

The variable to eliminate in a multiple regression model are the ones that contains explanatory variables that are highly related to each other, this can cause problems in the estimated slopes. In such a case, we should eliminate the redundant variables from the model. In this case, I did calculate the VIF of each independent variable (see summary output sheet) and each VIF was small indicating that no two variables are significantly related.

Вам также может понравиться