Ph.D. B.A. Program Class RES600: Introductory to Data Analysis
Professor: Dr. Truel Student: Anh Tran E-mail: Anh.NTran@my.trident.edu Phone: 714-904-6209
Subject
Date
From Case 4 for Module 4: Causality and inference: Structuring the well-formed hypothesis 2-Dec-2013
A. Tran
References 1. Bryant, Adam, INVESTING IT: Duffers Need Not Apply, The New York Times, May 31, 1998, Section 3, page 1 2. Knowledge Base, In SPSS, how do I find outliers in my regression?, Indiana University, 2013, url: http://kb.iu.edu/data/afho.html.
A. Introduction The purpose of this report are 1) to review the value of Adam Bryants articles [1] in the view of statistical approaches and 2) to provide recommendations if the study should be conducted different and the technical reasons for this approach.
Page 2 of 7
B. Analysis
In 1998, Adam Bryant [1] wrote an interesting article titled INVESTING IT : Duffers Need Not Apply to propose that there was a significant relationship between the company stock performance and the golf skill of these companys CEO. According to Bryant [1], the support for this hypothesis is based on the good correlation of three golf handicap sample means of the CEOs from three groups of good, average, and below-average stock performances.
Because the interesting and very entertaining proposal of Mr. Bryant in this article, its important to examine these data and his approach carefully from the statistical standpoints.
After extensively reviewing his article, three main issues of Mr. Bryants thesis have been observed as follows:
First, by selecting the Golf Digest survey data, which was collected based on the golf handicap self-graded data from their readers, Mr. Bryant introduced the sampling frame error, where certain sample elements are excluded or the entire population is not accurately represented. In addition, the sample of 51 CEOs who voluntarily reported their golf handicaps probably may not present the population of all CEOs, who played golf.
Second, assuming that the sample was collected properly from the statistical standpoints, and that there was a significant correlation between the CEOs golf handicaps and their companys stock performance, its not adequate and sufficient to infer a casuality between these variables such as a good CEO golf handicap causes a good stock performance. Therefore, Bryans approach was flaw in this regard.
Third, in Bryants article, Mr. Crystal deleted seven points from his analysis as shown in Figure 4 ( observations numbers 45, 46, 47, 48, 49, 50 and 51 in SPSS Handicap_stockrate.sav data file ) without providing any statically sound criteria for rejecting these data.
Its common that when a sample of N observations of a variable is obtained, there may be some observations that appear to differ markedly from the others. If some mistakes in the survey technique are identified, these observations so called outliers can simply be discarded. If mistakes are not found, in this case, a statistical criterion must be used to identify observations that can be considered for discard.
In order to determine if Mr. Crystals decision to discard these seven observations is valid or not, an influence of these discarded data on the correlation of the company stock
Page 3 of 7
performance versus their CEOs golf handicaps is examined through linear regression. From Figures 1 and 2, its very clear that these discarded observations have a strong effect on the correlation. For a full data set (55 observations), there is a very weak correlation (corr = - 0.04145) between these two variables. For a reduced data set (44 observations), there is a strong correlation (corr=-0.4172). This conclusion is also supported by the descriptive statistical analysis of the means of the CEO golf handicaps for three groups of good, average, and below-average stock performances as shown in Figure 3. From Figure 3, the 44-observation sample shows a monotonic increase of the CEO gold handicaps from 12.42 to 14.56 to 17.22 across the three groups of good, average, and below-average stock performances. However, the full sample shows a non-linear relationship with a value going from 17.10 to 13.71 to 16.87 across the same stock performance groups.
Figure 1 SUMMARY OUTPUT Regression Statistics Multiple R 0.41448 R Square 0.171794 Adjusted R Square 0.152075 Standard Error 19.09969 Observations 44
Page 4 of 7
Figure 2
Figure 3 SUMMARY OUTPUT Regression Statistics Multiple R 0.04172 R Square 0.00174 Adjusted R Square -0.0186 Standard Error 25.3818 Observations 51
Page 5 of 7
From these above analyses, it may be correct that some of these observations are outliers. However, its a flaw to just discard the observations based on the a statistical criterion must be used to identify points that can be considered for rejection.
In SPSS, one of the criteria to identify outliers is to compare the centered leverages of each observations and compare them with the significant value defined by 2*p/n, which lies between 0 and (n-1)/n, where n is the number of observations and p is the number of independent or explanatory variables. Its recommended that you might like to discard the observations for which leverage is greater than 2*p/n. For this data, the 2*p/n is calculated to be 0.03922.
The observations and their calculated centered leverages are shown in Figure 4. The SPSS criteria shows that the last ten observations can be considered as outliers. Therefore, Mr. Crystals decision to discard these last seven observations is valid from the statistical influence standpoint. It can also be observed that the percentage on the observations being discarded may not be used to invalidate the selected data.
Figure 4
Page 6 of 7
Now that Mr. Crystals data is valid and there is a good observation on the correlation between the CEOs golf handicaps and their companys stock performance, the next question will be if the differences of the means of these golf handicaps between these stock performance groups are statically significant. In order to address this concern, an ANOVA test is conducted.
Figure 5 shows the ANOVA analysis on the 51-data and the 44-data sets. From Figure 5, we can observe that the p-values of the ANOVA F-test for the 51-data and the 44-data sets are 0.156 and 0.077 respectively. The p-values are greater than 0.05. Therefore, it can be concluded that there is no sufficient evidence to reject the null hypothesis that all means are equal. The ANOVA results also shows that the outliers has a significant influence on outcome. The p-value decrease from 0.156 for the 55-data set to 0.077 for a 44-data set. In brief, we can claim that Bryans conclusion is invalid and wrong.
Figure 5
The independent samples t-tests between the group 1 (below-average stock performance) versus the group 2 ( average stock performance ), and the group 2 ( average stock performance ) versus group 3 ( good stock performance ) show the same conclusion that there is not enough evidence to reject the null hypothesis and no statistically significant difference between the handicap means of these groups. The t-test results are shown in Figure 6.
Its not quite clear why the nationally well-respected newspapers such as the New York Times had published this article, where the conclusion is not well supported. However, its suspected that the entertaining value of the story to try to link two hot subjects: business success and golf success may give the NYT editors enough reason to publish it.
General Static Load Capacity in Slewing Bearings. Unified Theoretical Approach For Crossed Roller Bearings and Four Contact Point Angular Ball Bearings