Академический Документы
Профессиональный Документы
Культура Документы
We want to see whether any of these result sets is better than any of the others. To do this, we will do pairwise comparisons of them => three pairs. Comparing two result sets, our null hypothesis is that there's no statistically significant difference between them. There's an elaborate way and a simpler way to do this Before we start, here's what the raw results look like. How do you think they compare?
97.5 97.0 96.5 96.0 95.5 95.0
94.5
Alg1
Alg2
Alg3
10
The result of this covers several lines (my annotations in red): t-Test: Paired Two Sample for Means Variable 1 Variable 2 94.7976 96.2183 0.194812 0.412564 10 10 0.17494 0 9 Degrees of freedom -6.302187 t-statistic 7.03E-05 1.833113 0.000141 Note 2 2.262157 Note 1
Mean Variance Observations Pearson Correlation Hypothesized Mean Difference df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail
Note 1: We use a two-tailed test if either set of results might be better. If |t-stat| > t-crit, we reject the null hypothesis that means are same, and conclude the difference between the algorithms is significant. Conclusion: these two sets of results are not equally good, at the 95% confidence level. Note 2: This is the value of alpha at which the hypothesis would be rejected. Thus, our conclusion holds up to a confidence level of 1 - P(T<=t): 99.986%
We can use this to test each pair of result sets: Pair 1 2 3 Set 1 Alg1 Alg2 Alg1 Set 2 Alg2 Alg3 Alg3 P(T<=t) Confidence 0.014% 99.986% 9.812% 90.188% 0.007% 99.993%
We have to decide on our confidence level. 95% is most commonly used, and 99% is also popular. 95% corresponds to a p-value of 5%, meaning that we accept a 1 in 20 chance of incorrectly rejecting the null hypothesis. If the p-value calculated is lower than this threshold, then we reject the null hypothesis.
Note 3: Excel's T-Test formula can return a value of #DIV/0 (division by zero) in two situations: 1) Both algorithms have identical results [no difference] 2) Extremely large difference between results [infinity] Obviously, these two situations are very different from each other! If you get a #DIV/0 result you should examine the data to determine whether this corresponds to evidence that the null hypothesis should be accepted or rejected.
Conclusions:
On this particular data set, at a confidence level of 95%, from the table above: - Alg1 and Alg2 are not equally good - Alg1 and Alg3 are not equally good - Alg2 and Alg3 are equally good
Zeros 0 0 0 0 0 0 0 0 0 0
Alg1-Alg2 Alg2-Alg3 -1.599 0.564 -1.694 0.000 0.094 -0.376 -1.976 0.282 -0.941 0.376 -1.975 0.000 -0.565 -0.188 -1.694 0.189 -1.976 0.941 -1.881 0.470
Alg1-Alg3 -1.035 -1.694 -0.282 -1.694 -0.565 -1.975 -0.753 -1.505 -1.035 -1.411
Now we can perform a t-test comparing the zeros to each of the sets of differences:
Pair 1 2 3
The formula here is: =TTEST(D17:D26,E17:E26,2,1) These results are exactly the same as before, so the same conclusions are drawn. If you do the test 'the elaborate way', you of course get the same t-statistic results. However, the individual variable statistics (mean, variance) will be different since the raw numbers are different, and the Pearson Correlation figure will be undefined since the row of zeros gives a divide-by-zero error.