Академический Документы
Профессиональный Документы
Культура Документы
Magpoc, Maverick M.
Department of Biology
College of Science
Problem Set
A marine biologist in charge of four marine reserves located on a small island noticed that
one of the marine reserves (Area ‘A’) was twice the size of the other areas (‘B’, ‘C’ and ‘D’).
Considering that all other aspects of the marine reserves were equal except for size, the biologist
wanted to find out if the size of the marine reserve had an effect on the overall size of fish species
living within them. To test this, he designated a single fish species Acanthurus olivaceous as the
test species, and collected 10 specimens of this fish in each of the four marine reserves. He
Area A: (78, 88, 87, 88, 83, 82, 81, 80, 80, 89)
Area B: (78, 78, 83, 81, 78, 81, 81, 82, 76, 76)
Area C: (79, 73, 79, 75, 77, 78, 80, 78, 83, 84)
Area D: (77, 69, 75, 70, 74, 83, 80, 75, 76, 75)
Introduction
Data Management is the operations needed for a systematic, coherent process of data
collection, storage and retrieval. It is essential part of research and documentation of analyses.
Proper data handling and management is crucial to the success and reproducibility of a statistical
analysis. Selection of the appropriate tools and efficient use of these tools can save the researcher
numerous hours, and allow other researchers to leverage the products of their work.
Data analysis for quantitative studies, on the other hand, involves critical analysis and
interpretation of figures and numbers, and attempts to find rationale behind the emergence of main
findings. Comparisons of primary research findings to the findings of the literature review are
The statistical method used to analyze and solve the problem set is one-way analysis of
variance (ANOVA). This statistical tool compares the sets of data among three or more groups
based on only one independent variable, and determines if there is a statistically significant
The objectives of the activity are to be able to learn some of the principles and techniques
of data management; to be able to familiarize one’s self to different statistical methods; and to
analyze and ascertain the proper statistical approach for a given set of data.
Hypotheses
H0: The area of the marine reserves has no effect on the size of the fish.
HA: The area of the marine reserves affects the size of the fish.
Results and Discussion
Table 1. Tabulated form of the data set with the computed sums and means of the values.
Area A Area B Area C Area D
78 78 79 77
88 78 73 69
87 83 79 75
88 81 75 70
83 78 77 74
82 81 78 83
81 81 80 80
80 82 78 75
80 76 83 76
89 76 84 75
̅
𝒙 83.6 79.4 78.6 75.4
̅
𝑿 79.25
The data in the problem set shows directly that there are four different groups. Each of the
groups possess an independent variable, which differ from one group to another. In this case, the
independent variable is the size of the marine reserve, where Area A has an area twice the size of
that of Areas B, C, and D. The dependent variable is the size of the fish, which is to be tested
according to the problem. In order to determine if the size of the marine reserve does have an effect
Other statistical methods could have been used to analyze the data, but one-way ANOVA
was the best one to be used for the type of data that the problem set possesses. The numerical data
does not show any sign of ranking, therefore Mann-Whitney U Test, Kruskal-Wallis H Test, and
Spearman’s Correlation Coefficient were not the best methods to analyze the data. The data also
shows more than three groups of data, thus eliminating t-test, z-test, chi-square test, and Pearson’s
Correlation Coefficient from being the best method for analysis. The difference between one-way
and two-way ANOVA is that the former involves only one independent variable, while the latter
involves two variables. In the problem set, only one variable was analyzed, and thus, one-way
One-way analysis of variance (ANOVA) is the test used to analyze and solve the problem.
This test compares the means of three or more groups or categories affected by one variable, and
determines if a statistically significant difference exists among the data. Usually, it tests the null
hypothesis (H0), which states that there is no significant difference between the means of three or
H0: 𝜇1 = 𝜇2 = 𝜇3 = ⋯ = 𝜇𝑘
where 𝜇 is the mean, and k is the number of groups. If, for example, a statistically
significant difference was observed in the data, then the alternative hypothesis (HA) is accepted
However, one-way ANOVA does not specifically state which two sets of data significantly
differ from each other. This is thus called an omnibus test. In order to determine which two sets of
sum of the squared differences of each value from the grand mean. Two sums of squares were
computed in the analysis: Sum of Squares of Treatments (SSb) and Sum of Squares of Errors
Computations for the SSb and SSw are shown in Tables 4 and 5.
The Mean of Squares of Treatments and Errors is computed by simply dividing the sum of
𝑆𝑆
𝑀𝑆 =
𝑑𝑓
The values of MSb and MSw are determined through the computations below:
𝑆𝑆𝑏 341.9 𝑆𝑆𝑤 455.6
MSb= 𝑑𝑓 = = 𝟏𝟏𝟑. 𝟗𝟕 MSw= = = 𝟏𝟐. 𝟔𝟔
1 3 𝑑𝑓2 36
With the MSb and MSw determined, the F-value can now be determined. The formula for
𝑀𝑆𝑏
F=𝑀𝑆𝑤
𝑀𝑆𝑏 113.97
F= = = 𝟗. 𝟎𝟎
𝑀𝑆𝑤 12.66
Now that the F-value is determined, the critical value must also be obtained from the F
table (Appendix 1). The values of df1 and df2 are needed to determine the critical value. However,
since most of the standardized range distribution table does not show the value for df2=36, the
method of interpolation must be used with a formula similar to the two point formula in basic
algebra in which:
𝑦2 − 𝑦1
𝑦 − 𝑦1 = (𝑥̅ − 𝑥̅1 )
𝑥̅2 − 𝑥̅1
2.9223 − 2.8387
2.9223 − 𝐹𝐶 = (36 − 30)
40 − 30
𝑭𝑪 = 𝟐. 𝟖𝟕
Comparing the F-value to the critical value (FC), the F-value is greater than the critical
Since the F-value (F=9.00) is greater than the critical value (FC=2.87), the null hypothesis
Considering that the null hypothesis is rejected, the Tukey-Kramer method is performed in
which the pair(s) of means that differ is determined. In this test, the standardized range distribution
table (Appendix 2) is utilized in which the values of k and df2 will be determined since it will be
√𝑀𝑆𝐸
𝜔 = 𝑞𝛼 (𝑘, 𝑑𝑓2 )
√𝑛
Based on the data, the values for k and df2 are 3 and 36 respectively with a test statistic of
0.05.
As shown in the standardized distribution tables the following values and computation
3.858 − 3.919
𝑦 − 3.919 = (36 − 30)
40 − 30
𝒚 = 𝟑. 𝟖𝟖𝟐𝟒
𝒒𝒂 (𝒌, 𝒅𝒇𝟐 ) = 𝟐. 𝟖𝟕
Substituting the interpolated value to the formula with the values of MSE and n of 12.66
and 40 respectively,
√12.66
𝜔 = 3.88
√40
𝜔 = 𝟐. 𝟏𝟖
The value of 𝜔 will then be the basis in the differences of the mean. If the difference
between two means is greater than the computed value of 2.18, then that pair of means is what
differs
In this problem, the mean difference of each of pair are shown in Table 6.
Conclusion
Since the F-value (F=9.00) is greater than the critical value (FC=2.87), the null hypothesis
must be rejected. Therefore, it can be concluded that the area of the marine reserves affects the
Samuels, M.L., Witmer, J.A., Schaffner, A.A. (2015). Statistics for the Life Sciences. United States:
Pearson Education.