Data Management: One-Way Analysis of Variance (ANOVA) : Problem Set

Data Management: One-Way Analysis of Variance (ANOVA)
Enaje, Shana Marisse E.
Labao, Jerzeth Kate A.
Magpoc, Maverick M.
Roxas, Ria Francesca M.
Department of Biology
College of Science
University of Santo Tomas, España Street, Manila, 1501
Problem Set
A marine biologist in charge of four marine reserves located on a small island noticed that
one of the marine reserves (Area ‘A’) was twice the size of the other areas (‘B’, ‘C’ and ‘D’).
Considering that all other aspects of the marine reserves were equal except for size, the biologist
wanted to find out if the size of the marine reserve had an effect on the overall size of fish species
living within them. To test this, he designated a single fish species Acanthurus olivaceous as the
test species, and collected 10 specimens of this fish in each of the four marine reserves. He
measured each fish (cm) and tabulated the data below.
Area A: (78, 88, 87, 88, 83, 82, 81, 80, 80, 89)
Area B: (78, 78, 83, 81, 78, 81, 81, 82, 76, 76)
Area C: (79, 73, 79, 75, 77, 78, 80, 78, 83, 84)
Area D: (77, 69, 75, 70, 74, 83, 80, 75, 76, 75)
Introduction
Data Management is the operations needed for a systematic, coherent process of data
collection, storage and retrieval. It is essential part of research and documentation of analyses.
Proper data handling and management is crucial to the success and reproducibility of a statistical
analysis. Selection of the appropriate tools and efficient use of these tools can save the researcher
numerous hours, and allow other researchers to leverage the products of their work.
Data analysis for quantitative studies, on the other hand, involves critical analysis and
interpretation of figures and numbers, and attempts to find rationale behind the emergence of main
findings. Comparisons of primary research findings to the findings of the literature review are
critically important for both types of studies – qualitative and quantitative.
The statistical method used to analyze and solve the problem set is one-way analysis of
variance (ANOVA). This statistical tool compares the sets of data among three or more groups
based on only one independent variable, and determines if there is a statistically significant
difference between at least two of the groups.
The objectives of the activity are to be able to learn some of the principles and techniques
of data management; to be able to familiarize one’s self to different statistical methods; and to
analyze and ascertain the proper statistical approach for a given set of data.
Hypotheses
The null hypothesis states that:
H0: The area of the marine reserves has no effect on the size of the fish.
The alternative hypothesis states that:
HA: The area of the marine reserves affects the size of the fish.
Results and Discussion
Table 1. Tabulated form of the data set with the computed sums and means of the values.
Area A Area B Area C Area D
78 78 79 77
88 78 73 69
87 83 79 75
88 81 75 70
83 78 77 74
82 81 78 83
81 81 80 80
80 82 78 75
80 76 83 76
89 76 84 75
̅
𝒙 83.6 79.4 78.6 75.4
̅
𝑿 79.25
The data in the problem set shows directly that there are four different groups. Each of the
groups possess an independent variable, which differ from one group to another. In this case, the
independent variable is the size of the marine reserve, where Area A has an area twice the size of
that of Areas B, C, and D. The dependent variable is the size of the fish, which is to be tested
according to the problem. In order to determine if the size of the marine reserve does have an effect
on fish size, one-way ANOVA must be used as the statistical tool.
Other statistical methods could have been used to analyze the data, but one-way ANOVA
was the best one to be used for the type of data that the problem set possesses. The numerical data
does not show any sign of ranking, therefore Mann-Whitney U Test, Kruskal-Wallis H Test, and
Spearman’s Correlation Coefficient were not the best methods to analyze the data. The data also
shows more than three groups of data, thus eliminating t-test, z-test, chi-square test, and Pearson’s
Correlation Coefficient from being the best method for analysis. The difference between one-way
and two-way ANOVA is that the former involves only one independent variable, while the latter
involves two variables. In the problem set, only one variable was analyzed, and thus, one-way
ANOVA was used as the test statistic.
One-way analysis of variance (ANOVA) is the test used to analyze and solve the problem.
This test compares the means of three or more groups or categories affected by one variable, and
determines if a statistically significant difference exists among the data. Usually, it tests the null
hypothesis (H0), which states that there is no significant difference between the means of three or
more different sets of data. This is mathematically stated as:
H0: 𝜇1 = 𝜇2 = 𝜇3 = ⋯ = 𝜇𝑘
where 𝜇 is the mean, and k is the number of groups. If, for example, a statistically
significant difference was observed in the data, then the alternative hypothesis (HA) is accepted
rather than the null hypothesis.
However, one-way ANOVA does not specifically state which two sets of data significantly
differ from each other. This is thus called an omnibus test. In order to determine which two sets of
data differ from each other, the Tukey Method is utilized.
Table 2. Computation of the values for  and degrees of freedom.

α 0.05
dfb (df1) k – 1 4–1 3
dfw (df2) n – k 40 – 4 36
dfT (df3) n – 1 40 – 1 39
Table 3. Determination of the F value.

SS Df (Degrees MS (Mean of
F-value
(Sum of Squares) of Freedom) Squares)
Treatments
341.9 3 113.97
(between/b)
9.00
Error (within/w) 455.6 36 12.66
Total 797.5 39
There are several steps in computing for the F-value of the data. The Sum of Squares is
sum of the squared differences of each value from the grand mean. Two sums of squares were
computed in the analysis: Sum of Squares of Treatments (SSb) and Sum of Squares of Errors
(SSw). The formulae for both values are:
𝑺𝑺𝒃 = 𝑛𝛴 (𝑋̅ − 𝑥̅𝑖 )2 𝑺𝑺𝒘 = 𝛴𝛴 (𝑥̅𝑖𝑗 − 𝑥̅𝑖 )2
Computations for the SSb and SSw are shown in Tables 4 and 5.
Table 4. Computation for SSb.

SSA 10 (83.6-79.25)2
SSB 10 (79.4-79.25)2
SSC 10 (78.6-79.25)2
SSD 10 (75.4-79.25)2
SSb 341.9
Table 5. Computation for SSw.

A B C D
(78-83.6)2 (78-79.4) 2 (79-78.6) 2 (77-75.4) 2
(88-83.6) 2 (78-79.4) 2 (73-78.6) 2 (69-75.4) 2
(87-83.6) 2 (83-79.4) 2 (79-78.6) 2 (75-75.4) 2
(88-83.6) 2 (81-79.4) 2 (75-78.6) 2 (70-75.4) 2
(83-83.6) 2 (78-79.4) 2 (77-78.6) 2 (74-75.4) 2
(82-83.6) 2 (81-79.4) 2 (78-78.6) 2 (83-75.4) 2
(81-83.6) 2 (81-79.4) 2 (80-78.6) 2 (80-75.4) 2
(80-83.6) 2 (82-79.4) 2 (78-78.6) 2 (75-75.4) 2
(80-83.6) 2 (76-79.4) 2 (83-78.6) 2 (76-75.4) 2
(89-83.6) 2 (76-79.4) 2 (84-78.6) 2 (75-75.4) 2
𝛴 (𝑥̅𝑖𝑗 − 𝑥̅𝑖 )2 146.4 56.4 98.4 154.4
SSw 455.6
The Mean of Squares of Treatments and Errors is computed by simply dividing the sum of
squares by the degrees of freedom. In equation,
𝑆𝑆
𝑀𝑆 =
𝑑𝑓
The values of MSb and MSw are determined through the computations below:
𝑆𝑆𝑏 341.9 𝑆𝑆𝑤 455.6
MSb= 𝑑𝑓 = = 𝟏𝟏𝟑. 𝟗𝟕 MSw= = = 𝟏𝟐. 𝟔𝟔
1 3 𝑑𝑓2 36
With the MSb and MSw determined, the F-value can now be determined. The formula for
the F-value is expressed as:
𝑀𝑆𝑏
F=𝑀𝑆𝑤
The F-value of the data set is computed as:
𝑀𝑆𝑏 113.97
F= = = 𝟗. 𝟎𝟎
𝑀𝑆𝑤 12.66
Now that the F-value is determined, the critical value must also be obtained from the F
table (Appendix 1). The values of df1 and df2 are needed to determine the critical value. However,
since most of the standardized range distribution table does not show the value for df2=36, the
method of interpolation must be used with a formula similar to the two point formula in basic
algebra in which:
𝑦2 − 𝑦1
𝑦 − 𝑦1 = (𝑥̅ − 𝑥̅1 )
𝑥̅2 − 𝑥̅1
Computations for the critical value are shown below:
2.9223 − 2.8387
2.9223 − 𝐹𝐶 = (36 − 30)
40 − 30
𝑭𝑪 = 𝟐. 𝟖𝟕
Comparing the F-value to the critical value (FC), the F-value is greater than the critical
value. This is illustrated in the distribution curve in Figure 1.

Figure 1. Comparison of the F-value to the critical value in the distribution curve.
Since the F-value (F=9.00) is greater than the critical value (FC=2.87), the null hypothesis
must therefore be rejected.
Considering that the null hypothesis is rejected, the Tukey-Kramer method is performed in
which the pair(s) of means that differ is determined. In this test, the standardized range distribution
table (Appendix 2) is utilized in which the values of k and df2 will be determined since it will be
needed in the formula for w wherein:
√𝑀𝑆𝐸
𝜔 = 𝑞𝛼 (𝑘, 𝑑𝑓2 )
√𝑛
Based on the data, the values for k and df2 are 3 and 36 respectively with a test statistic of
0.05.
As shown in the standardized distribution tables the following values and computation
basing from the previous equation is as follows:
3.858 − 3.919
𝑦 − 3.919 = (36 − 30)
40 − 30
𝒚 = 𝟑. 𝟖𝟖𝟐𝟒
Since 𝑞𝑎 (𝑘, 𝑑𝑓2 ) = 𝑦, then
𝒒𝒂 (𝒌, 𝒅𝒇𝟐 ) = 𝟐. 𝟖𝟕
Substituting the interpolated value to the formula with the values of MSE and n of 12.66
and 40 respectively,
√12.66
𝜔 = 3.88
√40
𝜔 = 𝟐. 𝟏𝟖
The value of 𝜔 will then be the basis in the differences of the mean. If the difference
between two means is greater than the computed value of 2.18, then that pair of means is what
differs
In this problem, the mean difference of each of pair are shown in Table 6.
Table 6. Differences between the pairs of means.

Pair of Difference
Means
̅̅̅̅̅
𝑋𝐴𝐵 83.6-79.4 4.2
̅̅̅̅̅
𝑋𝐴𝐶 83.6-78.6 5
̅̅̅̅̅
𝑋𝐴𝐷 83.6-75.4 8.2
̅̅̅̅̅
𝑋𝐵𝐶 79.4-78.6 0.8
̅̅̅̅̅
𝑋𝐵𝐷 79.4-75.4 4
̅̅̅̅̅
𝑋𝐶𝐷 78.6-75.4 3.2
Hence fish sizes in Area A differ from that of Areas B, C and D.
Conclusion
Since the F-value (F=9.00) is greater than the critical value (FC=2.87), the null hypothesis
must be rejected. Therefore, it can be concluded that the area of the marine reserves affects the
size of the fish.

References
Samuels, M.L., Witmer, J.A., Schaffner, A.A. (2015). Statistics for the Life Sciences. United States:
Pearson Education.

Data Management: One-Way Analysis of Variance (ANOVA) : Problem Set

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Data Management: One-Way Analysis of Variance (ANOVA) : Problem Set

Загружено:

Авторское право:

Доступные форматы

Data Management: One-Way Analysis of Variance (ANOVA)

Enaje, Shana Marisse E.

Labao, Jerzeth Kate A.

Roxas, Ria Francesca M.

University of Santo Tomas, España Street, Manila, 1501

measured each fish (cm) and tabulated the data below.

critically important for both types of studies – qualitative and quantitative.

difference between at least two of the groups.

The null hypothesis states that:

The alternative hypothesis states that:

on fish size, one-way ANOVA must be used as the statistical tool.

ANOVA was used as the test statistic.

more different sets of data. This is mathematically stated as:

rather than the null hypothesis.

data differ from each other, the Tukey Method is utilized.

Table 2. Computation of the values for  and degrees of freedom.

Table 3. Determination of the F value.

(SSw). The formulae for both values are:

𝑺𝑺𝒃 = 𝑛𝛴 (𝑋̅ − 𝑥̅𝑖 )2 𝑺𝑺𝒘 = 𝛴𝛴 (𝑥̅𝑖𝑗 − 𝑥̅𝑖 )2

Table 4. Computation for SSb.

Table 5. Computation for SSw.

squares by the degrees of freedom. In equation,

the F-value is expressed as:

The F-value of the data set is computed as:

Computations for the critical value are shown below:

value. This is illustrated in the distribution curve in Figure 1.

must therefore be rejected.

needed in the formula for w wherein:

basing from the previous equation is as follows:

Since 𝑞𝑎 (𝑘, 𝑑𝑓2 ) = 𝑦, then

Table 6. Differences between the pairs of means.

Hence fish sizes in Area A differ from that of Areas B, C and D.

size of the fish.

Вам также может понравиться