Вы находитесь на странице: 1из 9

Data Management: One-Way Analysis of Variance (ANOVA)

Enaje, Shana Marisse E.

Labao, Jerzeth Kate A.

Magpoc, Maverick M.

Roxas, Ria Francesca M.

Department of Biology

College of Science

University of Santo Tomas, España Street, Manila, 1501

Problem Set

A marine biologist in charge of four marine reserves located on a small island noticed that

one of the marine reserves (Area ‘A’) was twice the size of the other areas (‘B’, ‘C’ and ‘D’).

Considering that all other aspects of the marine reserves were equal except for size, the biologist

wanted to find out if the size of the marine reserve had an effect on the overall size of fish species

living within them. To test this, he designated a single fish species Acanthurus olivaceous as the

test species, and collected 10 specimens of this fish in each of the four marine reserves. He

measured each fish (cm) and tabulated the data below.

Area A: (78, 88, 87, 88, 83, 82, 81, 80, 80, 89)

Area B: (78, 78, 83, 81, 78, 81, 81, 82, 76, 76)

Area C: (79, 73, 79, 75, 77, 78, 80, 78, 83, 84)

Area D: (77, 69, 75, 70, 74, 83, 80, 75, 76, 75)
Introduction

Data Management is the operations needed for a systematic, coherent process of data

collection, storage and retrieval. It is essential part of research and documentation of analyses.

Proper data handling and management is crucial to the success and reproducibility of a statistical

analysis. Selection of the appropriate tools and efficient use of these tools can save the researcher

numerous hours, and allow other researchers to leverage the products of their work.

Data analysis for quantitative studies, on the other hand, involves critical analysis and

interpretation of figures and numbers, and attempts to find rationale behind the emergence of main

findings. Comparisons of primary research findings to the findings of the literature review are

critically important for both types of studies – qualitative and quantitative.

The statistical method used to analyze and solve the problem set is one-way analysis of

variance (ANOVA). This statistical tool compares the sets of data among three or more groups

based on only one independent variable, and determines if there is a statistically significant

difference between at least two of the groups.

The objectives of the activity are to be able to learn some of the principles and techniques

of data management; to be able to familiarize one’s self to different statistical methods; and to

analyze and ascertain the proper statistical approach for a given set of data.

Hypotheses

The null hypothesis states that:

H0: The area of the marine reserves has no effect on the size of the fish.

The alternative hypothesis states that:

HA: The area of the marine reserves affects the size of the fish.
Results and Discussion

Table 1. Tabulated form of the data set with the computed sums and means of the values.
Area A Area B Area C Area D
78 78 79 77
88 78 73 69
87 83 79 75
88 81 75 70
83 78 77 74
82 81 78 83
81 81 80 80
80 82 78 75
80 76 83 76
89 76 84 75
̅
𝒙 83.6 79.4 78.6 75.4
̅
𝑿 79.25

The data in the problem set shows directly that there are four different groups. Each of the

groups possess an independent variable, which differ from one group to another. In this case, the

independent variable is the size of the marine reserve, where Area A has an area twice the size of

that of Areas B, C, and D. The dependent variable is the size of the fish, which is to be tested

according to the problem. In order to determine if the size of the marine reserve does have an effect

on fish size, one-way ANOVA must be used as the statistical tool.

Other statistical methods could have been used to analyze the data, but one-way ANOVA

was the best one to be used for the type of data that the problem set possesses. The numerical data

does not show any sign of ranking, therefore Mann-Whitney U Test, Kruskal-Wallis H Test, and

Spearman’s Correlation Coefficient were not the best methods to analyze the data. The data also

shows more than three groups of data, thus eliminating t-test, z-test, chi-square test, and Pearson’s

Correlation Coefficient from being the best method for analysis. The difference between one-way

and two-way ANOVA is that the former involves only one independent variable, while the latter
involves two variables. In the problem set, only one variable was analyzed, and thus, one-way

ANOVA was used as the test statistic.

One-way analysis of variance (ANOVA) is the test used to analyze and solve the problem.

This test compares the means of three or more groups or categories affected by one variable, and

determines if a statistically significant difference exists among the data. Usually, it tests the null

hypothesis (H0), which states that there is no significant difference between the means of three or

more different sets of data. This is mathematically stated as:

H0: 𝜇1 = 𝜇2 = 𝜇3 = ⋯ = 𝜇𝑘

where 𝜇 is the mean, and k is the number of groups. If, for example, a statistically

significant difference was observed in the data, then the alternative hypothesis (HA) is accepted

rather than the null hypothesis.

However, one-way ANOVA does not specifically state which two sets of data significantly

differ from each other. This is thus called an omnibus test. In order to determine which two sets of

data differ from each other, the Tukey Method is utilized.

Table 2. Computation of the values for  and degrees of freedom.


α 0.05
dfb (df1) k – 1 4–1 3
dfw (df2) n – k 40 – 4 36
dfT (df3) n – 1 40 – 1 39

Table 3. Determination of the F value.


SS Df (Degrees MS (Mean of
F-value
(Sum of Squares) of Freedom) Squares)
Treatments
341.9 3 113.97
(between/b)
9.00
Error (within/w) 455.6 36 12.66
Total 797.5 39
There are several steps in computing for the F-value of the data. The Sum of Squares is

sum of the squared differences of each value from the grand mean. Two sums of squares were

computed in the analysis: Sum of Squares of Treatments (SSb) and Sum of Squares of Errors

(SSw). The formulae for both values are:

𝑺𝑺𝒃 = 𝑛𝛴 (𝑋̅ − 𝑥̅𝑖 )2 𝑺𝑺𝒘 = 𝛴𝛴 (𝑥̅𝑖𝑗 − 𝑥̅𝑖 )2

Computations for the SSb and SSw are shown in Tables 4 and 5.

Table 4. Computation for SSb.


SSA 10 (83.6-79.25)2
SSB 10 (79.4-79.25)2
SSC 10 (78.6-79.25)2
SSD 10 (75.4-79.25)2
SSb 341.9

Table 5. Computation for SSw.


A B C D
(78-83.6)2 (78-79.4) 2 (79-78.6) 2 (77-75.4) 2
(88-83.6) 2 (78-79.4) 2 (73-78.6) 2 (69-75.4) 2
(87-83.6) 2 (83-79.4) 2 (79-78.6) 2 (75-75.4) 2
(88-83.6) 2 (81-79.4) 2 (75-78.6) 2 (70-75.4) 2
(83-83.6) 2 (78-79.4) 2 (77-78.6) 2 (74-75.4) 2
(82-83.6) 2 (81-79.4) 2 (78-78.6) 2 (83-75.4) 2
(81-83.6) 2 (81-79.4) 2 (80-78.6) 2 (80-75.4) 2
(80-83.6) 2 (82-79.4) 2 (78-78.6) 2 (75-75.4) 2
(80-83.6) 2 (76-79.4) 2 (83-78.6) 2 (76-75.4) 2
(89-83.6) 2 (76-79.4) 2 (84-78.6) 2 (75-75.4) 2
𝛴 (𝑥̅𝑖𝑗 − 𝑥̅𝑖 )2 146.4 56.4 98.4 154.4
SSw 455.6

The Mean of Squares of Treatments and Errors is computed by simply dividing the sum of

squares by the degrees of freedom. In equation,

𝑆𝑆
𝑀𝑆 =
𝑑𝑓

The values of MSb and MSw are determined through the computations below:
𝑆𝑆𝑏 341.9 𝑆𝑆𝑤 455.6
MSb= 𝑑𝑓 = = 𝟏𝟏𝟑. 𝟗𝟕 MSw= = = 𝟏𝟐. 𝟔𝟔
1 3 𝑑𝑓2 36

With the MSb and MSw determined, the F-value can now be determined. The formula for

the F-value is expressed as:

𝑀𝑆𝑏
F=𝑀𝑆𝑤

The F-value of the data set is computed as:

𝑀𝑆𝑏 113.97
F= = = 𝟗. 𝟎𝟎
𝑀𝑆𝑤 12.66

Now that the F-value is determined, the critical value must also be obtained from the F

table (Appendix 1). The values of df1 and df2 are needed to determine the critical value. However,

since most of the standardized range distribution table does not show the value for df2=36, the

method of interpolation must be used with a formula similar to the two point formula in basic

algebra in which:

𝑦2 − 𝑦1
𝑦 − 𝑦1 = (𝑥̅ − 𝑥̅1 )
𝑥̅2 − 𝑥̅1

Computations for the critical value are shown below:

2.9223 − 2.8387
2.9223 − 𝐹𝐶 = (36 − 30)
40 − 30

𝑭𝑪 = 𝟐. 𝟖𝟕

Comparing the F-value to the critical value (FC), the F-value is greater than the critical

value. This is illustrated in the distribution curve in Figure 1.


Figure 1. Comparison of the F-value to the critical value in the distribution curve.

Since the F-value (F=9.00) is greater than the critical value (FC=2.87), the null hypothesis

must therefore be rejected.

Considering that the null hypothesis is rejected, the Tukey-Kramer method is performed in

which the pair(s) of means that differ is determined. In this test, the standardized range distribution

table (Appendix 2) is utilized in which the values of k and df2 will be determined since it will be

needed in the formula for w wherein:

√𝑀𝑆𝐸
𝜔 = 𝑞𝛼 (𝑘, 𝑑𝑓2 )
√𝑛

Based on the data, the values for k and df2 are 3 and 36 respectively with a test statistic of

0.05.

As shown in the standardized distribution tables the following values and computation

basing from the previous equation is as follows:

3.858 − 3.919
𝑦 − 3.919 = (36 − 30)
40 − 30
𝒚 = 𝟑. 𝟖𝟖𝟐𝟒

Since 𝑞𝑎 (𝑘, 𝑑𝑓2 ) = 𝑦, then

𝒒𝒂 (𝒌, 𝒅𝒇𝟐 ) = 𝟐. 𝟖𝟕

Substituting the interpolated value to the formula with the values of MSE and n of 12.66

and 40 respectively,
√12.66
𝜔 = 3.88
√40

𝜔 = 𝟐. 𝟏𝟖

The value of 𝜔 will then be the basis in the differences of the mean. If the difference

between two means is greater than the computed value of 2.18, then that pair of means is what

differs

In this problem, the mean difference of each of pair are shown in Table 6.

Table 6. Differences between the pairs of means.


Pair of Difference
Means
̅̅̅̅̅
𝑋𝐴𝐵 83.6-79.4 4.2
̅̅̅̅̅
𝑋𝐴𝐶 83.6-78.6 5
̅̅̅̅̅
𝑋𝐴𝐷 83.6-75.4 8.2
̅̅̅̅̅
𝑋𝐵𝐶 79.4-78.6 0.8
̅̅̅̅̅
𝑋𝐵𝐷 79.4-75.4 4
̅̅̅̅̅
𝑋𝐶𝐷 78.6-75.4 3.2

Hence fish sizes in Area A differ from that of Areas B, C and D.

Conclusion

Since the F-value (F=9.00) is greater than the critical value (FC=2.87), the null hypothesis

must be rejected. Therefore, it can be concluded that the area of the marine reserves affects the

size of the fish.


References

Samuels, M.L., Witmer, J.A., Schaffner, A.A. (2015). Statistics for the Life Sciences. United States:
Pearson Education.

Вам также может понравиться