Anova 1

Analysis of Variance (ANOVA)
W&W, Chapter 10
Introduction
Last time we learned about the chi square test for independence, which is useful for data that is measured at the nominal or ordinal level of analysis. If we have data measured at the interval level, we can compare two or more population groups in terms of their population means using a technique called analysis of variance, or ANOVA.
Completely randomized design

Population 1 Mean = 1 Variance=12 Population 2.. Population k Mean = 2 . Mean = k Variance=22 Variance = k2
We want to know something about how the populations compare. Do they have the same mean? We can collect random samples from each population, which gives us the following data.

Mean = M1 Variance=s12 Mean = M2 .. Variance=s22 . Mean = Mk Variance = sk2
N1 cases
N2 cases
Nk cases
Suppose we want to compare 3 college majors in a business school by the average annual income people make 2 years after graduation. We collect the following data (in $1000s) based on random surveys.

Accounting 27 22 33 25 38 29 Marketing 23 36 27 44 39 32 Finance 48 35 46 36 28 29

Can the dean conclude that there are differences among the majors incomes? Ho: 1 = 2 = 3
HA: 1 2 3
In this problem we must take into account: 1) The variance between samples, or the actual differences by major. This is called the sum of squares for treatment (SST).

2) The variance within samples, or the variance of incomes within a single major. This is called the sum of squares for error (SSE). Recall that when we sample, there will always be a chance of getting something different than the population. We account for this through #2, or the SSE.
F-Statistic
For this test, we will calculate a F statistic, which is used to compare variances. F = SST/(k-1) SSE/(n-k) SST=sum of squares for treatment SSE=sum of squares for error k = the number of populations N = total sample size
F-statistic
Intuitively, the F statistic is: F = explained variance unexplained variance Explained variance is the difference between majors Unexplained variance is the difference based on random sampling for each group (see Figure 10-1, page 327)
Calculating SST
SST = ni(Mi - )2 = grand mean or = Mi/k or the sum of all values for all groups divided by total sample size Mi = mean for each sample k= the number of populations
Calculating SST
By major Accounting M1=29, n1=6 Marketing M2=33.5, n2=6 Finance M3=37, n3=6 = (29+33.5+37)/3 = 33.17 SST = (6)(29-33.17)2 + (6)(33.5-33.17)2 + (6)(37-33.17)2 = 193
Calculating SST
Note that when M1 = M2 = M3, then SST=0 which would support the null hypothesis. In this example, the samples are of equal size, but we can also run this analysis with samples of varying size also.
Calculating SSE
SSE = (Xit Mi)2 In other words, it is just the variance for each sample added together. SSE = (X1t M1)2 + (X2t M2)2 + (X3t M3)2 SSE = [(27-29)2 + (22-29)2 ++ (29-29)2] + [(23-33.5)2 + (36-33.5)2 +] + [(48-37)2 + (35-37)2 ++ (29-37)2] SSE = 819.5
Statistical Output
When you estimate this information in a computer program, it will typically be presented in a table as follows:
Source of Variation Treatment Error Total df k-1 n-k n-1 Sum of squares SST SSE
SS=SST+SSE
Mean squares
MSE=SSE/(n-k)
F-ratio
MST=SST/(k-1) F=MST MSE
Calculating F for our example

F = 193/2 819.5/15 F = 1.77 Our calculated F is compared to the critical value using the F-distribution with F, k-1, n-k degrees of freedom k-1 (numerator df) n-k (denominator df)
The Results
For 95% confidence (=.05), our critical F is 3.68 (averaging across the values at 14 and 16 In this case, 1.77 < 3.68 so we must accept the null hypothesis. The dean is puzzled by these results because just by eyeballing the data, it looks like finance majors make more money.
The Results
Many other factors may determine the salary level, such as GPA. The dean decides to collect new data selecting one student randomly from each major with the following average grades.
New data
Average Accounting A+ 41 A 36 B+ 27 B 32 C+ 26 C 23 M(t)1=30.83 = 33.72 Marketing 45 38 33 29 31 25 M(t)2=33.5 Finance M(b) 51 M(b1)=45.67 45 M(b2)=39.67 31 M(b3)=30.83 35 M(b4)=32 32 M(b5)=29.67 27 M(b6)=25 M(t)3=36.83
Randomized Block Design

Now the data in the 3 samples are not independent, they are matched by GPA levels. Just like before, matched samples are superior to unmatched samples because they provide more information. In this case, we have added a factor that may account for some of the SSE.
Two way ANOVA

Now SS(total) = SST + SSB + SSE Where SSB = the variability among blocks, where a block is a matched group of observations from each of the populations We can calculate a two-way ANOVA to test our null hypothesis. We will talk about this next week.

Anova 1

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Anova 1

Загружено:

Авторское право:

Доступные форматы

Analysis of Variance (ANOVA)

Completely randomized design

Completely randomized design

Completely randomized design

Completely randomized design

Completely randomized design

MST=SST/(k-1) F=MST MSE

Calculating F for our example

Randomized Block Design

Two way ANOVA

Вам также может понравиться