Вы находитесь на странице: 1из 39

ASSIGNMENT 1 MANOVA (Multivariate ANOVA) Introduction Multivariate analysis of variance (MANOVA) is simply an ANOVA with several dependent variables.

. That is to say, ANOVA tests for the difference in means between two or more groups, while MANOVA tests for the difference in two or more vectors of means. For example, we may conduct a study where we try two different textbooks, and we are interested in the students' improvements in math and physics. In that case, improvements in math and physics are the two dependent variables, and our hypothesis is that both together are affected by the difference in textbooks. A multivariate analysis of variance (MANOVA) could be used to test this hypothesis. Instead of a univariate F value, we would obtain a multivariate F value (Wilks' ) based on a comparison of the error variance/covariance matrix and the effect variance/covariance matrix. Although we only mention Wilks' here, there are other statistics that may be used, including Hotelling's trace and Pillai's criterion. The "covariance" here is included because the two measures are probably correlated and we must take this correlation into account when performing the significance test. Testing the multiple dependent variables is accomplished by creating new dependent variables that maximize group differences. These artificial dependent variables are linear combinations of the measured dependent variables. Research Questions The main objective in using MANOVA is to determine if the response variables (student improvement in the example mentioned above), are altered by the observers manipulation of the independent variables. Therefore, there are several types of research questions that may be answered by using MANOVA: 1) What are the main effects of the independent variables? 2) What are the interactions among the independent variables? 3) What is the importance of the dependent variables? 4) What is the strength of association between dependent variables? 5) What are the effects of covariates? How may they be utilized? Results If the overall multivariate test is significant, we conclude that the respective effect (e.g., textbook) is significant. However, our next question would of course be whether only math skills improved, only physics skills improved, or both. In fact, after obtaining a significant multivariate test for a particular main effect or interaction, customarily one would examine the univariate F tests for each variable to interpret the respective effect. In other words, one would identify the specific dependent variables that contributed to the significant overall effect. MANOVA is useful in experimental situations where at least some of the independent variables are manipulated. It has several advantages over ANOVA.

First, by measuring several dependent variables in a single experiment, there is a better chance of discovering which factor is truly important. Second, it can protect against Type I errors that might occur if multiple ANOVAs were conducted independently. Additionally, it can reveal differences not discovered by ANOVA tests. However, there are several cautions as well. It is a substantially more complicated design than ANOVA, and therefore there can be some ambiguity about which independent variable affects each dependent variable. Thus, the observer must make many potentially subjective assumptions. Moreover, one degree of freedom is lost for each dependent variable that is added. The gain of power obtained from decreased SS error may be offset by the loss in these degrees of freedom. Finally, the dependent variables should be largely uncorrelated. If the dependent variables are highly correlated, there is little advantage in including more than one in the test given the resultant loss in degrees of freedom. Under these circumstances, use of a single ANOVA test would be preferable. Assumptions Normal Distribution: - The dependent variable should be normally distributed within groups. Overall, the F test is robust to non-normality, if the nonnormality is caused by skewness rather than by outliers. Tests for outliers should be run before performing a MANOVA, and outliers should be transformed or removed. Linearity: - MANOVA assumes that there are linear relationships among all pairs of dependent variables, all pairs of covariates, and all dependent variable-covariate pairs in each cell. Therefore, when the relationship deviates from linearity, the power of the analysis will be compromised. Homogeneity of Variances: - Homogeneity of variances assumes that the dependent variables exhibit equal levels of variance across the range of predictor variables. Remember that the error variance is computed (SS error) by adding up the sums of squares within each group. If the variances in the two groups are different from each other, then adding the two together is not appropriate, and will not yield an estimate of the common within-group variance. Homoscedasticity can be examined graphically or by means of a number of statistical tests. Homogeneity of Variances and Covariances: - In multivariate designs, with multiple dependent measures, the homogeneity of variances assumption described earlier also applies. However, since there are multiple dependent variables, it is also required that their intercorrelations (covariances) are homogeneous across the cells of the design. There are various specific tests of this assumption.

Special Cases Two special cases arise in MANOVA, the inclusion of within-subjects independent variables and unequal sample sizes in cells. Unequal sample sizes - As in ANOVA, when cells in a factorial MANOVA have different sample sizes, the sum of squares for effect plus error does not equal the total sum of squares. This causes tests of main effects and interactions to be correlated. SPSS offers and adjustment for unequal sample sizes in MANOVA. Within-subjects design - Problems arise if the researcher measures several different dependent variables on different occasions. This situation can be viewed as a within- subject independent variable with as many levels as occasions, or it can be viewed as separate dependent variables for each occasion. Tabachnick and Fidell (1996) provide examples and solutions for each situation. This situation often lends itself to the use of profile analysis, which is explained below. Additional Limitations o Outliers - Like ANOVA, MANOVA is extremely sensitive to outliers. Outliers may produce either a Type I or Type II error and give no indication as to which type of error is occurring in the analysis. There are several programs available to test for univariate and multivariate outliers. o Multicollinearity and Singularity - When there is high correlation between dependent variables, one dependent variable becomes a near-linear combination of the other dependent variables. Under such circumstances, it would become statistically redundant and suspect to include both combinations.

RAW DATA Responden 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Sikap 1 3 1 2 2 1 3 1 2 1 3 1 2 2 3 1 1 1 2 3 3 3 3 1 2 3 2 3 1 2 Sejarah 56 51 67 57 68 77 76 89 65 77 78 67 67 76 78 55 58 59 67 78 89 90 78 67 78 65 47 66 88 85 Matematik 67 56 65 77 87 66 89 67 76 56 71 72 61 80 59 78 65 78 71 62 59 63 85 47 56 77 87 80 66 85

RESULT OF DATA ANALYSIS FROM SPSS V.20


Between-Subjects Factors Value Label 1 SIKAP 2 3 Tidak Minat Tidak Pasti Minat N 11 9 10

Descriptive Statistics SIKAP Tidak Minat Tidak Pasti SKOR SEJARAH Minat Total Tidak Minat Tidak Pasti SKOR MATEMATIK Minat Total 70.10 70.27 11.958 10.998 10 30 74.90 70.63 66.09 75.56 11.618 11.758 8.882 11.114 10 30 11 9 Mean 69.09 67.78 Std. Deviation 12.194 11.322 N 11 9

The Descriptive Statistics table displays the Mean test score for each type of attitude for their History and Mathematic test results. As we are not interested in comparing the scores of History and Mathematic (i.e. the two dependent variables), both appear in separate rows. However, we are interested in comparing the scores from the three levels of attitude on each of the dependent variables. For example, by observing the History results we can see that the Interested attitude has a higher mean (74.90) than the Not Interested attitude (69.09) and Unknown attitude (67.78). As for the Mathematics results, the Unknown attitude (75.56) has a higher mean than Interested attitude (70.10) while Not Interested attitude (66.09 ) get the lowest mean.

Box's Test of Equality of Covariance Matrices Box's M F df1 df2 Sig.


a

1.382 .205 6 14929.054 .975

Tests the null hypothesis that the observed covariance matrices of the dependent variables are equal across groups. a. Design: Intercept + SIKAP

The Boxs Test of Equality of Covariance Matrices checks the assumption of homogeneity of covariance across the groups using p< .001 as a criterion. As Boxs M (1.38) was not significant, p (.975) > (.001) indicating that there are no significant differences between the covariance matrices. Therefore, the assumption is not violated and Wilks Lambda is an appropriate test to use.

Multivariate Testsa Effect Pillai's Trace Wilks' Lambda Intercept Hotelling's Trace Roy's Largest Root Pillai's Trace Wilks' Lambda SIKAP Hotelling's Trace Roy's Largest Root a. Design: Intercept + SIKAP b. Exact statistic c. The statistic is an upper bound on F that yields a lower bound on the significance level. .220 .145 1.376 1.962c 4.000 2.000 50.000 27.000 .256 .160 .099 .127 108.550 108.550 .196 .812 1411.149b 1411.149
b

Value .991 .009

F 1411.149b 1411.149
b

Hypothesis df 2.000 2.000 2.000 2.000 4.000 4.000

Error df 26.000 26.000 26.000 26.000 54.000 52.000

Sig. .000 .000 .000 .000 .224 .239

Partial Eta Squared .991 .991 .991 .991 .098 .099

1.471 1.424b

Above is the MANOVA using the Wilks Lambda test. Using an alpha level of .05, we can see that this test is not significant, Wilks =.81, F (4, 52) = 1.42, p >.001, multivariate 2 = .09. The non-significant F indicates that there are no significant differences among the students attitude on a linear combination of the two dependent variables. The multivariate 2 = .09 indicates that approximately 9% of multivariate variance of the dependent variable is associated with the group factor.

Levene's Test of Equality of Error Variances F SKOR SEJARAH SKOR MATEMATIK .209 1.549 df1 2 2

df2 27 27

Sig. .813 .231

Tests the null hypothesis that the error variance of the dependent variable is equal across groups. a. Design: Intercept + SIKAP

The LevenesTest of Equality of Error Variances tests the assumption of MANOVA and ANOVA that the variances of each variable are equal across the groups. From the LevenesTest of Equality of Error Variances table, we can see th e assumption is met for both dependent variables (History Score, p>.05, and Mathematic Score, p>.05)

Tests of Between-Subjects Effects Source Dependent Variable Type III Sum of Squares SKOR SEJARAH Corrected Model SKOR MATEMATIK SKOR SEJARAH Intercept SKOR MATEMATIK SKOR SEJARAH SIKAP SKOR MATEMATIK SKOR SEJARAH Error SKOR MATEMATIK SKOR SEJARAH Total SKOR MATEMATIK SKOR SEJARAH Corrected Total SKOR MATEMATIK a. R Squared = .070 (Adjusted R Squared = .001) b. R Squared = .127 (Adjusted R Squared = .062) 3507.867 29 151630.000 4008.967 30 29 3064.031 153681.000 27 30 113.483 443.835 3727.365 2 27 221.918 138.051 1.956 .161 .127 148455.517 281.602 1 2 148455.517 140.801 1308.178 1.020 .000 .374 .980 .070 443.835b 148486.679 2 1 221.918 148486.679 1.956 1075.596 .161 .000 .127 .976 281.602a 2 140.801 1.020 .374 df Mean Square F Sig. Partial Eta Squared .070

Follow-up univariate ANOVAs (shown above) indicated that both History score and Mathematic score were not significantly difference for students with difference attitude. F (2, 27) = 1.96, p>.05, 2 =.13 and F (2, 27) = 1.02, p>.05, 2 =.07 respectively.

Due to the results of Multivariate ANOVA analysis, we can accept our two Hypothesis Nol that there is no significant differences in History score mean on the

difference of attitude level and there is no significant differences in Mathematic score mean on the difference of attitude level.

ASSIGNMENT 2 Simple Regression Introduction Regression analysis is used when two or more variables are thought to be systematically connected by a linear relationship. In simple regression, we have only two let us designate them x and y and we suppose that they are related by an expression of the form y = 0 + 1 x + . Well leave aside for a moment the nature of the variable and focus on the x - y relationship. y = 0 + 1 x is the equation of a straight line; 0 is the intercept (or constant) and 1 is the x coefficient, which represents the slope of the straight line the equation describes.To be concrete, suppose we are talking about the relation between air temperature and the drying time of paint. We know from experience that as x (temperature) increases, y (drying time) decreases, and we might suppose that the relationship is linear. But suppose that we need to know the exact nature of the relationship, so that we can predict drying time at various temperatures. How could we discover the actual values of 0 and 1? Well, to start with, we cannot discover the actual values. Note that 0 and 1 are Greek letters, indicating that these are parameters, and they are somewhat in the nature of population parameters which can never be known exactly. What we can do is to get estimates of these parameters let us call them b0 and b1, using Latin characters to indicate that these are statistics and only approximations of the real thing. Assumptions of the Regression Model The relation between x and y is given by y = 0 + 1 x + is a random variable, which may have both positive and negative values, so is normally distributed E() = 0 The standard deviation of , yx, is constant over the whole range of variation of x. This property is called homoscedasticity. o since E() = 0, were supposing that E(y) = 0 + 1x + E() = 0 + 1x Finding the regression line The method of ordinary least squares or OLS begin with assumed values for b0 and b1 and suppose that the relation between x and y is given by y = b0 + b1x; some b0s and b1s will give us better fits than others Let y = a + bx i be the value of y estimated by the regression equation when x has the value x i; then if yi is actual value, yi y i is called the residual or the error

Substituting, let ei = yi y i = yi b0 b1x i Different b0s and b1s will cause each ei to have a different value:

The residuals along the line marked A are larger than those along the line marked B but the sum of deviations is always zero square each residual and define the sum of squared errors as (yi b0 b1x i)2 x and y are data: the variables are b0 and b1, and choosing different values of these will change the size of the sum of squares Minimizing the sum of squares with respect to b0 and b1, using minimization methods from differential calculus, gives unique values for the bs Resulting formulas are rarely used explicitly anymore, but

RAW DATA STUDENT 1 2 3 4 5 6 7 8 9 10 MULTIPLE-CHOICE TEST 9 7 5 6 8 3 5 2 10 6 STANDARDIZED TEST 155 152 150 151 151 144 149 146 155 150

RESULT OF DATA ANALYSIS FROM SPSS V.20

SPSS has produced a line (regression line or the line of best fit) that is as close as possible to all of the dots. The line was built based on the statistical concept of least squared. This means that the line was drawn so that we minimize the squared distance between the dots and the line.

Variables Entered/Removeda Model 1 Variables Entered Multiple-choice test


b

Variables Removed

Method . Enter

a. Dependent Variable: Standardized Test b. All requested variables entered.

Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1 .940
a

.883

.868

1.257

a. Predictors: (Constant), Multiple-choice test

The R Square value in the Model Summary table shows the amount of variance in the dependent variable that can be explained by the independent variable. The independent variable of Multiple-choice test score accounts for 88.3 percent of the variance in the score of the Standardized test. The R value (0.940) indicates that as Multiple-choice test score increases the Standardized test score also increases, and this is a positive correlation, with r= 0.940. This is statistically significant from the Pearson correlation output.
Test Statistic
ANOVA Model Regression 1 Residual Total a. Dependent Variable: Standardized Test b. Predictors: (Constant), Multiple-choice test Sum of Squares 95.460 12.640 108.100 df 1 8 9
a

Mean Square 95.460 1.580

F 60.419

Sig. .000b

p value

The ANOVA tests the significance of the regression model. From the above table, F (1, 8) = 60.419, p<0.01, and therefore can conclude that the regression is statistically significant.

Coefficients Model

Unstandardized Coefficients

Standardized Coefficients

Sig.

B (Constant) 1 Multiple-choice test a. Dependent Variable: Standardized Test 1.295 142.399

Std. Error 1.091 .167

Beta 130.470 .940 7.773 .000 .000

The Unstandardized Coefficients B column gives us the value of the intercept (for the Constant row) and the slope of the regression line (from the Multiple-choice test row). This gives us the following regression equation: Standardized Test score = 142.399 + 1.295 Multiple-choice test Y = 142.40 + 1.30 X The Standardized Beta Coefficient column informs us of the contribution that an individual variable makes to the model. From the above table we can see that Multiple-choice test contributes 1.295 to Standardized test score, which is our Persons r value. The t value (t= 130.470, p< 0.01) for Constant tells us that the intercept is significantly different from zero. The t value (t= 7.773, p< 0.01) for shows that the regression is significant. A simple regression procedure was used to predict students standardized test score from the students short multiple-choice test scores. A total of 10 subjects participated in the study. The simple regression analysis revealed that the short multiple-choice test predicted the standardized test scores, r2=.88 (adjusted r2=.87), F (1, 8) = 60.419, p<0.01. The unstandardized and standardized regression equations a reported in Table 1. The regression coefficient was statistically significant (p<.01)

Table 1 Unstandardized and Standardized Regression Equations for the Prediction of Standardized Test Scores from Short Multiple-choice test.

Unstandardized Coefficients B (Constant) Multiple-choice test 142.40 1.30 SE 1.09 0.17

Standardized Coefficients t 130/47** 0.94 7.77**

RAW DATA STUDENT 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 MULTIPLE-CHOICE TEST 9 7 5 6 8 3 5 2 10 6 12 11 9 7 2 10 6 12 11 8 3 5 2 8 3 11 15 14 10 11 STANDARDIZED TEST 155 152 150 151 151 144 149 146 155 150 159 156 154 152 146 155 150 159 156 154 141 152 140 154 143 158 168 159 156 160

RESULT OF DATA ANALYSIS FROM SPSS V.20

SPSS has produced a line (regression line or the line of best fit) that is as close as possible to all of the dots. The line was built based on the statistical concept of least squared. This means that the line was drawn so that we minimize the squared distance between the dots and the line.

Variables Entered/Removeda Model 1 Variables Entered Multiple-choice test


b

Variables Removed

Method . Enter

a. Dependent Variable: Standardized Test b. All requested variables entered.

Model Summaryb Model R R Square Adjusted R Square Std. Error of the Estimate 1 .946a .894 .890 2.027

a. Predictors: (Constant), Multiple-choice test b. Dependent Variable: Standardized Test

The R Square value in the Model Summary table shows the amount of variance in the dependent variable that can be explained by the independent variable. The independent variable of Multiple-choice test score accounts for 89.4 percent of the variance in the score of the Standardized test. The R value (0.946) indicates that as Multiple-choice test score increases the Standardized test score also increases, and this is a positive correlation, with r= 0.946. This is statistically significant from the Pearson correlation output.
Test Statistic
ANOVA Model Regression 1 Residual Total a. Dependent Variable: Standardized Test b. Predictors: (Constant), Multiple-choice test Sum of Squares 972.470 115.030 1087.500 df 1 28 29
a

Mean Square 972.470 4.108

F 236.714

Sig. .000b

p value

The ANOVA tests the significance of the regression model. From the above table, F (1, 28) = 236.714, p<0.01, and therefore can conclude that the regression is statistically significant.

Coefficients Model

Unstandardized Coefficients

Standardized Coefficients

Sig.

B (Constant) 1 Multiple-choice test a. Dependent Variable: Standardized Test 1.583 140.314

Std. Error .874 .103

Beta 160.506 .946 15.386 .000 .000

The Unstandardized Coefficients B column gives us the value of the intercept (for the Constant row) and the slope of the regression line (from the Multiple-choice test row). This gives us the following regression equation: Standardized Test score = 140.314 + 1.585 Multiple-choice test Y = 140.31 + 1.60 X The Standardized Beta Coefficient column informs us of the contribution that an individual variable makes to the model. From the above table we can see that Multiple-choice test contributes 1.585 to Standardized test score, which is our Persons r value. The t value (t= 160.506, p< 0.01) for Constant tells us that the intercept is significantly different from zero. The t value (t= 15.386, p< 0.01) for shows that the regression is significant. A simple regression procedure was used to predict students standardized test score from the students short multiple-choice test scores. A total of 30 subjects participated in the study. The simple regression analysis revealed that the short multiple-choice test predicted the standardized test scores, r2=.894 (adjusted r2=.89), F (1, 28) =

236.714, p<0.01. The unstandardized and standardized regression equations a reported in Table 1. The regression coefficient was statistically significant (p<.01)

Table 1 Unstandardized and Standardized Regression Equations for the Prediction of Standardized Test Scores from Short Multiple-choice test.

Unstandardized Coefficients B (Constant) Multiple-choice test 140.31 1.60 SE 0.97 0.10

Standardized Coefficients t 160.51** 0.95 15.37**

ASSIGNMENT 3 & 4 ANOVA One Way and Two Way ANOVA One Way A One-Way Analysis of Variance is a way to test the equality of three or more means at one time by using variances. Assumptions The populations from which the samples were obtained must be normally or approximately normally distributed. The samples must be independent. The variances of the populations must be equal. Hypotheses The null hypothesis will be that all population means are equal, the alternative hypothesis is that at least one mean is different. In the following, lower case letters apply to the individual samples and capital letters apply to the entire set collectively. That is, n is one of many sample sizes, but N is the total sample size. Grand Mean The grand mean of a set of samples is the total of all the data values divided by the total sample size. This requires that you have all of the sample data available to you, which is usually the case, but not always. It turns out that all that is necessary to find perform a one-way analysis of variance are the number of samples, the sample means, the sample variances, and the sample sizes.

Another way to find the grand mean is to find the weighted average of the sample means. The weight applied is the sample size.

Total Variation Total VariationThe total variation (not variance) is comprised the sum of the squares of the differences of each mean with the grand mean.

There is the between group variation and the within group variation. The whole idea behind the analysis of variance is to compare the ratio of between group variance to within group variance. If the variance caused by the interaction between the samples is much larger when compared to the variance that appears within each group, then it is because the means aren't the same. Between Group Variation Between Group VariationThe variation due to the interaction between the samples is denoted SS(B) for Sum of Squares Between groups. If the sample means are close to each other (and therefore the Grand Mean) this will be small. There are k samples involved with one data value for each sample (the sample mean), so there are k-1 degrees of freedom.

The variance due to the interaction between the samples is denoted MS(B) for Mean Square Between groups. This is the between group variation divided by its degrees of freedom. It is also denoted by s_b^2. Within Group Variation Within Group VariationThe variation due to differences within individual samples, denoted SS(W) for Sum of Squares Within groups. Each sample is considered independently, no interaction between samples is involved. The degrees of freedom is equal to the sum of the individual degrees of freedom for each sample. Since each sample has degrees of freedom equal to one less than their sample sizes, and there are k samples, the total degrees of freedom is k less than the total sample size: df = N - k.

The variance due to the differences within individual samples is denoted MS(W) for Mean Square Within groups. This is the within group variation divided by its degrees of freedom. It is also denoted by s_w^2. It is the weighted average of the variances (weighted with the degrees of freedom). F test statistic ANOVA Test StatisticRecall that a F variable is the ratio of two independent chisquare variables divided by their respective degrees of freedom. Also recall that the F test statistic is the ratio of two sample variances, well, it turns out that's exactly what we have here. The F test statistic is found by dividing the between group variance by the within group variance. The degrees of freedom for the numerator are the degrees of freedom for the between group (k-1) and the degrees of freedom for the denominator are the degrees of freedom for the within group (N-k).

Summary Table All of this sounds like a lot to remember, and it is. However, there is a table which makes things really nice. SS SS(B) df k-1 MS SS(B) ----------k-1 SS(W) ----------N-k F MS(B) -------------MS(W) .

Between

Within

SS(W)

N-k

Total

SS(W) + SS(B)

N-1

Notice that each Mean Square is just the Sum of Squares divided by its degrees of freedom, and the F value is the ratio of the mean squares. Do not put the largest variance in the numerator, always divide the between variance by the within variance. If the between variance is smaller than the within variance, then the means are really close to each other and you will fail to reject the claim that they are all equal. The degrees of freedom of the F-test are in the same order they appear in the table.

Decision Rule The decision will be to reject the null hypothesis if the test statistic from the table is greater than the F critical value with k-1 numerator and N-k denominator degrees of freedom. If the decision is to reject the null, then at least one of the means is different. However, the ANOVA does not tell you where the difference lies. For this, you need another test, either the Scheffe' or Tukey test. ANOVA Two Way The two-way analysis of variance is an extension to the one-way analysis of variance. There are two independent variables (hence the name two-way). Assumptions The populations from which the samples were obtained must be normally or approximately normally distributed. The samples must be independent. The variances of the populations must be equal. The groups must have the same sample size. Hypotheses There are three sets of hypothesis with the two-way ANOVA. The null hypotheses for each of the sets are given below. 1. The population means of the first factor are equal. This is like the one-way ANOVA for the row factor. 2. The population means of the second factor are equal. This is like the one-way ANOVA for the column factor. 3. There is no interaction between the two factors. This is similar to performing a test for independence with contingency tables. Factors The two independent variables in a two-way ANOVA are called factors. The idea is that there are two variables, factors, which affect the dependent variable. Each factor will have two or more levels within it, and the degrees of freedom for each factor is one less than the number of levels. Treatment Groups Treatment Groups are formed by making all possible combinations of the two factors. For example, if the first factor has 3 levels and the second factor has 2 levels, then there will be 3x2=6 different treatment groups. As an example, let's assume we're planting corn. The type of seed and type of fertilizer are the two factors we're considering in this example. This example has 15 treatment groups. There are 3-1=2 degrees of freedom for the type of seed, and 5-

1=4 degrees of freedom for the type of fertilizer. There are 2*4 = 8 degrees of freedom for the interaction between the type of seed and type of fertilizer. The data that actually appears in the table are samples. In this case, 2 samples from each treatment group were taken.

Seed A-402 Seed B-894 Seed C-952 Main Effect

Fert I 106, 110 110, 112 94, 97

Fert II 95, 100 98, 99 86, 87

Fert III 94, 107 100, 101 98, 99

Fert IV 103, 104 108, 112 99, 101

Fert V 100, 102 105, 107 94, 98

The main effect involves the independent variables one at a time. The interaction is ignored for this part. Just the rows or just the columns are used, not mixed. This is the part which is similar to the one-way analysis of variance. Each of the variances calculated to analyze the main effects are like the between variances Interaction Effect The interaction effect is the effect that one factor has on the other factor. The degrees of freedom here is the product of the two degrees of freedom for each factor. Within Variation The Within variation is the sum of squares within each treatment group. You have one less than the sample size (remember all treatment groups must have the same sample size for a two-way ANOVA) for each treatment group. The total number of treatment groups is the product of the number of levels for each factor. The within variance is the within variation divided by its degrees of freedom. The within group is also called the error. F-Tests There is an F-test for each of the hypotheses, and the F-test is the mean square for each main effect and the interaction effect divided by the within variance. The numerator degrees of freedom come from each effect, and the denominator degrees of freedom is the degrees of freedom for the within variance in each case.

Two-Way ANOVA Table

It is assumed that main effect A has a levels (and A = a-1 df), main effect B has b levels (and B = b-1 df), n is the sample size of each treatment, and N = abn is the total sample size. Notice the overall degrees of freedom is once again one less than the total sample size. Source Main Effect A Main Effect B Interaction Effect Within Total SS given given given given sum of others df A, a-1 B, b-1 A*B, (a-1)(b-1) N - ab, ab(n-1) N - 1, abn - 1 MS SS / df SS / df SS / df SS / df F MS(A) / MS(W) MS(B) / MS(W) MS(A*B) / MS(W)

RAW DATA Sebuah firma computer bercadang memasarkan sejenis computer yang baru. Anda ditugaskan menentukan dua jenis skrin yang sesuai. Satu skrin menaip huruf berwarna oren pada latar belakang yang gelap sementara skrin yang satu lagi menaip huruf biru pada latar belakang yang cerah. Keperluan lain yang perlu disesuaikan adalah lampu floresen dan lampu mentol untuk kedua-dua skrin. Untuk tujuan itu anda merangka satu eksperimen melibatkan 20 orang dan dibahagikan kepada empat kumpulan yang berbeza. Sampel diuji dengan mengukur bilangan saat yang digunakan untuk setiap responden membaca 1000 perkataan. Datanya adalah seperti berikut.

Warna Lampu Biru

Florensen 130 129 118 133 125

Mentol 132 117 123 129 122 147 143 141 134 140

Oren

144 139 145 139 139

a. Kenal pasti yang mana satu data metrik dan data bukan metrik. Data metrik : Bilangan saat (interval / ratio) Data bukan metrik : Jenis warna lampu dan Jenis Lampu b. Tentukan pembolehubah bebas dan pembolehubah bersandar.

Pembolehubah Bersandar (DV) Pembolehubah Bebas (IV)

: Kepantasan membaca : Jenis warna lampu (Biru & Oren) Jenis lampu (Florensen& Mentol)

c. Cadangkan hipotesis bagi eksperimen yang dilakukan dan uji. Ho HI : Tidak terdapat perbezaan min yang signifikan dalam kepantasan membaca terhadap jenis warna lampu. : Terdapat perbezaan min yang signifikan dalam kepantasan membaca terhadap jenis warna lampu.

Ho HI

: Tidak terdapat perbezaan min yang signifikan dalam kepantasan membaca terhadap jenis lampu. : Terdapat perbezaan min yang signifikan dalam kepantasan membaca terhadap jenis lampu.

d. Apakah keputusan ujian? e. Cadangkan tindakan yang sesuai bagi firma tersebut berdasarkan keputusan ujian statistik.

RESULT OF DATA ANALYSIS FROM SPSS V.20


Between-Subjects Factors Value Label 1 Jenis Warna Lampu 2 1 Jenis Lampu 2 Mentol 10 Oren Florensen 10 10 Biru N 10

Descriptive Statistics Dependent Variable: Bilangan saat membaca Jenis Warna Lampu Jenis Lampu Florensen Biru Mentol Total Florensen Oren Mentol Total Florensen Total Mentol Total Mean 127.00 124.60 125.80 141.20 141.00 141.10 134.10 132.80 133.45 Std. Deviation 5.788 5.941 5.673 3.033 4.743 3.755 8.660 10.020 9.139 N 5 5 10 5 5 10 10 10 20

The Descriptive Statistics table displays the Mean propriety in reading on the blue light and orange light using the florescence and menthol. It can be seen from the Total row that overall there appear to be a large difference in the propriety in reading on the blue light and orange light (a mean of 125.80 for blue light compared to 141.10 for orange light). The orange light were making almost the same amount of reading time using florescence type of lamp (141.20) as on the menthol type of lamp (141.00). The blue light, however, creating a lower propriety in reading time using the menthol (124.60) than on florescence (127.00).

By observing the Total in the bottom row, it can be seen that the overall amount of propriety in reading on type of light colours. Irrespective of whether the type of lamp is florescence or menthol, do show some difference, with more propriety in reading when using the florescence type of lamp. The Standard Deviation show that when comparing the overall score of the two groups, the blue lights has the highest amount of propriety in reading (5.673), with the most closely related propriety in reading being found in the orange light (3.755). Even though the overall standard deviation is bigger for the blue lights, when we consider the type of lamp, the blue lights using the menthol that show the greatest spread of scores (5.941).
Levene's Test of Equality of Error Variancesa Dependent Variable: Bilangan saat membaca F .717 df1 3 df2 16 Sig. .556

Tests the null hypothesis that the error variance of the dependent variable is equal across groups. a. Design: Intercept + WL + JL + WL * JL

From this table we can see that we have homogeneity of variances of the dependent variable across groups. We know this as the Sig. value is greater than 0.05 (0.556), which is the level we set for alpha. If the Sig. value had been less than 0.05 then we would have concluded that the variance across groups was significantly different (unequal).
Tests of Between-Subjects Effects Dependent Variable: Bilangan saat membaca Source Type III Sum of Squares Corrected Model Intercept WL JL WL * JL Error Total Corrected Total 1184.950a 356178.050 1170.450 8.450 6.050 402.000 357765.000 1586.950 3 1 1 1 1 16 20 19 394.983 356178.050 1170.450 8.450 6.050 25.125 15.721 14176.241 46.585 .336 .241 .000 .000 .000 .570 .630 df Mean Square F Sig.

a. R Squared = .747 (Adjusted R Squared = .699)

We can see from the Test of Between-Subjects Effects table that for our factor type of lamp we have not found a significant main effect, F (1, 16) = 0.336, p> 0.05. This was expected from previous discussions of the descriptive statistics. We have found a significant main effect for our type of light colours, i.e whether our participant were using blue lights or orange light, F (1, 16) = 46.585, p< 0.05.

The chart plots the means and confirms our previously discussed findings of a significant interaction between the two variables. We can see from the interaction plot that the orange light made the highest propriety in reading using florescence type of lamp, but made the lowest propriety in reading using the menthol type of lamp. The orange lights appear to create an equal propriety in reading on both type of lamp.

ASSIGNMENT 5 & 6 Correlation Pearson, Spearman & Scatter Plot

Introduction Often several quantitative variables are measured on each member of a sample. If we consider a pair of such variables, it is frequently of interest to establish if there is a relationship between the two; i.e. to see if they are correlated. We can categorise the type of correlation by considering as one variable increases what happens to the other variable: Positive correlation the other variable has a tendency to also increase; Negative correlation the other variable has a tendency to decrease; No correlation the other variable does not tend to either increase or decrease. The starting point of any such analysis should thus be the construction and subsequent examination of a scatterplot. Examples of negative, no and positive correlation are as follows.

Correlation coefficient Pearsons correlation coefficient is a statistical measure of the strength of a linear relationship between paired data. In a sample it is denoted by r and is by design constrained as follows

Furthermore: Positive values denote positive linear correlation; Negative values denote negative linear correlation; A value of 0 denotes no linear correlation; The closer the value is to 1 or 1, the stronger the linear correlation.

In the figures various samples and their corresponding sample correlation coefficient values are presented. The first three represent the extreme correlation values of -1, 0 and 1:

When we say we have perfect correlation with the points being in a perfect straight line. Invariably what we observe in a sample are values as follows:

Note: 1) the correlation coefficient does not relate to the gradient beyond sharing its +ve or ve sign! 2) The correlation coefficient is a measure of linear relationship and thus a value of does not imply there is no relationship between the variables. For example in the following scatterplot which implies no (linear) correlation however there is a perfect quadratic relationship:

Correlation is an effect size and so we can verbally describe the strength of the correlation using the guide that Evans (1996) suggests for the absolute value of r: .00-.19 very weak .20-.39 weak .40-.59 moderate .60-.79 strong .80-1.0 very strong

For example a correlation value of would be a moderate positive correlation.

Assumptions The calculation of Pearsons correlation coefficient and subsequent significance testing of it requires the following data assumptions to hold: interval or ratio level; linearly related; bivariate normally distributed. In practice the last assumption is checked by requiring both variables to be individually normally distributed (which is a by-product consequence of bivariate normality). Pragmatically Pearsons correlation coefficient is sensitive to skewed distributions and outliers, thus if we do not have these conditions we are content. If your data does not meet the above assumptions then use Spearmans rank correlation!

RAW DATA The state of Vermont is divided into 10 Health Planning District which correspond roughly to countries. The following data for 2010 represent the percentage of births of babies under 2500 gram (Y), the fertility rate for females younger than 18 or older than 34 years of age (X1) and the percentage of births to unmarried mothers (X2) for each district. District 1 2 3 4 5 6 7 8 9 10 Y 6.1 7.1 7.4 6.3 6.5 5.7 6.6 8.1 6.3 6.9 X1 43.0 55.3 48.5 38.8 46.2 39.9 43.1 48.5 40 56.7 X2 9.2 12.0 10.4 9.8 9.8 7.7 10.9 9.5 11.6 11.6

a. Make a scatter diagram of (i) Y and X1 and (ii) Y and X2. b. Calculate the correlation between (i) Y and X1 and (ii) Y and X2. c. Use a t-test to test Ho : p = 0 for answer to b(i) and b(ii)

Scatter diagram of (i) Y and X1

Correlations Percentage of births of babies under 2500 gram Fertility rate for females younger than 18 or older than 34 years of age Pearson Correlation Percentage of births of babies under Sig. (2-tailed) 2500 gram N Fertility rate for females younger than 18 or older than 34 years of age Pearson Correlation Sig. (2-tailed) N 10 .621 .055 10 10 10 1 .055 1 .621

r=

------------------------( 67 )(461) = ( )

-----------------------4.41 = 3429.7 --------------= 4.41 = 338.04 Percentage of births of babies under 2500 grams (Y) have weak correlation with the fertility rate for females younger than 18 or older than 34 years of age (X1).

Scatter diagram of (i) Y and X2

Correlations District % of births of babies under 2500g Correlation Coefficient District Spearman's rho Correlation Coefficient % of births of babies under 2500g Sig. (2-tailed) N .134 .713 10 1.000 . 10 Sig. (2-tailed) N 1.000 . 10 .134 .713 10

r = (

------------------------( 67 )(102.5) Percentage of births of babies under 2500 grams (Y) and percentage of births to unmarried mothers (X2) for each district have very weak correlation

Use t-test to test HO: p=0 for answer to b ( i ) and b ( ii )

T-Test
One-Sample Statistics N fertility rate 18 to 34 % of births of babies under 2500g 10 10 Mean 46.000 6.700 Std. Deviation 6.2891 .6976 Std. Error Mean 1.9888 .2206

One-Sample Test Test Value = 0 t df Sig. (2-tailed) Mean Difference 95% Confidence Interval of the Difference Lower % of births of babies under 2500g fertility rate 18 to 34 23.130 9 .000 46.0000 41.501 50.499 30.371 9 .000 6.7000 6.201 Upper 7.199

T-Test
One-Sample Statistics N % of births of babies under 2500g % of births to unmarrried mothers 10 10 Mean 6.700 10.250 Std. Deviation .6976 1.3218 Std. Error Mean .2206 .4180

One-Sample Test Test Value = 0 t df Sig. (2tailed) Mean Difference 95% Confidence Interval of the Difference Lower % of births of babies under 2500g % of births to unmarrried mothers 24.522 9 .000 10.2500 9.304 11.196 30.371 9 .000 6.7000 6.201 Upper 7.199

Correlations District % of births of babies under 2500g Correlation Coefficient District Spearman's rho Correlation Coefficient % of births of babies under 2500g Sig. (2-tailed) N .134 .713 10 1.000 . 10 Sig. (2-tailed) N 1.000 . 10 .134 .713 10