Вы находитесь на странице: 1из 11

MULTIVARIATE ANALYSIS

Statistical procedure for analysis of data involving more than one type of measurement or observation. It may also mean solving problems where more than one dependent variable is analyzed simultaneously with other variables. Multivariate analysis (MVA) is based on the statistical principle of multivariate statistics, which involves observation and analysis of more than one statistical outcome variable at a time. In design and analysis, the technique is used to perform trade studies across multiple dimensions while taking into account the effects of all variables on the responses of interest.

Multivariate Analysis Techniques


Multiple Regression Analysis Multiple regression is the most commonly utilized multivariate technique. It examines the relationship between a single metric dependent variable and two or more metric independent variables. The technique relies upon determining the linear relationship with the lowest sum of squared variances; therefore, assumptions of normality, linearity, and equal variance are carefully observed. The beta coefficients (weights) are the marginal impacts of each variable, and the size of the weight can be interpreted directly. Multiple regression is often used as a forecasting tool. Discriminant Analysis The purpose of discriminant analysis is to correctly classify observations or people into homogeneous groups. The independent variables must be metric and must have a high degree of normality. Discriminant analysis builds a linear discriminant function, which can then be used to classify the observations. The overall fit is assessed by looking at the degree to which the group means differ (Wilkes Lambda or D2) and how well the model classifies. To determine which variables have the most impact on the discriminant function, it is possible to look at partial F values. The higher the partial F, the more impact that variable has on the discriminant function. This tool helps categorize people, like buyers and nonbuyers.

Multivariate Analysis of Variance (MANOVA) This technique examines the relationship between several categorical independent variables and two or more metric dependent variables. Whereas analysis of variance (ANOVA) assesses the differences between groups (by using T tests for two means and F tests between three or more means), MANOVA examines the dependence relationship between a set of dependent measures across a set of groups. Typically this analysis is used in experimental design, and usually a hypothesized relationship between dependent measures is used. This technique is slightly different in that the independent variables are categorical and the dependent variable is metric. Factor Analysis When there are many variables in a research design, it is often helpful to reduce the variables to a smaller set of factors. This is an independence technique, in which there is no dependent variable. Rather, the researcher is looking for the underlying structure of the data matrix. Ideally, the independent variables are normal and continuous, with at least three to five variables loading onto a factor. There are two main factor analysis methods: common factor analysis, which extracts factors based on the variance shared by the factors, and principal component analysis, which extracts factors based on the total variance of the factors. Common factor analysis is used to look for the latent (underlying) factors, whereas principal component analysis is used to find the fewest number of variables that explain the most variance. The first factor extracted explains the most variance. Cluster Analysis The purpose of cluster analysis is to reduce a large data set to meaningful subgroups of individuals or objects. The division is accomplished on the basis of similarity of the objects across a set of specified characteristics. Outliers are a problem with this technique, often caused by too many irrelevant variables. The sample should be representative of the population, and it is desirable to have uncorrelated factors. There are three main clustering methods: hierarchical, which is a treelike process appropriate for smaller data sets; nonhierarchical, which requires specification of the number of clusters a priori; and a

combination of both. There are four main rules for developing clusters: the clusters should be different, they should be reachable, they should be measurable, and the clusters should be profitable (big enough to matter). This is a great tool for market segmentation. Conjoint Analysis Conjoint analysis is often referred to as trade-off analysis, since it allows for the evaluation of objects and the various levels of the attributes to be examined. It is both a compositional technique and a dependence technique, in that a level of preference for a combination of attributes and levels is developed. A part-worth, or utility, is calculated for each level of each attribute, and combinations of attributes at specific levels are summed to develop the overall preference for the attribute at each level. Models can be built that identify the ideal levels and combinations of attributes for products and services.

Z-TEST
A Z-test is any statistical test for which the distribution of the test statistic under the null hypothesis can be approximated by a normal distribution. Because of the central limit theorem, many test statistics are approximately normally distributed for large samples. For each significance level, the Z-test has a single critical value (for example, 1.96 for 5% two tailed) which makes it more convenient than the Student's t-test which has separate critical values for each sample size. Therefore, many statistical tests can be conveniently performed as approximate Z-tests if the sample size is large or the population variance known. If the population variance is unknown (and therefore has to be estimated from the sample itself) and the sample size is not large (n < 30), the Student t-test may be more appropriate. The formula for calculating z test:

where

is the sample mean, is a specified value to be tested, is the population standard null hypothesis: H0: = 5 alternative hypothesis: Ha: > 5

deviation, and n is the size of the sample.

Example How many subjects will be needed to find the average age of students at Fisher College plus or minus a year, with a 95 percent significance level and a population standard deviation of 3.5?

Interpretation: Rounding up, a sample of 48 students would be sufficient to determine students' mean age plus or minus one year. Note that the confidence interval width is always double the plus or minus figure.

T-TEST A statistical examination of two population means. A two-sample t-test examines whether two samples are different and is commonly used when the variances of two normal distributions are unknown and when an experiment uses a small sample size. The formula for calculating t-test:

M = mean n = number of scores per group

x = individual scores M = mean n= number of scores in group Example Sam Sleep researcher hypothesizes that people who are allowed to sleep for only four hours will score significantly lower than people who are allowed to sleep for eight hours on a cognitive skills test. He brings sixteen participants into his sleep lab and randomly assigns them to one of two groups. In one group he has participants sleep for eight hours and in the other group he has them sleep for four. The next morning he administers the SCAT (Sam's

Cognitive Ability Test) to all participants. (Scores on the SCAT range from 1-9 with high scores representing better performance). SCAT scores 8 hours sleep group (X) 4 hours sleep group (Y) 57535339

81466412

X 5 7 5 3 5 3 3 9

(x-Mx)2 0 4 0 4 0 4 4 16

Y 8 1 4 6 6 4 1 2

(y - My)2 16 9 0 4 4 0 9 4

Sx=40 S(x-Mx)2=32 Sy=32 S(y-My)2=46 Mx=5 My=4

*(according to the t sig/probability table with df = 14, t must be at least 2.145 to reach p < .05, so this difference is not statistically significant) Interpretation: Sam's hypothesis was not confirmed. He did not find a significant difference between those who slept for four hours versus those who slept for eight hours on cognitive test performance.

CHI-SQUARE TEST
Chi-square is a statistical test commonly used to compare observed data with data we would expect to obtain according to a specific hypothesis. For example, if, according to Mendel's laws, you expected 10 of 20 offspring from a cross to be male and the actual observed number was 8 males, then you might want to know about the "goodness to fit" between the observed and expected. Were the deviations (differences between observed and expected) the result of chance, or were they due to other factors. How much deviation can occur before you, the investigator, must conclude that something other than chance is at work, causing the observed to differ from the expected. The chi-square test is always testing what scientists call the null hypothesis, which states that there is no significant difference between the expected and observed result.

The formula for calculating chi-square ( = (o-e)2/e


Example

):

Thai, the manager of a car dealership, did not want to stock cars that were bought less frequently because of their unpopular color. The five colors that he ordered were red, yellow, green, blue, and white. According to Thai, the expected frequencies or number of customers choosing each color should follow the percentages of last year. She felt 20% would choose yellow, 30% would choose red, 10% would choose green, 10% would choose blue, and 30% would choose white. She now took a random sample of 150 customers and asked them their color preferences. The results of this poll are shown in Table 1 under the column labeled observed frequencies."

Table 1 - Color Preference for 150 Customers for Thai's Superior Car Dealership Category Color Yellow Red Green Blue White Observed Frequencies 35 50 30 10 25
2

Expected Frequencies 30 45 15 15 45

We are now ready to use our formula for X and find out if there is a significant difference between the observed and expected frequencies for the customers in choosing cars. We will set up a worksheet; then you will follow the directions to form the columns and solve the formula. Category yellow red green blue white
2

O 35 50 30 10 25

E 30 45 15 15 45

(O - E) 5 5 15 -5 -20

(O - E)2 25 25 225 25 400

(O - E)2 E 0.83 0.56 15 1.67 8.89

X = 26.95 The table value for Chi Square in the correct box of 4 df and P=.05 level of significance is 9.49. If the calculated chi-square value for the set of data you are analyzing (26.95) is equal to or greater than the table value (9.49 ), reject the null hypothesis. There IS a significant difference between the data sets that cannot be due to chance alone. If the number you calculate is LESS than the number you find on the table, than you can probably say that any differences are due to chance alone.

Interpretation: In this situation, the rejection of the null hypothesis means that the differences between the expected frequencies (based upon last year's car sales) and the observed frequencies (based upon this year's poll taken by Thai) are not due to chance. That is, they are not due to chance variation in the sample Thai took; there is a real difference between them. Therefore, in deciding what color autos to stock, it would be to Thai's advantage to pay careful attention to the results of her poll!

ANOVA
The technique of analysis of variance is referred to as ANOVA. A table showing the source of variation, the sum of squares, degrees of freedom, mean square (variance) and the formula for the F-ratio is known as ANOVA TABLE. It is used to test whether the means of a number of populations are equal. The formula for calculating ANOVA

SSwithin = SStotal - SSamong dfamong = r-1 dfwithin = N-r

x = individual observation r = number of groups N = total number of observations (all groups) n = number of observations in group

One-Way ANOVA example Susan Sound predicts that students will learn most effectively with a constant background sound, as opposed to an unpredictable sound or no sound at all. She randomly divides twenty-four students into three groups of eight. All students study a passage of text for 30 minutes. Those in group 1 study with background sound at a constant volume in the background. Those in group 2 study with noise that changes volume periodically. Those in group 3 study with no sound at all. After studying, all students take a 10 point multiple choice test over the material. Their scores follow: Group 1) constant sound 2) random sound 3) no sound test scores 74686629

55344722 24712155

x1 7 4 6 8 6 6 2 9 Sx1 = 48 (Sx1)2 = 2304 M1 = 6

x1 2 49 16 36 64 36 36 4 81 Sx12 = 322

x2 5 5 3 4 4 7 2 2 Sx2 = 32 (Sx2)2 = 1024 M2 = 4

x22 25 25 9 16 16 49 4 4 Sx22 = 148

x3 2 4 7 1 2 1 5 5 Sx3 = 27 (Sx3)2 = 729 M3 = 3.375

x3 2 4 16 49 1 4 1 25 25 Sx32 = 125

= 595 - 477.04 SStotal = 117.96

= 507.13 - 477.04 SSamong = 30.08 SSwithin = 117.96 - 30.08 = 87.88 Source SS df MS F

Among 30.08 2 15.04 3.59 Within 87.88 21 4.18 *(according to the F sig/probability table with df = (2,21) F must be at least 3.4668 to reach p < .05, so F score is statistically significant) Interpretation: Susan can conclude that her hypothesis may be supported. The means are as she predicted, in that the constant music group has the highest score. However, the signficant F only indicates that at least two means are signficantly different from one another, but she can't know which specific mean pairs significantly differ until she conducts a posthoc analysis

ASSIGNMENT ON MULTIVARIATE ANALYSIS

SUBMITTED BY P.J CERLIN PAJILA M.Phil Management 26/11/2012

Вам также может понравиться