Multivariate Ana

MULTIVARIATE ANALYSIS
Statistical techniques that simultaneously analyze more than two variables Multivariate techniques two categories 1. Dependency techniques deal with one or more dependent variables One dependent variable - data metric
Multiple Regression Analysis
Several dependent variable data metric
2. Interdependency techniques More than two variables Variables not segregated as dependent and independent variables Interrelationships between the variables are analyzed Data metric Factor Analysis, Cluster
Discriminant Analysis
Analysis
MULTIVARIATE ANALYSIS Multiple Regression

Used to analyse quantitative data To study cause and effect relationship between a single dependent variable and two or more than two independent variables Used mainly for prediction/ forecasting
2

The general multiple regression with k independent variables is given by:
Y ' a b1 X1 b2 X 2 ...bk X k
Greek letters are used for a (a) and b (b) when denoting population parameters. a is the Y-intercept. X1 to Xk are the independent variables.
bj is called a partial regression coefficient. It is the net change in Y for each unit change in Xj holding all other variables constant, (where j=1 to k) 3
ASSUMPTIONS IN MULTIPLE REGRESSION The independent variables and the dependent variable have a linear relationship. The dependent variable must be continuous and at least interval-scaled. The residuals should follow the normal distributed with mean 0. The variation in (Y-Y) or residual must be the same for all values of Y. When this is the case, we say the difference exhibits homoscedasticity.
4
Successive values of the dependent variable must be uncorrelated or not autocorrelated.
Correlation Matrix
Correlation Coefficients
Cars
Advertising
Sales force
Cars
1.000
Advertising
Sales force
0.808
0.872
1.000
0.537 1.000
A correlation matrix is used to show all possible simple correlation coefficients among the variables. The matrix is useful for locating correlated independent variables. It shows how strongly each independent variable is correlated with the dependent variable.

The least squares criterion is used to develop this equation. Because determining b1, b2, etc. is very tedious, a software package such as Excel or MINITAB may be used.
Explained Variation Variation ANOVA TABLE accounted Source df SS MS for by the set of Regression k-1 SSR SSR/(k-1) independent S(YY)2 variables. Error n-k-1 SSE SSE/(n-k-1) S(Y-Y)2 Total n-k-1 SS Total S(Y-Y) Unexplained or Random Variation
Variation not accounted for by the independent variables.
ANOVA
Total Variation
EXAMPLE 1 A market researcher for Super Markets is studying the yearly amount families of four or more spend on food. Three independent variables are thought to be related to yearly food expenditures (Food). Those variables are: total family income (Income) in $00, size of family (Size), and whether the family has children in college (College). 8
Expenditure = a + b1*(Income) + b2(Size) + b3(College)

The variable college is called a dummy or indicator variable. It can take only one of the two possible outcomes i.e. a child is a college student or not. Examples of dummy variables: gender, the part is acceptable or not, the voter will or will not vote for the incumbent governor etc.
We usually code one value of the dummy variable as 1 and the other 0.
9
Fam ily 1 2 3 4 5 6 7 8 9 10 11 12
Food 3900 5300 4300 4900 6400 7300 4900 5300 6100 6400 7400 5800
Incom e 376 515 516 468 538 626 543 437 608 513 493 563
Size 4 5 4 5 6 7 5 4 5 6 6 5
Student 0 1 0 0 1 1 0 0 1 1 1 0
Example 1 continued
10
Example 1 continued
From the analysis provided by MINITAB, the estimated multiple regression equation is: Y = 954 +1.09X1 + 748X2 + 565X3
Food Expenditure= 954+1.09*income+748*size +565*college
What food expenditure would you estimate for a family of 4, with no college student, and an income of $50,000 (which is input as 500)? 11
Food Expend.=$954+$1.09*income+$748*size +$565*college
Example 1 continued
Each additional $100 dollars of income per year will increase the amount spent on food by $109 per year. An additional family member will increase the amount spent per year on food by $748. A family with a college student will spend $565 more per year on food than those without a college student. So a family of 4, with no college students, and an income of $50,000 will spend an estimated $4,491.
Food Expend.=$954+$1.09*500+$748*4+$565*0
12
The regression equation is Food = 954 + 1.09 Income + 748 Size + 565 Student
Predictor Constant
Income Size Student S = 572.7
Coef 954
1.092 748.4 564.5
SE Coef 1581
T 0.60
0.35 2.47 1.14
P 0.563
0.738 0.039 0.287
3.153
303.0 495.1
R-Sq = 80.4%
R-Sq(adj) = 73.1%
Analysis of Variance Source DF Regression 3 Residual Error 8 Total 11
SS 10762903 2623764 13386667
MS 3587634 327970
F 10.94
P 0.003
13 Example 1 continued
The coefficient of determination is 80.4 percent. This means that more than 80 percent of the variation in the amount spent on food is accounted for by the variables income, family size, and student.
Food
Income Size
College
Food
1.000 1.000 0.609 0.491 1.000 0.743 1.000
Income 0.587 Size 0.876
College 0.773
Correlation matrix
The strongest correlation between the dependent variable and an independent variable is between family size and amount spent on food.
14
Conduct an individual test to determine which coefficients are not zero. This is the hypotheses for the independent variable family size.
H0 : b2 0
From the MINITAB output, the only significant variable is FAMILY (family size) using the p-values. The other variables can be omitted from the model.
H1: b2 0
Thus, using the 5% level of significance, reject H0 if the p-value < .05.
Example 1 continued15
If we rerun the analysis using only the significant independent variable i.e. family size. The new regression equation is: Y = 340 + 1031X2 The coefficient of determination is 76.8 percent. We dropped two independent variables, and the R-square term was reduced by only 3.6 percent.
Example 1 continued16
Regression Analysis: Food versus Size The regression equation is:
Food = 340 + 1031 Size

Predictor Constant Size S = 557.7 Coef 339.7 1031.0 SE Coef 940.7 179.4 T 0.36 5.75 P 0.726 0.000
R-Sq = 76.8%
R-Sq(adj) = 74.4%
Analysis of Variance
Source Regression Residual Error Total DF 1 10 11 SS 10275977 3110690 13386667 MS 10275977 311069 F 33.03 P 0.000
Example 1 continued 17
A residual is the difference between the actual value of Y and the predicted value Y. Residuals should follow the normal distribution. Histograms are useful in checking this requirement. A plot of the residuals and their corresponding Y values is used for showing that there are no trends or patterns in the residuals.
Analysis of Residuals 18
Residual Plots against Estimated Values of Y

1000
Residuals
500 0
-500
4500 6000 Y 7500 Residual Plot
19
8 7 6 5 4 3 2 1 0
Frequency
-600
-200
200
Residuals
600
1000
Histograms of Residuals
20

Multivariate Ana

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Multivariate Ana

Загружено:

Авторское право:

Доступные форматы

MULTIVARIATE ANALYSIS

Multiple Regression Analysis

Several dependent variable data metric

MULTIVARIATE ANALYSIS Multiple Regression

Multiple Regression Analysis

Successive values of the dependent variable must be uncorrelated or not autocorrelated.

Multiple Regression Analysis

Expenditure = a + b1*(Income) + b2(Size) + b3(College)

Food Expend.=$954+$1.09income+$748size +$565*college

Analysis of Variance Source DF Regression 3 Residual Error 8 Total 11

SS 10762903 2623764 13386667

1.000 1.000 0.609 0.491 1.000 0.743 1.000

Income 0.587 Size 0.876

Regression Analysis: Food versus Size The regression equation is:

Food = 340 + 1031 Size

Residual Plots against Estimated Values of Y

Вам также может понравиться