You are on page 1of 46

# Training Data Analysis

## Multivariate Linier Regression

BASIC PRINCIPLE
Novandri Kusuma Wardana

## Multivariate Linear Regression

A. The Basic Principle
We consider the multivariate extension of multiple
linear regression modeling the relationship
between m responses Y1,,Ym and a single set of r
predictor variables z1,,zr. Each of the m responses is
assumed to follow its own regression model, i.e.,
Y1 = B01 + B11z1 + B21z2 + +
Y2 = B02 + B12z1 + B22z2 + +

Br1zr
Br2zr

Br1zr

1

2

where E = E
= 0, Var =

## Conceptually, we can let

[zj0, zj1, , zjr]
denote the values of the predictor variables for the jth
trial and
Yj1
j1
Y

j2
j2

Yj =
, =

Yjm
jm
be the responses and errors for the jth trial. Thus we
have an n x (r + 1) design matrix

z10
z
20
Z =

zn0

z11
z21

zn1

z1r
z2r

znr

If we now set
Y11
Y
21
Y =

Yn1

Y12
Y22

Yn2

Y1m
Y2m
= Y1 | Y 2 | | Y m

Ynm

01

11
=

r1

02
12

r2

0m
1m
= 1 | 2 | | m

rm

11

21
=

n1

12
22

n2

1m
2m
=
| 2
1

nm

'1

'
2
| | m =

'
m

with

Y = Z +

E i = 0

and

## Cov i,k = ikI, i, k = 1, , m

Note also that the m observed responses on the jth
trial have covariance matrix
11 12 1m

21
22
2m
=

m1 m2 mm

## The ordinary least squares estimates b are found in a

~
manner analogous to the univariate case we begin
by taking

i =

'

ZZ

-1

Z'Yi

## collecting the univariate least squares estimates yields

-1
-1
'
'
'

| | | = ZZ Z Y | Y | | Y = ZZ Z'Y
m
1
2
m
1 2

## Now for any choice of parameters

B = b1 | b2 | | b m

the resulting matrix of errors is

Y - Z

is

Y
Y - ZB

'

- ZB =

- Zb1

Y
'

- Zb1

- Zb m

Y
'

- Zb1

- Zbi

Y
'

- Zbi

^
b~ (i)

- Zb m

Y
Y - ZB

'

- ZB and

- Zb m

Y
'

- Zb m

generalized
variance

Y - ZB
Y - ZB

'

minimizes the

i.e.,
tr

## We can show that the selection b~(i) =

ith diagonal sum of squares

- Zb1

'

- Z
Y

Z Y
'

= ZZ

-1

'

Y-Y

-1
'

= I - Z ZZ Z' Y

## Note that the orthogonality conditions among

residuals, predicted values, and columns of the
design matrix which hold in the univariate case are
also true in the multivariate case because

-1
'

'

## which means the residuals are perpendicular to the

columns of the design matrix

-1
'

Z

'

'

## and to the predicted values

'
Y

-1
' '
'

Z I - Z ZZ Z' Y

= 0

Furthermore, because

we have
'
YY

total sums of

squares and
crossproducts

+
Y

'

YY

predicted sums

of squares and
crossproducts

'

residual (error)
sums of squares
and crossproducts

## Example suppose we had the following six sample

observations on two independent variables (palatability
and texture) and two dependent variables (purchase
intent and overall quality):
Palatability Texture
65
72
77
68
81
73

71
77
73
78
76
87

Overall Purchase
Quality
Intent
63
67
70
70
72
70
75
72
89
88
76
77

## Use these data to estimate the multivariate linear

regression model for which palatability and texture are
independent variables while purchase intent and
overall quality are the dependent variables

We wish to estimate
Y1 = B01 + B11z1 + B21z2
and
Y2 = B02 + B12z1 + B22z2

jointly.
The design matrix is

1
1

1
Z =

1
1

65
72
77
68
81
73

71
77
73

78
76

87

so
1 1 1 1 1 1

'
ZZ
= 65 72 77 68 81 73

71 77 73 78 76 87

1
1

1
1

65
72
77
68
81
73

71
77
436
462
6
73

=
436
31852
33591

78
462 33591 35728
76

87

and
-1

ZZ

'

-1

436
462
6

## = 436 31852 33591

462 33591 35728
62.560597030 -0.378268027 -0.453330568

## = -0.378268027 0.005988412 -0.000738830

-0.453330568 -0.000738830 0.006584661

and
Z'y1

so

1 1 1 1 1 1

= 65 72 77 68 81 73
71 77 73 78 76 87

63
70

445
72

=
32536

75
34345
89

76

-1
'

1 = ZZ Z'y1

## = -0.378268027 0.005988412 -0.000738830 32536

-0.453330568 -0.000738830 0.006584661 34345
-37.501205460

= 1.134583728
0.379499410

and
Z'y 2

so

1 1 1 1 1 1

= 65 72 77 68 81 73
71 77 73 78 76 87

67
70

444
70

=
32430

72
34260
88

77

-1
'

2 = ZZ Z'y2

## = -0.378268027 0.005988412 -0.000738830 32430

-0.453330568 -0.000738830 0.006584661 34260
-21.432293350

= 0.940880634
0.351449792

so

-37.501205460 -21.432293350

|
= 1.134583728
=
0.940880634

1 2
0.379499410
0.351449792

## This gives us estimated values matrix

1
1

1
= Z =

1
1

65
72
77
68
81
73

71
63.19119
73.41028
77
-37.501205460 -21.432293350
73
77.56520
0.940880634 =
1.134583728
78
69.25144

0.379499410
0.351449792
83.24203
76

87
78.33986

64.67788
73.37275
76.67135

69.96067
81.48922

77.82812

Y - Y

63
70

72
=
75
89

76

## 67 63.19119 64.67788 0.191194960 -2.322116943

70 73.41028 73.37275 3.410277515 3.372746244
70 77.56520 76.67135 5.565198512 6.671350244
-
=

## 72 69.25144 69.96067 -5.748557985 -2.039326498

88 83.24203 81.48922 -5.757968347 -6.510777845

## Note that each

column sums to zero!

## B. Inference in Multivariate Regression

The least squares estimators

b = [b
| b ||b
]
~ (1) ~ (2)
~ (m)

~
of the multivariate regression model have the
following properties

= i.e., E
- E
i
i

-1
'

- Cov i, k = ik ZZ , i, k = 1, , m

'

= -1
- E = 0 and E

n - r - 1

## if the model is of full rank, i.e., rank(Z)=

r + 1 < n.
~
Note that e~ and b are also uncorrelated.
~

= z'0
z'0

m
1
2
m
1 2

= z'0
E z'0

## We can also determine from these properties that the

estimation errors
have covariances

z'0i - z'0

i

E z'0 i -
i i

'

= z0 E i -
i

z
0
i

'

-1

'
'

- i z0 = ikz0 ZZ z0

'

= Y0
z'0

## i.e., the forecasted vector Y0 associated with the values

~
of the predictor variables z0 is an unbiased estimator
~
of Y~ 0.
The forecast errors have covariance

'

E Y0i - z0i

0k

- z
k
'
0

'
= 1 + z' ZZ
ik
0

-1

z0

## Thus, for the multivariate regression model with full

rank (Z) = r + 1, n r + 1 + m, and normally
~
distributed errors ~e,

-1

'

ZZ

Z'Y

~

## where the elements of S are

~

Cov
i

k

= ZZ

'

ik

-1

, i, k = 1, , m

^
Also, the maximum likelihood estimator of b is
~
independent of the maximum likelihood estimator of
the positive definite matrix S given by

1
'
= 1
=
n
n

and

'

Y - Z Y - Z

~ Wp,n-r-1
n

## all of which provide additional support for using the

least squares estimate when the errors are normally
distributed
'
and n-1

and

## These results can be used to develop likelihood ratio

tests for the multivariate regression parameters.
The hypothesis that the responses do not depend on
predictor variables zq+1, zq+2,, zr is
Big Beta (2)

H0 : 2 = 0 where

1

=

2

If we partition Z
in a similar manner
~

Z = Z1 | Z2

m x (q + 1)

m x (r - q)

(q + 1) x m
(r - q) x m

## we can write the general model as

E Y = Z

1

= Z1 | Z2 = Z11 + Z22

^
The extra sum of squares associated with b(2) are
~

'

Y - Z1

Y - Z1
1
1

where

and

1 = n-1

= n -

'

Y - Z

Y - Z

'
1 1

= ZZ

Y - Z1
1

-1

'

Z'1Y

Y - Z1
1

## The likelihood ratio for the test of the hypothesis

H0:b(2) = 0~
~
is given by the ratio of generalized variances

= =

, 1

1

,
L

,
1

max L ,
,

max L
1

n2

2 n

## Finally, for the multivariate regression model with

full rank (Z)
~ = r + 1, n r + 1 + m, normally
distributed errors ~e, and the null hypothesis is true
^
^
(so n(S~ 1 S)
~ Wq,r-q(S))
~
~

- n - r - 1 - m - r + q + 1 ln
2

~ 2m r-q

Crossproducts as
^

E
= nS
~
~

## and the Hypothesis Sum of Squares and Crossproducts

as
H
= n(S
- S)
~
~1 ~
then we can define Wilks lambda as

2n

E
= =
=

E+H
1

i=1 1 + i

## where h1 h2 hs are the ordered eigienvalues of

HE-1 where s = min(p, r - q).
~~

## There are other similar tests (as we have seen in our

discussion of MANOVA):
s

Pillais Trace

i =1

-1
i

= tr H H + E

1 + i

-1

=
tr
HE
Hotelling-Lawley Trace i

i=1

1
Roys Greatest Root
1 + 1
Each of these statistics is an alternative to Wilks
lambda and perform in a very similar manner
(particularly for large sample sizes).

## Example For our previous data (the following six

sample observations on two independent variables palatability and texture - and two dependent variables purchase intent and overall quality
Palatability Texture
65
72
77
68
81
73

71
77
73
78
76
87

Overall Purchase
Quality
Intent
63
67
70
70
72
70
75
72
89
88
76
77

## to test the hypotheses that i) palatability has no joint

relationship with purchase intent and overall quality
and ii) texture has no joint relationship with purchase
intent and overall quality.

## We first test the hypothesis that palatability has no

joint relationship with purchase intent and overall
quality, i.e.,
H0:b(1) = 0
~
The likelihood ratio for the test of this hypothesis is
given by the ratio of generalized variances

= =

,
2

max L ,
,

max L
2

, 2

2

,
L

n2

statistic

2 n

E
= =

E+H
2

## The error sum of squares and crossproducts matrix is

114.31302415 99.335143683
E =

99.335143683
108.5094298

## and the hypothesis sum of squares and crossproducts

matrix for this null hypothesis is

214.96186763 178.26225891
H =

178.26225891
147.82823253

## so the calculated value of the Wilks lambda statistic is

2n

E
=
E+H

114.31302415 99.335143683
99.335143683 108.5094298

=
114.31302415 99.335143683 214.96186763 178.26225891
99.335143683 108.5094298 + 178.26225891 147.82823253

2536.570299
=
= 0.34533534
7345.238098

## The transformation to a Chi-square distributed

statistic (which is actually valid only when n r and n
m are both large) is

- n - r - 1 - m - r + q + 1 ln
2

= - 6 - 2 - 1 - 2 - 2 + 1 + 1 ln 0.34533534
2

= 0.92351795
at a = 0.01 and m(r - q) = 1 degrees of freedom, the
critical value is 9.210351 - we have a strong nonrejection. Also, the approximate p-value of this chisquare test is 0.630174 note that this is an extremely
gross approximation (since n r = 4 and n m = 4).

## We next test the hypothesis that texture has no joint

relationship with purchase intent and overall quality,
i.e.,
H0:b(2) = 0
~
The likelihood ratio for the test of this hypothesis is
given by the ratio of generalized variances

= =

,
1

max L ,
,

max L
1

, 1

1

,
L

n2

statistic

2 n

E
= =

E+H
1

## The error sum of squares and crossproducts matrix is

114.31302415 99.335143683
E =

99.335143683
108.5094298

## and the hypothesis sum of squares and crossproducts

matrix for this null hypothesis is

21.872015222 20.255407498
H =

20.255407498
18.758286731

## so the calculated value of the Wilks lambda statistic is

2n

E
=
E+H

114.31302415 99.335143683
99.335143683 108.5094298

=
114.31302415 99.335143683 21.872015222 20.255407498
99.335143683 108.5094298 + 20.255407498 18.758286731

2536.570299
=
= 0.837135598
3030.059055

## The transformation to a Chi-square distributed

statistic (which is actually valid only when n r and n
m are both large) is

- n - r - 1 - m - r + q + 1 ln
2

= - 6 - 2 - 1 - 2 - 2 + 1 + 1 ln 0.837135598
2

= 0.15440838
at a = 0.01 and m(r - q) = 1 degrees of freedom, the
critical value is 9.210351 - we have a strong nonrejection. Also, the approximate p-value of this chisquare test is 0.925701 - note that this is an extremely
gross approximation (since n r = 4 and n m = 4).

## SAS code for a Multivariate Linear Regression Analysis:

OPTIONS LINESIZE = 72 NODATE PAGENO = 1;
DATA stuff;
INPUT z1 z2 y1 y2;
LABEL z1='Palatability Rating'
z2='Texture Rating'
y1='Overall Quality Rating'
y2='Purchase Intent';
CARDS;
65
71
63
67
72
77
70
70
77
73
72
70
68
78
75
72
81
76
89
88
73
87
76
77
;
PROC GLM DATA=stuff;
MODEL y1 y2 = z1 z2/;
MANOVA H=z1 z2/PRINTE PRINTH;
TITLE4 'Using PROC GLM for Multivariate Linear Regression';
RUN;

## SAS output for a Multivariate Linear Regression Analysis:

Dependent Variable: y1

Source
Model
Error
Corrected Total

DF
2
3
5

R-Square
0.691740

Sum of
Squares
256.5203092
114.3130241
370.8333333

Coeff Var
8.322973

## Overall Quality Rating

Mean Square
128.2601546
38.1043414

Root MSE
6.172871

F Value
3.37

Pr > F
0.1711

y1 Mean
74.16667

Source
z1
z2

DF
1
1

Type I SS
234.6482940
21.8720152

Mean Square
234.6482940
21.8720152

F Value
6.16
0.57

Pr > F
0.0891
0.5037

Source
z1
z2

DF
1
1

Type III SS
214.9618676
21.8720152

Mean Square
214.9618676
21.8720152

F Value
5.64
0.57

Pr > F
0.0980
0.5037

Dependent Variable: y1

Parameter
Intercept
z1
z2

Estimate
-37.50120546
1.13458373
0.37949941

Standard
Error
48.82448511
0.47768661
0.50090335

t Value
-0.77
2.38
0.76

Pr > |t|
0.4984
0.0980
0.5037

## SAS output for a Multivariate Linear Regression Analysis:

Dependent Variable: y2

Source
Model
Error
Corrected Total

DF
2
3
5

R-Square
0.625830

Sum of
Squares
181.4905702
108.5094298
290.0000000

Coeff Var
8.127208

Purchase Intent

Mean Square
90.7452851
36.1698099

Root MSE
6.014134

F Value
2.51

Pr > F
0.2289

y2 Mean
74.00000

Source
z1
z2

DF
1
1

Type I SS
162.7322835
18.7582867

Mean Square
162.7322835
18.7582867

F Value
4.50
0.52

Pr > F
0.1241
0.5235

Source
z1
z2

DF
1
1

Type III SS
147.8282325
18.7582867

Mean Square
147.8282325
18.7582867

F Value
4.09
0.52

Pr > F
0.1364
0.5235

Dependent Variable: y2

Parameter
Intercept
z1
z2

Purchase Intent

Estimate
-21.43229335
0.94088063
0.35144979

Standard
Error
47.56894895
0.46540276
0.48802247

t Value
-0.45
2.02
0.72

Pr > |t|
0.6829
0.1364
0.5235

## SAS output for a Multivariate Linear Regression Analysis:

The GLM Procedure
Multivariate Analysis of Variance

y1
y2

## E = Error SSCP Matrix

y1
y2
114.31302415
99.335143683
99.335143683
108.5094298

Partial Correlation Coefficients from the Error SSCP Matrix / Prob > |r|
DF = 3
y1

y1
1.000000

y2

0.891911
0.1081

y2
0.891911
0.1081
1.000000

## SAS output for a Multivariate Linear Regression Analysis:

The GLM Procedure
Multivariate Analysis of Variance
H = Type III SSCP Matrix for z1
y1
y2
y1
214.96186763
178.26225891
y2
178.26225891
147.82823253
Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for z1
E = Error SSCP Matrix
Characteristic
Characteristic Vector V'EV=1
Root
Percent
y1
y2
1.89573606
100.00
0.10970859
-0.01905206
0.00000000
0.00
-0.17533407
0.21143084
MANOVA Test Criteria and Exact F Statistics
for the Hypothesis of No Overall z1 Effect
H = Type III SSCP Matrix for z1
E = Error SSCP Matrix
S=1

Statistic
Wilks' Lambda
Pillai's Trace
Hotelling-Lawley Trace
Roy's Greatest Root

M=0
N=0
Value F Value
0.34533534
1.90
0.65466466
1.90
1.89573606
1.90
1.89573606
1.90

Num DF
2
2
2
2

Den DF
2
2
2
2

Pr > F
0.3453
0.3453
0.3453
0.3453

## SAS output for a Multivariate Linear Regression Analysis:

The GLM Procedure
Multivariate Analysis of Variance
H = Type III SSCP Matrix for z2
y1
y2
y1
21.872015222
20.255407498
y2
20.255407498
18.758286731
Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for z2
E = Error SSCP Matrix
Characteristic
Characteristic Vector V'EV=1
Root
Percent
y1
y2
0.19454961
100.00
0.06903935
0.02729059
0.00000000
0.00
-0.19496558
0.21052601
MANOVA Test Criteria and Exact F Statistics
for the Hypothesis of No Overall z2 Effect
H = Type III SSCP Matrix for z2
E = Error SSCP Matrix
S=1

Statistic
Wilks' Lambda
Pillai's Trace
Hotelling-Lawley Trace
Roy's Greatest Root

M=0
N=0
Value F Value
0.83713560
0.19
0.16286440
0.19
0.19454961
0.19
0.19454961
0.19

Num DF
2
2
2
2

Den DF
2
2
2
2

Pr > F
0.8371
0.8371
0.8371
0.8371

## We can also build confidence intervals for the

predicted mean value of Y0 associated with z0 - if the
~
~
model

+
Y = Z

'z0

and

~ Nm

-1
'
'
'
z0, z0 ZZ z0

independent

~ Wn-r-1
n

so

'

'

'

'
'
-1

z0 - z0

z0 - z0

T2 =

-1
-1
n
r
1

'
'
z' ZZ
z'0 ZZ
z0
z0
0

~
~~
is given by

'z0 - 'z0

'

-1

n
r
1

'z0 - 'z0

z ZZ
'
0

'

-1

m n - r - 1

z0
Fm,n-r-m
n-r-m

## and the 100(1 a)% simultaneous confidence intervals

~ with z0 (z0 b(i) ) are
for the mean value of Yi associated
~ ~
~

z
i
'
0

-1
m n - r - 1
n
'
'
Fm,n-r- m z0 ZZ z0
ii
n-r-m
n-r-1

i = 1,,m

## Finally, we can build prediction intervals for the

predicted value of Y~ 0 associated with ~z0 here the
prediction error

+
Y = Z

'z0

and

~ Nm

-1
'
'
'
z0, z0 ZZ z0

independent

~ Wn-r-1
n

so

'

'

'

'
'
-1

z0 - z0

z0 - z0

T2 =

-1
-1
n
r
1

'
'
z' ZZ
z'0 ZZ
z0
z0
0

## the prediction intervals the 100(1 a)% prediction

interval associated with ~z0 is given by

'z0
Y0 -

'

-1

n
r
1

'z0
Y0 -

m n - r - 1

1 + z ZZ z0
Fm,n-r-m
n-r-m

'
0

'

-1

## and the 100(1 a)% simultaneous prediction intervals

with ~z0 are

z
i
'
0

-1
m n - r - 1
n
'
'
Fm,n-r- m 1 + z0 ZZ z0
ii
n-r-m
n-r-1

i = 1,,m