You are on page 1of 46

Training Data Analysis

Multivariate Linier Regression


BASIC PRINCIPLE
Novandri Kusuma Wardana

Multivariate Linear Regression


A. The Basic Principle
We consider the multivariate extension of multiple
linear regression modeling the relationship
between m responses Y1,,Ym and a single set of r
predictor variables z1,,zr. Each of the m responses is
assumed to follow its own regression model, i.e.,
Y1 = B01 + B11z1 + B21z2 + +
Y2 = B02 + B12z1 + B22z2 + +

Y1 = B01 + B11z1 + B21z2 + +

Br1zr
Br2zr

Br1zr

1

2

where E = E
= 0, Var =

Conceptually, we can let


[zj0, zj1, , zjr]
denote the values of the predictor variables for the jth
trial and
Yj1
j1
Y

j2
j2

Yj =
, =


Yjm
jm
be the responses and errors for the jth trial. Thus we
have an n x (r + 1) design matrix

z10
z
20
Z =

zn0

z11
z21

zn1

z1r
z2r

znr

If we now set
Y11
Y
21
Y =

Yn1

Y12
Y22

Yn2

Y1m
Y2m
= Y1 | Y 2 | | Y m

Ynm

01

11
=

r1

02
12

r2

0m
1m
= 1 | 2 | | m

rm

11

21
=

n1

12
22

n2

1m
2m
=
| 2
1

nm

'1


'
2
| | m =



'
m

the multivariate linear regression model is

with

Y = Z +

E i = 0

and

Cov i,k = ikI, i, k = 1, , m


Note also that the m observed responses on the jth
trial have covariance matrix
11 12 1m

21
22
2m
=

m1 m2 mm

The ordinary least squares estimates b are found in a


~
manner analogous to the univariate case we begin
by taking

i =

'

ZZ

-1

Z'Yi

collecting the univariate least squares estimates yields

-1
-1
'
'
'

| | | = ZZ Z Y | Y | | Y = ZZ Z'Y
m
1
2
m
1 2

Now for any choice of parameters

B = b1 | b2 | | b m


the resulting matrix of errors is

Y - Z

The resulting Error Sums of Squares and Crossproducts


is

Y
Y - ZB

'

- ZB =

- Zb1

Y
'

- Zb1

- Zb m

Y
'

- Zb1

- Zbi

Y
'

- Zbi

^
b~ (i)

- Zb m

Y
Y - ZB

'

- ZB and

are both minimized.

- Zb m

Y
'

- Zb m

generalized
variance

Y - ZB
Y - ZB

'

minimizes the

i.e.,
tr

We can show that the selection b~(i) =


ith diagonal sum of squares

- Zb1

'

so we have matrices of predicted values

- Z
Y

Z Y
'

= ZZ

-1

'

and we have a resulting matrices of residuals

Y-Y

-1
'

= I - Z ZZ Z' Y

Note that the orthogonality conditions among


residuals, predicted values, and columns of the
design matrix which hold in the univariate case are
also true in the multivariate case because

-1
'

Z I - Z ZZ Z' = Z' - Z' = 0


'

which means the residuals are perpendicular to the


columns of the design matrix

-1
'

= Z I - Z ZZ Z' = Z' - Z' = 0


Z

'

'

and to the predicted values

'
Y

-1
' '
'

Z I - Z ZZ Z' Y

= 0

Furthermore, because

we have
'
YY

total sums of

squares and
crossproducts

+
Y

'

YY

predicted sums

of squares and
crossproducts

'

residual (error)
sums of squares
and crossproducts

Example suppose we had the following six sample


observations on two independent variables (palatability
and texture) and two dependent variables (purchase
intent and overall quality):
Palatability Texture
65
72
77
68
81
73

71
77
73
78
76
87

Overall Purchase
Quality
Intent
63
67
70
70
72
70
75
72
89
88
76
77

Use these data to estimate the multivariate linear


regression model for which palatability and texture are
independent variables while purchase intent and
overall quality are the dependent variables

We wish to estimate
Y1 = B01 + B11z1 + B21z2
and
Y2 = B02 + B12z1 + B22z2

jointly.
The design matrix is

1
1

1
Z =

1
1

65
72
77
68
81
73

71
77
73

78
76

87

so
1 1 1 1 1 1

'
ZZ
= 65 72 77 68 81 73

71 77 73 78 76 87

1
1

1
1

65
72
77
68
81
73

71
77
436
462
6
73

=
436
31852
33591

78
462 33591 35728
76

87

and
-1

ZZ

'

-1

436
462
6

= 436 31852 33591


462 33591 35728
62.560597030 -0.378268027 -0.453330568

= -0.378268027 0.005988412 -0.000738830


-0.453330568 -0.000738830 0.006584661

and
Z'y1

so

1 1 1 1 1 1

= 65 72 77 68 81 73
71 77 73 78 76 87

63
70

445
72

=
32536

75
34345
89

76

-1
'

1 = ZZ Z'y1

62.560597030 -0.378268027 -0.453330568 445

= -0.378268027 0.005988412 -0.000738830 32536


-0.453330568 -0.000738830 0.006584661 34345
-37.501205460

= 1.134583728
0.379499410

and
Z'y 2

so

1 1 1 1 1 1

= 65 72 77 68 81 73
71 77 73 78 76 87

67
70

444
70

=
32430

72
34260
88

77

-1
'

2 = ZZ Z'y2

62.560597030 -0.378268027 -0.453330568 444

= -0.378268027 0.005988412 -0.000738830 32430


-0.453330568 -0.000738830 0.006584661 34260
-21.432293350

= 0.940880634
0.351449792

so

-37.501205460 -21.432293350

|
= 1.134583728
=
0.940880634

1 2
0.379499410
0.351449792

This gives us estimated values matrix

1
1

1
= Z =

1
1

65
72
77
68
81
73

71
63.19119
73.41028
77
-37.501205460 -21.432293350
73
77.56520
0.940880634 =
1.134583728
78
69.25144

0.379499410
0.351449792
83.24203
76

87
78.33986

64.67788
73.37275
76.67135

69.96067
81.48922

77.82812

and residuals matrix

Y - Y

63
70

72
=
75
89

76

67 63.19119 64.67788 0.191194960 -2.322116943


70 73.41028 73.37275 3.410277515 3.372746244
70 77.56520 76.67135 5.565198512 6.671350244
-
=

72 69.25144 69.96067 -5.748557985 -2.039326498


88 83.24203 81.48922 -5.757968347 -6.510777845

77 78.33986 77.82812 2.339855345 0.828124797

Note that each


column sums to zero!

B. Inference in Multivariate Regression


The least squares estimators

b = [b
| b ||b
]
~ (1) ~ (2)
~ (m)

~
of the multivariate regression model have the
following properties

= i.e., E
- E
i
i

-1
'

- Cov i, k = ik ZZ , i, k = 1, , m

'

= -1
- E = 0 and E

n - r - 1

if the model is of full rank, i.e., rank(Z)=


r + 1 < n.
~
Note that e~ and b are also uncorrelated.
~

This means that, for any observation z0

= z'0
z'0

| | | = z'0 | z'0 | | z'0


m
1
2
m
1 2

is an unbiased estimator, i.e.,

= z'0
E z'0

We can also determine from these properties that the


estimation errors
have covariances

z'0i - z'0

i

E z'0 i -
i i

'

= z0 E i -
i

z
0
i

'

-1

'
'

- i z0 = ikz0 ZZ z0


'

Furthermore, we can easily ascertain that

= Y0
z'0

i.e., the forecasted vector Y0 associated with the values


~
of the predictor variables z0 is an unbiased estimator
~
of Y~ 0.
The forecast errors have covariance

'

E Y0i - z0i

0k

- z
k
'
0

'
= 1 + z' ZZ
ik
0

-1

z0

Thus, for the multivariate regression model with full


rank (Z) = r + 1, n r + 1 + m, and normally
~
distributed errors ~e,

-1

'

ZZ

Z'Y

is the maximum likelihood estimator of b and


~

where the elements of S are


~

Cov
i

k

= ZZ

'

ik

-1

, i, k = 1, , m

^
Also, the maximum likelihood estimator of b is
~
independent of the maximum likelihood estimator of
the positive definite matrix S given by

1
'
= 1
=
n
n

and

'

Y - Z Y - Z

~ Wp,n-r-1
n

all of which provide additional support for using the


least squares estimate when the errors are normally
distributed
'
and n-1

are the maximum likelihood estimators of


and

These results can be used to develop likelihood ratio


tests for the multivariate regression parameters.
The hypothesis that the responses do not depend on
predictor variables zq+1, zq+2,, zr is
Big Beta (2)

H0 : 2 = 0 where

1

=

2

If we partition Z
in a similar manner
~

Z = Z1 | Z2

m x (q + 1)

m x (r - q)

(q + 1) x m
(r - q) x m

we can write the general model as

E Y = Z

1

= Z1 | Z2 = Z11 + Z22

^
The extra sum of squares associated with b(2) are
~

'

Y - Z1

Y - Z1
1
1

where

and

1 = n-1

= n -

'

Y - Z

Y - Z

'
1 1

= ZZ

Y - Z1
1

-1

'

Z'1Y

Y - Z1
1

The likelihood ratio for the test of the hypothesis

H0:b(2) = 0~
~
is given by the ratio of generalized variances

= =


, 1

1

,
L

,
1

max L ,
,

max L
1

n2

which is often converted to Wilks Lambda statistic

2 n

Finally, for the multivariate regression model with


full rank (Z)
~ = r + 1, n r + 1 + m, normally
distributed errors ~e, and the null hypothesis is true
^
^
(so n(S~ 1 S)
~ Wq,r-q(S))
~
~

- n - r - 1 - m - r + q + 1 ln
2

when n r and n m are both large.

~ 2m r-q

If we again refer to the Error Sum of Squares and


Crossproducts as
^

E
= nS
~
~

and the Hypothesis Sum of Squares and Crossproducts


as
H
= n(S
- S)
~
~1 ~
then we can define Wilks lambda as

2n

E
= =
=

E+H
1

i=1 1 + i

where h1 h2 hs are the ordered eigienvalues of


HE-1 where s = min(p, r - q).
~~

There are other similar tests (as we have seen in our


discussion of MANOVA):
s

Pillais Trace

i =1

-1
i

= tr H H + E

1 + i

-1

=
tr
HE
Hotelling-Lawley Trace i

i=1

1
Roys Greatest Root
1 + 1
Each of these statistics is an alternative to Wilks
lambda and perform in a very similar manner
(particularly for large sample sizes).

Example For our previous data (the following six


sample observations on two independent variables palatability and texture - and two dependent variables purchase intent and overall quality
Palatability Texture
65
72
77
68
81
73

71
77
73
78
76
87

Overall Purchase
Quality
Intent
63
67
70
70
72
70
75
72
89
88
76
77

to test the hypotheses that i) palatability has no joint


relationship with purchase intent and overall quality
and ii) texture has no joint relationship with purchase
intent and overall quality.

We first test the hypothesis that palatability has no


joint relationship with purchase intent and overall
quality, i.e.,
H0:b(1) = 0
~
The likelihood ratio for the test of this hypothesis is
given by the ratio of generalized variances

= =

,
2

max L ,
,

max L
2

, 2

2

,
L

n2

For ease of computation, well use the Wilks lambda


statistic

2 n

E
= =

E+H
2

The error sum of squares and crossproducts matrix is

114.31302415 99.335143683
E =

99.335143683
108.5094298

and the hypothesis sum of squares and crossproducts


matrix for this null hypothesis is

214.96186763 178.26225891
H =

178.26225891
147.82823253

so the calculated value of the Wilks lambda statistic is

2n

E
=
E+H

114.31302415 99.335143683
99.335143683 108.5094298

=
114.31302415 99.335143683 214.96186763 178.26225891
99.335143683 108.5094298 + 178.26225891 147.82823253

2536.570299
=
= 0.34533534
7345.238098

The transformation to a Chi-square distributed


statistic (which is actually valid only when n r and n
m are both large) is

- n - r - 1 - m - r + q + 1 ln
2

= - 6 - 2 - 1 - 2 - 2 + 1 + 1 ln 0.34533534
2

= 0.92351795
at a = 0.01 and m(r - q) = 1 degrees of freedom, the
critical value is 9.210351 - we have a strong nonrejection. Also, the approximate p-value of this chisquare test is 0.630174 note that this is an extremely
gross approximation (since n r = 4 and n m = 4).

We next test the hypothesis that texture has no joint


relationship with purchase intent and overall quality,
i.e.,
H0:b(2) = 0
~
The likelihood ratio for the test of this hypothesis is
given by the ratio of generalized variances

= =

,
1

max L ,
,

max L
1

, 1

1

,
L

n2

For ease of computation, well use the Wilks lambda


statistic

2 n

E
= =

E+H
1

The error sum of squares and crossproducts matrix is

114.31302415 99.335143683
E =

99.335143683
108.5094298

and the hypothesis sum of squares and crossproducts


matrix for this null hypothesis is

21.872015222 20.255407498
H =

20.255407498
18.758286731

so the calculated value of the Wilks lambda statistic is

2n

E
=
E+H

114.31302415 99.335143683
99.335143683 108.5094298

=
114.31302415 99.335143683 21.872015222 20.255407498
99.335143683 108.5094298 + 20.255407498 18.758286731

2536.570299
=
= 0.837135598
3030.059055

The transformation to a Chi-square distributed


statistic (which is actually valid only when n r and n
m are both large) is

- n - r - 1 - m - r + q + 1 ln
2

= - 6 - 2 - 1 - 2 - 2 + 1 + 1 ln 0.837135598
2

= 0.15440838
at a = 0.01 and m(r - q) = 1 degrees of freedom, the
critical value is 9.210351 - we have a strong nonrejection. Also, the approximate p-value of this chisquare test is 0.925701 - note that this is an extremely
gross approximation (since n r = 4 and n m = 4).

SAS code for a Multivariate Linear Regression Analysis:


OPTIONS LINESIZE = 72 NODATE PAGENO = 1;
DATA stuff;
INPUT z1 z2 y1 y2;
LABEL z1='Palatability Rating'
z2='Texture Rating'
y1='Overall Quality Rating'
y2='Purchase Intent';
CARDS;
65
71
63
67
72
77
70
70
77
73
72
70
68
78
75
72
81
76
89
88
73
87
76
77
;
PROC GLM DATA=stuff;
MODEL y1 y2 = z1 z2/;
MANOVA H=z1 z2/PRINTE PRINTH;
TITLE4 'Using PROC GLM for Multivariate Linear Regression';
RUN;

SAS output for a Multivariate Linear Regression Analysis:


Dependent Variable: y1

Source
Model
Error
Corrected Total

DF
2
3
5

R-Square
0.691740

Sum of
Squares
256.5203092
114.3130241
370.8333333

Coeff Var
8.322973

Overall Quality Rating

Mean Square
128.2601546
38.1043414

Root MSE
6.172871

F Value
3.37

Pr > F
0.1711

y1 Mean
74.16667

Source
z1
z2

DF
1
1

Type I SS
234.6482940
21.8720152

Mean Square
234.6482940
21.8720152

F Value
6.16
0.57

Pr > F
0.0891
0.5037

Source
z1
z2

DF
1
1

Type III SS
214.9618676
21.8720152

Mean Square
214.9618676
21.8720152

F Value
5.64
0.57

Pr > F
0.0980
0.5037

Dependent Variable: y1

Parameter
Intercept
z1
z2

Overall Quality Rating

Estimate
-37.50120546
1.13458373
0.37949941

Standard
Error
48.82448511
0.47768661
0.50090335

t Value
-0.77
2.38
0.76

Pr > |t|
0.4984
0.0980
0.5037

SAS output for a Multivariate Linear Regression Analysis:


Dependent Variable: y2

Source
Model
Error
Corrected Total

DF
2
3
5

R-Square
0.625830

Sum of
Squares
181.4905702
108.5094298
290.0000000

Coeff Var
8.127208

Purchase Intent

Mean Square
90.7452851
36.1698099

Root MSE
6.014134

F Value
2.51

Pr > F
0.2289

y2 Mean
74.00000

Source
z1
z2

DF
1
1

Type I SS
162.7322835
18.7582867

Mean Square
162.7322835
18.7582867

F Value
4.50
0.52

Pr > F
0.1241
0.5235

Source
z1
z2

DF
1
1

Type III SS
147.8282325
18.7582867

Mean Square
147.8282325
18.7582867

F Value
4.09
0.52

Pr > F
0.1364
0.5235

Dependent Variable: y2

Parameter
Intercept
z1
z2

Purchase Intent

Estimate
-21.43229335
0.94088063
0.35144979

Standard
Error
47.56894895
0.46540276
0.48802247

t Value
-0.45
2.02
0.72

Pr > |t|
0.6829
0.1364
0.5235

SAS output for a Multivariate Linear Regression Analysis:


The GLM Procedure
Multivariate Analysis of Variance

y1
y2

E = Error SSCP Matrix


y1
y2
114.31302415
99.335143683
99.335143683
108.5094298

Partial Correlation Coefficients from the Error SSCP Matrix / Prob > |r|
DF = 3
y1

y1
1.000000

y2

0.891911
0.1081

y2
0.891911
0.1081
1.000000

SAS output for a Multivariate Linear Regression Analysis:


The GLM Procedure
Multivariate Analysis of Variance
H = Type III SSCP Matrix for z1
y1
y2
y1
214.96186763
178.26225891
y2
178.26225891
147.82823253
Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for z1
E = Error SSCP Matrix
Characteristic
Characteristic Vector V'EV=1
Root
Percent
y1
y2
1.89573606
100.00
0.10970859
-0.01905206
0.00000000
0.00
-0.17533407
0.21143084
MANOVA Test Criteria and Exact F Statistics
for the Hypothesis of No Overall z1 Effect
H = Type III SSCP Matrix for z1
E = Error SSCP Matrix
S=1

Statistic
Wilks' Lambda
Pillai's Trace
Hotelling-Lawley Trace
Roy's Greatest Root

M=0
N=0
Value F Value
0.34533534
1.90
0.65466466
1.90
1.89573606
1.90
1.89573606
1.90

Num DF
2
2
2
2

Den DF
2
2
2
2

Pr > F
0.3453
0.3453
0.3453
0.3453

SAS output for a Multivariate Linear Regression Analysis:


The GLM Procedure
Multivariate Analysis of Variance
H = Type III SSCP Matrix for z2
y1
y2
y1
21.872015222
20.255407498
y2
20.255407498
18.758286731
Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for z2
E = Error SSCP Matrix
Characteristic
Characteristic Vector V'EV=1
Root
Percent
y1
y2
0.19454961
100.00
0.06903935
0.02729059
0.00000000
0.00
-0.19496558
0.21052601
MANOVA Test Criteria and Exact F Statistics
for the Hypothesis of No Overall z2 Effect
H = Type III SSCP Matrix for z2
E = Error SSCP Matrix
S=1

Statistic
Wilks' Lambda
Pillai's Trace
Hotelling-Lawley Trace
Roy's Greatest Root

M=0
N=0
Value F Value
0.83713560
0.19
0.16286440
0.19
0.19454961
0.19
0.19454961
0.19

Num DF
2
2
2
2

Den DF
2
2
2
2

Pr > F
0.8371
0.8371
0.8371
0.8371

We can also build confidence intervals for the


predicted mean value of Y0 associated with z0 - if the
~
~
model

+
Y = Z

has normal errors, then


'z0

and

~ Nm

-1
'
'
'
z0, z0 ZZ z0

independent

~ Wn-r-1
n

so

'

'

'

'
'
-1

z0 - z0

z0 - z0

T2 =

-1
-1
n
r
1

'
'
z' ZZ
z'0 ZZ
z0
z0
0

Thus the 100(1 a)% confidence interval for the

predicted mean value of Y0 associated with ~z0 (bz0)


~
~~
is given by

'z0 - 'z0

'

-1

n
r
1

'z0 - 'z0

z ZZ
'
0

'

-1

m n - r - 1

z0
Fm,n-r-m
n-r-m

and the 100(1 a)% simultaneous confidence intervals


~ with z0 (z0 b(i) ) are
for the mean value of Yi associated
~ ~
~


z
i
'
0

-1
m n - r - 1
n
'
'
Fm,n-r- m z0 ZZ z0
ii
n-r-m
n-r-1

i = 1,,m

Finally, we can build prediction intervals for the


predicted value of Y~ 0 associated with ~z0 here the
prediction error

+
Y = Z

has normal errors, then


'z0

and

~ Nm

-1
'
'
'
z0, z0 ZZ z0

independent

~ Wn-r-1
n

so

'

'

'

'
'
-1

z0 - z0

z0 - z0

T2 =

-1
-1
n
r
1

'
'
z' ZZ
z'0 ZZ
z0
z0
0

the prediction intervals the 100(1 a)% prediction


interval associated with ~z0 is given by

'z0
Y0 -

'

-1

n
r
1

'z0
Y0 -

m n - r - 1

1 + z ZZ z0
Fm,n-r-m
n-r-m

'
0


'

-1

and the 100(1 a)% simultaneous prediction intervals


with ~z0 are


z
i
'
0

-1
m n - r - 1
n
'
'
Fm,n-r- m 1 + z0 ZZ z0
ii
n-r-m
n-r-1

i = 1,,m