© All Rights Reserved

Просмотров: 13

© All Rights Reserved

- Answers to Linear Algebra Book Joshua
- Quantitative Methods for Economists
- Scilab Tutorials for Computational Science
- Determinant
- 2.- Eigenvalues and Eigenvectors (Part 1)
- ÌÇÊ ËÈ
- Complete_Math1_rev2014(3)
- chapter3_research design and methodology
- AIEEE Maths QuickReview
- week 7
- Matrices and Determinant Maths Theory
- Mathematics Subject Test Sample Questions MCQs (31-40)
- On approximation of functions by exponential sums
- week 7
- MA106 - Linear Algebra
- pset1
- Calculus w 4
- Stt363chapter4
- Advanced Math
- re

Вы находитесь на странице: 1из 21

Multivariate Models 97

The primary purpose of this chapter is to introduce some basic ideas from multivari-

ate statistical analysis. Quite often, experiments produce data where measurements

were obtained on more than one variable hence the name: multivariate. In the Swiss

head dimension example (Flury, 1997), in order to determine well-fitting masks, sev-

eral different head-dimension measurements were obtained on the soldiers. In the

next chapter on regression analysis, we will examine models that are defined in terms

of several parameters. In order to properly understand the estimation of these model

parameters, a foundation in multivariate statistics is needed. In particular, we need

to understand concepts such as covariances and correlations between variables and

estimators. An advantage of the multivariate approach is to allow for designs of ex-

periments where the resulting parameter estimators will be uncorrelated, thus making

it easier to interpret results.

The probabilistic background for multivariate statistics requires multiple integration

ideas as seen in some of the formulas below. However, this chapter does not require

multiple integration computations. We shall be concerned instead with statistical

estimation computations which require simple (but tedious) arithmetic and some

elementary matrix algebra. Fortunately, these computations can be done very easily

on the computer. Data, particularly multivariate data, comes in the form of arrays of

numbers and hence matrix algebra techniques are the natural way of handling such

data. The appendix to this chapter contains a short review of some matrix algebra

in case the reader needs to brush up on these ideas.

Suppose we are interested in two variables. For instance, in the Swiss head dimension

data, let Y1 = MFB (Minimal frontal breadth or forehead width) and let Y2 = BAM,

(Breadth of angulus mandibulae or chin width). Data that consists of measurements

on two different variables is called bivariate data (similarly, data collected on three

variables is called trivariate and so on). We can define a joint probability density

function f (y1 , y2 ) that satisfies the following properties which mirror the properties

satisfied by the (univariate) pdf:

1. f (y1 , y2 ) 0.

2. The total volume under the pdf must be 1:

Z Z

f (y1 , y2 )dy1 dy2 = 1

Chapter 4. Multivariate Models 98

Z Z

P ((Y1 , Y2 ) A) = f (y1 , y2 )dy1 dy2 .

A

Definition. The marginal pdf of Y1 , denoted f1 (y1 ), is just the pdf of the random

variable Y1 considered alone. To determine the marginal pdf, we integrate out y2 in

the joint pdf: Z

f1 (y1 ) = f (y1 , y2 )dy2 .

Our focus here is not so much on computing probabilities using multiple integration.

Instead, we will focus on statistical measures of association between variables.

2 Covariance

Let Y1 and Y2 be two jointly distributed random variables with means 1 and 2

respectively and variances 12 and 22 . A common measure of association between Y1

and Y2 is the covariance, denoted 12 :

Z Z

12 = (y1 1 )(y2 2 )f (y1 , y2 )dy1 dy2 .

A positive covariance indicates that if Y1 is above its average (Y1 1 > 0), then

Y2 tends to be above its average (Y2 2 > 0), so that (Y1 1 )(Y2 2 ) tends to

be positive; also if Y1 is below average, then Y2 tends to be below average as well

whereby (Y1 1 )(Y2 2 ) is a negative times a negative resulting in a positive value.

Conversely, a negative covariance indicates that if Y1 tends to be small, then Y2 tends

to be large, and vice-versa.

To illustrate, if Y1 is a measure of a persons height and Y2 is a measure of their weight,

then these two variables tend to be associated. In particular, the covariance between

them is usually positive since taller people tend to weigh more and shorter people

tend to weigh less. On the other hand, if Y1 is the hours of training a technician

receives for learning to operate a new machine and Y2 represents the number of

errors the technician makes using the machine, then we would expect to see fewer

errors corresponding with more training and hence Y1 and Y2 would have a negative

covariance.

An important area where the covariance is important is when considering differences

of jointly distributed random variables Y1 Y2 . For instance, we will discuss later

experiments looking at paired differences in situations where we may want to compare

Chapter 4. Multivariate Models 99

two different experimental conditions. The statistical analysis requires that we know

the variance of the difference: var(Y1 Y2 ). There are two extreme cases:

Y1 = Y2 : var(Y1 Y2 ) = var(0) = 0

12 = 0 : var(Y1 Y2 ) = var(Y1 ) + var(Y2 )

These two extremes are special cases the following formula which holds in all cases:

3 Correlation

We can transform the covariance to obtain a well-known measure of association known

as the correlation, which is denoted by the Greek letter (rho).

12

Correlation: = 1 2 , where 1 and 2 are the standard deviations of Y1 and Y2

respectively.

1. 1 1.

is, there exists constants a and b so that Y2 = a + bY1 .

Property (1) highlights the fact that the correlation is a unitless quantity. Property

(2) highlights the fact that the correlation is a measure of the strength of the linear

relation between Y1 and Y2 . A perfect linear relation produces a correlation of 1

or 1. A correlation of zero indicates no linear relation between the two random

variables. Figure 1 shows scatterplots of data obtained from bivariate distributions

with different correlations. The distribution for the top-left panel had a correlation

of = 0.95. The plot shows a strong positive relation between Y1 and Y2 with the

points tightly clustered together in a linear pattern. The correlation for the top-right

panel is also positive with = 0.50 and again we see a positive relation between the

two variables, but not as strong as in the top-right panel. The bottom-left panel

corresponds to a correlation of = 0 and consequently, we see no relationship evident

between Y1 and Y2 in this plot. Finally, the bottom-right panel shows a negative

linear relation with a correlation of = 0.50.

A note of caution is in order: two variables Y1 and Y2 can be strongly related, but

the relation may be nonlinear in which case the correlation may not be a reasonable

Chapter 4. Multivariate Models 100

correlations.

bution. There is clearly a very strong relation between y1 and y2 , but the relation is

nonlinear. The correlation is not an appropriate measure of association for this data.

In fact, the correlation is nearly zero. To say y1 and y2 are unrelated because they

are uncorrelated can be misleading if the relation is nonlinear. This is an error that

is quite commonly made in everyday usage of the term correlation.

Caution: Another very common error made in practice is to assume that because

two variables are highly correlated, one causes the other. Sometimes this will indeed

be the case (e.g. more fertilizer leads to taller plants and hence a positive correlation.)

In other cases, the causation conclusion is silly. For example, do a survey of fires in

a large city and note Y1 , the dollar amount of fire damage, and also Y2 , the number

of fire-fighters called in to fight the fire. Will Y1 and Y2 be positively or negatively

correlated? Does sending more fire fighters to a fire cause more fire damage? Or,

could the association be due to something else?

Below is some Matlab code for obtaining plots and statistics for the multivariate Swiss

head dimension data:

% gas masks. 6 measurements were taken on each soldier (facial height,

% width, etc.)

Chapter 4. Multivariate Models 101

and y2 . The correlation is nearly zero.

load swiss.dat;

bam = swiss(:,2); % Breadth of angulus mandibulae (chin width)

tfh = swiss(:,3); % True facial height

lgan = swiss(:,4); % Length from glabella to apex nasi (tip of nose to top of forehead)

ltn = swiss(:,5); % length from tragion to nasion (top of nose to ear)

ltg = swiss(:,6); % Length from tragion to gnathion (bottom of chin to ear)

plot(mfb, bam, *)

title(Swiss Head Data)

xlabel(Forehead Width)

ylabel(Chin Width)

corr(swiss) % Compute the sample correlation matrix

Note that to access a particular variable (i.e. a column of the data set) call swiss, we

write swiss(:,1) for column 1, and so on.

For higher dimensional data, it is helpful to employ matrix notation. Suppose we

have p jointly distributed random variables Y1 , Y2 , . . . , Yp with means 1 , 2 , . . . , p

and variances 12 , 22 , . . . , p2 . For instance, in the Swiss head dimension example,

there were p = 6 head dimension variables recorded for each soldier. We can let the

Chapter 4. Multivariate Models 102

Y1

Y2

Y =

..

.

Yp

1

2

= ..

.

.

p

When we have more than two variables, we can compute covariances between each

pair of variables. These covariances are collected together in a p p matrix called

the covariance matrix. The diagonal elements of a covariance matrix correspond to

the variances of the random variables. The i-jth element of the covariance matrix

is the covariance between Yi and Yj . The covariance matrix is a symmetric matrix

because the covariance between Yi and Yj is the same as the covariance between Yj

and Yi . To illustrate, suppose we have a tri-variate distribution for Y1 , Y2 and Y3 .

Let 12 = cov(Y1 , Y2 ), 13 = cov(Y1 , Y3 ), and 23 = cov(Y2 , Y3 ), . Then the covariance

matrix, denoted by is

12 12 13

Covariance Matrix: = 12 22 23 .

13 23 32

E[(Y )(Y )0 ].

When we take the expected value of a random vector or a random matrix, we compute

the expected value of each term individually. For example,

E[Y1 ]

E[Y2 ]

E[Y ] =

.. .

.

E[Yp ]

0 Y 1 1 (Y1 1 )2 (Y1 1 )(Y2 2 )

(Y )(Y ) = ( Y 1 1 Y 2 2 ) = .

Y 2 2 (Y1 1 )(Y2 2 ) (Y2 2 )2

Therefore,

E[(Y1 1 )2 ] E[(Y1 1 )(Y2 2 )]

E[(Y )(Y )0 ] =

E(Y1 1 )(Y2 2 )] E[(Y2 2 )2 ]

Chapter 4. Multivariate Models 103

Of course, the population covariances (e.g. 12 ) and the population correlations are

typically unknown population parameters which must be estimated from the data.

Generally, multivariate data sets are organized so that each row corresponds to a new

p-dimensional observation and each column corresponds to the measurement on one

of the p variables. In other words, the data usually comes in the form of n rows for the

sample size and p columns for the p measured variables. For a p dimensional data set,

let yi1 equal the ith observation on the first variable, and yi2 equal the ith observation

on the second variable and so on for i = 1, 2, . . . , n. The sample covariance between

variables 1 and 2, denoted s12 is

n

X

s12 = (yi1 y1 )(yi2 y2 )/(n 1) (2)

i=1

where y1 and y2 are the sample means of the first and second variables respectively.

We can estimate the covariance matrix by replacing the population variances and

covariances by their respective estimators this will be called the sample covariance

matrix and is generally denoted by S.

Example. Consider once again the Swiss head dimension data consisting of p = 6

head measurements. Denote these measurements by

Y2 = BAM = Breadth of angulus mandibulae (chin width)

Y3 = TFH = True facial height

Y4 = LGAN = Length from glabella to apex nasi (tip of nose to top of forehead)

Y5 = LTN = length from tragion to nasion (top of nose to ear)

Y6 = LTG = Length from tragion to gnathion (bottom of chin to ear).

To give an indication of what the data looks like, below is a list of the fist 20 obser-

vations:

Chapter 4. Multivariate Models 104

113.2 111.7 119.6 53.9 127.4 143.6

117.6 117.3 121.2 47.7 124.7 143.9

112.3 124.7 131.6 56.7 123.4 149.3

116.2 110.5 114.2 57.9 121.6 140.9

112.9 111.3 114.3 51.5 119.9 133.5

104.2 114.3 116.5 49.9 122.9 136.7

110.7 116.9 128.5 56.8 118.1 134.7

105.0 119.2 121.1 52.2 117.3 131.4

115.9 118.5 120.4 60.2 123.0 146.8

96.8 108.4 109.5 51.9 120.1 132.2

110.7 117.5 115.4 55.2 125.0 140.6

108.4 113.7 122.2 56.2 124.5 146.3

104.1 116.0 124.3 49.8 121.8 138.1

107.9 115.2 129.4 62.2 121.6 137.9

106.4 109.0 114.9 56.8 120.1 129.5

112.7 118.0 117.4 53.0 128.3 141.6

109.9 105.2 122.2 56.6 122.2 137.8

116.6 119.5 130.6 53.0 124.0 135.3

109.9 113.5 125.7 62.8 122.7 139.5

107.1 110.7 121.7 52.1 118.6 141.6

To get a better feel for the data, Figure 3 shows scatterplots of each pair of variables.

The sample mean vector for the entire data set is given by

114.7245

y1

115.9140

y2 123.0550

y = .. =

. 57.9885

y6 122.2340

138.8335

26.9012 12.6229 5.3834 2.9313 8.1767 12.1073

12.6229 27.2522 2.8805 2.0575 7.1255 11.4412

5.3834 2.8805 35.2300 10.3692 6.0275 7.9725

S= .

2.9313 2.0575 10.3692 17.8453 2.9194 4.9936

8.1767 7.1255 6.0275 2.9194 15.3702 14.5213

12.1073 11.4412 7.9725 4.9936 14.5213 31.8369

Matlab can compute these statistics easily the cov command to get the sample co-

variance matrix. Note that the covariance between the six head measurements are all

positive. It is quite common to see all positive covariances on data of this sort. For

example, if people with larger than average forehead widths will tend to also have

larger than average chin widths and so on.

Chapter 4. Multivariate Models 105

100 110 120 50 60 70 125 140

130

115

MFB

100

115

BAM

100

TFH 140

125

110

70

LGAN

60

50

135

125

LTN

115

140

LTG

125

Figure 3: Scatterplot matrix of each pair of variables in the Swiss head data. Note

that most pairs of variables are positively correlated.

Chapter 4. Multivariate Models 106

The sample correlations, typically denoted by r, are the sample counterpart to the

population correlation. For instance,

s12

r12 = . (3)

s1 s2

We can collect the sample correlations together into a correlation matrix, denoted

by R where the i-jth element of the matrix is rij , the sample correlation between

the ith and the jth variables. Note that the correlation between a random variable

with itself is always 1 (the same goes for sample correlations). Therefore, correlation

matrices always have ones down the diagonal. For the Swiss head dimension data,

the sample correlation matrix is

1.0000 0.4662 0.1749 0.1338 0.4021 0.4137

0.4662 1.0000 0.0930 0.0933 0.3482 0.3884

0.1749 0.0930 1.0000 0.4135 0.2590 0.2381

R=

0.1338

.

0.0933 0.4135 1.0000 0.1763 0.2095

0.4021 0.3482 0.2590 0.1763 1.0000 0.6564

0.4137 0.3884 0.2381 0.2095 0.6564 1.0000

Note that the highest correlation r56 = 0.6564 is between LTN and LTG, the distances

from the top of the nose to the ear and the distance from the bottom of the chin to the

ear. The correlation between chin width (BAM) and facial height (TFH) is relatively

quite small (r23 = 0.0930). Also, the correlation between chin width and the distance

from the top the nose to the ear is also relatively quite small (r24 = 0.0933). Looking

at Figure 3, one can see a weak association between BAM and TFH and a strong

association between LTN and LTG.

Recall that a normal random variable Y with mean and variance 2 has a probability

density function (pdf) of

1 1

f (y) = exp{ 2 (y )2 },

2 2 2

for < y < . It is easy to generalize the this univariate pdf to a multivariate nor-

mal pdf. Let Y = (Y1 , Y2 , . . . , Yp )0 denote a multivariate normal random vector with

mean vector = (1 , 2 , . . . , p ) and covariance matrix . To obtain the multivariate

density function, we replace (y )2 / 2 in the exponential exponent by

(y )0 1 (y ),

and we replace the 1/ scaler by the determinant of raised to the 1/2 power:

||1/2 . The p-dimensional normal pdf can be written

1 p/2 1/2 1

f (y1 , y2 , . . . , yp ) = ( ) || exp{ (y )0 1 (y )}, (4)

2 2

for y <p .

It is informative to note that if we set the expression (y )0 1 (y ) in the

exponent of the multivariate normal density equal to a constant, the resulting equation

Chapter 4. Multivariate Models 107

Figure 4: The bivariate normal pdf (4) for the Swiss head dimension variables LTN

and LTG.

patterns are used for forming multivariate confidence regions and multivariate critical

regions for hypothesis testing.

Introductory textbooks typically refrain from using matrix notation when expressing

the multivariate normal pdf given in (4). However (4) is fairly easy to write down

in matrix notation compared to what one would get writing it down without matrix

notation. For example, for p = 2 dimensions, we can write out the bivariate normal

pdf as

1 1 y1 1 2 y 1 1 y 2 2 y 2 2 2

f (y1 , y2 ) = exp{ 2

[( ) 2( )( )+( ) ] ],

21 2 1 2 2(1 ) 1 2 2 2

for < y1 < , and < y2 < . This looks quite complicated. The expression

for p = 3 or more dimensions becomes even more of a mess to write out without

matrix notation but (4) stays the same regardless of the dimension.

To get an idea of what a multivariate normal pdf looks like, Figure 4 shows a bivariate

normal pdf for the LTN and LTG variables from the Swiss head dimension data. The

bivariate normal pdf looks like a mountain centered over the mean of the distribution.

In order to compute probabilities using the pdf, one needs to compute the volume

under the pdf surface corresponding to the region of interest.

Chapter 4. Multivariate Models 108

5 Confidence Regions

Confidence intervals were introduced for estimating a single parameter such as the

mean of a distribution. In the multivariate setting, we can similarly define confidence

regions for vectors of parameters, such as the mean vector . To illustrate matters, we

shall consider two of the Swiss head dimension variables: LTN and LTG. Since they

correspond to the 5th and 6th variables, the mean vector of interest is = (5 , 6 )0 .

There are two approaches. One method is to simply compute two univariate confi-

dence intervals separately for 1 and 2 and form the Cartesian product of the two

intervals to obtain a confidence rectangle. However, if we compute say 95% confidence

intervals for 1 and 2 , then the joint confidence region (the rectangle) has a lower

confidence level. To understand why, consider an analogy: if there is a 5% chance Ill

get a speeding ticket on a given day when I drive to work. Then the probability I

get at least one ticket during the year is certainly higher than 5%. Similarly, if there

is a 5% probability that each random interval does not contain its respective mean,

then the probability that at least one of the intervals does not contain their respective

mean is higher than 5%. A simple (but not always efficient) fix to this problem is to

use what is known as the Bonferroni adjustment. If you form p confidence intervals

for p parameters using a confidence level 1 for the family of parameters, then one

can compute a confidence interval for each parameter separately using a confidence

level of 1/p to guarantee that the confidence level is at least 1 for all p intervals

considered jointly.

A more efficient approach for estimating a mean vector is to incorporate the correla-

tions between the estimate parameters. For multivariate normal data, the resulting

confidence regions have ellipsoidal shapes. Instead of determining a random interval

that covers a mean with high probability, we want to determine a region that covers

the vector with high probability.

The solution to this problem requires introducing another probability distribution

known as the F -distribution which results when we look at statistics formed by ratios

of variance estimates. The F -distribution is used extensively in analysis of variance

(ANOVA) applications where we want to compare several means. Because an F

random variable is defined in terms of a ratio of variance estimators, and variance

estimators depend on a degrees of freedom, the F -distribution is specified by a nu-

merator and a denominator degrees of freedom. The F -distribution is skewed to the

right and takes only positive values. Critical values for the F -distribution can be

found beginning on page 202 in the Appendix. Let Fp,np () denote the critical

value of an F -distribution on p numerator degrees of freedom and n p denominator

degrees of freedom.

Returning to the confidence region problem, one can show (e.g., Johnson and Wichern,

1998, page 179) that for a sample of size n from a p-dimensional normal distribution

(n 1)p

P (n(Y )0 S 1 (Y ) Fp,np ()) = 1 .

(n p)

This statement shows that a (1 )100% confidence region for the mean of a p-

Chapter 4. Multivariate Models 109

Figure 5: A 95% confidence ellipse for the Swiss head dimension data using only the

variables LTN and LTG.

dimensional normal distribution is given by the set of <p that satisfy the in-

equality:

(n 1)p

n(y )0 S 1 (y ) Fp,np ().

(n p)

The inequality defines a p-dimensional ellipsoid centered at y. To determine if a

hypothesized value of lies in this region, simply plug it into the expression and see

if the inequality is satisfied or not. Figure 5 shows a 95% confidence ellipse for the

Swiss head data for variables Y5 = LTN and Y6 = LTG.

Multivariate statistics is a broad field of statistics and we have only introduced some

of the most basic ideas. Additional topics in multivariate analysis (such as principal

component analysis, discriminant analysis, cluster analysis, cannonical correlations,

MANOVA) take the correlations between variables into consideration to solve various

problems.

Problems

1. Data on felled black cherry trees was collected (Ryan et al., 1976). The measured

variables were the diameter (in inches measured from 4.5 feet above the ground),

the height (measured in feet) and the volume (measured in cubic feet). The full

data set appear in the following table:

Chapter 4. Multivariate Models 110

x1 x2 x3

Diameter Height Volume (xi1 x1 ) (xi2 x2 ) (xi1 x1 )(xi2 x2 )

11.0 75 18.2

11.1 80 22.6

11.2 75 19.9

11.3 79 24.2

11.4 76 21.0

11.4 76 21.4

11.7 69 21.3

12.0 75 19.1

12.9 74 22.2

12.9 85 33.8

13.3 86 27.4

13.7 71 25.7

13.8 64 24.9

14.0 78 34.5

14.2 80 31.7

14.5 74 36.3

16.0 72 38.3

16.3 77 42.6

17.3 81 55.4

17.5 82 55.7

17.9 80 58.3

18.0 80 51.5

18.0 80 51.0

20.6 87 77.0

a) Before analyzing the data, do you expect the correlations between these

three variables to be negative, positive or zero? A scatterplot matrix of

the data is plotted in Figure 6

tedious, we shall attempt to get a feel for the covariance between x1 , the

diameter and x2 , the height. The sample means for the three variables are

y1 = 13.248, y2 = 76.00, and y3 = 30.17. In the table above, put a + in the

column (xi1 x1 ) if the ith diameter is higher than the average diameter

and put a if the ith diameter is lower than the mean value. Do the

same thing for the heights in the column labelled (xi2 x2 ). If both these

differences are positive, or they are both negative, put a + in the column

labelled (xi1 x1 )(xi2 x2 ) and a otherwise. To illustrate, here is how

to do this for the first row:

x1 x2 x3

Diameter Height Volume (xi1 x1 ) (xi2 x2 ) (xi1 x1 )(xi2 x2 )

11.0 75 18.2 +

The sample covariance is basically the average of the product (xi1 x1 )(xi2

Chapter 4. Multivariate Models 111

x2 ). From the list of + and s, does it appear the covariance will be pos-

itive or negative?

c) The sample covariance matrix for the entire data set is given by

9.85 10.38 49.89

S = 10.38 40.60 62.66 .

49.89 62.66 270.20

Compute the sample correlation matrix (using (3)) from the covariance

matrix.

d) The purpose of this study was to predict the volume of wood of the tree

using the diameter and/or height. If you had to choose one of the variables

(height or diameter) for predicting the volume of the tree, which would you

choose from a purely statistical point of view (note that for trees that have

not been cut, it would be much more difficult to measure the height than

the diameter). What was the basis for your choice?

e) If we convert the diameter measurements from units of inches to feet,

then we would need to divide each diameter measurement by 12. Let

xi1 = yi1 /12 denote the diameter measurements in units of feet. Compute

the sample variance of the xi1 measurements. Also, compute the sample

correlation between the diameter (in feet) and the height of the cherry

trees.

In this appendix we give a brief review of some of the basics of matrix algebra. A

matrix is simply an array of numbers. Let n denote the number of rows and p denote

the number of columns in an array. Matrices are denoted by boldface letters. For

example, let A denote a matrix with n = 3 rows and p = 2 columns. Then we say

that A is a n p matrix, which in this case, A is a 3 2 matrix. A special case of

a matrix is a vector which is simple a matrix with a single column (a column vector)

or a single row (a row vector). By convention, whenever we denote a vector, we shall

assume it is a column vector. One can regard an n 1 column vector as a point in

n-dimensional Euclidean space

To illustrate matters, let x denote a 3 1 column vector and A denote a 3 2 matrix

defined as follows:

1 2 3

x = 2, A = 4 5.

3 1 6

We can perform operations on vectors and matrices such as summation, subtraction,

multiplication.

The transpose of a matrix means to simply change the columns to vectors and is

denoted by a prime: A0 is the transpose of A. Thus,

x0 = ( 1, 2, 3 )

Chapter 4. Multivariate Models 112

65 70 75 80 85

20

18

16

Girth

14

12

10

8

85

80

Height

75

70

65

70

60

50

Volume

40

30

20

10

8 10 12 14 16 18 20 10 20 30 40 50 60 70

Figure 6: Scatterplots of the black cherry tree data. Here, Girth = diameter

and

2 4 1

A0 = .

3 5 6

To multiply a matrix by a number (i.e. a scalar), one just multiplies each element of

the matrix by the scalar. For instance, if c = 2 then

2 3 4 6

cA = 2 4 5 = 8 10 .

1 6 2 12

In order to add two matrices together, they must both be of the same dimensions in

which case you just add the corresponding components together (or subtract if you

are subtracting matrices). We cannot add the vector x to the matrix A because they

are not of the same dimension. However, if

4

y = 5

6

then

1 4 5

x + y = 2 + 5 = 7.

3 6 9

Chapter 4. Multivariate Models 113

Let

a11 a12 b11 b12

A = a21 a22 and B = b21 b22 ,

a31 a32 b31 b32

then

a11 a12 b11 b12 a11 + b11 a12 + b12

A + B = a21 a22 + b21 b22 = a21 + b21 a22 + b22 .

a31 a32 b31 b32 a31 + b31 a32 + b32

Note that the ijth entry in the matrix A for the ith row and the jth column is denoted

aij . Thus, the first index specifies the row number and the second index specifies the

column number.

matrices together. First of all, matrix multiplication is not commutative as we shall

see. If A and B are matrices and we want to form the product AB then the number

of columns of A must match the number of rows of B. Suppose A has dimension

n p and B has dimension p q, then the product AB will have dimension n q.

To illustrate, let us first compute the product of two vectors a and b say where

a = ( a11 , a12 , a13 ) and

b11

b = b21 .

b31

Since a is a 1 3 row vector and b is a 3 1 column vector, we can form the product

ab since the number of columns of a equals the number of rows of b. The product

ab is defined as

b11

ab = ( a11 , a12 , a13 ) b21 = a11 b11 + a12 b21 + a13 b31 .

b31

Now consider the product of two matrices A and B. Think of each row of A as a

row vector and each column of B as a column vector. The the ijth element of the

product AB is defined to be the product of the ith row of A times the jth column

of B. To illustrate, let A denote a 3 2 matrix and B denote a 2 4 matrix:

a11 a12

b b12 b13 b14

A = a21 a22 and B = 11 ,

b21 b22 b23 b24

a31 a32

then AB is a 3 4 matrix computed as

a11 b11 + a12 b21 a11 b12 + a12 b22 a11 b13 + a12 b23 a11 b14 + a12 b24

AB = a21 b11 + a22 b21 a21 b12 + a22 b22 a21 b13 + a22 b23 a21 b14 + a22 b24 .

a31 b11 + a32 b21 a31 b12 + a32 b22 a31 b13 + a32 b23 a31 b14 + a32 b24

In this example, we cannot form the product BA since the number of columns of B

does not match the number of rows of A.

Consider again the multiplication of two vectors a = ( a11 , a12 , a13 ) and

b11

b = b21

.

b31

Chapter 4. Multivariate Models 114

We saw how to compute the product ab. Note that we can also form the product

ba since b is a 3 1 column vector and a is a 1 3 row vector, i.e. the number of

columns of b matches the number of rows of a and the product will be a matrix of

dimension 3 3:

b11 b11 a11 b11 a12 b11 a13

ba = b21 ( a11 , a12 , a13 ) b21 a11 b21 a12 b21 a13

.

b31 b31 a11 b31 a12 b31 a13

A square matrix is a matrix with the same number of rows as columns.

A diagonal matrix is a matrix with all zeros except along the main diagonal.

An important special case of a square diagonal matrix is the identity matrix denoted

by I. The identity matrix is a square matrix whose diagonal elements are ones and

the off diagonal elements are all zero. For instance, the 3 3 identity matrix is

1 0 0

I = 0 1 0.

0 0 1

The reason we call this the identity matrix is that it acts as the multiplicative identity

element: for any matrix A, we have

AI = A

and

IA = A

provided the matrix multiplications are defined (one can easily verify these relations).

A symmetric matrix is any matrix A such that A = A0 , that is, A is equal to its

transpose. For example

2 1

A=

1 2

is symmetric. Covariance matrices are always symmetric.

Two (column) vectors a and b of the same dimension are orthogonal if a0 b = 0.

Geometrically speaking, if we think of a vector as an array extending from the origin

to the point represented by the vector, then orthogonal vectors are perpendicular to

each other. For example, if a = ( 1, 1 )0 and b = ( 1, 1 )0 , then

0 1

a b = ( 1, 1 ) = 1 1 1 1 = 0.

1

Figure 7 illustrates the geometric property of the orthogonal vectors.

Inverses. For a scalar such as 5, its inverse is simply 1/5 = 51 and 5(1/5) = 1. All

real numbers have an inverse except zero. Let A denote a square p p matrix. The

inverse of A, if it exists, is denoted A1 and is the p p matrix such that

AA1 = A1 A = I, (the identity matrix).

Chapter 4. Multivariate Models 115

In order for a matrix A to have an inverse, its columns must be linearly independent

which means that no column of A can be expressed as a linear combination of the

other columns of A. Such matrices are call nonsingular. Thus a singular matrix does

not have an inverse. Finding the inverse of a matrix is somewhat tedious for higher

dimensional matrices. However, for a 2 2 matrix, there is a simple formula. If

a11 a12

A= ,

a21 a22

then

1 a22 a12

A1 = .

a11 a22 a12 a21 a21 a11

Inverse matrices are needed in order to define the multivariate normal density function

and understanding multivariate distance. The inverse of diagonal matrices are easy

to compute:

1 .

0 0 0

a11 0 0 0 1 a11

0 1

0

a22 0 0 a22

0 0

.. .. .. .. =

... .. .. ..

. . . . . . .

0 0 0 app 0 0 0 1

app

In order for the inverse of a diagonal matrix to exist, all the diagonal elements must

be nonzero. For higher dimensional non-diagonal matrices, computer software such

as matlab can be used to compute inverses of matrices.

Another matrix operation needed for the multivariate normal density is the deter-

minant of a matrix denoted by |A| (also denoted by det(A)). The computation of

the determinant is rather tedious for square matrices of dimension higher than 3 3

and again, software such as matlab can be used to compute determinants. In the case

Chapter 4. Multivariate Models 116

Figure 8: The determinant is the area of the parallelogram formed from the column

vectors that make up the matrix.

of a 2 2 matrix A where

a11 a12

A= ,

a21 a22

the formula is quite simple:

Thus, if

3 4

A= ,

2 5

then |A| = 3 5 4 2 = 15 8 = 7. One way to think of the determinant of a

matrix is to look at the two column vectors of A. If we plot these two vectors, they

form two edges of a parallelogram as seen in Figure 8. The determinant of A is the

(signed) area of the parallelogram. For higher dimensional matrices, the columns of

the matrix are the vertices of a parallelepiped and the determinant is equal to the

(signed) volume of the parallelepiped.

The determinant of a singular matrix is zero. For instance, suppose

1 4

A= .

2 8

Then, the second column of A is just 2 times the first column of A and therefore A

is singular. The determinant of A is |A| = 8 8 = 0. Since the second column of

A is 2 times the first column of A, the vertices of the parallelogram formed by these

two columns coincide and hence the area of the resulting parallelogram is zero.

Note that in the formula for the multivariate normal distribution, we divide by the

determinant of the covariance matrix. If the determinant is zero, then the distribution

Chapter 4. Multivariate Models 117

does not have a density. To understand what this means, consider a bivariate normal

random vector (Y1 , Y2 )0 . If the covariance matrix has determinant zero, then that

means that Y2 is a linear function of Y1 and the two random variables are perfectly

correlated (i.e. correlation equal to 1). In a scatterplot such as Figure 5, if Y1

and Y2 were perfectly correlated, then the points would lie exactly in a line and the

confidence ellipse would shrink to a line. The bivariate density assigns probability by

computing the volume under the density surface (as shown in Figure 4). However, if

the entire distribution is concentrated on a line in the plane, then the volume under

the density, if it existed, would be zero. In other words, the distribution is degenerate.

Problems

1. Let

3 4 2 1

A= and B = .

4 7 1 2

Find the following:

a) A + B.

b) AB.

c) BA. (Is AB = BA?)

d) A1 . Verify that your answer is correct by confirming that AA1 = I.

e) |A|

2. Let

1 x1

1 x2

X=

1 x3

.

1 x4

1 x5

Find the following:

a) X 0 X.

b) (X 0 X)1 .

References

Flury, B. (1997). A First Course in Multivariate Statistics. Springer, New York.

Johnson, R. A. and Wichern, D. W. (1998). Applied Multivariate Statistical Analysis.

Prentice Hall, New Jersey.

Ryan, T. A., Joiner, B. L., and Ryan, B. F. (1976). The Minitab Student Handbook.

Duxbury Press, California.

- Answers to Linear Algebra Book JoshuaЗагружено:jerome_weir
- Quantitative Methods for EconomistsЗагружено:yula
- Scilab Tutorials for Computational ScienceЗагружено:scribdipedia
- DeterminantЗагружено:Maths Home Work 123
- 2.- Eigenvalues and Eigenvectors (Part 1)Загружено:Ana Juárez
- ÌÇÊ ËÈЗагружено:openid_djXMs7FP
- Complete_Math1_rev2014(3)Загружено:Panagiotis Doykeris
- chapter3_research design and methodologyЗагружено:api-19983487
- AIEEE Maths QuickReviewЗагружено:akashraj7713
- week 7Загружено:api-224015314
- Matrices and Determinant Maths TheoryЗагружено:Kapil Gupta
- Mathematics Subject Test Sample Questions MCQs (31-40)Загружено:Majid Qasmi
- On approximation of functions by exponential sumsЗагружено:lumonzon
- week 7Загружено:api-224015314
- MA106 - Linear AlgebraЗагружено:Rebecca Rumsey
- pset1Загружено:michaelchungyay
- Calculus w 4Загружено:Bambang Gito Raharjo
- Stt363chapter4Загружено:Anonymous gUySMcpSq
- Advanced MathЗагружено:Carlo Galicia
- reЗагружено:stathiss11
- antonchap1 moreinversesЗагружено:api-261282952
- Brealey - Fundamentals of Corporate Finance 3e (McGraw, 2001) (1).pdfЗагружено:Henizion
- 12 Mathematics Impq Matrices and Determinants 02Загружено:Ashok Pradhan
- Matrices pdf.pdfЗагружено:Name
- Operator equationЗагружено:Fotis Kasolis
- AlignmentЗагружено:jhanak
- Chap2 ThomasЗагружено:LdTorrez Ldtc
- Data Science Training in MumbaiЗагружено:sushma93
- 140620 aveirowhw bookabstractsЗагружено:api-240446772
- DE_Basics_RЗагружено:Fuad Nur Azis

- 1057 Insights Right Here DownloadЗагружено:Pi
- ABn TestingЗагружено:Pi
- RentalBond Bonds Held as at July 2017Загружено:Pi
- Clustering SystemЗагружено:Duong Duc Hung
- Ainsworth Dental Wax MsdsЗагружено:Pi
- Growing By Adapting At Speed.pdfЗагружено:Anonymous gUySMcpSq
- Manual de Agrupamento - SystatЗагружено:Leandrofsv
- CorrelationЗагружено:Pi
- SPE_ RMD_v1_Refractory Model Dip.pdfЗагружено:Anonymous gUySMcpSq
- Pdq Die Hardener Rev 2011Загружено:Pi
- Teradata InelliCloudЗагружено:Pi
- 13th September 2017Загружено:Pi
- 19914 Sdb EnuЗагружено:Pi
- Li Quist OneЗагружено:Pi
- Star AllianceЗагружено:Pi
- Leading in the 21st Century an Interview With Hertz Ceo Mark FrissoraЗагружено:Pi
- Robots Mean Business A Conversation With Rodney Brooks.pdfЗагружено:PETER
- Final Approach Heathrow Airport Prepares to Land Its Third RunwayЗагружено:Pi
- Understanding The Services Revolution.pdfЗагружено:Pi
- Methylated SpiritЗагружено:Pi
- Spe01495f MsdsЗагружено:PETER
- Self Curing 1Загружено:Pi
- Medical Oxygen, CompressedЗагружено:Pi
- Ips Margin Build Up LiquidsЗагружено:Pi
- File Vertex Ortoplast Polvere Scheda Di SicurezzaЗагружено:Pi
- C651Загружено:Pi
- ButaneЗагружено:Ratu Fajrina Hanifa
- 19928 Sdb Enu.pdfЗагружено:Pi
- Austenal Electro Pol LiquidЗагружено:Pi

- Rdb Design DgЗагружено:Nelson Fleig
- Catalog Diode ZenerЗагружено:Borcan Cristi
- What is Oracle Shopfloor ManagementЗагружено:Ramesh Garikapati
- ETABS 2015 FeaturesЗагружено:MREFAAT
- 02 Controllers W Part1Загружено:ReginaldoFerreira
- Hassan Ali ResumeЗагружено:ahsanliaqat426
- QuotationЗагружено:Anjo Alba
- c1Загружено:swapnanilpt
- HW12Загружено:Sajjad
- usb l16xc data sheet 2013Загружено:api-170472102
- VMware Virtual SAN DatasheetЗагружено:aayoedhie
- AFS RestrictionsЗагружено:surve_bilal
- Smart DustЗагружено:Praveen Chand
- Qsoft User GuideЗагружено:Achmad Subuki
- ABB Excitation UNITROL 6000 Light brochure.pdfЗагружено:Gorakhnath Dagade
- Lesson 8 LO 2 2.1 - 2.3 Creating Bootable InstallerЗагружено:He Q. Launio
- Manual_SAC.pdfЗагружено:Paulo Feijão
- What is Fundamental Test Process in Software TestingЗагружено:Prashant Rai
- Autopilot HistoryЗагружено:Kamesh Prasad
- Princeton MF Resume BookЗагружено:Alex Hang Li
- GRE_VPN_LabЗагружено:Moïse Guilavogui
- StatoilOffshoreCranes.pdfЗагружено:fazeli
- An Effective Means of Transmission Line Protegtion ByЗагружено:ilg1
- FortiGate-200D-QuickStartЗагружено:Chu Gia Khôi
- Attention Age DoctrineЗагружено:Priscilla Price
- Cooperative Provable Data Possession for Integarity Verification in Multi-Cloud StorageЗагружено:Nagarjuna Reddy
- Line Current Differential Testing-A Practical ApproachЗагружено:mentong
- 3340920Загружено:Cameron Marconi
- MEMORY MANAGEMENT IBM OS 360Загружено:api-3700456
- BWP Demand Supply PlanningЗагружено:khalid_w_saif