Академический Документы
Профессиональный Документы
Культура Документы
between objects based on variables measured on those objects. These chapters should be reviewed before Chapters 6 to 12, which cover the most
important multivariate procedures in current use. The nal Chapter 13 contains some general comments about the analysis of multivariate data.
The chapters in this third edition of the book are the same as those in
the second edition. The changes that have been made for the new edition
are the updating of references, some new examples, some examples carried
out using newer computer software, and changes in the text to reect new
ideas about multivariate analyses.
In making changes, I have continually kept in mind the original intention
of the book, which was that it should be as short as possible and attempt to
do no more than take readers to the stage where they can begin to use
multivariate methods in an intelligent manner.
I am indebted to many people for commenting on the text of the three
editions of the book and for pointing out various errors. Particularly, I thank
Earl Bardsley, John Harraway, and Liliana Gonzalez for their help in this
respect. Any errors that remain are my responsibility alone.
I would like to express my appreciation to the Department of Mathematics and Statistics at the University of Otago in New Zealand for hosting
me as a visitor twice in 2003, rst in May and June, and later in November
and December. The excellent university library was particularly important
for my nal updating of references.
In conclusion, I wish to thank the staff of Chapman and Hall and of CRC
for their work over the years in promoting the book and encouraging me to
produce the second and third editions.
Bryan F.J. Manly
Laramie, Wyoming
chapter one
Example 1.1
X1
(mm)
X2
(mm)
X3
(mm)
X4
(mm)
X5
(mm)
156
154
153
153
155
163
157
155
164
158
158
160
161
157
157
156
158
153
155
163
159
155
156
160
152
160
155
157
165
153
162
162
159
159
155
162
152
159
155
163
163
156
159
161
155
245
240
240
236
243
247
238
239
248
238
240
244
246
245
235
237
244
238
236
246
236
240
240
242
232
250
237
245
245
231
239
243
245
247
243
252
230
242
238
249
242
237
238
245
235
31.6
30.4
31.0
30.9
31.5
32.0
30.9
32.8
32.7
31.0
31.3
31.1
32.3
32.0
31.5
30.9
31.4
30.5
30.3
32.5
31.5
31.4
31.5
32.6
30.3
31.7
31.0
32.2
33.1
30.1
30.3
31.6
31.8
30.9
30.9
31.9
30.4
30.8
31.2
33.4
31.0
31.7
31.5
32.1
30.7
18.5
17.9
18.4
17.7
18.6
19.0
18.4
18.6
19.1
18.8
18.6
18.6
19.3
19.1
18.1
18.0
18.5
18.2
18.5
18.6
18.0
18.0
18.2
18.8
17.2
18.8
18.5
19.5
19.8
17.3
18.0
18.8
18.5
18.1
18.5
19.1
17.3
18.2
17.9
19.5
18.1
18.2
18.4
19.1
17.7
20.5
19.6
20.6
20.2
20.3
20.9
20.2
21.2
21.1
22.0
22.0
20.5
21.8
20.0
19.8
20.3
21.6
20.9
20.1
21.9
21.5
20.7
20.6
21.7
19.8
22.5
20.0
21.4
22.7
19.8
23.1
21.3
21.7
19.0
21.3
22.2
18.6
20.5
19.3
22.8
20.7
20.3
20.3
20.8
19.6
(continued)
Chapter one:
X1
(mm)
X2
(mm)
X3
(mm)
X4
(mm)
X5
(mm)
162
153
162
164
247
237
245
248
31.9
30.6
32.5
32.3
19.1
18.6
18.5
18.8
20.4
20.4
21.1
20.9
Note: X1 = total length, X2 = alar extent, X3 = length of beak and head, X4 = length of
humerus, and X5 = length of keel of sternum. Birds 1 to 21 survived, and birds 22 to
49 died. The data source is Bumpus (1898), who measured in inches and millimeters.
Source: Adapted from Bumpus, H.C. (1898), Biological Lectures, 11th Lecture, Marine Biology
Laboratory, Woods Hole, MA, pp. 209226.
close to the average survived better than individuals with measurements far
from the average.
In fact, the development of multivariate statistical methods had hardly
begun in 1898 when Bumpus was writing. The correlation coefcient as a
measure of the relationship between two variables was devised by Francis
Galton in 1877. However, it was another 56 years before Harold Hotelling
described a practical method for carrying out a principal components analysis, which is one of the simplest multivariate analyses that can usefully be
applied to Bumpus data. Bumpus did not even calculate standard deviations. Nevertheless, his methods of analysis were sensible. Many authors
have reanalyzed his data and, in general, have conrmed his conclusions.
Taking the data as an example for illustrating multivariate methods,
several interesting questions arise. In particular:
1. How are the various measurements related? For example, does a
large value for one of the variables tend to occur with large values
for the other variables?
2. Do the survivors and nonsurvivors have statistically signicant differences for their mean values of the variables?
3. Do the survivors and nonsurvivors show similar amounts of variation for the variables?
4. If the survivors and nonsurvivors do differ in terms of the distributions of the variables, then is it possible to construct some function
of these variables that separates the two groups? It would then be
convenient if large values of the function tended to occur with the
survivors, as the function would then apparently be an index of the
Darwinian tness of the sparrows.
Example 1.2
Egyptian skulls
For a second example, consider the data shown in Table 1.2 for measurements
made on male skulls from the area of Thebes in Egypt. There are ve samples
of 30 skulls each from the early predynastic period (circa 4000 B.C.), the late
1
2
3
4
5
6
7
8
9
10
Late Predynastic
X1
X2
X3
X4
X1
131
125
131
119
136
138
139
125
131
134
124
133
138
148
126
135
132
133
131
133
137
129
132
130
134
140
138
136
136
126
138
131
132
132
143
137
130
136
134
134
89
92
99
96
100
89
108
93
102
99
49
48
50
44
54
56
48
48
51
51
138
134
134
129
124
136
145
130
134
125
101
97
98
104
95
98
100
102
96
94
48
48
45
51
45
52
54
48
50
46
96
93
87
106
96
98
95
99
92
95
52
47
48
50
45
50
47
55
46
56
Ptolemaic Period
X1
X2
X3
X4
X1
137
141
141
135
133
131
140
139
140
138
137
136
128
130
138
126
136
126
132
139
134
128
130
131
120
135
137
130
134
140
107
95
87
99
91
90
94
90
90
100
54
53
49
51
46
50
60
48
51
52
Roman Period
X2
X3
X4
123
131
126
134
127
138
138
126
132
135
91
95
91
92
86
101
97
92
99
92
50
49
57
52
47
52
58
45
55
54
Skull
Early Predynastic
X2
X3
X4
X1
103
93
96
101
102
103
93
103
104
100
93
106
114
101
50
53
50
49
51
47
53
50
49
55
53
48
54
46
138
123
130
134
137
126
135
129
134
131
132
130
135
130
129
131
129
130
136
131
136
126
139
134
130
132
132
128
107
101
105
93
106
100
97
91
101
90
104
93
98
101
53
51
47
54
49
48
52
50
49
53
50
52
54
51
129
134
138
136
132
133
138
130
136
134
136
133
138
138
135
125
134
135
130
131
137
127
133
123
137
131
133
133
92
90
96
94
91
100
94
99
91
95
101
96
100
91
50
60
51
53
52
50
51
45
49
52
54
49
55
46
131
129
136
131
139
144
141
130
133
138
131
136
132
135
141
135
128
125
130
124
131
131
128
126
142
138
136
130
99
95
93
88
94
86
97
98
92
97
95
94
92
100
55
47
54
48
53
50
53
53
51
54
53
55
52
51
138
137
133
145
138
131
143
134
132
137
129
140
147
136
125
135
125
129
136
129
126
124
127
125
128
135
129
133
99
96
92
89
92
97
88
91
97
85
81
103
87
97
51
54
50
47
46
44
54
55
52
57
52
48
48
51
137
133
136
131
133
135
124
134
130
135
132
129
136
138
135
132
139
132
126
135
134
128
130
138
128
127
131
124
Chapter one:
17
18
19
20
21
22
23
24
25
26
27
28
29
30
x2
x1
x4
x3
predynastic period (circa 3300 B.C.), the 12th and 13th dynasties (circa
1850 B.C.), the Ptolemaic period (circa 200 B.C.), and the Roman period
(circa A.D. 150). Four measurements are available on each skull, as illustrated
in Figure 1.1.
For this example, some interesting questions are:
1. How are the four measurements related?
2. Are there statistically signicant differences in the sample means for
the variables, and if so, do these differences reect gradual changes
over time in the shape and size of skulls?
3. Are there signicant differences in the sample standard deviations
for the variables, and, if so, do these differences reect gradual changes over time in the amount of variation?
4. Is it possible to construct a function of the four variables that, in some
sense, describes the changes over time?
These questions are, of course, rather similar to the ones suggested for
Example 1.1.
It will be seen later that there are differences between the ve samples
that can be explained partly as time trends. It must be said, however, that
the reasons for the apparent changes are unknown. Migration of other races
into the region may well have been the most important factor.
Example 1.3
Distribution of a buttery
SS
SB
WSB
JRC
JRH
SJ
CR
UO
LO
DP
PZ
MC
IF
AF
GH
GL
a
Annual
Precipitation
(in.)
500
808
570
550
550
380
930
650
600
1,500
1,750
2,000
2,500
2,000
7,850
10,500
43
20
28
28
28
15
21
10
10
19
22
58
34
21
42
50
Temperature (F)
Maximum
Minimum
98
92
98
98
98
99
99
101
101
99
101
100
102
105
84
81
17
32
26
26
26
28
28
27
27
23
27
18
16
20
5
12
3
16
6
4
1
2
0
21
26
1
4
7
9
7
5
3
22
20
28
19
8
19
15
40
32
6
34
14
15
17
7
1
57
38
46
47
50
44
50
25
28
80
33
66
47
32
84
92
17
13
17
27
35
32
27
4
0
12
22
13
21
27
4
4
1
13
3
3
6
3
8
0
0
1
6
0
8
14
0
0
The data source was McKechnie et al. (1975), with the environmental variables rounded to integers for simplicity. The original data
Colony
Altitude
(ft)
Table 1.3 Environmental Variables and Phosphoglucose-Isomerase (Pgi) Gene Frequencies for Colonies of the Buttery
Euphydryas editha in California and Oregona
MC
DP
SB
GL
IF
WSB
JRC
CR
AF
GH
JRH
SJ
PZ
California
Scale
50 100
UO
Miles
LO
describe the genetic distribution of the buttery to some extent. Figure 1.2
shows the geographical locations of the colonies.
In this example, questions that can be asked include:
1. Are the Pgi frequencies similar for colonies that are close in space?
2. To what extent, if any, are the Pgi frequencies related to the environmental variables?
These are important questions in trying to decide how the Pgi frequencies are determined. If the genetic composition of the colonies was largely
determined by past and present migration, then gene frequencies will tend
to be similar for colonies that are close in space, although they may show
little relationship with the environmental variables. On the other hand, if it
is the environment that is most important, then this should show up in
relationships between the gene frequencies and the environmental variables
(assuming that the right variables have been measured), but close colonies
will only have similar gene frequencies if they have similar environments.
Obviously, colonies that are close in space will usually have similar environments, so it may be difcult to reach a clear conclusion on this matter.
Chapter one:
X1
(mm)
X2
(mm)
X3
(mm)
X4
(mm)
X5
(mm)
X6
(mm)
Modern dog
Golden jackal
Chinese wolf
Indian wolf
Cuon
Dingo
Prehistoric dog
9.7
8.1
13.5
11.5
10.7
9.6
10.3
21.0
16.7
27.3
24.3
23.5
22.6
22.1
19.4
18.3
26.8
24.5
21.4
21.1
19.1
7.7
7.0
10.6
9.3
8.5
8.3
8.1
32.0
30.3
41.9
40.0
28.8
34.4
32.2
36.5
32.9
48.1
44.6
37.6
43.1
35.0
Note: X1 = breadth of mandible; X2 = height of mandible below the rst molar; X3 = length of
the rst molar; X4 = breadth of the rst molar; X5 = length from rst to third molar,
inclusive; and X6 = length from rst to fourth premolar, inclusive.
Source: Adapted from Higham, C.F.W. et al. (1980), J. Archaeological Sci., 7, 149165.
Example 1.4
Excavations of prehistoric sites in northeast Thailand have produced a collection of canid (dog) bones covering a period from about 3500 B.C. to the
present. However, the origin of the prehistoric dog is not certain. It could
descend from the golden jackal (Canis aureus) or the wolf, but the wolf is not
native to Thailand. The nearest indigenous sources are western China (Canis
lupus chanco) or the Indian subcontinent (Canis lupus pallides).
In order to try to clarify the ancestors of the prehistoric dogs, mandible
(lower jaw) measurements were made on the available specimens. These
were then compared with the same measurements made on the golden
jackal, the Chinese wolf, and the Indian wolf. The comparisons were also
extended to include the dingo, which may have its origins in India; the cuon
(Cuon alpinus), which is indigenous to southeast Asia; and modern village
dogs from Thailand.
Table 1.4 gives mean values for the six mandible measurements for
specimens from all seven groups. The main question here is what the measurements suggest about the relationships between the groups and, in particular, how the prehistoric dog seems to relate to the other groups.
Example 1.5
Group
EU
EU
EU
EU
EU
EU
EU
EU
EU
EU
EU
EU
AGR
MIN
MAN
PS
CON
SER
FIN
SPS
TC
2.6
5.6
5.1
3.2
22.2
13.8
8.4
3.3
4.2
11.5
9.9
2.2
0.2
0.1
0.3
0.7
0.5
0.6
1.1
0.1
0.1
0.5
0.5
0.7
20.8
20.4
20.2
24.8
19.2
19.8
21.9
19.6
19.2
23.6
21.1
21.3
0.8
0.7
0.9
1.0
1.0
1.2
0.0
0.7
0.7
0.7
0.6
1.2
6.3
6.4
7.1
9.4
6.8
7.1
9.1
9.9
0.6
8.2
9.5
7.0
16.9
14.5
16.7
17.2
18.2
17.8
21.6
21.2
18.5
19.8
20.1
20.2
8.7
9.1
10.2
9.6
5.3
8.4
4.6
8.7
11.5
6.3
5.9
12.4
36.9
36.3
33.1
28.4
19.8
25.5
28.0
29.6
38.3
24.6
26.7
28.4
6.8
7.0
6.4
5.6
6.9
5.8
5.3
6.8
6.8
4.8
5.8
6.5
Country
Belgium
Denmark
France
Germany
Greece
Ireland
Italy
Luxembourg
Netherlands
Portugal
Spain
U.K.
10
Table 1.5 Percentages of the Workforce Employed in Nine Different Industry Groups in 30 Countries in Europe
19.4
0.0
37.3
28.9
3.9
2.6
0.0
2.2
0.3
0.0
0.6
0.9
0.0
35.0
0.0
0.0
24.1
37.9
28.8
38.7
19.0
6.8
27.9
15.3
0.0
0.0
0.0
0.0
0.9
2.0
0.0
2.2
0.5
2.0
1.5
0.2
3.4
6.7
8.4
6.4
6.3
5.8
10.2
8.1
9.1
16.9
4.6
5.2
3.3
9.4
10.2
13.3
10.3
6.9
7.9
13.8
23.7
24.5
10.2
12.4
15.3
1.5
1.6
0.0
1.3
0.6
0.6
3.1
6.7
10.8
3.9
2.4
0.0
20.9
22.9
27.3
24.5
15.3
25.6
19.1
21.2
34.0
41.6
14.5
3.0
7.5
6.9
8.8
5.2
6.8
8.4
7.8
6.0
5.0
7.2
4.4
Note: AGR, agriculture, forestry, and shing; MIN, mining and quarrying; MAN, manufacturing; PS, power and water supplies; CON,
construction; SER, services; FIN, nance; SPS, social and personal services; TC, transport and communications. The data for the
individual countries are for various years from 1989 to 1995. Data from Euromonitor (1995), except for Germany and the U.K., where
more reasonable values were obtained from the United Nations Statistical Yearbook (2000).
Source: Adapted from Euromonitor (1995), European Marketing Data and Statistics, Euromonitor Publications, London; and from United
Nations (2000), Statistical Yearbook, 44th Issue, U.N. Department of Social Affairs, New York.
55.5
19.0
12.8
15.3
23.6
22.0
18.5
5.0
13.5
0.0
2.6
44.8
Eastern
Eastern
Eastern
Eastern
Eastern
Eastern
Eastern
Eastern
Other
Other
Other
Other
Chapter one:
Albania
Bulgaria
Czech/Slovak Republics
Hungary
Poland
Romania
USSR (former)
Yugoslavia (former)
Cyprus
Gibraltar
Malta
Turkey
12
Chapter one:
13
where the aij values are constants, F1 and F2 are the factors, and ei represents
the variation in Xi that is independent of the variation in the other X variables.
Here F1 might be the factor of size. In that case, the coefcients a11, a21, a31,
a41, and a51 would all be positive, reecting the fact that some birds tend to
be large and some birds tend to be small on all body measurements. The
second factor F2 might then measure an aspect of the shape of birds, with
some positive coefcients and some negative coefcients. If this two-factor
model tted the data well, then it would provide a relatively straightforward
description of the relationship between the ve body measurements being
considered.
One type of factor analysis starts by taking the rst few principal components as the factors in the data being considered. These initial factors are
then modied by a special transformation process called factor rotation in
order to make them easier to interpret. Other methods for nding initial
factors are also used. A rotation to simpler factors is almost always done.
Discriminant function analysis is concerned with the problem of seeing
whether it is possible to separate different groups on the basis of the available
measurements. This could be used, for example, to see how well surviving
and nonsurviving sparrows can be separated using their body measurements
(Example 1.1), or how skulls from different epochs can be separated, again
using size measurements (Example 1.2). Like principal components analysis,
discriminant function analysis is based on the idea of nding suitable linear
combinations of the original variables to achieve the intended aim.
Cluster analysis is concerned with the identication of groups of similar
objects. There is not much point in doing this type of analysis with data like
those of Example 1.1 and 1.2, as the groups (survivors/nonsurvivors and
epochs) are already known. However, in Example 1.3 there might be some
interest in grouping colonies on the basis of environmental variables or Pgi
frequencies, while in Example 1.4 the main point of interest is in the similarity
between prehistoric Thai dogs and other animals. Likewise, in Example 1.5
the European countries can possibly be grouped in terms of their similarity
in employment patterns.
With canonical correlation, the variables (not the objects) are divided into
two groups, and interest centers on the relationship between these. Thus in
Example 1.3, the rst four variables are related to the environment, while
the remaining six variables reect the genetic distribution at the different
colonies of Euphydryas editha. Finding what relationships, if any, exist
between these two groups of variables is of considerable biological interest.
Multidimensional scaling begins with data on some measure of the distances apart of a number of objects. From these distances, a map is then
constructed showing how the objects are related. This is a useful facility, as
it is often possible to measure how far apart pairs of objects are without
having any idea of how the objects are related in a geometric sense. Thus in
Example 1.4, there are ways of measuring the distances between modern
dogs and golden jackals, modern dogs and Chinese wolves, etc. Considering
each pair of animal groups gives 21 distances altogether, and from these
14
Chapter one:
15
believe that this is not true. In particular, if all the individual variables being
studied appear to be normally distributed, then it is assumed that the joint
distribution is multivariate normal. This is, in fact, a minimum requirement
because the denition of multivariate normality requires more than this.
Cases do arise where the assumption of multivariate normality is clearly
invalid. For example, one or more of the variables being studied may have
a highly skewed distribution with several very high (or low) values; there
may be many repeated values; etc. This type of problem can sometimes be
overcome by an appropriate transformation of the data, as discussed in
elementary texts on statistics. If this does not work, then a rather special
form of analysis may be required.
One important aspect of a multivariate normal distribution is that it is
specied completely by a mean vector and a covariance matrix. The denitions of a mean vector and a covariance matrix are given in Section 2.7.
Basically, the mean vector contains the mean values for all of the variables
being considered, while the covariance matrix contains the variances for all
of the variables plus the covariances, which measure the extent to which all
pairs of variables are related.
16
References
Bumpus, H.C. (1898), The elimination of the unt as illustrated by the introduced
sparrow, Passer domesticus, Biological Lectures, 11th Lecture, Marine Biology
Laboratory, Woods Hole, MA, pp. 209226.
Euromonitor (1995), European Marketing Data and Statistics, Euromonitor Publications,
London.
Higham, C.F.W., Kijngam, A., and Manly, B.F.J. (1980), An analysis of prehistoric
canid remains from Thailand, J. Archaeological Sci., 7, 149165.
McKechnie, S.W., Ehrlich, P.R., and White, R.R. (1975), Population genetics of Euphydryas butteries, I: genetic variation and the neutrality hypothesis, Genetics,
81: 571594.
Thomson, A. and Randall-Maciver, R. (1905), Ancient Races of the Thebaid, Oxford
University Press, Oxford, U.K.
United Nations (2000), Statistical Yearbook, 44th Issue, U.N. Department of Social
Affairs, New York.
chapter two
Matrix algebra
2.1 The need for matrix algebra
The theory of multivariate statistical methods can be explained reasonably
well only with the use of some matrix algebra. For this reason it is helpful,
if not essential, to have at least some knowledge of this area of mathematics.
This is true even for those who are interested in using the methods only as
tools. At rst sight, the notation of matrix algebra is somewhat daunting.
However, it is not difcult to understand the basic principles, providing that
some of the details are accepted on faith.
.
a
m1
a12 . .
a 22 . .
.
.
am 2 . .
a1n
a2n
.
.
a mn
18
then this is called a column vector. If there is only one row, such as
r = (r1, r2 , . ., rn)
then this is called a row vector. Bold type is used to indicate matrices and
vectors.
The transpose of a matrix is obtained by interchanging the rows and the
columns. Thus the transpose of the matrix A above is
a11 a 21 . . a m1
a a . . a
m2
12 22
.
.
.
A =
.
. .
a1n a 2 n . . a mn
Also, the transpose of the vector c is c = (c1, c2, . ., cm), and the transpose of
the row vector r is the column vector r.
There are a number of special kinds of matrices that are important. A
zero matrix has all elements equal to zero, so that it is of the form
0
0
0 = .
.
0
0 . . 0
0 . . 0
.
.
.
.
0 . . 0
A diagonal matrix has zero elements except down the main diagonal, so that
it takes the form
d 0 . . 0
1
0 d2 . . 0
.
D= . .
.
. .
0 0 .. d
n
Chapter two:
Matrix algebra
19
1
0
I = .
.
0
0 . . 0
1 . . 0
.
.
.
.
0 . . 1
Two matrices are equal only if they are the same size and all of their
elements are equal. For example
b b b
a a a
11 12 13
11 12 13
a 21 a 22 a 23 = b 21 b 22 b 23
b b b
a a a
31 32 33
31 32 33
only if a11 = b11, a12 = b12, a13 = b13 , and so on.
The trace of a matrix is the sum of the diagonal terms, which is only
dened for a square matrix. For example, the trace of the 3 3 matrix with
the elements aij shown above has trace (A) = a11 + a22 + a33.
11
11 12
11 12
A + B = a 21 a 22 + b 21 b22 = a 21 + b21 a 22 + b22
a + b a + b
b b
a a
31
32
32
31
31 32
31 32
while
a - b a - b
b b
a a
11
12
12
11
11 12
11 12
A - B = a 21 a 22 - b 21 b 22 = a 21 - b 21 a 22 - b 22
a - b a - b
b b
a a
31
32
32
31
31 32
31 32
Clearly, these operations only apply with two matrices of the same size.
20
In matrix algebra, an ordinary number such as 20 is called a scalar. Multiplication of a matrix A by a scalar k is then dened to be the multiplication
of every element in A by k. Thus if A is the 3 2 matrix as shown above, then
ka ka
12
11
kA = ka 21 ka 22
ka ka
32
31
The multiplication of two matrices, denoted by A.B or A B, is more
complicated. To begin with, A.B is dened only if the number of columns
of A is equal to the number of rows of B. Assume that this is the case, with
A having the size m n and B having the size n p. Then multiplication is
dened to produce the result:
a11 a12 .. a1n b11 b12 .. b1p
a a .. a b b .. b
2p
2n
21 22
21 22
.
.
.
. . .
A.B = .
.
.
.
. .
.
a m1 a m 2 .. a mn bn 1 bn 2 .. b np
.
.
.
Sa mj b j1 Sa mj b j 2 .. Sa mj b jp
where the summations are for j running from 1 to n. Hence the element in
the ith row and kth column of A.B is
S aij bjk = ai1 b1k + ai2 b2k + . . + ain bnk
When A and B are both square matrices, then A.B and B.A are both
dened. However, they are not generally equal. For example,
2 1 - 1 0 2 1 - 1 1
2 1
2 -1 1 1
=
=
1 1 + 1 0 1 1 + 1 1
1 2
1 1 0 1
Chapter two:
Matrix algebra
21
whereas
3 0
1 2 + 1 1 - 1 1 + 1 1
1 1 2 -1
0 1 1 1 = 0 2 + 1 1 - 1 0 + 1 1 = 1 1
23
2 1
1 2 = -1 3
-1 3
2 3
0 1
1 2 -1 3 2 3
Actually, the inverse of a 2 2 matrix, if it exists, can be calculated fairly
easily. The equation is
a b
c d
-1
d D - b D
=
a D
- c D
22
Q=
i =1
xiaijx j
j =1
where xi is the element in the ith row of x and aij is the element in the ith
row and jth column of A.
a n 1x1 + a n 2 x 2 + + a nn x n = l x n
where l is a scalar. These can also be written in matrix form as
A x = lx
or
(A l I) x = 0
where I is the n n identity matrix, and 0 is an n 1 vector of zeros. Then
it can be shown that these equations can hold only for certain particular
Chapter two:
Matrix algebra
23
values of l that are called the latent roots or eigenvalues of A. There can be
up to n of these eigenvalues. Given the ith eigenvalue l i , the equations can
be solved by arbitrarily setting x1 = 1, and the resulting vector of x values
with transpose x = (1, x2, x3, . ., xn), or any multiple of this vector, is called
the ith latent root or the ith eigenvector of the matrix A. Also, the sum of the
eigenvalues is equal to the trace of A dened above, so that
trace (A) = l 1 + l 2 + . . + l
x = (x1 + x 2 + ... + x n ) n = x i n
i =1
s 2 = (x i - x )
i =1
(n - 1)
)(
c jk = x ij - x j x jk - xk
i =1
) (n - 1)
where xij is the value of variable Xj for the ith multivariate observation. This
covariance is then a measure of the extent to which there is a linear relationship between Xj and Xk, with a positive value indicating that large value of
Xj and Xk tend to occur together, and a negative value indicating that large
values for one variable tend to occur with small values for the other variable.
It is related to the ordinary correlation coefcient between the two variables,
which is dened to be
24
Furthermore, the denitions imply that ckj = cjk, rkj = rjk, cjj = sj2, and rjj = 1.
With these denitions, the transpose of the sample mean vector is
x = ( x1 , x2 ,.., xp )
which can be thought of as reecting the center of the multivariate sample.
It is also an estimate of the transpose of the population vector of means
= (1, 2, . ., p)
Furthermore, the sample matrix of variances and covariances, or the covariance matrix, is
c11
c
21
C= .
.
c p 1
c12 .. c1p
c 22 .. c 2 p
.
.
.
.
c p 2 .. c pp
where cii = si2. This is also sometimes called the sample dispersion matrix, and
it measures the amount of variation in the sample as well as the extent to
which the p variables are correlated. It is an estimate of the population covariance matrix
s11 s12 .. s1p
s s .. s
2p
21 22
.
.
.
=
.
.
.
s p1 s p 2 .. s pp
r21
R = .
.
r
p1
r12 .. r1p
1 .. r2 p
.
.
.
.
rp 2 .. 1
Chapter two:
Matrix algebra
25
the variables are coded by subtracting the sample mean and dividing by the
sample standard deviation, then the coded values will have a mean of zero
and a standard deviation of one for each variable. In that case, the sample
covariance matrix will equal the sample correlation matrix, i.e., C = R.
26
References
Harville, D.A. (1997), Matrix Algebra from a Statisticians Perspective, Springer,
New York.
Healy, M.J.R. (1986), Matrices for Statistics, Clarendon Press, Oxford.
Namboodiri, K. (1984), Matrix Algebra: an Introduction, Sage Publications, Thousand
Oaks, CA.
Searle, S.R. (1982), Matrix Algebra Useful to Statisticians, Wiley, New York.
chapter three
Displaying multivariate
data
3.1 The problem of displaying many variables
in two dimensions
Graphs must be displayed in two dimensions either on paper or on a computer screen. It is therefore straightforward to show one variable plotted on
a vertical axis against a second variable plotted on a horizontal axis. For
example, Figure 3.1 shows the alar extent plotted against the total length for
the 49 female sparrows measured by Hermon Bumpus in his study of natural
selection (Table 1.1). Such plots allow one or more other characteristics of
the objects being studied to be shown as well. For example, in the case of
Bumpuss sparrows, survival and nonsurvival are also indicated.
It is considerably more complicated to show one variable plotted against
another two, but still possible. Thus Figure 3.2 shows beak and head lengths
(as a single variable) plotted against total lengths and alar lengths for the 49
sparrows. Again, different symbols are used for survivors and nonsurvivors.
It is not possible to show one variable plotted against another three at
the same time in some extension of a three-dimensional plot. Hence there is
a major problem in showing in a simple way the relationships that exist
between the individual objects in a multivariate set of data where those
objects are each described by four or more variables. Various solutions to
this problem have been proposed and are discussed in this chapter.
28
250
245
240
235
230
225
150
155
160
Total Length (mm)
Survivor
165
Nonsurvivor
Figure 3.1 Alar extent plotted against total length for the 49 female sparrows measured by Hermon Bumpus.
34
33
32
31
235
165
160
240
245
Alar Extent
155
250
255
150
Total
Length
Figure 3.2 The length of the beak and head plotted against the total length and alar
extent (all measured in millimeters) for the 49 female sparrows measured by Hermon
Bumpus ( = survivor, = nonsurvivor).
of principal component 1 can be used as a means of representing the relationships between objects graphically, and a display of principal component
3 against the rst two principal components can also be used if necessary.
The use of suitable index variables has the advantage of reducing the
problem of plotting many variables to two or three dimensions, but the
potential disadvantage is that some key difference between the objects may
be lost in the reduction. This approach is discussed in various different
contexts in the chapters that follow and will not be considered further here.
Chapter three:
29
49
Bird
1
166
Length
150
255
Alar
225
34
Bk & Hd
30
20
Humerus
17
24
Sternum
18
49
Bird
150
166
Length
225
255
Alar
30
34
Bk & Hd
17
20
Humerus
18
24
Sternum
Figure 3.3 Draftsmans plot of the bird number and ve variables measured (in
millimeters) on 49 female sparrows. The variables are the total length, the alar extent,
the length of the beak and head, the length of the humerus, and the length of the
keel of the sternum ( = survivor, = nonsurvivor). Only the extreme values are
shown on each scale.
30
Chapter three:
Modern dog
Golden
jackal
Chinese wolf
Indian wolf
31
Dingo
Cuon
Prehistoric dog
(a)
Modern dog
Golden
jackal
Chinese wolf
Indian wolf
Cuon
Dingo
X4
X3
X5
Prehistoric dog
X6
X2
X1
(b)
Toit et al. (1986, ch. 4). In summary, it can be said that the use of symbols
has the advantage of displaying all variables simultaneously, but the disadvantage that the impression gained from the graph may depend quite
strongly on the order in which objects are displayed and the order in which
variables are assigned to the different aspects of the symbol.
The assignment of variables is likely to have more effect with faces than
it is with stars, because variation in different features of the face may have
very different impacts on the observer, whereas this is less likely to be the
case with different rays of a star. For this reason, the recommendation is
often made that alternative assignments of variables to features should be
tried with faces in order to nd what seems to be the best. The subjective
nature of this type of process is clearly rather unsatisfactory.
Although the use of faces, stars, and other similar representations for
the values of variables on the objects being considered seems to be useful
under some circumstances, the fact is that this is seldom done. One reason
is the difculty in nding computer software to produce the graphs. In the
past, this software was reasonably easily available, but these options are now
very hard to nd in statistical packages.
32
40
30
20
10
0
X4
X1
X3
X2
X5
X6
Variable
Modern dog
Golden jackal
Chinese wolf
Indian wolf
Cuon
Dingo
Prehistoric dog
Figure 3.5 Proles of variables for mandible measurements on seven canine groups
The variables are in order of increasing average values, with X1 = breadth of mandible;
X2 = height of mandible above the rst molar; X3 = length of rst molar; X4 = breadth
of rst molar; X5 = length from rst to third molar, inclusive; and X6 = length from
rst to fourth molar, inclusive.
Chapter three:
33
50
40
30
20
10
Modern dog
Chinese wolf
Golden jackal
Indian wolf
M. breadth
M. height
Cuon
Length 13 molar
Prehistoric dog
Dingo
Length 14 molar
Figure 3.6 An alternative way to show variable proles using bars instead of the
lines used in Figure 3.5 (M. = mandible).
and differences between the canine groups is exactly the same as seen in
Figure 3.5.
34
References
Chernoff, H. (1973), Using faces to represent points in K-dimensional space graphically, J. Am. Stat. Assoc., 68, 361368.
Cleveland, W.S. (1985), The Elements of Graphing Data, Wadsworth, Monterey, CA.
Everitt, B. (1978), Graphical Techniques for Multivariate Data, North-Holland, New York.
Jacoby, W.G. (1998), Statistical Graphics for Visualizing Multivariate Data, Sage Publications, Thousand Oaks, CA.
Toit, S.H.C., Steyn, A.G.W., and Stumf, R.H. (1986), Graphical Exploratory Data Analysis,
Springer-Verlag, New York.
Tufte, E.R. (2001), The Visual Display of Quantitative Information, 2nd ed., Graphics
Press, Cheshire, CT.
Welsch, R.E. (1976), Graphics for data analysis, Comput. Graphics, 2, 3137.
chapter four