Академический Документы
Профессиональный Документы
Культура Документы
Tensor product
In Chapter 2 we have looked at the conjugation action of GL(V ) on matrices. This action
corresponds with the view of matrices as linear transformations. In Chapter 1 we have
looked into the role of matrices for describing linear subspaces of n . In Remark 1.1.2, we
mentioned yet another interpretation of a matrix A, namely as data determining a bilinear
map n m by means of matrix multiplication:
(x, y) 7 (x1 , . . . , xn )> A(y1 , . . . , ym ).
In Chapter 4 we will take a closer look at that interpretation. In this chapter we pave the
way by taking a closer look at the notion bilinear. For this purpose we will be using a very
general construction: tensor products of vector spaces.
The foundation of tensor products lies in a natural extension of the notion of linear map:
the multilinear map. If we take multi equal to 1 then we get linear; equal to 2 gives us
bilinear, 3 gives us trilinear, et cetera. We shall focus on bilinear maps. In this chapter V ,
W , U , V1 , V2 , . . . are vector spaces over . In general they are finite dimensional, but every
now and then we will make an exception to this rule.
3.1
Bilinear maps
Definition 3.1.1 A map f : V W U with the property that, for every v V and every
w W , the maps x 7 f (x, w) and y 7 f (v, y) are linear, is called bilinear (with respect
to V and W ).
In general, for every natural number t a map f : V 1 V2 Vt U is called t-linear
(with respect to V1 , . . . , Vt ) if, for every i {1, . . . , t} and every v j Vj (j 6= i), the map
x 7 f (v1 , . . . , vi1 , x, vi+1 , . . . , vt ) is a linear map from Vi to U .
The set of all t-linear maps f : V1 V2 Vt U will be denoted by ML(V1 , V2 , . . . , Vt ; U ).
Thus, if we view f as a map with t arguments, then t-linearity means that f is linear in
each of its t arguments.
Example 3.1.1 Here are two well-known special cases.
(i) Inner products on V are bilinear maps V V .
(ii) The determinant, defined on the vector space M n ( ) of n n matrices over , can be
viewed as a n-linear map from n n n (n copies) to : the n arguments
are the the columns of the matrix in M n ( ).
1
Exercise 3.1 Often the word product refers to a bilinear map. Think of the inner product
described as above. Prove that also the following products are bilinear maps.
(i) Matrix multiplication Mn ( ) Mn ( ) Mn ( ).
(ii) Outer product
3,
x y = (x2 y3 x3 y2 , x3 y1 x1 y3 , x1 y2 x2 y1 ).
Exercise 3.2 Show that ML(V1 , V2 , . . . , Vt ; U ), with point-wise addition and scalar multiplication, is a vector space over . (Harder, and a consequence of the theory which we will
Q
develop shortly:) Verify that the dimension is dim(U ) i dim(Vi ).
We construct a vector space which is so to speak a product of V and W . One could think
of the set V W of all pairs (v, w) with v V and w W . But what should addition look
like? If we demand (v1 , w1 ) + (v2 , w2 ) = (v1 + v2 , w1 + w2 ) then we arrive at the direct sum
of V and W .
So the question for such a product has to be made precise before we can continue. The
demand that the product is the universal domain for bilinear maps on V W viewed as
linear maps, helps us further. The product we want to form is called the tensor product
and is denoted by V W . It is characterised as the vector space T satisfying the following
property.
Definition 3.1.2 Let T be a linear vector space over . We say that T satisfies the characteristic property of the tensor product (with respect to V and W ) if there is a bilinear
map h : V W T such that the image of h in T spans all of the vector space T and, if,
for every bilinear map f : V W U (with U a vector space), there exists a linear map
g : T U with f = gh.
One also expresses this by saying that the following diagram commutes.
V W
h&
U
%g
T
This property determines T modulo an isomorphism. Strictly speaking the tensor product
does not exist solely of the space T but also of the map h : V W T . The fact that two
tensor products are isomorphic should therefore be formulated a little bit more precise: a
homomorphism : (h, T ) (h0 , T 0 ) is a linear map : T T 0 such that h0 = h. The
map is called an isomorphism if there is a homomorphism : (h 0 , T 0 ) (h, T ) which, as
a linear map T 0 T , is the inverse of .
Lemma 3.1.1 Any two vector spaces which satisfy the characteristic property of the tensor
product of V and W , are isomorphic.
Proof: Suppose both (h, T ) and (h0 , T 0 ) satisfy the characteristic property of the tensor
product with respect to V and W . Apply the characteristic property of the tensor product
to (h, T ) with (f, U ) = (h0 , T 0 ). It follows that there exists a linear map g : T T 0 with
h0 = g h. Apply the property to (h0 , T 0 ) with (f, U ) = (h, T ). Then there exists a linear
map g 0 : T 0 T with h = g 0 h0 . Therefore we have both h = g 0 g h and h0 = g g 0 h0 . Since
h(V W ) spans all of T , and h is linear, we find g 0 g = idT , and likewise g g 0 = idT 0 . It
follows that T and T 0 , and even that (h, T ) and (h0 , T 0 ) are isomorphic (by means of g). 2
We could take the characteristic property of the tensor product as the definition of the
tensor product V W , except that we have not shown yet its existence! A rather axiomatic
definition of the tensor product is the following.
Definition 3.1.3 Consider the formal vector space A with basis all pairs (v, w) from V W .
Let B be the linear subspace of A generated by all vectors of the form
(v, w1 + w2 ) (v, w1 ) (v, w2 )
(v1 + v2 , w) (v1 , w) (v2 , w)
(v, w) (v, w)
(v, w) (v, w)
with , v, v1 , v2 V and w, w1 , w2 W . Then we define V W , the tensor product of
V and W , as the quotient vector space A/B.
Exercise 3.3 Show that the vector space A of Definition 3.1.3 is finite dimensional if and
only if is finite and V and W are finite dimensional.
To derive the characteristic property of the tensor product, we use the bilinear map h
ML(V, W ; A/B) given by h(v, w) = (v, w) + B.
Proposition 3.1.1 The tensor product V W of V and W satisfies the characteristic
property of the tensor product.
Proof: In the first place we see that the residue class (v, w) + B in A/B of every basis
element (v, w) of A appears as an image under h, so that indeed A/B is spanned by h(V
W ). Now let f : V W U be a bilinear map. To see that g can be found as in the
commutative diagram of Definition 3.1.2, we use the well-known result:
If k : A U is a linear map with B contained in ker(k), then there is a unique
linear map g : A/B U with g(a + B) = k(a) for all a A.
We apply this result to the linear map k : A U determined by k(v, w) = f (v, w) ((v, w)
V W ). (Since k is only given on a basis of A, this determines a unique linear map.)
The fact that f is bilinear implies that B ker(k). We conclude that there is a map
g : A/B U with g((v, w) + B) = f (v, w) for all v V and w W , so with f = gh. 2
Theorem 3.1.2 Suppose m = dim(W ) is finite. Then V 0 W
= L(V, W ).
given by
h(b, w) = (v 7 b(v)w) ,
satisfies the characteristic property for the tensor product of V 0 and W . Then the theorem
follows from Lemma 3.1.1.
Choose a basis (wi )i of W . As a L(V, W ), there are linear functionals b i V 0 such that
P
P
a = (v 7 i bi (v)wi ), so a = i h(bi , wi ) is in the linear span in L(V, W ) of the image of
h.
Now let f : V 0 W U be a bilinear map. Then the linear map g : L(V, W ) U given
by
g(v 7
bi (v)wi ) =
f (bi , wi )
V W
h&
U
%g
L(V, W )
We have seen that the pair (h, L(V, W )) satisfies the characteristic property for the tensor
product of V 0 and W . This suffices for the proof of the theorem.
2
Exercise 3.4 Investigate where the theorem goes wrong if W is not finite dimensional.
Remark 3.1.1 From the theorem it follows that U 0 = L(U, )
= U 0 . But this can also
be seen directly. Verify this!
4
Remark 3.1.2 By Theorem 3.1.2 the linear map
V 0 W L(V, W )
determined by
f w 7 (v 7 f (v)w)
is an isomorphism.
Likewise, for finite dimensional V , the linear map
V W L(V 0 , W )
determined by
v w 7 (f 7 f (v)w)
is an isomorphism.
Compare this with the comments following Definition 2.2.1.
determined by
v w 7 w v
is an isomorphism.
determined by
(u v) w 7 u (v w)
is an isomorphism. It therefore does not matter where the parentheses are (or where we
put them in case they are missing) in a multiple tensor product.
v w 7 (f 7 f (v)w)
(f V 0 , v V, w W ).
Now (f 7 f (vi )wj )i j is a basis of L(V 0 , W ), and so ( 1 (f 7 f (vi )wj ))i j = (vi wj )i j is
a basis of V W .
2
Remark 3.1.3 Let V = n . Using the fact that L(V )
= V 0 V we can now interpret
0
>
n n matrices as vectors in V V . The tensor y x of an n-dimensional column
vector x from V = n and an n-dimensional row vector y > from V 0 corresponds with the
linear transformation z 7 (y > z)x L(V ), i.e., with the rank 1 matrix xy > (for (y > z)x =
x(y > z) = (xy > )z).
4
Corollary 3.1.3 A linear transformation on a finite dimensional vector space V has rank
at most r if and only if the corresponding element in V 0 V is the sum of at most r distinct
matrices of rank 1.
Proof: A matrix has rank r if and only if the image of the associated linear transformation
has dimension r.
In the first place every rank 1 matrix is of the form xy > = (y x) in the notation of the
proof of Proposition 3.1.2. So the statement we are trying to prove holds in the case r = 1.
From this it follows directly that a vector g V 0 V is the sum of at most r simple tensors
if and only if the associated linear transformation (g) L(V ) is spanned by at most r
distinct vectors.
2
Exercise 3.8 Write the matrix
74 88 102
A = 82 98 114
90 108 126
as a sum of two simple tensors, or, equivalently, as a sum of two rank 1 matrices.
Exercise 3.9 Write an algorithm which decomposes a square matrix A into a sum of
rank(A) matrices of rank 1.
Exercise 3.10 For vector spaces V and W over , prove the following assertions.
(i) ML(V, W ; )
= (V W )0 .
(ii) (V W )0
= V 0 W 0.
3.2
Now that we have constructed the tensor product U V , we would of course like to relate
linear maps on U and V to linear maps on U V .
Theorem 3.2.1 For A L(U, W ) and B L(V, T ) the requirements
u v 7 Au Bv
(u U, v V )
ui vi 7
Aui Bvi .
A1,1 B
A2,1 B
AB =
..
.
A1,2 B
A2,2 B
..
.
. . . A1,n B
. . . A2,n B
..
..
.
.
An,1 B An,2 B . . . An,n B
For example, if
ua
ud
!
a b c
ug
u v
A=
and A = d e f , then A B =
xa
x y
g h k
xd
xg
2)
L(
3)
ub
ue
uh
xb
xe
xh
uc
uf
uk
xc
xf
xk
va
vd
vg
ya
yd
yg
vb
ve
vh
yb
ye
yh
vc
vf
vk
.
yc
yf
yk
3.3
Hadamard matrices
As an application we now look at a special type of matrix that has done well in statistics.
Definition 3.3.1 A Hadamard matrix is a square matrix H of dimension n satisfying
HH > = nIn ,
and with each coefficients either 1 or 1.
Example 3.3.1 For n = 1 the only two Hadamard matrices are H = (1) and H = (1).
For n = 2
H=
1 1
1 1
is a Hadamard matrix.
3.4
We have now reached the stage where we can return to our point of departure: understanding a matrix as a data structure for bilinear forms. The bilinear forms on a vector space V
are the elements of ML(V, V ; ), but they can also be viewed as element of (V V ) 0 , or of
V 0 V 0 (see Exercise 3.10), or as an element of L(V, V 0 ).
Let f be a bilinear form on V . If we choose a basis e i for V , then f is determined by the
numbers fi,j = f (ei , ej ) . If we identify V with n using our basis (ei )i , then the matrix
Af = (fi,j )i,j satisfies
f (x, y) = x> Af y.
We want to investigate how Af changes under a coordinate transformation. Since a coordinate transformation is an element of the group GL(V ), we shall first investigate how the
action of GL(V ) on V 0 can be described.
To start with, we explain what we mean by an action.
vi w i )
gvi gwi ;
the dual action of G on V 0 is the map given by (g, f ) 7 f g 1 . (For the dual action,
see also Definition 1.1.5.)
4
Verify that the dual action is a special case of the conjugation action. Indeed, V 0 = L(V, ),
and G acts trivial on (i.e., via (g, ) 7 ), so that gf g 1 = f g 1 .
The key observation in understanding the action of GL(V ) on V 0 is that the evaluation
function
V0V ,
determined by
f v 7 f (v)
for all v V, f V 0 .
for all f V 0 .
We can also formulate this in the language of matrices. Below we shall denote by the
exponent > the transposed inverse: first take the transposed and then the inverse (or
the other way around, the order does not matter).
Lemma 3.4.2 The action of GL(n, ) on
(T, v) 7 T > v
(T GL(n, ), v
n
n
).
10
(T GL(n, ), y
).
The lemma now follows by choosing the basis e 0i for ( n )0 , and determining the matrix of
the linear transformation y > 7 y > T 1 with respect to this basis.
2
Exercise 3.16 Suppose we are given two actions of a group G: one on V and one on W .
A map : V W such that g = g for all g G is called G-equivariant. Show that there
are G-equivariant isomorphisms in the following cases:
(i) : V 0 W L(V, W ).
(ii) : ML(V, W ; ) L(V, W 0 ).
Lemma 3.4.3 If V =
matrix Af is given by
T 7 T > Af T 1 .
Proof:
(T f )(x, y) = f (T 1 x, T 1 y) = (T 1x)> Af T 1 y = x> (T > Af T 1 )y
So T transforms Af to T > Af T 1 .
Now we really are only interested in bilinear forms modulo coordinate transformations, i.e.
in the orbits of GL(V ) on ML(V, V ; ). For this it does not matter whether we look at this
action or the dual, i.e., one composed with the automorphism g 7 g > .
Definition 3.4.2 If we take the dual action of Lemma 3.4.3, then we are dealing with the
action of GL(n, ) given by
A 7 SAS >
for S GL(n, ).
We call this action the congruency action. Matrices (bilinear forms) in the same GL(n, )orbit under this action are called congruent.
Notice that this action is indeed different from the conjugation action. In the next chapter
we shall study the orbits of GL(V ) under this action.
Exercise 3.17 Determine the orbits of GL(3,
2)
on M3 (
2)
11
Exercise 3.19 Give a solution of the congruency problem for bilinear forms in the one
dimensional case.
In fact we will not investigate all GL(V )-orbits under congruency on bilinear forms, but
only those orbits that belong to special bilinear forms: the alternating bilinear forms and
the symmetric bilinear forms.
Theorem 3.4.4 Suppose has characteristic not equal to 2. Then the linear space ML(V, V ; )
of all bilinear forms on V is a direct sum of the linear spaces
S(V ) = {f ML(V, V ; ) | f (x, y) = f (y, x) for all x, y V }
and
A(V ) = {f ML(V, V ; ) | f (x, y) = f (y, x) for all x, y V }.
Both subspaces S(V ) and A(V ) are invariant under the congruency action of GL(V ).
Proof: An arbitrary element f ML(V, V ; ) can be written as
(x, y) 7 (f (x, y) + f (y, x))/2 + (f (x, y) f (y, x))/2.
The first summand belongs to S(V ), the second to A(V ). these two subsets of ML(V, V ; )
are linear subspaces. Their intersection is zero: if f S(V ) A(V ), then we have f (y, x) =
f (x, y = f (y, x), so 2f (y, x) = 0, so f (y, x) = 0 for all x, y V .
To prove the statement at the end of the theorem we choose = 1 such that f (x, y) =
f (y, x) for all x, y V . Then it follows that for g GL(V )
(g f )(x, y) = f (g 1 x, g 1 y) = f (g 1 y, g 1 x) = (g f )(y, x)
so that g f A(V ) if f A(V ) (with = 1) and g f S(V ) if f S(V ) (with = 1).
2
n+1
2
and
dim(A(V )) =
n
.
2
12