Академический Документы
Профессиональный Документы
Культура Документы
MATH1231 Mathematics 1B
ALGEBRA NOTES
For MATH1231 students, this summary is an extract from the Algerba Notes in the MATH1231/41
Course Pack for revision.
If you found any mistake or typo, pleaase send me an email to chi.mak@unsw.edu.au
ii
Chapter 6
VECTOR SPACES
6.1
Furthermore, the following set, its subset of all continuous function and its subset of all differentiable
functions are vector spaces over R.
R[X], the set of all possible real-valued functions with domain X.
6.2
Vector arithmetic
Proposition 1. In any vector space V , the following properties hold for addition.
1. Uniqueness of Zero. There is one and only one zero vector.
2. Cancellation Property. If u, v, w V satisfy u + v = u + w, then v = w.
3. Uniqueness of Negatives. For all v V , there exists only one w V such that v + w = 0.
Proposition 2. Suppose that V is a vector space over a field F, F, v V , 0 is the zero scalar
in F and 0 is the zero vector in V . Then the following properties hold for multiplication by a scalar:
1. Multiplication by the zero scalar. 0v = 0,
2. Multiplication of the zero vector. 0 = 0.
3. Multiplication by 1. (1)v = v (the additive inverse of v).
4. Zero products. If v = 0, then either = 0 or v = 0.
5. Cancellation Property. If v = v and v 6= 0 then = .
6.3
Subspaces
Definition 1. A subset S of a vector space V is called a subspace of V if S is
itself a vector space over the same field of scalars as V and under the same rules for
addition and multiplication by scalars.
In addition if there is at least one vector in V which is not contained in S, the
subspace S is called a proper subspace of V .
Theorem 1 (Subspace Theorem). A subset S of a vector space V over a field F, under the same
rules for addition and multiplication by scalars, is a subspace of V if and only if
i) The vector 0 in V also belongs to S.
ii) S is closed under vector addition, and
iii) S is closed under multiplication by scalars from F.
6.4
6.4.1
6.4.2
6.5
Linear independence
Definition 1. Suppose that S = {v1 , . . . , vn } is a subset of a vector space. The
set S is a linearly independent set if the only values of the scalars 1 , 2 , . . . , n
for which
1 v1 + + n vn = 0
are 1 = 2 = = n = 0.
6.5.1
We have seen that questions about spans in Rm can be answered by relating them to questions
about the existence of solutions for systems of linear equations. The same is true for questions
about linear dependence in Rm .
Proposition 1. If S = {a1 , . . . , an } is a set of vectors in Rm and A is the m n matrix whose
columns are the vectors a1 , . . . , an then the set S is linearly dependent if and only if the system
Ax = 0 has at least one non-zero solution x Rn .
6.5.2
6.5.3
Theorem 3. A set of vectors S is a linearly independent set if and only if no vector in S can be
written as a linear combination of the other vectors in S, that is, if and only if no vector in S is
in the span of the other vectors in S.
Note. The theorem is equivalent to:
A set of vectors S is a linearly dependent set if and only if at least one vector in S is in the
span of the other vectors in S.
Theorem 4. If S is a finite subset of a vector space V and the vector v is in V , then
span (S {v}) = span (S) if and only if v span (S).
Theorem 5. Suppose that S is a finite subset of a vector space. The span of every proper subset
of S is a proper subspace of span (S) if and only if S is a linearly independent set.
Theorem 6. If S is a finite linearly independent subset of a vector space V and v is in V but not
in span (S) then S {v} is a linearly independent set.
6.6
6.6.1
Bases
Definition 1. A set of vectors B in a vector space V is called a basis for V if:
1. B is a linearly independent set, and
2. B is a spanning set for V (that is, span (B) = V ).
Let B = {v1 , . . . , vn } be a basis for a vector space V over F. Every vector v V can be
uniquely written as
v = 1 v1 + + n vn ,
where 1 , . . . , n F.
An orthonormal basis is a basis whose elements are all of length 1 and are mutually orthogonal.
The advantage of using an orthonormal basis is that we can write easily any vector as the unique
linear combination of the basis by dot product.
6.6.2
Dimension
Theorem 1. The number of vectors in any spanning set for a vector space V is always greater
than or equal to the number of vectors in any linearly independent set in V .
Theorem 2. If a vector space V has a finite basis then every set of basis vectors for V contains
the same number of vectors, that is, if B1 = {u1 , . . . , um } and B2 = {v1 , . . . , vn } are two bases for
the same vector space V then m = n.
6.6.3
Theorem 4. If S is a finite non-empty subset of a vector space then S contains a subset which is
a basis for span (S).
In particular, if V is any non-zero vector space which can be spanned by a finite set of vectors
then V has a basis.
Theorem 5. Suppose that V is a vector space which can be spanned by a finite set of vectors. If S
is a linearly independent subset of V then there exists a basis for V which contains S as a subset.
In other words, every linearly independent subset of V can be extended to a basis for V .
Theorem 6 (Reducing a spanning set to a basis in Rm ). Suppose that S = {v1 , . . . , vn } is any
subset of Rm and A is the matrix whose columns are the members of S. If U is a row-echelon form
for A and S is created from S by deleting those vectors which correspond to non-leading columns
in U then S is a basis for span (S).
Theorem 7 (Extending a linearly independent set to a basis in Rm ).
Suppose that S = {v1 , . . . , vn } is a linearly independent subset of Rm and A is the matrix whose
columns are the members of S followed by the members of the standard basis for Rm . If U is a rowechelon form for A and S is created by choosing those columns of A which correspond to leading
columns in U then S is a basis for Rm containing S as a subset.
Proposition 8. If V is a finite-dimensional space and W is a subspace of V and dim(W ) = dim(V )
then W = V .
Proof. By Theorem 4, there exists B a basis for W . So B is a linearly independent set in V . By
Theorem 3 part 4, B is also a basis for V and V = span (B) = W .
6.9
6.9.1
A set is any collection of elements. Sets are usually defined either by listing their elements or by
giving a rule for selection of the elements. The elements of a set are usually enclosed in braces {}.
Definition 1. Two sets A and B are equal (notation A = B) if every element of
A is an element of B, and if every element of B is an element of A.
To prove that A = B it is necessary to prove that the two conditions:
1. if x A then x B, and
2. if x B then x A
are both satisfied.
Definition 2. A set A is a subset of another set B (notation A B) if every
element of A is also an element of B.
is satisfied.
x B}.
Definition 5. The union of two sets A and B (notation: A B) is the set of all
elements which are in either or both sets.
That is,
A B = {x : x A or
6.9.2
x B}.
Function notation
The notation f : X Y (which is read as f is a function (or map) from the set X to the set Y )
means that f is a rule which associates exactly one element y Y to each element x X. The y
associated with x is written as y = f (x) and is called the value of the function f at x or the
image of x under f . The set X is often called the domain of the function f and the set Y is often
called the codomain of the function f .
Equality of Functions. Two functions f : X Y and g : X Y are defined to be equal if and
only if f (x) = g(x) for all x X.
Addition of Functions. If f : X Y and g : X Y , and if elements of Y can be added, then
the sum function f + g is defined by
(f + g)(x) = f (x) + g(x) for all x X.
Multiplication by a Scalar. If f : X Y and F, where F is a field, and if elements of Y
can be multiplied by elements of F, then the function f is defined by
(f )(x) = f (x) for all x X.
Chapter 7
LINEAR TRANSFORMATIONS
7.1
(#)
Theorem 3. If T is a linear map with domain V and S is a set of vectors in V , then the function
value of a linear combination of S is equal to the corresponding linear combination of the function
values of S, that is, if S = {v1 , . . . , vn } and 1 ,. . .,n are scalars, then
T (1 v1 + + n vn ) = 1 T (v1 ) + + n T (vn ).
Theorem 4. For a linear map T : V W , the function values for every vector in the domain are
known if and only if the function values for a basis of the domain are known.
Further, if B = {v1 , . . . , vn } is a basis for the domain V then for all v V we have
T (v) = x1 T (v1 ) + + xn T (vn ),
where x1 , . . . , xn are the scalars in the unique linear combination v = x1 v1 + + xn vn of the basis
B.
9
10
7.2
for
x Rn ,
is a linear map.
Theorem 2 (Matrix Representation Theorem). Let T : Rn Rm be a linear map and let the
vectors ej for 1 6 j 6 n be the standard basis vectors for Rn . Then the m n matrix A whose
columns are given by
aj = T (ej ) for 1 6 j 6 n
has the property that
T (x) = Ax for all x Rn .
7.3
7.4
7.4.1
11
Proposition 4. The columns of a matrix A are linearly independent if and only if nullity(A) = 0.
7.4.2
Image
Definition 4. Let T : V W be a linear map. Then the image of T is the set
of all function values of T , that is, it is the subset of the codomain W defined by
im(T ) = {w W : w = T (v) for some v V }.
Definition 6. The rank of a linear map T is the dimension of im(T ). The rank
of a matrix A is the dimension of im(A).
Proposition 7. For a matrix A:
rank(A)
7.4.3
12
for
1 , . . . , R,
7.5
7.10
Chapter 8
EIGENVALUES AND
EIGENVECTORS
8.1
8.1.1
14
8.1.2
8.2
8.3
8.3.1
1 0 . . . 0
0 2
0
D= .
.. .
.
.
.
.
. .
0 0 . . . n
15
k
1 0 . . . 0
0 k
0
2
Dk = .
. .
..
..
. ..
0 0 . . . kn
Proposition 2. If A is diagonalisable, that is, if there exists an invertible matrix M and diagonal
matrix D such that M 1 AM = D, then
Ak = M D k M 1
8.3.2
for integer
k > 1.
16
Chapter 9
INTRODUCTION TO
PROBABILITY AND STATISTICS
9.1
Definition 2.
A set A is a subset of a set B (written A B) if and only if
each element of A is also an element of B; that is, if x A, then x B.
The power set P(A) of A is set of all subsets of A.
The universal set S is the set that denotes all objects of given interest.
The empty set (or {}) is the set with no elements.
Definition 4.
complement of A:
Ac
= {x S : x
/ A}
intersection of A and B:
A B = {x S : x A and x B}
union of A and B:
A B = {x S : x A or x B}
difference:
A B = {x S : x A but x
/ B} = A B c
Definition 5. Sets A and B are disjoint (or mutually exclusive) if and only if
AB =
17
18
9.2
9.2.1
|A B| = |A| + |B| |A B|
Probability
Sample Space and Probability Axioms
Definition 1. A sample space of an experiment is a set of all possible outcomes.
|A|
2. If S is finite and P ({a}) is constant for all outcomes a S, then P (A) =
.
|S|
X
3. If S is finite (or countable), then
P ({a}) = 1 .
aS
9.2.2
(Addition Rule)
9.2. PROBABILITY
9.2.3
19
Conditional Probabilities
We now consider what happens if we restrict the sample space from S to some event in S.
Multiplication Rule
P (B) =
n
X
P (B|Ai )P (Ai ) .
i=1
Bayes Rule
If A1 , . . . , An partition S and B is an event, then
P (Aj |B) =
P (B|Aj )P (Aj )
.
n
X
P (B|Ai )P (Ai )
i=1
9.2.4
Statistical Independence
Definition 4. Events A and B are (statistically) independent if and only if
P (A B) = P (A)P (B)
20
9.3
Random Variables
Definition 1. A random variable is a real function defined on a sample space.
Definition 2. For a random variable X on some sample space S, define for all
subsets A S and real numbers r R,
{X A} = {s S : X(s) A}
{X = r} = {s S : X(s) = r}
{X 6 r} = {s S : X(s) 6 r}
... and so on.
Definition 3.
is given by
9.3.1
for x R .
9.3.2
21
Theorem 2.
9.4
Special Distributions
A Bernoulli trial is an experiment with two outcomes, often success and failure, or Y(es) and
N(o), or {1, 0}, where P (Y ) and P (N ) are denoted by p and q = 1 p, respectively. A Bernoulli
process is an experiment composed of a sequence of identical and mutually independent Bernoulli
trials.
9.4.1
Theorem 1. If X is the random variable that counts the successes of some Bernoulli process with
n trials having success probability p, then X has the binomial distribution B(n, p).
We write X B(n, p) to denote that X is a random variable with this distribution.
Theorem 2. If X is a random variable and X B(n, p), then
E(X) = np ;
9.4.2
Geometric Distribution
Definition 2. The Geometric distribution G(p) is the function
G(p, k) = (1 p)k1 p = q k1 p where k = 1, 2, . . . .
Theorem 3. Consider an infinite Bernoulli process of trials each of which has success probability p.
If the random variable X is the number of trials conducted until success occurs for the first time,
then X has the geometric distribution G(p).
22
1
p
Var(X) =
9.4.3
;
1p
.
p2
Sign Tests
Often, we have a sample of data consisting of independent observations of some quantity of interest,
and it might be of interest to see whether the observed values differ systematically from some fixed
and pre-determined value.
To answer this question, one may use a sign test approach as follows:
1. Count the number of observations that are strictly greater than the target value (+).
2. Count the total number of observations that are either strictly greater (+) or strictly smaller
() than the target value.
3. Calculate the tail probability that measures how often one would expect to observe as many
increases (+) as were observed, if there were equal probability of + and .
In this course, we will say that if the tail probability is less than 5% then we will regard this as
significant.
9.5
Strictly speaking, FX (x) must actually be piecewise differentiable, which means that FX (x) is
differentiable except for at most countably many points. However, the above definition is good
enough for our present purposes.
Definition 2. The probability density function f (x) of a continuous random
variable X is defined by
d
f (x) = fX (x) =
F (x) , x R
dx
d
F (x) if F (x) is not differentiable at x = a.
if F (x) is differentiable, and lim
xa dx
Theorem 1. F (x) =
f (t)dt.
9.5.1
23
Theorem 2. If X is a continuous random variable with density function f (x), and g(x) is a real
function, then the expected value of Y = g(X) is
Z
g(x)f (x)dx .
E(Y ) = E(g(X)) =
9.6
9.6.1
X
, then E(Z) = 0 and Var(Z) = 1.
Var(X) = 2 .
X
N (0, 1).
The normal distribution is used, among other things, to approximate the binomial distribution
B(n, p) when n grows large.
Theorem 2. If X N (, 2 ), then