Summary 1231 Alg

A Summary of
MATH1231 Mathematics 1B
ALGEBRA NOTES
For MATH1231 students, this summary is an extract from the Algerba Notes in the MATH1231/41
Course Pack for revision.
If you found any mistake or typo, pleaase send me an email to chi.mak@unsw.edu.au
ii
Chapter 6
VECTOR SPACES
6.1
Definitions and examples of vector spaces

Definition 1. A vector space V over the field F is a non-empty set V of vectors
on which addition of vectors is defined and multiplication by a scalar is defined in
such a way that the following ten fundamental properties are satisfied:
1. Closure under Addition. If u, v V , then u + v V .
2. Associative Law of Addition. If u, v, w V , then (u+v)+w = u+(v+w).
3. Commutative Law of Addition. If u, v V , then u + v = v + u.
4. Existence of Zero. There exists an element 0 V such that, for all v V ,
v + 0 = v.
5. Existence of Negative. For each v V there exists an element w V
(usually written as v), such that v + w = 0.
6. Closure under Multiplication by a Scalar. If v V and F, then
v V .
7. Associative Law of Multiplication by a Scalar. If , F and v V ,
then (v) = ()v.
8. If v V and 1 F is the scalar one, then 1v = v.
9. Scalar Distributive Law. If , F and v V , then ( + )v = v + v.
10. Vector Distributive Law. If F and u, v V , then (u + v) = u + v.
The following are vectors spaces.

Rn over R, where n is a positive integer.
Cn over C, where n is a positive integer.
P(F), Pn (F) over F, where F is a field. Usually F is either Q, R or C.
1
CHAPTER 6. VECTOR SPACES

Mmn (F) over F, where m, n are positive integers and F is a field.
Furthermore, the following set, its subset of all continuous function and its subset of all differentiable
functions are vector spaces over R.
R[X], the set of all possible real-valued functions with domain X.
6.2
Vector arithmetic
Proposition 1. In any vector space V , the following properties hold for addition.
1. Uniqueness of Zero. There is one and only one zero vector.
2. Cancellation Property. If u, v, w V satisfy u + v = u + w, then v = w.
3. Uniqueness of Negatives. For all v V , there exists only one w V such that v + w = 0.
Proposition 2. Suppose that V is a vector space over a field F, F, v V , 0 is the zero scalar
in F and 0 is the zero vector in V . Then the following properties hold for multiplication by a scalar:
1. Multiplication by the zero scalar. 0v = 0,
2. Multiplication of the zero vector. 0 = 0.
3. Multiplication by 1. (1)v = v (the additive inverse of v).
4. Zero products. If v = 0, then either = 0 or v = 0.
5. Cancellation Property. If v = v and v 6= 0 then = .
6.3
Subspaces
Definition 1. A subset S of a vector space V is called a subspace of V if S is
itself a vector space over the same field of scalars as V and under the same rules for
addition and multiplication by scalars.
In addition if there is at least one vector in V which is not contained in S, the
subspace S is called a proper subspace of V .
Theorem 1 (Subspace Theorem). A subset S of a vector space V over a field F, under the same
rules for addition and multiplication by scalars, is a subspace of V if and only if
i) The vector 0 in V also belongs to S.
ii) S is closed under vector addition, and
iii) S is closed under multiplication by scalars from F.
6.4. LINEAR COMBINATIONS AND SPANS
6.4
Linear combinations and spans

Definition 1. Let S = {v1 , . . . , vn } be a finite set of vectors in a vector space V
over a field F. Then a linear combination of S is a sum of scalar multiples of the
form
1 v1 + + n vn with 1 , . . . , n F.
Proposition 1 (Closure under Linear Combinations). If S is a finite set of vectors in a vector

space V , then every linear combination of S is also a vector in V .
Definition 2. Let S = {v1 , . . . , vn } be a finite set of vectors in a vector space V

over a field F. Then the span of the set S is the set of all linear combinations of S,
that is,
span (S) = span (v1 , . . . , vn )
= {v V : v = 1 v1 + + n vn
for some 1 , . . . , n F}.
Theorem 2 (A span is a subspace). If S is a finite, non-empty set of vectors in a vector space V ,

then span(S) is a subspace of V . Further, span(S) is the smallest subspace containing S (in the
sense that span(S) is a subspace of every subspace which contains S).
Definition 3. A finite set S of vectors in a vector space V is called a spanning

set for V if span (S) = V or equivalently, if every vector in V can be expressed as
a linear combination of vectors in S.
6.4.1
Matrices and spans in Rm
Proposition 3 (Matrices, Linear Combinations and Spans).

If S = {v1 , . . . , vn } is a set of
vectors in Rm and A is the m n matrix whose columns are the vectors v1 , . . . , vn then
a) a vector b in Rm can be expressed as a linear combination of S if and only if it can be
expressed in the form Ax for some x in Rn ,
b) a vector b in Rm belongs to span (S) if and only if the equation Ax = b has a solution x in
Rn .
Definition 4. The subspace of Rm spanned by the columns of an m n matrix A

is called the column space of A and is denoted by col(A).
6.4.2
6.5
Solving problems about spans
Linear independence
Definition 1. Suppose that S = {v1 , . . . , vn } is a subset of a vector space. The
set S is a linearly independent set if the only values of the scalars 1 , 2 , . . . , n
for which
1 v1 + + n vn = 0
are 1 = 2 = = n = 0.
Definition 2. Suppose that S = {v1 , . . . , vn } is a subset of a vector space. The

set S = {v1 , . . . , vn } is a linearly dependent set if it is not a linearly independent
set, that is, if there exist scalars 1 , . . . , n , not all zero, such that
1 v1 + + n vn = 0.
6.5.1
Solving problems about linear independence
We have seen that questions about spans in Rm can be answered by relating them to questions
about the existence of solutions for systems of linear equations. The same is true for questions
about linear dependence in Rm .
Proposition 1. If S = {a1 , . . . , an } is a set of vectors in Rm and A is the m n matrix whose
columns are the vectors a1 , . . . , an then the set S is linearly dependent if and only if the system
Ax = 0 has at least one non-zero solution x Rn .
6.5.2
Uniqueness and linear independence
Theorem 2 (Uniqueness of Linear Combinations). Let S be a finite, non-empty set of vectors

in a vector space and let v be a vector which can be written as a linear combination of S. Then
the values of the scalars in the linear combination for v are unique if and only if S is a linearly
independent set.
6.5.3
Spans and linear independence
Theorem 3. A set of vectors S is a linearly independent set if and only if no vector in S can be
written as a linear combination of the other vectors in S, that is, if and only if no vector in S is
in the span of the other vectors in S.
Note. The theorem is equivalent to:
A set of vectors S is a linearly dependent set if and only if at least one vector in S is in the
span of the other vectors in S.
Theorem 4. If S is a finite subset of a vector space V and the vector v is in V , then
span (S {v}) = span (S) if and only if v span (S).
Theorem 5. Suppose that S is a finite subset of a vector space. The span of every proper subset
of S is a proper subspace of span (S) if and only if S is a linearly independent set.
Theorem 6. If S is a finite linearly independent subset of a vector space V and v is in V but not
in span (S) then S {v} is a linearly independent set.
6.6. BASIS AND DIMENSION
6.6
Basis and dimension
6.6.1
Bases
Definition 1. A set of vectors B in a vector space V is called a basis for V if:
1. B is a linearly independent set, and
2. B is a spanning set for V (that is, span (B) = V ).
Let B = {v1 , . . . , vn } be a basis for a vector space V over F. Every vector v V can be
uniquely written as
v = 1 v1 + + n vn ,
where 1 , . . . , n F.
An orthonormal basis is a basis whose elements are all of length 1 and are mutually orthogonal.
The advantage of using an orthonormal basis is that we can write easily any vector as the unique
linear combination of the basis by dot product.
6.6.2
Dimension
Theorem 1. The number of vectors in any spanning set for a vector space V is always greater
than or equal to the number of vectors in any linearly independent set in V .
Theorem 2. If a vector space V has a finite basis then every set of basis vectors for V contains
the same number of vectors, that is, if B1 = {u1 , . . . , um } and B2 = {v1 , . . . , vn } are two bases for
the same vector space V then m = n.
Definition 2. If V is a vector space with a finite basis, the dimension of V ,

denoted by dim(V ), is the number of vectors in any basis for V . V is called a finite
dimensional vector space.
Theorem 3. Suppose that V is a finite dimensional vector space.

1. the number of vectors in any spanning set for V is greater than or equal to the dimension of
V;
2. the number of vectors in any linearly independent set in V is less than or equal to the dimension of V ;
3. if the number of vectors in a spanning set is equal to the dimension then the set is also a
linearly independent set and hence a basis for V ;
4. if the number of vectors in a linearly independent set is equal to the dimension then the set
is also a spanning set and hence a basis for V .
6.6.3
Existence and construction of bases
Theorem 4. If S is a finite non-empty subset of a vector space then S contains a subset which is
a basis for span (S).
In particular, if V is any non-zero vector space which can be spanned by a finite set of vectors
then V has a basis.
Theorem 5. Suppose that V is a vector space which can be spanned by a finite set of vectors. If S
is a linearly independent subset of V then there exists a basis for V which contains S as a subset.
In other words, every linearly independent subset of V can be extended to a basis for V .
Theorem 6 (Reducing a spanning set to a basis in Rm ). Suppose that S = {v1 , . . . , vn } is any
subset of Rm and A is the matrix whose columns are the members of S. If U is a row-echelon form
for A and S is created from S by deleting those vectors which correspond to non-leading columns
in U then S is a basis for span (S).
Theorem 7 (Extending a linearly independent set to a basis in Rm ).
Suppose that S = {v1 , . . . , vn } is a linearly independent subset of Rm and A is the matrix whose
columns are the members of S followed by the members of the standard basis for Rm . If U is a rowechelon form for A and S is created by choosing those columns of A which correspond to leading
columns in U then S is a basis for Rm containing S as a subset.
Proposition 8. If V is a finite-dimensional space and W is a subspace of V and dim(W ) = dim(V )
then W = V .
Proof. By Theorem 4, there exists B a basis for W . So B is a linearly independent set in V . By
Theorem 3 part 4, B is also a basis for V and V = span (B) = W .
6.9
6.9.1
A brief review of set and function notation

Set notation.
A set is any collection of elements. Sets are usually defined either by listing their elements or by
giving a rule for selection of the elements. The elements of a set are usually enclosed in braces {}.
Definition 1. Two sets A and B are equal (notation A = B) if every element of
A is an element of B, and if every element of B is an element of A.
To prove that A = B it is necessary to prove that the two conditions:
1. if x A then x B, and
2. if x B then x A
are both satisfied.
Definition 2. A set A is a subset of another set B (notation A B) if every
element of A is also an element of B.
6.9. A BRIEF REVIEW OF SET AND FUNCTION NOTATION
To prove that A B it is necessary to prove that the condition:

if x A then x B
is satisfied.
Definition 3. A is said to be a proper subset of B if A is a subset of B and at

least one element of B is not an element of A.
To prove that A is a proper subset of B it is necessary to prove that the two conditions:
1. if x A then x B, and
2. for some x B, x is not an element of A
are both satisfied.
Definition 4. The intersection of two sets A and B (notation: A B) is the set
of elements which are common to both sets.
That is,
A B = {x : x A and
x B}.
Definition 5. The union of two sets A and B (notation: A B) is the set of all
elements which are in either or both sets.
That is,
A B = {x : x A or
6.9.2
x B}.
Function notation
The notation f : X Y (which is read as f is a function (or map) from the set X to the set Y )
means that f is a rule which associates exactly one element y Y to each element x X. The y
associated with x is written as y = f (x) and is called the value of the function f at x or the
image of x under f . The set X is often called the domain of the function f and the set Y is often
called the codomain of the function f .
Equality of Functions. Two functions f : X Y and g : X Y are defined to be equal if and
only if f (x) = g(x) for all x X.
Addition of Functions. If f : X Y and g : X Y , and if elements of Y can be added, then
the sum function f + g is defined by
(f + g)(x) = f (x) + g(x) for all x X.
Multiplication by a Scalar. If f : X Y and F, where F is a field, and if elements of Y
can be multiplied by elements of F, then the function f is defined by

(f )(x) = f (x) for all x X.
Multiplication of Functions. If f : X Y and g : X Y , and if elements of Y can be

multiplied, then the product function f g is defined by
(f g)(x) = f (x)g(x) for all x X.
Composition of Functions. If g : X W and f : W Y , then the composition function
f g : X Y is defined by

(f g)(x) = f g(x) for all x X.
Chapter 7
LINEAR TRANSFORMATIONS
7.1
Introduction to linear maps

Definition 1. Let V and W be two vector spaces over the same field F. A function
T : V W is called a linear map or linear transformation if the following two
conditions are satisfied.
Addition Condition. T (v + v ) = T (v) + T (v ) for all v, v V , and
Scalar Multiplication Condition. T (v) = T (v) for all F and v V .
Proposition 1. If T : V W is a linear map, then

1. T (0) = 0 and
2. T (v) = T (v) for all v V .
Theorem 2. A function T : V W is a linear map if and only if for all 1 , 2 F and v1 , v2 V
T (1 v1 + 2 v2 ) = 1 T (v1 ) + 2 T (v2 ).
(#)
Theorem 3. If T is a linear map with domain V and S is a set of vectors in V , then the function
value of a linear combination of S is equal to the corresponding linear combination of the function
values of S, that is, if S = {v1 , . . . , vn } and 1 ,. . .,n are scalars, then
T (1 v1 + + n vn ) = 1 T (v1 ) + + n T (vn ).
Theorem 4. For a linear map T : V W , the function values for every vector in the domain are
known if and only if the function values for a basis of the domain are known.
Further, if B = {v1 , . . . , vn } is a basis for the domain V then for all v V we have
T (v) = x1 T (v1 ) + + xn T (vn ),
where x1 , . . . , xn are the scalars in the unique linear combination v = x1 v1 + + xn vn of the basis
B.
9
10
7.2
CHAPTER 7. LINEAR TRANSFORMATIONS
Linear maps from Rn to Rm and m n matrices
Theorem 1. For each m n matrix A, the function TA : Rn Rm , defined by

TA (x) = Ax
for
x Rn ,
is a linear map.
Theorem 2 (Matrix Representation Theorem). Let T : Rn Rm be a linear map and let the
vectors ej for 1 6 j 6 n be the standard basis vectors for Rn . Then the m n matrix A whose
columns are given by
aj = T (ej ) for 1 6 j 6 n
has the property that
T (x) = Ax for all x Rn .
7.3
Geometric examples of linear transformations
Proposition 1. Suppose that T : Rn Rm is a linear map. It maps a line in Rn to either a

line or a point in Rm .
7.4
7.4.1
Subspaces associated with linear maps

The kernel of a map
The kernel of a map.

Definition 1. Let T : V W be a linear map. Then the kernel of T (written
ker(T )) is the set of all zeroes of T , that is, it is the subset of the domain V defined
by
ker(T ) = {v V : T (v) = 0}.
Definition 2. For an m n matrix A, the kernel of A is the subset of Rn defined

by
ker(A) = {x Rn : Ax = 0} ,
that is, it is the set of all solutions of the homogeneous equation Ax = 0.
Theorem 1. If T : V W is a linear map, then ker(T ) is a subspace of the domain V .
Definition 3. The nullity of a linear map T is the dimension of ker(T ). The
nullity of a matrix A is the dimension of ker(A).
Proposition 2. Let A be an m n matrix with real entries and TA : Rn Rm the associated
linear transformation. Then
ker(TA ) = ker(A)
7.4. SUBSPACES ASSOCIATED WITH LINEAR MAPS
11
Proposition 3. For a matrix A:

nullity(A) = maximum number of independent vectors in the solution space of Ax = 0
= number of parameters in the solution of Ax = 0 obtained by Gaussian
elimination and back substitution
= number of non-leading columns in an equivalent row-echelon form U for A.
Proposition 4. The columns of a matrix A are linearly independent if and only if nullity(A) = 0.
7.4.2
Image
Definition 4. Let T : V W be a linear map. Then the image of T is the set
of all function values of T , that is, it is the subset of the codomain W defined by
im(T ) = {w W : w = T (v) for some v V }.
Definition 5. The image of an m n matrix A is the subset of Rm defined by

im(A) = {b Rm : b = Ax for some x Rn } .
Theorem 5. Let T : V W be a linear map between vector spaces V and W . Then im(T ) is a
subspace of the codomain W of T .
Proposition 6. Let A be an m n matrix with real entries and TA : Rn Rm the associated
linear transformation. Then
im(A) = im(TA )
Definition 6. The rank of a linear map T is the dimension of im(T ). The rank
of a matrix A is the dimension of im(A).
Proposition 7. For a matrix A:
rank(A)
= maximal number of linearly independent columns of A

= number of leading columns in a row-echelon form U for A
7.4.3
Rank, nullity and solutions of Ax = b
Theorem 8 (Rank-Nullity Theorem for Matrices). For any matrix A,

rank(A) + nullity(A) = number of columns of A.
Theorem 9 (Rank-Nullity Theorem). Suppose V and W are finite dimensional vector spaces and
T : V W is linear. Then
rank(T ) + nullity(T ) = dim(V ).
12
CHAPTER 7. LINEAR TRANSFORMATIONS
Theorem 10. The equation Ax = b has:

1. no solution if rank(A) 6= rank([A|b]), and
2. at least one solution if rank(A) = rank([A|b]). Further,
i) if nullity(A) = 0 the solution is unique, whereas,
ii) if nullity(A) = > 0, then the general solution is of the form
x = xp + 1 k1 + + k
for
1 , . . . , R,
where xp is any solution of Ax = b, and where {k1 , . . . , k } is a basis for ker(A).
7.5
7.10
Further applications and examples of linear maps

One-to-one, onto and inverses for functions
Definition 1. The range or image of a function is the set of all function values,
that is, for a function f : X Y ,
im(f ) = {y Y : y = f (x) for some x X}.
Definition 2. A function is said to be onto (or surjective) if the codomain is

equal to the image of the function, that is, a function f : X Y is onto if for all
y Y there exists an x X such that y = f (x).
Definition 3. A function is said to be one-to-one (or injective) if no point in

the codomain is the function value of more than one point in the domain, that is, a
function f : X Y is one-to-one if f (x1 ) = f (x2 ) if and only if x1 = x2 .
Definition 4. Let f : X Y be a function. Then a function g : Y X is called

an inverse of f if it satisfies the two conditions:
a) g f = idX , where idX is the identity function in X with the property that
idX (x) = x for all x X.
b) f g = idY , where idY is the identity function in Y with the property that
idY (y) = y for all y Y .
Theorem 1. A function has an inverse if and only if the function is both one-to-one and onto.
Chapter 8
EIGENVALUES AND
EIGENVECTORS
8.1
Definitions and examples

Definition 1. Let T : V V be a linear map. Then if a scalar and non-zero
vector v V satisfy
T (v) = v,
then is called an eigenvalue of T and v is called an eigenvector of T for the
eigenvalue .
Note. An eigenvector is non-zero, but zero can be an eigenvalue.
Definition 2. Let A Mnn (C) be a square matrix. Then if a scalar C and

non-zero vector v Cn satisfy
Av = v,
then is called an eigenvalue of A and v is called an eigenvector of A for the
eigenvalue .
8.1.1
Some fundamental results
Theorem 1. A scalar is an eigenvalue of a square matrix A if and only if det(A I) = 0, and

then v is an eigenvector of A for the eigenvalue if and only if v is a non-zero solution of the
homogeneous equation (A I)v = 0, i.e., if and only if v ker(A I) and v 6= 0.
Theorem 2. If A is an n n matrix and C, then det(A I) is a complex polynomial of
degree n in .
Definition 3. For a square matrix A, the polynomial p() = det(A I) is called

the characteristic polynomial for the matrix A.
13
14
CHAPTER 8. EIGENVALUES AND EIGENVECTORS
Theorem 3. An n n matrix A has exactly n eigenvalues in C (counted according to their

multiplicities). These eigenvalues are the zeroes of the characteristic polynomial p() = det(AI).
Note.
1. The equation p() = 0 is called the characteristic equation for A.
2. Theorem 3 is of fundamental theoretical importance, as it proves the existence of eigenvalues
of a matrix. However, with the exception of 2 2 and specially constructed larger matrices,
modern methods of finding eigenvalues of matrices do not make use of this theorem. These
efficient modern methods are currently available in standard matrix software packages such
as MAPLE, MATLAB.
8.1.2
8.2
Calculation of eigenvalues and eigenvectors
Eigenvectors, bases, and diagonalisation
Theorem 1. If an n n matrix has n distinct eigenvalues then it has n linearly independent

eigenvectors.
Note. Even if the n n matrix does not have n distinct eigenvalues, it may have n linearly
independent eigenvectors.
Theorem 2. If an n n matrix A has n linearly independent eigenvectors, then there exists an
invertible matrix M and a diagonal matrix D such that
M 1 AM = D.
Further, the diagonal elements of D are the eigenvalues of A and the columns of M are the eigenvectors of A, with the jth column of M being the eigenvector corresponding to the jth element of
the diagonal of D.
Conversely if M 1 AM = D with D diagonal then the columns of M are n linearly independent
eigenvectors of A.
Definition 1. A square matrix A is said to be a diagonalisable matrix if there

exists an invertible matrix M and diagonal matrix D such that M 1 AM = D.
8.3
8.3.1
Applications of eigenvalues and eigenvectors

Powers of A
Proposition 1. Let D be the diagonal matrix
1 0 . . . 0
0 2
0
D= .
.. .
.
.
.
.
. .
0 0 . . . n
8.3. APPLICATIONS OF EIGENVALUES AND EIGENVECTORS
15
Then, for k > 1,
k
1 0 . . . 0
0 k
0
2
Dk = .
. .
..
..
. ..
0 0 . . . kn
Proposition 2. If A is diagonalisable, that is, if there exists an invertible matrix M and diagonal
matrix D such that M 1 AM = D, then
Ak = M D k M 1
8.3.2
for integer
k > 1.
Solution of first-order linear differential equations
Proposition 3. y(t) = vet is a solution of

dy
= Ay
dt
if and only if is an eigenvalue of A and v is an eigenvector for the eigenvalue .
Proposition 4. If u1 (t) and u2 (t) are two solutions of the equation
dy
= Ay,
dt
then any linear combination of u1 and u2 is also a solution.
16
CHAPTER 8. EIGENVALUES AND EIGENVECTORS
Chapter 9
INTRODUCTION TO
PROBABILITY AND STATISTICS
9.1
Some Preliminary Set Theory

Definition 1. A set is a collection of objects. These objects are called elements.
Definition 2.
A set A is a subset of a set B (written A B) if and only if
each element of A is also an element of B; that is, if x A, then x B.
The power set P(A) of A is set of all subsets of A.
The universal set S is the set that denotes all objects of given interest.
The empty set (or {}) is the set with no elements.
Definition 3. A set S is countable if its elements can be listed as a sequence.
Definition 4.
For all subsets A, B S, define the following set operations:
complement of A:
Ac
= {x S : x
/ A}
intersection of A and B:
A B = {x S : x A and x B}
union of A and B:
A B = {x S : x A or x B}
difference:
A B = {x S : x A but x
/ B} = A B c
Definition 5. Sets A and B are disjoint (or mutually exclusive) if and only if
AB =
17
18
CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
Definition 6. Disjoint subsets A1 , . . . , Ak partition a set B if and only if

A1 Ak = B
Lemma 1. If A1 , . . . , An partition S and B is a subset of S, then A1 B, . . . , An B partition B.
Definition 7. If A is a set, then |A| is the number of elements in A.
The Inclusion-Exclusion Principle.
9.2
9.2.1
|A B| = |A| + |B| |A B|
Probability
Sample Space and Probability Axioms
Definition 1. A sample space of an experiment is a set of all possible outcomes.
Definition 2. A probability P on a sample space S is any real function on P(S)

that satisfies the following conditions:
(a) 0 6 P (A) 6 1 for all A S;
(b) P () = 0;
(c) P (S) = 1;
(d) If A and B are disjoint, then P (A B) = P (A) + P (B).
Theorem 1. Let P be a probability on a sample space S, and let A be an event in S.
X
1. If S is finite (or countable), then P (A) =
P ({a}) .
aA
|A|
2. If S is finite and P ({a}) is constant for all outcomes a S, then P (A) =
.
|S|
X
3. If S is finite (or countable), then
P ({a}) = 1 .
aS
9.2.2
Rules for Probabilities
Theorem 2. Let A and B be events of a sample space S.

1. P (A B) = P (A) + P (B) P (A B)
2. P (Ac ) = 1 P (A)
3. If A B, then P (A) 6 P (B).
(Addition Rule)
9.2. PROBABILITY
9.2.3
19
Conditional Probabilities
We now consider what happens if we restrict the sample space from S to some event in S.
Definition 3. The conditional probability of A given B is denoted and defined

by
P (A B)
P (A|B) =
provided that P (B) 6= 0 .
P (B)
Lemma 3. For any fixed event B, the function P (A|B) is a probability on S.
Multiplication Rule
P (A B) = P (A|B)P (B) = P (B|A)P (A)
Total Probability Rule

If A1 , . . . , An partition S and B is an event, then
P (B) =
n
X
P (B|Ai )P (Ai ) .
i=1
Bayes Rule
If A1 , . . . , An partition S and B is an event, then
P (Aj |B) =
P (B|Aj )P (Aj )
.
n
X
P (B|Ai )P (Ai )
i=1
9.2.4
Statistical Independence
Definition 4. Events A and B are (statistically) independent if and only if
P (A B) = P (A)P (B)
Definition 5. Events A1 , . . . , An are mutually independent if and only if, for

any Ai1 , . . . , Aik of these,
P (Ai1 Aik ) = P (Ai1 ) P (Aik ) .
Theorem 4. If events A1 , . . . , An are mutually independent and Bi is either Ai or Aci for each
i = 1, . . . , n, then B1 , . . . , Bn are also mutually independent.
Theorem 5. If events A1,1 , . . . , A1,n1 , A2,1 , . . . , Am,nm are mutually independent and for each
i = 1, . . . , m, the event Bi is obtained from Ai,1 , . . . , Ai,ni by taking unions, intersections, and
complements, then B1 , . . . , Bn are also mutually independent.
20
9.3
Random Variables
Definition 1. A random variable is a real function defined on a sample space.
Definition 2. For a random variable X on some sample space S, define for all
subsets A S and real numbers r R,
{X A} = {s S : X(s) A}
{X = r} = {s S : X(s) = r}
{X 6 r} = {s S : X(s) 6 r}
... and so on.
Definition 3.
is given by
The cumulative distribution function of a random variable X

FX (x) = P (X 6 x)
9.3.1
for x R .
Discrete Random Variables

Definition 4. A random variable X is discrete if its image is countable.
Definition 5. The probability distribution of a discrete random variable X is

the function P (X = x) on R. We sometimes write this as pk = P (X = xk ).
9.3.2
The Mean and Variance of a Discrete Random Variable

Definition 6. The expected value (or mean) of a discrete random variable X
with probability distribution pk
X
E(X) =
xk p k .
all k
The expected value E(X) is often denoted by or X .

Theorem 1. Let X be a discrete random variable with probability distribution pk = P (X = xk ).
Then for any real function g(x), the expected value of Y = g(X) is
X
E(Y ) = E(g(X)) =
g(xk )pk .
k
9.4. SPECIAL DISTRIBUTIONS
21
Definition 7. The variance of a discrete random variable X is

Var(X) = E (X E(X))2 .
p
The standard deviation of X is SD(X) = Var(X) .
2 .
The standard deviation is often denoted by or X , and the variance is often written as 2 or X
Theorem 2.
Var(X) = E(X 2 ) (E(X))2 .
Theorem 3. If a and b are constants, then

E(aX + b) = aE(X) + b
Var(aX + b) = a2 Var(X)
SD(aX + b) = |a| SD(X) .
9.4
Special Distributions
A Bernoulli trial is an experiment with two outcomes, often success and failure, or Y(es) and
N(o), or {1, 0}, where P (Y ) and P (N ) are denoted by p and q = 1 p, respectively. A Bernoulli
process is an experiment composed of a sequence of identical and mutually independent Bernoulli
trials.
9.4.1
The Binomial Distribution

Definition 1. The Binomial distribution B(n, p) for n N is the function

n k
B(n, p, k) =
p (1 p)nk where k = 0, 1, . . . , n .
k
Theorem 1. If X is the random variable that counts the successes of some Bernoulli process with
n trials having success probability p, then X has the binomial distribution B(n, p).
We write X B(n, p) to denote that X is a random variable with this distribution.
Theorem 2. If X is a random variable and X B(n, p), then
E(X) = np ;
Var(X) = npq = np(1 p).
9.4.2
Geometric Distribution
Definition 2. The Geometric distribution G(p) is the function
G(p, k) = (1 p)k1 p = q k1 p where k = 1, 2, . . . .
Theorem 3. Consider an infinite Bernoulli process of trials each of which has success probability p.
If the random variable X is the number of trials conducted until success occurs for the first time,
then X has the geometric distribution G(p).
22
Theorem 4. If X G(p) and n is a positive integer, then P (X > n) = (1 p)n = q n .

Corollary 5. If X G(p), then the cumulative distribution function F is given by F (x) = P (X 6
x) = 1 (1 p)x = 1 q x for x R.
Note that x denotes the largest integer less or equal to x.
Theorem 6. If X is a random variable and X G(p), then
E(X) =
1
p
Var(X) =
9.4.3
;
1p
.
p2
Sign Tests
Often, we have a sample of data consisting of independent observations of some quantity of interest,
and it might be of interest to see whether the observed values differ systematically from some fixed
and pre-determined value.
To answer this question, one may use a sign test approach as follows:
1. Count the number of observations that are strictly greater than the target value (+).
2. Count the total number of observations that are either strictly greater (+) or strictly smaller
() than the target value.
3. Calculate the tail probability that measures how often one would expect to observe as many
increases (+) as were observed, if there were equal probability of + and .
In this course, we will say that if the tail probability is less than 5% then we will regard this as
significant.
9.5
Continuous random variables

Definition 1. Random variable X is continuous if and only if FX (x) is continuous.
Strictly speaking, FX (x) must actually be piecewise differentiable, which means that FX (x) is
differentiable except for at most countably many points. However, the above definition is good
enough for our present purposes.
Definition 2. The probability density function f (x) of a continuous random
variable X is defined by
d
f (x) = fX (x) =
F (x) , x R
dx
d
F (x) if F (x) is not differentiable at x = a.
if F (x) is differentiable, and lim
xa dx
Theorem 1. F (x) =
f (t)dt.
9.6. SPECIAL CONTINUOUS DISTRIBUTIONS
9.5.1
23
The mean and variance of a continuous random variable

Definition 3. The expected value (or mean) of a continuous random variable X
with probability density function f (x) is defined to be
Z
xf (x)dx .
= E(X) =
Theorem 2. If X is a continuous random variable with density function f (x), and g(x) is a real
function, then the expected value of Y = g(X) is
Z
g(x)f (x)dx .
E(Y ) = E(g(X)) =
Definition 4. The variance of a continuous random variable X is

Var (X) = E((X E(X))2 ) = E(X 2 ) (E(X))2 .
p
The standard deviation of X is = SD(X) = Var (X).
Theorem 3. If a and b are constants, then
E(aX + b) = aE(X) + b
Var(aX + b) = a2 Var(X)
SD(aX + b) = |a| SD(X) .
Theorem 4. If E(X) = and Var(X) = 2 , and Z =
9.6
9.6.1
X
, then E(Z) = 0 and Var(Z) = 1.
Special Continuous Distributions

The Normal Distribution
Definition 1. A continuous random variable X has normal distribution N (, 2 )
if it has probability density
2
1
12 x
where < x < .

(x) =
e
2 2
Theorem 1. If X is a continuous random variable and X N (, 2 ), then

E(X) =
Var(X) = 2 .
X
N (0, 1).
The normal distribution is used, among other things, to approximate the binomial distribution
B(n, p) when n grows large.
Theorem 2. If X N (, 2 ), then

Summary 1231 Alg

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Summary 1231 Alg

Загружено:

Авторское право:

Доступные форматы

A Summary of

Definitions and examples of vector spaces

The following are vectors spaces.

CHAPTER 6. VECTOR SPACES

6.4. LINEAR COMBINATIONS AND SPANS

Linear combinations and spans

Proposition 1 (Closure under Linear Combinations). If S is a finite set of vectors in a vector

Definition 2. Let S = {v1 , . . . , vn } be a finite set of vectors in a vector space V

for some 1 , . . . , n F}.

Theorem 2 (A span is a subspace). If S is a finite, non-empty set of vectors in a vector space V ,

Definition 3. A finite set S of vectors in a vector space V is called a spanning

Matrices and spans in Rm

Proposition 3 (Matrices, Linear Combinations and Spans).

Definition 4. The subspace of Rm spanned by the columns of an m n matrix A

CHAPTER 6. VECTOR SPACES

Solving problems about spans

Definition 2. Suppose that S = {v1 , . . . , vn } is a subset of a vector space. The

Solving problems about linear independence

Uniqueness and linear independence

Theorem 2 (Uniqueness of Linear Combinations). Let S be a finite, non-empty set of vectors

Spans and linear independence

6.6. BASIS AND DIMENSION

Basis and dimension

Definition 2. If V is a vector space with a finite basis, the dimension of V ,

Theorem 3. Suppose that V is a finite dimensional vector space.

CHAPTER 6. VECTOR SPACES

Existence and construction of bases

A brief review of set and function notation

6.9. A BRIEF REVIEW OF SET AND FUNCTION NOTATION

To prove that A B it is necessary to prove that the condition:

Definition 3. A is said to be a proper subset of B if A is a subset of B and at

CHAPTER 6. VECTOR SPACES

Multiplication of Functions. If f : X Y and g : X Y , and if elements of Y can be

Introduction to linear maps

Proposition 1. If T : V W is a linear map, then

CHAPTER 7. LINEAR TRANSFORMATIONS

Linear maps from Rn to Rm and m n matrices

Theorem 1. For each m n matrix A, the function TA : Rn Rm , defined by

Geometric examples of linear transformations

Proposition 1. Suppose that T : Rn Rm is a linear map. It maps a line in Rn to either a

Subspaces associated with linear maps

The kernel of a map.

Definition 2. For an m n matrix A, the kernel of A is the subset of Rn defined

7.4. SUBSPACES ASSOCIATED WITH LINEAR MAPS

Proposition 3. For a matrix A:

Definition 5. The image of an m n matrix A is the subset of Rm defined by

= maximal number of linearly independent columns of A

Rank, nullity and solutions of Ax = b

Theorem 8 (Rank-Nullity Theorem for Matrices). For any matrix A,

CHAPTER 7. LINEAR TRANSFORMATIONS

Theorem 10. The equation Ax = b has:

where xp is any solution of Ax = b, and where {k1 , . . . , k } is a basis for ker(A).

Further applications and examples of linear maps

Definition 2. A function is said to be onto (or surjective) if the codomain is

Definition 3. A function is said to be one-to-one (or injective) if no point in

Definition 4. Let f : X Y be a function. Then a function g : Y X is called

Definitions and examples

Note. An eigenvector is non-zero, but zero can be an eigenvalue.

Definition 2. Let A Mnn (C) be a square matrix. Then if a scalar C and

Some fundamental results

Theorem 1. A scalar is an eigenvalue of a square matrix A if and only if det(A I) = 0, and

Definition 3. For a square matrix A, the polynomial p() = det(A I) is called

CHAPTER 8. EIGENVALUES AND EIGENVECTORS

Theorem 3. An n n matrix A has exactly n eigenvalues in C (counted according to their