Вы находитесь на странице: 1из 25

A Summary of

MATH1231 Mathematics 1B

ALGEBRA NOTES

For MATH1231 students, this summary is an extract from the Algerba Notes in the MATH1231/41
Course Pack for revision.
If you found any mistake or typo, pleaase send me an email to chi.mak@unsw.edu.au

ii

Chapter 6

VECTOR SPACES
6.1

Definitions and examples of vector spaces


Definition 1. A vector space V over the field F is a non-empty set V of vectors
on which addition of vectors is defined and multiplication by a scalar is defined in
such a way that the following ten fundamental properties are satisfied:
1. Closure under Addition. If u, v V , then u + v V .
2. Associative Law of Addition. If u, v, w V , then (u+v)+w = u+(v+w).
3. Commutative Law of Addition. If u, v V , then u + v = v + u.
4. Existence of Zero. There exists an element 0 V such that, for all v V ,
v + 0 = v.
5. Existence of Negative. For each v V there exists an element w V
(usually written as v), such that v + w = 0.
6. Closure under Multiplication by a Scalar. If v V and F, then
v V .
7. Associative Law of Multiplication by a Scalar. If , F and v V ,
then (v) = ()v.
8. If v V and 1 F is the scalar one, then 1v = v.
9. Scalar Distributive Law. If , F and v V , then ( + )v = v + v.
10. Vector Distributive Law. If F and u, v V , then (u + v) = u + v.

The following are vectors spaces.


Rn over R, where n is a positive integer.
Cn over C, where n is a positive integer.
P(F), Pn (F) over F, where F is a field. Usually F is either Q, R or C.
1

CHAPTER 6. VECTOR SPACES


Mmn (F) over F, where m, n are positive integers and F is a field.

Furthermore, the following set, its subset of all continuous function and its subset of all differentiable
functions are vector spaces over R.
R[X], the set of all possible real-valued functions with domain X.

6.2

Vector arithmetic

Proposition 1. In any vector space V , the following properties hold for addition.
1. Uniqueness of Zero. There is one and only one zero vector.
2. Cancellation Property. If u, v, w V satisfy u + v = u + w, then v = w.
3. Uniqueness of Negatives. For all v V , there exists only one w V such that v + w = 0.
Proposition 2. Suppose that V is a vector space over a field F, F, v V , 0 is the zero scalar
in F and 0 is the zero vector in V . Then the following properties hold for multiplication by a scalar:
1. Multiplication by the zero scalar. 0v = 0,
2. Multiplication of the zero vector. 0 = 0.
3. Multiplication by 1. (1)v = v (the additive inverse of v).
4. Zero products. If v = 0, then either = 0 or v = 0.
5. Cancellation Property. If v = v and v 6= 0 then = .

6.3

Subspaces
Definition 1. A subset S of a vector space V is called a subspace of V if S is
itself a vector space over the same field of scalars as V and under the same rules for
addition and multiplication by scalars.
In addition if there is at least one vector in V which is not contained in S, the
subspace S is called a proper subspace of V .

Theorem 1 (Subspace Theorem). A subset S of a vector space V over a field F, under the same
rules for addition and multiplication by scalars, is a subspace of V if and only if
i) The vector 0 in V also belongs to S.
ii) S is closed under vector addition, and
iii) S is closed under multiplication by scalars from F.

6.4. LINEAR COMBINATIONS AND SPANS

6.4

Linear combinations and spans


Definition 1. Let S = {v1 , . . . , vn } be a finite set of vectors in a vector space V
over a field F. Then a linear combination of S is a sum of scalar multiples of the
form
1 v1 + + n vn with 1 , . . . , n F.

Proposition 1 (Closure under Linear Combinations). If S is a finite set of vectors in a vector


space V , then every linear combination of S is also a vector in V .

Definition 2. Let S = {v1 , . . . , vn } be a finite set of vectors in a vector space V


over a field F. Then the span of the set S is the set of all linear combinations of S,
that is,
span (S) = span (v1 , . . . , vn )
= {v V : v = 1 v1 + + n vn

for some 1 , . . . , n F}.

Theorem 2 (A span is a subspace). If S is a finite, non-empty set of vectors in a vector space V ,


then span(S) is a subspace of V . Further, span(S) is the smallest subspace containing S (in the
sense that span(S) is a subspace of every subspace which contains S).

Definition 3. A finite set S of vectors in a vector space V is called a spanning


set for V if span (S) = V or equivalently, if every vector in V can be expressed as
a linear combination of vectors in S.

6.4.1

Matrices and spans in Rm

Proposition 3 (Matrices, Linear Combinations and Spans).


If S = {v1 , . . . , vn } is a set of
vectors in Rm and A is the m n matrix whose columns are the vectors v1 , . . . , vn then
a) a vector b in Rm can be expressed as a linear combination of S if and only if it can be
expressed in the form Ax for some x in Rn ,
b) a vector b in Rm belongs to span (S) if and only if the equation Ax = b has a solution x in
Rn .

Definition 4. The subspace of Rm spanned by the columns of an m n matrix A


is called the column space of A and is denoted by col(A).

6.4.2

6.5

CHAPTER 6. VECTOR SPACES

Solving problems about spans

Linear independence
Definition 1. Suppose that S = {v1 , . . . , vn } is a subset of a vector space. The
set S is a linearly independent set if the only values of the scalars 1 , 2 , . . . , n
for which
1 v1 + + n vn = 0
are 1 = 2 = = n = 0.

Definition 2. Suppose that S = {v1 , . . . , vn } is a subset of a vector space. The


set S = {v1 , . . . , vn } is a linearly dependent set if it is not a linearly independent
set, that is, if there exist scalars 1 , . . . , n , not all zero, such that
1 v1 + + n vn = 0.

6.5.1

Solving problems about linear independence

We have seen that questions about spans in Rm can be answered by relating them to questions
about the existence of solutions for systems of linear equations. The same is true for questions
about linear dependence in Rm .
Proposition 1. If S = {a1 , . . . , an } is a set of vectors in Rm and A is the m n matrix whose
columns are the vectors a1 , . . . , an then the set S is linearly dependent if and only if the system
Ax = 0 has at least one non-zero solution x Rn .

6.5.2

Uniqueness and linear independence

Theorem 2 (Uniqueness of Linear Combinations). Let S be a finite, non-empty set of vectors


in a vector space and let v be a vector which can be written as a linear combination of S. Then
the values of the scalars in the linear combination for v are unique if and only if S is a linearly
independent set.

6.5.3

Spans and linear independence

Theorem 3. A set of vectors S is a linearly independent set if and only if no vector in S can be
written as a linear combination of the other vectors in S, that is, if and only if no vector in S is
in the span of the other vectors in S.
Note. The theorem is equivalent to:
A set of vectors S is a linearly dependent set if and only if at least one vector in S is in the
span of the other vectors in S.
Theorem 4. If S is a finite subset of a vector space V and the vector v is in V , then
span (S {v}) = span (S) if and only if v span (S).
Theorem 5. Suppose that S is a finite subset of a vector space. The span of every proper subset
of S is a proper subspace of span (S) if and only if S is a linearly independent set.
Theorem 6. If S is a finite linearly independent subset of a vector space V and v is in V but not
in span (S) then S {v} is a linearly independent set.

6.6. BASIS AND DIMENSION

6.6

Basis and dimension

6.6.1

Bases
Definition 1. A set of vectors B in a vector space V is called a basis for V if:
1. B is a linearly independent set, and
2. B is a spanning set for V (that is, span (B) = V ).

Let B = {v1 , . . . , vn } be a basis for a vector space V over F. Every vector v V can be
uniquely written as
v = 1 v1 + + n vn ,

where 1 , . . . , n F.

An orthonormal basis is a basis whose elements are all of length 1 and are mutually orthogonal.
The advantage of using an orthonormal basis is that we can write easily any vector as the unique
linear combination of the basis by dot product.

6.6.2

Dimension

Theorem 1. The number of vectors in any spanning set for a vector space V is always greater
than or equal to the number of vectors in any linearly independent set in V .
Theorem 2. If a vector space V has a finite basis then every set of basis vectors for V contains
the same number of vectors, that is, if B1 = {u1 , . . . , um } and B2 = {v1 , . . . , vn } are two bases for
the same vector space V then m = n.

Definition 2. If V is a vector space with a finite basis, the dimension of V ,


denoted by dim(V ), is the number of vectors in any basis for V . V is called a finite
dimensional vector space.

Theorem 3. Suppose that V is a finite dimensional vector space.


1. the number of vectors in any spanning set for V is greater than or equal to the dimension of
V;
2. the number of vectors in any linearly independent set in V is less than or equal to the dimension of V ;
3. if the number of vectors in a spanning set is equal to the dimension then the set is also a
linearly independent set and hence a basis for V ;
4. if the number of vectors in a linearly independent set is equal to the dimension then the set
is also a spanning set and hence a basis for V .

CHAPTER 6. VECTOR SPACES

6.6.3

Existence and construction of bases

Theorem 4. If S is a finite non-empty subset of a vector space then S contains a subset which is
a basis for span (S).
In particular, if V is any non-zero vector space which can be spanned by a finite set of vectors
then V has a basis.
Theorem 5. Suppose that V is a vector space which can be spanned by a finite set of vectors. If S
is a linearly independent subset of V then there exists a basis for V which contains S as a subset.
In other words, every linearly independent subset of V can be extended to a basis for V .
Theorem 6 (Reducing a spanning set to a basis in Rm ). Suppose that S = {v1 , . . . , vn } is any
subset of Rm and A is the matrix whose columns are the members of S. If U is a row-echelon form
for A and S is created from S by deleting those vectors which correspond to non-leading columns
in U then S is a basis for span (S).
Theorem 7 (Extending a linearly independent set to a basis in Rm ).
Suppose that S = {v1 , . . . , vn } is a linearly independent subset of Rm and A is the matrix whose
columns are the members of S followed by the members of the standard basis for Rm . If U is a rowechelon form for A and S is created by choosing those columns of A which correspond to leading
columns in U then S is a basis for Rm containing S as a subset.
Proposition 8. If V is a finite-dimensional space and W is a subspace of V and dim(W ) = dim(V )
then W = V .
Proof. By Theorem 4, there exists B a basis for W . So B is a linearly independent set in V . By
Theorem 3 part 4, B is also a basis for V and V = span (B) = W .

6.9
6.9.1

A brief review of set and function notation


Set notation.

A set is any collection of elements. Sets are usually defined either by listing their elements or by
giving a rule for selection of the elements. The elements of a set are usually enclosed in braces {}.
Definition 1. Two sets A and B are equal (notation A = B) if every element of
A is an element of B, and if every element of B is an element of A.
To prove that A = B it is necessary to prove that the two conditions:
1. if x A then x B, and
2. if x B then x A
are both satisfied.
Definition 2. A set A is a subset of another set B (notation A B) if every
element of A is also an element of B.

6.9. A BRIEF REVIEW OF SET AND FUNCTION NOTATION

To prove that A B it is necessary to prove that the condition:


if x A then x B

is satisfied.

Definition 3. A is said to be a proper subset of B if A is a subset of B and at


least one element of B is not an element of A.
To prove that A is a proper subset of B it is necessary to prove that the two conditions:
1. if x A then x B, and
2. for some x B, x is not an element of A
are both satisfied.
Definition 4. The intersection of two sets A and B (notation: A B) is the set
of elements which are common to both sets.
That is,
A B = {x : x A and

x B}.

Definition 5. The union of two sets A and B (notation: A B) is the set of all
elements which are in either or both sets.
That is,
A B = {x : x A or

6.9.2

x B}.

Function notation

The notation f : X Y (which is read as f is a function (or map) from the set X to the set Y )
means that f is a rule which associates exactly one element y Y to each element x X. The y
associated with x is written as y = f (x) and is called the value of the function f at x or the
image of x under f . The set X is often called the domain of the function f and the set Y is often
called the codomain of the function f .
Equality of Functions. Two functions f : X Y and g : X Y are defined to be equal if and
only if f (x) = g(x) for all x X.
Addition of Functions. If f : X Y and g : X Y , and if elements of Y can be added, then
the sum function f + g is defined by
(f + g)(x) = f (x) + g(x) for all x X.
Multiplication by a Scalar. If f : X Y and F, where F is a field, and if elements of Y
can be multiplied by elements of F, then the function f is defined by

(f )(x) = f (x) for all x X.

CHAPTER 6. VECTOR SPACES

Multiplication of Functions. If f : X Y and g : X Y , and if elements of Y can be


multiplied, then the product function f g is defined by
(f g)(x) = f (x)g(x) for all x X.
Composition of Functions. If g : X W and f : W Y , then the composition function
f g : X Y is defined by

(f g)(x) = f g(x) for all x X.

Chapter 7

LINEAR TRANSFORMATIONS
7.1

Introduction to linear maps


Definition 1. Let V and W be two vector spaces over the same field F. A function
T : V W is called a linear map or linear transformation if the following two
conditions are satisfied.
Addition Condition. T (v + v ) = T (v) + T (v ) for all v, v V , and
Scalar Multiplication Condition. T (v) = T (v) for all F and v V .

Proposition 1. If T : V W is a linear map, then


1. T (0) = 0 and
2. T (v) = T (v) for all v V .
Theorem 2. A function T : V W is a linear map if and only if for all 1 , 2 F and v1 , v2 V
T (1 v1 + 2 v2 ) = 1 T (v1 ) + 2 T (v2 ).

(#)

Theorem 3. If T is a linear map with domain V and S is a set of vectors in V , then the function
value of a linear combination of S is equal to the corresponding linear combination of the function
values of S, that is, if S = {v1 , . . . , vn } and 1 ,. . .,n are scalars, then
T (1 v1 + + n vn ) = 1 T (v1 ) + + n T (vn ).
Theorem 4. For a linear map T : V W , the function values for every vector in the domain are
known if and only if the function values for a basis of the domain are known.
Further, if B = {v1 , . . . , vn } is a basis for the domain V then for all v V we have
T (v) = x1 T (v1 ) + + xn T (vn ),
where x1 , . . . , xn are the scalars in the unique linear combination v = x1 v1 + + xn vn of the basis
B.
9

10

7.2

CHAPTER 7. LINEAR TRANSFORMATIONS

Linear maps from Rn to Rm and m n matrices

Theorem 1. For each m n matrix A, the function TA : Rn Rm , defined by


TA (x) = Ax

for

x Rn ,

is a linear map.
Theorem 2 (Matrix Representation Theorem). Let T : Rn Rm be a linear map and let the
vectors ej for 1 6 j 6 n be the standard basis vectors for Rn . Then the m n matrix A whose
columns are given by
aj = T (ej ) for 1 6 j 6 n
has the property that
T (x) = Ax for all x Rn .

7.3

Geometric examples of linear transformations

Proposition 1. Suppose that T : Rn Rm is a linear map. It maps a line in Rn to either a


line or a point in Rm .

7.4
7.4.1

Subspaces associated with linear maps


The kernel of a map

The kernel of a map.


Definition 1. Let T : V W be a linear map. Then the kernel of T (written
ker(T )) is the set of all zeroes of T , that is, it is the subset of the domain V defined
by
ker(T ) = {v V : T (v) = 0}.

Definition 2. For an m n matrix A, the kernel of A is the subset of Rn defined


by
ker(A) = {x Rn : Ax = 0} ,
that is, it is the set of all solutions of the homogeneous equation Ax = 0.
Theorem 1. If T : V W is a linear map, then ker(T ) is a subspace of the domain V .
Definition 3. The nullity of a linear map T is the dimension of ker(T ). The
nullity of a matrix A is the dimension of ker(A).
Proposition 2. Let A be an m n matrix with real entries and TA : Rn Rm the associated
linear transformation. Then
ker(TA ) = ker(A)

7.4. SUBSPACES ASSOCIATED WITH LINEAR MAPS

11

Proposition 3. For a matrix A:


nullity(A) = maximum number of independent vectors in the solution space of Ax = 0
= number of parameters in the solution of Ax = 0 obtained by Gaussian
elimination and back substitution
= number of non-leading columns in an equivalent row-echelon form U for A.

Proposition 4. The columns of a matrix A are linearly independent if and only if nullity(A) = 0.

7.4.2

Image
Definition 4. Let T : V W be a linear map. Then the image of T is the set
of all function values of T , that is, it is the subset of the codomain W defined by
im(T ) = {w W : w = T (v) for some v V }.

Definition 5. The image of an m n matrix A is the subset of Rm defined by


im(A) = {b Rm : b = Ax for some x Rn } .
Theorem 5. Let T : V W be a linear map between vector spaces V and W . Then im(T ) is a
subspace of the codomain W of T .
Proposition 6. Let A be an m n matrix with real entries and TA : Rn Rm the associated
linear transformation. Then
im(A) = im(TA )

Definition 6. The rank of a linear map T is the dimension of im(T ). The rank
of a matrix A is the dimension of im(A).
Proposition 7. For a matrix A:
rank(A)

= maximal number of linearly independent columns of A


= number of leading columns in a row-echelon form U for A

7.4.3

Rank, nullity and solutions of Ax = b

Theorem 8 (Rank-Nullity Theorem for Matrices). For any matrix A,


rank(A) + nullity(A) = number of columns of A.
Theorem 9 (Rank-Nullity Theorem). Suppose V and W are finite dimensional vector spaces and
T : V W is linear. Then
rank(T ) + nullity(T ) = dim(V ).

12

CHAPTER 7. LINEAR TRANSFORMATIONS

Theorem 10. The equation Ax = b has:


1. no solution if rank(A) 6= rank([A|b]), and
2. at least one solution if rank(A) = rank([A|b]). Further,
i) if nullity(A) = 0 the solution is unique, whereas,
ii) if nullity(A) = > 0, then the general solution is of the form
x = xp + 1 k1 + + k

for

1 , . . . , R,

where xp is any solution of Ax = b, and where {k1 , . . . , k } is a basis for ker(A).

7.5
7.10

Further applications and examples of linear maps


One-to-one, onto and inverses for functions
Definition 1. The range or image of a function is the set of all function values,
that is, for a function f : X Y ,
im(f ) = {y Y : y = f (x) for some x X}.

Definition 2. A function is said to be onto (or surjective) if the codomain is


equal to the image of the function, that is, a function f : X Y is onto if for all
y Y there exists an x X such that y = f (x).

Definition 3. A function is said to be one-to-one (or injective) if no point in


the codomain is the function value of more than one point in the domain, that is, a
function f : X Y is one-to-one if f (x1 ) = f (x2 ) if and only if x1 = x2 .

Definition 4. Let f : X Y be a function. Then a function g : Y X is called


an inverse of f if it satisfies the two conditions:
a) g f = idX , where idX is the identity function in X with the property that
idX (x) = x for all x X.
b) f g = idY , where idY is the identity function in Y with the property that
idY (y) = y for all y Y .
Theorem 1. A function has an inverse if and only if the function is both one-to-one and onto.

Chapter 8

EIGENVALUES AND
EIGENVECTORS
8.1

Definitions and examples


Definition 1. Let T : V V be a linear map. Then if a scalar and non-zero
vector v V satisfy
T (v) = v,
then is called an eigenvalue of T and v is called an eigenvector of T for the
eigenvalue .

Note. An eigenvector is non-zero, but zero can be an eigenvalue.

Definition 2. Let A Mnn (C) be a square matrix. Then if a scalar C and


non-zero vector v Cn satisfy
Av = v,
then is called an eigenvalue of A and v is called an eigenvector of A for the
eigenvalue .

8.1.1

Some fundamental results

Theorem 1. A scalar is an eigenvalue of a square matrix A if and only if det(A I) = 0, and


then v is an eigenvector of A for the eigenvalue if and only if v is a non-zero solution of the
homogeneous equation (A I)v = 0, i.e., if and only if v ker(A I) and v 6= 0.
Theorem 2. If A is an n n matrix and C, then det(A I) is a complex polynomial of
degree n in .

Definition 3. For a square matrix A, the polynomial p() = det(A I) is called


the characteristic polynomial for the matrix A.
13

14

CHAPTER 8. EIGENVALUES AND EIGENVECTORS

Theorem 3. An n n matrix A has exactly n eigenvalues in C (counted according to their


multiplicities). These eigenvalues are the zeroes of the characteristic polynomial p() = det(AI).
Note.
1. The equation p() = 0 is called the characteristic equation for A.
2. Theorem 3 is of fundamental theoretical importance, as it proves the existence of eigenvalues
of a matrix. However, with the exception of 2 2 and specially constructed larger matrices,
modern methods of finding eigenvalues of matrices do not make use of this theorem. These
efficient modern methods are currently available in standard matrix software packages such
as MAPLE, MATLAB.

8.1.2

8.2

Calculation of eigenvalues and eigenvectors

Eigenvectors, bases, and diagonalisation

Theorem 1. If an n n matrix has n distinct eigenvalues then it has n linearly independent


eigenvectors.
Note. Even if the n n matrix does not have n distinct eigenvalues, it may have n linearly
independent eigenvectors.
Theorem 2. If an n n matrix A has n linearly independent eigenvectors, then there exists an
invertible matrix M and a diagonal matrix D such that
M 1 AM = D.
Further, the diagonal elements of D are the eigenvalues of A and the columns of M are the eigenvectors of A, with the jth column of M being the eigenvector corresponding to the jth element of
the diagonal of D.
Conversely if M 1 AM = D with D diagonal then the columns of M are n linearly independent
eigenvectors of A.

Definition 1. A square matrix A is said to be a diagonalisable matrix if there


exists an invertible matrix M and diagonal matrix D such that M 1 AM = D.

8.3
8.3.1

Applications of eigenvalues and eigenvectors


Powers of A

Proposition 1. Let D be the diagonal matrix

1 0 . . . 0
0 2
0

D= .
.. .
.
.
.
.
. .
0 0 . . . n

8.3. APPLICATIONS OF EIGENVALUES AND EIGENVECTORS

15

Then, for k > 1,

k
1 0 . . . 0
0 k
0
2

Dk = .
. .
..
..
. ..
0 0 . . . kn
Proposition 2. If A is diagonalisable, that is, if there exists an invertible matrix M and diagonal
matrix D such that M 1 AM = D, then
Ak = M D k M 1

8.3.2

for integer

k > 1.

Solution of first-order linear differential equations

Proposition 3. y(t) = vet is a solution of


dy
= Ay
dt
if and only if is an eigenvalue of A and v is an eigenvector for the eigenvalue .
Proposition 4. If u1 (t) and u2 (t) are two solutions of the equation
dy
= Ay,
dt
then any linear combination of u1 and u2 is also a solution.

16

CHAPTER 8. EIGENVALUES AND EIGENVECTORS

Chapter 9

INTRODUCTION TO
PROBABILITY AND STATISTICS
9.1

Some Preliminary Set Theory


Definition 1. A set is a collection of objects. These objects are called elements.

Definition 2.
A set A is a subset of a set B (written A B) if and only if
each element of A is also an element of B; that is, if x A, then x B.
The power set P(A) of A is set of all subsets of A.
The universal set S is the set that denotes all objects of given interest.
The empty set (or {}) is the set with no elements.

Definition 3. A set S is countable if its elements can be listed as a sequence.

Definition 4.

For all subsets A, B S, define the following set operations:

complement of A:

Ac

= {x S : x
/ A}

intersection of A and B:

A B = {x S : x A and x B}

union of A and B:

A B = {x S : x A or x B}

difference:

A B = {x S : x A but x
/ B} = A B c

Definition 5. Sets A and B are disjoint (or mutually exclusive) if and only if
AB =
17

18

CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS

Definition 6. Disjoint subsets A1 , . . . , Ak partition a set B if and only if


A1 Ak = B
Lemma 1. If A1 , . . . , An partition S and B is a subset of S, then A1 B, . . . , An B partition B.
Definition 7. If A is a set, then |A| is the number of elements in A.

The Inclusion-Exclusion Principle.

9.2
9.2.1

|A B| = |A| + |B| |A B|

Probability
Sample Space and Probability Axioms
Definition 1. A sample space of an experiment is a set of all possible outcomes.

Definition 2. A probability P on a sample space S is any real function on P(S)


that satisfies the following conditions:
(a) 0 6 P (A) 6 1 for all A S;
(b) P () = 0;
(c) P (S) = 1;
(d) If A and B are disjoint, then P (A B) = P (A) + P (B).
Theorem 1. Let P be a probability on a sample space S, and let A be an event in S.
X
1. If S is finite (or countable), then P (A) =
P ({a}) .
aA

|A|
2. If S is finite and P ({a}) is constant for all outcomes a S, then P (A) =
.
|S|
X
3. If S is finite (or countable), then
P ({a}) = 1 .
aS

9.2.2

Rules for Probabilities

Theorem 2. Let A and B be events of a sample space S.


1. P (A B) = P (A) + P (B) P (A B)
2. P (Ac ) = 1 P (A)
3. If A B, then P (A) 6 P (B).

(Addition Rule)

9.2. PROBABILITY

9.2.3

19

Conditional Probabilities

We now consider what happens if we restrict the sample space from S to some event in S.

Definition 3. The conditional probability of A given B is denoted and defined


by
P (A B)
P (A|B) =
provided that P (B) 6= 0 .
P (B)

Lemma 3. For any fixed event B, the function P (A|B) is a probability on S.

Multiplication Rule

P (A B) = P (A|B)P (B) = P (B|A)P (A)

Total Probability Rule


If A1 , . . . , An partition S and B is an event, then

P (B) =

n
X

P (B|Ai )P (Ai ) .

i=1

Bayes Rule
If A1 , . . . , An partition S and B is an event, then

P (Aj |B) =

P (B|Aj )P (Aj )
.
n
X
P (B|Ai )P (Ai )
i=1

9.2.4

Statistical Independence
Definition 4. Events A and B are (statistically) independent if and only if
P (A B) = P (A)P (B)

Definition 5. Events A1 , . . . , An are mutually independent if and only if, for


any Ai1 , . . . , Aik of these,
P (Ai1 Aik ) = P (Ai1 ) P (Aik ) .
Theorem 4. If events A1 , . . . , An are mutually independent and Bi is either Ai or Aci for each
i = 1, . . . , n, then B1 , . . . , Bn are also mutually independent.
Theorem 5. If events A1,1 , . . . , A1,n1 , A2,1 , . . . , Am,nm are mutually independent and for each
i = 1, . . . , m, the event Bi is obtained from Ai,1 , . . . , Ai,ni by taking unions, intersections, and
complements, then B1 , . . . , Bn are also mutually independent.

20

9.3

CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS

Random Variables
Definition 1. A random variable is a real function defined on a sample space.

Definition 2. For a random variable X on some sample space S, define for all
subsets A S and real numbers r R,
{X A} = {s S : X(s) A}
{X = r} = {s S : X(s) = r}
{X 6 r} = {s S : X(s) 6 r}
... and so on.

Definition 3.
is given by

The cumulative distribution function of a random variable X


FX (x) = P (X 6 x)

9.3.1

for x R .

Discrete Random Variables


Definition 4. A random variable X is discrete if its image is countable.

Definition 5. The probability distribution of a discrete random variable X is


the function P (X = x) on R. We sometimes write this as pk = P (X = xk ).

9.3.2

The Mean and Variance of a Discrete Random Variable


Definition 6. The expected value (or mean) of a discrete random variable X
with probability distribution pk
X
E(X) =
xk p k .
all k

The expected value E(X) is often denoted by or X .


Theorem 1. Let X be a discrete random variable with probability distribution pk = P (X = xk ).
Then for any real function g(x), the expected value of Y = g(X) is
X
E(Y ) = E(g(X)) =
g(xk )pk .
k

9.4. SPECIAL DISTRIBUTIONS

21

Definition 7. The variance of a discrete random variable X is



Var(X) = E (X E(X))2 .
p
The standard deviation of X is SD(X) = Var(X) .
2 .
The standard deviation is often denoted by or X , and the variance is often written as 2 or X

Theorem 2.

Var(X) = E(X 2 ) (E(X))2 .

Theorem 3. If a and b are constants, then


E(aX + b) = aE(X) + b
Var(aX + b) = a2 Var(X)
SD(aX + b) = |a| SD(X) .

9.4

Special Distributions

A Bernoulli trial is an experiment with two outcomes, often success and failure, or Y(es) and
N(o), or {1, 0}, where P (Y ) and P (N ) are denoted by p and q = 1 p, respectively. A Bernoulli
process is an experiment composed of a sequence of identical and mutually independent Bernoulli
trials.

9.4.1

The Binomial Distribution


Definition 1. The Binomial distribution B(n, p) for n N is the function
 
n k
B(n, p, k) =
p (1 p)nk where k = 0, 1, . . . , n .
k

Theorem 1. If X is the random variable that counts the successes of some Bernoulli process with
n trials having success probability p, then X has the binomial distribution B(n, p).
We write X B(n, p) to denote that X is a random variable with this distribution.
Theorem 2. If X is a random variable and X B(n, p), then
E(X) = np ;

Var(X) = npq = np(1 p).

9.4.2

Geometric Distribution
Definition 2. The Geometric distribution G(p) is the function
G(p, k) = (1 p)k1 p = q k1 p where k = 1, 2, . . . .

Theorem 3. Consider an infinite Bernoulli process of trials each of which has success probability p.
If the random variable X is the number of trials conducted until success occurs for the first time,
then X has the geometric distribution G(p).

22

CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS

Theorem 4. If X G(p) and n is a positive integer, then P (X > n) = (1 p)n = q n .


Corollary 5. If X G(p), then the cumulative distribution function F is given by F (x) = P (X 6
x) = 1 (1 p)x = 1 q x for x R.
Note that x denotes the largest integer less or equal to x.
Theorem 6. If X is a random variable and X G(p), then
E(X) =

1
p

Var(X) =

9.4.3

;
1p
.
p2

Sign Tests

Often, we have a sample of data consisting of independent observations of some quantity of interest,
and it might be of interest to see whether the observed values differ systematically from some fixed
and pre-determined value.
To answer this question, one may use a sign test approach as follows:
1. Count the number of observations that are strictly greater than the target value (+).
2. Count the total number of observations that are either strictly greater (+) or strictly smaller
() than the target value.
3. Calculate the tail probability that measures how often one would expect to observe as many
increases (+) as were observed, if there were equal probability of + and .
In this course, we will say that if the tail probability is less than 5% then we will regard this as
significant.

9.5

Continuous random variables


Definition 1. Random variable X is continuous if and only if FX (x) is continuous.

Strictly speaking, FX (x) must actually be piecewise differentiable, which means that FX (x) is
differentiable except for at most countably many points. However, the above definition is good
enough for our present purposes.
Definition 2. The probability density function f (x) of a continuous random
variable X is defined by
d
f (x) = fX (x) =
F (x) , x R
dx
d
F (x) if F (x) is not differentiable at x = a.
if F (x) is differentiable, and lim
xa dx
Theorem 1. F (x) =

f (t)dt.

9.6. SPECIAL CONTINUOUS DISTRIBUTIONS

9.5.1

23

The mean and variance of a continuous random variable


Definition 3. The expected value (or mean) of a continuous random variable X
with probability density function f (x) is defined to be
Z
xf (x)dx .
= E(X) =

Theorem 2. If X is a continuous random variable with density function f (x), and g(x) is a real
function, then the expected value of Y = g(X) is
Z
g(x)f (x)dx .
E(Y ) = E(g(X)) =

Definition 4. The variance of a continuous random variable X is


Var (X) = E((X E(X))2 ) = E(X 2 ) (E(X))2 .
p
The standard deviation of X is = SD(X) = Var (X).
Theorem 3. If a and b are constants, then
E(aX + b) = aE(X) + b
Var(aX + b) = a2 Var(X)
SD(aX + b) = |a| SD(X) .
Theorem 4. If E(X) = and Var(X) = 2 , and Z =

9.6
9.6.1

X
, then E(Z) = 0 and Var(Z) = 1.

Special Continuous Distributions


The Normal Distribution
Definition 1. A continuous random variable X has normal distribution N (, 2 )
if it has probability density
2
1
12 x

where < x < .


(x) =
e
2 2

Theorem 1. If X is a continuous random variable and X N (, 2 ), then


E(X) =

Var(X) = 2 .
X
N (0, 1).

The normal distribution is used, among other things, to approximate the binomial distribution
B(n, p) when n grows large.

Theorem 2. If X N (, 2 ), then

Вам также может понравиться