Lecture 3

1
Basic Probability
Pattern Classification, Chapter
Introduction
Probability is the study of randomness and uncertainty.
In the early days, probability was associated with games

of chance (gambling).
3
Simple Games Involving Probability
Game: A fair die is rolled. If the result is 2, 3, or 4, you win

$1; if it is 5, you win $2; but if it is 1 or 6, you lose $3.
Should you play this game?
Random Experiment
a random experiment is a process whose outcome is uncertain.
Examples:
Tossing a coin once or several times
Picking a card or cards from a deck
Measuring temperature of patients
...
Events & Sample Spaces
Sample Space
The sample space is the set of all possible outcomes.
Event
Simple Events
The individual outcomes are called simple events.
An event is any collection

of one or more simple events
Example
Experiment: Toss a coin 3 times.
Sample space
= {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}.
Examples of events include
A = {HHH, HHT,HTH, THH}

= {at least two heads}
B = {HTT, THT,TTH}
= {exactly two tails.}
Basic Concepts (from Set Theory)
The union of two events A and B, A B, is the event consisting of

all outcomes that are either in A or in B or in both events.
The complement of an event A, Ac, is the set of all outcomes in

that are not in A.
The intersection of two events A and B, A B, is the event

consisting of all outcomes that are in both events.
When two events A and B have no outcomes in common, they are

said to be mutually exclusive, or disjoint, events.
Example
Experiment: toss a coin 10 times and the number of heads is observed.
Let A = { 0, 2, 4, 6, 8, 10}.
B = { 1, 3, 5, 7, 9}, C = {0, 1, 2, 3, 4, 5}.
A B= {0, 1, , 10} = .
A B contains no outcomes. So A and B are mutually exclusive.
Cc = {6, 7, 8, 9, 10}, A C = {0, 2, 4}.
Rules
Commutative Laws:
Associative Laws:
Distributive Laws:
A B = B A, A B = B A
(A B) C = A (B C )
(A B) C = A (B C) .
(A B) C = (A C) (B C)
(A B) C = (A C) (B C)
DeMorgans Laws:
i 1
Ai
i 1
Aic ,
i 1
Ai
Aic .
i 1
Venn Diagram
1
0
AB
Probability
A Probability is a number assigned to each subset (events) of a sample

space .
Probability distributions satisfy the following rules:
1
1
Axioms of Probability
For any event A, 0 P(A) 1.
P() =1.
If A1, A2, An is a partition of A, then
1
2
P(A) = P(A1) + P(A2) + ...+ P(An)

(A1, A2, An is called a partition of A if A1 A2 An = A and A1,
A2, An are mutually exclusive.)
Properties of Probability
For any event A, P(Ac) = 1 - P(A).
If A B, then P(A) P(B).
For any two events A and B,

P(A B) = P(A) + P(B) - P(A B).
1
3
For three events, A, B, and C,

P(ABC) = P(A) + P(B) + P(C) P(AB) - P(AC) - P(BC) + P(AB C).
Example
1
4
In a certain population, 10% of the people are rich, 5% are famous,

and 3% are both rich and famous. A person is randomly selected
from this population. What is the chance that the person is
not rich?
rich but not famous?
either rich or famous?
Intuitive Development
(agrees with axioms)
1
5
Intuitively, the probability of an event a could be

defined as:
Where N(a) is the number that event a happens in n trials
Here We Go Again: Not So Basic

Probability
More Formal:
1
7
is the Sample Space:
Contains all possible outcomes of an experiment
in is a single outcome
A in is a set of outcomes of interest
Independence
1
8
The probability of independent events A, B and C is

given by:
P(A,B,C) = P(A)P(B)P(C)
A and B are independent, if knowing that A has happened

does not say anything about B happening
Bayes Theorem
1
9
Provides a way to convert a-priori probabilities to aposteriori probabilities:
Conditional Probability
2
0
One of the most useful concepts!
A
B
Bayes Theorem
2
1
Provides a way to convert a-priori probabilities to aposteriori probabilities:
Using Partitions:
2
2
If events A are mutually exclusive and partition

i
Random Variables
2
3
A (scalar) random variable X is a function that maps

the outcome of a random event into real scalar
values
X()
Random Variables Distributions
2
4
Cumulative Probability Distribution (CDF):
Probability Density Function (PDF):
Random Distributions:
2
5
From the two previous equations:
2
6
Uniform Distribution
A R.V. X that is uniformly distributed between x
and
x2 has density function:
X1
X2
Gaussian (Normal) Distribution
2
7
A R.V. X that is normally distributed has density

function:
Statistical Characterizations
2
8
Expectation (Mean Value, First Moment):
Second Moment:
Statistical Characterizations
2
9
Variance of X:
Standard Deviation of X:
Mean Estimation from Samples
3
0
Given a set of N samples from a distribution, we can

estimate the mean of the distribution by:
Variance Estimation from Samples
3
1
Given a set of N samples from a distribution, we can

estimate the variance of the distribution by:
Pattern
Classification
Chapter 1: Introduction to Pattern

Recognition (Sections 1.1-1.6)
Machine Perception
An Example
Pattern Recognition Systems
The Design Cycle
Learning and Adaptation
Conclusion
Machine Perception
3
4
Build a machine that can recognize patterns:

Speech recognition
Fingerprint identification
OCR (Optical Character Recognition)
DNA sequence identification
An Example
3
5
Sorting incoming Fish on a conveyor according to

species using optical sensing
Sea bass
Species
Salmon
3
6
Problem Analysis
Set up a camera and take some sample images to extract
features
Length
Lightness
Width
Number and shape of fins
Position of the mouth, etc
This is the set of all suggested features to explore for use in our
classifier!
3
7
Preprocessing
Use a segmentation operation to isolate fishes from one

another and from the background
Information from a single fish is sent to a feature
extractor whose purpose is to reduce the data by

measuring certain features
The features are passed to a classifier

3
8
3
9
Classification
Select the length of the fish as a possible feature for
discrimination
4
0
4
1
The length is a poor feature alone!

Select the lightness as a possible feature.
4
2
4
3
Threshold decision boundary and cost relationship

Move our decision boundary toward smaller values of
lightness in order to minimize the cost (reduce the number

of sea bass that are classified salmon!)
Task of decision theory
4
4
Adopt the lightness and add the width of the fish

Fish
xT = [x1, x2]
Lightness
Width
4
5
4
6
We might add other features that are not correlated
with the ones we already have. A precaution should be

taken not to reduce the performance by adding noisy
features
Ideally, the best decision boundary should be the one
which provides an optimal performance such as in the

following figure:
4
7
4
8
However, our satisfaction is premature because

the central aim of designing a classifier is to
correctly classify novel input
Issue of generalization!
4
9

Lecture 3

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Lecture 3

Загружено:

Авторское право:

Доступные форматы

1

Pattern Classification, Chapter

Probability is the study of randomness and uncertainty.

In the early days, probability was associated with games

Pattern Classification, Chapter

Game: A fair die is rolled. If the result is 2, 3, or 4, you win

Pattern Classification, Chapter

a random experiment is a process whose outcome is uncertain.

Pattern Classification, Chapter

Events & Sample Spaces

An event is any collection

Pattern Classification, Chapter

= {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}.

Examples of events include

A = {HHH, HHT,HTH, THH}

Pattern Classification, Chapter

Basic Concepts (from Set Theory)

The union of two events A and B, A B, is the event consisting of

The complement of an event A, Ac, is the set of all outcomes in

The intersection of two events A and B, A B, is the event

When two events A and B have no outcomes in common, they are

Pattern Classification, Chapter

B = { 1, 3, 5, 7, 9}, C = {0, 1, 2, 3, 4, 5}.

A B contains no outcomes. So A and B are mutually exclusive.

Cc = {6, 7, 8, 9, 10}, A C = {0, 2, 4}.

Pattern Classification, Chapter

Pattern Classification, Chapter

Pattern Classification, Chapter

A Probability is a number assigned to each subset (events) of a sample

Probability distributions satisfy the following rules:

Pattern Classification, Chapter

For any event A, 0 P(A) 1.

If A1, A2, An is a partition of A, then

P(A) = P(A1) + P(A2) + ...+ P(An)

Pattern Classification, Chapter

For any event A, P(Ac) = 1 - P(A).

If A B, then P(A) P(B).

For any two events A and B,

For three events, A, B, and C,

Pattern Classification, Chapter

In a certain population, 10% of the people are rich, 5% are famous,

Pattern Classification, Chapter

(agrees with axioms)

Intuitively, the probability of an event a could be

Where N(a) is the number that event a happens in n trials

Pattern Classification, Chapter

Here We Go Again: Not So Basic

is the Sample Space:

Contains all possible outcomes of an experiment

Pattern Classification, Chapter

The probability of independent events A, B and C is

A and B are independent, if knowing that A has happened

Provides a way to convert a-priori probabilities to aposteriori probabilities:

Pattern Classification, Chapter

One of the most useful concepts!

Pattern Classification, Chapter

Provides a way to convert a-priori probabilities to aposteriori probabilities:

Pattern Classification, Chapter

If events A are mutually exclusive and partition

Pattern Classification, Chapter

A (scalar) random variable X is a function that maps

Pattern Classification, Chapter

Random Variables Distributions