Академический Документы
Профессиональный Документы
Культура Документы
Anoop M. Namboodiri
IIIT, Hyderabad, INDIA. anoop@iiit.ac.in
• What is the expected value of height? Is it the same 2. Identical distribution: The probability that any
as above? particular sample is drawn is the unchanged across
the trials. In other words, the probability distribu-
• What is the expected value of height, given that the tion is identical for all trials.
gender of the sample is, say female?
We put the two assumptions together and claim that the
By the end of this tutorial, you should be able to an- samples in an experiment are independent and identi-
swer all the above questions (and those at the end) with cally distributed (i.i.d. or iid for short). The above as-
clear reasoning. Specifically, the last question is most sumptions are the primary reasons why we can make
interesting from the point of view of pattern classifica- any inference about a population from a relatively small
tion, which asks the inverse question, “what is the most set of samples drawn from the population. We often as-
likely gender of a sample, given that the height is 165?” sume that the method of sampling is (simple) random
Before we dive deeper into the details, we introduce a sampling, where every sample in the population has an
1
n
X n
X
equal chance of being drawn in any trial. Note that for
P (vi ) = pi = 1. (2)
a finite population, applying random sampling with re-
i=1 i=1
placement will make the resulting samples, iid.
In the following sections, we deal with two differ- Certain parametric forms of the probability mass
ent types of random variables: discrete and continuous. function are popular in practice, as they model the pro-
The distinction is based on the nature of values that a cess of generation of the samples. Figure 1 shows two
random variable can take. The tools required to deal popular PMF forms, uniform and binomial.
with them might also be different, which will be dis-
cussed in detail.
2
However, if we were to guess the most likely height 3.1 Probability Density Function (PDF)
among all students, we will be better off guessing the
mode of the distribution and not its mean . . . ofcourse!! Consider the continuous domain equivalent of the
Now we know the best guess of the outcome of our first question that we asked: ‘If we randomly select
experiment. However, can we say anything about the a student, how likely are we to get a specific height,
amount of error we will make? This is precisely what say 172.3413587391, precise up to the picometer or
the variance tells us. more?’ Intuitively, we can say that it is extremely
The variance, σ 2 of a random variable is defined as: unlikely, well almost impossible, that we will chance
n
X upon a student with that exact height. i.e, P r[x =
Var(x) ≡ σ 2 = E[(x−µ)2 ] = (vi −µ)2 P (vi ). (4) 172.3413587391...] = 0. Then what about exactly
i=1 172.00...? or any other specific real number between
As you can see, the variance is the mean squared er- 150 and 190? We have to say they also have the same
ror (MSE) if you guess the mean. If the mean tells you plight. To generalize, the probability that a continuous
about the centre of mass of the population, the vari- random variable takes any specific value in its range is
ance tells you how spread out the population is from 0. Does that mean no event can ever occur?!!
the mean. Note that variance only gives you a measure To get around this predicament, we reframe the ques-
of spread of the data and not the exact way in which it tion a bit as follows: ‘How likely are we to select a
is spread. For that you need the complete PMF itself. student of height within the range [172 − δ, 172 + δ]?
One can also represent the variance as: Now there is a non-zero probability that we might get a
Z ∞ number within that range. Based on this, we define the
2
σ = E[(x − µ) ] = 2
(x − µ)2 p(x)dx distribution of samples in the range as follows:
Z −∞ For every continuous random variable, x, there exists
a probability density function, p(x), such that:
= (x2 − 2xµ + µ2 )p(x)dx
Z Z Z ∀x , p(x) ≥ 0, and (6)
= x2 p(x)dx − 2µ xp(x)dx + µ2 p(x)dx Z b
P r[x ∈ (a, b)] = p(x)dx. (7)
= E[x2 ] − 2µE[x] + µ2 .1 a
Note: If we compute the square root of the variance, p(xt ) gives the limiting value for density of prob-
i.e, RMSE w.r.t µ, we get the standard deviation, σ. ability in a small window around the point xt . Note
that the value of p(xt ) is not a probability. We always
3 Continuous Random Variables use lower case p for densities, and upper case P for
functions that give probabilities. The probability den-
In the previous section, we assumed that the height sity function (PDF), plays the same role for CRVs as
measurement of a student is an integer value, making PMF for DRVs. Note that in theory the PDF need not
x a discrete random variable. If we assume that the be a parametric function, although in practice it always
height can be measured precisely to any real number is.
between 150 and 190, the number of values that x can
take will become uncountable. Such random variables, 3.2 Expectation and Variance: µ & σ 2
which usually take any value within a continuous range
are referred to as continuous random variables (CRV). We can extend the definitions of expected value and
Note that the number of real numbers in a range are un- variance of a RV from the discrete to continuous domain
countable. In each trial, the random variable, x, can take as the following integrals:
any of the infinite number of values within its range, χ.
The range could also be infinite, i.e. (−∞, ∞). Z ∞
In our example, even though the range is finite E[x] ≡µ = xp(x)dx
([150, 190]), the number of possible values that x can −∞
Z ∞
take are uncountably infinite. This makes the definition
of probabilities, tricky. Var(x) ≡ σ2 = (x − µ)2 p(x)dx.
−∞
3
These measures also have similar meanings or inter- a normal density if infinite. The expectation and vari-
pretations as we found for the discrete RVs. To make ance of the normal density function are in fact, µ and
the ideas clear, we will consider two examples of PDFs: σ 2 themselves.
In addition to what we discussed, there are a large
3.2.1 Uniform Density number of probability distributions for both discrete and
continuous RVs that are used in specific scenarios [1].
The uniform density function is characterized by the
range within which it is defined as is given by: 4 CDF: Cumulative Distribution Func-
( 1 tion
(b−a) if a ≤ x ≤ b
U (a, b) = (9)
0 otherwise The cumulative distribution function or CDF is de-
rived from the PDF by the integral of the density up to
Z µ ¶ Z b a point. It is defined as:
b
1 1 Z
µ = x dx = xdx t
a b−a b−a a C(t) = p(x)dx. (12)
1 £ 2 ¤b −∞
= x /2 a = (b + a)/2,
b−a
Not that the CDF gives the total probability that a
which is what we expect of the mean of a uniform dis- continuous random variable takes a value less than a
tribution between a and b. Similarly, the variance can specific value, t. The CDF is can be expressed in a
shown to be: parametric form in certain cases, such as the uniform
(b − a)2 density:
σ2 = (10)
12
0 if t < a
Figure 2(a) shows the plot of a uniform density func-
(t−a)
tion in the range [0, 3]. C(t) = (b−a) if a ≤ t ≤ b (13)
1 if t > b
Uniform Density: U(0,3) Normal Density
0.35 0.14
0.3 0.12
Note that a PDF of a RV completely specifies its
0.25
0.2
0.1
0.08
CDF and vice-versa. However, it is possible that one
of them has a compact parametric representation, while
p(x)
p(x)
0.15 0.06
0.1 0.04 the other does not. For example, the CDF of the normal
0.05 0.02
distribution (equation 11) is given by:
0 0
µ µ ¶¶
−1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 −10 −8 −6 −4 −2 0 2 4 6 8 10
x x
(a) (b)
1 x−µ
cdf (x) = 1 + erf √ , (14)
2 σ 2
Figure 2. (a) uniform and (b) norma densi-
where erf () is the error function defined by:
ties for PDF.
Z x
2 2
erf (x) = √ e−t dt. (15)
π −∞
4
There is no closed form representation to the erf , 5 Problems
and it is often approximated by its Taylor series expan-
sion. Figure 3 shows the normal density function with 1. Give an example each of probability mass func-
four different parameters, and the corresponding cumu- tions with finite and infinite ranges. Show that the
lative distribution functions. conditions on PMF are satisfied by your example.
2. Show with complete steps that the variance of uni-
4.1 Generating Random Numbers form density is given by equation 10. (Hint: use
the expression for variance in equation 5.)
One of the very useful applications of CDFs is 3. Show examples of two density functions (draw the
that one can generate random numbers that follow any function plots) that have the same mean and vari-
given distribution, provided we can compute/estimate ance, but clearly different distributions. Plot both
the CDF of the distribution. functions in the same graph with different colours.
Consider a random variable, x that is distributed ac- 4. Show that the alternate expression for variance
cording to a PDF, p(x). Also consider another random given in equation 5 holds for discrete random vari-
variable, y = C(x), where C(x) is the CDF corre- ables as well.
sponding to p(x).
Now, consider a small window of x around the point 5. Prove that the mean and variance of a normal den-
t, [t − δt, t + δt] (see Figure 4). The value of y corre- sity, N (µ, σ 2 ) are indeed its parameters, µ and σ 2 .
sponding to t will be r = C(t). Moreover, the value of 6. Using the inverse of CDFs, map a set of 10, 000
y corresponding to x = t+δt will be r+δt.p(t), assum- random numbers from U [0, 1] to follow the follow-
ing that δt is small, giving p(t + δt) ≈ p(t). Similarly, ing pdfs:
C(t − δt) = r − δt.p(t).
(a) Normal density with µ = 0, σ = 3.0.
y (b) Rayleigh density with σ = 1.0.
(c) Exponential density with λ = 1.5.
y=C(x)
1.0
r + dt.p(t) Once the numbers are generated, plot the normal-
r = C(t)
r - dt.p(t)
ized histograms (the values in the bins should add
up to 1) of the new random numbers with appro-
0.5
priate bin sizes in each case; along with their pdfs.
What do you infer from the plots? Note: see rand()
x
function in C for U [0, INT MAX].
0 t-dt t t+dt
2.0 4.0 6.0 8.0
7. Write a function to generate a random number as
follows: Every time the function is called, it gen-
Figure 4. Mapping of a random variable erates 500 new random numbers from U [0, 1] and
using the CDF. outputs their sum.
Generate 50, 000 random numbers by repeatedly
calling the above function, and plot their normal-
In other words, all samples of x within a window of ized histogram (with bin-size = 1). What do you
size 2δt around t will map to a window of size 2δt.p(t) find about the shape of the resulting histogram?
around C(t) for y. The resulting density of y will be
hence 1/p(t) times the density of x, which is unity. i.e,
References
y is of uniform density in the range [0, 1].
We just argued that given a random variable x of any [1] Probability distribution, Wikipedia, 2008,
density, the corresponding random variable, y = C(t) http://en.wikipedia.org/wiki/Probability distributions.
will be U [0, 1]. We can invert this statement and say that
given a random variable y that follows the pdf U [0, 1], [2] R. Duda, P. Hart, and D. Stork, Pattern classifcation
the random variable x = C −1 (y) will follow a PDF and scene analysis, 2nd ed., John Wiley and Sons,
with corresponding CDF as C(). In other words given a New York, 2001.
set of random numbers yi with uniform density U (0, 1), [3] John A. Rice, Mathematical statistics and data
we can map it to a set of random variables xi with any analysis, 2nd ed., Duxbury Press, 1995.
desired PDF using the inverse CDF function !!!