Вы находитесь на странице: 1из 8

Statistics and probability models

Manas Patra

Manas Patra Statistical Methods (MBD 411) Lecture 5


Two faces of “statistics”

Mathematical statistics deals with abstract probability


models and their applications to make statistical
inferences.
Applied statistics deals with data, either obtained from a
controlled experiment or observations. A common and
useful way of representing data is by the frequency
histogram.
We used relative frequency or percentage representation
which is just a matter of scaling.
There are statistical terms or statistic(s) like mean,
median, standard deviation used in both branches of
statistics (meant as a discipline).

Manas Patra Statistical Methods (MBD 411) Lecture 5


More confusion?

We compute these statistics (again!) in seemingly


different ways in the two cases.
In mathematical statistics these quantities refer to a
random variable and its accompanying distribution (or
density) function.
In applied statistics we calculate these from the observed
data associated to a “plain” variable or its representation
like the histogram.
Are they two parallel branches of statistics or is there a
connection? We explore these questions below.

Manas Patra Statistical Methods (MBD 411) Lecture 5


Building bridges

We deal with continuous variable case only. The categorical


case is similar (and a bit simpler).
For a random variable X we have a (cumulative) distribution
function F (x ) = Pr(X ≤ x ) and (often) the density (or mass)
function p(x ) such that
Z x
F (x ) = p(t)dt
−∞

The mean/average/expectation value of X is given by


Z
X= xp(x )dx

And the median M is the number such that


RM
−∞ p(x )dx = 1/2.

Manas Patra Statistical Methods (MBD 411) Lecture 5


Histogram and the statistics

Recall: the range of variable x is divided by interval points


x1 < x2 < · · · < xn } and xk+1 − xk = c for all k. And yk
is the relative frequency of the observation lying in the
interval [xk , xk+1 ].
The mean: How do we calculate the mean. One possible
way is to divide each interval [xk , xk+1 ) into large number
of subintervals xk , xk + J1 , xk + J2 , . . . , xk + J−1
J
, assume
r r +1
that in the “little bin” [xk + J , xk + J all individuals
have x -value at the initial point (there are fk /Jc
individuals in the little bin (fk is the frequency in the kth
bin) and calculate the average.
We may even take lim J → ∞ to show rigor.

Manas Patra Statistical Methods (MBD 411) Lecture 5


Histogram and the statistics
Whether we do the simple way or the “rigorous” way we
will get the same value. Why?
Because we are dealing with a simple uniform distribution
in each subinterval [xk , xk+1 ).
The histogram picture suggests that!
Histogram of x

15
10
Frequency

5
0

2 4 6 8 10

Manas Patra Statistical Methods (MBD 411) Lecture 5


Mean and median

The discussion in the previous slide suggests another


approach. Identify, (relative) frequency with probability
and cumulative relative frequency with cumulative
distribution function.
Thus, (cumulative) distribution function with
F (xk ) = (y1 + y2 + · · · + yk )

F (x
k) + uyk /c if x = xk + u < xk+1
F (x ) = 
0 if x < x1 or x > xn

The density function is given by

p(x ) = yk /c if xk ≤ x < xk+1

Manas Patra Statistical Methods (MBD 411) Lecture 5


The spread of distribution: variance

We can now calculate the mean µ and the median M


using the distribution or density function.
X
µ= yk (xk + xk+1 )/2
For M we have to solve the equation F (M) = 1/2. Let
[xm , xm+1 ) be the interval such that F (xm ) = r ≤ 1/2 and
F (xm+1 ) > 1/2.
1/2 − r
M = xm +
ym
Show that in the symmetric unimodal case mean, mode
and median values coincide.

Manas Patra Statistical Methods (MBD 411) Lecture 5

Вам также может понравиться