You are on page 1of 1

1

Non-parametric methods using Kernel Density


Estimation
Harish K,(EP12B009) Aditya Gurunathan,(EE12B126) and T R Sriram,(EE12B056)

AbstractThis paper discusses the various non-parametric


methods to estimate the density of a random variable. Nonparametric models differ from parametric models in that the
model structure is not specified a priori but is instead determined
from data.

bandwidth. An example is the Gaussian kernel K(x) = 12 ex /2 .


We will be able to show asymptotically that optimal Histograms
error function converges at n2/3 and KDE at n4/5 and so KDE
is a better estimator of the density than the histogram.

A. Tradeoff between Bias and Variance


For the mean squared error, we have a convenient decomposition

I. I NTRODUCTION
Nonparametric statistics are statistics not based on parameterized
families of probability distributions. The term non-parametric means
that the number and nature of the parameters are flexible and not
fixed in advance.

ISTOGRAM is the simplest non-parametric estimator. Mathematically


m
X
pj
I(x Bj )
(1)
fN (x) =
h
j=1

where h = 1/m is the bandwidth, Yj is the number of observation in


Y
the bin Bj , pj = Nj . There is no optimal procedure for determining
the number of bins, and different bin sizes can reveal different
features of the data. Using wider bins where the density is low reduces
noise due to sampling randomness; using narrower bins where the
density is high gives greater precision to the density estimation. It
can be shown that under certain conditions E(fN (x)) f (x).
1
(E(I(x [xo , xo + h])))
hZ
1
=
I(x [xo , xo + h])fN (xi )dxi
h
Z xo +h
1
=
fN (x) + f 0 (
x)(xi xo )dxi
h xo
Z xo +h
Z xo +h
1
1
dxi +
f 0 (
x)(xi xo ) dxi
= fN (xo )
h
h
xo
xo
(2)

E(fN (x)) =

The first term in the above equation reduces to fN (xo ) and the second
term can be shown to be bounded by C*h, if the derivative is assumed
to be bounded by C. Consistency of the histogram requires that h0
as n.

III. K ERNEL D ENSITY E STIMATION


Kernel density estimation is a fundamental data smoothing problem where inferences about the population are made, based on a finite
data sample. Let (x1, x2, . . . , xn) be an independent and identically
distributed sample drawn from some distribution with an unknown
density f . We are interested in estimating the shape of this function
f . Its kernel density estimator is
N
N
1 X
1 X
x xi
fN =
Kh (x xi ) =
K(
)
N i=1
hN i=1
h

(4)

where bias(fN (x)) = E(fN (x)) f (x).

B. Mean and Variance of KDE


For our estimator fN (x) =

II. H ISTOGRAM

E((f (x) fN (x))2 ) = bias2 (fN (x)) + V (fN (x))

1
i
K( xx
)
h
h

E(fN (x)) =

1
N

Kh (x, xi ) where Kh (x, xi ) =

1
xt
K(
)f (t)dt
h
h

Z
(5)
K(t)f (x ht)dt
Z
1
= f (x) + h2 f 00 (x) t2 K(t)dt + . . .
2
We have used the fact that the kernel integrates to 1 and has zero
mean in deriving the last equation.
The bias can now be written as,
Z
1
E(fN (x)) f (x) = h2 f 00 (x) t2 K(t)dt + O(h4N ) (6)
2

E(fN (x))f (x) 0 as h 0. By similar calculation the Variance


can be written as,
=

V (fN (x)) =

f (x)2 (x)dx
1
+ O( )
N hN
N

(7)

V (fN (x)) 0 as N hN .

IV. O PTIMAL BANDWIDTH


When we differentiate the error function with respect to h and set
it equal to zero, the optimal bandwidth which minimizes the error
function can be asymptotically approximated as
R
K(x)2 dx
1 15

R
h =[ R 2
]
(8)
( x K(x)dx)2 (f 00 (x))2 dx N
This also satisfies the property of E(fN (x)) f (x) 0 as h 0
and V (fN (x)) 0 as N h .

V. C ONCLUSION
Kernel density estimates are closely related to histograms, but can
be endowed with properties such as smoothness or continuity by
using a suitable kernel.

R EFERENCES
(3)

where K() is the kernel, a non-negative function that integrates to one


and has zero mean and h(> 0) is a smoothing parameter called the

[1]
[2]
[3]
[4]

http://en.wikipedia.org/wiki/Kernel density estimation


Steven Kay - Fundamentals of Statistical Signal Processing, Volume I
http://www.cc.gatech.edu/agray/6740fall09/
http://athena.sas.upenn.edu/petra/class721/nonpar3.pdf