Вы находитесь на странице: 1из 13

Independent Component

Analysis
For Time Series Separation
ICA
Blind Signal Separation (BSS) or Independent Component Analysis (ICA) is the
identification & separation of mixtures of sources with little prior
information.
Applications include:

Audio Processing
Medical data
Finance
Array processing (beamforming)
Coding
and most applications where Factor Analysis and PCA is currently used.
While PCA seeks directions that represents data best in a |x
0
- x|
2
sense,
ICA seeks such directions that are most independent from each other.
We will concentrate on Time Series separation of Multiple Targets
The simple Cocktail Party Problem
Sources
Observations
s
1

s
2

x
1

x
2

Mixing matrix A
x = As
n sources, m=n observations
Motivation
Two Independent Sources
Mixture at two Mics
a
IJ
... Depend on the distances of the microphones from the speakers
2 22 1 21 2
2 12 1 11 1
) (
) (
s a s a t x
s a s a t x
+ =
+ =
Motivation
Get the Independent Signals out of the Mixture
ICA Model (Noise Free)
Use statistical latent variables system
Random variable s
k
instead of time signal
x
j
= a
j1
s
1
+ a
j2
s
2
+ .. + a
jn
s
n
, for all j
x = As
ICs s are latent variables & are unknown AND Mixing matrix A is
also unknown
Task: estimate A and s using only the observeable random vector x
Lets assume that no. of ICs = no of observable mixtures
and A is square and invertible
So after estimating A, we can compute W=A
-1
and hence
s = Wx = A
-1
x

Illustration
2 ICs with distribution:




Zero mean and variance equal to 1

Mixing matrix A is



The edges of the parallelogram are in the
direction of the cols of A
So if we can Est joint pdf of x
1
& x
2
and then
locating the edges, we can Est A.
|
|
.
|

\
|
=
1 2
3 2
A

s
=
otherwise
s if
s p
i
i
0
3 | |
3 2
1
) (
Restrictions
s
i
are statistically independent
p(s
1
,s
2
) = p(s
1
)p(s
2
)
Nongaussian distributions
The joint density of unit
variance s
1
& s
2
is symmetric.
So it doesnt contain any
information about the
directions of the cols of the
mixing matrix A. So A cannt
be estimated.
If only one IC is gaussian, the
estimation is still possible.

|
|
.
|

\
|
+
=
2
exp
2
1
) , (
2
2
2
1
2 1
x x
x x p
t
Ambiguities
Cant determine the variances (energies)
of the ICs
Both s & A are unknowns, any scalar multiple in one of the
sources can always be cancelled by dividing the corresponding
col of A by it.
Fix magnitudes of ICs assuming unit variance: E{s
i
2
} = 1
Only ambiguity of sign remains
Cant determine the order of the ICs
Terms can be freely changed, because both s and A are
unknown. So we can call any IC as the first one.
ICA Principal (Non-Gaussian is Independent)
Key to estimating A is non-gaussianity
The distribution of a sum of independent random variables tends toward a Gaussian
distribution. (By CLT)





f(s
1
) f(s
2
) f(x
1
) = f(s
1
+s
2
)
Where w is one of the rows of matrix W.


y is a linear combination of s
i
, with weights given by z
i
.
Since sum of two indep r.v. is more gaussian than individual r.v., so z
T
s is more
gaussian than either of s
i
. AND becomes least gaussian when its equal to one of s
i
.
So we could take w as a vector which maximizes the non-gaussianity of w
T
x.
Such a w would correspond to a z with only one non zero comp. So we get back the s
i.
s z As w x w y
T T T
= = =
Measures of Non-Gaussianity
We need to have a quantitative measure of non-gaussianity for ICA
Estimation.
Kurtotis : gauss=0 (sensitive to outliers)

Entropy : gauss=largest

Neg-entropy : gauss = 0 (difficult to estimate)

Approximations



where v is a standard gaussian random variable and :


2 2 4
}) { ( 3 } { ) ( y E y E y kurt =
}
= dy y f y f y H ) ( log ) ( ) (
) ( ) ( ) ( y H y H y J
gauss
=
{ }
2
2
2
) (
48
1
12
1
) ( y kurt y E y J + =
{ } { } | |
2
) ( ) ( ) ( v G E y G E y J ~
) 2 / . exp( ) (
) . cosh( log
1
) (
2
u a y G
y a
a
y G
=
=
Data Centering & Whitening
Centering
x = x E{x}
But this doesnt mean that ICA cannt estimate the mean, but it just simplifies the
Alg.
ICs are also zero mean because of:
E{s} = WE{x}
After ICA, add W.E{x} to zero mean ICs
Whitening
We transform the xs linearly so that the x
~
are white. Its done by EVD.
x
~
= (ED
-1/2
E
T
)x = ED
-1/2
E
T
Ax = A
~
s
where E{xx
~
} = EDE
T
So we have to Estimate Orthonormal Matrix A
~

An orthonormal matrix has n(n-1)/2 degrees of freedom. So for large dim A we
have to est only half as much parameters. This greatly simplifies ICA.
Reducing dim of data (choosing dominant Eig) while doing whitening also
help.
Noisy ICA Model
x = As + n
A ... mxn mixing matrix
s ... n-dimensional vector of ICs
n ... m-dimensional random noise vector
Same assumptions as for noise-free model, if we use measures of
nongaussianity which are immune to gaussian noise.
So gaussian moments are used as contrast functions. i.e.


however, in pre-whitening the effect of noise must be taken in to account:
x
~
= (E{xx
T
} - )
-1/2
x
x
~
= Bs + n
~
.

{ } { } | |
( ) ) 2 / exp( 2 / 1 ) (
) ( ) ( ) (
2 2
2
c x c y G
v G E y G E y J
=
~
t

Вам также может понравиться