Академический Документы
Профессиональный Документы
Культура Документы
ln
f
u
~u!
)
i1
N
f
u
i
~u
i
!
0
where E@. # denotes expected value. Unlike sources, s
i
s, which are
assumed to be temporally independent, the observed mixtures of
sources, x
i
s, are statistically dependent on each other, so the mu-
tual information between pairs of mixtures, I ~x
i
, x
j
! is in general
positive. The blind separation problem is to find a matrix, W, such
that the linear transformation
u Wx WAs
reestablishes the condition I ~u
i
, u
j
! 0 for all i j.
Consider the joint entropy of two nonlinearly transformed com-
ponents of y:
H~ y
1
, y
2
! H~ y
1
! H~ y
2
! I ~ y
1
, y
2
!
ICA removes EEG artifacts 177
where y
i
g~u
i
! and g~ ! is an invertible, bounded nonlinearity.
The nonlinearity function provides, through its Taylor series ex-
pansion, higher-order statistics that are necessary to establish
independence.
Maximizing this joint entropy involves maximizing the indi-
vidual entropies, H~ y
1
! and H~ y
2
!, while minimizing the mutual
information, I ~ y
1
, y
2
!, shared between the two. Thus, maximizing
H~ y!, in general, minimizes I ~ y!. When this latter quantity is zero,
the two variables are statistically independent.
The algorithm attempts to maximize the entropy H~ y! by iter-
atively adjusting the elements of the square matrix, W, using small
batches of data vectors ~normally 10 or more! drawn randomly
from $x% without substitution, according to Bell and Sejnowski
~1995!:
DW @
]H~ y!
]W
W
T
W @I fu
T
#W, where f
i
]
]u
i
ln
]y
i
]u
i
.
The ~W
T
W! natural gradient term ~Amari et al., 1996; Cardoso
& Laheld, 1996! avoids matrix inversions and speeds convergence.
The form of the nonlinearity g~u! plays an essential role in the
success of the algorithm. The ideal form for g~u! is the cumulative
density function ~c.d.f.! of the distributions of the independent
sources. When g~u! is a sigmoid function ~as in Bell & Sejnowski,
1995!, the algorithm is then limited to separating sources with
super-Gaussian distributions.
A way of generalizing the learning rule to sources with either
sub-Gaussian or super-Gaussian distributions is to estimate p.d.f.
of sources using a parametric density model. Sub-Gaussians can be
modeled with a symmetrical form of the Pearson mixture model
~Pearson, 1901! as proposed in Girolami ~1998! and Lee et al.
~1999!, whereas super-Gaussians can be modeled as the derivative
of the hyperbolic tangent ~Girolami, 1998; Lee et al., 1999!. For
sub-Gaussians, the following approximation is possible: f
i
tan-
h~u
i
! u
i
. For super-Gaussians, the same approximation becomes
f
i
tanh~u
i
! u
i
. The two equations can be combined as
DW @ @I Ktanh~u!u
T
uu
T
#W
k
i
1: super-Gaussian
k
i
1: sub-Gaussian
where k
i
are elements of the N-dimensional diagonal matrix K.
The k
i
s can be derived from the generic stability analysis ~Car-
doso, 1998; Cardoso & Laheld, 1996; Pham, 1997! of separating
solutions. This yields the choice of k
i
s used by Lee et al. ~1999!,
k
i
sign~E@sech
2
~u
i
!#E@u
i
2
# E@~tanh~u
i
!u
i
# !,
which ensures stability of the learning rule.
Note that although a nonlinear function is used in determining
W, once the algorithm converges and W is found, the decompo-
sition is a linear transformation, u 5 Wx. This extended infomax
algorithm was used to analyze the EEG recordings in this study.
178 T.-P. Jung et al.