Академический Документы
Профессиональный Документы
Культура Документы
Email: expertsyssol@gmail.com
expertsyssol@yahoo.com
Cell: 9952749533
www.researchprojects.info
PAIYANOOR, OMR, CHENNAI
Call For Research Projects Final
year students of B.E in EEE, ECE, EI,
M.E (Power Systems), M.E (Applied
Electronics), M.E (Power Electronics)
Ph.D Electrical and Electronics.
Students can assemble their hardware in our
Research labs. Experts will be guiding the
projects.
MICROARRAY DATA
REPRESENTED by a N × M matrix
(y 1 , , y M )
y j contains the gene expressions for the N genes
of the jth tissue sample (j = 1, …,M).
N = No. of genes (103 - 104)
M = No. of tissue samples (10 - 102)
Gene 1
Gene 2
Expression Signature
M columns (samples) ~
102
N rows (genes) ~
Expression Profile 104
Gene N
Two Clustering Problems:
f ( y j ) = π 1 f1 ( y j ) + + π g f g ( y j )
where
− 12
φ ( y j ; µ , σ ) = (2π ) σ exp{− ( y j − µ ) σ }
2 −1 1
2
2 2
∆ =3 ∆ =4
∆ =3 ∆ =4
µ 2
(0 0) T (0 0) T (0.360 0.115) T
µ 3
(0 2) T (1 0) T (-0.004 2.027) T
2 0 1 0 1.961 − 0.016
Σ 1
0 0.2 0 1 − 0.016 0.218
2 0 1 0 2.346 − 0.553
Σ
1
0 0.2 0 1 − 0.553 0.218
2 0 1 0 2.339 0.042
Σ 1 0 0.2 0 1 0.042 0.206
Figure 7
Figure 8
MIXTURE OF g NORMAL COMPONENTS
where
where
− 2 log φ (y; μ, Σ) = (y − μ)T Σ−1 (y − μ) + constant
MAHALANOBIS DISTANCE
( y − μ )T ( y − μ )
EUCLIDEAN DISTANCE
MIXTURE OF g NORMAL COMPONENTS
k-means
Σ1 = = Σgg =
σ II 22
SPHERICAL CLUSTERS
Equal spherical covariance matrices
With a mixture model-based approach to
clustering, an observation is assigned
outright to the ith cluster if its density in
the ith component of the mixture
distribution (weighted by the prior
probability of that component) is greater
than in the other (g-1) components.
Principal components or a
single-factor analysis model
provides only a global linear
model.
where
Σ i = Bi B + Di T
i (i = 1,..., g ),
Bi is a p x q matrix and Di is a
diagonal matrix.
Single-Factor Analysis Model
Yj = µ + B U j + e j ( j = 1,..., n) ,
where U j is a q - dimensional (q < p )
vector of latent or unobservable
variables called factors and Bi is a
p x p matrix of factor loadings.
The Uj are iid N(O, Iq)
independently of the errors ej,
which are iid as N(O, D), where D
is a diagonal matrix
D = diag (σ ,..., σ )
2
1
2
p
Conditional on ith component
membership of the mixture,
1
2 { ( p − q ) − (p + q)}
2
n n
µ i
( k +1)
= ∑τ (k )
ij y j / ∑τ (k )
ij
j =1 j =1
for i = 1, ... , g .
M step on 2nd cycle:
( k )T
Β i
( k +1)
= Vi γ
( k +1 / 2 ) (k )
i (γ i V
i
( k +1 / 2 ) ( k )
γ
i +ω ( k +1 / 2 ) −1
i )
(k +1)T
Di
( k +1)
= diag{Vi ( k +1 / 2 )
− Vi γ
( k +1 / 2 ) (k )
i B i }
( k )T (k ) −
where γ =( B B
(k )
i i
(k )
i +D ) Bi
1
i
(k )
,
( k )T
ω (k )
i = Iq − γ i Bi
( k +1 / 2 )
Vi is given by
∑
n
τ ( y j ;Ψ
j =1 i
( k +1 / 2 )
)( y j − µ i
( k +1)
)( y j − µ i
( k +1) T
)
∑ j =1 i j
n
τ ( y ;Ψ ( k +1 / 2 )
)
Work in q-dim space:
(BiBiT + Di ) - 1=
Di –1 - Di -1 Bi (Iq + BiTDi -1 Bi) -1 BiTDi -1 ,
|BiBiT+D i| =
| Di | / |Iq -BiT(BiBiT+Di) -1 Bi| .
ˆ ˆ ˆ ˆ
Di = diag (Vi − Bi Bi ),
T
where
Vˆi =
n
∑τ i ( y j ;Ψ ) ( yˆ j − µˆ i )( yˆ j − µˆ i )
j =1
ˆ T
∑i j )
τ (
j =1
y ;Ψˆ
ˆ ˆ ˆ ˆ
Di = diag (Vi − Bi Bi ),
T
With EM:
where
n
Wi (k )
= (nπˆ i
( k ) −1
) ∑τ
j =1
(k )
ij
(k ) T
E (U jU | y j )}
i j
To avoid potential computational problems
with small-sized clusters, we impose the
constraint
Di = D (i =1,... g )
n n
µ i
( k +1)
= ∑τ u (k ) (k )
ij ij yj ∑τ (k ) (k )
ij uij
j =1 j =1
( k +1 / 2 )
Vi is given by
∑
n
τ ( y j ;Ψ
j =1 i
( k +1 / 2 )
)( y j − µ i
( k +1)
)( y j − µ i
( k +1) T
) u (k )
ij
∑ j =1 i j
n
τ ( y ;Ψ ( k +1 / 2 )
) u (k )
ij
Number of Components
in a Mixture Model
Testing for the number of components,
g, in a mixture is an important but very
difficult problem which has not been
completely resolved.
Order of a Mixture Model
A mixture density with g components might
be empirically indistinguishable from one
with either fewer than g components or
more than g components. It is therefore
sensible in practice to approach the question
of the number of components in a mixture
model in terms of an assessment of the
smallest number of components in the
mixture compatible with the data.
Likelihood Ratio Test Statistic
An obvious way of approaching the problem of
testing for the smallest value of the number of
components in a mixture model is to use the
LRTS, -2logλ . Suppose we wish to test the null
hypothesis,
H 0 : g = g 0 versus H1 : g = g1
for some g1>g0.
We let Ψ̂ i denote the MLE of Ψ calculated
under Hi , (i=0,1). Then the evidence against
H0 will be strong if λ is sufficiently small,
or equivalently, if -2logλ is sufficiently
large, where
http://www.bioinformatics.oupjournals.org/cgi/screen
pdf/18/3/413.pdf
Example: Microarray Data
Colon Data of Alon et al. (1999)
M = 62 (40 tumours; 22 normals)
tissue samples of
N = 2,000 genes in a
2,000 × 62 matrix.
Mixture of 2 normal components
Mixture of 2 t components
The t distribution does not have substantially better breakdown
behavior than the normal (Tyler, 1994).
This point is made more precise in Hennig (2002) who has provided an
excellent account of breakdown points for ML estimation of location
-scale mixtures with a fixed number of components g.
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
Clustering of COLON Data
Tissues using EMMIX-GENE
Grouping for Colon Data
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
Heat Map Displaying the Reduced Set of 4,869 Genes
on the 98 Breast Cancer Tumours
Insert heat map of 1867 genes
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
26 27 28 29 30
31 32 33 34 35
36 37 38 39 40
i mi Ui i mi Ui i mi Ui i mi Ui
1 146 112.98 11 66 25.72 21 44 13.77 31 53 9.84
2 93 74.95 12 38 25.45 22 30 13.28 32 36 8.95
3 61 46.08 13 28 25.00 23 25 13.10 33 36 8.89
4 55 35.20 14 53 21.33 24 67 13.01 34 38 8.86
5 43 30.40 15 47 18.14 25 12 12.04 35 44 8.02
6 92 29.29 16 23 18.00 26 58 12.03 36 56 7.43
7 71 28.77 17 27 17.62 27 27 11.74 37 46 7.21
8 20 28.76 18 45 17.51 28 64 11.61 38 19 6.14
9 23 28.44 19 80 17.28 29 38 11.38 39 29 4.64
10 23 27.73 20 55 13.79 30 21 10.72 40 35 2.44
where i = group number
mi = number in group i
Ui = -2 log λi
Heat Map of Genes in Group G1
Heat Map of Genes in Group G2
Heat Map of Genes in Group G3
Clustering of gene expression profiles
• Cross-sectional data
EMMIX-WIRE
EM-based MIXture analysis With Random Effects
A Mixture Model with Random-Effects Components for Clustering
Correlated Gene-Expression Profiles.
S.K. Ng, G. J. McLachlan, K. Wang, L. Ben-Tovim Jones, S-W. Ng.
Clustering of Correlated Gene Profiles
y j = Xβ h + Ub hj +Vch + ε hj
Clustering of gene expression profiles
π h f ( y j | zhj = 1,c h ;ψ h )
=
∑
g
i =1
π i f ( y j | zij = 1, ci ;ψ i )