Академический Документы
Профессиональный Документы
Культура Документы
Individuals cloud
Variables cloud
Helps to interpret
useR-2008
Dortmund, August 11th 2008
1 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
French ...)
We made an R package:
Possibility to add supplementary information
The use of a more geometrical point of view allowing to draw
graphs
The possibility to propose new methods (taking into account
2 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
Outline
Some extensions
3 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
variables
Dimensionality reduction describe the dataset with smaller
number of variables
Techniques widely used for applications such as: data
4 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
Descriptive methods
Data visualization
Geometrical approach: importance to graphical outputs
Identification of clusters, detection of outliers
5 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
6 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
PCA in R
7 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
8 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
Some examples
Many examples
Sensory analysis: products - descriptors
Environmental data: plants - measurements; waters physico-chemical analyses
Economy: countries - economic indicators
Microbiology: cheeses - microbiological analyses
etc.
Today we illustrate PCA with:
data decathlon: athletes performances during two athletics
meetings
data chicken: genomics data
9 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
Decathlon data
100m
Long.jump
Shot.put
High.jump
400m
110m.hurdle
Discus
Pole.vault
Javeline
1500m
Rank
Points
Competition
41 athletes (rows)
13 variables (columns):
10 continuous variables corresponding to the performances
2 continuous variables corresponding to the rank and the
points obtained PCA Example
1 categorical variable corresponding to the athletics meeting:
DataOlympic
: performances
41 athletes
during
two meetings of decathlon
Gameofand
Decastar
(2004)
SEBRLE
CLAY
KARPOV
BERNARD
YURKOV
11.04
10.76
11.02
11.02
11.34
7.58
7.40
7.30
7.23
7.09
14.83
14.26
14.77
14.25
15.19
2.07
1.86
2.04
1.92
2.10
49.81
49.37
48.37
48.93
50.42
14.69
14.05
14.09
14.99
15.31
43.75
50.72
48.95
40.87
46.26
5.02
4.92
4.92
5.32
4.72
63.19
60.15
50.31
62.77
63.44
291.70
301.50
300.20
280.10
276.40
1
2
3
4
5
8217
8122
8099
8067
8036
Decastar
Decastar
Decastar
Decastar
Decastar
Sebrle
Clay
Karpov
Macey
Warners
10.85
10.44
10.50
10.89
10.62
7.84
7.96
7.81
7.47
7.74
16.36
15.23
15.93
15.73
14.48
2.12
2.06
2.09
2.15
1.97
48.36
49.19
46.81
48.97
47.97
14.05
14.13
13.97
14.56
14.01
48.72
50.11
51.65
48.34
43.73
5.00
4.90
4.60
4.40
4.90
70.52
69.71
55.54
58.46
55.39
280.01
282.00
278.11
265.42
278.05
1
2
3
4
5
8893
8820
8725
8414
8343
OlympicG
OlympicG
OlympicG
OlympicG
OlympicG
10 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
Problems - objectives
11 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
var 1
var k
ind 1
ind k
12 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
same units
20
60
+
+
+
40
+
++ +
++
20
+
+++
+
++
+
+ ++
++
+
+
+
+
+
+
+
+
+
+
20
+++ +
+ ++
+ +
+
+ ++ + + ++ + ++ + +
+++ ++
40
10
10
+
+
+
+
+
+
+
+
20
60
10
10
20
30
40
+
50
50
Introduction
Individuals cloud
Variables cloud
Helps to interpret
Individuals cloud
Individuals are in RK
Similarity between individuals: Euclidean distance
Study the structure, i.e. the shape of the individual cloud
14 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
Inertia
Ig
I
X
pi ||xi. g ||2 =
i=1
= tr (S) =
1X
(xi. g )0 (xi. g ),
I
i=1
s = K .
15 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
16 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
min
= < x, u1 > u1
max
Fu
17 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
maximized:
u1 = argmaxu1 var (Xu1 ) with u10 u1 = 1
1
1
var (Fu1 ) = (Xu1 )0 Xu1 = u10 X 0 Xu1
I
I
It leads to:
max u10 Su1 with u10 u1 = 1
u1 first eigenvector of S (associated with the largest
eigenvalue 1 ):
Su1 = 1 u1 .
This eigenvector is known as the first axis (loadings).
18 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
P 1
k k
Additional axes are defined in an incremental fashion: each new
Introduction
Individuals cloud
Variables cloud
Helps to interpret
Casarsa
YURKOV
Parkhomenko
Korkizoglou
0
2
Dimension 2 (17.37%)
Zsivoczky
Smith
Macey
SEBRLE
Pogorelov
CLAY
MARTINEAU
HERNU
KARPOV
Turi Terek
Barras
Uldal
McMULLEN
BOURGUIGNON
Schoenbeck
Bernard
Karlivans
BARRAS Qi
Hernu
Ojaniemi
BERNARD
Smirnov
ZSIVOCZKY
Gomez
Schwarzl
Nool
Averyanov
Lorenzo
WARNERS
Warners
NOOL
Sebrle
Clay
Karpov
Drews
Dimension 1 (32.72%)
Introduction
Individuals cloud
Variables cloud
Helps to interpret
individuals coordinates in RI )
r(F2,k)
r(F1,k)
Introduction
Individuals cloud
Variables cloud
Helps to interpret
Discus
Shot.put
X400m
0.5
X1500m
High.jump
0.0
Pole.vault
0.5
Long.jump
1.0
Dimension 2 (17.37%)
Javeline
X110m.hurdle
X100m
1.0
0.5
0.0
0.5
1.0
Dimension 1 (32.72%)
22 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
Gv
P
PK
PK
2
2
2
max K
k=1 Gv1 (k) = max
k=1 cor (v1 , x.k ) = max
k=1 cos ()
with v 0 v = 1
v1 is the best synthetic variable
23 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
24 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
Discus
Shot.put
X400m
0.5
X1500m
High.jump
X110m.hurdle
X100m
0.0
Dimension 2 (17.37%)
Javeline
Pole.vault
1.0
0.5
Long.jump
1.0
0.5
0.0
0.5
1.0
Dimension 1 (32.72%)
Introduction
Individuals cloud
Variables cloud
Helps to interpret
Projections...
Only well projected variables (high cos2 between the variable and
its projection) can be interpreted!
A
HA
HB
HA
HB
D
HD
HD
HC
HC
B
26 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
Exercices...
27 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
Exercices...
mat=matrix(rnorm(7*200,0,1),ncol=200)
PCA(mat)
28 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
v=
u=
1 Fu
1 Gv
Gv = X 0 v = 1 X 0 Fu
Fu = Xu = 1 XGv
K
1 X
Fs (i) =
xik Gs (k)
s k=1
I
1 X
Gs (k) =
xik Fs (i)
s i=1
29 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
Example on decathlon
Dim 2 (17.37%)
Discus
X400m
X1500m
Javeline
X110m.hurdle
X100m
Casarsa
Shot.put
High.jump
Rank
Points
Dim 1 (32.72%)
Pole.vault
YURKOV
Parkhomenko
Korkizoglou
Zsivoczky
Smith
Macey
Pogorelov
SEBRLE
CLAY
MARTINEAU
HERNU
KARPOV
Turi TerekBarras
McMULLEN
BOURGUIGNON Uldal
OlympicG
Decastar
Schoenbeck
Bernard
Karlivans
BARRAS Qi
Hernu
Ojaniemi
BERNARD
Smirnov
ZSIVOCZKY
Gomez
Schwarzl
Nool
Averyanov
Lorenzo
WARNERS
Warners
NOOL
Long.jump
Sebrle
Clay
Karpov
Dim 1 (32.72%)
Drews
Introduction
Individuals cloud
Variables cloud
Helps to interpret
Discus
Shot.put
X400m
0.5
X1500m
High.jump
X110m.hurdle
X100m
Rank
0.0
Dimension 2 (17.37%)
Javeline
Points
Pole.vault
1.0
0.5
Long.jump
1.0
0.5
0.0
0.5
1.0
Dimension 1 (32.72%)
Introduction
Individuals cloud
Variables cloud
Helps to interpret
HERNU
BARRAS
NOOL
BOURGUIGNON
Sebrle
Clay
Decastar
Olympic.G
11.18
10.92
7.25
7.27
14.16
14.62
1.98
1.98
32 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
Casarsa
YURKOV
Parkhomenko
Korkizoglou
Zsivoczky
Smith
Sebrle
Macey
SEBRLE
Pogorelov
CLAY
MARTINEAU
HERNU
KARPOV
Turi Terek
Barras
McMULLEN
BOURGUIGNON Uldal
OlympicG
Decastar
Schoenbeck
Bernard
Karlivans
BARRAS Qi
Hernu
Ojaniemi
BERNARD
Smirnov
ZSIVOCZKY
Gomez
Schwarzl
Nool
Averyanov
Lorenzo
WARNERS
Warners
NOOL
Dimension 2 (17.37%)
Clay
Karpov
Drews
Dimension 1 (32.72%)
Introduction
Individuals cloud
Variables cloud
Helps to interpret
Confidence ellipses
Individuals factor map (PCA)
Decastar
OlympicG
Casarsa
YURKOV
Parkhomenko
Korkizoglou
Zsivoczky
Smith
Sebrle
Macey
SEBRLE
Pogorelov
CLAY
MARTINEAU
HERNU
KARPOV
Turi Terek
Barras
McMULLEN
BOURGUIGNON Uldal
OlympicG
Decastar
Schoenbeck
Bernard
Karlivans
BARRAS Qi
Hernu
Ojaniemi
BERNARD
Smirnov
ZSIVOCZKY
Gomez
Schwarzl
Nool
Averyanov
Lorenzo
WARNERS
Warners
NOOL
Dimension 2 (17.37%)
Clay
Karpov
Drews
Dimension 1 (32.72%)
Introduction
Individuals cloud
Variables cloud
Helps to interpret
Number of dimensions?
PQ
s
PsK
s
s
35 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
nr=50
nc=8
iner=rep(0,1000)
for (i in 1:1000)
{
mat=matrix(rnorm(nr*nc,0,1),ncol=nc)
iner[i]=PCA(mat,graph=F)$eig[2,3]
}
quantile(iner,0.95)
36 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
4
96.5
93.3
90.5
88.1
86.1
84.5
82.8
81.5
80.0
79.0
78.1
77.3
76.5
75.5
75.1
74.1
72.0
69.8
68.5
67.5
66.4
65.6
60.9
5
93.1
88.6
84.9
82.3
79.5
77.5
75.7
74.0
72.5
71.5
70.3
69.4
68.4
67.6
67.0
66.1
63.3
61.1
59.6
58.3
57.1
56.3
51.4
6
90.2
84.8
80.9
77.2
74.8
72.3
70.3
68.6
67.2
65.7
64.6
63.5
62.6
61.8
60.9
60.1
57.1
55.1
53.3
52.0
50.8
49.9
44.9
7
87.6
81.5
77.4
73.8
70.7
68.2
66.3
64.4
62.9
61.5
60.3
59.2
58.2
57.1
56.5
55.6
52.5
50.3
48.6
47.3
46.1
45.2
40.0
8
85.5
79.1
74.4
70.7
67.4
65.0
62.9
61.2
59.4
58.1
57.0
55.6
54.7
53.7
52.8
52.1
48.9
46.7
44.9
43.4
42.4
41.4
36.3
Number of variables
9
10
11
83.4
81.9
80.7
76.9
75.1
73.2
72.0
70.1
68.3
68.2
66.1
64.0
65.1
62.9
61.1
62.4
60.1
58.3
60.1
58.0
56.0
58.3
55.8
54.0
56.7
54.4
52.2
55.1
52.8
50.8
53.9
51.5
49.4
52.9
50.3
48.3
51.8
49.3
47.1
50.8
48.4
46.3
49.9
47.4
45.5
49.1
46.6
44.7
46.0
43.4
41.4
43.6
41.1
39.1
41.9
39.5
37.4
40.5
38.0
36.0
39.3
36.9
34.8
38.4
35.9
33.9
33.3
31.0
28.9
12
79.4
72.2
67.0
62.8
59.4
56.5
54.4
52.4
50.5
49.0
47.8
46.6
45.5
44.6
43.7
42.9
39.6
37.3
35.6
34.1
33.1
32.1
27.2
13
78.1
70.8
65.3
61.2
57.9
55.1
52.7
50.9
48.9
47.5
46.1
45.2
44.0
43.0
42.1
41.3
38.1
35.7
34.0
32.7
31.5
30.5
25.8
14
77.4
69.8
64.3
60.0
56.5
53.7
51.3
49.3
47.7
46.2
44.9
43.6
42.6
41.6
40.7
39.8
36.7
34.4
32.7
31.3
30.2
29.2
24.5
15
76.6
68.7
63.2
59.0
55.4
52.5
50.1
48.2
46.6
45.0
43.6
42.4
41.4
40.4
39.6
38.7
35.5
33.2
31.6
30.1
29.0
28.1
23.3
16
75.5
68.0
62.2
58.0
54.3
51.5
49.2
47.2
45.4
44.0
42.5
41.4
40.3
39.3
38.4
37.5
34.5
32.1
30.4
29.1
27.9
27.0
22.3
37 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
17
74.9
67.0
61.3
57.0
53.6
50.6
48.1
46.2
44.4
42.9
41.6
40.4
39.4
38.3
37.4
36.7
33.5
31.2
29.5
28.1
27.0
26.1
21.5
18
74.2
66.3
60.7
56.2
52.5
49.8
47.2
45.2
43.4
42.0
40.7
39.5
38.5
37.4
36.5
35.8
32.5
30.3
28.6
27.3
26.1
25.3
20.7
19
73.5
65.6
59.7
55.4
51.8
49.0
46.5
44.4
42.8
41.3
39.8
38.7
37.6
36.7
35.8
34.9
31.8
29.5
27.9
26.5
25.4
24.6
19.9
20
72.8
64.9
59.1
54.5
51.2
48.3
45.8
43.8
41.9
40.4
39.1
37.9
36.9
35.8
34.9
34.2
31.1
28.8
27.1
25.8
24.7
23.8
19.3
25
70.7
62.3
56.4
51.8
48.1
45.2
42.8
40.7
39.0
37.4
36.2
35.0
33.8
32.9
32.0
31.3
28.1
26.0
24.3
23.0
21.9
21.1
16.7
Number of variables
30
35
40
68.8
67.4
66.4
60.4
58.9
57.6
54.3
52.6
51.4
49.7
47.8
46.7
45.9
44.4
42.9
42.9
41.4
40.1
40.6
39.0
37.7
38.5
36.9
35.5
36.8
35.1
33.9
35.2
33.6
32.3
34.0
32.4
31.1
32.8
31.1
29.8
31.7
30.1
28.8
30.7
29.1
27.8
29.9
28.3
27.0
29.1
27.5
26.2
26.0
24.5
23.3
23.9
22.3
21.1
22.2
20.7
19.6
21.0
19.5
18.4
20.0
18.5
17.4
19.1
17.7
16.6
14.9
13.6
12.5
50
64.7
55.8
49.5
44.6
41.0
38.0
35.6
33.5
31.8
30.4
29.0
27.9
26.8
25.9
25.1
24.3
21.4
19.3
17.8
16.6
15.7
14.9
11.0
75
62.0
52.9
46.4
41.6
38.0
35.0
32.6
30.5
28.8
27.4
26.0
24.9
23.9
22.9
22.2
21.4
18.6
16.6
15.2
14.1
13.2
12.5
8.9
100
60.5
51.0
44.6
39.8
36.1
33.2
30.8
28.8
27.1
25.7
24.3
23.2
22.2
21.3
20.5
19.8
17.0
15.1
13.7
12.7
11.8
11.1
7.7
150
58.5
49.0
42.4
37.6
34.0
31.0
28.7
26.7
25.0
23.6
22.4
21.2
20.3
19.4
18.6
18.0
15.2
13.4
12.1
11.1
10.3
9.6
6.4
200
57.4
47.8
41.2
36.4
32.7
29.8
27.5
25.5
23.9
22.4
21.2
20.1
19.2
18.3
17.5
16.9
14.2
12.5
11.1
10.2
9.4
8.7
5.7
Introduction
Individuals cloud
Variables cloud
Helps to interpret
Dim.2
0.08
0.03
0.00
39 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
Contribution
Contribution to the inertia to create the axis:
For the individuals: Ctrs (i) =
F 2 (i)
PI s 2
i =1 Fs (i)
Fs2 (i)
s
Gs2 (k)
s
Introduction
Individuals cloud
Variables cloud
Helps to interpret
0.96
0.74
0.62
-0.67
-0.68
-0.75
-0.77
$Dim.2
$Dim.2$quanti
Dim.2
Discus
Shot.put
0.61
0.60
41 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
P-value
0.155
$Dim.1$category
Estimate
OlympicG
0.4393
Decastar
-0.4393
P-value
0.155
0.155
42 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
Practice
library(FactoMineR)
data(decathlon)
res <- PCA(decathlon,quanti.sup=11:12,quali.sup=13)
plot(res,habillage=13)
res$eig
x11()
barplot(res$eig[,1],main="Eigenvalues",names.arg=1:nrow(res$eig))
res$ind$coord
res$ind$cos2
res$ind$contrib
dimdesc(res)
aa=cbind.data.frame(decathlon[,13],res$ind$coord)
bb=coord.ellipse(aa,bary=TRUE)
plot.PCA(res,habillage=13,ellipse=bb)
#write.infile(res,file="my_FactoMineR_results.csv") #to export a list
43 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
Application
Chicken data:
43 chickens (individuals)
7407 genes (variables)
One categorical variable: 6 diets corresponding to different
stresses
Do genes differentially expressed from one stress to another?
44 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
100
j48r24_9
j48r24_5
50
j48r24_7
J48R24
j48r24_3
j48r24_6
j48r24_2
J16
j48r24_1
0
j48_7
j48_6
j48r24_4
j48_1
J48
J16R16
J16R5
j48_3
j48_4
j48_2
-50
Dimension 2 (9.35%)
j48r24_8
-100
-50
50
Dimension 1 (19.63%)
45 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
-20
j48r24_8
j16r5_6
j48r24_9
j16r5_8
j48_7 j16r5_2 j16r5_7
j16r16_1 j48_1
j16_7
j16_6
N_1
j16r5_5
j48_6
N_2
j48_4
N_3 N
j48r24_6
J16R5
N_6
J48
j48r24_1
J16
N_4 j48r24_2 j16r16_5
N_7 j16r16_6
j16_5
j16r16_7
j16r16_3
j16r16_2
j16_4
j48_3
J16R16
J48R24
j16r16_9 j16r16_8
j16r5_1 j16_3
j16r5_4
j16r5_3
j48r24_7
j48_2
j48r24_3
-40
j48r24_5
j16r16_4
j48r24_4
-60
Dimension 4 (5.87%)
20
40
-60
-40
-20
20
40
60
Dimension 3 (7.24%)
46 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
100
j48r24_9
j48r24_5
50
j48r24_7
J48R24
j48r24_3
j48r24_6
j48r24_2
J16
j48r24_1
0
j48_7
j48_6
j48r24_4
j48_1
J48
J16R16
J16R5
j48_3
j48_4
j48_2
-50
Dimension 2 (9.35%)
j48r24_8
-100
-50
50
Dimension 1 (19.63%)
47 / 48
Introduction
Individuals cloud
Variables cloud
Helps to interpret
-20
j48r24_8
j16r5_6
j48r24_9
j16r5_8
j48_7 j16r5_2 j16r5_7
j16r16_1 j48_1
j16_7
j16_6
N_1
j16r5_5
j48_6
N_2
j48_4
N_3 N
j48r24_6
J16R5
N_6
J48
j48r24_1
J16
N_4 j48r24_2 j16r16_5
N_7 j16r16_6
j16_5
j16r16_7
j16r16_3
j16r16_2
j16_4
j48_3
J16R16
J48R24
j16r16_9 j16r16_8
j16r5_1 j16_3
j16r5_4
j16r5_3
j48r24_7
j48_2
j48r24_3
-40
j48r24_5
j16r16_4
j48r24_4
-60
Dimension 4 (5.87%)
20
40
-60
-40
-20
20
40
60
Dimension 3 (7.24%)
48 / 48