Curse of Dimensionality

1 dimension:
Curse of Dimensionality 10 positions 2 dimensions:

100 positions
▪ Theoretically, increasing features should

improve performance
3 dimensions:
1000 positions
2
1 dimension:
100 positions

improve performance
▪ In practice, more features leads to worse
performance
3 dimensions:
1000 positions
3
1 dimension:
100 positions

improve performance
▪ In practice, more features leads to worse
performance
▪ Number of training examples required
increases exponentially with
dimensionality
3 dimensions:
1000 positions
4
Solution: Dimensionality Reduction
▪ Data can be represented by fewer
dimensions (features)
Height
Cigarettes/Day
5
▪ Reduce dimensionality by selecting
subset (feature elimination)
Height
Cigarettes/Day
6
▪ Reduce dimensionality by selecting
subset (feature elimination)
▪ Combine with linear and non-linear
Height
transformations
Cigarettes/Day
7
▪ Two features: height and cigarettes
per day
▪ Both features increase together
(correlated)
▪ Can we reduce number of features
Height
to one?
Cigarettes/Day
8
per day
(correlated)
Height
to one?
Cigarettes/Day
9
per day
(correlated)
Height
to one?
Cigarettes/Day
10
per day
(correlated)
Height
to one?
Height +
Cigarettes/Day
Cigarettes/Day
11
▪ Create single feature that is
combination of height and cigarettes
▪ This is Principal Component Analysis
(PCA)
Height
Height +
Cigarettes/Day
Cigarettes/Day
12
▪ Create single feature that is
combination of height and cigarettes
▪ This is Principal Component Analysis
(PCA)
Height + Cigarettes/Day
13
Dimensionality Reduction
Given an 𝑁-dimensional data set (𝑥), find a 𝑁 × 𝐾 matrix (𝑈):
𝑦 = 𝑈 𝑇 𝑥, where 𝑦 has 𝐾 dimensions and 𝐾 < 𝑁
𝑥1 𝑦1
𝑥2 𝑈𝑇 𝑦2
𝑥= ⋯ 𝑦 = ⋯ (𝐾 < 𝑁)
𝑥𝑛 𝑦𝑘
14
Principal Component Analysis (PCA)
X2
X1
15
X2
X1
16
X2
X1
17
Direction: 𝑣1
Length: 𝜆1
Direction: 𝑣2
Length: 𝜆2
X2
X1
18
Single Value Decomposition (SVD)
▪ SVD is a matrix factorization method ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ 0 0
normally used for PCA ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ 0 ⋆ 0 ⋆ ⋆ ⋆
⋆ ⋆ ⋆ = ⋆ ⋆ ⋆ ⋆ ⋆ 0 0 ⋆ ⋆ ⋆ ⋆
▪ Does not require a square data set ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ 0 0 0 ⋆ ⋆ ⋆
▪ SVD is used by Scikit-learn for PCA ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ 0 0 0
𝑇
𝐴𝑚×𝑛 𝑈𝑚×𝑚 𝑆𝑚×𝑛 𝑉𝑛×𝑛
19
Truncated Single Value Decomposition
▪ How can SVD be used for ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ 0 0
dimensionality reduction? ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ 0 ⋆ 0 ⋆ ⋆ ⋆
⋆ ⋆ ⋆ = ⋆ ⋆ ⋆ ⋆ ⋆ 0 0 ⋆ ⋆ ⋆ ⋆
▪ Principal components are calculated ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ 0 0 0 ⋆ ⋆ ⋆
from 𝑈𝑆 ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ 0 0 0
𝑇
▪ "Truncated SVD" used for 𝐴𝑚×𝑛 𝑈𝑚×𝑚 𝑆𝑚×𝑛 𝑉𝑛×𝑛
dimensionality reduction (𝑛→𝑘)
20
Importance of Feature Scaling
500
▪ PCA and SVD seek to find the Unscaled
vectors that capture the most
400
variance
▪ Variance is sensitive to axis scale 300
X2
200
100
10 20 30 40 50
X1
21
Importance of Feature Scaling
500
▪ PCA and SVD seek to find the Unscaled Scaled
vectors that capture the most
400
variance
▪ Variance is sensitive to axis scale 300
X2
▪ Must scale data!
200
100
10 20 30 40 50
X1
22
PCA: The Syntax
Import the class containing the dimensionality reduction method.
from sklearn.decomposition import PCA
23
PCA: The Syntax
Create an instance of the class.

PCAinst = PCA(n_components=3, whiten=True)
24
PCA: The Syntax

final number
of dimensions
25
PCA: The Syntax

whiten = scale
and center data
26
PCA: The Syntax

Fit the instance on the data and then transform the data.
X_trans = PCAinst.fit_transform(X_train)
27
PCA: The Syntax

X_trans = PCAinst.fit_transform(X_train)
Does not work with sparse matrices.
28
Truncated SVD: The Syntax
from sklearn.decomposition import TruncatedSVD

SVD = TruncatedSVD(n_components=3)
X_trans = SVD.fit_transform(X_sparse)
Works with sparse matrices—used with text data for Latent Semantic
Analysis (LSA).
29
Truncated SVD: The Syntax
from sklearn.decomposition import TruncatedSVD

does not
SVD = TruncatedSVD(n_components=3)
center data
X_trans = SVD.fit_transform(X_sparse)
Works with sparse matrices—used with text data for Latent Semantic
Analysis (LSA).
30
Moving Beyond Linearity
▪ Transformations calculated Original Space Projection by PCA
with PCA/SVD are linear
31
▪ Data can have non-linear
features
32
▪ Data can have non-linear
features
▪ This can cause dimensionality
reduction to fail
dimensionality reduction fails
33
Kernel PCA
▪ Solution: kernels can be used Original Space Projection by KPCA
to perform non-linear PCA
34
Kernel PCA
▪ Solution: kernels can be used
to perform non-linear PCA
Linear PCA Kernel PCA
▪ Like the kernel trick
introduced for SVMs 𝑅2 𝑅2 𝐹
Φ
35
Kernel PCA: The Syntax
from sklearn.decomposition import KernalPCA

kPCA = KernalPCA(n_components=3, kernel='rbf',gamma=1.0)
X_trans = kPCA.fit_transform(X_train)
36
Multi-Dimensional Scaling (MDS)
▪ Non-linear transformation
▪ Doesn't focus on maintaining
overall variance
▪ Instead, maintains geometric
distances between points Z
37
MDS: The Syntax
from sklearn.manifold import MDS

mdsMod = MDS(n_components=2)
X_trans = mdsMod.fit_transform(X_sparse)
Many other manifold dimensionality methods exist: Isomap, TSNE.
38
Uses of Dimensionality Reduction
▪ Frequently used for high
dimensionality data
▪ Natural language processing
(NLP)—many word combinations
▪ Image-based data sets—pixels are
features
Image Source: https://commons.wikimedia.org/wiki/File:Monarch_In_May.jpg
39
▪ Divide image into 12 x 12
pixel sections
40
▪ Divide image into 12 x 12 pixel
sections
▪ Flatten section to create row of data 12 x 12
with 144 features
14 14 14
1 2 3 …
2 3 4
41
▪ Divide image into 12 x 12 pixel
sections 14 14 14
1 2 3 …
2 3 4
▪ Flatten section to create row of data 1 2 3 …
14 14 14
2 3 4
with 144 features
14 14 14
1 2 3 …
▪ Perform PCA on all data points 2 3 4
14 14 14
1 2 3 …
2 3 4
14 14 14
1 2 3 …
2 3 4
14 14 14
1 2 3 …
2 3 4
42
PCA Compression: 144 → 60 Dimensions
144 Dimensions 60 Dimensions
43
44
Sixteen Most Important Eigenvectors
45
46
L2 Error and PCA Dimension
1.0
0.8
Relative Error 0.6
0.4
0.2
20 40 60 80 100 120 140

PCA Dimension
47
Four Most Important Eigenvectors
48
Four Most Important Eigenvectors
49
PCA Compression: 144 → 1 Dimension
50

Curse of Dimensionality

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Curse of Dimensionality

Загружено:

Авторское право:

Доступные форматы

1 dimension:

Curse of Dimensionality 10 positions 2 dimensions:

▪ Theoretically, increasing features should

▪ Theoretically, increasing features should

▪ Theoretically, increasing features should

Create an instance of the class.

Create an instance of the class.

Create an instance of the class.

Create an instance of the class.

Create an instance of the class.

Does not work with sparse matrices.

Create an instance of the class.

Create an instance of the class.

dimensionality reduction fails

Create an instance of the class.

Create an instance of the class.

Many other manifold dimensionality methods exist: Isomap, TSNE.

Image Source: https://commons.wikimedia.org/wiki/File:Monarch_In_May.jpg

Image Source: https://commons.wikimedia.org/wiki/File:Monarch_In_May.jpg

Image Source: https://commons.wikimedia.org/wiki/File:Monarch_In_May.jpg

Image Source: https://commons.wikimedia.org/wiki/File:Monarch_In_May.jpg

144 Dimensions 60 Dimensions

144 Dimensions 16 Dimensions

144 Dimensions 4 Dimensions

Relative Error 0.6

20 40 60 80 100 120 140

144 Dimensions 1 Dimensions

Вам также может понравиться