Вы находитесь на странице: 1из 51

1 dimension:

Curse of Dimensionality 10 positions 2 dimensions:


100 positions

▪ Theoretically, increasing features should


improve performance

3 dimensions:
1000 positions

2
1 dimension:
Curse of Dimensionality 10 positions 2 dimensions:
100 positions

▪ Theoretically, increasing features should


improve performance
▪ In practice, more features leads to worse
performance

3 dimensions:
1000 positions

3
1 dimension:
Curse of Dimensionality 10 positions 2 dimensions:
100 positions

▪ Theoretically, increasing features should


improve performance
▪ In practice, more features leads to worse
performance
▪ Number of training examples required
increases exponentially with
dimensionality

3 dimensions:
1000 positions

4
Solution: Dimensionality Reduction
▪ Data can be represented by fewer
dimensions (features)

Height

Cigarettes/Day

5
Solution: Dimensionality Reduction
▪ Data can be represented by fewer
dimensions (features)
▪ Reduce dimensionality by selecting
subset (feature elimination)

Height

Cigarettes/Day

6
Solution: Dimensionality Reduction
▪ Data can be represented by fewer
dimensions (features)
▪ Reduce dimensionality by selecting
subset (feature elimination)
▪ Combine with linear and non-linear
Height
transformations

Cigarettes/Day

7
Solution: Dimensionality Reduction
▪ Two features: height and cigarettes
per day
▪ Both features increase together
(correlated)
▪ Can we reduce number of features
Height
to one?

Cigarettes/Day

8
Solution: Dimensionality Reduction
▪ Two features: height and cigarettes
per day
▪ Both features increase together
(correlated)
▪ Can we reduce number of features
Height
to one?

Cigarettes/Day

9
Solution: Dimensionality Reduction
▪ Two features: height and cigarettes
per day
▪ Both features increase together
(correlated)
▪ Can we reduce number of features
Height
to one?

Cigarettes/Day

10
Solution: Dimensionality Reduction
▪ Two features: height and cigarettes
per day
▪ Both features increase together
(correlated)
▪ Can we reduce number of features
Height
to one?
Height +
Cigarettes/Day

Cigarettes/Day

11
Solution: Dimensionality Reduction
▪ Create single feature that is
combination of height and cigarettes
▪ This is Principal Component Analysis
(PCA)

Height
Height +
Cigarettes/Day

Cigarettes/Day

12
Solution: Dimensionality Reduction
▪ Create single feature that is
combination of height and cigarettes
▪ This is Principal Component Analysis
(PCA)

Height + Cigarettes/Day

13
Dimensionality Reduction
Given an 𝑁-dimensional data set (𝑥), find a 𝑁 × 𝐾 matrix (𝑈):
𝑦 = 𝑈 𝑇 𝑥, where 𝑦 has 𝐾 dimensions and 𝐾 < 𝑁

𝑥1 𝑦1
𝑥2 𝑈𝑇 𝑦2
𝑥= ⋯ 𝑦 = ⋯ (𝐾 < 𝑁)
𝑥𝑛 𝑦𝑘

14
Principal Component Analysis (PCA)

X2

X1

15
Principal Component Analysis (PCA)

X2

X1

16
Principal Component Analysis (PCA)

X2

X1

17
Principal Component Analysis (PCA)
Direction: 𝑣1
Length: 𝜆1

Direction: 𝑣2
Length: 𝜆2
X2

X1

18
Single Value Decomposition (SVD)
▪ SVD is a matrix factorization method ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ 0 0
normally used for PCA ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ 0 ⋆ 0 ⋆ ⋆ ⋆
⋆ ⋆ ⋆ = ⋆ ⋆ ⋆ ⋆ ⋆ 0 0 ⋆ ⋆ ⋆ ⋆
▪ Does not require a square data set ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ 0 0 0 ⋆ ⋆ ⋆
▪ SVD is used by Scikit-learn for PCA ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ 0 0 0
𝑇
𝐴𝑚×𝑛 𝑈𝑚×𝑚 𝑆𝑚×𝑛 𝑉𝑛×𝑛

19
Truncated Single Value Decomposition
▪ How can SVD be used for ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ 0 0
dimensionality reduction? ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ 0 ⋆ 0 ⋆ ⋆ ⋆
⋆ ⋆ ⋆ = ⋆ ⋆ ⋆ ⋆ ⋆ 0 0 ⋆ ⋆ ⋆ ⋆
▪ Principal components are calculated ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ 0 0 0 ⋆ ⋆ ⋆
from 𝑈𝑆 ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ 0 0 0
𝑇
▪ "Truncated SVD" used for 𝐴𝑚×𝑛 𝑈𝑚×𝑚 𝑆𝑚×𝑛 𝑉𝑛×𝑛
dimensionality reduction (𝑛→𝑘)

20
Importance of Feature Scaling
500
▪ PCA and SVD seek to find the Unscaled
vectors that capture the most
400
variance
▪ Variance is sensitive to axis scale 300
X2
200

100

10 20 30 40 50
X1

21
Importance of Feature Scaling
500
▪ PCA and SVD seek to find the Unscaled Scaled
vectors that capture the most
400
variance
▪ Variance is sensitive to axis scale 300
X2
▪ Must scale data!
200

100

10 20 30 40 50
X1

22
PCA: The Syntax
Import the class containing the dimensionality reduction method.
from sklearn.decomposition import PCA

23
PCA: The Syntax
Import the class containing the dimensionality reduction method.
from sklearn.decomposition import PCA

Create an instance of the class.


PCAinst = PCA(n_components=3, whiten=True)

24
PCA: The Syntax
Import the class containing the dimensionality reduction method.
from sklearn.decomposition import PCA

Create an instance of the class.


final number
PCAinst = PCA(n_components=3, whiten=True)
of dimensions

25
PCA: The Syntax
Import the class containing the dimensionality reduction method.
from sklearn.decomposition import PCA

Create an instance of the class.


whiten = scale
PCAinst = PCA(n_components=3, whiten=True)
and center data

26
PCA: The Syntax
Import the class containing the dimensionality reduction method.
from sklearn.decomposition import PCA

Create an instance of the class.


PCAinst = PCA(n_components=3, whiten=True)

Fit the instance on the data and then transform the data.
X_trans = PCAinst.fit_transform(X_train)

27
PCA: The Syntax
Import the class containing the dimensionality reduction method.
from sklearn.decomposition import PCA

Create an instance of the class.


PCAinst = PCA(n_components=3, whiten=True)

Fit the instance on the data and then transform the data.
X_trans = PCAinst.fit_transform(X_train)

Does not work with sparse matrices.

28
Truncated SVD: The Syntax
Import the class containing the dimensionality reduction method.
from sklearn.decomposition import TruncatedSVD

Create an instance of the class.


SVD = TruncatedSVD(n_components=3)

Fit the instance on the data and then transform the data.
X_trans = SVD.fit_transform(X_sparse)

Works with sparse matrices—used with text data for Latent Semantic
Analysis (LSA).

29
Truncated SVD: The Syntax
Import the class containing the dimensionality reduction method.
from sklearn.decomposition import TruncatedSVD

Create an instance of the class.


does not
SVD = TruncatedSVD(n_components=3)
center data
Fit the instance on the data and then transform the data.
X_trans = SVD.fit_transform(X_sparse)

Works with sparse matrices—used with text data for Latent Semantic
Analysis (LSA).

30
Moving Beyond Linearity
▪ Transformations calculated Original Space Projection by PCA
with PCA/SVD are linear

31
Moving Beyond Linearity
▪ Transformations calculated Original Space Projection by PCA
with PCA/SVD are linear
▪ Data can have non-linear
features

32
Moving Beyond Linearity
▪ Transformations calculated Original Space Projection by PCA
with PCA/SVD are linear
▪ Data can have non-linear
features
▪ This can cause dimensionality
reduction to fail

dimensionality reduction fails

33
Kernel PCA
▪ Solution: kernels can be used Original Space Projection by KPCA
to perform non-linear PCA

34
Kernel PCA
▪ Solution: kernels can be used
to perform non-linear PCA
Linear PCA Kernel PCA
▪ Like the kernel trick
introduced for SVMs 𝑅2 𝑅2 𝐹
Φ

35
Kernel PCA: The Syntax
Import the class containing the dimensionality reduction method.
from sklearn.decomposition import KernalPCA

Create an instance of the class.


kPCA = KernalPCA(n_components=3, kernel='rbf',gamma=1.0)

Fit the instance on the data and then transform the data.
X_trans = kPCA.fit_transform(X_train)

36
Multi-Dimensional Scaling (MDS)
▪ Non-linear transformation
▪ Doesn't focus on maintaining
overall variance
▪ Instead, maintains geometric
distances between points Z

37
MDS: The Syntax
Import the class containing the dimensionality reduction method.
from sklearn.manifold import MDS

Create an instance of the class.


mdsMod = MDS(n_components=2)

Fit the instance on the data and then transform the data.
X_trans = mdsMod.fit_transform(X_sparse)

Many other manifold dimensionality methods exist: Isomap, TSNE.

38
Uses of Dimensionality Reduction
▪ Frequently used for high
dimensionality data
▪ Natural language processing
(NLP)—many word combinations
▪ Image-based data sets—pixels are
features

Image Source: https://commons.wikimedia.org/wiki/File:Monarch_In_May.jpg

39
Uses of Dimensionality Reduction
▪ Divide image into 12 x 12
pixel sections

Image Source: https://commons.wikimedia.org/wiki/File:Monarch_In_May.jpg

40
Uses of Dimensionality Reduction
▪ Divide image into 12 x 12 pixel
sections
▪ Flatten section to create row of data 12 x 12
with 144 features

14 14 14
1 2 3 …
2 3 4

Image Source: https://commons.wikimedia.org/wiki/File:Monarch_In_May.jpg

41
Uses of Dimensionality Reduction
▪ Divide image into 12 x 12 pixel
sections 14 14 14
1 2 3 …
2 3 4
▪ Flatten section to create row of data 1 2 3 …
14 14 14
2 3 4
with 144 features
14 14 14
1 2 3 …
▪ Perform PCA on all data points 2 3 4
14 14 14
1 2 3 …
2 3 4
14 14 14
1 2 3 …
2 3 4
14 14 14
1 2 3 …
2 3 4

Image Source: https://commons.wikimedia.org/wiki/File:Monarch_In_May.jpg

42
PCA Compression: 144 → 60 Dimensions

144 Dimensions 60 Dimensions

43
PCA Compression: 144 → 16 Dimensions

144 Dimensions 16 Dimensions

44
Sixteen Most Important Eigenvectors

45
PCA Compression: 144 → 4 Dimensions

144 Dimensions 4 Dimensions

46
L2 Error and PCA Dimension
1.0

0.8

Relative Error 0.6

0.4

0.2

20 40 60 80 100 120 140


PCA Dimension
47
Four Most Important Eigenvectors

48
Four Most Important Eigenvectors

49
PCA Compression: 144 → 1 Dimension

144 Dimensions 1 Dimensions

50

Вам также может понравиться