Академический Документы
Профессиональный Документы
Культура Документы
3 dimensions:
1000 positions
2
1 dimension:
Curse of Dimensionality 10 positions 2 dimensions:
100 positions
3 dimensions:
1000 positions
3
1 dimension:
Curse of Dimensionality 10 positions 2 dimensions:
100 positions
3 dimensions:
1000 positions
4
Solution: Dimensionality Reduction
▪ Data can be represented by fewer
dimensions (features)
Height
Cigarettes/Day
5
Solution: Dimensionality Reduction
▪ Data can be represented by fewer
dimensions (features)
▪ Reduce dimensionality by selecting
subset (feature elimination)
Height
Cigarettes/Day
6
Solution: Dimensionality Reduction
▪ Data can be represented by fewer
dimensions (features)
▪ Reduce dimensionality by selecting
subset (feature elimination)
▪ Combine with linear and non-linear
Height
transformations
Cigarettes/Day
7
Solution: Dimensionality Reduction
▪ Two features: height and cigarettes
per day
▪ Both features increase together
(correlated)
▪ Can we reduce number of features
Height
to one?
Cigarettes/Day
8
Solution: Dimensionality Reduction
▪ Two features: height and cigarettes
per day
▪ Both features increase together
(correlated)
▪ Can we reduce number of features
Height
to one?
Cigarettes/Day
9
Solution: Dimensionality Reduction
▪ Two features: height and cigarettes
per day
▪ Both features increase together
(correlated)
▪ Can we reduce number of features
Height
to one?
Cigarettes/Day
10
Solution: Dimensionality Reduction
▪ Two features: height and cigarettes
per day
▪ Both features increase together
(correlated)
▪ Can we reduce number of features
Height
to one?
Height +
Cigarettes/Day
Cigarettes/Day
11
Solution: Dimensionality Reduction
▪ Create single feature that is
combination of height and cigarettes
▪ This is Principal Component Analysis
(PCA)
Height
Height +
Cigarettes/Day
Cigarettes/Day
12
Solution: Dimensionality Reduction
▪ Create single feature that is
combination of height and cigarettes
▪ This is Principal Component Analysis
(PCA)
Height + Cigarettes/Day
13
Dimensionality Reduction
Given an 𝑁-dimensional data set (𝑥), find a 𝑁 × 𝐾 matrix (𝑈):
𝑦 = 𝑈 𝑇 𝑥, where 𝑦 has 𝐾 dimensions and 𝐾 < 𝑁
𝑥1 𝑦1
𝑥2 𝑈𝑇 𝑦2
𝑥= ⋯ 𝑦 = ⋯ (𝐾 < 𝑁)
𝑥𝑛 𝑦𝑘
14
Principal Component Analysis (PCA)
X2
X1
15
Principal Component Analysis (PCA)
X2
X1
16
Principal Component Analysis (PCA)
X2
X1
17
Principal Component Analysis (PCA)
Direction: 𝑣1
Length: 𝜆1
Direction: 𝑣2
Length: 𝜆2
X2
X1
18
Single Value Decomposition (SVD)
▪ SVD is a matrix factorization method ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ 0 0
normally used for PCA ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ 0 ⋆ 0 ⋆ ⋆ ⋆
⋆ ⋆ ⋆ = ⋆ ⋆ ⋆ ⋆ ⋆ 0 0 ⋆ ⋆ ⋆ ⋆
▪ Does not require a square data set ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ 0 0 0 ⋆ ⋆ ⋆
▪ SVD is used by Scikit-learn for PCA ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ 0 0 0
𝑇
𝐴𝑚×𝑛 𝑈𝑚×𝑚 𝑆𝑚×𝑛 𝑉𝑛×𝑛
19
Truncated Single Value Decomposition
▪ How can SVD be used for ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ 0 0
dimensionality reduction? ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ 0 ⋆ 0 ⋆ ⋆ ⋆
⋆ ⋆ ⋆ = ⋆ ⋆ ⋆ ⋆ ⋆ 0 0 ⋆ ⋆ ⋆ ⋆
▪ Principal components are calculated ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ 0 0 0 ⋆ ⋆ ⋆
from 𝑈𝑆 ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ 0 0 0
𝑇
▪ "Truncated SVD" used for 𝐴𝑚×𝑛 𝑈𝑚×𝑚 𝑆𝑚×𝑛 𝑉𝑛×𝑛
dimensionality reduction (𝑛→𝑘)
20
Importance of Feature Scaling
500
▪ PCA and SVD seek to find the Unscaled
vectors that capture the most
400
variance
▪ Variance is sensitive to axis scale 300
X2
200
100
10 20 30 40 50
X1
21
Importance of Feature Scaling
500
▪ PCA and SVD seek to find the Unscaled Scaled
vectors that capture the most
400
variance
▪ Variance is sensitive to axis scale 300
X2
▪ Must scale data!
200
100
10 20 30 40 50
X1
22
PCA: The Syntax
Import the class containing the dimensionality reduction method.
from sklearn.decomposition import PCA
23
PCA: The Syntax
Import the class containing the dimensionality reduction method.
from sklearn.decomposition import PCA
24
PCA: The Syntax
Import the class containing the dimensionality reduction method.
from sklearn.decomposition import PCA
25
PCA: The Syntax
Import the class containing the dimensionality reduction method.
from sklearn.decomposition import PCA
26
PCA: The Syntax
Import the class containing the dimensionality reduction method.
from sklearn.decomposition import PCA
Fit the instance on the data and then transform the data.
X_trans = PCAinst.fit_transform(X_train)
27
PCA: The Syntax
Import the class containing the dimensionality reduction method.
from sklearn.decomposition import PCA
Fit the instance on the data and then transform the data.
X_trans = PCAinst.fit_transform(X_train)
28
Truncated SVD: The Syntax
Import the class containing the dimensionality reduction method.
from sklearn.decomposition import TruncatedSVD
Fit the instance on the data and then transform the data.
X_trans = SVD.fit_transform(X_sparse)
Works with sparse matrices—used with text data for Latent Semantic
Analysis (LSA).
29
Truncated SVD: The Syntax
Import the class containing the dimensionality reduction method.
from sklearn.decomposition import TruncatedSVD
Works with sparse matrices—used with text data for Latent Semantic
Analysis (LSA).
30
Moving Beyond Linearity
▪ Transformations calculated Original Space Projection by PCA
with PCA/SVD are linear
31
Moving Beyond Linearity
▪ Transformations calculated Original Space Projection by PCA
with PCA/SVD are linear
▪ Data can have non-linear
features
32
Moving Beyond Linearity
▪ Transformations calculated Original Space Projection by PCA
with PCA/SVD are linear
▪ Data can have non-linear
features
▪ This can cause dimensionality
reduction to fail
33
Kernel PCA
▪ Solution: kernels can be used Original Space Projection by KPCA
to perform non-linear PCA
34
Kernel PCA
▪ Solution: kernels can be used
to perform non-linear PCA
Linear PCA Kernel PCA
▪ Like the kernel trick
introduced for SVMs 𝑅2 𝑅2 𝐹
Φ
35
Kernel PCA: The Syntax
Import the class containing the dimensionality reduction method.
from sklearn.decomposition import KernalPCA
Fit the instance on the data and then transform the data.
X_trans = kPCA.fit_transform(X_train)
36
Multi-Dimensional Scaling (MDS)
▪ Non-linear transformation
▪ Doesn't focus on maintaining
overall variance
▪ Instead, maintains geometric
distances between points Z
37
MDS: The Syntax
Import the class containing the dimensionality reduction method.
from sklearn.manifold import MDS
Fit the instance on the data and then transform the data.
X_trans = mdsMod.fit_transform(X_sparse)
38
Uses of Dimensionality Reduction
▪ Frequently used for high
dimensionality data
▪ Natural language processing
(NLP)—many word combinations
▪ Image-based data sets—pixels are
features
39
Uses of Dimensionality Reduction
▪ Divide image into 12 x 12
pixel sections
40
Uses of Dimensionality Reduction
▪ Divide image into 12 x 12 pixel
sections
▪ Flatten section to create row of data 12 x 12
with 144 features
14 14 14
1 2 3 …
2 3 4
41
Uses of Dimensionality Reduction
▪ Divide image into 12 x 12 pixel
sections 14 14 14
1 2 3 …
2 3 4
▪ Flatten section to create row of data 1 2 3 …
14 14 14
2 3 4
with 144 features
14 14 14
1 2 3 …
▪ Perform PCA on all data points 2 3 4
14 14 14
1 2 3 …
2 3 4
14 14 14
1 2 3 …
2 3 4
14 14 14
1 2 3 …
2 3 4
42
PCA Compression: 144 → 60 Dimensions
43
PCA Compression: 144 → 16 Dimensions
44
Sixteen Most Important Eigenvectors
45
PCA Compression: 144 → 4 Dimensions
46
L2 Error and PCA Dimension
1.0
0.8
0.4
0.2
48
Four Most Important Eigenvectors
49
PCA Compression: 144 → 1 Dimension
50