Академический Документы
Профессиональный Документы
Культура Документы
Classification challenges
• In machine learning, we have many parameters which is
called as features.
2 Dimensional graph,
Gene 1 – X axis and Gene 2 – Y axis
3 Dimensional Graph
4 Dimensional Graph
PCA is the solution
• Let Discuss how PCA can handle 4 or more
Gene.
• We will also see how 4D is converted into 2D
graph
PCA with 2 variables
PCA with 2 Gene
PCA With 2 gene find the center Point
Focus the data
Lets Focus on the plotted data and not on the original data anymore.
X is the center point
Move the data point to the center
Shifting the data didn’t not shift or change the data points position relative to each other
Fit a Line & Rotate it
How PCA decides if this fit is good or not ??
PC1 Calculation
Distance calculation
Distance Criteria
Slope is 0.25
Inference
Most of the data are spread across Gene 1 and less data across Gene 2
PCA 1 = 4 parts of Gene 1 + 1 part of Gene 2
For PCA using SVD, length is considered as 1
PCA 1 – Final calculation
PC2 Calculation
PC2 Steps
PCA Final Plot
Rotate the plot
Variation around PCA
Scree Plot
PCA with 3 variables
PCA With 3 variables
Center the data & find Best fit line to
find PCA1
PCA2
PCA3
PCA and its variation
Convert 3D to 2D graph
Strip away everything other than PCA1 and PCA2
PCA with 4 variables
PCA with 4D
• Not possible to draw 4d graph, but however
PCA calculation can be done
SCREE plot for 4D Data
Maths
Standard Deviation
• The Standard Deviation (SD) of a data set is a
measure of how spread out
Variance
• Variance is another measure of the spread of
data in a data set. In fact it is almost identical
to the standard deviation. The formula is
Covariance
• Covariance is always measured between 2 dimensions
• Positive covariance: Indicates that two variables tend to move in the same
direction.
Eigen Values
0.049083 0
0 1.284028
Rotated Axes
2
1.5
0.5
-0.5
-1
-1.5
-2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
Find new Feature vector