Вы находитесь на странице: 1из 4

Q1.

In below figure, samples falling in one of the closed curve, belongs to one class and samples falling
in another closed curve belongs to other class. Does PCA a good choice to reduce dimension of the
data? Why? or Why not?

Answer: PCA may not be a good choice because PCA is meant to find the best representation of the
whole data in lesser dimensions, but the problem stated here belongs to classification problem. Also,
from figures it is clear that direction of maximum variability of the data is almost orthogonal to direction
of variation in class labels.
Thus, after projecting the data in the direction of maximum variability of the data will lose the
important information necessary for classification.
Q2. Use Fisher Linear Discriminant method to find discriminant function also predict if last row belongs
to class1 or class2
class

X1
2.95
2.53
3.57
3.16
2.58
2.16
3.27
2.81

1
1
1
1
2
2
2
prediction

X2
6.63
7.79
5.65
5.47
4.46
6.22
3.52
5.46

Answer:
http://people.revoledu.com/kardi/tutorial/LDA/index.html
class

X1
1
1
1
1
2
2
2

X2

prediction

2.95
2.53
3.57
3.16
2.58
2.16
3.27
2.81

6.63 x1'
7.79 x2'
5.65 x3'
5.47 x4'
4.46 x5'
6.22 x6'
3.52 x7'
5.46 x8'

m1'
m2'

3.05
2.67

6.38
4.73

m'

2.88

5.65

count

P(1)
p(2)

4
3
0.571429
0.428571

mean adjusted
-0.10
0.24
-0.52
1.40
0.52
-0.73
0.11
-0.92
-0.09
-0.28
-0.51
1.49
0.60
-1.21

discriminant function value


2.73
3.11
3.94
1.16
-4.53
-2.76
-2.82
-0.85

S1, cov matrix for class1


0.56
-1.23
-1.23
3.40
S2, cov matrix for class2
0.64
-1.47
-1.47
3.76
Sw=S1+s2
1.20
-2.70

-2.70
7.16

inverse Sw
5.70
2.15

2.15
0.95

w
c
5.714348 -29.9252
2.386407

Q3 Which classification problem from below will get better discriminant function from Least Squares
method and why? (5 min)

Answer: Least Square function equally weights every training sample while estimating discriminant
function (classifier). Now if we see the second figure, classifier obtained from these samples will be
almost equally good for classifying samples in first figure. But, while training with additional samples, at
the bottom right corner of figure first, will affect the classifier to tilt towards them, which may pose a
threat to misclassify some samples that are close to decision boundary.
Q4.
PCA, SVD both are dimension reduction techniques.
PCA can be derived from SVD
PCA is achieved by projecting data points to eigen vectors corresponding to higher eigen values
of covariance matrix
SVD is more powerful, can be calculated even for singular matrix / rectangular matrix
SVD takes decomposition of matrix A = UXV'
U is orthonormal eigen vector matrix in column space of A
V is orthonormal eigen vector matrix in row space of A
X is diagonal matrix keeping eigen values at diagonals
in both eigen vectors corresponding to higher eigen values are responsible for higher variability in data

Q5 classification by least squares


Assign a value / vector to class

S_1
S_2
S_3
S_4

X1
0
6
5
2

X2
5
1
1
2

Y
1
1
-1
-1

use least squares method to this regression problem


W' = [w0 w1 w2]
x' = [1 x1 x2 ]
W'x=Y
Solve W = inverse(X'*X)*X'*Y;

X = [S_1;S_2;...]

for test point x check class is sign(W'x)


Q6. Overfitting in SVM

Hard narrow margin is overfitting.

Solution is Soft margin SVM


(some students have used non-linear SVM to show overfitting. Although overfitting can apply to
nonlinear SVM also, its solutions are not well developed. Moreover, overfitting is not necessarily
linked to non-linearity, as done by many students)
Q7. Fisher Linear Discriminant
Refer to 4.1.5. & 4.1.6 of the Bishop book.