Вы находитесь на странице: 1из 3

ML LAB- ​Principal Component Analysis

-RAHUL NABERA M
-15BCE1101

DATASET: arrhythmia dataset

What is PCA?

Principal component analysis (PCA) is a statistical procedure that uses an ​orthogonal transformation​ to
convert a set of observations of possibly correlated variables into a set of values of ​linearly uncorrelated
variables called principal components. The number of distinct principal components is equal to the
smaller of the number of original variables or the number of observations minus one. This
transformation is defined in such a way that the first principal component has the largest possible
variance​ (that is, accounts for as much of the variability in the data as possible), and each succeeding
component in turn has the highest variance possible under the constraint that it is ​orthogonal​ to the
preceding components. The resulting vectors are an uncorrelated ​orthogonal basis set​. PCA is sensitive
to the relative scaling of the original variables.

CODE:

# Importing the libraries


import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset

dataset = pd.read_csv('log.csv',header=None)

X=dataset.iloc[:,:-1].values
y = dataset.iloc[:,-1 ].values

#from sklearn.preprocessing import Imputer


#imputer = Imputer(missing_values =0, strategy = 'mean', axis = 0)
#imputer = imputer.fit(X[:,10:15])
#X[:,10:15] = imputer.transform(X[:,10:15])

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

#pca
from sklearn.decomposition import PCA
pca = PCA(n_components=7)
X_train = pca.fit_transform(X_train)
X_test = pca.transform(X_test)
Explained= pca.explained_variance_ratio_

from sklearn.neighbors import KNeighborsClassifier


classifier = KNeighborsClassifier(n_neighbors=5)
classifier.fit(X_train, y_train)
# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Making the Confusion Matrix


from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

print('accuracy train:{:.3f}'.format(classifier.score(X_train,y_train)))
print('accuracy test:{:.3f}'.format(classifier.score(X_test,y_test)))

RESULTS:

PCA’s:
ACCURACY:

accuracy train:0.791
accuracy test:0.684

Вам также может понравиться