Академический Документы
Профессиональный Документы
Культура Документы
Abstract: Diabetes Mellitus, simply called as Diabetes, is a harmful disease, in which a person is affected with high
blood glucose level. The main cause of this disease is that body fails to produce insulin or not properly utilization of
insulin. The Diabetes can results in Insulin Resistance, age, central obesity, Stress, Polyuria, Polydipsia disease etc.
High level of Diabetes is associated with the heart disease. The Diabetes Disease is highly prevalent in world. There is
system used to predict the complications with the use of clinical dataset. But few systems prediction based on risk
factors. Systems based on risk factors helps not only of experts but also warn patients in advance. This paper proposes
a methodology that aims to predict complications regarding diabetes disease in advance on basis of risk factors. This
methodology uses the data mining technique for prediction. In this paper, two data mining techniques: Feed Forward
Neural Network and Kernel Discriminant Analysis (KDA) technique is used for classifying the medical databases. The
parameters used for disease identification have been designed in such a way that user can predict himself either he is
affected with diabetes or not.
Keywords: Diabetes Mellitus, Risk Factors, Medical Databases, Kernel Discriminant Analysis, Feed Forward Neural
Network.
I. INTRODUCTION
Diabetes is one of the most dangerous diseases that causes of death [1]. Diabetes is metabolic disorder that occurs due to
failure of body due to produce insulin properly. According to W.H.O, by 2015 a total of 3 hundred millions of the world
population will be affected by diabetes [2]. It has been noticed that diabetes affected a more fatal persons and also the
women than men. The cause of worst affect on women is their lower survival rate and poor quality of life. A cause of
diabetes is also that many of the peoples don’t have knowledge this disease [3]. Human body needs energy for activation
the carbohydrates are broken down to glucose. That is the important energy source for body cells. Insulin is necessary to
translate the glucose into body cells. The blood glucose is supplied with insulin [4]. In the world, there are many systems
that are used for the advanced complication predictions of diabetes symptoms and produce the results on the basis of
these symptoms. Most of these systems predict the results based on datasets available in clinical labs. But some the
systems predict the causes of diabetes based on the risk factors. Such as Insulin Resistance, age, central obesity, Stress,
Polyuria, Polydipsia disease etc, but still the major problem of these systems are to diagnose the disease correctly and
costly medical tests [1]. Many data mining techniques are used to solve these problems. Data mining techniques helps
the experts and patients to calculate the diabetes risks, on the basis of risks they can know in advance either they affected
with diabetes or not.
III. METHODOLOGY
a. The Data
There are many risk factors that become a cause of diabetes. It is very difficult to diagnose these factors easily. Most of
the time the disease is diagnosed at the last stage of disease. With the help of risk factors, it is easy to diagnose disease
possibilities in advance. The dataset used in this research composed of 9 risk factors are blood cholesterol, plasma
glucose, diastolic blood pressure, triceps (SFT), Insulin, BMI, DPF, age class. On the basis of these risk factors the
results are computed to know whether the patient has risk of Diabetes or not. The dataset contains 768 people data
collected from the UCI repository. The dataset consists of 9 attributes as shown in Table 1.
Prediction
Positive Negative
Positive TP FP
Negative FN TN
The following set of Evaluation measures are being used to find out the results [8].
Sensitivity: A high sensitivity is clearly important where the test is used to identify a serious but treatable disease.
𝑇𝑃
Sensitivity =
𝑇𝑃+𝐹𝑁
Specificity: The specificity of a clinical test refers to the ability of the test to correctly identify those patients without the
disease.
𝑇𝑁
Specificity =
𝑇𝑁+𝐹𝑃
Accuracy: Accuracy measures correctly figured out the diagnostic test by eliminating a given condition.
𝑇𝑃+𝑇𝑁
Accuracy =
𝑇𝑃+𝐹𝑃+𝑇𝑁+𝐹𝑁
Precision: is the fraction of retrieved instances that are relevant. Precision is calculated by:
𝑇𝑃
Precision=
𝑇𝑃+𝐹𝑃
Recall: - is the fraction of relevant instances that are retrieved. Recall is calculated by:
𝑇𝑃
Recall=
𝑇𝑃+𝐹𝑁
F-measure: - A measure that compile precision and Recall is the harmonic mean of Precision and Recall, the traditional
F-measure and balanced F-score. It totally depends upon value Precision and Recall. F-measure is calculated by:
(2∗𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗𝑅𝑒𝑐𝑎𝑙𝑙 )
F-measure=
(𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 +𝑅𝑒𝑐𝑎𝑙𝑙 )
The results on the dataset will be displayed in the form of two-dimensional confusion matrix having a row and column
for each class.
VI. CONCLUSION
Data mining techniques used for diabetes prediction provides best results. In this research KDA process extract the
features properly and Feed Forward neural network provides the results on the basis of these features. The combination
of KDA and Feed Forward Neural Network achieved the accuracy of 96%. So, the obtained results show that this
approach performs better than existing techniques. For future, there is enough scope for improvement in this field and
with the advent of faster and more accurate learning techniques. Results can be surely improved consider. To improve the
performance of this system by using the better feature extraction technique and better classification methods to create
model, this will give the efficient results.
REFERENCES
[1] Anand A. Chaudhari, Prof.S.P.Akarte” Fuzzy & Datamining based Disease Prediction Using K-NN Algorithm”
International Journal of Innovations in Engineering and Technology (IJIET) ISSN: 2319 – 1058, Volume 3,
Issue 4, April 2014.
[2] Prof.Sumathy, Prof.Mythili Thirugnanam, Dr.Praveen Kumar, Jishnujit T M, K Ranjith Kumar” Diagnosis of
Diabetes Mellitus based on Risk Factors” International Journal of Computer Applications (0975 – 8887),
Volume 10, Issue 4, November 2010.
[3] Aiswarya Iyer, S. Jeyalatha and Ronak Sumbaly” Diagnosis Of Diabetes Using Classification Mining
Techniques” International Journal of Data Mining & Knowledge Management Process (IJDKP) Volume 5,
Issue 1, January 2015.
[4] M. Durairaj, G. Kalaiselvi “ Prediction Of Diabetes Using Soft Computing Techniques- A Survey” International
journal of scientific & technology research, ISSN 2277-8616, Volume 4, Issue 03, March 2015.
[5] K. Rajesh, V. Sangeetha” Application of Data Mining Methods and Techniques for Diabetes Diagnosis”
International Journal of Engineering and Innovative Technology (IJEIT) Volume 2, Issue 3, September
2012International Journal of Engineering and Innovative Technology (IJEIT), Volume 2, Issue 3, September
2012.
[6] Ashis Pradhan”Support Vector Machine- A Survey” International Journal of Emerging Technology and
Advanced Engineering ISSN 2250-2459, Volume 2, Issue 8, August 2012.
[7] Thirumal P. C. and Nagarajan N” Utilization Of Data Mining Techniques For Diagnosis Of Diabetes Mellitus -
A Case Study” ARPN Journal of Engineering and Applied Sciences ISSN 1819-6608, Volume 10, Issue 1,
January 2015.
[8] Abdullah, A. S., and R. Rajalaxmi. "A data mining model for predicting the coronary heart disease using
random forest classifier." In International Conference in Recent Trends in Computational Methods,
Communication and Controls, pp.22-25, 2012.
[9] K. R. Lakshmi,S.Prem Kumar“ Utilization of Data Mining Techniques for Prediction of Diabetes Disease
Survivability” International Journal of Scientific & Engineering Research, ISSN 2229-5518, Volume 4, Issue 6,
June-2013.
[10] Miroslav Marinov, M.S., Abu Saleh Mohammad Mosa, M.S.,1Illhoi Yoo, Suzanne Austin Boren” Data-Mining
Technologies for Diabetes: A Systematic Review”,Journal of Diabetes Science and Technology, Volume 5,
Issue 6, November 2011.