Вы находитесь на странице: 1из 34

DWIT COLLEGE

DEERWALK INSTITUTE OF TECHNOLOGY


Tribhuvan University
Institute of Science and Technology

DISEASE PREDICTOR

A PROJECT REPORT
Submitted to
Department of Computer Science and Information Technology
DWIT College

In partial fulfillment of the requirements for the Bachelor’s Degree in Computer


Science and Information Technology

Submitted by
Anil Parajuli
August, 2016
DWIT College
DEERWALK INSTITUTE OF TECHNOLOGY
Tribhuvan University

SUPERVISOR’S RECOMENDATION

I hereby recommend that this project prepared under my supervision by ANIL


PARAJULI entitled “DISEASE PREDICTOR” in partial fulfillment of the
requirements for the degree of B.Sc. in Computer Science and Information Technology be
processed for the evaluation.

…………………………………………
Ritu Raj Lamsal
Lecturer
Deerwalk Institute of Technology
DWIT College
DWIT College
DEERWALK INSTITUTE OF TECHNOLOGY
Tribhuvan University

LETTER OF APPROVAL

This is to certify that this project prepared by ANIL PARAJULI entitled “DISEASE
PREDICTOR” in partial fulfillment of the requirements for the degree of B.Sc. in
Computer Science and Information Technology has been well studied. In our opinion it is
satisfactory in the scope and quality as a project for the required degree.

…………………………………… …………………………………………
RiturajLamsal [Supervisor] Hitesh Karki
Lecturer Chief Academic Officer
DWIT College DWIT College

………………………………………….. …………………………………………..
Jagdish Bhatta [External Examiner] SarbinSayami [Internal Examiner]
IOST, Tribhuvan University Assistant Professor
IOST, Tribhuvan University

i
ACKNOWLEDGEMENT

First of all, I would like to thank DWIT College for providing me with the opportunity
and resources need for this project. Also, I am really thankful to my respected and
esteemed guide Mr. Ritu Raj Lamsal who helped me complete this project.

At the end, I would like to express my sincere thanks to all my friends and others who
helped me directly or indirectly during this project work.

Anil Parajuli
TU Roll No.: 1789/069

ii
Tribhuvan University
Institute of Science and Technology

STUDENT’S DECLARATION

I hereby declare that I am the only author of this work and that no sources other than the
listed here have been used in this work.

... ... ... ... ... ... ... ...

Anil Parajuli

Date: August, 2016

iii
ABSTRACT

“Disease Prediction” system based on predictive modeling predicts the disease of the user
on the basis of the symptoms that user provides as an input to the system. The system
analyzes the symptoms provided by the user as input and gives the probability of the
disease as an output
Disease Prediction is done by implementing the Naïve Bayes Classifier. Naïve Bayes
Classifier calculates the probability of the disease. Therefore, average prediction accuracy
probability 60% is obtained.

Keywords: Predictive Modeling, Naïve Bayes Classifier

iv
TABLE OF CONTENTS

LETTER OF APPROVAL ................................................................................................... i

ACKNOWLEDGEMENT ...................................................................................................ii

STUDENT’S DECLARATION ........................................................................................ iii

LIST OF FIGURES ...........................................................................................................vii

LIST OF TABLES ........................................................................................................... viii

LIST OF ABBREVIATIONS ............................................................................................. ix

CHAPTER 1 INTRODUCTION ......................................................................................... 1

1.1 Introduction .................................................................................................................... 1

1.2 Problem Statement ..................................................................................................... 2

1.3 Objective .................................................................................................................... 2

1.3.1 General Objective ............................................................................................... 2

1.3.2 Specific Objective ............................................................................................... 2

1.4 Scope and Limitations................................................................................................ 2

1.4.1 Scope ................................................................................................................... 2

1.4.2 Limitations .......................................................................................................... 3

1.5 Outline of Document.................................................................................................. 3

CHAPTER 2 REQUIREMENT ANALYSIS AND FEASIBILITY ANALYSIS ............. 5

2.1 Literature Review....................................................................................................... 5

2.2 Requirement Analysis ................................................................................................ 8

2.2.1 Functional requirements................................................................................. 8

2.2.2 Non-functional requirements ......................................................................... 8

v
2.3 Feasibility Analysis .................................................................................................... 8

2.3.1 Technical feasibility ............................................................................................ 8

2.3.2 Economic feasibility ........................................................................................... 8

2.3.3 Operational feasibility......................................................................................... 9

CHAPTER 3 SYSTEM DESIGN ...................................................................................... 10

3.1 Methodology ............................................................................................................ 10

3.1.1 Data collection .................................................................................................. 10

3.1.2Algorithm implemented ..................................................................................... 10

3.2 System Design ......................................................................................................... 13

3.2.1 Class Diagram ................................................................................................... 13

3.2.2 State diagram .................................................................................................... 14

3.2.3 Sequence diagram ............................................................................................. 15

CHAPTER 4 IMPLEMENTATION AND TESTING ...................................................... 16

4.1 Implementation ........................................................................................................ 16

4.1.2 Description ........................................................................................................ 18

CHAPTER 5 MAINTENANCE AND SUPPORT ............................................................ 20

5.1 Corrective Maintenance ........................................................................................... 20

5.2 Adaptive Maintenance ............................................................................................. 20

CHAPTER 6 CONCLUSION AND RECOMMENDATION .......................................... 21

6.1 Conclusion ............................................................................................................... 21

6.2 Recommendations .................................................................................................... 21

APPENDIX ........................................................................................................................ 22

REFERENCES .................................................................................................................. 23

vi
LIST OF FIGURES

Figure 1 - Class Diagram ................................................................................................... 13


Figure 2- State Diagram ..................................................................................................... 14
Figure 3- Sequence Diagram ............................................................................................. 15
Figure 4- Workflow ........................................................................................................... 16

vii
LIST OF TABLES

Table 2- Predictive Accuracy of Bayes and other Technique .............................................. 6


Table 3- Sample Data Sets ................................................................................................. 17
Table 4- Sample Output ..................................................................................................... 17

viii
LIST OF ABBREVIATIONS

CARE- Collaborative Assessment and Recommendation Engine


ICD- International Classification Of Disease.
NB- Naïve Bayes
HTML- HyperText Markup Language
CSS- Cascading Style Sheets

ix
Disease Predictor

CHAPTER 1 INTRODUCTION

1.1 Introduction

At present, when one suffers from particular disease, then the person has to visit to doctor
which is time consuming and costly too. Also if the user is out of reach of doctor and
hospitals it may be difficult for the user as the disease can not be identified. So, if the
above process can be completed using a automated program which can save time as well
as money, it could be easier to the patient which can make the process easier. There are
other Heart related Disease Prediction System using data mining techniques that analyzes
the risk level of the patient.

Disease Predictor is a web based application that predicts the disease of the user with
respect to the symptoms given by the user. Disease Prediction system has data sets
collected from different health related sites. With the help of Disease Predictor the user
will be able to know the probability of the disease with the given symptoms.

As the use of internet is growing every day, people are always curious to know different
new things. People always try to refer to the internet if any problem arises. People have
access to internet than hospitals and doctors. People do not have immediate option when
they suffer with particular disease. So, this system can be helpful to the people as they
have access to internet 24 hours.

1
Disease Predictor

1.2 Problem Statement

There are many tools related to disease prediction. But particularly heart related diseases
have been analyzed and risk level is generated. But generally there are no such tools that
are used for prediction of general diseases. So Disease Predictor helps for the prediction
of the general diseases.

1.3 Objective

1.3.1 General Objective

-To implement Naïve Bayes Classifier that classifies the disease as per the input of the
user.

1.3.2 Specific Objective

-To develop web interface platform for the prediction of the disease.

1.4 Scope and Limitations

1.4.1 Scope

This project aims to provide a web platform to predict the occurrences of disease on the
basis of various symptoms. The user can select various symptoms and can find the
diseases with their probabilistic figures.

2
Disease Predictor

1.4.2 Limitations

The limitations of this project are:

a. Disease Predictor does not recommend medications of the disease.

b. Past history of the disease has not been considered

1.4 Outline of Document

Preliminary Section  Title Page


 Abstract
 Table of Contents
 List of figures and Tables

Introduction Section  Background of Research


 Statements of Problems
 Objectives

3
Disease Predictor

Requirement Analysis and  Literature Review


Feasibility Analysis  Requirement Analysis
 Feasibility Analysis

System Design  Methodology


 System Design
 Implementation and Testing

Maintainace and Support  Maintenance


 Support

Conclusion and  Conclusion


Recommendation  Recommendation

4
Disease Predictor

CHAPTER 2 REQUIREMENT ANALYSIS AND


FEASIBILITY ANALYSIS

2.1 Literature Review

K.M. Al-Aidaroos, A.A. Bakar and Z. Othman have conducted the research for the best
medical diagnosis mining technique. For this authors compared Naïve Baeyes with five
other classifiers i.e. Logistic Regression (LR), KStar (K*), Decision Tree (DT), Neural
Network (NN) and a simple rule-based algorithm (ZeroR). For this, 15 real-world medical
problems from the UCI machine learning repository (Asuncion and Newman, 2007) were
selected for evaluating the performance of all algorithms. In the experiment it was found
that NB outperforms the other algorithms in 8 out of 15 data sets so it was concluded that
the predictive accuracy results in Naïve Baeyes is better than other techniques.

5
Disease Predictor

Table 1- Predictive Accuracy of Bayes and other Technique

Medical Problems NB LR K* DT NN ZeroR


Breast Cancer wise 97.3 92.98 95.72 94.57 95.57 65.52
Breast Cancer 72.7 67.77 73.73 74.28 66.95 70.3
Dermatology 97.43 96.89 94.51 94.1 96.45 30.6
Echoeardiogram 95.77 94.59 89.38 96.41 93.64 67.86
Liver Disorders 54.89 68.72 66.82 65.84 68.73 57.98
Pima Diabetes 75.75 77.47 70.19 74.49 74.75 65.11
Haeberman 75.36 74.41 73.73 72.16 70.32 73.53
Heart-c 83.34 83.7 75.18 77.13 80.99 54.45
Heart-statlog 84.85 84.04 73.89 75.59 81.78 55.56
Heart-b 83.95 84.23 77.83 80.22 80.07 63.95
Hepatitis 83.81 83.89 80.17 79.22 80.78 79.38
Lung Cancer 53.25 47.25 41.67 40.83 44.08 40
Lymphpgraphy 84.97 78.45 83.18 78.21 81.81 54.76
Postooerative Patient 68.11 61.11 61.67 69.78 58.54 71.11
Primary tumor 49.71 41.62 38.02 41.39 40.38 24.78
Wins 8\15 5\15 0\15 2\15 1\15 1\15
(Al-Aidaroos, Bakar, & Othman, 2012)

Darcy A. Davis, Nitesh V. Chawla, NicholasBlumm, Nicholas Christakis, Albert-Laszlo


Barabasi have found that global treatment of chronic disease is neither time or cost
efficient. So the authors conducted this research to predict future disease risk. For this
CARE was used (which relies only on a patient’s medical history using ICD- 9-CM codes
in order to predict future diseases risks). CARE combines collaborative filtering methods
with clustering to predict each patient’s greatest disease risks based on their own medical
history and that of similar patients. Authors have also described an Iterative version,
ICARE, which incorporates ensemble concepts for improved performance. These novel
systems require no specialized information and provide predictions for medical conditions
of all kinds in a single run. The impressive future disease coverage of ICARE represents
more accurate early warnings for thousands of diseases, some even years in advance.
Applied to full potential, the CARE framework can be used explore a broader disease

6
Disease Predictor

histories, suggest previously unconsidered concerns, and facilitating discussion about


early testing and prevention.
(A.Davis, V.Chawla, Blumm, Christakis, & Barbasi, 2008)

JyotiSoni, Ujma Ansari, Dipesh Sharma and SunitaSoni have done this research research
paper into provide a survey of current techniques of knowledge discovery in databases
using data mining techniques that are in use in today’s medical research particularly in
Heart Disease Prediction. Number of experiment has been conducted to compare the
performance of predictive data mining technique on the same dataset and the outcome
reveals that Decision Tree outperforms and some time Bayesian classification is having
similar accuracy as of decision tree but other predictive methods like KNN, Neural
Networks, Classification based on clustering is not performing well.
(JyotiSoni, Ansari, Sharma, & Soni, 2011)

Shadab Adam Pattekari and AsmaParveen have conducted a research using Naïve Bayes
Algorithm to predict the heart diseases where user provides the data which is compared
with trained set of values. So from this research, patients were able to provide their basic
information which is compared with the data and the heart disease is predicted.
(Adam & Parveen, 2012)

M.A.NisharaBanu, B Gomathy used medical data mining techniques like association rule
mining, classification, clustering I to analyze the different kinds of heart based problems.
Decision tree is made to illustrate every possible outcome of a decision. Different rules
are made to get the best outcome. In this research age , sex, smoking, overweight, alcohol
intake, blood sugar, hear rate, blood pressure are the parameters used for making the
decisions. Risk level for different parameters are stored with their id’s ranging (1-8). ID
lesser than of 1 of weight contains the normal level of prediction and higher ID other
than 1 comprise the higher risk levels .K-means clustering technique is used to study the
pattern in the dataset. The algorithm clusters informations into k groups. Each point in
the dataset is assigned to the closed cluster. Each cluster center is recomputed as the
average of the points in that cluster.
(NisharBanu, MA; Gomathy, B;, 2013)

7
Disease Predictor

2.2 Requirement Analysis

2.2.1 Functional requirements

a. Predict disease with the given symptoms.


b. Compare the given symptoms with the input datasets

2.2.2 Non-functional requirements

a. Display the list of symptoms where user can select the symptoms.

b. Naïve Bayes Classifier is used to classify the data sets.

2.3 Feasibility Analysis

2.3.1 Technical feasibility

The project is technically feasible as it can be built using the existing available
technologies. It is a web based applications that uses Grails Framework. The technology
required by Disease Predictor is available and hence it is technically feasible.

2.3.2 Economic feasibility

The project is economically feasible as the cost of the project is involved only in the
hosting of the project. As the data samples increases, which consume more time and
processing power. In that case better processor might be needed.

8
Disease Predictor

2.3.3 Operational feasibility

The project is operationally feasible as the user having basic knowledge about computer
and Internet. Disease Predictor is based on client-server architecture where client is users
and server is the machine where datasets are stored.

9
Disease Predictor

CHAPTER 3 SYSTEM DESIGN

3.1 Methodology

Disease Prediction has been already implemented using different techniques like Neural
Network, decision tree and Naïve Byes algorithm. Particularly heart related disease is
mostly analyzed. From the analysis it was found that Naïve Bayes is more accurate than
other techniques. So, Disease Predictor also uses Naïve Bayes for the prediction of
different diseases.

3.1.1 Data collection

Data collection has been done from the internet to identify the disease here the real
symptoms of the disease are collected i.e. no dummy values are entered. The symptoms
of the disease are collected from different health related websites.

3.1.2Algorithm implemented

The algorithm implemented in this project is Naïve Bayes Classifier.

Naïve Bayes classifier depends on Bayes Theorem


Equation 1:
P(Y)P(X1, … … , Xn |Y)
P(Y|X1, … … … . . , Xn) =
P(X1, … … . Xn )
Where,
Y is the class variable
X1, , X2 , … … . . , Xn are the dependent features

10
Disease Predictor

From equation 1 we get equation 2 as:


P(Disease|symptom1 , symptom2 , … … . , symptomn )
P(Disease)P(symptom1 , … … . . , symptomn |Disease)
=
P(symptom1 , symptom2 , … … … … . . Symptomn )
Using the naive independence assumption :
P(symptom1 , … … . . , symptomn |Disease) = P(Symptomi |Disease)
Where i= 1, 2, …………,n
Equation 3:
P(Disease|symptom1 , symptom2 , … … . , symptomn )
P(Disease)P(Symptomi |Disease)
=
P(symptom1 , symptom2 , … … … … . . Symptomn )
So the relation becomes:
Equation 4:
P(Disease|symptom1 , symptom2 , … … . , symptomn )
P(Disease) ∏ni=1 P(Symptomi |Disease)
=
P(symptom1 , symptom2 , … … … … . . Symptomn )

Since P(symptom1 , symptom2 , … … … … . . Symptomn ) is constant, we can use the


following classification rule:
P(Disease|symptom1 , symptom2 , … … . , symptomn )
n

= P(Disease) ∏ P(Symptomi |Disease)


i=1

P(Disease|symptom1 , symptom2 , … … . , symptomn )∝


P(Disease) ∏ni=1 P(symptomi|Disease)
n
̂Y= ARG MAX P(Disease) ∏ P(Symptomi |Disease)
i=1

The value P(Symptomi |Disease) of can be calculated by using multinomial Naïve Bayes
which is given by:

𝑁𝑦𝑖 + 𝛼
P(𝑠𝑦𝑚𝑝𝑡𝑜𝑚𝑖 |𝐷𝑖𝑠𝑒𝑎𝑠𝑒) =
𝑁𝑦 + 𝛼𝑛

Where:

11
Disease Predictor

Nyi = Frequency of same disease in the dataset


Ny = Total symptoms of the particular disease
n= total symptoms in the dataset
α=1, known as Laplace Smoothing

The value of P(Disease) can be calculated by using Laplace Law of Succession which is
given by:
𝑁(𝐷𝑖𝑠𝑒𝑎𝑠𝑒) + 1
P(Disease)=
𝑁+2

Where,
N (Disease) = Frequency of the same disease in the dataset
N= Total disease in the dataset

12
Disease Predictor

3.2 System Design

3.2.1 Class Diagram

Figure 1 - Class Diagram

It explain the classes used in the Disease Predictor. There are three classes used in total,
Symptoms Reader: Reads the user input and creates the list of symptoms
Symptoms Analyzer: According to symptoms parameter displays the subjective result.
Calculate Values: Calculates the probabilistic model of the diseases.

13
Disease Predictor

3.2.2 State diagram

Figure 2- State Diagram


It explains different state of the system. First the user opens Disease Predictor. The user
selects the symptoms. When finished selecting symptoms the user submits the symptoms.
Disease Predictor analyzes the symptoms and displays the result.

14
Disease Predictor

3.2.3 Sequence diagram

Figure 3- Sequence Diagram

It explains the sequence of the Disease Predictor. Initially system shows the symptoms to
be selected. The user selects the symptoms and submits to the system .The Disease
Predictor predicts and display the result

15
Disease Predictor

CHAPTER 4 IMPLEMENTATION AND TESTING

4.1 Implementation

Disease Predictor is the ability to predict the disease that has been provided to the system.
For disease prediction, we need to implement the naïve Byes Classifier.

Figure 4- Workflow

As shown in the figure the input data sets are classified using Naïve Bayes classifier. The
sample input data sets is shown below

16
Disease Predictor

Table 2- Sample Data Sets


Symptoms Disease
Runny nose ,Sore throat ,Cough ,Congestion, body Common cold
aches, headache ,Sneezing , fever
Fever ,profuse sweating ,headache ,nausea Malaria
,vomiting ,diarrhea ,anemia ,muscle pain
,convulsions ,coma bloody stools ,shaking chill
poor appetite ,abdominal pain ,headaches Typhoid
,generalized aches and pains ,fever ,lethargy
,intestinal bleeding or perforation ,diarrhea ,
constipation

Naïve Bayes classifier uses the following rule to classify the datasets:
n
̂Y= ARG MAX P(Disease) ∏ P(symptomi|Disease)
i=1

User gives input to the system. The input consists of symptoms. The user marks the
symptoms due to which the user is feeling unwell.
1. Fever
2. Cough
3. Vomiting
The “Disease Predictor” system predicts the disease according to the input data sets and
calculates the probability of the disease.
The sample output is given as:

Table 3- Sample Output

Disease Name Probability


Typhoid 0.5%
Malaria 0.3%
Flu 0.333%

4.1.1 Tools Used

17
Disease Predictor

1. HTML is used to display content in the browser.


2. CSS is used to properly align the HTML content.
3. Grails framework is used for developing the application.
4. Creately is used for constructing figures.

4.1.2 Description

The major classes in the application are:

SymptomsReader

This class is the run first when the user wants for disease prediction
Input: User selects the symptoms from the list.
Output: The selected symptoms are put in the list

SymptomsAnalyzer

Input: Takes the user input i.e. symptoms.


Output: Predicts the disease

CalculateValues

Here the actual mathematical computation takes place.

18
Disease Predictor

4.3 Testing

The test case designed for the project is discussed below:

Test Case- I: Submit the symptoms from the list

Precondition: The application is open.


Assumptions: The symptoms for the disease are available
Test steps: 1. Select the checkbox from the list
2. Select submit
Expected Result: The symptoms selected should be submitted and further analyzed to
calculate the probability of the disease.

19
Disease Predictor

CHAPTER 5 MAINTENANCE AND SUPPORT

5.1 Corrective Maintenance

In case of any bugs left in the system, the bugs and issues will be fixed for smooth
running of the application. The accuracy of the system can be further improved with other
algorithms if needed.

5.2 Adaptive Maintenance

The features in the application can be added such as history of the disease can be kept in
the log. The available list of symptoms can also be added for covering more number of
diseases.

20
Disease Predictor

CHAPTER 6 CONCLUSION AND RECOMMENDATION

6.1 Conclusion

This project aims to predict the disease on the basis of the symptoms. The project is
designed in such a way that the system takes symptoms from the user as input and
produces output i.e. predict disease. Average prediction accuracy probability of 55% is
obtained. Disease Predictor was successfully implemented using grails framework.

6.2 Recommendations
This project has not implemented recommendation of medications to the user. So,
medication recommendation can be implemented in the project. History about the disease
for a user can be kept as a log and recommendation can be implemented for medications.

21
Disease Predictor

APPENDIX

Landing Page of Disease Predictor

Output showing the probability of the disease

22
Disease Predictor

REFERENCES

A.Davis, D., V.Chawla, N., Blumm, N., Christakis, N., & Barbasi, A. L. (2008).
Predicting Individual Disease Risk Based On Medical History.
Adam, S., & Parveen, A. (2012). Prediction System For Heart Disease Using Naive
Bayes.
Al-Aidaroos, K., Bakar, A., & Othman, Z. (2012). Medical Data Classification With
Naive Bayes Approach. Information Technology Journal .
Darcy A. Davis, N. V.-L. (2008). Predicting Individual Disease Risk Based On Medical
History.
JyotiSoni, Ansari, U., Sharma, D., & Soni, S. (2011). Predictive Data Mining for Medical
Diagnosis: An Overview Of Heart Disease Prediction.
K.M. Al-Aidaroos, A. B. (n.d.).
K.M. Al-Aidaroos, A. B. (n.d.). 2012. Medical Data Classification With Naive Bayes
Approach .
NisharBanu, MA; Gomathy, B;. (2013). Disease Predicting System Using Data Mining
Techniques.

23

Вам также может понравиться