Вы находитесь на странице: 1из 34

DWIT COLLEGE DEERWALK INSTITUTE OF TECHNOLOGY

Tribhuvan University

Institute of Science and Technology

Tribhuvan University Institute of Science and Technology DISEASE PREDICTOR A PROJECT REPORT Submitted to Department

DISEASE PREDICTOR

A PROJECT REPORT

Submitted to Department of Computer Science and Information Technology DWIT College

In partial fulfillment of the requirements for the Bachelor’s Degree in Computer Science and Information Technology

Submitted by

Anil Parajuli

August, 2016

DWIT College DEERWALK INSTITUTE OF TECHNOLOGY Tribhuvan University

SUPERVISOR’S RECOMENDATION

I hereby recommend that this project prepared under my supervision by ANIL

PARAJULI entitled “DISEASE PREDICTOR” in partial fulfillment of the

requirements for the degree of B.Sc. in Computer Science and Information Technology be

processed for the evaluation.

………………………………………… Ritu Raj Lamsal Lecturer Deerwalk Institute of Technology DWIT College

DWIT College DEERWALK INSTITUTE OF TECHNOLOGY Tribhuvan University

LETTER OF APPROVAL

This is to certify that this project prepared by ANIL PARAJULI entitled DISEASE

PREDICTORin partial fulfillment of the requirements for the degree of B.Sc. in

Computer Science and Information Technology has been well studied. In our opinion it is

satisfactory in the scope and quality as a project for the required degree.

…………………………………… RiturajLamsal [Supervisor] Lecturer DWIT College

………………………………………… Hitesh Karki Chief Academic Officer DWIT College

………………………………………… Jagdish Bhatta [External Examiner] IOST, Tribhuvan University

………………………………………… SarbinSayami [Internal Examiner] Assistant Professor IOST, Tribhuvan University

i

ACKNOWLEDGEMENT

First of all, I would like to thank DWIT College for providing me with the opportunity and resources need for this project. Also, I am really thankful to my respected and esteemed guide Mr. Ritu Raj Lamsal who helped me complete this project.

At the end, I would like to express my sincere thanks to all my friends and others who helped me directly or indirectly during this project work.

Anil Parajuli TU Roll No.: 1789/069

ii

Tribhuvan University Institute of Science and Technology

STUDENT’S DECLARATION

I hereby declare that I am the only author of this work and that no sources other than the listed here have been used in this work.

Anil Parajuli

Date: August, 2016

iii

ABSTRACT

“Disease Prediction” system based on predictive modeling predicts the disease of the user on the basis of the symptoms that user provides as an input to the system. The system analyzes the symptoms provided by the user as input and gives the probability of the disease as an output Disease Prediction is done by implementing the Naïve Bayes Classifier. Naïve Bayes Classifier calculates the probability of the disease. Therefore, average prediction accuracy

probability 60% is obtained.

Keywords: Predictive Modeling, Naïve Bayes Classifier

iv

TABLE OF CONTENTS

LETTER OF APPROVAL

i

ACKNOWLEDGEMENT

ii

STUDENT’S DECLARATION

iii

LIST OF FIGURES

vii

LIST

OF

TABLES

viii

LIST

OF

ABBREVIATIONS

ix

CHAPTER 1 INTRODUCTION

1

1.1

Introduction

1

1.2 Problem Statement

2

 

1.3 Objective

2

1.3.1 General Objective

2

1.3.2 Specific Objective

2

1.4

Scope and Limitations

2

 

1.4.1 Scope

2

1.4.2 Limitations

3

1.5

Outline of Document

3

CHAPTER 2 REQUIREMENT ANALYSIS AND FEASIBILITY ANALYSIS

5

2.1 Literature Review

5

2.2 Requirement Analysis

8

2.2.1 Functional requirements

8

2.2.2 Non-functional requirements

8

v

2.3

Feasibility Analysis

8

 

2.3.1 Technical feasibility

8

2.3.2 Economic feasibility

8

2.3.3 Operational feasibility

9

CHAPTER 3 SYSTEM DESIGN

10

3.1

Methodology

10

3.1.1

Data collection

10

3.1.2Algorithm implemented

10

3.2

System Design

13

3.2.1 Class Diagram

13

3.2.2 State diagram

14

3.2.3 Sequence diagram

15

CHAPTER 4 IMPLEMENTATION AND TESTING

16

4.1

Implementation

16

4.1.2

Description

18

CHAPTER 5 MAINTENANCE AND SUPPORT

20

5.1 Corrective Maintenance

20

5.2 Adaptive Maintenance

20

CHAPTER 6 CONCLUSION AND RECOMMENDATION

21

6.1 Conclusion

21

6.2 Recommendations

21

 

APPENDIX

22

REFERENCES

23

vi

LIST OF FIGURES

Figure 1 - Class Diagram

13

Figure

2-

State Diagram

14

Figure

3-

Sequence Diagram

15

Figure

4-

Workflow

16

vii

LIST OF TABLES

Table 2- Predictive Accuracy of Bayes and other Technique

6

Table 3- Sample Data Sets

17

Table

4-

Sample

Output

17

viii

LIST OF ABBREVIATIONS

CARE- Collaborative Assessment and Recommendation Engine ICD- International Classification Of Disease. NB- Naïve Bayes HTML- HyperText Markup Language CSS- Cascading Style Sheets

ix

Disease Predictor

CHAPTER 1 INTRODUCTION

1.1 Introduction

At present, when one suffers from particular disease, then the person has to visit to doctor which is time consuming and costly too. Also if the user is out of reach of doctor and hospitals it may be difficult for the user as the disease can not be identified. So, if the above process can be completed using a automated program which can save time as well as money, it could be easier to the patient which can make the process easier. There are other Heart related Disease Prediction System using data mining techniques that analyzes the risk level of the patient.

Disease Predictor is a web based application that predicts the disease of the user with respect to the symptoms given by the user. Disease Prediction system has data sets collected from different health related sites. With the help of Disease Predictor the user will be able to know the probability of the disease with the given symptoms.

As the use of internet is growing every day, people are always curious to know different new things. People always try to refer to the internet if any problem arises. People have access to internet than hospitals and doctors. People do not have immediate option when they suffer with particular disease. So, this system can be helpful to the people as they have access to internet 24 hours.

1

Disease Predictor

1.2 Problem Statement

There are many tools related to disease prediction. But particularly heart related diseases have been analyzed and risk level is generated. But generally there are no such tools that are used for prediction of general diseases. So Disease Predictor helps for the prediction of the general diseases.

1.3 Objective

1.3.1 General Objective

-To implement Naïve Bayes Classifier that classifies the disease as per the input of the user.

1.3.2 Specific Objective

-To develop web interface platform for the prediction of the disease.

1.4 Scope and Limitations

1.4.1 Scope

This project aims to provide a web platform to predict the occurrences of disease on the basis of various symptoms. The user can select various symptoms and can find the diseases with their probabilistic figures.

2

Disease Predictor

1.4.2 Limitations

The limitations of this project are:

a. Disease Predictor does not recommend medications of the disease.

b. Past history of the disease has not been considered

1.4 Outline of Document

Preliminary Section

Title Page

Abstract

Table of Contents

List of figures and Tables

Introduction Section

Background of Research

Statements of Problems

Objectives

3

Disease Predictor

Requirement Analysis and Feasibility Analysis

System Design

Maintainace and Support

Conclusion and Recommendation

Literature Review

Requirement Analysis

Feasibility Analysis

Methodology

System Design

Implementation and Testing

Maintenance

Support

Conclusion

Recommendation

4

Disease Predictor

CHAPTER 2 REQUIREMENT ANALYSIS AND FEASIBILITY ANALYSIS

2.1 Literature Review

K.M. Al-Aidaroos, A.A. Bakar and Z. Othman have conducted the research for the best medical diagnosis mining technique. For this authors compared Naïve Baeyes with five other classifiers i.e. Logistic Regression (LR), KStar (K*), Decision Tree (DT), Neural Network (NN) and a simple rule-based algorithm (ZeroR). For this, 15 real-world medical problems from the UCI machine learning repository (Asuncion and Newman, 2007) were selected for evaluating the performance of all algorithms. In the experiment it was found that NB outperforms the other algorithms in 8 out of 15 data sets so it was concluded that the predictive accuracy results in Naïve Baeyes is better than other techniques.

5

Disease Predictor

Table 1- Predictive Accuracy of Bayes and other Technique

Medical Problems

NB

LR

K*

DT

NN

ZeroR

Breast Cancer wise

97.3

92.98

95.72

94.57

95.57

65.52

Breast Cancer

72.7

67.77

73.73

74.28

66.95

70.3

Dermatology

97.43

96.89

94.51

94.1

96.45

30.6

Echoeardiogram

95.77

94.59

89.38

96.41

93.64

67.86

Liver Disorders

54.89

68.72

66.82

65.84

68.73

57.98

Pima Diabetes

75.75

77.47

70.19

74.49

74.75

65.11

Haeberman

75.36

74.41

73.73

72.16

70.32

73.53

Heart-c

83.34

83.7

75.18

77.13

80.99

54.45

Heart-statlog

84.85

84.04

73.89

75.59

81.78

55.56

Heart-b

83.95

84.23

77.83

80.22

80.07

63.95

Hepatitis

83.81

83.89

80.17

79.22

80.78

79.38

Lung Cancer

53.25

47.25

41.67

40.83

44.08

40

Lymphpgraphy

84.97

78.45

83.18

78.21

81.81

54.76

Postooerative Patient

68.11

61.11

61.67

69.78

58.54

71.11

Primary tumor

49.71

41.62

38.02

41.39

40.38

24.78

Wins

8\15

5\15

0\15

2\15

1\15

1\15

(Al-Aidaroos, Bakar, & Othman, 2012)

Darcy A. Davis, Nitesh V. Chawla, NicholasBlumm, Nicholas Christakis, Albert-Laszlo Barabasi have found that global treatment of chronic disease is neither time or cost efficient. So the authors conducted this research to predict future disease risk. For this CARE was used (which relies only on a patient’s medical history using ICD- 9-CM codes in order to predict future diseases risks). CARE combines collaborative filtering methods with clustering to predict each patient’s greatest disease risks based on their own medical history and that of similar patients. Authors have also described an Iterative version, ICARE, which incorporates ensemble concepts for improved performance. These novel systems require no specialized information and provide predictions for medical conditions of all kinds in a single run. The impressive future disease coverage of ICARE represents more accurate early warnings for thousands of diseases, some even years in advance. Applied to full potential, the CARE framework can be used explore a broader disease

6

Disease Predictor

histories, suggest previously unconsidered concerns, and facilitating discussion about early testing and prevention. (A.Davis, V.Chawla, Blumm, Christakis, & Barbasi, 2008)

JyotiSoni, Ujma Ansari, Dipesh Sharma and SunitaSoni have done this research research paper into provide a survey of current techniques of knowledge discovery in databases using data mining techniques that are in use in today’s medical research particularly in Heart Disease Prediction. Number of experiment has been conducted to compare the performance of predictive data mining technique on the same dataset and the outcome reveals that Decision Tree outperforms and some time Bayesian classification is having similar accuracy as of decision tree but other predictive methods like KNN, Neural Networks, Classification based on clustering is not performing well. (JyotiSoni, Ansari, Sharma, & Soni, 2011)

Shadab Adam Pattekari and AsmaParveen have conducted a research using Naïve Bayes Algorithm to predict the heart diseases where user provides the data which is compared with trained set of values. So from this research, patients were able to provide their basic information which is compared with the data and the heart disease is predicted. (Adam & Parveen, 2012)

M.A.NisharaBanu, B Gomathy used medical data mining techniques like association rule mining, classification, clustering I to analyze the different kinds of heart based problems. Decision tree is made to illustrate every possible outcome of a decision. Different rules are made to get the best outcome. In this research age , sex, smoking, overweight, alcohol intake, blood sugar, hear rate, blood pressure are the parameters used for making the decisions. Risk level for different parameters are stored with their id’s ranging (1-8). ID lesser than of 1 of weight contains the normal level of prediction and higher ID other than 1 comprise the higher risk levels .K-means clustering technique is used to study the pattern in the dataset. The algorithm clusters informations into k groups. Each point in the dataset is assigned to the closed cluster. Each cluster center is recomputed as the average of the points in that cluster.

(NisharBanu, MA; Gomathy, B;, 2013)

7

Disease Predictor

2.2 Requirement Analysis

2.2.1 Functional requirements

a. Predict disease with the given symptoms.

b. Compare the given symptoms with the input datasets

2.2.2 Non-functional requirements

a. Display the list of symptoms where user can select the symptoms.

b. Naïve Bayes Classifier is used to classify the data sets.

2.3 Feasibility Analysis

2.3.1 Technical feasibility

The project is technically feasible as it can be built using the existing available technologies. It is a web based applications that uses Grails Framework. The technology required by Disease Predictor is available and hence it is technically feasible.

2.3.2 Economic feasibility

The project is economically feasible as the cost of the project is involved only in the hosting of the project. As the data samples increases, which consume more time and processing power. In that case better processor might be needed.

8

Disease Predictor

2.3.3 Operational feasibility

The project is operationally feasible as the user having basic knowledge about computer and Internet. Disease Predictor is based on client-server architecture where client is users and server is the machine where datasets are stored.

9

Disease Predictor

CHAPTER 3 SYSTEM DESIGN

3.1 Methodology

Disease Prediction has been already implemented using different techniques like Neural

Network, decision tree and Naïve Byes algorithm. Particularly heart related disease is

mostly analyzed. From the analysis it was found that Naïve Bayes is more accurate than

other techniques. So, Disease Predictor also uses Naïve Bayes for the prediction of

different diseases.

3.1.1 Data collection

Data collection has been done from the internet to identify the disease here the real

symptoms of the disease are collected i.e. no dummy values are entered. The symptoms

of the disease are collected from different health related websites.

3.1.2Algorithm implemented

The algorithm implemented in this project is Naïve Bayes Classifier.

Naïve Bayes classifier depends on Bayes Theorem

Equation 1:

P(Y|X1, … … …

, X n) = P(Y)P(X1, … … , X n |Y) P(X1, … … . X n )

Where,

Y is the class variable

X 1, , X 2 , … …

,

X n

are the dependent features

10

Disease Predictor

From equation 1 we get equation 2 as:

P(Disease|symptom 1 , symptom 2 , … … . , symptom n )

= P(Disease)P(symptom 1 , … …

, symptom n |Disease)

P(symptom 1 , symptom 2 , … … … …

Symptom n )

Using the naive independence assumption :

P(symptom 1 , … …

Where i= 1, 2, …………,n

Equation 3:

, symptom n |Disease) = P(Symptom i |Disease)

P(Disease|symptom 1 , symptom 2 , … … . , symptom n )

So the relation becomes:

Equation 4:

P(Disease)P(Symptom i |Disease)

=

P(symptom 1 , symptom 2 , … … … …

Symptom n )

P(Disease|symptom 1 , symptom 2 , … … . , symptom n )

=

P(Disease) ∏

n

i=1

P(Symptom i |Disease)

P(symptom 1 , symptom 2 , … … … …

Symptom n )

Since P(symptom 1 , symptom 2 , … … … …

following classification rule:

Symptom n ) is constant, we can use the

P(Disease|symptom 1 , symptom 2 , … … . , symptom n )

n

= P(Disease) ∏ P(Symptom i |Disease)

i=1

P(Disease|symptom 1 , symptom 2 , … … . , symptom n )∝

P(Disease) ∏

n

i=1

P(symptomi|Disease)

n

̂ Y = ARG MAX P(Disease) ∏ P(Symptom i |Disease)

i=1

The value P(Symptom i |Disease) of can be calculated by using multinomial Naïve Bayes

which is given by:

Where:

P( |) = + +

11

Disease Predictor

N yi = Frequency of same disease in the dataset

N y = Total symptoms of the particular disease

n= total symptoms in the dataset

α=1, known as Laplace Smoothing

The value of P(Disease) can be calculated by using Laplace Law of Succession which is

given by:

P(Disease)= () + 1 + 2

Where,

N (Disease) = Frequency of the same disease in the dataset

N= Total disease in the dataset

12

Disease Predictor

3.2 System Design

3.2.1 Class Diagram

Disease Predictor 3.2 System Design 3.2.1 Class Diagram Figure 1 - Class Diagram It explain the

Figure 1 - Class Diagram

It explain the classes used in the Disease Predictor. There are three classes used in total, Symptoms Reader: Reads the user input and creates the list of symptoms Symptoms Analyzer: According to symptoms parameter displays the subjective result. Calculate Values: Calculates the probabilistic model of the diseases.

13

Disease Predictor

3.2.2 State diagram

Disease Predictor 3.2.2 State diagram Figure 2- State Diagram It explains different state of the system.

Figure 2- State Diagram

It explains different state of the system. First the user opens Disease Predictor. The user selects the symptoms. When finished selecting symptoms the user submits the symptoms. Disease Predictor analyzes the symptoms and displays the result.

14

Disease Predictor

3.2.3 Sequence diagram

Disease Predictor 3.2.3 Sequence diagram Figure 3- Sequence Diagram It explains the sequence of the Disease

Figure 3- Sequence Diagram

It explains the sequence of the Disease Predictor. Initially system shows the symptoms to be selected. The user selects the symptoms and submits to the system .The Disease Predictor predicts and display the result

15

Disease Predictor

CHAPTER 4 IMPLEMENTATION AND TESTING

4.1 Implementation

Disease Predictor is the ability to predict the disease that has been provided to the system. For disease prediction, we need to implement the naïve Byes Classifier.

prediction, we need to implement the naïve Byes Classifier. Figure 4- Workflow As shown in the

Figure 4- Workflow

As shown in the figure the input data sets are classified using Naïve Bayes classifier. The sample input data sets is shown below

16

Disease Predictor

Table 2- Sample Data Sets

Symptoms

Disease

Runny nose ,Sore throat ,Cough ,Congestion, body

Common cold

aches, headache ,Sneezing , fever

Fever ,profuse sweating ,headache ,nausea

Malaria

,vomiting ,diarrhea ,anemia ,muscle pain

,convulsions ,coma bloody stools ,shaking chill

poor appetite ,abdominal pain ,headaches

Typhoid

,generalized aches and pains ,fever ,lethargy

,intestinal bleeding or perforation ,diarrhea ,

constipation

Naïve Bayes classifier uses the following rule to classify the datasets:

̂

Y

n

= ARG MAX P(Disease) ∏ P(symptomi|Disease)

i=1

User gives input to the system. The input consists of symptoms. The user marks the

symptoms due to which the user is feeling unwell.

1.

Fever1.

2.

Cough2.

3.

Vomiting3.

The “Disease Predictor” system predicts the disease according to the input data sets and

calculates the probability of the disease.

The sample output is given as:

Table 3- Sample Output

Disease Name

Probability

Typhoid

0.5%

Malaria

0.3%

Flu

0.333%

4.1.1 Tools Used

17

Disease Predictor

1. HTML is used to display content in the browser.

2. CSS is used to properly align the HTML content.

3. Grails framework is used for developing the application.

4. Creately is used for constructing figures.

4.1.2 Description

The major classes in the application are:

SymptomsReader

This class is the run first when the user wants for disease prediction Input: User selects the symptoms from the list. Output: The selected symptoms are put in the list

SymptomsAnalyzer

Input: Takes the user input i.e. symptoms. Output: Predicts the disease

CalculateValues

Here the actual mathematical computation takes place.

18

Disease Predictor

4.3 Testing

The test case designed for the project is discussed below:

Test Case- I: Submit the symptoms from the list

Precondition: The application is open.

Assumptions: The symptoms for the disease are available

Test steps:

1. Select the checkbox from the list 2. Select submit

Expected Result: The symptoms selected should be submitted and further analyzed to calculate the probability of the disease.

19

Disease Predictor

CHAPTER 5 MAINTENANCE AND SUPPORT

5.1 Corrective Maintenance

In case of any bugs left in the system, the bugs and issues will be fixed for smooth running of the application. The accuracy of the system can be further improved with other algorithms if needed.

5.2 Adaptive Maintenance

The features in the application can be added such as history of the disease can be kept in the log. The available list of symptoms can also be added for covering more number of diseases.

20

Disease Predictor

CHAPTER 6 CONCLUSION AND RECOMMENDATION

6.1 Conclusion

This project aims to predict the disease on the basis of the symptoms. The project is designed in such a way that the system takes symptoms from the user as input and

produces output i.e. predict disease. Average prediction accuracy probability of 55% is

obtained. Disease Predictor was successfully implemented using grails framework.

6.2 Recommendations

This project has not implemented recommendation of medications to the user. So, medication recommendation can be implemented in the project. History about the disease for a user can be kept as a log and recommendation can be implemented for medications.

21

Disease Predictor

APPENDIX

Landing Page of Disease Predictor

Disease Predictor APPENDIX Landing Page of Disease Predictor Output showing the probability of the disease 22

Output showing the probability of the disease

Disease Predictor APPENDIX Landing Page of Disease Predictor Output showing the probability of the disease 22

22

Disease Predictor

REFERENCES

A.Davis, D., V.Chawla, N., Blumm, N., Christakis, N., & Barbasi, A. L. (2008). Predicting Individual Disease Risk Based On Medical History. Adam, S., & Parveen, A. (2012). Prediction System For Heart Disease Using Naive Bayes. Al-Aidaroos, K., Bakar, A., & Othman, Z. (2012). Medical Data Classification With Naive Bayes Approach. Information Technology Journal . Darcy A. Davis, N. V.-L. (2008). Predicting Individual Disease Risk Based On Medical History. JyotiSoni, Ansari, U., Sharma, D., & Soni, S. (2011). Predictive Data Mining for Medical Diagnosis: An Overview Of Heart Disease Prediction. K.M. Al-Aidaroos, A. B. (n.d.). K.M. Al-Aidaroos, A. B. (n.d.). 2012. Medical Data Classification With Naive Bayes Approach . NisharBanu, MA; Gomathy, B;. (2013). Disease Predicting System Using Data Mining Techniques.

23