Minor Project Report

DIABETES PREDICTOR
A
Minor Project Report
Submitted in the partial fulfillment of the requirement for the award of
Bachelor of Engineering
In
Computer Science and Engineering.
Submitted to
Samrat Ashok Technological Institute, Vidisha

(An Autonomous Institute Affiliated to RGPV, Bhopal)
Submitted by
ANAMIKA TRIPATHI (0108CS161009)

RITU LAKHANI (0108CS161045)
Under the supervision of
Prof. SANJEET KUMAR

Assistant Professor
Department of Computer Science & Engineering

Samrat Ashok Technological Institute
Vidisha (M.P.)-464001
MAY 2019
Samrat Ashok Technological Institute
Vidisha (M.P.)
Department of Computer Science and Engineering

CERTIFICATE
This is to certify that the Minor Project entitled as “DIABETES PREDICTOR” submitted by
ANAMIKA TRIPATHI (0108cs161009), RITU LAKHANI (0108cs161045), in the partial
fulfillment of the requirements for the award of degree of Bachelor of Engineering in the
specialization of Computer Science and Engineering from Samrat Ashok Technological
Institute, Vidisha (M.P.) is a record work carried out by her under my supervision and
guidance. The matter presented in this report has note been presented by him elsewhere for any
other degree or diploma.
Prof. SANJEET KUMAR Prof. RAM RATAN AHIRWAR Prof. SHAILENDRA KUMAR
(Project Guide) Assistant Professor SHRIVASTAVA (H.O.D.)
Assistant Professor Computer Science & Engineering Computer Science & Engineering
Computer Science & Engineering Samrat Ashok Technological Institute, Samrat Ashok Technological Institute,
Samrat Ashok Technological Institute, Vidisha (M.P.) Vidisha (M.P.)
Vidisha (M.P.)

ACKNOWLEDGEMENT
The present project report is submitted to Samrat Ashok Technological Institute, Vidisha
(M.P.).
We being the student of Samrat Ashok Technological Institute convey our sincere thanks to
Director Dr. J. S. Chauhan for providing all the facilities required for making the project
success.
We take a deep pleasure in thanking Dr. Shailendra Kumar Shrivastava (H.O.D. Computer
Science & Engineering) for all the moral and educational support, which he gave through the
year.
We have the sense of gratitude to Prof. Ram Ratan Ahirwar, Department of computer science
and engineering for providing guidance for this project work. Under his supervision and
inspiring guidance this project was embarked upon, planned and executed. His sincere
suggestion helped me greatly in bringing out this work at its present shape.
And above all, thanks to S.A.T.I for providing an opportunity for showing our talent in the
field of information Technology and last but not the least, we avail the opportunity to express
my deepest gratitude towards our Professor and Head Dr. S. K. Shrivastava for helping to
complete this synopsis report successfully.
S.No NAME Roll No.
1. Anamika Tripathi 0108cs161009
2. Ritu Lakhani 0108cs161045

Table of Contents:

1. Title Page
2. Certificate
3. Acknowledgement
4. Certificate of the guide
5. Abstract
6. Introduction
• Setting the research Goal
• Retrieving Information
• Data Preprocessing and Cleaning
• Data Exploration and Visualization
• Data Modeling
• Model Deployment
7. Logistic Regression
8. Requirements
9. Program
10. References

Abstract:

Using Machine learning, Diabetes Prediction Model has been trained and is deployed
with a Tkinter GUI.
Model is trained using Logistic Regression on the dataset diabetes.csv, which

contains the effect of various parameters on Diabetes.
Inputs values of parameters Blood Pressure, Age, BMI, Glucose, Insulin, Diabetes
Pedigree Function, skin thickness and pregnancies are taken from the user. These
vales are stored in a list. This list is then used for building regression model. The
regression model is trained on the dataset provided.
The trained model is then tested and prediction is done.

Introduction:

Diabetes Prediction is a Machine Learning project build using Logistic Regression to
predict the diabetes of people using certain parameters and deployed in the from of
GUI by Tkinter.
The following steps have been followed in building and deploying the project:
1. Setting the research goal

2. Retrieving Data
3. Data Preprocessing & Cleansing
4. Data Exploration & Visualization
5. Data Modeling
6. Model Deployment
Setting the research goal:

The aim of this work is to get familiarized with a Data Science process as described
above by building & deploying a Machine Learning Model that can predict diabetes
based on its features using Logistic Regression.

Retrieving Data:
Retrieving Data is the important step that comes after setting the research goal, for
this purpose I used the diabetes.csv data from Kaggle.

Data Preprocessing and cleansing:

The first step in Data preprocessing is importing the libraries needed to preprocess
data. Numpy, Matplotlib, Pandas and Seaborn were used for mathematical
functioning.

Dataset was uploaded using:

Correlation heatmap among the various parameters on which diabetes is judged:

Data correlation is the way in which one set of data may correspond to another set.
In ML, correlation indicates how your features correspond with your output.

Data Exploration and visualization:
After the Data Extraction and Data Preprocessing steps, the data set is visualized so
that one can have more insights about what is happening under the hood, and how
the data is distributed.
Age groups present in the dataset:

Relation between Age and Body Mass Index:

How pregnancy affect diabetes:

How Blood Pressure affect Diabetes:
Data Modeling:
The library sklearn.linear_model is used to import Logistic Regression. Object
classifier is created and called.
Now that our model has learned from our training sets, it is time to predict some
observations.
Model-
Output = 0 or 1
Hypothesis => Z = WX + B
hΘ(x) = sigmoid (Z)

If ‘Z’ goes to infinity, Y (predicted) will become 1 and if ‘Z’ goes to negative infinity,
Y(predicted) will become 0.

Model Deployment with Tkinter:

Python offers multiple options for developing GUI (Graphical User Interface). Out of
all the GUI methods, tkinter is most commonly used method. It is a standard Python
interface to the Tk GUI toolkit shipped with Python. Python with tkinter outputs the
fastest and easiest way to create the GUI applications.

Prediction button when clicked, opens a new window which takes values of
parameters as input. These input are saved as List which are used as data for
prediction of diabetes.

The label on the screen indicates [0] or [1].

[0] - Indicates that the model has not predicted diabetes for the person.
[1] – Indicates that the model has predicted diabetes for the person.

Logistic Regression:

Logistic Regression is a Machine Learning classification algorithm that is used to
predict the probability of a categorical dependent variable. In logistic regression, the
dependent variable is a binary variable that contains data coded as 1 (yes, success,
etc.) or 0 (no, failure, etc.). In other words, the logistic regression model predicts P
(Y=1) as a function of X.
Logistic Regression is one of the most popular ways to fit models for categorical
data, especially for binary response data in Data Modeling. It is the most important
(and probably most used) member of a class of models called generalized linear
models. Unlike linear regression, logistic regression can directly predict probabilities
(values that are restricted to the (0,1) interval); furthermore, those probabilities are
well-calibrated when compared to the probabilities predicted by some other
classifiers, such as Naive Bayes. Logistic regression preserves the marginal
probabilities of the training data. The coefficients of the model also provide some
hint of the relative importance of each input variable.
Logistic Regression is used when the dependent variable (target) is categorical.
Logistic regression is generally used where the dependent variable is Binary or

Dichotomous. That means the dependent variable can take only two possible values
such as “Yes or No”, “Default or No Default”, “Living or Dead”, “Responder or Non
Responder”, “Yes or No” etc. Independent factors or variables can be categorical or
numerical variables.
Logistic Regression Assumptions:
· Binary logistic regression requires the dependent variable to be binary.
· For a binary regression, the factor level 1 of the dependent variable should
represent the desired outcome.
· Only the meaningful variables should be included.
· The independent variables should be independent of each other. That is, the model
should have little or no multi-collinearity.
· The independent variables are linearly related to the log odds.
· Logistic regression requires quite large sample sizes.
Logistic Regression Equation:

The underlying algorithm of Maximum Likelihood Estimation (MLE) determines the
regression coefficient for the model that accurately predicts the probability of the
binary dependent variable. The algorithm stops when the convergence criterion is
met or maximum number of iterations are reached. Since the probability of any
event lies between 0 and 1 (or 0% to 100%), when we plot the probability of
dependent variable by independent factors, it will demonstrate an ‘S’ shape curve.
Logit Transformation is defined as follows-
Logit = Log (p/1-p) = log (probability of event happening/ probability of event not
happening) = log (Odds)
Logistic Regression is part of a larger class of algorithms known as Generalized Linear

Model (GLM). The fundamental equation of generalized linear model is:
g(E(y)) = α + βx1 + γx2
Here, g() is the link function, E(y) is the expectation of target variable and α + βx1 +
γx2 is the linear predictor (α,β,γ to be predicted). The role of link function is to ‘link’
the expectation of y to linear predictor.
System Requirements:
Hardware Specification:
• A computer with at least 6GB of RAM(Random Access Memory)

• 1 TB of Hard disk

Software Requirements:
• Jupyter Notebook
• Language: Python
• Tkinter
Program:
from tkinter import *
import os

global screen3
screen3 = Tk()
screen3.title("Main Window")

Label(screen3, text = "Welcome to Diabetes Prediction",fg = 'Blue', font =
('Calibari',30)).pack()
Label(screen3,text = "").pack()
Label(screen3, text = "Diabetes is a condition in which the amount of glucose (sugar)
in your blood is too high because your body cannot use it properly. \nThis happens
because your body either cannot use or make a hormone called insulin,which is
responsible for turning sugar into food for your body's cells.").pack()

Label(screen3, text = "Factors Affecting Diabetes",fg = 'Red',font=('Calibari',25)
).pack()
Label(screen3,text = "Blood Pressure",fg = 'Green',font = ('Calibari',20)).pack()
Label(screen3,text="Having diabetes raises your risk of heart disease, stroke, kidney
disease and other health problems. Having high blood pressure also raises this
risk.\n If you have diabetes and high blood pressure together, this raises your risk of
health problems even more.If you have diabetes, your doctor will want to be sure
that your blood pressure is very well controlled. \nThis means that they will probably
want your blood pressure to be below 130 over 80.").pack()

Label(screen3,text = "Age",fg = 'Green',font = ('Calibari',20)).pack()
Label(screen3,text="Diabetes in older adults is a growing public health burden. The
unprecedented aging of the world's population is a major contributor to the diabetes
epidemic, \nand older adults represent one of the fastest growing segments of the
diabetes population").pack()

Label(screen3,text = "BMI",fg = 'Green',font = ('Calibari',20)).pack()
Label(screen3,text="An increase in body fat is generally associated with an increase
in risk of metabolic diseases such as type 2 diabetes mellitus, hypertension and
dyslipidaemia .\n Body mass index (BMI) criteria are currently the primary focus in
obesity treatment recommendations, with different treatment cutoff points based
upon the presence or absence of obesity-related comorbid disease ").pack()

Label(screen3,text = "Glucose",fg = 'Green',font = ('Calibari',20)).pack()
Label(screen3,text="The presence of glucose in the blood stimulates the pancreas to
secrete insulin. ").pack()

Label(screen3,text = "Insulin",fg = 'Green',font = ('Calibari',20)).pack()
Label(screen3,text="The insulin facilitates the transport of glucose from the blood
into the cells where it is used. \nIf not enough insulin is secreted, the glucose blood
level remains high.\n Consistently high blood glucose levels caused by insufficient
insulin is diabetes mellitus.").pack()

Label(screen3,text = "DiabetesPedigreeFunction",fg = 'Green',font =
('Calibari',20)).pack()
Label(screen3,text="It provides some data on diabetes mellitus history in relatives
and the genetic relationship of those relatives to the patient").pack()

Label(screen3,text = "SkinThickness",fg = 'Green',font = ('Calibari',20)).pack()
Label(screen3,text="Glucose measurement from different skin areas might be
influenced by changes in skin texture due to several environmental
confounders").pack()

Label(screen3,text = "Pregnancies",fg = 'Green',font = ('Calibari',20)).pack()
Label(screen3,text="The effects of pregnancy on acute metabolic complications of
diabetes may have important consequences for both mother and fetus.").pack()
Label(screen3,text="").pack()

def enter():
global root
root=Toplevel(screen3)
root.title("Diabetes Prediction")

label_1 = Label(root, text = 'BloodPressure')
label_2 = Label(root, text = 'Age')
label_3 = Label(root, text = 'BMI')
label_4 = Label(root, text = 'Glucose')
label_5 = Label(root, text = 'Insulin')
label_6 = Label(root, text = 'DiabetesPedigreeFunction')
label_7 = Label(root, text = 'SkinThickness')
label_8 = Label(root, text = 'Pregnancies')

entry_1 = Entry(root)

label_1.grid(row = 0, sticky = E)

entry_1.grid(row = 0 , column = 1)

data = []

def callback():
data.insert(0,entry_1.get())
print(data)

button1 = Button(root, text="Submit",command = callback)
button1.grid(columnspan = 2)

def pri():
df1 = pd.read_csv("diabetes.csv")

x
=['Pregnancies','Glucose','BloodPressure','SkinThickness','Insulin','BMI','DiabetesPedi
greeFunction','Age']
y=['Output']

from sklearn.model_selection import train_test_split

X_train,X_test,y_train,y_test=train_test_split(df1.drop('Outcome',axis=1),df1['Outco
me'],test_size=0.20,random_state=101)

from sklearn.linear_model import LogisticRegression
LRModel=LogisticRegression()
LRModel.fit(X_train,y_train)
predictions_diabetes=LRModel.predict(X_test)

from sklearn.metrics import classification_report, confusion_matrix

x=['Pregnancies','Glucose','BloodPressure','SkinThickness','Insulin','BMI','DiabetesPe
digreeFunction','Age']
paitentid_54=pd.DataFrame([data],columns=x)
predictions_diabetes=LRModel.predict(paitentid_54)

return predictions_diabetes

var = IntVar()
var.set(0)

button2 = Button(root, text="Predict",command = lambda: var.set(pri()))
button2.grid(columnspan = 3)

lbl = Label(root, textvariable=var)
lbl.grid(row = 10, sticky = E)

Button(screen3,text = "Prediction",command =enter ).pack()

References:
• https://towardsdatascience.com/logistic-regression-detailed-overview-46c4da4303bc
• https://towardsdatascience.com/build-develop-and-deploy-a-machine-learning-model-to-predict-
cars-price-using-gradient-boosting-2d4d78fddf09
• https://github.com/udacity/machine-learning/blob/master/projects/capstone/report-example-
1.pdf
• https://towardsdatascience.com/logistic-regression-detailed-overview-46c4da4303bc
•

Minor Project Report

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Minor Project Report

Загружено:

Авторское право:

Доступные форматы

DIABETES PREDICTOR

Samrat Ashok Technological Institute, Vidisha

ANAMIKA TRIPATHI (0108CS161009)

Under the supervision of

Prof. SANJEET KUMAR

Department of Computer Science & Engineering

(Project Guide) Assistant Professor SHRIVASTAVA (H.O.D.)

Samrat Ashok Technological Institute, Vidisha (M.P.) Vidisha (M.P.)

S.No NAME Roll No.

1. Anamika Tripathi 0108cs161009

2. Ritu Lakhani 0108cs161045

Model is trained using Logistic Regression on the dataset diabetes.csv, which

The trained model is then tested and prediction is done.

1. Setting the research goal

Setting the research goal:

Age groups present in the dataset:

hΘ(x) = sigmoid (Z)

Logistic Regression is used when the dependent variable (target) is categorical.

Logistic regression is generally used where the dependent variable is Binary or

Logistic Regression Assumptions:

· Binary logistic regression requires the dependent variable to be binary.

· Only the meaningful variables should be included.

· The independent variables are linearly related to the log odds.

· Logistic regression requires quite large sample sizes.

Logistic Regression Equation:

Logit Transformation is defined as follows-

Logistic Regression is part of a larger class of algorithms known as Generalized Linear

g(E(y)) = α + βx1 + γx2

• A computer with at least 6GB of RAM(Random Access Memory)

Вам также может понравиться