Chapters 2

CERTIFICATE
It is to certify that Himanshu Pawar (1609010056), Kumari Shweta(1609010066), Karan

Kandpal(1609010061), Raghav Sharma (1609010098) has carried out the research work presented in
this thesis entitled “Crime Rate Prediction Using K-Means” for the award of Bachelor of Technology
from Dr APJ Abdul Kalam Technical University, Lucknow under the supervision of Ms. Ankita Taak.
The thesis embodies results of original work, and studies are carried out by our group and the
contents of the thesis do not form the basis for the award of any other degree to the candidate or to
anybody else from this or any other University/Institution.
Signature
(Ms. Ankita Taak)
(Project Guide)
(I.E.C - CET)
Date:
ABSTRACT
Crime analysis and prevention is a systematic approach for identifying and analysing
patterns and trends in crime. Our system can predict regions which have high
probability for crime occurrence and can visualize crime prone areas. With the
increasing advent of computerized systems, crime data analysts can help the Law
enforcement officers to speed up the process of solving crimes. About 10% of the
criminals commit about 50% of the crimes. Even though we cannot predict who all
may be the victims of crime but can predict the place that has probability for its
occurrence. K-means algorithm is done by partitioning data into groups based on their
means. K-means algorithm has an extension called expectation - maximization
algorithm where we partition the data based on their parameters. This easy to
implement data mining framework works with the geospatial plot of crime and helps
to improve the productivity of the detectives and other law enforcement officers. This
system can also be used for the Indian crime departments for reducing the crime and
solving the crimes with less time
ACKNOWLEDGEMENT
It is my pleasure to be indebted to various people, who directly or indirectly contributed in

the development of this work and who influenced my thinking, behaviour, and acts during the
course of study.
I express my sincere gratitude to Prof. Rajnesh Singh, HOD of CSE/IT for providing me an
opportunity to undergo this project “CRIME RATE PREDICTION USING K -MEANS in
the premises of IEC COLLEGE OF ENGINEERING & TECHNOLOGY.
I am thankful to Ms. Ankita Taak for her support, cooperation and motivation provided to
me during the project for constant inspiration, presence and blessings.
Lastly, I would like to thank the almighty and my parents for their moral support and my
friends with whom I shared my day-to-day experience and received lots of suggestions that
improved my quality of work.
Name-Himanshu Pawar (1609010056)
Date- Kumari Shweta (1609010066)
Karan Kandpal (1609010061)
Raghav Sharma (1609010098)

1. CHAPTERS
1.1 INTRODUCTION
In present scenario criminals are becoming technologically sophisticated in

committing crime and one challenge faced by intelligence and law enforcement
agencies is difficulty in analysing large volume of data involved in crime and
terrorist activities therefore agencies need to know technique to catch criminal
and remain ahead in the eternal race between the criminals and the law
enforcement. So appropriate field need to be chosen to perform crime analysis
and as data mining refers to extracting or mining knowledge from large
amounts of data, data mining is used here on high volume crime dataset and
knowledge gained from data mining approaches is useful and support police
forces. To perform crime analysis appropriate data mining approach need to be
chosen and as clustering is an approach of data mining which groups a set of
objects in such a way that object in the same group are more similar than those
in other groups and involved various algorithms that differ significantly in their
notion of what constitutes a cluster and how to efficiently find them. In this
paper k means clustering technique of data mining used to extract useful
information from the high- volume crime dataset and to interpret the data which
assist police in identify and analyse crime patterns to reduce further occurrences
of similar incidence and provide information to reduce the crime. In this paper k
mean clustering is implemented using open source data mining tool which are
analytical tools used for analysing data .Among the available open source data
mining suite such as R, Tanagra ,WEKA ,KNIME ,ORANGE , Rapid miner k
means clustering is done with the help of rapid miner tool which is an open
source statistical and data mining package written in Java with flexible data
mining support options. Also, for crime analysis dataset used is from Kaggle in
India. In this, we are taking the data district wise & then state wise.
It is done by partitioning data into groups based on their means. We are using
clustering algorithms to predict Crime prone areas.
We Choose clustering technique over any other supervisors techniques such as
classification since crimes vary in nature widely and crime databases are
often filled with unsolved crimes. Therefore, classification technique that will
rely on the existing and non-solved crimes, will not give good predictive
quality for future crimes.
1.2 Identification of Problems & Issues
In this problem we are identifying the ways by which we can identify the
crime. Criminal identification is and indispensable in the combating of crime it
is not only the most potent factor in securing the apprehension of the criminal
but it’s establishment enables the judiciary to sentence guilty equitably. In most
countries that the detection of crime is the responsibility of police those special
law enforcement agencies may be responsible for the discovery of particular
types of crime example customs department may be charged with combating
smuggling and related offences. Crime detection Falls into three
distinguishable phases the discovery that a crime has been committed, the
identification of a suspect, and the collection of sufficient evidence to indicate
the suspect before court. Many crimes are discovered and reported by persons
other than the police example victims or witnesses. To detect crimes such as
murder ,burglary, theft, childhood abuse , cybercrime, domestic abuse, fraud by
taking data sets from Kaggle & with effective algorithms we will try to predict
these issues & inform the respective police to look into this & take respective
methods to prevent them .
1.3 Formulation Of Problem
Our algorithm aims at minimizing an objective function known as squared error
function given by:
Where,‘||xi - vj||’ is the Euclidean distance between xi and vj.
‘ci’ is the number of data points in ith cluster.
‘c’ is the number of cluster centres.
1.4 Solution Approach

1)Let X = {x1, x2, x3,........,xn} be the set of data points and
V = {v1,v2,.......,vc} be the set of Centres.
2) Randomly select ‘c’ Cluster Centres.
3) Calculate the distance between each data point and Cluster Centres.
4) Assign the data point to the cluster centre whose distance from the cluster
centre is minimum of all the cluster centres
5) Recalculate the new cluster centre using:

Where, ‘ci’ represents the number of data points in ith cluster.
6) Recalculate the distance between each data point and new obtained cluster
centres.
7) If no data point was reassigned then stop, otherwise repeat from step 3.
1.5 Implementations
Step 1: Loading
Step 2: Cleaning
Step 3: K means implementation on the data sets.

Step 4: Plotting of graphs .

Chapters 2

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Chapters 2

Загружено:

Авторское право:

Доступные форматы

CERTIFICATE

It is to certify that Himanshu Pawar (1609010056), Kumari Shweta(1609010066), Karan

It is my pleasure to be indebted to various people, who directly or indirectly contributed in

Name-Himanshu Pawar (1609010056)

Date- Kumari Shweta (1609010066)

Karan Kandpal (1609010061)

Raghav Sharma (1609010098)

In present scenario criminals are becoming technologically sophisticated in

Where,‘||xi - vj||’ is the Euclidean distance between xi and vj.

‘ci’ is the number of data points in ith cluster.

‘c’ is the number of cluster centres.

1.4 Solution Approach

V = {v1,v2,.......,vc} be the set of Centres.

2) Randomly select ‘c’ Cluster Centres.

5) Recalculate the new cluster centre using:

Step 3: K means implementation on the data sets.

Вам также может понравиться