Вы находитесь на странице: 1из 8

A SPIKY STUDY OF TAMILNADU CRIME DATA USING DATAMINING Abstract It news, the is occurrence evident of crime from and

respect of whole India is a tedious task.$o the research focused here is to do research only about Tamilnadu. The #esearch addresses two problems related to crime analysis. The first part of this paper deals with data clustering .This paper reviews si% types of clustering techni ues are presented and compared. It is used to identify the most suitable algorithms from the si% different algorithms such as techni ues.& k&'eans (lustering, )ierarchical (lustering, "*$can clustering, "ensity *ased (lustering, +ptics, ,' !lgorithm. It is used to identify the most suitable algorithms from the different clustering algorithm .The second part of this paper deals with an intelligent crime analysis and recording system designed to overcome problems that appear mainly in the Tamilnadu police department. It is a GI$ based system which comprises of data mining techni ues such as )otspot detection, (rime clock, (rime comparison, (rime pattern visualization, +utbreaks detection and the nearest police station detection. $alient features of the proposed system include a rich environment for crime data analysis and a simplified environment for location based data analysis. It facilitates the identification of various types of crimes in detail and assists the police personals to control and prevent such incident efficiently.

newspapers,TV,web and other sources of terrerorism in India is increasing year by year .Generally the crime rate is not reduced rather it increases. This despicable act of terrerorism and growing crime is a big threat for the countrys peace and likely hood. They highly devastating the countrys resources. The increase in crime rate and terrerorism threat needs to be controlled in long run to be eradicated before it depletes the resources gradually. The crime occurrence and terrorist attacks have been recorded by the police department country wide. This huge volume of crime records needs to be thoroughly analyzed to reveal the fre uency of crime occurrence crime type. Type of terrorist attack and other factors. The outcome of analysis should be interpreted and concluded. The conclusion should be submitted to police higher officials as suggestions and recommendations. !nalyzing this volume data manually is a cumbersome task. "ata mining techni ues and tools have been proposed to be used in this research by the researcher. #esearch in

The conclusion of the study will be recommended to the Tamilnadu police department as suggestions to reduce the crime level to a limit. TermsData clustering, K-Means Clustering, Hierarchical Clustering, DB Scan Clustering, Density Based Clustering, OPTICS, EM Alg rithm, Crime Anal sis! Crime In"esti#ati$n! Data Minin# Intr$%&cti$n The primary goal of crime data mining is to identify crime trends and patterns-series. 'ining of crime data provides timely and pertinent information about crime patterns. It will also provide trend analysis to assist the law enforcement personnel, which would help them in planning and deployment of resources for the prevention and suppression of criminal activities. (ombining historical data with current data sometimes would aid to unearth new clues, thus helping in solving many pending crimes. !lso, it will aid in the investigation process. In crime data analysis, statistical e%aminations are performed on the fre uency of specific crimes in order to evaluate the security of the property and persons. It involves careful analysis of time, location, type of crime that has been committed at a particular place and the appropriate steps are taken to reduce crime.

Through research and documentation of crimes and categorization by type of offenses, location and time, gradual patterns and trends will emerge which will lead to preventive solutions. The ob.ective of crime data mining is evaluating the probability of a crime and assessing risks. This involves the analysis of data pertaining to observed behavior and modeling it in order to determine the likelihood of its occurrence again. !n estimation of the probability of a crime or attack occurring is made using documented historical data such as crime reports. /or e.g. a security professional may entail the documented statistics on car thefts for a building over a one year period. (01$T,#I2G is a data mining techni ue to group the similar data into a cluster and dissimilar data into different clusters' (lustering can be considered the most important unsupervised learning techni ue so as every other problem of this kind, it deals with finding a structure in a collection of unlabeled data. (lustering is 3the process of organizing ob.ects into groups whose members are similar in some way. ! cluster is therefore a collection of ob.ects which are 3similar between them and are 3dissimilar to the ob.ects belonging to other clusters. (lustering of is the unsupervised classification patterns

4observations, data items, or feature vectors5 into groups 4clusters5. "ata clustering is a process of putting similar data into groups. ! clustering algorithm partitions a data set into several groups such that the similarity within a group is larger than among groups. 'oreover, most of the data collected in many problems seem to have some inherent properties that lend themselves to natural groupings. (lustering algorithms are used e%tensively not only to organize and categorize data, but are also useful for data compression /inding these and model construction. or trying to groupings

vector belongs to by measuring a similarity metric between the input vector and all the cluster centers, and determining which cluster is the nearest or most similar one. $ome of the clustering techni ues rely on knowing the number of clusters apriori. In that case the algorithm tries to partition the data into the given number of clusters. 7& means and /uzzy (&means clustering are of that type. The grouping step can be performed in a number of ways. The output clustering 4or clusterings5 can be hard 4a partition of the data into groups5 or fuzzy 4where each pattern has a variable degree of membership in each of the output clusters5. Aim $( t)e Researc) 8. (lassifying the different types of crimes 9. To identify the most suitable algorithms from the different clustering algorithms. :. To represent graphically )otspot detection, (rime clock, (rime comparison, (rime pattern visualization, +utbreaks detection and the nearest police station detection Nee% an% si#ni(icance 'any police departments all around the world lack good and efficient crime recording and analysis systems. The vast geographical diversity and the comple%ity of crime patterns have made the analyzing and

categorize the data is not a simple task for or three dimensions at ma%imum.5 !nother reason for clustering is to discover relevance knowledge in data. "ata cluster are created to meet specific re uirements that cannot created using any of the categorical levels. +ne can combine data sub.ects as a temporary group to get a data cluster. "isk structure6 4!5 Track 4*5 Geometrical $ector 4(5 Track $ector 4"5 (luster The common approach of all the clustering techni ues presented here is to find cluster centers that will represent each cluster. ! cluster center is a way to tell where the heart of each cluster is located, so that later when presented with an input vector, the system can tell which cluster this

recording of crime data even difficult. !ccording to the Tamilnadu police department, they face these problems for many years. They need good and efficient system to control and prevent various crime incident efficiently

detection. $alient features of the proposed system include a rich environment for crime data analysis and a simplified environment for location based data analysis. It facilitates the identification of various types of crimes in detail and assists the police personals to control and prevent such incident efficiently. The conclusion of the study will be

Si#ni(icance The part of this #esearch paper addresses deals with two data

recommended to the Tamilnadu police department as suggestions to reduce the crime level to a limit.

problems related to crime analysis. The first clustering .This paper reviews si% types of clustering techni ues are presented and compared. It is used to identify the most suitable algorithms from the si% different algorithms such as techni ues.& k&'eans (lustering, )ierarchical (lustering, "*$can clustering, "ensity *ased (lustering, +ptics , ,' !lgorithm. It is used to identify the most suitable algorithms from the different clustering algorithm .The second part of this paper deals with an intelligent crime analysis and recording system designed to overcome problems that appear mainly in the Tamilnadu police department. It is a GI$ based system which comprises of data mining techni ues such as )otspot detection, (rime clock, (rime comparison, (rime pattern visualization, +utbreaks detection and the nearest police station

Data Anal sis 8.9;;;&9;89 (rime record report collected from $tate (rime #ecords *ureau, Tamil 2adu, (hennai < =;; ;9>. 9. $?$$8=.; software used for finding the statistical report. :. tool A' The clustering techni ues of the are = and implemented and analyzed using a clustering @,7!. "atabase6 ?erformance 'y$B0 techni ues are presented and compared database ?ostGI$-?ostgre$B0 database Limitati$ns #esearch in respect of whole India is a tedious task. $o the research focused here is to do research only about Tamilnadu.

within the defined area. Then it clusters the DATAMI!I!" TECH!I#$ES (rime analysis is carried out as a collection of steps6 )otspot detection, (rime clock, (rime comparison, (rime pattern visualization, +utbreaks detection and nearest police station detection. ,ach of these steps has been automated as a tool in the Tamilnadu&crime analysis 2et system. Therefore, the police personals can use different tools in different times according to the situation at hand and decisions can be taken in fast and well organized manner. This section describes each of those analysis tools in detail. %& H ts' t Detecti n (luster analysis is the process of identifying groups of a dataset in such a way that the data inside those groups have specific similarities while the relationships among those groups are minimal. Therefore in order to identify hotspots with high crime density, cluster analysis is used for identifying the clusters of crime spots. The clustering algorithm of the system first accepts the area to be investigated as the input. !ccording to the users inputs the algorithm measures the ,uclidian distances among all the data points with each other data points into the most suitable number of clusters using the nearest neighbor concept and the calculated ,uclidian distances. /inally, the coordinates of the centers of the clusters are identified and the number of crime points inside each of those clusters are returned. "epending on the values returned with a coordinate, each cluster is assigned a color darkness and a radius according to the magnitude of the cluster. (& Crime Cl c) ! crime clock is a representation of the number of crime scenes that has been taken place within the 9A hours of a day. ! crime clock is represented as a bar chart. The 9A hour clock is represented using 9A bars on the graph and the height of each bar represents the number of crime scenes per hour. Three e%tra bars are used to represent the crime scenes without an e%act time of incident. The Cday barD represents the crime scenes which were taken place in the day time, the Cnight barD represents the crime scenes which were taken place in the night time and the Cunknown barD represents the crime scenes which cannot be assigned to any time duration. *& Crime C m'aris n (omparing different types of crimes is very important to get an idea about the

growth of a particular crime over the other types of crimes. ! pie&graph is used to satisfy this re uirement by allowing the analyst the ma%imum freedom to compare the different types of crimes in an optimal way. It shows the percentage comparison between different crime types. +& Crime Pattern ,isuali-ati n In statistics, signal processing, econometrics and mathematical finance, a time series is a se uence of data points, measured typically at successive times spaced at uniform time intervals. Time series analysis comprises methods for analyzing time series data in order to e%tract other meaningful statistics and

police stations. In this system, initially the user can define a reference time frame and then the system will calculate the average 4 5, and the standard deviation 45 of the number of crimes per day per each cluster. If, in a particular cluster, number of crimes within a day is greater than the system will prompt an alert. 0& !earest P lice Stati n Detecti n The GA> decision tree is a predictive machine&learning model that decides the target value 4dependent variable5 of a new sample based on various attribute values of the available data. In an emergency like following a suspect on per suit, it is very important to know clearly about the available police support around the current location. To achieve this task, a nearest police station detection tool has been integrated. The CGA>D classification algorithm is the methodology used in building this tool. /irst, the GA> algorithm is trained for about 8H; data points per each A;; 7m9 area. Those data points include the coordinates and the nearest police stations. The algorithm was trained several times to adopt the coordinates to the predefined classes 4police stations5. @hen the user clicks on a desired point on the map, that coordinate will be analyzed by the algorithm and the most suitable class of that coordinates will be returned.

characteristics of the data. ! time series plot is used to represent the changes in fre uency of crime occurrence. The E&a%is represents the fre uency of crimes and the F&a%is represents the time. .& Out/rea)s Detecti n ! crime outbreak is the occurrence of any crime incidents in e%cess of what would normally be e%pected in a defined geographical area or a time period. C(rime outbreaks detection toolD is an agent system that observes for number of crimes in different regions. If the number of crimes is increased out of control, an alert will be prompted by the system to all the relevant

of the Tamilnadu 2et is composed of a 'y$B0 database, a ?ostGI$-?ostgre$B0 S stem Arc)itect&re The (rime data and analyzing system was built using the following software tools. !ll packages are free or +pen $ource software. Gava = is a powerful ob.ect oriented language. ,clipse G9,, version :.A is the Gava 9 ,nterprise ,dition version of the Gava Integrated "evelopment ,nvironment. !pache Tomcat =.; is the latest +pen $ource web application server. The Google 'aps !?I offers a 9" mapping interface with a robust overlay uery capability. capability ?ostgre$Bl used in database with support for geometry and geospatial con.unction with ?ostGI$ 8.:.9. @,7! is a data mining tool with a collection of machine learning algorithms. The purpose of this course pro.ect is to develop a web application that was capable of searching and visualizing crime report data. The ma.or aspects of this pro.ect involved e%tracting the data from Fml data files into te%t format and storing the data into the database. The ne%t ma.or step involved applying the mining algorithms on the data to e%tract meaningful patterns from the data. The final step is creating a web based front end 4visualization5 to interact with data stored at the back end to represent the data. The model database and a 'ap 0ayers container. Tamilnadu 2et analysis tools communicate with the two databases, 'y$B0 and ?ostGI$-?ostgre$B0, while the Geoserver communicates with the map layers and the ?ostGI$-?ostgre$B0 database. @hen the user re uest is for a map, the system communicates with the +pen 0ayers !?I. In turn, the !?I communicates with the Geoserver to resolve the @'$ and @/$ re uests sent by the Geoserver and provides a layered view of maps to the user. The +pen 0ayers !?I uses the Google 'aps as the base layers while Geo,%t !?I helps the +pen 0ayers !?I to view these information in graphically rich environment. C$ncl&si$n The pro.ect is a good starting point for implementation of data mining for real world e%amples. This pro.ect has brought us insight into various techni ues not only in the field of data mining but also in database utilization, visualization, etc. /ew points of consideration are for the pro.ect itself are "ata uality is an e%tremely important aspect, and we have realized during the course of implementing the pro.ect that more time should have been spent in checking how sane the data we had was. This,

however, would have had no effect at all on the work done, but it would definitely result in much more useful information about the data. !lthough the problem of parsing crime reports wasnIt tackled in this work, we realize how important it is, and how challenging it can be. /rom the variation weIve seen among the different datasets, we believe that some sort of standardization should be enforced among the different police departments in order to make automatic parsing of crime reports more reliable. +ne more issue that could be considered is the use of open&source data mining tools, even though @,7! is a very useful alternative many other tools e%ist that are more robust and feature rich. 1tilization of such tools would proved for more open and feature rich application.

JAK

Geo$erver.

49;8;,

$eptember

8H5.

J+nlineK.

!vailable6

http6--geoserver.org-display-G,+$-@elcome JHK Geo,%t. 49;8;, $eptember 8H5. J+nlineK. !vailable6 http6--geoe%t.orgJ=K ?ostGI$. 49;8;, $eptember 8H5. J+nlineK. !vailable6 http6--postgis.refractions.netJLK (raig @alls M #yan *reidenbach, $pring in !ction, 9nd ,dition, 'anning ?ublications, 1$!49;;H5. J>K Time $eries. 49;8;, $eptember 985. J+nlineK. JOK (lassification 985. 'ethods. J+nlineK. !vailable6 49;8;, !vailable6 http6--en.wikipedia.org-wiki-TimeNseries $eptember html J8;K @hat is 'y$B0Q. 49;8;, $eptember 9:5. J+nlineK. !vailable6 http6--dev.mys l.com-doc-refman-H.;-en-wha

http6--www.d.umn.edu-Ppadhy;;H-(hapterH.

Re(erences J8K (rime 'apping and #eporting $ystem. 49;88, !ugust :85. J+nlineK. !vailable6 https6--www.crimereports.comJ9K Intelligent 'apping $ystem. 49;8;, +ctober 8=5. J+nlineK. !vailable6 http6--maps.met.police.ukJ:K +pen0ayers6 /ree 'aps for the @eb. 49;8;, $eptember 8H5. J+nlineK. !vailable6 http6--openlayers.org-

t&is&mys l.htm l. J88K Grave (rime !bstract for /ull Eear 9;8; for @hole Island /rom ;8.;8.9;8; To :8.89.9;8;. 49;8;, $eptember 9=5. J+nlineK. !vailable6 http6--www.police.lk-images-others-crimeNtr ends-9;8;-graveNcrimeN abstractNforNfullNyearR9;9;8;.pdf. J89K (hen, ).,@.(hung, et al.49;;A5. (rime data mining6 a general framework and some e%amples. (omputer :L 4A56H;&H=.

Вам также может понравиться