Академический Документы
Профессиональный Документы
Культура Документы
Abstract:- Modern Era inventions have resulted in a better Section 2 includes a literature review on work done by recent
life span and a safer and better world. In spite the fact that researchers. Section 3 describes the accident dataset including
there is a remarkable development in health status and description of attributes. Section 4 describes the methodology
standard across the globe. One area that still haunts employed in this thesis by using machine learning algorithms
human life the most is the road accidents, resulting in to evaluate the performance of different algorithms. Section 5
colossal loss of immensely precious human life every year. describes performances of algorithms with accuracy and also
In this work, we avail data mining techniques to analyze draws the conclusion about the thesis and includes some
the record of accident dataset in recent years. We possible areas for further research.
proposed hybrid approach by using combinations of
machine learning algorithms i.e. Bayes Net, j48 decision II. LITERATURE SURVEY
tree, j48 graft and analyze their performance in
forecasting the road accidents. The experimental results This chapter describes the review of published studies in
reveal that a hybrid algorithm is more suitable than the recent years. Authors paid a great interest in recognizing the
single algorithm to predict occurrence of road accidents in accident data set by using different data mining approaches to
future. This research provides an insight to reduce the determine the cause of road accidents that significantly affects
road accidents to certain extent. the severity of accidents.
Keywords:-Road Accident, Data Mining, Weka. Ayushi Jain et. al [1] focus to create a model that expels the
discrepancy of the data by clustering to group similar objects
I. INTRODUCTION together to locate the accident prone areas and these formed
clusters are labeled for using decision tree. The author
In the recent years, the increased number of transportation has attempts to identify the accident prone areas in states and
given rise to greater number of road accidents. Transport territories of India against the causes that contribute to road
accidental costs; in terms of casualties and injuries have accident and the experiences of drivers by using K-means
profound implication on the survivors. Apparently survivors, clustering, decision tree algorithms to analyze the accident
their families both have to sustain the sufferings, caused by the data set to improve road safety management.
accident [1].
Isra et.al [2] determines the causes of road accidents by using
Demographic data of low income countries elaborates that various data mining techniques to figure out the factors leads
they bear a large share of the burden, accounting for 85 to car accidents severity in Riyad. The result of the models has
percent of annual deaths and 90 percent of the disability- comparable accuracy values. The intent of this paper lies on
adjusted life years lost because of road traffic injury [6]. analyzing the road accidents records by using three
Vehicle crashes are known to be the third-leading countries of classification techniques i.e.; CHAID, j48 and Nave Bayes are
casualties and injuries [5]. In low and middle income nations, used to predict the factors responsible for car accidents and
people from lower socio economic strata are more prone to be also highlight crashes involving distracted drivers that can
involved in a road traffic accident[12]. Though, a lot has been assist the authorities to implement effective measures.
done in this field, till it leaves much to be desired, to prevent
these mishaps [16]. Sachin Kumar et.al [3] focused on analyzing the accident data
by using association rule mining algorithms and clustering
Databases are increasing in size to a level where traditional algorithms to identify various factors associated with road
techniques of analysis and visualization of the data are accidents. Synthesis of k mode clustering along with
cracking down [9]. Wide availability of large volume of data association rule mining is very impressive as it provides vital
in can be turned into useful information through data mining. data that would not be possible if no segmentation has been
The basic assumption of data mining is to analyze the data done before generating association rules. Then trend analysis
from different aspects [10].The study highlights analyzing the is made for each cluster and entire data set of the road
accident data in past records to reduce the accident rate and accident, finds different trends in the different cluster.
death rate.
IJISRT17AG18 www.ijisrt.com 7
Volume 2, Issue 8 , August 2017 International Journal of Innovative Science and Research Technology
ISSN No: - 2456 2165
RamyaV [4] analyses the accident patterns from many M. Gupta et. al [11] proposed approach in which association
different dimensions or angles to predict the probability of rule mining is used to find out the severity of road accident
accident to occur and also determines the death rates due to data obtained from police records in Mujjafarnagar district,
road accidents. The combination of approaches CRISP- Uttar Pradesh, India. To identify the accident severity based on
DM(Cross-Industry Standard Process for Data Mining) and fatal, major injury and minor/no injury occurred during
SEMMA(sample, explore, modify, model, assess) and the accidents. The association rules exposed different factors that
algorithms such as: random forest, J48, Nave Bayes, lib SVM, are associated with road accidents in each of the categories.
Regression Analysis are used to identify the accident zones to Author describes the way to find some hidden factors that can
improve road safety and draws attention in the view of road be used to understand the factors that cause road accident.
accidents statistics.
Dheeraj Khera et.al [12] predicts the severity of road accidents
Frantisek Babic et al. [5] describes the way to apply the techniques by using classification and regression tress, random
compiled data to extract frequent patterns and vital factors tree, ID3, functional tree, J48, nave Bayes and PART over
causing different accidents by using Decision Tree algorithm, different data sets to yields the better results with accuracy.
apriority algorithm. Analyzing the real sample data using two Using the different methods of data mining, the author
alternatives i.e.; predictive mining through decision tree concluded that J48 performed with better accurate results as
algorithms, and using descriptive mining by the apriority compared to other techniques used to improve the
algorithm. transportation system.
An Shi et. al [6] describes the traffic flow on highways, when Bahram Sadeghi et. al [13] contributes to determining the
crashes took place. The author proposed a time series model various factors leads to road accidents by using data mining
by using temporal data mining method and Clustering analysis algorithms including Market basket analysis, association rule,
was done to study the transport dynamics. Result of the pattern mining that would help transportation system to
proposed method was carried out by numerical overcome the accidents and analyses accident patterns to
experimentation. The newly developed method highlights the predict future behaviors.
increase of traffic flow and the resultant crashes on highways.
Hence in previous work various different data mining
Maninder et. al [7] analyze the accident patterns of collision techniques were used to analyze the record of accident dataset
occurred during road accidents that involves head on collisions to improve the traffic system and to reduce factors leading to
between vehicles and animals, vehicles and pedestrians and road accidents [14].The following table summarizes the above
vehicles and fixed obstructions due to the high speed of road described work done to prevent road accidents.
traffic. To predict accident severity by using different data
mining techniques that are classification and regression trees, S.no. Author Objectives Data Mining
random forest, ID3, functional tree, nave Bayes and j48 to Techniques
prevent the road accidents. 1 Ayushi jain, To find the k-means
garima Ahuja accident prone clustering,
Dipo T. Akolmalageet. al [8] made attempts to analyze the (2016) areas with decision tree
cause and occurrence of crashes on highways using various respect to
data mining techniques to overcome road mishaps. They used different
different techniques to analyses the reasons of collision along accident
this route and the effects of accidents. These mishaps were factors.
also explored by using decision tree: ID3, functional tree. To 2 Isra Al- Understanding CHAID, J48
find the cause of road collision and accident prone locations Turaiki, Maryam the various and Naive
proposed using Lagos Ibadan Highway. Aloumi, Nour factors that Bayes
Aloumi, Khulood causes car
Hiwot teshome et. al [9] determines the factors that cause road Alghamdi (2016) accidents by
accidents to reduce the accident rate attempted the possible comparing the
application of data mining to predict the relationship between performance of
drivers experience including different attributes and road classification
traffic accident. They followed the CRISP-DM model, and techniques to
machine learning tool Weka is used to implement the Nave compute the
Bayes, J48, and PART algorithms to find the factors accurate
associated with the road accident. results.
3 Sachin kumar, Analysis of Association
Kwon OH et. al [10] Analyzing the risk factor of road accident Durga Toshiwal data to identify rule mining
and the result of the work was evaluated by comparing the (2015) the factors algorithms
classification methods: Nave Bayes Classifier, decision tree associated with and K-mode
classifier and Binary Logistic Regression Model. road accidents clustering
IJISRT17AG18 www.ijisrt.com 8
Volume 2, Issue 8 , August 2017 International Journal of Innovative Science and Research Technology
ISSN No: - 2456 2165
5 Frantisek Babic, To use the Decision tree 9 Hiwot teshome, To determine J48, PART,
Karin Zuskacona collected data algorithm, tibebe beshah the factors Nave Bayes.
(2016) about road Apriori leads to road
accidents to algorithm accidents in
mine frequent relation with
patterns and training of the
important driver.
factors causing
different types
10 Kwon oH, Rheew, Analyzing the Nave Bayes
of accident
Yoon Y risk factor of Classifier,
road accident Decision tree
by comparing classifier,
the Binary
classification Logistic
6 An Shi, Zhang This paper Clustering methods Regression
Tao,Zhang describes analysis, Model.
Xinming, Wang traffic flow temporal data 11 Meenu Gupta, To identify the Association
Jian(2014) evolution by mining Vijender Kumar traffic rule Mining
creating a Time Solanki, Vijay accidents
series model Kumar Jain severity by
using temporal (2016) understanding
data mining. the hidden
The newly factors behind
developed road accidents
method
highlights the 12 Dheeraj Khera, Determine the classification
evolution of Williamjeet Singh degree of injury and regression
traffic flow on (2014) severity in road tress, random
highways traffic accident tree, ID3,
by evaluating functional
the variables tree, J48,
that contribute nave bayes
in severity of and PART
injury occurred
7 Maninder Singh, To determine Classification
during
Amrit Kaur (2016) the factors and regression
accident.
causes road tree, ID3,
13 Bahram sadeghi To identify the Market basket
accident that random
Bigham (2014) factor behind analysis,
provides the forest,
road accidents association
methods for functional
analyzing the rule, Pattern
traffic accident tree, J48,
accident mining.
prevention to nave bayes.
patterns to
predict the rate
predict future
of accident
behaviors.
severity
IJISRT17AG18 www.ijisrt.com 9
Volume 2, Issue 8 , August 2017 International Journal of Innovative Science and Research Technology
ISSN No: - 2456 2165
III. DATA SET COLLECTED The above described attributes are related to road accidents,
each attribute contains the different data as described. So, its
The first step in data mining process is to extract the relevant required to convert the format of data into the machine
information from the available database. Accident data set understandable format (.arff) format file that is automatically
have been produced by data repository. To understand the converted by loading the selected data set in weka.
accident dataset in perspective of machine learning [1]. It is
essential to convert the format of selected accident data set to ARFF format includes three parts: relation, attribute and data.
the required ARFF (attribute relation) format file that is Attributes information in machine readable format is described
machine readable format. This dataset contains 7 attributes below:
related to road accident data i.e. Fatal, Commercial Vehicle,
Alcohol, Year, gender, driver city, and accident. Dataset type @relation accident data
can be either nominal or numeric form. The table given below @attribute Fatal {yes, no}
describes the detailed description of attributes.
@attribute Commercial Vehicle {yes, no}
@attribute Alcohol {yes, no}
The detailed description of attributes can be viewed by the
table. @attribute Year {2014, 2015, 2016}
@attribute Gender {male, female}
S.NO ATTRIBUTES TYPE DESCRIPTION @attribute Driver city {Annapolis, Rockville,
Baltimore, Cumberland}
1 Fatal Nominal Describes accident that
causes someone to die @attribute accident {yes, no}
or not.
IV. METHODOLOGY
6 Driver City Nominal Describes that city in A Bayes net model can be used to calculate the probability of
which in accident any event involving variables in the domain, conditional on
occurred any other event.
IJISRT17AG18 www.ijisrt.com 10
Volume 2, Issue 8 , August 2017 International Journal of Innovative Science and Research Technology
ISSN No: - 2456 2165
Algorithm for Bayesian Network: If the two nodes are dependent, add this arc back to E;
Algorithm: Algo_Bayes net otherwise remove the arc permanently.
Input: Training Data set
Output: Structure of Bayesian network
B. J48 Decision Tree
a). Procedure
A decision tree is a decision support tool that decides on the
b). Phase I: (Drafting) basis of certain conditions, a graphical model in which an
internal node represents a test on an attribute, each branch
Initiate a graph G (V, E) where V= {all the nodes of a denotes the outcome of the test and leaf node represents a class
label distribution.J48 Decision tree is the implementation of
data set}, E= { }. Initiate two empty ordered set S, R.
ID3. J48 is an open source Java implementation of the
C4.5 algorithm and by applying J48 decision tree on available
For each pair of nodes (Yi, Yj) where Yi, Yj Y, estimate dataset would help to predict the target variable of a new
mutual information I (Yi, Yj) using equation (1). For the dataset record [11]. It follows the following algorithm:
pairs of nodes that have mutual information greater than a
certain small value , sort them by their mutual Algorithm for J48 Decision Tree:
Algorithm: J48_ALGO
information from large to small and put them into an
Input: Training dataset
ordered set S. Output: Decision tree
Get the first two pairs of nodes in S and remove them a). Procedure:
from S. Add the corresponding arcs to E. (the direction of
the arcs in this algorithm is determined by the previously In this step, on the basis of attributes value of available
available nodes ordering.) training data set, identify the value of the attributes and
distinguish the various instances clearly.
Get the first pair of nodes remained in S and removes it Selecting recursively, classify the best attribute to have
from S. If there is no open path (open path means every the highest information gain.
node in the path is active) between the two nodes (these
two nodes are d-separated given empty set), add the If there is any value for which there is no ambiguity, then
corresponding arc to E; Otherwise, add the pair of nodes we stop and assign the target value as decision attribute
to the end of an ordered set R. for the node.
Get the first pair of nodes in R and remove it from R. In case, we run out of attributes or if unable to get the as it
is value for the target variable from the available data, we
Find a block set that blocks each open path between these assign this branch a target value that most of the items in
two nodes by a set of minimum number of nodes. this branch possess.
Conduct a Conditional independency test. If these two
nodes are still dependent on each other given the block In this decision tree, follow the order of attribute selection
as we obtained for tree. Verify all the attributes, their
set, connect them by an arc.
values with the values in the decision tree and then assign
the target value of this new instance.
Go to step 6 until R is empty.
C. J48 Graft
d). Phase III: (Thinning)
Grafted Decision tree implemented in J48 Graft component of
For each arc in E, if there are open paths between the the weka machine learning workbench, grafting would allow
two nodes besides this arc remove this arc from E temporarily adding nodes to the tree to increase the performance of
and call procedure find_block_set (current graph, node1, decision tree as good as possible.
node2). Conduct a CI test on the condition of the block set.
IJISRT17AG18 www.ijisrt.com 11
Volume 2, Issue 8 , August 2017 International Journal of Innovative Science and Research Technology
ISSN No: - 2456 2165
a). Procedure:
IJISRT17AG18 www.ijisrt.com 12
Volume 2, Issue 8 , August 2017 International Journal of Innovative Science and Research Technology
ISSN No: - 2456 2165
each classifier are obtained by 10-fold cross validation and analysis of traffic accident data can be done to reduce the
66% of split validation with accuracy. accident rate to certain extent.
REFRENCES
IJISRT17AG18 www.ijisrt.com 13
Volume 2, Issue 8 , August 2017 International Journal of Innovative Science and Research Technology
ISSN No: - 2456 2165
[5]. Bahram Sadeghi Bigham, "Road Accident data analysis: [17]. Jaideep Kashyap1, Chandra Prakash Singh, "Mining
A Data Mining Approach," Indian Journal of Scientific Road TrafficAccident Data to Improve Safety on Road-
Research, May 2014. related Factors for Classification and Prediction of
[6]. Tibebe Shah, Shawndra Hill, "Mining Road Traffic Accident Severity," International Research Journal of
Accident Data to Improve Safety: Role of Road- related Engineering and Technology (IRJET), October 2016.
Factors on Accident Severity in Ethiopia.," Conference: [18]. Kwon OH,Rhee W, Yoon Y "Application of
Artificial Intelligence for Development, Papers from the classification algorithms for analysis of road safety risk
2010 AAAI Spring Symposium, Technical Report SS-10- factor dependencies," US National Library of Medicine
01, Stanford, California, USA, January 2010. National Institutes of Health, November 2016.
[7]. DipoT. Akomolafe, Akinbola Olutayo, "Using Data [19]. Hamzah Al Najada, Imad Mahgoub, "Big vehicular
Mining Technique to Predict Cause of Accident and traffic Data mining:Towards accident and congestion
Accident Prone Locations on Highways," American prevention," International Wireless Communications and
Journal of Database Theory and Application, 2012. Mobile Computing Conference (IWCMC), September
[8]. Dheeraj Khera, Williamjeet Singh, "A Review on Injury 2016.
Severity in Traffic System using Various Data Mining [20]. An Shi, Zhang Tao, Zhang Xinming, and Wang Jian,
Techniques," International Journal of Computer "Evolution of Traffic Flow Analysis under Accidents on
Applications, August 2014. Highways Using Temporal Data Mining," International
[9]. M. Sowmya, Dr.P.Ponmuthuramalingam, "Analyzing the Conference on Intelligent Systems Design and
Road Traffic and Accidents with Classification Engineering Applications, December 2014.
Techniques," International Journal of Computer Trends [21]. Emil Brissman, Kajsa Eriksson, Classification:
and Technology (IJCTT), november 2013. Grafted Decision Trees, Linkping University Campus
[10]. PragyaBaluni Y.1 P. RaiwaniV, "Extraction of Road Norrkping, December 2011.
accident patterns in Uttarakhand using the neural [22]. Jie Cheng, David A. Bell, Weiru Liu, "An Algorithm
network," International Journal of Computer Engineering for Bayesian Belief Network Construction from Data,"
and Technology, August 2014. University of Ulster, U.K., April 2010.
[11]. S.Vigneswaran, A.Arun Joseph, E.Rajamanickam,
"Efficient Analysis of Traffic Accident Using Mining
Techniques," International Journal of Software and
Hardware Research in Engineering, March 2014.
[12]. Isra Al-Turaiki, Maryam Aloumi, Nour Aloumi,
Khulood Alghamdi, "Modeling traffic accidents in Saudi
Arabia using classification techniques," Saudi
International Conference on Information Technology (Big
Data Analysis) (KACSTIT), December 2016.
[13]. MS.R. Saravanya,Ms.Mangayarkarasi, "A Study and
Analysis of Road Accident in Tamilnadu using Data
mining Technique," International Journal of Recent
Engineering Science (IJRES), August 2015.
[14]. HiwotTeshome, TibebeBeshah, "Traffic Accident
Analysis from Drivers Training Perspective Using
Predictive Approach," HiLCoE Journal of Computer
Science and Technology, December 2013.
[15]. Meenu Gupta, Vijender Kumar Solanki, Vijay Kumar
Singh, "A NovelFramework to Use Association Rule
Mining for classification of traffic accident severity,"
Ingeniera Solidaria, January 2017.
[16]. Farzaneh Moradkhani, SomayyaEbrahimkhani,
BahramSadeghiBegham, "Road Accident Data Analysis:
A Data Mining Approach," Indian Journal of Scientific
Research, May 2014.
IJISRT17AG18 www.ijisrt.com 14