Академический Документы
Профессиональный Документы
Культура Документы
IMDS
105,9 Distractions and motor vehicle
accidents
Data mining application on fatality analysis
1188 reporting system (FARS) data files
Wen-Shuan Tseng, Hang Nguyen,
Jay Liebowitz and William Agresti
Graduate Division of Business and Management, Department
of Information Technology, Johns Hopkins University,
Rockville, Maryland, USA
Abstract
Purpose – This research applies data mining techniques to discover the relationship between driver
inattention and motor vehicle accidents.
Design/methodology/approach – The data used in this research is obtained from the Fatality
Analysis Reporting System of the National Highway Traffic Safety Administration, focused on the
Maryland and Washington, DC area from years 2000 to 2003. The data are first clustered using the
Kohonen networks. Then, the patterns and rules of the data are explored by decision tree and neural
network models.
Findings – Results suggests that when inattention and physical/mental conditions take place at the
same time, the driver has a higher tendency of being involved in a crash that collides into static
objects. Furthermore, with regards to the manner of collision, the relative importance of colliding into a
moving vehicle as the first harmful event is two times higher relative to that of colliding into a fixed
object as the first harmful event in a crash.
Research limitations/implications – The data used in this research are limited to fatal crashes
that happened in Maryland and Washington, DC from years 2000 to 2003.
Originality/value – This is one of the first research papers utilizing data mining techniques to
explore the possible relationships between driver inattention and motor vehicle crashes.
Keywords Data collection, Road vehicles, Accidents
Paper type Research paper
1. Introduction
Driver distraction, which has evolved as a significant issue of highway safety, is a
class of inattentive behaviors and mental status of drivers as defined by the AAA
Foundation for Traffic Safety (Stutts et al., 2001). In 1996, a study published by the
National Highway Traffic Safety Administration (NHTSA) found that approximately
25-30 percent of the injuries caused by car crashes were contributed by driver
distraction (Utter, 2001). In 1999, 11 percent of fatal crashes, corresponding to 4,462
fatalities, were due to inattention of drivers according to the Fatality Analysis
Industrial Management & Data Reporting System (FARS) (Utter, 2001). The use of cellular phones while driving has
Systems
Vol. 105 No. 9, 2005
pp. 1188-1205 The authors gratefully acknowledge the grant and support from the GEICO Educational
q Emerald Group Publishing Limited
0263-5577
Foundation, David Schindler, and Karen Watson for the GEICO Scholarship in Discovery
DOI 10.1108/02635570510633257 Informatics.
been given the most consideration. In 2003, the Harvard Center for Risk Analysis Distractions and
(HCRA) revealed that the risk of using cell phones while driving alone caused 330,000 motor vehicle
moderate to severe injuries and approximately 2,600 deaths each year (Sundeen, 2003).
In response, most states have considered legislation related to driver use of cell phone accidents
while driving.
This research program, funded by the GEICO Educational Foundation and
coordinated with the Department of Information Technology at Johns Hopkins 1189
University, is conducted with the hope of applying data mining techniques to explore
the patterns of distraction factors and traffic crashes. Particularly, we use data mining
as the main tool to understand and recognize the correlation of data from the crash
information provided by FARS. The data system FARS, initiated in 1975, was
developed by the National Center for Statistics and Analysis (NCSA) with the aim to
build up a safer traffic community within the 50 states, the District of Columbia, and
Puerto Rico. FARS contains data on fatal traffic crashes within 30 days of the crash
mainly from police crash reports. Traffic safety problems classified by FARS were
used in evaluating both motor vehicle safety standards and highway safety initiatives.
Data files of crashes collected by FARS were analyzed by using data mining.
This research applies three data mining techniques, including Kohonen networks,
decision trees, and neural networks to find combinations of distraction factors that help
to explain the high accident rates. Cluster detection of the collection of data is done
using Kohonen networks, in which inputs are topologically ordered to compete for a
signal output (Kohonen, 1995). Decision trees allow the exploration and classification of
data mathematically derived from the effect of each incident on successive events
(Marakas, 2003). The potential correlation of distraction and car accidents from the
FARS data files, in addition, is generated by the techniques of neural networks. Neural
networks are designed based on the knowledge of the foundation on biological models
of how the human brain works (Berry and Linoff, 2004). Thus, literally, it is a network
of connected neurons, which imitate the biological counterpart and deliver a helpful
output by combining various inputs together (Berry and Linoff, 2004). The findings of
this research will be applied for future study.
2. Literature review
According to NHTSA, driver distraction is a form of inattention when drivers shift
their attention away from the task at hand. Driver distractions can be classified into
two types (internal distraction and external stimuli) and four categories: visual (e.g.
reading a map), cognitive (e.g. lost in thought), auditory (e.g. respond to a ringing cell
phone) and biomechanical distraction (e.g. manually adjusting the radio volume)
(Ranney et al., 2000). Although it is possible that a driver can be engaged in one or more
activities at the same time, a distraction occurs when a driver “is delayed in the
recognition of information needed to safely accomplish the driving task because some
event, activity, object, or person within or outside the vehicle compels or induces the
driver, shifting attention away from the driving task” as defined by the AAA
Foundation for Traffic Safety (AAAFTS) (Stutts et al., 2001).
Driver distraction is not a new category of causations in car accidents. In fact, driver
inattention is one of the most prevalent causes of traffic crashes (Wang et al., 1996).
However, the concept of driver distractions was so subtle to the public that the
government and legislators did not pay much attention to driver distractions until the
IMDS 1990s when mobile phone use while driving aroused more and more attention for
105,9 distracted driving (“Status Report,” 2002).
Mobile phone usage while driving is a typical type of distraction in accidents. Many
studies concluded that, on the average, drivers talking on the mobile phones while
driving have higher risks in car accidents compared to non-mobile phone users
(Laberge-Nadeau et al., 2003; Wilson et al., 2003; Redelmeier and Tibshirani, 1997;
1190 Strayer and Drews, 2004). Based on telephone interviews of 4,010 drivers in the United
States, NHTSA and the Gallup Organization conducted a national survey of distracted
and drowsy driving attitudes and behaviors in 2002. The result suggested that 6 in 10
participating drivers have a mobile phone. Three in 10 of all drivers nationwide used
mobile phones while driving on at least some of their driving trips. Moreover, talking
with other passengers, tuning radios, reading a map, eating, and dealing with children
riding in the back seats were identified as more common distractions for driving
(Royal, 2003). Still, phoning while driving was among the first few causes that caught
people’s attention about distracted driving.
In addition to the studies by NHTSA, in Canada, two epidemiological studies were
conducted to compare two cohorts – mobile phone users and non-mobile phone users
while driving. One study showed that the use of mobile phones while driving could
increase four times the risk of collisions (Redelmeier and Tibshirani, 1997). In the other
study, Laberge-Nadeau et al. (2003) concluded that mobile phone users were facing 38
percent higher risks in injury collision for talking over mobile phone while driving.
Furthermore, among the mobile phone users, the result indicated that there was a
dose-response relationship between the frequency of cell phone use and crash risk. In
other words, frequent mobile phone users had higher relative risks than rare-users
(Laberge-Nadeau et al., 2003). Laberge-Nadeau et al.’s research, however, did not
provide precise data on cell phone usage while driving. Another similar research was
conducted in the Greater Vancouver Regional District, but using factor analysis and
logistic regression instead of epidemiological approach. Wilson et al.’s (2003) research
suggested that mobile phone users have higher risks of rear-end collisions, supporting
the findings in the work by Redelmeier and Tibshirani (1997) and Laberge-Nadeau et al.
(2003). In this research, Wilson et al. also discovered that the mobile phone users tend
to have more violations in “speeding, impaired driving, seat belt nonuse, aggressive
driving, and non-moving violations” (Wilson et al., 2003). Thus, life styles, attitude and
personality factors should be considered when assessing the direct risk attributable to
mobile phone use (Wilson et al., 2003).
Beside the focus on the relations between mobile phone use and car accidents, some
new studies have switched to compare the impact on driving performance and
behavior between talking over hand-held and hands-free mobile phone interfaces.
According to the study conducted by the University of Utah using driving simulator,
drivers’ reactions are 18 percent slower when using a mobile phone. In other words, the
performance of driver’s ages 18-25 talking on a mobile phone while driving is similar to
drivers of ages 65-74 driving without talking on a mobile phone. Moreover, the risk of
rear-end collision increases two times when drivers talk on a phone while driving
(Strayer and Drews, 2004).
Due to the fast growing mobile phone subscription rate and the car accidents
involving mobile phone use reported from time to time, the public has started to
recognize the distraction problem caused by mobile phones usage while driving. In
November 1997, NHTSA conducted an investigation of the safety implications of Distractions and
wireless communications in vehicles and concluded that conversations over mobile motor vehicle
phones while driving increased the risk of a crash (NHTSA, 1997). Not only did the
research emphasize the mobile phone usage while driving, NHTSA also broadened its accidents
research to all types of drivers’ distraction factors. According to NHTSA’s estimation,
25-30 percent of police reported crashes were caused by some forms of distractions.
NHTSA further estimated that these distractions caused 1.2 million crashes a year and 1191
12,000 fatalities (Sundeen, 2002; Shelton, 2001).
At this point, researchers began to realize that distraction in driving is a more
encompassing issue than merely wireless communication and technologies that were
first perceived as the main distraction for driving. In the NHTSA Driver Distraction
Expert Working Group meetings held in 2000, panelists concluded that many forms of
distraction existed, and that very little was known about the magnitude and
characteristics of the distraction problem (NHTSA, 2000). Hence, more and more
researchers commenced to explore the different variables in distracted driving.
A candid-camera survey on distraction conducted by University of North Carolina
at Chapel Hill and AAAFTS indicated that the use of wireless communication devices
ranked relatively low on the list of distractions. Thirty percent of drivers were
video-captured making phone calls while driving. The most frequently distraction
activities included 97 percent of drivers reaching or leaning, 91 percent caught
adjusting the radio, 77 percent ate or drank while driving, 46 percent groomed
themselves, and 40 percent seen reading or writing (Stutts et al., 2003). This research
also concluded that many distractions were “neither new nor technological in nature”
(Stutts et al., 2003). Instead, as the activities listed, there are aspects that drivers would
do but were seldom aware of (Stutts et al., 2003). The drawbacks of this research are the
small sample size of only 70 drivers and the relatively small number of hours analyzed
(Stutts et al., 2003).
In response to the surveys and studies, government and legislation have started to
view driver distractions seriously. Although the legislative issue is still being debated
throughout the United States, legislations and regulations were gradually established
in an attempt to protect drivers, passengers, and pedestrians from accidents caused by
distracted driving. New York was the first state to ban hand-held mobile phone while
driving in November 2001. In July 2004, the ban became effective in New Jersey and
Washington, DC (www.cellular-news.com/car _bans). One of the latest legislation
efforts in Maryland took place in March 2005. As an effort to be in line with the
legislations in Virginia, Washington, DC and 23 other states as well as to deal with
distracted driving among teenage drivers, the Maryland House of Delegates has
approved bills on restricting teenage passengers and wireless phone usage for teenage
drivers (Snyder and Williamson, 2005).
MAN_COLL Manner of collision (9 items) 0 Not collision with motor vehicle in transport
1 Rear and Head (rear-end, head-on, etc.)
2 Angle (front-to-side, etc.)
3 Side (sideswipe)
4 Unknown
SEX Driver’s gender 1 Male
2 Female
3 Unknown
OCUPANTS Occupants in the motor vehicle recorded 1 One occupant
2 Two occupant
...
VEH_MAN Vehicle maneuver before crash 1 Going straight
2 Slowing or stopping in traffic lane
3 Passing or overtaking another vehicle
4 Parked
5 Turning right
6 Turning left
7 U-Turn
8 Changing lanes or merging
9 Other
10 Unknown
WEATHER Weather condition at the time of crash 1 No adverse atmospheric conditions
2 Rain
3 Sleet
4 Snow
5 Fog
6 Rain and fog
7 Sleet and fog
Table II. 8 Other
Major variables 9 Unknown
Teuvo Kohonen. Although the original application of the networks is used for images Distractions and
and sounds, they can also be used to recognize cluster in data. A Kohonen network motor vehicle
consists of interconnected processing units, which are usually arranged in rectangular
and linear maps. In particular, patterns of data are organized into input vectors of a accidents
discrete map with one or two dimensions and follow a topologically order and compete
for a signal (Kohonen, 1995). After a cluster of data is achieved, decision tree and neural
network models will be put in the play for a further understanding of the patterns of 1195
the data.
A decision tree is a structure that can be used to divide up a large collection of
records into successively smaller sets of records by applying a sequence of simple
decision if-then rules (Berry and Linoff, 2004). Decision trees merge both data
exploration and modeling, so it is useful for exploring data to gain insight into the
relationships of a large number of input variables to a target variable. According to
Breiman et al. (1984), a popular approach to tree-structured modeling consists of two
separate phases: growing and pruning. In tree growing, data is split into small nodes,
referred as “child nodes” (Breiman et al. 1984). Because the ultimate goal of tree
growing is a wholesome outcome, tree growing is limited by sample size, node
homogeneity, or stopping rules. After the trees have been grown, algorithms are used
to prune the tree back. Tree pruning is usually guided by cross-validation within the
training data set (Miller, 2005).
Like a tree, a decision tree includes a root node, child nodes, leaf nodes, and
branches. The decision tree algorithm starts growing at the root node, and stops
growing new nodes when no further splits are necessary. Based on the decision tree
rules, a root node splits into two or more child nodes, and each child splits into one or
more leaf nodes, which lead to a decision or a set of possible answers. Branches connect
one node to another. For each record to be classified, the values in its fields determine
which branches it takes in the tree. The path ends at a leaf node that classifies the
record.
Classification and regression trees (CART) (in Clementine, C&RT), C4.5 (in
Clementine, C5.0), and CHAID are three decision tree algorithms. CART transforms a
data model into a binary tree by splitting the records at each node base on a single
input field. The C4.5 algorithm produces a tree with varying numbers of branches at
each node. For categorical variables, C4.5 assumes one branch for each value of the
variable and comes with a companion program for turning trees into rules. C5.0 in
Clementine is a commercial version of C4.5. Unlike CART and C4.5, CHAID tries to
stop growing the tree before over-flitting occurs.
Artificial neural networks (ANN) are a data mining technique which is
constructed in the inspiration of the biological models of the human nervous system.
Neural networks provide a proven track record in such ways that they are generally
employed in data mining as tools for prediction, classification, and clustering (Berry
and Linoff, 2004). Neural networks modify the biological counterpart in the
interconnection to one another and how they work. Neural networks can be
constructed in feed-forward or feedback architectures. Feed-forward allow signals to
travel one way only from input to output, whereas, the feedback networks can have
signals traveling in both directions by introducing loops in the network. Feedback
networks are dynamic and their state is changing continuously until they reach an
equilibrium point. Thus, feedback networks can become extremely complicated.
IMDS According to Berry and Linoff (2004), the most common neural network is a
105,9 feed-forward network. Feed-forward ANNs tend to be straightforward networks that
associate inputs with outputs. They are extensively used in pattern recognition.
Cluster 1 250 20-30, 30-40 22.40 7 21.60 6 97.60 2 36.80 1 99.20 1 100.00 1 86.80
Cluster 2 320 20-30 28.75 7 21.25 6 92.81 0 100.00 1 100.00 1 99.69 1 93.44
Cluster 3 94 Over 70 21.28 2 24.47 6 95.74 4 94.68 1 98.94 1 95.74 1 87.23
Cluster 4 2 20-30 100.00 5 100.00 6 100.00 3, 8 50.00 1 100.00 1 100.00 1 100.00
Cluster 5 61 20-30 39.34 7 27.87 6 91.80 0 100.00 3 59.20 1 100.00 1 85.25
Cluster 6 65 Over 70 36.92 2 20.00 6 98.46 4 78.46 1 93.85 2 100.00 1 95.38
Cluster 7 9 30-40, 40-50 33.33 1, 3 22.22 6 100.00 4 88.89 3 66.67 1 100.00 1 88.89
Cluster 8 83 20-30 39.76 6 24.10 6 92.77 0 56.63 2 55.42 1 100.00 1 92.77
Cluster 9 187 20-30 21.93 6 19.25 6 96.79 0 39.57 1 58.29 2 100.00 1 91.44
Cluster 10 49 20-30 40.82 1 24.49 6 97.96 0 42.86 2 100.00 2 100.00 1 77.55
Cluster 11 137 30-40 29.20 1 27.10 6 96.35 0 29.93 2 100.00 1 100.00 1 85.40
accidents
motor vehicle
result
Table III.
1197
IMDS
105,9
1198
Figure 1.
Kohonen network
feedback report
Figure 2.
DT model result
Distractions and
No. Leaf node IF Then
motor vehicle
1 C, K, L, M His/her driver related factor is a This driver will more likely be accidents
physical/mental condition involved in a crash which does not
collide with moving vehicles
2 H, I A female driver is under 70 and, This driver will more likely be
Her driver related factors are involved in an angle collision 1199
improper/illegal driving behavior and
obscured vision
3 J A female driver is under 70 and, This driver will more likely be
Her driver related factors are involved in a crash which does not
improper/illegal driving behavior and collide with moving vehicles
obscured vision and,
Physical/mental condition
4 A, B A driver is over 70, This driver will more likely be
His/her driver related factors are involved in an angle collision
improper/illegal driving behavior and
obscured vision
5 D A male driver is between 30 and 40 or This driver will more likely be
between 50 and 70 years old and, involved in an angle collision
His driver related factors are
improper/illegal driving behavior and
vision obscured and,
There are 2 to 4 occupants (including
the driver)
6 E A male driver is between 40 and 50 or
between 20 and 30 years old and,
His driver related factors are This driver will more likely be
improper/illegal driving behavior and involved in a crash which does not
vision obscured and, collide with moving vehicles
There are 1 to 3 passengers
7 F, G A male driver is under 70 years old This driver will more likely be
and, involved either in a side collision or a
His/her driver related factors are crash which does not collide with
improper/illegal driving behavior and moving vehicles
vision obscured and, Table IV.
He drives alone or with more than 4 DT model result in text
passengers form
1200
Figure 3.
NN model trained result
Figure 4 demonstrates the result applying the second subset of the data to test the NN
model. Among the output fields, the NN model has a very high percentage rate of
correctly predicting manner of collision recorded as unknown (98.99 percent) and side
collision (97.29 percent). It was because there are very few records in the data
categorized in these two categories. The accurate prediction percentage for rear and
head collision is 78.31 and 77.35 percent for angle collision. Furthermore, we had
Clementine create the coincidence matrix (Tables V-VIII) for the output fields except
the output of not colliding with a moving vehicle.
1201
Figure 4.
NN model test result
5. Summary
This research focuses on the discovery of relationships between motor vehicle
accidents and driver distractions. We used the data-mining tool, Clementine, to explore
the data derived from FARS. Three data mining models are used in this research. The
Kohonen network model is used to reveal the patterns of the input variables. The
decision tree model is then applied to classify the output variable manner of collision
with input variables, driver related factors, and driver’s basic demographic data. The
relationship between inattentive drivers with physical/mental conditions and crashes
is suggested by the results of the DT model in this research. Furthermore, a Neural
Network model is trained and tested to see how effective the model is built. Among the
IMDS five output variables, the accuracy percentage in predicting “rear and head collisions”
105,9 and “angle collisions” are 78 and 77 percent, respectively.
One of the major limitations of the research is the difficulty of correctly recording
the mental state of the drivers. In fact, it can be very controversial to define distractions
properly. Finally, the potential of a car crash due to driver distraction is rising as
wireless communication technologies and in-vehicle communication devices become
1202 increasingly pervasive.
1205
Figure A1.