Distractions and

The Emerald Research Register for this journal is available at The current issue and full text archive
ext archive of this journal is available at

www.emeraldinsight.com/researchregister www.emeraldinsight.com/0263-5577.htm
IMDS
105,9 Distractions and motor vehicle
accidents
Data mining application on fatality analysis
1188 reporting system (FARS) data files
Wen-Shuan Tseng, Hang Nguyen,
Jay Liebowitz and William Agresti
Graduate Division of Business and Management, Department
of Information Technology, Johns Hopkins University,
Rockville, Maryland, USA
Abstract
Purpose – This research applies data mining techniques to discover the relationship between driver
inattention and motor vehicle accidents.
Design/methodology/approach – The data used in this research is obtained from the Fatality
Analysis Reporting System of the National Highway Traffic Safety Administration, focused on the
Maryland and Washington, DC area from years 2000 to 2003. The data are first clustered using the
Kohonen networks. Then, the patterns and rules of the data are explored by decision tree and neural
network models.
Findings – Results suggests that when inattention and physical/mental conditions take place at the
same time, the driver has a higher tendency of being involved in a crash that collides into static
objects. Furthermore, with regards to the manner of collision, the relative importance of colliding into a
moving vehicle as the first harmful event is two times higher relative to that of colliding into a fixed
object as the first harmful event in a crash.
Research limitations/implications – The data used in this research are limited to fatal crashes
that happened in Maryland and Washington, DC from years 2000 to 2003.
Originality/value – This is one of the first research papers utilizing data mining techniques to
explore the possible relationships between driver inattention and motor vehicle crashes.
Keywords Data collection, Road vehicles, Accidents
Paper type Research paper
1. Introduction
Driver distraction, which has evolved as a significant issue of highway safety, is a
class of inattentive behaviors and mental status of drivers as defined by the AAA
Foundation for Traffic Safety (Stutts et al., 2001). In 1996, a study published by the
National Highway Traffic Safety Administration (NHTSA) found that approximately
25-30 percent of the injuries caused by car crashes were contributed by driver
distraction (Utter, 2001). In 1999, 11 percent of fatal crashes, corresponding to 4,462
fatalities, were due to inattention of drivers according to the Fatality Analysis
Industrial Management & Data Reporting System (FARS) (Utter, 2001). The use of cellular phones while driving has
Systems
Vol. 105 No. 9, 2005
pp. 1188-1205 The authors gratefully acknowledge the grant and support from the GEICO Educational
q Emerald Group Publishing Limited
0263-5577
Foundation, David Schindler, and Karen Watson for the GEICO Scholarship in Discovery
DOI 10.1108/02635570510633257 Informatics.
been given the most consideration. In 2003, the Harvard Center for Risk Analysis Distractions and
(HCRA) revealed that the risk of using cell phones while driving alone caused 330,000 motor vehicle
moderate to severe injuries and approximately 2,600 deaths each year (Sundeen, 2003).
In response, most states have considered legislation related to driver use of cell phone accidents
while driving.
This research program, funded by the GEICO Educational Foundation and
coordinated with the Department of Information Technology at Johns Hopkins 1189
University, is conducted with the hope of applying data mining techniques to explore
the patterns of distraction factors and traffic crashes. Particularly, we use data mining
as the main tool to understand and recognize the correlation of data from the crash
information provided by FARS. The data system FARS, initiated in 1975, was
developed by the National Center for Statistics and Analysis (NCSA) with the aim to
build up a safer traffic community within the 50 states, the District of Columbia, and
Puerto Rico. FARS contains data on fatal traffic crashes within 30 days of the crash
mainly from police crash reports. Traffic safety problems classified by FARS were
used in evaluating both motor vehicle safety standards and highway safety initiatives.
Data files of crashes collected by FARS were analyzed by using data mining.
This research applies three data mining techniques, including Kohonen networks,
decision trees, and neural networks to find combinations of distraction factors that help
to explain the high accident rates. Cluster detection of the collection of data is done
using Kohonen networks, in which inputs are topologically ordered to compete for a
signal output (Kohonen, 1995). Decision trees allow the exploration and classification of
data mathematically derived from the effect of each incident on successive events
(Marakas, 2003). The potential correlation of distraction and car accidents from the
FARS data files, in addition, is generated by the techniques of neural networks. Neural
networks are designed based on the knowledge of the foundation on biological models
of how the human brain works (Berry and Linoff, 2004). Thus, literally, it is a network
of connected neurons, which imitate the biological counterpart and deliver a helpful
output by combining various inputs together (Berry and Linoff, 2004). The findings of
this research will be applied for future study.
2. Literature review
According to NHTSA, driver distraction is a form of inattention when drivers shift
their attention away from the task at hand. Driver distractions can be classified into
two types (internal distraction and external stimuli) and four categories: visual (e.g.
reading a map), cognitive (e.g. lost in thought), auditory (e.g. respond to a ringing cell
phone) and biomechanical distraction (e.g. manually adjusting the radio volume)
(Ranney et al., 2000). Although it is possible that a driver can be engaged in one or more
activities at the same time, a distraction occurs when a driver “is delayed in the
recognition of information needed to safely accomplish the driving task because some
event, activity, object, or person within or outside the vehicle compels or induces the
driver, shifting attention away from the driving task” as defined by the AAA
Foundation for Traffic Safety (AAAFTS) (Stutts et al., 2001).
Driver distraction is not a new category of causations in car accidents. In fact, driver
inattention is one of the most prevalent causes of traffic crashes (Wang et al., 1996).
However, the concept of driver distractions was so subtle to the public that the
government and legislators did not pay much attention to driver distractions until the
IMDS 1990s when mobile phone use while driving aroused more and more attention for
105,9 distracted driving (“Status Report,” 2002).
Mobile phone usage while driving is a typical type of distraction in accidents. Many
studies concluded that, on the average, drivers talking on the mobile phones while
driving have higher risks in car accidents compared to non-mobile phone users
(Laberge-Nadeau et al., 2003; Wilson et al., 2003; Redelmeier and Tibshirani, 1997;
1190 Strayer and Drews, 2004). Based on telephone interviews of 4,010 drivers in the United
States, NHTSA and the Gallup Organization conducted a national survey of distracted
and drowsy driving attitudes and behaviors in 2002. The result suggested that 6 in 10
participating drivers have a mobile phone. Three in 10 of all drivers nationwide used
mobile phones while driving on at least some of their driving trips. Moreover, talking
with other passengers, tuning radios, reading a map, eating, and dealing with children
riding in the back seats were identified as more common distractions for driving
(Royal, 2003). Still, phoning while driving was among the first few causes that caught
people’s attention about distracted driving.
In addition to the studies by NHTSA, in Canada, two epidemiological studies were
conducted to compare two cohorts – mobile phone users and non-mobile phone users
while driving. One study showed that the use of mobile phones while driving could
increase four times the risk of collisions (Redelmeier and Tibshirani, 1997). In the other
study, Laberge-Nadeau et al. (2003) concluded that mobile phone users were facing 38
percent higher risks in injury collision for talking over mobile phone while driving.
Furthermore, among the mobile phone users, the result indicated that there was a
dose-response relationship between the frequency of cell phone use and crash risk. In
other words, frequent mobile phone users had higher relative risks than rare-users
(Laberge-Nadeau et al., 2003). Laberge-Nadeau et al.’s research, however, did not
provide precise data on cell phone usage while driving. Another similar research was
conducted in the Greater Vancouver Regional District, but using factor analysis and
logistic regression instead of epidemiological approach. Wilson et al.’s (2003) research
suggested that mobile phone users have higher risks of rear-end collisions, supporting
the findings in the work by Redelmeier and Tibshirani (1997) and Laberge-Nadeau et al.
(2003). In this research, Wilson et al. also discovered that the mobile phone users tend
to have more violations in “speeding, impaired driving, seat belt nonuse, aggressive
driving, and non-moving violations” (Wilson et al., 2003). Thus, life styles, attitude and
personality factors should be considered when assessing the direct risk attributable to
mobile phone use (Wilson et al., 2003).
Beside the focus on the relations between mobile phone use and car accidents, some
new studies have switched to compare the impact on driving performance and
behavior between talking over hand-held and hands-free mobile phone interfaces.
According to the study conducted by the University of Utah using driving simulator,
drivers’ reactions are 18 percent slower when using a mobile phone. In other words, the
performance of driver’s ages 18-25 talking on a mobile phone while driving is similar to
drivers of ages 65-74 driving without talking on a mobile phone. Moreover, the risk of
rear-end collision increases two times when drivers talk on a phone while driving
(Strayer and Drews, 2004).
Due to the fast growing mobile phone subscription rate and the car accidents
involving mobile phone use reported from time to time, the public has started to
recognize the distraction problem caused by mobile phones usage while driving. In
November 1997, NHTSA conducted an investigation of the safety implications of Distractions and
wireless communications in vehicles and concluded that conversations over mobile motor vehicle
phones while driving increased the risk of a crash (NHTSA, 1997). Not only did the
research emphasize the mobile phone usage while driving, NHTSA also broadened its accidents
research to all types of drivers’ distraction factors. According to NHTSA’s estimation,
25-30 percent of police reported crashes were caused by some forms of distractions.
NHTSA further estimated that these distractions caused 1.2 million crashes a year and 1191
12,000 fatalities (Sundeen, 2002; Shelton, 2001).
At this point, researchers began to realize that distraction in driving is a more
encompassing issue than merely wireless communication and technologies that were
first perceived as the main distraction for driving. In the NHTSA Driver Distraction
Expert Working Group meetings held in 2000, panelists concluded that many forms of
distraction existed, and that very little was known about the magnitude and
characteristics of the distraction problem (NHTSA, 2000). Hence, more and more
researchers commenced to explore the different variables in distracted driving.
A candid-camera survey on distraction conducted by University of North Carolina
at Chapel Hill and AAAFTS indicated that the use of wireless communication devices
ranked relatively low on the list of distractions. Thirty percent of drivers were
video-captured making phone calls while driving. The most frequently distraction
activities included 97 percent of drivers reaching or leaning, 91 percent caught
adjusting the radio, 77 percent ate or drank while driving, 46 percent groomed
themselves, and 40 percent seen reading or writing (Stutts et al., 2003). This research
also concluded that many distractions were “neither new nor technological in nature”
(Stutts et al., 2003). Instead, as the activities listed, there are aspects that drivers would
do but were seldom aware of (Stutts et al., 2003). The drawbacks of this research are the
small sample size of only 70 drivers and the relatively small number of hours analyzed
(Stutts et al., 2003).
In response to the surveys and studies, government and legislation have started to
view driver distractions seriously. Although the legislative issue is still being debated
throughout the United States, legislations and regulations were gradually established
in an attempt to protect drivers, passengers, and pedestrians from accidents caused by
distracted driving. New York was the first state to ban hand-held mobile phone while
driving in November 2001. In July 2004, the ban became effective in New Jersey and
Washington, DC (www.cellular-news.com/car _bans). One of the latest legislation
efforts in Maryland took place in March 2005. As an effort to be in line with the
legislations in Virginia, Washington, DC and 23 other states as well as to deal with
distracted driving among teenage drivers, the Maryland House of Delegates has
approved bills on restricting teenage passengers and wireless phone usage for teenage
drivers (Snyder and Williamson, 2005).
2.1 Database review

The NHTSA (URL: ftp://ftp.nhtsa.dot.gov/) is a comprehensive database that
congregates vehicle crash data from a variety of sources such as the Fatality
Analysis Reporting System (FARS), the National Automotive Sampling System
(NASS) Crashworthiness Data System (CDS), and the NASS General Estimates System
(GES). NHTSA also presents an effective method to identify emerging safety problems,
monitor trends, and evaluate the effectiveness of various countermeasures.
IMDS In the quest for valid datasets for our research, we focused on ones that identify
105,9 causes of accidents, especially related to driver’s distractions or inattention.
Furthermore, data quality is another important issue when selecting the database
for our research. Missing data and data inconsistency may cause problems such as
distribution distortion and correlation depression on k-nearest neighbor and neural
networks when interpreting the result of data mining (Brown and Kros, 2003). After
1192 reviewing the national databases in Table I, we have decided to adopt FARS since
FARS is the one database that fits our requirements about the data content and quality.
FARS is a collection of fatal crashes from police accident reports since 1975 in the
United States. Qualifying crashes recorded in the FARS database are those that had
motor vehicles involved, took place on an open traffic way, and resulted in the death of
a person within 30 days of the crash. The structure of the FARS database consists of
sections recording crashes from three perspectives: accident, vehicle and person. Since
1991, the FARS database began to record inattentive factors inside and outside of the
vehicle under the variable driver related factors (DR-CF1, DR-CF2, DR-CF3, and
DR-CF4) in the vehicle section.
The scope of this research is to examine the data that took place in Maryland
and Washington, DC from 2000 to 2003 in the FARS database. We apply data
mining techniques to the datasets in an attempt to find out the correlation between
inattention as well as other driver related factors and traffic crashes. First, we
merge the person file and vehicle file by the variable VEH_NO (Vehicle number – a
serial number for each vehicle involved in the recorded accidents). Next, we merge
the accident file to the previously merged person-vehicle file by the variable
ST_CASE (State case number – a serial number for each accident recorded in the
FARS database). At this point, we have 134 variables and 6,344 crash records that
match the logistic constraints of Maryland and Washington, DC from 2000 to 2003.
The driver distraction data, for example, eating and drinking, in the FARS database
were coded under one category – inattention in the driver related factors. Among
the records, we were able to identify 1,255 records of drivers that have been
involved in a fatal crash with “Inattention” listed as one of the driver related factors
that led to the crash. In other words, all the 1,255 drivers had being distracted or
inattentive when fatal crashes occurred. We identified fourteen variables for data
mining. Table II below lists the 14 variables.
3. Some data mining techniques

Data mining (DM) is an analytic methodology that uses statistical algorithms to
discover meaningful correlation in data from large collections of data. DM is also one of
the information technology enablers for implementing knowledge management (Wong,
2005). As DeVries et al. (2004) point out, relationships among variables are often hard
to determine. Therefore, the principles of DM are applied to describe relationships
among various attributes of data, predict future activities, or trends by using data
about the past. Analysis of data comes in two flavors – directed and undirected (Berry
and Linoff, 2004). Directed data mining focuses on specific target fields while
undirected data mining concentrates on patterns and resemblance among sets of data
without a predefined hypothesis.
Kohonen networks, decision trees, and neural networks are examples of data mining
techniques. The Kohonen networks were developed by the neurocomputing researcher
Distractions and
Database Description URL
motor vehicle
State data program The State Data Program provides www-nrd.nhtsa.dot.gov/ accidents
(SDP) details of crashes that contribute to departments/nrd-30/ncsa/SDP.
national databases of traffic safety html
such as FARS and NASS GES
Crash outcome data CODES reports the data at the crash www-nrd.nhtsa.dot.gov/ 1193
evaluation system scene and associates the outcome departments/nrd-30/ncsa/
(CODES) with vehicle registration, driver CODES.html
licensing, citation and roadway
inventory data in order to achieve an
inclusive data collection for highway
safety assessment
National automotive NASS collects data mainly from CDS www-nrd.nhtsa.dot.gov/
sampling systems (Crashworthiness Data System) and departments/nrd-30/ncsa/NASS.
(NASS) GES (General Estimates System), html; CDS: www-nrd.nhtsa.dot.
data systems on cases selected from a gov/departments/nrd-30/ncsa/
sample of police crash reports. CDS TextVer/CDS.html; GES: ftp://
data investigates the nature and ftp.nhtsa.dot.gov/ges/
severity of injuries and crash damage
to propose potential improvements in
vehicle design. GES, on the other
hand, assesses national police reports
on crashes of all types
Special crash SCI data files are conducted by www-nrd.nhtsa.dot.gov/
investigations (SCI) professional crash investigation departments/nrd-30/ncsa/sci.
teams. SCI provides a completeness html
of data range from basic data
maintained in routine police and
insurance crash reports to
comprehensive data
National Center for NCSA provides a comprehensive www-nrd.nhtsa.dot.gov/
Statistics and Analysis statistical data to analyze the nature, departments/nrd-30/ncsa/
(NCSA) causes, and injury outcomes of
crashes and proposes strategies to
reduce crashes and their
consequences
Fatality analysis FARS compiles data in three www-fars.nhtsa.dot. gov/main.
reporting system principal files, namely the accident, cfm
(FARS) vehicle, and person involving in a
crash in the form of fact sheets and
reports from within the States
Bureau of transportation BTS distributes valid and reliable www.bts.gov/about/
statistics (BTS) data that covers transportation in
every area of interest to support the
significant transportation policy
decision
Nationwide personal NPTS analyzes data from a http://nhts.ornl.gov/2001/
transportation survey nationally representative sample of html_files/download_directory.
(NPTS) households to derive statistically shtml
reliable travel estimates at the
national level Table I.
List of the national
Source: NHTSA (2003) databases
IMDS Variable Description Derived category (examples used)
105,9
AGE Driver’s age over a 10-year interval 10 to 19 20 to 29 30 to 39 40 to 49
50 to 59 60 to 69 70 and over
AVOID Crash avoidance maneuver 0 No avoidance maneuver reported
1 Braking
2 Steering
1194 3 Steering and braking
4 Other avoidance maneuver
5 Not reported
Day_week Day of the week crash happened 1 Monday 2 Tuesday 3 Wednesday 4
Thursday 5 Friday 6 Saturday 7 Sunday
DR_CF1 Driver related factors in 0 None
the crash (99 factors) 1 Physical/Mental condition
DR_CF2 2 Improper/Illegal driving behavior
DR_CF3 3 Vision obscured
DR_CF4 4 Inattention
5 Others
HARM_EV The first harmful event 1 Motor vehicle in transport
applies to the crash (50 events) 2 Fixed object (tree, utility pole, etc.)
3 Moving object (animal, pedestrian, etc.)
M_HARM The most harmful event applies to 4 Vehicle (overturn, fire/explosion, etc.)
the vehicle (50 events same as above) 5 Occupants (fell from vehicle, etc.)
6 Others
MAN_COLL Manner of collision (9 items) 0 Not collision with motor vehicle in transport
1 Rear and Head (rear-end, head-on, etc.)
2 Angle (front-to-side, etc.)
3 Side (sideswipe)
4 Unknown
SEX Driver’s gender 1 Male
2 Female
3 Unknown
OCUPANTS Occupants in the motor vehicle recorded 1 One occupant
2 Two occupant
...
VEH_MAN Vehicle maneuver before crash 1 Going straight
2 Slowing or stopping in traffic lane
3 Passing or overtaking another vehicle
4 Parked
5 Turning right
6 Turning left
7 U-Turn
8 Changing lanes or merging
9 Other
10 Unknown
WEATHER Weather condition at the time of crash 1 No adverse atmospheric conditions
2 Rain
3 Sleet
4 Snow
5 Fog
6 Rain and fog
7 Sleet and fog
Table II. 8 Other
Major variables 9 Unknown
Teuvo Kohonen. Although the original application of the networks is used for images Distractions and
and sounds, they can also be used to recognize cluster in data. A Kohonen network motor vehicle
consists of interconnected processing units, which are usually arranged in rectangular
and linear maps. In particular, patterns of data are organized into input vectors of a accidents
discrete map with one or two dimensions and follow a topologically order and compete
for a signal (Kohonen, 1995). After a cluster of data is achieved, decision tree and neural
network models will be put in the play for a further understanding of the patterns of 1195
the data.
A decision tree is a structure that can be used to divide up a large collection of
records into successively smaller sets of records by applying a sequence of simple
decision if-then rules (Berry and Linoff, 2004). Decision trees merge both data
exploration and modeling, so it is useful for exploring data to gain insight into the
relationships of a large number of input variables to a target variable. According to
Breiman et al. (1984), a popular approach to tree-structured modeling consists of two
separate phases: growing and pruning. In tree growing, data is split into small nodes,
referred as “child nodes” (Breiman et al. 1984). Because the ultimate goal of tree
growing is a wholesome outcome, tree growing is limited by sample size, node
homogeneity, or stopping rules. After the trees have been grown, algorithms are used
to prune the tree back. Tree pruning is usually guided by cross-validation within the
training data set (Miller, 2005).
Like a tree, a decision tree includes a root node, child nodes, leaf nodes, and
branches. The decision tree algorithm starts growing at the root node, and stops
growing new nodes when no further splits are necessary. Based on the decision tree
rules, a root node splits into two or more child nodes, and each child splits into one or
more leaf nodes, which lead to a decision or a set of possible answers. Branches connect
one node to another. For each record to be classified, the values in its fields determine
which branches it takes in the tree. The path ends at a leaf node that classifies the
record.
Classification and regression trees (CART) (in Clementine, C&RT), C4.5 (in
Clementine, C5.0), and CHAID are three decision tree algorithms. CART transforms a
data model into a binary tree by splitting the records at each node base on a single
input field. The C4.5 algorithm produces a tree with varying numbers of branches at
each node. For categorical variables, C4.5 assumes one branch for each value of the
variable and comes with a companion program for turning trees into rules. C5.0 in
Clementine is a commercial version of C4.5. Unlike CART and C4.5, CHAID tries to
stop growing the tree before over-flitting occurs.
Artificial neural networks (ANN) are a data mining technique which is
constructed in the inspiration of the biological models of the human nervous system.
Neural networks provide a proven track record in such ways that they are generally
employed in data mining as tools for prediction, classification, and clustering (Berry
and Linoff, 2004). Neural networks modify the biological counterpart in the
interconnection to one another and how they work. Neural networks can be
constructed in feed-forward or feedback architectures. Feed-forward allow signals to
travel one way only from input to output, whereas, the feedback networks can have
signals traveling in both directions by introducing loops in the network. Feedback
networks are dynamic and their state is changing continuously until they reach an
equilibrium point. Thus, feedback networks can become extremely complicated.
IMDS According to Berry and Linoff (2004), the most common neural network is a
105,9 feed-forward network. Feed-forward ANNs tend to be straightforward networks that
associate inputs with outputs. They are extensively used in pattern recognition.
4. Application of data mining to distractions and motor vehicle accidents

1196 For our research, we used SPSS Clementine, one of the most widely used DM software,
according to KDnuggets.com’s poll in May 2004 (www.kdnuggets.com/polls/2004/
data_mining_software.htm), to mine the data derived from the FARS database in three
models. First, we built a Kohonen networks model to cluster data into groups with
similar features. This is our first attempt to better understand the data. Next, we used
decision trees (DT) (i.e. C&RT) to further classify relationships among the variables we
have identified. With DT, we classify the dependent variable manner of collision
(MAN_COLL) based on the if-then rules of other independent variables (SEX,
OCUPANTS, DR_CF). Lastly, we applied the NN model to predict the manner of
collision. The Appendix shows the Clementine data streams that we created for use in
this data mining application.
4.1 Kohonen network model

Since Kohonen networks do not use a target field, the 14 variables identified in Section
2 are loaded to the Kohonen node. With the feature of self-organization map in the
Kohonen Node, Clementine observes and divides the data into 11 clusters (Table III).
Among these clusters, six, with the higher number of records, were selected to build a
feedback graph (Figure 1). According to the feedback graph, variables such as AGE,
Day_week, MAN_COLL, OCUPANTS, and SEX are identified as strong units. These
strong units represent probable cluster centers. On the contrary, DR_CF1 and
WEATHER act as weak units in the model because they do not bear significant weight
when Kohonen node clusters the data.
4.2 Decision tree (C&RT) model

Unlike using all identified variables in the Kohonen network for clustering, we picked
SEX, OCUPANTS, AGE, DR_CF, and MAN_COLL for the DT model. Our goal was to
classify the manner of collision by drivers and their driver related factors. Figure 2
below displays the results from the C&RT model. Table IV provides the if-then rules in
text form of the result. The most significant result we derived from this C&RT model is
that if a driver, besides inattention, has a physical/mental condition, he or she is more
likely to be involved in a crash that collides with fixed objects instead of moving
vehicles transportation. Examples of recorded physical/mental condition in the
FARS database include: fatigued, drowsy, emotional, drugs-medication, physical
impairment, etc.
4.3 Neural network model

Aside from describing relationships among variables in previous sections, DM can also
be applied to predict future activities based on the data in the past. In this section, we
use a NN to conduct prediction on manner of collision. One of the advantages of NN is
its ability to handle multiple dependent variables. In this model, we divide the data into
two subsets – one to train the model and the other to test the model. In addition, we use
Records AGE Percent Day_week Percent DR_CF1 Percent MAN_COLl Percent OCUPANTS Percent SEX Percent WEATHER Percent
Cluster 1 250 20-30, 30-40 22.40 7 21.60 6 97.60 2 36.80 1 99.20 1 100.00 1 86.80
Cluster 2 320 20-30 28.75 7 21.25 6 92.81 0 100.00 1 100.00 1 99.69 1 93.44
Cluster 3 94 Over 70 21.28 2 24.47 6 95.74 4 94.68 1 98.94 1 95.74 1 87.23
Cluster 4 2 20-30 100.00 5 100.00 6 100.00 3, 8 50.00 1 100.00 1 100.00 1 100.00
Cluster 5 61 20-30 39.34 7 27.87 6 91.80 0 100.00 3 59.20 1 100.00 1 85.25
Cluster 6 65 Over 70 36.92 2 20.00 6 98.46 4 78.46 1 93.85 2 100.00 1 95.38
Cluster 7 9 30-40, 40-50 33.33 1, 3 22.22 6 100.00 4 88.89 3 66.67 1 100.00 1 88.89
Cluster 8 83 20-30 39.76 6 24.10 6 92.77 0 56.63 2 55.42 1 100.00 1 92.77
Cluster 9 187 20-30 21.93 6 19.25 6 96.79 0 39.57 1 58.29 2 100.00 1 91.44
Cluster 10 49 20-30 40.82 1 24.49 6 97.96 0 42.86 2 100.00 2 100.00 1 77.55
Cluster 11 137 30-40 29.20 1 27.10 6 96.35 0 29.93 2 100.00 1 100.00 1 85.40
accidents
motor vehicle
Kohonen network cluster

Distractions and
result
Table III.
1197
IMDS
105,9
1198
Figure 1.
Kohonen network
feedback report
Figure 2.
DT model result
Distractions and
No. Leaf node IF Then
motor vehicle
1 C, K, L, M His/her driver related factor is a This driver will more likely be accidents
physical/mental condition involved in a crash which does not
collide with moving vehicles
2 H, I A female driver is under 70 and, This driver will more likely be
Her driver related factors are involved in an angle collision 1199
improper/illegal driving behavior and
obscured vision
3 J A female driver is under 70 and, This driver will more likely be
Her driver related factors are involved in a crash which does not
improper/illegal driving behavior and collide with moving vehicles
obscured vision and,
Physical/mental condition
4 A, B A driver is over 70, This driver will more likely be
His/her driver related factors are involved in an angle collision
obscured vision
5 D A male driver is between 30 and 40 or This driver will more likely be
between 50 and 70 years old and, involved in an angle collision
His driver related factors are
vision obscured and,
There are 2 to 4 occupants (including
the driver)
6 E A male driver is between 40 and 50 or
between 20 and 30 years old and,
His driver related factors are This driver will more likely be
improper/illegal driving behavior and involved in a crash which does not
vision obscured and, collide with moving vehicles
There are 1 to 3 passengers
7 F, G A male driver is under 70 years old This driver will more likely be
and, involved either in a side collision or a
His/her driver related factors are crash which does not collide with
improper/illegal driving behavior and moving vehicles
vision obscured and, Table IV.
He drives alone or with more than 4 DT model result in text
passengers form
DR_CF, HARM_EV, M_HARM, OCUPANTS, SEX, and WEATHER as input

variables; we set MAN_COLL as output variables. We have 29 neurons in the input
layer, three neurons in the hidden layer, and five neurons in the output layer in this NN
model.
Figure 3 shows the result of the trained model. The estimated accuracy of the
model is 91.05 percent. We also conducted a sensitivity analysis to the trained NN
model. The three highest values for relative importance of inputs are colliding with a
motor vehicle in transport (0.21), colliding into fixed objects (0.11), and occupants
(0.06).
IMDS
105,9
1200
Figure 3.
NN model trained result
Figure 4 demonstrates the result applying the second subset of the data to test the NN
model. Among the output fields, the NN model has a very high percentage rate of
correctly predicting manner of collision recorded as unknown (98.99 percent) and side
collision (97.29 percent). It was because there are very few records in the data
categorized in these two categories. The accurate prediction percentage for rear and
head collision is 78.31 and 77.35 percent for angle collision. Furthermore, we had
Clementine create the coincidence matrix (Tables V-VIII) for the output fields except
the output of not colliding with a moving vehicle.
4.4 Limitations of the research

There are a number of limitations to this study. First, inattention or distraction has a
cognitive aspect that can be difficult to be recorded and reported objectively. Second,
the datasets from FARS are the records from fatal motor vehicle accidents. Therefore,
the analysis is limited to fatal crashes and cannot be extended to general accidents.
Distractions and
motor vehicle
accidents
1201
Figure 4.
NN model test result
Furthermore, the sample we used is limited to Maryland and Washington, DC in 2000

through 2003. Finally, the data “inattention” we used is a comprehensive driver-related
factor that encompasses most of the driver distractions.
5. Summary
This research focuses on the discovery of relationships between motor vehicle
accidents and driver distractions. We used the data-mining tool, Clementine, to explore
the data derived from FARS. Three data mining models are used in this research. The
Kohonen network model is used to reveal the patterns of the input variables. The
decision tree model is then applied to classify the output variable manner of collision
with input variables, driver related factors, and driver’s basic demographic data. The
relationship between inattentive drivers with physical/mental conditions and crashes
is suggested by the results of the DT model in this research. Furthermore, a Neural
Network model is trained and tested to see how effective the model is built. Among the
IMDS five output variables, the accuracy percentage in predicting “rear and head collisions”
105,9 and “angle collisions” are 78 and 77 percent, respectively.
One of the major limitations of the research is the difficulty of correctly recording
the mental state of the drivers. In fact, it can be very controversial to define distractions
properly. Finally, the potential of a car crash due to driver distraction is rising as
wireless communication technologies and in-vehicle communication devices become
1202 increasingly pervasive.
MC_A 0 1 Number of actual records

Table V.
Rear and head collision 0 394 82 476
(MC_A) coincidence 1 54 97 151
matrix Number of predicted records 448 179 627
MC_B 0 1 Number of actual records
Table VI. 0 397 50 447

Angle collision (MC_B) 1 92 88 180
coincidence matrix Number of predicted records 489 138 627
MC_C 0 1 Number of actual records
Table VII. 0 610 0 610

Side collision (MC_C) 1 17 0 17
MC_9 0 1 Number of actual records

Table VIII.
Manner of collision 0 620 0 620
unknown (MC_9) 1 7 0 7
References Distractions and
Berry, M. and Linoff, G. (2004), Data Mining Techniques, Wiley, Indiana. motor vehicle
Breiman, J., Friedman, R., Olshen, A. and Stone, C. (1984), Classification and Regression Trees, accidents
Chapman & Hall, New York, NY.
Brown, M.L. and Kros, J.H. (2003), “Data mining and the impact of missing data”, Industrial
Management & Data Systems, Vol. 103 No. 8, pp. 611-21.
DeVries, P., Mulig, E.V. and Lowery, K. (2004), “A useful tool for data scanning in executive
1203
information systems: schematic faces”, Industrial Management & Data Systems, Vol. 104
No. 8, pp. 644-9.
Kohonen, T. (1995), Self-Organizing Maps, Springer-Verlag, Berlin.
Laberge-Nadeau, C., Maag, U., Bellavance, F., Lapierre, S.D., Desjardins, D. and Messier, S. et al.,
(2003), “Wireless telephones and the risk of road crashes”, Accident Analysis and
Prevention, Vol. 35, pp. 649-60.
Marakas, G. (2003), Modern Data Warehousing, Mining, and Visualization, Pearson Education,
Inc., Upper Saddle River, NJ.
Miller, T. (2005), Data and Text Mining: A Business and Applications Approach, Pearson
Education, Inc., Upper Saddle River, NJ.
NHTSA (1997), An Investigation of the Safety Implications of Wireless Communications in
Vehicles, Report DOT HS 808-635, National Highway Traffic Safety
Administration, Washington, DC, available at: www.nhtsa.dot.gov/people/injury/
research/wireless
NHTSA (2000), NHTSA Driver Distraction Expert Working Group Meetings: Summary and
Proceedings, National Highway Traffic Safety Administration, Washington, DC, available
at: www-nrd.nhtsa.dot.gov/pdf/nrd-13/GroupProceedings.pdf
Ranney, T.A., Mazzae, E., Garrott, R. and Goodman, M.J. (2000), “NHTSA driver distraction
research: past, present and future”, available at: www-nrd. nhtsa.dot.gov/departments/
nrd-13/driver-distraction/PDF/233.pdf
Redelmeier, M.D. and Tibshirani, R.J. (1997), “Association between cellular-telephone calls
and motor vehicle collisions”, The New England Journal of Medicine, Vol. 336 No. 7,
pp. 453-8.
Royal, D. (2003), “Volume I: findings; national survey of distracted and drowsy driving attitudes
and behavior: 2002”, Report DOT HS 809 566, Washington, DC.
Shelton, L.R. (2001), “Statement before the Subcommittee on Highways and Transit Committee
on Transportation and Infrastructure”, US House of Representatives, available at: www.
nhtsa.dot.gov/ nhtsa/announce/testimony/distractiontestimony.html (accessed 9 May
2001).
Snyder, D. and Williamson, E. (2005), “Teen driving limits clear md house; restrictions on
cell phone use and passengers are approved”, The Washington Post, p. B1, 18
March.
Status Report (2002), Insurance Institute for Highway Safety, Arlington, VA, 17 August.
Strayer, D.L. and Drews, F.A. (2004), “Profiles in driver distraction: effects of cell phone
conversations on younger and older drivers”, Human Factors, Vol. 46 No. 4, pp. 640-9.
Stutts, J., Reinfurt, D., Staplin, L. and Rodgman, E. (2001), The Role of Driver Distraction in
Traffic Crashes, AAA Foundation for Traffic Safety, Washington, DC.
Stutts, J., Feaganes, J., Rodgman, E., Hamlett, C., Meadows, T. and Reinfurt, D. et al. (2003),
Distractions in Everyday Driving, AAA Foundation for Traffic Safety in Washington,
IMDS The University of North Carolina Highway Safety Research Center, Washington, DC, Final
Report.
105,9 Sundeen, M. (2002), “Along for the ride: reducing driver distractions”, paper presented at
National Conference of State Legislatures, Final Report of the Driver Focus and
Technology Forum, Denver, CO.
Sundeen, M. (2003), “Cell phones and highway safety: 2003 state legislative update”, paper
1204 presented at National Conference of State Legislatures.
Wang, J.S., Knipling, R.R. and Goodman, M.J. (1996), “The role of driver inattention in crashes;
new statistics from the 1995 crashworthiness data system”, Proceedings of 40th Annual
meeting of the Association for the Advancement of Automotive Medicine, Vancouver,
October.
Wilson, J., Fang, M., Wiggins, S. and Cooper, P. (2003), “Collision and violation involvement
of drivers who use cellular telephones”, Traffic Injury Prevention, Vol. 4 No. 1,
pp. 45-52.
Wong, K.Y. (2005), “Critical success factors for implementing knowledge management in small
and medium enterprises”, Industrial Management & Data Systems, Vol. 105 No. 3,
pp. 261-79.
Utter, D. (2001), “Passenger vehicle driver cell phone use results from the fall 2000 national
occupant protection use survey”, Report No. DOT HS 809 293.
Appendix Distractions and
motor vehicle
accidents
1205
Figure A1.

Distractions and

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Distractions and

Загружено:

Авторское право:

Доступные форматы

The Emerald Research Register for this journal is available at The current issue and full text archive

ext archive of this journal is available at

2.1 Database review

3. Some data mining techniques

4. Application of data mining to distractions and motor vehicle accidents

4.1 Kohonen network model

4.2 Decision tree (C&RT) model

4.3 Neural network model

Kohonen network cluster

DR_CF, HARM_EV, M_HARM, OCUPANTS, SEX, and WEATHER as input

4.4 Limitations of the research

Furthermore, the sample we used is limited to Maryland and Washington, DC in 2000

MC_A 0 1 Number of actual records

MC_B 0 1 Number of actual records

Table VI. 0 397 50 447

MC_C 0 1 Number of actual records

Table VII. 0 610 0 610

MC_9 0 1 Number of actual records

Вам также может понравиться