Академический Документы
Профессиональный Документы
Культура Документы
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 130
Data Mining Based on Extreme Learning Machines
for the Classification of Premium and Regular
Gasoline in Arson and Fuel Spill Investigation
S. O. Olatunji, Imran A. Adeleke, Alaba Akingbesote
Abstract— In this work, we developed a data mining approach based on extreme learning
machines (ELM) for identifying gasoline types. Detection and correct identification of
gasoline types during Arson and Fuel Spill Investigation are very important in forensic
science. As the number of arson and spillage becomes a common place, it becomes more
important to have an accurate means of detecting and classifying gasoline found at such
sites of incidence. However, currently only a very few number of classification models have
been explored in this germane field of forensic science, particularly as relates to gasoline
identification. The proposed model was constructed using gas chromatography–mass
spectrometry (GC–MS) spectral data obtained from gasoline sold in Canada over one
calendar year. Prediction accuracy of the model was evaluated and compared with earlier
used methods on the same datasets. Empirical results from simulation showed that the
proposed ELM based approach achieved better performance compared to other earlier
implemented techniques.
Index Terms— Data mining, Extreme Learning Machine; gas chromatography–mass spectrometry; Pattern recognition;
Principal Component Analysis; Artificial Neural Networks.
• S.O. Olatunji is on study leave from the Computer Science Department, Adekunle Ajasin University Akungba Akoko, Ondo State Nigeria
• Imran A. Adeleke is with the Faculty of Computer Science and Information Systems, University technology Malaysia
• Alaba Akingbesote is with the Computer Science Department, Adekunle Ajasin University, Akungba Akoko, Nigeria
—————————— —————————— is used to fingerprint fuel spills, with the gas
I. INTRODUCTION chromatograms of the spill sample and the
The importance of detection and accurate different candidate fuels compared visually in
classification of gasoline for both arson and order to seek a best match. This method has some
environmental spills investigation cannot be over- shortcomings. One problem with this technique is
emphasized. In this work, ELM was used to that the interpretation and classification of the data
classify premium and regular gasolines from gas is limited by the skill and experience of the
chromatography–mass spectrometry (GC–MS) analyst. Also, visual analysis of gas
spectral data [1, 2] obtained from gasoline sold in chromatograms is subjective and is not always
Canada over one calendar year [3, 4]. In arson, persuasive in a court of law. Pattern recognition
petroleum-based accelerants such as gasoline, methods offer a better approach to the problem of
kerosene, and paint thinners are often used to matching gas chromatograms of weathered fuels
accelerate a fire. In some cases, liquid accelerant [5]. Pattern recognition methods involve less
is left at the scene, which may be matched to subjectivity in the interpretation of the data and
samples that are associated with the suspect. In the are capable of identifying the samples correctly.
environment, gasoline spills are commonplace, but Among the pattern recognition methods that
identification of the source is not always were used for this identification purpose was
straightforward. artificial neural network and principal component
The identification of gasoline is crucial for the analysis (PCA) [6]. Unfortunately, accuracy of
successful prosecution of an offending individual some of these earlier approaches is often limited
and/or company. Frequently, gas chromatography and is sometimes bedeviled with problems like
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 3, MARCH 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 131
over-fitting. Recently, ELM has been proposed as II. EXTREME LEARNING MACHINES
a new intelligence framework for both prediction Extreme learning machine (ELM) is a recently
and classification [7-11]. It has featured in a wide introduced learning algorithm for single-hidden
range of journals, often with promising results. layer feed-forward neural networks (SLFNs)
In this work, we developed ELM based which randomly chooses hidden nodes and
identification model for identifying gasoline types. analytically determines the output weights of
The model is constructed using gas SLFNs. In general, the learning rate of feed-
chromatography–mass spectrometry (GC–MS) forward neural networks (FFNN) is time-
spectral data obtained from gasoline sold in consuming than required and this is becoming
Canada over one calendar year as contained in [6]. bottleneck in their applications. According to [9],
Prediction accuracy of the model is evaluated and there are two main reasons behind this behavior,
compared with earlier used methods on the same one is slow gradient based learning algorithms
datasets. Based on the excellent performance of used to train neural network (NN) and the other is
ELM on various identification problems surveyed the iterative tuning of the parameters of the
in the literature coupled with the empirical results networks by these learning algorithms. To
from our simulation, we found out that ELM overcome these problems, [7-9] proposed a
based model produced accurate and promising learning algorithm called extreme learning
results better than or at least equal to the best machine (ELM) for single hidden layer feed-
among the other earlier implemented methods on forward neural networks (SLFNs). It is stated that
the same datasets. To demonstrate the usefulness “In theory, this algorithm tends to provide the best
of the Extreme Learning Machines technique as generalization performance at extremely fast
regards spill and arson investigation in particular learning speed since it is a simple tuning-free
and forensic science in general, we described both algorithm” [8].
the steps and the use of Extreme Learning The advent of ELM has been viewed as good
Machine as a Pattern Recognition modeling and timely happening, as in the past, it seems that
approach for identifying liquid accelerant left at there exists an unbreakable virtual speed barrier
the scene of arson and spill which may be matched which classic learning algorithms cannot break
to samples that are associated with the suspects. through and therefore feed-forward network
An ELM classifier has been developed and used implementation then take a very long time to train
to identify gasoline types in arson and spill itself, independent of the application type whether
investigation. Comparative studies were also simple or complex. Also ELM tends to reach the
carried out to compare the performance of ELM as minimum training error as well as it considers
a classifier with the performance of other magnitude of weights which is opposite to the
classifiers that were already used for this same classic gradient-based learning algorithms which
purpose, using the same datasets, such as ANN only intend to reach minimum training error but
and PCA. This study also presents a comparison do not consider the magnitude of weights. Also
of ELM to PCA and ANNs for the classification of unlike the classical gradient-based learning
liquid premium and regular gasoline from their algorithms which only work for differentiable
GC–MS chromatograms. The ANN referred to activation functions ELM learning algorithm can
was trained using the classical back-propagation be used to train SLFNs with non-differentiable
learning algorithms [6], for interested readers, activation functions [7]. According to [9], “Unlike
details and structures of both PCA and ANN the traditional classic gradient-based learning
referred to can be found in [6]. ELM have been algorithms, like back-propagation method, facing
used in the past for many different types of pattern several issues like local minimum, improper
recognition problems, but this is the first report of learning rate and over-fitting, etc, the ELM is a
applying ELM for the classification of summer simple tuning-free three-step algorithm that tends
and winter, premium and regular grade gasoline to reach the solutions straightforward without such
from GC–MS chromatograms data. trivial issues”.
A. The Learning Process for the Proposed
ELM BASED Framework
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 3, MARCH 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 132
Let us first define the standard SLFN (single- thus the training procedures for the proposed ELM
hidden layer feed-forward neural networks). If we based framework can be summarized in the
have N samples (xi, ti), where xi = [xi1, xi2, … , following algorithmic steps. See [8, 9] for further
xin]T ∈ Rn and ti = [ti1, ti2, … , tim]T ∈ Rm, then the details on the workings of ELM algorithm.
~
standard SLFN with N hidden neurons and
activation function g(x) is defined as: Input- The inputs to the system, include the
N% inputs parameters-i.e. independent variables (input
∑ β i g ( wi . x j + bi ) = o j , j =1,..., N 1 xi ∈ Rn ) and target variable – i.e. dependent
i =1 variable (ti ∈ Rm), activation function, and the
~
T number of hidden neuron, N.
where wi = [wi1, wi2, … , win] is the weight
vector that connects the ith hidden neuron and the Output- The outputs of the ELM system are the
input neurons, βi = [βi1, βi2, … , βim]T is the weight target output values and weights of the layer.
vector that connects the ith neuron and the output Mathematically, given a training set
{
neurons, and bi is the threshold of the ith hidden N = ( xi , t i ) | xi ∈ R , t i ∈ R , i =1, ..., N ,
n m
}
neuron. The “.” in wi . xj means the inner product activation function g(x), and the number of hidden
of wi and xj. neuron = N%, then, do the following:
SLFN aims to minimize the difference between oj
and tj. This can be expressed mathematically as: Step 0: Initialization. Assign random values to
N% ~
∑ β i g ( wi . x j + bi ) = t j , j =1,..., N (2) the input weight wj and the bias bj, j = 1, ... , N
i =1 Step 1: Find the hidden layer output matrix H.
Step 2: Find the output weight β as follows:
Or, more compactly, as: β = H†T
Hβ =T (3) where β, H and T are defined in the same way they
were defined in the SLFN specification above
(equations 1, 2, and 3).
where
III. EMPIRICAL STUDIES
H ( w1 , ... , w
%
N
, b1 , ...,N
b
% , x1 N, =
..., Ψ
x ) (4a)
A. Experimental data
With The experimental data was originally taken
from a Canadian Petroleum Products Institute
g ( w1 . x1 + b 1 ) ... g ( wN%. xN%+ b N%) Report of the composition of unleaded summer
and winter gasolines in 1993 [5] and also reported
. . in [6]. In this report, 44 samples of regular
Ψ = . ... . gasoline (22 winter, 22 summer), and 44 samples
. . of premium gasoline (22 winter, 22 summer) were
g ( w . x + b ) ... g ( w %. x + b %) analyzed by GC–MS. The gasolines were
1 N 1 N N N N × N% collected over the course of one year from
β1T T1T different regions across the country. Forty-four
compounds that may be present in automotive
. . gasoline in concentrations of greater than 1% were
β = . and T = . (4b) reported.
. .
β T T T%
B. Determining the Optimal Parameters through
N%
N%×m N N ×m Cross-Validation
validation method. These criteria also provide a parameters. The optimal values of the parameters
more accurate evaluation of the classifier. and the kernel option associated with the best
An experimental procedure based on test-set- performance measure were identified. A summary
cross-validation was employed in our study. We
of the procedure is as follows.
used the stratified sampling approach to divide the
data set into both training and testing data, such Step 1: Choose the initial “activation function”
that the size of the training set is 70% of the option from the list of available options.
available data and the testing is the rest. The Step 2: Identify the best values of the number
parameters associated with the ELM were hidden neuron through a test-set –cross-
optimized through a test-set-cross-validation on validation and store the corresponding
the available data set. This entire process was performance measures.
repeated 10 times with different random splitting Step3: If there is no activation function option
of the training and testing data sets using the left, then go to Step 4. Otherwise, add the
stratified sampling approach; the final results were next activation function option and go to
averaged over 10 runs. Step 2.
The details of the test-set-cross-validation for Step 4: Identify the best performance measure and
optimizing the ELM parameters goes thus: For its associated parameters values.
each run of generated training and testing set, the Step 5: Use the optimized activation function
percentage of correctly classified samples for a option and the parameters values to train
group of parameters, viz: number of hidden the final ELM
neuron and activation function type were noted. Step 6: Calculate the performance measures for
Searching through all possible values of the both the training and testing sets using the
parameters in a given range will identify the best system obtained in the previous Step 5.
value of the performance measure and the This is presented in mathematical form as
corresponding values of the parameters for the follows:
fixed set of features. The optimal values of the Let the set A contains all the possible activation
parameters with the best performance measure functions options, the element of A is of
were identified. A summary of the procedure is as
follows. the shape Ai ( j ) , where i is the activation
C. Optimal parameters search procedure for function number, j is the selected number
ELM of hidden neuron, nf is the total number of
The parameters associated with the ELM were activation functions available, and nh is the
optimized through a test-set-cross-validation on maximum number of hidden neuron
the available data set. The details of the test-set- assumed. Also pm represents performance
cross-validation for optimizing the ELM measure taken, ix represents index for best
parameters goes thus: For each run of generated activation function, jx represents index
training and testing set, the values of RMSE and for best number of layer.
The algorithm then goes thus:
Correlation Coefficient were monitored for a
group of parameters that include number of hidden Initialization; jx=0, vx=0, ix=0
neurons (N) and activation functions (AF). for i = 1 → nf
Searching through all possible values of the for j=1 j = 1 → nh
parameters in a given range will identify the best pm = f ( Ai ( j )) {Performance measure
performance measures and the corresponding
for the present parameters combination}
values of the parameters for the fixed set of
if pm is better than vx then vx = pm
features. In our experiment, this process was
ix = i ; jx = j {storing the index of the
repeated for every ELM activation function
better parameter}
available, each time with an incremental step of
end
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 3, MARCH 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 134
end
field of forensic science, during arson and oil correct classification just like the ELM did. But on
spillage investigation. four-class classification of gasoline, ANN
performance on the testing set reduced to 84.09%
which is far lower than that of ELM that stayed at
Table 2: The percentage of correctly classified 96.47%. Thus we conclude here that ELM has
PUW, RUW, PUS, and RUS again distinguished itself as a viable tool in the
field of forensic science for correct classification
of gasoline samples in the course of arson and oil
spill investigations. Also it could also serve as
powerful classificatory tool in other fields of
forensic science like in the identification of broken
glasses found at the scene of crime during arson
investigation and the likes.
REFERENCES
Alaba Akingbesote is with the Computer Science Department, Adekunle
Ajasin University, Akungba Akoko, Nigeria. He received the B. Sc. And
M.Sc. in Computer Science from Ogun State University, AgoIwoye and
Federal University of Technology, Akure, Nigeria respectively. He is
currently pursuing his PhD in Computer Science. He has several years of
experience as a lecturer of computer science.