2007.fuzzy Integral-Based Perceptron For Two-Class Pattern Classic at Ion Problems

Information Sciences 177 (2007) 1673–1686
www.elsevier.com/locate/ins
Fuzzy integral-based perceptron for two-class

pattern classification problems
Yi-Chung Hu *
Department of Business Administration, Chung Yuan Christian University, 200, Chung Pei Road, Chung-Li 32023, Taiwan, ROC
Received 16 August 2005; received in revised form 16 September 2006; accepted 20 September 2006
Abstract
The single-layer perceptron with single output node is a well-known neural network for two-class classification prob-
lems. Furthermore, the sigmoid or logistic function is usually used as the activation function in the output neuron. A crit-
ical step is to compute the sum of the products of the connection weights with the corresponding inputs, which indicates
the assumption of additivity among individual variables. Unfortunately, because the input variables are not always inde-
pendent of each other, an assumption of additivity may not be reasonable enough. In this paper, the inner product can be
replaced with an aggregation value obtained by a useful fuzzy integral by viewing each of the connection weights as a value
of a k-fuzzy measure for the corresponding variable. A genetic algorithm is then employed to obtain connection weights by
maximizing the number of correctly classified training patterns and minimizing the errors between the actual and desired
outputs of individual training patterns. The experimental results further demonstrate that the proposed method outper-
forms the traditional single-layer perceptron and performs well in comparison with other fuzzy or non-fuzzy classification
methods.
Ó 2006 Elsevier Inc. All rights reserved.
Keywords: Single-layer perceptron; Fuzzy integral; Pattern classification; Genetic algorithm; Neural network
1. Introduction
Pattern classification is a problem that partitions a pattern space into classes and then assigns a pattern to
one of those classes [18]. The traditional single-layer perceptron (SLP) with single output node has a simple
structure, but could find half planes bounded by a hyperplane [29], and has been a widely used tool for
two-class classification problems, such as financial distress analysis [2]. A nonlinear function, such as the
sigmoid function whose output ranges from zero to one, is commonly used as the activation function of the
output node. By using such a function, the performance errors between the actual and desired outputs of
individual training patterns can be measured [14]. Then, the back-propagation algorithm (BP) using a gradient
*
Tel.: +886 3 2655130; fax: +886 3 2655199.
E-mail address: ychu@cycu.edu.tw
0020-0255/$ - see front matter Ó 2006 Elsevier Inc. All rights reserved.
doi:10.1016/j.ins.2006.09.009
1674 Y.-C. Hu / Information Sciences 177 (2007) 1673–1686
descent [36] is utilized to train the SLP. However, the main problem with gradient descent is that the training
may converge to a local minimum [48].
It is clear that the output of the sigmoid function is obtained by computing the inner product of the con-
nection weight with the input pattern. That is, the product terms are summed in the sigmoid function. This
indicates that the additivity property [27] of the interaction among individual variables is assumed. However,
because the variables are not always independent of each other, it seems that an assumption of additivity is not
reasonable enough.
Fuzzy measures with the monotonicity assumption consider the interrelation between elements [4] by gen-
eralizing the usual definition of a measure and replacing the usual additive property with the monotonic prop-
erty. Actually, the grade of importance of a variable can be obtained by a fuzzy measure. In practice, since a
non-additive technique, namely the fuzzy integral proposed by Sugeno [37,38], does not assume the indepen-
dence of one element from another [43], and considers the implicit interrelation between elements, the fuzzy
measure is used with fuzzy integral to aggregate the performance values [20,21,37–39]. In view of the effective-
ness of the fuzzy integral, this technique has been applied to solving real problems, such as classifier fusion
[6,19,20,39,45], and multiple criteria decision-making [5,11,26,40–42,47].
This paper aims to propose a novel perceptron by replacing the above-mentioned inner product in the sig-
moid function with an aggregation value obtained by the fuzzy integral. In particular, each of the connection
weights, as well as each of the input values is viewed as a value of a k-fuzzy measure ranging from zero to one
and the performance value of the corresponding input variable, respectively. Usually, for a perceptron, an
input pattern can be categorized into one class or the other if the output value is above or below 0.5. However,
0.5 may not be an optimal cut value [34]. In order to determine the parameter specifications, including con-
nection weights and cut value, a general-purpose optimization technique, namely genetic algorithm (GA)
[10,23,33], is designed to determine these parameters by maximizing the number of correctly classified training
patterns and by minimizing the errors between the actual and desired outputs of individual training patterns
for the proposed perceptron. That is, a GA is designed to build a fuzzy integral-based perceptron with high
classification performance and low training error. As for finding the connection weights, a local minimum is
unlikely to be stuck, due to the nature of GA [44].
The rest of this paper is organized as follows. The traditional perceptron (i.e., SLP with inner product), and
the proposed perceptron (fuzzy integral-based perceptron) are described in Sections 2 and 3, respectively. Sec-
tion 4 describes the genetics-based learning method for the proposed perceptron. In Section 5, in order to com-
pare the classification performance of the proposed perceptron with that of the traditional perceptron, the
proposed one is evaluated by computer simulations on realistic two-class problems, including thyroid data,
appendicitis data, breast cancer data, Wisconsin breast-cancer data, and Pima Indian diabetes data. It is dem-
onstrated that the proposed perceptron outperforms the traditional perceptron. Furthermore, the proposed
perceptron performs well in comparison with other fuzzy or non-fuzzy classification methods. This paper is
concluded in Section 6.
2. Single-layer perceptron with inner product
Let h be a bias in the activation function, denoted by fh. As depicted in Fig. 1, a traditional SLP with single
output node and n input nodes is a one-layer feed-forward network, where y is the actual output correspond-
ing to the input pattern x (i.e., (x1, x2, . . ., xn)), and n is the number of input features. Such a simple neural
w1 w2 w3 wn
x1 x2 x3 … xn
Fig. 1. A single-layer perceptron.
Y.-C. Hu / Information Sciences 177 (2007) 1673–1686 1675
network can decide whether an input belongs to one of the two classes by computing the output from the acti-
vation function, fh(u), where u is equal to wx h such that y = fh(u). While wx = w1x1 + w2x2 + + wnxn is
the inner product of the connection weight, w (i.e., (w1, w2, . . . , wn)), with x. fh(u) is usually taken to be non-
linear [14], as the sigmoid function. For the sigmoid function, an input pattern can be categorized into one
class or the other if the output value is above or below 0.5. In addition, the errors existing between the actual
and desired outputs of individual training patterns can also be measured. It is seen that the additivity property
of the interaction between x1, x2, . . ., and xn is assumed. In other words, x1, x2, . . . , and xn are assumed to be
independent of each other.
Without loss of generality, the sigmoid function is defined as
1
fh ðuÞ ¼ ð1Þ
1 þ eu
w and h are obtained by the learning rules in the perceptron learning algorithm. Furthermore, the learning
rules can be derived by the gradient descent method to minimize a square error measure, denoted by E
1X m
E¼ ðd i y i Þ2 ð2Þ
2 i¼1
where di and yi are the desired output and the actual output of the ith input training data, say xi = (xi1, xi2,
. . . , xin), respectively, and m is the number of training patterns. There is no doubt that di is 1 or 0, and yi ranges
from zero to one. With the well-known BP training algorithm [34], the learning is continued until a convergent
condition is reached. For instance, if E is below a pre-specified value, then the learning algorithm can be
stopped during the training phase.
3. Single-layer perceptron with fuzzy integral
Let X denote a finite set of {x1, x2, . . . , xn}, and h : X ! [0, 1]. It is considered that the element in X with
max{h(xj)jj = 1, 2, . . . , n} is renumbered as one. That is, xj’s are renumbered by ordering h(x) in the descending
order such that h(xn) 6 6 h(x2) 6 h(x1). The fuzzy integral, h(x) g(Æ), over X of h with respect to the fuzzy
measure, g, is computed as [20,50]

hðxÞ gðÞ ¼ max min minfhðxÞg; gðEÞ ¼ maxfminfhðxj Þ; gðEj Þgg ð3Þ
EX x2E xj 2X
where Ej = {x1, x2, . . . , xj} and h(xj) denotes the performance value of xj for x. For the SLP with the fuzzy inte-
gral, the fuzzy integral is employed to aggregate n performance values for x, and wx in the sigmoid function
can be replaced by h(x) g(Æ).
The fuzzy measure is used with the fuzzy integral for aggregating information. Let P(X) denote the power
set of X, such that g : P(X) ! [0, 1]. In general, g satisfies the following properties [11,22,37,43,46]:
1. g(/) = 0, g(X) = 1 (boundary conditions).

2. For any R, S 2 P(X), if R S, then g(R) 6 g(S) (monotonicity).
3. For every sequence of subsets of X, if either R1 R2 or R1 R2 , then limi!1g(Ri) =
g(limi!1Ri) (continuity).
Since the considered systems are not always independent of each other, the Lebesgue measure [25,50], a prob-
abilistic measure assuming additivity of the interaction among elements in X (e.g., g(A [ B) = g(A) + g(B))
may not be reasonable. It can be seen that fuzzy measures with the monotonicity assumption consider the
interrelation between elements [4] by generalizing the usual definition of a measure and replacing the usual
additive property with the monotonic property.
A k-fuzzy measure has been suggested for computing the fuzzy integral [20,45]. Formally, g is called a
k-fuzzy measure if
gðR [ SÞ ¼ gðRÞ þ gðSÞ þ kgðRÞgðSÞ; k 2 ð1; 1Þ ð4Þ
where R, S 2 P(X), and R \ S = /. Let gj denote g({xj}), which is called a fuzzy density. For the interaction
between R and S, if k > 0, then the multiplicative effect applies; if k < 0, then the substitutive effect applies.
However, if k = 0, then the additivity principle applies [5,40]. That is, R and S are independent of each other.
In the proposed perceptron, the connection weight wj is equal to gj, and can be determined by the GA. In com-
parison with the traditional SLP, gj can be interpreted as the grade of importance or the discriminatory power
of xj, and g(R) represents the grade of importance of the elements in R. However, it is difficult to analyze the
implicit meaning of the weights in the traditional SLP [9].
From the boundary conditions, g(/) and g({x1, x2, . . . , xn}) are equal to 0 and 1, respectively. In addition,
g({x1, x2, . . . , xj}) (1 6 j 6 n) is defined as
gðfx1 ; x2 ; . . . ; xj gÞ ¼ gj þ gðfx1 ; . . . ; xj1 gÞ þ kgj gðfx1 ; x2 ; . . . ; xj1 gÞ
" #
X
j j1 X
X j
j1 1 Y
¼ gi þ k gi1 gi2 þ þ k g 1 g2 . . . gj ¼ ð1 þ kgi Þ 1 ð5Þ
i¼1 i1 ¼1 i2 ¼i1 þ1
k i¼1::j
As can be seen, gj is a critical part for the computation of g({x1, x2, . . . , xj}) but is not easily pre-specified.
Additionally, since g({x1, x2, . . . , xn}) = 1, k can be obtained by solving the following polynomial equation
in the order of (n 1) using the well-known Newton’s method [39]:
" #
1 Y n
ð1 þ kgj Þ 1 ¼ 1 ð6Þ
k j¼1
4. Genetics-based learning method
In this section, a learning method for the proposed perceptron by a GA is proposed. The purpose of a GA
is to design a fuzzy integral-based perceptron with excellent classification performance.
4.1. Encoding parameters
As mentioned above, the selection of 0.5 as a cut value may be not optimal. Thus, an appropriate cut value,
say c, should be determined. Furthermore, by comparing the output value, y, of the input, x, with c(0 6 c 6 1),
x can be roughly classified into one of the two classes. In fact, there is an uncertain decision for some real-
world classification problems. That is, the cut value is not a unique base for classifying a pattern into one
of the two classes. For instance, the normal body temperature is around 37 °C, say 36.8 °C. In principle, med-
ical treatment must be given for a diagnosis of a fever if the body temperature is above 38.5 °C. Thus, although
the body temperature is above 36.8 °C but below 38.5 °C (e.g., 38.2 °C), it is not necessary to give any medical
treatment to the person. This means that tolerant ranges for the cut value should be taken into account.
If y is sufficiently larger than c, then x can be assigned to one class, say class 1; otherwise x can be assigned
to another class, say class 2. Similarly, if y is sufficiently smaller than c, then x can be assigned to class 2; other-
wise x can be assigned to class 1. Hence, two thresholds, t1 and t2, ranging from zero to one are needed to
classify x. That is, the degree of confidence is taken into account for assigning x to a class. Considering the
case in which y is larger than and equal to c or y is less than c, x is categorized into class 1 if
yPc and y c P t1 ð7Þ
or
y<c and c y < t2 ð8Þ
On the other hand, x is categorized into class 2 if
yPc and y c < t1 ð9Þ
or
y<c and c y P t2 ð10Þ
Since y, obtained from the output of the proposed perceptron, ranges from zero to one, it is seen that, for
instance, if c + t1 is above 1, then y c P t1 cannot hold. In such a case, a pattern is categorized as class 2
when y P c. For t2, if c 6 t2, then a pattern must be categorized as class 1 when y < c.
Let Pi, Npop and Ncon denote the population generated in the ith generation, the population size, and the
total number of generations, respectively. Since w1, w2, . . . , wn, t1, t2 and h cannot be pre-specified, these param-
eter specifications are automatically determined by a GA. The kth chromosome (1 6 k 6 Npop) in Pi
(1 6 i 6 Ncon) is represented by wik1 wik2 wikn cik tik1 tik2 hik . Its length is equal to jwik1 j þ jwik2 j þ þ
jwikn j þ jcik j þ jtik1 j þ jtik2 j þ jhik j, where jwikh jð1 6 h 6 nÞ, jcik j, jtik1 j, jtik2 j and jhik j are the lengths of wikh , cik , tik1 ,
tik2 and hik , respectively. Moreover, w1, w2, . . . , wn, c, t1, t2 and h are encoded as widely used binary strings.
An example with two weights is depicted in Fig. 2. It is noted that each of the populations consists of Npop
chromosomes.
The values of jwikh j, jcik j, jtik1 j, jtik2 j and jhik j are dependent on their domain lengths and the corresponding
required precision [28]. That is, if the domain of a variable has the length of s1, and the corresponding required
precision is s2 decimal places, then s3 bits are required to code such a variable if 2s3 1 < s1 10s2 < 2s3 holds.
Since the above variables range from zero to one (i.e., s1 = 1.0), if the required precision is three decimal places
(i.e., s2 = 3) for each variable, then 10 bits are required to code them (i.e., s3 = 10).
4.2. Population evolution
An initial population containing Npop chromosomes is generated and inserted into P0. Each gene in the bin-
ary chromosome is randomly assigned as either one or zero, with the probability of 0.5. By evaluating the fit-
ness value of each chromosome in Pi, the genetic operators—including reproduction, crossover, and mutation
[10,23]—are iterated until Npop new chromosomes are generated in Pi+1. When a stopping condition is satis-
fied, the evolution of the GA is terminated. In addition, the best chromosome with maximum fitness value
among the successive generations is taken as the desired solution to examine the generalization ability of
the proposed perceptron. That is, the final solution is not the best chromosome in the final population. In this
paper, the total number of generations (i.e., Ncon) is used as the stopping condition.
4.3. Evaluation of each chromosome
Each substring can be directly decoded as a real value ranging from zero to one. For instance, let jwikh j be
equal to 6. If wikh ¼ 100010, then wikh is equal to the 34/(26 1) (i.e., 0.54). As mentioned above, using
wik1 ; wik2 wikn , k can be obtained by solving Eq. (6) using Newton’s method. For each pattern, an output value
can be obtained by the sigmoid function with the aggregated performance value from the fuzzy integral. With
wik1 wik2 ; . . . ; wikn cik tik1 tik2 , let CA(wik1 wik2 wikn cik tik1 tik2 hik Þ and Eðwik1 wik2 wikn cik tik1 tik2 hik Þ denote the number of cor-
rectly classified training patterns and the square error computed by Eq. (2) and the fuzzy integral, respectively.
Then, the overall performance of wik1 wik2 wikn cik tik1 tik2 hik is evaluated by its fitness value, fki , which is defined as
fki ¼ wCA CAðwik1 wik2 wikn cik tik1 tik2 hik Þ wV Eðwik1 wik2 wikn cik tik1 tik2 hik Þ ð11Þ
where wCA and wV are the relative weights of CAðwik1 wik2
and wikn cik tik1 tik2 hik Þ
respec- Eðwik1 wik2 wikn cik tik1 tik2 hik Þ,
tively. It is seen that the objective of the fitness function is designed to find a set of connection weights or fuzzy
measures for the fuzzy integral-based perceptron with higher classification performance and lower square er-
rors between the actual and desired outputs of individual training patterns.
wki 1 wki 2 cki tki 1 tki 2 θ ki
0 0 … 1 1 0 … 1 1 1 … 0 0 1 … 1 1 1 … 0 0 0 … 0
Fig. 2. The kth binary chromosome in Pi.

4.4. Genetic operations
When the fitness values of individual chromosomes are obtained, the genetic operators are employed to
determine the newly generated chromosomes in the next generation. In order to generate new chromosomes,
a pair of chromosomes is selected from Pi. Following the roulette wheel selection [17], the selection probabil-
ity, Prðwik1 wik2 wikn cik tik1 tik2 hik Þ, of wik1 wik2 wikn cik tik1 tik2 hik in Pi is as
fki minffzi jz ¼ 1 . . . N con g
Prðwik1 wik2 wikn cik tik1 tik2 hik Þ ¼ P i
ð12Þ
k¼1::N con ½fk minffz jz ¼ 1 . . . N con g
i
1/2Ncon pairs of chromosomes are randomly selected from Pi, generating two new chromosomes from each
pair by the crossover and mutation operations.
A GA evolves a population of candidate solutions to the given problem by focusing primarily on crossover
[30]. The crossover operation with a pre-specified probability, Prc, is used for exchanging partial information
between two substrings in the selected parent, and two new chromosomes are generated to replace their parent
strings by inserting these two new chromosomes into Pi+1. Each crossover point in a substring is chosen
randomly. Thus, there are (n + 4) crossover points for a selected pair. An example for demonstrating the
multi-point crossover operation is depicted in Fig. 3. It can be seen that the six-point crossover operation
is performed on a selected pair (i.e., wiu1 wiu2 ciu tiu1 tiu2 hiu and wiv1 wiv2 civ tiv1 tiv2 hiv , 1 6 u, v 6 Npop) if two weights are
taken into account.
For each gene of the newly generated binary chromosomes in Pi+1, the mutation operation with a pre-spec-
ified probability, Prm, is performed on each bit or gene of the string. An example of the mutation operation is
illustrated in Fig. 4, where ‘‘*’’ denotes a mutation position. Subsequently, Pi+1 can inherit the best performing
chromosome in Pi by the elitist strategy. That is, the best chromosome in Pi may be viewed as a kind of elite
chromosome that is inserted into Pi+1 without change [33,35]. In practice, Ndel chromosomes
(0 6 Ndel 6 Npop) can be randomly selected and removed from Pi+1 by adding Ndel strings with the maximum
fitness value in Pi.
4.5. Learning algorithm for fuzzy integral-based perceptron
With a GA, the training algorithm of the fuzzy integral-based perceptron for two-class classification prob-
lems is written as follows:
wui 1 wui 2 cui tui 1 tui 2 θui
0 0 1 1 0 0 0 1 1 1 1 0 0 1 0 0 1 1 0 0 0 0 1 0
1st crossover point 2nd crossover point 3rd crossover point 4th crossover point 5th crossover point 6th crossover point
1 0 0 1 1 0 1 1 0 1 0 1 1 1 1 1 1 1 0 0 0 1 1 1
wvi1 wvi 2 cvi tvi1 tvi 2 θvi
0 0 0 1 0 0 1 1 1 1 0 1 0 1 0 1 1 1 0 0 0 0 1 1
1 0 1 1 1 0 0 1 0 1 1 0 1 1 1 0 1 1 0 0 0 1 1 0
wvi1 wvi 2 cvi tvi1 tvi 2 θvi
Fig. 3. 6-point crossover.

0 0 0 1 0 0 1 1 1 1 0 1 0 1 0 1 1 1 0 0 0 0 1 1
* * *
1 0 0 1 0 0 1 0 1 1 0 1 0 1 0 1 1 1 0 1 0 0 1 1
Fig. 4. Mutation operation.
Algorithm: Fuzzy integral-based perceptron learning algorithm

Input:
a. Population size: Npop;
b. Stopping condition: Ncon (i.e., total number of generations);
c. Number of elite chromosomes: Ndel;
d. Length of binary strings (i.e., jwikh j; jcik j; jtik1 j; jtik2 j and jðhik jÞ;
d. Crossover probability: Prc;
e. Mutation probability: Prm;
f. Relative weights in the fitness function (i.e., wCA and wV);
g. A set of training patterns.
Output: A fuzzy integral-based perceptron with higher classification performance and lower square error
Method:
Step 1. Data normalization

Since individual performance values or input values range from zero to one, normalization is per-
formed for each performance value, h(xj), of an input pattern x
hðxj Þ
hðxj Þ ¼ for j ¼ 1; 2; . . . ; n ð13Þ
maj
where maj is the maximum value of the domain interval of xj.
Step 2. Initialization
Generate the initial population of Npop chromosomes.
Step 3. Compute fitness values
Decode each chromosome in the current population and compute the corresponding fitness value of
each chromosome in the current population.
Step 4. Generate new chromosomes
Generate new Npop chromosomes from the current population using genetic operations.
Step 5. Perform elitist strategy
Newly generated Ndel chromosomes are randomly selected and removed from the current population.
Ndel chromosomes with the maximum fitness value in the previous population are inserted into the
current one.
Step 6. Termination test
Terminate the algorithm if Ncon generations have been generated; otherwise, return to Step 2.
5. Computer simulations
In this section, the generalization ability of the proposed perceptron is mainly compared with that of the tra-
ditional SLP. The generalization ability is examined through computational experiments on five well-known
data sets with two classes, including the thyroid data, appendicitis data, breast cancer data, Wisconsin breast-
cancer data, and Pima Indian diabetes data, available from the UCI machine learning repository on http://
www.ics.uci.edu/~mlearn/MLRepository.html.
The advantages of the proposed learning algorithm is simple enough to implement as a computer program,
without any statistical assumptions. For instance, the assumption of conditional independence of the obser-
vations is not needed for the proposed perceptron. In practice, the computer simulations are implemented
in Delphi 7.0 on a Pentium 4 personal computer with a clock rate of 1700 MHz.
5.1. Thyroid data
The thyroid data reported by Quinlan [31] are related to the problem that determines whether a patient
referred to the clinic has hypothyroidism. The samples are classified into two classes (i.e., hypothyroid and
non-hypothyroid) consisting of 3772 and 3428 patterns with 21 features during the years 1985 and 1986,
respectively. In practice, the former and latter samples are used as training and test cases, respectively. It is
noted that c, t1 and t2 are 0.158, 0.567 and 0.602, respectively, for the best chromosome.
With carefully tuned parameters, the best classification accuracy rates obtained by the eight non-fuzzy
methods, including the linear discriminant, the quadratic discriminant, the nearest neighbor method (i.e.,
1-NN) with Euclidean distance, the 1-NN with a normalized distance, the Bayes rule with independence
(i.e., Bayes independence), the Bayes rule with second-order considerations, the traditional SLP trained by
the BP algorithm, and the two-layer perceptron with single hidden layer trained by the BP algorithm were
reported by Weiss and Kulikowski [48] and are summarized in Table 1. The BP algorithm with 6000 epochs
is employed to train the traditional SLP sufficiently. Considering configurations with different numbers of
hidden units, it is found that the two-layer perceptron with three hidden units, a learning rate of 0.9, a momen-
tum of 0.5, and 70,000 training epochs has the best generalization ability (i.e., 98.54%).
With regard to the proposed perceptron, the performance of a GA could be influenced by using different
parameter specifications. Nevertheless, there is no best set of values of parameters [28]. The reasons for pre-
specified parameter specifications are briefly described below:
(a) Npop = 50: The most common size of population varies from 50 to 500 individuals [28]. Hence, 50 indi-
viduals is an acceptable size.
(b) Ncon = 2000: The stopping condition is specified according to the available computing time. Moreover, a
sufficient evolution of the GA is required.
(c) Ndel = 2: To avoid generating much perturbation in the next generation, a small number of the elite
chromosomes are taken into account.
(d) jwikh j ¼ jcik j ¼ jtik1 j ¼ jtik2 j ¼ jhik j ¼ 10, for k = 1. . .Npop: the required precision for each parameter has
three decimal places.
(e) Prc = 0.90 and Prm = 0.05: Usually, a Prc with a higher value is taken into account since a higher Prc
allows the exploration of more of the solution space. Furthermore, in order to not generate excessive
perturbation, Prm should be specified as a lower value.
(f) wCA = wV = 1.0: Since there is no special consideration to setting wCA and wV, both wCA and wV are
specified as 1.0. That is, the relative degree of importance of the classification performance is equal to
that of the square errors. Of course, the scales of measurement of the classification performance and
square error can also influence the fitness of a string.
Table 1
Classification accuracy rates of different classification methods for the thyroid data
Linear discriminant Quadratic discriminant 1-NN 1-NN normalized Bayes independence
93.85% 88.39% 95.27% 95.77% 96.06%
Bayes second-order SLP Two-layer perceptron Fuzzy integral-based perceptron
92.44% 96.41% 98.54% 97.87%
As a result, the classification accuracy rate of the proposed perceptron is 97.87%. It can be seen that, except
for the two-layer perceptron, the fuzzy integral-based perceptron outperforms the other seven classification
methods. Additionally, the classification accuracy rate on the 3772 training patterns for the proposed percep-
tron and the traditional SLP are 97.53% and 97.40%, respectively. Thus, the proposed perceptron outperforms
slightly the traditional SLP in terms of fit.
It should be noted that in [48], if the BP algorithm has a learning rate of 1, a momentum of 0 and 2000
epochs is performed then the classification rate of the traditional SLP is below 95%, and the best classification
rate of the two-layer perceptron is below 97.4% and occurs at nine hidden units. In general, the learning rate
and the training epochs can affect the training performance for the traditional SLP [48]. Regarding the two-
layer perceptron, although it was shown that a two-layer perceptron with any fixed continuous sigmoidal func-
tion is sufficient to approximate any continuous function [7], the number of hidden nodes is also a key factor
that can influence the training performance.
5.2. Appendicitis data
The appendicitis data are classified into two classes (i.e., acute appendicitis and other diagnoses) and consist
of 106 patterns with seven quantitative features. Marchand et al. [25] used this data set to confirm the diag-
nosis of acute appendicitis. The generalization ability of the proposed perceptron is examined by the leave-
one-out technique. In each of the iterations of the leave-one-out technique, the perceptron is trained from
105 training patterns. The single remaining pattern is tested by the trained perceptron. This procedure is iter-
ated until each of the given 106 patterns has been used as a test pattern.
By the leave-one-out technique, the best classification accuracy rates obtained by the nine fuzzy methods
and those of the 10 non-fuzzy methods with carefully tuned parameters were reported by Grabisch and Dispot
[12] and Weiss and Kulikowski [48], respectively. The nine fuzzy classification methods are fuzzy integral with
perceptron criterion, fuzzy integral with quadratic criterion, fuzzy pattern matching with minimum operator,
fast heuristic search with Sugeno integral, simulated annealing with Sugeno integral, fuzzy k-nearest neighbor,
fuzzy c-means, fuzzy c-means for histograms, and hierarchical fuzzy c-means, while the 10 non-fuzzy classifi-
cation methods include the linear discriminant, the quadratic discriminant, the 1-NN with Euclidean distance,
the Bayes independence, the Bayes rule with second-order considerations, the two-layer perceptron trained by
the BP algorithm, the two-layer perceptron trained by an ordinary differential equation (ODE) solver, the
PVM rule, an exhaustive search for the optimal rule with size 2, and the CART tree. The performance of
the proposed perceptron is compared with that of the above-mentioned fuzzy or non-fuzzy classification meth-
ods, and the classification results are summarized in Table 2. Furthermore, the classification performances of
Table 2
Classification accuracy rates of different classification methods for the appendicitis data
Fuzzy methods
Perceptron criterion Quadratic criterion Minimum operator Fast heuristic search Simulated annealing
79.2% 86.8% 86.8% 84.9% 81.1%
Fuzzy k-nearest Fuzzy c-means Fuzzy c-means for Hierarchical fuzzy c- Rule-based voting system
neighbor histograms means
86.8% 71.2% 78.3% 80.2% 89.6%
Genetics-based learning Fuzzy integral-based
method perceptron
84.9% 88.7%
Non-fuzzy methods
Linear discriminant Quadratic discriminant 1-NN Bayes independence Bayes second-order
86.8% 73.6% 82.1% 83.0% 81.1%
Two-layer perceptron ODE algorithm PVM rule Optimal rule with CART tree
size 2
85.8% 86.8% 89.6% 89.6% 84.9%
SLP
85.8%
the fuzzy rule-based voting system [15] and the genetics-based learning method [16] are also summarized in this
table. It is noted that the BP with 15,000 epochs is utilized to train the traditional SLP and two-layer percep-
tron sufficiently. Among the configurations with different numbers of hidden units, the two-layer perceptron
with two hidden units has the best generalization ability (i.e., 85.8%).
With the same parameter values as in the computer simulation for the thyroid data, the classification accu-
racy rate of the proposed perceptron is 88.7%. It can be seen that the proposed perceptron outperforms the
traditional SLP. Furthermore, the fuzzy integral-based perceptron is comparable with the other classification
methods. Additionally, for simplicity, an observation of c, t1 and t2 is obtained by using all patterns as training
samples. As a result, c, t1, and t2 are 0.794, 0.372 and 0.255, respectively, for the best chromosome.
5.3. Breast cancer data
The breast cancer data, provided by Zwitter and Soklic, was obtained from the Institute of Oncology, Uni-
versity Medical Centre, Ljubljana, Yugoslavia. The samples consist of 286 patterns with nine attributes. In a
similar manner, c, t1, and t2 for the breast cancer data are also obtained by using all patterns as training sam-
ples such that c, t1 and t2 are 0.498, 0.490 and 0.008, respectively, for the best chromosome.
By performing a random sampling procedure with 50% training patterns and 50% test patterns, Grabisch
and Dispot [12] reported the classification rates of nine of the fuzzy classification methods mentioned in the
previous simulation. The classification rates of the 10 non-fuzzy classification methods mentioned in the case
of the appendicitis data, reported in Weiss and Kulikowski [48], were obtained by performing a random sam-
pling procedure with 70% training patterns and 30% test patterns. The classification results are summarized in
Table 3. The classification performance of the fuzzy rule-based voting system [15], obtained by a random sam-
pling procedure with 70% training patterns and 30% test patterns, is also included in Table 3. In [43], the BP
with 2000 epochs is employed to train the traditional SLP and two-layer perceptron. The best two-layer per-
ceptron in terms of resampled accuracy (i.e., 71.0%) occurs at three hidden units.
In order to compare the proposed perceptron with these fuzzy or non-fuzzy classification methods, two dif-
ferent data partitions using resampling techniques are considered in this computer simulation: one is 50%
training, 50% test partition, and the other is 70% training, 30% test partition. Since the results of a random
sampling procedure may be dependent on the partition of data, 10 randomly sampled data sets with the for-
mer (i.e., 50% training and 50% test) and latter partitions (i.e., 70% training and 30% test) are created, respec-
tively. With the same parameter values as in the computer simulation for the thyroid data, the average
classification accuracy rate of the proposed perceptron for test patterns obtained from 10 independent trials
of a random sampling procedure with 70% training patterns and 30% test patterns is 74.8%, while that
Table 3
Classification accuracy rates of different classification methods for the breast cancer data
Fuzzy methods
Perceptron criterion Quadratic criterion Minimum operator Fast heuristic Simulated
search annealing
60.9% 68.0% 55.3% 59.2% 65.5%
Fuzzy k-nearest neighbor Fuzzy c-means Fuzzy c-means for Hierarchical Rule-based voting
histograms fuzzy c-means system
63.7% 54.9% 58.1% 59.5% 74.7%
Fuzzy integral-based perceptron Fuzzy integral-based perceptron
(70/30 partition) (50/50 partition)
74.8% 73.5%
Non-fuzzy methods
Linear discriminant Quadratic discriminant 1-NN Bayes Bayes second-
independence order
70.6% 65.6% 65.3% 71.8% 65.6%
Two-layer perceptron ODE algorithm PVM rule ASSISTANT tree CART tree
71.0% 72.4% 77.1% 72.0% 77.1%
SLP
71.5%
obtained with 50% training patterns and 50% test patterns is 73.5%. It can be seen that the proposed percep-
tron outperforms the traditional SLP. Moreover, the fuzzy integral-based perceptron is comparable with other
classification methods.
5.4. Wisconsin breast-cancer data
There are 699 patterns with nine attributes in the Wisconsin breast-cancer data, obtained from the Univer-
sity of Wisconsin Hospitals, Madison [24]. Sixteen patterns that contain missing attribute values are removed.
A preprocessing of the data set is performed by normalizing the maximum of each attribute to 1. An obser-
vation of c, t1 and t2 for the Wisconsin breast-cancer data is obtained by using all patterns as training samples.
For the best chromosome, c, t1 and t2 are 0.617, 0.262 and 0.205, respectively.
The generalization ability of the proposed perceptron is examined by the 10-fold cross-validation or rota-
tion method [20]. In general, the 10-fold cross-validation is adequate and sufficient for estimating the true error
rate [48]. In the 10-fold cross-validation, all patterns are divided into 10 different subsets. For a classification
method, nine subsets are used as the training patterns and are tested on the single remaining subset. This pro-
cedure is repeated until each of the 10 subsets is tested. Moreover, the 10-fold cross-validation is independently
performed 10 times using different partitions of the Wisconsin breast-cancer data.
With the same parameter specifications as in the previous computer simulations, the average result of the
proposed perceptron for test patterns obtained from 10 independent trials of the 10-fold cross-validation is
96.38%. Since the decision-tree algorithm, namely the C4.5 algorithm [32], is one of the most well-known
and frequently used classification methods, the performance of the proposed percepton is mainly compared
with the six variants of the C4.5 algorithm proposed by Elomaa and Rousu [8]. These six variants, including
the binary, greedy, and optimal splitting strategies, were tested by using three evaluation functions: the infor-
mation gain function (IG), the gain ratio function (GR), and the balanced gain function (BGlog). The perfor-
mances are summarized in Table 4. The classification results of the decision tree-based fuzzy classifier [1], the
GA-based non-fuzzy classifier [13] and the learning algorithm with the parallel delta rule [3], obtained by the
10-fold cross-validation, are included in Table 4.
The classification rate of the SLP is obtained by using the BP with a learning rate of 0.9, a momentum of 0.5
and 10,000 training epochs. It can be seen that the classification performances of the decision tree-based fuzzy
classifier and the algorithm with parallel delta rule are slightly superior to the fuzzy integral-based perceptron.
On the other hand, the proposed perceptron outperforms slightly the traditional SLP, and the proposed per-
ceptron performs better than the six various types of the C4.5 algorithm in classifying the test patterns.
Additionally, the testing of the statistical hypothesis is further performed to examine whether the classification
performance of the proposed perceptron is statistically different from that of the traditional perceptron or not.
The null hypothesis is that the classification performance of the proposed perceptron is equal or inferior to the
traditional perceptron, and the alternative hypothesis is that the classification performance of the proposed per-
ceptron is superior to the traditional perceptron. Since the paired t-test demonstrates that the difference of 0.29%
is statistically significant at the 5% level, the null hypothesis is rejected for the Wisconsin breast-cancer data.
5.5. Pima Indian diabetes data
This data set, obtained from the National Institute of Diabetes and Digestive and Kidney Diseases, USA,
contains 768 patterns with eight attributes. The preprocessing of the data set is the same as in the Wisconsin
Table 4
Classification accuracy rates of different classification methods for the Wisconsin breast-cancer data
C4.5 variants SLP Decision GA-based Parallel Proposed perceptron
Binary Greedy Optimal tree-based non-fuzzy delta rule
fuzzy classifier classifier
GR IG GR BGlog GR BGlog
94.7% 94.9% 94.1% 94.6% 94.0% 94.7% 96.09% 96.82% 95.30% 96.94% 96.38%
Table 5
Classification accuracy rates of different classification methods for the diabetes data
C4.5 algorithm variants SLP Decision tree-based Parallel delta Proposed
fuzzy classifier rule perceptron
Binary Greedy Optimal
GR IG GR BGlog GR BGlog
72.8% 75.0% 73.2% 74.3% 73.2% 74.3% 73.63% 73.05% 73.66% 74.81%
breast-cancer data: the domain of each attribute is normalized to the unit interval [0, 1]. The corresponding
values of c, t1 and t2 for the diabetes data can be observed by using all patterns as training samples. For
the best chromosome, c, t1 and t2 are 0.208, 0.243 and 0.063, respectively.
The parameter specifications used in the case of the diabetes data are the same as in the previous computer
simulations. As a result, the average classification rate of the proposed perceptron for test patterns obtained
from 10 independent trials of the 10-fold cross-validation is 74.81%. The performance of the fuzzy integral-
based perceptron is also compared with that of the six variants of the C4.5 algorithm.
The performances of these algorithms are summarized in Table 5. The classification results of the decision
tree-based fuzzy classifier and the learning algorithm with the parallel delta rule, obtained by the 10-fold cross-
validation, are displayed in Table 5. As in the Wisconsin breast-cancer data, the classification rate of the SLP
is obtained by using the BP with a learning rate of 0.9, a momentum of 0.5 and 10,000 training epochs. From
the test results, it is seen that the proposed perceptron outperforms the traditional SLP. Moreover, the pro-
posed perceptron is comparable to the classification methods mentioned above. The testing of the statistical
hypothesis is also performed. With the same hypotheses as in the computer simulation for the Wisconsin
breast-cancer data, the paired t-test demonstrates that the difference of 1.18% is statistically significant at
the 5% level. Hence, the null hypothesis is rejected for the diabetes data.
As seen from Tables 1–5, the fuzzy integral-based perceptron performs well in comparison with the other
fuzzy or non-fuzzy classification methods. Additionally, it should be noted that k is below 0.9 when finding
the best solution for each data set (i.e., thyroid data, appendicitis data, breast cancer data, Wisconsin breast-
cancer data, and diabetes data). In other words, there exists a substitutive effect of the interaction among the
attributes. Thus, it is reasonable to incorporate the fuzzy integral with a k-fuzzy measure into a perceptron
instead of an inner production.
6. Discussion and conclusions
A SLP with single output node for two-class problems is explored in this paper. First, a fuzzy integral-based
activation function of an output node in a perceptron is taken into account. The main purpose is to demon-
strate that the interaction among individual input features should not be ignored. A k-fuzzy measure is used
with the fuzzy integral for the purpose of obtaining aggregated performance values of individual features,
since both the fuzzy integral and the fuzzy measure consider the implicit interrelation. In particular, a fuzzy
density corresponds exactly to a connection weight, and can be interpreted as the degree of importance of the
corresponding feature. Second, a GA is designed to determine fuzzy measures, bias, cut value, and thresholds
automatically. Thresholds can determine whether it is sufficiently certain or not for a pattern to be categorized
into one of the two classes.
As shown in the experiment for the thyroid data, in order to obtain the smallest predictive error rate for
different problems for a traditional SLP, the learning rate and the number of training epochs are required
to be carefully pre-specified. However, in the computer simulations, customized parameter tuning is not con-
sidered for the proposed perceptron learning algorithm. It seems that the parameter specifications for the pro-
posed perceptron are not a serious problem. It is particularly stressed that the aim of incorporating the fuzzy
integral into the SLP is to replace an inner production operation with a reasonably non-additive technique,
rather than becoming the best classifier among all fuzzy or non-fuzzy classification methods. In fact, there
is no such thing as the ‘‘best’’ classifier [20]. Furthermore, it can be seen that the fuzzy integral-based percep-
tron is comparable to other fuzzy or non-fuzzy classification methods. This may indicate that the proposed
perceptron could have the potential to solve some real problems, such as medical diagnoses and bankruptcy
prediction.
Although the following issues are not the focus of this paper, they may be further considered to propose
novel models in future studies:
a. It is seen that the fuzzy integral proposed by Sugeno is performed by max and min operators. In fact,
other fuzzy aggregators such as the algebraic product [51] may be useful. In order to perform ‘‘soft’’
aggregation that can be free from the bias effect resulting from maximum or minimum performance val-
ues, Yager [49,50] proposed two special ordered weighted averaging operators, S-OWA-OR and S-
OWA-AND, to replace max and min, respectively. It is possible to apply various operators to the fuzzy
integral, and analyze individual classification performances.
b. Only two-class problems are addressed in this paper. However, the possibility of extending the current
work to multiple classes should be analyzed. A possible method is to set up a SLP with l output nodes if
there are l classes. That is, one net is responsible for one class.
c. The fuzzy integral-based activation function may be applied to neurons in the other neural networks,
such as multi-layer perceptron or functional-link net [29].
d. The main limitation for the proposed perceptron comes from the nature of a GA. When a GA is used to
solve problems, a number of factors, such as population size, number of generations, crossover rate, and
mutation rate, can be subjectively determined [28,33]. In general, there are no optimal values of the
above parameter specifications, and the setting is mainly dependent on the application context. In the
future study, appropriate values of the above parameter for a real problem could be worth analyzing
in order to obtain better solutions.
Acknowledgments
The author would like to thank the anonymous referees for their valuable comments.
References
[1] J. Abonyi, J.A. Roubos, F. Szeifert, Data-driven generation of compact, accurate, and linguistically-sound fuzzy classifiers based on a
decision-tree initialization, International Journal of Approximate Reasoning 32 (1) (2003) 1–21.
[2] B. Andrea, M. Sergio, T. Pietro, Neural Networks for Economic and Financial Modelling, International Thomson Computer Press,
London, 1996.
[3] P. Auer, H. Burgsteiner, W. Maass, Reducing communication for distributed learning in neural networks, Lecture Notes in Computer
Science, vol. 2415, Springer-Verlag, Heidelberg, 2002, pp. 123–128.
[4] T.Y. Chen, H.L. Chang, G.H. Tzeng, Using fuzzy measures and habitual domains to analyze the public attitude and apply to the gas
taxi policy, European Journal of Operational Research 137 (1) (2002) 145–161.
[5] H.K. Chiou, G.H. Tzeng, Fuzzy multicriteria decision-making approach to analysis and evaluation of green engineering for industry,
Environmental Management 30 (6) (2002) 816–830.
[6] S.B. Cho, J.H. Kim, Multiple network fusion using fuzzy logic, IEEE Transactions on Neural Networks 6 (2) (1995) 497–501.
[7] G. Cybenko, Approximation by superpositions of a sigmoidal function, Mathematics of Control, Signals, and Systems 2 (1989) 303–
314.
[8] T. Elomaa, J. Rousu, General and efficient multisplitting of numerical attributes, Machine Learning 36 (3) (1999) 201–244.
[9] R. Eric, S. Seema, Bankruptcy prediction by neural network, in: R.R. Trippi, E. Turban (Eds.), Neural Networks in Finance and
Investing: Using Artificial Intelligence to Improve Real-World Performance, Irwin, Chicago, 1996, pp. 243–259.
[10] D.E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, Reading, 1989.
[11] M. Grabisch, Fuzzy integral in multicriteria decision-making, Fuzzy Sets and Systems 69 (3) (1995) 279–298.
[12] M. Grabisch, F. Dispot, A comparison of some methods of fuzzy classification on real data, in: Proceedings of the 2nd International
Conference on Fuzzy Logic and Neural Networks, Iizuka, Japan, 1992, pp. 659–662.
[13] S.U. Guan, F. Zhu, Class decomposition for GA-based classifier agents – a Pitt approach, IEEE Transactions on Systems, Man, and
Cybernetics 34 (1) (2004) 381–392.
[14] J. Hertz, A. Krogh, R.G. Palmer, Introduction to the Theory of Neural Computation, Addison-Wesley, Redwood, 1991.
[15] H. Ishibuchi, T. Nakashima, T. Morisawa, Voting in fuzzy rule-based systems for pattern classification problems, Fuzzy Sets and
Systems 103 (2) (1999) 223–238.
[16] H. Ishibuchi, T. Nakashima, T. Murata, Performance evaluation of fuzzy classifier systems for multidimensional pattern classification
problems, IEEE Transactions on Systems, Man, and Cybernetics 29 (5) (1999) 601–618.
[17] H. Ishibuchi, T. Nakashima, T. Murata, Three-objective genetics-based machine learning for linguistic rule extraction, Information
Sciences 136 (2001) 109–133.
[18] D. Kim, S.Y. Bang, A handwritten numeral character classification using tolerant rough set, IEEE Transactions on Pattern Analysis
and Machine Intelligence 22 (9) (2000) 923–937.
[19] A.S. Kumar, S.K. Basu, K.L. Majumdar, Robust classification of multispectral data using multiple neural networks and fuzzy
integral, IEEE Transactions on Geoscience and Remote Sensing 35 (3) (1997) 787–790.
[20] L.I. Kuncheva, Fuzzy Classifier Design, Physica-Verlag, Heidelberg, 2000.
[21] K.M. Lee, H. Leekwang, Identification of k-fuzzy measure by genetic algorithms, Fuzzy Sets and Systems 75 (1995) 301–309.
[22] Y.K. Liu, B. Liu, Fuzzy random programming with equilibrium chance constraints, Information Sciences 170 (2–4) (2005) 363–395.
[23] K.F. Man, K.S. Tang, S. Kwong, Genetic Algorithms: Concepts and Designs, Springer, London, 1999.
[24] O.L. Mangasarian, W.H. Wolberg, Cancer diagnosis via linear programming, SIAM News 23 (5) (1990) 1–18.
[25] A. Marchand, F. Van Lente, R. Galen, The assessment of laboratory tests in the diagnosis of acute appendicitis, American Journal of
Clinical Pathology 80 (3) (1983) 369–374.
[26] P. Meyer, M. Roubens, On the use of the Choquet integral with fuzzy numbers in multiple criteria decision support, Fuzzy Sets and
Systems 157 (7) (2006) 927–938.
[27] T. Onisawa, M. Sugeno, M.Y. Nishiwaki, H. Kawai, Y. Harima, Fuzzy measure analysis of public attitude towards the use of nuclear
energy, Fuzzy Sets and Systems 20 (1986) 259–289.
[28] A. Osyczka, Evolutionary Algorithms for Single and Multicriteria Design Optimization, Physica-Verlag, New York, 2002.
[29] Y.H. Pao, Adaptive Pattern Recognition and Neural Networks, Addison-Wesley, Reading, 1989.
[30] M. Pelikan, D.E. Goldberg, S. Tsutsui, Getting the best of both worlds: discrete and continuous genetic and evolutionary algorithms
in concert, Information Sciences 156 (3–4) (2003) 147–171.
[31] J. Quinlan, Simplifying decision trees, International Journal of Man-Machine Studies 27 (1987) 221–234.
[32] J.R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufman, San Mateo, 1993.
[33] A.J.F. Rooij, L.C. Jain, R.P. Johnson, Neural Network Training Using Genetic Algorithms, World Scientific, Singapore, 1996.
[34] L.M. Salchenberger, E.M. Cinar, N.A. Lash, Neural networks: a new tool for predicting thrift failures, Decision Sciences 23 (4) (1992)
899–916.
[35] P. Siarry, F.A. Guely, A genetic algorithm for optimizing Takagi-Sugeno fuzzy rule bases, Fuzzy Sets and Systems 99 (1998) 37–47.
[36] K.A. Smith, J.N.D. Gupta, Neural Networks in Business: Techniques and Applications, Idea Group, Hershey, PA, 2002.
[37] M. Sugeno, Theory of Fuzzy Integrals and Its Applications, Ph.D. thesis, Tokyo Institute of Technology, Tokyo, 1974.
[38] M. Sugeno, Fuzzy measures and fuzzy integrals: a survey, in: M.M. Gupta, G.N. Saridis, B.R. Gaines (Eds.), Fuzzy Automata and
Decision Processes, North Holland, New York, 1977, pp. 89–102.
[39] H. Tahani, J.M. Keller, Information fusion in computer vision using the fuzzy integral, IEEE Transactions on Systems, Man, and
Cybernetics 20 (1990) 733–741.
[40] H.H. Tsai, I.Y. Lu, The evaluation of service quality using generalized Choquet integral, Information Sciences 176 (6) (2006) 640–663.
[41] F.M. Tseng, Y.J. Chiu, Hierarchical fuzzy integral stated preference method for Taiwan’s broadband service market, Omega 33
(2005) 55–64.
[42] F.M. Tseng, C.Y. Yu, Partitioned fuzzy integral multinomial logit model for Taiwan’s internet telephony market, Omega 33 (3) (2005)
267–276.
[43] G.H. Tzeng, Y.P. Ou Yang, C.T. Lin, C.B. Chen, Hierarchical MADM with fuzzy integral for evaluating enterprise intranet web sites,
Information Sciences 169 (3–4) (2005) 409–426.
[44] E. Vonkj, L.C. Jain, R.P. Johnson, Automatic Generation of Neural Network Architecture Using Evolutionary Computation, World
Scientific, Singapore, 1997.
[45] D. Wang, J.M. Keller, C.A. Carson, K.K. McAdoo-Edwards, C.W. Bailey, Use of fuzzy-logic-inspired features to improve bacterial
recognition through classifier fusion, IEEE Transactions on Systems, Man, and Cybernetics 28 (4) (1998) 583–591.
[46] Z. Wang, G.J. Klir, Fuzzy Measure Theory, Plenum Press, New York, 1992.
[47] Z. Wang, K.S. Leung, G.J. Klir, Applying fuzzy measures and nonlinear integrals in data mining, Fuzzy Sets and Systems 156 (3)
(2005) 371–380.
[48] S.M. Weiss, C.A. Kulikowski, Computer Systems that Learn: Classification and Prediction Methods from Statistics, Neural Nets,
Machine Learning, and Expert Systems, Morgan Kaufmann, San Maeto, 1991.
[49] R.R. Yager, On ordered weighted averaging aggregation operators in multi-criteria decision making, IEEE Transactions on Systems,
Man, and Cybernetics 18 (1988) 183–190.
[50] R.R. Yager, Elements selection from a fuzzy subset using the fuzzy integral, IEEE Transactions on Systems, Man, and Cybernetics 23
(1993) 467–477.
[51] H.-J. Zimmermann, Fuzzy Set Theory and Its Applications, Kluwer, Boston, 1996.

2007.fuzzy Integral-Based Perceptron For Two-Class Pattern Classic at Ion Problems

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

2007.fuzzy Integral-Based Perceptron For Two-Class Pattern Classic at Ion Problems

Загружено:

Авторское право:

Доступные форматы

Information Sciences 177 (2007) 1673–1686

Fuzzy integral-based perceptron for two-class

2. Single-layer perceptron with inner product

3. Single-layer perceptron with fuzzy integral

1. g(/) = 0, g(X) = 1 (boundary conditions).

4. Genetics-based learning method

4.1. Encoding parameters

4.2. Population evolution

4.3. Evaluation of each chromosome

wki 1 wki 2 cki tki 1 tki 2 θ ki

Fig. 2. The kth binary chromosome in Pi.

4.4. Genetic operations

4.5. Learning algorithm for fuzzy integral-based perceptron

wui 1 wui 2 cui tui 1 tui 2 θui

wvi1 wvi 2 cvi tvi1 tvi 2 θvi

wui 1 wui 2 cui tui 1 tui 2 θui

wvi1 wvi 2 cvi tvi1 tvi 2 θvi

Fig. 3. 6-point crossover.

wui 1 wui 2 cui tui 1 tui 2 θui

wui 1 wui 2 cui tui 1 tui 2 θui

Fig. 4. Mutation operation.

Algorithm: Fuzzy integral-based perceptron learning algorithm

Step 1. Data normalization

5.1. Thyroid data

5.2. Appendicitis data

5.3. Breast cancer data

5.4. Wisconsin breast-cancer data

5.5. Pima Indian diabetes data

6. Discussion and conclusions

Вам также может понравиться