Вы находитесь на странице: 1из 17

Neural Networks 15 (2002) 909925

www.elsevier.com/locate/neunet

An intelligent sales forecasting system through integration of artificial


neural networks and fuzzy neural networks with fuzzy weight elimination
R.J. Kuoa,*, P. Wub, C.P. Wangc
a
Department of Industrial Engineering, National Taipei University of Technology, No. 1, Section 3, Chung-Hsiao East Road, Taipei 106, Taiwan, ROC
b
Department of Industrial Engineering and Management, I-Shou University, Kaohsiung County 840, Taiwan, ROC
c
Graduate School of Management Science, I-Shou University, Kaohsiung County 840, Taiwan, ROC
Received 3 August 1999; revised 10 May 2002; accepted 10 May 2002

Abstract
Sales forecasting plays a very prominent role in business strategy. Numerous investigations addressing this problem have generally
employed statistical methods, such as regression or autoregressive and moving average (ARMA). However, sales forecasting is very
complicated owing to influence by internal and external environments. Recently, artificial neural networks (ANNs) have also been applied in
sales forecasting since their promising performances in the areas of control and pattern recognition. However, further improvement is still
necessary since unique circumstances, e.g. promotion, cause a sudden change in the sales pattern. Thus, this study utilizes a proposed fuzzy
neural network (FNN), which is able to eliminate the unimportant weights, for the sake of learning fuzzy IF THEN rules obtained from the
marketing experts with respect to promotion. The result from FNN is further integrated with the time series data through an ANN. Both the
simulated and real-world problem results show that FNN with weight elimination can have lower training error compared with the regular
FNN. Besides, real-world problem results also indicate that the proposed estimation system outperforms the conventional statistical method
and single ANN in accuracy. q 2002 Elsevier Science Ltd. All rights reserved.
Keywords: Sales forecasting; Artificial neural networks; Fuzzy neural networks; Fuzzy weight elimination

1. Introduction the candidates for decision makers. However, these methods


are only efficient for data, which are seasonal or cyclical. If
To enhance the commercial competitive advantage in the data are influenced by the special case, like promotion,
a constantly fluctuating environment, an organizations they are not feasible. Though artificial neural networks
management must make the right decision in time (ANNs) which are better than the conventional statistical
depending on the information at hand. However, the methods (Agrawal & Schorling, 1997; Chakraborty,
decision lead time ranges from several years to several Mehrotra, & Mohan, 1992; Kumar, Rao, & Soni, 1995;
hours based on the types of business. Thus, making an Lachtermacher & Fuller, 1995) have been recently
accurate decision in time plays a prominent role. employed, yet the problem still arises. These researches
Intuitively, historical data can provide a feasible estimate do put the promotion into consideration. Nonetheless, it is
through the forecasting models. Therefore, if the marketing treated as an easy way, like one input of the ANN. Thus, this
department can estimate the sales quantity for the next study first attempts to propose a fuzzy neural network
period, the materials department can then effectively control (FNN) which is able to learn the fuzzy IF THEN rules
the inventory to achieve just-in-time (JIT). In addition, the obtained from the marketing experts with respect to the
production department can make the scheduling and arrange promotion. Though FNN concept and learning algorithm
the facility utilization. Such an action may cause the have first been presented by Lin and Lee (1991), it is only
production cost to decrease. Thus, obtaining an accurate for the real inputs and real outputs. The FNN proposed in
forecast most likely appears to be critical. Statistical
this study is not only able to learn the fuzzy IF THEN
methods, such as regression model and ARMA, have been
rules (or fuzzy inputs and outputs), but also posses the fuzzy
* Corresponding author. Tel.: 886-2-2771-2171x2341; fax: 886-2- weights. In addition, it can also eliminate the unimportant
2731-7168. weights during training. This can lead to better convergence
E-mail address: rjkuo@ntut.edu.tw (R.J. Kuo). according to the simulated results in the present study. The
0893-6080/02/$ - see front matter q 2002 Elsevier Science Ltd. All rights reserved.
PII: S 0 8 9 3 - 6 0 8 0 ( 0 2 ) 0 0 0 6 4 - 3
910 R.J. Kuo et al. / Neural Networks 15 (2002) 909925

Nomenclature
p the sample number
Xp the input vector of sample p
Tp the target vector of sample p
Opk the output of kth output node
Oph the output of hth hidden node
Wih the connection weight from ith input node to hth hidden node
Whk the connection weight from hth hidden node to kth output node
Netpk the net internal activity level of kth output node
Netph the net internal activity level of hth hidden node
Qj the bias of jth output node
Ep the cost function for sample p
Eps the cost function of s-levels a-cut set for sample p
L
Epks the cost function of the lower boundary for s-levels a-cut set of sample p
L
Epks the cost function of the upper boundary for s-levels a-cut set of sample p
X p the fuzzy input for sample p
Op the fuzzy output for sample p
W ih ; W
 hk the fuzzy weights
Q h ; Q k the fuzzy biases
h the learning rate
a the momentum term
p
aL ; p aU the lower limit and the upper limit of the a-cut of fuzzy number

main reason to propose such FNN is that the promotion 2.1. Artificial neural networks in sales forecasting
effect on sales is always very vague, or fuzzy.
In addition to obtaining the promotion effect on sales, it In an enterprises decision support system, sales fore-
is also necessary to provide the forecast of the sales. casting always plays a prominent role. An accurate sales
Therefore, this study also aims to develop an intelligent forecasting in advance is able to help the decision maker
sales forecasting system, which consists of three parts: (1) calculate production and materials costs, even determine the
data collection, (2) special pattern model (FNN), and (3) sale price (LeVee, 1992 1993). This will lead to a lower
decision integration (ANN). To evaluate the proposed inventory level and achieve the objective of just-in-time.
system, the real-world data provided by a well-known Among the conventional sales forecasting methods (Chase,
convenience store (CVS) company in Taiwan are used, 1993; Florance & Sawicz, 1993; Meyer, 1993), most of
while the promotion effect is obtained by surveying the them used either factors or time series data to determine the
experts in the retailing. According to these results, the forecast. However, the relationship between the factors or
proposed system performs more accurately than the con- the past time series data (independent variables) and the
ventional statistical method and single ANN, particularly sales (dependent variable) is always quite complicated.
when the promotion is conducted. Obtaining the promising results through the above-
The rest of this paper is organized as follows. Section 2 mentioned approaches is quite difficult. Therefore, various
provides some necessary background information while the decision makers prefer using their own intuition, instead of
proposed system is discussed in Section 3. Section 4 model-based approaches (i.e. time series or regression
presents the simulation results of FNN, while the evaluation models). However, a model-free approach, ANN, is applied
results are summarized in Section 5. Discussion and in the area of forecasting recently owing to its adequate
concluding remarks are finally made in Sections 6 and 7, performance in control and pattern recognition.
respectively. Artificial neural network (ANN) models are built on
networks of processing units called neurons that are arranged
in layers and are connected to one another by restricted links.
Links between neurons have associated weights. Many studies
2. Background have attempted to apply ANNs to time-series forecasting.
However, their conclusions are often contradictory. Some
In this section, sales forecasting systems and applications studies found out that ANNs are better than conventional
of artificial neural networks in sales forecasting are briefly methods (Weigen, Rumelhart, & Huberman, 1991), while
reviewed. In addition, fuzzy neural networks are also others concluded an opposite conclusion (Tang, Almeida, &
discussed in the following. Fishwick, 1991). A weight-elimination back-propagation
R.J. Kuo et al. / Neural Networks 15 (2002) 909925 911

learning procedure to effectively deal with the overfitting fuzzy logic. In the first one, the traditional fuzzy system
problem is introduced by Weigen et al. (1991). It was also mentioned above is based on experts knowledge. However,
applied to sunspots and an exchange rate time series. Tang it is not very objective. Besides, acquiring robust knowledge
et al. (1991) compared the ANN and Box Jenkins models, and finding available human experts are extremely difficult.
using international airline passenger traffic, domestic car Now, ANNs learning algorithm has been applied to
sales and foreign car sales in the USA. They concluded that enhance the performance of a fuzzy system and demon-
the Box Jenkins models outperformed the ANN models in strated to be an innovative approach. In addition, fuzzy IF
short-term forecasting. On the other hand, the ANN models THEN rules were generated and adjusted by learning
outperformed the Box Jenkins in long term forecasting. methods using numerical data. Takagi and Hayashi (1991)
In order to predict the flour prices in three cities in USA, introduced a feedforward ANN into fuzzy inference. An
Chakraborty et al. (1992) presented an ANN approach to ANN represents a rule, while all the membership functions
multivariate time-series analysis. They showed that the are represented by only one ANN. Jang (1991, 1992) and
result is quite accurate. According to their results, the ANN Jang and Sun (1993) proposed a method which transforms
approach is a leading contender among statistical modeling the fuzzy inference system into a functional equivalent
approaches. Lachtermacher and Fuller (1995) developed a adaptive network, and then employs the EBP-type algorithm
calibrated ANN model. The model used Box Jenkins to update the premise parameters and a least squares method
methods to identify the lag components of the data, to identify the consequence parameters. Meanwhile, Fukuda
which should be used as input variables. In addition, it and Shibata (1992), Shibata, Fukuda, Kosuge, and Arai
employed a heuristics to suggest the number of hidden units (1992) and Wang and Mendel (1992) also presented similar
needed in structuring the model. In examining the stationary methods. Nakayama, Horikawa, Furuhashi, and Uchikawa
series, they observed that the calibrated ANN models have (1992) proposed a so-called FNN which has a special
only a slightly better overall performance than the structure for realizing a fuzzy inference system. Each
conventional time-series methods used in the benchmark. membership function consists of one or two sigmoid func-
In the case of a non-stationary series, the calibrated ANN
tions for each inference rule. Lin and Lee (1991) proposed
models outperformed the ARMA model for three of the four
the so-called Neural-Network-Based Fuzzy Logic Control
series, and almost as well as the ARMA in fourth series. The
System (NN-FLCS). They introduced the low-level learning
above survey indicates that ANN is more appropriate for the
power of neural networks in the fuzzy logic system and
time series data. Ansuj, Camargo, Radharamanan, and Petry
provided high-level human-understandable meaning to the
(1996) compared the time series model with interventions
normal connectionist architecture. In addition, Kuo (1994)
and ANN model in analyzing the behavior of sales in a
and Kuo and Cohen (1998, 1999) introduced a feedforward
medium size enterprise. The results showed that ANN
ANN into fuzzy inference represented by Takagi Sugeno
model is more accurate. Kumar et al. (1995) found that
model.
ANN does quite well compared to logistic regression in
predicting a dichotomous choice in presence of several The above-mentioned FNNs are only appropriate for
independent variables. However, considering only some numerical data. However, the experts knowledge is always
series data may result in a worse forecast. Including both the fuzzy type. Thus, some researchers have attempted to
time series data and factors in the forecasting model seems address this problem. Gupta and Knopf (1990) and Gupta
to be preferable. and Qi (1991, 1992) presented some models with fuzzy
Recently, Bigus (1996) used promotion, time of year, end neurons, but no learning algorithms were proposed in the
of month flag, and weekly sales as inputs for the ANN in paper. However, in a series of papers as cited in a survey
order to forecast the weekly demand. The results seem very paper (Buckley & Hayashi, 1994), the authors discussed the
promising. Agrawal and Schorling (1997) also have shown learning algorithms and applications for fuzzy neural
that ANN is able to predict brand shares quite well even networks with fuzzy inputs, weights and outputs (Buckley
when price promotions, feature, and display are present in & Hayashi, 1992; Hayashi, Buckley, & Czogula, 1993).
the data set. Ishibuchi, Kwon and Tanaka (1995a) and Ishibuchi, Okada,
Fujioka, and Tanaka (1993) also proposed learning methods
2.2. Fuzzy neural networks of neural networks to utilize not only numerical data but
also expert knowledge represented by fuzzy IF THEN
ANNs and the fuzzy model have been used in many rules. Lin (1995) and Lin and Lu (1995) also presented an
application areas (Lee, 1990; Lippmann, 1987; Zadeh, FNN, capable of handling both the fuzzy inputs and outputs.
1973), each pairing its own advantages and disadvant- Based on Ishibuchis work, Kuo and Xue (1999) presented
ages. Therefore, how to combine these two approaches an FNN, which does not only posses the asymmetric fuzzy
successfully has become a relevant concern of further inputs and outputs, but also the asymmetric fuzzy weights.
studies. Moreover, genetic algorithm was integrated with the
Two major parts have recently received much interest: proposed FNN in order to yield better results both in
(1) fuse ANN and fuzzy logic and (2) integrate ANN and speed and accuracy (Kuo, Chen, & Hwang, 2001).
912 R.J. Kuo et al. / Neural Networks 15 (2002) 909925

Fig. 1. The architecture of the proposed forecasting system.

3. Methodology 3.2. FNN architecture

Section 2 has emphasized the relevance of sales This section discusses how to use FNN to effectively
forecasting as well as some necessary background infor- handle the circumstance of promotion by means of FNN.
mation. Though the research, like (Agrawal & Schorling, Since the FNN architecture is based on the fuzzy logic
1997; Bigus, 1996), have put the promotion effect on the which possesses both the precondition and consequence, the
sales into consideration, yet it is still a straightforward precondition variables represent the effective factors while
model. It is necessary to develop a more robust approach to the sales represents the consequence variable. First, the data
handle the promotion effect on the sales and then input it to and IF THEN rules are obtained through the fuzzy Delphi
the ANN. The proposed system is discussed in more detail method. After this procedure, the collected data can be
in the following. applied to train the proposed FNN. The structure of FNN
The proposed intelligent forecasting system consists of presented in this study is similar to (Ishibuchi et al., 1995a).
(1) data collection, (2) special pattern model (FNN), and (3) The main difference is that the network employs the
decision integration (ANN). Fig. 1 shows the proposed asymmetric bell shaped instead of triangular fuzzy weights.
system architecture. The system determines the qualitative In addition, the network can eliminate the unimportant
factors affecting the sales first. Thereafter, this effect is weights during training. In the following, the two com-
integrated with time series data through a feedforward ponents, fuzzy Delphi and FNN, are discussed in more
neural network with error back-propagation (EBP) learning detail.
algorithm. Each part is thoroughly discussed in the
following subsections. 3.2.1. Fuzzy Delphi
Delphi method was first developed by Dalkey and
3.1. Data collection Helmer (1963) in RAND Corporation. This approach has
been widely applied in many management areas, e.g.
The current study requires two different kinds of data, forecasting, public policy analysis, or project planning.
which include quantitative and qualitative data. The CVS However, the conventional Delphi method cannot converge
(convenience store) franchise company can provide the very well. Besides, high survey frequencies always result in
daily sales data needed, while the promotion effect on sales high costs. Thus, Ishikawa, Amagasa, Tomiqawa, Tatsuta,
can be implemented by questionnaire. This study employs and Mieno (1993) utilized fuzzy sets theory in the Delphi
the fuzzy questionnaire to obtain the fuzzy IF THEN rules method to resolve the above shortcomings. However, the
from the domain experts. method proposed by Ishikawa et al. (1993) is inappropriate
R.J. Kuo et al. / Neural Networks 15 (2002) 909925 913

for this research. Therefore, the procedures of the modified (1995). Thus, this component intends to modify Ishibuchis
fuzzy Delphi method for this research are as follows: work (1995a). In Ishibuchis work, the input, weight, and
output fuzzy numbers are symmetric triangular. Thus, this
a. Collect all the possible factors, which may affect the paper replaces the triangular fuzzy numbers with asymmetric
sales, and make the sortation, grouping in order to Gaussian functions since it can speed up the convergence (Kuo
formulate the first questionnaire. The domain experts & Xue, 1999). The inputoutput relation of the proposed FNN
select the important factors and give each a fuzzy is discussed in the following. However, the operations offuzzy
number. numbers are presented first.
b. Formulate the second questionnaire, which is a set of Operations of fuzzy numbers. Before describing the FNN
IF THEN rules based on the three dimensions. architecture, fuzzy numbers and fuzzy number operations
c. Fuzzify the returned second questionnaires from the are defined by the extension principle. In the proposed
senior managers and determine the pessimistic index, algorithm, real numbers and fuzzy numbers are denoted by
optimistic index and average index. The formulations the lowercase letters (e.g. a,b,) and a bar placed over
are as follows: uppercase letters (e.g. A; B;
 ), respectively.
1. Pessimistic (minimum) index Since input vectors, connection weights and output
l1 l2 ln vectors of multi-layer feedforward neural networks are
l 1 fuzzified in the proposed FNN, the addition, multiplication
n
and non-linear mapping of fuzzy numbers are necessary for
where li is the pessimistic index of the ith expert defining the proposed FNN. Thus, they are defined as
and n is the number of the experts. follows:
2. Optimistic (maximum) index
 Xx
Zz  Yy
 max{Xx
 ^ Yylz
 x y}; 6
u u2 un
u 1 2  Xx
Zz  Yy
 max{Xx
 ^ Yylz
 xy}; 7
n
where ui is the optimistic index of the ith expert. f Netz max{Netxlz f x}; 8
3. Average (most appropriate index) p
where X; Y;
 Z;
 and Net are fuzzy numbers, denotes
For each interval li ; ui ; calculate the mid-
point, mi li ui =2 and then find the membership function of each fuzzy number, ^ is the
minimum operator, and f x 1 exp2x 2 1 is the
m m1 m2 mn 1=n 3 activation function of hidden units and output units of
the proposed FNN. The a-cut of fuzzy number X is defined as
Thereafter, the fuzzy number A m; s ; sL ; R

which represents the mean, right width, and left  a {xlXx


X  $ a; x [ R} for 0 , a # 1;
width, respectively, for an asymmetric bell shaped  a represents X  aL ; X
 a X  aU  and X
 aL and
where X
function, can be determined through the above U
 a are the lower boundary and the upper boundary of the a-
X
indices:
 a; respectively.
cut set X
l2m FNN learning algorithm. The proposed FNN learning
sR 4
3 algorithm is similar to EBP-type learning algorithm. Before
u 2 m discussing the algorithm, some assumptions should be
sL 5
clarified as follows:
3
The asymmetric bell shaped function is defined as
1. fuzzify a three layer feedforward neural network with nI
a bell shaped function with different left and right
input units, nH hidden units, and nO output units (i.e.
widths.
input vectors, target vectors connection weights and
d. Formulate the third questionnaire with the above
thresholds are fuzzified);
indices and repeat the survey.
2. the input vectors are non-negative fuzzy numbers whose
e. Employ dissemblance index rule (Kaufmann & Gupta,
lower and upper bounds are larger than zero;
1985) to examine the second and third questionnaire
3. these fuzzy numbers are asymmetric Gaussian shaped
fuzzy numbers. If all the fuzzy numbers have converged,
fuzzy numbers.
stop the survey; otherwise, go back to step d.
The input output relation of the proposed FNN is
3.2.2. FNN defined by the extension principle (Ishibuchi et al., 1995a)
Section 3.2.1 has determined the shape for each and can be written as follows:
membership function. However, most FNNs can only
handle the actually real input and output except Ishibuchi Input layer
et al. (1995a), Ishibuchi, Morioka, and Turksen (1995b),
Ishibuchi and Tanaka (1991), Lin (1995) and Lin and Lu  pi a X pi a;
O i 1; 2; ; nI ; 9
914 R.J. Kuo et al. / Neural Networks 15 (2002) 909925

Hidden layer X
nO
Netpk aL  kh aL O
W  ph aL
 ph a f Netph a;
O h 1; 2; ; nH ; 10 i1
 kh aL $0
W
X
nI
Netph a W  pi a Q h a;
 hi aO 11 X
nO
i1 W  ph aU Q k aL ;
 kh aL O
k1
 kh aL ,0
W
Output layer
X
nO
 pk a f Netpk a;
O k 1; 2; ; nO ; 12 Netpk aU  kh aU O
W  ph aU
k1
X
nO  kh aU $0
W
Netpk a W  ph a Q k a:
 kh aO 13
k1 X
nO
 kh aU O
W  ph aL
From Eqs. (11) (15), the a-cut sets of the fuzzy output O pk k1
 kh aU ,0
W
are calculated from the a-cut sets of the fuzzy inputs, fuzzy
weights, and fuzzy biases. If the a-cut set of the fuzzy Q k aU ; 18
outputs O pk is required, then the above relation can be
rewritten as follows:
The objective is to minimize the cost function defined as:
Input layer XX
nO   X
h i h i Ep L
a Ek U
a Eka Epa ; 19
O pi a O pi aL ; O
 pi aU X pi aL ; X pi aU ; a k1 a
14
i 1; 2; ; nI ; where

Hidden layer X
nO  
L U
h i Epa a Ek a Eka ; 20
 ph a O
O  ph aL ; O
 ph aU k1

h  i 1  2
f Netph aL ; f Netph aU ; 15 L
Ek a Tpk aL 2 O
 pk aL ;
2
21
h 1; 2; ; nH ; U 1  2
Ek a Tpk aU 2 O
 pk aU ;
X
nI 2
Netph aL  hi aL O
W  pi aL L U
where Ek a and Eka can be viewed as the squared errors for
i1
 hi aL $0
W the lower boundaries and the upper boundaries of the a-cut
sets of a fuzzy outputs and fuzzy targets. Other a-cut sets of
X
nI
a fuzzy weight are independently modified to reduce Epa :
W  pi aU Q h aL ;
 hi aL O
i1
Otherwise, the fuzzy numbers after modifications are
 hi aL ,0
W distorted. Therefore, each fuzzy weight is updated in a
similar but still different way from the approach of Ishibuchi
X
nI
et al. (1995a). That is, in the proposed FNN, the membership
Netph aU  hi aU O
W  pi aU
i1 functions are asymmetric Gaussian functions (i.e. a general
 hi aU $0
W shape), which are represented as:
X
nI 8  2 !
 pi aL Q h aU ;
 hi aU O >
> 1 x 2 m
W > exp 2
> ; x,m
>
> 2 sL
i1
 hi aU ,0
W
>
<

Ax 1; xm 22
16 >
> !
>
>  
Output layer >
> 1 x2m 2
h i > exp 2
: ; otherwise
 pk a O pk aL ; O
 pk aU 2 sR
O
h    i Thus, the asymmetric Gaussian fuzzy weights are specified
f Netpk aL ; f Netpk aU ; 17
by their three parameters (i.e. center right width and left
width). The gradient search method is derived for each
k 1; 2; ; nO ; parameter. It is the amount of adjustment for each parameter
R.J. Kuo et al. / Neural Networks 15 (2002) 909925 915

average membership function values curve from a 0 to 1.


Dash line, horizontal line as mx 1; x axis, and y axis
bound the measured area. Practically, a is set up to start
from 0.05 in that membership function values for a 0 are
positive and negative infinite. Yagers fuzzy number index,
F, is calculated as follows:
1 1 h i
F m sR 22ln a1=2 m 2 sL 22ln a1=2 da
0:05 2

1 1
F s 2 sL 22ln a1=2 da
m
0:05 2 R

 a1 1
 1
F ma sR 2 sL 22ln a1=2 da 25
 a 0:05 2 0:05

1 X
Fig. 2. Yagers area.
F 0:95m sR sL 22ln a1=2
2 a

using the cost function Epa as follows: F 0:95m 0:556516sR 2 sL

Epa The learning algorithm of fuzzy neural network with weight


Dmkh t 2h bDmkh t 2 1 23 elimination is shown in Fig. 3a.
mkh
Besides the above weight elimination method, the other
Epa approach is to add one more gene at the front of the
DsLkh t 2h bDsLkh t 2 1;
sLkh chromosome. It represents the status of the weight. Zero
24 implies that the weight can be eliminated, while the weight
Epa
DsRkh t 2h bDsRkh t 2 1: should be kept, as it is equal to one. Fig. 4 shows the
sRkh structure of the chromosome and Fig. 3b shows the learning
Since the gradient search method is very easy to find the procedure.
local minimum, this research employs genetic algorithm to
provide the initial weights for the FNN which not only 3.3. Decision integration
decreases the training time but also prevents the local
minimum. For detailed discussion, refer to Kuo et al. From the above part, FNN provides the qualitative effect
(2001). on sales. To yield the final result, the promotional effect is
integrated with time series data. This study employs a
3.2.3. FNN with weight elimination feedforward neural network with EBP learning algorithm.
In Section 3.2.2, all the weights are kept until the end of Since the result from FNN is a fuzzy number, the a-cut,
the training. However, in the case of weights, which are a 0:1; 0:3; 0:5; 0:7 and 0.9, is applied in order to get the
close to zero, it may be the source of bias. If these weights real numbers from the FNN. Thus, there are ten input nodes,
can be deleted during training, the error may be reduced. For which represent the promotion effect.
real-value network, a threshold, which is close to zero, is For feedforward neural network with EBP learning
setup and the weights, which are smaller than the thresholds algorithm, the input neurons use y f x x (no change in
are deleted. The way to determine the close-to-zero value is input) and all the other neurons generally have the sigmoidal
trial and error. If the network is fuzzy instead of crisp, then function y f x 1 exp2x21 : The objective is to
the fuzzy threshold value should be determined. One of the minimize cost function defined as:
fuzzy ranking methods should be chosen to determine 1X
E T 2 Op 2 26
whether the weights should be deleted or not. In the present 2 p p
study, the area measurement method proposed by Yager is
employed, though many other fuzzy ranking methods, like where Tp and Op are the target and actual outputs,
degree of optimality, Hamming distance, a-cut, fuzzy mean respectively, and p is the number of training samples.
and standard deviation, and some other functions, scores or
indices, can be applied.
Yagers approach (1980) provides an index for compar- 4. Simulation
ing the fuzzy numbers. The main concept for calculating the
area is to find out the average membership function values Section 3 has shown the proposed fuzzy neural network
from a 0 to 1. Fig. 2 shows the Yagers area for an with weight elimination theoretically. In order to verify its
asymmetric Gaussian function. Dash line represents the feasibility, this section employs four examples to simulate
916 R.J. Kuo et al. / Neural Networks 15 (2002) 909925

Fig. 3. (a) Training procedure for GA FNNW. (b) Training procedure for GAW FNN.

the FNN. It is written in C language being implemented in ship of fuzzy inputs and fuzzy outputs are shown in Fig. 5 as
IBM compatible PC. The simulation results can be a 0:1; 0:3; 0:5; 0:7; and 0.9, respectively. All the required
referenced for the real-world problem, which will be parameters are set up in the following:
presented in Section 5.
1. The number of hidden nodes: 3
4.1. Example one 2. The number of hidden layers: 2.
3. a-cut levels: a 0:1; 0:3; 0:5; 0:7; and 0.9.
The first example is a linear mapping. Three training 4. The number of training epochs: 30,000.
 Tp;
samples, Xp;  where Xp is the fuzzy input, Tp
 is the 5. Training rate: h 0:3:
fuzzy desired output and training sample number p 6. Momentum: b 0:6:
1; 2; 3; are developed in the two-dimensional space. Each 7. Weight elimination: yes and No.
fuzzy number has mean as ym xm ; left width as yLs 2xLs ;
and right width as yRs 3xRs : The corresponding means and Fig. 6 shows the training results after training. Table 2
standard deviations are presented in Table 1. The relation- presents the MSE values for these two algorithms. The
testing sample results are shown in Fig. 7. It is obvious that
the proposed FNN can well learn the fuzzy relation between
fuzzy inputs and fuzzy outputs accurately. In addition, the
network with weight elimination is really better than the
network without weight elimination.

4.2. Example two

Example two simulates the non-linear relation of cosine


 Tp
function. Six training pairs Xp;  where p 1; 2; ; 6; in
Fig. 4. The structure of chromosome. two-dimensional space are used. They are generated by
R.J. Kuo et al. / Neural Networks 15 (2002) 909925 917

Table 1
Training pairs for example one

Sample number Input Output

Left s m Right s Left s m Right s

1 0.456 2 0.123 0.912 2 0.369


2 0.567 5 0.234 1.134 5 0.702
3 0.678 8 0.345 1.356 8 1.035

Table 2
Example ones MSE values Fig. 5. Example ones training pairs.

Weights elimination With (fuzzy ranking) With (GA) Without

MSE 0.000102 0.000117 0.000124


Improved Rate 17.74%

Table 3
Training pairs of example two

Sample Number Input Output

Left s m Right s Left s m Right s

1 0.05 22.94 0.05 0.05 20.2 0.05


2 0.05 22.62 0.05 0.05 20.5 0.05
3 0.05 22.12 0.05 0.05 20.85 0.05 Fig. 6. Example ones testing results for training pairs.
4 0.05 21.03 0.05 0.05 20.86 0.05
5 0.05 20.52 0.05 0.05 20.5 0.05
6 0.05 20.2 0.05 0.05 20.2 0.05

using the Gaussian functions presented in Table 3. The


mean value is as ym cosxm ;while the left width and right
width are as yLs xLs and yRs xRs ; respectively. The
graphical relation of fuzzy inputs and outputs as a is equal
to 0.1,0.3,0.5,0.7 and 0.9 for training pairs is shown in Fig.
8. All the required parameters setup is the same as example
one. In addition, the training data are linearly transformed in
order to fit the assumptions.
Fig. 9 shows the testing results for the training pairs,
while the computational results for new testing pairs are
shown in Fig. 10. The final MSE values for with and without Fig. 7. Example ones testing results for two new testing pairs.
weight elimination are 0.000054 and 0.000076, respectively
(Table 4). This, again, proves that weight elimination really
can improve the performance of FNN.

4.3. Example three

This example will try to learn fuzzy IF THEN rules with


two precondition variables (temperature X1 and humidity
X2) and one consequence variable (engine speed Y ) through
FNN. These rules are as shown in Table 5.
The Gaussian functions applied are presented in Table 6.
Totally, there are nine training pairs for FNN. In order to
determine the best network topology, different hidden node
numbers ranging from 3 to 6 are implemented. Similarly,
different training rates, 0.3 and 0.6, and different momentum Fig. 8. Example twos training pairs.
918 R.J. Kuo et al. / Neural Networks 15 (2002) 909925

Table 4
Example twos MSE values

Weights elimination With (fuzzy ranking) With (GA) Without

MSE 0.000054 0.000061 0.000076


Improved Rate 28.95%

terms, 0.3 and 0.6, are also tested. In addition, two different
fuzzy thresholds are testified (0.3, ^ 0.1, 0.3) and (0.5,
^ 0.3, 0.5). Totally, there are 48 different combinations. The
simulation results are shown in Table 7. The network with
weight elimination definitely can reduce the training error. Fig. 9. Example twos testing results for training pairs.
The decrease rate is 3.5%. Twelve out of thirteen cases show
better performance in the case that weights are really momentum have significant influence on training error
eliminated. compared with those with weight elimination, since their p-
The other goal of trained FNN is to infer two more values are 0.006 and 0.009, respectively. Besides, the
linguistic terms for each precondition variable with respect interaction of training rate and momentum is more
to existed three linguistic terms. Thus, totally there are 25 significant, while the interaction of training rate and weight
fuzzy IF THEN rules after inference. The new added fuzzy elimination and interaction of momentum and weight
rules are presented in Table 8. Besides, genetic algorithm is elimination are not significant as a 0:05: According to
also applied for increasing the accuracy. Table 9 indicates these results, weight elimination should be implemented
the computational results. It is very clear that GA FNNW after the training rate and momentum term have been well
can provide the best forecast. However, using GA to setup. This can result in the lowest training error.
eliminate the unimportant fuzzy weights can provide better
result, but not the best.
5. Model evaluation results
4.4. Example four
The above sections have presented the proposed fore-
The purpose of this example is to reconfirm the validity casting system and FNNs feasibility numerically. Further, a
of FNN with weight elimination both in speed and accuracy. real-world problem is applied to verify the proposed
The training pairs are adopted from example one. The initial systems practicality. In addition, the proposed system is
also compared with the other method, single ANN and
weights are generated randomly. The number of simulation
ARMA. Both the procedures and results are sequentially
is 30 for both cases using IBM compatible PC-166. All the
shown in the following subsections.
parameters setup is identical to example one. Table 10
shows the simulation results and t test values. Weight
elimination really reduces the training time and training 5.1. Data collection
error as a 0:01:
In order to further find out the procedure for weight A nationally well-known CVS franchise company
elimination, we conducted a three-factorial (training rate, provides the daily sales data. Since the forecasting pattern
momentum and weight elimination) design using example
three. There are two levels for each factor. Thus, totally
there are eight different combinations. The ANOVA results
are presented in Table 11. It indicates that training rate and

Table 5
Example threes fuzzy rules table for training

X1 X2

S MS M ML L

L S MS M
ML
M MS M ML
MS
S ML L L
Fig. 10. Example twos testing results for four new testing pairs.
R.J. Kuo et al. / Neural Networks 15 (2002) 909925 919

there are five times of promotion. The time period lasts from
1 January 1995 to 14 January 1996. For the purpose of
testing, these 379 data points are further divided into
training set and testing set. The front one has 334 data points
while the latter one having 45 data points.
(2) Expert questionnaire. To survey all the possible
factors of promotion and their effects on the sales, this study
employs fuzzy Delphi method. The questionnaires setup is
based on the companys practical requirements. Thus, some
factors are included. The procedures are based on the
modified fuzzy Delphi method.
Fig. 11. The time series data.
A large number of factors can generally affect the sales.
However, a different product has different characteristics.
is divided into two categories, general pattern and special After discussing with the companys senior managers, all
pattern, the data collection is also comprised of two parts: the factors are divided into three dimensions. The first
(1) Time series data. The company provides the daily dimension represents the methods of promotion, while the
sales of 500 cm3 papaya milk. The total number of the data types of advertising media are presented in the second
points is 379 as shown in Fig. 11. The sudden increase of dimension. The third dimension represents the competitors
sales indicates that the promotion is conducting. Totally actions. Table 12 presents the fuzzy number of each event

Table 6
Training pairs of example three

Rule no. Input Output

X1 X2 Y

Left s m Right s Left s m Right s Left s m Right s

1 5 20 7 2 210 2.1 10 250 12


2 5 20 7 1.85 0 2.5 9 300 14
3 5 20 7 2.3 10 1.5 9 300 14
4 7 50 4 2 210 2.1 12 150 13
5 7 50 4 1.85 0 2.5 14 200 11
6 7 50 4 2.3 10 1.5 10 250 12
7 6 80 8 2 210 2.1 10 100 15
8 6 80 8 1.85 0 2.5 12 150 13
9 6 80 8 2.3 10 1.5 14 200 11

Table 7
Example threes MSE values for different topologies and setup

Hidden nodes h b Without (0.1, 0.3) Deleted no. Improved rate (%) (0.3, 0.5) Deleted no. Improved rate (%)

0.3 0.3 0.005319 0.005319 0 0.00 0.005133 3 3.50


3 0.3 0.6 0.005805 0.005510 3 5.09 0.005510 3 5.09
0.6 0.3 0.005929 0.005612 3 5.35 0.005612 3 5.35
0.6 0.6 0.006947 0.006947 0 0.00 0.006947 0 0.00
0.3 0.3 0.005233 0.005106 4 2.41 0.005088 4 2.77
4 0.3 0.6 0.005653 0.005436 4 3.83 0.005436 4 3.83
0.6 0.3 0.005763 0.005763 0 0.00 0.005534 4 3.97
0.6 0.6 0.006644 0.006644 0 0.00 0.006644 0 0.00
0.3 0.3 0.005199 0.005199 0 0.00 0.005295 5 21.85
5 0.3 0.6 0.005557 0.005557 0 0.00 0.005388 5 3.04
0.6 0.3 0.005657 0.005657 0 0.00 0.005482 5 3.09
0.6 0.6 0.006452 0.006452 0 0.00 0.006452 0 0.00
0.3 0.3 0.005236 0.005236 0 0.00 0.005236 0 0.00
6 0.3 0.6 0.005920 0.005920 0 0.00 0.005920 0 0.00
0.6 0.3 0.005666 0.005666 0 0.00 0.005666 0 0.00
0.6 0.6 0.006223 0.006223 0 0.00 0.006223 0 0.00
920 R.J. Kuo et al. / Neural Networks 15 (2002) 909925

Table 8 study. The reason is that genetic algorithm may prevent the
New fuzzy rule table of example three network getting stuck to the local minimum and accelerate
X1 X2 the training speed. In addition, different training rates and
momentum terms may yield different results. Thus, two
S MS M ML L training rates, 0.3 and 0.6, and two momentum terms, 0.3
and 0.6, are tested. Two different fuzzy threshold numbers
L S S MS M M
ML S MS M M ML
as used in example four will also be utilized. The network
M MS M M ML ML will not stop learning until 30,000 epochs. The a-level sets
MS M M ML L L are 0.1, 0.3, 0.5, 0.7, and 0.9. For the training results as
S ML ML L L L shown in Table 14, the cell with symbol p is the best result,
or best network. The lowest MSE value is 0.993 1023 as
training rate and momentum are 0.3 and 0.6, respectively.
after three times of survey. The reason for three times of
The network structure is 3 7 7 1. The weight
survey is that the similarity testing results indicate that all
elimination criterion is (0.5, ^ 0.3, 0.5). Besides, GAW
the fuzzy numbers have converged for the third survey.
FNN is also implemented for comparison. The best topology
Therefore, this knowledge base will be applied to train the
is 3 8 8 1 and its MSE is 1.021 1023 as training
FNN and represents the FNN outputs.
rate and momentum are 0.3 and 0.1, respectively. Finally,
Besides, one event, 3-dollar discount, is selected as the
both these two networks become integrated with time series
testing case. Thus, totally there are only 42 (3 7 2) IF-
THEN rules for training the FNN as shown in Table 13. data in the next part.

5.2. Special pattern model (FNN) 5.3. Decision integration model (ANN)

The initial weights for FNN are generated by using This subsection will demonstrate the integration of the
genetic algorithm proposed by Kuo et al. (2001) in this qualitative factor effect on sales and time series data. Both

Table 9
The results for different algorithms

Algorithms FNN FNNW GA FNN GAW FNN GA FNNW

Mean square error 0.005818 0.005767 0.000556 0.000526 0.00007975

FNN: FNN only and without weight-elimination; FNNW: FNN only and with weight-elimination; GA FNN: genetic algorithm to find the initial weights
and followed by FNN; GAW FNN: genetic algorithm with weight-elimination and followed by FNN; GA FNNW: genetic algorithm and followed by
FNN with weight-elimination.

Table 10
Example fours MSE values and t values

With/out weight elimination Mean Standard deviation Sample no. t value P-value Improved rate (%)

Error Without 0.000111 2.5 10205 30 3.392 0.0010 17.99


With 0.000091 2 10205 30
Time Without 336.86 0.52481 30 14.4 5 10215 0.62
With 334.77 0.59616 30

Table 11
Simulation ANOVA table

Source of deviation Degree of freedom Sum square Mean sum square F Value P Value

h 1 1.69533 10205 1.69533 10205 8.034 0.006


b 1 1.55113 10205 1.55113 10205 7.350 0.009
Cut 1 2.36135 10207 2.36135 10207 0.112 0.739
h*b 1 1.76375 10206 1.76375 10206 0.836 0.364
h pcut 1 9.39688 10209 9.39688 10209 0.004 0.947
b pcut 1 1.47744 10209 1.47744 10209 0.001 0.979
Combination error 57 0.000120287 2.1103 10206
Total 63 0.000154763
R.J. Kuo et al. / Neural Networks 15 (2002) 909925 921

Table 12
The fuzzy number of each event for the third questionnaire

Factors Events Average (m ) sR l 2 m=3 sL u 2 m=3

Promotion methods 10 dollars discount 8.6000 0.2333 0.2000


5 dollars discount 4.9000 0.3000 0.2333
3 dollars discount 2.7000 0.1000 0.0667
Buy two get one free 6.8000 0.3000 0.2000
At night on TV 7.6000 0.3000 0.2667
At noon on TV 2.5000 0.3333 0.1667
Advertising media In the evening on TV 3.1000 0.4667 0.1333
Radio 4.5000 0.4000 0.2667
Newspaper 3.8000 0.5000 0.1667
POP notice 6.5000 0.4333 0.1000
Poster 6.6000 0.3000 0.2000
Competitors action Related products without promotion 7.6000 0.4000 0.2000
Related products with promotion 4.5000 0.3333 0.1000

the training and testing results are presented in the former always has 10.05% less training epochs than the
following. latter.
(1) Training. For the integration network with both the In order to find out better results for these three models,
qualitative and quantitative factors, eight different models three training rates, 0.1, 0.3, 0.5, and three momentum
are testified in order to find out the best network topology. terms, 0.1, 0.5, 0.8, are tested. Totally there are nine
Table 15 shows the four different kinds of network topology combinations for each model. The computational results are
(Models I VIII). Besides, Table 15 also presents the four shown in Table 16. Model III has lowest MSE value,
network topologies (Models IX XII) without qualitative 0.001837, as a and b are 0.1 and 0.5, respectively, while the
factors. In addition, conventional network with an lowest MSE value for model VIII is 0.0020020 as a and b
are 0.1 and 0.5, respectively. Model XII has the lowest MSE
additional input unit for the promotion effect is also
value 002282 as a and b are 0.3 and 0.5, respectively.
considered. This input unit receives on off units (1:
Model XVI has lowest MSE value, 0.002102 as a and b are
promotion, 0: non-promotion). There are also four network
0.5 and 0.8, respectively.
topologies (Models XIII XVI) for such setup. Training rate (2) Testing. Though model III has been shown to be the
and momentum are both 0.5. The network will not stop best network using training data, it cannot be guaranteed
training until the MSE no longer decreases over 500 epochs. that its testing results are also the best. The further
Four best networks for two integration networks, conven- comparison is based on the 45 testing data points as
tional network, and conventional network with promotion mentioned. There is one time of promotion, 3-dollar
unit are models III, VIII, XII, and XVI, respectively. Model discount, during the period and it has not been included in
III with network structure 25 28 1 has the MSE value the FNN knowledge base. Both the MSE and MAPE (mean
0.002151, while the MSE value for model IIIV with absolutely percentage error) values for testing set are listed
structure 30 58 1 has the MSE value 0.002248. Model in Table 17. It is very clear that integration model still has
XII with structure 20 34 1 is 0.002441, while model the best performance compared with both conventional
XVI with network structure 21 36 1 has the MSE value networks and ARMA (autoregressive and moving average)
0.002397. Basically, the MSE values of networks with both model. The MSE value for integration model with GA
quantitative and qualitative factors are all smaller than those FNNW algorithm is 0.001753, while the integration model
of networks with only quantitative factors. Besides, the with GAW FNN algorithm has the MSE value of
0.001856. The former is also outperforms the latter. The
coefficients of ARMA are determined after examining the
ACF (auto-correlation function) and PACF (partial auto-
correlation function). The actual and forecasting outputs are
shown in Fig. 12.

6. Discussion

Section 5 has presented evaluation results based on data


Fig. 12. The integration ANN forecasting result. accumulated from a CVS franchise company. The proposed
922 R.J. Kuo et al. / Neural Networks 15 (2002) 909925

Table 13
Fuzzy IFTHEN rules

Rule Fuzzy number

IF THEN

Promotion Media Competitor Left s m Right s

1 10 dollars discount At night on TV No 0.2000 7.8000 0.2667


2 10 dollars discount At noon on TV No 0.0333 2.1000 0.5667
3 10 dollars discount In the evening on TV No 0.1000 3.9000 0.4333
4 10 dollars discount Radio No 0.2000 4.6000 0.2667
5 10 dollars discount Newspaper No 0.2333 5.5000 0.3000
6 10 dollars discount POP notice No 0.2000 7.0000 0.3333
7 10 dollars discount Poster No 0.1000 6.5000 0.3000
8 5 dollars discount At night on TV No 0.2000 6.0000 0.4000
9 5 dollars discount At noon on TV No 0.0333 1.1000 0.3000
10 5 dollars discount In the evening on TV No 0.1000 3.3000 0.3667
11 5 dollars discount Radio No 0.1333 3.4000 0.3333
12 5 dollars discount Newspaper No 0.2000 4.2000 0.3333
13 5 dollars discount POP notice No 0.1333 5.0000 0.4000
14 5 dollars discount Poster No 0.1000 5.3000 0.3000
15 Buy two get one free At night on TV No 0.2333 7.3000 0.3000
16 Buy two get one free At noon on TV No 0.0000 3.2000 0.4667
17 Buy two get one free In the evening on TV No 0.0333 3.7000 0.4333
18 Buy two get one free Radio No 0.2000 5.2000 0.3333
19 Buy two get one free Newspaper No 0.2000 5.4000 0.3333
20 Buy two get one free POP notice No 0.2333 6.5000 0.3000
21 Buy two get one free Poster No 0.2000 6.6000 0.2667
22 10 dollars discount At night on TV Yes 0.2333 7.7000 0.3000
23 10 dollars discount At noon on TV Yes 0.1000 3.7000 0.3667
24 10 dollars discount In the evening on TV Yes 0.1000 4.1000 0.4333
25 10 dollars discount Radio Yes 0.2000 4.8000 0.2667
26 10 dollars discount Newspaper Yes 0.2333 5.7000 0.3000
27 10 dollars discount POP notice Yes 0.2333 6.1000 0.3667
28 10 dollars discount Poster Yes 0.2000 7.1000 0.2333
29 5 dollars discount At night on TV Yes 0.2000 5.8000 0.4000
30 5 dollars discount At noon on TV Yes 0.0000 2.4000 0.4000
31 5 dollars discount In the evening on TV Yes 0.0333 2.1000 0.4333
32 5 dollars discount Radio Yes 0.1333 3.4000 0.3333
33 5 dollars discount Newspaper Yes 0.1333 4.4000 0.4000
34 5 dollars discount POP notice Yes 0.1000 4.7000 0.3667
35 5 dollars discount Poster Yes 0.1667 6.0000 0.3333
36 Buy two get one free At night on TV Yes 0.2000 6.8000 0.2667
37 Buy two get one free At noon on TV Yes 0.1000 3.5000 0.3667
38 Buy two get one free In the evening on TV Yes 0.1000 3.5000 0.3667
39 Buy two get one free Radio Yes 0.1000 4.6000 0.4000
40 Buy two get one free Newspaper Yes 0.1000 4.5000 0.3667
41 Buy two get one free POP notice Yes 0.2000 6.2000 0.2667
42 Buy two get one free Poster Yes 0.2000 7.1000 0.2333

scheme utilizes two kinds of information (i.e. fuzzy if then with the forecasting result, the integration ANN is also
rules and numerical time series data) in the learning of second to none. Regarding the ARMA model and ANN
neural networks. The factor effects on the sales seem to be model, the results are dependent. Basically, if the ANN can
subjective since the data are provided by either the senior be well set up, it can provide the better result.
managers or experts. However, the number of experts is 20, The proposed FNN is able to learn the relationships
implying that the subjective factors can be reduced. In between fuzzy inputs and outputs. If pruning technique
particularly the fuzzy Delphi method is employed; the is included in the training, it provides more promising
above consideration can also be ignored. results. However, the simulation results according to
Table 17 indicates that integration model outperforms all example four indicate that the training rate and
other forecasting methods, e.g. ARMA (2,5) and conven- momentum should be well set up before including the
tional ANN. The reason is that the integration ANN weight elimination. Also, GA FNNW can provide the
prioritizes the promotion effect on the sales pattern. Even best performance.
R.J. Kuo et al. / Neural Networks 15 (2002) 909925 923

Table 14
Training results of FNN

Network structure Parameter Weight-elimination criteria

h b m 0:1; s 0:3 m 0:3; s 0:5

3-3-3-1 0.3 0.6 0.002994 0.002950 0.002971 0.002994 0.002950 0.002971


0.6 0.3 0.002546 0.002541 0.002542 0.002546 0.002167 0.002589
3-4-4-1 0.3 0.6 0.002035 0.002030 0.002034 0.002035 0.001742 0.002034
0.6 0.3 0.002082 0.002079 0.002082 0.002082 0.002079 0.002082
3-5-5-1 0.3 0.6 0.001536 0.001545 0.001536 0.001717 0.001652 0.001674
0.6 0.3 0.001646 0.001646 0.001645 0.001756 0.001686 0.001794
3-6-6-1 0.3 0.6 0.001706 0.001691 0.001718 0.001681 0.001686 0.001741
0.6 0.3 0.002622 0.002591 0.002597 0.002056 0.001946 0.002060
p
3-7-7-1 0.3 0.6 0.001013 0.001269 0.001108 0.001013 0.000993 0.001040
0.6 0.3 0.001346 0.001349 0.001345 0.001414 0.001349 0.001345
3-8-8-1 0.3 0.6 0.001113 0.001097 0.001110 0.001249 0.001250 0.001276
0.6 0.3 0.001350 0.001393 0.001355 0.001471 0.001500 0.001494

Table 15
Different neural network model

Network number Time series Qualitative factor a-cut Promotion unit Input nodes Hidden nodes

Integration network (GA FNNW) I 5 10 0 15 8 30


II 10 10 0 20 11 40
III 15 10 0 25 13 50
IV 20 10 0 30 15 60
Integration network (GAW FNN) V 5 10 0 15 8 30
VI 10 10 0 20 11 40
VII 15 10 0 25 13 50
VIII 20 10 0 30 15 60
Conventional network IX 5 0 0 5 3 12
X 10 0 0 10 6 22
XI 15 0 0 15 8 30
XII 20 0 0 20 11 40
Conventional network with promotion unit XIII 5 0 1 6 3 14
XIIII 10 0 1 11 6 24
XV 15 0 1 16 8 32
XVI 20 0 1 21 11 42

Table 16
MSE values of network III for different setup

h III VIII XI XVI

b 0.1 0.3 0.5 0.1 0.3 0.5 0.1 0.3 0.5

0.1 0.001943 0.002315 0.002067 0.002054 0.002398 0.002284 0.002708 0.002349 0.002350 0.002683 0.002293 0.002302
0.5 0.001837 0.002077 0.002151 0.002002 0.002211 0.002309 0.002454 0.002282 0.002441 0.002377 0.002254 0.002397
0.8 0.002389 0.002210 0.002332 0.002477 0.002428 0.002457 0.002793 0.002697 0.002712 0.002636 0.002598 0.002102

7. Conclusions improvement. Integrating the time series data with the


qualitative factor effect from FNN can provide more reliable
This study has developed an intelligent forecasting forecast. Besides, the FNN with weight elimination really
system based on FNN to solve the sales forecasting problem can improve FNNs performance. In the future, the authors
under promotion. Though directly using a single ANN to would like to further improve the capability of FNN. One
model the sales pattern has been shown to be better than the direction is to apply floating genetic algorithm to find out
conventional statistical methods, it still need further the solution for FNN, while the other is to include more
924 R.J. Kuo et al. / Neural Networks 15 (2002) 909925

Table 17 International Joint Conference on Neural Networks, Seattle, II,


Comparison of integration and conventional networks 43124436.
Gupta, M. M., & Qi, J. (1992). On fuzzy neuron models. In L. Zadeh, &
Models MSE MAPE (%) J. Kacprzky (Eds.), Fuzzy logic for the management of uncertainty
(pp. 479 491). New York: Wiley.
Integration with GA FNNW 0.001753 7.689 Hayashi, Y., Buckley, J. J., & Czogula, E. (1993). Fuzzy neural network.
Integration with GAW FNN 0.001856 8.388 International Journal of Intelligent Systems, 8, 527 537.
Conventional 0.002359 14.929 Ishibuchi, H., Kwon, K., & Tanaka, H. (1995a). A learning algorithm of
Conventional with promotion unit 0.002173 11.237 fuzzy neural networks with triangular fuzzy weights. Fuzzy Sets and
ARMA (2,5) 0.005356 20.323 Systems, 71, 277293.
Ishibuchi, H., Morioka, K., & Turksen, I. B. (1995b). Learning by fuzzified
neural networks. International Journal of Approximate Reasoning,
13(4), 327 358.
qualitative factors, which may yield a more precise result. In Ishibuchi, H., Okada, H., Fujioka, R., & Tanaka, H. (1993). Neural
addition, this proposed system structure can also be used for networks that learn from fuzzy if then rules. IEEE Transactions on
other applications. Fuzzy System, FS-1(2), 8597.
Ishibuchi, H., & Tanaka, H. (1991). An extension of the BP-algorithm to
interval input vectors. Proceedings of IEEE International Joint
Conference on Neural Networks, Singapore (pp. 15881593).
Acknowledgments Ishikawa, A., Amagasa, M., Tomiqawa, G., Tatsuta, R., & Mieno, H.
(1993). The mix min Delphi method and fuzzy Delphi method via
fuzzy integration. Fuzzy Sets and Systems, 55, 241 253.
The authors would like to thank the National Science Jang, J. -S. R. (1991). Fuzzy modeling using generalized neural networks
Council, Republic of China for partially supporting this and kalman filter algorithm. Proceedings of Ninth National Conference
manuscript under Contract No. NSC 89-2213-H-027. Mr on Artificial Intelligence (pp. 762767).
L.C. Shie, who is the manager of the company providing the Jang, J. -S. R. (1992). Fuzzy controller design without domain expert. IEEE
data, is also appreciated for providing the daily sales data International Conference on Fuzzy Systems (pp. 289296).
Jang, J.-S. R., & Sun, C.-T. (1993). Functional equivalence between radial
and his valuable discussion regarding chain store basic function networks and fuzzy inference systems. IEEE Trans-
promotion. actions on Neural Networks, 4(1), 156159.
Kaufmann, A., & Gupta, M. M. (1985). Introduction to fuzzy arithmetic.
Amsterdam: North-Holland.
Kumar, A., Rao, V. R., & Soni, H. (1995). An empirical comparison of
References neural network and logistic regression models. Marketing Letters, 6(4),
251 263.
Agrawal, D., & Schorling, C. (1997). Market share forecasting: An Kuo, R. J. (1994). Multi-sensor integration for manufacturing process
empirical comparison of artificial neural networks and multinomial control through artificial neural networks and fuzzy modelling. PhD
logit model. Journal of Retailing, 72(4), 383 408. Thesis, The Pennsylvania State University, PA, USA.
Ansuj, A. P., Camargo, M. E., Radharamanan, R., & Petry, D. G. (1996). Kuo, R. J., Chen, J. H., & Hwang, Y. C. (2001). An intelligent stock trading
Sales forecasting using time series and neural networks. Computers and decision support system through integration of genetic algorithm based
Industrial Engineering, 31(1/2), 421 424. fuzzy neural network and artificial neural network. Fuzzy Sets and
Bigus, J. P. (1996). Data mining with neural networks: Solving business Systems, 118(1), 2145.
problems-from application development to decision support. New Kuo, R. J., & Cohen, P. H. (1998). Manufacturing process control through
York: McGraw-Hill. integration of neural networks and fuzzy model. Fuzzy Sets and
Buckley, J. J., & Hayashi, Y. (1992). Fuzzy neural nets and applications. Systems, 98(1), 15 31.
Fuzzy Systems and AI, 3, 11 41. Kuo, R. J., & Cohen, P. H. (1999). Integration of RBF network and fuzzy
Buckley, J. J., & Hayashi, Y. (1994). Fuzzy neural networks: A survey. neural network for tool wear estimation. Neural Networks, 12(2),
Fuzzy Sets and Systems, 66, 113. 355 370.
Chakraborty, K., Mehrotra, K., & Mohan, C. K. (1992). Forecasting the Kuo, R. J., & Xue, K. C. (1999). Fuzzy neural network with application to
behavior of multivariate time series using neural networks. Neural sales forecasting. Fuzzy Sets and Systems, 108(2), 123 143.
Networks, 5(6), 961 970. Lachtermacher, G., & Fuller, J. D. (1995). Backpropagation in time-series
Chase, C. W. (1993). Ways to improve sales forecasts. Journal of Business forecasting. Journal of Forecasting, 14, 381393.
Forecasting, 12(3), 15 17. Lee, C. C. (1990). Fuzzy logic in control systems: Fuzzy logic controller
Dalkey, N. C., & Helmer, O. (1963). An experimental application of Parts I and II. IEEE Transactions on Systems, Man, and Cybernetics,
the Delphi method to the use of experts. Management Science, 9, SMC-20(2), 404 435.
458 467. LeVee, G. S. (19921993). The key to understanding the forecasting
Florance, M. M., & Sawicz, M. S. (1993). Positioning sales forecasting for process. Journal of Business Forecasting, 11(4), 1216.
better results. Journal of Business Forecasting, 12(4), 2728. Lin, C. T. (1995). A neural fuzzy control system with structure and
Fukuda, T., & Shibata, T. (1992). Hierarchical intelligent control for parameter learning. Fuzzy Sets and Systems, 70, 183212.
robotic motion by using fuzzy, artificial intelligence, and neural Lin, C. T., & Lee, C. S. G. (1991). Neural-network-based fuzzy logic
network. Proceedings of IJCNN92 (pp. I-269 I-274). control and decision system. IEEE Transactions on Computer, C-
Gupta, M. M., & Knopf, G. K. (1990). Fuzzy neural network approach 40(12), 13201336.
to control systems. Proceedings of the First International Lin, C. T., & Lu, Y. C. (1995). A neural fuzzy system with linguistic
Symposium on Uncertainty Modeling and Analysis, Maryland, teaching signals. IEEE Transactions on Fuzzy Systems, 3(2), 169 189.
MD (pp. 483 488). Lippmann, R. P. (1987). An introduction to computing with neural nets.
Gupta, M. M., & Qi, J. (1991). On fuzzy neuron models. Proceedings of IEEE ASSP Magazine, 422.
R.J. Kuo et al. / Neural Networks 15 (2002) 909925 925

Meyer, G. G. (1993). Marketing research and sales forecasting at schlegel


corporation. Journal of Business Forecasting, 12(2), 2223. R.J. Kuo received the MS degree in Industrial and Manufacturing
Nakayama, S., Horikawa, S., Furuhashi, T., & Uchikawa, Y. (1992). Systems Engineering from Iowa State University, Ames, IA, in 1990 and
Knowledge acquisition of strategy and tactics using fuzzy neural the PhD degree in Industrial and Management Systems Engineering from
networks. Proceedings of IJCNN92 (pp. II-751II-756). the Pennsylvania State University, University Park, PA in 1994.
PDP Research Group, Rumelhart, D. E., & McClelland, J. L. (1986) Currently, he is the Professor and Chairman in the Department of
(Vol. 1). Parallel distributed processing, Cambridge, MA: MIT Industrial Engineering, National Taipei University of Technology,
Press. Taiwan, ROC. His research interests include architecture issues of neural
Shibata, T., Fukuda, T., Kosuge, K., & Arai, F. (1992). Skill based control networks, fuzzy logic, and genetic algorithms, and their applications in
by using fuzzy neural network for hierarchical intelligent control. decision support systems, process control and forecasting.
Proceedings of IJCNN92 (pp. II-81II-86).
Takagi, T., & Hayashi, I. (1991). NN-driven fuzzy reasoning. International
Journal of Approximate Reasoning, 5, 191212.
Tang, Z., Almeida, C., & Fishwick, P. A. (1991). Times series forecasting P. Wu received the MS degree in Industrial and Manufacturing
using neural networks vs. BoxJenkins methodology. Simulations, Systems Engineering from Iowa State University, Ames, IA, in 1992
Simulations Councils, 303310. and the PhD degree in Industrial Engineering from North Carolina State
Wang, L.-X., & Mendel, J. M. (1992). Back-propagation fuzzy system as University in 1997. Currently, he is an associate professor in the
nonlinear dynamic system identifiers. IEEE International Conference Department of Industrial Engineering and Management, I-Shou
on Fuzzy Systems (pp. 14091418). University, Taiwan, ROC. His research focuses on fuzzy logic, neural
Weigen, A. S., Rumelhart, D. E., & Huberman, B. A. (1991). networks, decision support systems, and operations research.
Generalization by weight-elimination with application to forecasting.
Advances in Neural Information Processing Systems, 3, 875 882.
Yager, R. R. (1980). On choosing between fuzzy subsets. Kybernetes, 9,
151154. C.P. Wang received the MS degree in management science from the I-
Zadeh, L. A. (1973). Outline of a new approach to the analysis of complex Shou University, Taiwan, in 1999. Currently, he is in the military
systems and decision processes. IEEE Transactions on Systems, Man, service. His research interests include neural networks, fuzzy logic, and
and Cybernetics, SMC-3(1), 28 44. their applications in logistics and forecasting.

Вам также может понравиться