Вы находитесь на странице: 1из 9

Backpropagation Networks for Time Series Forecasting:

Case Studies in Data Modeling

Kriangsiri Malasri Siripong Malasri


Undergraduate Student Professor and Dean
School of Aerospace Engineering School of Engineering
Georgia Institute of Technology Christian Brothers University
Atlanta, GA 30332 Memphis, TN 38104

ABSTRACT

A commercial software package, NeuroShell 2, was used to apply a standard backpropagation neural network for
forecasting the behavior of various types of time series, x = x(t). The series used covered a wide range of
complexity: linear, quadratic, sinusoidal, damped sinusoidal, and finally a "real-world" application using sample
stock data provided with the software. Depending on the case, the data was modeled using different methods of
varying difficulty. For the linear and quadratic relations, reasonable results were obtained simply by mapping
values of x(t) to their corresponding t. In the sinusoidal cases, values for x(t) were inputted to the network in small
groups, having it predict only the next point after each group. Finally, in the stocks case, a complicated approach
using a large number of relevant variables was employed. Overall, there was a direct relationship between the
complexity of the series and the complexity of the model required.

INTRODUCTION

The backpropagation network model has long been a backbone in the practical application of neural networks. For
this paper, it was desired to see how the model would perform in forecasting the behavior of various time series.
The series used were:

• Linear: x(t) = t
• Quadratic: x(t) = t2
• Sinusoidal: x(t) = sin t
• Damped sinusoidal: x(t) = 1.05-t sin t
• Share prices of a certain stock over a 955-day time period

In addition, three different data models (to be discussed in the following section) were applied, depending on the
series being considered. To construct the required network architectures and carry out the training, a commercial

1
software package called NeuroShell 2 [1] was used. The program contains network design tools and provides a
variety of statistical data as well. In all cases, a standard backpropagation network was used, with one input layer,
one hidden layer, and one output layer.

TYPES OF DATA MODELING

"Data modeling" refers to how the time series data is provided to the network for training. As mentioned before,
three different types of models were considered. For simplicity in the rest of the paper, they are referred to via
Roman numerals, as follows:

I. Values of x(t) are mapped directly to their corresponding t. Given values for x(t ) over some range of t, the
network predicts x(t) beyond this range. The network consists of one input ti , some number of hidden cells,
and one output x(ti ), as shown in Figure 1.

Figure 1. Network configuration for type I data modeling.

II. Values of x(t) are provided in small sets, paired with the corresponding following value of x(t). Given a
previously unknown set of x(t), the network predicts the next x(t). For instance, in the series x(t) = t2 , the
training data might consist of {0, 1, 4, 9, 16}, 25; {1, 4, 9, 16, 25}, 36; {4, 9, 16, 25, 36}, 49; ... ; {25, 36, 49,
64, 81}, 100. Then, when given the set {36, 49, 64, 81, 100}, the network would ideally predict 121 as the
following value. In this case the network consists of 5 inputs, some number of hidden cells, and one output,
as shown in Figure 2.

2
Figure 2. Network configuration for type II data modeling, using 5 inputs.

III. Values of x(t) are provided over some range of t, along with a number of relevant variables that can affect the
behavior of x(t). The network predicts x(t) beyond this range. Clearly, the input variables used are dependent
on the nature of the problem, and selecting them effectively is not a trivial task. This is thus the most
complex data model considered here. A representation of the network for this type of model is shown in
Figure 3.

Figure 3. Network configuration for type III data modeling.

3
LINEAR AND QUADRATRIC SERIES

For the linear series x(t) = t, model I was applied; the network has a single input t and a single output x(t). The
training data consists of all the {t, x(t)} pairs for t = 1, 2, 3, ... , 40. Training of the network was carried out until the
average error on the training patterns fell below 0.00001. The network was asked to forecast values of x(t) for t =
41, 42, 43, ... , 50. As a matter of interest, the process was repeated for varying numbers of hidden cells. The results
are summarized in Table 1 and Figure 4.

The quantity R2 mentioned throughout this paper is the coefficient of multiple determination. It is used as a measure
of the performance of the trained network; a perfect fit with the desired outputs would yield R2 = 1. R2 may be
computed from

R2 = 1 −
∑ (xactual − xpredicted )2
∑ ( xactual − xmean)2

Table 1. Results for forecasting a linear series, x(t) = t, model I.


Hidden cells Learning epochs R2
5 7127 0.9976
10 13084 0.9976
15 16927 0.9981
20 12327 0.9984
25 18797 0.9983
30 11635 0.9987
50 16140 0.9990

(a) (b)
Figure 4. Comparison of network predictions for (a) 5 hidden cells and (b) 50 hidden cells.
(In all plots in this paper, the horizontal axis denotes the pattern index, not necessarily the value of t.)

4
Curiously, increasing the number of hidden cells does not appear to have any clear correlation with the learning
epochs required by the network. However, having more hidden cells generally results in a higher value of R2 ,
indicating a better fit with the desired output data. This is illustrated in Figure 4, which shows that using 50 hidden
cells results in noticeably more accurate forecasting than using only 5 hidden cells. At 50 hidden cells, the network
provides a reasonably good prediction of the behavior of x(t) immediately beyond the training data.

An analogous process was carried out for the quadratic series x(t) = t2 , with the only difference being the values of t
used. Here, the training data consists of all the {t, x(t)} pairs for t = 0.5, 1, 1.5, ... , 20. The network was trained and
asked to forecast values of x(t) for t = 20.5, 21, 21.5, ... , 25 (where t = 20.5 corresponds to a pattern index of 41).
The results are summarized in Table 2 and Figure 5.

Table 2. Results for forecasting a quadratic series, x(t) = t2 , model I.


Hidden cells Learning epochs R2
5 224 0.9815
10 571 0.9910
15 962 0.9964
20 1331 0.9966
25 3329 0.9981
50 6622 0.9981

(a) (b)

Figure 5. Comparison of network predictions for (a) 5 hidden cells and (b) 50 hidden cells.

5
In this case, there is a direct relationship between the number of hidden cells and the learning epochs required. As
before, increasing the hidden cells also results in improvement in R2 , as illustrated in Figure 5. Again, at 50 hidden
cells, the network provides a fairly good short-term prediction of x(t) beyond the training data.

Overall, it can be concluded that given a sufficient number of hidden cells, type I of data modeling performs
reasonably well for forecasting the behavior of linear and quadratic series. Using only values for x(t) over a range of
t, the backpropagation network can effectively predict x(t) at a near-future time.

SINUSOIDAL SERIES

Unfortunately, the data model detailed above fails spectacularly when applied to a periodic function. A test case
p 2p
with x(t ) = sin t was performed. The training data consists of all the {t, x(t)} pairs for t = 0 , , , ... , 4p ; the
10 10
41p 42p 43p 41p
network was asked to forecast x(t) for t = , , , ... , 6p (where t = corresponds to a pattern index of
10 10 10 10
42). As shown in Figure 6, even with 50 hidden cells, the network completely fails to predict the cyclic behavior of
the series past the training data. Furthermore, training was unable to reduce the average error of the training data to
0.00001, as was done in the previous cases. Even with 50 hidden cells, the error appeared to stall at around 0.0005 –
over a full order of magnitude higher than the desired result.

(a) (b)

Figure 6. Failure of model I to forecast the behavior of x(t) = sin t : (a) 5 hidden cells; (b) 50 hidden cells.

6
Type II of data modeling was thus applied to x(t) = sin t. Whereas the network previously had only a single input, t,
model II calls for multiple inputs, x(tn ). In this case, the inputs were handled in sets of 5. Thus, the training data
p 2p 3p 4p 5p p 2p 3p 4p 5p 6p
consists of { x( 0), x( ), x( ), x( ), x( )}, x( ) ; { x( ), x( ), x( ), x ( ), x ( )}, x ( ) ; ... ;
10 10 10 10 10 10 10 10 10 10 10
41p 42 p 43p 44p 45p
{ x( 4p), x( ), x( ), x( ), x ( )}, x ( ) . The resulting network contains 5 input cells and 9 hidden cells
10 10 10 10 10
(suggested by NeuroShell). Using the same 0.00001 error condition as before, the network was asked to predict x(t)
46 p 47 p 48p 65p 46p
for t = , , , ... , (where t = corresponds to a pattern index of 42). The same process was then
10 10 10 10 10
repeated for a damped sine series, x(t) = 1.05-t sin t; the results for both cases are displayed in Table 3 and Figure 7.

Table 3. Results for forecasting sinusoidal series, model II.


Learning epochs R2
x(t) = sin t 458 0.9999
-t
x(t) = 1.05 sin t 16411 0.9994

(a) (b)

Figure 7. Model II applied to forecasting (a) x(t) = sin t; (b) x(t ) = 1.05-t sin t.

Clearly, in both cases the network is able to accurately predict x(t) at a future time by using values of x(t) at
immediately preceding times. Note, however, that the examples here can be somewhat misleading because they
always use the known actual values of x(t) (as opposed to network predictions) for the inputs. Nonetheless, for
short-term forecasting, model II is sufficient.

7
STOCK PRICES

All of the cases considered thus far involve series whose value at any time can be computed using an explicit
expression. To test the practical applicability of these concepts, it was decided to let the backpropagation network
attempt to forecast the share prices of a certain stock. Data was obtained from one of the example problems
provided with NeuroShell; it consists of a history of share prices over a 955-day period, along with some stock
market statistics for each day.

The first approach was to apply model II, again using input sets of 5. The resulting network contains 5 input cells
and 28 hidden cells (suggested by NeuroShell). Training data consists of {x(1), x(2), x(3), x(4), x(5)}, x(6); {x(2),
x(3), x(4), x(5), x(6)}, x(7); ... ; {x(798), x(799), x(800), x(801), x(802)}, x(803). The network was asked to predict
x(t) for t = 804, 805, 806, ... , 956. However, training was able to reduce the average training pattern error to only
about 0.00005 – still significantly greater than the desired value of 0.00001. As Figure 8a demonstrates, the lack of
any obvious trend in the stock prices poses a formidable obstacle to the network, and it is unable to accurately
forecast prices past the given training data.

(a) (b)
Figure 8. Comparison of network stock predictions using (a) model II; (b) model III.

With the failure of model II, model III was applied. Unlike the previous approaches, which depend solely on values
of x(t) for training, model III calls for the introduction of relevant outside variables. Unfortunately, neither of the
authors has sufficient knowledge of the stock market to effectively select these inputs. Consequently, it was decided
to simply use the variables already supplied in the example problem. The resulting network contains 24 inputs
covering such technical indicators as the NYSE high/low/close, S&P 500 high/low/close, advances volume, declines
volume, unchanged volume, and total volume for each day. Using these inputs, the network displays impressive
accuracy in forecasting x(t), as shown in Figure 8b.

8
CONCLUSIONS

Overall, the standard backpropagation network proves to be quite effective at forecasting the behavior of time series.
Depending on the complexity of the series, however, different types of data modeling must be employed to achieve
desirable results. Of the models considered here, type I is the simplest to apply but also the most limited. Type II
requires more manipulation of the input data, but it succeeds for forecasting more complex series. Both types I and
II are advantageous in that they require only values of x(t) for training. Type III, by contrast, requires careful
selection of appropriate input variables, meaning that a successful modeler needs to have extensive domain
knowledge. In exchange for this added complexity, type III proves to be the most powerful model considered here;
it succeeds in forecasting a complicated practical problem without a simple mathematical basis.

REFERENCE

1. NeuroShell 2, Release 4.0. Ward Systems Group, Inc., Executive Park West, 5 Hillcrest Dr., Frederick,
MD 21703, www.wardsystems.com, 2000.

AUTHORS

Kriangsiri Malasri is currently a junior in aerospace engineering at the Georgia Institute of Technology. His
experience with neural networks has included programming several Java applets that demonstrate various network
algorithms. He hopes to eventually perform research on how artificial intelligence may be applied towards
aerospace applications.

Siripong Malasri is a professor and Dean of Engineering at Christian Brothers University. He has taught artificial
intelligence related courses for the Engineering Management Graduate Program and the Electrical & Computer
Engineering Department, including an undergraduate course in Connectionist Artificial Intelligence that covers
various neural network architectures.

Вам также может понравиться