Академический Документы
Профессиональный Документы
Культура Документы
www.elsevier.com/locate/asoc
Abstract
Technical analysis of stocks mainly focuses on the study of irregularities, which is a non-trivial task. Because one time scale alone cannot be
applied to all analytical processes, the identification of typical patterns on a stock requires considerable knowledge and experience of the stock
market. It is also important for predicting stock market trends and turns. The last two decades has seen attempts to solve such non-linear financial
forecasting problems using AI technologies such as neural networks, fuzzy logic, genetic algorithms and expert systems but these, although
promising, lack explanatory power or are dependent on domain experts. This paper presents an algorithm, PXtract to automate the recognition
process of possible irregularities underlying the time series of stock data. It makes dynamic use of different time windows, and exploits the potential
of wavelet multi-resolution analysis and radial basis function neural networks for the matching and identification of these irregularities. The study
provides rooms for case establishment and interpretation, which are both important in investment decision making.
# 2006 Elsevier B.V. All rights reserved.
Keywords: Forecasting; Wavelet analysis; Neural networks; Radial basis function network; Chart pattern extraction; Stock forecasting; CBR
* Corresponding author.
Many of financial researchers believe that there are some
E-mail addresses: csnkliu@comp.polyu.edu.hk (James N.K. Liu), hidden indicators and patterns underlying stocks [9]. Weinstein
cskwong@comp.polyu.edu.hk (Raymond W.M. Kwong). [10] found that every stock has its own characteristics. It mainly
1568-4946/$ – see front matter # 2006 Elsevier B.V. All rights reserved.
doi:10.1016/j.asoc.2006.01.007
1198 J.N.K. Liu, R.W.M. Kwong / Applied Soft Computing 7 (2007) 1197–1208
falls into five categories, they are: finance, utilities, property, chart patterns play a very important role in technical analysis
and commercial/industrial and technology. Stocks’ price move- with different chart patterns revealing different market trends.
ments in different categories are depending on different factors. It For example, a head-and-shoulders tops chart pattern reveals
is difficult to identify which factors will affect a particular stock’s that the market will most likely to have a 20–30% rise in the
price movement. To address the problem, we explored the use of coming future. Successfully identifying the chart pattern is said
genetic algorithm to provide a dynamic mechanism for selecting to be the crucial step towards the win. Fig. 1 shows 16 samples
appropriate factors from available fundamental data and of typical chart patterns.
technical indicators [11]. Our investigation of the HK stock However, the analysis and identification of wave patterns is
market included potential parameters in fundamental data such difficult for two reasons. Firstly, there exists no single time
as daily high, daily low, daily opening, daily closing, daily scale that works for all analytical purposes. Secondly, any stock
turnover, gold price, oil price, HK/US dollar exchange rate, HK chart may exhibit countless different pattern combinations,
deposit call, HK interbank call, HK prime rate, silver price, and some containing sub-patterns. Choosing the most representa-
Hang Seng index comprising 33 stocks from the said five tive presents quite a dilemma. Furthermore, there is no readily
categories. The aggregate market capitalization of these stocks report of research development on the automatic process of
accounts for about 79% of the total market capitalization on The identifying chart patterns. We address this problem using the
Stock Exchange of Hong Kong Limited (SEHK). following algorithm.
On the other hand, for the technical indicators, we examined
the influences of popular indicators such as the relative strength 3.1. The PXtract algorithm
index (RSI), moving average (MA), stochastic and Ballinger
bands, prices/index movements, time lags and several data The PXtract algorithm extracts wave patterns from stock
transformations [12,13]. Each of these indicators provides price charts based on the following phases:
guidance for investors to analyze the trend of the stocks’ prices
movements. In particularly, the RSI is quite useful to technical 3.1.1. Window size phase
analyst in chart interpretation. The theoretical basis of the As there is hardly a single time scale that works for all
relative strength index is the concept of momentum. A analytical purposes in a wave identification process [2,29], a set
momentum oscillator is used to measure the velocity or rate of of time window sizes W={fw1 ; w2 ; . . . ; wn g j w1 > w2 > . . .
change of price over time. It is essentially a short-term trading > wn is defined (wi is the window size for 1 < = i< = n).
indicator and also quite effective in extracting price information Different window sizes are used to determine whether a wave
for a non-trending market. In short, the total number of pattern occurs in a specific time range. For example, in a short-
potential inputs being tested was 57 [11]. We applied GAs to term investment strategy, a possible window size can be defined
determine which input parameters are optimal for different as Wi 2 W = {40, 39, . . ., 10}.
stock modeling in Hong Kong. The fitness value of the
chromosome in the genetic algorithm was the classification rate 3.1.2. Time subset generation phase
of the neural network. It was calculated by counting on how Stock price trading data contain a set of time data T = {t1, t2,
many days the network’s output matched the derived ‘‘best . . ., tn} j t1 > t2 > . . . > tn. For a given time window size wi , T
strategy’’. We defined the best strategy at trading time t as: will be divided into a temporary subset T0. A set P is also
8 defined, where P T. It contains the time ranges in which
>
> priceðt þ 1Þ priceðtÞ
>
> buy if > z% previously identified wave patterns have occurred. Set P is f in
< priceðtÞ
the beginning.
best strategy ¼ priceðt þ 1Þ priceðtÞ
>
> sell if < z% It is said that any large change in a trend plays a more
>
> priceðtÞ
: important role in the prediction process [13]. A range which
hold otherwise
has previously been discovered to contain a wave pattern will
where z is the decision threshold, and the output of the network not be tested again (i.e. If T0 P, tests will not be carried out).
is encoded as 1, 0, and 1 corresponding to the suggested Details about time subset T0 generation processes are shown in
investment strategies ‘buy’, ‘hold’, ‘sell’, respectively. We Fig. 2.
observed that the daily closing price and its transformation For example, T = {10 Jan, 9 Jan, 8 Jan, 7 Jan, 6 Jan, 5 Jan, 4
were the most sensitive input parameters for the stock forecast. Jan, 3 Jan, 2 Jan, 1 Jan}, the current testing window size is 3
In contrast, technical indicators such as RSI and MA were not (w ¼ 3), and P = {9 Jan, 8 Jan, 7 Jan, 6 Jan}. After the time
critical in those experiments. As such, we feel confident to subset generation process, T0 = {(5 Jan, 4 Jan, 3 Jan), (4 Jan, 3
concentrate on the investigation of the closing price movements Jan, 2 Jan), (3 Jan, 2 Jan, 1 Jan)}.
for possible trends and irregularities. This will be the subject of
chart pattern analysis below. 3.1.3. Pattern recognition
For a given set of time T00 j T00 T0, apply the wavelet theory
3. Wave pattern identification to identify the desired sequences. If a predefined wave pattern is
discovered, add T00 to P. Details are described below.
According to Thomas [14], there are up to 47 different chart The proposed algorithm PXtract is given in Fig. 3. The
patterns, which can be identified in stock price charts. These function genSet(wi ) is the subset generation process discussed
J.N.K. Liu, R.W.M. Kwong / Applied Soft Computing 7 (2007) 1197–1208 1199
earlier. At the end of the algorithm, all the time information of univariate function c, defined on R when subjected to
the identified wave pattern is stored in set P. fundamental operations of shifts and dyadic dilation, yielding
Pattern matching can be carried out using simple multi- an orthogonal basis of L2(R).
resolution (MR) matching (or radial basis function neural The orthonormal basis of compactly supported wavelets of
network (RBFNN) matching. Details of the wavelet recognition L2(R) is formed by the dilation and translation of a single
and simple MR matching can be found in our previous work function c (x).
[15].
c j;k ðxÞ ¼ 2i=2 cð2 j x kÞ
4. Wavelet recognition and matching where j, k 2 Z. Vanishing moments means that the basis func-
tions are chosen to be orthogonal to the low degree polyno-
Wavelet analysis is a relatively recent development of applied mials. It is said that a function w(x) has a vanishing kth moment
mathematics in 1980s. It has since been applied widely with at point t0 if the following equality holds with the integral
encouraging results in signal processing, image processing and converging absolutely:
pattern recognition [16]). As the waves in stock charts are 1D Z
patterns, no transformation from higher dimension to 1D is ðt t0 Þk ’ðtÞdt ¼ 0
needed. In general, wavelet analysis involves the use of a
The function c(x) has a companion, the scaling function f(x), ing to the wavelet orthonormal decomposition as shown in
and these functions satisfy the following relations: Eq. (1), Vj is first decomposed orthogonally into a high-fre-
quency sub-space Vj+1 and Wj+1. The low-frequency sub-space
pffiffiffiX
L1
fðxÞ ¼ 2 hk fð2x kÞ Vj+1 is further decomposed into Vj+2 and Wj+2 and the processes
k¼0 can be continued. The above wavelet orthonormal decomposi-
tion can be represented by
pffiffiffiX
L1
’ðxÞ ¼ 2 gk fð2x kÞ V j ¼ W jþ1 V jþ1 ¼ W jþ1 W jþ2 V jþ2
k¼0
¼ W jþ1 W jþ2 W jþ3 V jþ3 ¼ . . .
where hk and gk are the low- and high-pass filter coefficients,
respectively, L is related to the number of vanishing moments k According to Tang et al. [16], projective operators Aj and Dj
and L is always even. For example, L = 2k in the Daubechies are defined as:
wavelets.
A j : L2 ðRÞ V j projective operator from L2 ðRÞ to V j
gk ¼ ð1Þk hLk1 ; k ¼ 0; . . . ; L 1 D j : L2 ðRÞ W j projective operator from L2 ðRÞ to W j
Z
þ1
Since f ðxÞ 2 V j L2 ðRÞ :
fðxÞdx ¼ 1 X
f ðxÞ ¼ A j f ðxÞ ¼ c j;k f j;k ðxÞ ¼ A jþ1 f ðxÞ þ D jþ1 f ðxÞ
1
k 2 ZZ
The filter coefficients are assumed to satisfy the orthogon- X X
¼ c jþ1;m f jþ1;m ðxÞ þ d jþ1;m c jþ1;m ðxÞ
ality relations: m 2 ZZ m 2 ZZ
X
hn hnþ2 j ¼ dð jÞ Also, Tang et al. [16] has proved the following equations:
n
X X
hn gnþ2 j ¼ 0 c jþ1;m ¼ hk c j;kþ2m (2)
n
extracting chart patterns in the stock time series data is a time we use a simple but powerful mechanism to generate more
consuming and expensive operation. We have examined five training data based on the real data.
typical stocks for the period 1 January 1995 to 31 December To generate more training samples, a radial deformation
2001 (see Table 1). A summary of the total numbers of real method is introduced. Here are the major steps of the radial
training data for fourteen different chart patterns is shown in deformation process:
Table 2. The training set of the chart patterns is collected based
on the judgment of a human critic following the rules suggested (a) P = {p1, p2, p3, . . ., pn} is a set of data points containing a
by Thomas [14] from the real and deformed data described in chart pattern.
the following. The training set contains totally 308 records. A (b) Randomly pick i points (i< = n) in set P for deformation.
quarter of the training set is extracted as the validation set. We (c) Randomly generate a set of the radial deformation distance
set the wavelet resolution equal to 8. We found that the signal/ D = {d1, d2, . . ., di}.
pattern for the resolution 1–3, was too smooth and each pattern (d) For each point in P, a random step dr is taken in a random
was similar to each others at those levels. The network was not direction. The deformed pattern is constructed by joining
able to recognize different patterns well. Therefore, only four consecutive points with straight lines. Details are depicted
RBFNNs were created for training different chart patterns at the in Fig. 7.
resolution levels 4–7. The performance of the networks at (e) Justify the deformed pattern using human critics.
different resolution levels and the classification results are
shown in Section 6. Psychophysical studies [25] tell us that humans are better
In our training set, the initial quantity of data is insufficient than machines at recognizing objects, which are more
for training the system well. If we tried to extract over 200 chart
patterns in the time series data, it would be infeasible, time
consuming and expensive. In order to expand the training set,
Table 1
The five different stocks and their stock IDs
Stock ID Stock name
00341 CAFÉ DE CORAL HOLDINGS Ltd.
00293 CATHAY PACIFIC AIRWAYS Ltd.
00011 HANG SENG BANK Ltd.
00005 HSBC HOLDINGS PLC.
00016 SUN HUNG KAI PROPERTIES Ltd. Fig. 7. Radial deformation. (a) An example of accepted deformed pattern. (b)
An example of NOT accepted deformed pattern.
J.N.K. Liu, R.W.M. Kwong / Applied Soft Computing 7 (2007) 1197–1208 1203
Table 2
Total numbers of training patterns in fourteen typical chart patterns of five different stocks
Table 3
Optimal wavelet and thresholds setting found by empirical testing
Wavelet family Resolution threshold Threshold value Accuracy (%) Total number of patterns discovered Processing time (s)
Daubechies (DB2) 4 0.3 6.2 8932 312
0.2 7.1 7419
0.15 14.2 3936
0.1 43.1 543
5 0.3 7.1 7734 931
0.2 9.4 6498
0.15 17.4 2096
0.1 53 420
6 0.3 8.9 7146 3143
0.2 13.5 5942
0.15 19.9 1873
0.1 56.9 231
7 0.3 10.5 6023 8328
0.2 14.5 5129
0.15 18.5 1543
0.1 48.3 194
1204 J.N.K. Liu, R.W.M. Kwong / Applied Soft Computing 7 (2007) 1197–1208
a large range of widow’s sizes, we used typical stock prices of contains the ‘Double Tops’ pattern. For the identification of the
SUN HUNG KAI and Co. Ltd. (0086) for the period from 2 chart patterns, two matching methods were studied — simple
January 1992 to 31 December 2001. multi-resolution (MR) matching and RBFNN matching,
As shown in Fig. 10, the algorithm scales linearly as the size respectively. For the simple MR matching, similarity between
of the time window increases. the input and the template is measured by mean absolute
In the experiments on wavelet chart patterns recognition, percentage error (MAPE). A low MAPE denotes that they are
different wavelet families were selected as the filter. The similar. The performance of simple MR matching was tested in
maximum resolution level was set to be 7. The highest experiments using different resolution threshold t and different
resolution level 8 is taken as the raw input. The left hand side of level threshold l.
Fig. 11 shows the price of the stock CATHAY PACIFIC (00293) Table 3 shows the most accurate combinations. We note
for the period from 7 June 1999 to 22 July 1999. This period that the accuracy using simple MR matching is not accurate
J.N.K. Liu, R.W.M. Kwong / Applied Soft Computing 7 (2007) 1197–1208 1205
Fig. 9. Training set from both the real and deformed chart patterns.
Fig. 11. Algorithm PXtract using wavelet multi-resolutions analysis on the pattern ‘‘double tops’’ template.
Cybernetics, Sheraton Hotel, Xi’an, China, 02–05 November 2003, [29] P. Blakey, Pattern recognition techniques [in stock price and volumes],
(2003), pp. 1768–1773. IEEE Microwave Mag. 3 (1) (2000) 28–33.
[28] L.J. Cao, F.E.H. Tay, Support vector machine with adaptive parameters in [32] L.Y. Yu, Y.-Q. Zhang, Evolutionary fuzzy neural networks for
financial time series forecasting, IEEE Trans. Neural Networks 14 (6) hybrid financial prediction, IEEE Trans. SMC Part C 35 (2) (2005)
(2003) 1506–1518. 244–249.