Академический Документы
Профессиональный Документы
Культура Документы
a r t i c l e i n f o a b s t r a c t
Article history: In financial markets, it is both important and challenging to forecast the daily direction of the stock mar-
Received 24 July 2016 ket return. Among the few studies that focus on predicting daily stock market returns, the data mining
Revised 6 September 2016
procedures utilized are either incomplete or inefficient, especially when a large amount of features are
Accepted 16 September 2016
involved. This paper presents a complete and efficient data mining process to forecast the daily direction
Available online 21 September 2016
of the S&P 500 Index ETF (SPY) return based on 60 financial and economic features. Three mature di-
Keywords: mensionality reduction techniques, including principal component analysis (PCA), fuzzy robust principal
Daily stock return forecasting component analysis (FRPCA), and kernel-based principal component analysis (KPCA) are applied to the
Principal component analysis (PCA) whole data set to simplify and rearrange the original data structure. Corresponding to different levels of
Fuzzy robust principal component analysis the dimensionality reduction, twelve new data sets are generated from the entire cleaned data using each
(FRPCA) of the three different dimensionality reduction methods. Artificial neural networks (ANNs) are then used
Kernel-based principal component analysis
with the thirty-six transformed data sets for classification to forecast the daily direction of future market
(KPCA)
Artificial neural networks (ANNs)
returns. Moreover, the three different dimensionality reduction methods are compared with respect to
Trading strategies the natural data set. A group of hypothesis tests are then performed over the classification and simula-
tion results to show that combining the ANNs with the PCA gives slightly higher classification accuracy
than the other two combinations, and that the trading strategies guided by the comprehensive classifi-
cation mining procedures based on PCA and ANNs gain significantly higher risk-adjusted profits than the
comparison benchmarks, while also being slightly higher than those strategies guided by the forecasts
based on the FRPCA and KPCA models.
© 2016 Elsevier Ltd. All rights reserved.
1. Introduction and methodology (Enke & Thawornwong, 2005; Wang, Wang, Zhang, & Guo, 2011).
Each of these factors interacts in a very complex manner Yao, Tan,
Analyzing stock market movements is extremely challenging & Poh, 1999. Above all, the efficient market hypothesis states that
for both investors and researchers. This is mainly due to the stock current stock values reflect all available information in the market
market essentially being a dynamic, nonlinear, nonstationary, at that moment, and that the public cannot make successful trades
nonparametric, noisy, and chaotic system (Deboeck, 1994; Yaser based on that information, further adding to the difficulty of
& Atiya, 1996). In fact, stock markets are affected by many highly understanding and predicting stock market movements.
interrelated factors. These factors include: 1) economic variables, However, it is believed by some researchers that the markets
such as interest rates, exchange rates, monetary growth rates, are inefficient, in part due to psychological factors of the various
commodity prices, and general economic conditions; 2) industry market participants, along with the inability of the markets to
specific variables, such as growth rates of industrial production and immediately respond to newly released information (Jensen, 1978).
consumer prices; 3) company specific variables, such as changes Financial variables, such as stock prices, stock market index values,
in company policies, income statements, and dividend yields; 4) and the prices of financial derivatives are therefore thought to be
psychological variables of investors, such as investors’ expectations predictable. This allows one to gain a return above the market
and institutional investors’ choices; and 5) political variables, such average by examining information released to the general public,
as the occurrence and the release of important political events with results that are better than random (Lo & MacKinlay, 1988).
For decades, investors and researchers have been attracted to try
and make significant profit due to potential market inefficiencies
∗
Corresponding author. by improving trading strategies based on increasingly accurate
E-mail addresses: xz4y4@mst.edu (X. Zhong), enke@mst.edu (D. Enke). forecast of financial variables.
http://dx.doi.org/10.1016/j.eswa.2016.09.027
0957-4174/© 2016 Elsevier Ltd. All rights reserved.
X. Zhong, D. Enke / Expert Systems With Applications 67 (2017) 126–139 127
There exist different categorizations among previous stock system. Thus, it is necessary to choose the most influential and
market forecasting technologies. For instance, given the number of representative inputs if an ANN is expected to produce an efficient
input variables, financial time series forecasting can be classified and accurate prediction. This type of selection is the main task of
as either univariate or multivariate analysis. In univariate analysis, dimensionality reduction technology.
only the financial time series itself is considered as the input, Strictly speaking, the dimensionality reduction can be per-
while in multivariate analysis the input variables can be a lagged formed in two different ways: either by selecting the most relevant
time series, or another type of data, such as a technical, funda- variables from the original data set (usually called as feature selec-
mental, or inter-market indicator. With regard to the techniques tion) or by generating a smaller group of new variables, each being
used to analyze the stock markets, both statistical and artificial a certain combination of the older input variables. Researchers
intelligence methods have been explored. One group of statisti- in Statistics, Computer Science and Applied Mathematics have
cal approaches are based on the autoregressive moving average worked in this field for many years and developed a variety of
(ARMA), the autoregressive integrated moving average (ARIMA), the linear and nonlinear dimensionality reduction techniques. Van der
generalized autoregressive conditional heteroskedastic (GARCH) Maaten, Postma, and Van den Herik (2009) present a review and
volatility (Franses & Ghijsels, 1999), and the smooth transition systematic comparison of these techniques. Sorzano, Vargas, and
autoregressive model (STAR) (Sarantis, 2001). These statistical Pascual-Montano (2014) also categorize the plethora of dimension
techniques also fall into the category of univariate analysis since reduction techniques with the mathematical insight behind them.
they use the financial time series itself, as well as a lagged time Principal component analysis (PCA) is the most classical and
series as input variables. Other types of statistical approaches of- well-known statistical method for extracting important features
ten employed include linear discriminant analysis (LDA), quadratic from high-dimensional data space. This methodology dates back to
discriminant analysis (QDA), linear regression (LR), and support Pearson (1901), and is based on the idea of defining a new co-
vector machines (SVM), each of which usually includes multiple ordinate system or space where the raw data can be expressed
input variables. With the assumptions of linearity, stationarity, and in terms of many less variables without a significant loss of in-
normality, most of the statistical analysis methods listed above formation. Nonetheless, there are some concerns that this lin-
have been restricted within the area of financial forecasting. On ear technique cannot adequately handle complex nonlinear data.
the contrary, artificial intelligence models, such as artificial neural Therefore, a number of nonlinear techniques, including kernel-
networks (ANNs), fuzzy systems, and genetic algorithms are driven based principal component analysis (KPCA), have been proposed.
by multivariate data with no required assumptions. Many of these KPCA is a kernel-based dimensionality reduction method that has
methodologies have been applied to forecast financial variables. a broad application in pattern recognition and machine learning.
For instance, see Armano, Marchesi, and Murru (2005), Cao and The KPCA method gained more interest after SVM was introduced
Tay (2001), Chen, Leung, and Daouk (2003), Chun and Kim (2004), by Vapnik (1998). Van der Maaten, Postma, and Van den Herik
Thawornwong and Enke (2004), Enke and Thawornwong (2005), (2009) compare PCA with twelve front-ranked nonlinear dimen-
Hansen and Nelson (2002), Kim and Han (2000), Shen and Loh sionality reduction techniques, such as Multidimensional Scaling,
(2004), Ture and Kurt (2006), Vellido, Lisboa, and Meehan (1999), Isomap, Maximum Variance Unfolding, KPCA, Diffusion Maps, Mul-
Wang (2002), and Zhang (2003). A comprehensive review of tilayer Autoencoders, Locally Linear Embedding, Laplacian Eigen-
these studies can be found in Atsalakis and Valavanis (2009) and maps, Hessian LLE, Local Tangent Space Analysis, Locally Linear Co-
Vanstone and Finnie (2009). Often, the developed price forecasting ordination, and Manifold Charting by performing each on artifi-
and stock market timing systems are used in conjunction with cial and natural tasks. The results show that although nonlinear
trading rules to develop an intelligent, autonomous, and/or adap- techniques do well on selected artificial data, none of them out-
tive decision support system. For instance, see Barak, Dahooie, performs the traditional PCA using real-world data. However, they
and Tichý (2015), Cervelló-Royo, Guijarro, and Michniuk (2015), also point out that the selection of a proper kernel function is im-
Chen and Chen (2016), Chiang, Enke, Wu, and Wang (2016), Chour- portant for the performance of KPCA. In general, the model se-
mouziadis and Chatzoglou (2016), Enke and Mehdiyev (2013), lection in kernel methods, including the specification of relevant
Jaisinghani (2016), Kim and Enke (2016), Monfared and Enke parameters, can lead to high computational costs. Consistently,
(2014), and Thawornwong, Enke, and Dagli (2001). Sorzano et al. (2014) state that among the available dimensionality
With nonlinear, data-driven, and easy-to-generalize char- reduction techniques, PCA and its different versions, such as stan-
acteristics, multivariate analysis through the use of ANNs has dard PCA, robust PCA, sparse PCA, and KPCA are still the preferred
become a dominant and popular analysis tool in finance and techniques given their simplicity and intuitiveness. Moreover,
economics. Refenes, Burgess, and Bentz (1997) and Zhang, Patuwo, Van der Maaten, Postma, and Van den Herik (2009) demonstrate
and Hu (1998) provide a review of using ANNs as a forecasting the four main weaknesses of the popular dimensionality reduction
method in different areas of finance and investing, including techniques, including: (1) the susceptibility to the curse of dimen-
financial engineering. Although ANNs seem to be suited for sionality, (2) the problems in finding the smallest eigenvalues in
financial time series forecasting, they have some limitations. an eigenproblem, (3) overfitting, and (4) the presence of outliers.
Saad, Prokhorov, and Wunsch (1998) question the robustness of It is known that many well-accepted techniques are sensitive
ANN results. Hussain, Knowles, Lisboa, and El-Deredy (2007) and to noisy data, especially outliers in the data. The quality and
Lam (2004) also state that it is crucial for the ANNs to achieve performance of such techniques can be significantly affected by
accurate results with a deliberate selection of the input variables missing values and outliers, not to mention incorrect data and
and an optimal combination of the network parameters, including mismatches that possibly exist in the data collected from different
the learning rate, momentum, number of hidden layers, and sources. Properly handling outliers can improve the robustness and
number of nodes in each layer. Atsalakis and Valavanis (2009), accuracy of the dimensionality reduction results and help keep any
Cao, Leggio, and Schniederjans (2005), and Thawornwong and subsequent classifier from spending too much time trying to find
Enke (2004) demonstrate that designing an ANN with the least an effective solution. Moreover, if the number of outliers is large,
complexity and the most relevant and influential input variables the data cannot be normal or symmetric based on the empirical
can improve the efficiency and accuracy of financial time series principle of normality, further reducing classification accuracy
forecast. As mentioned earlier, stock markets are affected by for some techniques. Thus, in order to perform an efficient and
various factors, many of which are utilized as possible input reliable analysis with reasonably accurate results, it is necessary
variables during the development of a stock market forecasting to conduct a careful data preprocessing at the beginning of any
128 X. Zhong, D. Enke / Expert Systems With Applications 67 (2017) 126–139
data mining procedure. Yet, data preprocessing can be very time dict the Shanghai Composite index. Other researchers have studied
consuming and somewhat tedious depending on the specific cases. daily data. For example, Guresen, Kayakutlu, and Daim (2011) ex-
It is not unusual to spend 60–90% of the modeling and testing plore daily data of NASDAQ Stock Exchange index, Kara, Boyacioglu,
on cleaning and preprocessing the raw data. Atsalakis and Vala- and Baykan (2011) attempt to predict the direction of movement
vanis (2009) summarize that among the studies of stock market in the daily Istanbul Stock Exchange (ISE) National 100 Index,
forecasting, some researchers simply preprocess the data by using O’Connor and Madden (2006) predict the daily movements in the
a logarithmic data transformation or standardization of the raw Dow Jones Industrial Average index, while Zhu, Wang, Xu, and
data, while others do not preprocess the data or give any further Li (2008) use daily data to forecast NASDAQ, DJIA, and STI indices. A
details about cleaning the data. There are some techniques from few research groups, such as Armano, Marchesi, and Murru (2005),
other fields aimed to alleviate and solve this issue. For example, Cao and Tay (2001), and Niaki and Hoseinzade (2013) work on pre-
robustness theory is developed for solving problems subject to dicting daily movements of the S&P 500 index. Both Thawornwong
model perturbation or added noise or outliers; and the theory of and Enke (2004) and Leung et al. (2000) conclude that trading
fuzzy set proposed by Zadeh (1965) can reduce the effect of out- strategies guided by classification models generate higher risk-
liers or noises when applied to data sets with unmodeled charac- adjusted profits compared to the benchmark buy-and-hold strategy
teristics by assigning a fuzzy membership to each input data point and those strategies directed by level-estimation based forecasts.
such that different input points can make different contributions In this paper, the daily direction of SPDR S&P 500 ETF (ticker
to the analyzing process. For almost four decades, statisticians symbol: SPY) is forecasted using a deliberately designed classi-
have investigated the robust algorithm of principal component fication mining procedure. This will begin by preprocessing the
analysis. One outstanding idea is proposed by Xu and Yuille (1995). raw data to deal with missing values, outliers, and mismatched
They adapt the statistical physics approach to define an objective samples. Three versions of PCA are applied next to the cleaned
function with the consideration of outliers, and then generalize and complete data in order to select the most influential and
several commonly used PCA self-organizing rules into robust ver- uncorrelated variables for classification. ANNs acting as classifiers
sions. They demonstrate that their method can resist outliers very are then used with the transformed data sets to forecast the
well. However, it is difficult to choose a hard threshold in their direction of future market returns.
approach. Yang and Wang (1999) extend Xu and Yuille’s method The remainder of the paper is organized as follows. The data
by defining a fuzzy objective function and using gradient descent description and preprocessing will be discussed next in Section 2,
optimization. Their robust principal component analysis algorithm, while three different dimensionality reduction techniques will be
FRPCA, only needs to preset one parameter, the fuzziness variable, introduced in Section 3. The proposed classifiers will be briefly
which determines the influences of outliers on the results. They reviewed in Section 4, and the data analysis and model develop-
developed their algorithm in three different ways based on updat- ment will be illustrated in Section 5. The modeling results will be
ing the weights of the data points differently and called them FR- summarized in Section 6, with the simulation process described
PCA1, FRPCA2, and FRPCA3. Luukka (2011) develops a nonlinear ver- in Section 7. Concluding remarks are presented in Section 8. The
sion of the FRPCA3 algorithm and claims that with outlier removal data sources and descriptions are included in the Appendix.
his algorithm brings promising results in the study of medical
data sets. 2. Data description and preprocessing
Data mining, or big data analytics, is focused on analyzing
large amounts of data efficiently and extracting important, useful, 2.1. Data description
and hidden information from the data by combining various
techniques in different areas, such as pattern recognition, decision The data set utilized for this study involves the daily direction
making, expert systems, knowledge database discovery, artifi- (UP or DOWN) of the closing price of the SPDR S&P 500 ETF
cial intelligence, and statistics. The main types of data mining (ticker: SPY) as the output, along with 60 financial and economic
include classification mining, cluster mining, association rule factors as the potential features. These daily data are collected
mining, text mining, and image mining. Zhong (20 0 0, 20 04) and from 2518 trading days between June 1, 2003 and May 31, 2013.
Zhong, Ma, Yu, and Zhang (2001) demonstrate classification and The 60 potential features can be divided into 10 groups, including
cluster mining in more detail. In general, stock market or financial the SPY return for the current day and three previous days, the
time series forecasting is focused on developing approaches to relative difference in percentage of the SPY return, exponential
successfully forecast or predict index values or stock prices so moving averages of the SPY return, Treasury bill (T-bill) rates,
that the investors can gain high profits using well-defined trad- certificate of deposit rates, financial and economic indicators, the
ing strategies according to the forecasting results. Atsalakis and term and default spreads, exchange rates between the USD and
Valavanis (2009) state that the key to successful stock market four other currencies, the return of seven world major indices
forecasting is achieving the best results with both the minimum (other than the S&P 500), SPY trading volume, and the return of
required input data and the least complex stock market model. eight large capitalization companies within the S&P 500 (which
Therefore, it is natural to connect data mining with stock market is a market cap weighted index and driven by larger capitaliza-
forecasting in order to mine historical data from stock markets to tion companies). Some of these features are being considered
help define better trading strategies. Given the technical challenges for the first time, while others are a mixture of the features
and significant potential profits, many researchers find it worth- conducted by various research groups (Armano et al., 2005; Cao
while to seek a comprehensive data mining procedure that can & Tay, 2001; Thawornwong and Enke (2004), Enke and Thaworn-
produce accurate, consistent, and reliable forecasting results with wong (2005) and Niaki & Hoseinzade, 2013), as long as their
potential profits. Since a stock market index contains numerous values were released without a gap of more than five continuous
individual stocks and reflects the broader market movement rather trading days during the study period. The details of these 60 fi-
than movement of any individual stock, forecasting stock market nancial and economic factors, including their descriptions, sources,
indices has attracted the attention of many researchers. Some of and calculation formulas are given in Table A1 of the Appendix.
the studies target monthly data. For example, Thawornwong and After further analysis, only the most important and influential
Enke (2004), Enke and Thawornwong (2005) and Leung, Daouk, principal components among all the linear combinations of the 60
and Chen (20 0 0) forecast the S&P 500 index using monthly data, factors determined using PCA, FRPCA, and KPCA will be input into
whereas Wang et al. (2011) analyze historical monthly data to pre- the classifiers to predict the direction of the SPY for the next day.
X. Zhong, D. Enke / Expert Systems With Applications 67 (2017) 126–139 129
2000 600
1800
500
1600
1400
400
1200
1000 300
800
200
600
400
100
200
0 0
-0.1 -0.05 0 0.05 0.1 0.15 -0.03 -0.02 -0.01 0 0.01 0.02 0.03
Fig. 1. Histogram of SPY current return (on the left); Histogram of adjusted SPY current return (on the right).
2.2. Data preprocessing space without much loss of the information. Among them, PCA is
the most popular unsupervised linear technique for dimensionality
The data used for this study covers 60 factors over 2518 trading reduction. Jolliffe (1986) gives an authoritative and accessible
days. As to be expected, for such a large collection of data, there account of this methodology. As one of the earliest multivariate
are missing values, mismatching samples, and outliers existing techniques, PCA is aimed to construct a low-dimensional repre-
in the raw data. Using the 2518 trading days during the 10-year sentation of the data while keeping the maximal variance and
period as criteria, the collected samples from other days should covariance structure of the data. In order to achieve this goal, a
be deleted. As for the missing values, if there are n values for linear mapping W that can maximize WT var(X)W, where var(X)
any variable or column that are missing continuously, the average is the variance-covariance matrix of the data X, is needed. It is
of the n existing values on both sides of the missing values are shown that W is formed by the principal eigenvectors of var(X).
used to fill in the n missing values. A simple statistical principle Thus, PCA turns out to be an eigenproblem var (X )W = λW , where
is employed to detect the possible outliers (Navidi, 2011). The λ represents the eigenvalues of var(X). In addition, it is known that
possible outliers are then adjusted using a similar method to the working on the raw data X instead of standardized data with PCA
one employed by Cao and Tay (2001). Specifically, for each of the tends to give more emphasis to those variables that have higher
60 factors or columns in the data, any value beyond the interval variances compared to those variables that have very low vari-
(Q1 − 1.5 ∗ IQR, Q3 + 1.5 ∗ IQR) is regarded as a possible outlier, ances, especially if the units at which the variables are measured
with the factor value replaced by the boundary of the interval are not consistent. In this study, not all variables are measured at
closer to it. Here, Q1 and Q3 are the first and third quartile of the same units. Thus, PCA is applied to the standardized version
all the values in that column, and IQR = Q3 − Q1 is the interquar- of the cleaned data X. In other words, the linear mapping W∗ is
tile of those values. The symmetry of all adjusted and cleaned searched such that
columns can be checked using histograms or statistical tests. For
cor r (X )W ∗ = λ∗W ∗ , (1)
example, Fig. 1 includes the histograms of factor SPYt (i.e., the SPY
current daily return), before and after data preprocessing. It can where corr(X) is the correlation matrix of the data X.
be observed that the outliers are removed and the symmetry is That is, suppose the data X has the format X = (X1 X2 · · · XM ),
achieved after the adjustments. then cor r (X ) = ρ is a M × M matrix where M is the dimensionality
In this study, the ANNs are used as classifiers. At the start of the of the data, and the ijth element of the correlation matrix is
classification mining procedure, the cleaned data are sequentially σi j
partitioned into three parts: training data (the first 70% data), cor r Xi , X j = ρi j = ,
validation data (the last 15% of the first 85% data), and testing
σi σ j
data (the last 15% data). The reason for having validation data where
is to decrease the possibility of overfitting the data, which often
happens in ANN analysis. Additional details about how the data σi j = cov Xi , X j , σi
was used for classification are provided in Section 5.2.
= var (Xi ), σ j = var X j , and i, j = 1, 2, . . . , M. (2)
3. Dimensionality reduction using PCA, FRPCA, and KPCA Essentially, the principal components are the linear com-
binations of all the factors with the coefficients equaling the
3.1. PCA elements of the eigenvectors, correspondingly. Different amounts
of principal components can explain different proportions of the
A number of linear or nonlinear techniques have been devel- variance-covariance structure of the data. The eigenvalues can
oped to embed high-dimensional data into a lower dimensional be used to rank the eigenvectors based on how much of the
130 X. Zhong, D. Enke / Expert Systems With Applications 67 (2017) 126–139
Table 1
The results of PCA over the entire data.
PCs Cumulative proportion PCs Cumulative proportion PCs Cumulative proportion PCs Cumulative proportion
in the cases of PCA and FRPCA, are selected and used for the spaces with various dimensions lower than 60. To show the in-
forecasting of daily return with the ANN classifier in Section 5.2. fluence of dimensionality reduction with PCA, FRPCA, and KPCA on
the daily direction classification, ANNs are applied to each of the
5.2. Use ANN to classify the data thirty-six transformed data sets. The results are listed in Table 3.
3 SPY return in current and three previous days SPYt −0.0019 −0.0402 −0.2349 0.2763 0.0646 0.1976 0.0032 −0.2156 0.2969 −0.4925 0.6027
SPYt1 −0.0047 0.4022 −0.0010 0.0546 −0.1247 −0.0095 0.0075 −0.0013 −0.2143 −0.0876 0.0500
SPYt2 −0.0014 0.0096 0.0048 −0.0391 0.6157 −0.0160 −0.0468 0.50 0 0 −0.2365 −0.0956 0.0701
SPYt3 0.0 0 07 −0.0138 0.0052 0.0203 −0.0551 −0.0087 0.9841 0.0792 0.0739 0.0275 −0.0118
1 Relative difference in percentage of SPY return RDP5 0.0 0 0 0 0.0 0 08 −0.0 0 05 0.0 0 07 0.0010 0.0 0 03 0.0018 0.0 0 08 −0.0 0 01 −0.0014 0.0014
RDP10 0.0 0 0 0 0.0 0 04 −0.0 0 03 0.0 0 03 0.0 0 06 0.0 0 01 0.0 0 09 0.0 0 04 0.0 0 0 0 −0.0 0 08 0.0 0 06
RDP15 0.0 0 0 0 0.0 0 03 −0.0 0 02 0.0 0 03 0.0 0 04 0.0 0 01 0.0 0 06 0.0 0 03 0.0 0 0 0 −0.0 0 05 0.0 0 05
RDP20 0.0 0 0 0 0.0 0 02 −0.0 0 02 0.0 0 02 0.0 0 03 0.0 0 01 0.0 0 05 0.0 0 03 0.0 0 0 0 −0.0 0 04 0.0 0 03
1 Exponential moving averages of SPY return EMA10 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0
EMA20 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0
EMA50 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0
EMA200 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0
1 T-bill rates T1 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 01 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 −0.0 0 01 0.0 0 01 0.0 0 01
T3 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 01 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 −0.0 0 01 0.0 0 01 0.0 0 01
T6 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 −0.0 0 01 0.0 0 01 0.0 0 01
Table 3
The ANN classification results of the 36 transformed data sets based on three PCAs.
Training Validation Testing Total Training Validation Testing Total Training Validation Testing Total
1 54.8 53.6 56.8 54.9 54.8 53.3 57 54.9 55.3 53.3 57 55.2
3 55.2 53.3 57.3 55.2 55.2 53.8 56.8 55.2 55.8 53.6 57 55.6
6 54.9 53.6 57.3 55 57.1 53.6 57 56.6 55.6 53.3 57 55.5
10 56.4 54.6 57.3 56.3 57.1 56.5 56.8 57 56.7 54.6 58.1 56.6
15 56.3 53.3 57.6 56 55.3 55.4 57.8 55.7 56 54.9 57.6 56
22 55.2 54.6 58.1 55.5 56.2 54.9 57.8 56.2 56.6 56 57.8 56.7
26 55.1 53.1 58.1 55.2 56.8 56.5 58.6 57 55.4 54.1 57.8 55.6
31 57.5 57.3 58.1 57.5 56.2 54.4 59.2 56.4 55.7 54.1 57.3 55.7
34 56.2 56 57.3 56.4 56 53.8 58.1 56 55.5 54.4 56.8 55.5
37 55 54.4 57 55.2 56.3 54.1 57.8 56.2 55.7 53.1 57.3 57.6
40 56.2 56.2 56.2 56.2 56 54.1 57.8 56 55.8 59.2 57.6 56.6
60 57.5 54.1 58.1 57.1 56.5 54.4 57.3 56.3 57.4 54.9 58.4 57.1
Table 4 not considered. Moreover, both leveraging and short selling when
The paired t-test results used for the comparison of dif-
investing are forbidden. The two benchmarks used to measure
ferent classification models with respect to the PCAs.
how well the models can perform include investing in a stock
Null hypothesis Alternative hypothesis P-value portfolio (i.e., buy-and-hold) and purchasing a one-month T-bill at
μPCA = μFRPCA μPCA = μFRPCA 0.2989
the start of the testing period, and closing the trading at the end
μPCA = μKPCA μPCA = μKPCA 0.8163 of the testing period. The trading simulation is done for all the
μFRPCA = μKPCA μFRPCA = μKPCA 0.4727 classification models over each testing period, including 376 sam-
μPCA = μFRPCA μPCA > μFRPCA 0.8505 ples (excluding the first day of the 377-day testing period because
μPCA = μKPCA μPCA > μKPCA 0.5918
of the lack of direction prediction for that day) of the thirty-six
μFRPCA = μKPCA μFRPCA > μKPCA 0.2363
transformed data sets corresponding to the number of principal
components involved. The resulting mean, standard deviation or
volatility, and Sharpe ratio of the daily returns on investment
generated from each forecasting model over each testing data are
can be achieved in much higher dimensional data spaces. This
then calculated. The results are presented in Table 5. In addition
phenomenon may be interpreted by considering Table 1. From
to the trading simulation results of the three models for each of
Column 2 of Table 1 we see that the first (and the largest) prin-
the twelve principals components, the 376-day return for both the
cipal component of the entire cleaned data set can explain 93.08%
buy-and-hold and T-bill benchmarks are provided for comparison.
variation of the data, such that there is not much space left for
As shown in Table 5, the return from the buy-and-hold
improvement for the remaining 59 smaller principal components.
benchmark is much higher than one-month T-bill benchmark. By
multiplying the mean of the daily return column by 376 and then
7. Trading simulation
comparing with the two benchmarks, this comparison indicates
that: for all thirty-six transformed data sets, the trading strategies
After using the ANNs to predict the daily SPY direction, it
based on the classification models generate higher returns than
is natural to carry out a trading simulation to see if the higher
the one-month T-bill benchmark; the trading strategies based
predictability implies higher profitability. Given that this research
on the ANNs combining PCA generate higher returns than the
study is based on predicting the direction of S&P 500 ETF (SPY)
buy-and-hold benchmark except for three data sets (PCs = 3, 22,
daily returns, we modified the trading strategy for classification
and 31) where the returns are slightly less than the buy-and-hold
models defined by Enke and Thawornwong (2005) as follows:
benchmark; the returns from the trading strategies based on the
If UPt+1 = 1, fully invest in stocks or maintain, and receive the
ANNs combining FRPCA generate higher returns than the buy-and-
actual stock return for the day t + 1 (i.e., SPYt+1 ); if UPt+1 = 0,
hold benchmark except for four data sets (PCs = 1, 3, 10, and 60);
fully invest in one-month T-bills or maintain, and receive the
and the returns from the trading strategies based on the ANNs
actual one-month T-bill return for the day t + 1 (i.e., T1Ht+1 ).
combining KPCA generate higher returns than the buy-and-hold
Here UP is the direction of the SPY daily return as predicted by
benchmark except for six data sets (PCs = 1, 3, 6, 26, 31, and 34).
the models described in this paper. The actual one-month T-bill
Six paired t-tests are carried out to make a comparison of the
return for the day t + 1 is:
mean of daily return from three different model combinations.
discount rate term The results are given in Table 6.
T 1Ht+1 = ∗
100 360 days Since all the P-values are greater than 0.05, there is no sig-
T 1t+1 28 days T 1t+1 7 nificant difference among the mean of daily returns generated by
= ∗ = ∗ , (22) the models involving PCA, FRPCA, and KPCA given the thirty-six
100 360 days 100 90
transformed natural data sets. However, with more careful obser-
where T 1t+1 is the one-month T-bill discount rate (or risk-free vation of the P-values listed in Table 6, it seems that on average
rate) in percentage on the secondary market for business day t + 1. PCA performs slightly better than FRPCA and KPCA, while FRPCA
Specifically, at the beginning of each trading day, the investor performs slightly better than KPCA.
decides to buy the SPY portfolio or the one-month T-bill according The Sharpe ratio is calculated by dividing the mean daily
to the forecasted direction of the SPY daily return. For simplicity, it return by the standard deviation of the daily returns. The higher
is assumed in this paper that the money invested in either a stock the Sharpe ratio, the higher the return and the lower the standard
portfolio or T-bills is illiquid and detained in each asset during deviation or volatility, the better the trading strategy. Therefore,
the entire trading day. Dividends and transaction costs are also another six paired t-tests over the Sharpe ratio are performed to
136 X. Zhong, D. Enke / Expert Systems With Applications 67 (2017) 126–139
Table 5
Trading simulation results.
Buy-and-hold 3.08E-01
T-bill 3.89E-04
PCs Models Mean of daily return Std. of daily return Sharpe ratio PCs Models Mean of daily return Std. of daily return Sharpe ratio
In this study, a natural data set is collected and analyzed. The or KPCA. Nonetheless, the selection of a proper kernel function
ANN classifiers combing PCA are recognized as the simplest, but is important for the performance of KPCA. In the future, a more
relatively more accurate procedure. The trading strategies based delicate selection of the kernel functions and the relevant kernel
on this procedure generate slightly higher risk-adjusted profits parameters are suggested.
than the ones based on combing the ANNs with either FRPCA
Appendix
Table A1
The 60 financial and economic features of the raw data.
Group Name Description Source/Calculation
Table A1 (continued)
CTB6M Change in the market yield on US Treasury securities H. 15 Release - Federal Reserve Board of Governors
at 6-month constant maturity, quoted on investment
basis.
CTB1Y Change in the market yield on US Treasury securities H. 15 Release - Federal Reserve Board of Governors
at 1-year constant maturity, quoted on investment
basis.
CTB5Y Change in the market yield on US Treasury securities H. 15 Release - Federal Reserve Board of Governors
at 5-year constant maturity, quoted on investment
basis.
CTB10Y Change in the market yield on US Treasury securities H. 15 Release - Federal Reserve Board of Governors
at 10-year constant maturity, quoted on investment
basis.
AAA Change in the Moody’s yield on seasoned corporate H. 15 Release - Federal Reserve Board of Governors
bonds - all industries, Aaa.
BAA Change in the Moody’s yield on seasoned corporate H. 15 Release - Federal Reserve Board of Governors
bonds - all industries, Baa.
The term and default spreads TE1 Term spread between T120 and T1. TE1 = T120 - T1
TE2 Term spread between T120 and T3. TE2 = T120 - T3
TE3 Term spread between T120 and T6. TE3 = T120 - T6
TE5 Term spread between T3 and T1. TE5 = T3 - T1
TE6 Term spread between T6 and T1. TE6 = T6 - T1
DE1 Default spread between BAA and AAA. DE1 = BAA - AAA
DE2 Default spread between BAA and T120. DE2 = BAA - T120
DE4 Default spread between BAA and T6. DE4 = BAA - T6
DE5 Default spread between BAA and T3. DE5 = BAA - T3
DE6 Default spread between BAA and T1. DE6 = BAA - T1
DE7 Default spread between CD6 and T6. DE7 = CD6 - T6
Exchange rate between USD USD_Y Relative change in the exchange rate between US http://www.investing.com/currencies/usd-jpy-historical-data
and four other currencies dollar and Japanese yen.
(day t)
USD_GBP Relative change in the exchange rate between US http://www.investing.com/currencies/gbp- usd- historical- data
dollar and British pound. (then, take the opposites to the changes)
USD_CAD Relative change in the exchange rate between US http://www.investing.com/currencies/usd- cad- historical- data
dollar and Canadian dollar.
USD_CNY Relative change in the exchange rate between US http://www.investing.com/currencies/usd-cny-historical-data
dollar and Chinese Yuan (Renminbi).
The return of the other seven HSI Hang Seng index return. finance.yahoo.com
world major indices (day t)
SSE Composite Shang Hai Stock Exchange Composite index return. finance.yahoo.com
FCHI CAC 40 index return. finance.yahoo.com
FTSE FTSE 100 index return. finance.yahoo.com
GDAXI DAX index return. finance.yahoo.com
DJI Dow Jones Industrial Average index return. finance.yahoo.com (no download function for this one);
measuringworth.com/datasets/DJA/result.php
IXIC NASDAQ Composite index return. finance.yahoo.com
SPY trading volume (day t) V Relative change in the trading volume of S&P 500 finance.yahoo.com
index (SPY)
The return of the eight big AAPL Apple Inc stock return. finance.yahoo.com
companies in S&P 500 (day
t)
MSFT Microsoft stock return. finance.yahoo.com
XOM Exxon Mobil stock return. finance.yahoo.com
GE General Electric stock return. finance.yahoo.com
JNJ Johnson and Johnson stock return. finance.yahoo.com
WFC Wells Fargo stock return. finance.yahoo.com
AMZN Amazon.com Inc stock return. finance.yahoo.com
JPM JPMorgan Chase & Co stock return. finance.yahoo.com
References Cao, Q., Leggio, K. B., & Schniederjans, M. J. (2005). A comparison between Fama and
French’s model and artificial neural networks in predicting the Chinese stock
Amornwattana, S., Enke, D., & Dagli, C. (2007). A hybrid options pricing model us- market. Computers & Operations Research, 32(10), 2499–2512.
ing a neural network for estimating volatility. International Journal of General Cao, L., & Tay, F. (2001). Financial forecasting using vector machines. Neural Comput-
Systems, 36, 558–573. ing and Applications, 10, 184–192.
Armano, G., Marchesi, M., & Murru, A. (2005). A hybrid genetic-neural architecture Cervelló-Royo, R., Guijarro, F., & Michniuk, K. (2015). Stock market trading rule
for stock indexes forecasting. Information Sciences, 170(1), 3–33. based on pattern recognition and technical analysis: Forecasting the DJIA index
Atsalakis, G. S., & Valavanis, K. P. (2009). Surveying stock market forecasting tech- with intraday data. Expert Systems with Applications, 42(14), 5963–5975.
niques – Part II: Soft computing methods. Expert Systems with Applications, Chavarnakul, T., & Enke, D. (2008). Intelligent technical analysis based equivolume
36(3), 5941–5950. charting for stock trading using neural networks. Expert Systems with Applica-
Bao, D., & Yang, Z. (2008). Intelligent stock trading system by turning point confirm- tions, 34, 1004–1017.
ing and probabilistic reasoning. Expert Systems with Applications, 34, 620–627. Chen, T. L., & Chen, F. Y. (2016). An intelligent pattern recognition model for
Barak, S., Dahooie, J. H., & Tichý, T. (2015). Wrapper ANFIS-ICA method to do stock supporting investment decisions in stock market. Information Sciences, 346,
market timing and feature selection on the basis of Japanese Candlestick. Expert 261–274.
Systems with Applications, 42(23), 9221–9235. Chen, A. S., Leung, M. T., & Daouk, H. (2003). Application of neural networks to
Bogullu, V. K., Enke, D., & Dagli, C. (2002). Using neural networks and technical an emerging financial market: Forecasting and trading the Taiwan stock index.
Indicators for generating stock trading signals. Intelligent Engineering Systems Computers and Operations Research, 30(6), 901–923.
through Artificial Neural Networks, 12, 721–726. Chiang, W. C., Enke, D., Wu, T., & Wang, R. (2016). An adaptive stock index trading
decision support system. Expert Systems with Applications, 59, 195–207.
X. Zhong, D. Enke / Expert Systems With Applications 67 (2017) 126–139 139
Chourmouziadis, K., & Chatzoglou, P. D. (2016). An intelligent short term stock trad- Patel, J., Shah, S., Thakkar, P., & Kotecha, K. (2015). Predicting stock market index
ing fuzzy system for assisting investors in portfolio management. Expert Systems using fusion of machine learning techniques. Expert Systems with Applications,
with Applications, 43, 298–311. 42, 2162–2172.
Chun, S. H., & Kim, S. H. (2004). Data mining for financial prediction and trad- Pearson, K. (1901). On lines and planes of closest fit to systems of points in space.
ing: Application to single and multiple markets. Expert Systems with Applications, Philosophical Magazine, 2(6), 559–572.
26(2), 131–139. Rather, A. M., Agarwal, A., & Sastry, V. N. (2015). Recurrent neural network and a
Deboeck, G. J. (1994). Trading on the edge: Neural, genetic, and fuzzy systems for hybrid model for prediction of stock returns. Expert Systems with Applications,
chaotic financial markets. New York: Wiley. 42, 3234–3241.
Enke, D., Ratanapan, K., & Dagli, C. (20 0 0). Large machine-part family formation Refenes, A. P. N., Burgess, A. N., & Bentz, Y. (1997). Neural networks in financial
utilizing a parallell ART1 neural network. Journal of Intelligent Manufacturing, engineering: A study in methodology. IEEE Transactions on Neural Networks, 8(6),
11(6), 591–604. 1222–1267.
Enke, D., & Mehdiyev, N. (2013). Stock market prediction using a combination of Saad, E. W., Prokhorov, D. V., & Wunsch, D. C. (1998). Comparative study of stock
stepwise regression analysis, differential evolution-based fuzzy clustering, and a trend prediction using time delay, recurrent and probabilistic neural networks.
fuzzy inference neural network. Intelligent Automation & Soft Computing, 29(4), IEEE Transactions on Neural Networks and Learning Systems, 9(6), 1456–1470.
636–649. Sarantis, N. (2001). Nonlinearities, cyclical behavior and predictability in stock mar-
Enke, D., & Thawornwong, S. (2005). The use of data mining and neural networks kets: International evidence. International Journal of Forecasting, 17(3), 459–482.
for forecasting stock market returns. Expert Systems with Applications, 29(4), Schölkopf, B., Smola, A., & Müller, K. R. (1998). Nonlinear component analysis as a
927–940. kernel eigenvalue problem. Neural Computing, 10(5), 1299–1319.
Franses, P. H., & Ghijsels, H. (1999). Additive outliers, GARCH and forecasting volatil- Shawe-Taylor, J., & Christianini, N. (2004). Kernel methods for pattern analysis. New
ity. International Journal of Forecasting, 15(1), 1–9. York: Cambridge University Press.
Guresen, E., Kayakutlu, G., & Daim, T. U. (2011). Using artificial neural network mod- Shen, L., & Loh, H. T. (2004). Applying rough sets to market timing decisions. Deci-
els in stock market index prediction. Expert Systems with Applications, 38(8), sion Support Systems, 37(4), 583–597.
10389–10397. Sorzano, C. O. S., Vargas, J., & Pascual-Montano, A. (2014). A survey of dimensional-
Hansen, J. V., & Nelson, R. D. (2002). Data mining of time series using stacked gen- ity reduction techniques. Cornell University Library Abstracts (pp. 1–35).
eralizers. Neurocomputing, 43(1-4), 173–184. Thawornwong, S., & Enke, D. (2004). The adaptive selection of financial and eco-
Hussain, A. J., Knowles, A., Lisboa, P. J. G., & El-Deredy, W. (2007). Financial time se- nomic variables for use with artificial neural networks. Neurocomputing, 56,
ries prediction using polynomial pipelined neural networks. Expert Systems with 205–232.
Applications, 35(3), 1186–1199. Thawornwong, S., Enke, D., & Dagli, C. (2001). Using neural networks and techni-
Jaisinghani, D. (2016). An empirical test of calendar anomalies for the Indian secu- cal analysis indicators for predicting stock trends. Intelligent Engineering Systems
rities markets. South Asian Journal of Global Business Research, 5(1), 53–84. through Artificial Neural Networks, 11, 05–232.
Jensen, M. (1978). Some anomalous evidence regarding market efficiency. Journal of Ture, M., & Kurt, I. (2006). Comparison of four different time series methods to fore-
Financial Economics, 6(2/3), 95–101. cast hepatitis A virus infection. Expert Systems with Applications, 31(1), 41–46.
Jolliffe, T. (1986). Principal component analysis. New York: Springer-Verlag. Turk, M., & Pentland, A. (1991). Eigenfaces for recognition. Journal of Cognitive Neu-
Kara, Y., Boyacioglu, M. A., & Baykan, O. K. (2011). Predicting direction of stock price roscience, 3(1), 71–86.
index movement using artificial neural networks and support vector machines: Van Der Maaten, L., Postma, E., & Van den Herik, J. (2009). Dimensionality reduc-
The sample of the Istanbul stock exchange. Expert Systems with Applications, tion: A comparative. Journal of Machine Learning Research, 10(1-41), 66–71.
38(5), 5311–5319. Vanstone, B., & Finnie, G. (2009). An empirical methodology for developing stock
Kim, Y., & Enke, D. (2016). Developing a rule change trading system for the futures market trading systems using artificial neural networks. Expert Systems with Ap-
market using rough set analysis. Expert Systems with Applications, 59, 165–173. plications, 36(3), 6668–6680.
Kim, K. J., & Han, I. (20 0 0). Genetic algorithms approach to feature discretization in Vapnik, V. (1998). Statistical learning theory. New York: Wiley.
artificial neural networks for the predication of stock price index. Expert Systems Vellido, A., Lisboa, P. J. G., & Meehan, K. (1999). Segmentation of the on-line shop-
with Applications, 19(2), 125–132. ping market using neural networks. Expert Systems with Applications, 17(4),
Lam, M. (2004). Neural network techniques for financial performance prediction: 303–314.
Integrating fundamental and technical analysis. Decision Support Systems, 37(4), Wang, J. Z., Wang, J. J., Zhang, Z. G., & Guo, S. P. (2011). Forecasting stock indices
567–581. with back propagation neural network. Expert Systems with Applications, 38(11),
Leung, M. T., Daouk, H., & Chen, A. S. (20 0 0). Forecasting stock indices: A compar- 14346–14355.
ison of classification and level estimation models. International Journal of Fore- Wang, Y. F. (2002). Predicting stock price using fuzzy grey prediction system. Expert
casting, 16(2), 173–190. Systems with Applications, 22(1), 33–39.
Lo, A. W., & MacKinlay, A. C. (1988). Stock market prices do not follow random Xu, L., & Yuille, A. L. (1995). Robust principal component analysis by self-organizing
walks: Evidence from a simple specification test. Review of Financial Studies, rules based on statistical physics approach. IEEE Transactions on Neural Networks,
1(1), 41–66. 6(1), 131–143.
Luukka, P. (2011). A new nonlinear fuzzy robust PCA algorithm and similarity clas- Yang, T. N., & Wang, S. D. (1999). Robust algorithms for principal component analy-
sifier in classification of medical data sets. International Journal of Fuzzy Systems, sis. Pattern Recognition Letters, 20, 927–933.
13(3), 153–162. Yao, J., Tan, L. C., & Poh, H. (1999). Neural networks for technical analysis: A study
Monfared, S. A., & Enke, D. (2014). Volatility forecasting using a hybrid GJR-GARCH on KLCI. International Journal of Theoretical and Applied Finance, 2(2), 221–241.
neural network model. Procedia Computer Science, 36, 246–253. Yaser, S., & Atiya, A. F. (1996). Introduction to financial forecasting. Applied Intelli-
Navidi, W. (2011). Statistics for engineers and scientists (3rd ed.). New York: Mc- gence, 6(3), 205–213.
Graw-Hill. Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8, 338–353.
Niaki, S. T. A., & Hoseinzade, S. (2013). Forecasting S&P 500 index using artificial Zhang, G. (2003). Time series forecasting using a hybrid ARIMA and neural network
neural networks and design of experiments. Journal of Industrial Engineering In- model. Neurocomputing, 50, 159–175.
ternational, 9(1), 1–9. Zhang, G., Patuwo, B. E., & Hu, M. Y. (1998). Forecasting with artificial neural net-
O’Connor, N., & Madden, M. G. (2006). A neural network approach to predict- works: The state of the art. International Journal of Forecasting, 14(1), 35–62.
ing stock exchange movements using external factors. Knowledge-Based Systems, Zhong, X. (20 0 0). Classification and cluster mining. Zhejiang University.
19(5), 371–378. Zhong, X. (2004). A study of several statistical methods for classification with applica-
Oja, E. (1995). The nonlinear PCA learning rule and signal separation – mathematical tion to microbial source tracking Master thesis. Worcester Polytechnic Institute.
analysis. Helsinki University of Technology Technical Report. Zhong, X., Ma, S. P., Yu, R. Z., & Zhang, B. (2001). Data mining: A survey. Pattern
Oja, E., & Karhunen, J. (1985). On stochastic approximation of the eigenvectors and Recognition and Artificial Intelligence, 14(1), 48–55.
eigenvalues of the expectation of a random matrix. Journal of Mathematical Zhu, X. T., Wang, H., Xu, L., & Li, H. Z. (2008). Predicting stock index increments
Analysis and Applications, 106, 69–84. by neural networks: The role of trading volume under different horizons. Expert
Systems with Applications, 34(4), 3043–3054.