Академический Документы
Профессиональный Документы
Культура Документы
Loke K.S.
Faculty of Engineering, Computing and Science.
Swinburne University of Technology Sarawak Campus
Sarawak, Malaysia
ksloke@swinburne.edu.my
Abstract— A stock movement prediction method is presented returns. Lewellen [9] also found that dividend yield predicts
using quarterly financial ratio data from Hong Kong companies market returns. However, Goyal and Welch [10] argues that the
from the period, 2011-2014. We found that the accuracy of price evidence was too weak. Lau et al [11] found earnings- price
movement prediction using Random Forest method over multiple ratio relationship to market returns to be conditional. Many of
quarters to be fairly weak. However we were able to predict with these studies (as above) used statistical analysis, regression and
high accuracy in the last quarter of 2014 and not in other years. ordinary least squares to find the relationship between price
We attribute this not to the superiority of the method but to the and the financial ratios. The application of artificial intelligence
non-stationary nature of the price signals. and machine learning was not widespread.
Keywords— Stock Prediction, Stock market, Random Forest, However in the field of technical analysis the use of
Financial Ratios machine learning methods were quite common. The use of
evolutionary methods such as Genetic Algorithms [12], Swarm
I. INTRODUCTION Optimization [13] and Evolutionary Learning [14] is also
popular. Patel et al compared the use of Artificial Neural
The use of artificial intelligence and machine learning
Networks, Support Vector Machines, Random Forests and
techniques to determine future trends of the stock market is an
Naïve Bayes on prediction stock movement direction [15].
active research area. Even though the Efficient Market
Their data indicated good results with Random Forests.
Hypothesis [1] posits that all relevant information are already
Similarly, Ballings et al [16] also compared a variety of
reflected in the prices and impossible to outperform the market, ensemble algorithms against single classifiers models including
this thesis has its critics. Some studies show that at very short Support Vector Machines, Neural Networks and Logistic
time span, price movements can be predicted better than Regression. They also concluded that Random Forest ensemble
chance [2] . Others have found that news can effect price method should be used for stock price direction prediction.
movements as well [3] [4] [5]. In this research we studied the
Ladyzynski et al [17] also used Random Forest for stock price
effect of financial ratios on market price prediction using
trend detection. Eventhough they failed to generate a profitable
random forests methods which is a well-known method in
trading strategy, they concluded that the artificial intelligence
machine learning introduced by Breiman [6].
approach is promising. Ash et al [18] used a recency-weighted
This paper is organized as follows. In section 2, we will Random Forest to take into consideration seasonality which
review some of the previous related works that used machine they claimed superior results. Given that a number of recent
learning methods. We describe the Random Forest algorithm in works have been optimistic on ensemble methods like Random
section 3. Next in section 4, we will present our approach and Forest, we decided to adopt Random Forest in our tests.
methods. The experimental results are presented in section 5.
Finally, the conclusion of our work is summarized in the last III. RANDOM FOREST
section.
Random Forest is an ensemble classification algorithm that
uses a collection of decision tree in combination. Random
II. LITERATURE REVIEW Forest was first introduced by Leo Breiman [6] following on
There have been many empirical researches on the the ideas of Amit et al [19] and Ho [20]. The method requires
predictive power of financial ratios and many of the results are the random selection of features (or attributes) to split at each
mixed. In his review, Hjalmarsson [7] has found that using of the decision tree node. The random factor makes the
dividend- and earnings-price ratios as regressors were sensitive individual trees uncorrelated. This makes the Random Forest
to sample period and choice of frequency. His own research robust to noise and resistant to over training. Each of the trees,
showed that traditional valuation methods such as dividend- at the end of the tree traversal, will cast a vote for the
and earnings-price had very limited predictive power. Fama classification of the input class; the sum of the total vote that
and French [8] have found weak evidence on dividend yield constitutes the majority will be the classification. A single
predictability on monthly (New York Stock Exchange) NYSE random tree classifier will only have a slightly better than
c) As in b) above but used different threshold. Balancing Trees Accuracy % Kappa AUC
Manual 100 65.22 0.2931 -
d) As in b) above but used different NA values. We Manual 200 66:03 0.3081 -
wanted to test what NA values should be used. Manual 400 66.30 0.3147 0.699
Weka 400 62.90 0.2596 0.709
Fig. 1. Hang Seng Index Jan 2011- Jan 2015 with SMA and EMA overlays
TABLE III. 2014 Q3 AND Q4 RETURNS DIRECTION PREDICTION WITH DIFFERENT ATTRIBUTES
All Return Return.2 Return.3 RET.2 RET.3 R3 Class AUC
D D D D D D D R4 0.853
D D D R4 0.739
D D R4 0.645
D D D D R4 0.889
D D D R3 0.741
D D D R3 0.819
D D D D D D R4 0.867
D D D R4 0.849
D D D R4 0.864
TABLE IV. 2013 Q3 AND Q4 RETURNS DIRECTION PREDICTION WITH DIFFERENT ATTRIBUTES
All Return Return.2 Return.3 RET.2 RET.3 R3 Class AUC
D D D D D D D R4 0.937
D D D D D D R4 0.972
D D D R4 0.847
D R4 0.724
D D R4 0.761
Table III shows the results of 2014 quarters using different The values indicated that R4 values (direction) were fairly
attributes. Return.2 refers to the third quarter returns. RET.2 predictive without the financial ratios, in other words, it was
refers to the cumulative value by adding Return and Return.2; predictive based on prices alone. Given various financial ratios
similarly for RET.3. R3 were set to 1 if the RET.3 were above calculated over every quarter and the returns for subsequent
one, and zero otherwise. Table IV shows similar results for quarter, the 4th quarter was quite predictable. In fact, it was
2013. predictable without the financial ratios. However, this form of
classification used R4 information in generating the model.