Вы находитесь на странице: 1из 5

Impact of Financial Ratios and Technical Analysis on

Stock Price Prediction Using Random Forests

Loke K.S.
Faculty of Engineering, Computing and Science.
Swinburne University of Technology Sarawak Campus
Sarawak, Malaysia
ksloke@swinburne.edu.my

Abstract— A stock movement prediction method is presented returns. Lewellen [9] also found that dividend yield predicts
using quarterly financial ratio data from Hong Kong companies market returns. However, Goyal and Welch [10] argues that the
from the period, 2011-2014. We found that the accuracy of price evidence was too weak. Lau et al [11] found earnings- price
movement prediction using Random Forest method over multiple ratio relationship to market returns to be conditional. Many of
quarters to be fairly weak. However we were able to predict with these studies (as above) used statistical analysis, regression and
high accuracy in the last quarter of 2014 and not in other years. ordinary least squares to find the relationship between price
We attribute this not to the superiority of the method but to the and the financial ratios. The application of artificial intelligence
non-stationary nature of the price signals. and machine learning was not widespread.
Keywords— Stock Prediction, Stock market, Random Forest, However in the field of technical analysis the use of
Financial Ratios machine learning methods were quite common. The use of
evolutionary methods such as Genetic Algorithms [12], Swarm
I. INTRODUCTION Optimization [13] and Evolutionary Learning [14] is also
popular. Patel et al compared the use of Artificial Neural
The use of artificial intelligence and machine learning
Networks, Support Vector Machines, Random Forests and
techniques to determine future trends of the stock market is an
Naïve Bayes on prediction stock movement direction [15].
active research area. Even though the Efficient Market
Their data indicated good results with Random Forests.
Hypothesis [1] posits that all relevant information are already
Similarly, Ballings et al [16] also compared a variety of
reflected in the prices and impossible to outperform the market, ensemble algorithms against single classifiers models including
this thesis has its critics. Some studies show that at very short Support Vector Machines, Neural Networks and Logistic
time span, price movements can be predicted better than Regression. They also concluded that Random Forest ensemble
chance [2] . Others have found that news can effect price method should be used for stock price direction prediction.
movements as well [3] [4] [5]. In this research we studied the
Ladyzynski et al [17] also used Random Forest for stock price
effect of financial ratios on market price prediction using
trend detection. Eventhough they failed to generate a profitable
random forests methods which is a well-known method in
trading strategy, they concluded that the artificial intelligence
machine learning introduced by Breiman [6].
approach is promising. Ash et al [18] used a recency-weighted
This paper is organized as follows. In section 2, we will Random Forest to take into consideration seasonality which
review some of the previous related works that used machine they claimed superior results. Given that a number of recent
learning methods. We describe the Random Forest algorithm in works have been optimistic on ensemble methods like Random
section 3. Next in section 4, we will present our approach and Forest, we decided to adopt Random Forest in our tests.
methods. The experimental results are presented in section 5.
Finally, the conclusion of our work is summarized in the last III. RANDOM FOREST
section.
Random Forest is an ensemble classification algorithm that
uses a collection of decision tree in combination. Random
II. LITERATURE REVIEW Forest was first introduced by Leo Breiman [6] following on
There have been many empirical researches on the the ideas of Amit et al [19] and Ho [20]. The method requires
predictive power of financial ratios and many of the results are the random selection of features (or attributes) to split at each
mixed. In his review, Hjalmarsson [7] has found that using of the decision tree node. The random factor makes the
dividend- and earnings-price ratios as regressors were sensitive individual trees uncorrelated. This makes the Random Forest
to sample period and choice of frequency. His own research robust to noise and resistant to over training. Each of the trees,
showed that traditional valuation methods such as dividend- at the end of the tree traversal, will cast a vote for the
and earnings-price had very limited predictive power. Fama classification of the input class; the sum of the total vote that
and French [8] have found weak evidence on dividend yield constitutes the majority will be the classification. A single
predictability on monthly (New York Stock Exchange) NYSE random tree classifier will only have a slightly better than

978-1-5386-0765-7/17/$31.00 ©2017 IEEE


38
random classification but combining them as an ensemble can 2) Experiment B. Varied the attributes (columns) and
produce very much improved accuracy. A feature of Random predicted the returns in the Q3 and Q4 dataset. All those with
Forest is that it does not overfit but will reach a limiting value incomplete data (with NA) were removed.
of generalization error.
a) Used Q1 2014 data
IV. APPROACH b) Used Q1 2013 data
3) Experiment C. Varied the attributes (columns), trained
We used a dataset that consists of 433 companies listed in
the Hong Kong Stock Exchange from 2011-2014. We calculate with Q3 data and predict the returns in the Q4 dataset. This is
for each of the companies a set of 63 attributes in which the called a one quarter walk forward test.
majority of them were related to financial ratios per quarter. a) Used combined 2013-2014 data
For each quarter, the returns in the next quarter were also b) Used combined 2011-2012 data
calculated based on historical data. For example, if the
financial data was for quarter 1 (Q1), then the returns column
would have the difference in price from the start till the end of Experiments A and B made use of the next-quarter-returns
the next quarter, i.e. Q2. However, not all the financial ratios in creating the model. However that makes it not useful for
were available for all companies, therefore some values were actual prediction because the next-quarter-returns were
missing. The data format per quarter is as follows: [Company- required to create the model.
id, ratio1, ratio2, ratio3, …, ratio62, next-quarter-returns]. So Experiment C removed that requirement by training on Q2-
for Q1 data, the returns value is actually the next quarter Q4 returns and predicting Q5 return direction using Q3-Q4
returns, i.e. Q2. All data have been normalized to have zero values, that is, we treat it as a single quarter moving time
mean and variance of 1. series. We created a model using all Q1 attributes, Q2-returns
Some of financial ratios included are: Liquidation and Q3-returns to classify Q4-returns. This model was then
value/Market Cap, Book Asset Value/Market Cap, used to classify Q5-return direction. All Q5 values have not
Sales/Market Cap, EBITDA/Enterprise Value, been used in the training set.
Earnings/Market Cap, Operating Cash Flow/Market Cap, In all the experiments we used Weka to perform the
Dividend/Market Cap, Return on Assets, Return on equity, training and classification. In some cases we also performed
Return on invested capital, Net Asset Value/Total Assets, the same experiment using Rattle/R [22]. We used the Random
Revenue growth first half, Earnings per share first half, Net Forest [6] classifier in Rattle/R to obtain the important
Assets Growth Rate over 5 years, Return on Asset Margin over attributes. We exclusively used the Random Forest classifier in
5 years, Book to Price previous 5 years average deviation, Weka [23], varying the number of trees and random attributes
market capitalization and Dividend Yield ratio previous 5 years to choose from. We typically use 100-400 trees and with 10-
average deviation. The rest are variations of the above fold validation unless mentioned otherwise. Some of the
calculated with slightly different periods and transformations. attributes might not be independent since the random forest
We investigated if the calculated financial ratios have any algorithm would only randomly selects ome of attributes for
impact on future price direction. Instead of using actual return each tree.Varying the number of trees and attributes only
values, we set the values to be 1 or 0, depending on whether it resulted in marginal changes.
is above a threshold (e.g. threshold=0) or below it.
V. EXPERIMENTAL RESULTS
A. Experiments Experiment A(a) used combined quarters of Q4 from 2011
We performed the following experiments. to 2014 with the companies that had incomplete information
removed. The classes were balanced by removing some classes
1) Experiment A.Used all attributes (columns) but vary NA manually so that class distribution was less skewed (0=195,
values, quarter, year and threshold 1=173). We also tested classes that were resampled using the
a) Combined all Q4 as rows from 2011-2014 by using Weka Class balancer to create an equal class distribution
all columns to create a model to classify returns direction with (0=240,1=240). Same quarters were used because there might
threshold set to 0. We only used Q4 instead of Q1 through Q4 be seasonal patterns. The classifier was to return Q4-returns
because the data may be seasonal [21]. The table rows were (which are the next quarter returns) direction. The results are
partitioned into 10-fold validation. shown in Table 1.
b) Model each quarter (Q1,Q2,Q3,Q4) in 2013 and
2014 to classify return direction separately. TABLE I. 2011-2014 Q4 RETURNS DIRECTION CLASSIFICATION

c) As in b) above but used different threshold. Balancing Trees Accuracy % Kappa AUC
Manual 100 65.22 0.2931 -
d) As in b) above but used different NA values. We Manual 200 66:03 0.3081 -
wanted to test what NA values should be used. Manual 400 66.30 0.3147 0.699
Weka 400 62.90 0.2596 0.709

978-1-5386-0765-7/17/$31.00 ©2017 IEEE


39
There were only marginal differences in using different TABLE II. 2013-2014 PER QUARTER RETURNS CLASSIFICATION
number of trees. Secondly reporting accuracy (the percentage QYYYY Threshold NA Accuracy AUC
of correctly classified instances) was misleading because of 12013 0.2 99 70.43 0.755
skewed class distribution, so would be reported as Area under 22013 0.2 99 70.2 0.659
the Curve (AUC) values instead [24]. A value of AUC=0.5 or 32013 0.2 99 67.9 0.712
less was random and a value of 1.0 was perfect classification. 42013 0.2 99 65.5 0.652
A good value of AUC should be around 0.8 and above. 12014 0 0 67.2 0.651
12014 0.2 0 83.4 0.616
A(b)-(c). The per quarter results are presented in Table II. 22014 0.2 0 61.4 0.633
The number of trees was 400. The number of companies is 433 42014 0 0 67.9 0.718
in each quarter. The input attributes were the entire financial 42014 0.2 0 66.1 0.718
ratios to classify the next quarter return direction. The accuracy
% precentage should not be used for comparison because of
skewed class distribution.

Fig. 1. Hang Seng Index Jan 2011- Jan 2015 with SMA and EMA overlays

TABLE III. 2014 Q3 AND Q4 RETURNS DIRECTION PREDICTION WITH DIFFERENT ATTRIBUTES
All Return Return.2 Return.3 RET.2 RET.3 R3 Class AUC
D D D D D D D R4 0.853
D D D R4 0.739
D D R4 0.645
D D D D R4 0.889
D D D R3 0.741
D D D R3 0.819
D D D D D D R4 0.867
D D D R4 0.849
D D D R4 0.864

TABLE IV. 2013 Q3 AND Q4 RETURNS DIRECTION PREDICTION WITH DIFFERENT ATTRIBUTES
All Return Return.2 Return.3 RET.2 RET.3 R3 Class AUC
D D D D D D D R4 0.937
D D D D D D R4 0.972
D D D R4 0.847
D R4 0.724
D D R4 0.761
Table III shows the results of 2014 quarters using different The values indicated that R4 values (direction) were fairly
attributes. Return.2 refers to the third quarter returns. RET.2 predictive without the financial ratios, in other words, it was
refers to the cumulative value by adding Return and Return.2; predictive based on prices alone. Given various financial ratios
similarly for RET.3. R3 were set to 1 if the RET.3 were above calculated over every quarter and the returns for subsequent
one, and zero otherwise. Table IV shows similar results for quarter, the 4th quarter was quite predictable. In fact, it was
2013. predictable without the financial ratios. However, this form of
classification used R4 information in generating the model.

978-1-5386-0765-7/17/$31.00 ©2017 IEEE


40
We have found that prices were informative in predicting These results also explained why some research reported
price directions. In Table V, the results of using quarter returns highly predictive values; that was by selecting the appropriate
as a time series prediction with one quarter lag were presented. time period for their experiments to obtain high prediction
The table also shows which attributes were used. We trained scores.
the model using up to Q3 data and used it to classify Q4 data.
The results implied that the correct predictive model may
need to be episodic (essentially non-stationary), that is the
TABLE V. USING Q3 LAGGED VALUES TO PREDICT PRICE model is only useful within a period of time and may be
DIRECTION AT Q4 FOR 2011 AND 2014 required to be updated when there is a change in the operating
QYYYY All Return,Return Accuracy Kappa AUC environment. Many researches have shown that stock prices
.2,RET.2,R3 are non-stationary [17]. This suggests a research direction to
12014 D D 88.98 0.7761 0.920 identify what are the triggers for the change in the environment
12014 D 88.04 0.6936 0.880
(also called regime change) that will prompt model relearning.
Some of these triggers will be an external event in the macro
12011 D D 38.85 -0.1238 0.414
environment, and will be conveyed through news events or
12011 D 34.20 -0.138 0.396 social media. It can also be conveyed through the early price
movements. Many of the researches use these signals directly
There seemed to be an anomaly here. The data was for prediction; we believe that it will be useful to study them as
predictive for 2014 but not for 2011. We would discuss this triggers for model re-learning.
further in the conclusion section.
REFERENCES
VI. DISCUSSION [1] E. F. Fama, "Efficient Captial markets: A Review of Theory and
The results reported in Table I and II showed just slightly Emprical Work.," Journal of Finance, vol. 25, no. 2, pp. 383-417, 1970.
better than random results. This indicated that using financial [2] M. Rechenthin and W. Street, "Using conditional probability to identify
ratios to predict the next quarter results was not reliable. This trends in intra-day high-frequencey equity pricing," Physica A;
Statistical Mechanics and its Applications, vol. 392, no. 24, pp. 6169-
was despite the fact that the next quarter return values were 6188, 2013.
used in creating the model. We did not test further the impact [3] G. Gidofalvi and C. Elkan, "Using News Articles to Predict Stock Price
of using different threshold or NA values since the prediction Movements," Department of Computer Science and Engineering,
results were not good, increasing slight accuracy was not University of California, San Diego, 2003.
meaningful. [4] K. S. Loke and P. Chan, "Prediction of Individual Stock Movements in
Bursa Malaysia using Online News," in SME-Entrepreunership Global
Tables III and IV showed the results for longer term impact Conference, Kuala Lumpur, 2006.
beyond the one quarter returns. They tested for 1 year (4 [5] G. P. Fung, J. X. Yu and W. Lam, "News Sensitive Stock Trend
quarters) cumulative return direction (R3) and for 5 quarters Prediction," in Proceedings of the 6th Pacific-Asia Conference on
return direction (R4). The results were more positive here, Advances in Knowledge Discovery and Data Mining., Taipei, 2002.
seemingly there was a correlation between the inputs to the [6] L. Breiman, "Random forestes.," Machine Learning, vol. 45, pp. 5-32,
predicted class. It seemed there was a close relationship 2001.
between the quarterly returns to R3 and R4, again indicating [7] E. Hjalmarsson, "On the Predictability of Global Stock Returns," School
that the financial ratios did not play a big role. On closer of Business, Economics and Law. Goteborg University, Gothenburg,
2005.
examination, R3 and R4 were also highly correlated.
[8] E. Fama and K. French, "Dividend yields and expected stock returns,"
However, the same results were not obtained when using Journal of Financial Economics, vol. 22, pp. 3-25, 1998.
2011 data, as shown in Table V. The results in Table V were [9] J. Lewellen, "Predicting returns with financial ratios," Journal of
obtained using a model trained up to R3 but tested on R4, Financial Economics, vol. 74, pp. 209-235, 2004.
unlike in previous results. Good results were obtained for 2014. [10] A. Goyal and I. Welch, "A Note on "Predicting Returns with Financial
Ratios"," Yale School of Management, 2003.
For the year 2011, the model trained up to R3 could not predict
values for R4 that it had not seen. [11] S. T. Lau, T. C. Lee and T. H. McInish, "Stock Returns and Beta, Firms
Size, E/P, CF/P, Book to Market and Sales Growth: Evidence from
This can be cleared up by examining the overall Hong Singapore and Malaysia," Journal of Multinational Financial
Management, vol. 12, pp. 207-222, 2002.
Kong Hang Seng index for those relevant years. Fig. 1 shows
the period from January 2011 to January 2015 with Simple [12] S. Mabu, K. Hirasawa, M. Obayashi and T. Kuremoto, "Enhanced
decision making mechanism of rule-based genetic network programming
Moving Average (SMA-50) and Exponential Moving Average for creating stock trading signals," Expert Systems with Applications,
(EMA-50), both with a period of 50 days. The difference vol. 40, pp. 6311-6320, 2013.
between the two periods 2011 and 2014 can be seen clearly. [13] F. Wang, P. L. Yu and D. W. Cheung, "Combining technical trading
The period from 2011 to early 2012 showed an early rise and rules using particle swarm optimization," Expert Systems with
drop whereas in 2014, it was a steady rise until early 2015. Applications, vol. 41, pp. 3016-3026, 2014.
This would explain why the model for 2014 could be predictive [14] Y. Hu, B. Feng, X. Zhang, E. Ngai and M. Liu, "Stock trading rule
for 1Q 2015 (R4) and not for 2012 because the prices in 2011- discovery with an evolutionary trend following model," Expert Systems
with Applications, vol. 42, pp. 212-222, 2015.
2012 were fluctuating. The simple model was not powerful
[15] J. Patel, S. Shah, P. Thakkar and K. Kotecha, "Predicting stock and
enough to take into consideration all the cause of the stock price index movement using Trend Deterministic Data Preparation
fluctuation. and machine learning techniques," Expert Systems with Applications,
vol. 42, pp. 259-268, 2015.

978-1-5386-0765-7/17/$31.00 ©2017 IEEE


41
[16] M. Ballings, D. V. den Poel, N. Hespeels and R. Gryp, "Evaluating [21] R. Ariel, "A monthly effect in stock returns," Journal of Financial
multiple classifiers for stock price direction prediction," Expert Systems Economics, vol. 18, pp. 161-174, 1987.
with Applications, vol. 42, pp. 7046-7056, 2015. [22] G. J. Williams, Data Mining with Rattle and R: The Art of Excavating
[17] P. Ladyzynski, K. Zbikowsk and P. Grzegorzewski, "Stock Trading Data for Knowledge Discovery, Springer, 2011.
With Random Forests, Trend Detection Tests and Force Index Volume [23] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann and I.
Indicators," Artificial intelligence and soft computing, vol. 1, pp. 441- Witten, "The WEKA Data Mining Software: An Update," SIGKDD
452, 2013. Explorations, vol. 11, no. 1, 2009.
[18] A. Booth, E. Gerding and F. McGroarty, "Automated trading with [24] T. Fawcett, "ROC graphs: Notes and practical considerations for
performance weighted random forests," Expert Systems with researchers," ReCALL, Vols. HPL-2003-4, no. 31, pp. 1-38, 2004.
Applications, p. 3651–3661, 2014.
[25] M. Ariff, M. Shamsher and M. N. Annuar, "Stock Pricing in Malaysia:
[19] Y. Amit and D. Geman, "Shape quantization and recognition with Financial and Investment Management," in Financial Economics
randomized trees," Neural Computation, vol. 9, no. 7, pp. 1545-1588, Behaviour of an Emerging Capital Market, Serdang, University Putra
1997. Malaysia Press, 1998.
[20] T. Ho , "The Random Subspace Method for Constructing Decision [26] X. Jiang and B.-S. Lee, "Do Decomposed Financial Ratios Predict Stock
Forests," IEEE Transactions on Pattern Analysis and Machine Returns and Fundamentals Better?," Social Science Research Network,
Intelligence, vol. 20, no. 8, pp. 822-844, 1998. 2009.

978-1-5386-0765-7/17/$31.00 ©2017 IEEE


42

Вам также может понравиться