Вы находитесь на странице: 1из 8

Subspace Clustering

3D Subspace
Clustering for
Value Investing
Kelvin Sim, Institute for Infocomm Research

Vivekanand Gopalkrishnan, Deloitte Analytics Institute Asia

Clifton Phua, SAS

Gao Cong, Nanyang Technological University

V
alue investing is an investment strategy where the investor believes that a
A 3D-subspace-
stock’s fundamentals determine future stock prices.1 A value investor an-
clustering method
alyzes stock fundamentals and buys stocks that are undervalued with the belief
generates rules
that the prices of the stocks will rise in the future.2 The success of value ­investing
to pick potential
is evident in the stock market, with many which financial ratios and what values of
undervalued ­famous value investors’ portfolios, such as these ratios are related to undervalued stocks.
Warren Buffett’s, outperforming the market For example, Benjamin Graham, the founder
stocks; 3D subspace indices. There are also many successful mu- of value investment, prefers stocks with a
tual funds that follow the philosophy of value price–earnings ratio of no more than 7.2
clustering is effective investing, such as Third Avenue Value Fund, Using Graham’s rules on picking stocks has
which managed 5.04 billion dollars of assets in been proven to generate profits for value in-
in handling high- 2011. Academic research has shown that stock vestors. An experiment conducted over an
fundamentals are related to stock prices.3 eight year period from 1973 to 1980 showed
dimensional Stock fundamentals can be measured by this strategy to be profitable.1 We propose us-
stock financial ratios. For example, the re- ing 3D subspace clustering to generate rules
financial data and turn-on-equity ratio measures a stock’s ef- to pick potential undervalued stocks. The
ficiency in using its assets to generate profit, 3D subspace-clustering method is effective in
is adaptive to new the debt–­equity ratio measures the amount of handling high-dimensional financial data and
the stock’s assets that are debts, and the price– is adaptive to new data. In addition, its re-
data. earnings ratio measures the ratio of the stock’s sults aren’t influenced by human biases and
current price to its current earnings. emotions, and are easily interpretable. We
Therefore, scrutinizing financial ratios is conducted extensive experimentation in the
important in finding undervalued stocks.2 stock market over a period of 28 years (from
However, there’s no perfect rule that shows 1980 to 2007), and we found that using rules

52 1541-1672/14/$31.00 © 2014 IEEE IEEE INTELLIGENT SYSTEMS


Published by the IEEE Computer Society

IS-29-02-Sim.indd 52 22/05/14 4:10 PM


Stock Financial ratio (attribute)
Financial ratio (attribute) r4 r5
generated by two 3D subspace-cluster- Stock
r4 r5 –5 45
ing algorithms (CATSeeker and MIC) Stock Financial ratio (attribute)
results in 60 percent more profits than (object) 9 –1 0 2
r1 r2 r3 r4 r5
using Graham’s rules alone. 35 5 1 2
s1 1 13 0 8 35
36 4 0 2
Problem Defined s2 35 1 10 5 9
35 5 5 7
Value investing isn’t simply about buy- s3 --6 1 30 6 9 Years 8–10
2 9
ing stocks based on some rules on the s4 --1 2 21 5 8 Years 5–6
financial ratios, although Henry Op- s5 3 80 –5 50 1
Years 1–3
penheimer1 has shown that Graham’s
(a)
rule-based strategy enables the inves-
tor to make profits from the stock
market. It’s not necessary to always
use Graham’s rules, and the investor
can set his or her own rules, based S1
on his or her preferences and domain
Price return

S2
knowledge. These rules are generally S3
used to select a comfortable number of S4
stocks for the investor to conduct fur- S5
ther analysis. Hence, these rules pro-
vide some general decision support.
For an inexperienced investor, manu- 1 2 3 4 5 6 7 8 9 10
ally setting rules on the financial ratios Year (Time)
can be difficult, and even for the ex-
(b)
perienced investor, he or she might be
prone to set irrational and biased rules.
The investor can stick to Graham’s Figure 1. (a) Example of a 3D financial dataset defined by stocks, financial ratios,
and years. The highlighted region is an actionable 3D subspace cluster of stocks s2,
rules, but the relevance of these rules s3, s 4 that have similar financial fundamentals reflected in financial ratios r2, r3, r4 .
at present time remains to be seen. (b) The price returns of the stocks. Stocks s2, s3, s 4 have high price returns.
Hence, the following problem needs to
be addressed: How do we find rules on
financial ratios that are related to highThe 3D subspace-clustering approach r3[…], r4[0, 2] in year 10, where r[j, k]
stock price returns? We should note groups stocks that have similar funda- denotes that the stock’s value on finan-
here that we define the price return mentals (financial ratios) and high price cial ratio r should fall between values j
of a stock as (sold price – ­purchased returns across years. The highlighted and k. If there’s a stock whose set of fu-
price)/purchased price. region in Figure 1a is a 3D subspace ture years contain this rule, this stock is
There are financial studies that in- cluster containing stocks s2, s3, s4 that a potentially undervalued stock.
vestigate the impact of single finan- have similar fundamentals reflected in 3D subspace clustering is suitable
cial ratios on stock prices.3 However, financial ratios r2, r3, r4 for years 1–3, for this value investing problem, due
different financial ratios quantify dif- 5–6, and 8–10. From Figure 1b, we can to the following reasons:
ferent aspects of a stock, so to get the see that stocks s2, s3, s4 have high price
complete picture, it will be useful to returns. • Effective in handling financial r­atio
study the collective influence of finan- This cluster’s subspace can be used data. Financial ratio data is high di-
cial ratios on the stock prices, and thisas a rule that’s related to high price re- mensional, as the number of ­financial
is a nontrivial problem. turns. For future years, if there’s a stock ratios and timestamps can be large.
whose financial ratio values fall in this Techniques such as traditional clus-
Proposed Solution subspace, we can consider this stock as tering (for example, ­k-means clus-
We propose using 3D subspace-­ a potential undervalued stock. Using the tering) suffer from the curse of
clustering algorithms to mine rules that example in Figure 1a, the rule is r2[2, 3], dimensionality in this type of data;
are related to high stock price returns. r3[10, 11], r4[5, 6] in year 1, …, r2[…], the stocks are equidistant from each

MARCH/APRIL 2014 www.computer.org/intelligent 53

IS-29-02-Sim.indd 53 22/05/14 4:10 PM


Subspace Clustering

other in the full space of the data, which is also known as the i-period use them as training data for the 3D-
hence it’s difficult to cluster them.4 simple net return.5 subspace-clustering algorithms to pick
We developed 3D subspace clustering For the sake of brevity, we also stocks. The partitioned datasets are
to overcome this curse of dimension- ­denote ret(o) as the price return of denoted as Dt , t ∈ {1980, …, 1999},
ality. We achieve this by clustering stock o, if the year it’s bought and the with each Dt containing data of the
stocks based on similar subsets of fi- year it’s sold aren’t required to be ex- set of years T = {t, …, t + 9}. For ex-
nancial ratios (data subspace). The fi- plicitly stated. ample, D1980 contains data from
nancial ratio data is continuous and A 3D subspace cluster is a subcuboid 1980 to 1989.
3D subspace clustering is generally C = O × A × T, with its axes defined We also processed these 10-year da-
used on this type of data. by a subset of stocks O ⊆ O, a subset tasets Dt to contain only stocks that
• Adaptive to new data. The finan- of financial ratios A ⊆ A, and a sub- have high price returns. These data-
cial ratio data is constantly chang- set of years T ⊆ T. We denote {C1, …, sets are required as training data for
ing and 3D subspace clustering can Cm} as the set of 3D subspace clusters certain 3D-subspace-clustering algo-
t
be easily reapplied on the new data mined from the dataset D. rithms. More specifically, Dmin ret
is a
to get the updated results. processed dataset that contains stocks
• Easy interpretation of results. The Research Design o, whose CAGR(o, t, t + 9)³ minret,
investor can easily analyze 3D sub- We present the research design of our given that minret is a threshold. The
space clusters because the clusters experiments. The research design con- compound annual growth rate is
are explicitly created. sists of three main phases: data prepa- 1

 p(o, t + 9 )  9
ration, stock picking, and data analysis. CAGR(o, t , t + 9) =  − 1.
 p(o, t) 
We aren’t trying to solve the prob-
lem of how to invest, for example, Data Preparation We use compound annual growth
or what stocks to buy at a particu- In the data preparation phase, we ob- rate instead of average return to mini-
lar time. Instead, we’re trying to de- tained raw financial figures of US mize the effect of volatility of periodic
termine if 3D subspace clustering can stocks from Compustat (see www. returns.
help the value investor in his or her compustat.com) and converted this in- We vary minret from 0.1 to 0.5, as
stock selection process by decreasing formation into a 3D dataset of financial there are no valid stocks in some 10-year
t
the pool of stocks to select. ratios. We removed microcap stocks datasets Dmin ret
for minret > 0.5.
To evaluate the effectiveness of using (whose prices are less than $5) from the
3D subspace clustering for value invest- data, as these stocks have a high risk of Stock Picking
ing, we compare its profits and risks to being manipulated and their financial Graham’s rule-based strategy consists
those of Graham’s rule-based strategy. figures are less transparent. of a buy phase and a sell phase.1 In
We converted the raw financial fig- the buy phase, a stock is bought if it
Preliminaries ures into 30 financial ratios, based satisfies at least one reward criterion
Let the 3D financial dataset be a on the ratios’ formula from Investo- and one risk criterion. The criteria are
cuboid D with its axes defined by ob- pedia (see www.investopedia.com/ shown in Table 1. If a stock satisfies
jects (stocks) O, attributes (financial university/ratios). at least one reward criterion and one
ratios) A, and time stamps (years) T, We prepared a financial dataset risk criterion on year t, we will pur-
for example D = O × A × T. D containing 30 financial ratios chase the stock on the last day of its
Let the value of financial ratio a of and spanning 28 years (from 1980 fiscal year t.
stock o, in year t, be denoted as voat. to 2007). The number of stocks in- In the sell phase, the stock will be sold
Let p(o, t) be the closing price of stock creased from 3,335 to 5,049, due to either on the last day of its fiscal year
o at the end of the fiscal year t, which the stock market’s expansion. Some t + 2, or on the day when its price appre-
we use as the buying and selling prices (14.7 percent) of the dataset contain ciates by 50 percent, whichever comes
in our experiments. The price return missing values. first. We slightly tweak the sell phase, as
of stock o, bought at year t and sold Graham’s rule-based strategy uses we are only able to obtain the price of
at year t + i, is calculated as 10 years of financial ratio data to pick the last day of each fiscal year of a stock.
stocks and to have a fair compari- A stock will be sold on the last day of
p(o, t + i) − p(o, t) son; we also partition the financial its fiscal year t + 1 if its price appreci-
ret(o, t , t + 1) = ,
p(o, t) ratio data into 10-year datasets, and ates to more than 50 percent, or it will

54 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

IS-29-02-Sim.indd 54 22/05/14 4:10 PM


Table 1. Graham’s rule-based strategy (adapted from other work1).
Reward Criteria
1 Earnings/price yield ≥ 2 × AAA bond credit rating yield
be sold on the last day of its fiscal year 2 Price–earnings ratio ≤ 0.4 × highest price–earnings ratio
t + 2. In our experiments, we show that of the stock during the past 5 years
this strategy still generates good profits. 3 Dividend yield ≥ 2/3 × AAA bond yield
We use 10 datasets Dt, t ∈ {1989, …, 4 Stock price ≤ 2/3 × tangible book value per share
1998} as the testing datasets. Hence, 5 Stock price ≤ net current asset value
the testing period contains 19 years,
Risk Criteria
from 1989 to 2007.
6 Total liabilities ≤ book value
For the 3D subspace-clustering
strategy, depending on which algo- 7 Current ratio > 2
rithm we use, the training dataset can 8 Total liabilities < 2 × net current asset value
t
either be Dt or Dmin ret
. Assume that 9 Earnings growth of prior 10 years ≥ 7 percent annual
we mine 3D subspace clusters from (compound) rate
Dt , and we use the clusters as rules to 10 No more than two declines of 5 percent or more in
pick stocks. Let C = O × A × T be one year-end earnings in the prior 10 years
of the 3D subspace clusters. Financial Ratio Definitions in Reward and Risk Criteria
The general idea is to use the values Book value Total assets to total liabilities
of the financial ratios of a 3D subspace Current ratio Current assets/current liabilities
cluster as a rule to pick stocks, because
Dividend yield Dividend per share/stock price per share
these values are associated with stocks
Earnings/price yield Earnings per share/price per share
of high compound annual growth rate.
Let there be a 3D subspace cluster C = Net current asset value Current assets to total liabilities
O × A × T mined from training data- Price–earnings ratio Stock price per share/earnings per share
{
set Dt, and let Vati = Voati o ∈ O be } Tangible book value per share Total tangible assets/total number of shares outstanding
the set of values in cluster C, of
financial ratio a at year ti. We denote
boundary(Vati) = [min(Vati), max(Vati)],
t
which defines the boundaries of the where t n′ t1′  t n′ ∈ T ′, and t1′ < For a training dataset Dt or Dmin ret
,
values of financial ratio a at year ti. t2′ <  < t n′ . The stock o′ is then we test the rules mined on the testing
Definition 1 of our approach is bought on the year t n′ . dataset Dt+9. We use 10 datasets Dt,
the rule of the 3D subspace clus- We only buy stock o′ at year t n′ t ∈ {1980, …, 1989} as the training
ter where C = O × A × T). We denote once, regardless the number of times datasets. Hence, each training dataset
the rule of the cluster C as rule(C) = it’s picked; this is to prevent the stock has a corresponding testing dataset.
{ }
boundary(Vati ) a ∈ A, t i , ∈ T , which from dominating the results. We use 10 datasets Dt′, t′ ∈ {1989,
is a set of boundaries. We consider 3D subspace clusters …, 1998} as the testing datasets.
We then use rule(C) from training that contain at least two years, because
dataset Dt to pick stocks from dataset clusters that contain only a single year Data Analysis
Dt+9, which is the corresponding test- are trivial. This means that the earliest Let strat denote the strategy used in
ing dataset. Subsequently, we use rule(C) year a stock can be bought in the testing the stock picking phase. Let ODstrat
from training dataset Dt+j to pick stocks data is in the second year. Hence, we set be the set of stocks bought using strat
from testing dataset Dt+j+9, for j ≥ 1. the testing data to start at year t + 9, so on a training dataset D. We use the
Definition 2 of our approach is the 3D that the earliest year a stock in the test- function ret to calculate the price re-
subspace-clustering strategy’s buy rule. ing data Dt+9 can be bought is on year turn of the stocks bought.
Let Dt+9 be a dataset with a set of years t + 10. If the testing data starts at year t Definition 3 is the average return of
T ′. A stock o′ in dataset Dt+9 is bought + 10, then it’s not possible to buy stocks strategy:
if it satisfies rule(C). Stock o′ satis­- on year t + 10 across all experiments.
{
fies rule(C) = boundary(Vati ) a ∈ A, t i , To have a fair comparison, we used ∑ o ∈O D ret(o)
.
retD
strat =
strat

}} if, ∀a ∈ A : v
the same sell phase described for
{
D
| Ostrat |
∈T = t1 ,..., t T o ′at1′ ∈  Graham’s rule-based strategy in the
­
previous “Stock Picking Phase” section The strategy’s risk on training data­
( ) ( )
boundary Vat1 ,..., vo ′atn′  ∈ boundary Vatn for the 3D subspace-clustering strategy. set D is its standard deviation of the

MARCH/APRIL 2014 www.computer.org/intelligent 55

IS-29-02-Sim.indd 55 22/05/14 4:10 PM


Subspace Clustering

average return. A high standard de- TRICLUSTER correlated when the values in the clus-
viation implies that the strategy is TRICLUSTER is the pioneer algorithm ter have high co-occurrences and these
risky and volatile. Let d ret denote the for mining 3D subspace clusters, which co-occurrences aren’t by chance.
risk-free return that the investor is are denoted as triclusters.7 A tricluster
assumed to have. In calculating the can be transformed into a wide varia- CATSeeker
standard deviation, we shouldn’t in- tion of 3D subspace clusters, depend- The price return of the stocks can be
corporate returns that have at least ing on the setting of the TRICLUSTER crucial information in clustering, but
d ret. Thus, we calculate the strategy’s algorithm’s parameters.7 In a tricluster the previous three algorithms don’t
risk on training dataset D using the C = O × A × T, the stocks O have ho- incorporate this information. The
­
downside standard deviation, which mogeneous values in the set of finan- CATSeeker algorithm incorporates this
is Definition 4 (the risk of strategy): cial ratios A in each year t ∈ T, and information, and its clusters are de-
the homogeneity and size criterion are noted as CATSs.10 A CATS satisfies the
riskD
strat satisfied subject to the setting of pa- following criterion: the 3D subspace
∑ o ∈O D
strat | ret(o) <δ ret
(ret(o) − retstrat )2
.
rameters d, minO, minA, minT. We also
set its parameters d y = dz = ∞, as they
cluster C = O × A × T is actionable
when ∀t ∈ T, that is, the stocks in O
= D
| {o | o ∈ Ostrat ∧ ret(o) < δ ret } | −1 aren’t applicable in mining our desired are similar on the set of financial ratios
clusters. The clusters are sensitive to A; and the stocks in O have high and
A strategy is thus desirable if it the parameters, and careful setting of correlated price returns in years T.
gives high average return and low the parameters is required. Given a set of centroids, the optimal
downside risk (standard deviation), clusters with respect to these centroids
which can be measured using the Sor- STATPC are found. The results of the algo-
tino ratio. 6 Definition 5 is the Sortino Moise and Sander proposed statistical rithm are shown to be insensitive to
ratio of strategy: significant subspace clusters (SSSCs), its parameters.10
which are subspace clusters that are On the selection of centroids, the
retD
strat − δ ret . insensitive to the parameters of their user can set a threshold to select stocks
SortinoRatioD
strat =
riskDstrat algorithm STATPC.8 The number of that have good historical price returns
stocks in the statistical significant as the centroids.
We conduct different stock-picking cluster is significantly more than ex-
strategies and evaluate their results pected, under the assumption that the Experiments
by the following experiments: data is uniformly distributed. We coded all algorithms in C++, and
• Average returns across years. We de- SSSCs are 2D subspace clusters O × A, their codes or programs were kindly
note T as the set of years used to thus we require a postprocessing step provided by their respective authors.
test a strat. For each testing dataset, to convert the 2D SSSCs to 3D. Given We performed all experiments using
we calculate Dt, t ∈ T, the average a dataset D, which contains a set of computers with Intel Core 2 Quad
­return retD D
strat, and SortinoRatio strat time stamps T, we mine SSSCs from 3.0-GHz CPUs with 8 Gbytes of RAM.
of the stocks bought. each year t ∈ T, and we try all pos- We used Windows 7 except for experi-
• Overall average returns and risks. sible combinations of them to obtain ments involving TRICLUSTER, which
We calculate the overall average re- 3D SSSCs. That is, a 3D SSSCs C = O we performed in Ubuntu 10.10
turn of a strat by averaging retDtstrat, × A × T is formed if there exists 2D We conducted the experiments in
∀ t ∈ T, and we calculate the overall SSSCs O × A, ∀t ∈ T. accordance with the parameters pre-
risk of this strategy by averaging the sented in the “Research Design” sec-
downside standard deviation of retDtt
strat, MIC tion. We set the risk-free return at dret = 0
∀ t ∈ T. We also calculate the overall Correlated subspace clusters (CSCs) are in our experiments.
Sortino ratios of the strategies using insensitive to the parameters of their al- For TRICLUSTER, we fixed its
the overall average return and risk. gorithm, MIC.9 Unlike SSSCs, CSCs are minimum size parameters to minO = 5,
3D and they don’t require the assump- minA = 2, and minT = 3, and varied
3D Subspace-Clustering tion of uniformly distributed data. its similarity parameters as e = 1 and
Algorithms A 3D subspace cluster is a CSC when d = 0, 0.1, 0.01, as it’s not possible to
We use a wide range of 3D subspace- it satisfies the following criterion: the test on all possible combinations of its
clustering algorithms in our experiments. 3D subspace cluster C = O × A × T is parameters.

56 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

IS-29-02-Sim.indd 56 22/05/14 4:10 PM


1 ∞
0.9 Graham Graham
0.8 TRI 4 TRI
0.7 STATPC STATPC
0.6 MIC 3 MIC
0.5

Sortino ratio
Average return

CATS CATS
0.4 2
0.3
0.2 1
0.1
0 0
–0.1
–0.2 –1
–0.3
–0.4 <–2
89 90 91 92 93 94 95 96 97 98 89 90 91 92 93 94 95 96 97 98
Year Year
(a) (b)

Graham
TRI
100
STATPC
MIC
80
CATS
% of stocks bought

60

40

20

0
89 90 91 92 93 94 95 96 97 98
Year
(c)

Figure 2. The different 3D-subspace-clustering strategies’ (a) average returns and (b) Sortino ratios across the years. Each year
on the x-axis denotes the start of a ten-year test period. (c) Percentage of stocks bought.

For STATPC, we used its default clustering process. For CATSeeker, a positive returns. However, this ap-
setting a 0 = 10 −10, aK = aH = 10 −3. stock o is selected as a centroid if its proach generated substantial losses in
For MIC, we used its default setting CAGR(o, t, t + 9) is at least minret. the last two datasets, notably generat-
p-value = 10 −4. For CATSeeker, we ing an 80 percent loss in D1998. Hence,
used its default setting t = 0.1; m = 10; Average Returns Across Years TRICLUSTER produced pretty vola-
d = 0.001; l = 0.1; and varied r = 0.2, Figures 2a and 2b present the average tile results. Strategy with volatile re-
0.3, 0.4, as its results are shown to be returns and Sortino ratios of the dif- sults naturally generates high returns
insensitive to this range of r.10 ferent 3D subspace-clustering strate- in the datasets when it’s profitable;
On the use of training datasets, TRI- gies, on testing datasets Dt, t ∈ {1989, TRICLUSTER based strategy has the
CLUSTER, STATPC, and MIC used …, 1998}. highest Sortino ratio in datasets D1991,
t
Dmin ret
, as these algorithms don’t con- TRICLUSTER-based strategy gen- D1993, and D1994. However, strategies
sider the stocks’ price returns during erated good positive returns in the ini- with less volatile results also outper-
their clustering process. CATSeeker tial seven datasets. In dataset D1992, it formed their volatile peer in certain
used Dt, as this algorithm consid- even has a Sortino ratio of infinity, as years, as CATSeeker based strategy
ers the stocks’ price returns during its returns of all the stocks picked have has the highest Sortino ratio in D1996,

MARCH/APRIL 2014 www.computer.org/intelligent 57

IS-29-02-Sim.indd 57 22/05/14 4:10 PM


Subspace Clustering

The Authors
Kelvin Sim is a scientist at the Data Analytics Department, Institute for Infocomm Re-
search, Singapore, which is part of the Agency for Science, Technology, and Research. across the 10 testing datasets and pres-
His research interests include financial data mining, subspace clustering, graph mining, ent the results in Figure 3a. STATPC-,
co-clustering, and activities of daily living recognition. Sim has a PhD in computer en-
gineering from Nanyang Technological University, Singapore. Contact him at shsim@ MIC-, and CATSeeker-based strategies
i2r.a-star.edu.sg. have zero risk, as they have positive
average returns across the 10 testing
Vivekanand Gopalkrishnan is the director of research at Deloitte Analytics Institute
Asia. His research interests include efficient algorithms for mining interesting item sets, datasets. Figure 3b presents the over-
subspace clustering, mining in P2P networks, outlier detection, and data warehousing. all Sortino ratio across the 10 testing
Gopalkrishnan has a PhD in computer science (data warehousing) from City University datasets, which shows that Graham’s
of Hong Kong. Contact him at vivek@deloitte.com.
strategy has a high Sortino ratio.
Clifton Phua is the security and fraud analytics lead at SAS. His research interests include However, STATPC-, MIC-, and CAT-
data mining, fraud detection, activity recognition, and intelligent monitoring. Clifton has Seeker-based strategies have higher
a PhD in information technology from Monash University, Australia. He’s a member of
IEEE. Contact him at clifton.phua@sas.com. Sortino ratios than Graham’s strategy.
Among these 3D-subspace-clustering
Gao Cong is an assistant professor at Nanyang Technological University, Singapore. His strategies, MIC- and CATSeeker-based
research interests include geospatial keyword queries and mining social media. Cong has
a PhD in computer science from the National University of Singapore. Contact him at strategies have higher average return
gaocong@ntu.edu.sg. and lower risk than Graham’s strategy.

Summary of the Experiments


All stocks recommended by CAT-
0.4 ∞ Seeker- and MIC-based strategies have
Graham 10
0.35 TRI positive returns. This is a good achieve-
0.3 STATPC ment, and CATSeeker- and MIC-based
8
Average return

MIC
Sortino ratio

0.25 CATS strategies generate 60 percent more re-


6
0.2 turns than Graham’s strategy, with no
0.15 4 negative returns across the years (see
0.1 Figure 3a). Although CATSeeker and
2
0.05 MIC used different approaches to find
0 0 3D subspace clusters, their perfor-
–0.05 0 0.05 0.1 0.15 Graham TRI STATPC MIC CATS
Risk mances are similarly good. This sug-
gests that good performances can be
(a) (b)
achieved by different approaches; price
return is used to guide the clustering
Figure 3. The results of the different strategies across the 10 testing datasets. (a) Overall
average returns and risks, and (b) overall Sortino ratios. in CATSeeker, and information theory
concept (correlation information) is
used to guide the clustering in MIC.
D1997, and D1998; and MIC in D1989, of all stocks, and is better than Graham’s
D1996, and D1995. strategy, which bought the most stocks
STATPC-, MIC-, and CATSeeker-
based strategies were able to generate
positive average returns across the 10
(between 30 to 50 percent of all stocks).
STATPC bought the least stocks (less
than 5 percent of all stocks) but it has
W e investigated the effective-
ness of using 3D subspace
clustering for value investing. This
datasets, which even Graham’s strat- a lower average return than MIC and approach involves grouping stocks
egy was unable to achieve (it gener- CATSeeker. TRICLUSTER and MIC are that have similarly good fundamen-
ated losses in dataset D1998). more volatile strategies, because the per- tals (represented by their financial ra-
centage of stocks bought by them varied tios) over the years, and then using
Percentage of Stocks Bought across the years. this information to buy stocks.
Figure 2c presents the percentage of We compared this approach with
stocks bought by different strategies, Overall Average a highly successful value investment
based on the pool of stocks available Returns and Risks strategy, known as Graham’s strat-
to each strategy. The CATSeeker-based We calculated the overall average re- egy. We found that two 3D sub­
strategy bought between 10 to 20 percent turn and risk of the different strategies space-­clustering strategies ­generated

58 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

IS-29-02-Sim.indd 58 22/05/14 4:10 PM


60 percent more returns than Gra- Clustering,” ACM Trans. Knowledge ACM SIGKDD Int’l Conf. Knowledge
ham’s strategy, with zero risk. Discovery from Data, vol. 3, no. 1, 2009, Discovery and Data Mining, 2008,
pp. 1–58. pp. 533–541.
References 5. R. Tsay, Analysis of Financial Time Se- 9. K. Sim, A. Aung, and G. Vivekanand,
1. H.R. Oppenheimer, “A Test of Ben ries, Wiley-Interscience, vol. 543, 2005. “Discovering Correlated Subspace Clus-
Graham’s Stock Selection Criteria,” 6. F. Sortino and L. Price, “Performance ters in 3D Continuous-Valued Data,”
­Financial Analyst J., vol. 40, no. 5, Measurement in a Downside Risk Proc. IEEE Int’l Conf. Data Mining,
1984, pp. 68–74. Framework,” The J. Investing, vol. 3, 2010, pp. 471–480.
2. B. Graham, The Intelligent Investor: no. 3, 1994, pp. 59–64. 10. K. Sim et al., “Centroid Based Actionable
A Book of Practical Counsel, Harper 7. L. Zhao and M.J. Zaki, “TRICLUSTER: 3D Subspace Clustering,” IEEE Trans.
Collins Pub., 1986. An Effective Algorithm for Mining Co- Knowledge and Data Eng., vol. 25,
3. J.Y. Campbell and R.J. Shiller, “Valuation herent Clusters in 3D Microarray Data,” no. 6, 2012.
Ratios and the Long-Run Stock Market Proc. ACM Sigmod Int’l Conf. Manage-
Outlook: An Update,” J. Portfolio Man- ment of Data, 2005, pp. 694–705.
agement, vol. 24, 2001, pp. 11–26. 8. G. Moise and J. Sander, “Finding Non-
4. H.-P. Kriegel, P. Kröger, and A. Zimek, Redundant, Statistically Significant
“Clustering High-Dimensional Data: A Regions in High Dimensional Data: A Selected CS articles and columns
Survey on Subspace Clustering, Pattern- Novel Approach to Projected and are also available for free at
Based Clustering, and ­Correlation ­Subspace Clustering,” Proc. 14th http://ComputingNow.computer.org.

ADVERTISER INFORMATION

Advertising Personnel Southwest, California:


Mike Hughes
Marian Anderson: Sr. Advertising Coordinator Email: mikehughes@computer.org
Email: manderson@computer.org Phone: +1 805 529 6790
Phone: +1 714 816 2139 | Fax: +1 714 821 4010
Southeast:
Sandy Brown: Sr. Business Development Mgr. Heather Buonadies
Email sbrown@computer.org Email: h.buonadies@computer.org
Phone: +1 714 816 2144 | Fax: +1 714 821 4010 Phone: +1 973 304 4123
Fax: +1 973 585 7071
Advertising Sales Representatives (display)
Advertising Sales Representatives (Classified Line)
Central, Northwest, Far East:
Eric Kincaid Heather Buonadies
Email: e.kincaid@computer.org Email: h.buonadies@computer.org
Phone: +1 214 673 3742 Phone: +1 973 304 4123
Fax: +1 888 886 8599 Fax: +1 973 585 7071

Northeast, Midwest, Europe, Middle East: Advertising Sales Representatives (Jobs Board)
Ann & David Schissler
Email: a.schissler@computer.org, d.schissler@computer.org
Phone: +1 508 394 4026 Heather Buonadies
Fax: +1 508 394 1707 Email: h.buonadies@computer.org
Phone: +1 973 304 4123
Fax: +1 973 585 7071

MARCH/APRIL 2014 www.computer.org/intelligent 59

IS-29-02-Sim.indd 59 22/05/14 4:10 PM

Вам также может понравиться