Вы находитесь на странице: 1из 4

DATA MINING TECHNIQUE WITH CLUSTER

ANAYSIS USE K-MEANS ALGORITHM FOR


LQ45 INDEX ON INDONESIA STOCK
EXCHANGE

A. Raharto Condrobimo1,2, Bahtiar Saleh Abbas Agung Trisetyarso


1
Computer Science Department Computer Science Department Computer Science Department
BINUS Graduate Program - Doctor BINUS Graduate Program - Doctor BINUS Graduate Program - Doctor
of Computer Science, of Computer Science, of Computer Science,
Bina Nusantara University, Bina Nusantara University, Bina Nusantara University,
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
2
Information System Department bahtiars@binus.edu atrisetyarso@binus.edu
School of Information System
Bina Nusantara University
Jakarta 11480, Indonesia
acondrobimo@binus.edu

Wayan Suparta Chul-Ho Kang


Civil Engineering Department Department of Electronic and
University of Technology Yogyakarta, Communication Engineering
Yogyakarta Indonesia Kwangwoon University, South Korea
drwaynesparta@gmail.com Chikang5136@kw.ac.kr

Abstract— The purpose of this research is to apply data


mining techniques with cluster analysis or also known as Stocks or shares, in this study we do not distinguish stocks and
clustering on stock data listed in index LQ45 in Indonesia Stock shares, is the relationship of ownership between the company
Exchange. The method used in cluster analysis is k-means and shareholders. Based on the classification, there are two
algorithm. The data used in this research is taken from Indonesia
Stock Exchange. Cluster analysis in this study takes the
types of shares, namely 1) preferred stock and 2) ordinary
characteristics of data such as volume and value of shares. The shares. Preferred stock is a stock that has a special right in the
results of cluster analysis in this study are presented in the form company (for example: distribution of previously received
of clustering cluster members visually. Therefore, this cluster corporate profits rather than other shareholders) whereas
analysis in this research can be used to identify more quickly and ordinary shares are shares that have no more rights than the
efficiently about each member of LQ45 index cluster based on general right to obtain profit sharing in accordance with the
share value for each cluster and its volume. The identification profit-sharing schedule to be held in the Annual General
results can be used by beginner-level investors who have begun to Meeting of Shareholders (AGMS). Common shares
be interested in stock investments to help make informed (hereinafter referred to as shares) have advantages over special
decisions about stock trading on desired cluster groups.
interests that can be freely transferred to other parties so they
can be traded in a so-called stock market.
Keywords— data mining, cluster analysis, k- Indonesia Stock Exchange (IDX) is the only stock market
means, stocks in Indonesia. IDX provides a mechanism for selling and
buying shares for publicly listed companies listed on the Stock
I. INTRODUCTION Exchange. Limited Liability Company (PT) is a legal entity to
In recent years, it is normal for professional stockbrokers to run a business consisting of share capital, which is part of its
try to extract relationships from different stocks by analyzing share owners. PT TBK is a company with limited liability
past trading graphs thoroughly. In addition, more available company and also public company status (Go Public).
stock system software predictions can be used by stock Shares are the main products in the capital market
investors to help them generate fast stock market forecasts [1]. instruments that are transacted. There are several derivatives
that arise from transactions that occur due to the stock market, approach for extracting meaningful information. The main
there are two ways to invest in stocks, first is to buy and store purpose is to help determine the most appropriate pre-
these shares so that the profit distribution of dividends processing and data analysis techniques [6].
(dividends) and the second is the stock buying and selling To overcome the pre-processing mistakes done cleaning,
back so as to benefit from the difference between the sale and integration, transformation, reduction of news reports. This
purchase value (capital gain). Buying shares in general can be shows the missing value filling, combines the report by
done in two ways, purchased when stocks will go up and start relevance and consolidates the data by replacing the original
on Initial Public Offering (IPO) and purchased through the information using the news aggregator. Once the stored data is
secondary market we know the stock market. processed in pre-processing data stored in the data repository.
Shares are an investment only for the upper classes, this The data repository contains data that has been cleared [7].
happens in the period before online. However, since the era of
online trading is increasing where the transaction can use the
Internet online network, stock transactions increasingly shifted
into investment options for many people. This is supported by
a minimum initial deposit more affordable for most people.
Information about shares on the internet has become one of
the sources of information for investors and traders. However,
it remains an open debate whether the activity of stock
message boards on the internet quickly shows the price or the
value of the stock [2]. Figure 1. Data mining and knowledge discovery process of Database [11]
Therefore, it is reasonable to link data mining with stock
market forecasting to mine historical data from the stock B. Cluster Analysis Or Clustering
market in helping to determine a better trading strategy. Given The data source clustering is another data preprocess for
significant technical challenges and potential advantages, multiple data mining sources. In contrast to the classification
many researchers feel the need to look for comprehensive data of data sources, this is a type of targeted learning. In other
mining procedures that can produce accurate, consistent, and words, data clustering is a data cluster in accordance with the
reliable forecasting results with potential benefits. Since the similarity between data without knowing the classes it has in
stock market index contains many individual stocks and advance [8] .
reflects a broader market movement than individual stock This paper contributes to the existing literature by
movements, the forecasting of the stock market index has proposing a new time series model on clustering that (1) is
attracted the attention of many researchers [3]. more accurate than conventional approaches, (2) measurable
The dynamics of financial markets play a major role in the (on large datasets) due to the use of multi-resolution timing at
functioning of the stock market, but the best predictive method various levels of clustering, 3) can overcome the limitations of
has always been the topic of ongoing research and discussion. comparative clustering algorithms in finding the same time
In the past decade, most methods have been based on stock series clusters in form. This important feature is very
index or index data, using unclear hybrid models or artificial advantageous for the cooperative valuation in the stock market
neural networks to predict its future [4]. [9].
Cluster analysis offers a useful way to organize and present
II. THEORITICAL BACKGROUND a complex dataset [6] Analysis of the cluster can be regarded as
the most popular techniques and foremost to solve problems
A. Data Mining that around supervised learning or the learning process
Data mining is focused on analyzing large amounts of undirected or unsupervised. So each technique used to solve
data efficiently and extracting important, useful, and hidden problems with techniques like this, will certainly find a way of
information from data by combining different techniques in dealing with the structure of the data that has not been
different areas, such as pattern recognition, decision making, labeled [10]
expert systems, knowledge discovery databases, artificial Being one of the most prominent grouping techniques
intelligence, and statistics . The main types of data mining in science and technology, the k-means clustering algorithm
include mining classification, cluster mining, association rule will be used in this study. A collection of sorting algorithms
mining, text mining, and image mining [3]. One of the data will be used to group textual data collected from stock data on
mining techniques such as clustering can be applied to the Indonesia Stock Exchange [4].
uncover hidden knowledge of stock data.. Clustering is the A rich set of methods has been proposed in the past
process of grouping a set of objects into a class of similar to group congruent data elements. The K-Means grouping
objects. Clusters are collections of data objects that resemble (KM) is the most popular method that divides data into
each other in the same cluster and are different from those in disaggregated groups using Euclidean distance between data
other clusters [5]. elements and parallel cluster centers [10].
Data Exploration is a preliminary investigation of data to K-means clustering algorithm then iterate to improve or
understand its main characteristics and decide on the best increase the separation distances or similarities
in the cluster. [11] The new averages will be calculated by this The Euclidean method used as a similar measurement
algorithm for each cluster, using objects that are grouped into technique as above also satisfies the mathematical properties,
clusters on the previous iteration. By using the newly updated as follows: [11]:
center of the new cluster, all objects will then be regrouped.  Non-negative: d (i, j) ≥ 0: Distance is not possible
The iteration will continue until it reaches a stable grouping, negative.
which means that the cluster formed at the last iteration is the  The identity of an indistinguishable: d (i, i) = 0: Distance
same as the cluster formed on the previous iteration [11]. object to itself adala h 0.
 Symmetrical: d (i, j) = d (j, i): The distance is a function
III. RESEARCH DESIGN AND METHOD
of symmetry.
The application of cluster analysis in this paper applies four  Triangle inequality: d (i, j) ≤ d (i, k) + d (k, j): The
parts of cluster analysis Implemented on two attributes, namely distance the object i to j can not be greater than the
volume and transaction value on 45 blue chip stocks in distance rotate through k.
Indonesia Stock Exchange. The following plots below are the results of a study
Source of data in this research is taken from data of analyzing cluster 45 blue chip stocks in the Indonesia Stock
Indinesia Stock Exchange (http://www.idx.co.id/id- Exchange on 6th Nov 2017 transaction..
id/beranda/publikasi/lq45.aspx). The data used in this study is The Euclidean n method used as a similar measurement
the latest update on November 5, 2017.
technique as above also satisfies the mathematical properties,
as follows: [11]:
 Non-negative: d (i, j) ≥ 0: Distance is not possible
negative.
 The identity of an indistinguishable: d (i, i) = 0: Distance
object to itself adala h 0.
 Symmetrical: d (i, j) = d (j, i): The distance is a function
of symmetry.
 Triangle inequality: d (i, j) ≤ d (i, k) + d (k, j): The
distance the object i to j can not be greater than the
Figure 2. The Purpose Model distance rotate through k.

Figure 4 shows the proposed data mining model that


applies K-means clustering algorithm. The data base on general
stock list consist of 500 list. We use selected dataset in LQ45
list with two parameter which is volume and value.

IV. RESULT AND DISCUSSION


In this cluster analysis paper with K-Means, using
software to perform the mining process is Rapid miner studio.
At the pre-processing stage determine the three attributes for
cluster analysis, ie 1) the code of the stock attribute, 2) the
transaction volume attribute, and 3) the attribute of the value
of the stock. As the attribute identifier is the Attribute of the
Figure 5. Plots cluster analysis results of 45 blue chip stocks
stock code, while the volume attribute is an attribute that
describes the number of shares traded and the attribute value The graphic between volume and value is visible in
of the stock is to describe the total value of the transaction. the plot, cluster 0 has 18 objects, Cluster 1 has 8 objects,
The Euclidian distance measurement method is used cluster 2 has 11 objects, and cluster 3 has 8 objects. Cluster 0,
in the second analysis cluster in this study, applying the looks dominant in terms of number of object membership
equations and inequalities between the measurement data compared to other clusters. In addition, the density of the
objects. So for example, i = (i1 χ, χ i 2, ..., χ ip) and j = (χ j 1, χ distance between objects also looks very dominant. Based on
j2, ..., χ j 1) are two of the objects described by the numerical the findings of the two properties, it can be concluded that the
attribute p, then to measure the Euclidian distance between the stocks in LQ 45 are the most sought after investors is a
objects is [11] : combination of stocks with low value transactions and low
transaction volume. As is the case with many other types of
studies, this study certainly is not a perfect study in the study
. of cluster analysis 45 blue chip stocks in the Indonesia Stock
Exchange. Some potential weaknesses in this study, were
identified by the researchers, is the need for studies comparing
the accuracy of cluster analysis when the study was done by
comparing the various experiments by taking a number of mining procedure for daily stock market return forecasting,”
different clusters. Other potential weakness, is the need for Neurocomputing, vol. 267, pp. 152–168, Dec. 2017.
comparative studies of accuracy by using a cluster analysis [4] E. N. Desokey, A. Badr, and A. F. Hegazy, “Enhancing stock
algorithm analysis of other clusters, such as k-medoid s or prediction clustering using K-means with genetic algorithm,” in
more. 2017 13th International Computer Engineering Conference
V. CONCLUSION (ICENCO), 2017, pp. 256–261.
[5] R. Asif, A. Merceron, S. A. Ali, and N. G. Haider, “Analyzing
This cluster analysis study can provide information quickly
undergraduate students’ performance using educational data
and efficiently to potential novice investors on the distribution
map of 45 blue chip stocks in Indonesia Stock Exchange. mining,” Comput. Educ., vol. 113, pp. 177–194, Oct. 2017.
[6] M. S. Packianather, A. Davies, S. Harraden, S. Soman, and J. White,
The cluster analysis of 45 blue chip stocks in the Indonesia “Data Mining Techniques Applied to a Manufacturing SME,”
Stock Exchange provides useful and quick information Procedia CIRP, vol. 62, pp. 123–128, 2017.
visually to see the map of 45 blue chip stocks divided into four [7] R. Mythily, A. Banu, and S. Raghunathan, “Clustering Models for
parts according to the needs in stock price attributes and share Data Stream Mining,” Procedia Comput. Sci., vol. 46, no. Icict
transaction value so as to provide information quickly and 2014, pp. 619–626, 2015.
accurate to quickly become the target of stock investors'
[8] R. Wang et al., “Review on mining data from multiple data
decisions.
sources,” Pattern Recognit. Lett., vol. 0, pp. 1–9, Jan. 2018.
REFERENCES [9] S. Aghabozorgi and Y. W. Teh, “Stock market co-movement
[1] Y. Luo, J. Hu, X. Wei, D. Fang, and H. Shao, “Stock trends assessment using a three-phase clustering method,” Expert Syst.
prediction based on hypergraph modeling clustering algorithm,” in Appl., vol. 41, no. 4 PART 1, pp. 1301–1314, 2014.
2014 IEEE International Conference on Progress in Informatics [10] V. Vijay, V. P. Raghunath, A. Singh, and S. N. Omkar, “Variance
and Computing, 2014, pp. 27–31. Based Moving K-Means Algorithm,” in 2017 IEEE 7th
[2] H. Leung and T. Ton, “The impact of internet stock message boards International Advance Computing Conference (IACC), 2017, no. i,
on cross-sectional returns of small-capitalization stocks,” J. Bank. pp. 841–847.
Financ., vol. 55, no. December 1997, pp. 37–55, 2015. [11] Han, J., and Kamber, M. (2012). “Data Mining: Concepts and
Techniques”. 4th ed. San Francisco, Morgan Kaufmann Publishers.
[3] X. Zhong and D. Enke, “A comprehensive cluster and classification

Вам также может понравиться