A. Raharto Condrobimo1,2, Bahtiar Saleh Abbas Agung Trisetyarso
1 Computer Science Department Computer Science Department Computer Science Department BINUS Graduate Program - Doctor BINUS Graduate Program - Doctor BINUS Graduate Program - Doctor of Computer Science, of Computer Science, of Computer Science, Bina Nusantara University, Bina Nusantara University, Bina Nusantara University, Jakarta, Indonesia 11480 Jakarta, Indonesia 11480 Jakarta, Indonesia 11480 2 Information System Department bahtiars@binus.edu atrisetyarso@binus.edu School of Information System Bina Nusantara University Jakarta 11480, Indonesia acondrobimo@binus.edu
Wayan Suparta Chul-Ho Kang
Civil Engineering Department Department of Electronic and University of Technology Yogyakarta, Communication Engineering Yogyakarta Indonesia Kwangwoon University, South Korea drwaynesparta@gmail.com Chikang5136@kw.ac.kr
Abstract— The purpose of this research is to apply data
mining techniques with cluster analysis or also known as Stocks or shares, in this study we do not distinguish stocks and clustering on stock data listed in index LQ45 in Indonesia Stock shares, is the relationship of ownership between the company Exchange. The method used in cluster analysis is k-means and shareholders. Based on the classification, there are two algorithm. The data used in this research is taken from Indonesia Stock Exchange. Cluster analysis in this study takes the types of shares, namely 1) preferred stock and 2) ordinary characteristics of data such as volume and value of shares. The shares. Preferred stock is a stock that has a special right in the results of cluster analysis in this study are presented in the form company (for example: distribution of previously received of clustering cluster members visually. Therefore, this cluster corporate profits rather than other shareholders) whereas analysis in this research can be used to identify more quickly and ordinary shares are shares that have no more rights than the efficiently about each member of LQ45 index cluster based on general right to obtain profit sharing in accordance with the share value for each cluster and its volume. The identification profit-sharing schedule to be held in the Annual General results can be used by beginner-level investors who have begun to Meeting of Shareholders (AGMS). Common shares be interested in stock investments to help make informed (hereinafter referred to as shares) have advantages over special decisions about stock trading on desired cluster groups. interests that can be freely transferred to other parties so they can be traded in a so-called stock market. Keywords— data mining, cluster analysis, k- Indonesia Stock Exchange (IDX) is the only stock market means, stocks in Indonesia. IDX provides a mechanism for selling and buying shares for publicly listed companies listed on the Stock I. INTRODUCTION Exchange. Limited Liability Company (PT) is a legal entity to In recent years, it is normal for professional stockbrokers to run a business consisting of share capital, which is part of its try to extract relationships from different stocks by analyzing share owners. PT TBK is a company with limited liability past trading graphs thoroughly. In addition, more available company and also public company status (Go Public). stock system software predictions can be used by stock Shares are the main products in the capital market investors to help them generate fast stock market forecasts [1]. instruments that are transacted. There are several derivatives that arise from transactions that occur due to the stock market, approach for extracting meaningful information. The main there are two ways to invest in stocks, first is to buy and store purpose is to help determine the most appropriate pre- these shares so that the profit distribution of dividends processing and data analysis techniques [6]. (dividends) and the second is the stock buying and selling To overcome the pre-processing mistakes done cleaning, back so as to benefit from the difference between the sale and integration, transformation, reduction of news reports. This purchase value (capital gain). Buying shares in general can be shows the missing value filling, combines the report by done in two ways, purchased when stocks will go up and start relevance and consolidates the data by replacing the original on Initial Public Offering (IPO) and purchased through the information using the news aggregator. Once the stored data is secondary market we know the stock market. processed in pre-processing data stored in the data repository. Shares are an investment only for the upper classes, this The data repository contains data that has been cleared [7]. happens in the period before online. However, since the era of online trading is increasing where the transaction can use the Internet online network, stock transactions increasingly shifted into investment options for many people. This is supported by a minimum initial deposit more affordable for most people. Information about shares on the internet has become one of the sources of information for investors and traders. However, it remains an open debate whether the activity of stock message boards on the internet quickly shows the price or the value of the stock [2]. Figure 1. Data mining and knowledge discovery process of Database [11] Therefore, it is reasonable to link data mining with stock market forecasting to mine historical data from the stock B. Cluster Analysis Or Clustering market in helping to determine a better trading strategy. Given The data source clustering is another data preprocess for significant technical challenges and potential advantages, multiple data mining sources. In contrast to the classification many researchers feel the need to look for comprehensive data of data sources, this is a type of targeted learning. In other mining procedures that can produce accurate, consistent, and words, data clustering is a data cluster in accordance with the reliable forecasting results with potential benefits. Since the similarity between data without knowing the classes it has in stock market index contains many individual stocks and advance [8] . reflects a broader market movement than individual stock This paper contributes to the existing literature by movements, the forecasting of the stock market index has proposing a new time series model on clustering that (1) is attracted the attention of many researchers [3]. more accurate than conventional approaches, (2) measurable The dynamics of financial markets play a major role in the (on large datasets) due to the use of multi-resolution timing at functioning of the stock market, but the best predictive method various levels of clustering, 3) can overcome the limitations of has always been the topic of ongoing research and discussion. comparative clustering algorithms in finding the same time In the past decade, most methods have been based on stock series clusters in form. This important feature is very index or index data, using unclear hybrid models or artificial advantageous for the cooperative valuation in the stock market neural networks to predict its future [4]. [9]. Cluster analysis offers a useful way to organize and present II. THEORITICAL BACKGROUND a complex dataset [6] Analysis of the cluster can be regarded as the most popular techniques and foremost to solve problems A. Data Mining that around supervised learning or the learning process Data mining is focused on analyzing large amounts of undirected or unsupervised. So each technique used to solve data efficiently and extracting important, useful, and hidden problems with techniques like this, will certainly find a way of information from data by combining different techniques in dealing with the structure of the data that has not been different areas, such as pattern recognition, decision making, labeled [10] expert systems, knowledge discovery databases, artificial Being one of the most prominent grouping techniques intelligence, and statistics . The main types of data mining in science and technology, the k-means clustering algorithm include mining classification, cluster mining, association rule will be used in this study. A collection of sorting algorithms mining, text mining, and image mining [3]. One of the data will be used to group textual data collected from stock data on mining techniques such as clustering can be applied to the Indonesia Stock Exchange [4]. uncover hidden knowledge of stock data.. Clustering is the A rich set of methods has been proposed in the past process of grouping a set of objects into a class of similar to group congruent data elements. The K-Means grouping objects. Clusters are collections of data objects that resemble (KM) is the most popular method that divides data into each other in the same cluster and are different from those in disaggregated groups using Euclidean distance between data other clusters [5]. elements and parallel cluster centers [10]. Data Exploration is a preliminary investigation of data to K-means clustering algorithm then iterate to improve or understand its main characteristics and decide on the best increase the separation distances or similarities in the cluster. [11] The new averages will be calculated by this The Euclidean method used as a similar measurement algorithm for each cluster, using objects that are grouped into technique as above also satisfies the mathematical properties, clusters on the previous iteration. By using the newly updated as follows: [11]: center of the new cluster, all objects will then be regrouped. Non-negative: d (i, j) ≥ 0: Distance is not possible The iteration will continue until it reaches a stable grouping, negative. which means that the cluster formed at the last iteration is the The identity of an indistinguishable: d (i, i) = 0: Distance same as the cluster formed on the previous iteration [11]. object to itself adala h 0. Symmetrical: d (i, j) = d (j, i): The distance is a function III. RESEARCH DESIGN AND METHOD of symmetry. The application of cluster analysis in this paper applies four Triangle inequality: d (i, j) ≤ d (i, k) + d (k, j): The parts of cluster analysis Implemented on two attributes, namely distance the object i to j can not be greater than the volume and transaction value on 45 blue chip stocks in distance rotate through k. Indonesia Stock Exchange. The following plots below are the results of a study Source of data in this research is taken from data of analyzing cluster 45 blue chip stocks in the Indonesia Stock Indinesia Stock Exchange (http://www.idx.co.id/id- Exchange on 6th Nov 2017 transaction.. id/beranda/publikasi/lq45.aspx). The data used in this study is The Euclidean n method used as a similar measurement the latest update on November 5, 2017. technique as above also satisfies the mathematical properties, as follows: [11]: Non-negative: d (i, j) ≥ 0: Distance is not possible negative. The identity of an indistinguishable: d (i, i) = 0: Distance object to itself adala h 0. Symmetrical: d (i, j) = d (j, i): The distance is a function of symmetry. Triangle inequality: d (i, j) ≤ d (i, k) + d (k, j): The distance the object i to j can not be greater than the Figure 2. The Purpose Model distance rotate through k.
Figure 4 shows the proposed data mining model that
applies K-means clustering algorithm. The data base on general stock list consist of 500 list. We use selected dataset in LQ45 list with two parameter which is volume and value.
IV. RESULT AND DISCUSSION
In this cluster analysis paper with K-Means, using software to perform the mining process is Rapid miner studio. At the pre-processing stage determine the three attributes for cluster analysis, ie 1) the code of the stock attribute, 2) the transaction volume attribute, and 3) the attribute of the value of the stock. As the attribute identifier is the Attribute of the Figure 5. Plots cluster analysis results of 45 blue chip stocks stock code, while the volume attribute is an attribute that describes the number of shares traded and the attribute value The graphic between volume and value is visible in of the stock is to describe the total value of the transaction. the plot, cluster 0 has 18 objects, Cluster 1 has 8 objects, The Euclidian distance measurement method is used cluster 2 has 11 objects, and cluster 3 has 8 objects. Cluster 0, in the second analysis cluster in this study, applying the looks dominant in terms of number of object membership equations and inequalities between the measurement data compared to other clusters. In addition, the density of the objects. So for example, i = (i1 χ, χ i 2, ..., χ ip) and j = (χ j 1, χ distance between objects also looks very dominant. Based on j2, ..., χ j 1) are two of the objects described by the numerical the findings of the two properties, it can be concluded that the attribute p, then to measure the Euclidian distance between the stocks in LQ 45 are the most sought after investors is a objects is [11] : combination of stocks with low value transactions and low transaction volume. As is the case with many other types of studies, this study certainly is not a perfect study in the study . of cluster analysis 45 blue chip stocks in the Indonesia Stock Exchange. Some potential weaknesses in this study, were identified by the researchers, is the need for studies comparing the accuracy of cluster analysis when the study was done by comparing the various experiments by taking a number of mining procedure for daily stock market return forecasting,” different clusters. Other potential weakness, is the need for Neurocomputing, vol. 267, pp. 152–168, Dec. 2017. comparative studies of accuracy by using a cluster analysis [4] E. N. Desokey, A. Badr, and A. F. Hegazy, “Enhancing stock algorithm analysis of other clusters, such as k-medoid s or prediction clustering using K-means with genetic algorithm,” in more. 2017 13th International Computer Engineering Conference V. CONCLUSION (ICENCO), 2017, pp. 256–261. [5] R. Asif, A. Merceron, S. A. Ali, and N. G. Haider, “Analyzing This cluster analysis study can provide information quickly undergraduate students’ performance using educational data and efficiently to potential novice investors on the distribution map of 45 blue chip stocks in Indonesia Stock Exchange. mining,” Comput. Educ., vol. 113, pp. 177–194, Oct. 2017. [6] M. S. Packianather, A. Davies, S. Harraden, S. Soman, and J. White, The cluster analysis of 45 blue chip stocks in the Indonesia “Data Mining Techniques Applied to a Manufacturing SME,” Stock Exchange provides useful and quick information Procedia CIRP, vol. 62, pp. 123–128, 2017. visually to see the map of 45 blue chip stocks divided into four [7] R. Mythily, A. Banu, and S. Raghunathan, “Clustering Models for parts according to the needs in stock price attributes and share Data Stream Mining,” Procedia Comput. Sci., vol. 46, no. Icict transaction value so as to provide information quickly and 2014, pp. 619–626, 2015. accurate to quickly become the target of stock investors' [8] R. Wang et al., “Review on mining data from multiple data decisions. sources,” Pattern Recognit. Lett., vol. 0, pp. 1–9, Jan. 2018. REFERENCES [9] S. Aghabozorgi and Y. W. Teh, “Stock market co-movement [1] Y. Luo, J. Hu, X. Wei, D. Fang, and H. Shao, “Stock trends assessment using a three-phase clustering method,” Expert Syst. prediction based on hypergraph modeling clustering algorithm,” in Appl., vol. 41, no. 4 PART 1, pp. 1301–1314, 2014. 2014 IEEE International Conference on Progress in Informatics [10] V. Vijay, V. P. Raghunath, A. Singh, and S. N. Omkar, “Variance and Computing, 2014, pp. 27–31. Based Moving K-Means Algorithm,” in 2017 IEEE 7th [2] H. Leung and T. Ton, “The impact of internet stock message boards International Advance Computing Conference (IACC), 2017, no. i, on cross-sectional returns of small-capitalization stocks,” J. Bank. pp. 841–847. Financ., vol. 55, no. December 1997, pp. 37–55, 2015. [11] Han, J., and Kamber, M. (2012). “Data Mining: Concepts and Techniques”. 4th ed. San Francisco, Morgan Kaufmann Publishers. [3] X. Zhong and D. Enke, “A comprehensive cluster and classification
Recommendation System With Automated Web Usage Data Mining by Using K-Nearest Neighbour (KNN) Classification and Artificial Neural Network (ANN) Algorithm