Вы находитесь на странице: 1из 5

International Journal of Engineering Associates (2320 0804) / #27 / Volume 2 Issue 6

Implementation Of C-Trend For Commercial Application


Prof. Nilesh Shah#1,Latika Chaudhary#2, Aditya Rajmane#3,Neha Patil#4,Komal Khatke#5
Department of Computers, Padmabhushan Vasantdada Patil Pratishthans College of Engineering, Mumbai latikac92@gmail.com(8879309384), adi_rajmane@rediffmail.com(9768975739), neha_patil92@yahoo.co.in(9870669255),komalkhatke@gmail.com(9664683567)

Abstract Data mining has made broad and significant progress since its early beginnings. Today data mining is used in a vast array of areas, and numerous commercial data mining system that are available. There are many data mining systems and research prototypes to choose from. When selecting a data mining product that is appropriate for ones task, it is important to consider various features of data mining systems from a multidimensional point of view. Researchers have been striving to build theoretical foundations for data mining. Various clustering techniques have been used for identifying and visualizing trends in multi-attribute transactional data . This paper introduce Cluster-based Temporal Representation of EvenT Data (C-TREND), a system that implements the temporal cluster graph construct, which maps multiattribute temporal data to a two-dimensional directed graph that identifies trends in dominant data types over time. Keywords Clustering, Data Visualization, Trend Analysis, Multiattribute Data

I. INTRODUCTION Since the inception of information storage, the ability to sift through and analyze huge amounts of information was a dream sought out for in many ways and through different ways. With the advent of electronic and magnetic data storage, rational databases emerged as one of the efficient and widely used method to store data. Data stored in such large databases are not always comprehendible by humans, it needed to be filtered and analyzed first. Stored records are raw amounts of data poor in information, not only is it large and seamlessly irrelevant but also continuously increasing, updating and changing[8]. Organizations and firms are increasingly capturing more transactional data, containing multiple attributes and some measure of time, example through their websites, e-commerce firms capture clickstream and purchasing behavior of their customers[2]. Real life transactional data often poses challenges such as very large size, high dimensionality. Consider the problem of technology forecasting for a firm. Technologies possess many features that change over time and understanding how a technology evolves requires trend analysis of multiple attributes at once. Similar issues arise in the trend analysis of consumer purchasing behavior and many other business intelligence applications[3]. Business intelligence

applications represent to help firms gather and analyze information about their performance, customers, information about their performance, customers, competitors, and business environment. Knowledge representation and data visualization tools constitute one form of business intelligence techniques that present information to users in a manner that supports business decision-making processes[5]. Identifying temporal relationships or trends in data constitutes an important problem that is relevant in many business and academic settings, and the data mining literature has provided analytical techniques for some specialized types of temporal data, such as time series analysis and sequence analysis techniques. However, temporal data can take many forms, most commonly being general multi-attribute transactional data with a timestamp, for which time series or sequence analysis methods are not particularly well suited. The ability to identify trends in such general temporal faced data can provide significant benefits, to provide competitive advantage to a firm performing forecasts or making decisions on future investment and strategies. In this paper, we present, CTREND-Cluster-based Temporal Representation of EveNt Data, a new method for discovering and visualizing trends and temporal patterns in transactional attribute-value data that builds upon standard data mining clustering techniques.

II.RELATED WORK Data mining and visualizations are knowledge discovery tools used for autonomous analysis of data stored in large sets in many different ways. Data mining is a knowledge discovery process; it is the analysis step of knowledge discovery in databases or KDD for short. Identifying and visualizing temporal relationships (e.g., trends) in data constitutes an important problem that is relevant in many business, scientific, and academic settings[5]. Large data sets of data cannot possibly be analyzed manually; mining tools and visualization provide automated means to comprehend such data sets. In this section, we provide a brief review of related research in the temporal data mining and visualization streams.

2013 IJEA. ALL RIGHTS RESERVED

27

International Journal of Engineering Associates (2320 0804) / #28 / Volume 2 Issue 6

Fig 1.What is Data Mining? 1. Temporal Data Mining Temporal data mining is a single step in the process of knowledge discovery in temporal databases that enumerates structures (temporal patterns or models) over the temporal data. Temporal data mining is concerned with the analysis of temporal data and for finding temporal patterns and regularities in sets of temporal data. Also temporal data mining techniques allow for the possibility of computerdriven, automatic exploration of the data 2. Clustering A computer cluster is a group of linked computers, working together closely so that in many respects they form a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area networks. Clusters are usually deployed to improve performance and availability over that provided by a single computer, while typically being much more cost-effective than single computers of comparable speed or availability. 3. Trend analysis The term "trend analysis" refers to the concept of collecting information and attempting to spot a pattern, or trend, in the information. In some fields of study, the term "trend analysis" has more formally-defined meanings. The data mining techniques uses clusters identified in multiple time periods and identifies trends based on similarities between clusters over time. It is a clustering approach for discovering temporal patterns, which builds on temporal clustering methods. Trend analysis decomposes time-series data into trend movements, cyclic movements, seasonal movements, and irregular movements. 4. Data Visualization Visual data mining integrates data mining and data visualization to discover implicit and useful knowledge from large data sets. Visual data mining includes data visualization, data mining results visualization, data mining process visualization and interactive visual data mining. Data visualization is the process of presenting data in some visual form and allowing the human to interact with the data.

i.Hierarchical Clustering: Hierarchical clustering is one of the visualization techniques partition all dimensions into subsets. The subsets are visualized in a hierarchical manner. ii. Dendrogram: A Dendrogram is a tree-structured graph used in heat maps to visualize the result of a hierarchical clustering calculation. The result of a clustering is presented either as the distance or the similarity between the clustered rows or columns depending on the selected distance measure. iii.Temporal cluster graph: To proceed with the temporal cluster graph firstly, transactional data set is to be partitioned with respect to time and then clustering technique is applied. The temporal cluster graph is a directed graph that consists of set of nodes and directed edges.

Fig 2.Reducing multi-attribute temporal complexity by partitioning data into time periods and producing a temporal cluster graph

III.THE C-TREND TECHNIQUE 1. C-Trend Overview C-TREND is designed to work with what we term transactional attribute-value data. Specifically, transactional attribute-value data is a general form of temporal data that consists of a collection of records each with a time stamp and described by a set of attributes. Examples of this type of data include shopping cart data with a sale timestamp and numerical values representing the number of products purchased in certain as well as product description data that includes a release date and a set of indicators representing the presence and/or quantity of specific product features.

2013 IJEA. ALL RIGHTS RESERVED

28

International Journal of Engineering Associates (2320 0804) / #29 / Volume 2 Issue 6

Fig 3.Overview Of C-Trend Process C-TREND is the system implementation of the temporal cluster-graph-based trend identification and visualization technique; it provides an end user with the ability to generate graphs from data and adjust the graph parameters. C-TREND consists of two main phases: 1) offline preprocessing of the data and 2) online interactive analysis and graph rendering. In the preprocessing phase, the data set is partitioned based on time periods, and each partition is clustered using one of many traditional clustering techniques such as a hierarchical approach The results of the clustering for each partition are used to generate two data structures: the node list and the edge list.

view of the temporal cluster graph. Note that in this initial implementation, the time partition size is set exogenously by the user and stays constant throughout preprocessing and online interactive analysis. We followed this approach because of the domain-specific nature of time granularity. For example, for analyzing technology evolution, the desired time granularity could be a year, whereas financial market trend analysis may require a much smaller time window. For this reason, we decided to rely on the domain expert to specify the most appropriate time granularity for a given application. However, an important future extension for this research would be to provide the ability to adjust the time granularity interactively in real time. 2. Data Preprocessing An important requirement for real-time graph customization in C-TREND is the pre computation of multiple clustering solutions from the initial data set. Depending on the type of clustering algorithm employed, the cluster solution can be stored in a way that maximizes the efficiency of the output graph customization. C-TREND can be implemented with multiple different standard clustering algorithms (e.g., agglomerative or divisive hierarchical clustering or partitionbased clustering). C-TREND utilizes optimized Dendrogram data structures for storing and extracting cluster solutions generated by hierarchical clustering algorithms. C-TREND is the system implementation of the temporal cluster graphbased trend identification and visualization technique; it provides an end user with the ability to generate graphs from data and adjust the graph parameters. C-TREND produces a dendrogram for each data partition and utilizes a global input value N that represents the maximum-sized cluster solution maintained for each data partition. For all practical purposes, a useful solution will consist of a set of N <<n clusters (n is number of data points in partition i) and, therefore, CTREND has to store only 2N -1 nodes per partition. We have found that maintaining a maximum solution size consisting of N 50 clusters is more than sufficient for many practical applications of data visualization The value of N can also be set by the user before the preprocessing phase.

Fig 4.The C-Trend Process Creating this list in the preprocessing phase allows for more effective (real-time) visualization updates of the C-TREND output graphs. Based on these data structures, graph entities (nodes and edges) are generated and rendered as a temporal cluster graph in the system output window. In the interactive analysis, C-TREND allows the user to modify ki(i=1,t), and on demand in real time and, as a result, update the

Fig 5.Dendogram example with node n=10 A dendrogram data structure allows for quick extraction of any specific clustering solution for each data partition when

2013 IJEA. ALL RIGHTS RESERVED

29

International Journal of Engineering Associates (2320 0804) / #30 / Volume 2 Issue 6

the user changes partition zoom level ki. To obtain a specific clustering solution from the data structure for data partition Di, C-TREND uses the DENDRO_EXTRACT algorithm (Algorithm 1), which takes the desired number of clusters in the solution ki as an input and returns the set CurrCl containing the clusters corresponding to the ki-sized solution. Cluster attributes such as center and size are then accessible from the corresponding dendrogram data structure by referencing the clusters in CurrCl.

assigned a valid pass flag. The nodes that have both valid k-pass and pass flags make up the set of nodes that are both large enough and in the desired clustering solution and therefore are included in the output graph. In our implementation, a list of all possible edges and their weights is generated during preprocessing. Each edge in the list possesses pass flag. When is changed, all edges with a passing weight (based on ) are assigned a valid pass flag, and all others are assigned an invalid flag. Only edges that have a valid pass flag and are incident to two valid nodes (nodes with valid k-pass and pass flags) are included in the output graph. Using the implementation described above, CTREND can update output graphs based on user changes to the ki, , and parameters very efficiently. Changing any one parameter requires only one operation to update the corresponding flag in the data structure for a given node or edge[5].

DENDRO_EXTRACT starts at the root of the dendrogram and traverses the dendrogram by splitting the highest numbered node in the current set of clusters until k clusters are included in the set. MaxCl represents the highest element in the current cluster set CurrCl. It is easy to see that because of the specific dendrogram structure, it is always the case that MaxCl=DendrogramRooti- |CurrCl| +1. Furthermore, the dendrogram data array maintains the successive levels of the hierarchical solution in order; therefore, replacing MaxCl by its children MaxCl:Left (left child) and MaxCl:Right (right child) is sufficient for identifying the next solution level in the dendrogram. DENDRO_EXTRACT is linear in time complexity O(ki), which provides for the real-time extraction of cluster solutions. 3. Interactive Data Analysis C-TREND utilizes a series of validation flags to maintain and update the displayed state of the output trend graph. Combinations of the validation flags are used to determine whether or not each possible edge and node should be displayed in the graph, and as these flags change, the displayed components of the graph also change. Each cluster in the node list possesses two flags: ki -pass and a-pass. These flags are used to indicate whether the cluster should be included in the output graph based on the ki value and the a value, respectively. Specifically, when ki is changed, the dendrogram data structure is updated so that only the clusters that should be extracted for the clustering solution of size ki have a valid k-pass flag. Similarly, when is changed, the dendrogram data structure is updated so that only the clusters that are large enough to pass the node filter based on are

Fig 6.Implementation Of C-Trend IV.ADVANTAGES C-TREND provides three advantages over existing techniques. First, C-TREND presents temporal data in a unique and intuitive manner that emphasizes trends between dominant transaction types over time, and its output graphs resemble evolutionary diagrams and naturally portray the changes in data characteristics over time. Second, C-TREND is a meta-analysis tool for data mining results and, therefore, is designed to provide the domain expert with substantial

2013 IJEA. ALL RIGHTS RESERVED

30

International Journal of Engineering Associates (2320 0804) / #31 / Volume 2 Issue 6

control over the data presentation. In particular, C-TREND provides the user with the ability to adjust all key parameters for creating output trend graphs, which allows a domain expert to visualize the data in a manner that provides the most value. Third, C-TREND presents a set of graph statistics which, in our future work, will provide a means for developing new trend metrics and a framework for performing hypothesis testing on the existence and characteristics of trends.

VI.CONCLUSION By harnessing computational techniques of data mining, we have developed a temporal clustering technique for discovering, analyzing, and visualizing trends in multiattribute temporal data. The proposed technique is versatile and gives significant data representation power to the user domain experts have the ability to adjust parameters and clustering mechanisms to fine-tune trend graphs. It is also scalable: the time required to adjust trend parameters is quite low even for larger data sets, which provides for real-time visualization capabilities. The proposed technique is applicable in many data analysis contexts, and can provide insights for analysts performing historical analyses and generating forecasts.

VIII.REFERENCES
[1]Simmi Bagga, Dr. G.N. Singh Application of Data Mining,International Journal for Science and Emerging Technologies with Latest Trends,1(1):19-23. [2] B.Ratnamala, P.M.Kiran Temporal Cluster graphs for visualizing Trends, National Conference on Advances in Computer Science and Applications with International Journal of Computer Applications (NCACSA 2012), Proceedings published in International Journal of Computer Applications (IJCA). [3] D.Radha Rani, A.Vini Bharati, P.Lakshmi Durga Madhuri, M.Phaneendra Babu, A.Sravani, Analysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data, International Journal of Engineering Trends and Technology- Volume3Issue1- 2012,pp 14-18. [4]Arna Prabha Jena, Annan Naidu A Review of C-TREND Using Complete-Link Clustering for Transactional Data Arna Prabha Jena et al./ International Journal of Computer Science & Engineering Technology (IJCSET)- Vol. 4 No. 07 Jul 2013,pp 850854. [5] Gediminas Adomavicius, Member, IEEE, and Jesse Bockstedt C-TREND: Temporal Cluster Graphs for Identifying and Visualizing Trends in Multiattribute Transactional Data IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. 6, JUNE 2008,pp 721-735. [6] Sotiris Kotsiantis, Dimitris Kanellopoulos Association Rules Mining: A Recent Overview GESTS International Transactions on Computer Science and Engineering, Vol.32 (1), 2006, pp. 71-82 [7]Jeffrey Hsu DATA MINING TRENDS AND DEVELOPMENTS :The Key Data Mining Technologies and Applications forthe 21st Century [8] AbdulRahman R. Alazmi, AbdulAziz R. Alazmi Data Mining And Visualization of Large Databases International Journal of Computer Science and Security (IJCSS), Volume (6): Issue (5) : 2012,pp 295-314.

2013 IJEA. ALL RIGHTS RESERVED

31

Вам также может понравиться