Вы находитесь на странице: 1из 2

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)

Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 6, November December 2013 ISSN 2278-6856

Single-pass Interesting Frequent Pattern Mining: without Support Threshold


Parag Moteria1 , Dr Y R Ghodasara2
1

Gujarat Technological University, ISTAR College, MCA Department, Mota Bazar, Vallabh Vidyanagar 388 120, India
2

AIT Department, Anand Agricultural University, Anand 388 110, India

Abstract:

Discover interesting associated frequent items among hundreds of itemsets by specifying a user defined support threshold without any domain knowledge leads small or large numbers of uninterested results, is not appropriate. Without specifying minimum user defined support threshold is high cost effective process, but it helps to conclude interesting frequent itemsets without any domain knowledge. Frequent pattern mining techniques have become necessary for massive amount datasets in data mining approach.

Multiple scanning processes of whole transactional datasets to determine k numbers of itemsets from each transaction is major problem. Scanning process takes major time to determine frequent pattern itemsets. After determination of frequent pattern itemsets, another problem is to generate association rules and confidence rule, efficiently. Consider, TID is an identifier in transactional datasets says D. Association rule notation for two independent itemset is X => Y. Support (X => Y) = sup(XY) / |D| Confidence (X => Y) = sup(XY) / sup(X) A frequent itemset is an itemset whose occurrence is frequently in transactional datasets that is above certain threshold.

Keywords: Data Mining, Frequent Pattern Itemset, Minimum Support Count, Transactional Dataset

1. INTRODUCTION
Frequent itemset mining discovers association and correlation among items in large transactional data set. Market basket analysis is the best example of frequent pattern mining [1]. Frequent pattern mining is one of the most important and well researched techniques of data mining. Association rules can be useful for decisions concerning product pricing, promotions, store layout and many others [2]. Data mining is the process of finding interesting trends or patterns in large datasets to steer decision about future activities. To conclude interesting frequent itemsets from large transactional dataset, either specifying user defined support threshold or without specifying user defined support threshold. If user defined threshold is set too large, there may be only a small number of results or even no result. If user defined threshold is set too small, there may be only too large number of results. Mine all itemsets of cardinality greater than one whose support count are greater than or equals to certain threshold value [3]. Without domain knowledge, user specified threshold may generate either large numbers of result or small numbers of result or no result. Hence, our proposed paper may helpful to determine threshold instead of fix it.

3. PROPOSED WORK
Consider the following table:
TID T100 T200 T300 T400 T500 T600 T700 T800 T900 Table 1 List of ITEM IDs I1, I2, I5 I2, I4 I2, I3 I1, I2, I4 I1, I3 I2, I3 I1, I3 I1, I2, I3, I5 I1, I2, I3

Apply frequent pattern mining algorithm based on single scanning of whole transactional dataset (FPMA-SS) [4] on Table 1, we get following result:
Table 2 Cardinality 2 3 4 4 5 4 4 2 1 1 0 2

2. PROBLEM STATEMENT
There are different cases for minimum threshold to determine frequent pattern itemsets. But without domain knowledge, user specified threshold leads either large number of result or small number of result or no result [3]. Hence, determination of threshold instead of fix it, is helpful to find appropriate frequent pattern itemsets without domain knowledge. Volume 2, Issue 6 November December 2013

Item s I1 I2 I3 I4 I5

1 0 0 0 0 0

4 1 1 1 0 1

Page 116

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 6, November December 2013 ISSN 2278-6856
Consider cardinality wise minimum support count and final support count, say MSC[c_idx] always greater than one (where c_idx represents cardinality and c_idx=2, 3, , n) and FSC respectively. Determine FSC = Min { MSC[c_idx] } (where c_idx=2, 3, , n). For example, using Table 2 FSC = Min { 2, 4 } = 2.

References
[1] Jiawei Han and Micheline Kamber, Data Mining Concepts and Techniques - Third Edition, ELSEVIER Morgan Kaufman Publisher, July 6, 2011, pp. 227-240 [2] N. Goswami, Anshu Chaturvedi, C. S. Raghuvanshi, Frequent Pattern Mining Using Record Filter Approach",International Journal of Computer Science, Vol. 7, Issue 4, No 7, July 2010, pp 38-43 [3] Yin-Ling Cheung; Fu, A.W.-C., "Mining frequent itemsets without support threshold: with and without item constraints," Knowledge and Data Engineering, IEEE Transactions on , vol.16, no.9, pp.1052,1069, Sept. 2004 doi: 10.1109/TKDE.2004.44 keywords: {computational complexity;data mining;tree data structures;very large databases;BOMO algorithm;FPtree algorithm;Itemset-Loop algorithm;LOOPBACK algorithm;association rules mining;build-once and mine-once algorithm;frequent itemset mining;item constraints;minimum support threshold;Association rules;Dairy products;Data mining;Itemsets;Transaction databases;65;FPtree;Index TermsAssociation rules;N-most interesting itemsets;item constraints.}, URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnu mber=1316834&isnumber=29187 [4] Parag Moteria, Dr Y R Ghodasara, Frequent Pattern Mining Algorithm Based on Single Scanning of Whole Transactional Dataset, Vol. 2 Isuue 3 (March 2013), International Journal of Engineering Research & Technology (IJERT), ISSN: 2278-0181, www.ijert.org [5] Shariq Bashir, Zahoor Jan, A. Rauf Baig, Fast Algorithms for Mining Interesting Frequent Itemsets without Minimum Support, http:// arvix.org/ftp/arvix/papers/0904/0904.3319.pdf [Accessed: Sept. 13, 2013]

4. CALCULATE SUPPORT COUNT


Using FPMA-SS [4] & FSC, yield following frequent pattern itemsets as result: Our proposed algorithm yields same frequent items say I1, I2, I3 and I5, those are result of apriori algorithm. The apriori algorithm: (Cardinality = 2), Support count of {I1,I2} = 4 Proposed paper: (Cardinality = 2), Support count of {I1,I2} = Min{4,5} = 4 The apriori algorithm: (Cardinality = 3), Support count of {I1, I2, I3} = 2 Proposed paper: (Cardinality = 3), Support count of {I1, I2, I3} = Min{4,4,2} = 4 Only single-pass of transactional dataset is required using our proposed paper with determination of threshold instead of fix it, yields support count of each item in transactional datasets.

5. CALCULATE CONFIDENCE COUNT


We get {I1, I2, I3, I5} as frequent itmeset in result. Power set of these resultant frequent itemset consists sixteen different unique combinations to calculate confidence count. For example, Conf(I1 => I2) = (sup(I1,I2)) / sup(I1) = Min {sup (I1), sup (I2)} / sup (I1) = Min {4, 5} / 6 =4/6 In above, sup (I1, I2) , there are two items in support count. Therefore, we refer column with cardinality two from Table 2. When we have single item in support count, it refers total count of specific item from the Table 2.

AUTHOR
Parag Moteria received the B.Sc. (Mathematics) and M.C.A. degrees from Sardar Patel University in 2001 and Saurashtra University in 2004 respectively. Currently working as Assistant Professor at ISTAR MCA Department, Vallabh Vidyanagar 388 120 India. Objective to develop efficient frequent pattern mining theory to enhance result with accuracy.

6. CONCLUSION
Without domain knowledge, user specified threshold may generate either large numbers of result or small numbers of result or no result. It may consume exceedingly long time in the computation or more storage [3]. Hence, our proposed task may be useful to determine final support count threshold without domain knowledge to enhance and optimize computation to find frequent pattern itemset. Volume 2, Issue 6 November December 2013
Dr Y R Ghodasara received the Ph.D. degree from Saurashtra University in 2009. Currently working as Associate Professor at B.Tech. Agricultural Information Technology Department, Anand Agricultural University, Anand 388 110 India. Data Mining, Distributed Computing, Distributed Operating System, and Software Engineering are key areas to innovate new concepts and techniques for society.

Page 117

Вам также может понравиться