You are on page 1of 34

Mining Association Rules using Optimal Genetic Algorithm & Quantum Swarm intelligent PSO.

Presented By K.Indira Under the Guidance of Dr. S. Kanmani, Professor, Department of Information Technology, Pondicherry Engineering College.
1

Contents
Objective. Introduction.
Data Mining. Association Analysis. Limitations of the existing system. GA and PSO An Introduction.

Existing Work.
Based on GA. Based on PSO.

Work Done So far. Proposed Work. Execution Plan. Papers Published. References.
2

Objective

To Propose an efficient methodology for mining of ARs using Optimal Genetic Algorithm & Quantum Swarm intelligent PSO

Data Mining
Extraction of interesting information or patterns from data in large databases is known as data mining.

Association Rules
Tid

Items bought Milk, Nuts, Sugar Milk, Coffee, Sugar Milk, Sugar, Eggs Nuts, Eggs, Bread
Nuts, Coffee, Sugar , Eggs, Bread Customer buys both

10 20 30 40 50

Customer buys sugar

Find all the rules X Y with minimum support and confidence Support, s, probability that a transaction contains X Y Confidence, c, conditional probability that a transaction having X also contains Y

Customer buys milk

Let minsup = 50%, minconf = 50% Freq. Pat.: Milk:3, Nuts:3, Sugar:4, Eggs:3, {Milk, Sugar}:3 Association rules: Milk Sugar (60%, 100%) Sugar Milk (60%, 75%)
5

Limitations of Existing System

Apriori, FP Growth Tree, clat are some of the


popular algorithms for mining ARs. Traverse the database many times. I/O overhead, and computational complexity is more Cannot meet the requirements of large-scale

database mining.

GA and PSO An Introduction


Evolutionary algorithms provide robust and efficient approach in exploring large search space. A Genetic Algorithm (GA) is a procedure used to find approximate solutions to search problems through the application of the principles of evolutionary biology. PSOs mechanism is inspired by the social and cooperative behavior displayed by various species like birds, fish etc including human beings.

Existing Work
Mining ARs Based on Genetic Algorithm
Efficient Distributed Genetic Algorithm done by spatial partitioning of the population into several semi-isolated nodes, each evolving in parallel and possibly exploring different regions of the search space. Genetic algorithm without taking the minimum support and confidence into account. Extracts the best rules that have best correlation between support and confidence Improved niched Pareto genetic algorithm(INPGA), selects the accurate candidates and also saves selection time with combining BNPGA and SDNPGA GRA with a new operator, called guided mutation is introduced. GRA considers the correlation coefficient between nodes in each individual of GRA.
8

Existing Work
Mining ARs Based on Particle Swarm Optimization

contd..

A novel algorithm for association rule mining in order to improve computational efficiency as well as to automatically determine suitable threshold values. The algorithm operates at three evolution levels where an adaptive inertia weight is presented. The safety distance is introduced to move the particle through its current position, and the proximity index.

Self-adaptive method to adjust the inertia weight of the velocity update rule based on the empirical values and negative feedback technique is introduced ,which relieve the burden of specifying the parameters values.
Combines Particle Swarm Optimization (PSO) and Genetic Algorithms (GAs) using fuzzy logic to integrate the results of both methods and for parameters tuning. The new optimization method combines the advantages of PSO and GA to give us an improved FPSO + FGA hybrid approach.
9

Work Done so Far


Association Rule Mining was carried out using the Genetic Algorithm in Matlab 2008a. Mining of Association rule was carried out using self Adaptive Genetic algorithm using Java.

The GA Parameters were varied and the results were recorded for each cases.

10

Mining ARs using GA in Matlab 2008a.


Methodology
Selection
Crossover Probability Mutation Probability Fitness Function Dataset

:
: : : :

Tournament
Fixed ( Tested with 3 values) No Mutation

Lenses, Iris, Haberman from UCI Irvine repository. Fixed ( Tested with 3 values)

Population

11

Flow chart of the GA

12

Results Analysis
Comparison based on variation in population Size.
No. of Instances No. of Instances * 1.25 No. of Instances *1.5 No. of No. of No. of Accuracy Accuracy Accuracy Generations Generations Generations % % % 75 7 82 12 95 17 71 114 68 88 64 70 77 88 87 53 82 45

Lenses
Haberman

Iris

Comparison based on variation in Minimum Support and Confidence


Minimum Support & Minimum Confidence Sup = 0.4 & con Sup =0.9 & con =0.9 Sup = 0.9 & con = Sup = 0.2 & con = =0.4 0.2 0.9
Accuracy %

No. of Gen 20 68 28

Accuracy

% 49 58 59

No. of Gen. 11 83 37

Accuracy

% 70 71 78

No. of Gen. 21 90 48

Accuracy

% 95 62 87

No. of Gen 18 75 55

Lenses
Haberman

22 45 40

Iris

Comparison based on variation in Crossover Probability


Pc = .25 Accuracy % No. of
Generations

Cross Over Pc = .5 Accuracy % No. of


Generations

Pc = .75 Accuracy % No. of


Generations

Lenses Haberman Iris

95 69 84

8 77 45

95 71 86

16 83 51

95 70 87

13 80 55

Comparison of the optimum value of Parameters for maximum Accuracy achieved


Dataset No. of No. of
Populatio n

Minimum

Instance attributes s

Size 36 306 225

Support 0.2 0.9 0.2

Minimum confidence

Crossover Accuracy in % rate

Lenses
Haberman

Iris

24 306 150

4 3 5

0.9 0.2 0.9

0.25 0.5 0.75

95 71 87
14

Inferences
Values of minimum support, minimum confidence and population size decides upon the accuracy of the system than other GA parameters. Crossover rate affects the convergence rate rather than the accuracy of the system. The optimum value of the GA parameters varies from data

to data and the fitness function plays a major role in


optimizing the results. The size of the dataset and relationship between attributes in data contributes to the setting up of the parameters.
15

Mining ARs using Self Adaptive GA in Java.


Methodology
Selection
Crossover Probability Mutation Probability

:
: :

Roulette Wheel
Fixed ( Tested with 3 values) Self Adaptive

Fitness Function Dataset

: : Lenses, Iris, Car from UCI Irvine repository. Fixed ( Tested with 3 values)
16

Population

Procedure SAGA Begin Initialize population p(k); Define the crossover and mutation rate; Do { Do { Calculate support of all k rules; Calculate confidence of all k rules; Obtain fitness; Select individuals for crossover / mutation; Calculate the average fitness of the n and (n-1) the generation; Calculate the maximum fitness of the n and (n-1) the generation; Based on the fitness of the selected item, calculate the new crossover and mutation rate; Choose the operation to be performed; } k times; }
17

Self Adaptive GA
SELF ADAPTIVE

Results Analysis
ACCURACY COMPARISON BETWEEN GA AND SAGA WHEN PARAMETERS ARE IDEAL FOR TRADITIONAL GA

Dataset

Lenses Haberman Car Evaluation

Traditional GA No. of Accuracy Generations 75 38 52 36 85 29

Self Adaptive GA Accuracy No. of Generations 87.5 68 96 35 28 21

ACCURACY COMPARISON BETWEEN GA AND SAGA WHEN PARAMETERS ARE ACCORDING TO TERMINTAION OF SAGA

Dataset

Lenses Haberman Car Evaluation

Traditional GA Self Adaptive GA No. of No. of Accuracy Accuracy Generations Generations 50 35 87.5 35 36 38 68 28 74 36 96 21
19

Inferences
Better accuracy.

Better convergence.
Self Adaptive GA gives better accuracy than

Traditional GA.

Proposed Work
1. To implement a Distributive niched Pareto memetic Algorithm for Rule Mining. 2. To propose a association rule mining algorithm based on Chaotic PSO and swarm intelligence.

3. Propose a Particle swarm optimization rule mining methodology combined with quantum computing and quantum differential evolution

21

Niched Pareto Selection Algorithm


Obtains the comparison set S from clustering based samples.
For any two candidates and comparison set S, if one candidate is dominated and the other not, the candidate non-dominated is

selected, Exit.
If two candidates (cd_1 and cd_2) compute the number of samples in two niches, count1 and count2. If count1=0, cd_1 is selected and if count2=0, cd_2 is selected, Exit. If count1-count2>delta or count2-count1>delta, then selects cd_2 or cd_1, Exit.. If abs(count1-count2)<delta, computing the standard deviation of two niches,sd1 and sd2. If sd1>sd2, cd_1 is selected, otherwise, cd_2 is selected. Exit
22

Distributed Model
Rules Generated

GA1 Rules Generated


subpopulation

GA4 subpopulation

Concept

Full Dataset

Description

GA2 subpopulation

Rules Generated Rules Generated GA3 subpopulation


23

Association Rule mining Algorithm based on Chaotic PSO and Swarm intelligence.

Swarm Intelligence Concept


Based on chaotic maps
24

Execution Plan
July : Niched Pareto Sampling based Selection. Implementing GA for Local intensity Search. Distributed Methodology Implementation. Preparing the Above work as a paper. Particle Swarm Optimization based Rule Mining to be implemented.

August September & October

November

Chaotic PSO & Swarm intelligence based PSO for Mining ARs to be implemented. Documenting the same into paper.

December & January

Study on Quantum computing and differential Evolution concepts.

25

Papers Published
Paper titled Framework for Comparison of Association Rule Mining Using Genetic Algorithm has been presented in the International Conference On Computers, Communication & Intelligence at VCET, 2010.

Paper titled Mining Association Rules Using Genetic Algorithm: The role of Estimation Parameters has been Selected for presentation in the International conference on advances in computing and communications ,2011. To be published in Springer LNCS (CCIS) series.

Paper titled Rule Acquisition in Data Mining Using a Self Adaptive Genetic Algorithm has been Selected for presentation in the First International conference on Computer Science and Information Technology (CCSEIT-2011) , To be published in Springer LNCS (CCIS) series.

26

References
Jing Li, Han Rui-feng, A Self-Adaptive Genetic Algorithm Based On RealCoded, International Conference on Biomedical Engineering and computer Science , Page(s): 1 - 4 , 2010 Chuan-Kang Ting, Wei-Ming Zeng, Tzu- Chieh Lin, Linkage Discovery through Data Mining, IEEE Magazine on Computational Intelligence, Volume 5, February 2010. Caises, Y., Leyva, E., Gonzalez, A., Perez, R., An extension of the Genetic Iterative Approach for Learning Rule Subsets , 4th International Workshop on Genetic and Evolutionary Fuzzy Systems, Page(s): 63 - 67 , 2010 Shangping Dai, Li Gao, Qiang Zhu, Changwu Zhu, A Novel Genetic Algorithm Based on Image Databases for Mining Association Rules, 6th IEEE/ACIS International Conference on Computer and Information Science, Page(s): 977 980, 2007 Peregrin, A., Rodriguez, M.A., Efficient Distributed Genetic Algorithm for Rule Extraction,. Eighth International Conference on Hybrid Intelligent Systems, HIS '08. Page(s): 531 536, 2008
27

References

Contd..

Mansoori, E.G., Zolghadri, M.J., Katebi, S.D., SGERD: A Steady-State Genetic Algorithm for Extracting Fuzzy Classification Rules From Data, IEEE Transactions on Fuzzy Systems, Volume: 16 , Issue: 4 , Page(s): 1061 1071, 2008.. Xiaoyuan Zhu, Yongquan Yu, Xueyan Guo, Genetic Algorithm Based on Evolution Strategy and the Application in Data Mining, First International Workshop on Education Technology and Computer Science, ETCS '09, Volume: 1 , Page(s): 848 852, 2009 Hong Guo, Ya Zhou, An Algorithm for Mining Association Rules Based on Improved Genetic Algorithm and its Application, 3rd International Conference on Genetic and Evolutionary Computing, WGEC '09, Page(s): 117 120, 2009 Genxiang Zhang, Haishan Chen, Immune Optimization Based Genetic Algorithm for Incremental Association Rules Mining, International Conference on Artificial Intelligence and Computational Intelligence, AICI '09, Volume: 4, Page(s): 341 345, 2009
28

References
Maria J. Del Jesus, Jose A. Gamez, Pedro Gonzalez, Jose M. Puerta, On the Discovery of Association Rules by means of Evolutionary Algorithms, from Advanced Review of John Wiley & Sons , Inc. 2011

Junli Lu, Fan Yang, Momo Li, Lizhen Wang, Multi-objective Rule Discovery Using the Improved Niched Pareto Genetic Algorithm, Third International Conference on Measuring Technology and Mechatronics Automation, 2011.
Hamid Reza Qodmanan, Mahdi Nasiri, Behrouz Minaei-Bidgoli, Multi Objective Association Rule Mining with Genetic Algorithm without specifying Minimum Support and Minimum Confidence, Expert Systems with Applications 38 (2011) 288298. Miguel Rodriguez, Diego M. Escalante, Antonio Peregrin, Efficient Distributed Genetic Algorithm for Rule Extraction, Applied Soft Computing 11 (2011) 733743. J.H. Ang, K.C. Tan , A.A. Mamun, An Evolutionary Memetic Algorithm for Rule Extraction, Expert Systems with Applications 37 (2010) 13021315.
29

References

Contd..

R.J. Kuo, C.M. Chao, Y.T. Chiu, Application of particle swarm optimization to association rule mining, Applied Soft Computing 11 (2011) 326336. Bilal Alatas , Erhan Akin, Multi-objective rule mining using a chaotic particle swarm optimization algorithm, Knowledge-Based Systems 22 (2009) 455460. Mourad Ykhlef, A Quantum Swarm Evolutionary Algorithm for mining association rules in large databases, Journal of King Saud University Computer and Information Sciences (2011) 23, 16. Haijun Su, Yupu Yang, Liang Zhao, Classification rule discovery with DE/QDE algorithm, Expert Systems with Applications 37 (2010) 12161222. Jing Li, Han Rui-feng, A Self-Adaptive Genetic Algorithm Based On RealCoded, International Conference on Biomedical Engineering and computer Science , Page(s): 1 - 4 , 2010 Chuan-Kang Ting, Wei-Ming Zeng, Tzu- Chieh Lin, Linkage Discovery through Data Mining, IEEE Magazine on Computational Intelligence, Volume 5, February 2010.

30

References

Contd..

Caises, Y., Leyva, E., Gonzalez, A., Perez, R., An extension of the Genetic Iterative Approach for Learning Rule Subsets , 4th International Workshop on Genetic and Evolutionary Fuzzy Systems, Page(s): 63 - 67 , 2010 Xiaoyuan Zhu, Yongquan Yu, Xueyan Guo, Genetic Algorithm Based on Evolution Strategy and the Application in Data Mining, First International Workshop on Education Technology and Computer Science, ETCS '09, Volume: 1 , Page(s): 848 852, 2009

31

References
Miguel Rodriguez, Diego M. Escalante, Antonio Peregrin, Efficient Distributed Genetic Algorithm for Rule extraction, Applied Soft Computing 11 (2011) 733743. Hamid Reza Qodmanan , Mahdi Nasiri, Behrouz Minaei-Bidgoli, Multi objective association rule mining with genetic algorithm without specifying minimum support and minimum confidence, Expert Systems with Applications 38 (2011) 288298. Junli Lu, Fan Yang, Momo Li1, Lizhen Wang, Multi-objective Rule Discovery Using the Improved Niched Pareto Genetic Algorithm, 2011 Third International Conference on Measuring Technology and Mechatronics Automation. Yan Chen, Shingo Mabu, Kotaro Hirasawa, Genetic relation algorithm with guided mutation for the large-scale portfolio optimization, Expert Systems with Applications 38 (2011) 33533363.
32

References
R.J. Kuo, C.M. Chao, Y.T. Chiu, Application of particle swarm optimization to association rule mining, Applied Soft Computing 11 (2011) 326336 Yamina Mohamed Ben Ali, Soft Adaptive Particle Swarm Algorithm for Large Scale Optimization, IEEE 2010. Feng Lu, Yanfeng Ge, LiQun Gao, Self-adaptive Particle Swarm Optimization Algorithm for Global Optimization, 2010 Sixth International Conference on Natural Computation (ICNC 2010) Fevrier Valdez, Patricia Melin, Oscar Castillo, An improved evolutionary method with fuzzy logic for combining Particle Swarm Optimization and Genetic Algorithms, Applied Soft Computing 11 (2011) 26252632

33

Thank You

34