Вы находитесь на странице: 1из 8

International Journal of Computer Engineering and Technology ENGINEERING (IJCET), ISSN 0976INTERNATIONAL JOURNAL OF COMPUTER 6367(Print), ISSN 0976

6375(Online) Volume 4, Issue 3, May June (2013), IAEME & TECHNOLOGY (IJCET)

ISSN 0976 6367(Print) ISSN 0976 6375(Online) Volume 4, Issue 3, May-June (2013), pp. 93-100 IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2013): 6.1302 (Calculated by GISI) www.jifactor.com

IJCET
IAEME

META-HEURISTIC BASED CLUSTERING OF TWO-DIMENSIONAL DATA USING NEIGHBOURHOOD SEARCH WITH DATA MINING TECHNIQUE AS AN APPLICATION OF P-MEDIAN PROBLEM
1

D.Srinivas Reddy, 2Dr A.Govardhan, 3S.S.V.N Sharma


1

Vaageswari College of Engineering, Karimnagar, India JNTUH University, Director of Evaluation, Hyderabad, India 3 Director of Vaagdevi College of Engineering, India

ABSTRACT The p-median difficulty is the well known facility allocation essence. The neighborhood search (NS) algorithm provides a solution to that problem. This NS algorithm is hybridized with data mining (DM) technique to attain a solution that is better than NS approach [1]. Two new Metaheuristic clustering algorithms HDMNS (Hybrid Data Mining based Neighborhood Search) and HMDMNS (Hybrid Multiple Data Mining based Neighborhood Search) are proposed schemes which are hybrid versions of NS algorithm [10, 11 and 17]. The clustering technique is the process of dividing the points into related groups. The proposed NS with DM technique method can also be used as a clustering algorithm based on the nature of the p-median problem. The proposed algorithm has two phases; First phase in both algorithms constructs the basic solution using the NS approach. In HDMNS approach DM technique is used to locate the improved solution which gives the better clustering of two-dimensional data space. In HMDMNS the second phase is also uses the DM technique to get the best quality cluster based on the idea to get better results with mining if exists and the results are compared with the well known K-means clustering algorithm, results proves that both the methods out performs k-means algorithm. Index Terms- HDMNS (Hybrid Data Mining based Neighborhood Search), HMDMNS (Hybrid Multiple Data Mining based Neighborhood Search, neighborhood search (NS), Data mining (DM), Clustering

93

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME

I. INTRODUCTION The p-median problem which is a combinatorial optimization problem in natural world is NP-Hard that realizes facilitators that serve the maximum locations [7, 8]. The pmedian problem will be practical in several applications such as growing marketing strategies in the realm of management sciences and locating server positions in computer networks (CNs) [12]. The use the p-median problem as a clustering technique. In this work a metaheuristic method k-Mean-GRASP is proposed to solve the p-median problem which is discussed in detail. The arrangement of similar objects into different groups is known as Clustering, i.e., the segregation of information into subsets (clusters), so that the elements of each subset possess some common mannerism. Data clustering is an important Data Mining task and is a general mechanism for analyzing statistical data, which is used in several other areas; such as mechanical process industries, machine learning, pattern recognition, image analysis and bioinformatics [15]. Metaheuristics embody a principal category of approximate practice for decipher of hard combinatorial optimization problems, for simplification of which the use of exact methods is impractical. There are several general purpose high-level procedures that can be instantiated to explore the solution space of a specific optimization problem efficiently. Earlier, metaheuristics, like genetic algorithms, tabu search, simulated annealing, ant systems, GRASP, and others, have been introduced and are applied to real-life problems in several areas of science. Many optimization problems are successfully applied to solve the GRASP (Greedy Randomized Adaptive Search Procedures) metaheuristic. The search process for identifying the solution employed by GRASP is iterative and each pass consists of two phases: construction and enhancement phase. In the construction phase a feasible solution is built, and then its neighbourhood is determined by the enhancement phase to find an improved one. The outcome is the paramount solution found over all iterations.
Procedure-Mean-GRASP() 1. Initialize best_sol as 2. repeat 3. sol k-Means(data points); 4. best_sol Enhancement (sol); 5. if cost(sol) > cost(best_sol) 6. best_sol sol; 7. end if 8. until Termination criterion; 9. return best_sol; Procedure-means (data points) 1. Initialize k-points as cluster centers 2. Assign each data point to the nearest cluster center 3. Recompute the cluster centers for each cluster as the mean of the cluster. 4. Repeat steps 2 and 3 until there is not any more change in the value of the means

Figure 1: K-Mean GRASP Procedure

Figure 2: K-Means algorithm

The NS technique reveals all possible combinations with the elements in the neighbourhood of individual elements in the solution and determine the optimal solution i.e., which serves the maximum locations so that the sum of the total distance from the each element to the facilities is minimized. The NS Approach Metaheuristic is an iterative one which contains two phases [16]. Construction phase is the first phase that structure the initial solution and based on the initial solution the second phase gawk at for the optimal solution based on NS approach and then the feasible solution space is computed to obtain the optimal

94

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME

solution. The NS Approach mechanism is illustrated in the Figure 3. It is having two phases: the construction phase and NB Search phase.
Procedure NSApproach () 1. optml_sol 2. sol Construction(data points); 3. best_sol NBSearch(sol); 4. if cost(sol) > cost(optml_sol) 5. optml_sol sol; 6. end if 7. until Termination criterion; 8. return optml_sol; Procedure HDMNS () 1. Initialize sol, optimal_sol, Mined_sol 2. Initialize sol_space 3. Initialize USS, FIS, UFIS 4. Read list, p 5. Sol NSApproach(list, p) 6. USS Update_sol_space (sol) 7. FIS Generate_frequent_items(USS) 8. UFIS Update_supportcount(FIS, USS) 9. Mined _ sol Generate _ mined _solution(UFIS) 10. Update optimal_sol
Figure 4: Hybridized Data Mining NS algorithm

Figure 3: NS Approach procedure

II. HYBRIDIZED DATA MINING NEIGHBOURHOOD SEARCH APPROACH AND HYBRID MULTIPLE DATA MINING BASED NEIGHBORHOOD SEARCH The HDMNS () procedure shown in the Figure 4 consists of three phases. The first phase NS Approach (), computes the initial solution using NS approach by considering the given list and user specified P [18]. The second phase is the application of DM technique to the basic realistic solution. The obtained solution in first phase is input for the second phase which is used to generate the updated solution space (USS) that consists of all the probable solutions generated using the result obtained in the NSApproach (). Based on USS, the Frequent item set (FIS) are generated which consists the set of all individual items that are present in the USS [4, 5 and 9]. Currently, support count is calculated and updated for each item in the FIS. After updating support count, sort the frequent items in the decreasing order of support count and then the mined solution is constructed by deliberating the items with high support count until the size of the solution or the number of items exactly equals to P. The final phase is used to obtain the optimal solution. The optimal solution is updated using enhancement phase described in NSApproach which examines the global optimal solution that optimizes the objective of the given p-median problem [13, 14]. HMDMNS described in the Figure 5, is the Metaheuristic based on Neighbourhood Search (NS) hybridized with Data Mining Technique (HDMNS) multiple times iteratively with Frequent Mining to provide an updated solution to p-median problem if exits in each iteration [6]. The resulting local optimal solution from NS method serves as a basis for identification of practical solution space that holds different possible solutions of similar size and by the application of frequent mining technique on it results in identification of frequent items. Basing on the support count, most practical solution is identified. The frequent mining technique is applied only one time in HDMNS and yields a better solution. In HMDMNS it is applied constantly several times based on the inspiration that on applying the mining technique one time derives a better solution. Then applying the same over and over again will definitely yield optimal one.
95

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME
Procedure HMDMNS () 1. sol, optimal_sol, Mined_sol 2. sol_space 3. USS, FIS, UFIS 4. Read list, p 5. Sol NSApproach(list, p) 6. Repeat 7. USS Update_sol_space (sol) 8. FIS Generate_frequent_items(USS) 9. UFIS Update_supportcount(FIS, USS) 10. Mined _ sol Generate _ mined _solution(UFIS) 11. Update optimal_sol 12. Go to step 6 until user specified times Procedure k-means (data points) 1. Initialize k-points as cluster centers 2. Assign each data point to the nearest cluster center 3. Recompute the cluster centers for each 6: Figure cluster as the mean of the cluster. 4. Repeat steps 2 and 3 until there is not any more change in the value of the means

Figure 5: Hybridized Data Mining NS algorithm III. HDMNS AND HMDMNS APPROACHES AS CLUSTERING METHODS, K-MEANS Of the most popular heuristics for solving clustering problems by k-means clustering Algorithm (CA) which is simple and widely used [2]. This algorithm segregates the data into k disjoint clusters. The center of each cluster is labeled as centroid. It partitions the objects so as to minimize the sum total of the squared distances between the centroid of the clusters and their objects. The k-means algorithm is described in the Figure 6. The cluster quality computed for k-means is same as p-median problem objective function value. So, the solution approaches proposed for p-median can also be considered as new clustering mechanisms which are efficient than k-means. And the cluster quality estimation for the k-means algorithm is the squared error criterion is given by | | where E is the sum of square-error for all objects in the database, P is the point in the space representing the given object, and mi is the mean of the cluster Ci. This criterion makes the resulting k clusters as solid. IV. RESULTS The experimental results acquired for k-GRASP and k-MEANS are presented in this section, and the results are com-pared on the bases of solution quality against k. Experiments are conducted on data sets with 50, 75, 100 points. Results are tabulated and graphs are plotted. The data sets under study are taken from the web site of Professor Eric Taillarrd, University of Applied Sciences of Western Switzerland. The companion website for pmedian problems instances is http://mistic.heigvd.ch/taillard/problemes.dir/location.html. Here quality of the solution (cluster) i.e., sum of the distances from each customer location to its closest facility (cluster center) is measured for both k-means algorithm and the hybridized k-mean-GRASP algorithm. In Figure 7 solution/cluster quality is compared using both algorithms k-Means and k-Mean-GRASP for the data set of size 50 with number of facility locations (cluster centers) incremented by 5.In Figure 8 solution/cluster quality is compared using both algorithms k-Means and k-Mean-GRASP for the data set of size 75 with number of facility locations (cluster centers) incremented by 10.
96

International Journal of Computer Engineering and Technology (IJCET), ISSN 09760976 6367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME

Figure 7: k-Mean Vs K-Mean Mean GRASP GRAS

Figure 8: k-Mean Vs k-Mean Mean GRASP

The experimental results obtained for NS approach, HDMNS and HM MDMNS are analyzed in this section, and the results are evaluated on the basis of quality of the solution against P. . Experiments are carried out on data sets with 15, 25, 50and 50 75 points. Results are tabularized and graphs are outlined. The origin of data sets under study is acquired from the web site of Professor Eric Taillarrd, Kent University of Applied Sciences of Western Switzerland. The associated website for p-median p median problem instances is http://mistic.heig-vd.ch/taillard/problemes.dir/location.html. vd.ch/taillard/problemes.dir/location.html. In Graph-1 the cluster quality is compared for algorithms HDMNS, HMDMNS and k-means k means algorithms with intervals 5 for p and n=25. It is identified that HMDMNS is resulting with good quality cluster than the other two.

97

International Journal of Computer Engineering and Technology (IJCET), ISSN 09760976 6367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME

98

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME

The same is observed in the remaining graphs plotted for cluster quality with n=50, 15 and 75 with p= 15, 3 and 20 respectively in Graphs:2, 3 and 4. Execution times are also compared for all the three algorithms for p=5, 15 and 3 and identified that HMDMNS can also be used as a better clustering algorithms than existing k-means algorithm. It is depicted in Graphs 5 and 6. V. CONCLUSION It is observed that in all the test cases, Hybrid Multiple Data Mining Neighbourhood Search (HMDMNS) Metaheuristic performs much better as a clustering algorithm (CA) when compared with the efficient existing method k-means. So instead of using k-means; HMDMNS can be used as a new clustering mechanism. REFERENCES [1] [2] [3] [4] R. Agrwal and R. Srikanth, Fast algorithms for mining association rules, Proceedings of the Very Large Data Bases Conference, pp. 487-499, 1994. T. A. Feo and M. G. C. Resende, A probabilistic heuristic for a computationally difficult set covering problem, Operational Research Letters, 8 (1989), pp.67-71. M. D. H. Gamal and Salhi, A cellular heuristic for the multisource Weber Problem, computers & Operations Research, 30 (2003), pp.1609-1624. B. Goethals and M. J. Zaki, Advances in Frequent Item set Mining Implementations: Introduction to FIMI03, Proceedings of the IEEEICDM workshop on Frequent Item set Mining Implementations, 2003. G. Grahne and J. Zhu, Efficiently using prefix-trees in mining frequent item-sets, Proceedings of the IEEEICDM Workshop on Frequent Item set Mining Implementations, 2003. J. Han and M. Kamber, Data Mining: Concepts and Techniques, 2nd Ed., Morgan Kaufman Publishers, 2006 O. Kariv and L. Hakimi, An algorithmic approach to network location problems, part ii: the p-medians, SIAM Journal of Applied Mathematics, 37 (1979), pp.539-560 N. Mladenovic, J. Brimberg, P. Hansen and Jose A. Moreno-Perez, The p-median problem: A survey of metaheuristic approaches, European Journal of Operational Research, 179 (2007), pp.927-939. S. Orlando, P. Palmerimi and R. Perego, Adaptive and resource- aware mining of frequent sets, Proceedings of the IEEE International conference on Data Mining, pp.338-345, 2002 M. H. F. Ribeiro, V. F. Trindade, A. Plastino and S. L. Martins, Hybradization of GRASP metaheuristic with datamining techniquess, Proceedings of the ECAI Workshop on Hybrid Metaheuristics, pp.69-78, 2004. E. G. Talbi, A taxonomy of hybrid metaheuristics, Journal of Heuristics, 8 (2002), pp.541-564. B. C. Tansel, R. L. Fransis, and T. J. Lowe. Location on networks: A survey, Management Science, 29 (1983), pp.482-511. Mohd Belal Al-Zoubi, Ahmed Sharieh, Nedal Al-Hanbali and Ali Al-Dahoud, A Hybrid Heuristic Algorithm for Solving the P-Median Problem. Journal of Computer Science (Special Issue) 80-83, 2005, Science Publications.
99

[5]

[6] [7] [8]

[9]

[10]

[11] [12] [13]

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME

[14] Alexndre Plastino, Eric R Fonseca, Richard Fuchshuber, Simone de L Martins, Alex.A.Freitas, Martino Luis and Said Salhi A Hybrid Datamining Metaheuristic for the p-median problem. Proceedings of SIAM journal (Data Mining), 2009. [15] D.Srinivas Reddy, Dr A.Govardhan, S.S.V.N Sharma, a Nodal Relational Approach to Data Virtualization. International conference on Cloud Computing and E-governance, (Bangkok, Thiland). [16] D.Srinivas Reddy, Dr A.Govardhan, S.S.V.N Sharma, Metaheuristic Approach based on Neighborhood Search for Solving p-Median Problem, IOSR Journal of Computer Engineering (IOSRJCE), Volume 7, Issue 1 (Nov-Dec. 2012), PP. 01-05. [17] D.Srinivas Reddy, Dr A.Govardhan, S.S.V.N Sharma, Hybrid K-mean GRAP for Partition based Clustering of Two Dimensional Data Space as an application of pMedian Problem, International Journal of Engineering Sciences Research (IJESR), Volume 3, Issue 12, December-2012. [18] D.Srinivas Reddy, Dr A.Govardhan, S.S.V.N Sharma, Hybridization of Neighborhood Search Metaheuristic with Data Mining Technique to solve p-median Problem, International Journal of Computational Engineering Research (IJCER), Vol. 2, Issue 7, November-2012. [19] R. Lakshman Naik, D. Ramesh and B. Manjula, Instances Selection using Advance Data Mining Techniques, International Journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 2, 2012, pp. 47 - 53, ISSN Print: 0976 6367, ISSN Online: 0976 6375. [20] M. Karthikeyan, M. Suriya Kumar and Dr. S. Karthikeyan, A Literature Review on the Data Mining and Information Security, International Journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 1, 2012, pp. 141 - 146, ISSN Print: 0976 6367, ISSN Online: 0976 6375.

100

Вам также может понравиться