Вы находитесь на странице: 1из 17

35

CHAPTER 3

CLUSTERING APPROACHES ON VLSI CIRCUIT PARTITIONING

3.1. Overview

Circuit partitioning is one of the critical areas of VLSI Physical design automation. The

principle function of VLSI circuit partitioning is to divide the circuit into a number of

sub-circuits with minimum interconnections between them. Designing complex logic

circuits requires sub-division of multi-million transistor design into manageable pieces

within the hierarchy. The presence of hierarchy gives rise to natural clusters of cells.

However, most of the widely used algorithms tend to ignore this fact and divide the net

list in a balanced portioning resulting in partitions which are not optimal.

This chapter begins by discussing an overview of existing approaches to the graph and

hypergraph partitioning problems. Since graph and hypergraph partitioning are NP-

complete problems, this study takes into consideration the need for developing various

clustering models that yield good sub-optimal solutions. As a solution to this problem, a

different clustering approach to solve VLSI circuit partitioning problem has been

proposed. Useful sub-circuits with the lowest amount of interconnection between them

were studied to bridge the gap. Finally, three different data clustering methods namely

two-step clustering, Hierarchical clustering and K-medoids clustering were considered in

dividing the circuits into sub-circuits.

3.2. Clustering Methods

Clustering is one of the techniques for data mining from data warehouses. In the

clustering process, a set of objects is partitioned into subsets, also known as clusters,
36

ensuring that objects are similar within a cluster, but dissimilar with objects of different

subsets [45]. The cluster is considered to be more distinct when the objects within a

cluster is very similar (or having more homogeneity) and very dissimilar between the

other groups. A data represented by few clusters will inevitably lose the finer details, but

at the same time, achieve simplification. The clustering technique is used in analysis of

gene expression data [46], data compression [47], anomaly detection [48], statistical data

analysis, recognition of patterns, machine language, image information, bio-informatics,

information retrieval [49] and structuring results of search engines [50].

In addition to the term clustering, several other terms with similar meanings are used,

which are numerical taxonomy, automatic classification, botryology and typological

analysis. Thus, clustering equates to an unsupervised learning which results in a data

concept [51]-[54]. Cluster analysis is used to solve a problem. The solution can be

obtained by a variety of algorithms that differ significantly in their concept of what

constitutes a cluster and how efficiently it could be found.

The most common concept of clusters involves groups that have a short distance between

the subsets, dense areas of the data space with specific statistical distributions. Thus, a

multi-objective optimization problem is solved by clustering. The proposed use of the

individual data set dictates the types of algorithm and parameters used. Moreover,

clustering uses an iterative process of discovering knowledge and is an interactive multi-

objective optimization that works on trial and error mode; as a consequence, it is not an

automatic process. Preprocessing and parameters are modified till the desired properties

are achieved. Clustering methods are usually based on measuring distances between

records and the clusters. Records are assigned to the respective clusters using a
37

methodology that tends to reduce the distance between records that belong to the same

cluster. The clusters or subsets or segments created through the clustering models are

used as inputs for the successive analyses.

According to Duda, et al. [55], clustering helps to reduce the amount of data by grouping

or categorizing similar data sets into one group. The main objective of using clustering

algorithms is to construct automated tools which can support the creation of categories or

taxonomies. The automation in turn will help in reducing the human intervention in the

process [56], [57].

Clustering methods are usually classified into two basic types: hierarchical and partitional

[53], [56]-[61]. Large subtypes are present in both of these types and entirely different

algorithms are used to find the clusters. Figure 3.1 illustrates the clustering methods.

Figure 3.1: Clustering methods


38

In data mining, hierarchical clustering is a method of cluster analysis which seeks to

build a hierarchy of clusters. The strategies for hierarchical clustering generally fall into

the following two types (Figure 3.2):

 Agglomerative [58] : Merges the smaller clusters (Hierarchical classification

places each object in its own cluster) into larger clusters with a “bottom-up”

approach where each observation starts in its own cluster, and pairs of clusters are

merged as one moves up the hierarchy.

 Divisive: Proceeds by splitting larger groups of clusters into smaller ones with a

“top-down” approach, where all observations start in one cluster, and splits are

performed recursively as one move down the hierarchy.

Figure 3.2: Agglomerative and divisive clustering

These methods produce a tree of clusters commonly called as dendrogram that reveals the

way in which these clusters are related to each other. There are three types of hierarchical

algorithms: Single-linkage method – two clusters are merged based on the closest pair

made up of individuals from each cluster; Complete linkage method – two clusters are
39

merged based on the distant pair made up of individuals from each cluster; and Group

method – two clusters are merged based on the average distance between all pairs of

individuals made from one individual from each cluster.

Partitional clustering [58], on the other hand, is used to directly decompose the single

data set into a set of smaller disjoint clusters. In this clustering, an integer number of

partitions are determined which is used to optimize certain criterion function. The

optimization is an iterative process which highlights the local or global structure of the

data. Generally the global criteria involve minimizing of some measure of dissimilarity in

the samples within each cluster, while maximizing the dissimilarity of different clusters.

In K-means clustering [62], the criterion function is calculated as the average squared

distance of the data items from their nearest cluster centroids,

Where c(xk) is the index of the centroid that is closest to xk and xk is index values . One
possible algorithm for minimizing the cost function begins by initializing a set of K
cluster centroids denoted by mi, i = 1,..., K.

The position of the mi is adjusted in an iterative manner by assigning the data samples
initially of according to the nearest available clusters and then recomputing the respective
centroids. The iteration will be stopped automatically when E does not change any more.
In such an alternative algorithm every sample that is chosen in a random manner is
considered in succession, and the nearest centroid is updated. The above equation stated
is also used to represent the objective of a related method called vector quantization [63]
- [65].
40

In these clustering methods, the interpretation of the clusters is considered to be difficult

and is also perceived as one of the major problems of clustering. Most available

clustering algorithms prefer certain predefined cluster shapes, and the related algorithms

will always assign the data to clusters of such shapes even if there were no clusters found

in the data. When the goal is to compress the data set as well as to make inferences about

its cluster structures, then it is important to ascertain beforehand if the data set displays a

clustering tendency [58].

The number of clusters used is important as depending on the number different kinds of

clusters may be obtained if K is changed. Cluster centroids should be initialized better

else some of the clusters may even be left empty if their centroids lie initially far from the

distribution of data.

Though clustering is used to induce categorization and minimize the amount of data, the

internal categories have only limited value. It is essential that the clusters are analyzed in

a manner in which the concepts are understood. Shih, et al. [66] gave an example of K-

means algorithm, wherein additional illustration methods are required to visualize a

cluster containing a centroid which has high dimensional variables.

Further, hierarchical clustering algorithms may not be a suitable method for clustering

data sets containing categorical attributes [58].

3.3. Two-Step Cluster Method

Many clustering algorithms have been proposed by researchers till now to combine

smaller groups of data into larger clusters in various domains. The performance of these

clustering methods has been found to be highly effective for pure numeric data or for
41

pure categorical data. However, they do not perform well on mixed numeric and

categorical types [62].

The two-step clustering method was first proposed to find clusters of mixed numeric data.

It was designed to analyze large datasets [67]. In the two-step method, the data items that

are available in categorical attributes are first validated to build similarity and then

converted into numeric attributes based on the constructed relationship. The two-step

method is different from the traditional clustering method in that it handles both

continuous and categorical variables and automatically evaluates the optimum number of

clusters. It works on the principle of building clusters followed by sub-clusters in a

sequential manner.

The two-step clustering method uses two-stage approach where the algorithm attempts a

procedure that is very similar to the K-means algorithm (Figure 3.3). The algorithm then

conducts a modified hierarchical agglomerative clustering procedure by combining the

objects sequentially to form homogenous clusters, which is built by a cluster feature tree

whose “leaves” represent distinct objects in the dataset [68]-[70].

Step1 Step2 Step3

Outlier
Pre-
Handling Clustering
Clustering
(Optional)

Figure 3.3: Two-step clustering


42

In the traditional method, every attribute in the clustering algorithm is treated as a single

entity without considering the relationships among them. In order to overcome the

inefficiencies in the two-cluster method, Shih, et al. [66] proposed a two-step method that

integrated the hierarchal and partitional clustering algorithm by adding attributes to

cluster objects. They believe that their method can be used on cluster mixed numeric and

categorical data.

Two-step clustering method has been used in various fields in the past. Schiopu [71] had

proposed the use of SPSS Two-step clustering method to analyze information about the

customers of a bank, dividing them into three clusters.

3.4. Agglomerative Clustering Algorithm

Agglomerative clustering has been used ever since 1950s as a clustering strategy [72],

[73], which then found its way into biological taxonomy to classify organisms [65]. As

discussed earlier, agglomerative clustering is a bottom-up clustering process. Every input

object forms clusters and the two closest clusters will combine till only one cluster is

available at the end creating a hierarchy of clusters.

It is essential to specify the distance measure among clusters to define a successful

agglomerative strategy. There are three strategies used to form the agglomerative

clusters. In the single-linkage strategy, the distance between two clusters is defined as the

distance between their closest pair of data objects. In case of the complete linkage

strategy, the distance between two clusters is defined as the distance between their

furthest pair of data objects. In the average linkage strategy, the distance is defined as the

average distance between data objects from the two clusters [74].
43

3.5. Divisive Clustering Algorithm

Divisive clustering runs repeatedly to divide large clusters into small sub-clusters till it

reaches a specified stopping point. Divisive clustering algorithm represents a top-down

approach, which is more complex compared to the bottom-up approach (Figure 3.4). The

divisive approach uses an efficient second and flat-clustering algorithm for sub-routine to

continue further. This approach is used from the top when all the documents are found in

one cluster. The first cluster is split using the flat-clustering algorithm and then applied

recursively to each of the document till a single cluster is obtained. The criteria used to

stop the analysis includes a specific number of iterations, a maximum number of levels to

which the data set is divided, and a minimum required number of instances for further

partitioning.

Figure 3.4: Divisive clustering tree

Divisive method can be used even when a complete hierarchy is not generated till the

individual document leaves. In comparison with agglomerative clustering algorithms, this

method is better as it runs faster and uses at least quadratic. Further, agglomerative

clustering makes decisions depending on the local patterns and does not consider the

global distribution in the initial stages. Once decision is made, it cannot be undone.

While, in the case of decisive clustering, partitional decisions are taken based on the
44

complete information with respect to the global distribution and hence produces efficient

results [75].

3.6. Hierarchical Clustering Method

Partitioning methods are based on specifying an initial number of groups, and iteratively

reallocating objective among groups to convergence. In contrast, hierarchical methods

combine or divide existing groups, creating a hierarchical structure that reflects the order

in which groups are merged or divided in data mining. Hierarchical clustering is a

method of cluster analysis which seeks to build a hierarchy of clusters. Strategies for

hierarchical clustering generally fall into two types:

 Agglomerative: This is a "bottom-up" approach. Each observation starts in its

own cluster, and pairs of clusters are merged as one moves up the hierarchy.

 Divisive: This is a "top-down" approach. All observations start in one cluster, and

splits are performed recursively as one moves down the hierarchy.

In general, the merges and splits are determined in a greedy manner. The results of

hierarchical clustering are usually presented in a dendrogram. In general, the time

complexity of agglomerative clustering is O (n3), which makes them too slow for large

datasets.

3.7. Agglomerative Hierarchical Clustering Method

In the agglomerative hierarchical clustering, in the beginning, each item x1,..., xn is in its

own cluster C1,..., Cn. This is repeated till only one cluster is left. After which the clusters

merge with the nearest clusters, for example, Ci and Cj resulting in a cluster tree. The
45

cluster tree can be cut at any level to produce a set of new clustering. Figure 3.5

illustrates an example of agglomerative hierarchical clustering method.

Figure 3.5: Agglomerative hierarchical clustering method

The properties of the hierarchy within the final cluster are listed out below [76]:

 The clusters that are generated in the early stages are nested with those generated

in the later stages.

 Clusters with different sizes in the tree can be valuable for geographic knowledge

discovery.

Advantages

 It can produce an ordering of the objects, which may be informative for data

display.

 Smaller clusters are generated, which may be helpful for discovery.


46

Disadvantages

 No provision can be made for a relocation of objects that may have been

incorrectly grouped at an early stage. The result should be examined closely to

ensure it makes sense.

 Use of different distance metrics for measuring distances between clusters may

generate different results. Performing multiple experiments and comparing the

results is recommended to support the veracity of the original results.

3.8. Divisive Hierarchical Clustering Method

Divisive hierarchical clustering method is a top-down clustering method, which is not

often used. Though the principle of divisive approach is similar to agglomerative

clustering, but in the opposite direction. This method starts with a single cluster

containing all objects, and then successive splits resulting clusters until only clusters of

individual objects remain. Figure 3.6 illustrates an example of divisive clustering method.

Figure 3.6: Divisive hierarchical clustering method


47

3.9. K-Medoids Clustering Method

The K-medoids algorithm is a clustering algorithm related to the k-means algorithm and

the medoid shift algorithm. In the case of K-means algorithm, optimal solution is attained

in clustering quality. Moreover, its evolution depends highly on the defining the first

position of centroids. On the contrary, K-medoids method overcomes this problem by

using medoids to represent the cluster rather than centroid. A medoid is the most

centrally located data object in a cluster. Here, k data objects are selected randomly as

medoids to represent k cluster and remaining all data objects are placed in a cluster

having medoid nearest (or most similar) to that data object. After processing all data

objects, new medoid is determined which can represent cluster in a better way and the

entire process is repeated. Again all data objects are bound to the clusters based on the

new medoids. In each iteration, methods change their location step by step, or in other

words, medoids move towards every iteration. This process is continued until no more

medoid move. As a result, k clusters are found representing a set of n data objects. Figure

3.7 illustrates the steps used in the K-medoids clustering.

The K-means technique uses the centroid for representing the cluster, and it is highly

sensitive to a node that lies far from the observant (outliers). This issue is addressed by

using K-medoid cluster technique. Medoids, instead of centroid, are used in this

technique to represent the cluster. A medoid is the most centrally located node in a cluster

(Figure 3.7).
48

Figure 3.7: K-medoids clustering

Strengths

K-medoid is more robust than K-means in the presence of noise and outliers, because a

medoid is less influenced by outliers or other extreme values than a mean.

3.10. Results and Observation

This study made an attempt to understand the best method for clustering. The

performance of K-medoids was compared with Two-step and Hierarchical clustering.

The results are provided in the subsequent sections.

3.10.1. Comparison of three Clustering methods

In order to compare the performance of the proposed method, three methods were used in

this study. They were

1. Two-Step clustering

2. Hierarchical clustering

3. K-mediods
49

The clustering algorithms were compared based on the following factors: the size of the

data set and run time. For each factor, four tests were conducted: one for each algorithm.

The partitioning problem was applied to the circuit as shown in Figure 3.8. Its run time

analysis and data analysis are shown in Figures 3.9 and 3.10, respectively. The Figure 3.8

which gives information about small VLSI Circuit. It is converted into hypergraph

partitioning circuit. Three clustering methods are used to obtain sub circuits with lowest

amount of interconnection. This is tested through visual C++ Software in Visual studio

IDE Environment.

Figure 3.8: Comparing clustering algorithms


50

Figure 3.9: Run time performance analysis of proposed clustering methods

Figure 3.10: Data analysis of proposed clustering methods


51

3.11. Discussion

The main objective of the study is to compare the K-medoids clustering with Two-step

and hierarchical clustering. The results were analyzed based on the size of the data set

and run time factor of each of the algorithms.

 K-medoids method overcomes this problem by using medoids to represent the

cluster rather than centroid.

 K data objects are selected randomly as medoids to represent k cluster and

remaining all data objects are placed in a cluster having medoid nearest to that

data object.

 After processing all data objects, the new method is determined which can

represent cluster in a better way and the entire process is repeated until no any

medoid move.

 As a result, k clusters are found representing a set of n data objects.

K-medoid clustering takes less execution time as shown in Figure 3.9 and more data

coverage as shown in Figure 3.10. Hence, this study proves that K-medoid clustering

achieves greater performance in comparison to Two-step and hierarchical clustering.

3.12. Publication

Based on the research work done on “Clustering approaches on VLSI Circuit

partitioning” the following paper was published.

1. Manikandan, R. and Swaminathan, P. 2012, Comparative study of clustering methods

in VLSI circuit partitioning, International Conference on Electrical, Electronics &

Information Technology, 11th Nov 2012, Trivandrum, India.

Вам также может понравиться