Академический Документы
Профессиональный Документы
Культура Документы
CHAPTER 3
3.1. Overview
Circuit partitioning is one of the critical areas of VLSI Physical design automation. The
principle function of VLSI circuit partitioning is to divide the circuit into a number of
within the hierarchy. The presence of hierarchy gives rise to natural clusters of cells.
However, most of the widely used algorithms tend to ignore this fact and divide the net
This chapter begins by discussing an overview of existing approaches to the graph and
hypergraph partitioning problems. Since graph and hypergraph partitioning are NP-
complete problems, this study takes into consideration the need for developing various
clustering models that yield good sub-optimal solutions. As a solution to this problem, a
different clustering approach to solve VLSI circuit partitioning problem has been
proposed. Useful sub-circuits with the lowest amount of interconnection between them
were studied to bridge the gap. Finally, three different data clustering methods namely
Clustering is one of the techniques for data mining from data warehouses. In the
clustering process, a set of objects is partitioned into subsets, also known as clusters,
36
ensuring that objects are similar within a cluster, but dissimilar with objects of different
subsets [45]. The cluster is considered to be more distinct when the objects within a
cluster is very similar (or having more homogeneity) and very dissimilar between the
other groups. A data represented by few clusters will inevitably lose the finer details, but
at the same time, achieve simplification. The clustering technique is used in analysis of
gene expression data [46], data compression [47], anomaly detection [48], statistical data
In addition to the term clustering, several other terms with similar meanings are used,
concept [51]-[54]. Cluster analysis is used to solve a problem. The solution can be
The most common concept of clusters involves groups that have a short distance between
the subsets, dense areas of the data space with specific statistical distributions. Thus, a
individual data set dictates the types of algorithm and parameters used. Moreover,
objective optimization that works on trial and error mode; as a consequence, it is not an
automatic process. Preprocessing and parameters are modified till the desired properties
are achieved. Clustering methods are usually based on measuring distances between
records and the clusters. Records are assigned to the respective clusters using a
37
methodology that tends to reduce the distance between records that belong to the same
cluster. The clusters or subsets or segments created through the clustering models are
According to Duda, et al. [55], clustering helps to reduce the amount of data by grouping
or categorizing similar data sets into one group. The main objective of using clustering
algorithms is to construct automated tools which can support the creation of categories or
taxonomies. The automation in turn will help in reducing the human intervention in the
Clustering methods are usually classified into two basic types: hierarchical and partitional
[53], [56]-[61]. Large subtypes are present in both of these types and entirely different
algorithms are used to find the clusters. Figure 3.1 illustrates the clustering methods.
build a hierarchy of clusters. The strategies for hierarchical clustering generally fall into
places each object in its own cluster) into larger clusters with a “bottom-up”
approach where each observation starts in its own cluster, and pairs of clusters are
Divisive: Proceeds by splitting larger groups of clusters into smaller ones with a
“top-down” approach, where all observations start in one cluster, and splits are
These methods produce a tree of clusters commonly called as dendrogram that reveals the
way in which these clusters are related to each other. There are three types of hierarchical
algorithms: Single-linkage method – two clusters are merged based on the closest pair
made up of individuals from each cluster; Complete linkage method – two clusters are
39
merged based on the distant pair made up of individuals from each cluster; and Group
method – two clusters are merged based on the average distance between all pairs of
Partitional clustering [58], on the other hand, is used to directly decompose the single
data set into a set of smaller disjoint clusters. In this clustering, an integer number of
partitions are determined which is used to optimize certain criterion function. The
optimization is an iterative process which highlights the local or global structure of the
data. Generally the global criteria involve minimizing of some measure of dissimilarity in
the samples within each cluster, while maximizing the dissimilarity of different clusters.
In K-means clustering [62], the criterion function is calculated as the average squared
Where c(xk) is the index of the centroid that is closest to xk and xk is index values . One
possible algorithm for minimizing the cost function begins by initializing a set of K
cluster centroids denoted by mi, i = 1,..., K.
The position of the mi is adjusted in an iterative manner by assigning the data samples
initially of according to the nearest available clusters and then recomputing the respective
centroids. The iteration will be stopped automatically when E does not change any more.
In such an alternative algorithm every sample that is chosen in a random manner is
considered in succession, and the nearest centroid is updated. The above equation stated
is also used to represent the objective of a related method called vector quantization [63]
- [65].
40
and is also perceived as one of the major problems of clustering. Most available
clustering algorithms prefer certain predefined cluster shapes, and the related algorithms
will always assign the data to clusters of such shapes even if there were no clusters found
in the data. When the goal is to compress the data set as well as to make inferences about
its cluster structures, then it is important to ascertain beforehand if the data set displays a
The number of clusters used is important as depending on the number different kinds of
else some of the clusters may even be left empty if their centroids lie initially far from the
distribution of data.
Though clustering is used to induce categorization and minimize the amount of data, the
internal categories have only limited value. It is essential that the clusters are analyzed in
a manner in which the concepts are understood. Shih, et al. [66] gave an example of K-
Further, hierarchical clustering algorithms may not be a suitable method for clustering
Many clustering algorithms have been proposed by researchers till now to combine
smaller groups of data into larger clusters in various domains. The performance of these
clustering methods has been found to be highly effective for pure numeric data or for
41
pure categorical data. However, they do not perform well on mixed numeric and
The two-step clustering method was first proposed to find clusters of mixed numeric data.
It was designed to analyze large datasets [67]. In the two-step method, the data items that
are available in categorical attributes are first validated to build similarity and then
converted into numeric attributes based on the constructed relationship. The two-step
method is different from the traditional clustering method in that it handles both
continuous and categorical variables and automatically evaluates the optimum number of
sequential manner.
The two-step clustering method uses two-stage approach where the algorithm attempts a
procedure that is very similar to the K-means algorithm (Figure 3.3). The algorithm then
objects sequentially to form homogenous clusters, which is built by a cluster feature tree
Outlier
Pre-
Handling Clustering
Clustering
(Optional)
In the traditional method, every attribute in the clustering algorithm is treated as a single
entity without considering the relationships among them. In order to overcome the
inefficiencies in the two-cluster method, Shih, et al. [66] proposed a two-step method that
cluster objects. They believe that their method can be used on cluster mixed numeric and
categorical data.
Two-step clustering method has been used in various fields in the past. Schiopu [71] had
proposed the use of SPSS Two-step clustering method to analyze information about the
Agglomerative clustering has been used ever since 1950s as a clustering strategy [72],
[73], which then found its way into biological taxonomy to classify organisms [65]. As
object forms clusters and the two closest clusters will combine till only one cluster is
agglomerative strategy. There are three strategies used to form the agglomerative
clusters. In the single-linkage strategy, the distance between two clusters is defined as the
distance between their closest pair of data objects. In case of the complete linkage
strategy, the distance between two clusters is defined as the distance between their
furthest pair of data objects. In the average linkage strategy, the distance is defined as the
average distance between data objects from the two clusters [74].
43
Divisive clustering runs repeatedly to divide large clusters into small sub-clusters till it
approach, which is more complex compared to the bottom-up approach (Figure 3.4). The
divisive approach uses an efficient second and flat-clustering algorithm for sub-routine to
continue further. This approach is used from the top when all the documents are found in
one cluster. The first cluster is split using the flat-clustering algorithm and then applied
recursively to each of the document till a single cluster is obtained. The criteria used to
stop the analysis includes a specific number of iterations, a maximum number of levels to
which the data set is divided, and a minimum required number of instances for further
partitioning.
Divisive method can be used even when a complete hierarchy is not generated till the
method is better as it runs faster and uses at least quadratic. Further, agglomerative
clustering makes decisions depending on the local patterns and does not consider the
global distribution in the initial stages. Once decision is made, it cannot be undone.
While, in the case of decisive clustering, partitional decisions are taken based on the
44
complete information with respect to the global distribution and hence produces efficient
results [75].
Partitioning methods are based on specifying an initial number of groups, and iteratively
combine or divide existing groups, creating a hierarchical structure that reflects the order
method of cluster analysis which seeks to build a hierarchy of clusters. Strategies for
own cluster, and pairs of clusters are merged as one moves up the hierarchy.
Divisive: This is a "top-down" approach. All observations start in one cluster, and
In general, the merges and splits are determined in a greedy manner. The results of
complexity of agglomerative clustering is O (n3), which makes them too slow for large
datasets.
In the agglomerative hierarchical clustering, in the beginning, each item x1,..., xn is in its
own cluster C1,..., Cn. This is repeated till only one cluster is left. After which the clusters
merge with the nearest clusters, for example, Ci and Cj resulting in a cluster tree. The
45
cluster tree can be cut at any level to produce a set of new clustering. Figure 3.5
The properties of the hierarchy within the final cluster are listed out below [76]:
The clusters that are generated in the early stages are nested with those generated
Clusters with different sizes in the tree can be valuable for geographic knowledge
discovery.
Advantages
It can produce an ordering of the objects, which may be informative for data
display.
Disadvantages
No provision can be made for a relocation of objects that may have been
Use of different distance metrics for measuring distances between clusters may
clustering, but in the opposite direction. This method starts with a single cluster
containing all objects, and then successive splits resulting clusters until only clusters of
individual objects remain. Figure 3.6 illustrates an example of divisive clustering method.
The K-medoids algorithm is a clustering algorithm related to the k-means algorithm and
the medoid shift algorithm. In the case of K-means algorithm, optimal solution is attained
in clustering quality. Moreover, its evolution depends highly on the defining the first
using medoids to represent the cluster rather than centroid. A medoid is the most
centrally located data object in a cluster. Here, k data objects are selected randomly as
medoids to represent k cluster and remaining all data objects are placed in a cluster
having medoid nearest (or most similar) to that data object. After processing all data
objects, new medoid is determined which can represent cluster in a better way and the
entire process is repeated. Again all data objects are bound to the clusters based on the
new medoids. In each iteration, methods change their location step by step, or in other
words, medoids move towards every iteration. This process is continued until no more
medoid move. As a result, k clusters are found representing a set of n data objects. Figure
The K-means technique uses the centroid for representing the cluster, and it is highly
sensitive to a node that lies far from the observant (outliers). This issue is addressed by
using K-medoid cluster technique. Medoids, instead of centroid, are used in this
technique to represent the cluster. A medoid is the most centrally located node in a cluster
(Figure 3.7).
48
Strengths
K-medoid is more robust than K-means in the presence of noise and outliers, because a
This study made an attempt to understand the best method for clustering. The
In order to compare the performance of the proposed method, three methods were used in
1. Two-Step clustering
2. Hierarchical clustering
3. K-mediods
49
The clustering algorithms were compared based on the following factors: the size of the
data set and run time. For each factor, four tests were conducted: one for each algorithm.
The partitioning problem was applied to the circuit as shown in Figure 3.8. Its run time
analysis and data analysis are shown in Figures 3.9 and 3.10, respectively. The Figure 3.8
which gives information about small VLSI Circuit. It is converted into hypergraph
partitioning circuit. Three clustering methods are used to obtain sub circuits with lowest
amount of interconnection. This is tested through visual C++ Software in Visual studio
IDE Environment.
3.11. Discussion
The main objective of the study is to compare the K-medoids clustering with Two-step
and hierarchical clustering. The results were analyzed based on the size of the data set
remaining all data objects are placed in a cluster having medoid nearest to that
data object.
After processing all data objects, the new method is determined which can
represent cluster in a better way and the entire process is repeated until no any
medoid move.
K-medoid clustering takes less execution time as shown in Figure 3.9 and more data
coverage as shown in Figure 3.10. Hence, this study proves that K-medoid clustering
3.12. Publication