Cluster

1
Clustering in Distributed Incremental
Estimation in Wireless Sensor Networks

Sung-Hyun Son, Mung Chiang, Sanjeev R. Kulkarni, Stuart C. Schwartz
Abstract
Energy efficiency, low latency, high estimation accuracy, and fast convergence are important goals in
distributed incremental estimation algorithms for sensor networks. One approach that adds flexibility in
achieving these goals is clustering. In this paper, the framework of distributed incremental estimation is
extended by allowing clustering amongst the nodes. Among the observations made is that a scaling law
exists where the estimation accuracy increases proportionally with the number of clusters. The distributed
parameter estimation problem is posed as a convex optimization problem involving a social cost function
and data from the sensor nodes. An in-cluster algorithm is then derived using the incremental subgradient
method. Sensors in each cluster successively update a cluster parameter estimate based on local data,
which is then passed on to a fusion center for further processing. We prove convergence results for the
distributed in-cluster algorithm, and provide simulations that demonstrate the benefits clustering for least
squares and robust estimation in sensor networks.
Index Terms
Distributed estimation, optimization, incremental subgradient method, clustering, wireless sensor net-
works.
This research was supported in part by the ONR under grant N00014-03-1-0290, the Army Research Office under grant DAAD19-
00-1-0466, Draper Laboratory under IR&D 6002 grant DL-H-546263, and the National Science Foundation under grant CCR-
0312413. Portions of this work were presented at the 2005 IEEE International Conference on Wireless Networks, Communications,
and Mobile Computing, Maui, HI, USA, June 13-17, 2005.

The authors are with the Department of Electrical Engineering, Princeton University, Princeton, NJ 08544 USA. Email: {sungson,
chiangm, kulkarni, stuart}@princeton.edu.

2
I. I NTRODUCTION
A wireless sensor network (WSN) is comprised of a fusion center and a set of geographically
distributed sensor nodes. The fusion center provides a central point in the network to consolidate
the sensor data while the sensor nodes collect information about a state of nature. In our scenario,
the fundamental objective of a WSN is to reconstruct that state of nature, e.g., estimation of a
parameter, given the sensor observations. Depending on the application and the resources of the
WSN, many possible algorithms exist that solve this parameter estimation problem.
One failsafe approach that accomplishes this objective is the centralized approach in which all
sensor nodes send their observations to the fusion center and allow the fusion center to make the
parameter estimation. The centralized scheme allows the most information to be present when
making the inference. However, the main drawback is the drainage of energy resources from each
sensor of the WSN [1]. In an energy constrained WSN, the energy expenditure of transmitting all
the observations to the fusion center might be too costly, thus making the method highly energy
inefficient. In our application, the purpose of a WSN is to make an inference, not collect all the
sensor observations.
Another approach avoids the fusion center altogether and allows the sensors to collaboratively
make the inference. This approach is referred to as the distributed in-network scheme, recently
proposed by Rabbat and Nowak [2]. First, consider a path that passes through all the sensor nodes
and visits each node only once. The path hops from one neighbor node to another until all the
sensor nodes are covered. Instead of passing the data along the sensor node path, a parameter esti-
mate is passed from node to node. As the parameter estimate passes through each node, each node
updates the parameter estimate with its own local observations. The distributed in-network ap-
proach significantly reduces the transmission energy for communication required by the network.
However, this approach has drawbacks in terms of latency, accuracy, and convergence. While the
centralized approach takes one iteration to have access to all data, the distributed approach takes
n iterations to have seen all the data captured by the network, where n is the number of sensors.
Also, the parameter estimate of the distributed in-cluster algorithm is less accurate when com-
3
pared to the parameter estimate of the centralized algorithm. In terms of the number of iterations,
the distributed in-network scheme converges slower than the centralized scheme. The distributed
in-network scheme remedies the issue of energy inefficiency, but suffers in terms of these other
performance parameters.
In this paper, we consider a hybrid form of the two aforementioned approaches. While the
former approach relies heavily on the fusion center and the latter approach eliminates the fusion
center altogether, we allow the fusion center to minimally interact with the sensor nodes. We
formulate a distributed in-cluster approach where the nodes are clustered and there exists a path
within each cluster that passes through each node only once as shown in Fig. 1. While the precise
mathematical formulation of the algorithm is stated in Section IV, roughly speaking, each cluster
operates similarly to the distributed in-network scheme. Within each cluster, every sensor node
updates its own parameter estimate based on its own local observations. Hence, each cluster has
its own parameter estimate. The sensor node that initiates the algorithm is designated to be the
cluster head. After completion of all the iterations within each cluster, the parameter estimate of
each cluster is then passed to the fusion center and averaged. Then, the fusion center announces
the average parameter value back to the cluster heads to start another set of cluster iterations if
necessary.
The purpose of clustering is to address the inherent inflexibility of both the centralized and the
in-network algorithms. For example, if the WSN application calls for the most accurate estimate
regardless of the communication costs, then the centralized algorithm would suffice. If the WSN
demands the most energy efficient algorithm irrespective of the other performance parameters like
latency, accuracy, and convergence speed, then the distributed in-network algorithm would be most
suitable. However, given a WSN application with specific accuracy demands or energy constraints,
are we able to develop an algorithm that is tailored to those desired performance levels? With the
distributed in-cluster algorithm, it is more feasible since the number of clusters, or equivalently
the size of the clusters, adds another dimension to the algorithm development process. We are also
able to adjust the cluster size to accommodate the WSN requirements.
Throughout the rest of the paper, we consider the following criteria in comparing the distributed
4
in-cluster, distributed in-network and centralized algorithms:
• Energy efficiency
• Latency
• Estimation accuracy
• Flexible tradeoff among accuracy, robustness, and speed of convergence
We show that the proposed distributed in-cluster algorithm adds a flexible tradeoff among all
the aforementioned criteria. Specifically, due to clustering, we are able to control the estimation
accuracy since the residual error scales as a function of the number of clusters. The inclusion
of clusters improves the scaling behavior of the estimation accuracy and latency. We use the
centralized and the distributed in-network algorithms as extreme cases of maximal and minimal
√
energy usage, respectively. For the special case where the WSN has n clusters with each cluster
√
having n sensors, we show that the transport cost of the distributed in-cluster algorithm has
the same order of magnitude as the distributed in-network algorithm. However, the latency and
√
accuracy improve by factor of n under this specific clustering situation.
The organization of the paper is as follows. Previous work in the areas of distributed incremental
estimation and clustering is discussed in Section II. We formulate the problem and provide two
concrete applications in Section III. The distributed in-cluster algorithm is precisely formulated
in Section IV and convergence analysis is discussed in Section V. Section VI surveys the benefits
of clustering on the performance parameters. Analytical results are verified in Section VII by
simulations that involve two applications, least-squares and robust estimation, followed by the
conclusion in Section VIII.
II. P REVIOUS W ORK
The ideas of incremental subgradient methods and clustering applied to distributed estimation
are quite prevalent in the literature. Incremental subgradient methods were first studied by Kibardin
[3] and then, more recently, by Nedić and Bertsekas [4]. Then, Rabbat and Nowak [2] applied
the framework used in [4] to handle the issue of energy consumption in distributed estimation
in WSNs. To further save on energy expenditure, Rabbat and Nowak [5] also implemented a
5
quantization scheme for distributed estimation, in which they showed that quantization does not
effect the rate of convergence for the incremental algorithms.
Clustering schemes have been implemented throughout the area of WSNs to provide a hierarchal
structure to minimize the energy spent in the system (see [6] and references therein). The purpose
of clustering is either to minimize the number of hops the data needs to arrive at a destination or
to provide a fusion point within the network to consolidate the amount of data sent. In our work,
clustering is applied to the framework of distributed incremental estimation to provide a better
scaling law for estimation accuracy and latency in relation to the number of clusters.
III. P ROBLEM F ORMULATION
Consider a WSN with n sensors with each sensor taking m measurements. The parameter
estimation objective of the WSN can be viewed as a convex optimization problem if the distortion
measure between the parameter estimate and the data is convex. The problem is
minimize f (x, θ)
(1)
subject to θ ∈ Θ
where f : Rnm+1 → R is a convex cost function, x ∈ Rnm is a vector of all the observations
collected by the WSN, θ is a scalar, and Θ is a nonempty, closed and convex subset of R. Note that
x are the constants while θ is the variable.
One method to decompose Problem (1) into a distributed optimization problem is to assume
that the cost function has an additive structure. The additive property states that the social cost
function given all the WSN data, f (x, θ), can be expressed as the normalized sum of individual
cost functions given only individual sensor data, fi (xi , θ). Hence, the problem becomes
1
Pn
minimize f (x, θ) = n i=1 fi (xi , θ)
(2)
where fi (xi , θ) : Rm+1 → R is a convex local cost function for the ith sensor only using its own
measurement data xi ∈ Rm .
6
Although the additive property does not hold for general cost functions, two important appli-
cations in estimation satisfy this property: least squares estimation and robust estimation with the
Huber loss function.
A. Least Squares Estimation
The simplest estimation procedure is least-squares estimation. For the classical least squares
estimation problem, the distortion measure is f (x, θ) = kx − θ1k2 , where k · k is the Euclidean
norm. Clearly, the least-squares distortion measure is convex and additive. Hence, the optimization
problem is formulated as follows,
1
Pn n 1 Pm p 2
o
minimize n i=1 m p=1 kxi − θk
(3)
Pm
where fi (xi , θ) = 1
m p=1 kxpi − θk2 and xpi denotes the pth entry in the vector of observations
from sensor node i. The beauty of least-squares estimation lies in its simplicity, but the technique is
prone to suffer greatly in terms of accuracy if some of the measurements are not as accurate as the
others. If some measurements have higher variances than other measurements, the least-squares
inference procedure does not take this effect into account. Thus, the least squares procedure is
highly sensitive to large deviations. To make the inference procedure more robust to these types of
deviations, the following robust estimation procedure is often used.
B. Robust Estimation
Another practical application is the robust estimation problem, which has the following form,
1
Pn n 1 Pm p
o
minimize n i=1 m p=1 fH (xi , θ)
(4)
where



kxpi − θk2 /2, if kxpi − θk ≤ γ
fH (xpi , θ) = (5)


γkxp − θk − γ 2 /2, if kxp − θk > γ
i i
7
Pm
and γ ≥ 0 is the Huber loss function constant [7]. Note, fi (xi , θ) = 1
m p=1 fH (xpi , θ). The
purpose behind robust estimation is to introduce a new distortion measure that puts more weight
on good measurements, and less weight on, or even discards, bad measurements. The parameter
γ sets the threshold for the measurement values around the parameter estimate θ, i.e., the values
within a γ–range of θ are considered good measurements and the values outside a γ–range of θ are
considered bad measurements. As γ → ∞, robust estimation reduces to least-squares estimation.
IV. D ISTRIBUTED I N -C LUSTER A LGORITHM
To solve a convex optimization problem like (1), the most common method used is a gradient
descent method. Given any starting point θ̂ ∈ dom f , update θ̂ by descending along the gradient
of f . To formalize this procedure, let
θ̂new = θ̂old + α∆θ̂ , (6)
where α is the step size and ∆θ̂ = −∇f . The convexity of the function f guarantees that a
local minimum will be a global minimum. However, if the function f is not differentiable, then a
subgradient can be used. A subgradient of any convex function f (x) at a point y is any vector g
such that f (x) ≥ f (y) + (x − y)T g, ∀x. For a differentiable function, the subgradient is just the
gradient.
Along with convexity, if the cost function has an additive structure, a variant of the subgradient
method can be used. This method is called the incremental subgradient method [8]. The key idea
of the incremental subgradient algorithm is to sequentially take steps along the subgradients of the
marginal function fi (xi , θ) instead of taking one large step along the subgradient of f (x, θ). In
doing so, the parameter estimate can be adjusted by the subgradient of each individual cost function
given only its individual observations. Following this procedure leads to the distributed in-network
algorithm [2]. The convergence results for the algorithm follow directly from the incremental
subgradient method.
We now describe the in-cluster algorithm. Consider a WSN with nC clusters and nS sensors per
cluster, where nS ∗ nC = n. Assume that nS and nC are factors of n. Note that the distributed
8
in-network algorithm can be viewed as a special case of the distributed in-cluster algorithm where
each sensor is its own cluster, nC = 1 and nS = n.
We use i = 1 to index sensor nodes, j to index clusters, and k to index the iteration number.
Let i = 0 represent the cluster head. Let fi,j (xi,j , θ) and φi,j,k denote the local cost function and
the parameter estimate at node i in cluster j during iteration k, respectively. For conciseness, we
suppress the dependency of f on the parameters in the notation and let fi,j (xi,j , θ) = fi,j (θ) and
f (x, θ) = f (θ). Also, let θk be the estimate maintained by the fusion center during iteration k. We
need to update θk for k = 1, 2, . . . , by local updates of φi,j,k .
Each iteration k of the distributed in-cluster algorithm proceeds as follows:
1) Fusion center passes the current estimate θk to the cluster heads in all clusters.
Cluster heads initialize φ0,j,k = θk , ∀j.
2) Incremental update is conducted in parallel in all the clusters. Within each cluster,
the updates are conducted through update paths that traverse all the nodes in each
cluster:
gi,j,k
φi,j,k = φi−1,j,k − αk (7)
n
where αk is the step size and gi,j,k is a subgradient of fi,j using the last esti-
mate φi−1,j,k and the local measurement data xi,j . This is denoted as gi,j,k ∈
δfi,j (φi−1,j,k ). If φi−1,j,k ∈

/ Θ, then project φi−1,j,k onto a nearest point in the a
priori constraint set Θ.
3) All clusters pass the last in-cluster estimate φnS ,j,k to the fusion center, which takes
P C
the average to produce the next estimate θk+1 = n1C nj=1 φnS ,j,k .
4) Repeat.
In step 3 of the distributed in-cluster algorithm, the fusion center may process the in-cluster
estimates {φnS ,j,k }∞

j=1 using a variety of methods, e.g. a weighted average depending on the signal
to noise ratio of the observations. By involving the fusion center, this allows more flexibility in the
9
algorithm development. In the convergence proofs, we will consider the case where an average of
the in-cluster estimates was performed.
V. C ONVERGENCE
Following the approach for the distributed incremental subgradient algorithm in [8], we show
convergence for the distributed in-cluster approach. The main difference in our proofs is the emer-
gence of the clustering values, nC and nS . In these proofs, we make reasonable assumptions that
the optimal solution exists and the subgradient is bounded as shown in the following statements.
Let the true underlying state of the environment be the (finite) minimizer, θ∗ , of the cost func-
tion. Also, assume there exists scalars Ci,j ≥ 0 such that Ci,j ≥ ||gi,j,k || for all i = 1, ..., nS ,
j = 1, ..., nC , and k.
We start with the following lemma that is true for each cluster parameter estimate {φi,j,k }.
Lemma 1: Let {φi,j,k } be the sequence of subiterations generated by Eq. (7). Then for all y ∈ Θ
and for k ≥ 0,
2 2 2αk ¡ ¢ αk2 2
||φi,j,k − y|| ≤ ||φi−1,j,k − y|| − fi,j (φi−1,j,k ) − fi,j (y) + 2 Ci,j ∀ i, j . (8)
n n
See the Appendix for the proof.
By summing all the inequalities in Eq. (8) over all i = 1, ..., nS and j = 1, ..., nC , we have the
following lemma for each parameter estimate {θk }.
Lemma 2: Let {θk } be a sequence generated by distributed in-cluster method. Then, for all
y ∈ Θ and for k ≥ 0
2αk ¡ ¢ α2 b 2
||θk+1 − y||2 ≤ ||θk − y||2 − f (θk ) − f (y) + 2 k C , (9)
nC n · nC
b2 = PnC {PnS Ci,j }2 .

where C j=1 i=1

2n2 (f (θk )−f (y))
Lemma 2 guarantees that the sequence {θk }∞
k=1 gets smaller provided αk < Cb2 . This
key lemma is a crucial step in proving all the subsequent theorems.

10
Theorem 1: Let {θk } be a sequence generated by the distributed in-cluster method. Then, for a
fixed step-size, αk = α, and using Lemma 2, we have

b2
αC
lim inf f (θk ) ≤ f (θ∗ ) + (10)
k→∞ 2n2
b2 = PnC {PnS Ci,j }2 .
where f (θ∗ ) = inf θ∈Θ f (θ) and C j=1 i=1
If all the subgradients gi,j,k are bounded by one scalar C, we have the following corollary.
Corollary 1: Let C0 = maxi,j Ci,j . It is evident that C0 ≥ kgi,j,k k for all i = 1, ..., nS and
j = 1, ..., nC . Then, for a fixed step-size, αk = α,
αC02
lim inf f (θk ) ≤ f (θ∗ ) + . (11)
k→∞ 2nC
Since the incremental subgradient method is a primal feasible method, a lower bound of f (θ∗ )
is always satisfied. Therefore, the sequence of estimates f (θk ) will eventually be trapped between
αC02
f (θ∗ ) and f (θ∗ ) + 2nC
. The fluctuation around the equilibrium,
αC02
R= , (12)
2nC
is the residue estimation error due to the fact that a constant step size is used for subgradient
methods.
Comparing Corollary 1 with both the standard results in [8] and the result used by Rabbat and
Nowak in [2], we observe that in our case, we have a smaller threshold tolerance. Using the same
assumptions as in Corollary 1, in the distributed in-network case,
αC02
lim inf f (θk ) ≤ f (θ∗ ) + . (13)
k→∞ 2
αC02
Thus, as k → ∞, the distributed in-network algorithm converges to a ( 2
- suboptimal) solution,
αC 2
whereas, the distributed in-cluster algorithm converges to a ( 2nC0 - suboptimal) solution. We ob-
serve a key advantage of the in-cluster approach: estimation accuracy is tighter by a factor of nC .
Even for medium scale sensor networks, a factor of nC can be an order of magnitude improvement.
The next theorem provides the necessary number of iterations, K, to achieve a certain desired
estimation accuracy.
11
Theorem 2: Let {θk } be a sequence generated by the distributed in-cluster method. Then, for a
fixed step-size α and for any positive scalar ²,

Ã !
1 αCb2
min f (θk ) ≤ f (θ∗ ) + +² (14)
0≤k≤K 2 n2
where K is given by
¹ º
nC ||θ0 − θ∗ ||2
K= . (15)
α²
Along the same lines as Theorem 1 and Corollary 1, we have the following corollary.
Corollary 2: Let C0 = maxi,j Ci,j . Then, for a fixed step-size α and for any positive scalar ²
µ ¶
∗ 1 αC02
min f (θk ) ≤ f (θ ) + +² (16)
0≤k≤K 2 nC
where K is given by Eq. (15).
By observing Eq. (16), we see that the index k refers to the parameter estimate obtained at
the end of each cluster cycle. Since each cluster has nS iterations, the total number of itera-
³ 2 ´ j k
1 αC0 nC ||θ0 −θ ∗ ||2
tions required for an accuracy of 2 nC + ² is nS α²
. In comparison, for the dis-
tributed in-network case, the total number of iterations required for an accuracy of 12 (αC02 + ²)
j k
−θ∗ ||2
is n ||θ0α² . Therefore, for both algorithms, the total number of iterations necessary is the
same order of magnitude, while the benefit of distributed in-cluster algorithm over the distributed
in-network is an accuracy improvement by a factor of nC .
Another natural extension is varying the step-size. For a fixed step-size, complete convergence
can not be achieved. The parameter estimates, {θk }, enter a limit cycle after an arbitrary number
of iterations. To force convergence to the optimal value, f (θ∗ ), the step-size can be set to diminish
at a rate inversely proportional to the number of iterations, αk = αk . More generally, we have the
following theorem.
Theorem 3: Let {θk } be a sequence generated by the distributed in-cluster method. Also, assume
that the step-size αk satifies

∞
X
lim αk = 0 and αk = ∞,
k→∞
k=0
12
then,
lim inf f (θk ) = f (θ∗ ) .

k→∞
VI. B ENEFITS OF C LUSTERING ON P ERFORMANCE PARAMETERS
A. Energy Efficiency
The main expenditure of energy for a WSN is in the cost of communication. This entails trans-
porting bits either from sensor to sensor or from sensor to fusion center. So, the transport cost
would be a good measure of the energy usage for a WSN.
For example, consider our original WSN consisting of n sensors where each sensor has m mea-
surements. The sensors are distributed randomly (uniformly) over one square meter. We use
bit-meters as the metric to measure the transport cost in the transmission of data.
In the centralized setting, all n sensors send their m observations to the fusion center, requiring
O(mn) bits to be transmitted over an average distance of O(1) meters. In total, the transport cost
is O(mn) bit-meters.
In the distributed in-network setting, all n sensors use their m observations to update the pa-
rameter estimate and pass the parameter estimate along the path that contains all the sensor nodes.
The distributed in-network method needs O(n) bits to be transmitted over an average distance of
√
O( √1n ). In total, the transport cost is O( n).
In the distributed in-cluster setting, the sensor network forms nC clusters with nS nodes per
cluster. This method requires O(n) bits to be transmitted over an average distance of O( √1n )
meters which accounts for the sensor to sensor transport cost and O(nC ) bits to be transmitted over
an average distance of O(1) meters which accounts for the sensor to fusion center transport cost.
√
Thus, the transport cost is O( n + nC ) bit-meters.
An interesting result arises when the cluster size and the number of sensors per cluster are equal,
√
nC = nS = n. For this case, the total transport cost of the distributed in-cluster algorithm
√
becomes O( n). The distributed in-cluster algorithm and the distributed in-network algorithm
√
have the same magnitude in terms of transport cost when nC = nS = n.
13
B. Latency
Latency is defined as the number of iterations needed to see all the data captured by the network.
For the centralized case, only one iteration is needed while for the in-network case, n iterations
are needed. However, the in-cluster algorithm, the latency of the WSN can be adjusted by the size
n
of the cluster. The latency for the in-cluster case reduces to nC
or more simply, nS iterations as
shown in Table I as shown in Table I.
C. Estimation Accuracy
By forming nC clusters in a WSN, estimation accuracy can be improved. For the fixed step-size
1
case, the estimation accuracy is reduced by a factor of nC
when compared to the distributed in-
network case as shown in Table I. The accuracy improvement by a factor of nC holds for both cases
where k tends toward infinity and k is finite as shown in Corollary 1 and Corollary 2, respectively.
VII. S IMULATIONS , VARIATIONS , AND E XTENSIONS
Consider a WSN with 100 sensors uniformly distributed over a region, each taking 10 mea-
surements each. The observations are independent and identically distributed from measurement
to measurement, observations are independent from sensor to sensor. If the sensors are working
properly, the measurements are distributed by a Gaussian distribution with mean 10 and variance 1
and if the sensors are defective, the measurements are distributed by a Gaussian distribution with
mean 10 and variance 100. This application can be viewed as a deterministic mean-location param-
eter estimation problem. The simulations assume that 10% of the sensor nodes are damaged and
a fixed step-size of αk = 0.4 is used. As summarized in this section, a variety of simulations are
conducted to verify the theorems and characterize other properties and tradeoffs in the proposed
distributed in-cluster algorithm.
A. Basic Simulations
Least squares estimation and Huber robust estimation are simulated, and the resulting conver-
gence behavior of the residual value is shown in Figs. 2 and 3, respectively. In both figures, the
14
distributed in-network method and the distributed in-cluster method are shown by a solid line while
the centralized method is shown by a dashed line. Since a fixed step size of αk = 0.4 is used, there
are residue fluctuations around the equilibrium.
In both estimation procedures, an increase in the number of clusters causes a decrease in fluctu-
ations. The precise data points confirm the theoretical prediction: the distributed in-cluster method
1
fluctuation is nC
smaller than the distributed in-network fluctuation. By observing the least squares
1
estimation plots in Fig. 2, when nC = 4 and nC = 10, the fluctuations are smaller by a factor of 4
1
and 10
, respectively, compared to the plot when nC = 1. In the robust estimation example of Fig.
3, we use a Huber parameter of γ = 1. The distributed in-cluster method fluctuation again shows
narrower fluctuations and is almost indistinguishable from the centralized estimation curve.
B. Accuracy, Robustness and Speed Tradeoff
To determine the tolerance bounds for the robust estimation procedure, the gradient of the Huber
P
loss function is calculated. Since fi (θ) = m1 m p
p=1 fH (xi , θ), it is clear that
||∇fi (θ)|| ≤ γ
by differentiation, while
||∇fi (θ)|| ≤ C0
by definition. Hence, C0 can be set to equal γ to provide an upper bound for the gradient.
This gives the following results that analytically characterize the tradeoff among three competing
criteria: accuracy, robustness, and speed of convergence for incremental estimation. Combining
Eq. (12) with the relation that C0 = γ, we have the following formula. For a given network size
n, the tradeoff among estimation error bound R, Huber robustness parameter γ, and constant step
size α is characterized by:
2nC R = αγ 2 . (17)
For example, to maintain a desired level of robustness γ, tighter convergence bounds (smaller
R) implies slower convergence speed (smaller α). As another example, to get tighter convergence
bounds, we would like R to be small. This can be achieved by either reducing α, which means
15
smaller step size and slower convergence speed, or reducing γ, which means accepting less reliable
data for Huber estimation and reducing the robustness as well as the speed of convergence, since
less reliable data are used for estimation. An illustrative example is shown in Fig. 4, where we
reduce γ by a factor of 2 (cf. Figs. 4a and 4b). This reduction of γ reduces the estimation error by
about a factor of 4 but also increases the convergence time by roughly a factor of 2. The nC term in
the tradeoff characterization in Eq. (17) again highlights an advantage of the in-cluster approach:
more clusters help achieve a more efficient tradeoff.
VIII. C ONCLUSION
We have presented a distributed in-cluster scheme for a WSN that uses the incremental subgra-
dient method. By incorporating clustering within the sensor network, we have created a degree
of freedom which allows us to tune the algorithm for energy efficiency, estimation accuracy, con-
vergence speed and latency. Specifically, in terms of estimation accuracy, we have shown that a
different scaling law applies to the clustered algorithm: the residual error is inversely proportional
√
to the number of clusters. Also, for the special case where a WSN with n sensors forms n clus-
ters, we are able to maintain the same transport cost as the distributed-in-network scheme, while
increasing both accuracy of the estimate and convergence speed, and reducing latency. Simulations
have been provided for both least squares and robust estimation.
We plan to extend our work by relaxing the independence assumption of sensor to sensor obser-
vations. In particular, in future work, we will consider a WSN scenario where the data within each
cluster are spatially correlated, while the data from cluster to cluster are independent.
R EFERENCES
[1] G. J. Pottie and W. J. Kaiser, “Wireless integrated network sensors,” Communications of the ACM, vol. 43, no. 5, pp. 51–58,
May 2000.
[2] M. Rabbat and R. Nowak, “Distributed optimization in sensor networks,” in Proceedings of the Third International Symposium
on Information Processing in Sensor Networks (IPSN 04), Berkeley, CA, 2004.
[3] V. M. Kibardin, “Decomposition into functions in the minimization problem,” Automation and Remote Control, vol. 40, pp.
1311–1323, 1980.
16
[4] A. Nedić and D. P. Bertsekas, “Incremental subgradient methods for nondifferentiable optimization,” Tech. Rep., Massachusetts
Institute of Technology, Cambridge, MA, 1999.
[5] M. Rabbat and R. Nowak, “Quantized incremental algorithms for distributed optimization,” IEEE Journal on Selected Areas
in Communications, vol. 23, no. 4, pp. 798–808, April 2005.
[6] S. Bandyopadhyay and E. Coyle, “An energy efficient hierarchical clustering algorithm for wireless sensor networks,” in
Proceedings of the 22nd Annual Joint Conference of the IEEE Computer and Communications Societies (Infocom 2003), San
Francisco, CA, 2003.
[7] P. J. Huber, Robust Statistics, John Wiley & Sons, New York, 1981.
[8] D. P. Bertsekas, Convex Analysis and Optimization, Athena Scientific, Belmont, MA, April 2003.
A PPENDIX
A. Proof of Lemma 1
Proof:
gi,j,k
||φi,j,k − y||2 = ||φi−1,j,k − αk − y||2
n
2αk α2
≤ ||φi−1,j,k − y||2 − (φi−1,j,k − y)gi,j,k + 2k ||gi,j,k ||2
n n
2 2αk αk2 2
≤ ||φi−1,j,k − y|| − (fi,j (φi−1,j,k ) − fi,j (y)) + 2 Ci,j .
n n
The last line of the proof uses the fact that gi,j,k is a subgradient of the convex function fi,j at
φi−1,j,k . Thus,
fi,j (φi−1,j,k ) ≥ fi,j (y) + (φi−1,j,k − y)gi,j,k .
B. Proof of Lemma 2
Proof:
° °2
° 1 XnC
¡ ¢ °
° °
||θk+1 − y||2 = ° φnS ,j,k − y °
° nC °
j=1
° °2
° 1 XnC
¡ ¢°
° °
= ° φnS ,j,k − y °
° nC °
j=1
nC
1 X
≤ ||φnS ,j,k − y||2
nC j=1
nC ½ ¾
1 X 2 2αk ¡ ¢ αk2 2
≤ ||φnS −1,j,k − y|| − fnS ,j (φnS −1,j,k ) − fnS ,j (y) + 2 Ci,j
nC j=1 n n
17
where in the third and fourth lines of the proof, we used the Quadratic Mean-Arithmetic Mean
inequality and Lemma 1, respectively. After recursively decomposing ||φi,j,k − y||2 nS times, we
get
nC
( nS nS
)
1 X 2αk
X ¡ ¢ α 2 X
||θk+1 − y||2 ≤ ||φ0,j,k − y||2 − fi,j (φi−1,j,k ) − fi,j (y) + 2k C2
nC j=1 n i=1 n i=1 i,j
nC X nS nC X nS
2αk X ¡ ¢ α2 X
= ||θk − y||2 − fi,j (φi−1,j,k ) − fi,j (y) + 2 k 2
Ci,j
n · nC j=1 i=1 n · nC j=1 i=1
nC X nS
22αk X ¡ ¢
= ||θk − y|| − fi,j (φi−1,j,k ) − fi,j (y) + fi,j (θk ) − fi,j (θk )
n · nC j=1 i=1
nC X nS
αk2 X 2
+ 2 Ci,j .
n · nC j=1 i=1
Then, using the fact that

n n
1X C X S
f (θk ) = fi,j (θk ) ,

n j=1 i=1
the expression simplifies to
( nC X nS
)
2αk 1X ¡ ¢
||θk+1 − y||2 = ||θk − y||2 − f (θk ) − f (y) + fi,j (φi−1,j,k ) − fi,j (θk )
nC n j=1 i=1
nC X nS
αk2 X
+ 2 C2 .
n · nC j=1 i=1 i,j
Then,
( nC X nS
)
2αk 1X
||θk+1 − y||2 ≤ ||θk − y||2 − f (θk ) − f (y) + Ci,j ||φi−1,j,k − θk ||
nC n j=1 i=1
nC XnS
αk2 X 2
+ 2 Ci,j
n · nC j=1 i=1
( nC X nS i−1
)
2α k 1 X α k
X
≤ ||θk − y||2 − f (θk ) − f (y) + Ci,j · Cm,j
nC n j=1 i=2 n m=1
nC X nS
αk2 X
+ C2
n2 · nC j=1 i=1 i,j
2αk ¡ ¢
= ||θk − y||2 − f (θk ) − f (y)
n
n
(C n ( i−1 ) n )
αk2 X C X S X XS
2
+ 2 Ci,j Cm,j + Ci,j
n2 · nC j=1 i=2 m=1 i=1
nC
(n )2
2α k ¡ ¢ α 2 X X S
= ||θk − y||2 − f (θk ) − f (y) + 2 k Ci,j

nC n · nC j=1 i=1
18
where in the second and third inequalities, we used the fact that
fi,j (θk ) − fi,j (φi−1,j,k ) ≤ Ci,j ||φi−1,j,k − θk ||
and
i
αk X
||φi,j,k − θk || ≤ Cm,j , i = 1, ..., nS ,
n m=1
respectively.
C. Proof of Theorem 1
Proof: Proof is by contradiction. If Thm. 1 is not true, then there exists an ² > 0 such that
b2
αC
lim inf f (θk ) > f (θ∗ ) + + 2² .
k→∞ 2n2
Let z ∈ Θ be the value so that
b2
αC
lim inf f (θk ) ≥ f (z) + + 2² ,
k→∞ 2n2
and let k0 be sufficiently large so that for all k ≥ k0 , we have
f (θk ) ≥ lim inf f (θk ) − ² .

k→∞
Then, by combining the above two relations with Lemma 2 and setting y = z, we have, for all
k ≥ k0
2α²
||θk+1 − z||2 ≤ ||θk − z||2 − .
nC
Therefore,
2α²
||θk+1 − z||2 ≤ ||θk − z||2 −
nC
4α²
≤ ||θk−1 − z||2 −
nC
..
.
2(k + 1 − k0 )α²
≤ ||θk0 − z||2 −
nC
which cannot hold for sufficiently large k.

19
D. Proof of Theorem 2
Proof: Proof is by contradiction. Assume that for all k with 0 ≤ k ≤ K, we have

Ã !
1 α b2
C
f (θk ) > f (θ∗ ) + +² .
2 n2
By setting αk = α and y = θ∗ in Lemma 2 and by combining that with the above relation, we have
for all k with 0 ≤ k ≤ K,
2α b2
α2 C
||θk+1 − θ∗ ||2 ≤ ||θk − θ∗ ||2 − (f (θk ) − f (θ∗ )) + 2
nC n · nC
Ã Ã !!
2α 1 αC b2 b2
α2 C
≤ ||θk − θ∗ ||2 − + ² −
nC 2 n2 n2 · nC
α²
= ||θk − θ∗ ||2 − .
nC
If we sum the above inequalities over k for k = 0, 1, . . . , K, we have
(K + 1) α²
||θK+1 − θ∗ ||2 ≤ ||θ0 − θ∗ ||2 − .
nC
Thus, it is evident that

(K + 1) α²
||θ0 − θ∗ ||2 − ≥0,
nC
which contradicts the definition of K.
E. Proof of Theorem 3
Proof: Proof is by contradiction. If Thm. 3 does not hold, then there exists an ² > 0 such
that
lim inf f (θk ) − 2² > f (θ∗ ).

k→∞
Then, using the convexity of f and Θ, there exists a point z ∈ Θ such that
lim inf f (θk ) − 2² ≥ f (z) > f (θ∗ ).

k→∞
Let, there be a k0 large enough that for all k ≥ k0 , we have
f (θk ) ≥ lim inf f (θk ) − ² .

k→∞
20
Then, by combining the above two relations, we have for all k ≥ k0 ,
f (θk ) − f (z) ≥ ².
By setting y = z in Lemma 2 and by combining that with the above relation, we obtain for all
k ≥ k0 ,
2αk b2
α2 C
||θk+1 − z||2 ≤ ||θk − z||2 − ² + 2k
nC n · nC
Ã !
αk α k
b
C 2
= ||θk − z||2 − 2² − .
nC n2
Since αk → 0, we may assume that k0 is large enough so that
b2
αk C
2² − ≥ ², ∀k ≥ k0 .
n2
Thus, for all k ≥ k0 , we have
αk ²
||θk+1 − z||2 ≤ ||θk − z||2 −
nC
(αk + αk−1 )²
≤ ||θk−1 − z||2 −
nC
..
.
k
2 ² X
≤ ||θk0 − z|| − αj ,
nC j=k
0
which cannot hold for sufficiently large k.

21
qk
Fusion Center
Fig. 1. Illustration of a sensor network implementing the distributed in-cluster algorithm. The dash-dotted lines represent the
borders of the clusters. The shaded nodes represent the cluster heads that communicate with the fusion center. All clusters run the
algorithm in parallel, although in the schematic only the lower-right cluster is shown running the incremental subgradient algorithm.
22
Energy Efficiency Latency Estimation Accuracy
(bit-meters) (iterations) (residual error)
centralized O(mn) 1 0
√ αC02
distributed in-network O( n) n 2
√ n αC02
distributed in-cluster O( n + nC ) nC 2nC
√ √ αC02
special case: (nC = n) O( n) √1 √
n 2 n
TABLE I
S UMMARY OF PERFORMANCE TRADEOFFS AMONG DIFFERENT ALGORITHMS IS SHOWN . N OTE , IN THE SPECIAL CASE
√
WHERE nC = n, THE DISTRIBUTED IN - CLUSTER CASE AND IN - NETWORK ALGORITHMS HAVE THE SAVE TRANSPORT
√
COST, BUT THE LATENCY AND ESTIMATION ACCURACY IS IMPROVED BY A FACTOR OF 1/ n.
23
Fixed Stepsize, Percent Damaged = 10

5
centralized
4 distributed in−network
Least Squares Residual Value

2
−1
−2
−3
−4
−5
0 50 100 150 200
Total Number of Iterations
(a) Least squares estimation (nC = 1 and nS = 100)

5
centralized
4 distributed in−cluster
3
−1
−2
−3
−4
−5
0 50 100 150 200
(b) Least squares estimation (nC = 4 and nS = 25)

5
centralized
3
−1
−2
−3
−4
−5
0 50 100 150 200
(c) Least squares estimation (nC = 10 and nS = 10)
Fig. 2. Plots of least squares residual value vs. total number of iterations for three different clustering scenarios are shown.
(a) Distributed in-network algorithm, (b) Distributed in-cluster algorithm with nC = 4 and nS = 25, (c) Distributed in-cluster
algorithm with nC = 10 and nS = 10.

24

5
centralized
4 distributed in−network
Robust Residual Value

1
−1
−2
−3
−4
−5
0 50 100 150 200
(a) Robust estimation (nC = 1 and nS = 100)

5
centralized
2
−1
−2
−3
−4
−5
0 50 100 150 200
(b) Robust estimation (nC = 4 and nS = 25)

5
centralized
2
−1
−2
−3
−4
−5
0 50 100 150 200
(c) Robust estimation (nC = 10 and nS = 10)
Fig. 3. Plots of robust residual value vs. total number of iterations for three different clustering scenarios are shown. (a) Distributed
in-network algorithm, (b) Distributed in-cluster algorithm with nC = 4 and nS = 25, (c) Distributed in-cluster algorithm with
nC = 10 and nS = 10.
25

1
centralized
0.8 distributed in−cluster
0.6
0.4
0.2
−0.2
−0.4
−0.6
−0.8
−1
0 50 100 150 200
(a) Robust Estimation with γ = 1.

1
centralized
0.8 distributed in−cluster
0.6
0.4
0.2
−0.2
−0.4
−0.6
−0.8
−1
0 50 100 150 200
(b) Robust Estimation with γ = 0.5
Fig. 4. Plots of robust residual value vs. total number of iterations for the case where nC = 10 and nS = 10 are shown. The plots
are rescaled for clarity. (a) Robust estimation with γ = 1, (b) Robust estimation with γ = 0.5.

Cluster

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Cluster

Загружено:

Авторское право:

Доступные форматы

1

Clustering in Distributed Incremental

Estimation in Wireless Sensor Networks

squares and robust estimation in sensor networks.

and Mobile Computing, Maui, HI, USA, June 13-17, 2005.

chiangm, kulkarni, stuart}@princeton.edu.

able to adjust the cluster size to accommodate the WSN requirements.

in-cluster, distributed in-network and centralized algorithms:

• Flexible tradeoff among accuracy, robustness, and speed of convergence

conclusion in Section VIII.

II. P REVIOUS W ORK

effect the rate of convergence for the incremental algorithms.

clustering is applied to the framework of distributed incremental estimation to provide a better

III. P ROBLEM F ORMULATION

x are the constants while θ is the variable.

Huber loss function.

A. Least Squares Estimation

problem is formulated as follows,

deviations, the following robust estimation procedure is often used.

considered bad measurements. As γ → ∞, robust estimation reduces to least-squares estimation.

IV. D ISTRIBUTED I N -C LUSTER A LGORITHM

of f . To formalize this procedure, let

θ̂new = θ̂old + α∆θ̂ , (6)

each sensor is its own cluster, nC = 1 and nS = n.

need to update θk for k = 1, 2, . . . , by local updates of φi,j,k .

Each iteration k of the distributed in-cluster algorithm proceeds as follows:

Cluster heads initialize φ0,j,k = θk , ∀j.

δfi,j (φi−1,j,k ). If φi−1,j,k ∈

priori constraint set Θ.

estimates {φnS ,j,k }∞

the in-cluster estimates was performed.

See the Appendix for the proof.

following lemma for each parameter estimate {θk }.

b2 = PnC {PnS Ci,j }2 .

See the Appendix for the proof.

key lemma is a crucial step in proving all the subsequent theorems.

fixed step-size, αk = α, and using Lemma 2, we have

See the Appendix for the proof.

j = 1, ..., nC . Then, for a fixed step-size, αk = α,

assumptions as in Corollary 1, in the distributed in-network case,

fixed step-size α and for any positive scalar ²,

See the Appendix for the proof.

where K is given by Eq. (15).

in-network is an accuracy improvement by a factor of nC .

that the step-size αk satifies

lim inf f (θk ) = f (θ∗ ) .

See the Appendix for the proof.

VI. B ENEFITS OF C LUSTERING ON P ERFORMANCE PARAMETERS

would be a good measure of the energy usage for a WSN.

shown in Table I as shown in Table I.

VII. S IMULATIONS , VARIATIONS , AND E XTENSIONS

distributed in-cluster algorithm.

are residue fluctuations around the equilibrium.

B. Accuracy, Robustness and Speed Tradeoff

size α is characterized by:

more clusters help achieve a more efficient tradeoff.

on Information Processing in Sensor Networks (IPSN 04), Berkeley, CA, 2004.

Institute of Technology, Cambridge, MA, 1999.

in Communications, vol. 23, no. 4, pp. 798–808, April 2005.

Francisco, CA, 2003.

fi,j (φi−1,j,k ) ≥ fi,j (y) + (φi−1,j,k − y)gi,j,k .

Then, using the fact that