Вы находитесь на странице: 1из 25

1

Clustering in Distributed Incremental

Estimation in Wireless Sensor Networks


Sung-Hyun Son, Mung Chiang, Sanjeev R. Kulkarni, Stuart C. Schwartz

Abstract

Energy efficiency, low latency, high estimation accuracy, and fast convergence are important goals in

distributed incremental estimation algorithms for sensor networks. One approach that adds flexibility in

achieving these goals is clustering. In this paper, the framework of distributed incremental estimation is

extended by allowing clustering amongst the nodes. Among the observations made is that a scaling law

exists where the estimation accuracy increases proportionally with the number of clusters. The distributed

parameter estimation problem is posed as a convex optimization problem involving a social cost function

and data from the sensor nodes. An in-cluster algorithm is then derived using the incremental subgradient

method. Sensors in each cluster successively update a cluster parameter estimate based on local data,

which is then passed on to a fusion center for further processing. We prove convergence results for the

distributed in-cluster algorithm, and provide simulations that demonstrate the benefits clustering for least

squares and robust estimation in sensor networks.

Index Terms

Distributed estimation, optimization, incremental subgradient method, clustering, wireless sensor net-

works.

This research was supported in part by the ONR under grant N00014-03-1-0290, the Army Research Office under grant DAAD19-

00-1-0466, Draper Laboratory under IR&D 6002 grant DL-H-546263, and the National Science Foundation under grant CCR-

0312413. Portions of this work were presented at the 2005 IEEE International Conference on Wireless Networks, Communications,

and Mobile Computing, Maui, HI, USA, June 13-17, 2005.


The authors are with the Department of Electrical Engineering, Princeton University, Princeton, NJ 08544 USA. Email: {sungson,

chiangm, kulkarni, stuart}@princeton.edu.


2

I. I NTRODUCTION

A wireless sensor network (WSN) is comprised of a fusion center and a set of geographically

distributed sensor nodes. The fusion center provides a central point in the network to consolidate

the sensor data while the sensor nodes collect information about a state of nature. In our scenario,

the fundamental objective of a WSN is to reconstruct that state of nature, e.g., estimation of a

parameter, given the sensor observations. Depending on the application and the resources of the

WSN, many possible algorithms exist that solve this parameter estimation problem.

One failsafe approach that accomplishes this objective is the centralized approach in which all

sensor nodes send their observations to the fusion center and allow the fusion center to make the

parameter estimation. The centralized scheme allows the most information to be present when

making the inference. However, the main drawback is the drainage of energy resources from each

sensor of the WSN [1]. In an energy constrained WSN, the energy expenditure of transmitting all

the observations to the fusion center might be too costly, thus making the method highly energy

inefficient. In our application, the purpose of a WSN is to make an inference, not collect all the

sensor observations.

Another approach avoids the fusion center altogether and allows the sensors to collaboratively

make the inference. This approach is referred to as the distributed in-network scheme, recently

proposed by Rabbat and Nowak [2]. First, consider a path that passes through all the sensor nodes

and visits each node only once. The path hops from one neighbor node to another until all the

sensor nodes are covered. Instead of passing the data along the sensor node path, a parameter esti-

mate is passed from node to node. As the parameter estimate passes through each node, each node

updates the parameter estimate with its own local observations. The distributed in-network ap-

proach significantly reduces the transmission energy for communication required by the network.

However, this approach has drawbacks in terms of latency, accuracy, and convergence. While the

centralized approach takes one iteration to have access to all data, the distributed approach takes

n iterations to have seen all the data captured by the network, where n is the number of sensors.

Also, the parameter estimate of the distributed in-cluster algorithm is less accurate when com-
3

pared to the parameter estimate of the centralized algorithm. In terms of the number of iterations,

the distributed in-network scheme converges slower than the centralized scheme. The distributed

in-network scheme remedies the issue of energy inefficiency, but suffers in terms of these other

performance parameters.

In this paper, we consider a hybrid form of the two aforementioned approaches. While the

former approach relies heavily on the fusion center and the latter approach eliminates the fusion

center altogether, we allow the fusion center to minimally interact with the sensor nodes. We

formulate a distributed in-cluster approach where the nodes are clustered and there exists a path

within each cluster that passes through each node only once as shown in Fig. 1. While the precise

mathematical formulation of the algorithm is stated in Section IV, roughly speaking, each cluster

operates similarly to the distributed in-network scheme. Within each cluster, every sensor node

updates its own parameter estimate based on its own local observations. Hence, each cluster has

its own parameter estimate. The sensor node that initiates the algorithm is designated to be the

cluster head. After completion of all the iterations within each cluster, the parameter estimate of

each cluster is then passed to the fusion center and averaged. Then, the fusion center announces

the average parameter value back to the cluster heads to start another set of cluster iterations if

necessary.

The purpose of clustering is to address the inherent inflexibility of both the centralized and the

in-network algorithms. For example, if the WSN application calls for the most accurate estimate

regardless of the communication costs, then the centralized algorithm would suffice. If the WSN

demands the most energy efficient algorithm irrespective of the other performance parameters like

latency, accuracy, and convergence speed, then the distributed in-network algorithm would be most

suitable. However, given a WSN application with specific accuracy demands or energy constraints,

are we able to develop an algorithm that is tailored to those desired performance levels? With the

distributed in-cluster algorithm, it is more feasible since the number of clusters, or equivalently

the size of the clusters, adds another dimension to the algorithm development process. We are also

able to adjust the cluster size to accommodate the WSN requirements.

Throughout the rest of the paper, we consider the following criteria in comparing the distributed
4

in-cluster, distributed in-network and centralized algorithms:

• Energy efficiency

• Latency

• Estimation accuracy

• Flexible tradeoff among accuracy, robustness, and speed of convergence

We show that the proposed distributed in-cluster algorithm adds a flexible tradeoff among all

the aforementioned criteria. Specifically, due to clustering, we are able to control the estimation

accuracy since the residual error scales as a function of the number of clusters. The inclusion

of clusters improves the scaling behavior of the estimation accuracy and latency. We use the

centralized and the distributed in-network algorithms as extreme cases of maximal and minimal

energy usage, respectively. For the special case where the WSN has n clusters with each cluster

having n sensors, we show that the transport cost of the distributed in-cluster algorithm has

the same order of magnitude as the distributed in-network algorithm. However, the latency and

accuracy improve by factor of n under this specific clustering situation.

The organization of the paper is as follows. Previous work in the areas of distributed incremental

estimation and clustering is discussed in Section II. We formulate the problem and provide two

concrete applications in Section III. The distributed in-cluster algorithm is precisely formulated

in Section IV and convergence analysis is discussed in Section V. Section VI surveys the benefits

of clustering on the performance parameters. Analytical results are verified in Section VII by

simulations that involve two applications, least-squares and robust estimation, followed by the

conclusion in Section VIII.

II. P REVIOUS W ORK

The ideas of incremental subgradient methods and clustering applied to distributed estimation

are quite prevalent in the literature. Incremental subgradient methods were first studied by Kibardin

[3] and then, more recently, by Nedić and Bertsekas [4]. Then, Rabbat and Nowak [2] applied

the framework used in [4] to handle the issue of energy consumption in distributed estimation

in WSNs. To further save on energy expenditure, Rabbat and Nowak [5] also implemented a
5

quantization scheme for distributed estimation, in which they showed that quantization does not

effect the rate of convergence for the incremental algorithms.

Clustering schemes have been implemented throughout the area of WSNs to provide a hierarchal

structure to minimize the energy spent in the system (see [6] and references therein). The purpose

of clustering is either to minimize the number of hops the data needs to arrive at a destination or

to provide a fusion point within the network to consolidate the amount of data sent. In our work,

clustering is applied to the framework of distributed incremental estimation to provide a better

scaling law for estimation accuracy and latency in relation to the number of clusters.

III. P ROBLEM F ORMULATION

Consider a WSN with n sensors with each sensor taking m measurements. The parameter

estimation objective of the WSN can be viewed as a convex optimization problem if the distortion

measure between the parameter estimate and the data is convex. The problem is

minimize f (x, θ)
(1)
subject to θ ∈ Θ

where f : Rnm+1 → R is a convex cost function, x ∈ Rnm is a vector of all the observations

collected by the WSN, θ is a scalar, and Θ is a nonempty, closed and convex subset of R. Note that

x are the constants while θ is the variable.

One method to decompose Problem (1) into a distributed optimization problem is to assume

that the cost function has an additive structure. The additive property states that the social cost

function given all the WSN data, f (x, θ), can be expressed as the normalized sum of individual

cost functions given only individual sensor data, fi (xi , θ). Hence, the problem becomes

1
Pn
minimize f (x, θ) = n i=1 fi (xi , θ)
(2)
subject to θ ∈ Θ

where fi (xi , θ) : Rm+1 → R is a convex local cost function for the ith sensor only using its own

measurement data xi ∈ Rm .
6

Although the additive property does not hold for general cost functions, two important appli-

cations in estimation satisfy this property: least squares estimation and robust estimation with the

Huber loss function.

A. Least Squares Estimation

The simplest estimation procedure is least-squares estimation. For the classical least squares

estimation problem, the distortion measure is f (x, θ) = kx − θ1k2 , where k · k is the Euclidean

norm. Clearly, the least-squares distortion measure is convex and additive. Hence, the optimization

problem is formulated as follows,

1
Pn n 1 Pm p 2
o
minimize n i=1 m p=1 kxi − θk
(3)
subject to θ ∈ Θ
Pm
where fi (xi , θ) = 1
m p=1 kxpi − θk2 and xpi denotes the pth entry in the vector of observations

from sensor node i. The beauty of least-squares estimation lies in its simplicity, but the technique is

prone to suffer greatly in terms of accuracy if some of the measurements are not as accurate as the

others. If some measurements have higher variances than other measurements, the least-squares

inference procedure does not take this effect into account. Thus, the least squares procedure is

highly sensitive to large deviations. To make the inference procedure more robust to these types of

deviations, the following robust estimation procedure is often used.

B. Robust Estimation

Another practical application is the robust estimation problem, which has the following form,

1
Pn n 1 Pm p
o
minimize n i=1 m p=1 fH (xi , θ)
(4)
subject to θ ∈ Θ

where



kxpi − θk2 /2, if kxpi − θk ≤ γ
fH (xpi , θ) = (5)


γkxp − θk − γ 2 /2, if kxp − θk > γ
i i
7

Pm
and γ ≥ 0 is the Huber loss function constant [7]. Note, fi (xi , θ) = 1
m p=1 fH (xpi , θ). The

purpose behind robust estimation is to introduce a new distortion measure that puts more weight

on good measurements, and less weight on, or even discards, bad measurements. The parameter

γ sets the threshold for the measurement values around the parameter estimate θ, i.e., the values

within a γ–range of θ are considered good measurements and the values outside a γ–range of θ are

considered bad measurements. As γ → ∞, robust estimation reduces to least-squares estimation.

IV. D ISTRIBUTED I N -C LUSTER A LGORITHM

To solve a convex optimization problem like (1), the most common method used is a gradient

descent method. Given any starting point θ̂ ∈ dom f , update θ̂ by descending along the gradient

of f . To formalize this procedure, let

θ̂new = θ̂old + α∆θ̂ , (6)

where α is the step size and ∆θ̂ = −∇f . The convexity of the function f guarantees that a

local minimum will be a global minimum. However, if the function f is not differentiable, then a

subgradient can be used. A subgradient of any convex function f (x) at a point y is any vector g

such that f (x) ≥ f (y) + (x − y)T g, ∀x. For a differentiable function, the subgradient is just the

gradient.

Along with convexity, if the cost function has an additive structure, a variant of the subgradient

method can be used. This method is called the incremental subgradient method [8]. The key idea

of the incremental subgradient algorithm is to sequentially take steps along the subgradients of the

marginal function fi (xi , θ) instead of taking one large step along the subgradient of f (x, θ). In

doing so, the parameter estimate can be adjusted by the subgradient of each individual cost function

given only its individual observations. Following this procedure leads to the distributed in-network

algorithm [2]. The convergence results for the algorithm follow directly from the incremental

subgradient method.

We now describe the in-cluster algorithm. Consider a WSN with nC clusters and nS sensors per

cluster, where nS ∗ nC = n. Assume that nS and nC are factors of n. Note that the distributed
8

in-network algorithm can be viewed as a special case of the distributed in-cluster algorithm where

each sensor is its own cluster, nC = 1 and nS = n.

We use i = 1 to index sensor nodes, j to index clusters, and k to index the iteration number.

Let i = 0 represent the cluster head. Let fi,j (xi,j , θ) and φi,j,k denote the local cost function and

the parameter estimate at node i in cluster j during iteration k, respectively. For conciseness, we

suppress the dependency of f on the parameters in the notation and let fi,j (xi,j , θ) = fi,j (θ) and

f (x, θ) = f (θ). Also, let θk be the estimate maintained by the fusion center during iteration k. We

need to update θk for k = 1, 2, . . . , by local updates of φi,j,k .

Each iteration k of the distributed in-cluster algorithm proceeds as follows:

1) Fusion center passes the current estimate θk to the cluster heads in all clusters.

Cluster heads initialize φ0,j,k = θk , ∀j.

2) Incremental update is conducted in parallel in all the clusters. Within each cluster,

the updates are conducted through update paths that traverse all the nodes in each

cluster:
gi,j,k
φi,j,k = φi−1,j,k − αk (7)
n

where αk is the step size and gi,j,k is a subgradient of fi,j using the last esti-

mate φi−1,j,k and the local measurement data xi,j . This is denoted as gi,j,k ∈

δfi,j (φi−1,j,k ). If φi−1,j,k ∈


/ Θ, then project φi−1,j,k onto a nearest point in the a

priori constraint set Θ.

3) All clusters pass the last in-cluster estimate φnS ,j,k to the fusion center, which takes
P C
the average to produce the next estimate θk+1 = n1C nj=1 φnS ,j,k .

4) Repeat.

In step 3 of the distributed in-cluster algorithm, the fusion center may process the in-cluster

estimates {φnS ,j,k }∞


j=1 using a variety of methods, e.g. a weighted average depending on the signal

to noise ratio of the observations. By involving the fusion center, this allows more flexibility in the
9

algorithm development. In the convergence proofs, we will consider the case where an average of

the in-cluster estimates was performed.

V. C ONVERGENCE

Following the approach for the distributed incremental subgradient algorithm in [8], we show

convergence for the distributed in-cluster approach. The main difference in our proofs is the emer-

gence of the clustering values, nC and nS . In these proofs, we make reasonable assumptions that

the optimal solution exists and the subgradient is bounded as shown in the following statements.

Let the true underlying state of the environment be the (finite) minimizer, θ∗ , of the cost func-

tion. Also, assume there exists scalars Ci,j ≥ 0 such that Ci,j ≥ ||gi,j,k || for all i = 1, ..., nS ,

j = 1, ..., nC , and k.

We start with the following lemma that is true for each cluster parameter estimate {φi,j,k }.

Lemma 1: Let {φi,j,k } be the sequence of subiterations generated by Eq. (7). Then for all y ∈ Θ

and for k ≥ 0,

2 2 2αk ¡ ¢ αk2 2
||φi,j,k − y|| ≤ ||φi−1,j,k − y|| − fi,j (φi−1,j,k ) − fi,j (y) + 2 Ci,j ∀ i, j . (8)
n n

See the Appendix for the proof.

By summing all the inequalities in Eq. (8) over all i = 1, ..., nS and j = 1, ..., nC , we have the

following lemma for each parameter estimate {θk }.

Lemma 2: Let {θk } be a sequence generated by distributed in-cluster method. Then, for all

y ∈ Θ and for k ≥ 0

2αk ¡ ¢ α2 b 2
||θk+1 − y||2 ≤ ||θk − y||2 − f (θk ) − f (y) + 2 k C , (9)
nC n · nC

b2 = PnC {PnS Ci,j }2 .


where C j=1 i=1

See the Appendix for the proof.


2n2 (f (θk )−f (y))
Lemma 2 guarantees that the sequence {θk }∞
k=1 gets smaller provided αk < Cb2 . This

key lemma is a crucial step in proving all the subsequent theorems.


10

Theorem 1: Let {θk } be a sequence generated by the distributed in-cluster method. Then, for a

fixed step-size, αk = α, and using Lemma 2, we have


b2
αC
lim inf f (θk ) ≤ f (θ∗ ) + (10)
k→∞ 2n2
b2 = PnC {PnS Ci,j }2 .
where f (θ∗ ) = inf θ∈Θ f (θ) and C j=1 i=1

See the Appendix for the proof.

If all the subgradients gi,j,k are bounded by one scalar C, we have the following corollary.

Corollary 1: Let C0 = maxi,j Ci,j . It is evident that C0 ≥ kgi,j,k k for all i = 1, ..., nS and

j = 1, ..., nC . Then, for a fixed step-size, αk = α,

αC02
lim inf f (θk ) ≤ f (θ∗ ) + . (11)
k→∞ 2nC
Since the incremental subgradient method is a primal feasible method, a lower bound of f (θ∗ )

is always satisfied. Therefore, the sequence of estimates f (θk ) will eventually be trapped between
αC02
f (θ∗ ) and f (θ∗ ) + 2nC
. The fluctuation around the equilibrium,

αC02
R= , (12)
2nC

is the residue estimation error due to the fact that a constant step size is used for subgradient

methods.

Comparing Corollary 1 with both the standard results in [8] and the result used by Rabbat and

Nowak in [2], we observe that in our case, we have a smaller threshold tolerance. Using the same

assumptions as in Corollary 1, in the distributed in-network case,

αC02
lim inf f (θk ) ≤ f (θ∗ ) + . (13)
k→∞ 2
αC02
Thus, as k → ∞, the distributed in-network algorithm converges to a ( 2
- suboptimal) solution,
αC 2
whereas, the distributed in-cluster algorithm converges to a ( 2nC0 - suboptimal) solution. We ob-

serve a key advantage of the in-cluster approach: estimation accuracy is tighter by a factor of nC .

Even for medium scale sensor networks, a factor of nC can be an order of magnitude improvement.

The next theorem provides the necessary number of iterations, K, to achieve a certain desired

estimation accuracy.
11

Theorem 2: Let {θk } be a sequence generated by the distributed in-cluster method. Then, for a

fixed step-size α and for any positive scalar ²,


à !
1 αCb2
min f (θk ) ≤ f (θ∗ ) + +² (14)
0≤k≤K 2 n2

where K is given by
¹ º
nC ||θ0 − θ∗ ||2
K= . (15)
α²

See the Appendix for the proof.

Along the same lines as Theorem 1 and Corollary 1, we have the following corollary.

Corollary 2: Let C0 = maxi,j Ci,j . Then, for a fixed step-size α and for any positive scalar ²
µ ¶
∗ 1 αC02
min f (θk ) ≤ f (θ ) + +² (16)
0≤k≤K 2 nC

where K is given by Eq. (15).

By observing Eq. (16), we see that the index k refers to the parameter estimate obtained at

the end of each cluster cycle. Since each cluster has nS iterations, the total number of itera-
³ 2 ´ j k
1 αC0 nC ||θ0 −θ ∗ ||2
tions required for an accuracy of 2 nC + ² is nS α²
. In comparison, for the dis-

tributed in-network case, the total number of iterations required for an accuracy of 12 (αC02 + ²)
j k
−θ∗ ||2
is n ||θ0α² . Therefore, for both algorithms, the total number of iterations necessary is the

same order of magnitude, while the benefit of distributed in-cluster algorithm over the distributed

in-network is an accuracy improvement by a factor of nC .

Another natural extension is varying the step-size. For a fixed step-size, complete convergence

can not be achieved. The parameter estimates, {θk }, enter a limit cycle after an arbitrary number

of iterations. To force convergence to the optimal value, f (θ∗ ), the step-size can be set to diminish

at a rate inversely proportional to the number of iterations, αk = αk . More generally, we have the

following theorem.

Theorem 3: Let {θk } be a sequence generated by the distributed in-cluster method. Also, assume

that the step-size αk satifies



X
lim αk = 0 and αk = ∞,
k→∞
k=0
12

then,

lim inf f (θk ) = f (θ∗ ) .


k→∞

See the Appendix for the proof.

VI. B ENEFITS OF C LUSTERING ON P ERFORMANCE PARAMETERS

A. Energy Efficiency

The main expenditure of energy for a WSN is in the cost of communication. This entails trans-

porting bits either from sensor to sensor or from sensor to fusion center. So, the transport cost

would be a good measure of the energy usage for a WSN.

For example, consider our original WSN consisting of n sensors where each sensor has m mea-

surements. The sensors are distributed randomly (uniformly) over one square meter. We use

bit-meters as the metric to measure the transport cost in the transmission of data.

In the centralized setting, all n sensors send their m observations to the fusion center, requiring

O(mn) bits to be transmitted over an average distance of O(1) meters. In total, the transport cost

is O(mn) bit-meters.

In the distributed in-network setting, all n sensors use their m observations to update the pa-

rameter estimate and pass the parameter estimate along the path that contains all the sensor nodes.

The distributed in-network method needs O(n) bits to be transmitted over an average distance of

O( √1n ). In total, the transport cost is O( n).

In the distributed in-cluster setting, the sensor network forms nC clusters with nS nodes per

cluster. This method requires O(n) bits to be transmitted over an average distance of O( √1n )

meters which accounts for the sensor to sensor transport cost and O(nC ) bits to be transmitted over

an average distance of O(1) meters which accounts for the sensor to fusion center transport cost.

Thus, the transport cost is O( n + nC ) bit-meters.

An interesting result arises when the cluster size and the number of sensors per cluster are equal,

nC = nS = n. For this case, the total transport cost of the distributed in-cluster algorithm

becomes O( n). The distributed in-cluster algorithm and the distributed in-network algorithm

have the same magnitude in terms of transport cost when nC = nS = n.
13

B. Latency

Latency is defined as the number of iterations needed to see all the data captured by the network.

For the centralized case, only one iteration is needed while for the in-network case, n iterations

are needed. However, the in-cluster algorithm, the latency of the WSN can be adjusted by the size
n
of the cluster. The latency for the in-cluster case reduces to nC
or more simply, nS iterations as

shown in Table I as shown in Table I.

C. Estimation Accuracy

By forming nC clusters in a WSN, estimation accuracy can be improved. For the fixed step-size
1
case, the estimation accuracy is reduced by a factor of nC
when compared to the distributed in-

network case as shown in Table I. The accuracy improvement by a factor of nC holds for both cases

where k tends toward infinity and k is finite as shown in Corollary 1 and Corollary 2, respectively.

VII. S IMULATIONS , VARIATIONS , AND E XTENSIONS

Consider a WSN with 100 sensors uniformly distributed over a region, each taking 10 mea-

surements each. The observations are independent and identically distributed from measurement

to measurement, observations are independent from sensor to sensor. If the sensors are working

properly, the measurements are distributed by a Gaussian distribution with mean 10 and variance 1

and if the sensors are defective, the measurements are distributed by a Gaussian distribution with

mean 10 and variance 100. This application can be viewed as a deterministic mean-location param-

eter estimation problem. The simulations assume that 10% of the sensor nodes are damaged and

a fixed step-size of αk = 0.4 is used. As summarized in this section, a variety of simulations are

conducted to verify the theorems and characterize other properties and tradeoffs in the proposed

distributed in-cluster algorithm.

A. Basic Simulations

Least squares estimation and Huber robust estimation are simulated, and the resulting conver-

gence behavior of the residual value is shown in Figs. 2 and 3, respectively. In both figures, the
14

distributed in-network method and the distributed in-cluster method are shown by a solid line while

the centralized method is shown by a dashed line. Since a fixed step size of αk = 0.4 is used, there

are residue fluctuations around the equilibrium.

In both estimation procedures, an increase in the number of clusters causes a decrease in fluctu-

ations. The precise data points confirm the theoretical prediction: the distributed in-cluster method
1
fluctuation is nC
smaller than the distributed in-network fluctuation. By observing the least squares
1
estimation plots in Fig. 2, when nC = 4 and nC = 10, the fluctuations are smaller by a factor of 4

1
and 10
, respectively, compared to the plot when nC = 1. In the robust estimation example of Fig.

3, we use a Huber parameter of γ = 1. The distributed in-cluster method fluctuation again shows

narrower fluctuations and is almost indistinguishable from the centralized estimation curve.

B. Accuracy, Robustness and Speed Tradeoff

To determine the tolerance bounds for the robust estimation procedure, the gradient of the Huber
P
loss function is calculated. Since fi (θ) = m1 m p
p=1 fH (xi , θ), it is clear that

||∇fi (θ)|| ≤ γ

by differentiation, while

||∇fi (θ)|| ≤ C0

by definition. Hence, C0 can be set to equal γ to provide an upper bound for the gradient.

This gives the following results that analytically characterize the tradeoff among three competing

criteria: accuracy, robustness, and speed of convergence for incremental estimation. Combining

Eq. (12) with the relation that C0 = γ, we have the following formula. For a given network size

n, the tradeoff among estimation error bound R, Huber robustness parameter γ, and constant step

size α is characterized by:

2nC R = αγ 2 . (17)

For example, to maintain a desired level of robustness γ, tighter convergence bounds (smaller

R) implies slower convergence speed (smaller α). As another example, to get tighter convergence

bounds, we would like R to be small. This can be achieved by either reducing α, which means
15

smaller step size and slower convergence speed, or reducing γ, which means accepting less reliable

data for Huber estimation and reducing the robustness as well as the speed of convergence, since

less reliable data are used for estimation. An illustrative example is shown in Fig. 4, where we

reduce γ by a factor of 2 (cf. Figs. 4a and 4b). This reduction of γ reduces the estimation error by

about a factor of 4 but also increases the convergence time by roughly a factor of 2. The nC term in

the tradeoff characterization in Eq. (17) again highlights an advantage of the in-cluster approach:

more clusters help achieve a more efficient tradeoff.

VIII. C ONCLUSION

We have presented a distributed in-cluster scheme for a WSN that uses the incremental subgra-

dient method. By incorporating clustering within the sensor network, we have created a degree

of freedom which allows us to tune the algorithm for energy efficiency, estimation accuracy, con-

vergence speed and latency. Specifically, in terms of estimation accuracy, we have shown that a

different scaling law applies to the clustered algorithm: the residual error is inversely proportional

to the number of clusters. Also, for the special case where a WSN with n sensors forms n clus-

ters, we are able to maintain the same transport cost as the distributed-in-network scheme, while

increasing both accuracy of the estimate and convergence speed, and reducing latency. Simulations

have been provided for both least squares and robust estimation.

We plan to extend our work by relaxing the independence assumption of sensor to sensor obser-

vations. In particular, in future work, we will consider a WSN scenario where the data within each

cluster are spatially correlated, while the data from cluster to cluster are independent.

R EFERENCES
[1] G. J. Pottie and W. J. Kaiser, “Wireless integrated network sensors,” Communications of the ACM, vol. 43, no. 5, pp. 51–58,

May 2000.

[2] M. Rabbat and R. Nowak, “Distributed optimization in sensor networks,” in Proceedings of the Third International Symposium

on Information Processing in Sensor Networks (IPSN 04), Berkeley, CA, 2004.

[3] V. M. Kibardin, “Decomposition into functions in the minimization problem,” Automation and Remote Control, vol. 40, pp.

1311–1323, 1980.
16

[4] A. Nedić and D. P. Bertsekas, “Incremental subgradient methods for nondifferentiable optimization,” Tech. Rep., Massachusetts

Institute of Technology, Cambridge, MA, 1999.

[5] M. Rabbat and R. Nowak, “Quantized incremental algorithms for distributed optimization,” IEEE Journal on Selected Areas

in Communications, vol. 23, no. 4, pp. 798–808, April 2005.

[6] S. Bandyopadhyay and E. Coyle, “An energy efficient hierarchical clustering algorithm for wireless sensor networks,” in

Proceedings of the 22nd Annual Joint Conference of the IEEE Computer and Communications Societies (Infocom 2003), San

Francisco, CA, 2003.

[7] P. J. Huber, Robust Statistics, John Wiley & Sons, New York, 1981.

[8] D. P. Bertsekas, Convex Analysis and Optimization, Athena Scientific, Belmont, MA, April 2003.

A PPENDIX

A. Proof of Lemma 1

Proof:

gi,j,k
||φi,j,k − y||2 = ||φi−1,j,k − αk − y||2
n
2αk α2
≤ ||φi−1,j,k − y||2 − (φi−1,j,k − y)gi,j,k + 2k ||gi,j,k ||2
n n
2 2αk αk2 2
≤ ||φi−1,j,k − y|| − (fi,j (φi−1,j,k ) − fi,j (y)) + 2 Ci,j .
n n

The last line of the proof uses the fact that gi,j,k is a subgradient of the convex function fi,j at

φi−1,j,k . Thus,

fi,j (φi−1,j,k ) ≥ fi,j (y) + (φi−1,j,k − y)gi,j,k .

B. Proof of Lemma 2

Proof:
° °2
° 1 XnC
¡ ¢ °
° °
||θk+1 − y||2 = ° φnS ,j,k − y °
° nC °
j=1
° °2
° 1 XnC
¡ ¢°
° °
= ° φnS ,j,k − y °
° nC °
j=1
nC
1 X
≤ ||φnS ,j,k − y||2
nC j=1
nC ½ ¾
1 X 2 2αk ¡ ¢ αk2 2
≤ ||φnS −1,j,k − y|| − fnS ,j (φnS −1,j,k ) − fnS ,j (y) + 2 Ci,j
nC j=1 n n
17

where in the third and fourth lines of the proof, we used the Quadratic Mean-Arithmetic Mean

inequality and Lemma 1, respectively. After recursively decomposing ||φi,j,k − y||2 nS times, we

get
nC
( nS nS
)
1 X 2αk
X ¡ ¢ α 2 X
||θk+1 − y||2 ≤ ||φ0,j,k − y||2 − fi,j (φi−1,j,k ) − fi,j (y) + 2k C2
nC j=1 n i=1 n i=1 i,j
nC X nS nC X nS
2αk X ¡ ¢ α2 X
= ||θk − y||2 − fi,j (φi−1,j,k ) − fi,j (y) + 2 k 2
Ci,j
n · nC j=1 i=1 n · nC j=1 i=1
nC X nS
22αk X ¡ ¢
= ||θk − y|| − fi,j (φi−1,j,k ) − fi,j (y) + fi,j (θk ) − fi,j (θk )
n · nC j=1 i=1
nC X nS
αk2 X 2
+ 2 Ci,j .
n · nC j=1 i=1

Then, using the fact that


n n
1X C X S

f (θk ) = fi,j (θk ) ,


n j=1 i=1
the expression simplifies to
( nC X nS
)
2αk 1X ¡ ¢
||θk+1 − y||2 = ||θk − y||2 − f (θk ) − f (y) + fi,j (φi−1,j,k ) − fi,j (θk )
nC n j=1 i=1
nC X nS
αk2 X
+ 2 C2 .
n · nC j=1 i=1 i,j

Then,
( nC X nS
)
2αk 1X
||θk+1 − y||2 ≤ ||θk − y||2 − f (θk ) − f (y) + Ci,j ||φi−1,j,k − θk ||
nC n j=1 i=1
nC XnS
αk2 X 2
+ 2 Ci,j
n · nC j=1 i=1
( nC X nS i−1
)
2α k 1 X α k
X
≤ ||θk − y||2 − f (θk ) − f (y) + Ci,j · Cm,j
nC n j=1 i=2 n m=1
nC X nS
αk2 X
+ C2
n2 · nC j=1 i=1 i,j
2αk ¡ ¢
= ||θk − y||2 − f (θk ) − f (y)
n
n
(C n ( i−1 ) n )
αk2 X C X S X XS
2
+ 2 Ci,j Cm,j + Ci,j
n2 · nC j=1 i=2 m=1 i=1
nC
(n )2
2α k ¡ ¢ α 2 X X S

= ||θk − y||2 − f (θk ) − f (y) + 2 k Ci,j


nC n · nC j=1 i=1
18

where in the second and third inequalities, we used the fact that

fi,j (θk ) − fi,j (φi−1,j,k ) ≤ Ci,j ||φi−1,j,k − θk ||

and
i
αk X
||φi,j,k − θk || ≤ Cm,j , i = 1, ..., nS ,
n m=1
respectively.

C. Proof of Theorem 1

Proof: Proof is by contradiction. If Thm. 1 is not true, then there exists an ² > 0 such that

b2
αC
lim inf f (θk ) > f (θ∗ ) + + 2² .
k→∞ 2n2

Let z ∈ Θ be the value so that

b2
αC
lim inf f (θk ) ≥ f (z) + + 2² ,
k→∞ 2n2

and let k0 be sufficiently large so that for all k ≥ k0 , we have

f (θk ) ≥ lim inf f (θk ) − ² .


k→∞

Then, by combining the above two relations with Lemma 2 and setting y = z, we have, for all

k ≥ k0
2α²
||θk+1 − z||2 ≤ ||θk − z||2 − .
nC
Therefore,

2α²
||θk+1 − z||2 ≤ ||θk − z||2 −
nC
4α²
≤ ||θk−1 − z||2 −
nC
..
.
2(k + 1 − k0 )α²
≤ ||θk0 − z||2 −
nC

which cannot hold for sufficiently large k.


19

D. Proof of Theorem 2

Proof: Proof is by contradiction. Assume that for all k with 0 ≤ k ≤ K, we have


à !
1 α b2
C
f (θk ) > f (θ∗ ) + +² .
2 n2

By setting αk = α and y = θ∗ in Lemma 2 and by combining that with the above relation, we have

for all k with 0 ≤ k ≤ K,

2α b2
α2 C
||θk+1 − θ∗ ||2 ≤ ||θk − θ∗ ||2 − (f (θk ) − f (θ∗ )) + 2
nC n · nC
à à !!
2α 1 αC b2 b2
α2 C
≤ ||θk − θ∗ ||2 − + ² −
nC 2 n2 n2 · nC
α²
= ||θk − θ∗ ||2 − .
nC

If we sum the above inequalities over k for k = 0, 1, . . . , K, we have

(K + 1) α²
||θK+1 − θ∗ ||2 ≤ ||θ0 − θ∗ ||2 − .
nC

Thus, it is evident that


(K + 1) α²
||θ0 − θ∗ ||2 − ≥0,
nC

which contradicts the definition of K.

E. Proof of Theorem 3

Proof: Proof is by contradiction. If Thm. 3 does not hold, then there exists an ² > 0 such

that

lim inf f (θk ) − 2² > f (θ∗ ).


k→∞

Then, using the convexity of f and Θ, there exists a point z ∈ Θ such that

lim inf f (θk ) − 2² ≥ f (z) > f (θ∗ ).


k→∞

Let, there be a k0 large enough that for all k ≥ k0 , we have

f (θk ) ≥ lim inf f (θk ) − ² .


k→∞
20

Then, by combining the above two relations, we have for all k ≥ k0 ,

f (θk ) − f (z) ≥ ².

By setting y = z in Lemma 2 and by combining that with the above relation, we obtain for all

k ≥ k0 ,

2αk b2
α2 C
||θk+1 − z||2 ≤ ||θk − z||2 − ² + 2k
nC n · nC
à !
αk α k
b
C 2
= ||θk − z||2 − 2² − .
nC n2

Since αk → 0, we may assume that k0 is large enough so that

b2
αk C
2² − ≥ ², ∀k ≥ k0 .
n2

Thus, for all k ≥ k0 , we have

αk ²
||θk+1 − z||2 ≤ ||θk − z||2 −
nC
(αk + αk−1 )²
≤ ||θk−1 − z||2 −
nC
..
.
k
2 ² X
≤ ||θk0 − z|| − αj ,
nC j=k
0

which cannot hold for sufficiently large k.


21

qk

Fusion Center

Fig. 1. Illustration of a sensor network implementing the distributed in-cluster algorithm. The dash-dotted lines represent the

borders of the clusters. The shaded nodes represent the cluster heads that communicate with the fusion center. All clusters run the

algorithm in parallel, although in the schematic only the lower-right cluster is shown running the incremental subgradient algorithm.
22

Energy Efficiency Latency Estimation Accuracy

(bit-meters) (iterations) (residual error)

centralized O(mn) 1 0
√ αC02
distributed in-network O( n) n 2
√ n αC02
distributed in-cluster O( n + nC ) nC 2nC
√ √ αC02
special case: (nC = n) O( n) √1 √
n 2 n

TABLE I

S UMMARY OF PERFORMANCE TRADEOFFS AMONG DIFFERENT ALGORITHMS IS SHOWN . N OTE , IN THE SPECIAL CASE


WHERE nC = n, THE DISTRIBUTED IN - CLUSTER CASE AND IN - NETWORK ALGORITHMS HAVE THE SAVE TRANSPORT


COST, BUT THE LATENCY AND ESTIMATION ACCURACY IS IMPROVED BY A FACTOR OF 1/ n.
23

Fixed Stepsize, Percent Damaged = 10


5
centralized
4 distributed in−network

Least Squares Residual Value


2

−1

−2

−3

−4

−5
0 50 100 150 200
Total Number of Iterations

(a) Least squares estimation (nC = 1 and nS = 100)


Fixed Stepsize, Percent Damaged = 10
5
centralized
4 distributed in−cluster

3
Least Squares Residual Value

−1

−2

−3

−4

−5
0 50 100 150 200
Total Number of Iterations

(b) Least squares estimation (nC = 4 and nS = 25)


Fixed Stepsize, Percent Damaged = 10
5
centralized
4 distributed in−cluster

3
Least Squares Residual Value

−1

−2

−3

−4

−5
0 50 100 150 200
Total Number of Iterations

(c) Least squares estimation (nC = 10 and nS = 10)

Fig. 2. Plots of least squares residual value vs. total number of iterations for three different clustering scenarios are shown.

(a) Distributed in-network algorithm, (b) Distributed in-cluster algorithm with nC = 4 and nS = 25, (c) Distributed in-cluster

algorithm with nC = 10 and nS = 10.


24

Fixed Stepsize, Percent Damaged = 10


5
centralized
4 distributed in−network

Robust Residual Value


1

−1

−2

−3

−4

−5
0 50 100 150 200
Total Number of Iterations

(a) Robust estimation (nC = 1 and nS = 100)


Fixed Stepsize, Percent Damaged = 10
5
centralized
4 distributed in−cluster

2
Robust Residual Value

−1

−2

−3

−4

−5
0 50 100 150 200
Total Number of Iterations

(b) Robust estimation (nC = 4 and nS = 25)


Fixed Stepsize, Percent Damaged = 10
5
centralized
4 distributed in−cluster

2
Robust Residual Value

−1

−2

−3

−4

−5
0 50 100 150 200
Total Number of Iterations

(c) Robust estimation (nC = 10 and nS = 10)

Fig. 3. Plots of robust residual value vs. total number of iterations for three different clustering scenarios are shown. (a) Distributed

in-network algorithm, (b) Distributed in-cluster algorithm with nC = 4 and nS = 25, (c) Distributed in-cluster algorithm with

nC = 10 and nS = 10.
25

Fixed Stepsize, Percent Damaged = 10


1
centralized
0.8 distributed in−cluster

0.6

0.4
Robust Residual Value

0.2

−0.2

−0.4

−0.6

−0.8

−1
0 50 100 150 200
Total Number of Iterations

(a) Robust Estimation with γ = 1.


Fixed Stepsize, Percent Damaged = 10
1
centralized
0.8 distributed in−cluster

0.6

0.4
Robust Residual Value

0.2

−0.2

−0.4

−0.6

−0.8

−1
0 50 100 150 200
Total Number of Iterations

(b) Robust Estimation with γ = 0.5

Fig. 4. Plots of robust residual value vs. total number of iterations for the case where nC = 10 and nS = 10 are shown. The plots

are rescaled for clarity. (a) Robust estimation with γ = 1, (b) Robust estimation with γ = 0.5.

Вам также может понравиться