A Novel Fault Detection and Recovery Mechanism For Zigbee Sensor Networks

2008 Second International Conference on Future Generation Communication and Networking
A novel fault Detection and Recovery mechanism for Zigbee Sensor Networks
Jian Wan, Jianrong Wu, Xianghua Xu
Grid and Services Computing Lab, S. of Computer Science, Hangzhou Dianzi
University, Hangzhou, China
E-mail: wanjian@hdu.edu.cn, wujianlong111@163.com, xianghuaxu@yahoo.com.cn
essential to guarantee sensor network reliability and
connectivity after one or more nodes are loss in
connection with the network [2] [3].
ZigBee [4] is a wireless standard based on IEEE
802.15.4[5] that was developed to address the unique
needs of most remote monitoring and control and
sensor network applications, and enable broad-based
deployment of wireless networks with low cost, low
power, low data-rate solutions. ZigBee standard has a
self-repair ability which specifies how a disconnected
node attempts to find a new parent and re-join the
network.
This paper proposes an efficient fault recovery
mechanism based on ZigBee Specification, supposing
a node is termed as failing when its energy level
drops below the threshold value. The main
contributions of this paper are:
i) We design three types of failure mechanisms
depending on the type of node in the tree
network of ZigBee.
ii) We design a energy efficient methods for fault
recovery in ZigBee network.
iii) We show the performance of failure mechanisms
for the increase of total number of nodes in
ZigBee network.
Abstract
Wireless Sensor networks (WSN) are prone to
cause failures due to the constraining factors of
energy, memory and communication capability. This
paper describes an efficient faulty detection and
recovery mechanism for ZigBee network which is an
industrial standard widely adopted for WSN. We
propose a localized tree based method for detecting
faults due to energy exhaustion of nodes and repair
them after node failure. We design three types of
failure mechanisms depending on the type of node in
the tree network of ZigBee. In order to avoid much
more exchange message and reduce the collision of
re-joining network, we take an elected new parent
strategy from the disconnected children, and make
the disconnected node re-join network together with
as many descendant nodes as possible. Simulation
results show that our proposed method consumes less
energy than the original ZigBee network, and prolong
the lifetime of network efficiently.
1 Introduction
Wireless sensor networks (WSN) is composed of a
large number of low-cost, low-powered and selforganizing sensor devices, called sensor nodes, which
can be deployed on the ground, in the air, under water,
on bodies, in vehicles, and inside buildings. The
monitoring area may be inaccessible terrains or harsh
environment. On the other hand, sensor network
should possess cooperative and self-heal capabilities.
There is a wide range of application of sensor
networks. For example, environment monitoring,
home automation, military sensing, patient
monitoring, building monitoring and so on [1].
Some sensor nodes may fail due to various reasons
such as energy depletion, environmental interference,
or malicious attacks. This often results in a nonuniform network topology and some nodes will lose
contact with the rest of the network. Therefore, the
sensor nodes should have a robust and reliable feature
to detect faulty nodes and take appropriate measures
to recovery from the failure status. This ability is
The remainder of this paper is organized as

follows. Section II discusses related work of failure
mechanisms. Section III develops the motivation of
our improvement. Section IV gives a detailed
description of the proposed algorithm. In section V,
we provide theoretical and simulation results in
evaluating the proposed algorithm. Finally, in section
IV, we presents the paper concludes.
2 Related Work
In this section, we summarize the related works in
the area of fault detection and recovery in wireless
sensor networks. In [6], M. Ding, D. Chen, K. Xing
and X. Chen present one localized algorithm for
faulty sensor identification and one localized
algorithm for fault-tolerant event boundary detection.
Both faulty sensors and sensors in an event region
*This work is supported by Science and Technology Research

and Development Program of Zhejiang Province, China. (Grant No
2007C11023, 2007C21G3230005 and 2007R40G2040097)
978-0-7695-3431-2/08 $25.00 2008 IEEE

DOI 10.1109/FGCN.2008.105
270
get a stain. Unlike existing solutions suppose all

neighbors have the same influence, which inevitably
results in some false decisions.
can generate abnormal readings deviated from a

typical application-specific range. Usually, when an
obvious change in sensor readings is detected,
something must have happened. If the change is
present with a single sensor only, the sensor is faulty.
If most neighboring sensors capture the same
phenomenon simultaneously, an event occurs. The
proposed algorithms have a strong fault tolerant
ability, effectively identify faulty sensors and locate
the event boundaries, and are sensitive to the settings
of thresholds.
The failure of a single node can result in a whole
portion of the network becoming silent in a tree-like
sensor network, and the base station is at the root of
the tree. When the base station ceases to receive
measurements from a region of nodes, it cant
immediately determine whether this is because of the
failure of all the nodes in that region or merely the
result of the failure of a few key routing nodes.
Therefore, it is critical for the base station to trace
failed nodes in sensor networks. In [7], the authors
propose an algorithm that puts the burden of detecting
and tracing failed nodes to the base station. At first
the nodes learn the network topology and send their
portion of the topology information to the base
station. With this information the base station learns
the complete network topology which is used to send
route updates as soon as it detects that nodes become
silent.
Fault recovery techniques enable systems to
continue operating according to their specifications
even if faults of a certain type are present. Multi-path
routing is desirable to avoid that a single failing node
causes the partitioning of a sensor network. Thus, a
network should be k-connected, which allows k 1
nodes to fail while the network would still be
connected [8].
G. Gupta and M. Younis, in [9], propose a
reallocation of nodes that were part of a cluster that
suffered a cluster-head failure. The cluster-head,
called gateway, is considered to be a resourceful
node. The solution presented considers that all the
gateways in the network maintain a list of the nodes
that are currently in their cluster and another backup
list of nodes that could become part of their cluster.
When a gateway fails, the nodes from its cluster are
reallocated to the other gateways that have the nodes
in their backup lists created during the time of
clustering. If more than one gateway has a specific
node in its backup list the node is assigned to the
cluster-head that has the smallest communication cost.
At last, in [10], the authors propose a weighted
majority vote scheme to detect faulty sensor. Based
on the spatial correlations in WSN, a faulty sensor
can diagnose itself through utilizing the spatial and
time information provided by its neighbor sensors.
The proposed algorithm introduces stain to weight
different neighbors. Once the measurement of a
sensor is judged as faulty, the sensor will be said to
3 Motivation
In ZigBee network, there are three types of nodes,
namely coordinator, routers and end devices. A
coordinator starts a network and allows other nodes to
join it. Routers scan to find a network to join and
allow other nodes to join it. End devices cant route
messages, dont have the capability of allowing other
nodes to join it, and poll parent to get messages.
When a parent is accepting a node as its child it
assigns a 16-bit network address to the child node.
Network addresses are assigned using a distributed
addressing scheme that is designed to provide every
potential parent with a finite sub-block of network
addresses. These addresses are unique within a
particular network and are given by a parent to its
children. Besides, each ZigBee node has a globally
unique 64-bit identifier called IEEE address.
After forming a network, the coordinator
determines the maximum number of children (Cm) of
a ZigBee router, the maximum number of child
routers (Rm) of a parent node, and the depth of the
network (Lm). We may compute the function, ,
essentially the size of the address sub-block being
distributed by each parent at that depth to its routercapable child nodes for a given network depth(d), as
follows:
if Rm =1
1+ Cm (Lm d 1),
Lm
1
Cskip(d) = 1+ Cm Rm Cm Rm
(1)
, otherwise
1 Rm
The kth router and nth end device shall be assigned

the network address by their parent at depth d as in
the following equation:
(2)
Ak = Aparent + Cskip ( d ) (k 1) + 1(1 k Rm)
An = Aparent + Cskip ( d ) Rm + n(1 n Cm Rm)
(3)
When a node loses connection with its parent, or is

unable to transmit a message to its parent, this node
will initiate a procedure to find a new parent. Before
the disconnected node try to re-join the network, it
shall use the disassociation procedure to disassociate
all of its children. Similarly, if a disassociated child
has its own children, it shall disassociate them from
the network before attempting re-association.
Obviously, such a process of ZigBee is not very
appropriate due to it could cause many unnecessary
message exchanges and other traffic disruptions. In
this condition, we propose an improved method in
order to reduce the collision of re-joining network
and message exchanges on the tree topology. Figure 1
depicts the tree topology of ZigBee network.
271
information of the failure report is an indication to

start the failure recovery process.
Therefore, the mechanism of detection doesnt
allow all the nodes in the network to detect a failure,
just make its first hop members to know about the
failure, and invoke a recovery mechanism. Note that
the coordinator is considered to be main-powered and
will not disconnect itself or be disconnected.
4.3 Fault recovery

First, we assume that sensors are deployed densely.
Based on the description of section III, we know the
self-healing process of ZigBee is inefficient since it
could result in many unnecessary message exchanges,
waste precious energy and bandwidth, and multiple
disconnected nodes may try to join the same potential
parent simultaneously causing rejected re-joining due
to limited capacity of ZigBee node.
The procedure for finding a suitable parent is
shown in Fig.3 and Fig.4.
Figure 1. Tree Topology of ZigBee network
4 Proposed algorithms
4.1 Tree formation
This paper deals with the fault detection and
recovery based on a tree topology of ZigBee network.
The nodes in the network are classified into four
types, leaf node, pre-leaf node, internal node and the
root node. The descriptions of each node are
explained in table 1 and illustrated in figure 2.
Step1.The healthy child of the failing node sends a

join
request
messageJOIN_REQUEST
(IEEE_addr) to all its neighbors.
Step2.Wait for the response from all the
neighbors-JOIN_RESPONSE (NWK_addr, status,
depth), NWK_addr is the assigned network
address, status indicates whether joining is
successful, depth is the depth of the neighbor.
Step3. If status is successful, then the neighbor is
the potential parent, choose the one with the
minimum depth of all the potential parents. If no
node is qualified, then go to Step1.
Figure 3. Procedure for joining node
Figure 2. Type of Node in ZigBee network
Table 1.
Step1. Receive a JOIN_REQUEST (IEEE_addr), if

no capacity to for any new node, then send the
JOIN_RESPONSE (NWK_addr, status, depth)
with the status of failure to the requesting node and
this process completes. Otherwise go to the next
step.
Description about type of node
Type of Node
Description
Root Node
Pre-leaf Node
Coordinator
Own at least one pre-leaf
node or internal node as child
Its children are all leaf nodes
Leaf Node
No child nodes
Internal Node
Step2. Compute NWK_addr and depth, and then

send the JOIN_RESPONSE (NWK_addr, status,
depth) with the status of success to the requesting
node.
Figure 4. Procedure for potential parent
This section we propose three types of failure

mechanisms depending on the type of node in the
network. The algorithm for fault recovery is as follow:
1) Leaf Node Fault Recovery Algorithm
4.2 Fault detection

In this section, we discuss the failures related to
energy exhaustion. We assume every node has a
record of its balance energy, and periodically check
its own energy status. When nodes energy level
drops below the threshold value, it will send a failure
report message to its parent and children. This
If the leaf node is failing, it will be ignored due to

it does not affect the connectivity of the ZigBee
network. Once the parent receives the failure report
of the faulty child, it considers it a non-active
member.
272
2)
Pre-Leaf Node Fault Recovery Algorithm
node, pre-leaf node or internal node, which will be

treated accordingly in the procedure above, here we
dont perform recurring failure recovery procedure.
If the pre-leaf node is failing, then the healthy

children will find another suitable parent as explained
in Figure 3. If no potential parent can be found, the
healthy child will become an orphan node since it is a
leaf node. Otherwise, the parameters (i.e., the new
network address and new network depth) of the
healthy child are updated and data transmissions then
follow the new paths. If the child of the failing preleaf node is also failing, it will be ignored.
3) Internal Node Fault Recovery Algorithm
5 Performance evaluation
We evaluate the performance improvement of our
mechanism by comparing the built-in self-healing
process of ZigBee. We use Visual C++ as the tool. In
each of our experiments, we study five different
sensor fields, ranging from 100 to 500 nodes in
increments of 100 nodes, all these nodes are
randomly deployed in a region of size 400m400m.
And set the transmission range as 50 meters. We set
the network parameters (Cm, Rm, Lm) as three
schemes(6, 5, 4), (4, 4, 5), (4, 3, 6). Based on
equation (1), we can conclude the maximum nodes
number (Nm) is given by the following equation:
First, we assume the amount of the child nodes

which have joined one node is defined Ci, and the
capacity of this node for allowing other new nodes to
join is defined Ca, which is equivalent to (Cm-Ci).
If the internal node is failing, then the healthy
children exchange messages which include the value
of Ca. The healthy child with the maximum Ca is
selected as the new parent of the rest children. In
other words, let as many other healthy children of the
failing internal node as possible attach to the elected
child directly.
Due to the limited capacity of the new parent or
beyond the range of radio frequency, it could result in
one or more children cant join this elected child.
Such procedure is illustrated in figure 5. Node a is
failing, and all its children elect a new parent-node b,
then the rest children c and d try to re-join b. But
node d is failed, due to the distance between b and d
is beyond the range of radio frequency.
if Rm = 1
Cm Lm + 1,
Nm = 1 Cm RmLm + Cm Rm
, otherwise
1 Rm
(4)
From the equation (4) we can compute Nm of such

three schemes are 937, 1365 and 1457 respectively.
In Fig.6, it shows that there is one coordinator, 155
routers and 781 end devices in the random network
with the network parameters (Cm=6, Rm=5, Lm=4).
Figure 5. Failure of Internal Node
Then, the disconnected node will try to re-join a

suitable parent accordingly in the process as stated in
Figure 3. After it has successfully rejoined the
network, it will be assigned a new network address. If
its new network depth is smaller than or equal to its
previous one, it will keep all its descendants, and
assign new network addresses to its children based on
its network address and depth, this process will
continue to all leaf nodes of the subset network.
Otherwise, if its new network depth is larger than the
previous one and smaller than Lm, it computes new
network addresses and new depths, and assigns them
to its healthy children. This process will continue to
the new depth of a node becomes Lm, and then it
could request each of its children to leave the network.
If the child of the failing internal node is failing as
well, then the treatment depends on the type of the
node. As we know, the failing child may be a leaf
Figure 6. Random Network Nodes Deployment
Fig.7 shows the information of disconnected

nodes re-join related messages. We assume at the
same conditions, the re-join message number of the
built-in self-healing process of ZigBee is 100%.
Obviously, the total number of messages of our
proposed mechanism is smaller than the original
mechanism of ZigBee. On the original mechanism,
the disconnected node will request all its descendants
leave the network, those disconnected children in turn
request their children to leave the network also. Then
all the disconnected nodes will try to re-join the
network as shown in Fig.3. This process could result
in many exchange messages. But in our mechanism,
we take a elected new parent strategy, and keep as
273
many as descendants as possible, without

disconnecting its children. Therefore, it can save
certain energy and bandwidth. It has different effect
on the network, when the network parameters (Cm,
Rm, Lm) is different.
6 Conclusion
In this paper, we have proposed an efficient fault
recovery mechanism about disconnected nodes how
to re-join the network will result in less exchange
messages, which is based on ZigBee Specification,
supposing a node is termed as failing when its energy
level drops below the threshold value. We design
three types of failure mechanisms depending on the
type of node in the tree network of ZigBee. And we
take a elected strategy from some disconnected
healthy children, to reduce the collision of re-joining
process. As the performance evaluation shows, the
proposed mechanism can reduce certain precious
energy, and it will extend the life of the network.
In a future work, we plan to design a more
complex fault detection mechanism, and we will
combine a new cluster mechanism into the ZigBee
network.
References
Figure 7. Percentage of Re-joining messages Compare to the

built-in self-healing of ZigBee
[1]
I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E.

Cayirci. Wireless sensor networks:a survey,
Computer Networks, 2002,38(4): 393-422 .
[2] Iman Saleh, Adnan Agbaria, Mohamed Eltoweissy. InNetwork Fault Tolerance in Networked Sensor Systems,
Proceedings of the 2006 workshop on Dependability issues in
wireless ad hoc networks and sensor networks
(DIWANS06), Los Angeles, CA, USA, September 25,2006.
[3] Ruiping Ma, Liudong Xing, Howard E.Michel. FaultIntrusion Tolerant Techniques in Wireless Sensor Networks,
Proceeding of the 2nd IEEE International Symposium on
Dependable,Autonomic and Secure Computing (DASC06),
Indianapolis, Sept.-Oct. 2006.
[4] ZigBee Alliance, "ZigBee specification", ZigBee Document
053474r13, December 2006.
[5] LAN/MAN Standards Committee, IEEE Std. 802.15.42003,Wireless Medium Access Control and Physical Layer
Specifications for Low Rate Wireless Personal Area
Networks(WPANs), New Youk:IEEE Press.2003.
[6] M. Ding, D. Chen, K. Xing, X. Chen. Localized faulttolerant event boundary detection in sensor networks,
Proceedings of the Annual IEEE Conference on Computer
Communications (INFOCOM), Miami, March 2005,2:902913.
[7] J. Staddon, D. Balfanz, and G. Durfee. Efficient Tracing of
Failed nodes in Sensor Networks, ACM WSNA02, Atlanta,
Georgia, USA, September 2002 .
[8] N. Li and J. C. Hou. FLSS: A Fault-Tolerant Topology
Control Algorithm for Wireless Networks, In Proceedings of
the 10th Annual International Conference on Mobile
Computing and Networking, New York, USA, 2004: 275
286.
[9] G. Gupta and M. Younis. Fault-Tolerant Clustering of
Wireless Sensor Networks, Wireless Communications and
Networking, 3:15791584, 2003..
[10] GAO Jianliang, XU Yongjun, LI Xiaowei. Online
Distributed Fault Detection of Sensor Measurements,
TSINGHUA SCIENCE AND TECHNOLOGY, 12(S1), July
2007.
Fig.8 shows the percentage of disconnected nodes

re-joining network successfully. We design the
network parameters (Cm, Rm, Lm, Nm) as three
schemes(6, 5, 4, 937), (4, 4, 5, 1365), (4, 3, 6,
1457). When the number of network nodes is small, if
there are more routers, the successful probability of
re-joining network may become bigger. With the
expansion of the network scale, the number of
neighbors of the disconnected nodes will be increased,
the successful probability of re-joining network may
also become bigger, but the primary factor become
the maximum depth of the network (Lm). If the Lm
of the network is too small, it may result in many
disconnected nodes re-joining network more difficult.
Figure 8. Percentage of Success in Re-joining Network
274

A Novel Fault Detection and Recovery Mechanism For Zigbee Sensor Networks

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

A Novel Fault Detection and Recovery Mechanism For Zigbee Sensor Networks

Загружено:

Авторское право:

Доступные форматы

2008 Second International Conference on Future Generation Communication and Networking

The remainder of this paper is organized as

*This work is supported by Science and Technology Research

978-0-7695-3431-2/08 $25.00 2008 IEEE

get a stain. Unlike existing solutions suppose all

can generate abnormal readings deviated from a

The kth router and nth end device shall be assigned

When a node loses connection with its parent, or is

information of the failure report is an indication to

4.3 Fault recovery

Figure 1. Tree Topology of ZigBee network

Step1.The healthy child of the failing node sends a

Figure 2. Type of Node in ZigBee network

Step1. Receive a JOIN_REQUEST (IEEE_addr), if

Description about type of node

Step2. Compute NWK_addr and depth, and then

This section we propose three types of failure

4.2 Fault detection

If the leaf node is failing, it will be ignored due to

Pre-Leaf Node Fault Recovery Algorithm

node, pre-leaf node or internal node, which will be

If the pre-leaf node is failing, then the healthy

First, we assume the amount of the child nodes

From the equation (4) we can compute Nm of such

Figure 5. Failure of Internal Node

Then, the disconnected node will try to re-join a

Figure 6. Random Network Nodes Deployment

Fig.7 shows the information of disconnected

many as descendants as possible, without

Figure 7. Percentage of Re-joining messages Compare to the

I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E.

Fig.8 shows the percentage of disconnected nodes

Figure 8. Percentage of Success in Re-joining Network

Вам также может понравиться