Вы находитесь на странице: 1из 12

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO.

8, AUGUST 2008 1099

A Tree-Based Peer-to-Peer
Network with Quality Guarantees
Hung-Chang Hsiao and Chih-Peng He

Abstract—Peer-to-peer (P2P) networks often demand scalability, low communication latency among nodes, and low systemwide
overhead. For scalability, a node maintains partial states of a P2P network and connects to a few nodes. For fast communication,
a P2P network intends to reduce the communication latency between any two nodes as much as possible. With regard to a low
systemwide overhead, a P2P network minimizes its traffic in maintaining its performance efficiency and functional correctness. In this
paper, we present a novel tree-based P2P network with low communication delay and low systemwide overhead. The merits of our
tree-based network include 1) a tree-shaped P2P network, which guarantees that the degree of a node is constant in probability,
regardless of the system size (the network diameter in our tree-based network increases logarithmically with an increase in the
system size, and in particular, given a physical network with a power-law latency expansion property, we show that the diameter of our
tree network is constant), and 2) provable performance guarantees. We evaluate our proposal by a rigorous performance analysis,
and we validate this by extensive simulations.

Index Terms—Peer-to-peer systems, tree-based networks, multicast, performance analysis.

1 INTRODUCTION
P
weighted path length n1
P EER-TO-PEER (P2P) networks (or overlays) have recently
become an active research area. Applications over
P2P networks include information retrieval, content dis-
i¼1 cvi viþ1 among all possible short-
est paths, where the edge cost (that is, the delay in this
study) of two adjacent nodes vi and vj on the path is denoted
tribution, processor cycle sharing, etc. These applications by cvi vj . Notably, we explicitly differentiate between the
often demand that their underlying P2P network infra- “diameter” and “weighted diameter” in this paper. For a
structures be scalable and have low diameter and overhead. low overhead, we mean that an overlay has a low system-
For example, an Internet-scale file sharing system, namely, wide operational traffic in maintaining the performance
Oceanstore [1], is designed and deployed on top of a efficiency and functional correctness of the overlay. More
P2P network Tapestry [2]. Tapestry is scalable in that each precisely, we estimate the overhead of an overlay as
P
node participates in the network by using Oðlog X Þ e2E ce f, where ce is the delay of sending a control message
connections. Its overlay diameter is equal to Oðlog X Þ, through the overlay link e, and f is a predefined maximum
where X is the total number of nodes in the system. bandwidth required for sending a control message.1 We
In this study, we concentrate on addressing the above- assume that the control messages used to construct and
mentioned fundamental requirements, that is, scalability, maintain an overlay have the same message length.
low diameter, and low overhead, for the overlay network In this work, we are particularly interested in studying
infrastructures. By scalability, we mean that each node tree-based overlay networks. We aim at designing a scalable
only has partial knowledge regarding the entire network tree-based overlay with low (weighted) diameter and over-
structure. This implies that each node in the network head. Tree-based overlays are often the core infrastructures
maintains very few overlay links. For diameter, consider adopted by P2P applications that demand collective com-
that the shortest routing path v1 ; v2 ;    ; vn of any message in munication services (for example, message multicasting and
an overlay network G ¼ ðV ; EÞ, where v1 ; v2 ;    ; vn 2 V are reduction [5]). For example, consider a tree-based live media
distinct. The diameter of G is the maximal path length n of a multicasting system in which a root peer in a tree-shaped
path among all possible ones in G. An overlay with a low overlay acts as a source that stores a complete media stream
diameter is desirable, since a route between any two nodes and offers the stream to nonroot peers. Meanwhile, each
visits a lesser number of intermediates and is thus less nonroot peer downloads the stream from its upstream peer
sensitive to faults of these intermediates [3]. In contrast to and relays those downloaded to the downstream peers if
the diameter, the “weighted” diameter is the maximally available.

1.1 Previous Studies


. The authors are with the Department of Computer Science and Information Perhaps, the studies most relevant to our work are [6], [7],
Engineering, National Cheng-Kung University, Tainan 701, Taiwan, [8], [9], [10], and [11], considering the P2P setting. The
R.O.C. E-mail: hchsiao@csie.ncku.edu.tw.
earlier work relies on tree-shaped overlays to facilitate
Manuscript received 15 June 2007; revised 21 Sept. 2007; accepted 16 Oct.
2007; published online 24 Oct. 2007.
streaming media contents. Chu et al. [6] suggest construct-
Recommended for acceptance by K. Hwang. ing a mesh overlay network. Given a mesh network, a tree
For information on obtaining reprints of this article, please send e-mail to:
tpds@computer.org, and reference IEEECS Log Number TPDS-2007-06-0193. 1. Li et al. in [4] suggest that the overhead required for constructing and
Digital Object Identifier no. 10.1109/TPDS.2007.70798. maintaining an overlay is parameterized by the bandwidth metric.
1045-9219/08/$25.00 ß 2008 IEEE Published by the IEEE Computer Society
Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.
1100 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO. 8, AUGUST 2008

subnetwork Narada with good quality links is then built formatted by having an internal tree node with a hash ID x
and maintained. Narada implements a shortest path pick its fingers, with IDs between ½x; yÞ being the children
spanning tree algorithm and, thus, intends to minimize nodes for the tree, where nodes with IDs x and y are
the weighted diameter and the overall traffic generated. The immediate siblings in the tree [18], [19]. Clearly, such a tree
shortest path spanning tree, however, may not guarantee network can serve as a collective communication substrate,
the network diameter to be minimal. In addition, Narada where each node in the tree network maintains Oðlog X Þ
targets at a small-scale environment, which acquires the children nodes, and the overall diameter of the tree is
global knowledge to construct its tree network. 2  Oðlog X Þ. However, the performance of the tree network
Banerjee et al. [7] designed a tree overlay network NICE, embedded in a DHT may not be optimal in terms of
in which a node takes Oðlog X Þ connections to join the tree. the weighted diameter and overhead. For example,
In NICE, nodes geographically nearby form a cluster, and Bharambe et al. [20] conclude that the trees embedded in a
the cluster is the basic building block for the tree. NICE DHT network may not have a low weighted diameter and
guarantees the diameter of the tree to be equal to an overhead due to the deterministic structure of a DHT
2  Oðlog X Þ. It is, however, unclear whether NICE has the network and the mismatch of the ID space and the physical
minimal overhead, although the tree network exploiting the network topology. Instead of relying on trees embedded in
physical network topology operates toward the minimiza- a general-purpose DHT network, in this paper, we are
tion of the overhead. It is also unclear whether the weighted interested in designing a tree-shaped overlay to provide
diameter of NICE can have a bound guarantee. collective communication. In particular, we intend to design
Tran et al. [8] present the construction of a tree network, a tree network with good performance guarantees.
namely, ZIGZAG, similar to that in NICE. In [8], Tran et al.
1.2 Our Idea and Contribution
offer an in-depth performance analysis for ZIGZAG. Over-
all, if the size of a cluster is in ½k; 3k, ZIGZAG guarantees In this study, we present a scalable tree network T T ¼ ðV ; EÞ
that each node takes Oðk2 Þ connections to join the tree with low (weighted) diameter and overhead. To build our
and the diameter of the tree is 2  Oðlogk X Þ. However, tree network T T, we first design a tree-based overlay T with
Tran et al. do not investigate the bounds for the weighted a low diameter. We denote T as T 1 , and T 2 is formed by
diameter and systemwide overhead. structuring d disjoint T 1 , where d is a given positive integer.
In contrast to [6], [7], and [8], Hefeeda et al. [9] present a In general, T k1 consists of d T k2 trees. In our proposal, the
tree-based overlay network that exploits the physical height HðT i Þ of a tree T i ð1  i  k ¼ logd X Þ is guaranteed
Internet topology. Liao et al. [10] present how one can to have a bound of Oðln dÞ if we treat each subtree T i1 as
utilize the tree links that are used to bridge different tree a single node in T i . Let T T ¼ T k . With the recurrence
overlays. Both studies aim at minimizing the delay of equation [21], this results in the height of T T being HðT TÞ ¼
receiving a message for any nonroot peer. Clearly, such a
HðT k Þ ¼ HðT k1 Þ þ Oðln dÞ ¼ Oðln d  logd X Þ ¼ Oðln X Þ and,
design principle works toward the minimization of the
thus, the diameter of T T being 2  Oðln X Þ. Since in T i , our
weighted diameter and the systemwide overhead. How-
ever, the designs presented in [9] and [10] have no design allows each T i1 to freely pick a geographically
performance guarantees. nearby node as its parent, such a flexibility for picking a
Chunkyspread [11] shows that tree-based overlays are parent node reduces the weighted diameter of the resulting
viable solutions to live media broadcasting in the face of TT to a constant.
peers joining, departure, and failure. In Chunkyspread, We summarize our major contributions as follows:
participating nodes balance their loads in streaming data. A
node is forced to connect to a new parent node if its present 1. We propose a decentralized algorithm that constructs
parent is overloaded. If the load of a node is under a and maintains a tree network with low (weighted)
targeted lower bound, the node may accommodate more diameter and overhead. To our best knowledge, our
children nodes. In Chunkyspread, instead of optimizing the design is the first attempt to address these design
network diameter, nodes minimize the latency of receiving issues simultaneously.
media contents sent by the root. 2. Our tree-shaped overlay has provable performance
England et al. [3] investigate the design trade-off of the guarantees, which is efficient in that with a constant
data loss rate and performance-oriented metrics (for probability, the degree of each node in the network is
example, the delay from the source to a destination) for constant. The expected diameter of our tree network
tree-based overlay networks. They present a tree-based is Oðln X Þ. Given a physical network with the power-
overlay to achieve a desirable trade-off. Banik et al. [12] law latency expansion [22], the expected weighted
propose a tree-based network to satisfy the given con- diameter is OðÞ, where  is the maximal delay
straints of the delay bound and the delay variation bound between any two peers in the physical network.
from the source to any destination. 3. We offer a thorough and rigorous theoretical analysis
Structured overlay networks are general-purpose com- for our tree-shaped overlay protocol. Our analytical
munication infrastructures for P2P applications like file results have tight performance bounds. We also
sharing, multicasting, information retrieval, and processor validate our analytical results in simulations.
cycle sharing. Structured P2P systems, for example,
Chord [13], Pastry [14], and Tapestry [2], which are all 1.3 Roadmap
based on distributed hash tables (DHTs), may include tree The remainder of this paper is organized as follows: Section 2
structures into their designs [15], [16], [17]. A possible gives the definitions, notations, and assumptions. The design
tree structure embedded in a DHT overlay, say, Chord, is of our constant-degree low-diameter tree-shaped overlay is

Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.
HSIAO AND HE: A TREE-BASED PEER-TO-PEER NETWORK WITH QUALITY GUARANTEES 1101

TABLE 1
Notations Frequently Used

Fig. 1. An example of a T k tree consisting of six T k1 trees, in which


described in Section 3. We discuss how our overlay exploits
each node v in each of T k1 and T k trees has a maximum degree d^Tv ¼ 6.
the underlay network locality such that the resulting
weighted diameter becomes constant in Section 4. Section 5
presents the performance analysis for our tree overlay Section 3.1. The details of the tree protocol are then given
protocol. We also perform the simulation study, and the in Sections 3.2 and 3.3. We defer the discussion of how our
simulation results are given in Section 6. We summarize our tree network exploits the physical network locality in
study in Section 7, with possible future research directions. Section 4.
3.1 Overview
2 DEFINITIONS, NOTATIONS, AND ASSUMPTIONS Fig. 1 shows our idea for constructing a constant-degree
We model a P2P network as an undirected graph low-diameter tree. Basically, our tree is recursively formed
G ¼ ðV ; EÞ, where V includes the nodes participating in in a hierarchical fashion. The basic element of our tree is a
the system, and E represents the overlay links among the T 1 tree. A T i tree is built by at most d^Tv T i1 trees, where
nodes. An overlay edge e 2 E between two nodes u and v in 1  i  k. The resulting tree that our tree protocol con-
V is denoted by e ¼ uv. In this paper, the delay of an edge uv structs is T T ¼ T k . We note that ðd^Tv Þk is the maximum
in an overlay network is denoted by cuv . We assume in this number of nodes in T T.
study that nodes in V may come and go. Some terminologies When forming a T i tree, nodes self organize, and the
that are frequently used are defined as follows: maximal path length from the root to any leaf is bounded.
Definition 1. The simple path (or path) from u 2 V to We note that in each T i tree, the root node, denoted by r, is
v 2 V ðu 6¼ vÞ, denoted by u ! v, is a connected subgraph associated with an only child node r:chd½1. This allows us
ðV 0 ; E 0 Þ  G such that the cardinality jV 0 j ¼ jE 0 j þ 1, where to minimize the degree of the root. In contrast, nonroot
V 0  V , and E 0  E. We let the path length ju ! vj ¼ jV 0 j. nodes can use up to the degree of d^Tv to participate a T i tree.
Once a T i tree is constructed, its root node proceeds to
Definition 2. The shortest path length from u 2 V to join a T iþ1 tree. Possibly, the root remains a root node of a
v 2 V ðu 6¼ vÞ, denoted by lu;v , is lu;v ¼ minfju ! vk8u ! T iþ1 tree. Otherwise, it can connect not more than d^Tv nodes
v  Gg. The shortest path length is the length of the shortest in T iþ1 . For example, in Fig. 1, the root node A is associated
simple path from u to v.
with an only child node C in a T k1 tree. The root node A
Definition 3. The diameter of a graph G, denoted by DG , is the then participates in a T k tree and maintains another child
maximal shortest path length ju ! vj from any node u 2 V to node D for the T k tree. That is, a node may participate in
any v 2 V . That is, DG ¼ maxflu;v j8u 6¼ v 2 V g. several T k trees (where k ¼ 1; 2; 3;    ), which serves as a
Definition 4. The degree of a node v 2 V , denoted by dG
v , is jUj,
root node in each T k and maintains an only child node for
where any node u 2 U  V  fvg has uv 2 E. each T k . In contrast, the root node B of another T k1 tree in
Fig. 1 joins the T k tree as a nonroot node. The resulting
We assume in this study that there exists at least a robust T k tree in Fig. 1 consists of nodes A, B, D, E, F, and G
bootstrap node to help a node join/rejoin the network. that are the roots of corresponding T k1 trees.
Table 1 lists the notations frequently used in this paper. We call the tree formation protocol that forms a T i tree
for any 1  i  k as the T protocol in the following
discussions.
3 CONSTANT-DEGREE SMALL-DIAMETER OVERLAY
NETWORK PROTOCOL 3.2 T Protocol for Constant Peers
In this section, we first give an overview of the idea We consider formatting and maintaining a tree network
regarding our tree-based overlay formation protocol in T i ¼ T ¼ ðV ; EÞ, where 1  i  k.

Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.
1102 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO. 8, AUGUST 2008

3.2.1 Network Construction 1. If diffðAÞ < diffðBÞ, B reports its parent node B:prt
We first define the following notation: to A. Upon receiving the network address of B:prt, A
then iteratively performs the joining by sending the
Definition 5. The numerical difference of a node v with
joining request to B:prt. The joining process pro-
respect to the root node r, that is, diffðvÞ, is defined as
ceeds until the joining request is forwarded to an

def 4; F ðr:chd½1Þ  F ðvÞ; ancestor node Q of B:prt, and diffðQÞ < diffðAÞ. A
diffðvÞ ¼
R þ 4; otherwise; then connects to Q.
2. Otherwise, if diffðAÞ > diffðBÞ, A simply connects B
where F can be an arbitrary collision-free hash function that as B’s child node.
can provide a unique ID ( 1) to a node, R is the maximum
value that F can return, and 4 ¼ F ðvÞ  F ðr:chd½1Þ. We note the following in our tree formation protocol:

1. The bootstrap node picks the first node that joins the
When a node A intends to join the overlay, it first connects network as the root node r. r is then registered with
to the bootstrap node2 that provides an entry point, that is, the the bootstrap node.
root node r, of the overlay (see Algorithm 1). The root node r 2. When nodes are forming a tree network, the root
then helps A join by uniformly picking a node in the tree at node r in the tree always maintains only one child
random. Notably, r has the knowledge of the tree topology node r:chd½1. The second node that joins the network
(discussed later) such that r picks a node in the tree uniformly simply becomes the only child of r. That is, dTr ¼ 1.
at random without consuming any network traffic. 3. Any node v, except r, can accept any number of
nodes as their children subject to their degree
constraints. That is, dTv  d^Tv .
4. The total number of nodes in a tree is up to dTv . That
is, dTv ¼ d^Tv .
5. Each leaf node v of the tree requires sending a live
message to its parent v:prt such that v:prt can keep
track of the number of its children nodes and its
subtree topology. v:prt performs similarly so that the
parent node ðv:prtÞ:prt of v:prt can add up the size of
the “subtree” rooted at ðv:prtÞ:prt and maintain the
knowledge of its subtree topology. r can then have
the topology of the tree and calculate the total
number of the nodes in the tree.
6. If the tree rooted at r contains up to d^Tv nodes, then r
will not include any newly coming node into its tree.
r deregisters from the bootstrap node. r may
reregister with the bootstrap if r maintains less than
d^Tv nodes.

3.2.2 Network Maintenance


Our tree-shaped overlay network may be fragmented due to
node failure or departure. To handle the dynamics of the
overlay, each node v periodically pings its parent node v:prt.
If v:prt fails to respond to v, v assumes the failure of v:prt
and then rejoins the network via the help of another node in
the network by consulting its local cache.
Algorithm 2 details the overlay maintenance. A node A
first checks whether its parent A:prt is active by sending a
ping message periodically. Upon receiving a ping message,
A:prt replies to A with a pong message. If A does not
receive any pong message from A:prt, A then sends another
ping message to A:prt. If a number of ping messages are
sent and A does not receive any pong from A:prt, A then
performs the rejoining operation by sending a joining
request to a node U picked uniformly at random from
When the random node, say, B, is determined, the
A’s cache (denoted as CacheðAÞ) that A locally maintains.
process is immediately performed as follows:
The rejoining operation simply lets a rejoining node join the
network by using the joining algorithm discussed in the
2. We adopt the mechanism similar to Gnutella [23], which provides a
bootstrap node for a node joining. Possibly, there are several bootstrap previous section. In our design, before A performs its
nodes to help nodes join the overlay. rejoining, A needs to notify all nodes in its subtree (that is,

Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.
HSIAO AND HE: A TREE-BASED PEER-TO-PEER NETWORK WITH QUALITY GUARANTEES 1103

the subtree rooted at A). Upon receiving the notification 3. Similarly, any node in the subtree rooted at A rejoins
sent from A, the nodes in A’s subtree leave and join the by randomly picking an entry node from its cache. If
overlay. The notification can be simply implemented by the entry node is unavailable, the node rejoins via
sending the notification message downward a subtree. the bootstrap node.
4. If a node is performing its rejoining via an entry
point that is also performing the rejoining, then the
node selects another node in its cache as a new entry
point. Similarly, if a node cannot find any node in its
cache to help its rejoining, the node requests the
bootstrap to pick one.
5. Possibly, more than one of the nodes in the subtree
rooted at r:chd½1 select r from their local caches as
their entry points for their rejoining with join ðrÞ. If
r:chd½1 leaves or fails, r will pick one of these nodes
as its r:chd½1. Otherwise, those nodes that fail to
become r:chd½1 then rejoin by selecting other entries
from their caches (or by consulting the bootstrap
node in case no live cached node can be found).
Each node A in the network maintains a cache, denoted
by CacheðAÞ, by using a PULL algorithm. Basically, in our
PULL algorithm, each node A has to periodically send a live
message to its parent A:prt. Upon receiving a live message,
A:prt collects the IDs in the subtree rooted at A:prt. A:prt
then performs the similar by sending the received IDs to its
parent. This process continues so that r collects the IDs of all
nodes in the tree network rooted at r. r then disseminates
these IDs to all nodes in the network along the tree
structure. Consequently, each node in the network can
construct and maintain its cache that contains the IDs of the
nodes in the network.
Consider the example given in Fig. 1. A is the root of the We finally note that our network maintenance protocol
tree T k , where T k is formed by the root nodes A, B, D, E, can handle the failure of r:chd½1 well. Consider a level-i tree
F, and G of the tree T k1 . Assume that D fails. E then T i with a root node r, r’s only child node r:chd½1, and
detects the failure of D without receiving any pong two other nonroot nodes C and D. Assume that the children
messages from D. E informs the nodes (that is, B and F) nodes of r:chd½1 are C and D. If r:chd½1 fails, C and D
in T k in the subtree rooted at E to rejoin. G performs the detect the failure of r:chd½1 and then need to rejoin the
same operations if it has offspring nodes in T k . E and G network. If the local cache maintained by C (or D) is up to
then rejoin also. date and has r’s location, then C (or D) connects to r and
We will provide a theoretical analysis for the becomes an only child node of r. Otherwise, C (or D)
rejoining cost (Definition 8) in Section 5.1. Our analysis consults the bootstrap node to seek a root node of a
result (Theorem 4) presents that our network maintenance level-i tree as its entry point for its rejoining.
protocol is efficient. The expected rejoining cost of the
2
maintenance protocol is OðNl Þ, where l is the diameter of a
3.3 T T: Scaling with the T Protocol
T i tree, and N is the number of nodes participating in T i . We have presented the basic algorithm for forming a tree
We emphasize that it is critical to let a node pick a node network T that can consist of up to d^Tv , where d^Tv is also the
uniformly at random for its joining or rejoining the network. maximum degree of a node.
This enables our protocol to guarantee that l ¼ Oðln N Þ in For constructing a level-2 tree T 2 , root nodes r in distinct
1
expectation. Section 6.5 verifies these analytical results. T trees query the bootstrap node for their entry points.
We finally note the following for our network main- This process is identical to that of the joining of a node into
tenance protocol: a T 1 tree, except that the candidate entry points that can
help these root nodes form their T 2 trees are the root nodes
1. A rejoins by first invoking join ðUÞ shown in of T 1 trees. Therefore, in our tree formation protocol, we
Algorithm 1, where U is the node ID maintained require the bootstrap node to additionally label each
in A’s local cache. registry node with its level ID. The bootstrap node depends
2. If A cannot find any live node U from its cache for its on the level ID to identify the “root level” of a registry node.
rejoining, A needs to locate an entry point B from the That is, the root node of a T k tree will be labeled with the
bootstrap node3 for its rejoining with JOINðBÞ. level ID k in the bootstrap. For example, if a node is a root
3. In this study, we assume that the bootstrap node is always alive and node of a level-3 tree, then it will have level ID 3 in the
keeps some random nodes participating in the tree overlay. bootstrap.

Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.
1104 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO. 8, AUGUST 2008

If TT is a level-k tree T k , then the above-mentioned Definition 7. The weighted diameter in a graph G is
P
process proceeds until multiple T k1 trees self organize into maxf adjacent nodes v;u in p cvu j8 shortest paths p  Gg,
a T k tree. Similarly, the root nodes of these T k1 trees form where p denotes the simple shortest path.
a T k by consulting the bootstrap for the locations of roots,
with level ID k  1 being the entry points. Clearly, without exploiting the physical network locality,
We note the following:
T can have the “weighted diameter” Oð logd^ X Þ ¼
T
1. If a node is the root of T i , then it must also be the Oðlogd^ X Þ, where T i ð0  i  kÞ has d^ nodes, and the total
root of T i1 ; T i2 ;    ; T 1 . Consider the example nodes in T T is X .
shown in Fig. 1. The node A is not only a root node
Lemma 1. Let X1 ; X2 ;    ; Xn be independent random variables
of T k1 but also a root of T k . It is easy to verify that
A is also a root node of T j , where j ¼ 1; 2;    ; k  2. over [0, 1], where X1 ; X2 ;    ; Xn follow the probability
2. Nodes can use the degree of up to dTv to form a distribution with the -power-law latency expansion
level-i tree, where 1  i  k. Therefore, in a level-i tree,
P P ðX < xÞ ¼ x . Let Y ¼ minfX1 ; X2 ;    ; Xn g. Then, there
the root has the “total” degree equal to ij¼1 1 ¼ i, 1
exists Y < c such that E½Y   n  , where c is a positive
since the root node (for example, the node A in Fig. 1)
number.
maintains an only child node in each level-k tree,
where k ¼ 1; 2;    ; i. In contrast, the “maximum” total Proof. Since X1 ; X2 ;    ; Xn are independent and
degree of a nonroot node in a level-i tree is dTv þ i  1. Y ¼ minfX1 ; X2 ;    ; Xn g
This is because a nonroot node (for example, the
P ðY  yÞ ¼ P ðminfX1 ; X2 ;    ; Xn g  yÞ
node B in Fig. 1) in a level-i tree must be a root node of !
a level-k tree (where k ¼ 1; 2;    ; i  1). The nonroot \
n
¼P ðXi  yÞ
node then participates in the level-i tree by using the i¼1
degree up to dTv . Y
n
3. Consider a T i tree. If a node v detects the failure (or ¼ P ðXi  yÞ
departure) of its parent v:prt in T i , then similar to i¼1
what we have discussed in Section 3.2.2, v notifies its ¼ ð1  y Þn :
offspring root nodes of the subtree T i1 in T i
Since 1  a  ea (when 0 < a < 1, and a is sufficiently
regarding the failure/departure. Upon receiving
small), it follows that
the notification message, a node u in T i rejoins the
network by using Algorithm 2. Z 1
4. A node in T i maintains i caches, and each cache E½Y  ¼ P ðY  yÞdy
0
is constructed and maintained as mentioned in Z 1
 y n
Section 3.2.2.  e dy
0
Z 1  1

4 EXPLOITATION OF PHYSICAL NETWORK LOCALITY ¼ e n y
dy
0
Studies in [22], [24], and [25] present that the latency Z s
1 
distribution between Internet end hosts likely follow the ¼ ex dx;
s 0
power-law latency expansion. In this study, we thus concen- 1 Rs 
trate on the network graphs with the power-law latency where s ¼ n . Since 0 ex dx  1, E½Y   1s , and the
expansion. proof follows. u
t
Definition 6. A graph follows the -power-law latency Lemma 1 indicates that if a node v picks n nodes
expansion if for each node v in the graph uniformly at random and among these n nodes, v maintains
the one u that has the smallest delay to v, then the delay
Nv ðxÞ ¼ x ; 1
from v to u will be n  . This suggests that our tree protocol
where Nv ðxÞ denotes the number of nodes that have latencies works in the following way to exploit the physical network
not more than x to v, and  and  are two given positive locality:
constants.
1. A node v in T i samples the nodes in T i . Since T i
Assume that the maximal distance between any maintains up to d^Tv nodes (that is, the root nodes of
two nodes in the graph (with the power-law latency T i1 subtrees), each node can sample d^Tv nodes at
expansion) in which our tree network T T overlays is . most. Assume that v performs n samples, where
Then, the probability distribution for Nv ðxÞ is n  d^Tv .
 x  2. v maintains a node u that is closest in terms of the
P ðcvu < xÞ ¼ ; network delay among the n samples.
 3. v then rejoins T i via u if diffðuÞ < diffðvÞ. v thus
where u is any node in the graph, and cvu denotes the becomes a child node of u. Otherwise, v still connects
latency from v to u. Without loss of generality, we let  ¼ 1. to its original parent.

Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.
HSIAO AND HE: A TREE-BASED PEER-TO-PEER NETWORK WITH QUALITY GUARANTEES 1105

The details are shown in the following algorithm: system at time t is denoted by MðtÞ. We also assess the load
of the bootstrap node in this section.
Theorem 1. The number of peers in the system at time t is
OðE½MðtÞÞ, with high probability.4
Corollary 1. Let 
 ¼ N . If t  N , then OðE½MðtÞÞ ¼ N .

Due to space limitations, we give the detailed proofs for


Theorem 1 and Corollary 1 in [31].
Theorem 1 states that the number of nodes in the system
at any time t is OðMðtÞÞ with high probability. Corollary 1
presents that if the system time t  OðN Þ, then the number
of nodes in the system is OðE½MðtÞÞ ¼ N . Therefore, in the
following, we will discuss  operating at t > cN for some c
and denote the number of peers in  at t by N .
Lemma 2. If an overlay  is constructed using JOIN, then  will
be cycle free.
Proof. Consider a cycle, denoted by p ¼ a0 a1 a2    an1 a0 in ,
where a0 ¼ r:chd½1, and fa1 ; a2 ;    ; an1 g  V fr:chd½1g.
We consider the following cases:

1. p is a cycle, because two paths p1 and p2 share


the same end point a0 joint at a node, say,
ai ð1  i  n  1Þ. However, this is impossible,
We note the following: since by definition, each nonroot node in  can
1. Any node v 2 V  fr; r:chd½1g can simply sample only have a parent node.
nodes in v’s cache. 2. p can be a circular path, even without two paths,
2. Since r has the global knowledge of T i , r replaces with r:chd½1 being their end point cross. If so, it
r:chd½1 by the closest offspring node among V  frg. can be easily shown that diffða0 Þ < diffða1 Þ <
If so, all offspring nodes, except the newly selected diffða2 Þ <    < diffða0 Þ. This is a contradiction,
r:chd½1, rejoins T i by using Algorithm 2. and the proof follows. u
t
3. Measuring the latency between two nodes is out of Remark 1. If an overlay  is constructed using JOIN, then a
the scope of our study. This may refer to public node joining  will visit nodes on not more than
network services such as GNP [26]. one path, with the root node being the end point.
4. Nodes in the subtree rooted at v need to rejoin, since Theorem 2. If  implements JOIN, jV j ¼ N , and d^v ¼ jV j for
v successfully connects to a closer parent compared any v 2 V , then the maximal path length in  from r:chd½1 to
to its previous one. These nodes, except for v, any leaf is ln N þ Oð1Þ in expectation.
perform rejoining by Algorithm 2.
Theorem 3. Assume  with N nodes. Denote the maximal path
We will show later in Section 5.3 that if n  2þ1 ðln X Þ ,
length in  rooted at r:chd½1 by the random variable SN .
then the weighted diameter of our tree network T T becomes
Then, SN  6 ln N , with the probability not less than
the constant .
1  N 1 .

5 PERFORMANCE ANALYSIS Since the proofs for Theorems 2 and 3 are lengthy, we
This section provides a rigorous thorough performance refer the readers to our technical report in [31] for the details.
analysis for our proposal given in Sections 3 and 4. Lemma 2 and Remark 1 state that any node a takes a
Section 5.4 concludes this section and presents the implica- finite number of hops to join the overlay and the nodes
tions of our performance results by illustrating an example. helping a join appear on only one path, with r:chd½1 being
the end point. Theorems 2 and 3 show that any path, with
5.1 Performance of T r:chd½1 being the end point has Oðln N Þ hops with high
It is sufficient to consider the subtree  ¼ ðV ; EÞ rooted at probability. We thus conclude as follows:
r:chd½1 in T . Recent measurement studies [27], [28] of real Corollary 2. If  rooted at r:chd½1 with N nodes is constructed
P2P systems (that is, Gnutella [23] and Naspter [29]) with JOIN, then a newly joining node takes Oðln N Þ hops with
provide evidence that peers have lifetimes approximating high probability to join . Clearly, T associated with  has the
the exponential distribution reasonably well [30]. In the maximal path length Oðln N Þ þ 1 ¼ Oðln N Þ from r to any
following analysis, we assume that the system follows the leaf in T .
M=M=1 queuing model, in which the arrival rate of peers
is according to a Poisson distribution with parameter . The
lifetimes for peers are independent and exponentially 4. "With high probability" in this paper denotes the probability not less
ð1Þ
distributed with parameter . The number of peers in the than 1  OðN Þ.

Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.
1106 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO. 8, AUGUST 2008
Pl1
Our tree construction protocol (that is, JOIN) strongly ðN  1Þ
si  N  1 ð1  i  l  1Þ, Cavg  ðl þ 1Þ  i¼1
N 1
¼
depends on the uniformity in selecting a node to join the ðl þ 1Þ  ðl  1Þ ¼ l2  1. The proof follows. u
t
overlay. In addition to randomly picking a node by r for
joining a newly coming node, each node will also rely on Corollary 3. If  with N nodes is maintained with HEAL and
PULL, then in the time interval ½t  N1 
; t,the rejoining cost
randomly and uniformly selecting a node in its cache for 2 2
helping its rejoining if the node detects the failure of its per unit time is not more than Nl1 ¼ O Nl , on the average,
parent node. where l  6 ln N , with the probability not smaller than
Remark 2. If  with N nodes is maintained with HEAL and 1  N 1 , and any t  N1 .
PULL, then any node u 2 V has the identical probability
5.2 Performance of T T
to be picked as an entry point by a rejoining node v,
As we have discussed earlier, T T ¼ T k . We will, in this
where u 6¼ v.
section, report the performance analysis for TT regarding
the degree dvT
T for any v and the maximal path length from
We note that Remark 2 shows that if a node rejoins its
the root node r to any leaf.
tree overlay , then it will pick a node as its entry point
from N  1 participating peers in V with the probability of Theorem 5. Assume that each node v in T T initially has
1
N 1 . If so, the JOIN algorithm shown in Algorithm 1
the degree d^ to form T i , where 1  i  k. Then,
guarantees that the maximal path length from r:chd½1 to a E½dT T ^ TT ^
v  ¼ d þ Oð1Þ, and dv  2d, with the probability not
leaf remains Oðln N Þ (see Corollary 2) with high probability, ^
less than 1  d .3

even if nodes operating the JOIN algorithm are in an Theorem 6. Assume that constructing a d-node^ T i tree takes
environment in which nodes may come and go. ^
tðdÞ time units, where 1  i  k. If E½tðdÞ ^  d^
2ðþÞ
In our HEAL algorithm, if the root node p of a “subtree” ^ d^
(E½tðdÞ 2 when 
), then the number of registry
in  detects the failure of its parent p:prt, it needs to rejoin
nodes in the bootstrap node is less than k in expectation and
and notify the nodes in the subtree regarding the failure.
not more than k2 þ OðkÞ with the probability 1  k4 þ oð1Þ.
That is, each of the nodes in the subtree requires performing
the JOIN algorithm. We are thus interested in knowing the Due to space limitations, we omit the proofs for
cost associated with these rejoining operations. Theorems 5 and 6, and the details of the proofs can be
found in [31].
Definition 8. The rejoining cost due to the failure of a node v
Corollary 4. Let X ¼ d^k be the total number of nodes in T T.
in  is defined as the number of nodes in S1 ; S2 ;    ; Sk
performing their rejoining operations and the number of nodes
Then, the diameter of T T is DT T ¼ 2 ln X in expectation,
and with the probability not less than 1  Oðd^1 Þ, DT T is
that help nodes in S1 ; S2 ;    ; Sk perform their rejoining, where
not more than 12 ln X .
S1 ; S2 ;    ; Sk are the subtrees rooted at v.
Proof. The diameter DT T of TT is the length of the path
Theorem 4. If  with N nodes is maintained with HEAL and
crossing through the root node r in T i ði ¼ 1; 2;    ; kÞ
PULL, then in the time interval ½t  N1 ; t, the rejoining cost
from a leaf node a in T 1 to a leaf b in another T 1 .
introduced by any node is not more than l2 þ Oð1Þ, on the ^ ^
average, where l is the maximal path length from r:chd½1 to
Therefore, DT T ¼ 2k ln d ¼ 2 logd^ X ln d ¼ 2 ln X . By the
proof in [31, Theorem 3], with the probability not less
any leaf, and any t  N1 .
than 1  Oðd^1 Þ, the diameter is not more than 12 ln X .
Proof. Assume that l is the maximal path length from We can also rely on the recurrence [21] to prove this
r:chd½1 to any leaf and that ni is the number of nodes result. Let HðT k Þ be the maximal path length from the
with the i-hop distance Pl1 from the root. Clearly, root of T k to any node in T 1 . Then, we have the
1  i  l  1, and i¼1 n i ¼ N  1 (excluding r:chd½1). recurrence equation HðT k Þ ¼ HðT k1 Þ þ ln d. ^ Since
If a node v having the k-hop ð1  k  l  1Þ distance from k ¼ logd^ X , we have HðT k Þ ¼ logd^ X  ln d^ ¼ ln X . We
r:chd½1 detects the failure of its parent, then v will rejoin thus have the diameter DT T not more than 2 ln X , and
, and those nodes in the subtree rooted at v will also the proof follows. u
t
rejoin. We denote the number of nodes that perform the
rejoining operations by sk;v . It can be easily shown that if We have shown in Theorem 5 that with high probability,
any node v 2 fv1 ; v2 ;    ; vnk g that has the k-hop distance the degree of any node in T ^ We are,
T is not more than 2d.
from r:chd½1 detects the failure of its parent, then the however, also interested in knowing the maximal degree
total number of nodes that need to rejoin  P will be of a node in T
T.
l1
sk ¼ sk;v1 þsk;v2 þ    þsk;vn ¼ nk þ nkþ1    þ nl1 ¼ i¼k ni . Remark 3. The root node in T T ¼ T k has the degree
The total rejoining cost ck is thus not more than sk þ l  sk . k ¼ logd^ X . A nonroot node in the T k tree has the degree
Consider the time interval ½t  N1 ; t for any t  N1 . not more than d^ þ k  1. Thus, the degree of any node in
Thus, N  1 ¼  N1 nodes depart the system in the the system is not more than d^ þ k  1.
interval. In addition, each of these N  1 nodes leaves
the system with high probability. This is because 5.3 Weighted Diameter
the probability distribution of the lifetime  Tl of a Corollary 5. In T i ¼ ðV ; EÞ, any node v 2 V that implements
peer is P ðTl > tÞ ¼ et , and P Tl > N1 ¼ eN þ1 0. JOIN_VICINITY and samples n nodes in V  frg will connect
 1
Therefore, in the time interval ½t  N1 ; t, the average to a parent node u 2 V  fvg such that E½cvu   n2  .
rejoining cost Cavg is c1 þcN2 þþc
1
l
 ðl þ 1Þ  s1 þsN2 þþs
1
l
. Since Proof. Since P ðdiffðuÞ < diffðvÞÞ ¼ 12 , we have

Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.
HSIAO AND HE: A TREE-BASED PEER-TO-PEER NETWORK WITH QUALITY GUARANTEES 1107

Fig. 2. The maximal path length from r:chd½1 to any leaf. Fig. 3. The diameter.

   1
P ðcvu  yÞ \ diffðuÞ < diffðvÞ P ðcvu  yÞ 52 þ 5 ¼ 30, with the probability not less than 1  514 ¼ 0:998
2 n (Theorem 6). This is because by Corollary 2, constructing a
1 ^
d-node ^  0:01  d^  dln de
tree takes E½tðdÞ ^ ¼ 0:3 second, and
¼ 1  y :
2 ^ < d=2
by Theorem 6, E½tðdÞ ^ 0:45 second.
Similar to the proof in Lemma 1, we have
Z 1  6 SIMULATION RESULTS
1  n
E½cvu  ¼ e2y dy We have developed an event-driven simulator that allows
0
Z 1  n 1  the study of the performance of tree-based networks. The
 ð2Þ y performance metrics that we are interested in include
¼ e dy
0 the degree distribution of participating nodes and the
Z s (weighted) diameter of the network, given the number of
1 
¼ ex dx; nodes participating in the system, the mean lifetime of the
s 0
 1  1 joining peers, and the initially maximal degree d^ of a node.
where s ¼ n2  . Therefore, we have E½cvu   n2  , and the In our simulations, the number of nodes participating in
proof thus follows. u
t the system is up to X ¼ 100;000. The initially maximal degree
d^ of a node simulated is from 5 to 100. Each participating peer
Theorem 7. Let each T i ð0  i  kÞ have d^ nodes at most. Let has a lifetime with a mean of 150 minutes [28], [33]. The
lifetime follows the exponential distribution. We have also
X ¼ d^k be the maximum number of nodes in T T. Assume that
investigated the effect of a mean lifetime of 30 minutes. In this
each node in T i samples n ¼ cd^ nodes, where 0 < c  1. If
paper, we, however, omit the simulation results for the mean
n  2þ1 ðln X Þ , then the weighted diameter of T
T is ð1Þ in
lifetime of 30 minutes. This is because we do not observe any
expectation.
significant difference from the simulation results for the two
Proof. Without loss of generality, let N ¼ dk be the lifetime values (that is, 30 and 150 minutes).
maximum number of nodes in T , where d > 1, and We perform extensive simulations by averaging the
k > 1. Assume that each node in V samples n ¼ cd nodes, performance metrics collected from 1,000 runs. Each run
where 0 < c  1. Applying Corollary 4 yields  the
1 expected takes 1,600 minutes.
weighted diameter equal to L ¼ 2 ln N  cd 2 . Thus, if
2þ1 ðln N Þ 6.1 Height of T
d c , then L  1, and the proof follows. u
t
Fig. 2 depicts the simulation results for the T protocol,
Corollary 6. If the weighted diameter of T T is ð1Þ, then where the performance metric, that is, the maximal path
þ1
N Þ Ti

^  2þ1 ðln N Þ .
d^  2 ðlnc . That is, dv ¼ d c length from r:chd½1 to any leaf, is shown for the number of
participating peers from 10 to 100,000. We note that the
5.4 An Example x-axis is in logarithmic scale. The results show that the
Previous studies have shown that in popular P2P networks maximal path length from r:chd½1 to any leaf is nearly
(for example, KaZaA [32]), peers have the mean lifetime identical to ln X . This conforms to our theoretical analysis,
T with X ¼ 105 nodes,
1= ¼ 2:5 hours [28], [33]. That is, if T as discussed in Section 5 (see Theorem 2).
5
then  ¼ 10 =9;000 11:11 newly coming peers per second 6.2 Diameter of T T
(Corollary 1). We let d^ ¼ 10, and k is thus 5. By Theorem 5, We illustrate the diameter of the tree implementing the T T
the degree of any node is not more than 20, with the protocol in Fig. 3. In the simulations, d^ is 10, 100, and 1.
probability not less than 1  103 ¼ 0:999. If the mean Fig. 3 shows that if d^ is enlarged (for example, d^ ¼ 100),
network delay between two Internet end hosts is 10 ms [34], then the diameter measured from simulations closely
then the mean number of registry nodes in the bootstrap matches the result presented in Corollary 4. However, if d^
will be 5, and the number of registry nodes is not more than becomes relatively smaller (for example, d^ ¼ 10), then our

Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.
1108 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO. 8, AUGUST 2008

Fig. 5. The cumulative distribution function of degrees.

Fig. 4. (a) The BRITE topology models Waxman and Barabasi. (b) The Fig. 6. The averaged rejoining cost per node.
real topology, with 500 PlanetLab nodes.
since the PlanetLab network topology exhibits the power-
TT networks have diameters larger than the expected (that law latency expansion [25].
is, 2 ln X ). This is because in the proof of Corollary 4, with Note that in the experiments with the Waxman, Barabasi,
which we estimate the diameter of a tree, with d^ nodes and PlanetLab topologies, we let d^ ¼ 100 and  ¼ 1 (see
equal to ln d, ^ instead of dln de. ^ Thus, the diameters Theorem 7 and Corollary 6).
measured for our T T trees are not less than 2 ln X .
6.4 Degree Distribution
6.3 Weighted Diameter of T T Fig. 5 shows the cumulative distribution function of
Fig. 4a illustrates the simulation results for the weighted degrees, where ðx; yÞ represents the number y of nodes
diameters (that is, the maximal weighted path length) of our having degrees not more than x. In this experiment, we let
tree-shaped overlay networks over the Waxman and d^ ¼ 8; 20; 40, and X ¼ 100;000. The simulation results are
Barabasi topology models [35]. The performance similar for different system sizes and are thus omitted in
results for exploiting and not exploiting the physical this paper.
network topology (W/ Exploiting Network Locality The results in Fig. 5 conform to our analytical result
and W/O Exploiting Network Locality, respectively) presented in Theorem 5. That is, the degree of a node is
are both shown in Fig. 4a. We note that the delay between ^
unlikely to be more than the expected d.
nodes in the Waxman and Barabasi topologies does not
follow the distribution of the power-law latency expansion. 6.5 Rejoining Cost and Overhead
However, our algorithms still work well, and with our 6.5.1 The Rejoining Cost
algorithms, the increase in the delay between nodes is less Theorem 4 states that a node takes at most Oðln2 N Þ, on the
sensitive to the number of nodes participating in the system. average, to rejoin a T i network if the node detects the
We also investigate the effectiveness of our algorithms in departure or failure of its parent, where N is the number of
exploiting the physical network using the real network nodes (that is, the roots of T i1 ) in T i . Our simulation
topology of an experimental testbed, namely, PlanetLab results show that the averaged rejoining cost of a node is
[36]. Fig. 4b depicts the simulation results for the topology very small, given d^ up to 100. This represents that our
with 500 PlanetLab nodes. As we can see from the results healing protocol is efficient. For further understanding of
shown in Fig. 4b, our algorithms for exploiting the physical the rejoining cost of our protocol, we instead investigate d^
network locality are particularly effective (see Theorem 7), up to 100,000 (that is, N ¼ 100;000). Fig. 6 depicts the

Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.
HSIAO AND HE: A TREE-BASED PEER-TO-PEER NETWORK WITH QUALITY GUARANTEES 1109

Fig. 7. The overheads. (a) Waxman. (b) Barabasi. (c) PlanetLab.

simulation results, which presents that a node takes the cost a simple T protocol (see Section 3). We thoroughly and
far less than Oðln2 N Þ to repair the network. rigorously analyzed the performance of our proposal, and
we have shown that our tree-based network has nice
6.5.2 The Overhead performance guarantees in terms of
As we have discussed in Section 1.1, the earlier proposals
such as Narada [6], PROMISE [9], and Anysee [10] intend to 1. the degree of a peer,
minimize the delay of receiving messages sent by the root 2. the diameter,
for any nonroot nodes. That is, these proposals work 3. the weighted diameter,
toward the construction of the shortest path spanning tree. 4. the cost of joining a new peer,
We thus also investigate the overhead of the shortest path 5. the protocol maintenance overhead, and
spanning tree. 6. the queue length in the bootstrap node.
We note that the shortest path spanning tree is a We also validate our analytical results in extensive
minimal-cost network flow problem, which can be formally simulations.
formulated as follows: We believe that our tree-based overlay could serve as an
X infrastructure for P2P applications that demand scalability,
min c f
ij2E ij ij fast communication, and low overhead. Our next work will
P P
k2OðjÞ fjk  i2IðjÞ fij ¼ 1; 8j 2 V  frg;
study how a P2P application such as live media broadcasting
P takes advantage of our tree-based overlay for minimizing
s:t: f
k2OðrÞ rk ¼ jV j  1;
the communication latencies among nodes and the system-
fij  0; 8i; j 2 V ; wide overhead. In particular, recent studies (for example,
where IðjÞ and OðjÞ respectively represent the incoming and [37] and [38]) have presented that the heterogeneity of peers
outgoing flows to/from the node j. Since is the nature of a P2P environment, and they show that
P one of our design
objectives is to minimize the overhead ij2E cij f as much as taking advantage of the heterogeneity can improve the
possible, if we normalize f such thatPf ¼ 1, then we can performance quality of tree-based streaming overlays. In our
simply estimate future study, we will optimize our tree-based overlay by
P the overhead equal to ij2E cij . Clearly, the
cost (that is, ij2E cij fij ) of the exploiting the heterogeneity of peers.
P shortest path spanning tree,
as defined above, is at least ij2E cij , because the resulting
flow fij must be  1 for any link ij in the tree. The shortest
ACKNOWLEDGMENTS
path spanning tree works toward the minimization of the
overhead of the network. The authors thank the anonymous reviewers for their
Fig. 7 shows the overheads of our tree-based overlay valuable feedback and Dr. Yingwu Zhu for his helpful
with/without the exploitation of the physical network comments on this paper. This work was partially supported
locality (denoted by W/ and W/O, respectively). In Fig. 7, by the National Science Council, Taiwan, under Grant
SPT represents the shortest path spanning tree. The physical 95-2221-E-006-095.
network topologies that we study in this experiment are
Waxman, Barabasi, and PlanetLab, with jV j ¼ 2;000, REFERENCES
jV j ¼ 2;000, and jV j ¼ 500 nodes, respectively. In the
[1] J. Kubiatowicz, D. Bindel, Y. Chen, P. Eaton, D. Geels, R. Gummadi,
experiment, we P estimate the overhead by using the total S. Rhea, H. Weatherspoon, W. Weimer, C. Wells, and B. Zhao,
delay (that is, ij2E cij ), assuming that f is normalized to 1. “OceanStore: An Architecture for Global-Scale Persistent Storage,”
Fig. 7 presents that our tree-based overlay with the Proc. Ninth ACM Int’l Conf. Architectural Support for Programming
exploitation of network locality (that is, W/) obviously Languages and Operating Systems (ASPLOS ’00), pp. 190-201, Nov.
2000.
outperforms SPT. This indicates that our tree-based overlay [2] B.Y. Zhao, L. Huang, J. Stribling, S.C. Rhea, A.D. Joseph, and J.D.
performs better toward the minimization of the overhead. Kubiatowicz, “Tapestry: A Resilient Global-Scale Overlay for
Service Deployment,” IEEE J. Selected Areas in Comm., vol. 22,
no. 1, pp. 41-53, Jan. 2004.
7 SUMMARY AND FUTURE WORK [3] D. England, B. Veeravalli, and J.B. Weissman, “A Robust
Spanning Tree Topology for Data Collection and Dissemination
We have presented a tree-shaped P2P network infrastruc- in Distributed Environments,” IEEE Trans. Parallel and Distributed
ture. Our tree-based overlay is lightweight and implements Systems, vol. 18, no. 5, pp. 608-620, May 2007.

Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.
1110 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO. 8, AUGUST 2008

[4] J. Li, J. Stribling, T.M. Gil, R. Morris, and M.F. Kaashoek, [28] S. Saroiu, P.K. Gummadi, and S.D. Gribble, “Measurement Study
“Comparing the Performance of Distributed Hash Tables under of Peer-to-Peer File Sharing Systems,” Proc. Multimedia Computing
Churn,” LNCS 3279, pp. 87-99, Jan. 2005. and Networking (MCN ’02), pp. 18-25, Jan. 2002.
[5] J. Duato, S. Yalamanchili, and L. Ni, Interconnection Networks: [29] Napster, http://www.napster.com/, 2007.
An Engineering Approach. Morgan Kaufmann, 2002. [30] G. Pandurangan, P. Raghavan, and E. Upfal, “Building Low-
[6] Y. Chu, S. Rao, and H. Zhang, “A Case for End System Multicast,” Diameter Peer-to-Peer Networks,” IEEE J. Selected Areas in Comm.,
Proc. ACM SIGMETRICS ’00, pp. 1-12, 2000. vol. 21, no. 6, pp. 995-1002, Aug. 2003.
[7] S. Banerjee, B. Bhattacharjee, and C. Kommareddy, “Scalable [31] H.-C. Hsiao and C.-P. He, “A Tree-Based Peer-to-Peer Network
Application Layer Multicast,” Proc. ACM SIGCOMM ’02, pp. 205- with Quality Guarantees,” technical report (available upon
217, Aug. 2002. request), Dept. of Computer Science and Information Eng., Nat’l
Cheng-Kung Univ., June 2007.
[8] D.A. Tran, K.A. Hua, and T. Do, “ZIGZAG: An Efficient Peer-to-
[32] KaZaA, http://www.kazaa.com/, 2007.
Peer Scheme for Media Streaming,” Proc. IEEE INFOCOM ’03,
[33] K.P. Gummadi, R.J. Dunn, S. Saroiu, S.D. Gribble, H.M. Levy, and
pp. 1283-1292, Mar. 2003.
J. Zahorjan, “Measurement, Modeling, and Analysis of a Peer-to-
[9] M. Hefeeda, A. Habib, B. Botev, D. Xu, and B. Bhargava, Peer File-Sharing Workload,” Proc. 19th ACM Symp. Operating
“PROMISE: Peer-to-Peer Media Streaming Using CollectCast,” Systems Principles (SOSP ’03), pp. 314-329, Oct. 2003.
Proc. 11th ACM Int’l Conf. Multimedia (Multimedia ’03), pp. 45-54, [34] S. Rhea, D. Geels, T. Roscoe, and J. Kubiatowicz, “Handling Churn
Nov. 2003. in a DHT,” Proc. Usenix Ann. Technical Conf., 2004.
[10] X. Liao, H. Jin, Y. Liu, L.M. Ni, and D. Deng, “AnySee: Peer-to- [35] A. Medina, A. Lakhina, I. Matta, and J. Byers, “BRITE: An
Peer Live Streaming,” Proc. IEEE INFOCOM ’06, pp. 1-10, Mar. Approach to Universal Topology Generation,” Proc. Ninth Int’l
2006. Symp. Modeling, Analysis, and Simulation of Computer and Telecomm.
[11] V. Venkataraman, K. Yoshida, and P. Francis, “Chunkyspread: Systems (MASCOTS ’01), pp. 346-353, Aug. 2001.
Heterogeneous Unstructured Tree-Based Peer-to-Peer Multicast,” [36] PlanetLab, http://www.planet-lab.org/, 2007.
Proc. 14th IEEE Int’l Conf. Network Protocols (ICNP ’06), pp. 2-11, [37] M. Bishop, S. Rao, and K. Sripanidkulchai, “Considering Priority
Nov. 2006. in Overlay Multicast Protocols under Heterogeneous Environ-
[12] S.M. Banik, S. Radhakrishnan, and C.N. Sekharan, “Multicast ments,” Proc. IEEE INFOCOM ’06, pp. 1-13, Mar. 2006.
Routing with Delay and Delay Variation Constraints for Colla- [38] Y.-W. Sung, M. Bishop, and S. Rao, “Enabling Contribution
borative Applications on Overlay Networks,” IEEE Trans. Parallel Awareness in an Overlay Broadcasting System,” Proc. ACM
and Distributed Systems, vol. 18, no. 3, pp. 421-431, Mar. 2007. SIGCOMM ’06, pp. 411-422, Sept. 2006.
[13] I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, and H. Balakrish-
nan, “Chord: A Scalable Peer-to-Peer Lookup Service for Internet Hung-Chang Hsiao received the PhD degree in
Applications,” Proc. ACM SIGCOMM ’01, pp. 149-160, Aug. 2001. computer science from the National Tsing-Hua
[14] A. Rowstron and P. Druschel, “Pastry: Scalable, Distributed Object University, Hsinchu, Taiwan, in 2000. From
Location and Routing for Large-Scale Peer-to-Peer Systems,” October 2000 to July 2005, he was a post-
LNCS 2218, pp. 161-172, Nov. 2001. doctoral researcher in the Department of Com-
[15] M. Castro, P. Druschel, A. Kermarrec, A. Nandi, A. Rowstron, and puter Science, National Tsing-Hua University.
A. Singh, “SplitStream: High-Bandwidth Content Multicast in a Since August 2005, he has been an assistant
Cooperative Environment,” Proc. 19th ACM Symp. Operating professor in the Department of Computer
Systems Principles (SOSP ’03), pp. 298-313, Oct. 2003. Science and Information Engineering, National
[16] C. Chou, T.-Y. Huang, K.-L. Huang, and T.-Y. Chen, “SCALLOP: Cheng-Kung University, Tainan, Taiwan. His
A Scalable and Load-Balanced Peer-to-Peer Lookup Protocol,” research interests include peer-to-peer computing, overlay networking,
IEEE Trans. Parallel and Distributed Systems, vol. 17, no. 5, pp. 419- and grid computing.
433, May 2006.
[17] C.G. Plaxton, R. Rajaraman, and A.W. Richa, “Accessing Nearby Chih-Peng He received the BS degree in
Copies of Replicated Objects in a Distributed Environment,” computer science and information engineering
Proc. Ninth ACM Symp. Parallel Algorithms and Architectures from Fu-Jen Catholic University, Taipei, in 2004
(SPAA ’97), pp. 311-320, June 1997. and the MS degree in computer science and
information engineering from the National
[18] M. Castro, M.B. Jones, A.-M. Kermarrec, A. Rowstron, M. Cheng-Kung University, Tainan, Taiwan, in
Theimer, H. Wang, and A. Wolman, “An Evaluation of Scalable 2007. He is currently with the Department of
Application-Level Multicast Built Using Peer-to-Peer Overlays,” Computer Science and Information Engineer-
Proc. IEEE INFOCOM ’03, pp. 1510-1520, Mar. 2003. ing, National Cheng-Kung University. His re-
[19] S. El-Ansary, L.O. Alima, P. Brand, and S. Haridi, “Efficient search interests include peer-to-peer computing
Broadcast in Structured P2P Networks,” LNCS 2735, pp. 304-314, and overlay networking.
Oct. 2003.
[20] A. Bharambe, S. Rao, V. Padmanabhan, S. Seshan, and H. Zhang, . For more information on this or any other computing topic,
“The Impact of Heterogeneous Bandwidth Constraints on DHT- please visit our Digital Library at www.computer.org/publications/dlib.
Based Multicast Protocols,” LNCS 3640, pp. 115-126, Feb. 2005.
[21] T. Cormen, C. Leiserson, and R. Rivest, “Recurrences,” Introduc-
tion to Algorithms, second ed. MIT and McGraw-Hill, 2001.
[22] M. Faloutsos, P. Faloutsos, and C. Faloutsos, “On Power-Law
Relationships of the Internet Topology,” Proc. ACM SIGCOMM ’99,
pp. 251-262, Aug. 1999.
[23] Gnutella, http://rfc-gnutella.sourceforge.net/, 2007.
[24] D.R. Karger and M. Ruhl, “Finding Nearest Neighbors in Growth-
Restricted Metrics,” Proc. 34th ACM Ann. Symp. Theory of
Computing (STOC ’02), pp. 741-750, May 2002.
[25] H. Zhang, A. Goel, and R. Govindan, “Improving Lookup Latency
in Distributed Hash Table Systems Using Random Sampling,”
ACM/IEEE Trans. Networking, vol. 13, no. 5, pp. 1121-1134, Oct.
2005.
[26] T.S.E. Ng and H. Zhang, “Predicting Internet Network Distance
with Coordinates-Based Approaches,” Proc. IEEE INFOCOM ’02,
pp. 170-179, June 2002.
[27] J.C. Chu, K.S. Labonte, and B.N. Levine, “Availability and Locality
Measurements of Peer-to-Peer File Systems,” Proc. SPIE—ITCom:
Scalability and Traffic Control in IP Networks, pp. 310-321, July 2002.

Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.