0 оценок0% нашли этот документ полезным (0 голосов)

10 просмотров12 страницApr 12, 2011

© Attribution Non-Commercial (BY-NC)

PDF, TXT или читайте онлайн в Scribd

Attribution Non-Commercial (BY-NC)

0 оценок0% нашли этот документ полезным (0 голосов)

10 просмотров12 страницAttribution Non-Commercial (BY-NC)

Вы находитесь на странице: 1из 12

A Tree-Based Peer-to-Peer

Network with Quality Guarantees

Hung-Chang Hsiao and Chih-Peng He

Abstract—Peer-to-peer (P2P) networks often demand scalability, low communication latency among nodes, and low systemwide

overhead. For scalability, a node maintains partial states of a P2P network and connects to a few nodes. For fast communication,

a P2P network intends to reduce the communication latency between any two nodes as much as possible. With regard to a low

systemwide overhead, a P2P network minimizes its traffic in maintaining its performance efficiency and functional correctness. In this

paper, we present a novel tree-based P2P network with low communication delay and low systemwide overhead. The merits of our

tree-based network include 1) a tree-shaped P2P network, which guarantees that the degree of a node is constant in probability,

regardless of the system size (the network diameter in our tree-based network increases logarithmically with an increase in the

system size, and in particular, given a physical network with a power-law latency expansion property, we show that the diameter of our

tree network is constant), and 2) provable performance guarantees. We evaluate our proposal by a rigorous performance analysis,

and we validate this by extensive simulations.

1 INTRODUCTION

P

weighted path length n1

P EER-TO-PEER (P2P) networks (or overlays) have recently

become an active research area. Applications over

P2P networks include information retrieval, content dis-

i¼1 cvi viþ1 among all possible short-

est paths, where the edge cost (that is, the delay in this

study) of two adjacent nodes vi and vj on the path is denoted

tribution, processor cycle sharing, etc. These applications by cvi vj . Notably, we explicitly differentiate between the

often demand that their underlying P2P network infra- “diameter” and “weighted diameter” in this paper. For a

structures be scalable and have low diameter and overhead. low overhead, we mean that an overlay has a low system-

For example, an Internet-scale file sharing system, namely, wide operational traffic in maintaining the performance

Oceanstore [1], is designed and deployed on top of a efficiency and functional correctness of the overlay. More

P2P network Tapestry [2]. Tapestry is scalable in that each precisely, we estimate the overhead of an overlay as

P

node participates in the network by using Oðlog X Þ e2E ce f, where ce is the delay of sending a control message

connections. Its overlay diameter is equal to Oðlog X Þ, through the overlay link e, and f is a predefined maximum

where X is the total number of nodes in the system. bandwidth required for sending a control message.1 We

In this study, we concentrate on addressing the above- assume that the control messages used to construct and

mentioned fundamental requirements, that is, scalability, maintain an overlay have the same message length.

low diameter, and low overhead, for the overlay network In this work, we are particularly interested in studying

infrastructures. By scalability, we mean that each node tree-based overlay networks. We aim at designing a scalable

only has partial knowledge regarding the entire network tree-based overlay with low (weighted) diameter and over-

structure. This implies that each node in the network head. Tree-based overlays are often the core infrastructures

maintains very few overlay links. For diameter, consider adopted by P2P applications that demand collective com-

that the shortest routing path v1 ; v2 ; ; vn of any message in munication services (for example, message multicasting and

an overlay network G ¼ ðV ; EÞ, where v1 ; v2 ; ; vn 2 V are reduction [5]). For example, consider a tree-based live media

distinct. The diameter of G is the maximal path length n of a multicasting system in which a root peer in a tree-shaped

path among all possible ones in G. An overlay with a low overlay acts as a source that stores a complete media stream

diameter is desirable, since a route between any two nodes and offers the stream to nonroot peers. Meanwhile, each

visits a lesser number of intermediates and is thus less nonroot peer downloads the stream from its upstream peer

sensitive to faults of these intermediates [3]. In contrast to and relays those downloaded to the downstream peers if

the diameter, the “weighted” diameter is the maximally available.

. The authors are with the Department of Computer Science and Information Perhaps, the studies most relevant to our work are [6], [7],

Engineering, National Cheng-Kung University, Tainan 701, Taiwan, [8], [9], [10], and [11], considering the P2P setting. The

R.O.C. E-mail: hchsiao@csie.ncku.edu.tw.

earlier work relies on tree-shaped overlays to facilitate

Manuscript received 15 June 2007; revised 21 Sept. 2007; accepted 16 Oct.

2007; published online 24 Oct. 2007.

streaming media contents. Chu et al. [6] suggest construct-

Recommended for acceptance by K. Hwang. ing a mesh overlay network. Given a mesh network, a tree

For information on obtaining reprints of this article, please send e-mail to:

tpds@computer.org, and reference IEEECS Log Number TPDS-2007-06-0193. 1. Li et al. in [4] suggest that the overhead required for constructing and

Digital Object Identifier no. 10.1109/TPDS.2007.70798. maintaining an overlay is parameterized by the bandwidth metric.

1045-9219/08/$25.00 ß 2008 IEEE Published by the IEEE Computer Society

Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.

1100 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO. 8, AUGUST 2008

subnetwork Narada with good quality links is then built formatted by having an internal tree node with a hash ID x

and maintained. Narada implements a shortest path pick its fingers, with IDs between ½x; yÞ being the children

spanning tree algorithm and, thus, intends to minimize nodes for the tree, where nodes with IDs x and y are

the weighted diameter and the overall traffic generated. The immediate siblings in the tree [18], [19]. Clearly, such a tree

shortest path spanning tree, however, may not guarantee network can serve as a collective communication substrate,

the network diameter to be minimal. In addition, Narada where each node in the tree network maintains Oðlog X Þ

targets at a small-scale environment, which acquires the children nodes, and the overall diameter of the tree is

global knowledge to construct its tree network. 2 Oðlog X Þ. However, the performance of the tree network

Banerjee et al. [7] designed a tree overlay network NICE, embedded in a DHT may not be optimal in terms of

in which a node takes Oðlog X Þ connections to join the tree. the weighted diameter and overhead. For example,

In NICE, nodes geographically nearby form a cluster, and Bharambe et al. [20] conclude that the trees embedded in a

the cluster is the basic building block for the tree. NICE DHT network may not have a low weighted diameter and

guarantees the diameter of the tree to be equal to an overhead due to the deterministic structure of a DHT

2 Oðlog X Þ. It is, however, unclear whether NICE has the network and the mismatch of the ID space and the physical

minimal overhead, although the tree network exploiting the network topology. Instead of relying on trees embedded in

physical network topology operates toward the minimiza- a general-purpose DHT network, in this paper, we are

tion of the overhead. It is also unclear whether the weighted interested in designing a tree-shaped overlay to provide

diameter of NICE can have a bound guarantee. collective communication. In particular, we intend to design

Tran et al. [8] present the construction of a tree network, a tree network with good performance guarantees.

namely, ZIGZAG, similar to that in NICE. In [8], Tran et al.

1.2 Our Idea and Contribution

offer an in-depth performance analysis for ZIGZAG. Over-

all, if the size of a cluster is in ½k; 3k, ZIGZAG guarantees In this study, we present a scalable tree network T T ¼ ðV ; EÞ

that each node takes Oðk2 Þ connections to join the tree with low (weighted) diameter and overhead. To build our

and the diameter of the tree is 2 Oðlogk X Þ. However, tree network T T, we first design a tree-based overlay T with

Tran et al. do not investigate the bounds for the weighted a low diameter. We denote T as T 1 , and T 2 is formed by

diameter and systemwide overhead. structuring d disjoint T 1 , where d is a given positive integer.

In contrast to [6], [7], and [8], Hefeeda et al. [9] present a In general, T k1 consists of d T k2 trees. In our proposal, the

tree-based overlay network that exploits the physical height HðT i Þ of a tree T i ð1 i k ¼ logd X Þ is guaranteed

Internet topology. Liao et al. [10] present how one can to have a bound of Oðln dÞ if we treat each subtree T i1 as

utilize the tree links that are used to bridge different tree a single node in T i . Let T T ¼ T k . With the recurrence

overlays. Both studies aim at minimizing the delay of equation [21], this results in the height of T T being HðT TÞ ¼

receiving a message for any nonroot peer. Clearly, such a

HðT k Þ ¼ HðT k1 Þ þ Oðln dÞ ¼ Oðln d logd X Þ ¼ Oðln X Þ and,

design principle works toward the minimization of the

thus, the diameter of T T being 2 Oðln X Þ. Since in T i , our

weighted diameter and the systemwide overhead. How-

ever, the designs presented in [9] and [10] have no design allows each T i1 to freely pick a geographically

performance guarantees. nearby node as its parent, such a flexibility for picking a

Chunkyspread [11] shows that tree-based overlays are parent node reduces the weighted diameter of the resulting

viable solutions to live media broadcasting in the face of TT to a constant.

peers joining, departure, and failure. In Chunkyspread, We summarize our major contributions as follows:

participating nodes balance their loads in streaming data. A

node is forced to connect to a new parent node if its present 1. We propose a decentralized algorithm that constructs

parent is overloaded. If the load of a node is under a and maintains a tree network with low (weighted)

targeted lower bound, the node may accommodate more diameter and overhead. To our best knowledge, our

children nodes. In Chunkyspread, instead of optimizing the design is the first attempt to address these design

network diameter, nodes minimize the latency of receiving issues simultaneously.

media contents sent by the root. 2. Our tree-shaped overlay has provable performance

England et al. [3] investigate the design trade-off of the guarantees, which is efficient in that with a constant

data loss rate and performance-oriented metrics (for probability, the degree of each node in the network is

example, the delay from the source to a destination) for constant. The expected diameter of our tree network

tree-based overlay networks. They present a tree-based is Oðln X Þ. Given a physical network with the power-

overlay to achieve a desirable trade-off. Banik et al. [12] law latency expansion [22], the expected weighted

propose a tree-based network to satisfy the given con- diameter is OðÞ, where is the maximal delay

straints of the delay bound and the delay variation bound between any two peers in the physical network.

from the source to any destination. 3. We offer a thorough and rigorous theoretical analysis

Structured overlay networks are general-purpose com- for our tree-shaped overlay protocol. Our analytical

munication infrastructures for P2P applications like file results have tight performance bounds. We also

sharing, multicasting, information retrieval, and processor validate our analytical results in simulations.

cycle sharing. Structured P2P systems, for example,

Chord [13], Pastry [14], and Tapestry [2], which are all 1.3 Roadmap

based on distributed hash tables (DHTs), may include tree The remainder of this paper is organized as follows: Section 2

structures into their designs [15], [16], [17]. A possible gives the definitions, notations, and assumptions. The design

tree structure embedded in a DHT overlay, say, Chord, is of our constant-degree low-diameter tree-shaped overlay is

Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.

HSIAO AND HE: A TREE-BASED PEER-TO-PEER NETWORK WITH QUALITY GUARANTEES 1101

TABLE 1

Notations Frequently Used

described in Section 3. We discuss how our overlay exploits

each node v in each of T k1 and T k trees has a maximum degree d^Tv ¼ 6.

the underlay network locality such that the resulting

weighted diameter becomes constant in Section 4. Section 5

presents the performance analysis for our tree overlay Section 3.1. The details of the tree protocol are then given

protocol. We also perform the simulation study, and the in Sections 3.2 and 3.3. We defer the discussion of how our

simulation results are given in Section 6. We summarize our tree network exploits the physical network locality in

study in Section 7, with possible future research directions. Section 4.

3.1 Overview

2 DEFINITIONS, NOTATIONS, AND ASSUMPTIONS Fig. 1 shows our idea for constructing a constant-degree

We model a P2P network as an undirected graph low-diameter tree. Basically, our tree is recursively formed

G ¼ ðV ; EÞ, where V includes the nodes participating in in a hierarchical fashion. The basic element of our tree is a

the system, and E represents the overlay links among the T 1 tree. A T i tree is built by at most d^Tv T i1 trees, where

nodes. An overlay edge e 2 E between two nodes u and v in 1 i k. The resulting tree that our tree protocol con-

V is denoted by e ¼ uv. In this paper, the delay of an edge uv structs is T T ¼ T k . We note that ðd^Tv Þk is the maximum

in an overlay network is denoted by cuv . We assume in this number of nodes in T T.

study that nodes in V may come and go. Some terminologies When forming a T i tree, nodes self organize, and the

that are frequently used are defined as follows: maximal path length from the root to any leaf is bounded.

Definition 1. The simple path (or path) from u 2 V to We note that in each T i tree, the root node, denoted by r, is

v 2 V ðu 6¼ vÞ, denoted by u ! v, is a connected subgraph associated with an only child node r:chd½1. This allows us

ðV 0 ; E 0 Þ G such that the cardinality jV 0 j ¼ jE 0 j þ 1, where to minimize the degree of the root. In contrast, nonroot

V 0 V , and E 0 E. We let the path length ju ! vj ¼ jV 0 j. nodes can use up to the degree of d^Tv to participate a T i tree.

Once a T i tree is constructed, its root node proceeds to

Definition 2. The shortest path length from u 2 V to join a T iþ1 tree. Possibly, the root remains a root node of a

v 2 V ðu 6¼ vÞ, denoted by lu;v , is lu;v ¼ minfju ! vk8u ! T iþ1 tree. Otherwise, it can connect not more than d^Tv nodes

v Gg. The shortest path length is the length of the shortest in T iþ1 . For example, in Fig. 1, the root node A is associated

simple path from u to v.

with an only child node C in a T k1 tree. The root node A

Definition 3. The diameter of a graph G, denoted by DG , is the then participates in a T k tree and maintains another child

maximal shortest path length ju ! vj from any node u 2 V to node D for the T k tree. That is, a node may participate in

any v 2 V . That is, DG ¼ maxflu;v j8u 6¼ v 2 V g. several T k trees (where k ¼ 1; 2; 3; ), which serves as a

Definition 4. The degree of a node v 2 V , denoted by dG

v , is jUj,

root node in each T k and maintains an only child node for

where any node u 2 U V fvg has uv 2 E. each T k . In contrast, the root node B of another T k1 tree in

Fig. 1 joins the T k tree as a nonroot node. The resulting

We assume in this study that there exists at least a robust T k tree in Fig. 1 consists of nodes A, B, D, E, F, and G

bootstrap node to help a node join/rejoin the network. that are the roots of corresponding T k1 trees.

Table 1 lists the notations frequently used in this paper. We call the tree formation protocol that forms a T i tree

for any 1 i k as the T protocol in the following

discussions.

3 CONSTANT-DEGREE SMALL-DIAMETER OVERLAY

NETWORK PROTOCOL 3.2 T Protocol for Constant Peers

In this section, we first give an overview of the idea We consider formatting and maintaining a tree network

regarding our tree-based overlay formation protocol in T i ¼ T ¼ ðV ; EÞ, where 1 i k.

Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.

1102 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO. 8, AUGUST 2008

3.2.1 Network Construction 1. If diffðAÞ < diffðBÞ, B reports its parent node B:prt

We first define the following notation: to A. Upon receiving the network address of B:prt, A

then iteratively performs the joining by sending the

Definition 5. The numerical difference of a node v with

joining request to B:prt. The joining process pro-

respect to the root node r, that is, diffðvÞ, is defined as

ceeds until the joining request is forwarded to an

def 4; F ðr:chd½1Þ F ðvÞ; ancestor node Q of B:prt, and diffðQÞ < diffðAÞ. A

diffðvÞ ¼

R þ 4; otherwise; then connects to Q.

2. Otherwise, if diffðAÞ > diffðBÞ, A simply connects B

where F can be an arbitrary collision-free hash function that as B’s child node.

can provide a unique ID ( 1) to a node, R is the maximum

value that F can return, and 4 ¼ F ðvÞ F ðr:chd½1Þ. We note the following in our tree formation protocol:

1. The bootstrap node picks the first node that joins the

When a node A intends to join the overlay, it first connects network as the root node r. r is then registered with

to the bootstrap node2 that provides an entry point, that is, the the bootstrap node.

root node r, of the overlay (see Algorithm 1). The root node r 2. When nodes are forming a tree network, the root

then helps A join by uniformly picking a node in the tree at node r in the tree always maintains only one child

random. Notably, r has the knowledge of the tree topology node r:chd½1. The second node that joins the network

(discussed later) such that r picks a node in the tree uniformly simply becomes the only child of r. That is, dTr ¼ 1.

at random without consuming any network traffic. 3. Any node v, except r, can accept any number of

nodes as their children subject to their degree

constraints. That is, dTv d^Tv .

4. The total number of nodes in a tree is up to dTv . That

is, dTv ¼ d^Tv .

5. Each leaf node v of the tree requires sending a live

message to its parent v:prt such that v:prt can keep

track of the number of its children nodes and its

subtree topology. v:prt performs similarly so that the

parent node ðv:prtÞ:prt of v:prt can add up the size of

the “subtree” rooted at ðv:prtÞ:prt and maintain the

knowledge of its subtree topology. r can then have

the topology of the tree and calculate the total

number of the nodes in the tree.

6. If the tree rooted at r contains up to d^Tv nodes, then r

will not include any newly coming node into its tree.

r deregisters from the bootstrap node. r may

reregister with the bootstrap if r maintains less than

d^Tv nodes.

Our tree-shaped overlay network may be fragmented due to

node failure or departure. To handle the dynamics of the

overlay, each node v periodically pings its parent node v:prt.

If v:prt fails to respond to v, v assumes the failure of v:prt

and then rejoins the network via the help of another node in

the network by consulting its local cache.

Algorithm 2 details the overlay maintenance. A node A

first checks whether its parent A:prt is active by sending a

ping message periodically. Upon receiving a ping message,

A:prt replies to A with a pong message. If A does not

receive any pong message from A:prt, A then sends another

ping message to A:prt. If a number of ping messages are

sent and A does not receive any pong from A:prt, A then

performs the rejoining operation by sending a joining

request to a node U picked uniformly at random from

When the random node, say, B, is determined, the

A’s cache (denoted as CacheðAÞ) that A locally maintains.

process is immediately performed as follows:

The rejoining operation simply lets a rejoining node join the

network by using the joining algorithm discussed in the

2. We adopt the mechanism similar to Gnutella [23], which provides a

bootstrap node for a node joining. Possibly, there are several bootstrap previous section. In our design, before A performs its

nodes to help nodes join the overlay. rejoining, A needs to notify all nodes in its subtree (that is,

Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.

HSIAO AND HE: A TREE-BASED PEER-TO-PEER NETWORK WITH QUALITY GUARANTEES 1103

the subtree rooted at A). Upon receiving the notification 3. Similarly, any node in the subtree rooted at A rejoins

sent from A, the nodes in A’s subtree leave and join the by randomly picking an entry node from its cache. If

overlay. The notification can be simply implemented by the entry node is unavailable, the node rejoins via

sending the notification message downward a subtree. the bootstrap node.

4. If a node is performing its rejoining via an entry

point that is also performing the rejoining, then the

node selects another node in its cache as a new entry

point. Similarly, if a node cannot find any node in its

cache to help its rejoining, the node requests the

bootstrap to pick one.

5. Possibly, more than one of the nodes in the subtree

rooted at r:chd½1 select r from their local caches as

their entry points for their rejoining with join ðrÞ. If

r:chd½1 leaves or fails, r will pick one of these nodes

as its r:chd½1. Otherwise, those nodes that fail to

become r:chd½1 then rejoin by selecting other entries

from their caches (or by consulting the bootstrap

node in case no live cached node can be found).

Each node A in the network maintains a cache, denoted

by CacheðAÞ, by using a PULL algorithm. Basically, in our

PULL algorithm, each node A has to periodically send a live

message to its parent A:prt. Upon receiving a live message,

A:prt collects the IDs in the subtree rooted at A:prt. A:prt

then performs the similar by sending the received IDs to its

parent. This process continues so that r collects the IDs of all

nodes in the tree network rooted at r. r then disseminates

these IDs to all nodes in the network along the tree

structure. Consequently, each node in the network can

construct and maintain its cache that contains the IDs of the

nodes in the network.

Consider the example given in Fig. 1. A is the root of the We finally note that our network maintenance protocol

tree T k , where T k is formed by the root nodes A, B, D, E, can handle the failure of r:chd½1 well. Consider a level-i tree

F, and G of the tree T k1 . Assume that D fails. E then T i with a root node r, r’s only child node r:chd½1, and

detects the failure of D without receiving any pong two other nonroot nodes C and D. Assume that the children

messages from D. E informs the nodes (that is, B and F) nodes of r:chd½1 are C and D. If r:chd½1 fails, C and D

in T k in the subtree rooted at E to rejoin. G performs the detect the failure of r:chd½1 and then need to rejoin the

same operations if it has offspring nodes in T k . E and G network. If the local cache maintained by C (or D) is up to

then rejoin also. date and has r’s location, then C (or D) connects to r and

We will provide a theoretical analysis for the becomes an only child node of r. Otherwise, C (or D)

rejoining cost (Definition 8) in Section 5.1. Our analysis consults the bootstrap node to seek a root node of a

result (Theorem 4) presents that our network maintenance level-i tree as its entry point for its rejoining.

protocol is efficient. The expected rejoining cost of the

2

maintenance protocol is OðNl Þ, where l is the diameter of a

3.3 T T: Scaling with the T Protocol

T i tree, and N is the number of nodes participating in T i . We have presented the basic algorithm for forming a tree

We emphasize that it is critical to let a node pick a node network T that can consist of up to d^Tv , where d^Tv is also the

uniformly at random for its joining or rejoining the network. maximum degree of a node.

This enables our protocol to guarantee that l ¼ Oðln N Þ in For constructing a level-2 tree T 2 , root nodes r in distinct

1

expectation. Section 6.5 verifies these analytical results. T trees query the bootstrap node for their entry points.

We finally note the following for our network main- This process is identical to that of the joining of a node into

tenance protocol: a T 1 tree, except that the candidate entry points that can

help these root nodes form their T 2 trees are the root nodes

1. A rejoins by first invoking join ðUÞ shown in of T 1 trees. Therefore, in our tree formation protocol, we

Algorithm 1, where U is the node ID maintained require the bootstrap node to additionally label each

in A’s local cache. registry node with its level ID. The bootstrap node depends

2. If A cannot find any live node U from its cache for its on the level ID to identify the “root level” of a registry node.

rejoining, A needs to locate an entry point B from the That is, the root node of a T k tree will be labeled with the

bootstrap node3 for its rejoining with JOINðBÞ. level ID k in the bootstrap. For example, if a node is a root

3. In this study, we assume that the bootstrap node is always alive and node of a level-3 tree, then it will have level ID 3 in the

keeps some random nodes participating in the tree overlay. bootstrap.

Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.

1104 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO. 8, AUGUST 2008

If TT is a level-k tree T k , then the above-mentioned Definition 7. The weighted diameter in a graph G is

P

process proceeds until multiple T k1 trees self organize into maxf adjacent nodes v;u in p cvu j8 shortest paths p Gg,

a T k tree. Similarly, the root nodes of these T k1 trees form where p denotes the simple shortest path.

a T k by consulting the bootstrap for the locations of roots,

with level ID k 1 being the entry points. Clearly, without exploiting the physical network locality,

We note the following:

T can have the “weighted diameter” Oð logd^ X Þ ¼

T

1. If a node is the root of T i , then it must also be the Oðlogd^ X Þ, where T i ð0 i kÞ has d^ nodes, and the total

root of T i1 ; T i2 ; ; T 1 . Consider the example nodes in T T is X .

shown in Fig. 1. The node A is not only a root node

Lemma 1. Let X1 ; X2 ; ; Xn be independent random variables

of T k1 but also a root of T k . It is easy to verify that

A is also a root node of T j , where j ¼ 1; 2; ; k 2. over [0, 1], where X1 ; X2 ; ; Xn follow the probability

2. Nodes can use the degree of up to dTv to form a distribution with the -power-law latency expansion

level-i tree, where 1 i k. Therefore, in a level-i tree,

P P ðX < xÞ ¼ x . Let Y ¼ minfX1 ; X2 ; ; Xn g. Then, there

the root has the “total” degree equal to ij¼1 1 ¼ i, 1

exists Y < c such that E½Y n , where c is a positive

since the root node (for example, the node A in Fig. 1)

number.

maintains an only child node in each level-k tree,

where k ¼ 1; 2; ; i. In contrast, the “maximum” total Proof. Since X1 ; X2 ; ; Xn are independent and

degree of a nonroot node in a level-i tree is dTv þ i 1. Y ¼ minfX1 ; X2 ; ; Xn g

This is because a nonroot node (for example, the

P ðY yÞ ¼ P ðminfX1 ; X2 ; ; Xn g yÞ

node B in Fig. 1) in a level-i tree must be a root node of !

a level-k tree (where k ¼ 1; 2; ; i 1). The nonroot \

n

¼P ðXi yÞ

node then participates in the level-i tree by using the i¼1

degree up to dTv . Y

n

3. Consider a T i tree. If a node v detects the failure (or ¼ P ðXi yÞ

departure) of its parent v:prt in T i , then similar to i¼1

what we have discussed in Section 3.2.2, v notifies its ¼ ð1 y Þn :

offspring root nodes of the subtree T i1 in T i

Since 1 a ea (when 0 < a < 1, and a is sufficiently

regarding the failure/departure. Upon receiving

small), it follows that

the notification message, a node u in T i rejoins the

network by using Algorithm 2. Z 1

4. A node in T i maintains i caches, and each cache E½Y ¼ P ðY yÞdy

0

is constructed and maintained as mentioned in Z 1

y n

Section 3.2.2. e dy

0

Z 1 1

4 EXPLOITATION OF PHYSICAL NETWORK LOCALITY ¼ e n y

dy

0

Studies in [22], [24], and [25] present that the latency Z s

1

distribution between Internet end hosts likely follow the ¼ ex dx;

s 0

power-law latency expansion. In this study, we thus concen- 1 Rs

trate on the network graphs with the power-law latency where s ¼ n . Since 0 ex dx 1, E½Y 1s , and the

expansion. proof follows. u

t

Definition 6. A graph follows the -power-law latency Lemma 1 indicates that if a node v picks n nodes

expansion if for each node v in the graph uniformly at random and among these n nodes, v maintains

the one u that has the smallest delay to v, then the delay

Nv ðxÞ ¼ x ; 1

from v to u will be n . This suggests that our tree protocol

where Nv ðxÞ denotes the number of nodes that have latencies works in the following way to exploit the physical network

not more than x to v, and and are two given positive locality:

constants.

1. A node v in T i samples the nodes in T i . Since T i

Assume that the maximal distance between any maintains up to d^Tv nodes (that is, the root nodes of

two nodes in the graph (with the power-law latency T i1 subtrees), each node can sample d^Tv nodes at

expansion) in which our tree network T T overlays is . most. Assume that v performs n samples, where

Then, the probability distribution for Nv ðxÞ is n d^Tv .

x 2. v maintains a node u that is closest in terms of the

P ðcvu < xÞ ¼ ; network delay among the n samples.

3. v then rejoins T i via u if diffðuÞ < diffðvÞ. v thus

where u is any node in the graph, and cvu denotes the becomes a child node of u. Otherwise, v still connects

latency from v to u. Without loss of generality, we let ¼ 1. to its original parent.

Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.

HSIAO AND HE: A TREE-BASED PEER-TO-PEER NETWORK WITH QUALITY GUARANTEES 1105

The details are shown in the following algorithm: system at time t is denoted by MðtÞ. We also assess the load

of the bootstrap node in this section.

Theorem 1. The number of peers in the system at time t is

OðE½MðtÞÞ, with high probability.4

Corollary 1. Let

¼ N . If t N , then OðE½MðtÞÞ ¼ N .

Theorem 1 and Corollary 1 in [31].

Theorem 1 states that the number of nodes in the system

at any time t is OðMðtÞÞ with high probability. Corollary 1

presents that if the system time t OðN Þ, then the number

of nodes in the system is OðE½MðtÞÞ ¼ N . Therefore, in the

following, we will discuss operating at t > cN for some c

and denote the number of peers in at t by N .

Lemma 2. If an overlay is constructed using JOIN, then will

be cycle free.

Proof. Consider a cycle, denoted by p ¼ a0 a1 a2 an1 a0 in ,

where a0 ¼ r:chd½1, and fa1 ; a2 ; ; an1 g V fr:chd½1g.

We consider the following cases:

the same end point a0 joint at a node, say,

ai ð1 i n 1Þ. However, this is impossible,

We note the following: since by definition, each nonroot node in can

1. Any node v 2 V fr; r:chd½1g can simply sample only have a parent node.

nodes in v’s cache. 2. p can be a circular path, even without two paths,

2. Since r has the global knowledge of T i , r replaces with r:chd½1 being their end point cross. If so, it

r:chd½1 by the closest offspring node among V frg. can be easily shown that diffða0 Þ < diffða1 Þ <

If so, all offspring nodes, except the newly selected diffða2 Þ < < diffða0 Þ. This is a contradiction,

r:chd½1, rejoins T i by using Algorithm 2. and the proof follows. u

t

3. Measuring the latency between two nodes is out of Remark 1. If an overlay is constructed using JOIN, then a

the scope of our study. This may refer to public node joining will visit nodes on not more than

network services such as GNP [26]. one path, with the root node being the end point.

4. Nodes in the subtree rooted at v need to rejoin, since Theorem 2. If implements JOIN, jV j ¼ N , and d^v ¼ jV j for

v successfully connects to a closer parent compared any v 2 V , then the maximal path length in from r:chd½1 to

to its previous one. These nodes, except for v, any leaf is ln N þ Oð1Þ in expectation.

perform rejoining by Algorithm 2.

Theorem 3. Assume with N nodes. Denote the maximal path

We will show later in Section 5.3 that if n 2þ1 ðln X Þ ,

length in rooted at r:chd½1 by the random variable SN .

then the weighted diameter of our tree network T T becomes

Then, SN 6 ln N , with the probability not less than

the constant .

1 N 1 .

5 PERFORMANCE ANALYSIS Since the proofs for Theorems 2 and 3 are lengthy, we

This section provides a rigorous thorough performance refer the readers to our technical report in [31] for the details.

analysis for our proposal given in Sections 3 and 4. Lemma 2 and Remark 1 state that any node a takes a

Section 5.4 concludes this section and presents the implica- finite number of hops to join the overlay and the nodes

tions of our performance results by illustrating an example. helping a join appear on only one path, with r:chd½1 being

the end point. Theorems 2 and 3 show that any path, with

5.1 Performance of T r:chd½1 being the end point has Oðln N Þ hops with high

It is sufficient to consider the subtree ¼ ðV ; EÞ rooted at probability. We thus conclude as follows:

r:chd½1 in T . Recent measurement studies [27], [28] of real Corollary 2. If rooted at r:chd½1 with N nodes is constructed

P2P systems (that is, Gnutella [23] and Naspter [29]) with JOIN, then a newly joining node takes Oðln N Þ hops with

provide evidence that peers have lifetimes approximating high probability to join . Clearly, T associated with has the

the exponential distribution reasonably well [30]. In the maximal path length Oðln N Þ þ 1 ¼ Oðln N Þ from r to any

following analysis, we assume that the system follows the leaf in T .

M=M=1 queuing model, in which the arrival rate of peers

is according to a Poisson distribution with parameter . The

lifetimes for peers are independent and exponentially 4. "With high probability" in this paper denotes the probability not less

ð1Þ

distributed with parameter . The number of peers in the than 1 OðN Þ.

Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.

1106 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO. 8, AUGUST 2008

Pl1

Our tree construction protocol (that is, JOIN) strongly ðN 1Þ

si N 1 ð1 i l 1Þ, Cavg ðl þ 1Þ i¼1

N 1

¼

depends on the uniformity in selecting a node to join the ðl þ 1Þ ðl 1Þ ¼ l2 1. The proof follows. u

t

overlay. In addition to randomly picking a node by r for

joining a newly coming node, each node will also rely on Corollary 3. If with N nodes is maintained with HEAL and

PULL, then in the time interval ½t N1

; t,the rejoining cost

randomly and uniformly selecting a node in its cache for 2 2

helping its rejoining if the node detects the failure of its per unit time is not more than Nl1 ¼ O Nl , on the average,

parent node. where l 6 ln N , with the probability not smaller than

Remark 2. If with N nodes is maintained with HEAL and 1 N 1 , and any t N1 .

PULL, then any node u 2 V has the identical probability

5.2 Performance of T T

to be picked as an entry point by a rejoining node v,

As we have discussed earlier, T T ¼ T k . We will, in this

where u 6¼ v.

section, report the performance analysis for TT regarding

the degree dvT

T for any v and the maximal path length from

We note that Remark 2 shows that if a node rejoins its

the root node r to any leaf.

tree overlay , then it will pick a node as its entry point

from N 1 participating peers in V with the probability of Theorem 5. Assume that each node v in T T initially has

1

N 1 . If so, the JOIN algorithm shown in Algorithm 1

the degree d^ to form T i , where 1 i k. Then,

guarantees that the maximal path length from r:chd½1 to a E½dT T ^ TT ^

v ¼ d þ Oð1Þ, and dv 2d, with the probability not

leaf remains Oðln N Þ (see Corollary 2) with high probability, ^

less than 1 d .3

even if nodes operating the JOIN algorithm are in an Theorem 6. Assume that constructing a d-node^ T i tree takes

environment in which nodes may come and go. ^

tðdÞ time units, where 1 i k. If E½tðdÞ ^ d^

2ðþÞ

In our HEAL algorithm, if the root node p of a “subtree” ^ d^

(E½tðdÞ 2 when

), then the number of registry

in detects the failure of its parent p:prt, it needs to rejoin

nodes in the bootstrap node is less than k in expectation and

and notify the nodes in the subtree regarding the failure.

not more than k2 þ OðkÞ with the probability 1 k4 þ oð1Þ.

That is, each of the nodes in the subtree requires performing

the JOIN algorithm. We are thus interested in knowing the Due to space limitations, we omit the proofs for

cost associated with these rejoining operations. Theorems 5 and 6, and the details of the proofs can be

found in [31].

Definition 8. The rejoining cost due to the failure of a node v

Corollary 4. Let X ¼ d^k be the total number of nodes in T T.

in is defined as the number of nodes in S1 ; S2 ; ; Sk

performing their rejoining operations and the number of nodes

Then, the diameter of T T is DT T ¼ 2 ln X in expectation,

and with the probability not less than 1 Oðd^1 Þ, DT T is

that help nodes in S1 ; S2 ; ; Sk perform their rejoining, where

not more than 12 ln X .

S1 ; S2 ; ; Sk are the subtrees rooted at v.

Proof. The diameter DT T of TT is the length of the path

Theorem 4. If with N nodes is maintained with HEAL and

crossing through the root node r in T i ði ¼ 1; 2; ; kÞ

PULL, then in the time interval ½t N1 ; t, the rejoining cost

from a leaf node a in T 1 to a leaf b in another T 1 .

introduced by any node is not more than l2 þ Oð1Þ, on the ^ ^

average, where l is the maximal path length from r:chd½1 to

Therefore, DT T ¼ 2k ln d ¼ 2 logd^ X ln d ¼ 2 ln X . By the

proof in [31, Theorem 3], with the probability not less

any leaf, and any t N1 .

than 1 Oðd^1 Þ, the diameter is not more than 12 ln X .

Proof. Assume that l is the maximal path length from We can also rely on the recurrence [21] to prove this

r:chd½1 to any leaf and that ni is the number of nodes result. Let HðT k Þ be the maximal path length from the

with the i-hop distance Pl1 from the root. Clearly, root of T k to any node in T 1 . Then, we have the

1 i l 1, and i¼1 n i ¼ N 1 (excluding r:chd½1). recurrence equation HðT k Þ ¼ HðT k1 Þ þ ln d. ^ Since

If a node v having the k-hop ð1 k l 1Þ distance from k ¼ logd^ X , we have HðT k Þ ¼ logd^ X ln d^ ¼ ln X . We

r:chd½1 detects the failure of its parent, then v will rejoin thus have the diameter DT T not more than 2 ln X , and

, and those nodes in the subtree rooted at v will also the proof follows. u

t

rejoin. We denote the number of nodes that perform the

rejoining operations by sk;v . It can be easily shown that if We have shown in Theorem 5 that with high probability,

any node v 2 fv1 ; v2 ; ; vnk g that has the k-hop distance the degree of any node in T ^ We are,

T is not more than 2d.

from r:chd½1 detects the failure of its parent, then the however, also interested in knowing the maximal degree

total number of nodes that need to rejoin P will be of a node in T

T.

l1

sk ¼ sk;v1 þsk;v2 þ þsk;vn ¼ nk þ nkþ1 þ nl1 ¼ i¼k ni . Remark 3. The root node in T T ¼ T k has the degree

The total rejoining cost ck is thus not more than sk þ l sk . k ¼ logd^ X . A nonroot node in the T k tree has the degree

Consider the time interval ½t N1 ; t for any t N1 . not more than d^ þ k 1. Thus, the degree of any node in

Thus, N 1 ¼ N1 nodes depart the system in the the system is not more than d^ þ k 1.

interval. In addition, each of these N 1 nodes leaves

the system with high probability. This is because 5.3 Weighted Diameter

the probability distribution of the lifetime Tl of a Corollary 5. In T i ¼ ðV ; EÞ, any node v 2 V that implements

peer is P ðTl > tÞ ¼ et , and P Tl > N1 ¼ eN þ1 0. JOIN_VICINITY and samples n nodes in V frg will connect

1

Therefore, in the time interval ½t N1 ; t, the average to a parent node u 2 V fvg such that E½cvu n2 .

rejoining cost Cavg is c1 þcN2 þþc

1

l

ðl þ 1Þ s1 þsN2 þþs

1

l

. Since Proof. Since P ðdiffðuÞ < diffðvÞÞ ¼ 12 , we have

Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.

HSIAO AND HE: A TREE-BASED PEER-TO-PEER NETWORK WITH QUALITY GUARANTEES 1107

Fig. 2. The maximal path length from r:chd½1 to any leaf. Fig. 3. The diameter.

1

P ðcvu yÞ \ diffðuÞ < diffðvÞ P ðcvu yÞ 52 þ 5 ¼ 30, with the probability not less than 1 514 ¼ 0:998

2 n (Theorem 6). This is because by Corollary 2, constructing a

1 ^

d-node ^ 0:01 d^ dln de

tree takes E½tðdÞ ^ ¼ 0:3 second, and

¼ 1 y :

2 ^ < d=2

by Theorem 6, E½tðdÞ ^ 0:45 second.

Similar to the proof in Lemma 1, we have

Z 1 6 SIMULATION RESULTS

1 n

E½cvu ¼ e2y dy We have developed an event-driven simulator that allows

0

Z 1 n 1 the study of the performance of tree-based networks. The

ð2Þ y performance metrics that we are interested in include

¼ e dy

0 the degree distribution of participating nodes and the

Z s (weighted) diameter of the network, given the number of

1

¼ ex dx; nodes participating in the system, the mean lifetime of the

s 0

1 1 joining peers, and the initially maximal degree d^ of a node.

where s ¼ n2 . Therefore, we have E½cvu n2 , and the In our simulations, the number of nodes participating in

proof thus follows. u

t the system is up to X ¼ 100;000. The initially maximal degree

d^ of a node simulated is from 5 to 100. Each participating peer

Theorem 7. Let each T i ð0 i kÞ have d^ nodes at most. Let has a lifetime with a mean of 150 minutes [28], [33]. The

lifetime follows the exponential distribution. We have also

X ¼ d^k be the maximum number of nodes in T T. Assume that

investigated the effect of a mean lifetime of 30 minutes. In this

each node in T i samples n ¼ cd^ nodes, where 0 < c 1. If

paper, we, however, omit the simulation results for the mean

n 2þ1 ðln X Þ , then the weighted diameter of T

T is ð1Þ in

lifetime of 30 minutes. This is because we do not observe any

expectation.

significant difference from the simulation results for the two

Proof. Without loss of generality, let N ¼ dk be the lifetime values (that is, 30 and 150 minutes).

maximum number of nodes in T , where d > 1, and We perform extensive simulations by averaging the

k > 1. Assume that each node in V samples n ¼ cd nodes, performance metrics collected from 1,000 runs. Each run

where 0 < c 1. Applying Corollary 4 yields the

1 expected takes 1,600 minutes.

weighted diameter equal to L ¼ 2 ln N cd 2 . Thus, if

2þ1 ðln N Þ 6.1 Height of T

d c , then L 1, and the proof follows. u

t

Fig. 2 depicts the simulation results for the T protocol,

Corollary 6. If the weighted diameter of T T is ð1Þ, then where the performance metric, that is, the maximal path

þ1

N Þ Ti

^ 2þ1 ðln N Þ .

d^ 2 ðlnc . That is, dv ¼ d c length from r:chd½1 to any leaf, is shown for the number of

participating peers from 10 to 100,000. We note that the

5.4 An Example x-axis is in logarithmic scale. The results show that the

Previous studies have shown that in popular P2P networks maximal path length from r:chd½1 to any leaf is nearly

(for example, KaZaA [32]), peers have the mean lifetime identical to ln X . This conforms to our theoretical analysis,

T with X ¼ 105 nodes,

1= ¼ 2:5 hours [28], [33]. That is, if T as discussed in Section 5 (see Theorem 2).

5

then ¼ 10 =9;000 11:11 newly coming peers per second 6.2 Diameter of T T

(Corollary 1). We let d^ ¼ 10, and k is thus 5. By Theorem 5, We illustrate the diameter of the tree implementing the T T

the degree of any node is not more than 20, with the protocol in Fig. 3. In the simulations, d^ is 10, 100, and 1.

probability not less than 1 103 ¼ 0:999. If the mean Fig. 3 shows that if d^ is enlarged (for example, d^ ¼ 100),

network delay between two Internet end hosts is 10 ms [34], then the diameter measured from simulations closely

then the mean number of registry nodes in the bootstrap matches the result presented in Corollary 4. However, if d^

will be 5, and the number of registry nodes is not more than becomes relatively smaller (for example, d^ ¼ 10), then our

Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.

1108 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO. 8, AUGUST 2008

Fig. 4. (a) The BRITE topology models Waxman and Barabasi. (b) The Fig. 6. The averaged rejoining cost per node.

real topology, with 500 PlanetLab nodes.

since the PlanetLab network topology exhibits the power-

TT networks have diameters larger than the expected (that law latency expansion [25].

is, 2 ln X ). This is because in the proof of Corollary 4, with Note that in the experiments with the Waxman, Barabasi,

which we estimate the diameter of a tree, with d^ nodes and PlanetLab topologies, we let d^ ¼ 100 and ¼ 1 (see

equal to ln d, ^ instead of dln de. ^ Thus, the diameters Theorem 7 and Corollary 6).

measured for our T T trees are not less than 2 ln X .

6.4 Degree Distribution

6.3 Weighted Diameter of T T Fig. 5 shows the cumulative distribution function of

Fig. 4a illustrates the simulation results for the weighted degrees, where ðx; yÞ represents the number y of nodes

diameters (that is, the maximal weighted path length) of our having degrees not more than x. In this experiment, we let

tree-shaped overlay networks over the Waxman and d^ ¼ 8; 20; 40, and X ¼ 100;000. The simulation results are

Barabasi topology models [35]. The performance similar for different system sizes and are thus omitted in

results for exploiting and not exploiting the physical this paper.

network topology (W/ Exploiting Network Locality The results in Fig. 5 conform to our analytical result

and W/O Exploiting Network Locality, respectively) presented in Theorem 5. That is, the degree of a node is

are both shown in Fig. 4a. We note that the delay between ^

unlikely to be more than the expected d.

nodes in the Waxman and Barabasi topologies does not

follow the distribution of the power-law latency expansion. 6.5 Rejoining Cost and Overhead

However, our algorithms still work well, and with our 6.5.1 The Rejoining Cost

algorithms, the increase in the delay between nodes is less Theorem 4 states that a node takes at most Oðln2 N Þ, on the

sensitive to the number of nodes participating in the system. average, to rejoin a T i network if the node detects the

We also investigate the effectiveness of our algorithms in departure or failure of its parent, where N is the number of

exploiting the physical network using the real network nodes (that is, the roots of T i1 ) in T i . Our simulation

topology of an experimental testbed, namely, PlanetLab results show that the averaged rejoining cost of a node is

[36]. Fig. 4b depicts the simulation results for the topology very small, given d^ up to 100. This represents that our

with 500 PlanetLab nodes. As we can see from the results healing protocol is efficient. For further understanding of

shown in Fig. 4b, our algorithms for exploiting the physical the rejoining cost of our protocol, we instead investigate d^

network locality are particularly effective (see Theorem 7), up to 100,000 (that is, N ¼ 100;000). Fig. 6 depicts the

Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.

HSIAO AND HE: A TREE-BASED PEER-TO-PEER NETWORK WITH QUALITY GUARANTEES 1109

simulation results, which presents that a node takes the cost a simple T protocol (see Section 3). We thoroughly and

far less than Oðln2 N Þ to repair the network. rigorously analyzed the performance of our proposal, and

we have shown that our tree-based network has nice

6.5.2 The Overhead performance guarantees in terms of

As we have discussed in Section 1.1, the earlier proposals

such as Narada [6], PROMISE [9], and Anysee [10] intend to 1. the degree of a peer,

minimize the delay of receiving messages sent by the root 2. the diameter,

for any nonroot nodes. That is, these proposals work 3. the weighted diameter,

toward the construction of the shortest path spanning tree. 4. the cost of joining a new peer,

We thus also investigate the overhead of the shortest path 5. the protocol maintenance overhead, and

spanning tree. 6. the queue length in the bootstrap node.

We note that the shortest path spanning tree is a We also validate our analytical results in extensive

minimal-cost network flow problem, which can be formally simulations.

formulated as follows: We believe that our tree-based overlay could serve as an

X infrastructure for P2P applications that demand scalability,

min c f

ij2E ij ij fast communication, and low overhead. Our next work will

P P

k2OðjÞ fjk i2IðjÞ fij ¼ 1; 8j 2 V frg;

study how a P2P application such as live media broadcasting

P takes advantage of our tree-based overlay for minimizing

s:t: f

k2OðrÞ rk ¼ jV j 1;

the communication latencies among nodes and the system-

fij 0; 8i; j 2 V ; wide overhead. In particular, recent studies (for example,

where IðjÞ and OðjÞ respectively represent the incoming and [37] and [38]) have presented that the heterogeneity of peers

outgoing flows to/from the node j. Since is the nature of a P2P environment, and they show that

P one of our design

objectives is to minimize the overhead ij2E cij f as much as taking advantage of the heterogeneity can improve the

possible, if we normalize f such thatPf ¼ 1, then we can performance quality of tree-based streaming overlays. In our

simply estimate future study, we will optimize our tree-based overlay by

P the overhead equal to ij2E cij . Clearly, the

cost (that is, ij2E cij fij ) of the exploiting the heterogeneity of peers.

P shortest path spanning tree,

as defined above, is at least ij2E cij , because the resulting

flow fij must be 1 for any link ij in the tree. The shortest

ACKNOWLEDGMENTS

path spanning tree works toward the minimization of the

overhead of the network. The authors thank the anonymous reviewers for their

Fig. 7 shows the overheads of our tree-based overlay valuable feedback and Dr. Yingwu Zhu for his helpful

with/without the exploitation of the physical network comments on this paper. This work was partially supported

locality (denoted by W/ and W/O, respectively). In Fig. 7, by the National Science Council, Taiwan, under Grant

SPT represents the shortest path spanning tree. The physical 95-2221-E-006-095.

network topologies that we study in this experiment are

Waxman, Barabasi, and PlanetLab, with jV j ¼ 2;000, REFERENCES

jV j ¼ 2;000, and jV j ¼ 500 nodes, respectively. In the

[1] J. Kubiatowicz, D. Bindel, Y. Chen, P. Eaton, D. Geels, R. Gummadi,

experiment, we P estimate the overhead by using the total S. Rhea, H. Weatherspoon, W. Weimer, C. Wells, and B. Zhao,

delay (that is, ij2E cij ), assuming that f is normalized to 1. “OceanStore: An Architecture for Global-Scale Persistent Storage,”

Fig. 7 presents that our tree-based overlay with the Proc. Ninth ACM Int’l Conf. Architectural Support for Programming

exploitation of network locality (that is, W/) obviously Languages and Operating Systems (ASPLOS ’00), pp. 190-201, Nov.

2000.

outperforms SPT. This indicates that our tree-based overlay [2] B.Y. Zhao, L. Huang, J. Stribling, S.C. Rhea, A.D. Joseph, and J.D.

performs better toward the minimization of the overhead. Kubiatowicz, “Tapestry: A Resilient Global-Scale Overlay for

Service Deployment,” IEEE J. Selected Areas in Comm., vol. 22,

no. 1, pp. 41-53, Jan. 2004.

7 SUMMARY AND FUTURE WORK [3] D. England, B. Veeravalli, and J.B. Weissman, “A Robust

Spanning Tree Topology for Data Collection and Dissemination

We have presented a tree-shaped P2P network infrastruc- in Distributed Environments,” IEEE Trans. Parallel and Distributed

ture. Our tree-based overlay is lightweight and implements Systems, vol. 18, no. 5, pp. 608-620, May 2007.

Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.

1110 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO. 8, AUGUST 2008

[4] J. Li, J. Stribling, T.M. Gil, R. Morris, and M.F. Kaashoek, [28] S. Saroiu, P.K. Gummadi, and S.D. Gribble, “Measurement Study

“Comparing the Performance of Distributed Hash Tables under of Peer-to-Peer File Sharing Systems,” Proc. Multimedia Computing

Churn,” LNCS 3279, pp. 87-99, Jan. 2005. and Networking (MCN ’02), pp. 18-25, Jan. 2002.

[5] J. Duato, S. Yalamanchili, and L. Ni, Interconnection Networks: [29] Napster, http://www.napster.com/, 2007.

An Engineering Approach. Morgan Kaufmann, 2002. [30] G. Pandurangan, P. Raghavan, and E. Upfal, “Building Low-

[6] Y. Chu, S. Rao, and H. Zhang, “A Case for End System Multicast,” Diameter Peer-to-Peer Networks,” IEEE J. Selected Areas in Comm.,

Proc. ACM SIGMETRICS ’00, pp. 1-12, 2000. vol. 21, no. 6, pp. 995-1002, Aug. 2003.

[7] S. Banerjee, B. Bhattacharjee, and C. Kommareddy, “Scalable [31] H.-C. Hsiao and C.-P. He, “A Tree-Based Peer-to-Peer Network

Application Layer Multicast,” Proc. ACM SIGCOMM ’02, pp. 205- with Quality Guarantees,” technical report (available upon

217, Aug. 2002. request), Dept. of Computer Science and Information Eng., Nat’l

Cheng-Kung Univ., June 2007.

[8] D.A. Tran, K.A. Hua, and T. Do, “ZIGZAG: An Efficient Peer-to-

[32] KaZaA, http://www.kazaa.com/, 2007.

Peer Scheme for Media Streaming,” Proc. IEEE INFOCOM ’03,

[33] K.P. Gummadi, R.J. Dunn, S. Saroiu, S.D. Gribble, H.M. Levy, and

pp. 1283-1292, Mar. 2003.

J. Zahorjan, “Measurement, Modeling, and Analysis of a Peer-to-

[9] M. Hefeeda, A. Habib, B. Botev, D. Xu, and B. Bhargava, Peer File-Sharing Workload,” Proc. 19th ACM Symp. Operating

“PROMISE: Peer-to-Peer Media Streaming Using CollectCast,” Systems Principles (SOSP ’03), pp. 314-329, Oct. 2003.

Proc. 11th ACM Int’l Conf. Multimedia (Multimedia ’03), pp. 45-54, [34] S. Rhea, D. Geels, T. Roscoe, and J. Kubiatowicz, “Handling Churn

Nov. 2003. in a DHT,” Proc. Usenix Ann. Technical Conf., 2004.

[10] X. Liao, H. Jin, Y. Liu, L.M. Ni, and D. Deng, “AnySee: Peer-to- [35] A. Medina, A. Lakhina, I. Matta, and J. Byers, “BRITE: An

Peer Live Streaming,” Proc. IEEE INFOCOM ’06, pp. 1-10, Mar. Approach to Universal Topology Generation,” Proc. Ninth Int’l

2006. Symp. Modeling, Analysis, and Simulation of Computer and Telecomm.

[11] V. Venkataraman, K. Yoshida, and P. Francis, “Chunkyspread: Systems (MASCOTS ’01), pp. 346-353, Aug. 2001.

Heterogeneous Unstructured Tree-Based Peer-to-Peer Multicast,” [36] PlanetLab, http://www.planet-lab.org/, 2007.

Proc. 14th IEEE Int’l Conf. Network Protocols (ICNP ’06), pp. 2-11, [37] M. Bishop, S. Rao, and K. Sripanidkulchai, “Considering Priority

Nov. 2006. in Overlay Multicast Protocols under Heterogeneous Environ-

[12] S.M. Banik, S. Radhakrishnan, and C.N. Sekharan, “Multicast ments,” Proc. IEEE INFOCOM ’06, pp. 1-13, Mar. 2006.

Routing with Delay and Delay Variation Constraints for Colla- [38] Y.-W. Sung, M. Bishop, and S. Rao, “Enabling Contribution

borative Applications on Overlay Networks,” IEEE Trans. Parallel Awareness in an Overlay Broadcasting System,” Proc. ACM

and Distributed Systems, vol. 18, no. 3, pp. 421-431, Mar. 2007. SIGCOMM ’06, pp. 411-422, Sept. 2006.

[13] I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, and H. Balakrish-

nan, “Chord: A Scalable Peer-to-Peer Lookup Service for Internet Hung-Chang Hsiao received the PhD degree in

Applications,” Proc. ACM SIGCOMM ’01, pp. 149-160, Aug. 2001. computer science from the National Tsing-Hua

[14] A. Rowstron and P. Druschel, “Pastry: Scalable, Distributed Object University, Hsinchu, Taiwan, in 2000. From

Location and Routing for Large-Scale Peer-to-Peer Systems,” October 2000 to July 2005, he was a post-

LNCS 2218, pp. 161-172, Nov. 2001. doctoral researcher in the Department of Com-

[15] M. Castro, P. Druschel, A. Kermarrec, A. Nandi, A. Rowstron, and puter Science, National Tsing-Hua University.

A. Singh, “SplitStream: High-Bandwidth Content Multicast in a Since August 2005, he has been an assistant

Cooperative Environment,” Proc. 19th ACM Symp. Operating professor in the Department of Computer

Systems Principles (SOSP ’03), pp. 298-313, Oct. 2003. Science and Information Engineering, National

[16] C. Chou, T.-Y. Huang, K.-L. Huang, and T.-Y. Chen, “SCALLOP: Cheng-Kung University, Tainan, Taiwan. His

A Scalable and Load-Balanced Peer-to-Peer Lookup Protocol,” research interests include peer-to-peer computing, overlay networking,

IEEE Trans. Parallel and Distributed Systems, vol. 17, no. 5, pp. 419- and grid computing.

433, May 2006.

[17] C.G. Plaxton, R. Rajaraman, and A.W. Richa, “Accessing Nearby Chih-Peng He received the BS degree in

Copies of Replicated Objects in a Distributed Environment,” computer science and information engineering

Proc. Ninth ACM Symp. Parallel Algorithms and Architectures from Fu-Jen Catholic University, Taipei, in 2004

(SPAA ’97), pp. 311-320, June 1997. and the MS degree in computer science and

information engineering from the National

[18] M. Castro, M.B. Jones, A.-M. Kermarrec, A. Rowstron, M. Cheng-Kung University, Tainan, Taiwan, in

Theimer, H. Wang, and A. Wolman, “An Evaluation of Scalable 2007. He is currently with the Department of

Application-Level Multicast Built Using Peer-to-Peer Overlays,” Computer Science and Information Engineer-

Proc. IEEE INFOCOM ’03, pp. 1510-1520, Mar. 2003. ing, National Cheng-Kung University. His re-

[19] S. El-Ansary, L.O. Alima, P. Brand, and S. Haridi, “Efficient search interests include peer-to-peer computing

Broadcast in Structured P2P Networks,” LNCS 2735, pp. 304-314, and overlay networking.

Oct. 2003.

[20] A. Bharambe, S. Rao, V. Padmanabhan, S. Seshan, and H. Zhang, . For more information on this or any other computing topic,

“The Impact of Heterogeneous Bandwidth Constraints on DHT- please visit our Digital Library at www.computer.org/publications/dlib.

Based Multicast Protocols,” LNCS 3640, pp. 115-126, Feb. 2005.

[21] T. Cormen, C. Leiserson, and R. Rivest, “Recurrences,” Introduc-

tion to Algorithms, second ed. MIT and McGraw-Hill, 2001.

[22] M. Faloutsos, P. Faloutsos, and C. Faloutsos, “On Power-Law

Relationships of the Internet Topology,” Proc. ACM SIGCOMM ’99,

pp. 251-262, Aug. 1999.

[23] Gnutella, http://rfc-gnutella.sourceforge.net/, 2007.

[24] D.R. Karger and M. Ruhl, “Finding Nearest Neighbors in Growth-

Restricted Metrics,” Proc. 34th ACM Ann. Symp. Theory of

Computing (STOC ’02), pp. 741-750, May 2002.

[25] H. Zhang, A. Goel, and R. Govindan, “Improving Lookup Latency

in Distributed Hash Table Systems Using Random Sampling,”

ACM/IEEE Trans. Networking, vol. 13, no. 5, pp. 1121-1134, Oct.

2005.

[26] T.S.E. Ng and H. Zhang, “Predicting Internet Network Distance

with Coordinates-Based Approaches,” Proc. IEEE INFOCOM ’02,

pp. 170-179, June 2002.

[27] J.C. Chu, K.S. Labonte, and B.N. Levine, “Availability and Locality

Measurements of Peer-to-Peer File Systems,” Proc. SPIE—ITCom:

Scalability and Traffic Control in IP Networks, pp. 310-321, July 2002.

Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.