Вы находитесь на странице: 1из 5

Distributed Service Discovery with Guarantees in

Peer-to-Peer Networks using Distributed Hashtables


Sven Kaffille, Karsten Loesing, and Guido Wirtz
Distributed and Mobile Systems Group
Otto-Friedrich-Universitat Bamberg
Feldkirchenstrae 21, 96047 Bamberg, GERMANY
{sven.kaffille|karsten.loesing|guido.wirtz}@wiai.uni-bamberg.de
Phone: +49951863{2812|2810|2527}; Fax: +499518635528
Conference: PDPTA05; Presenting author (if accepted): Karsten Loesing

Abstract This paper proposes a protocol for decentralized significantly. Further, any server is a single point of failure
service discovery with guarantees. We use a peer-to-peer net- making the service unavailable when it is disconnected from
work based on the distributed hashtable Chord that provides the network. Apart from that, central or hierarchical structures
a structured overlay network in order to avoid flooding the
whole network. Service descriptions are decomposed into portions for registries do not always correspond to the dynamic forma-
which can be efficiently distributed and retrieved. We propose tion process of service providing environments. Next, servers
a way to evaluate our protocol by running simulations in have to be set up and managed in order to allow collaboration.
comparison with a straightforward way of achieving the same This represents a barrier for spontaneous collaborations which
goal in an unstructured, Gnutella-like network. might inhibit ad-hoc formation and should be avoided. In
K EYWORDS : S ERVICE D ISCOVERY, P EER - TO - PEER , D IS -
TRIBUTED H ASHTABLE
some situations it may be reasonable to interconnect multiple
registries which have formed apart of each other. This should
I. I NTRODUCTION be done rather in a coequal than in a master-servant setting.
In the recent years the Web has changed from a rather Another example is grouping of users which share a common
consumer-oriented to a consumer-and-producer environment. interest or only want to cooperate in their closed group. In
Providing a service is meanwhile understood nearly as usual these situations a distributed realization of a registry service
as consuming a service due to the empowerment of the edge is more feasible than picking a single node or a fixed set of
of the Internet. Services are offered by numerous nodes in a nodes to provide this task.
decentralized way and service providers may join or leave at There exist means of connecting multiple registries in non-
will. Examples cover peer-to-peer (P2P) file sharing systems hierarchical fashions. For example in the agent world of FIPA-
in which every peer offers a service for downloading files by conforming agent platforms [2] the so-called directory facili-
another peer, software agent systems which allow agents to tators can be federated, so that query requests are forwarded
be created at different places in order to serve users or other to each other with a given TTL (time to live). As another
agents, and Web Services in which any node with an HTTP example, UDDI allows connection of multiple registries hi-
server running can create its own Web Service and offer it to erarchically or in a peer-like fashion which is called registry
other nodes in the Web. affiliation [3]. Those approaches to connect registries may be
All these services are of little use until they are advertised to compared with unstructured, pure or hybrid P2P systems like
service consumers byhopefully computer-readableservice Gnutella [4] and FastTrack [5]. They form decentralized nets
descriptions including their type of service as well as parame- of all nodes, respectively specialized super nodes, and forward
ters describing service details. This is done by first publishing queries with a given TTL. The problem of these approaches is
service descriptions at so-called registries and second allowing thatunless a query traverses all (super) nodes in a network by
service consumers to query the registry database for services the means of floodingno guarantees can be given for finding
matching certain criteria. The latter may be divided into a certain resource. Though this might not be a problem in
looking up a certain service with a known identifier which file sharing systems, it is more than just an inconvenience for
is called name service and searching for services with certain service-oriented environments which require precise results.
attributes which is known as directory service [1]. Our approach aims to decentralize the registry in a struc-
Usually registry services are located at central, well-known tured way in order to provide the guarantee of finding any
places running on nodes dedicated only for this task. While registered service description matching a given query. This
being algorithmically simple, this solution has a couple of is done by forming an overlay network using the distributed
drawbacks: First of all central solutions generally do not scale. hashtable Chord [6] and distributing the service information
Server load linearly grows with the number of clients making it in it so that it can be queried by contacting only a logarithmic
necessary to replicate servers whenever network size increases number of nodes.
Figure 1 gives an example of a registry network that makes scalability (depending on the number of services published
use of our protocol. The registry service is distributed among and number of queries). Some requirements apply only to a
the nodes in the cloud while the nodes outside of it just distributed discovery service (especially in a P2P environment)
make use of it without contributing to it. The circles denote as, for example, low bandwidth consumption and low number
administrative boundaries, e.g. of corporations or universities. of messages for publishing, modifying, unpublishing, and
querying of services.
III. OVERLAY N ETWORK - R EQUIREMENTS AND
A SSUMPTIONS
The non-functional requirements directly impose require-
ments on the underlying P2P overlay network as the guarantee
to find all available services does, as well. The overlay network
has to allow storage and retrieval of service descriptions.
Therefore two types of P2P networks could be applied:
unstructured and structured networks. Unstructured P2P net-
works like Gnutella [4] and FastTrack [5] would achieve this
goal by creating local indices on all nodes and forwarding
queries through the network until either a result was found, or
a given TTL value has run off. Advantages of this approach are
simplicity of algorithms and arbitrary complexity of queries.
Fig. 1. Network consisting of nodes providing the registry service (inside But drawbacks which make this solution unfeasible for a
cloud) and those that only make use of it (outside cloud). Nodes in circles discovery service are bad scalability and incapability of giving
stand for nodes belonging to administrative units. guarantees whether a certain service is available, or not.
Structured P2P networks like Chord [6] take another ap-
proach. Here, the index used for service discovery is already
II. R EQUIREMENTS ON D ISCOVERY S ERVICES
distributed in the network when registering the service. This
The requirements on a discovery service imposed by service is done in a way that queries can be step-wisely routed to the
providers and consumers can be divided into functional and node which is responsible for holding the required information.
non-functional requirements. Advantages of structured overlays are efficient lookups which
Functional requirements of service providers comprise con- only involve a logarithmic number of nodes in the network.
venient methods and data structures to publish, modify and Further, they guarantee that any query can be answered even
unpublish their services. Service descriptions, that service if service provider and consumer reside at two distant edges of
providers want to publish, consist of (name, value) pairs the network. A drawback is the requirement of maintenance of
describing the attributes of services to publish. Values of a certain network structure in case of joins, leaves, or failures
service descriptions can contain primitive data types like of nodes, which usually is more expensive than in unstructured
integer, string, etc., complex data types, or sets of one of P2P networks. Another problem is that distributed hashtables
these types. Complex data types are composed of (name, do not support searching by themselves, but only looking up
value) pairs as a service description. A discovery service must data bound to concrete hash values.
therefore allow publishing, modifying, and unpublishing tree- The main argument for our approach to use a structured P2P
like complex data structures. There must not be any restrictions network is that queries are guaranteed to be answered without
on the number of attributes a service description can consist having to flood the whole network. It has to be evaluated
of. by simulation whether maintenance costs play a crucial role
Service consumers querying a discovery service need meth- compared to costs of publishing and querying. Choosing the
ods to look up services and a convenient data structure to right data structure to publish and retrieve service information
describe templates for services, which they are looking for. in the network is one of the crucial tasks of our approach.
This data structure must be defined analog to the one used for In Chord [6] any piece of information has to be uniquely
service descriptions. The discovery service must return all ser- identifiable by a key. This is achieved by applying a hash
vice descriptions matching a query giving service consumers function on any data which is to be stored in the network.
the guarantee to find any available service. Further it should This key is used for storage as well as for retrieval of data.
give users the possibility to specify an upper bound for the Consequently, in order to find any information the full key has
number of returned services and provide a means for iterating to be known in advance assuming that one has to know exactly
over them. what to look for. Searching capability has to be implemented
Beyond functional requirements the discovery service has separately of this lookup mechanism or by using additional
to satisfy non-functional requirements. Some of these are data structures, e.g. inverted indices. All nodes in the overlay
independent of applying a discovery service in a distributed network are assigned unique identifiers (ID) of the same key
manner. These are e.g. low response time, reliability, and space. Any node is responsible for the data keys within the
range of the next smaller node ID in the network up to its own by connecting to one or more nodes which are part of the
node ID. Therefore every node has to know its predecessor discovery service.
which is propagated and updated by maintenance messages. The following sections show how service descriptions are
The ordering of nodes in successor-predecessor relations forms published, unpublished, modified, and queried with help of the
the so-called Chord ring. In addition to predecessor refer- provided architecture.
ences every node stores a so-called finger tablea skip list
containing i references to nodes of which the node IDs are A. Publishing service descriptions
at least the i-th power of two greater than their own node The service discovery layer includes a data structure for
ID. By this, queries can be forwarded at least half the way representing a service description. Within this data structure
closer to their destination in every step only needing to know various attributes describing a service can be set. These at-
a logarithmic number of nodes in the network. This leads to tributes consist of (name, value) pairs. Values can be primitive
logarithmic performance for storing and retrieving any item or complex types, or sets of one primitive or complex type.
stored in the ring. Whenever nodes join or leave the network Primitive types include integer, string, etc. Complex types may
routing information have to be updated. This is done by a contain primitive or complex types as well as sets of them as
stabilization protocol which has been proven to be correct even subtypes. In this way a tree-like structure of types can be built
in case of multiple node joins or leaves at the same time [6]. (e.g. figure 3).
In order to prevent data loss because of node failures data
should be replicated on multiple nodes in a network, e.g. by ( type = ticketservice;
copying it on the k next nodes following a node responsible url = ( protocol = http;
for a specific data item. host = 81.200.194.40;
port = 8080;);
IV. D ESIGN OF THE DISCOVERY SERVICE owner = DB;
languages = {german,
The design of the proposed P2P discovery service is covered
english,
in this section. This service consists of three layers on top of
french};
which an application may be built. The first and lowest layer
)
is the transport layer, on top of which the second layer, the
P2P overlay network Chord resides. The third layer is the local
Fig. 3. Example of a service description.
discovery service layer which is implemented above the Chord
layer. This layer provides methods to publish, unpublish,
For each service description to be published a registration
modify, and query service descriptions. It also maintains a
ID is created which can later be used to uniquely identify the
local data structure which stores all service descriptions that
published service. Further, keys are generated for the attributes
have been published by the local node. Parts of the service
of the service description:
discovery layer rely directly on the transport layer for direct
communication with other nodes of which the addresses are The keys for attributes with primitive type are generated

already known by the service discovery layer. Finally, an by calculating the hash value of the concatenation of
application may be built on top of the service discovery layer. name and value. For each of these keys a so-called service
The architecture is shown in figure 2. reference consisting of the local nodes address and the
registration ID is stored in the Chord layer using the
calculated hash value as key.
Attributes having a complex type are decomposed into
their subtypes. For each of these a service reference is
stored in the Chord layer. In order to retain the tree-like
structure of the service description the names of attributes
of a complex type are preceded by the name of the
complex attribute itself. By this each attribute is assigned
a fully qualified name which allows unique identification
of attributes in a description.
Fig. 2. Architecture of the P2P discovery service. At last, attributes consisting of a set of types are stored
one by one as described above. The names of the items in
These layers are intended to be implemented on every node the set are also preceded by the attribute name of the set.
in the discovery service network. Hence, every participating The order of values in a set is not mapped to Chord. If
peer can publish, unpublish, and modify service descriptions such an order is desired, a complex type should be used
that it wants to provide to and query service descriptions instead.
provided by other peers. Alternatively, it is possible that The presented mapping of service descriptions to Chord
services can be provided to (and provided by) nodes not keys preserves the hierarchical structure of attributes, but
directly participating in P2P discovery service. This is done discards the ordering of its elements. Figure 4 shows an
example of the keys generated for the service description of service references to the Chord layer. Since the content of the
figure 3. service references stays the same the unchanged attributes are
not affected by the modification. This procedure ensures that
"type.ticketservice" -> ADBCB6...17 there is only network traffic generated for changed attributes.
"url.protocol.http" -> 801D4F...F0 If a service description has to be unpublished, the service
"url.host.81.200.194.40" -> 922E25...59 references belonging to the affected service have to be deleted.
"url.port.8080" -> 9F5A36...23 Therefore the keys of all service attributes are generated and
"owner.DB" -> 71A8D1...30 passed to the Chord layer for removal. Further, the service
"languages.german" -> A36923...42 description is removed from the local service description data
"languages.english" -> BE2CD9...AD structure.
"languages.french" -> B81C8B...D9
D. Handling departure and failure of nodes
Fig. 4. Example of keys for a service description. Nodes joining and leaving the P2P network as well as
crashing nodes are handled by the Chord layer, transparently
After service references have been stored within the Chord to the service discovery layer. But the service discovery layer
layer for each attribute, the service description itself is stored has to take care of the content which is stored in the Chord
in a local data structure of the service discovery layer. The layer. When a node leaves the network, the service discovery
generated registration ID is returned to the application for later layer of that node has to ensure that all service references
referral to the service description. referring to the leaving node are removed from the underlying
Chord layer.
B. Querying service descriptions If a node crashes, the service references of the crashing
In order to search for a service, a node has to know the node have to be removed, too. There are three possibilities to
schemathe tree-like hierarchy of attribute namesof the achieve this:
service description of the service it is looking for and at least A leasing concept could be employed making every node
one value of a service attribute. As with publishing of services responsible to renew the leases for its service references
the attribute types may be primitive or complex types as well in a regular interval. Unfortunately, this would lead to
as sets of these. In order to permit multi-attribute queries, the same number of messages as if all service descrip-
templates are used which incorporate all attributes belonging tions would be published again, but would be repeated
to one query. whenever leases have expired.
Querying for all services matching a given template is done A node would recognize that a node has crashed while
by choosing one of the attributes by random, calculating the querying for a service description. This is why all nodes
hash key for it, and looking up all available service references storing possible relevant service descriptions have to
for it in the Chord layer. Performance can be improved, if be contacted directly. If a node does not respond to
only the service references for the least frequent key are this request, the querying node can remove the affected
queried which requires an additional data structure maintaining service reference from the Chord layer. This solution
keyword frequency (e.g. see [7]). The node addresses of would lead to lots of properties being stored in the Chord
the returned service references are then used to retrieve the layer which are never used for a query.
complete service descriptions. It can be tested locally, if Service descriptions could be replicated by the publish-
these descriptions also match the other attributes contained ing node to k other nodes. This could for example be
in the query template. Only service descriptions matching all achieved by exploiting the routing information used by
attributes of the template are returned to the application layer. the Chord layer for internal replication purposes. If one
From this description follows that templates may only of these nodes detects that a publishing node has crashed,
contain complete (name, value) pairs. It is not possible to it could initiate the removal of all service references pub-
support wildcard usage in the value part of an attribute, lished by the crashed node. Though being most complex
because searching for them cannot be done efficiently in the this solution is preferable, because it ensures that service
Chord layer. Searching for ranges of values is not possible at descriptions are up-to-date without producing significant
the moment, neither. Both issues might be addressed in future traffic.
work.
V. E VALUATION
C. Modifying and unpublishing service descriptions Simulation of our protocol is one possible means to eval-
Sometimes it may occur that a previously published service uate performance of it in comparison to other protocols, i.e.
description has to be modified. Therefore the attributes which those which are not based on structured P2P networks like
have changed or have become obsolete are removed from the distributed hashtables. A protocol suitable for comparison
Chord layer and attributes which are new or have changed must fulfill the same functional requirements as we stated
are added to it. This is done by calculating the keys for the above. That includes publishing service descriptions as well
affected attributes and delegating the addition or removal of as sending queries containing service templates which are
guaranteed to be answered by all matching services in the Service descriptions are decomposed into portions which can
network. Therefore we assume a Gnutella-like network in be efficiently distributed and retrieved. We proposed a way to
which service descriptions are stored locally at each node and simulate our protocol by comparing it with a straightforward
queries are sent through the network by flooding in order way of achieving the same goal in an unstructured network.
to achieve the same guarantees as our protocol does. The
VIII. C URRENT AND FUTURE W ORK
definitive disadvantage of this approach is that flooding is
inherently inefficient. But an advantage of the Gnutella-like Currently we are working on simulating our protocol ac-
protocol may be that one query message can contain a template cording to the assumptions made in section V. Moreover a
with an arbitrary number of attributes of a potential service. prototype for service discovery in a FIPA-conforming agent
Usage of the discovery service does not depend on one platform is under development. We also aim to improve
of these two protocols. Therefore assumptions must be made efficiency of our protocol in the future and aim to incorporate
on network size and dynamics, e.g. arrival and uptime of means like a frequency dictionary of service attributes in the
nodes, and on service usage, e.g. frequency of publishing style of fusion dictionary [7] in order to decrease traffic of
and querying of services and number and kind of attributes multi-attribute queries. Further we intend to support wildcard
contained in service descriptions and query templates. The and range queries in the future. At last, security issues have
data to be measured and compared in the simulation can be to be taken into consideration, since any adversary would be
divided into quality measures, e.g. response time of queries, able to add, modify, or delete service descriptions at will.
and impacts for the nodes, e.g. amount of storage, number of R EFERENCES
open connections, and traffic volume. Since the setting is the
[1] G. Coulouris, J. Dollimore, and T. Kindberg, Distributed Systems:
same for both protocols we expect our protocol to outperform Concepts and Design, 3rd ed. Addison-Wesley, 2001.
the Gnutella-like approach. [2] FIPA Agent Management Specification, Foundation for Intelligent Phys-
As simulation environment ns-2 [8] may be used which is a ical Agents (FIPA), March 2004.
[3] Introduction to UDDI: Important Features and Functional Concepts,
discrete event simulator working on packet level. In addition to Organization for the Advancement of Structured Information Standards
this, GnutellaSim [9] might be useful which is an open-source (OASIS), October 2004.
library for simulation of P2P protocols and can be run with [4] The Gnutella Protocol Specification v0.4. [Online]. Available:
http://www9.limewire.com/developer/gnutella protocol 0.4.pdf
ns-2. The Gnutella protocol provided by GnutellaSim has to [5] A. Oram, Ed., Peer-to-Peer: Harnessing the Benefits of a Disruptive
be modified to achieve unbounded flooding which is required Technology. OReilly, March 2001.
for the comparison. Further an implementation of our protocol [6] I. Stoica, R. Morris, D. Liben-Nowell, D. R. Karger, M. F. Kaashoek,
F. Dabek, and H. Balakrishnan, Chord: A scalable peer-to-peer lookup
as well as a Chord layer has to be added to the library. protocol for internet applications, IEEE/ACM Trans. Netw., vol. 11,
no. 1, pp. 1732, 2003.
VI. R ELATED W ORK [7] L. Liu, K. D. Ryu, and K.-W. Lee, Keyword fusion to support efficient
keyword-based search in peer-to-peer file sharing, in Cluster Computing
Recently, some work has been done on enabling searching and the Grid, 2004. CCGrid 2004. IEEE International Symposium on,
capabilities in P2P systems based on distributed hashtables April 2004, pp. 269276.
which goes beyond looking up keys. [10] proposes a way [8] [Online]. Available: http://www.isi.edu/nsnam/ns/
[9] Q. He, M. Ammar, G. Riley, H. Raj, and R. Fujimoto,
to apply inverted indices to use for keywords in file sharing Mapping peer behavior to packet-level details: A framework for
applications. [11] extends this model by adding mechanisms packet-level simulation of peer-to-peer systems, in 11th IEEE/ACM
to improve query efficiency in such a system, namely query International Symposium on Modeling, Analysis and Simulation of
Computer Telecommunication Systems, 2003. [Online]. Available:
ordering, bloom filters, popularity information, and truncated http://csdl.computer.org/comp/proceedings/mascots/2003/2039/00/2039toc.htm
results. [7] introduces a keyword dictionary and improves [10] P. Reynolds and A. Vahdat, Efficient peer-to-peer keyword searching,
query efficiency by so-called keyword fusion. All this work has in Lecture Notes in Computer Science, vol. 2672. Springer-Verlag
GmbH, 2003, pp. 2140.
been applied to searching for multimedia data in file sharing [11] T. Lu, S. Sinha, and A. Sudan, Panache: A scalable
systems rather than for service discovery in service-oriented distributed index for keyword search. [Online]. Available:
environments. http://www.pdos.lcs.mit.edu/6.824-2002/projects/
[12] D. Elenius and M. Ingmarsson, Ontology-based service discovery
In our approach we assume service providers and consumers in p2p networks, in Proceedings of the MobiQuitous04
to have a common schema for describing services. That means Workshop on Peer-to-Peer Knowledge Management (P2PKM 2004),
that service types and their attributes are known before using Boston, MA, USA, August 22, 2004, 2004. [Online]. Available:
citeseer.ist.psu.edu/711664.html
the registry service. In contrast to this [12] proposes a means to
add semantics into registry services, for example by returning
similar services which do not exactly match the queried service
description, by using ontologies.
VII. C ONCLUSION
In this paper we proposed a protocol for decentralized
service discovery with guarantees. We used a P2P network
based on a distributed hashtable that provides a structured
overlay network in order to avoid flooding the whole network.

Вам также может понравиться