Вы находитесь на странице: 1из 12

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TSC.2015.2398442, IEEE Transactions on Services Computing

IEEE TRANSACTIONS ON SERVICES COMPUTING

A Multilevel Index Model to Expedite Web


Service Discovery and Composition in
Large-Scale Service Repositories
Yan WU, ChunGang YAN, ZhiJun DING, GuanJun LIU, PengWei WANG, ChangJun JIANG, Member,
IEEE, and MengChu ZHOU, Fellow, IEEE

1
AbstractThe number of Web services has grown drastically.
Then how to manage them efficiently in a service repository is an
important issue to address. Given a special field, there often exists
an efficient data structure for a class of objects, e.g., the Google'
Bigtable is very suitable for Web pages' storage and management.
Based on the theory of the equivalence relations and quotient sets,
this work proposes a multilevel index model for large-scale service
repositories, which can be used to reduce the execution time of
service discovery and composition. Its novel use of keys as
inspired by the key in relational database can effectively remove
the redundancy of the commonly-used inverted index. Its four
function-based operations are for the first time proposed to
manage and maintain services in a repository. The experiments
validate that the proposed model is more efficient than the
existing structures, i.e., sequential and inverted index ones.
Index TermsWeb service, service composition, service
discovery, service management, and big data.

I. INTRODUCTION

service-oriented computing (SOC) paradigm uses


services to support the development of rapid, low-cost,
interoperable, evolvable, and massively distributed
applications [1]. The application of SOC on the Web is
manifested by Web services [2]. Hence, the number of Web
services has grown drastically in recent years [3]. The rapid
development of cloud computing also promotes the use of Web
services. For example, Amazon Web Services, Google App
Engine and Windows Azure are well-known cloud computing
platforms. Their many remote management functions are
implemented by Web services. Many applications are also
developed in the form of Web services for sale, such as Bing
Manuscript received January xx, 2014. This work was supported in part by
the National Basic Research Program of China (973 Program) under Grant No.
2010CB328101, National Natural Science Funds of P.R. China under Grants
No. 61173016.
Y. Wu, C. Yan, Z. Ding, G. Liu, P. Wang, C. Jiang and M. C. Zhou are with
the Key Laboratory of Embedded System and Service Computing, Ministry of
Education, Tongji University, Shanghai, 201804, China, Lab Tel:
0086-021-69589864 (e-mail: vw_@163.com; yanchungang@tongji.edu.cn;
zhijun_ding@hotmail.com; liugj1116@163.com; pwei.wang@gmail.com;
cgyan2@163.com, zhou@njit.edu).
M. C. Zhou is also with the Department of Electrical and Computer
Engineering, New Jersey Institute of Technology, Newark, NJ 07102 USA
(e-mail: zhou@njit.edu).

Search API. Furthermore, users can deploy their Web service


applications in such cloud platforms. This development
tendency has two following effects. One is that the number of
services is rapidly enlarged. Another is that enormous services
congregate into some service centers or service repositories.
Consequently, a very important and challenging question arises,
i.e., how to manage services efficiently in a large service
repository. Clearly, services can be stored by a relational
database in a service repository. Lee et al. [4] propose an
efficient storage model based on such database. A fact is that,
for a special field, there maybe exists a more efficient data
structure for a class of objects. For example, Web pages are a
class of objects. Google has designed, implemented, and
deployed Bigtable [5] to store and manage them, and achieved
great success. A question is that, given a service repository, can
one find a data structure that maximizes the efficiency of
service discovery, composition and management. This work
intends to answer this critically important question.
The discovery, composition, selection and security of Web
services are widely studied. The most interesting problems
faced by both industry and academia are service discovery and
composition [6-17]. Finding a data structure to improve their
efficiencies is a highly interesting issue.
Service discovery is to find one or more services, each of
which can satisfy a user's requirement. Service composition is
to find a number of services that can be executed orderly to
satisfy a given user requirement. Therefore, the latter is more
complicated than the former. It is especially difficult to realize
the best composed services in a large-scale repository.
Narayanan and McIlraith [18] use Petri nets to analyze the
complexity of a composition problem of Semantic Web service
described by DAML-S and verify that this problem is
Exp-Space-Time-hard in the worst case. Nam et al. [19] study
the computational complexity of behavioral description-based
Web service composition, and conclude that the composition
problem of nondeterministic Web services with incomplete
information is 2-EXP-complete. Our prior work also proves
that it is co-NP-hard for a given group of services to decide
whether their composition is compatible in behavior [20].
These findings suggest that more efforts to devise efficient
solutions to the service composition problem are needed.
Various studies [4, 7, 21-23] propose different methods to
improve the service composition efficiency. Their common

1939-1374 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TSC.2015.2398442, IEEE Transactions on Services Computing

IEEE TRANSACTIONS ON SERVICES COMPUTING


shortcoming is that they are considered to be parts of their
composition methods but not independent systems. Therefore,
they are not considered as the support for service discovery,
management and maintenance, e.g., service addition and
deletion. However, as the number of services increases, the
demand becomes more and more intense for an independent
system that can provide great support for not only service
composition but also discovery and be conveniently managed
and maintained.
This work proposes a multilevel index model that can be
used as an independent system to store and manage services in a
large-scale repository and to facilitate service discovery and
composition. The proposed model includes a multilevel index
structure and four operations, i.e., addition, deletion,
replacement and retrieval. The proposed model contains four
level indexes. The first two are constructed according to inputs
and outputs of services and are easy to understand. The last two
are constructed according to a newly introduced concept called
"key", and can successfully remove redundancy contained in
the inverted index SPACE[23].
The contributions of this work are summarized as follows:
1. Based on the theory of the equivalence relations and
quotient sets, this work proposes a multilevel index model for
service repositories, which can reduce the service discovery
and composition time to the extent never possible by using
existing methods;
2. Based on functions implied in the proposed multilevel
index model, this work proposes four operations that can
retrieve, insert, delete and replace services easily; and
3. The experiments validate that the proposed multilevel
index is more efficient than the other two best structures in
literatures.
The rest of this paper is organized as follows. Section II
summarizes the related work. Section III presents the
architecture of the proposed model and provides basic
definitions. Section IV proposes the theory of a multilevel
index model. Section V introduces the functions implied in the
proposed model and proposed four operations of them. Section
VI reports the experimental results. Section VII concludes the
paper with a discussion of further researches.

II. RELATED WORK


Service discovery and composition time is an important
factor affecting user satisfaction. The studies [18-20] have
proven that the composition problem is very time-consuming.
Therefore, efforts that can facilitate in efficiently obtaining its
solution are highly desired.
Tang et al. [7] introduce a novel automatic Web service
composition method based on logical inference of Horn clauses
and Petri nets. They first transform a Web service composition
problem into a logical inference problem of Horn clauses based
on the forward-chaining algorithm. They then use the Petri net
and its structural analysis techniques to obtain the composite
service. Since there may be a large number of services in a
service repository, and a huge number of rules may be
generated consequently, the Petri net of a Horn clause set is

2
very large. In order to reduce the composition time, they
propose a method to select the candidate clauses for the
inference when a new query comes. Its weakness is that it must
be executed after receiving user requirements, and cannot be
executed beforehand. Wu and Khoury [21] propose a
tree-based search algorithm for Web service composition in a
cloud computing platform. They first create a tree that
represents all possible composition solutions according to user
requirements, and then prune the illegal branches aiming to
reduce response time and improve performance, and finally use
a heuristic algorithm to search an optimal solution. This method
has the disadvantage similar to that in [7], namely, its
optimization process cannot be executed before receiving user
requirements.
Constantinescu et al. [24] propose a type-compatible service
composition method. They use a forward composition
algorithm to obtain a solution. Its drawback is that it often
contains many useless services in its solution. Kwon et al. [22]
propose a two phase composition method to overcome such a
drawback. Its first phase is to generate a composition solution
via a forward composition algorithm. Its second phase is to
eliminate useless services backward. In the forward phase, they
use a service net to reduce the time of composition. Let si
denote a service, and si and si denote the input and output
parameters of si, respectively. Services are nodes of the net. If
sisj, there exists a directed edge from si to sj. Assume that
si have been selected in a solution. In the next step, only these
services that link to si need to be retrieved and determined
whether they should be selected. The service net narrows the
search space for every step of service composition (except the
first one), and is effective to improve the composition
efficiency. Lee et al. [4] propose a scalable and efficient Web
service composition method based on a relational database.
They also use the service net as a basic data structure. The
service net has two shortcomings. First, it does not consider the
issue to facilitate service discovery. Second, it is
time-consuming for service addition and deletion. For example,
when a service si is added, for every existing service sj, it needs
to be determined whether sjsi and sisj
An inverted index is a highly efficient index that is widely
used in many fields. For example, Google uses it to reduce
response time for users' queries [25, 26]. It is also adopted in
Web service storage.
Aversano et al. [27] propose a backward composition
method that composes services from a terminal state to the
initial one. It contains two steps, called horizontal and vertical
ones. The former is to find a minimal set of services that can
converge to a target state. The latter is to repeat the former till a
stop condition is met. Clearly, according to the Binomial
theorem [28], the time complexity of the horizontal step is 2n in
the worst case where n is the number of services. In order to
reduce its composition time, Li et al. [23] propose an inverted
index to manage services. They store services and their
parameters. If a parameter is an output one of a service, then an
index link is created from the parameter to the service. Their
method is efficient and effective for any backward service
composition method, and convenient for service addition and

1939-1374 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TSC.2015.2398442, IEEE Transactions on Services Computing

IEEE TRANSACTIONS ON SERVICES COMPUTING


deletion. But it is not applicable to the forward composition
method [21, 22, 24] which is simpler and more popular than the
backward one. Note that its time complexity of them is n(n-1)/2
in the worst case in contrast with the exponential one of a
backward method [27].
The inverted index can be easily amended to adapt to the
forward composition method via a simple change that an index
link is created from a parameter to a service if the parameter is
an input one of the service. Clearly, the principle of the
amended inverted index and the service net [4, 22] is the same
essentially. It overcomes the disadvantages of the service net
that is inconvenient to service addition and deletion. However,
it contains considerable redundancies, as illustrated later in this
paper, which can significantly waste time in service
composition and discovery. To remove such redundancies, our
prior work [29] proposed a multilevel index model based on
equivalence relations and quotient sets for the first time.
However, it only discussed how to construct it but failed to
answer theoretically why it could reduce redundancies. This
paper comprehensively presents a multilevel index model,
including its application architecture and four operations to
manage and maintain services and theoretically proves its
integrity, nonredundancy and correctness. More detailed and
persuaded experiments, including influence factors ones, are
conducted to validate its efficiency and explore its new
characteristics.
The comparisons between exist methods and the proposed
multilevel index can be concluded as follows. The horn clauses
based method [7] and the tree based one [21] cannot be
executed before receiving user requirements. The service net
[22] [4] improves its disadvantage. It is independent of users'
requirements and reduces response time. But its disadvantage is
that it is not easy for service addition and deletion. The inverted
index [23] is an improvement of the service net. It is easy for
service addition and deletion. However, there is still much
redundant information contained in it. The multilevel index [29]
inherits all their advantages, being independent of requirements
and easy for service addition and deletion, is proposed to
remove the redundancies contained in the inverted index.
However, it [29] only discussed how to construct the multilevel
index but failed to give its theoretical foundation. This paper is
a comprehensively extension of it, including its theoretical
foundation, four operations, application architecture and more
detailed and persuaded experiments.
III. ARCHITECTURE AND DEFINITIONS
The proposed multilevel index model can be easily
integrated into a service repository. Fig. 1 shows the scenario.
The multilevel index is the core of the proposed model, which
is an efficient data structure for service retrieval, addition,
deletion and replacement. Service retrieval is an operation that
accepts a set of service parameters and returns a set of services
that can be invoked according to this parameter set. It is
commonly invoked by service composition and discovery
methods when they need to search some services from a service
repository according to a parameter set. Since it can return a
smaller service set than the whole services of a repository, the

3
service composition or discovery methods using it can work on
a smaller service set instead of the whole one to find the most
suitable services, thereby reducing their time. Service addition,
deletion and replacement are three maintenance operations that
can be used by administrators or service providers to maintain
services in a service repository. The proposed model needs an
ontology to ensure each service parameter to have a unique
identifier. For example, a parameter name "city" may refer to a
departure or destination city in two services. Since they are two
different concepts, they will be resolved into different
identifiers by the ontology. The functions of the ontology are to
solve semantic questions before the construction of the
proposed model. Note that not all semantic questions can be
resolved by an ontology at present, which is beyond the scope
of this paper.
Different services are generally composed together by their
inputs and outputs. The following example in Fig. 2 is a
simplified one used in [4]. There are two services, 1) a
tourinfo-lookup service that accepts city and date, and outputs
tour cost, hotel and address, and 2) a car-rent service that
accepts date, address and car size, and outputs rent and car.
Users hope to have a tour and rent a car. They can provide city,
date, car size and would like to reserve a hotel room and a car.
Suppose that no single existing service satisfy their request but
can be fulfilled by the composition of the above mentioned
ones. Clearly, service inputs and outputs are indispensable for
service composition.
This example is simple and used to illustrate what service
composition is and inputs and outputs of services are in service
composition. In Section IV, some other examples will be given
to help understand how the proposed model improves service
composition and discovery efficiencies.
Definition 1. A service s=(s, s, O), where s is the set of
input parameters, and s is the set of output parameters. O is a
set of service attributes, e.g., QoS or description.

Fig. 1. Architecture of the proposed model.

Fig. 2. An example of service composition.

In each step of service composition, services are retrieved


and inspected to see whether they can be invoked. In a
large-scale service repository, most of services are irrelevant

1939-1374 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TSC.2015.2398442, IEEE Transactions on Services Computing

IEEE TRANSACTIONS ON SERVICES COMPUTING


and unnecessary to be retrieved. The multilevel index is
proposed to avoid unnecessary retrievals of irrelevant services.
Definition 2. Service retrieval Re(A, S)={s|sAsS} where
A is a given parameter set and S is a service set.
The symbol "" differs from its traditional usage slightly.
For example, given s={4WDcar} and A={car}, sA is true
if 4WDcar is a subclass of car in the ontology. To determine
whether sA is the task of the ontology. The consideration of
subclass relations leads to much more available services.
In this paper, service retrieval is defined as an operation that
accepts a set of parameters and returns a set of services that can
be invoked by this parameter set. In the above example in Fig. 2,
city, time and car size are given by a user and hotel, and rent
and car are required. The first step is to find out which services
can be invoked according to the given parameters. That is just
the function of the retrieval defined above. If a tourinfo-lookup
service is found, the available parameters become city, time,
car size, tour cost, hotel and address. Then the retrieval is
needed again to find the next service till all required parameters
are found.
Since "O" in a service s is not related to the service retrieval,
only s and s are considered in this paper. If s=({a, b}, {c, d},
O), it is written as s:abcd for conciseness. There is no ordered
relationship among the elements before or behind . S
denotes the set of all services in a repository. Unless otherwise
specified, sS is assumed.
Definition 3. A user's request can be denoted as Q=(Qp, Qr),
where Qp is a parameter set provided by the user, and Qr is a
parameter set required by the user.
Definition 4. Service discovery Dc(Q, L(O), S)={s|sQp
Qrss.OL(O)sS}, where L(O) is a set of constraints for
any other attributes, S is a service set, and s.OL(O) means that
s satisfies these constraints.
Therefore, Dc(Q, L(O), S)={s|Re(Qp, S)Qrss.OL(O)}.
It is clear that service retrieve is a part of service discovery. If
the time of the former is reduced, that of the latter is so. Note
that if a service discovery method does not follow Definition 4,
but finds a service based on L(O) only, the model is invalid.
Service composition is composed of successfully retrieved
services whose count may be high. Hence, efficient service
retrieves can clearly contribute to efficient service composition.

IV. MULTILEVEL INDEX


This section presents theoretical foundation of the proposed
multilevel index.
A. Equivalence-based Indexes
There are many similar services with the same input and
output parameters in the Internet. When these services are
stored in a repository, they can lower service composition and
discovery efficiencies. For example, in Fig. 2, if there are many
similar tourinfo-lookup services and car-rent ones, it is very
possible to retreive more than what users need to complete the
composition. If services with the same input and output
parameters are clustered into a class and formed a virtual

4
service, the search space can be reduced. This method is simple
but effective for these service sets that cantain many similar
services. After the clustering, a new service can be composed in
the level of virtul services. When the composed workflow is
completed, a real service need to be binded. How to select a
desirable serivce from many similar ones is another research
field called service selection [30]. It is beyond the scope of this
paper.
All similar services with the same input and output
parameters are clustered into a class, which constructs the first
level index. Equivalence relation is used to describe the whole
index.
Definition 5. Relation R1 is defined on S: si, sjS, si R1 sj
(si=sj)(si=sj).
Theorem 1. R1 is an equivalence relation on S.
Proof: (1) sS, s=ss=s s R1 s R1 on S is reflexive
[31]; (2) si, sjS, si R1 sj si=sjsi=sj sj=sisj=si sj
R1 si R1 on S is symmetric [31]; (3) si, sj, skS, (si R1 sj)(sj
R1 sk) (si=sjsi=sj)(sj=sksj=sk) si=sksi=sk si R1
sk R1 on S is transitive [31]. From (1)-(3), R1 is an
equivalence relation on S.
An equivalence relation E on a set W divides W into a family
of disjoint subsets called the quotient set [31] of W induced by
E. Each subset is called an equivalence class [31].
Procedure 1: Partition S according to R1.
According to Theorem 1, R1 as an equivalence relation on S
can divide S to result in a quotient set, i.e., a family of disjoint
service subsets.
Definition 6. The quotient set induced by R1 is denoted as 1.
An equivalence class contained in 1 is called a similar class,
denoted as Cs.
According to R1, a service set S is divided into many subsets,
and each subset is called a similar class that contains one or
more services with the same input and output parameters.
Therefore, each similar class Cs has a unique pair of parameter
sets, denoted as Cs and Cs where Cs=s and Cs=s, sCs.
The First Level Index (L1I): There is an index between a
service s and a similar class Cs if sCs.
Relation R1 and Procedure 1 help one construct L1I as shown
in Fig. 3 and the right side of Fig. 13. The definition of quotient
set Q induced by an equivalence relation E on a set W ensures
every element contained in W is classified into a unique
equivalence class and any two different equivalence classes are
disjointed. Therefore, L1I ensures that all services are indexed
only once, implying neither service being omitted nor service
being indexed twice. In other words, L1I has the integrity and
contains no redundancy.
Without L1I, all services need to be retrieved for computing
Re(Qp, S). With L1I, only all similar classes need to be retrieved
instead of all services. Since the similar class count is smaller
than the service one generally, the retrieval time is reduced. An
example is shown in Fig. 3.

Fig. 3. An example of L1I.

1939-1374 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TSC.2015.2398442, IEEE Transactions on Services Computing

IEEE TRANSACTIONS ON SERVICES COMPUTING

Without L1I, there are 3 services that need to be checked to


answer whether sQp for computing Re(Qp, S). With L1I, only
2 similar classes need to be checked regarding whether CsQp
for the same computing task. Generally, comparing sets
consumes more time than mapping since a set may contain
many elements. Therefore, when Cs1 is found, s1 and s2 are
found. L1I reduces the redundancies induced by these services
with the same input and output parameters.
A specific example of L1I is shown in Fig. 4. Two weather
services with the same input and output parameters are
classified into an abstract service Cs1. If users want to retrieval a
service, they can retrieve the similar classes instead of all
services. Since the similar class count is smaller than the
service one generally, the retrieval time is reduced.
Definition 7. Relation R2 is defined on 1: Cs1, Cs21, Cs1
R2 Cs2 Cs1=Cs2.
Theorem 2. R2 is an equivalence relation on 1.
This theorem can be proved by the similar method used in
proving Theorem 1.
L1I
Cs1(weather):
time, location
weather, temperature
Cs2(movie search):
time, cinema
movie name, price

weather1:
time, location
weather, temperature
weather2:
time, location
weather, temperature
movie search3:
time, cinema
movie name, price

Fig. 4. A specific example of L1I.

Procedure 2: Partition 1 according to R2.


According to Theorem 2, R2 as an equivalence relation on 1
can divide 1 to result in a quotient set, i.e., a family of disjoint
similar class subsets.
Definition 8. The quotient set induced by R2 is denoted as R2.
An equivalence class contained in R2 is called an input-similar
class, denoted as is.
According to R2, 1 is divided into many subsets, and each
subset is called an input-similar class that contains one or more
similar classes with the same input parameters, namely, every
similar class Cs contained in the same input-similar class is has
the same Cs. Therefore, each input-similar class is contained
in R2 has a unique parameter set, denoted as is where is=Cs,
Csis.
The Second Level Index (L2I): There is an index between a
similar class Cs and an input-similar class is if Csis.
Relation R2 and Procedure 2 help one construct L2I as shown
in Fig. 5 and Fig. 13. Since Re(Qp, S) focuses on s only, with
the help of L2I, only all input-similar classes need to be
retrieved rather than all similar classes. Since the input-similar
class count is smaller than the similar class count generally, the
retrieval time is further reduced. Relation R2 and Procedure 2
ensure that L2I has the integrity and contains no redundancy.
An example is shown in Fig. 5.
Without L2I, 3 similar classes need to be computed to judge
whether CsQp for Re(Qp, S), while with L2I, only 2
input-similar classes need to be compared.

is1 : ab
is 2 : ad
Fig. 5. An example of L2I-L1I.

A specific example is shown in Fig. 6. If a retrieval request is


Re({time, price}, S), namely to find all services that can be
invoked under the parameters time and price, with L2I, after
two comparisons with the input-similar classes, a conclusion
can be made that no service can be invoked; while without it,
more comparisons are needed.
The service retrieve is related to only service input
parameters. Therefore, after the construction of L2I, L1I has no
effect on service retrieve. On the other hand, it can improve the
efficiency of service discovery and composition. L1I clusters
services into different similar classes whose quantity is less
than that of services. According to the definition of service
discovery Dc(Q, L(O), S), sQp and Qrs need to be judged.
L2I helps one narrow the search space regarding which similar
classes need to be retrieved. Since a similar class contains
services with the same input and output parameters, it is not
necessary to judge whether Qrs for every service s contained
in a similar class but QrCs only. Therefore, L1I can improve
the efficiency of service discovery. Since each step of service
composition is similar to service discovery, L1I is also effective
to improve service composition efficiency. Another reason is
the quick replacements of unavailable services contained in a
composed service. If a service contained in a composed one is
unavailable, it can be quickly replaced by the one contained in
its similar class. These are the reasons why L1I is needed in the
proposed model.

is1 : time, location


is 2 : time, cinema

Fig. 6. A specific example of L2I-L1I.

B. Redundancy-removal Indexes
1) Redundancy contained in the inverted index
After L1I and L2I, when computing Re(Qp, S), all
input-similar classes need to be retrieved and checked whether
Qpis. In a large scale service repository, it is clear that most
input-similar classes are irrelevant to a user's request Q.
Therefore, another index is desired to narrow the search space
of input similar classes.
The above question can be abstracted into: given a set Qp,
how to efficiently find out all its subsets from many sets. B-tree
and hashtable are efficient index structures for element search.
But it cannot be used to narrow the search space for set

1939-1374 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TSC.2015.2398442, IEEE Transactions on Services Computing

IEEE TRANSACTIONS ON SERVICES COMPUTING


comparison directly. For example, all sets are stored by a B-tree,
shown in Fig. 7.

Fig. 7. Sets are stored in B-tree, which cannot narrow the search space.

But it is unknown which set is a subset of Qp. Therefore,


every set should be compared with Qp. For this reason, the
B-tree or hashtable cannot reduce the search space.
Inverted index is another candidate for this. An example is
shown in Fig. 8. All elements can be stored in the hashtable or
B-tree. Elements link to the sets containing them. Then
according to the inverted index, the search space is quickly
narrowed to A and B, instead of all sets.
Although the inverted index can narrow the search space, it
still contains redundancy. In Fig. 8, According to a, sets A and
B are retrieved; according to b, set A is retrieved again, which is
clearly redundant.

Fig. 8. Sets are stored in the inverted index, which can narrow the search space.

Fig. 9 is another example of using the inverted index for


input-similar classes. Parameters link to these input-similar
classes whose is contains them.
is1 : ab
is 2 : ad
is 3 : cd
Fig. 9. An example of inverted index for input-similar classes.

When computing Re(Qp, S), only these input-similar classes


linked by a and b, instead of all input-similar classes, are
retrieved and checked to know whether isQp. Therefore, the
search space can be significantly narrowed in a large-scale
repository. However, just as the above discussed, the inverted
index cannot get rid of all the redundancy. In Fig. 9, is1 is
linked by a and b, which means that is1 would be retrieved
twice when Re(Qp, S) is computed. Clearly, it is unnecessary
and such redundancies need to be eliminated.
is1 : time, location
is 2 : time, cinema

6
index, this work proposes two new level indexes.
Keys are used in tables of relational databases to reduce
some redundancies, which is similar to the redundancy residing
in the inverted index. Inspired by this, a new concept of a key
for the input parameters is proposed. A parameter in is is
designated as a key of is, denoted as (is). Each is is forced to
pick only one key if |is|1. Different input-similar classes may
have the same or different keys. According to keys, a new
relation and thus a new index can be established.
Definition 9. Relation R3 is defined on R2: is1, is2R2, is1
R3 is2 (is1)=(is2).
Theorem 3. R3 is an equivalence relation on R2.
This theorem can be proved by the similar method used in
proving Theorem 1.
Procedure 3: Partition R2 according to R3.
According to Theorem 3, R3 as an equivalence relation on R2
can divide R2 to result in a quotient set, i.e., a family of disjoint
input-similar class subsets.
Definition 10. The quotient set induced by R3 is denoted as
3. An equivalence class contained in 3 is called a key class,
denoted as Ck.
The Third Level Index (L3I): There is an index between an
input-similar class is and a key class Ck if isCk.
Relation R3 and Procedure 3 help one construct L3I as shown
in Fig. 11 and Fig. 13 and ensure that L3I has the integrity and
contains no redundancy.
According to R3, R2 is divided into many subsets, and each
subset is called a key class that contains one or more
input-similar classes with the same key, namely, every
input-similar class is contained in the same key class Ck has the
same key. Therefore, each key class Ck contained in 3 has a
unique key, denoted as (Ck). Thus (Ck)=(is), isCk.
The function f: AB is a bijection iff 1) x, yA,
f(x)=f(y)x=y, and 2) zB, xA such that f(x)=z. From
above analysis, we know that a key class has a unique key, and
a key identifies a unique key class. denotes the set of keys of
all key classes. Thus 3 and form a bijection fk:3. Given
Ck3 and k, fk(Ck)=k if (Ck)=k.
The Fourth Level Index (L4I): There is an index between a
key class Ck and a key k if fk(Ck)=k.
The definition of bijection ensures that L4I has the integrity
and contains no redundancy.
After L3I and L4I are built, the information redundancy
presented in the prior example can be eliminated. An example
is shown in Fig. 11.

is 3 : location, hotel
Fig. 10. A specific example of inverted index between parameters and
input-similar classes.

A specific example of an inverted index is shown in Fig. 10.


is1, is2 and is3 are indexed more than once. If a user uses
location and hotel to retrieve, is3 is retrieved twice. The
elimination of such redundancies will be discussed next.
2) Redundancy elimination
In order to eliminate the redundancy residing in the inverted

C k1 : a
C k2 : d

is1 : ab
is 2 : ad
is 3 : cd

Fig. 11. An example of a multilevel index.

In the above example, assume that (is1)=a, (is2)=d and


(is3)=d. Then Ck1 with (Ck1)=a links to is1, and Ck2 with
(Ck2)=d links to is2 and is3. Any object in the above index is
linked only once, and therefore, the redundancy that resides in

1939-1374 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TSC.2015.2398442, IEEE Transactions on Services Computing

IEEE TRANSACTIONS ON SERVICES COMPUTING

Theorem 4 suggests that |Ck| should be as close to | 2 | as

Fig. 9 is eliminated.
C k1 : time
C k 2 : location

is1 : time, location

possible. For example, assume that R2={Ck1, Ck2}, |Ck1|=2,

is 2 : time, cinema

(Ck1)=a, |Ck2|=1, (Ck2)=b, and an input-similar class is with

is 3 : location, hotel

its is={a, b} is added. If the key of is is designated to a, |

Fig. 12. A specific example of L4I-L3I.

A specific example of L4I-L3I is shown in Fig. 12. In order to


remove the redundancies in Fig. 10, every input-similar class
selects a parameter as its key. Assume that is1 and is2 select
time as their keys, and is3 select location as its key. According
to Procedure 3, is1 and is2 are classified into a key class, and
is3 is classified into another key class. The keys link to their
key classes. A service is neither retrieved more than once, nor
omitted. If the user uses location and hotel to retrieve a service,
only is3 is retrieved only once. The redundancies in the
inverted index are thus removed.
All keys can be stored in an efficient structure, e.g., a B-tree
or hash table. Such structure can help one quickly determine
whether a parameter contained in Qp is a key. When computing
Re(Qp, S), only a in Qp is a key, whereas b is not a key in the
example. Therefore, only Ck1 is retrieved once only. While in
Fig. 9, under the inverted index, Ck1 is retrieved twice and Ck2 is
retrieved once.
L1I-L4I help one construct a multilevel index for services.
Fig. 13 shows its resultant structure. Since every level index has
the integrity and contains no redundancy, the proposed model
has the integrity and contains no redundancy.
1 : all

R 2 : all

3 : all

similar

S : all

: all keys

key classes

classes

classes

services

k1

C k1
Ck2

is1
is 2

Cs1

s1

k2

Cs 2

s2

input-similar

Fig. 13. The Multilevel Index Model

C. The Selection of a Key


A question is how to determine the keys of input-similar
classes. Their determination is not the task of a service designer
but the multilevel index modeler. They can be determined by
the cardinality of Ck, denoted as |Ck|, in the proposed model
automatically.
Theorem 4. Assume that |R2|=n, |3|=m, the probability of
every is being retrieved is equal, |Cki| is same, i=1, 2,, m, and
t denotes the time of retrieving is from R2. When m n ,
t n is minimal.
Proof: Key classes and input-similar classes constitute a
two-level index. The average time to find is from R2 without
an index is n/2. The average time to find is with a two-level
index is t

dt 1
n
dt
m n
. Since
, solving

0

2
dm 2 2m
dm
2 2m

leads to m n . t is minimal since

d 2t
n
1

0.
dm 2 m3
n

| 2 | -|Ck1||+| | 2 | -|Ck2||=| 4 -3|+| 4 -1|=2. If its key is


designated to b, | | 2 | -|Ck1||+| | 2 | -|Ck2||=| 4 -2|+| 4 -2|
=0<2. Therefore, its key should be designated to b in order that
|Ck| is as close to | 2 | as possible.
Theorem 4 presents a basic principle for one to select keys.
Operation "addition" will later be given and performed based
on this principle.

V. OPERATIONS OF THE INDEX


This section first introduces the functions implied in the
proposed model, and then, based on them, proposed four
operations, i.e., retrieval, addition, deletion and replacement.
A. The Functions in the Multilevel Index Model
Given a bijection f:AB, f -1 denotes its invertible function.
A function f:AB is a surjection if and only if yB, xA
such that f(x)=y. If f:AB is a surjection, and CB, the
preimage of C under f is defined by f -p(C)={e|eAf(e)C}.
Assume that a set W is partitioned by an equivalence relation
E into a quotient set Q. f:WQ forms a surjection according to
the partition relation. Then functions can be embedded into the
proposed model shown in Fig. 14.
S is partitioned into multiple similar classes by R1. According
to Procedure 1, S and the set of all similar classes 1, form a
surjection, denoted as f1:S1. 1 is further partitioned into
multiple input-similar classes by R2. According to Procedure 2,
1 and the set of all input-similar classes R2, form a surjection
denoted as f2:1R2. R2 is partitioned into multiple key
classes by R3. According to Procedure 3, R2 and the set of all
key classes 3, form a surjection, denoted as f3:R23. The
function fk:3 is a bijection.
R 2 : all

3 : all
: all keys

key classes

input-similar
classes

k1

C k1
Ck2

is1
is 2

k2

1 : all
similar
classes

S : all
services

Cs1

s1

Cs 2

Fig. 14. The Multilevel Index Model with Functions

s2

Next, how to operate services will be introduced in the


proposed model based on these functions.
B. Retrieval
Retrieval is an important operation of the proposed model. It
can be invoked by service discovery and composition methods
and reduce their time. The method to compute Re(A, S) has six
steps as shown in the following algorithm.

1939-1374 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TSC.2015.2398442, IEEE Transactions on Services Computing

IEEE TRANSACTIONS ON SERVICES COMPUTING


Algorithm 1. Retrieval.
Input A

8
Kclose={k|kKincluded|(|fk-1(k)|- | 2 | )||(|fk-1(k')|- | 2 | )|, k'Kincluded};
if(Kclose){randomly select a kKclose as a key of s;}

K=A;

={Ck|Ckfk-p(K)};
C={is|isf3-p()isA};
=f2-p(C);
Re(A, S)=f1-p();

Theorem 5. Algorithm 1 to compute Re(A, S) is correct.


Proof: Let P=Re(A, S)={s|sAsS} and S' denotes the set
obtained by Algorithm 1. (1) sS' f1(s)=Cs f2(Cs)=is

s= Cs=is A sP S'P. (2) Given sP, assume that
f1(s)=Cs, f2(Cs)=is, f3(is)=Ck, and fk(Ck) =k. s=Cs=isA.
Since (is) and (is)A, (Ck)=(is)K. Since fk is a
bijection, fk(Ck)KCk. Since isA and f3 is a surjection,
f3(is)=CkisC. Since f2 is a surjection, f2(Cs)=isCs.
Since f1 is a surjection, f1(s)=CssS'. Thus, sS' and PS'. By
(1) and (2), P=S'. Therefore, this algorithm is correct.
When computing Re(A, S), many services need to be
retrieved to answer whether sQp. The number of these
services is called a search space size.
If a data structure without any index, called a sequential
structure, is used to do Re(A, S), its search space size is |S| in
any case. For an inverted index structure [23], its search space
size is 0 in the best case, and |A||S| in the worst case. The
search space size of the proposed model is 0 in the best case,
and |S| in the worst case.

else
{if((s-K)){randomly select a k(s-K) as a key of s;}
else
{Kexcluded=K-Kincluded;
Kclose={k|kKexcluded|(|fk-1(k)|- | 2 | )||(|fk-1(k')|- | 2 | )|,k'Kexcluded};
randomly select a kKclose as a key of s;}}}}
if(k){add k into ; create Ck:Ck3(Ck)=k;}
find is:is=sisf3-p({fk-1(k)});
if(is does not exist){create is:isR2is=s;}
find Cs:Cs=sCsf2-p({is});
if(Cs does not exist){create Cs:Cs1Cs=sCs=s;}
Cs=Cs{s};

D. Deletion and Replacement


Deletion is to delete a service from the multilevel index. It is
a reverse process of addition. Its algorithm is as follows.
Algorithm 3. Deletion.
Input a service s;
S'=Re(s, S);
if(sS')
{Cs=f1(s);Cs=Cs-{s};
if(|Cs|==0)
{is=f2(Cs);is=is-{Cs};
if(|is|==0)
{Ck=f3(is);Ck=Ck-{is};

C. Addition
Theorem 4 gives a basic principle for one to select keys, i.e.,
|Ck| should be as close to | 2 | as possible. Based on this
principle, a best-fit method is given to insert a service into the
multilevel index. When a service s is inserted into the
multilevel index, the first step is to find out which input
parameters are keys, and then select a key k such that |fk-1(k)| is
close to | 2 | . If there are more than one choice of keys with
the same cardinality, then select one from them randomly. If
there is no key contained in s, then select a parameter randomly
from s as a new key. Unfortunately, this method may make
some key classes such that their cardinalities are much bigger
than | 2 | when some parameters are used as input
parameters by most services. To avoid this, a threshold is set to
control the cardinality of a key class. When inserting a service s,
all keys in s should be found out. For each such key k, if
|fk-1(k)| | 2 | , it cannot be selected as the key of s. The
addition algorithm is shown next.
Algorithm 2. Addition.
Input a service s;
K=s;
if(K==){randomly select a ks as a key of s;}
else
{={Ck|Ckfk-p(K)};
find is:isf3-p()is==s;
if(is exists){select k=(f3(is) as a key of s;}
else
{Kinclude={k|kK|fk-1(k)|< | 2 | };

if(|Ck|==0)
{k=fk(Ck);3=3-{Ck};=-{k};}}}}}

The replacement operation can be implemented by the


deletion and addition operations. It needs two inputs that are s
and s' where s is the original service, and s' is the service to
replace s. Its algorithm is as follows.
Algorithm 4. Replacement.
Input s and s';
Deletion(s);
Addition(s');

VI. DEPLOYMENT AND EXPERIMENTS


A. Deployment
The proposed multilevel index model can be deployed
flexibly depending on a practical need.
L1I aims to remove redundancy induced by services with the
same input and output parameters. If a service set does not
contain services with the same input and output parameters, L1I
yields no effect. E1 denotes the efficiency improved by L1I. L1
denotes the overhead induced by L1I. If there are a small
number of services with the same input and output parameters,
it is possible that E1<L1. But when many services share the
same input and output parameters, L1<E1. A threshold is needed.
It may vary depending on a particular environment. How to
determine the threshold requires future work. If E1L1, L1I is
not needed. Services directly link to input-similar classes

1939-1374 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TSC.2015.2398442, IEEE Transactions on Services Computing

IEEE TRANSACTIONS ON SERVICES COMPUTING

Algorithm 5. Deployment Decision-making.


If(L1<E1)
{deploy L4I-L1I;}
Else
{If(L2<E2)
{deploy L4I-L2I;}
Else
{deploy L4I-L3I;}}

that the method with such index has a clear advantage


compared with one without any index and this advantage rises
as the number of services increases. The next experiments will
show them.
Fig. 15 shows search space size of 936 retrieval requests
under three structures. The inverted index reduces 93.89%
search space size than a sequential structure. The proposed
multilevel index reduces 98.07% search space size than the
sequential structure and 68.45% search space size than the
inverted index.
5000000
Search space size

according their input parameters.


L2I aims to remove redundancy induced by services with the
same input parameters. If a service set does not contain services
with the same input parameters, L2I yields no effect. E2 denotes
the efficiency improved by L2I. L2 denotes the overhead
induced by L2I. Similarly, if E2L2, L2I is not needed. Services
directly link to key classes according their keys.
According to the above analysis, an algorithm for how to
deploy the proposed multilevel index is proposed.

Sequential
4,647,168

4000000
3000000
2000000
Inverted
283,830

1000000

Multilevel
89,553

Fig. 15. Search space size for 936 retrievals under three structures.

In this section, service retrieval, composition and addition


are tested and evaluated.
1) Retrival
ICEBE05 [32] is a publically available test set that is
generated based on WSDL, as used in [3, 4, 33, 34]. It contains
two test sets, composition1 and composition2 that both include
81464 services and 198 composition queries. This test set is
generally used to evaluate the efficiency of service composition
and cannot be directly used to evaluate the retrieval. We know
that each step of service composition needs at least a retrieval
operation. Therefore, in order to test the efficiency of the
retrieval under different index structures, a retrieval request set
including 936 retrieval requests is generated from the
ICEBE05.
Three storage structures, sequential structure, inverted index
and multilevel index are tested. Sequential structure, denoted as
M1 that means method 1, is a basic structure without any index,
which is used as a baseline in our experiments. The inverted
index and the proposed multilevel index are denoted as M2 and
M3, respectively.
In ICEBE05, services shares no same input parameters, i.e.,
E1L1 and E2L2. According to Algorithm 5, only L4I and L3I
are deployed.
The average search space size of finding a service in
m n
computing Re(Qp, S) with a two-level index is t
,
2 2m
n
while the average search space size without any index is t .
2
Under the ideal conditions, i.e., |S|=n, and |3|=m= n , the
t
n

.
t
2
When n=|S|=104, =50. When n=|S|=106, =500. This means

maximum search space size ratio of t' to t is

The proposed index is more complicated than the inverted


index, which will bring some overhead. Therefore, retrieval
time is an important and practical indicator. Fig. 16 shows total
retrieval time of 936 retrieval requests under three structures.
The inverted index structure is more complicated than the
sequential structure. Although the inverted index reduces 93.89%
retrievals than a sequential structure, it only reduces 29.88%
retrieval time than the sequential structure. The proposed
multilevel index reduces 73.69% retrieval time than the
sequential structure and 62.48% retrieval time than the inverted
index. The improvement brought by the proposed method upon
the other two methods is clearly significant. To reduce 62%
retrieval time means to save over 60% computation-related
electricity power and thus electricity cost. Therefore, the
proposed model is very attractive for service industry.
12000

Retrieval time (ms)

B. Experiments

10000
8000

Sequential
10,957.43
Inverted
7,683.82

6000
4000

Multilevel
2,883.35

2000
0

Fig. 16. Retrieval time for 936 retrievals under three structures.

When the search space is narrowed, a question is whether the


search space covers all the services that can be invoked, namely
the recall. The completeness of the proposed index in Section
IV and the correctness of the proposed retrieval in Section V
have been proved, which means the recall is 100%. The
experimental results also validate them. Table 1 shows the
numbers of services found by retrieval under three structures.
Since the sequential method retrieval all services in the test set,
it can find out all services that can be invoked, namely its recall
is 100%. The inverted and multilevel index can do so as well.
Namely, their recalls are 100%.

1939-1374 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TSC.2015.2398442, IEEE Transactions on Services Computing

IEEE TRANSACTIONS ON SERVICES COMPUTING


TABLE 1. THE NUMBERS OF SERVICES FOUND BY RETRIEVAL UNDER THREE
STRUCTURES. THERE RECALLS ARE 100%.
Sequential
Inverted
Multilevel
Index name
78108
78108
78108
Service count
Multilevel
87.22%

TABLE 2. REORDERED EIGHTEEN SERVICE COMPOSITION TEST SETS

0.6
Inverted
27.52%

0.4
Sequencial
1.68%

Fig. 17. Index efficiencies of three structures.

For the 936 retrieval requests, there are 78108 services that
can meet them. Under the sequential structure, the retrieval
needs check 4647168 services to judge whether they can be
invoked. Then the ratio of the number of services that can be
invoked to the search space size can be used to measure the
efficiency of an index. Therefore, the efficiency of the
sequential structure is 78108/4647168, about 1.68%. The
efficiencies of the inverted and multilevel indexes are 27.52%
and 87.22% as shown in Fig. 17. The proposed index
significantly outperforms the other two.
2) Service composition
The purpose of the proposed model is to improve the
efficiency of service discovery and composition. In Section
V.B, the retrieval efficiency under the proposed multilevel
index is analyzed. However, the speeding up effect is unclear
when the retrieval is applied to service composition and
discovery under the proposed model. This section clarifies it
via experiments. Service discovery is a special case of service
composition with one step only. Therefore, service composition
is selected in our experiments.
Service composition methods can be generally classified into
two classes, the forward and backward ones. For a user's
request Q=(Qp, Qr), the forward methods are to compose a
composition results from Qp to Qr. Since the forward methods
are more efficient than the backward ones, which has been
analyzed in section II, they are widely used [4, 7, 22, 34, 35].
The backward ones are composing services from Qr to Qp. The
condition of the forward composition is sQp, and thus a key
can be used to narrow the search space; while the condition of
the backward composition is Qps. Therefore, the
proposed index is difficult to be directly utilized by the
backward one, which requires future work.
A breadth-first composition method that is one of forward
composition methods is selected for our experiments, which is
used by Kwon et al [22]. Its advantage is that it can find parallel
services to the largest degree with the same time complexity of
depth-first one. Another important reason is that the
breadth-first composition method can obtain the same
composition result for a composition request regardless of the
service storage structure. Therefore, it cannot be interfered with
randomness and is fair for different storage structures to

Name

Service Count

No.

Name

Service Count

1-20-4

2156

10

1-100-16

4156

1-20-16

2156

11

1-100-32

4156

1-20-32

2156

12

2-50-4

5356

1-50-4

2656

13

2-50-16

5356

1-50-16

2656

14

2-50-32

5356

1-50-32

2656

15

2-20-16

6712

2-20-4

3356

16

2-100-4

8356

2-20-32

3356

17

2-100-16

8356

1-100-4

4156

18

2-100-32

8356

Fig. 18 shows the total composition time. Its results are very
close to the retrieval time. The proposed index has significant
advantage over the other two methods.
12000

Composition time (ms)

Index efficiency

compare their performances.


ICEBE05 contains two composition test sets, each of which
contains 9 subsets, and each subset contains 11 composition
queries. All those eighteen service composition test sets are
reorder by their service count as shown in Table 2.
No.

0.8

0.2

10

10000
8000

Sequential
10,988.45
Inverted
7,688.54

6000
4000

Multilevel
2,888.04

2000
0

Fig. 18. Total composition time of a breadth-first method under three


structures.

Fig. 19 shows composition time for all 18 subsets. In every


set, the composition time of M3 is smaller than that of M1 and
M2. In sets 3, 6, 8, 11, 14 and 18, M2 spends more time to
complete composition than M1 and M3. The curves of M1 and
M3 show stable increase along with the service count. There are
many big fluctuations in the curve of composition time of M2.
Fig. 20 shows composition time differences between M1 and
M3 and between M2 and M3. The curves of (M1-M3) and
(M2-M3) show stable increase, which, once again, verifies the
conclusion that, with the number of services increases, the
advantage of the proposed multilevel index rises accordingly.
In other words, the proposed method has more advantages as
the repository becomes larger. Our theoretical analyses also
support this conclusion.

Fig. 19. Composition time of a breadth-first method under three structures on


eighteen test sets in an increasing order of service counts

1939-1374 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TSC.2015.2398442, IEEE Transactions on Services Computing

IEEE TRANSACTIONS ON SERVICES COMPUTING

M1-M3

M2-M3

11
parameter count

Service count

12

10

10

512.2ms
287.3ms

6
4
2

0
1

8 9 10 11 12 13 14 15 16 17 18
Eighteen test sets

Fig. 20. Differences of composition time of a breadth-first method under three


structures on eighteen test sets in an increasing order of service counts

3) Influence factors
Many factors may affect the performance of the proposed
multilevel index. Service count and service parameter count are
two of them. In our experiments, their impacts are evaluated.
service count

retrieval count

composition time
8356
9452116.4ms

53566448 77.9ms
3356 4411 52.6ms

2-20-4
2-50-4
2-100-4
(a) Three subsets have different service count. Each service of them has about 5
input and output parameters respectively.
service count
retrieval count
composition time
8356

9456512.2ms

5356 6453 343.7ms


3356 4456 227.6ms

2-20-32
2-50-32
2-100-32
(b) Three subsets have different service count. Each service of them has about
34 input and output parameters respectively.
Fig. 21. Influence of service count on retrieval count and composition time.

Fig. 21 shows the experimental results of the influence of


service count on service retrieval count and composition time.
In Fig. 21(a), three subsets have different service counts.
Services of the three subsets have approximate 5 input and
output parameters respectively. The results show that service
count can affect retrieval count and composition time. They
increase with service count. In order to evaluate how fast they
increase, their metric units are unified. The comparison results
show that their increase speed is slightly smaller than that of
service count. Fig. 21(b) shows the similar results.
parameter count

retrieval count

composition time
34
343.7ms

17
191.4ms
5 6448 77.9ms

6455

composition time
34

17

6
4

retrieval count

5 9452 116.4ms

9456

9456

2-100-4
2-100-16
2-100-32
(b) Three subsets have 8356 services respectively. Services in different subsets
have different numbers of input and output parameters.
Fig. 22. Influence of service parameter count on service retrieval count and
composition time.

Fig. 22 shows the experimental results of the influence of


service parameter count on service retrieval count and
composition time. In Fig. 22 (a), three subsets have the same
numbers of services. Services in different subsets have different
numbers of input and output parameters. The results show that
the service parameter count cannot affect the retrieval count.
However, it can affect the composition time. Similarly, the
increase pace of composition time is smaller than that of service
parameter count. Fig. 22 shows the similar result.
Experimental results in Fig. 21-22 show that the efficiency of
proposed index is very high. Its complexity is not more than
linear one. That means the multilevel index is highly suitable
for large-scale service repository.
4) Addition
Since the addition algorithm plays a key role for L3I, its
performance is tested. Fig. 23 shows the addition time of
eighteen test sets. The experiments results show that the
addition algorithm is very efficient. It cost 1289.8ms to insert
81464 services. Since it is more complex than the deletion one,
and replacement one can be implemented by addition and
deletion, three maintaining operations can be efficiently
implemented.

Fig. 23. Addition time of eighteen test sets in an increasing order of service
counts

The addition time curve is not smooth, and there are many
fluctuations in it. We think that it may be related to how many
input parameters each service has and how many services
contain the same parameter as an input one. How to improve
the proposed addition algorithm in different cases is an open
question.

6453

2-50-4
2-50-16
2-50-32
(a) Three subsets have 3356 services respectively. Services in different subsets
have different numbers of input and output parameters.

VII. CONCLUSION
This work proposes a multilevel index model to store and
manage services for large-scale service repositories. Based on

1939-1374 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TSC.2015.2398442, IEEE Transactions on Services Computing

IEEE TRANSACTIONS ON SERVICES COMPUTING


the theory of the equivalence relations and quotient sets, four
level indexes are given to construct the multilevel index model.
Four operations are proposed to manage and maintain services
based on functions implied in the proposed model. The
theoretical analysis and experimental results validate that the
proposed multilevel index is more efficient and stable for
service discovery and composition than the sequential structure
and inverted index. Especially, the advantages of the proposed
model become clearer and more significant as the number of
service increases. Our experiments validate that our four
operations are efficient. In the era of drastically expanding
services, the proposed model provides a highly desired storage
structure for large-scale service repositories.
Various factors can affect performances of the proposed
model, such as how many input parameters each service has,
how many services contain the same parameter as an input one,
and the probability of each service is invoked. It requires more
in-depth and systematic studies. Accordingly, how to adjust
addition operation according to different cases with different
factors in order to keep high efficiency of the proposed model
needs to be studied.

REFERENCES
[1] M. P. Papazoglou, P. Traverso, S. Dustdar, and F. Leymann,
"Service-oriented computing: State of the art and research challenges,"
Computer, vol. 40, pp. 38-45, Nov. 2007.
[2] M. P. Papazoglou and D. Georgakopoulos, "Service-oriented computing,"
Communications of the ACM, vol. 46, pp. 25-28, 2003.
[3] S. C. Oh, D. Lee, and S. R. Kumara, "Effective Web Service Composition
in Diverse and Large-Scale Service Networks," IEEE Transactions on
Services Computing, vol. 1, pp. 15-32, 2008.
[4] D. Lee, J. Kwon, S. Lee, S. Park, and B. Hong, "Scalable and efficient web
services composition based on a relational database," Journal of Systems
and Software, vol. 84, pp. 2139-2155, Dec. 2011.
[5] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M.
Burrows, et al., "Bigtable: A Distributed Storage System for Structured
Data," ACM Trans. Comput. Syst., vol. 26, pp. 1-26, 2008.
[6] J. Fan and S. Kambhampati, "A snapshot of public web services,"
SIGMOD Rec., vol. 34, pp. 24-32, 2005.
[7] X. Tang, C. Jiang, and M. Zhou, "Automatic Web service composition
based on Horn clauses and Petri nets," Expert Systems with Applications,
vol. 38, pp. 13024-13031, 2011.
[8] W. Tan and M. C. Zhou, Business and Scientific Workflows: A Web
Service-Oriented Approach, 1 ed.: Wiley-IEEE Press, 2013.
[9] W. Tan, Y. Fan, and M. Zhou, "A Petri Net-Based Method for
Compatibility Analysis and Composition of Web Services in Business
Process Execution Language," IEEE Transactions on Automation Science
and Engineering, vol. 6, pp. 94-106, 2009.
[10] W. Tan, Y. Fan, M. Zhou, and Z. Tian, "Data-Driven Service
Composition in Enterprise SOA Solutions: A Petri Net Approach," IEEE
Transactions on Automation Science and Engineering, vol. 7, pp. 686-694,
2010.
[11] P. Sun, C. J. Jiang, and M. C. Zhou, "Interactive Web service composition
based on Petri net," Transactions of the Institute of Measurement and
Control, vol. 33, pp. 116-132, 2011.
[12] P. Xiong, Y. Fan, and M. Zhou, "A Petri Net Approach to Analysis and
Composition of Web Services," IEEE Transactions on Systems, Man and
Cybernetics, Part A: Systems and Humans, vol. 40, pp. 376-387, 2010.
[13] X. PengCheng, F. YuShun, and Z. MengChu, "QoS-Aware Web Service
Configuration," IEEE Transactions on Systems, Man and Cybernetics,
Part A: Systems and Humans, vol. 38, pp. 888-895, 2008.
[14] P. W. Wang, Z. J. Ding, C. J. Jiang, and M. C. Zhou, "Design and
Implementation of a Web-Service-Based Public-Oriented Personalized
Health Care Platform," IEEE Transactions on Systems, Man, and
Cybernetics: Systems, pp. 1-17, 2013.

12
[15] J. Puttonen, A. Lobov, and J. L. Martinez Lastra, "Semantics-Based
Composition of Factory Automation Processes Encapsulated by Web
Services," Industrial Informatics, IEEE Transactions on, vol. 9, pp.
2349-2359, 2013.
[16] A. Girbea, C. Suciu, S. Nechifor, and F. Sisak, "Design and
Implementation of a Service-Oriented Architecture for the Optimization
of Industrial Applications," Industrial Informatics, IEEE Transactions on,
vol. 10, pp. 185-196, 2014.
[17] S. N. Han, G. M. Lee, and N. Crespi, "Semantic Context-Aware Service
Composition for Building Automation System," Industrial Informatics,
IEEE Transactions on, vol. 10, pp. 752-761, 2014.
[18] S. Narayanan and S. A. McIlraith, "Simulation, verification and
automated composition of web services," in Proc. the 11th international
conference on World Wide Web, Honolulu, Hawaii, USA, 2002, pp.
77-88.
[19] W. Nam, H. Kil, and D. Lee, "On the computational complexity of
behavioral description-based web service composition," Theoretical
Computer Science, vol. 412, pp. 6736-6749, 2011.
[20] G. Liu, C. Jiang, M. Zhou, and P. Xiong, "Interactive Petri Nets," IEEE
Transactions on Systems, Man and Cybernetics, Part A: Systems and
Humans, pp. 1-12, 2012.
[21] C.-S. Wu and I. Khoury, "Tree-based Search Algorithm for Web Service
Composition in SaaS," in Proc. the Ninth International Conference on
Information Technology: New Generations (ITNG), 2012, pp. 132-138.
[22] J. Kwon, H. Kim, D. Lee, and S. Lee, "Redundant-Free Web Services
Composition Based on a Two-Phase Algorithm," in Proc. the 2008 IEEE
International Conference on Web Services, 2008, pp. 361-368.
[23] K. Li, L. Ying, D. Shuiguang, and W. Zhaohui, "Inverted Indexing for
Composition-Oriented Service Discovery," in Proc. the 2007 IEEE
International Conference on Web Services, 2007, pp. 257-264.
[24] I. Constantinescu, B. Faltings, and W. Binder, "Large scale,
type-compatible service composition," in Proc. the 2004 IEEE
International Conference on Web Services, 2004, pp. 506-513.
[25] S. Brin and L. Page, "The anatomy of a large-scale hypertextual Web
search engine," Computer Networks and ISDN Systems, vol. 30, pp.
107-117, 1998.
[26] L. A. Barroso, J. Dean, and U. Holzle, "Web search for a planet: The
Google cluster architecture," IEEE Micro, vol. 23, pp. 22-28, 2003.
[27] L. Aversano, G. Canfora, and A. Ciampi, "An algorithm for Web service
discovery through their composition," in Proc. the 2004 IEEE
International Conference on Web Services, 2004, pp. 332-339.
[28] E.
W.
Weisstein.
Binomial
theorem.
Available:
http://mathworld.wolfram.com/BinomialTheorem.html
[29] Y. Wu, C. Yan, Z. Ding, P. Wang, C. Jiang, and M. Zhou, "A Relational
Taxonomy of Services for Large Scale Service Repositories," in Proc. the
19th IEEE International Conference on Web Services (ICWS), 2012, pp.
644-645.
[30] Y. Wu, C. Yan, Z. Ding, G. Liu, P. Wang, C. Jiang, et al., "A Novel
Method for Calculating Service Reputation," IEEE Transactions on
Automation Science and Engineering, vol. 10, pp. 634-642, 2013.
[31] B. Kolman, R. C. Busby, and S. C. Ross, Discrete Mathematical
Structures, 5th ed. Upper Saddle River, NJ: Pearson Prentice Hall, 2004.
[32] ICEBE05. (2005). Web Services Challenge 2005. Available:
http://www.comp.hkbu.edu.hk/~ctr/wschallenge/
[33] S. C. Oh, D. Lee, and S. R. T. Kumara, "Web service planner (WSPR): An
effective and scalable web service composition algorithm," International
Journal of Web Services Research, vol. 4, pp. 1-22, Jan-Mar 2007.
[34] R. Hewett, P. Kijsanayothin, and B. Nguyen, "Scalable Optimized
Composition of Web Services with Complexity Analysis," in Proc. the
2009 IEEE International Conference on Web Services, 2009, pp. 389-396.
[35] H. Mili, G. Tremblay, A. E. Caillot, R. B. Tamrout, and A. Obaid, "Web
service composition as a function cover problem," in Proc. The 2005
Montreal Conference on eTechnologies, Montreal, Canada, 2005, pp.
61-71.

1939-1374 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Вам также может понравиться