You are on page 1of 7

Towards Improving Web Service Composition and Ranking

Digvijay Singh Shaktawat (200601128) & Pushpak Aggarwal (200601173) Mentor: Prof. Sanjay Chaudhary Evaluation Committee no: 6
Abstract Web services, which facilitate automated machine-to-machine interaction, are being touted as the next big revolution in the internet world because of they provide for easy interoperability, reuse and integration of functionalities. But as their need grows, individual services will greatly hamper scalability of the service model. Herein lies the need for bundling services together; a process called web service composition. Here we extend a comprehensive approach towards service composition using workflows and move on to improve service search results using a new technique for ranking the web services based on their popularity. Index Terms page-rank, service composition, service ranking, structured data, web service, workflow. INTRODUCTION Today, there is a growing need for application functionality to be available over the Internet in a standardized, programmatic way. Approaches are being thought about so that applications that can't be accessed except by following rigid proprietary protocols can now talk to one another over the Internet, regardless of their native language, platform or internal protocols. The Web services model is the most popular approach in this area. Web services utilize existing IT infrastructures and allow companies to wrap legacy applications in a standardized, consistent and reusable format so every investment can be leveraged [10]. They provide a low-cost way to connect internal applications and collaborate among business partners. The ability to have applications-internal or external--share data automatically with no human intervention creates cost savings, productivity increases, error reductions and competitive advantages. WEB SERVICES The Web services paradigm is emerging to publish, discover, and invoke software components as services. Standardized by the World Wide Web Consortium (W3C), a Web Service is described as [8]: a software application identified by a Universal Resource Identifier (URI), whose interfaces and bindings are capable of being defined, described, and discovered as XML artifacts. A Web service supports direct interaction with other software agents using XML-based messages exchanged via Internet-based protocols. A simple example of a web service is a social-networking website offering weather related information. At the back end, the site itself acts as a client requests this information from another portal (the server) which actually stores the information. The portal exposes this functionality to be used by the client through an API accessible easily. The complete process, from web service creation to storage, discovery, composition and usage is explained now. Phases in web service creation and use The following phases are involved in development of services and service composition [1]: Service Creation and Description: First of all, the service provider should create the service and publish the interface with the description of a service. The description standard is WSDL which provides the information about Functional requirements of a service. It should provide information about methods, inputs along with their datatypes, outputs along with their data-types, exceptions, and transport mechanism. However, the WSDL standard has been criticized for not being scalable to highly complex and evolved web service descriptions because it does not cater to: Non-functional parameters which are related to the quality of service (QoS), response time, and service-price. These parameters are crucial to evaluate the suitability of a service for use. Service Registration/Publishing: The provider should advertise or publish the capabilities of each service, so that later on service consumers can discover any of these services as per their requirements and then communicate with the provider to access desired services. Universal Description, Discovery and Integration (UDDI) is a platform-independent, Extensible Markup Language (XML)-based registry for businesses worldwide to get themselves listed on the Internet. Service Composition: As the demand for bigger and more complex services increases, it is increasingly possible that a single Web Service may not be able to fulfill the requirements. But creation of distinct Web Services for every possible operation is infeasible and needlessly increases the maintenance overhead. In such a situation, a possible approach is to find a way to merge different Web Services and create the so called composed services which make it possible to aggregate and reuse the individual functionalities provided by the atomic Web Services.


In particular, most researchers conducted fall in the realm of workflow composition [2] or Artificial Intelligence planning. For the former, one can argue that, in many ways, a composite service is similar to a workflow. The definition of a composite service includes a set of atomic services together with the control and dataflow among the services. For example, assume that a user wants to find the web service whose input is City and output is Weather among web services in Figure 2. If web service composition is not supported, there is no answer to the user query. However, if the composition is supported, we can provide a sequence of web services as the answer to the user. The composition of web services WS1 and WS2, in which the output of WS1 is equal to the input of WS2, satisfies the user query. Hence we see that the basis for service composition is actually Service Matching, which is said to occur when the output parameters are the same as or a subset of the input parameters of another web service. Service Discovery and Ranking: The above are the steps to be taken by the service creators and the providers to make the services available for use. Now comes the phase where a client can search a Web Service that suits his needs. Herein, we need to introduce the concept of ranking the relevant Web Services because as the Web service models gains higher popularity, it is increasingly likely that a large number of them may satisfy the requirements but for the user to be able to choose effectively between them, these need to be ranked in the order of their relevance. SERVICE COMPOSITION AND RANKING Our work mainly concerns the last two phases of the flow of service creation and use, namely service composition and ranking. The Web service composition still is a highly complex task, and it is already beyond the human capability to deal with the whole process manually. The complexity, in general, comes from the following sources [2]. First, the number of services available over the Web increases dramatically during the recent years, and one can expect to have a huge Web service repository to be searched. Second, Web services can be created and updated on the fly, thus the composition system needs to detect the updating at runtime and the decision should be made based on the up to date information. Third, Web services can be developed by different organizations, which use different concept models to describe the services. However, there doesnt exist a unique language to define and evaluate the web services in an identical means. SERVICE RANKING Most of our work concentrates on how to rank the searched web services based on their popularity and relevance. After the service composition process, the services are available for the client. Then he can search for the most relevant ones. The search results must somehow be ranked based on their suitability; many approaches have been suggested like ranking the results based on the perceived Quality of Service (QoS) [5] and the popularity [4]. Our work mainly extends the concept of ranking the services based on their popularity to the paradigm of composed services. Our work involves extending a comprehensive approach towards the efficient storage, composition and services as explained in [5]. In the next section the architecture used is described in further detail.


Traditional web services are described with semi-structured WSDL which are stored and managed in UDDI. The processing efficiency of semi-structured data is lower than structured data so that current services search and service composition technology in UDDI cant adapt to the requirement of large number of web services operating. In [5], the authors follow the following procedure to overcome such limitations. Parse WSDL documents of web services manually wrapped from internet, and then decompose and process main elements of WSDL to respectively store in different tables of relational database, shown in Fig.4. The main elements include service name, function description, operators and input/output parameters. This kind of storage strategy will be able to flexibly extend to adapt to new web service description standard for future cloud computing service, and improve the efficiency of service composition. In fig.4, Service Main Table (SMT), Service Operator Table (SOT) and Service Parameter Table (SPT) are used to store the main elements for each Web service. For more details please refer to [5]. Concept similarity relationships are pre-computed based on WordNet [6] and stored in Concept Similarity Table (CST) for improving the efficiency in the process of service matching. We can calculate the matching relationship between different web services based on their input/output parameters by using those data in the above tables.

Store all results with high matching degree into Oneway Matching Table (OMT). Automatic service composition module analyzes all data in OMT to generate a weighted directed graph where each node denotes an atomic web service and each edge-weight denotes the semantic matching degree between two web services. Thus, the service composition problem is simplified to find all reachable paths of two nodes in the graph. Calculate all possible service compositions and store related results into Service Composition Table (SCT) and Composition Path Table (CPT). When there is a new service search request, where the system supplies two kinds of search modes including keywords, input/output definition. The Intelligent Ranking Module will extend key concepts in search condition based on Concept Similarity Table (CST) and search matched single or composite services in database. QoS Table will be used to filter or rank returned services in which QoS considers the following three elements and is recorded every time interval for long-term analysis. a) Response Time: the time interval between when a service is requested and when it is delivered; b) Availability: the percentage of time that a service is available during some time interval; c) Reliability: the probability that a request is correctly served. ADVANTAGES OF THIS APPROACH

The comprehensive approach towards service composition and ranking as illustrated above has many advantages, enumerated as under:3

Structured data: Traditional approaches have focused on using WSDL which provides only partial structure to the data. By adopting the aforementioned approach, and using concept similarity, the data can be completely structured. Multiple inputs and outputs: Most approaches towards composition have are such that there is just one input and one output to the service composition chain, which is a big constraint which hampers service usability. With the above approach, the inputs to and the outputs from the services can be many at every point, the only precondition being that the service matching condition holds and the outputs from one link form a subset for the next link in the chain.

Lower algorithmic complexity: It is incorrect to assume that only the shortest path is calculated here, because the least services for a composite service (which corresponds to the shortest path) does not imply that the QoS is the best or service price is the lowest. In database domain, finding all reachable paths is a recursive query problem and the time complexity is O(N3) where N is the amount of entries in the One-way Matching Table. EP-JOIN algorithms [3] reduce the time complexity to O(N2). The Fast-EP algorithm used for the above approach reduces the time complexity further to O(N*log(N)). Pre-composition The major contribution of the above framework is to avoid the need to dynamically compute the service composition when the service is requested. As mentioned above, the approach depends on precomputing possible composition paths and storing the same in a database for fast retrieval. This considerably lowers response time for service search and hence is much more efficient. OUR APPROACH TO SEARCH RESULT RANKING

aims to extend the concept of the PageRank algorithm to be applied to Web services, both atomic and composed. BACKGROUND To understand the approach that we adopt, it is essential to have some background in graph theory. This section is devoted to that end. The PageRank algorithm utilizes the concept of the importance of a Node in a given graph based on how well it is connected. But the concept of importance needs to be quantified to be measurable and hence, the goal of the algorithm is to assign a number i.e. mathematical value to each node that determines how vital it is to the graph. For example, consider the Figure 4. A layman view would be to assign a higher importance to the node with a higher degree (the number of nodes to which a node is directly connected) but if we look at the graph carefully, we can see that this will not always work.

Our major contribution is towards the improvement of search results of services based on popularity, extending the idea as given in [4] to the domain of workflow based service composition framework explained above. The following information about each web service (irrespective of whether it is atomic or composed) is deemed important: The number of services that point to it, as a percentage of the overall services, similar to incoming links to a webpage. As above, the percentage of web services it points to, similar to out-bound links from a web-page. The degree of semantic matching of that particular service. In the world of the Web, [7] was a revolutionary contribution which gave the widely known and acknowledged PageRank algorithm to efficiently rank the web-pages keeping in mind the importance of each web-page based on its importance as determined by the above parameters. Our work 4

OBSERVATIONS The red-node is the most important node in the graph because it is the most well-connected one. Even the yellow-node is well connected the reason why the former has a much higher rank is because it is possible to go from the yellow-node to the red node but NOT vice-versa. The orange node has a very high rank because it has an incoming edge from the most important node in the graph and moreover, it is the only outgoing edge from the red node. Most of the other nodes have very small ranks because they lack incoming edges. THE PROBLEM As mentioned in the approach towards composition in the earlier section, each node in the service composition graph

represents an atomic web service and each path represents a composite web service. Our objective is to find a suitable ranking mechanism for the ranking of atomic as well as composite web services based on their connectivity. The PageRank algorithm briefly explained above already provides an efficient way to rank the nodes in a graph. Our objective now becomes to find the relative ranking for the nodes AND the paths in the graph. An important point here is that to a user, atomic as well as composite web services are indistinguishable. He is just concerned with finding the most appropriate web service for his demands and is not concerned whether the service is a combination of other services. In the PageRank algorithm the rank of a web-page depends on the other web-pages in the graph; here, the rank of a web service depends on all the atomic as well as composed web services in the graph. In other words, the rank of a web service depends on the relative ranking of the nodes and the paths in the graph. We need to find a way to make a path indistinguishable from a node. For this we use the concept of path contraction. PATH CONTRACTION Consider a simple connected graph, comprising 4 nodes.


We repeat this procedure to provide for the addition of a new node for each path in the original service composition graph. Now we need to move on to the Ranking part of the process. First, we need to realize the concept of single and dual existence. Single Existence: The rank of each service depends on other atomic as well as composed web services. The atomic web services, as well as their connections were part of the original graph and hence they do not have a duplicate existence, as in they are represented only once in the entire modified service composition graph. Dual Existence: A composed web service (represented by a newly added node) is represented twice in the graph, once as the original path and the other as the extra node just added.


This graph represents 10 web services in total; The atomic web services represented by individual nodes i.e. 1,2,3 and 4 The composed web services represented by the paths in the graph; Hence, each of the following is a composite web service: 12 34 234 23 123 1234 What we do is to add a node corresponding to each composed web service to the graph; this process is termed path-contraction. After this process, each composed service will also be represented by a node in the newly-formed service composition graph, and the PageRank algorithm can now be executed to calculate the rank of each node in the graph. The process of adding a node: To add a node corresponding to a path in the graph, we need to maintain the same connectivity for the extra node as for the path itself. Hence, the extra node gets an incoming edge from all nodes which point towards the start point of the path, and an outgoing edge towards all nodes which point out of the end-point of the path.

From the above it is quite clear that the calculation of the rank for atomic web services is not difficult as we can directly apply the PageRank algorithm to get the scores which will indeed represent the individual rankings. But to calculate the rank for composed web services, we need to eliminate the effect of dual existence, and to do that, while we calculate the rank for each composed service, we only consider it to exist as the additional node and drop all the edges in the original path. In this way, we eliminate the dual existence of a path, though the computation of rank for each path has to be done individually which is a substantial overhead.

occurs to a certain degree (measure on a scale from 0-1) called semantic matching degree (SM degree). As service composition is done, and paths become longer, the matching degree decreases quickly and must be taken into account. To take account of this, we propose that each edge be weighted according to the SM degree, and the PageRank algorithm be modified accordingly. Unfortunately, no well-recognized techniques have been developed to apply the PageRank concept to weighted graphs; once that is done, maybe this concept can be extended further. EXPERIMENT The code and the test-data provided by the authors of [5] was used to carry out the service composition process on a large data-set and then the test-bed was further extend to implement the generation of the modified service composition graph and the subsequent ranking of the new nodes thus formed was found by running the PageRank algorithm individually for each of the nodes. The data persistence agent used was Postgre SQL and the programming platform used was Java. To generate the node rankings we used Sonamine [12], a proprietary software which calculates the PageRank for the nodes of an input graph. The objective of our testing was to demonstrate the implementation of our approach towards service ranking and to show the efficiency of the ranking for composed services that can be generated based on their connectivity and popularity. TEST RESULTS The following is the input service composition graph that we supplied to our ranking mechanism:


The pseudocode for the algorithm that we have proposed is as follows:FOR every path X1X2XK REPEAT Add new node X1K to the node table SCAN THE path table UNTIL ( a path containing X1XK as a sub-path) REPEAT FOREACH (edge of the form XJX1 or XKXM) Add edges XJX1K and X1KXM to the edge table FOR every path X1X2XK REPEAT Remove all the edges X1X2, X2X3, XK-1XK from the edge table (temporarily) Run PageRank algorithm to get the node rankings for node X1K Known Issues and further development: 1. When we drop the edges corresponding to a particular path to eliminate the effect of dual existence, the web services which use the path partially become ineffective. If the network is large i.e. the number of nodes and the paths are small, this effect however, can be neglected, though it will result in some inaccuracy inevitably creeping in. 2. In the real world, services are never really completely matched, i.e. the inputs and the output matching only

Nodes 1 to 8 are atomic services and then we generate all possible composed services which are numbered from 9 to 82 in the Figure 9. As our approach generates exactly one new node for every path in the initial service composition graph, the modified graph has a much larger number of nodes and is illustrated below.


Here, we can make the following observations which are also intuitive considering the graph structure: The atomic web services all have a very high rank because they are the most connected. This is expected as they form the backbone upon which all the composed web services are developed. As the paths get longer, the composed web services quickly lose out on ranking because their connectivity decreases and hence their popularity also decreases rapidly. Consider node 29(actually representing the path 245) , it has the highest ranking among the composed web services. This is in line with expectations since it has incoming edges from nodes 1 and 3, two of the most important nodes in the graph, and an outgoing link to node 6, another well-connected and significant node. ACKNOWLEDGMENT

REFERENCES 1. [Pages 17-20] Zakir Laliwala., Event-driven Service-oriented Architecture for Dynamic Composition of Web Services, Ph.D. thesis, DA-IICT 2008. [Pages 6-7] Jinghai Rao and Xiaomeng Su, A Survey of Automated Web Service Composition Methods, publisher: Springer in 2005. [Pages 808-815] Joonho Kwon, Kyuho Park, Daewook Lee, Sukho Lee: PSR: Pre-computing Solutions in RDBMS for FastWeb Services Composition Search. ICWS 2007. John Gekas and Maria Fasli University of Essex, Department of Computer Science, Wivenhoe Park, Colchester CO4 3SQ, UK, Automatic Web Service Composition Based on Graph Network Analysis Metrics, Springer 2005. [Pages 290-300] Cheng Zeng, Xiao Guo, Weijie Ou, and Dong Han State Key Lab of Software Engineering, Wuhan University, 430072, China., Cloud Computing Service Composition and Search Based on Semantic, published Springer 2009. Schickel-Zuber, V., Faltings, B.: OSS: A Semantic Similarity Function based on Hierarchical Ontologies. In: International Joint Conferences on Artificial Intelligence (2007) L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998. C Ferris, J Farrell What are web services?, Communications of ACM, 2003.

2. 3.




First and foremost, we extend our gratitude to Mr. Xiao Guo from the Wuhan University, China. Regular interaction with him through email provided us with important insights in the field of service composition and ranking and helped us in better understanding his own approach which formed the basis of our work. We would also like to thank Mr. Amit Nathani and Ms. Janki, from the M Tech batch; their help while understanding the basics of cloud computing and SOA was significant. Particularly instrumental to our research was the assistance that we got from Mr. Gourav Somani; he gave us the idea and the initial impetus to work in the field of service composition. Lastly, we would especially like to thank our mentor, Prof. Sanjay Chaudhary; regular meetings and detailed discussions with him were the sole reason that we stayed focused on our work, and without him it would not have been possible. 7


8. 9.


[web-page] [as on 27-04-2010].

10. [web-page] [as on 27-04-2010]. 11. [web-page] [as on 27-04-2010]. 12. [web-page] [as on 27-04-2010].