Approaches To Failure and Recovery in Service Composition

Approaches to Failure and Recovery in Service Composition
by Petrus Johannes Steyn bisag@webmail.co.za
Department of Computer Science University of Pretoria Pretoria, South Africa
November 2006
SPE780 Computer Science Honours Project
Table of Contents
Topic
1 2
Page Number
INTRODUCTION .............................................................................................................. 5 OVERVIEW OF WEB SERVICES.................................................................................... 6 2.1 2.2 2.3 WHAT IS A WEB SERVICE? ......................................................................................... 6 SOME OF THE PROBLEMS ........................................................................................... 7 SOME STANDARDS RELATED TO WEB SERVICES ......................................................... 8
FAILURE: AN INTRODUCTION .................................................................................... 10 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 AVAILABILITY FAILURES. .......................................................................................... 11 CONCURRENCY FAILURES ........................................................................................ 13 DEPENDENCY FAILURES ........................................................................................... 14 INCONSISTENCY FAILURES ....................................................................................... 15 COMPOSITION FAILURES .......................................................................................... 17 PARTIAL FAILURES .................................................................................................. 17 FAILURES DUE TO AMBIGUOUS OUTPUT .................................................................... 20 OTHER FAILURES ..................................................................................................... 21
POSSIBLE RECOVERY METHODS ............................................................................. 22 4.1 4.2 4.3 4.4 4.5 TRANSACTIONAL APPROACH .................................................................................... 22 DYNAMIC WEB SERVICES ......................................................................................... 25 LANGUAGE CONSTRUCTS ......................................................................................... 26 SELF-HEALING NETWORKS ....................................................................................... 27 TRIVIAL RECOVERY METHODS .................................................................................. 30
FAILURE DETECTION .................................................................................................. 31 5.1 5.2 5.3 DEFENSIVE PROCESS DESIGN .................................................................................. 32 SERVICE RUN-TIME MONITORING .............................................................................. 32 WOFBPEL .............................................................................................................. 33
THREE SCENARIOS ..................................................................................................... 33 6.1 6.2 6.3 FOREIGN TRAVELLER INFORMATION ......................................................................... 33 GENERAL ENTERTAINMENT PLANNER ....................................................................... 35 MALL INFORMATION SYSTEM .................................................................................... 35
EXAMPLE SCENARIO: SHOPPING DOMAIN ............................................................. 37 7.1 PROGRAM DEMO...................................................................................................... 41
8 9 10
RELATED WORK .......................................................................................................... 41 CONCLUSION................................................................................................................ 43 ACKNOWLEDGEMENTS .............................................................................................. 44
Table of Contents for Figures

Figure Page Number
Figure 1 Web Services Stack .............................................................................................. 9 Figure 2 Flow Diagram of an availability failure.............................................................. 11 Figure 3 Flow Diagram of a Partial Failure ...................................................................... 18 Figure 4 Flow Diagram of a process showing Ambiguous Output............................... 20 Figure 5 Flow Diagram of Transactional-Based Approach............................................ 25 Figure 6 Catch Branch in OBPMS .................................................................................... 26 Figure 7 Flow diagram of Pizza Company ....................................................................... 29 Figure 8 Flow Diagram of a Trivial Recovery Method .................................................... 31 Figure 9 Foreign Traveller Information ............................................................................ 34 Figure 10 General Entertainment Planner ....................................................................... 36 Figure 11 Mall Information System................................................................................... 36 Figure 12 Flow Diagram showing where Sub-Goals will be Checked .......................... 39 Figure 13 Screenshot of Program requesting data ........................................................ 48 Figure 14 Busy Searching for Shops ............................................................................... 49 Figure 15 Results Found ................................................................................................... 50 Figure 16 Displaying Results ............................................................................................. 51 Figure 17 Failure with the possibility of Recovery ......................................................... 52 Figure 18 Notification of Failure without the possibility of Recovery .......................... 53
Table of Contents for Examples

Figure Page Number
Code Example 1 Service not Found Exception from BPEL Console............................ 12 Code Example 2 Error Message produced by server when incorrect types are used as input .......................................................................................................................... 14 Code Example 3 BPEL Code from Example.................................................................... 16 Code Example 4 The Corresponding WSDL description ............................................... 16 Code Example 5 Time out Exception from the BPEL Server......................................... 19 Code Example 6 Time out Exception shown in the BPEL Console .............................. 19 Code Example 7 Catch Branch In BPEL .......................................................................... 27
Abstract. Web services have become a vital part of our lives. People do not always know that they are there, but we do notice it when something went wrong. There are various problems that can occur when using Web Services. These problems can be trivial problems like a broken connection or even more complicated like composition problems. These problems, or failures, can be fixed by making use of different recovery methods. Some common recovery methods that are being researched today include Selfhealing networks and Transaction-based strategies. Most of the research today is going into Self-healing networks and dynamic composition of services. Many different detection methods also exist and the two that are used frequently in Self-healing networks are namely Defensive Process Design (DPD) and Service run-time Monitoring (SrtM). These two are examples of run-time detection strategies. There are also static or off-line detection strategies. WofBPEL is a good example of a static fault detection strategy. There are different applications for Web Services. Three examples applications are discussed briefly in this document. Each of them can be implemented in the real world and can be of value if implemented successfully. This document proposes a classification for some of the most common failures that can occur when using Web Services. It also proposes some recovery methods that can be used to recover from these common failures. One of these recovery methods is also illustrated at the hand of a real world example.
Keywords: Web services, composition failures, recovery methods.
1 Introduction
Web services are becoming a big part of everyday life. We use them all the time without even knowing it. But like everything in life, we only notice something, when its not working. Web services are very dynamic. They are all around us and we use them everyday without even knowing it. Popular web site use Web Services to find and display information from various different domains. The travel domain is on the domains that rely heavily on the use of Web Services. Travel agencies use Web Services to connect to other companies (like airline companies of bus companies) to get their schedules and prices from them. They can be working for months at a time, and then suddenly go down for various reasons. When they are down, the system has to, somehow, recover the data the user requested. There are various ways in which this can be done. In this document, I try to classify some of the most common failures that can occur when using Web Services. I also take a look at a few recovery methods and also briefly discuss three failure detection strategies that are used today. These failures detection strategies can be classified into two categories; run-time detection strategies and off-line detection strategies. These will be discussed in Section 5. Finally, I use a real world example to illustrate one of the recovery methods that I discussed. The remainder of this document is broken up as follows. In Section 2, I give a brief overview and introduction of web service. In Section 3, I give an overview of the different types of failures that can occur. In Section 4, I give some methods for recovering from different failures. Section 5 briefly covers some detection methods found in literature. In Section 6 I introduce a few real world examples and also give some examples of what errors can occur during the use of these examples. Section 7 goes more in depth into one of the examples introduced in Section 6. In Section 8, I discuss some related work and Section 9 concludes this document.
2 Overview of Web Services

As stated in the introduction, web services are all around us. We interact with them all the time without even knowing that they exist. But what is a web service? And why are they so important today? 2.1 What is a Web Service? A Web Service is an entity on the web that can provide various kinds of information to clients. Some types of services offered are: Weather Services, Exchange Rate Services, Language Translation Services, Geographical Information Services, etc. These services are accessible from anywhere in the world, and they are always available (at least theory). The use of these services are not limited, although, some providers can charge clients for the use of the services that they provide. They form part of a greater architecture known as Service Oriented Architecture (SOA). According to Wikipedia [2], SOA is a software architectural concept that defines the use of services to support the requirements of software users. Web Services are often identified as the default implementation of SOA, but SOA can be implemented using various other service-based technologies. As an example, a Web Service can be compared to a company that is providing some service to the community. People from the community can use this service to their advantage. Lets say the company is a supermarket. The supermarket will supply the community with the goods that they want at an affordable rate. Unfortunately, as is always the case, competition is not far away. Another supermarket will open up soon, offer the same services, but at a better price. This will cause the old supermarket to either lower their prices, or offer newer services to their customers. This example describes what is happening continuously with web services. One web service, web service A, will offer exchange rate information. Later a new web service, web service B, also does the same, but offers better information (more up to date). In response to this, web service A will start to offer more services, like additional stock exchange information. This race can continue until one service provider will stop its service completely. Web services, as said, are always available. The only thing a client has to do is to go out and find them. Finding a service that meets a clients
requirements can be complicated, but things can become easier if the service makes use of certain methods to advertise itself. Services make use of a Description Language to describe what it does, and how a client can get access to it. This is described in Web Services Description Language (WSDL), which serves as an interface to the service. A WSDL description will supply the client with the necessary operations to invoke it, and might include a description of the functionality of the service as well. Another method of advertising is making use of Ontology-annotated signatures. These signatures, according to Brogi & Popescu [2005], describe the semantics of a service. The semantic description of a Web Service will describe, not only what the Web Service is, but also in what context to use it (Foggon et al. [2004]). These signatures will eventually be used in the WSDL descriptors to fully describe the service, and to expose the interface. There are various other methods and languages that have been developed around web services, and they will be discussed later. 2.2 Some of the Problems What are the problems facing us when we want to use multiple services to gain useful information? Why can we not just use one service for all our needs? According to Yu & Lin [2005], services can be upgraded or changed dynamically according to changes or needs in the environment. This can result in problems if interfaces to these services also change. When it comes to service discovery, Sahin et al. [2005] states, that although many advances have been made when it comes to service discovery, most of the service discovery techniques have 2 major problems. (1) There is usually some centralized server involved which handles all requests and this provides a central failing point for the whole system, and (2) many servers offer limited search capabilities, which means that you will not be able to always find the best service. Once you have access to the services, and you retrieved the necessary data from them, the system then has to compose the data in a meaningful way. This is known as service composition. During this process, the system must be able to distinguish between data that is useful, and data that is unwanted. This is not always very easy, and it cannot be guaranteed that the
data we receive is the correct data. In Yu & Lins [2005] paper, the authors take the approach of using Quality of Service measures, to ensure that data that are retrieved are correct. The problem with this is that you have to compare various services with each other in order to establish which service offers the best quality data. Another method of making sure that you do get the correct data is to always use a trusted and reliable service provider. This will ensure that the data is always correct, and that you receive a quality service. However, things do go wrong. Service providers might change the services they offer, their servers might crash, or they might shut down their servers. In such cases, any use of the services provided by the service provider will result in a failure being reported by the system. There are various ways in which we can recover from failure. The system can keep a backup of previous searches (in the form of a cache), and can use this data. However this data will not be up to date and it might be invalid. The system can also launch a search for a new service provider, or search for a web service that claims to offer the same services. This will result in the user getting the most up to date and correct data, but it might take a while to perform the search. Different recovery methods will be discussed in Section 4. 2.3 Some Standards related to Web Services According to Tartanoglu et al. [2006] the overall definition for Web Services architecture is still incomplete. The base standards for Web Services have already emerged from the W3C. They define a core middleware that is partly built upon results obtained from object-based and component-based middleware. The main standards for Web Services architecture as defined by the W3C Web Service Activity and the Oasis Consortium are: SOAP (Simple Object Access Protocol): A lightweight protocol for information exchange. It sets the rules on how to encode data in XML. It also describes invocation semantics and mappings to other Internet transfer protocols. WSDL (Web Services Description Language): An XML-based language that specifies a services interface (the type of messages that
the service can understand), and the binding information (the protocol dependant details). UDDI (Universal Description, Discovery and Integration): A registry for dynamically locating Web Services. It can also be used to advertise Web Services. Figure 1 shows how these standards fit together in the technology stack. This figure is adapted from the figures that can be found in Mikalsen et al. [2002], Tartanoglu et al. [2006] and van der Aalst [2003]. Along with these standards, there also exist a number of languages that are part of Web Services. The defacto standards for Web Services are BPEL and WSDL. Where WSDL describes the services interface, BPEL describes the services workflow. It describes the interactions that can be performed on the services (interactions like invoke, reply and receive). These two languages have been used very successfully up until now. Both have their roots in XML, and both make use of several W3C approved standards.
Figure 1 Web Services Stack
Van der Aalst [2003] took a pessimistic look at some of the standards that have been developed in and around Web Services and work flow languages in Web Services in his contribution: Dont go with the flow: Web services composition standards exposed. According to him, all of the supports claimed by some of the languages are unfounded. He is also under the 9
impression that there are too many so-called standards. Some of the languages that he inspected were: BPEL, Microsofts XLANG, IBMs WSFL and the Workflow Management Coalitions XPDL. From his research, BPEL was one of the most comprehensive languages, albeit the most complex one. His research though was done back in 2003, and since then, BPEL has become a default standard for describing the work flow of Web Services. There are also a few other languages (discussed later in this document), but all the examples and code in this document is in BPEL and WSDL.
3 Failure: An Introduction
Different types of failures can occur during the use of web services. These failures can be caused by something as simple as a broken connection or busy server, but they might also manifest during the composition of services. Most of these failures can be solved with little effort, but sometimes the problem lies much deeper. Some trivial errors that can happen are broken connections and server downtime. These are caused by external factors most of the time since the fault can lie at the server side (and in the case of the server downtime, the fault will definitely be caused by the server). There are many other types of failures that can range from concurrency problems, to dependency problems, and even availability problems. Most of the types of failures can be classified accordingly: Failures caused by availability. Failures caused by concurrency. Failures caused by dependency. Failures caused by inconsistency. Failures caused by incorrect composition. Partial failures caused by incorrect parallel execution.
Failures due to ambiguous output. These classifications are not the only ones that exist, but they are the most common ones. Even though most tools will not deploy services with some of these failures, they can still find their way in if you deploy them manually. In the following sections I will describe each classification, and also provide some examples of how these failures can occur. 10
Where possible, I used Oracle JDeveloper 10g [1], and Oracle BPEL Process Manager Server [1] to simulate the errors in the examples. All the examples were coded in WSDL and BPEL, mainly due to the development environment, but also because they work well together and because of their popularity. Other languages do exist, but the pros and cons will be discussed later in this paper. All examples make use of dummy services that only take simple inputs and give back simple replies. 3.1 Availability Failures. Failures in this classification can almost always be traced back to the server or the connection to the server. They can present themselves in the form of a time out, or a service not found error.
In OBPMS, this indicates that an error occurred during the execution of the process in question.
Figure 2 Flow Diagram of an availability failure
11
During a Time Out, the client will usually stop requesting the service after a certain amount of time due to the server not responding to its requests. This can be caused by a busy server, or a broken connection, or a lost message. Either way, the service cannot be accessed at that time. A Service Not Found error can be attributed to a faulty server, or a deleted service, or a broken connection. In these cases, the client assumes that the service is deleted because it cannot find the service or the server that is hosting the service. A Service Not Found error can also be caused by the same conditions that cause a Time Out error. In this example, I make use of a service that does not exist any more. The system responds with a Remote Fault (basically a Service Not Found error), and will return this error to the client. In Oracles BPEL Process Manager Server [1] (OBPMS), the following output was observed. In OBPMS, the user has the option of viewing either the flow diagrams of the service or the code of the service. The following figure (Figure 2) is the resulting flow diagram produced by OBPMS. If we take a look at the code of the example, the following error was reported by OBPMS.
<process> <sequence>
receiveInput
[2006/10/16 10:03:42]
Received "clientInput" call from partner "client" More...
<scope name="shopScope"> <sequence>
shopInputAssign
[2006/10/16 10:03:42] Updated variable "shopInput" less <shopInput> <part xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance name="payload"> <shopdef xmlns="http://services.otn.com"> CNA,PRETORIA </shopdef> </part> </shopInput>
searchShop (faulted)
[2006/10/16 10:03:42]
" remoteFault" has been thrown. less
Code Example 1 Service not Found Exception from BPEL Console
12
<remoteFault xmlns="http://schemas.oracle.com/bpel/extension"> <part name="code"> <code>WSDLReadingError</code> </part> <part name="summary"> <summary>Failed to read wsdl. Failed to read wsdl at "http://localhost:9700/orabpel/default/ShopServiceV2 /ShopServiceV2?wsdl", because "WSDLException: faultCode=INVALID_WSDL: The document: http://localhost:9700/orabpel/default/ShopServiceV2/ ShopServiceV2?wsdl is not a wsdl file or does not have a root element of "definitions" in the "http://schemas.xmlsoap.org/wsdl/" namespace or the "http://www.w3.org/2004/08/wsdl" namespace.". Make sure wsdl is valid. You may need to start the OraBPEL server, or make sure the related bpel process is deployed correctly.</summary> </part> </remoteFault> </sequence> <scope> </sequence> [2006/10/16 10:03:42] [2006/10/16 10:03:42] </process>
"BPELFault" has not been caught by a catch block. BPEL process instance "1105" cancelled
Code Example 1 Service not Found Exception from BPEL Console continued
3.2 Concurrency Failures With concurrency failures come all the usual concurrency problems that exist in normal computer systems and networks. A service can be used by more than one client at any time, and this can cause concurrency problems if the service is being updated by a client or by the server. Other clients need to be informed about the update otherwise clients using the service will receive inconsistent or corrupt data from the service. These types of failures are difficult to detect unless the client is actually aware of the updates. Clients will not know the difference if they are using a service that is outdated, as long as they receive data that looks correct, according to them. These types of failures are not common, but they can cause big problems if not caught in time. In another scenario, a web service can be used as a resource that first needs to be acquired. This wont happen often though since it would not make sense to create such a service. In addition to this, Tanenbaum et al.
13
[2002] states that trying to lock resources that are distributed is difficult and can lead to a deadlock situation if not approached correctly. 3.3 Dependency Failures Services are not only limited to only supplying us with information. They can make use of other external services to gather the required information before passing it on to the client. Many problems can occur when using such a technique. Messages can get lost between services, and can cause a service to deduce that the called service is not available any more. A service can also pass on incompatible types to the services it calls (send on string values when integer values are expected). This can cause the receiving service to misinterpret the incoming message from the sender, and will produce incorrect results due to the incompatible types received. This type of error can be avoided when using a development environment, but as stated in the previous sections, service providers can and will update their services periodically. These updates might include changing the types of the expected input data. Unless these changes are communicated to the clients, or to the services using the updated service, failures will occur. In the following example, I tried to invoke a service with an incorrect type. The server was the only component to respond to this incorrect input. The service itself did not respond any further because the server refused to invoke it.
Message handle error. An exception occurred while attempting to process the message "com.collaxa.cube.engine.dispatch.message.invoke.InvokeInstanceMessa ge"; the exception is: XPath expression failed to execute. Error while processing xpath expression, the expression is "((bpws:getVariableData("inputVariable", "payload", "/client:DummyService_3ProcessRequest/client:input") mod 2.0) = 1.0)", the reason is NaN is not an integer. Please verify the xpath query. Code Example 2 Error Message produced by server when incorrect types are used as input
If the set up of the service was correct, in other words if we included catch branches and exception handlers, the service would have been invoked.
14
However, when working in a synchronous environment, the service would eventually throw a time out exception to inform the client that something went wrong if the necessary exception handlers are not present. If we were to work in an asynchronous environment, we would have to include catch branches to catch the exception. 3.4 Inconsistency Failures Every now and again, a service provider might decide to change the descriptors of some of their services. These changes can affect the access to them in either a positive or a negative way. On the positive side, the new descriptors might enhance the use of the service. On the negative side, the new descriptors might cause a service to become unavailable. Changes to a services descriptor file can cause one of two major problems. If the descriptor is changed during run time, a client already using the service might get unexpected results due to the new descriptor file. It can also cause a service to behave differently to what it is supposed to do. Changes to the descriptor can also cause a service to be broken completely. This can happen if the descriptor file, and corresponding BPEL file, are inconsistent, e.g. the BPEL file uses a variable that is not defined in the descriptor file anymore. The latter error should not happen too often since many software development environments provide checks and tools to prevent this kind of error. However, if a service provider chooses to do things manually, these errors can (and most probably will) occur. In the following code example, the descriptor has been changed, but the workflow file was kept the same. This will result in a failure. Most development environments will not allow the creation of such erroneous services. The highlighted code segments (shown in bold) are the code segments that will cause the inconsistency problems. The BPEL process still thinks that the input and fault variables can be accessed through the MapServiceRequestMessage and MapServiceFaultMessage respectively, whilst their names have changed in the description file to MapServiceInvokedMessage and MapServiceErrorMessage. The outcome of such an error cannot be tested in the environment setup that I chose to work in, so the resulting behaviour is unknown.
15
<partnerLinks> <partnerLink name="client" partnerLinkType="tns:MapService" myRole="MapServiceProvider"/> </partnerLinks> <variables> <variable name="input" messageType="tns:MapServiceRequestMessage"/> <variable name="output" messageType="tns:MapServiceResponseMessage"/> <variable name="fault" messageType="tns:MapServiceFaultMessage"/> </variables> Code Example 3 BPEL Code from Example
<types> <schema attributeFormDefault="qualified" elementFormDefault="qualified" targetNamespace="http://services.otn.com" xmlns="http://www.w3.org/2001/XMLSchema"> <element name="request" type="string"/> <element name="response" type="string"/> <element name="error" type="string" /> </schema> </types> <message name="MapServiceInvokedMessage"> <part name="payload" element="tns:request"/> </message> <message name="MapServiceResponseMessage"> <part name="payload" element="tns:response"/> </message> <message name="MapServiceErrorMessage"> <part name="payload" element="tns:error" /> </message> <portType name="MapService"> <operation name="process"> <input message="tns:MapServiceInvokedMessage"/> <output message="tns:MapServiceResponseMessage"/> <fault name="MapNotFound" message="tns:MapServiceErrorMessage" /> </operation> </portType> Code Example 4 The Corresponding WSDL description
16
3.5 Composition Failures Failures can also happen during the composition phase. During composition, different services offering different information are forced to work together (the composition part). During composition, you need to be able to rollback from an error (i.e. be able to recover to a point before the request started) and sometimes these rollbacks are either incorrect, or incomplete. In Section 4.1, I discuss a Transaction Based approach to recovery from these types of errors. Services can also be composed incorrectly (they are forced to work together, but they cannot) and this can also cause a huge problem from a clients perspective. These types of errors will not happen often, but it can happen that an incorrect service gets used due to its incorrect description (in Section 4.2 I discuss this problem again). 3.6 Partial Failures Partial failures are closely linked to composition failures since they can cause partial failures. A partial failure implies that during a parallel execution of services, one of the branches cannot find the needed or requested services. This is not a major problem since parallel execution usually implies that you only need the output from one branch, but you are working with incomplete data. From a clients perspective, it does not matter, since he would not know the difference (unless all the branches fail), but the goal of a service is to give the most accurate data to the client invoking it. As said in the beginning, partial failures are closely linked to composition failures. Sometimes composition failures can also go unnoticed by the client. Although these failures will not be noticed, it does not mean they will not have an affect. As said above, the goal of a service is to give the most accurate data to the client invoking it. If a service cannot supply that, then the service will not be good enough to use. Another form of a partial failure would be if we need the result from all the branches of the parallel execution. In some cases we might need the results from all the branches to continue with the execution. If one of the branches fails the system will still continue to completion, but with incomplete data. This will cause the returned results to be incorrect or corrupt even. We can force the execution to stop if we do not have all the necessary information to continue, but this will be unacceptable to a client using the service.
17
The following example will clarify this problem. As a client we only have access to one service or access point to the composite service. The service we are using is calling other services (in parallel) to gather the needed information. The following was observed when one of the required services was not found. Once again I show the resulting flow diagram (shown in Figure 3) and the code (shown in Code Example 5) from the OBPMS.
In OBPMS, this indicates a time out error. This will only happen when using synchronous services.
Figure 3 Flow Diagram of a Partial Failure
The response for the server and the corresponding code fragments obtained from OBPMS. 18
Com.oracle.bpel.client.delivery.ReceiveTimeOutException: Waiting for response has timed out. The conversation id is 455aa7269f0030c5:149d886:10e5fc38efa:-7ffc. Please check the process instance for detail.
Code Example 5 Time out Exception from the BPEL Server
<sequence>
Assign_2
[2006/10/19 10:52:47]
Updated variable "invokeDummy_initiate_InputVariable" less
<invokeDummy_initiate_InputVariable> <part xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" name="payload"> <DummyService_2ProcessRequest xmlns="http://xmlns.oracle.com/DummyService_2"> <input>HELO</input> </DummyService_2ProcessRequest> </part> </invokeDummy_initiate_InputVariable>
invokeDymmy
[2006/10/19 10:52:48]
Invoked 1-way operation "initiate" on partner "Dummy2". less
<invokeDummy_initiate_InputVariable> <part xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" name="payload"> <DummyService_2ProcessRequest xmlns="http://xmlns.oracle.com/DummyService_2"> <input>HELO</input> </DummyService_2ProcessRequest> </part> </invokeDummy_initiate_InputVariable>
receiveDummy - pending
[2006/10/19 10:52:49]
Waiting for "onResult" from "Dummy2". Asynchronous
callback.
Code Example 6 Time out Exception shown in the BPEL Console
This example required the output from both branches in order to complete the execution of the process. In this example I used a synchronous service instead of an asynchronous service. A Time-out error will only occur when using synchronous services. An asynchronous service will sit idle and wait indefinitely for a result without giving us a time out exception. If we include catch blocks in the service, we can avoid these errors. These methods will be discussed in Section 4.
19
3.7 Failures due to Ambiguous Output In very few cases, services can be composed in such a way so as to provide a user with more that one response to only one request. This is undesirable since we only want one unique response from a service, given a specific input. Even though some tools will not allow this type of service to be deployed, they can still exist if they are created without the help of a tool.
Figure 4 Flow Diagram of a process showing Ambiguous Output
20
As said above, some tools will not allow these types of services to be deployed, and that is also the case of JDeveloper 10g [1]. These services can be created, but they are riddled with errors usually. In the diagram below (Figure 4), I try to show how this might look. This example takes a string as input, and delivers two outputs; the string all upper-case, and the string all lower-case. This service was deployed onto the server, but failed to run to completion. In a real world scenario, this service would be able to run, but the output would be determined by the speed with which each branch executes. The slowest branchs output would be the output that would be displayed, unless two output variables are defined (which is very difficult to do). In addition, according to Ouyang et al. [2005], a BPEL process must not use two or more receive actions on the same partner link, port type, operations or correlation sets. This means that we cannot have two or more input or output ports that are using the same variable. This statement is also defined in the BPEL specification. However, this type of error sometimes does still occur in real world services. 3.8 Other Failures There are other failures that can occur when using Web Services that do not fall into any of the categories mentioned above. Quality of Service (QoS) problems and Service Level Agreement problems would be some of the most common ones that cannot be classified. However, these two types of failures can be traced back to any of the above mentioned failures. A problem with the Quality of Service would result in a service just being slow to react, or giving back results that is correct, but not up to standard. The only way that this problem can be fixed would be to rebind to a new service. In Yu & Lin [2005], the authors describe how to rebind to a new service that will deliver a better Quality of Service. A Service Level Agreement error will result in the use of an incorrect service. As described briefly in Section 2.1, services need to advertise themselves. If these descriptions of their services are incorrect, we might end up making use of a service that is delivering faulty and incorrect data to us. With a Service Level Agreement, we enter into a contract that promises us the correct data, according to the description of the service. If this description
21
was incorrect to begin with, the agreement is void, and we end up with a binding to an incorrect service. Once again, the only way to fix this would be to rebind to another service, but we might end up rebinding to another faulty service. There are ways to ensure that the services that we rebind to are correct. These are discussed in the next section.
4 Possible Recovery Methods

Due to failures that can, and will, occur when using web services, various methods have been researched to be able to recover from these failures. Some of these methods include transactional methods, Self-healing networks and using QoS constraints as a heuristic in dynamic composition of services. Tanenbaum & van Steen [2002] and Tartanoglu et al. [2006] classify recovery methods into two sections; backward error recovery and forward error recovery. Backward error recovery involves rolling back to a safe state, and retrying the operation. This approach is followed in transactions. Forward error recovery tries to recover from an erroneous state by transforming it into a safe state. This approach is followed by Self-healing networks. In this section I give an overview of some of the proposed methods of error recovery when using web services. I do not do a classification of recovery methods however. I will also take a look at some trivial methods (like caching). 4.1 Transactional Approach According to Tanenbaum & van Steen [2002], a transaction is an operation that has an all-or-nothing property. This is sometimes also referred to as the ACID property. Operations that exhibit the ACID property are said to be; Atomic, Consistent, Isolated and Durable. Atomic: The transaction appears to be indivisible to the outside world. Consistent: The transaction will not violate any invariant rules of the system. Isolated: Transaction appears to happen sequentially if they are concurrent (in other words, they do not interfere with each other) Durable: Committed transactions cannot be undone, even if the system crash after a transaction has been committed.
22
They also classify three types of transactions; flat transactions, nested transactions and distributed transactions. A flat transaction is a normal transaction that will only commit after the main goal has been reached. This type of transaction is what is normally referred to when speaking of transactions in general. Nested and distributed transactions are discussed later in this section and usually apply to systems spread across a network. During a transaction, an operation can only be started once all the resources required for the operation have been acquired. Once these resources have been acquired, which usually implies that they have been locked by the acquiring process, the transaction will run to completion before releasing the resources. It will also only make changes to the acquired resources permanent once the transaction have completed successfully. According to Mikalsen et al. [2002], a Transactional Approach can be used successfully to recover from failures that can occur. A lot of architectures already support this model since it is quite easy to understand and to use. The basic idea behind a transactional approach is this: Only commit when every sub goal has completed successfully. Using a common example of the travel domain, this is how a transactional approach would work: A client sends a request to a service, requesting different booking details from a travel agent. The system then goes and finds the relevant details for each sub goal (booking a flight, booking a hotel etc.). As soon as one sub goal cannot be completed, for example a flight cannot be booked, the system will stop, and roll-back to before the request was made. This roll-back action will undo all actions performed and thus reset the state to just before the request. The client can then restart the request with different parameters. Sometime this complete roll-back is undesirable. If the flight cannot be booked due to an unreachable server, we will not be able to complete the transaction without changing the service we are using. A less strict way of doing things, would be to commit after certain sub goals have been reached. This will give the client more power to choose when he wants to commit. In the example above, we can set the system up in such a way that the system can commit after each sub goal is reached. If one of the sub goals should fail, we can still do a partial commit, and complete the transaction in some other way (to fill in the missing information).
23
Using the same example, if the system is unable to book a flight, the client can still commit to the hotel bookings and the car rental bookings. The client can then choose to do the flight booking manually, or let the system search for other flights that will also reach his final destination on time. In this way, the system will only roll-back to the start of the failed sub goal, instead of rolling back completely. Such an approach is called a nested transaction (Tanenbaum & van Steen [2002]). Another approach would be to make use of a distributed transaction, but this would not be satisfactory. In a distributed transaction, the transaction is approached as a normal transaction, with the difference that the resources are spread across a network. We still lock resources and perform the transaction as if it was a normal flat transaction on a non-distributed platform, but this can cause problems and can be difficult to manage for the two reasons mentioned below. According to Tartanoglu et al. [2006] a transaction-based approach is not suited for the composition of Web Services for mainly two reasons. Transaction management becomes more difficult over a distributed system. The main problem is that it requires cooperation among the transactional supports of Web Services, which may not be amenable with each other, or not willing to do so.
A transaction-based approach usually involves locking resources until you are done with them. In a Web Services environment, this is not really feasible. Overall though, this type of error recovery is a good method to use. It has been proven to work in many different domains already, and a transactionalbased framework already exists for Web Services. Using a simple flow diagram, the basic of using a Transaction-Based approach is illustrated below.
24
Figure 5 Flow Diagram of Transactional-Based Approach
4.2 Dynamic Web Services There are two ways in which services can be composed: static and dynamic. Static composition is the easiest, and also the most stable method to compose web services. Services are bound to each other during the compiling of the service, and the bindings stay the same until the need arise for them to change. However, we live in a dynamic world. Services do change periodically to reflect new information or data. This will result in services that are statically bound, to become useless, unless the new updated services interface is still the same. To overcome this problem, we can make use of dynamic composition. This type of composition occurs during run time. Services are bound to other services on the fly (based on their WSDL descriptions and ontological annotations). Dependency and composition failures can easily be solved by this method. There is a slight problem using this method though. If a service is advertising itself as something that it is not, using this method might result in the use of incorrect services. As an example, if we are looking for services that provide translation services, and we end up using a service that advertised itself as a translation service, but is actually an exchange rate service, our resulting feedback will be totally incorrect. When using dynamic composition, we cannot pick up on such problem until it is too late. We can 25
make use of other recovery methods, in conjunction with dynamic composition, to solve these problems more effectively. 4.3 Language Constructs BPEL and WSDL provide us with some error support. We can include catch branches and we can catch exceptions as they occur. These constructs can only catch the exceptions that are defined though, but they are still useful. We can force a process to complete (even with incomplete data) by using these catch constructs. As an example, if we are expecting an integer value, and the service gets a string value, we can use the catch block to substitute the incoming value with a default value. We can only do so if the value is not important or needed for the completion of the process, but in most cases, such a solution just will not do.
Catch Branch in OBPMS
Figure 6 Catch Branch in OBPMS
26
We can however use catch branch to safely recover from an erroneous state. Instead of just throwing an exception, we can use the catch branch to catch the exception, and return a user friendly message to inform the client that something went wrong. The example in Figure 6 makes use of a catch branch. In the code, below, you can see where the catch branch is inserted (the faultHandlers section). If an exception is raised, or the input is incorrect, the catch branch is invoked, and a default assignment is made. In this example, a default error message is copied to the output variable.
<scope name="shopScope"> <faultHandlers> <catch faultName="ns1:ShopNotFound"> <sequence name="Sequence_3"> <assign name="assignErrorMsg"> <copy> <from expression="'Shop Not Found'"/> <to variable="clientOutput" part="payload" query= "/client:ShopFinderProcessResponse/client:result"/> </copy> </assign> </sequence> </catch> </faultHandlers> <sequence name="Sequence_1"> <assign name="shopInputAssign"> <copy> <from variable="clientInput" part="payload" query="/client:ShopFinderProcessRequest/client:input"/> <to variable="shopInput" part="payload" query="/ns1:shopdef"/> </copy> </assign> <invoke name="searchShop" partnerLink="ShopSearch" portType= "ns1:ShopServiceV2" operation="process" inputVariable="shopInput" outputVariable= "shopOutput"/> </sequence> </scope> Code Example 7 Catch Branch In BPEL
4.4 Self-healing Networks This is where most of the research has gone into so far. Many researchers try to come up with new ways in which a network can heal itself without user intervention. Yu & Lin [2005] uses some form of a self-healing network in their paper. They combine it with QoS constraints as a heuristic. Baresi et al. [2006] also proposes to make use of self-healing networks. But what is a selfhealing network?
27
Self-healing networks are networks that are capable of recovering from errors by themselves. In a Web Services context, they are networks that can recover from composition faults by themselves. This is done by making use of some external heuristic that monitors the networks behaviour. Different types of self-healing strategies have already been proposed. There are strategies that make use of QoS constraints as a way of ensuring stability when composing Web Services (Yu & Lin [2005]). Baresi et al. [2006] proposes a strategy that is based on design by contract (a construct borrowed from the Eiffel language). In their strategy, you can set pre- and postconditions that have to be met (similar to QoS constraints), but they also weave in monitoring code that monitors the workflow and checks the pre- and post-conditions of the services invoked. Since Web Services live in a very dynamic environment, Self-healing networks might be the way to go in the future. In Baresi et al. [2006], they use the example of a Pizza Company to explain their concepts. The flow diagram in Figure 7 is taken from their paper. In the example, a client will use a web site or WAP enabled phone to contact the pizza company. The client then gets authenticated after which his profile is loaded. This profile holds information regarding the clients favourite pizzas. The Pizza Catalogue Service then offers the client a choice of four different pizzas. When the client made his choice, his credit card details are validated by the Credit Card Validation Web Service. If everything goes according to plan, the clients account is debited and the pizza companys account is credited. At the same time, the order will appear in the browser of the pizza chef, informing him of the new order. In conjunction to this, the address of the client is obtained from the Phone Company Service, and the GPS Web Service is then called to obtain the precise coordinates of the address. Once the coordinates are obtained, a map is retrieved from the Map Web Service. After this has completed successfully, the map is sent to the delivery boys PDA, and a SMS is sent to the client informing him that his pizza will be delivered in 20 minutes. In this example, various failures can occur, and because we are making use of dynamic composition, failures are bound to happen.
28
Figure 7 Flow diagram of Pizza Company
In the paper, the authors propose two types of failures detection, and three types of recovery methods. The two detection methods are briefly discussed in the next section (Section 5). The three recovery methods proposed by Baresi et al. [2006] are: Retry: if a binding to a service failed, we retry in the hope that it was a once of failure.
29
Dynamically bind to another service: we rebind to another service that offers the same functional or non-functional properties as the one that is unavailable.
Process reorganization: a dynamic reorganization of the process at run-time, in order to overcome the problems due to a faulty or unavailable external service, for which no alternative matching service can be found. These methods can be structured in a hierarchical fashion. This implies if a service cannot be reached, we first retry the service a few times. If that strategy doesnt work, we rebind to another service. If that strategy fails, we switch over to the most complex recovery method namely process reorganization. In process reorganization, we can locally reorganize services if we cannot rebind to another service that can offer the same properties as the unavailable service. This is done by using graph transformation rules. Using this strategy, we can split single nodes into parallel and disjoint nodes, and we can also combine parallel nodes into single nodes. This is done by ensuring that the pre- and post-conditions are the same for the resulting nodes after the transformation was applied. As an example, if a single node n is split up into two nodes n1 and n2, the pre-condition of nodes n and n1 will be the same. Similarly, the post-condition of nodes n and n2 will also be the same. This will result in the post-condition of n1 implying the pre-condition of n2. As a more concrete example, if the Get Map and Route Service cannot return a map of correct resolution for the PDAs, we can split up that service into two services Get Good Map and Route and Filter Map. Get Good Map and Route will return a high resolution map, and Filter Map will scale down the map to the proper resolution for the PDAs. 4.5 Trivial Recovery Methods There are some trivial recovery methods that can be used. A good one would be to use caching. Clients can cache previous retrieved information, and can recall it when the service cannot be found (Figure 8), or if some failure occurred during the request. This would only be useful if the service offers information that does not change too often (e.g. like a service giving information about bus times). In cases where information will change very 30
often (e.g. a service that offers the latest stock exchange information), this type of approach, would be useless, since it would not help a client to use information that is old. Another trivial method would be to just keep requesting the information until it is received, or until a specified time out is reached. This type of error recovery is the easiest, but it is the most undesirable of all recovery methods, since clients do not want to wait for a service to respond to a request. Clients would prefer to use the quickest and most accurate service, which will provide results in a fast and reliable manner.
Invoke Service
Service Invoked? No
Yes
Get Data from Cache
Figure 8 Flow Diagram of a Trivial Recovery Method
5 Failure Detection
Having now classified some of the most common failures and also having discussed some of the most common recovery methods, to bring the two together we need some way to detect whether a failure occurred or not. Failure detection algorithms are used in Self-healing networks to detect whether or not something went wrong during the composition phase. There are various ways in which this can be done, but these various techniques can
31
be split into two main categories: dynamic detection of errors and static detection of errors. Dynamic detection implies that the error or failure is detected during execution or during run-time. Static detection implies that errors or failures are detected in an offline fashion (in other words, not during run-time). Baresi et al. [2006] proposes two methods called Defensive Process Design (DPD) and Service run-time Monitoring (SrtM), which are two forms of dynamic detection. Ouyang et al. [2005] propose an automated analysis using Petri net techniques which is a form of static detection. Since this is not the main focus of this document, these methods will be discussed briefly in the following subsections. 5.1 Defensive Process Design According to Baresi et al. [2006], Defensive Process Design (DPD) consists of designing services in such a way so that they can cope with failures. This is done by using some of the language constructs that is included in the BPEL standard. By designing services in such a way, we can detect and gracefully recover from most exception and failures. As an example, a time-out failure can be detected in such a way by encapsulating the invoke action in a scope that has a timer. Once the timer has run out, the service can recover from the time-out exception by calling another service, or rebinding, or even retrying the same service. This type of detection ties in with Section 4.3 since we can use exception handlers and catch blocks to detect when an error has occurred. BPEL also provides us with other constructs that will also help with the detection of failures. 5.2 Service run-time Monitoring Service run-time Monitoring (SrtM) consists of making use of external monitoring tools to check whether functional and non-functional contract are violated. There are various methods that can be used to monitor services. Baresi et al. [2006] proposes an assertion based approach. In their approach, they specify pre- and post-conditions to remote services. These are checked by a separate tool that will notify the process engine if anything goes wrong. In the event that a pre- or post-condition has
32
been violated, the tool will notify the process engine, which will take the appropriate actions to recover from the error. The ASTRO tool set (Trainotti et al. [2005]) also makes use of a similar method in its WS-mon component. The only difference is that the monitoring code gets generated automatically by ASTRO and they use Java code to monitor the services. 5.3 WofBPEL Ouyang et al. [2005] proposes a technique that is based on Petri net analysis techniques. They propose the use of an external tool, WofBPEL, which can analyse composite services once they have been translated into Petri Net Markup Language (PNML). Unlike the previous two methods, which can be implemented to analyse service composition dynamically, this technique analyses service composition statically in an off-line fashion. A composite service needs to be translated into a secondary language before it can be analysed for errors. At the time of the article, the tool only supported three types of error detection: detection of unreachable actions, detection of conflicting message-consuming activities and metadata generation for garbage collection of unconsumable messages.
6 Three Scenarios
In this section I want to introduce three scenarios where service oriented computing (SOC) can be used for a real world implementation. There are many different applications for SOC, some that are very big, and some that are relatively small. With these scenarios I try to cover a wide spectrum from the smaller implementation (Foreign Traveller Information) to the large scale implementation (The General Entertainment Planner). 6.1 Foreign Traveller Information The idea here is that you are a tourist that just landed in a foreign country. You want to be able to get various information regarding transport options to and from your hotel. In this example, access to information regarding bus times, stations and prices can be accessed from a mobile device or your laptop. The way this is done is by making use of different services (one for bus times, another for 33
geographical information, etc.). The main program will go out and find suitable services to use, and will compose the received data in a meaningful way for the client using the program. Many services are involved, but only a small amount of data is needed from them in the end. See Figure 9. This can be related to a real world scenario. A university professor, on his way back from a conference, misses his connecting flight due to a delayed flight from his previous destination. He enquires about other flights and finds out that all flights to his final destination are booked full, and that the next flight is only available the following night. Now the professor has a problem. It is late at night and he needs to book a flight and also a hotel for the night. Thankfully there are various web sites that the professor can visit to make these bookings. These web sites almost always make use of Web Services to gather information. So the professor goes to a web site that will allow him to make a hotel reservation.
Figure 9 Foreign Traveller Information
The site gathers information from all the local hotels, and displays them to the professor so that he can make an informed choice. He also visits a web site to make the booking for his flight the following evening. Thanks to Web
34
Services, the day was saved, and the professor got a good nights rest and got home safely on the later flight the he booked using the web sites. 6.2 General Entertainment Planner In this example a user can plan his night out by finding information about nearby entertainment complexes. A user will be able to find out, for example, what movies are showing at cinema complexes and also what times they will be showing it. He can also find out the location to these cinemas from his current location. Other information that users will be able to access will include information about restaurants, pubs, clubs, bars and other entertainment hubs. This obviously means that all of this information must be obtained from various locations so that the user can plan his night. You will need information on each places location (geographical information so that the user can get maps to these places), you will also need information about the specific places (prices, atmosphere, type of place etc.) and probably some sort of translation service so that you can display the information in various languages. This once again will involve many different services from different source, and in the end the information obtained form these services, must be composed in a meaningful way. See Figure 10. 6.3 Mall Information System In this example the idea is very simple. A user wants to locate the nearest shop (specific shop like a stationary shop e.g. CNA) in his area. He also wants to know whether the shop will have what he is looking for and also how to get there. The user must be able to access this information from his home computer, as well as his mobile phone (or other mobile device). This requires that the system can find information about malls and the shops that they have. It also needs to find geographical information so that it can give the user directions to the mall. Instead of giving the user a map, the system must be able to give the user directions in a descriptive manner. This example once again needs information from different services, but this time it is on a smaller scale. The system only needs to provide the user with a list of shopping malls where the shop can be found, and directions to the nearest one (or one chosen by the user). See Figure 11.
35
Figure 10 General Entertainment Planner
Figure 11 Mall Information System
36
7 Example Scenario: Shopping Domain

Shopping Centres are being built everywhere nowadays and they are getting bigger and bigger. Many centres though do not have all the shops that you would want to visit. Although almost every major shopping centre has a web site with a store directory on it, not many of us takes the time to go onto the internet and find out what shopping centre contains a particular store. It would be much simpler to just use your cell phone to get the information about a shopping centre. Further more, not many of us know where some of the major shopping centres are. In a perfect world we all would know the direction to each one of these as well as what store each one has. But as we all should know by now, that is impossible, firstly because there are too many shopping centres, and secondly, many shopping centres evolve and change. Older stores close to make way for newer ones and thus the store directory constantly keeps changing. The proposed system that I came up with will facilitate frequent shoppers to know exactly where to go, and what they can expect. The system is in concept, very simple. The customer will use either his cell phone or his computer (or any other mobile device) to gather the required information. A program on each device will connect to the necessary services, and will return the results in a meaningful way. It will be the responsibility of the program to do error handling and recovery. Many different languages exist that can be used to describe a web service. Almost all of them are derived from XML. Depending on the type of description we want, we can describe a service using any one of the following standards: Web Services Description Language (WSDL) OWL and OWL-S
DAML and DAML-S Each one of these languages brings along with them their own unique method of describing a service. WSDL mainly describes the interface and can also contain a short description of the service. It describes the interface as a set of end-points operating on messages. These messages are described abstractly and are bound to concrete network protocols. OWL describes the semantics of the service. It is often used to describe the ontology of the
37
service, in other words, the behaviour of the service. For the example, we will use WSDL as the description language. Different work flow languages also exist. Some of the ones that were proposed are: BPEL (Business Process Execution Language) WSFL (Web Services Flow Language) XLANG (Web Services for Business Process Design) WSCI (Web Service Choreography Interface) BPML (Business Process Markup Language) BPSS (Business Process Schema Specification)
All of these languages have their own characteristics. According to van der Aalst [2003], XLANG has block-structures with basic control flow structures. WSFL on the other hand, is not limited to block-structures, and allows for directed graphs. It mainly describes Web Service composition and it considers 2 types of compositions; usage patterns and interaction patterns. Usage patterns are concerned with how to achieve a particular goal and interactive patterns are concerned with a collection of Web Services. BPEL builds on both these languages (XLANG and WSFL) and therefore supports most of the constructs supported by both languages. It uses programming abstraction that allows developers to compose multiple discrete Web Services into an end-to-end process flow. The other languages (WSCI, BPML and BPSS) are quite new and they have not yet caught on as a standard to be used for Web Services. We will use BPEL as the flow language. This has been chosen due to their ease of use, and also because my development platform (Oracle JDeveloper 10g [1]) only allows me to use these two languages. To successfully simulate the use of this system, and its capabilities to recover from a failure, the services that are used will be fake services, created by me in JDeveloper 10g [1]. These services will only return the necessary information to the system. This setup allows me to break a service, so that the system can then start the recovery process.
38
Get Shopping Center Listing
Shopping Center Listing Retrieved
No
Yes Check Sub-Goal Here
Get City Map
No
City Map Retrieved
Yes
Check Sub-Goal Here
Display Retrieved Data
Figure 12 Flow Diagram showing where Sub-Goals will be Checked
Although there are many different recovery methods, the most practical one to use when dealing with Web Services would be to use a transactionbased approach to recovery. With this approach, we can control where and when failures will be detected. We can do this by checking for certain subgoals that needs to be completed before we can continue with the processing of information. Logical places to insert sub-goals would be after each call to a
39
service. Once a service is invoked, we can check that the service has responded to our request, if it has, that particular sub-goal is complete. If it has not responded to our request, we can reissue our request, or choose to rebind to another service. Figure 12 will shows where the sub-goals will be checked. For the program, I chose to use .NET for my development environment. This is mainly due to its ease of use, but also because Web Services can be easily integrated into the code. A common way to simulate a transaction based approach in any programming language would be to use try-catch blocks, or if-statements. When using try-catch blocks, it would be very easy to pick up if an error occurred, and if one did occur, we can recover from it in the catch segment of the try-catch block. The following piece of C#-like pseudo code shows how this would look.
public void searchServices(string shop, string city, string prov){ try { string service = invokeMapService("http://aikon:9700/orabpel/ default/DummyService_1/DummyService_1?wsdl"); } catch (Exception exception) { MessageBox.Show("Error Occured during invocation of Service. Retry invocation?", "Invocation Error", MessageBoxButtons.RetryCancel, MessageBoxIcon.Error); if(button == Retry){ string service = invokeMapService("http://aikon:9700/ orabpel/default/DummyService_1/DummyService_1?wsdl"); } } } public string invokeMapService(string url) { try { invokeMapservice(parameter1, parameter2); string result = returnMapserviceresults(); } catch (Exception exception) { MessageBox.Show("Error Occured during invocation of Service", "Invocation Error", MessageBoxButtons.OK, MessageBoxIcon.Error); } return result; } Code Example 8 Pseudo code for a Transaction-based approach
40
Transaction can also be done in a similar way using if-statements. This will look almost exactly the same as the try-catch example above, but determining whether a failure occurred will be more difficult than before. 7.1 Program Demo In this section I give a demonstration of how the program works, and how it copes with failures. When the program is started, the user must input the requested data into the fields. The data that is requested are; shop name, province and city. This is shown in Figure 13. The program then goes out and finds the relevant information and displays it on the screen. Depending on the results found, the user will either get only one response (in other words only one result will be displayed and the system will automatically display the results page for this result), or the user will get the opportunity to choose from a list of results and the user must choose which one to display. Once the user has made his choice about which results to display, the program will respond by displaying the shop name, the mall name, additional information and directions on how to get there. This is shown in Figure 16. In the event that something went wrong during the invocation of the service, the program will inform the user and will ask the user how he wants to handle the situation. The user can either retry the invocation, or it can ask the program to handle the error. The program will first retry to invoke the service, after which it will try to find a new service (if one is available). In the event that something went wrong during the operations on the services, the program will make use of standard transaction-based rules to recover from the failure. This is shown in Figure 17. It can also happen that there is no possibility of recovery. This situation is shown in Figure 18.
8 Related Work
During my research, I have not come upon any research papers that deal with the classification of faults in Web Services. Many papers do, however, name some common faults that can occur. In Baresi et al. [2006] the authors name some of the faulty behaviour that can occur during deployment time, and during run time. They do not, however, try to classify them into categories.
41
Tanenbaum & van Steen [2002] do a classification of faults in distributed systems. Some of these faults are closely related to faults that can occur in Web Services and they have been included in the classification model in Section 3, but their work is focussed on distributed systems and not Web Services. A great deal of research has also gone into the detection of faults, something I did not cover in detail in this document. Ouyang et al. [2005] uses an automated tool to detect a limited set of faults by making use of Petri net analysis techniques. Their tool, WofBPEL, can detect unreachable services, services that make use of ambiguous input or output and invalid input messages to a service (in other words, messages of the wrong type for the service). Their analysis however, is done statically and the BPEL processes have to be converted into another language before it can be analysed. Baresi et al. uses two run-time methods to detect failures. DPD and SrtM can be used to detect failures when using Self-healing networks. Another detection strategy is included in ASTRO (Trainotti et al. [2005]). In ASTRO, monitors are generated automatically in Java. These monitors are used to check predefined properties of the associated processes and they will produce feedback in the event of a failure. These properties can be related back to the pre- and post-conditions of a service. When it comes to recovery methods, a lot of research has gone into this field. Both Tanenbaum & van Steen [2002] and Tartanoglu et al. [2006] classify recovery methods into two subfields namely forward and backward error recovery. Both also mention the use of transactions as a successful way to recover from failure. However, most of the research focuses on Selfhealing networks, and dynamic composition of services. Other methods are also discussed, but not as much as the Self-healing Approach. The Transaction Based approach, however, has been mentioned before in different papers and textbooks under many different names and guises. It seems to be the most logical choice when you do not want to make use of a Self-healing network (even though the two methods can be combined successfully to produce an even better recovery method). Various tools and languages have also been created to help with the composition of services. Brogi & Popescu [2005] proposed a workflow language called Yet Another Workflow Language (YAWL) that can be used to not only express the basic workflow, but also the behaviour of the
42
composition. YAWL is based on Petri nets, which makes failure detection a bit easier. When using YAWL, a service using BPEL as the workflow language and OWL as the descriptor will first need be translated into YAWL. After that, services are expanded to include control-flow constructs. These construct can then be used in the next phase to make sure that aggregated services does not have processes with unsatisfied inputs. These constructs can be seen as pre- and post-conditions of a service. If they are not met, the composition will fail. Finally, the service is deployed as normal Web Service. Their proposed strategy is a great in theory, but even though it is semiautomated, it is still an off-line strategy. Ponnekanti & Fox [2002] proposed a developer toolkit for the composition of Web Services called SWORD. Although a developer toolkit isnt anything new, their toolkit allows for the composition of services by supplying it with the necessary pre- and post-conditions. It will also generate rule based plans using these conditions as a base to work from. Pautasso & Alonso [2003] created a visual language in which a services workflow can be described using a graphical representation. Their language called BioOpera Flow Language (BFL) works very much the same as BPELs graphical notation in OBPMS. They have many of the same constructs in BFL, as well as a development environment specifically designed for BFL. All in all, a lot of research has gone into recovery and detection methods, but not a lot of research has gone into failures as such. Many researchers mention some of the failures they came across in their publications, but they do not classify them into specific classifications.
9 Conclusion
Web Services live in a very dynamic environment. Due to this environment, many things will go wrong during the lifetime of a single Web Service. This paper tries to classify some of the common failure points when using Web Services. This classification is by no means a complete classification, but only serves as a model with which certain failures can be associated. Very little research has gone into the classification of failures. Some papers try to just name them (Baresi et al. [2006]) and others try to classify them into their own classifications (Tartanoglu et al. [2006]). More research has gone into recovery from failures than into failures themselves. 43
Different recovery methods have been proposed, but some of the more popular ones have stayed in the research arena longer. Nowadays more research is going into self-healing composition of services than any other recovery method. This is partly due to its success, but also due to the fact that there are still many areas that can be improved upon in self-healing networks. Transaction-based approaches have been around for a long time and they have proven to be successful in the real world already. Some problems do persist though when using a transaction-based approach in a distributed fashion, but models have been proposed to solve this (Mikalsen et al. [2002]). Other methods also exist. Tartanoglu et al. [2006] uses a term Forward Error Recovery to classify al those recovery methods that come from the workflow language itself (all the exception handling etc.). There also exist trivial methods that are not suited to Web Services at all, like caching, that only prove to us why we need all these different recovery methods. The research field in recovery from failure is far from depleted, and a lot of research can still be done in various other related areas. Even though it was not covered in this document, a lot of research is still continuing in service discovery as well. Discovery and recovery can go hand-in-hand, especially when we look at Self-healing networks, since Self-healing networks do recovery by searching (discovering) for other services that can take over from a service that failed. Various other research fields are opening up in Web Services, and all of them have to deal with failure and recovery at some point. This document tries to show how important a formal classification of failures can be.
10 Acknowledgements
I would like to thank May Chan for her help and all the discussions regarding this topic. I would also like my supervisor, Prof. J. Bishop, for her support in guiding me in the right direction every time.
44
References
[1] [2] "Oracle BPEL Process Manager Suite 10g," Oracle. "Service-oriented architecture," Wikipedia, Available: http://en.wikipedia.org/wiki/Service_Oriented_Architecture. [Accessed: 2006/11/09 2006]. "Microsoft Visual Studio 2005," Professional Edition ed: Microsoft, 2005. Wil .M.P. van der Aalst, "Don't go with the flow: Web services composition standards exposed," IEEE Intelligent Systems, vol. 18, no. 1, pp. 72-76, Wolf-Tilo Balke and Matthias Wagner, "Towards Personalized Selection of Web Services." in Proceedings of the WWW (Alternate Paper Tracks), 2003. Luciano Baresi, Carlo Ghezzi, and Sam Guinea. "Towards Self-healing Service Compositions." in Contributions to Ubiquitous Computing, vol 42, Springer, 2006. Antonio Brogi and Razvan Popescu, "Towards Semi-automated Workflow-Based Aggregation of Web Services." in Proceedings of the ICSOC, 2005, pp. 214-227. Robert J. Brunner, Frank Cohen, Francisco Curbera, Darren Govoni, Steven Haines, Matthias Kloppmann, Benoit Marchal, K. Scott Morison, Arthur Ryman, Joseph Weber, and Mark Wutka, Java Web Services Unleashed, Sams Publishing, 2002. Paul A. Buhler, Christopher Starr, William H. Schroder, and Jos M. Vidal, "Preparing for Service-Oriented Computing: A Composite Design Pattern for Stubless Web Service Invocation." in Proceedings of the ICWE, 2004, pp. 603-604. Damian Foggon, Daniel Maharry, Chris Ullman, and Karli Watson, Programming Microsoft .NET XML Web Services, Microsoft Press, 2004. Rania Khalaf, Nirmal Mukhi, and Sanjiva Weerawarana, "ServiceOriented Composition in BPEL4WS." in Proceedings of the WWW (Alternate Paper Tracks), 2003, pp. Heiko Ludwig, Henner Gimpel, Asit Dan, and Robert Kearney, "Template-Based Automated Service Provisioning - Supporting the
[3] [4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
45
[13]
[14]
[15]
[16] [17] [18] [19] [20]
[21] [22]
[23]
[24]
Agreement-Driven Service Life-Cycle." in Proceedings of the ICSOC, 2005, pp. 283-295. Thomas Mikalsen, Stefan Tai, and Isabelle Rouvellou, "Transactional Attitudes: Reliable Composition of Autonomous Web Services," presented at International Conference on Dependable Systems and Networks, Washington D.C., USA, 2002. Chun Ouyang, Wil M.P. van der Aalst, Stephan Breutel, Marlon Dumas, Arthur H.M. ter. Hofstede, and Eric Verbeek, "WofBPEL: A Tool for Automated Analysis of BPEL Processes." in Proceedings of the ICSOC, 2005, pp. 484-489. Abhijit A. Patil, Swapna A. Oundhakar, Amit P. Sheth, and Kunal Verma, "Meteor-s web service annotation framework." in Proceedings of the WWW, 2004, pp. 553-562. Cesare Pautasso and Gustavo Alonso, "Visual composition of web services." in Proceedings of the HCC, 2003, pp. 92-99. David S. Platt, Introducing Microsoft .NET, Microsoft Press, 2001. Shankar R. Ponnekanti and Armando Fox, "SWORD: A Developer Toolkit for Web Service Composition," vol. no. pp. January~01. Mike Rosen, "BPM and SOA: Where Does One End and the Other Begin?" Available: http://www.bptrends.com. [Accessed: 2006]. Ozgur D. Sahin, Cagdas Evren Gerede, Divyakant Agrawal, Amr El Abbadi, Oscar H. Ibarra, and Jianwen Su, "SPiDeR: P2P-Based Web Service Discovery." in Proceedings of the ICSOC, 2005, pp. 157-169. Ichiro Satoh, "Location-Based Services in Ubiquitous Computing Environments." in Proceedings of the ICSOC, 2003, pp. 527-542. Andrew S. Tanenbaum and Maarten van Steen, Distributed Systems: Principles and Paradigms, International Edition. Prentice Hall, 2002, pp. 272-277. Ferda Tartanoglu, Valerie Issarny, Alexander Romanovsky, and Nicole Levy, "Dependability in the Web Services Architecture," Available: http://www-rocq.inria.fr/~tartanog/publi/wads/. [Accessed: 2006/10/05 2006]. Michele Trainotti, Marco Pistore, Gaetano Calabrese, Gabriele Zacco, Gigi Lucchese, Fabio Barbon, Piergiorgio Bertoli, and Paolo Traverso, "ASTRO: Supporting Composition and Execution of Web Services." in Proceedings of the ICSOC, 2005, pp. 495-501.
46
[25]
Tao Yu and Kwei-Jay Lin, "Service Selection Algorithms for Composing Complex Services with Multiple QoS Constraints." in Proceedings of the ICSOC, 2005, pp. 130-143.
47
Figure 13 Screenshot of Program requesting data
48
Figure 14 Busy Searching for Shops
49
Figure 15 Results Found
50
Figure 16 Displaying Results
51
Figure 17 Failure with the possibility of Recovery
52
Figure 18 Notification of Failure without the possibility of Recovery
53

Approaches To Failure and Recovery in Service Composition

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Approaches To Failure and Recovery in Service Composition

Загружено:

Авторское право:

Доступные форматы

Approaches to Failure and Recovery in Service Composition

by Petrus Johannes Steyn bisag@webmail.co.za

Department of Computer Science University of Pretoria Pretoria, South Africa

SPE780 Computer Science Honours Project

EXAMPLE SCENARIO: SHOPPING DOMAIN ............................................................. 37 7.1 PROGRAM DEMO...................................................................................................... 41

Table of Contents for Figures

Table of Contents for Examples

Keywords: Web services, composition failures, recovery methods.

2 Overview of Web Services

Figure 1 Web Services Stack

Figure 2 Flow Diagram of an availability failure

Received "clientInput" call from partner "client" More...

<scope name="shopScope"> <sequence>

" remoteFault" has been thrown. less

Code Example 1 Service not Found Exception from BPEL Console

Figure 3 Flow Diagram of a Partial Failure

Code Example 5 Time out Exception from the BPEL Server

Updated variable "invokeDummy_initiate_InputVariable" less

Invoked 1-way operation "initiate" on partner "Dummy2". less

Waiting for "onResult" from "Dummy2". Asynchronous

Code Example 6 Time out Exception shown in the BPEL Console

Figure 4 Flow Diagram of a process showing Ambiguous Output

4 Possible Recovery Methods

Figure 5 Flow Diagram of Transactional-Based Approach

Catch Branch in OBPMS

Figure 6 Catch Branch in OBPMS

Figure 7 Flow diagram of Pizza Company

Get Data from Cache

Figure 8 Flow Diagram of a Trivial Recovery Method

Figure 9 Foreign Traveller Information

Figure 10 General Entertainment Planner

Figure 11 Mall Information System

7 Example Scenario: Shopping Domain

Get Shopping Center Listing

Shopping Center Listing Retrieved

Yes Check Sub-Goal Here

Get City Map

Check Sub-Goal Here

Display Retrieved Data

Figure 12 Flow Diagram showing where Sub-Goals will be Checked

[16] [17] [18] [19] [20]

Figure 13 Screenshot of Program requesting data

Figure 14 Busy Searching for Shops

Figure 15 Results Found

Figure 16 Displaying Results

Figure 17 Failure with the possibility of Recovery

Figure 18 Notification of Failure without the possibility of Recovery

Вам также может понравиться