Middleware J2ee

RESEARCH REPORT
Comparing Microsoft .NET and IBM WebSphere/J2EE

A Productivity, Performance, Reliability and Manageability Analysis
http://www.MiddlewareRESEARCH.com
David Herst with William Edwards and Steve Wilkes September 2004 research@middleware-company.com
Disclosures
1.1 Research Code of Conduct

The Middleware Company offers the worlds leading knowledge network for middleware professionals. The Middleware Company operates communities, sells consulting and conducts research. As a research organization, The Middleware Company is dedicated to producing independent intelligence about techniques, technologies, products and practices in the middleware industry. Our goal is to provide practical information to aid technical decision making.
Our research is credible. We publish only what we believe and can stand behind. Our research is honest. To the greatest extent allowable by law we publish the
parameters, methodology and artifacts of a research endeavor. Where the research adheres to a specification, we publish that specification. Where the research produces source code, we publish the code for inspection. Where it produces quantitative results, we fully explain how they were produced and calculated. Our research is community-based. Where possible, we engage the community and relevant experts for participation, feedback, and validation. If the research is sponsored, we give the sponsor the opportunity to prevent publication if they deem that publishing the results would harm them. This policy allows us to preserve our research integrity, and simultaneously creates incentives for organizations to sponsor creative experiments as opposed to scenarios they can win. This Code of Conduct applies to all research conducted and authored by The Middleware Company, and is reproduced in all our research reports. It does not apply to research products conducted by other organizations that we may publish or mention because we consider them of interest to the community.
1.2 Disclosure
This study was commissioned by Microsoft. The Middleware Company has in the past done other business with both Microsoft and IBM. Moreover, The Middleware Company is an independently operating but wholly owned subsidiary of VERITAS Software (www.veritas.com, NASDAQ:VRTS). VERITAS and IBM have a number of business relationships in certain technology areas, and compete directly against each other in other technology areas. Microsoft commissioned The Middleware Company to perform this study on the expectation that we would remain vendor-neutral and therefore unbiased in the outcome. The Middleware Company stands behind the results of this study and pledges its impartiality in conducting this study.
1.3 Why are we doing this study? What is our agenda?

We are compelled to answer questions such as this one, due to controversy that sponsored studies occasionally create.
Page 2 of 109 .NET-WebSphere/J2EE Comparison Report Copyright 2004 The Middleware Company
First, what our agenda is not: It is not to demonstrate that a particular company, product, technology, or approach is better than others. Simple words such as better or faster are gross and ultimately useless generalizations. Life, especially when it involves critical enterprise applications, is more complicated. We do our best to openly discuss the meaning (or lack of meaning) of our results and go to great lengths to point out the several cases in which the result cannot and should not be generalized. Our agenda is to provide useful, reliable, and profitable research and consulting services to our clients and to the community at large. To help our clients in the future, we believe we need to be experienced in and be proficient in a number of platforms, tools, and technologies. We conduct serious experiments such as this one because they are great learning experiences, and because we feel that every technology consulting firm should conduct some learning experiments to provide their clients with the best value. If we go one step further and ask technology vendors to sponsor the studies (with both expertise and expenses), if we involve the community and known experts, and if we document and disclose what were doing, then we can:

Lower our cost of doing these studies Do bigger studies Do more studies Make sure we dont do anything silly in these studies and reach the wrong conclusions Make the studies learning experiences for the entire community (not just us)
1.4 Does a sponsored study always produce results favorable to the sponsor?
No. Our arrangement with sponsors is that we will write only what we believe, and only what we can stand behind, but we allow them the option to prevent us from publishing the study if they feel it would be harmful publicity. We refuse to be influenced by the sponsor in the writing of this report. Sponsorship fees are not contingent upon the results. We make these constraints clear to sponsors up front and urge them to consider the constraints carefully before they commission us to perform a study.
2 1
TABLE OF CONTENTS DISCLOSURES .................................................................................................2 1.1 Research Code of Conduct .........................................................................2 1.2 Disclosure.....................................................................................................2 1.3 Why are we doing this study? What is our agenda? ................................2 1.4 Does a sponsored study always produce results favorable to the sponsor?.......................................................................................................3 TABLE OF CONTENTS.....................................................................................4 EXECUTIVE SUMMARY ...................................................................................9 3.1 The Teams...................................................................................................9 3.2 The System ..................................................................................................9 3.3 The Implementations ...................................................................................9 3.4 Developer Productivity Results....................................................................9 3.5 Configuration and Tuning Results .............................................................10 3.6 Performance Results .................................................................................10 3.7 Reliability and Manageability Results ........................................................11 INTRODUCTION..............................................................................................12 4.1 How this Report is Organized....................................................................12 4.2 Goals of the Study .....................................................................................13 4.3 The Approach.............................................................................................14 4.4 The ITS System .........................................................................................14 4.5 Development Environments Tested ..........................................................16 4.6 Application Platform Technologies Tested................................................17 4.7 Application Code Availability......................................................................17 THE EVALUATION METHODOLOGY ............................................................18 5.1 The Teams.................................................................................................18
5.1.1 The IBM WebSphere Team ..................................................................... 19 5.1.2 The Microsoft .Net Team ........................................................................ 19
2 3
5.2 Controlling the Laboratory and Conducting the Analysis ..........................20 5.3 The Project Timeline ..................................................................................20
5.3.1 Target Schedule ..................................................................................... 20 5.3.2 Division of Lab Time Between the Teams................................................. 21 5.3.3 Detailed Schedule................................................................................... 21
5.4 Laboratory Rules and Conditions ..............................................................22

5.4.1 5.4.2 5.4.3 5.4.4 Overall Rules .......................................................................................... 22 Development Phase................................................................................ 23 Deployment and Tuning Phase................................................................ 23 Testing Phase ........................................................................................ 24
5.5 The Evaluation Tests .................................................................................24 THE ITS PHYSICAL ARCHITECTURE...........................................................25 6.1 Details of the WebSphere Architecture .....................................................27
6.1.1 6.1.2 6.1.3 6.1.4 IBM WebSphere ..................................................................................... 27 IBM HTTP Server (Apache) ..................................................................... 28 IBM Edge Server .................................................................................... 28 IBM WebSphere MQ ............................................................................... 29
6.2 Details of the .NET Architecture ................................................................29

6.2.1 Microsoft Internet Information Services (IIS) ............................................. 29 6.2.2 Microsoft Network Load Balancing (NLB) ................................................. 29
6.2.3 Microsoft Message Queue (MSMQ) ......................................................... 29
TOOLS CHOSEN ............................................................................................30 7.1 Tools Used by the J2EE Team ..................................................................30
7.1.1 Development Tools ................................................................................. 30 7.1.1.1 Rational Rapid Developer Implementation ........................................ 31 7.1.1.2 WebSphere Studio Application Developer Implementation ................. 31 7.1.2 Analysis, Profiling and Tuning Tools......................................................... 32
7.2 Tools Used by the .NET Team ..................................................................32

7.2.1 Development Tools ................................................................................. 32 7.2.2 Analysis, Profiling and Tuning Tools......................................................... 32
DEVELOPER PRODUCTIVITY RESULTS .....................................................34 8.1 Quantitative Results ...................................................................................34

8.1.1 The Basic Data....................................................................................... 34 8.1.2 .NET vs. RRD......................................................................................... 36 8.1.3 .NET vs. WSAD...................................................................................... 37
8.2 RRD Development Process.......................................................................37

8.2.1 Architecture Summary ............................................................................. 37 8.2.1.1 RRD Applications ............................................................................ 37 8.2.1.2 Database Access ............................................................................ 38 8.2.1.3 Overall Shape of the Code ............................................................... 38 8.2.1.4 Distributed Transactions .................................................................. 39 8.2.2 What Went Well...................................................................................... 39 8.2.2.1 Web Interfaces ................................................................................ 39 8.2.2.2 Web Service Integration................................................................... 39 8.2.3 Significant Technical Roadblocks............................................................. 39 8.2.3.1 Holding Data in Sessions ................................................................. 39 8.2.3.2 Web Service Integration................................................................... 40 8.2.3.3 Configuring and Using WebSphere MQ ............................................ 40 8.2.3.4 Handling Null Strings in Oracle......................................................... 40 8.2.3.5 Building the Handheld Module.......................................................... 40 8.2.3.6 Miscellaneous RRD Headaches ....................................................... 41
8.3 WSAD Development Process....................................................................42

8.3.1 Architecture Summary ............................................................................. 42 8.3.1.1 Overall Shape of the Code ............................................................... 42 8.3.1.2 Distributed Transactions .................................................................. 43 8.3.1.3 Organization of Applications in WSAD .............................................. 43 8.3.2 What Went Well...................................................................................... 44 8.3.2.1 Navigating the IDE .......................................................................... 44 8.3.2.2 Building for Deployment ................................................................... 44 8.3.2.3 Testing in WebSphere ..................................................................... 44 8.3.2.4 Common Logic in JSPs ................................................................... 44 8.3.3 Signficant Technical Roadblocks ............................................................. 45 8.3.3.1 XA Recovery Errors from Server ...................................................... 45 8.3.3.2 Miscellaneous WSAD Headaches .................................................... 45
8.4 Microsoft .NET Development Process ......................................................46

8.4.1 .NET Architecture Summary .................................................................... 46 8.4.1.1 Organization of .NET Applications .................................................... 46 8.4.1.2 Database Access ............................................................................ 47 8.4.1.3 Distributed Transactions .................................................................. 48 8.4.1.4 ASP.NET Session State .................................................................. 48 8.4.2 What Went Well...................................................................................... 48 8.4.3 Significant Technical Roadblocks............................................................. 48 8.4.3.1 Transactional MSMQ Remote Read ................................................. 48 8.4.4 Miscellaneous .NET Headaches .............................................................. 50
8.4.4.1 8.4.4.2 8.4.4.3 8.4.4.4
DataGrid Paging ............................................................................. 50 Web Services Returning DataSets.................................................... 50 The Mobile Application .................................................................... 51 Model Object Class Creation............................................................ 51
9 CONFIGURATION AND TUNING RESULTS .................................................52 10 WEBSPHERE CONFIGURAT ION AND TUNING PROCESS SUMMARY....54 10.1 RRD Round: Installing Software ................................................................54
10.1.1 Starting Point .......................................................................................... 54 10.1.2 Installing WebSphere Network Deployment .............................................. 54 10.1.3 Installing IBM HTTP Server ..................................................................... 55 10.1.4 Installing IBM Edge Server ...................................................................... 55
10.2 RRD Round: Configuring the System........................................................55

10.2.1 Configuring JNDI .................................................................................... 55 10.2.2 Configuring the WebSphere Web Server Plugin........................................ 56
10.3 RRD Round: Resolving Code Bottlenecks ................................................56

10.3.1 Rogue Threads ....................................................................................... 56 10.3.2 Optimizing Database Calls ...................................................................... 56 10.3.3 Optimizing the Web Service .................................................................... 56 10.3.4 Paging Query Results ............................................................................. 57 10.3.5 Caching JNDI Objects............................................................................. 57 10.3.6 Using DTOs for Work Tickets .................................................................. 58 10.3.7 Handling Queues in Customer Service Application.................................... 58
10.4 RRD Round: Tuning the System for Performance....................................58

10.4.1 Tuning Strategy ...................................................................................... 58 10.4.2 Performance Indicators ........................................................................... 58 10.4.3 Tuning the JVM ...................................................................................... 59 10.4.3.1 Garbage Collection.......................................................................... 59 10.4.3.2 Heap Size....................................................................................... 60 10.4.4 Vertical Scaling....................................................................................... 61 10.4.5 Database Tuning .................................................................................... 61 10.4.6 Tuning JDBC Settings ............................................................................. 61 10.4.7 Web Container Tuning ............................................................................ 61 10.4.7.1 Web Thread Pool ............................................................................ 61 10.4.7.2 Maximum HTTP Sessions................................................................ 61 10.4.8 Web Server Tuning ................................................................................. 62 10.4.9 Session Persistence ............................................................................... 62
10.5 WSAD Round: Issues ................................................................................62

10.5.1 Use of External Libraries and Classloading in WebSphere......................... 62 10.5.2 Pooling Objects ...................................................................................... 63 10.5.3 Streamlining the Web Service I/O ............................................................ 63 10.5.4 Optimizing Queries ................................................................................. 64
10.6 Significant Technical Roadblocks ..............................................................64

10.6.1 Switching JVMs with WebSphere............................................................. 65 10.6.2 Configuring Linux for Edge Server, Act 1.................................................. 65 10.6.3 Configuring Linux for Edge Server, Act 2.................................................. 65 10.6.4 Configuring Linux for Edge Server, Act 3.................................................. 67 10.6.5 Configuring JNDI for WebSphere ND ....................................................... 68 10.6.6 Edge Servers Erratic Behavior ................................................................ 69 10.6.7 Session Persistence ............................................................................... 70 10.6.7.1 Persisting to a Database.................................................................. 70 10.6.7.2 In-Memory Replication ..................................................................... 71 10.6.7.3 Tuning Session Persistence............................................................. 72 10.6.8 Hot Deploying Changes to an Application ................................................. 73 10.6.9 Configuring for Graceful Failover ............................................................. 74
10.6.9.1 10.6.9.2 10.6.9.3 10.6.9.4 10.6.10 10.6.11 10.6.12
Failover Requirements..................................................................... 75 Standard Topology .......................................................................... 75 Non-Standard Topology ................................................................... 76 Modified Standard Topology ............................................................ 77 Deploying the WSAD Web Service .................................................. 78 The Sudden, Bizarre Failure of the Work Order Application ............... 78 Using Mercury LoadRunner ............................................................. 79
11 .NET CONFIGURATION AND TUNING PROCESS SUMMARY...................81 11.1 Installing and Configuring Software...........................................................81
11.1.1 Network Load Balancing (NLB)................................................................ 81 11.1.2 ASP.NET Session State Server ............................................................... 83
11.2 Resolving Code Bottlenecks......................................................................84 11.3 Base Tuning Process.................................................................................84

11.3.1 Tuning the Database............................................................................... 84 11.3.2 Tuning the Web Applications ................................................................... 84 11.3.3 Tuning the Servers ................................................................................. 85 11.3.4 Tuning the Session State Server.............................................................. 85 11.3.5 Code Modifications ................................................................................. 85 11.3.6 Tuning Data Access Logic....................................................................... 85 11.3.7 Tuning Message Processing.................................................................... 85 11.3.8 Other Changes ....................................................................................... 85 11.3.9 Changes to Machine.config ..................................................................... 86 11.3.10 Changes Not Pursued ..................................................................... 86
11.4 Significant Technical Roadblocks ..............................................................86

11.4.1 Performance Dips in Web Service............................................................ 86 11.4.2 Lost Session Server Connections ............................................................ 86
12 PERFORMANCE TESTING ............................................................................88 12.1 Performance Testing Overview .................................................................88 12.2 Performance Test Results .........................................................................88
12.2.1 ITS Customer Service Application ............................................................ 88 12.2.2 ITS Work Order Web Application ............................................................. 89 12.2.3 Integrated Scenario................................................................................. 91 12.2.4 Message Processing............................................................................... 92
12.3 Conclusions from Performance Tests .......................................................93 13 MANAGEABILITY TESTING ...........................................................................95 13.1 Manageability Testing Overview ................................................................95 13.2 Manageability Test Results........................................................................95
13.2.1 Change Request 1: Changing a Database Query ...................................... 95 13.2.2 Change Request 2: Adding a Web Page .................................................. 97 13.2.3 Change Request 3: Binding a Web Page Field to a Database.................... 97
13.3 Conclusions from Manageability Tests......................................................98 14 RELIABILITY TESTING ...................................................................................99 14.1 Reliability Testing Overview.......................................................................99 14.2 Reliability Test Results...............................................................................99
14.2.1 Controlled Shutdown Test ....................................................................... 99 14.2.2 Catastrophic Hardware Failure Test ....................................................... 100 14.2.3 Loosely Coupled Test ........................................................................... 100 14.2.4 Long Duration Test ............................................................................... 101
14.3 Conclusions from Reliability Tests...........................................................101 15 OVERALL CONCLUSIONS ...........................................................................102 16 APPENDIX: RELATED DOCUMENTS .........................................................105 17 APPENDIX: SOURCES USED......................................................................106 17.1 Sources Used by the IBM WebSphere Team .........................................106 17.2 Sources Used by the Microsoft .NET Team ............................................106 18 APPENDIX: SOFTWARE PRICING DATA...................................................107 18.1 IBM Software............................................................................................107 18.2 Microsoft Software ...................................................................................108
EXECUTIVE SUMMARY
This study compares the productivity, performance, manageability and reliability of an IBM WebSphere/J2EE system running on Linux to that of a Microsoft .NET system running on Windows Server 2003.
3.1 The Teams

To conduct the study, The Middleware Company assembled two independent teams, one for J2EE using IBM WebSphere, the other for Microsoft .NET. Each team consisted of senior developers similarly skilled on their respective platforms in terms of development, deployment, configuration, and performance tuning experience.
3.2 The System

Each team received the same specification for a loosely-coupled system to be developed, deployed, tuned and tested in a controlled laboratory setting. The system consisted of two Web application subsystems and a handheld device user interface, all integrated via messaging and Web services.
3.3 The Implementations

The WebSphere team developed two different implementations of the specification, one using IBMs model-driven tool Rational Rapid Developer (RRD), the other with IBMs code-centric tool WebSphere Studio Application Developer (WSAD). The .NET team developed its single implementation using Visual Studio.NET as the primary development tool.
3.4 Developer Productivity Results

In the development phase of the study, the time it took each team to complete the initial implementation (including installing all necessary development and runtime software) was carefully measured to determine overall developer productivity. The .NET implementation was completed significantly faster than the RRD implementation, and also faster than the WSAD implementation.
.NET vs. RRD
.NET vs. WSAD Development Productivity
RRD vs. WSAD
Significantly better.
Greatest difference in
Better; uncertain how much.*
Worse, uncertain how much. *
product installs; less to install/configure on Windows Server side.

Also differences in
developer productivity for all subsystems.

.NET team had longer
history w/ VS than J2EE team w/ RRD. * The team using WSAD had already built the same application in RRD, and hence realized productivity advantages not realized for RRD or .NET, since they were familiar with the specification.
3.5 Configuration and Tuning Results

After developing their system, each team was measured in how long they took to configure and tune it in preparation for a series of performance, manageability, and reliability tests. The .NET team completed this stage in significantly less time than the WebSphere team for the RRD implementation. The .NET team took 16 man days for configuration and tuning, while the WebSphere team took 71 man days (much of it spent addressing software installation issues and patching the operating system, however). Later, when they deployed the WSAD implementation to the existing WebSphere infrastructure, the WebSphere team spent an additional 24 man days tuning and configuring. .NET vs. RRD .NET vs. WSAD Tuning Productivity Significantly better.
Huge part of RRD time
RRD vs. WSAD
Better; uncertain how much.

Since complete runtime
Uncertain.
taken patching Linux, dealing with Edge Server issues. Significant time also spent re-working RRDgenerated code to get better performance.
platform had already been installed/configured and tuned for the RRD implementation, this stage was completed much more quickly for WSAD.
3.6 Performance Results

In a battery of performance tests, the .NET implementation running on Windows Server 2003 significantly outperformed the RRD implementation running on Linux in four tests. Compared to the WSAD implementation, the .NET version performed about equally well overall, doing better on some tests and not as well on others.
.NET vs. RRD
.NET vs. WSAD Performance
RRD vs. WSAD
Significantly better on 3 of 4 tests.

.NET achieved user
About equal.
.NET achieved higher
Significantly worse on all 4 tests.
throughput 66-123% higher than RRD in 3 tests. th In 4 test, .NET achieved 40% higher message processing thruput.
user throughput in 1 test, slightly higher in 1, worse in 1.

In the 4 test, WSAD
th
achieved nearly 3 times the message processing thruput.
3.7 Reliability and Manageability Results

Manageability and reliability tests revealed better results for .NET on Windows Server 2003; it significantly surpassed the two J2EE implementations on Linux in terms of deploying changes under load, gracefully downing servers, and handling catastrophic failover. In terms of sustained, long-term operation under normal load, all three implementations performed equally well. .NET vs. RRD .NET vs. WSAD Manageability Significantly better.
.NET had many fewer
RRD vs. WSAD
.NET had many fewer
Better.
RRD had many fewer
errors during deployment. .NET slightly faster to deploy.

.NET preserved sessions
errors during deployment. .NET slightly faster to deploy.

.NET preserved sessions
errors during deployment. RRD preserved sessions more reliably.

Time to deploy about the
more reliably.
more reliably.
same.
Reliability: Handling Failover Significantly better.

RRD implementation:
WSAD implementation
Worse.
RRD implementation:
Could not add server to cluster after graceful shutdown. RRD implementation could not handle catastrophic failover.
could not handle catastrophic failover.
Could not add server to cluster after graceful shutdown.
Reliability: Sustained Operation Over 12 Hour Period Under Moderate Load Equal. Equal. Equal.
INTRODUCTION
Previous studies by The Middleware Company have compared tools or platforms on the basis of one criterion or another, such as developer productivity, ease of maintenance or application performance. This study compares two enterprise application platforms, Microsoft .NET and IBM WebSphere/ J2EE, across a full range of technical criteria: developer productivity, application performance, application reliability, and application manageability. Although sponsored by Microsoft, the study was conducted independently, in a strictly controlled laboratory environment, with no direct vendor involvement by either Microsoft or IBM. The Middleware Company cannot emphasize enough that Microsoft had no control over the development, testing, and results of the study, and we firmly stand by those results as accurate and unbiased. Towards that end, TMC has published the methodology used and the source code for both the .NET and J2EE application implementations for public download and scrutiny. Customers can review and comment on the methodology, examine the code, and even repeat the tests in their own testing environment.
4.1 How this Report is Organized

This report covers every aspect of the Microsoft.NET IBM WebSphere/J2EE Comparison Study: its purpose and methodology, participants, rules and procedures, schedule and working conditions; not to mention the results, both qualitative and quantitative. Section 1 discloses the conditions under which The Middleware Company conducted this study, including our research code of conduct and our policy regarding sponsored studies such as this one. Section 3 gives a brief, high-level summary of the study and its results. Section 4 (this section) introduces the study. It describes:

The goals we tried to achieve with the study The unique overall approach that the study takes The software system that the two teams developed and tuned The development environments that were tested The technologies of the .NET and WebSphere platforms that were tested What study artifacts are available and how to obtain them
Section 5 covers the studys methodology in detail:

The composition of the two teams The independent auditor who controlled the study conditions and conducted the analysis The project schedule The rules and conditions that governed the two teams in the laboratory A summary of the tests conducted
Section 6 details the physical architecture of the system:

The hardware infrastructure used by both teams The software infrastructure that each team installed
Section 7 describes the tools that each team used during the different phases of the study. In particular, since the J2EE team built two implementations of the system using two different development tools, this section compares the two IDEs. Section 8 presents the developer productivity results:

The quantitative results broken out by core development tasks The qualitative experiences of the two teams developing each of the three implementations, including important technical roadblocks
Sections 9-11 present the deployment, configuration and tuning results:

Section 9 lays out the quantitative results Section 10 describes the WebSphere teams experience, including significant technical roadblocks Section 11 describes the .NET teams experience, including significant technical roadblocks
Section 12 presents the results of the performance tests Section 13 presents the results of the manageability tests Section 14 presents the results of the reliability tests Section 15 presents our conclusions The final sections, from 16 on, contain various appendices:

Where to find documents related to this report Important sources used by both teams Pricing data on the software used in this study
4.2 Goals of the Study

Commentary abounds about the technical merits of both J2EE and Microsoft .NET for enterprise application development. The Middleware Company in particular has conducted various studies in the past to compare these two enterprise platforms. Some of these studies, such as Model Driven Development with IBM Rational Rapid Developer, address developer productivity. Others, such as J2EE and .NET Application Server and Web Services Performance Comparison, focus on performance. None, however, has spanned a wider set of technical criteria that includes not only productivity and performance, but application platform manageability and reliability as well. This study is the first of its kind to measure all of these criteria, using a novel evaluation approach. While we expect the study to spark controversy, we also hope it will fulfill two important goals:

Provide valuable insight into the Microsoft .NET and IBM WebSphere/J2EE development platforms. Suggest a controlled, hands-on evaluation approach that organizations can use to structure their own comparisons and technical evaluations of competing vendor offerings.
4.3 The Approach

This study took the approach of simulating a corporate evaluation scenario. In it, a development team is tasked with building, deploying and testing a pilot B2B integrated application in a fixed amount of time, after which we evaluate the results the team was able to achieve in this time period. In the study we executed the scenario three times, once using Microsoft .NET 1.1 running on the Windows 2003 platform, and twice using IBM WebSphere 5.1 running on the Red Hat Enterprise Linux AS 2.1 platform. (The latter two cases differed in the development tool used; more on this in Section 4.5.) We assembled two different teams, one for each platform, each similarly skilled on their respective platforms. Each team consisted of senior developers experienced in enterprise application architecture, application development, and/or performance tuning. The rules limited each team to no more than two members in the lab at any time, but did not require the same two members for all phases of the exercise. The IBM WebSphere/J2EE team consisted of three senior developers from The Middleware Company with 16 years combined experience in J2EE. The same two of these developers built both J2EE implementations, and all three participated at different times in the deployment, tuning and testing phases. For the installation, deployment and initial tuning of the WebSphere platform, the J2EE team also used two independent, WebSphere-certified consultants having a total of 7 years experience with the WebSphere platform. The Microsoft .NET team consisted of three senior developers from Vertigo Software, a California-based Microsoft Solution Provider, with a combined 10 years experience building software on Microsoft .NET. The Middleware Company took pains to keep the study free of vendor influence:
We subcontracted CN2 Technology, a third-party testing company, to prepare the application specification, set up the testing lab, audit the development process for each team, and independently perform the actual application testing for each platform. The teams did not communicate with each other during the study. Neither team had knowledge of the other teams results until after the study was completed. Neither Microsoft nor IBM had any influence over the development teams during the study.
It is important to note that this study represents what the development teams could achieve using only publicly available technical materials and vendor support channels for their platform. It does not represent what the vendors themselves might have achieved, nor what each team might have achieved if given a longer development and tuning schedule or allowed direct interaction with vendor consultants. Therefore, the resulting applications developed by the two teams may not fully represent vendor best practices or vendor-approved architectures. Rather, they reflect what customers themselves might achieve if tasked with independently building their own custom application using publicly available development patterns, technical guidance and vendor support channels.
4.4 The ITS System

The comparison at the heart of the study centers around the development and testing of a loosely coupled system known as ITS. ITS is a facilities management system created for the fictitious ITS Facilities Management Company (ITS-FMC). The system represents a B2B integration scenario, allowing corporate customers of ITS-FMC to use a Web-based hosted
application to create and track work order requests for facilities management on their corporate premises. The ITS system comprises three core subsystems that operate together in in both a loosely coupled fashion (vi a messaging) and a tightly coupled fashion (via synchronous Web Service requests):
The ITS Customer Service Application. ITS-FMCs corporate clients use this Web-based application to create and track work order requests for facilities management at their premises. The application automatically dispatches work order requests via messaging to the central ITS system, which operates across the Internet on a separate ITS-FMC internal network. The ITS Customer Service Application also allows customers to track the status of their work orders via Web service calls to the ITS central system, as well as view/modify customer and user information. The ITS Central Work Order Processing Application. This application is operated by ITS-FMC itself on a separate corporate network. The application receives incoming work order requests (as messages) from the ITS Customer Service Application. It places the requests into a database for further business processing, including assignment to a specific on-site technician. The application hosts the Web service that returns work order status and historical information to the ITS Customer Service Application. Additionally, this application has a Web user interface that ITS -FMCs central dispatching clerks can use to search, track and update work order requests, as well as query customer information and query/modify technician data. The Technician Work Order Mobile Device Application. This application operates on a handheld device, allowing technicians to retrieve their newly assigned work items and update work order status as they complete their work orders at the customer premises. Technicians use this application for dispatching purposes, and to log the time spent working on an issue so that customer billing can occur.
The following diagram illustrates these three subsystems and their interactions:
Technician Mobile Device Application
ITS Corporate Network B2B Internet Connectivity
ITS Customer Service Application
ITS Work Order Message Queue Server
ITS Work Order Processing Application
ITS Customer Service Database
ITS Durable Message Queue
ITS Work Order Processing Database
Figure 1. ITS Connected System Diagram
4.5 Development Environments Tested

In a study that focuses on both developer productivity and application performance, the development environment can influence the studys outcome as much as the deployment platform. The choice of development environment affects how quickly and easily the developer can

build the application initially alter the code to eliminate performance bottlenecks
While there are.NET development tools from third party vendors such as Borland, the vast majority of .NET development is done using Visual Studio.NET from Microsoft. This is the development environment that the .NET team used to produce its implementation. The J2EE world, on the other hand, offers many competing development tools with different approaches and advantages. Even within IBMs domain, choices exist. To reflect this range of offerings and enhance the studys usefulness, we had the J2EE team develop two different implementations of the ITS system using two different IBM tools: Rational Rapid Developer (RRD) and WebSphere Studio Application Developer (WSAD). Since both IDEs belong to IBM and are designed to work well with WebSphere, they are both consistent with the studys focus on the IBM WebSphere platform. But the two IDEs have important differences that ultimately led to different results.
Details on these tools, how they compare, how they were used, and other development software used with them can be found in Section 7.
4.6 Application Platform Technologies Tested

The ITS system tests the following functionality of the two application platforms:

Web application development Web application configuration/tuning Web application manageability, reliability and performance Message-based application development Message queue reliability and performance Mobile device application development
4.7 Application Code Availability

The application code for both the .NET and J2EE implementations can be downloaded from http://www.middlewareresearch.com. Customers can download the applications and install them in their own environments for further testing and confirmation of the results. The discussion forum for the study is located at http://www.theserverside.com. Finally, customers and vendors can email The Middleware Company to discuss the report and propose further testing or offer comments by emailing to: research@middlewareresearch.com.
THE EVALUATION METHODOLOGY
This study was designed to simulate two enterprise development teams given a fixed amount of time to build and tune a working pilot application according to a set of business and technical requirements. One team developed the application using IBM WebSphere running on Linux, while the other team developed the application using Microsoft .NET running on Windows 2003. Development took place in a controlled laboratory environment where the time taken to complete the system was carefully measured. The two teams worked from a common application specification derived from a set of business and technical requirements. Neither team had access to the specification until development started in the controlled lab setting. After developing an implementation, the team then tuned and configured it as part of a measured deployment phase. Each implementation was then put through a set of basic performance, manageability and reliability tests while running under load on the production equipment. Hence this study not only compares the relative productivity achieved by each development team, but also captures the base performance, manageability and reliability of each application in a deployed production environment. It is extremely important to note that the study allocated a fixed amount of time to each phase of the project, and hence objectively documents what each team was able to achieve in this 1 fixed amount of time. The study objectively documents exactly what each team was able to achieve, inclusive of detailed notes documenting technical roadblocks encountered by each team, and how these were resolved. As such, the study tells an interesting story that will undoubtedly spark much debate, but also shed valuable light on each platform based on actual hands-on development and testing of a pilot business application.
5.1 The Teams

Each team fielded two developers skilled in their respective development platform, and each team was selected such that their product experience levels and skill sets matched as closely as possible. As noted in Section 4.3, each team could have only two members in the lab at one time, but did not have to use the same two throughout the exercise. Neither team included any representative from either IBM or Microsoft, and neither team was allowed any direct interaction with vendor technicians from IBM or Microsoft other than the standard online customer support channels available to any customer. In cases where a team used a vendor support channel, support technicians were not told they were assisting a research project conducted by The Middleware Company; so the team received only the standard treatment afforded any developer on these channels. To mirror the development process of a typical corporate development team, we allowed the teams to consult with other members of their organizations outside the lab, to answer technical questions and provide guidance as required. Such access to external resources was monitored and logged, and we extended the rule prohibiting direct vendor interactions (other than with standard customer support channels) to all resources contacted during the development and testing phases of the project. Here are details on the makeup and experience of the two teams.
Note that under certain circumstances we allowed a team to go beyond that fixed time period. See section 5.3.1 for details.
5.1.1 The IBM WebSphere Team The WebSphere team consisted of three developers from The Middleware Company, described in the following table. Members A and B developed both the RRD and WSAD implementations, while all three members participated at different times in the tuning and testing phases. J2EE Team Members from The Middleware Company Team Member A Development Experience (years) 14 Java Experience (years) 7 J2EE Experience (years) 4
Other Relevant Experience Broad experience with development tools and platforms. Particular strength in design. Experienced in RRD, modeling and design. Extensive experience in tuning enterprise applications for performance.
B C
15 23
8 8
6* 6*
* Includes experience with Java servlet API predating the introduction of J2EE in 1999. Additionally, the J2EE team used two independent, IBM-certified WebSphere consultants at different times during the deployment and tuning phase.

One had three years experience as a WebSphere administrator on various Unix platforms, including Linux. The other had over four years experience installing, configuring and supporting IBM WebSphere on multiple platforms, including Linux.
5.1.2 The Microsoft .Net Team The .NET team consisted of three senior developers from Vertigo Software, a California-based Microsoft Solution Provider, with the following credentials: .NET Team Members from Vertigo Software Development Experience (years) 7 Microsoft Platform Experience (years) 7 .NET Experience (years) 3
Team Member A
Other Relevant Experience Experienced in Web application development and design Experienced in design in the presentation, business, and database tiers Experienced in development and performance tuning
13
13
5.2 Controlling the Laboratory and Conducting the Analysis

The Middleware Company subcontracted a third-party testing organization, CN2 Technology, to write a specification for the ITS system, set up the lab environment, design the tests, monitor and control the testing environment, and conduct the actual tests of the J2EE and .NET implementations. CN2 strictly monitored the time spent by each development team on the various phases of the project, and controlled the lab environment. CN2 also strictly monitored Internet access and email access, including logging all such access from within the lab, to ensure that neither team violated the rules of the lab. For details on those rules, see Section 5.4.
5.3 The Project Timeline

5.3.1 Target Schedule This study was designed with the objective that each team would complete its work in 25 workdays (five calendar weeks), distributed as follows: Phase Phase 1 Phase 2 Phase 3 Description Development Deployment and tuning Formal evaluation testing Days 10 10 up to 5 (as needed)
While we felt confident that the teams could complete Phases 1 and 3 in the allotted time, we were less certain about Phase 2. If, after ten days of deployment and tuning, the implementation did not perform up to even minimal standards, the results of formal testing in Phase 3 would have little meaning. So we added a requirement that each team continue their configuration and performance tuning until satisfied that their implementation would perform well enough to actually undergo the tests in the final week. This meant that each team was allowed to go beyond their allotted ten days if they desired, with the understanding that all time spent would be monitored and reported.
5.3.2 Division of Lab Time Between the Teams To keep the two teams from communicating with each other, while at the same time preserving the continuity of their work, we interleaved their time in the lab in the following sequence: .NET Team Implementation .NET Phase 1: Development RRD .NET .NET 2: Deployment / tuning 3: Evaluation testing RRD RRD WSAD * WSAD WSAD 2: Deployment / tuning 3: Evaluation testing 1: Development 2: Deployment / tuning 3: Evaluation testing 1: Development J2EE Team Implementation Phase
*Note that the J2EE team developed the WSAD implementation offsite, not in the controlled lab environment. This implementation was not in the initial scope of the project, but was added to ensure that the performance, reliability and manageability tests painted a more complete picture of J2EE/WebSphere for the community.
5.3.3 Detailed Schedule The following table documents the desired project schedule including the schedule goals established for the development, tuning/configuration and testing of each implementation. As explained in Section 5.3.2, the two teams occupied the lab at different times, so this schedule was repeated for each implementation. Desired Development and Testing Schedule (established prior to start of exercise) Schedule Timeline Task/Event Description Phase 1: Development Day 1 Day 1 (1 hour) Development team arrives in lab. Overview of lab rules and hardware environment. Team was introduced to the lab environment for the first time, lab rules were explained, and a walkthrough of the hardware was conducted. CN2 Technology provided a detailed walkthrough of the application specification and answered initial questions about the specification.
Day 1 (2 hours)
Development team given application specification for first time. Two hour application specification overview with Q&A.
Desired Development and Testing Schedule (established prior to start of exercise) Schedule Timeline Day 1 Task/Event Application specification review, development tool and application server setup. Application development. Description Team reviewed the application specification in detail, and created a strategy for dividing the work and beginning development. Team developed the application according to the provided specification. All development time in the lab was carefully tracked for each component of the system. CN2 deemed development completed when the implementation passed a series of functional tests.
Days 1-10
Phase 2: Deployment and Tuning Day 11 Review of base performance, manageability and reliability tests and requirements, including review of Mercury Load Runner test scripts and test tool. CN2 reviewed with team the tests to be performed and technical requirements/goals for these tests. CN2 provided a walkthrough of the Mercury LoadRunner testing environment and base test scripts so the team could begin configuring and tuning. Ten 8-hour days were initially allotted for tuning in preparation for evaluation tests. However, the team was allowed more time if required to ensure they felt ready to conduct the actual tests.
Days 11-20+
Application performance and configuration tuning.
Phase 3: Evaluation Testing Days 21-25 Performance, manageability and reliability tests conducted. Performance, manageability and reliability tests were conducted in the lab and results logged.
5.4 Laboratory Rules and Conditions

This section describes the conditions each team faced as they started each phase of the study and the various rules governing their behavior inside and outside the laboratory environment. 5.4.1 Overall Rules Several rules applied to the entire exercise:

Team members could only use the provided machines for development work and Internet access. Personal laptops were barred from the lab. Each day was limited to 8 hours working time in the lab, with an additional hour for lunch. The team could seek technical support and guidance from other members of their organization outside the lab as required. They could communicate via telephone or email.
Neither team members nor their offsite colleagues could have any interaction with vendor technicians from IBM or Microsoft, other than through standard online customer support channels. If they did use vendor support channels, team members could not reveal that they were participating in a study involving IBM and Microsoft software; they received only the standard treatment afforded any developer on these channels.
Note, however, that the WSAD implementation was developed after the RRD implementation, and was developed offsite, not in the controlled lab environment. 5.4.2 Development Phase When a team entered the lab for the first time, they were given the following initial environment:

A development machine for each developer, pre-configured with Windows XP and Internet access. Two machines with the two ITS databases pre-installed and pre-populated with data. The database server was Microsoft SQL Server for the .NET team, Oracle for the WebSphere team. Four application server machines pre-configured with the base OS installation only (Windows Server 2003 for the .NET team, Red Hat Enterprise Linux 2.1 for the WebSphere team).
As for augmenting or modifying this initial environment, both teams were under the same restrictions:

They had to install/configure their development environment (tools, source control, etc) as part of the measured time to complete the application development phase. They had to install the application server software separately on each server as part of the measured development time. They could not make changes to the database schemas, other than adding functions, stored procedures, or indexes.
This rule applied specifically to coding of the RRD and .NET implementations:
Team members were not allowed to work on code outside the lab. This meant they could not remove code from or bring code into the lab.
For all implementations (RRD, WSAD and .NET) this rule applied:
We allowed use of publicly available sample code and publicly available pre-packaged libraries, since a typical corporate development team would also have access to such code.
5.4.3 Deployment and Tuning Phase When Phase 2 began, the CN2 auditor introduced the test environment that the team would be using. This environment consisted of:

Mercury LoadRunner to simulate load A dedicated machine for the LoadRunner controller Some 40 additional machines to generate client load Base tests scripts created by CN2 so that the development team did not have to spend time doing so
5.4.4 Testing Phase The rules for Phase 3 were the most restrictive, since this phase consisted of the formal evaluation tests conducted by the CN2 auditor:

Team members could not modify application code or system configurations except as needed during a test. After a load test was launched, the team would have to leave the lab until the test reached completion (typically 1-4 hours later).
5.5 The Evaluation Tests

During Phase 3 the CN2 auditor conducted a variety of tests to measure manageability, reliability and performance of the implementations. Some of these tests required the active participation of the teams; others did not. Most of the tests were performed under load. As mentioned above, in this study Mercury LoadRunner running on 40 client machines was used to simulate load. CN2 provided the teams with a set of LoadRunner scripts for each implementation. The three sets of scripts were carefully constructed to perform the same set of actions, ensuring that they tested the exact same functionality for each implementation in a consistent 2 manner. Here is a summary of the tests performed; for more details and for test results see Sections 12 to 14.

Performance capacity (stress test). How many users can the system handle before response times become unacceptable or errors occur at a significant rate? Performance reliability. Given a reasonable load (based on the results of the stress test), how reliably does the system perform over a sustained period (say, 12 hours)? Efficiency of message processing. How quickly can the Work Order module process a backlog of messages in the queue? Ease of implementing change requests. How quickly and easily can a developer implement a requested change to the specification? Ease and reliability of planned maintenance. How easily and seamlessly can system updates be deployed to the system while under load? Graceful failover. How well does the clustered Customer Service module respond when an instance goes down. Session sharing under load. If one of the clustered Customer Service instances fails under load, are the sessions that were handled by the failed instance seamlessly resumed by the other Customer Service instance?
CN2 could not provide a single set of scripts for all three implementations because the three differed in certain low-level details, such as the URL of a given page, the names of fields in that page and whether that page was to be invoked with GET or POST.
THE ITS PHYSICAL ARCHITECTURE
This section describes the hardware and software infrastructure each team used to run its implementation of the ITS system. The specification required that the teams deploy to identical hardware; in fact, they used the same hardware. On the machines hosting the applications and the message server, each team had its own removable hard drive that was swapped in. On the machines hosting databases, the two teams DBMSs shared the same drive, but were never run simultaneously. In this way all three implementations used the very same processors, memory and network hardware. On the software side, the teams started with the operating systems and database engines already installed. They were responsible for installing the application server, message server, load balancing software and handheld device software.
This table lists the hardware and software used each by each team: ITS Subsystem Customer Service application Work Order Processing application Dedicated durable message queue server
Servers 2 (identical, loadbalanced) 1
Hardware Hewlett Packard DL580 with 4 1.3 GHz processors, 2GB of RAM and Gigabit networking
.NET Software
Windows Server
J2EE Software
Red Hat Enterprise
2003
.NET 1.1
Linux AS 2.1
IBM WebSphere
development framework and runtime (part of Windows Server 2003)

Windows Server
Network Deployment 5.1
Red Hat Enterprise
2003 Microsoft MSMQ (part of Windows Server 2003)
Linux AS 2.1 IBM MQSeries IBM WebSphere Deployment Manager IBM Edge Server
Windows Server
Customer Service database Work Order database
Hewlett Packard DL760 with 8 900 MHz processors and 4 GB RAM attached to a SANs network storage array with 500 GB of storage in a RAID 10 configuration Hewlett Packard iPAQ 5500 PocketPC
Windows Server
2003 SQL Server 2000 Enterprise Edition
2003 Oracle 9i Enterprise Edition
Technician Mobile Device application
n/a
.NET Compact Framework
Insignia Jeode JVM Mobile Information
Device Profile (MIDP) 2.0
The following diagram shows the physical deployment of the ITS system to the network, including all the machines listed above. It also shows the machine hosting the Mercury LoadRunner controller and the 40 machines providing client load.
Figure 2. ITS Connected System Physi cal Deployment Diagram
6.1 Details of the WebSphere Architecture

The basic WebSphere infrastructure described below was used with both J2EE implementations. The J2EE team installed it during the RRD round and did not substantially change it during the WSAD round. 6.1.1 IBM WebSphere The team used WebSphere Network Deployment (ND) Edition version 5.1. This version of WebSphere has the same core functionality as basic WebSphere, but allows for central administration of multiple WebSphere instances across a network. It also allows instances to be clustered for the purpose of application deployment, so that, for example, one can deploy the Customer Service application to two WebSphere instances at once.
Initially the team included three nodes in the WebSphere network: the two Customer Service machines and the single Work Order machine. Later they included the Message Queue Server machine as well, so that they could run a WebSphere instance there for sharing session state in the Customer Service application. WebSphere ND includes a Deployment Manager, a separate server dedicated to system administration. This server communicates with node agents on each node to handle remote deployment and configuration. The team installed the Deployment Manager on the same host as the MQ server. In terms of WebSphere instances, the team started with one per node. Along the way they experimented with multiple instances per node (for example, to run each Work Order module in a dedicated instance), but found no improvement and returned to the original configuration. 6.1.2 IBM HTTP Server (Apache) WebSphere has an HTTP transport listening on port 9080 that acts as a Web server. This transport is adequate for development and for running under very light loads, but cannot handle the traffic of even moderate loads. For this reason the team needed a separate Web server. Even though Red Hat Linux includes an Apache Web server distribution, the team chose to install IBM HTTP Server (IHS) 2.0, IBMs distribution of the Apache Web server. Using an external Web server necessitates the use of IBMs Web Server Plugin, an interface between the Web server and the WebSphere HTTP transport. The plugin consists of a native runtime library and an XML configuration file, plugin-cfg.xml. Applying the plugin consists of these steps: 1. 2. 3. Modify Apaches httpd.conf file to load the plugin library. Modify Apaches httpd.conf to point to the plugin configuration file. From within WebSpheres Deployment Manager, automatically update the plugin configuration file and copy it to the ND nodes. Normally this process takes only a few seconds but must be done every time there is a change in the configuration of an application using the Web (such as the name or location of the Web application). Bounce IHS if the configuration file has changed.
4.
Note that along the way the team found reason to customize the plugin configuration in ways not possible through the Deployment Manager. That meant they departed from the normal plugin configuration update process described in Step 3. For details, see Section 10.6.9.4. 6.1.3 IBM Edge Server The WebSphere team thought carefully about how to handle load balancing to and failover between the two Customer Service instances. One simple solution is a DNS round-robin arrangement, where a DNS server takes incoming requests to a single cluster IP address and distributes them evenly between the addresses of the two Customer Service machines. This solution addresses load balancing, but not failover. To handle both, the WebSphere team decided to use IBMs preferred solution, Edge Server. This component sits in front of the Web servers and balances load among them. But it also monitors the health of the Web servers and channels traffic away from one that fails. The team installed Edge Server on the MQ server host, because that machine was guaranteed not to go down. Then they had to configure that host and the Customer Service hosts at the operating system level for Edge Server to work properly. These configuration requirements led to some of the most vexing problems faced by the WebSphere team, as discussed in Section 10.6.
6.1.4 IBM WebSphere MQ The WebSphere team used IBMs WebSphere MQ Series for its message server. MQ was installed on the host designated for that purpose. Using it also required that host to have an instance of WebSphere, whose JMS server acts as a front end for MQ.
6.2 Details of the .NET Architecture

For the servers, the .NET team required no software other than Windows Server 2003. With a base installation of Windows Server 2003, enabling Application Server mode installs and configures the .NET Framework, Internet Information Services, MSMQ, and all other components that the .NET team needed to build their implement ation. The .NET Framework has built-in support for Web services and message queuing, which enabled the team to provide integration between the Customer Service and Work Order applications. 6.2.1 Microsoft Internet Information Services (IIS) Microsoft Windows Server 2003 comes with Internet Information Services (IIS) version 6.0. Like Apache for Linux, IIS 6.0 is a widely used Web server for Windows Server 2003. ASP.NET, the Web application engine for .NET applications, is integrated directly with IIS 6.0. In addition, Visual Studio enables developers to deploy applications to production servers or staging servers directly from their development machines, a feature that the .NET development team utilized during development. 6.2.2 Microsoft Network Load Balancing (NLB) One of the requirements for the ITS Customer Service application was to support load balancing and failover. Microsoft Network Load Balancing (NLB) Service is designed to provide this functionality. Built into Windows Server 2003, this service balances the load among multiple Web servers (in this case, two) and monitors their health, providing sub-second failover from a server that fails. The .NET team had to configure NLB on the two servers that hosted the Customer Service application using the graphical configuration tools built into Windows Server 2003. The details on how the .NET team configured NLB for the ITS system are found in the Section 11.1.1. 6.2.3 Microsoft Message Queue (MSMQ) The ITS specification also required loosely coupled integration between the Customer Service and Work Order applications via a message-driven architecture. The .NET team used Microsoft Message Queue (MSMQ) to satisfy this requirement. Like IIS and NLB, MSMQ also comes built into Microsoft Windows 2003. The .NET team had to enable MSMQ and create and configure the queues for the application. .NET provides classes for accessing and manipulating the queues. As per the specification, a separate, dedicated queue server was used for message queuing, with the Customer Service application writing to the remote queue on this server, and the Work Order application reading messages from this remote queue for processing.
TOOLS CHOSEN
Each team had the freedom to choose any development, analysis, profiling and support tools they wished to complete their work for their platform. This section describes the various tools they chose.
7.1 Tools Used by the J2EE Team

7.1.1 Development Tools The J2EE team had a broad choice of development environment. To give a better overview of the tradeoffs between different types of tools, two implementations for WebSphere were built: one using IBMs Rational Rapid Developer (RRD), the second using IBM WebSphere Studio Application Developer (WSAD). RRD is a a model driven, visual tool that provides O/R mapping and data binding technology and generates J2EE code from visual constructs. WSAD is a more mainstream J2EE development tool dedicated to WebSphere. These two IDEs have important differences that pertain to this study:

RRDs approach emphasizes developer productivity. But the code it generates is not optimized for performance and does not lend itself to manual tuning. WSADs approach requires the developer to write much more code manually, but gives the developer complete freedom to optimize that code. While both tools work well with WebSphere, WSAD integrates more tightly and provides a lightweight version of WebSphere for development testing.
This table compares the two IDEs in greater detail: Comparing Rational Rapid Developer (RRD) and WebSphere Studio Application Developer (WSAD) as Development Tools Aspect of Development Approach to J2EE development RRD Takes a model-driven approach that removes you, the developer, from the J2EE platform by several degrees. Has you model your classes, pages, components, messages and business logic in its own format; then it generates Java and JSP code for you. That generated code becomes just another product of the tool which, like generated deployment descriptors, you would normally not touch, much less edit. WSAD Takes a conventional approach in that you must write and manage all the Java and JSP code directly. WSAD may offer templates or wizards to get you started on a particular coding path, but you must still handle the resulting code.
Comparing Rational Rapid Developer (RRD) and WebSphere Studio Application Developer (WSAD) as Development Tools Aspect of Development Approach to page development RRD Has you place controls in a page design space, then bind them to data objects from your class model. Each page is served by its own subset of classes and attributes from the model. Supports a number of popular platforms, including WebSphere, WebLogic and Apache Tomcat. For development purposes IBM recommends deploying to the much lighter-weight Tomcat platform, then at the end regenerating for and deploying to WebSphere. WSAD Again, more conventional: You write business logic code to be used in standard JSPs, then write the JSPs themselves. If desired you can use Struts. Dedicated to WebSphere. You can deploy your application directly to a WebSphere instance. Also includes a WebSphere test environment (a lightweight version of WebSphere), that speeds development. Lets you configure your target platform, whether the WebSphere test environment or a real WebSphere instance, through the IDE. Conversely you can also configure WSADs test environment through a standard WebSphere admin console just as you would the real WebSphere.
Deployment platform for development
Configuring WebSphere
Has platform settings for WebSphere that let you specify JDBC datasources, JMS message queues and other critical resources. But these settings affect the application only, not the target platform. You must still configure WebSphere directly.
7.1.1.1 Rational Rapid Developer Implementation For their first implementation the J2EE team used RRD for most, but not all, development work:
They used RRD to build the two Web applications (Customer Service and Work Order), the Work Order message consumption module and the Work Order Web service, which answers work ticket queries from the Customer Service application. RRD was not suited for developing the handheld module, however. For that piece the team used Sun One Studio, Mobile Edition. During the tuning phase they developed a small library of custom classes to solve some performance bottlenecks. They used TextPad to write the classes and the Java Development Kit (JDK) to compile and package the library.
For source control of the RRD code, the team used Microsoft Visual Source Safe, which integrates nicely with RRD. 7.1.1.2 WebSphere Studio Application Developer Implementation For their second implementation the J2EE team used WSAD for all development, except the handheld module, which they did not redevelop in the second implementation.
Although WSAD works with certain source control software, including CVS and Rational Clear Case, the team did not use either for this implementation. Instead they simply divided the work carefully and copied changed source files between their two development machines. 7.1.2 Analysis, Profiling and Tuning Tools To profile the application, identify bottlenecks within the code and analyze system performance, the team used these tools at various times:
WebSpheres tracing service. A crude runtime monitor built into WebSphere. From the admin console you can select all the different activities you want to monitor in WebSphere; the list covers everything the server does. You choose the categories, restart the server, and see the output in a log file. IBM Tivoli Performance Viewer (TPV). A profiler that integrates easily with WebSphere. It displays a wide range of performance information. TPV also has a performance advisor that recommends changes for better performance. VERITAS Indepth for J2EE. This is a sophisticated profiler that lets you measure the performance of code to almost any desired granularity. Borland Optimizeit. This is another profiling tool that gave the team important information about thread usage which Indepth could not provide. Oracle Enterprise Manager. The team used this tool to manage and tune the database, for example to adjust the size of Oracles buffer cache. But Enterprise Manager also has a suite of analysis tools that the team used from time to time. By far the most useful was Top SQL, which gives valuable statistics on the SQL statements executed against the database. top and Windows Performance Monitor. The team used these simple tools to monitor CPU usage on the Linux and Windows machines respectively.
7.2 Tools Used by the .NET Team

7.2.1 Development Tools For development, the Microsoft .NET team chose Microsoft Visual Studio .NET Enterprise Architect Edition 2003, coupled with Visual SourceSafe 6.0d for source control. They used Visual Studio to lay out ASP.NET Web Forms graphically, but coded the back-end business and data logic manually for all applications using C#. For Web development, Visual Studio includes a feature that makes deployment fairly easy. The Copy Project mechanism allows a developer to deploy a Web application to any machine with IIS installed. The .NET team also used Visual Studio to develop the handheld application since they chose to target a Microsoft Windows Mobile 2003-based Pocket PC, which includes the .NET Compact Framework. To develop the application, the team used Visual Studios Pocket PC emulator; for testing and deployment, they used the real device. With Visual Studio, deploying to a real device was straightforward. 7.2.2 Analysis, Profiling and Tuning Tools The primary tool the .NET team used for analysis was Windows Performance Monitor. Given the broad range of performance counters available in Windows, this tool can provide finegrained visibility of the resource utilization of the applications under investigation.
To help them analyze database activity, the team used these Microsoft SQL Server 2000 tools:

Query Analyzer (Index Tuning Wizard) Enterprise Manager Profiler
DEVELOPER PRODUCTIVITY RESULTS
The focus during the development phase of the project was on developer productivity: how quickly and easily can a team of two developers build the ITS system to specification? Section 8.1 presents the quantitative productivity results of the development phase. The rest of Section 8 details the experiences of the two development teams: the architecture they chose for their implementations, what went well for them during the development phase, and the major roadblocks they encountered.
8.1 Quantitative Results

The .NET and RRD implementations were built in the controlled environment of the lab. During their development, the team members carefully tracked and the auditor recorded the time spent building each of the core elements of the ITS system. Since the Oracle 9i and SQL Server 2000 databases were fully installed and configured in advance, neither team had to spend time creating database schemas. The WSAD implementation, on the other hand, was built later under special circumstances:

The J2EE team had already built the application once. The team did not work in the lab, so the auditor could not monitor their time. Instead they carefully tracked their own times. The team did not reinstall WebSphere on the development or production machines. The team did not redevelop the handheld application.
For these reasons the auditors report provides productivity results only for the .NET and RRD implementations, while issuing a disclaimer regarding the WSAD results. 8.1.1 The Basic Data The study tracked these core development tasks:
Installing Products. This included time to install software on both development and server machines. All equipment used by the two teams initially had only a core OS installation, except for the two databases (Customer Service and Work Order) which were already installed and pre-loaded with an initial data set. Building the Customer Service Web Application. This included constructing the Web UI and backend processing for the Customer Service application according to the provided specification, as well as the functionality to send messages. It also included creating the Web service request that provides the ticket search functionality in the Customer Service application, and ensuring the application could be deployed to a cluster of two loadbalanced servers with centralized server-side state management for failover. Building the Work Order Processing Application. This included building the Web UI and backend processing for the Work Order application according to the provided specification, as well as the message handling functionality. It also included creating the Web service for handling ticket search requests from the Customer Service application. Building the Technician Mobile Device (Handheld) Application. This development task included building the complete mobile device application according to the provided specification.
System-Wide Development Tasks. This category included working out general design issues, writing shared code and general deployment and testing.
The following table shows the actual time spent building the .NET and RRD implementations, in developer hours. The data come from the auditors report: Time Spent Developing the ITS System, by Development Task (in developer hours) Development Task / ITS System Component Customer Service Application Work Order Processing Application System-Wide Development Tasks Subtotal Product Installs Technician Mobile Device Application Overall total Team / Tool Used .NET / VS.NET 40 41 2 83 4 7 J2EE / RRD 69 59 29 157 22 16
94
196
The WSAD implementation was created later by the same team that had previously created the RRD implementation. It was also created outside of the controlled lab setting. Hence, productivity data for this implementation cannot be directly compared to the other two, since the team benefited from already having already built the same application once. In addition, the team did not reinstall the WebSphere software nor redevelop the handheld application for the WSAD implementation.
Nevertheless, the following table shows the relative time spent developing the WSAD implementation of the ITS system. The data come from the developers logs. Time Spent Developing the ITS System, by Development Task (in developer hours) Development Task / ITS System Component Customer Service Application Work Order Processing Application System-Wide Development Tasks Subtotal Product Installs Technician Mobile Device Application Overall total Team / Tool Used J2EE / WSAD 13 46 33 92 n/a n/a
n/a
Given how easily two developers working closely together can move quickly among several tasks, one should not read too much precision into the breakdown of these numbers by development task. Nevertheless, some interesting conclusions emerge: 8.1.2 .NET vs. RRD The .NET team developed the entire system about twice as fast as the J2EE did team using RRD. This greater speed applied across all components. One of the greatest differences was for product installation. This is not surprising, since several key server-side .NET components were already present as part of the base installation of Windows Server 2003:

Internet Information Services (IIS), the Web server Network Load Balancing (NLB) Microsoft Message Queue (MSMQ), the message server
3
The corresponding components on the WebSphere side IBM HTTP Server , Edge Server and WebSphere MQ Server had to be installed separately. So, of course, did the WebSphere Application Server itself, on both the development and production machines. Another significant difference was in developing the Mobile Device piece, where the J2EE team ran into some roadblocks. (See Section 8.2.3.5 for details.)
As noted elsewhere, the base Linux installation included an installation of the Apache Web server, but the team chose to use IBMs version instead.
Even within the core development (the Customer Service and Work Order applications), the .NET team was more productive. Much of the explanation may lie in the simple fact that Visual Studio.NET is the dominant.NET tool, and a developer who has worked in .NET for 3 years has probably worked on VS.NET most or all of that time. In the J2EE world, by contrast, RRD is one of many tools, and a comparatively new one at that. The .NET team was undoubtedly more experienced with their tool than the J2EE team with theirs. Another factor may be the differing approaches taken by the two tools. VS.NET is more comparable to WSAD than to RRD: a development environment that connects you directly and explicitly to the platform on which you are developing. RRD, on the other hand, is marketed as a rapid development tool that accelerates the development process via its model-driven approach. RRD distances the developer from J2EE, and the team found that it simplified some tasks but complicated others where low-level code access would have been desirable. In a wide-ranging development project like ITS, RRDs weaknesses may have outweighed its particular strengths. Although the teams did not track the time spent performing different types of tasks (such as designing a Web page vs . coding database access logic), some inferences are possible. Both RRD and Visual Studio provide excellent GUI design tools and the ability to bind data objects to fields in a page. It is likely that the two tools offered much more similar productivity in this area, and that the greatest differences lay in other aspects of application development, such as coding the Customer Service logic to create and manage new work tickets in memory. 8.1.3 .NET vs. WSAD The .NET implementation (excluding product installation and the Mobile Device application) took approximately 10% less time to develop than the WSAD implementation (although, as noted, productivity for the WSAD implementation benefited from the fact the team had already built the same application using RRD). Though the totals are similar, the distributions differ. The J2EE team spent much more time writing common code and much less time on tasks specific to the Customer Service application. This was also true of the Work Order Web application, though the total for that item in the table also includes most of the work on messaging. The main reason for the higher total under common is that the J2EE team developed frameworks for the Web, business logic and persistence tiers. For example, their custom built base servlet class provided much of the functionality needed by all the servlets in the two Web applications. This design reduced the time spent developing individual use cases, while increased the proportion of time spent on common tasks.
8.2 RRD Development Process

This section describes the J2EE teams experience developing the RRD implementation of ITS. 8.2.1 Architecture Summary 8.2.1.1 RRD Applications The J2EE team divided the ITS system up into three RRD applications:

The Customer Service application, including the Customer Service Web interface and message production. The Work Order console application, which includes message consumption and the Web service used by the Customer Service application. When RRD builds this application, the Web service is packaged as a separate EAR file and must be installed separately. The Work Order web application, which stands by itself.
8.2.1.2 Database Access On the back end, the team chose to forego stored procedures and stick with explicit SQL through JDBC. The fact that RRD would generate JDBC logic automatically weighed against writing stored procedures. The team knew from prior experience that, for single database actions, a prepared statement invoked via JDBC performs at least as well as a stored procedure. So they expected that RRDs generated logic would suffice for basic CRUD operations (which covered most cases). There were cases, however, where they needed to customize that logic. For example:
Work ticket searches. Both the Work Order Web application and the Web service used by the Customer Service Web application allow ticket queries based on various combinations of criteria. For example, the Work Order Web application allows queries by any combination of customer ID, ticket creation date, work type, ticket status and technician assignment. The developers had to write code that determined which search criteria were used and constructed a custom SQL statement that uses only those criteria. Customer search. The Work Order Web application has a customer query function based on partial match of customer name. For every customer found it returns the number of tickets in each of three ticket status categories (created, in progress, completed). By default, RRDs generated code would have separately counted tickets in each category for each customer, in other words three additional SQL actions per row of customer data returned. Developer B reduced that number to one action per customer by using a custom 4 SQL statement with a GROUP BY clause to get all three counts in one action.
Later, during the tuning phase, the team discovered that some of the RRD-generated code performed poorly. In response they added other JDBC optimizations. See Section 10.4.9 for specifics. 8.2.1.3 Overall Shape of the Code RRD lets you choose from a variety of code generation patterns that will produce different code from the same design. The team made its choices based on past experience with development projects as well as performance research.
Persistence tier. RRD lets you choose EJB entity beans or plain ordinary Java objects (POJOs) for database operations. The team decided to avoid the overhead of entity beans and went with POJOs. Business tier. RRD offers session beans and POJOs. Again the team chose the latter because of its lower overhead. One exception was the code for message production in the Customer Service application; RRD wraps that code in stateless session beans. Web tier. For Web pages, RRD offers JSPs or ordinary servlets. Since neither would be edited directly, the choice was to a large degree arbitrary. The team chose straight servlets. Note also that RRD has its own Web application framework, so the use of an external MVC like Struts was not considered. Message consumption. EJB message-driven beans (MDBs) have long been the accepted technique for consuming JMS messages within an application server. They are simple and lightweight, and RRD generates them by default. The team did not deviate from that choice.
Here is the exact SQL statement: SELECT count(ticketid) FROM worktickets WHERE customerid = ? GROUP BY ticketstatus ORDER BY ticketstatus
8.2.1.4 Distributed Transactions Those database actions stemming from JMS message processing required a distributed transaction to span the database update and the JMS action. Both sides of the system Customer Service and Work Order had such requirements. Because distributed transactions require a two-phase commit, they are slower than one-phase transactions within a single database. So the team did not want to use distributed transactions for all database actions. Instead they set up two JDBC data sources to each database: one using an ordinary driver for simple transactions, the other using an XA -capable driver for distributed transactions. 8.2.2 What Went Well 8.2.2.1 Web Interfaces The team split up the work of developing the two Web interfaces. Developer A took on the Customer Service Web application, as well as building some generic login and page security functionality. Meanwhile Developer B tackled the Work Order Web application, but first setting up style sheets and look and feel for the applications. This part went very smoothly. RRDs facility for building Web pages and linking them to data structures are two of its strong points. By the end of Day 3 (where Day 1 was devoted mostly to installing software), the team had completed much of the simple logic linking the Web interfaces to database actions. 8.2.2.2 Web Service Integration Part of this process went smoothly. Developer A discovered that RRD only supported scalar types in a message; no schema, no complex data. He found it at least a little restrictive for real applications. In the face of that restriction he decided the best approach was to have the Web service return XML strings for its results, and hand code the logic to generate and parse those XML strings. This took time, but the coding was straightforward. Other aspects of the Web service piece caused confusion and loss of time. See Section 8.2.3.2 below. 8.2.3 Significant Technical Roadblocks 8.2.3.1 Holding Data in Sessions In the Customer Service application a user can create one or more work tickets (work order requests), then submit them to the system. The specification requires that the tickets be held in memory before submission; the obvious place to hold them is in the clients HTTP session, and the standard way to do that would be as instances of some work ticket Data Transfer Object (DTO) class. Here, however, Developer A discovered one of RRDs limitations. Because of the way it organizes its generated code, RRD does not lend itself to the standard solution. RRD organizes all the generated code for a given page in a page-specific package. This includes page specific classes representing the data structures used by that page. In other words, if two pages both use the WorkTicket class from the class model, RRD generates a different WorkTicket Java class for each page, each in a different package. This means that if Page 1 creates an instance of its WorkTicket class and places it in a session, Page 2 cannot use it as an instance of its WorkTicket class. Developer A used RRDs preferred solution to this problem: store the data in session in XML form. He used RRDs features to define an XML data structure and map it to classes in the
model. The generated code uses a DOM API to parse the XML. This solution is tedious, and a bit rankling to the hard core J2EE developer. Nevertheless, it did work. (Fast forward to Phase 2, when the team found all this to-ing and fro-ing between XML and objects a major performance bottleneck. They ripped out the XML code and replaced it with logic that used a custom DTO class.) 8.2.3.2 Web Service Integration At one point Developer A ran into an anomalous situation with respect to the Web service on the Work Order side. RRD was not building the Web service properly, and the reason was not apparent. In his previous experience with RRD he had found that RRD did not generate an IBM-specific deployment descriptor that seemed to be necessary. In the past he had used another tool (WSAD) to generate the missing descriptor, so he did so again. He created and built a simple Web service project simply to generate a descriptor. When he included this descriptor in the RRD application, however, it did not deploy correctly. It turned out that the initial failure was related to a bad state in RRD, possibly a source control issue. One of the files was not open for writing, but RRD didn't tell him. So the Web service implementation hadnt been saved properly and consequently didnt work. Once he tracked down and fixed the file problem, RRD did successfully build and deploy the Web service without the custom descriptor. At that point Developer A took yes for an answer and moved on without investigating the anomaly. But the detour cost him several hours. 8.2.3.3 Configuring and Using WebSphere MQ Developer A, responsible for the Customer Service application, also worked out how to get RRD to talk to WebSphere MQ on the development machine. Despite the fact that all the software products involved are IBMs, this process was not as simple as it could be. It took some time to work out the correct permutation of settings to get them all to work. Once he had built the message senders and receivers in RRD, putting data into them and getting it out was fairly straightforward (using XML). Later on, when the team set up its production environment, MQ again gave Developer A a headache. This while he was setting it up in a remote fashion (one MQ installation that served three applications residing on different servers). The process was complicated by the undocumented fact that IBM WebSpheres MQ installation did not use the default port of 1414 6 for MQ. Rather it used 5558, not at all obvious. 8.2.3.4 Handling Null Strings in Oracle The team discovered that Oracle treats empty strings in updates as null. This caused errors since most fields in the ITS schemas dont allow nulls. The team compensated through code in two ways:

Many string attributes required non-empty values. This requirement was enforced in the input forms. For non-required attributes, the developer set the initial value to a space character.
8.2.3.5 Building the Handheld Module Although, as noted earlier, the team used a different tool (Sun One Studio ME) to build the handheld application, they did involve RRD in the process. Given several ways for the
5 6
Document Object Model, an API that treats an XML document as a tree of objects. The developer discovered this by getting a complete process list using the ps command (ps efl). He noticed an entry strmqmlsr that had a port setting, tried this number and it worked.
handheld application to communicate with the Work Order application, they chose to use a Web service. Using RRD, Developer B added a Web service interface to the Work Order Web application, defining the five remote operations needed by the handheld application:

Login (logout doesnt require a remote call) View work orders for this technician for a certain status View an individual work ticket Mark a ticket as started Submit time spent and mark ticket completed
Meanwhile Developer A ran a stub generator in the J2ME wireless toolkit in Sun One Studio to create the Web service client. This should have been easy, but it wasnt. It turned out that the stub generator supported only document/literal Web services, whereas RRD only supported rpc/encoded. No manner of tweaking the WSDL would make the two talk to each other. But they had all the logic ready to use in EJB methods. They needed a way to allow the PDA to execute them. Luckily the wireless toolkit also had a wizard for converting a service (basically a simple Java class with some methods in it) into a servlet and client piece using HTTP post and simple data (not web services). So they butchered the EJBs generated by RRD into POJOs, delegated to their methods from the service class and ran the wizard. With that, they had the two halves talking to each other. The final step was to build the front end MIDLet in J2ME. Luckily this step was very easy and took only a couple of hours. 8.2.3.6 Miscellaneous RRD Headaches Along the way RRD posed a variety of smaller challenges. Among the more interesting: Inability to centralize common page logic. Because RRD does not let you work directly with JSPs and because the ITS specification prohibited use of frames in the pages, the team could not easily centralize page logic that was common to most or all pages. This logic included the navigation bar (links to other pages) and page authentication logic to verify that the user is logged in before displaying the page. In RRD this logic had to be copied to every page. A tedious but finite process, it would have been much worse had the number of pages been significantly greater. False error when building an EAR file for WebSphere. Developer A discovered a glitch in RRDs build process for WebSphere. The build script that RRD creates makes an EAR file under <websphere home> /RationalRDApps/<application name> . If there are no JARs in the application (EJB JARs or custom libraries), the script returns an error code and RRD aborts. This is true even if there need not be any jars. Developer A worked around it by drop any old jar in the folder (such as Oracles classes12.jar) to avoid the false error code. Placement of the GlobalObject class. For each application, RRD generates a GlobalObject class that contains any global functionality you define. Although the class is application specific, RRD does not package it in the resulting application EAR file. Rather it is treated like an external library: It must be placed in the servers class path. This means the team had to bounce WebSphere whenever the class changed. Also, since they used WebSpheres admin console rather than RRD to deploy to the production servers, they had to manually copy the class to its proper destination. It cost them some time figuring out where the class belonged and ensuring it was properly updated. Date handling in a Web page. Developer B discovered an apparent bug in RRDs handling of date fields in a page: After constructing a page with date fields tied to date attributes of an object in the model, he enter a date using the correct format, then print its value to standard
error. The printed value is one day behind the entered value. He wrote a simple global method to increment a date value and compensate for the discrepancy. Inconsistencies regarding editing and source control. RRD works seamlessly with Visual Source Safe; you can easily check files out and in of VSS from within RRD. And for the most part, RRD is smart about preventing you from editing files that you have not checked out. But Developer B found gaps in this intelligence, leading to wasted time. For example. when he started to add a session attribute to a project, he was able to define the attribute, enter its name and set its initial value, before realizing he had not checked out the source file where the attribute would be stored. But without checking out that file he couldnt save his work. And checking out the file caused him to lose his work. Adding a static HTML page to a Web application. At one point Developer B wanted to add a static HTML page to the Customer Service Web application. This turns out not to be easy at all for RRD. RRD has no facility for directly adding an actual HTML file to the web app. Instead it has a way to let you take snapshots of pages that change little or not at all. You designate the page as static in the page properties, then you have to go through a two-step construction process. Developer B created a dummy page and tried this, but quickly got bogged down in the details. So he gave up and instead simply dropped an HTML file in a folder that was included in the WAR build. That did the trick.
8.3 WSAD Development Process

8.3.1 Architecture Summary For this implementation the team decided to keep things simple and straightforward. They used only standard J2EE APIs according to established best practices. 8.3.1.1 Overall Shape of the Code The team chose an architectural framework based on lessons learned from prior development experience and performance research. Heres a tier-specific breakdown: Database access. As with the RRD implementation, again the team avoided putting database logic in the database itself via stored procedures, favoring explicit SQL statements in the Java code. In this more standard J2EE environment, where they would have to write the database logic either way, several factors led to this conclusion:

For a single database action, a prepared statement invoked via JDBC performs at least as well as a stored procedure. Prepared statements are easier to write than PL/SQL stored procedures. Having the SQL in the application code rather than in the database simplifies code maintenance.
Along the way the team discovered that this choice gave them even greater flexibility than first thought: They could invoke complete PL/SQL statements as JDBC prepared statements, letting them combine multiple database actions into one. More on this below under Section 10.5.4, Optimizing Queries. The persistence tier. Having chosen persistence logic within the application itself, the team then had to choose how to organize that logic, whether in EJB entity beans, another O/R mapping layer such as Hibernate or JDO, or POJOs with straight JDBC. Again, from past experience they decide that entity beans were too expensive. They considered Hibernate, but rejected it; it might provide a significant productivity gain only if the entity model were extensive, and the team wasnt certain it would scale. So instead they settled on POJOs.
To centralize all common aspects of JDBC operations, the team created a simple framework based on the Command pattern. A central JdbcHelper class was responsible for getting a connection, executing a statement, getting and returning the results of that statement (if the SQL was a query), and handling errors. It obtained the SQL statement and the actual query results from a case-specific callback object (the command object). The callback classes themselves were organized as inner classes of entity-specific JDBC helper classes, such as CustomerJdbcHelper for customer activity. Each class had methods representing individual database actions, such as updateCustomer(). Each method would 7 create or obtain the proper callback object and invoke the central JdbcHelper logic to execute the database action. This framework proved very effective in minimizing repetitive code, reducing bugs and easing maintenance. The business tier. For the business faade the team considered EJB session beans but rejected it as too expensive. Instead they created a simple framework based on the Faade pattern. Each application had a single faade class with stateless methods representing every action required by that applications front end. Most of these methods simply called a method of the appropriate JDBC helper class. Each faade class also created a single instance of itself of each JDBC helper class it used to eliminate unnecessary object creation. The Web tier. The team used JSPs for Web pages and servlets to tie those pages to business logic. They decided against using an MVC framework (Struts was the prime candidate) because it would add runtime overhead. Nevertheless, they did create a simple controller servlet that reproduced some of Struts conveniences:

form field validation message handling (informational and error) request forwarding
This servlet became the ancestor class to all servlets in the two ITS Web applications. The team also did not use any custom tag libraries. This choice meant that their JSPs contained significant amounts of Java code. For a comparatively short project such as this, the additional code was acceptable; but in a longer, more enduring project the team would probably have refactored their JSPs to use Struts or JSTL libraries. Message consumption. As with the RRD implementation, the team saw no reason to deviate from using EJB message-driven beans (MDBs). 8.3.1.2 Distributed Transactions To handle distributed transactions, the team used the same solution with the WSAD implementation as they did with RRD: They set up two JDBC data sources to each database: one using an ordinary driver for simple transactions, the other using an XA -capable driver for distributed transactions. See Section 8.2.1.4 for details. 8.3.1.3 Organization of Applications in WSAD WSAD proved very flexible in letting the team organize their code into projects and applications. The team organized their work into ten projects comprising three applications and two external libraries:
Callback classes representing actions with no parameters (such as get all customers) could be treated as singletons.
Project
itsCustServ + itsCustServWeb itsWorkOrder + itsWorkOrderWeb itsWorkOrderConsole + itsWorkOrderConsoleMdbs + itsWorkOrderConsoleWeb + itsWorkOrderConsoleCommon common itsCommon
Produced
itsCustServ.ear itsCustServWeb.war itsWorkOrder.ear itsWorkOrderWeb.war itsWorkOrderConsole.ear itsWorkOrderConsoleMdbs.jar itsWorkOrderConsoleWeb.war itsWorkOrderConsoleCommon.jar common.jar itsCommon.jar
Description
Umbrella project for Customer Service Web app Web app for Customer Service Umbrella project for Work Order Web app Web app for Work Order Non-Web functionality on Work Order side Message beans to process customer and ticket messages Web front end for Web service Common logic for message beans and Web service External library containing generic logic, e.g. JdbcHelper class External library containing ITS-wide logic, e.g. DTO classes
8.3.2 What Went Well Some aspects of development (especially page design and coding) were more tedious in WSAD than in RRD. Still, overall WSAD proved easier to use than RRD, for two reasons:

It was a straightforward tool that facilitated access to J2EE rather than hiding it. This appealed to the J2EE developers on the team. It was dedicated to WebSphere.
Several aspects of working with WSAD were particularly easy: 8.3.2.1 Navigating the IDE While far from simple, WSAD seems to be a better organized IDE than RRD. The flow from one type of task to another is simpler and easier. WSADs navigational structure is flatter and allows you to maintain parallel threads of activity simultaneously, such as editing source code, setting project properties and configuring servers. In RRD, often pursuing one thread of activity takes you to a place from which it is more difficult to return to your starting point. 8.3.2.2 Building for Deployment Rebuilding applications in RRD often took many minutes. WSADs build process was much faster. Given how often the process took place, this small savings per build added up to significant time savings over the life of the project. 8.3.2.3 Testing in WebSphere WSADs lightweight WebSphere test environment proved extremely useful. It started and stopped much more quickly than WebSphere running on the local development machines and worked seamlessly with the WSAD development environment. 8.3.2.4 Common Logic in JSPs Because the team was creating explicit JSPs for this implementation, they could take advantage of the JSP @include directive to put common logic in a central JSP and import into
every page that needed it. The team used this technique for two common aspects of their pages:

The page header, including the navigation bar (links to other pages) The page authentication logic to verify that the user is logged in before displaying the page
The only tricky part to using @include was the declaration and use of variables in scriptlets. A variable used in any JSP had to be declared in the same JSP. This restriction complicated the design a bit but not significantly. 8.3.3 Signficant Technical Roadblocks 8.3.3.1 XA Recovery Errors from Server At one point in development testing, Developer B got an unusual error from the WebSphere test server. The error had a long output trace that pointed to one of the XA datasources and began with
The transaction service encountered an error on an xa_recover operation.
The error appeared shortly after a test of a distributed transaction had failed and the server was bounced. It seemed the server was trying unsuccessfully to recover from the failure. The problem was that the server tried again every minute or so, pumping output into the stdout log. While not a show-stopper, this problem was annoying, especially because it later also appeared in the live WebSphere instances. Bouncing WebSphere and Oracle had no effect, and the team could not find any WebSphere configuration settings that helped. Moreover, the problem had not appeared during the RRD round. Eventually a Google search found the answer at an IBM devWorks forum: The datasource user must have SELECT rights to the Oracle table PUBLIC.DBA_PENDING_TRANSACTIONS. In the RRD round the team had set up the datasources to log in as SYSTEM, which had that right. In the WSAD round, they configured the datasource to log in as ITSUSER (the ITS schema 8 user), which didnt. When they granted that right to ITSUSER and restarted the servers, the problem disappeared. 8.3.3.2 Miscellaneous WSAD Headaches Along the way WSAD posed a handful of smaller challenges. Among the more interesting:
TODOs in JSPs. WSAD has a nice facility for marking to do tasks in your Java code. It displays any comment beginning with // TODO in a special To Do list and lets you easily jump to that comment from the list. However, this facility does not work for comments in JSPs. Code completion inside JSPs. WSAD has a nice code completion tool, but it is very slow for Java code inside JSP scriptlets. Copying a servlet. One developer discovered that when he created a new servlet by copying an existing one, WSAD did not automatically add the new servlet to the web deployment descriptor; because of that he got page not found errors when invoking that servlet. The Web descriptor editor has a Servlets page that should have allowed him to add existing servlets to the list, but it didnt point to the right, and the developer didnt see how to make it do so. So he had to manually edit the descriptor source. Sharing source code. Because the team chose not to use source control software, they shared code by zipping up their workspace. After a couple of false starts they learned which files not to share.
Why the change? Having the application log in as SYSTEM meant that at least some SQL table references had to be qualified with the schema name. Logging in as ITSUSER eliminated that problem.
8.4 Microsoft .NET Development Process

8.4.1 .NET Architecture Summary The .NET team used a three-tier architecture to implement the ITS system. They used ASP.NET for the Web application components, with C# code behind the ASP.NET Web Forms. The backend processing logic was separated into two distinct layers: business logic and data access. This fully architecture isolated the UI and business tiers from the backend data access layer so that a different database could be used without changing any business logic or UI code. The team spent approximately 50% of their development time writing new code, 25% modifying code to correct misinterpretations of the specification, 10% creating the overall design, and 15% performing unit and system tests. Since the machines were speedy, build times were insignificant. The team used model classes to map objects to relational data, and also used the publicly available Data Access Application Block (DAAB) to simplify their data access code. DAAB provides pre-built libraries for ADO.NET and is available on MSDN for both SQL Server and Oracle backends. More on DAAB below in Section 8.4.1.2. This diagram shows the software architecture of the Work Order application. The of the Customer Service and Technician Mobile Device applications had very similar structures, except that the former lacked message processing and the latter had neither message processing nor queue access.
Figure 3. Architecture of the .NET implementation of the Work Order application
8.4.1.1 Organization of .NET Applications The .NET team divided the ITS system into three Visual Studio.NET projects:
The Customer Service project included the Customer Service Web application with its associated business and data logic. This application exposed the Web UI and produced customer update and work order messages. The Work Order project included the Work Order Web application and the Message Forwarder and Processor console applications, each with its associated business and data logic. The Web application exposed the Web UI as well as the Web service consumed by the Customer Service Web application. The Message Forwarder and Processor console applications worked together to process the customer update and work order messages. The Technician Mobile Device project included the Pocket PC Windows Forms application with its associated business and data logic. This application provided the UI for technicians to process work orders. It connected directly to the Work Order database via wireless networking.
8.4.1.2 Database Access Stored Procedures The .NET team chose to use stored procedures even though all of the database actions were little more than CRUD operations. Doing so afforded a level of encapsulation and an interface that allowed the underlying database operations to change if necessary without adversely affecting the data access logic in the middle tier code. From a manageability standpoint, this arrangement has certain advantages, as at least some level of query changes are possible without having to modify or re-deploy any middle tier logic. At the same time, the .NET team did not limit themselves to stored procedures. Along the way they found that, for some operations, putting the SQL statements in the data access logic markedly improved performance. The most notable example was work ticket queries. Since the specification required the queries to employ several independent but optional search criteria, the team coded their middle tier to construct a SQL query for such criteria and send it as a batch to the database. Data Access Application Block (DAAB) Although each .NET application had a data access layer, the custom classes in those layers did not directly invoke the .NET frameworks data access classes (those found in the System.Data.SqlClient namespace). Instead, the .NET team chose to use a freely available application block, the Data Access Application Block (DAAB), created by Microsoft and publicly available for use with both SQL Server and Oracle. DAAB comes in the form of a single code file containing a few utility classes that encapsulate the most common data access operations. The main DAAB utility class, SqlHelper, contains methods to return scalar values or result sets in the form of a SqlDataReader or DataSet. It also contains methods to execute SQL batches that have no return value, such as INSERT or UPDATE statements. All SqlHelper methods have several overloads that take a variety of parameters, such as either a SqlConnection object or a connection string. In most cases, the .NET team used the overloads that take a connection string as well as the stored procedure name and a variable list of arguments in varargs style. This approach reduced much of the data access code down to a single statement. For instance, the Update method of the CustomerDAO class was very brief:
public static void Update(Customer customer) { SqlHelper.ExecuteNonQuery(connectionString, "UpdateCustomer", customer.CustomerId, customer.CompanyName, customer.Address1, customer.Address2, customer.City, customer.State, customer.Zip, customer.ContactFirstName, customer.ContactLastName, customer.ContactEmail, customer.ContactPhone, customer.ContactFax, customer.MasterAccountCode); }
8.4.1.3 Distributed Transactions The .NET team managed transactions with the ServiceDomain class instead of the COM+ Catalog. From previous experience, they knew that transactions would perform at least slightly faster and always be easier to manage with the ServiceDomain class than COM+ Catalog components (which can also be created via .NET). However, since the ServiceDomain class does not work on Windows XP (the OS of the development machines), the team could not test transactional behavior locally. This limitation did not turn out to be a problem since the code still compiled correctly; the .NET team protected the transaction-specific code with preprocessor directives (e.g. #if DEBUG). Thus, they were able to perform functional tests on their development machines and specification conformance tests (including transactional behavior) on the target machines, which were running Windows 2003 Server. The .NET team experienced some configuration problems. One was with the Distributed Transaction Coordinator that .NET uses to manage transactions. They suspected the problem was due to re-naming the server. They solved this problem without affecting their development time. 8.4.1.4 ASP.NET Session State The Customer Service subsystem requires clustering and load balancing to achieve reliability and scalability. Since the ITS specification required seamless failover with no loss of session state, the architecture had to include a way to preserve that session state. .NET offered the team two solutions: storing session state in a database or using ASP.NET Session State Services. Since the specification did not require that state persist longer than the session timeout for users (15 minutes), and believing it would perform better, the team decided to use ASP.NET Session State Services. ASP.NET Session State service is an out -of-process service that does not depend on any other processes, so that developers can begin using session state in server farm environments without worrying about the requests dependency. Once installed and enabled on the reliable server, this service satisfied the ITS specification for preserving session state in a clustered environment. 8.4.2 What Went Well Overall, the .NET teams development experience was normal: some parts were easy while others were more difficult or revealed unexpected problems. Error handling, error logging, custom event logging, and project deployment were a few of the easier issues that the .NET team encountered. Paging, Web service return values, and MSMQ were some of the areas in which they spent more time creating solutions. 8.4.3 Significant Technical Roadblocks 8.4.3.1 Transactional MSMQ Remote Read The ITS specification required that messages sent to the Work Order application to create a work ticket or update a customer would stay in the queue until processed successfully. More
specifically, the application would start a transaction before reading and processing a message. If the message processing succeeded, the transaction would be committed and the message would be removed from the queue; if it failed, the transaction would be rolled back and the message would remain on the queue. This requirement led to the single most costly problem the .NET team experienced. The issue stemmed from the way MSMQ handles distributed transactions. Currently, MSMQ supports transactionally sending messages to, but not reading messages from, a remote queue. In other words, the Work Order application residing on the Work Order host could not read a message from the queue residing on the MQ host within a transaction. The team discovered this issue about two-thirds of the way through the development phase when they tried a system-level test involving work ticket processing. Why did they not see it sooner? As described above in Section 8.4.1.3, the team had chosen to manage distributed (DTC or Distributed Transaction Coordinator) transactions via the ServiceDomain class, which is not available in Windows XP. So they could not test this functionality on the development machines. It took about half a day to diagnose the problem and a full day to implement a workaround. The solution, a read request queue, is described in the MSDN article Transactional Read-response Applications , found at http://msdn.microsoft.com/library/default.asp?url=/library/enus/msmq/msmq_about_transactions_05wz.asp. It lays out the following architecture:
Figure 4. Architecture for a transactional read -response application using MSMQ
This architecture has these key elements:

Each receiving (message processing) application has a local queue to hold messages it will process. When a receiving application wants to process a request, it sends a read-request message to a read-request queue colocated with the input queue (the queue to which the sending applications send their original messages). A separate application (the read-response application) monitors the input and read-request queues, which are local to it. In a single transaction it reads a read-request message, obtains from it the target receiving application, reads a message from the input queue and
forwards it to the target. When the forwarded message appears in the targets local queue, the target receiving application handles the request. This solution gets around the MSMQ barrier the .NET team encountered because all messages are read from queues local to the reader; there are no remote reads at all, let alone any that must be transactional. To implement this solution, the .NET team had to create

two read-request queues on the MQ host (where the original input queue resided), one each for customer updates and new tickets two corresponding local queues on the Work Order host a read-response application
The team also noted what it considered a non-standard requirement of the ITS specification. In a production system, if a message consumer could not process a message properly, it would probably remove the message from the main queue and place it into a secondary failed queue for management reasons. However, the ITS specification did not allow this option; it required 9 the failed message to remain in the main queue. One adverse consequence to this requirement was that a corrupt message would continually be re-processed over and over again. 8.4.4 Miscellaneous .NET Headaches 8.4.4.1 DataGrid Paging One of the ITS specification requirements called for paging the display when a query returned more than ten rows of data. The ASP.NET DataGrid class has a built-in paging feature that is easy to configure; with some property settings and minimal code (one event handler containing about two lines), it even provides its own Previous and Next links. However, the specification also called for the Previous and Next buttons to be implemented as INPUT tags (to work with the Mercury LoadRunner scripts) and named cmdPrevious and cmdNext, respectively, in the HTML. While these requirements prevented the .NET team from using all of the DataGrids paging features, they were still able to utilize the auto-paging feature and write additional code to handle the buttons' events. Since the specification also forbade client-side caching of query results, the .NET team disabled view state on Web pages that contained a DataGrid. This change had the added benefit of reducing the HTML page size, since the view state for a DataGrid can be quite large. Without view state, however, the team needed another mechanism to hold other, necessary page state information. They kept track of some of the DataGrid properties (such as the current page index) in a cookie to get the correct paging behavior. Had the team been allowed to use the full range of DataGrid features, this particular part of the development phase would have taken only several minutes. Instead, with all of the custom code necessary to implement the ITS paging requirements, it consumed a few hours. 8.4.4.2 Web Services Returning DataSets During the development phase, the .NET team decided to have the ticket search Web service return a DataSet object. This solution, rather than returning a custom object collection, was the easiest way to return data in a form that could be bound directly to an ASP.NET DataGrid, especially since the ITS specification required the application to sort the search result on a specific column.
9
The reason was simplicity. The ITS specification was designed to include integration technologies such as messaging in the project yet still be simple enough that a team could develop the ITS system in four developer-weeks.
However, during the tuning phase the .NET team found they could increase performance by switching to a custom object collection. This change required a more significant coding effort, since they had to create custom classes not only on the Web service side but also on the client side. Since the classes auto-generated by Visual Studios Web Reference mechanism expose only public member variables instead of the public properties required for DataGrid data binding, the .NET team had to create wrapper classes to expose those fields as properties. They also had to implement the IComparer interface to get the proper sorting in the DataGrid. 8.4.4.3 The Mobile Application As part of the .NET teams investigation, they created a simple Pocket PC application that had a single form containing a data grid filled by a query to a SQL database. While, that simple application worked fine, when they began to develop the ITS mobile application, compilation of Microsofts Data Access Application Block (the DAAB implemented in SqlHelper.cs) failed. When they did a quick retest in a desktop Windows Forms application, the failure did not occur; it happened only in a Compact Framework project. It turned out they were using version 1.0 of the DAAB in their initial mobile test project, but version 2.0 in the Technician Mobile Device project. They did not want to use version 1.0 since 2.0 contains several improvements, but for some reason the Compact Framework build was unable to find the IDisposable interface of the SqlDataAdapter class. They were able to enhance version 2.0 of the DAAB to work successfully with the Compact Framework by creating a custom SqlDataAdapter class to wrap the System.Data.SqlClient.SqlDataAdapter class. Even with this issue, the total development time for the mobile application was only one day. 8.4.4.4 Model Object Class Creation Although Visual Studio is a great tool for creating UIs for all kinds of applications (Web, Windows, and mobile) and provides assistance when writing code (e.g. IntelliSense and the Class Browser), it lacks tools to easily create model object classes (entities). The .NET team had to create all such classes manually. This process was rather tedious since most of the model object classes had a similar structure: private member variables, public properties (getters and some setters) to expose them, and one or more constructors to initialize them. About half of the classes were rather small, but others had as many as a dozen attributes. Although the time taken to create each class was not very significant, it was still minutes instead of seconds. Copying and pasting some parts of the code that was common across most of the classes mitigated some of the effort, but that process introduced the risk of errors. Fortunately, Visual Studio parses the code while editing, effectively performing a syntax check without compiling. Compilation was quick, in any case.
CONFIGURATION AND TUNING RESULTS
The development teams were initially allotted up to two weeks to tune and configure the system in preparation for performance, manageability and reliability testing in the final week of the project. However, each team was allowed additional time if they required it in order to successfully prepare for the tests and ensure the application was performing properly. The time spent configuring and tuning the production environment was tracked for comparative purposes. The teams used Mercury LoadRunner to simulate load for performance tuning. Each development team was responsible for their own database indexing, database tuning, and application server tuning. Changes to code were allowed during this phase if required to make the application perform more efficiently under load. The following table shows the amount of time taken by each team to tune each application to perform as efficiently as possible prior to testing. The data is from the auditors report: Time Spent Configuring and Tuning Team / Platform / Implementation Originally scheduled J2EE / WebSphere / RRD J2EE / WebSphere / WSAD .NET / .NET / .NET Man-Days* 20 76 24** 16
* A man-day is defined as an eight -hour day per individual for all implementations except RDD, in which case hours per day may vary per individual and be slightly higher than eight. ** Note that the base install, tuning and configuration process from the RRD implementation had at least some carry-over to the WSAD implementation, reducing to some extent the time needed for tuning/configuring the WSAD implementation. The J2EE team clearly went well beyond the two-week time frame when tuning the RRD implementation. There are several reasons for this, detailed in Section 10:

The RRD code required extensive reworking. The team experienced significant problems getting some of the WebSphere infrastructure to work properly, most notably Edge Server (the load-balancing component) and in-memory session replication. J2EE systems have many more tunable parts than do .NET systems. This means not only more knobs to be turned, but more ways in which tuning one part affects another. Basic J2EE system tuning (JVM, Web server, application server) takes a long time, as it did in this case.
In the WSAD round, the J2EE team built upon the solutions from the previous round, focusing their efforts on tuning the WSAD code and improving failover. Again see Section 10 for details. The .NET team, on the other hand, completed their tuning early. As the auditors report states, they were able to tune and configure their implementation in 8 days, less than the 10 days allotted. Vertigo Software could have used the entire 10 days allotted for this phase; they chose to consider this phase completed after 8 days.
Again, as noted above, .NET has many fewer knobs to turn than WebSphere. The team did not, for example, have an equivalent of tuning the JVM. See Section 11 for details on the .NET teams experience.
10 WEBSPHERE CONFIGURAT ION AND TUNING PROCESS SUMMARY

This section describes the process the J2EE team went through to configure and tune the basic WebSphere infrastructure. It also describes the major bottlenecks encountered and resolved in the two implementations. Here is a high-level summary of the stages the team went through. Details follow in the sections below. For the RRD implementation: 1. 2. 3. 4. Install the basic software: WebSphere Network Deployment, Edge Server, IBM HTTP Server (IHS). Configure the software for the ITS system. Resolve code bottlenecks in the implementation. Tune the system for performance.
For the WSAD implementation, the team did not repeat Stage 1 and did very little Stage 4 tuning. Most of their work in the WSAD round focused on three issues:

session sharing failover optimizing database queries
10.1
RRD Round: Installing Software
10.1.1 Starting Point During the development phase the team had done a basic WebSphere installation to pass functional tests on the target machines. This consisted of a standalone WebSphere instance on each of the four servers: two for the Customer Service application, one for the Work Order application, and one for the MQ server. The Customer Service and Work Order instances were individually configured for the necessary JDBC and JMS resources:

2 JDBC datasources (one non-XA, one XA) to the appropriate database 1 JMS queue connection factory for the MQ server 2 JMS queues for customer and ticket messages
Additionally, the team had set up session sharing between the Customer Service instances using a longstanding standard IBM technique: writing the session data to a database. WebSphere makes this relatively easy to configure. The team used the Customer Service database as the persistent store for sessions. (This technique worked for functional testing, but later would prove unacceptably slow under load. The team would then replace it with inmemory replication; more on this in Section 10.6.7.) 10.1.2 Installing WebSphere Network Deployment Installing WebSphere Network Deployment was a lengthy but comparatively straightforward process. After you install the Deployment Manager (the administrative server), you add nodes to your federation. In doing so you can choose whether the configurations of the WebSphere instances already installed on those nodes should be preserved. Initially the team did not do
so, losing their resource configurations in the process. Realizing their mistake, they removed the nodes and added them again in a way that preserved the configurations. Another false start had to do with adding a node for the MQ servers WebSphere instance running on the same host as the Deployment Manager. When the team did so, they discovered two changes in the MQ situation:
How you start MQ changed. As mentioned earlier, you control MQ Series through WebSpheres own JMS server. If the WebSphere stands alone, its JMS server is embedded in the application server, and starting the latter starts the former (as well as MQ). But when you add the WebSphere instance to the Network Deployment, the JMS server is split out as a separate server and must be started separately. The MQ configuration names (queue names, JNDI names, etc.) changed, and the applications could no longer reach the server.
While the first change did not cause a problem, the second did. And since the team had no compelling reason to include the MQ WebSphere in the federation, they removed it and restored the status quo. (Ironically, much later the team decided they needed the session sharing server to run on the MQ host, forcing them to add that node to the federation and confront the MQ configuration change.) Once the nodes were added, the team created a cluster for the two Customer Service servers. 10.1.3 Installing IBM HTTP Server Next the team installed IBM HTTP Server (IHS). This installation went quickly and smoothly. The only stumbling block came early on when Developer B tried to launch IHS using the command: apachectl start Apache did start, but it wasnt IHS. It took a bit of head scratching to figure out that Linux had its own Apache server already installed and placed in the system path. The two versions of Apache were launched with the same command. Executing the command with a path qualifier pointing to the IHS bin folder solved the problem. 10.1.4 Installing IBM Edge Server The last piece of the IBM infrastructure was Edge Server, used to handle load balancing and failover for the Customer Service cluster. Installing the software itself was easy. Getting it to run properly was a different matter. In fact, Edge Server was directly or indirectly responsible for the most significant challenges the team faced during this phase. See Section 10.6 for the gory details.
10.2
RRD Round: Configuring the System
10.2.1 Configuring JNDI One cons equence of converting from standalone WebSphere to the Network Deployment version is that WebSphere uses different ports for JNDI. Suddenly the applications were throwing errors. It took one developer a day and a half to sort out the cause and the cure. See Section 10.6.5 for details.
10.2.2 Configuring the WebSphere Web Server Plugin The Web Server Plugin is WebSpheres interface between Apache and the HTTP transport embedded in the application server. The plugin consists of a native runtime library (already installed with WebSphere) and an XML configuration file, plugin-cfg.xml. Generating that file and putting it in place on the nodes can be done entirely through the Deployment Manager console. The WebSphere literature talks of installing the plugin, but its really a matter of configuring Apache to use it. This configuration consists of modifying Apaches httpd.conf file to load the plugin library and point to the plugins configuration file, plugin-cfg.xml. While it took the team a few tries to get everything right, the procedure is straight forward and well documented.
10.3
RRD Round: Resolving Code Bottlenecks
This section describes changes the team made to the RRD implementation itself to improve performance. 10.3.1 Rogue Threads Initial tests of the RRD implementation were dismal, but the Work Order Web application performed especially poorly. After running a while, WebSphere would actually hang (become too busy to respond). It took a long time to diagnose this problem, but finally the team did so using Borland Optimizeits Thread Debugger. This tool shows you all thread activity in the JVM. It told the team that the server was spawning many new threads from instances of a log4j called FileWatchDog.. This class extends Thread and is used to check every now and then that a certain file has not changed. What was causing this? The symptom turned out to be RRDs debug settings; the Work Order Web application had been deployed with debug settings turned on. The team redeployed it with all debug output suppressed. 10.3.2 Optimizing Database Calls Watching Oracles TopSQL utility told the team that RRD was performing database calls very inefficiently in many places. There were two issues in particular:

The generated code often used plain statements instead of prepared statements. For queries, the code performed a count(*) before doing the actual query, to see how many rows would be returned. This of course doubled the number of actual database calls.
The team replaced RRDs code for all major queries with custom code that used prepared statements. It also eliminated the count(*) calls, as these were completely unnecessary. This coding work took considerable time but proved crucial to improving the applications performance. 10.3.3 Optimizing the Web Service The team found the Work Order Web service a major performance bottleneck. This service queries work tickets for the Customer Service application and returns results as an XML string. Even after optimizing the query logic, the Web service still responded slowly. So the team focused on the fact that RRD generated code to wrap the service in a stateless session bean. Every service call had to acquire a bean instance.
The team tried replacing the session bean with POJOs. They did this by porting all the generated RRD Java code into another tool (WSAD), then refactoring the bean class and the logic that used it. This change improved performance on the entire test script by more than 10%. Next they looked at the code that creates the XML response string from the query. The original code created DTOs from the JDBC result set, then used the DOM API to construct XML from the DTOs. The team refactored this code to create XML directly from the result set using a simple StringBuffer. Eliminating DTOs and DOM (which is notoriously expensive) improved overall performance by another 16%. 10.3.4 Paging Query Results The ITS specification set the following rules for large queries:

Queries should return no more than 500 rows. Results pages should display 10 rows per page. Results cannot be cached; each new page should re-execute the query.
The teams initial RRD implementation did limit the overall query size, using Oracles maxrows variable, such as: SELECT * FROM WHERE maxrows <= 500 And in fact RRDs generated code performed some inherent paging. But when they switched to custom queries, the team had to add custom paging logic. There were two important parts to this logic.

First, limiting the query size: In other words, for Page n the query should return the lesser of 500 and n * 10 + 1. Second, creating data transfer objects (DTOs) for only the ten rows actually displayed.
So if the application performed asked for Page 3 of a query that could return 100 rows, the query should only return 31 rows, and the application should create objects only for rows 21-30. Because RRDs query processing code is so tightly interwoven with its page producing code, replacing the query code wasnt enough; the team had to work around the code for page displaying the page. This took the team deep into the realm of working against RRDs capabilities instead of with them. But the results were worthwhile in terms of improved performance. 10.3.5 Caching JNDI Objects The teams experience investigating how RRD handles JNDI lookups made clear that it did so inefficiently. Every JNDI lookup required creating a new InitialContext object, a very expensive operation. The team created a simple ServiceLocator class that cached the InitialContext, JDBC data sources and EJB homes. (Although the implementation used EJB minimally, it did use stateless session beans in the Customer Service application to wrap the JMS message producing code.) This class was packaged in a custom library that was dropped into WebSpheres AppServer/lib/ext folder. The library worked well. The only inconvenience was that it meant bouncing WebSphere more often: when the library changed, and when the Customer Service application was rebuilt and redeployed (because the EJB home stubs became stale.)
10.3.6 Using DTOs for Work Tickets As described above in the section on developing the RRD applications, the way RRD generates code prohibits data objects from the class model from being directly shared across pages. So for the Customer Service Web application to hold pending work tickets in session state, it had to do so by converting ticket objects to XML and back again. This process proved highly inefficient, especially because RRD used a DOM API to construct the XML. (DOM is notoriously expensive.) After working through other code bottlenecks, the team focused on this one. The solution was a custom WorkTicket DTO class to replace the XML format, along with efficient code to go between the DTO class and the page-specific ticket objects. This change proved enormously helpful to performance. 10.3.7 Handling Queues in Customer Service Application One coding change that proved fruitless was to refactor how RRD handles JMS queues in the Customer Service message producing logic. The generated code opens a queue for every message sent. The team tried code to keep the queue open but ran into transaction issues. Moreover, they discovered that WebSphere maintains a connection pool for the queue, so the effort was unnecessary.
10.4
RRD Round: Tuning the System for Performance
This section discusses various actions the WebSphere team took to tune the ITS system, apart from changes to the application code. It discusses the strategy and performance indicators used, the variables tuned, and some important issues address that arose in the process. 10.4.1 Tuning Strategy With so many variables affecting performance in different and interrelated ways, one can easily waste a great deal of time if one does not have a strategy. From previous experience the team settled on a strategy with these basic elements: 1. Work up and out. In other words, take a stab at the application server parameters, including JVM parameters. Find the maximum load without errors. Get into the ballpark, dont try for final precision. Then go to one end of the system (the database or the Web tier) and work toward the other. At each point (for each tuning variable), try making a big change, such as doubling the size of a pool. Run a quick test, see if it had any effect. Use binary chopping to zero in on the optimum value. Never change two things at once. Stick with the plan, resist taking shortcuts as time runs out. One exception to this rule is where two variables are related, such as the Web container thread pool and the database connection pool. Realize that precise tuning requires several passes through the system.
2. 3.
4.
5.
Regarding the last point: because the team spent so much time optimizing code, they really had time for only one pass through the system. 10.4.2 Performance Indicators The team used a number of indicators to measure performance during tuning:
Page hits per second. LoadRunner provides average response times for each individual action in a test script. With some scripts comprising as many as two dozen actions, and when you have to run many tests in a long tuning process, it takes too long to record all the response times. response as a. The total page hits/second statistic in LoadRunner provides a handy summary performance indicator. It tells you the peak load that the application can handle. As a load test ramps up the number of users, at some point hits/second reaches a plateau before response times climb significantly and errors accumulate. This level amount represents the peak user load. To calculate peak user load from hits/second, use this formula: users = ( hits/sec ) * ( total user think secs in script / total hits in script ) All the scripts used in this study had 5 seconds of think time per Web request. If a script had 15 requests with a total of 20 hits, then users = ( hits/sec ) * ((5 user think secs / request ) * ( 15 actions / script ) / ( 20 hits / script )) users = ( hits/se c ) * 3.75 user-sec/hit In other words, it would take approximately 2,250 users to generate 600 hits/second. CPU usage. To see how hard a machine was working (CPU usage), the team used top on the Linux servers and Performance Monitor on the Windows machines. The latter also provided indicators of disk activity to tell them how hard the database was working. Response times. For more specific issues the team looked at response times of individual actions in a script. 10.4.3 Tuning the JVM Tuning the application servers Java virtual machine centers on two issues:

the amount of heap memory allocated to the JVM the duration and behavior of garbage collection (GC)
The two settings are related, because while a larger heap allows the JVM to work with more objects and possibly provide greater throughput, it also means that GC must work harder when heap begins to fill. 10.4.3.1 Garbage Collection Previous performance studies had taught the team that GC is a critical issue in J2EE applications. Standard JVMs offer two garbage collection modes:
Non-concurrent. The garbage collection thread sleeps most of the time, then periodically wakes up to collect garbage. When it does, it pauses the JVM, leading to a backlog of requests that under load can be overwhelming. Concurrent. Concurrent mode spreads the performance cost of garbage collection out over time. The garbage collector runs at a low level in the background most of the time. This reduces throughput during the steady state. But when full GC kicks in, it takes less time because the collector has been working continuously, so the backlog of requests doesnt build to a critical level.
Unrecognized Parameter Suns JVM uses the parameter -XX:+UseConcMarkSweepGC to turn on concurrent GC. However, when the team tried it, they got an error indicating that the JVM could not start. The
reason was that WebSphere 5.1 is installed with IBMs JVM 1.4.1, which does not recognize this parameter. The team briefly tried having WebSphere use Suns JVM instead, but quickly ran into other errors. So they abandoned this effort and stayed with the original JVM installation. See Section 10.6.1 for more details. Garbage Collection with IBMs JVM IBMs JVM has a parameter similar to Suns: -Xgcpolicy toggles concurrent marking of objects:

-Xgcpolicy:optthruput turns concurrent mark off (to optimize thruput). This is the default setting -Xgcpolicy:optavgpause turns concurrent mark on (to optimize the pause due to full GC).
During tuning, the team experimented with the latter policy. When they reached a stable configuration, however, they found that the system performed as well with concurrent mark turned off, and left that setting in place. Garbage Collection Guidelines The IBM literature on performance tuning suggests that the JVM should optimally be spending 10 an average of about 15% of its time collecting garbage. If GC is less than this, the JVM may be wasting memory; if more, the JVM is working too hard, indicating that heap may be too small and/or objects are not being used efficiently. The team used this guideline as it examined application server performance. It relied primarily on Tivoli Performance Viewer to give statistics on garbage collection. 10.4.3.2 Heap Size As part of the JVM tuning process, the team determined the best heap sizes for the servers. WebSpheres heap settings default to a maximum size of 256 MB and an initial size of 64 MB (25% of maximum). To find the optimum heap size, the IBM literature suggests the following procedure: 1. 2. 3. 4. 5. Choose a maximum size, say 128, 256 or 384 MB. Set the initial size to 25% of maximum. Use the verbosegc JVM parameter, which prints output of GC and heap expansion activity. Run the server under load. See where heap size stabilizes and where GC falls to an acceptable level (around 15% of total CPU time). Repeat for different heap sizes.
The team followed this test procedure with different heap sizes, ranging from 128 to 768 MB. They chose 768 MB as the upper end of the range because each machine had 1 GB total RAM, and a rule of thumb derived from previous experience suggested devoting no more than 75% of total RAM to the application servers. What they discovered is that WebSphere does not run better with significantly more memory. Ultimately the team left the Customer Service servers at 256 MB but increased the Work Order servers heap size to 384 MB.
10
The IBM Redbook, IBM WebSphere V5.1 Performance, Scalabilityand High Availability, says: The average time between garbage collection calls should be 5 to 6 times the average duration of a single garbage collection. This translates to a range of 14-17%.
Once the optimum heap size was determined, the team set the initial size = the maximum size, so that the server doesnt waste time adjusting heap size incrementally. 10.4.4 Vertical Scaling The team also experimented with vertical scaling, the technique of running multiple WebSphere instances on the same host to improve throughput. Would multiple instances running the same application help? What about dedicating an instance to each of the three Work Order modules? On the Customer Service side, the team tested 2 and 3 instances per host. (At 256 MB heap per instance, 3 instances was the most they could run.) On the Work Order side, they tested two alternatives:

2 identical instances at 384 MB each, with all three applications deployed to both 3 instances at 256 MB apiece, each dedicated to one of the three applications
These alternatives did not improve performance. WebSphere on Linux apparently spawns additional threads that act like processes. So, after much testing, the team found that the basic configuration of one instance per machine was best. On the Work Order side, that one instance hosted all three Work Order applications (the Web application, the message consumption application and the Web service for queries from Customer Service). 10.4.5 Database Tuning The team took these major actions to tune the database: Create indexes. The team found that creating an index on each individual field used in a query was most efficient. So, for the WorkTickets table for example, they created separate indexes on CreationDate, WorkStatus and other fields used in queries. Adjust buffer cache size. Oracle caches recent query results; up to a point, the more it can cache, the faster it performs. The Oracle Enterprise Manager console shows the optimum cache size. The team returned to this setting periodically to make sure it was adjusted properly. 10.4.6 Tuning JDBC Settings Within WebSphere there are two important JDBC settings:
Connection pool size. The number of connections in the pool affects how long a thread in the Web container must wait to carry out a JDBC action. The team set this pool size equal to that of the Web container thread pool (since the applications did not use EJBs for persistence). They also set the initial size = the maximum size (as they did with all pools) to get the system initialized more quickly. Prepared statement cache size. WebSphere caches prepared statements. The more different SQL statements used in the application, the greater this number should be.
10.4.7 Web Container Tuning 10.4.7.1 Web Thread Pool The team experimented with different sizes for the thread pool. The optimum is related to the size of the JDBC connection pool and the Apache parameters. 10.4.7.2 Maximum HTTP Sessions RRD is greedy when it comes to using HTTP sessions. Every generated servlet asks for a session on every invocation; even an action that logs out the current user (invalidating the
current session) immediately creates a new session when it redisplays the home page! The ITS specification required sessions to last at least 15 minutes, and there was no guarantee the load scripts would perform explicit logouts (not that they would help anyway). So sessions would linger until they timed out. After some experimentation the team settled on a maximum session count equal to twice the peak number of users. 10.4.8 Web Server Tuning Apache has several settings to tune, all found in its configuration file httpd.conf:

MaxClients ThreadsPerChild ListenBacklog
These settings are related to each other in the following ways:

MaxClients sets the limit on the number of simultaneous HTTP requests that will be served. Any connection attempts over the MaxClients limit go into the queue, whose length is determined by Apaches ListenBacklog setting (as well as Linuxs TCP backlog setting). Apache creates multiple child processes, each of which has ThreadsPerChild threads. So MaxClients is the maximum number of threads operating simultaneously. MaxClients / ThreadsPerChild must be an integer and cannot exceed 16 (Apache creates no more than 16 child processes).
In the case of Apache threads, the team found that more is not always better. There came a point where reducing MaxClients improved performance. The reason: if resources behind Apache are choked, making users wait improves performance. The team also experimented with using multiple Apache instances, but found they didnt help. 10.4.9 Session Persistence The ITS specification required that the Customer Service application share session state between the two clustered instances. Session sharing makes possible seamless failover in a distributed system such as ITS. This requires persisting sessions to some kind of store. Over the course of the RRD round, the WebSphere team wrestled with several techniques for persisting session state. Additionally, the team used other WebSphere settings to control the frequency of session persistence and thereby tune for performance. See Section 10.6.7 for a full discussion.
10.5
WSAD Round: Issues
10.5.1 Use of External Libraries and Classloading in WebSphere In both the RRD and WSAD implementations, the team wrote custom, system-wide code and packaged it into external libraries. They deployed these libraries to WebSphere by dropping them into the <WebSphere home> /AppServer/lib/ext folder, then bouncing the server. In the RRD round the use of these libraries had no effect on how the applications were deployed. But in the WSAD round it did. The team suddenly got NoClassDefFoundErrors on code that used classes from one of the libraries.
It took a few hours to find the remedy. WebSphere has an application deployment setting that controls the order in which class loaders are invoked. The default is parent first (WebSpheres class loader before the applications). When they changed the setting to parent last, the error went away. While making the error go away, this remedy did not explain why the error had appeared with the WSAD code but not the RRD code. Eventually the team identified the key difference: In the RRD implementation, the application classes used classes in the library but did not extend them. In the WSAD implementation, application classes extended library classes. It was the loading of these dependent classes that caused the error. Even then a small headache remained. If a team member had to uninstall and reinstall an application, the classloader setting always reverted back to its default. It required a few extra steps and a few extra minutes to set it correctly. 10.5.2 Pooling Objects The WSAD implementation made extensive use of data transfer objects (DTOs). The code included a dozen DTO classes, one for nearly every data entity in the domain and several custom DTOs for special queries. Realizing how expensive the creation of DTOs can be, Developer B created a simple class to manage a pool of objects.

The ObjectPool class was designed as a wrapper to a java.util.Stack. If the stack is empty when a client requests an object, the pool creates a new one. Each DTO type had its own subclass of ObjectPool. To avoid duplication of pools, each subclass was designed to create a static singleton instance of itself. The pooling logic was very simple, using synchronized methods to keep it thread safe. But as long as the expense of calling those synchronized methods was less than the expense of creating and garbage-collecting DTO instances, it would be worthwhile.
Refactoring the code to use this class proved fairly easy, and subsequent testing showed that object pooling improved performance by 5-10%. After refactoring the code to use the new class, Developer B realized he could have built a single pool class to cover all cases. It would have maintained a hash map of pool instances, keyed to DTO class type. This change would have improved code simplicity but not performance, so he didnt pursue it. 10.5.3 Streamlining the Web Service I/O In looking at the XML returned by the Work Order Web service in response to Customer Service queries, the team found two inefficiencies:

XML element names were very long. They were human readable names matching the column names in the source table. The Web service returned all columns in a WorkTicket row, even those that the client did not use.
Developer B refactored the code to shorten the element names to one or two characters each, and to eliminate the unneeded data elements. For a query returning a full page of data (11 rows ), these changes shrank the size of the XML from about 7500 characters to about 2900. That shrinkage improved performance noticeably.
10.5.4 Optimizing Queries While most database activity in the system was simple (meaning it involved only one action on one or perhaps two tables), some actions were more complex. For example, the Work Order application processed messages from the Customer Service Web application calling for creation of a new ticket. To do so, the Work Order application had to execute three steps: 1. 2. 3. Use an Oracle sequence to get a ticket ID (primary key) for the new ticket. Get the ID of a technician assigned to the specified customer and building. Insert a new row into the WorkTickets table using the information gathered in Steps 1 and 2 plus the other data in the message.
As noted earlier, the team did not use stored procedures to handle complex database operations like this. So as a first cut, this operation would require three database calls. Nesting one call inside another can help. The SELECT statement invoked in Step 2 can be nested inside the INSERT in Step 3, cutting the number of invocations to two. But the SELECT used in Step 1 cannot be nested. The team learned, however, that Oracle PL/SQL logic can be passed explicitly as a JDBC statement. So they constructed the following prepared statement to handle this operation in a single database call: DECLARE tickID INT; BEGIN SELECT SEQ_TICKETID.NEXTVAL INTO tickID FROM DUAL; INSERT INTO WORKTICKETS (TICKETID, TECHNICIANID, CUSTOMERID, BUILDINGID, TYPEID, CONTACTNAME, CONTACTEMAIL, CONTACTPHONE, CONTACTFAX, FLOORNUMBER, ROOMNUMBER, WORKDESCRIPTION, TICKETSTATUS, TICKETPRIORITY, CREATIONDATE, SUBMITTEDBY) VALUES (tickID, (SELECT TECHNICIANID FROM TECHNICIANS WHERE CUSTOMERID = ? AND BUILDINGID = ? AND ROWNUM = 1), ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?); END; Using this SQL in place of two separate database calls significantly improved performance on processing messages for new work tickets.
11
10.6
Significant Technical Roadblocks
This section provides details on the important technical problems the team encountered.
11
Oracles native language for stored procedures.
10.6.1 Switching JVMs with WebSphere When the team was ready to tune the WebSphere JVM for garbage collection (see Section 10.4.3.1), they tried using Suns JVM parameter -XX:+UseConcMarkSweepGC to turn on concurrent GC. However, this produced this error indicating that the JVM could not start: Unrecognized option: -XX:UseConcMarkSweepGC Unable to create the Java virtual machine This error occurred because the JVM installed with WebSphere 5.1, IBMs Java 1.4.1, does not recognize this parameter. Concerned that using IBMs JVM might deny them important options provided by Suns JVM, the team deciding to try having WebSphere use Suns JVM instead. They downloaded Java 1.4.1_04 for Linux from Sun and installed it on one of the Customer Service hosts. To point WebSphere to this JVM, they used the WebSphere admin console to change the value of the JAVA_HOME environment variable. While the server started and appeared to be running, it did log an error indicating that it could not find a particular IBM library. This was due to the changed Java home. At this point, suspicious that the server had not started cleanly, and fearful of opening a Pandoras box that would waste valuable time, the team abandoned this effort and reverted back to the original JVM installation. 10.6.2 Configuring Linux for Edge Server, Act 1 When first starting Edge Server, the team got an error: kernel extension not loaded. IBM tech support told them that Edge Server doesnt support their version of the Red Hat Linux kernel, namely 2.4.9-e.34. IBM said theyd have a patch in a day or two, then later said they couldnt produce a patch, so the kernel had to be upgraded. The team upgraded to 2.4.9-e.38 on the Edge Server host. 10.6.3 Configuring Linux for Edge Server, Act 2 Once past the kernel upgrade issue, the team ran into a more vexing problem on the machines in the cluster. This issue proved a major challenge, costing the WebSphere team many mandays of effort. Technical Background First some background. IBM Edge Server performs load balancing using a technique called MAC forwarding. It redirects a request at a low level by altering the MAC address (machine address) in the packet. The machines and IP addresses involved in this project are shown here: Machine Customer Service 1 (WEB01) Customer Service 2 (WEB02) Edge Server host (MQ01) IP Address 192.168.4.201 192.168.4.200 192.168.4.202 192.168.4.200 192.168.4.210 192.168.4.200 Description Physical address assigned to network card Cluster address; alias on loopback device Physical address assigned to network card Cluster address; alias on loopback device Physical address assigned to network card Cluster address; alias on network card
192.168.4.200 is the address of the Customer Service cluster. Requests using that address must go to the Edge Server host, so that host is assigned the cluster address as an alias. When a packet comes in addressed to the cluster, it contains both the IP address and the MAC address of the destination machine, namely the Edge Server machine. Based on the load balancing algorithm, Edge Server chooses a cluster member to handle the request. It changes the MAC address to that of the cluster member, but leaves the IP address alone. For MAC forwarding to work, every machine in the cluster must have the cluster IP address as an alias. However, the clustered machines should not respond to ARP requests (broadcast requests asking for the MAC address associated with an IP) on the cluster IP address. The solution is to alias the machines loopback device. Doing so requires a simple ifconfig command. Unfortunately, Linux has a defect such that aliases on the loopback device do respond to ARP requests. This means when an ARP request goes out for the cluster IP, all the machines in the cluster respond. That should not happen. Theres a Linux patch (the so-called hidden patch) that lets you hide those IPs from ARP requests. This patch is documented in the Edge Server Network Dispatcher Admin guide. The patch came from http://oss.software.ibm.com/developerworks/opensource/cvs/naslib Patching the Linux Kernel, Take 1 Applying the patch required rebuilding the kernel (since thats how things are done in Linux). While the team had experience using Linux, they had never patched a kernel before. Developer A, slogged through to do this on ITSWEB01. These were the basic steps he followed: 1. 2. 3. 4. Locate the kernel source code on the installation CD and copy it to the machine. Install certain additional Linux packages that hadnt been installed in order to get the kernel to recompile. Apply the patch (supplied as a diff file) as follows: patch b -verbose p0 < hidden-2.4.5-1.diff Follow directions from Redhat tech support to compile the new kernel.
When he rebooted, ITSWEB01, he got errors regarding the ethernet devices. The machines each had two network cards, a Linksys Gigabit card and a Compaq card: the first was assigned to eth0, the second to eth1. On reboot, eth0 was not recognized at all. And eth1 gave an error: Dev eth1 has different MAC address than expected. Next the team tried fiddling with the Network Configuration, but with no luck. By weeks end they were no closer to a solution. The following Monday the team tried reinstalling the network card drivers. This took them on a tour of a different circle of Linux hell. Some of the highlights:
The Linksys CD had two sets of drivers for the Gigabit card. But the install script for what looked like the correct drivers said they required 2.4.13. For earlier kernel versions it suggested upgrading to a newer version. The team tried using the Compaq ProLiant Support Pack for Red Hat Enterprise Linux 2.1, from the Compaq / HP web site. Perhaps it might update the drivers properly. Getting it to run was a series of small obstacles, and ultimately it failed to help anyway. Finally the team rebooted with the original kernel, but the boot failed because the root partition had run out of disk space.
Patching the Linux Kernel, Take 2 At this point the team called a colleague offsite with greater Linux knowledge who helped them free up disk space and remake the kernel. After rebooting, eth0 was active, eth1 (the Compaq card) was inactive. The team tried installing the bcm5700 driver for the card. With another series of acrobatic maneuvers, including manually editing the file /etc/modules.conf, the team got eth0 and eth1 to activate properly. With the patch installed, the team applied the alias to the loopback device with this command: ifconfig lo:1 192.168.4.200 netmask 255.255.255.255 up And they applied the hidden flag like so: sysctl w net.ipv4.conf.all.hidden=1 (turn on hidden patch) sysctl w net.ipv4.conf.lo.hidden=1 (apply hidden flag to loopback device) All that remained was to transfer the configuration to the second Customer Service machine. Also not a trivial process (is anything trivial in Linux?), but with help from the offsite colleague the team got through it. 10.6.4 Configuring Linux for Edge Server, Act 3 Testing the Hidden Patch The team worked with Edge Server for some time, but found it not working properly. After wrestling with its configuration settings, they at last delved into the network layer to see whether MAC forwarding was working as expected. Using the arp utility arp a arp d (to list the table) (to clear the table)
They learned that the cluster IP address was being associated with the MAC addresses of the two Customer Service hosts, but not the ES host. They suspected that the hidden patch had not taken hold. They performed a definitive test as follows: 1. 2. 3. 4. Powered down the two CS hosts Clear the arp table on a client machine Ping the cluster IP from the client machine Read the arp table.
Result: the arp table had the cluster IP associated w/ the ES box, as it should. Next they tried the opposite test: 1. 2. 3. 4. Boot ITSWEB01, activate its cluster IP alias on lo:1 Deactivate the cluster IP alias on the Edge Server host Clear the arp table on a client machine Ping the cluster IP from the client machine
Result: ITSWEB01 responded (it should not have). So the hidden patch had not worked.
Why the Failure? Why had the previous attempt to apply the patch failed? In retracing their steps, the team discovered an apparent discrepancy in how the patch (the diff file) was applied. The team member had used this command: patch b -verbose p0 < hidden-2.4.5-1.diff when the documentation had this command (-p0 vs p1): patch b -verbose p1 < hidden-2.4.5-1.diff The p option has to do with the number of characters to strip from file names. Was that the explanation? The team tested both options using the dry-run option; -p1 showed all successes whereas -p0 showed failures. Patching the Linux Kernel, Take 3 Discovering this apparent discrepancy, the team retraced their original procedure for applying the patch. But halfway through it they ran into more disk space problems on the root partition, as well as intimidating errors like this one: Mount: wrong fs type, bad option, bad superblock on /dev/loop2, or too many mounted file systems (could this be the IDE device where you infact use ide-scsi so that sr0 or sda or so is needed?) Cant get a loopback device. Patching the Linux Kernel, Take 4 At this point the team called a local Linux expert to come onsite and clean up the mess. In about 6 hours he freed up disk space, cleaned up the kernels and applied the hidden patch successfully to both Customer Service machines. After that, the hidden IP addresses were no longer an issue. The team could move on to other problems with Edge Server. 10.6.5 Configuring JNDI for WebSphere ND After installing the Network Deployment version of WebSphere and clustering the Customer Service servers, the team found the Customer Service application throwing a JNDI NameNotFoundException. For example: javax.naming.NameNotFoundException: Context: WASCell/nodes/ITSWEB01/servers/nodeagent, name: ITSCustServ.CustomerUpdateMessageHome: First component in name ITSCustServ.CustomerUpdateMessageHome not found. The error occurred when the application tried to look up the JDBC datasource or the EJB home for the session beans that produced messages. The frustrating thing was that this error didnt occur when they ran the application on a standalone server. Developer B first thought that the offending part of the name, ITSCustServ, was already representing a target object, and therefore couldnt also be a context for this compound name. So in the WebSphere console he changed the name to ITS.CustServ. But this had no effect; in fact the error message was unchanged, suggesting that his change to the JNDI name had not been recognized at all. Next he tried inspecting the JNDI tree. WebSphere has a utility, dumpNameSpace, that lets you get the tree in text form. He verified that the entries from the console were there. But a federated environment uses a federated JNDI setup. So the tree is full of links to other places.
In fact, if you use dumpNameSpace with the default JNDI port 2809, you dont see the entries. You need to use port 9811 for the right location. Knowing the names were in the namespace, he then looked for the reason why they werent found. He found the code RRD generated to do the session bean lookup. It did the following: 1. 2. 3. 4. Look up the AppServerDescriptor Get properties from the AppServerDescriptor Create an InitialContext using the properties. Do the lookup, with the JNDI name hard coded.
12
The key thing was that RRD hard codes the JNDI name , so the change the developer had made in the WebSphere console didnt matter. He put it back to the old value. So if the name was valid, then the code had to be looking for it in the wrong place. The provider URL is specified in a RRD-generated file, ITSCustServ_RtConfig.properties . When you set up a WebSphere deployment model in RRD, that URL defaults to iiop://localhost:2809. What you really want to use is the WebSphere bootstrap port. On a standalone server, its 2809. But in a federated environment, 2809 is used by the node agent on the local machine. The bootstrap port is some other value, determined at runtime. It points you to a location service demon, which is what you want instead. If you were coding straight J2EE, youd instantiate InitialContext with a default constructor and use the default value automatically. But you cant do that with RRD. So you have to change the configuration. At first the developer thought of changing the provider URL setting in RRD and reconstructing the application. After a couple of tests with no change in outcome, he realized that the ITSCustServ_RtConfig.properties file on the target server (ITSWEB01) was not being updated, because he was not using RRD to deploy the application from his development machine. So he manually copied the file to ITSWEB01. That produced a new error earlier in the process, on the lookup of the JDBC datasource. The relevant part of the properties file is an XML element (the file is an XML file, despite its misleading extension) with an attribute java.naming.provider.URL. Its value initially was iiop://localhost:2809, from the default RRD setting. The developer tried manually changing it to iiop://localhost , then iiop://localhost/ , both without success. He also tried an empty string, which gave a different error (because the empty string was treated as the explicit provider URL value). Finally, he decided to add some code to instantiate InitialContext with a default constructor, verify that JNDI lookups on it would work and see what setting it contained. He did and it did. The setting it used was corbaloc:rir://NameServiceServerRoot. When he put that URL into the properties file, it worked. After all that, it occurred to the developer to delete the java.naming.provider.URL attribute from the properties file entirely. This worked too, and proved better than hard coding the URL. 10.6.6 Edge Servers Erratic Behavior IBMs Edge Server, the component used for load balancing and failover, can be configured in two modes:
12
In fact, RRD doesnt let you choose the JNDI name for the session bean; it creates a name as appName.messageName Home
You can assign fixed weights to the different targets that govern the how load is balanced among them. This mode is similar to DNS round robinning; it balances load but does not offer graceful failover. You can configure a load manager that continually pings the targets to confirm their health, and diverts load away from a target that cannot be reached. The manager pings the target by invoking the URL of a page on port 80 of the target machine. The page does not have to exist; as long as some reply even an error comes back from the target, the manager is satisfied.
The team configured a manager to handle failover, but over the course of working with Edge Server they found it behaving erratically. The manager would mark a target down for no apparent reason. The team wrestled with the Edge Server configuration, looking for configuration mistakes that might explain this behavior, but found none. After that they simply switched to fixed-weight mode to avoid the inconvenience during the tuning phase. But during the RRD Round Testing Phase this problem became critical. Only during testing for the WSAD round did a possible explanation emerge. The team discovered a bad network card on the Work Order host (see Section 10.6.11 for the full story). Even though Edge Server had no communication with that host, the team wondered whether the failing card had created noise on the network that interfered with Edge Servers pinging the Customer Service targets and made it think one had failed. Although they could not prove or disprove that hypothesis conclusively, after the bad card was replaced Edge Server behaved flawlessly. 10.6.7 Session Persistence As noted above, the ITS specification required that the Customer Service application share session state between the two clustered instances to facilitate seamless failover. The WebSphere team tried a progression of techniques to persist session state:

Writing session data to a database Replicating sessions in memory in a peer-to-peer topology Replicating sessions in memory in a client-server topology
Additionally, the team used other WebSphere settings to tune for session persistence performance. This section describes in detail the session persistence techniques and the teams experience with them. 10.6.7.1 Persisting to a Database When the WebSphere team finished developing the RRD implementation, they were required to demonstrate its functionality, including session sharing. During the development phase, the team had only installed standalone instances of WebSphere on each production server. This configuration of WebSphere made possible only one technique for sharing sessions: persisting to a database. The team chose the Customer Service database as the logical target and set up a separate, dedicated datasource to that database for that purpose. To configure session persistence, all you need specify is the datasource and database login. WebSphere automatically creates the necessary table. While this technique satisfied the functional requirements of the RRD development phase, the team discovered during tuning, however, that it created a significant performance bottleneck. The team tried reducing it by adjusting the settings for tuning session persistence (see Section
10.6.7.3 below), but could not get any improvement. At that point they looked for an alternative. 10.6.7.2 In-Memory Replication Having installed WebSphere Network Deployment edition offered that alternative, a feature new to WebSphere 5: in-memory replication. In this technique, sessions are replicated in other WebSphere servers on the same and/or other hosts. This technique requires you to define a replication domain in WebSphere, an internal messaging domain built on a lightweight, embedded JMS infrastructure. In configuring the replication domain you indicate which WebSphere instances will act as replicators (message servers) and which will use the domain (message clients). A single WebSphere instance can do both. In-memory replication offers two basic topologies: Peer to peer. In this topology, each clustered server in the replication domain holds not only its own sessions but those of all the other clustered servers. This topology saves you from configuring additional servers, but it has certain implications for performance and failover:

Lots of duplicate messages. If the cluster has ten servers, every session is replicated to nine destinations. That requires nine messages in the domain. Greater memory requirements. Every server must have sufficient memory to hold the sessions for all ten servers, not just its own.
Figure 5. Peer -to-peer topology for in-memory session replication
Client-server. In this topology, additional WebSphere servers act as repositories for replicated sessions. The clustered servers send their sessions to these repositories, but do not themselves replicate sessions from other servers in the cluster.
Figure 6. Client-server topology for in-memory session replication
The team briefly experimented with the peer-to-peer topology. But given the memory overhead it imposed, and given the availability of a host guaranteed not to go down, they shifted to the client-server topology. They created a WebSphere server on the MQ host for this purpose and configured it as the session server. 10.6.7.3 Tuning Session Persistence Regardless of whether you persist sessions to a database or replicate them in memory, you can tune session persistence for performance. WebSphere has two configuration settings that let you control:

How frequently to write session data. You can choose to write the data at fixed time intervals or after each servlet service. What session data to write. You can choose to write the entire session or only the updated attributes.
These settings let you optimize for performance (infrequent writes, update only), optimize for failover (write after every service, write all session data), or something in between. The team experimented with different combinations. With database persistence, their focus was on improving the poor performance. They chose to write updated attributes only rather than the entire session, and found this choice improved performance marginally. As for controlling the write frequency, they could find an acceptable tradeoff between failover and performance. Writing after every servlet service was just too slow. Writing at fixed intervals improved performance only when the interval was too long to provide reliable failover. This poor tradeoff led them to switch to abandon database persistence. With in-memory replication, the team found they could write the session after every servlet service. They also tried writing updates only to improve performance. But at that point the team encountered strange out of memory errors related to session replication. Despite
spending a great deal of time diagnosing this problem, including using WebSpheres diagnostic trace service to examine the session replication behavior in detail, they could not explain it. But when they switched from writing updates only to writing all session data, the problem disappeared. 10.6.8 Hot Deploying Changes to an Application The manageability portion of the study examined how effectively the team could deploy changes to an application running under load. The normal application deployment procedure via the Deployment Manager stops and restarts an application, which would cause numerous errors under load. So the team needed alternatives to handle two sets of special circumstances:
On the Customer Service side, the team could deploy to one server at a time, downing it first if necessary. But deployment would have to be handled in a special way, because the normal process would deploy to both clustered servers at once. Since the Work Order application was running on only one machine, any changes to it would have to be deployed while it was running. This would require hot deployment .
13
The IBM WebSphere literature describes how to hot deploy components.in a Network Deployment environment. But the description is contradictory in a couple of places:
First it issues this caution: CAUTION: Do not use hot deployment to update components in a production deployment manager managed cell. Hot deployment is well-suited for development and testing, but poses unacceptable risks to production environments. Then, later on, it says: For changes to take effect, you might need to start, stop, or restart an application. (This would, of course, defeat the purpose of hot deployment.)
Nevertheless, the document describes the various facets of WebSphere that go into the process:

The exploded, deployed application is located in < WAS home> /installedApps/<cellName>/ application_name.ear Application metadata can be located in a separate place under < WAS home> /config/cells/ < cellName>/ applications/ < application_name.>ear/deployments Each deployed application has settings to enable/disable reload and control the reload interval. Changing these settings requires restarting the application.
From this information, and with much experimentation, the WebSphere team put together procedures for updating the applications. For the Customer Service application, they used this procedure in the RRD Round: 1. Copy installedApps/../itscustserv.war folder from WebSphere on the development machine to a staging area on both Customer Service hosts. (To save time, just copy WEB-INF, META-INF and maybe common subfolders.) Also copy config /../itscustserv.war folder. Stop the first Customer Service server (see Section 10.6.9 for a discussion of how to do this gracefully). Copy the contents of the staging area to their respective locations in the WebSphere installation.
2. 3.
13
Go to the WebSphere Information Center at http://publib.boulder.ibm.com/infocenter/ws51help/index.jspand search on hot deploy.
4. 5.
Restart the server. Repeat steps 2-4 for the other server.
Later, in the WSAD Round, they developed this alternative procedure: 1. 2. Turn off the Deployment Managers automatic synchronization with nodes. Deploy the updated Customer Service application through the Deployment Managers Admin console in the normal way, but do not synchronize when saving the changes. This installs the updated application on the DM machine only. Stop the first server. In the Deployment Manager, synchronize with the first Customer Service node. This copies the installed application into place. Restart the server. Repeat steps 3-5 for the other server.
3. 4. 5. 6.
For the Work Order application: 1. 2. 3. Install the application with reload enabled and the reload interval set to a reasonable value, such as 60 seconds. Copy the changed components to their target locations under installedApps. Wait until the reload interval has passed to see the changes.
Note that while these procedures cover most situations, there are some circumstances they cannot handle:

If you update a custom external library deployed to WebSpheres lib folder, you must restart the server. If you update an EJB whose stub is cached (as, say, in a service locator class), the stub becomes stale.
10.6.9 Configuring for Graceful Failover A robust distributed system architecture must meet two important requirements:

Load balancing. It must balance client traffic evenly among available servers Seamless failover. If a server dies in the middle of a conversation with a client, another server should pick up the conversation without the client knowing it.
To meet these requirements, the architecture must provide two additional capabilities:

Session persistence. Seamless failover requires applications to persist the client sessions so they are not lost during failover. Server affinity (stickiness). Session persistence makes it possible for any server in the cluster to handle any part of the client conversation, by first retrieving the session from the persistent store. But session retrieval is expensive and slows performance; it should be minimized during normal operations. So the smart system directs all requests from the same client to the same server.
While the WebSphere team was able to address load balancing effectively, seamless failover proved a huge challenge. The team tried several topologies:

A standard topology, suggested by default installations and the IBM literature. A small but important modification to the standard topology.
A major departure from the standard topology.
This section describes these topologies in turn and the teams experiences with them. 10.6.9.1 Failover Requirements The load balancing and failover issues had to do with the Customer Service module of the system, which was deployed to two clustered machines. For the testing phase the WebSphere team needed to handle two types of failover:

Controlled shutdown of a server (as to redeploy an application) Sudden, catastrophic failure (as if someone tripped over the power cord)
For the first situation, the team had to be able to redirect load away from a server before downing it. If they stopped a WebSphere server or even simply stopped the application in the server, errors would occur. The only tool available for gracefully stopping load was Edge Servers quiesce function. Quiescing a target server means reducing to zero the traffic to that server. If you use Edge Server to provide server affinity, you must choose whether or not to quiesce immediately. If you choose not to, Edge Server allows ongoing conversations to finish on the same target server, which means that it may take much longer to shift load away from that server. 10.6.9.2 Standard Topology The standard topology, described in the IBM literature and suggested by WebSpheres default behavior, has these features:

One instance of WebSphere on each clustered machine One instance of Apache (IHS) on each machine Edge Server balancing load between the two Apache instances and redirecting load when one goes down The Web server pluging using the default configuration generated by WebSphere.
Figure 7. Standard system topology for load balancing and failover
Whats interesting is that Edge Server is used simply to balance load bet ween the Apache instances and provide failover if one of them dies. The Web server plugin, does most of the
work: it provides both load balancing and server affinity (stickiness) with respect to the two WebSphere instances. It also provides failover capability if one of the WebSphere instances goes down. How the plugin distributes load among the target servers is governed by the plugin configuration file, generated by WebSphere. By default the distribution is equal among the targets. The WebSphere team found that this topology worked well for load balancing, but performed very poorly during failover tests. They could not gracefully bring down a WebSphere server. If they used Edge Server to quiesce Apache #1 (the instance on Host #1), Apache #2 would continue to direct load to WebSphere #1.
14
Figure 8. Standard system topology for load balancing and failover after quiescing Apache #1 from Edge Server
10.6.9.3 Non-Standard Topology As an alternative the team tried a different topology with these characteristics:

Each Apache instance served only its local WebSphere instance. Edge Server provided stickiness.
This configuration required manually editing the plugin configuration file to undo the clustering and have each Apache instance serve one WebSphere instance only. Each host had its own customized plugin configuration.
14
WebSphere provides stickiness by appending a unique server ID to the session ID that is passed back to the client via a cookie or URL rewriting. When a new request in the same session comes in, the plugin detects the server ID and routes the request to that server, if possible.
Figure 9. Non-standard system topology for load balancing and failover
This topology also proved troublesome. It did solve the problem of stopping traffic to a WebSphere instance: when the team used Edge Server to quiesce one of the Apache instances, traffic to the corresponding WebSphere stopped as well. But this topology required Edge Server to provide stickiness, and that was difficult to control. Quiescing immediately led to many errors, while quiescing slowly took much too long. The team found it hard to shut off Edge Servers traffic to a target server in a timely, seamless fashion. The team found this alternative worse than the first and reverted back to the standard topology. 10.6.9.4 Modified Standard Topology During the WSAD round the team tried a third alternative that proved effective, a slight variation on the first. In this version they manually edited the plugin configuration file to change the load distribution between the two WebSphere instances, favoring the local WebSphere instance by 100 to 1.
Figure 10. Modified standard system topology for load balancing and failover
This topology allowed the team to take advantage of the plugins stickiness and failover capabilities while at the same time channelling traffic to a specific WebSphere instance. When the team wanted to bring down WebSphere #1, they would do the following: 1. Use Edge Server quiesce Apache #1 immediately. With all traffic going to Apache #2, which favored WebSphere #2 (apart from stickiness), eventually traffic would shift over to WebSphere #2. Monitor CPU activity on Host #1 and wait until it subsided. That would indicate that the traffic had shifted. Bring down WebSphere #1.
2. 3.
Although failover still was not seamless, this was the most successful topology. 10.6.10 Deploying the WSAD Web Service For this implementation the team completely rebuilt the Work Order Web service in WSAD. Testing it locally on the development machine using the WSADs WebSphere test environment yielded no errors. But when it was deployed to the production environment, the Customer Service application could not reach it. The application received a Page Not Found error, and logged a java.lang.ExceptionInInitializerError in the error log. The Web service address is found in 2 places:

WEB-INF/wsdl/SearchTicketsWS.wsdl in the Customer Service WAR file A client class that WSAD generates from the WSDL and resides in the Customer Service Web application.
Developer B double-checked that both these references were using the explicit IP address of the Web service host. Then he gathered other clues:

When he ran the Customer Service application on the WSAD test environment and tried to use the Web service on the production machine, it worked. 15 When he manually invoked the direct URL of the Web service from a browser on the Customer Service host, it gave back the proper response page.
So the Web service was responding; the problem seemed to be with the client application. Working together, the team noticed that the webservices.jar library in WebSpheres AppServer/lib folder was older than the one in the WSAD test environment. They wondered whether this discrepancy was causing the problem. When they substituted the library from WSADs runtimes/base_v51_stub folder, that solved the problem. 10.6.11 The Sudden, Bizarre Failure of the Work Order Application Early in Phase 2 of the WSAD round, the team tested the Work Order Web application under load and got excellent results: it ramped up to 3000 users (600 hits/sec) before hitting a wall. That was a great start, due to two factors:

Much more efficient code Previous system tuning from the RRD round
15
The URL was: http://192. 168.4.215:9080/itsWorkOrderConsoleWeb/services/SearchTicketsWS ; it responded with a page that simply said And now some services.
The very next day, however, the application started behaving badly. As load ramped up, hits/sec would climb erratically; response times were horrible. The team had no idea what had thrown this particular monkey wrench, so they pursued all the usual suspects:

They bounced all the machines and servers. They regenerated the WebSphere Web service plugin.
None of these actions solved the problem. Moreover, a load test on the Customer Service application gave respectable response times on Web service queries. This clue narrowed the focus to the Work Order application itself. So the team looked at it more closely.

They rechecked the LoadRunner scripts and settings. They ran the application with Borlands Optimizeit Thread Debugger. No sign of trouble with threads. They discovered an apparent error in the SQL for the Web applications ticket query. Fixing it seemed to help, but the improvement was erratic. Other coding changes did not help.
After this round of effort, another test of the Work Order Web application cranked it up over 3500 users. That was nice to see, but it didnt explain what the original problem, or whether the team had solved it. In fact, they hadnt. Despite successful load tests, the horrible results returned, proving again the maxim, Things that go away by themselves can come back by themselves. By the beginning of the testing phase the problem had worsened; the response time of WorkOrder application was orders of magnitude time greater than that of the Customer Service application. The team rechecked everything again in a vain search for the cause. Then a new, tantalizing clue suddenly appeared: Ethernet 0 on the Work Order machine started to fail, going off and back on intermittently. Could the network card be responsible for the poor response times? There were two subnets connecting the servers, 192.168.4 and 192.168.5. The failing network card provided a .4 address, the address used by the load testing scripts. But the Web service (which was performing well) was accessed through a .5 address provided by a second card. So the failing card became the prime suspect in this mystery. CN2 immediately replaced the failing card, after which the team reran the Work Order load test. Lo and behold, the results were now on par with those of Customer Service. (More importantly, those results would remain stable through the rest of the project!) Eve ryone involved concluded that the failing card had been the culprit. But discovering the failing card raised two new questions. First, did it also explain the tendency of Edge Servers load manager to occasionally mark a target server down for no apparent reason (described in Section 10.6.6)? The team wondered whether the failing card had created noise on the network that would fool Edge Server into believing one of its targets had failed. They would keep their eye on Edge Server. (In fact, it behaved flawlessly from that point on.) Second, did the bad card compromise the results of the previous performance tests? Since there was no way to answer that question with certainty, CN2 insisted on rerunning the tests for both the .NET and RRD implementations, to insure the validity of the results. 10.6.12 Using Mercury LoadRunner Mercury LoadRunner is a powerful tool with many useful features. It allowed the team to simulate loads well in excess of the applications capacity, and gave detailed results that helped the team diagnose problems and tune the system.
But like any complex, sophisticated piece of software , LoadRunner can behave strangely if not configured and used exactly correctly. Over the course of tuning and testing their system, the team learned some important lessons about LoadRunner that would apply to comparable products: Adjust the scripts in response to page changes. Any change to the URL of a request (including changes to query parameters) affects a script that invokes that URL. The same is true of the fields in a form; if you add, remove or rename a field, the script must be corrected. Double-check the runtime settings. Twice the team got bizarre test results because a simple LoadRunner runtime setting was wrong. In one test, the Work Order Web application suddenly began overloading at a fraction of the load it had handled the day before. The cause turned out to be the LoadRunner runtime setting governing how think time was handled. The scripts had 5-second think times hard coded before each Web request, but when running a script you can control whether and how that think time is used. The correct setting was to use a random value between 50% and 150% of the 17 coded time . But in this case, think time was accidentally turned off, so naturally the system started hyperventilating very quickly. On another occasion the opposite occurred. The team suddenly found it could ramp up to loads much higher than those achieved earlier. These results were too good to be trusted. Again, the culprit was an incorrect runtime setting in LoadRunner, which controls how many seconds a user waits between iterations of the script. The correct setting was zero, but in this case it had been accidentally changed to 60 seconds. Periodically refresh the clients. The team found it prudent to reboot the client machines occasionally, to insure they performed properly. They also had LoadRunner periodically refresh the scripts on the client machines to guard against potential script corruption.
16
16 17
For some reason, application servers come to mind Randomization is important because it helps stagger the requests and more evenly distribute the load.
11 .NET CONFIGURATION AND TUNING PROCESS SUMMARY

This section describes the process the .NET team went through to configure and tune the .NET and Windows infrastructure. It also describes the major bottlenecks encountered and resolved in the implementation. Here is a high-level summary of the stages the team went through. Details follow in the sections below. 1. 2. 3. Install and configure the software: Network Load Balancing (NLB) and ASP.NET State Server. Resolve code bottlenecks in the implementation. Tune the system for performance.
11.1
Installing and Configuring Software
11.1.1 Network Load Balancing (NLB) Microsoft Windows Server 2003 comes with Network Load Balancing services (NLB) preinstalled, so the .NET team only had to configure the services to suit the application requirements. The diagram below shows the network topology relevant to the ITS Customer Services application and how the .NET team configured it for NLB.
Figure 11. Network topology and Windows NLB configuration for load balancing and failover in the .NET implementation
Since each Web server has multiple network interface cards (NICs), the team decided to configure the NLB in unicast mode for better performance. They also followed a best practice for this mode: connecting the clustered network interfaces to a single hub that is up-linked to the public switch. This practice prevents the NLB from flooding the switch (a condition known as port flooding ), and degrading the entire networks performance. The hub is used by the servers in the cluster to communicate with each other via a heartbeat process. There was an additional requirement for using this setup. For NLB to function properly with the hub configuration, MaskSourceMac had to be disabled. To turn it off, the team set this registry key to 0: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\WLBS \Parameters\MaskSourceMac The hub and the clustered network interfaces would take care of the incoming traffic. With performance in mind, the .NET team configured the second network interface card (gigabit) for dedicated outgoing traffic back to the clients. To simplify the configuration, the team used the interface metric setting for the outbound gigabit NIC to ensure it was the default NIC used for outbound traffic. To do this, they simply used TCP/IP properties to set the interface metric of the dedicated outbound NIC to a lower setting than that of the cluster NIC, which NLB used to communicate over the hub via its heartbeat process. In the diagram, these interface metrics are 1 and 2 respectively This configuration is documented in the help files for setting up network load balancing.
With this design, routing incoming and outgoing traffic through different dedicated channels, the team could achieve better overall network performance. Once the team had figured out the load balancing plan, setting up the clustered environment was straighforward. Using Network Load Balancing Manager, they could configure the clustered environment or an individual host from any server, making the maintenance task very simple. Network Load Balancing has both graphical and command line interfaces in Windows. Commands enable admins to stop, add and monitor servers in the cluster. One particularly useful command is drainstop, which enables a server in the cluster to finish handling its current requests, while ceasing to take new requests. Once all requests have been drained from that server, it can be taken offline for possible maintenance. It is important to note that the .NET team chose to hold session state in an ASP.NET session state server (rather than in the Customer Service servers themselves), and that the session server was placed on a machine outside of the cluster (the same machine handling the durable MSMQ message queue). This topology made the Customer Service application cluster safe, with several important advantages:

Network Load Balancing required no configuration for server affinity (stickiness). Load balancing became more efficient because NLB didnt have to worry about server affinity. Cluster maintenance tasks (adding servers, stopping servers, etc.) became very easy. In failover scenarios, no session state would be lost because the clustered servers themselves were not holding session state information.
There were also possible drawbacks to this topology:

A possible performance cost because the Customer Service application obtained all session data remotely rather than locally. The session state server as a single point of failure. Because the ITS specification guaranteed that the MQ host would always be available, the team did not concern itself with this issue; had it been necessary, they could have used a clustered SQL Server for the session state data store which would have removed the single point of failure.
11.1.2 ASP.NET Session State Server Installing Microsoft Windows Server 2003 also installs ASP.NET State Service by default. However, the .NET Team needed to enable the service in order to start using it. Configuring the ASP.NET State Service is quite simple. 1. 2. 3. 4. Open Administrative Tools, and then click Services. In the details pane, right-click ASP.NET State Service, and then click Properties. On the General tab, in the Startup type list box, click Automatic. Under Service status, click Start, and then click OK. The state service starts automatically when the Web server restarts.
To configure the clustered application to use a single session state server, there are a few documented considerations to keep in mind.

Make sure all applications use the same machine key. The team added the same <machineKey> entries in the Web.config files. Make sure all objects that are stored in session state can be serialized. It is easy to implement this in .NET by adding the Serializeable attribute to each class that needs serializing.
Make sure the Customer Services applications have the identical Application Path on both Web servers. This means ensuring all installations of the application have the same URL.
11.2
Resolving Code Bottlenecks
The .NET team investigated the use of an indexed view for the ticket search query performed by the Web service. However, this had an adverse effect on some of the other ticket searches. As a compromise, they removed the indexed view and added indexes to the appropriate tables. Initially, they had all ticket searches return the top 500 rows. This turned out to be a significant amount of data with a significant impact on performance. After the team implemented paging (which meant each query returned only enough data to satisfy the current page), the database returned an average of 20 rows per search (based on the mix of requests in the test scripts), with a commensurate improvement in performance.
11.3
Base Tuning Process
The .NET team made a significant number of tuning and performance modifications during Phase 2. They are listed below, organized by category. Note: Many of the actions the team took are discussed in the MSDN article Developing HighPerformance ASP.NET Applications in the .NET Framework Developers Guide discusses. See Section 17.2 for the URL. 11.3.1 Tuning the Database

Used the Index Tuning Wizard to optimize SQL queries Removed the technician-status index and added the technician and type indices to the WorkTickets table Added the IX_Technicians_ID_Name_Phone index to the Technicians table Aliased two of the table names to make the batch size smaller Added Creation Date index for WorkTickets Changed SQL Server Memory configuration to dynamically allocate a minimum of 1536MB and reserve physical memory of 10240 Allocated 10GB for the Work Order Processing Database file. Modified the increment to 100MB Created a clustered index on CustomerID, CreationDate in the WorkTickets table
11.3.2 Tuning the Web Applications

Disabled script injection attack validation (since all fields were checked manually) Disabled debug mode in Web.config Turned off request validation, debugging, Windows authentication, and session state in the WorkOrder Web application (forms authentication was used to authenticate users from the user table in the database) Turned off request validation, debugging, and Windows authentication in the Customer Web application (forms authentication was used to authenticate users from the user table in the database) Disabled enableViewStateMac Optimized the Nav user control by converting all link buttons to anchor tags, except for login/logout Set EnableSessionState to false for all pages in the Work Order Processing Web application and most pages in the Customer Service Web application Changed TicketSearchParams to use the query string instead of Session state
Added CustomerID cookie in CustomerService App to reduce Session State during Web service call Disabled unused HttpModules (e.g. output caching)
11.3.3 Tuning the Servers

Enabled hyper-threading on all servers Reduced maximum I/O and worker thread count until CPU dipped Increased maxConnections in Machine.config (set to 200 for tests, although team later discovered a more proper setting would have been ~8, from a default of 2) Set the HKLM\System\CurrentControlSet\Services\TCPIP\Parameters\MaxUserPort registry value to 65534 Set the HKLM\System\CurrentControlSet\Services\TCPIP\Parameters\TimeWaitDelay registry value to 30 Set the HKLM\System\CurrentControlSet\Services\http\Parameters\MaxConnectios registry value to 65534 Changed the httpRuntime appRequestQueueLimit attribute in Machine.config to 3000 Multithreaded the Forwarder and Processor Added ASP.NET worker processes until CPU saturated (2 were used on each machine)
11.3.4 Tuning the Session State Server
Compared ASP.NET Session State Server with Session SQL Server. Since using SQL Server to store session state was somewhat slower, they chose to keep using ASP.NET Session State Server. Modified the state network timeout values in the HKLM\System\CurrentControlSet\Services\aspnet_state\Parameters registry key and the associated sessionState Web.config values
11.3.5 Code Modifications

Modified the code to use a custom collection instead of a DataSet Used a dynamic query in GetWorkTicketListBySearch Implemented dynamic row count to minimize data transfer for paging
11.3.6 Tuning Data Access Logic

Added Connection Reset=false to the connection strings Set the number of SQL connections (connection pool size) to 25 in the connection strings of both the Customer Service and the Work Order Web applications Replaced dynamic SQL with a stored procedure that uses SET ROWCOUNT
11.3.7 Tuning Message Processing One area in which the team experienced significant improvement was message processing. Initially, the message processing application was single-threaded and processed about 2 messages per second. After they enhanced it to be multi-threaded (18 threads in the Processor and 12 threads in the Forwarder), the application was able to process about 170 messages per second. 11.3.8 Other Changes

Turned off DTC tracing Removed unnecessary server controls
11.3.9 Changes to Machine.config The team made these changes to Machine.config:

<connectionManagement> <add address="*" maxconnection="200" /> </connectionManagement> <httpRuntime executionTimeout="90" maxRequestLength="4096" useFullyQualifiedRedirectUrl="false" minFreeThreads="8" minLocalRequestFreeThreads="4" appRequestQueueLimit="3000" enableVersionHeader="true" /> <processModel enable="true" timeout="Infinite" idleTimeout="Infinite" shutdownTimeout="0:00:05" requestLimit="Infinite" requestQueueLimit="5000" restartQueueLimit="10" memoryLimit="60" webGarden="false" cpuMask="0xffffffff" userName="machine" password="AutoGenerate" logLevel="Errors" clientConnectedCheck="0:00:05" comAuthenticationLevel="Connect" comImpersonationLevel="Impersonate" responseDeadlockInterval="00:03:00" maxWorkerThreads="16" maxIoThreads="16" />
11.3.10 Changes Not Pursued Here are some possible optimizations that the team did not do either because the specification prohibited it or because they had insufficient time:

Disable IIS logging. Disable unused NT services. Put DTC on the fastest machine. Put the DTC log file on the fastest disk or RAM disk. Use CommandBehavior.SequentialAccess. Use ASCII instead of UTF -8.
11.4
Significant Technical Roadblocks
11.4.1 Performance Dips in Web Service The team was unable to address several performance characteristics due to time constraints. For instance, they experienced occasional rapid declines in performance of the Work Order Web service. These dips were momentary, and did not affect overall performance. While some time was spent to diagnose this behavior, the team could not diagnose it in the allotted time. Since it did not have a material impact on the results, the team was not overly concerned with this issue, but given more time would have liked to address it. 11.4.2 Lost Session Server Connections In addition, when the Customer Service application made Web service requests while under very high loads (server saturation), connections to the session state server would sometimes be dropped. The team attempted to correct this condition using documented Microsoft Knowledge base articles by setting the session state network timeout values higher than the default. This did not impact the condition. The team suspected two possible factors contributing to this problem:
They configured too many network connections (200) for the .NET HTTP module. Microsoft documents settings much lower than the 200 network connections used (the default setting is 2, but does need to be adjusted upwards on the Web service client machine).
They did not increase the number of I/O threads to handle the number of network connections. Analysis of MSDN material indicated that such a high setting might also require increasing the I/O thread pool.
Since this problem only occurred after server saturation and above the one-second performance testing cutoff, the team chose not to spend more time diagnosing it.
12 PERFORMANCE TESTING 12.1 Performance Testing Overview
The various ITS implementations were subjected to a series of four performance tests. The first three were similar: Mercury LoadRunner was used to subject each application to load in a variety of tests. Each test consisted of ramping up user load gradually over time to plot system throughput curves, measure transaction response times, determine maximum user loads supported, and track error rates under load. Identical test scripts were created to test each of the three implementations in a consistent manner. The fourth test measured message processing throughput. The following section details the tests and presents the summary findings from the auditors report.
12.2
Performance Test Results
12.2.1 ITS Customer Service Application This test put load on the Customer Service application running on two load-balanced servers to determine the peak throughput, transaction response times and peak user loads for each implementation. It ramped up user load at a rate of 500 new users every 15 minutes. The test scripts simulated simultaneous users accessing the ITS Customer Service Application to:

access the home page log in search and modify customer information generate new work tickets search work order database via Web service across a variety of search parameters
Since the Customer Service application was integrated with the Work Order application via messaging and a Web service, putting load on the Customer Service application also exercised the Work Order application and message queue server.
Customer updates would cause the Customer Service application to not only update its local database, but also send a message to the Work Order application to update its database (which replicates customer data) as well. The Customer Service application would submit new work orders by sending messages to the Work Order application. To perform ticket queries, the Customer Service application would invoke a Web service provided by the Work Order application.
The following table summarizes the auditors results for the Customer Service application performance test. Please note that a transaction represents a complete business operation invoked by a user, such as executing a ticket search and receiving the first results page. So 272 transactions, per second for example, is equivalent to 979,000 business operations per hour or 23.5 million business operations per day. Performance Results for ITS Customer Service Application Running on Cluster Statistic Peak throughput (passed transactions / second) Peak user load (load at which throughput achieved peak rate) Failed transactions as percentage of total RRD 272 1,500 WSAD 548 3,500 .NET 606 3,500
0.00%
0.02%
0.00%
This graph shows the performance of the three implementations at each user load:
TPS versus UserLoad Results for Customer Service Load Balanced Performance Test 700 600 AVG TPS 500 400 300 200 100 0 500 1000 1500 2000 2500 3000 3500 4000 Number of Virtual Users * Note: Last Data Point is AVG TPS past the 1 second cut-off WSAD RDD .NET 1.1
The .NET implementation performed slightly better than the WSAD, reaching about 10% higher peak throughput at the same peak user load. And both performed far better than the RRD implementation. 12.2.2 ITS Work Order Web Application This test put load on just the Work Order Web application to determine the peak throughput, transaction response times and peak user loads for each implementation. The test ramped up user load at a rate of 500 new users every 15 minutes. The test scripts simulated users accessing the application to:
log in
search for customers modify customer information search for technicians modify technician information search for work tickets
The following table summarizes the auditors results for the Work Order application performance test. Please note that a transaction represents a complete business operation invoked by a user, such as executing a ticket search and receiving the first results page. So 260 transactions per second, for example, is equivalent to 936,000 business operations per hour or 23.5 million business operations per day. Performance Results for ITS Work Order Application Running on Single Server Statistic Peak throughput (passed transactions / second) Peak user load (load at which throughput achieved peak rate) Failed transactions as percentage of total RRD 260 1,500 WSAD 754 4,500 .NET 432 3,000
0.00%
0.11%
0.00%
This graph shows the performance of the three implementations at each user load:
TPS versus UserLoad Results for Work Order Performance Test
800 700 600 AVG TPS 500 400 300 200 100 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Number of Virtual Users * Note: Last Data Point is AVG TPS past the 1 second cut-off WSAD RDD .NET 1.1
In this test the WSAD implementation far surpassed the other two. It achieved 75% more throughput at 50% higher peak user load. Again the RRD implementation fell far behind the other two.
12.2.3 Integrated Scenario This test stressed both systems at once, running the ITS Work Order test (25% of the load) and the ITS Customer Service Application test (75%) of the load. In this case, LoadRunner clients were accessing both Web Applications, running the same scripts as in the two individual tests. This test ramped up user load at a rate of 661 new users every 15 minutes: 500 on the Customer Service application, 161 on the Work Order Web application. The following table summarizes the auditors results for the integrated scenario performance test, which included both the Customer Service and Work Order applications. Please note that a transaction represents a complete business operation invoked by a user, such as executing a ticket search and receiving the first results page. So 365 transactions per second, for example, is equivalent to 1.3 million business operations per hour or 31.5 million business operations per day. Performance Results for Integrated Scenario Statistic Peak throughput (passed transactions / second) Peak user load (load at which throughput achieves peak rate) Failed transactions as percentage of total RRD 365 1,983 WSAD 482 2,644 .NET 855 5,299
0.00%
0.00%
0.00%
And the graph of performance at each user load:
TPS versus UserLoad Results for Fully Integrated Performance Test
1000 900 800 700 600 500 400 300 200 100 0 661 1322 1983 2644 3305 3966 4627 5082 5299 5449 Number of Virtual Users *Note: Last Data Point is AVG TPS past the 1 second cut-off
AVG TPS
WSAD RDD .NET 1.1
In this test of the complete system, again the .NET implementation was the clear winner. It achieved 1/3 higher peak throughput at 50% higher peak user load. Interestingly, though, the difference between the WSAD and RRD performance was much smaller for this test than for the others. 12.2.4 Message Processing The last test measured the performance of the Work Order application in processing messages. It timed how long it would take the Work Order application to process 20,000 messages in the ticket queue. The teams were asked to halt their Work Order message processing module and make sure the work ticket message queue was empty. A script was run to load the queue with exactly 20,000 messages. Once the queue was loaded, the Work Order application performing message processing was restarted, and the time it took to process the 20.000 messages was measured. This table presents the results from the auditors report: Performance Results for Message Processing Statistic Time to process 20,000 messages (seconds) Throughput (messages/sec) RRD 168 119 WSAD 38 526 .NET 101 198
The WSAD implementation was the clear winner over the .NET implementation in this test, by more than a factor of two. The difference may be attributable to differences in code, infrastructure, or both. On the .NET side, the team had to create an additional message forwarding layer to compensate for the fact that one cannot do transactional reads from a remote MSMQ queue (see Section 8.4.3.1). This layer, not necessary for the J2EE implementations, undoubtedly added overhead. On the other hand, teams initial throughput was much worse (2 messages per second); they improved it to this level by reconfiguring the message processing application from single- to multi-threaded. Equally interesting is the fact that the RRD implementation lagged so far behind the WSAD, despite their using the same message server (WebSphere MQ) and the same basic architecture (message-driven EJBs). The main reason is the generated RRD code that acquires message related resources. In particular, RRD code creates a new JNDI InitialContext object whenever it wants to do a JNDI lookup. For the WSAD version, the team used a ServiceLocator class that cached the InitialContext and the queue connection factory.
12.3
Conclusions from Performance Tests
What can we conclude from these performance tests? The following table may help. It summarizes the results of the load tests described above. (Note that this table presents peak throughput in transactions per hour rather than per second.) It also totals the numbers from the three load tests. These totals are only meant to provide a rough overall indicator of each implementations performance under load. Performance Results for 3 Load Tests ITS Customer Service Application Running on Cluster Statistic RRD WSAD .NET
Peak throughput (passed business operations/hour) Customer Service only Work Order only Integrated scenario Total for 3 test 979,200 936,000 1,314,000 3,229,200 1,972,800 2,714, 400 1,735,200 6,422,400 2,181,600 1,555, 200 3,078,000 6,814,800
Peak user load (load at which throughput achieved peak rate) Customer Service only Work Order only Integrated scenario Total for 3 test 1,500 1,500 1,983 4,983 3,500 4,500 2,644 10,644 3,500 3,000 5,299 11,799
Message processing throughput (messages/hour) 428,600 1,894,700 712,900
The most obvious conclusion is that the RRD implementation fell short of the others by a wide margin. The .NET and WSAD implementations outperformed it by a factor of two. The
explanation lies largely in the nature of RRD as a development tool (discussed in Section 7.1.1). Designed to speed development and distance the developer from J2EE coding, RRD generates Java code that in many ways is not optimized for performance. The J2EE team spent a great deal of time trying to compensate for those code limitations. Comparing the WSAD and .NET implementations, we find a much closer match. .NET performed significantly worse on the Work Order and messaging tests, slightly better on the Customer Service test, and significantly better on the integrated scenario test (which is the one most like the real world). Taking the total of the three load tests, the .NET and WSAD implementations come out nearly even. On the message throughput test, the WSAD implementation surpassed the .NET by more than double. The ability of the WSAD implementation to match roughly its.NET counterpart in performance suggests that the WebSphere/Linux platform performed on a par with the .NET/Windows platform. Of course the study revealed other indicators where the two platforms differed greatly. But strictly in terms of performance (as measured by these tests), the two platforms are comparable.
13 MANAGEABILITY TESTING 13.1 Manageability Testing Overview
To test manageability of the ITS system, the teams were asked to modify various aspects of the Work Order and Customer Service applications and deploy these changes to the system while running under load. The three change requests were as follows: 1. 2. 3. In the Work Order application, change the ordering of results from a database query In the Customer Service application, add a new Web page and modify an existing page In one of the Customer Service applications Web pages, change a drop-down list whose contents are hard-coded so that it instead binds to a database table.
Each test proceeded in two parts, development and deployment. For development, the team made, tested and verified the change in a test area, while the auditor timed the process. For deployment, the user load was ramped to 1,750 concurrent users (1,400 against the load balanced Customer Service application, 350 against the Work Order application). Note that even though Request #1 did not affect the Customer Service application, that application was still running under load during the test. The team was then asked to deploy the change to the appropriate server(s). The auditor measured the time taken to deploy, the number of errors occurring during deployment, and whether the Customer Service application preserved session state. The following section details the tests and presents the summary findings from the auditors report.
13.2
Manageability Test Results
13.2.1 Change Request 1: Changing a Database Query The first change request applied to the Work Order Web application. It stated: Results generated from the Ticket Search Page need to be ordered in descending order by date with most recent tickets displayed first.
Here are the summary results from the auditors report: Change Request #1: Changing a Database Query and Deploying Under Load Statistic RRD Development Development time (minutes) 112 Deployment Errors during deployment to live server running under load Time required to deploy changes successfully (minutes) 874 17 >2,000 9 5 1 35 9 WSAD .NET
Explosion of errors. The WSAD experience was somewhat bizarre. After the team had successfully deployed the changes to the Work Order Web application, the Customer Service side of the system (which was also running under load), began to fail. This occurred even though the changes did not affect it directly; the only modified code was in the freestanding Work Order Web application, not the Work Order module that processed messages or hosted a Web service. The team had no hypothesis to explain this failure. And under the time constraints they could not properly address it. They are confident, however, that a solution exists and, given enough time, they could have made this work. Auditors observations. Given the high number of errors that occurred for the two J2EE implementations, the auditor included these observations in the report: RRD: For this change request, the development team chose to modify the middle tier application logic since this contained the database query to which the order-by clause needed to be applied. In order to ensure continued query performance, the team also in tandem chose to change an index in the database schema so that the order-by clause could be completed efficiently. The 874 errors can be attributed to both the recompiling of the application and changing the database index while under load. WSAD: A portion of the errors could be attributed to both the recompiling of the application and changing the database index while under load. However, in this exercise it is also observed that while Apache had systematically crashed during the live deployment, Edge Server was not part of that particular issue. Technically this test was stopped and allowed to pass; however, the CS portion of the site should not have been effected by the change to the WO server. Since synchronization was not a step performed by Middleware there is no real cause for the CS site to suddenly go offline. At the time noted for System Back Online the User load was continued on the system for approximately 5 more minutes after the test ended to verify that the bouncing of CS#1 worked and that transactions were being passed successfully. .NET: The five errors were time-out errors (120 seconds), most likely due to the application recompiling and being re-loaded into memory after being re-deployed.
13.2.2 Change Request 2: Adding a Web Page The second change request applied to the Customer Service application. It required changes to the display of news items on the home page and creation of a new page for adding new, company -specific news items. The request stated: Add a new page to the Customer Service application that will allow administrators for each company to generate new news bulletins that are displayed for their company. This part requires 2 changes: adding the web form to allow news items to be submitted and stored in the database, and adding a column to the database table allowing news items to be tracked by unique customer id. The CompanyID of zero for any news item is specified such that these news items will display for every company. Unauthenticated users will see only default news items. However, once logged in, the news items for that company will also display, in addition to the default news items. Here are the summary results from the auditors report: Change Request #2: Adding a New Web Form and Deploying Under Load Statistic RRD Development Development time (minutes) 169 Deployment Errors during deployment to live server running under load Time required to deploy changes successfully (minutes) Session state properly maintained for running users of the Customer Service Application? 30 10 No 2 13 No 0 9 Yes 192 50 WSAD .NET
13.2.3 Change Request 3: Binding a Web Page Field to a Database The final change request also applied to the Customer Service application. Its ticket query page had a drop-down list of work status conditions (Created, In progress, Completed). In the original specification, the list was hard coded. This change request required creating a table in the Customer Service database and binding the list to it. The request stated: Databind the Work Status dropdown list to a table in the database vs. hard coding the values in the HTML.
Here are the summary results from the auditors report: Change Request #3: Changing A Web Form Field and Deploying Under Load Statistic RRD Development Development time (minutes) 110 Deployment Errors during deployment to live server running under load Time required to deploy changes successfully (minutes) Session state properly maintained for running users of the Customer Service Application? 6 7 Yes 0 10 No 0 15 Yes 47 15 WSAD .NET
13.3
Conclusions from Manageability Tests
Taken together, these tests show that the .NET implementation of ITS was significantly easier to manage than either the RRD or WSAD implementations. It performed better in the two most critical areas: First, it had fewer errors during deployment under load in all three tests. For the changes to the Customer Service application, the RRD and WSAD implementations both had relatively small numbers of errors, whereas the .NET had zero. The greatest difference lay in deploying to the Work Order application, where (as noted above in Section 13.2.1) the WSAD implementation had an inexplicable catastrophic failure. Second, the .NET implementation maintained session state properly in the two tests involving the Customer Service application it. The RRD implementation failed once on that count, the WSAD implementation twice.
14 RELIABILITY TESTING 14.1 Reliability Testing Overview
In order to measure the reliability of each implementation, some simple tests were designed to simulate operation of the production system under load. The following tests were performed: 1. 2. 3. 4. Controlled shutdown: Gracefully bring down a Customer Service load-balanced server for maintenance, then add it back to the cluster Catastrophic failover: Abruptly down a Customer Service load-balanced server (pull the plug), then bring the failed server back on-line within the cluster Loose coupling: Power off the Work Order application while running the Customer Service application (to test the loosely coupled nature of the applications ) Long duration: Run the entire system at normal load for 12 hours
The following section details the tests and presents the summary findings from the auditors report.
14.2
Reliability Test Results
Here are the summary results of these tests, taken from the auditors report. 14.2.1 Controlled Shutdown Test This test focused on how reliably the system operates when a clustered server is taken offline manually. A Customer Service server was powered down cleanly, properly removed from the cluster, and then brought back online and into the cluster. Once the first server was verified to be back up and handling load properly, the process was repeated for the other Customer Service server. Here are the summary results from the auditors report: Reliability During Controlled Shutdown Statistic RRD WSAD .NET
Bringing Customer Service server down Can manually remove server from cluster for maintenance? Application continues to operates with no errors when server is taken offline? Number of errors thrown Session state maintained? Yes Yes Yes
Yes
Yes
Yes
0 No
0 Yes
0 Yes
Adding Customer Service server to cluster Can manually add server to cluster Yes Yes Yes
additional capacity during operations? Application continues to operates with no errors when server is added to cluster? Server picks up load correctly? Session state maintained? Number of errors thrown No Yes Yes
No No >58,000
Yes Yes 0
Yes Yes 0
The WebSphere team did poorly with the RRD implementation, but much better with the WSAD implementation. The difference is explained much more by the differences in how WebSphere was configured than in the implementations themselves. By the second round (WSAD), the team had worked out a more reliable means of handling failover, which centered on skewing the load balancing that each Apache instance performed to favor its colocated WebSphere server. See Section 10.6.9 for a complete discussion. 14.2.2 Catastrophic Hardware Failure Test This test was designed to see how the site would respond when the janitor trips over the power cord of one of the servers. The power plug was physically removed from one of the clustered Customer Service servers. At the same time, 1,000 new users were added to the load against the system. Reliability / Failover During a Catastrophic Hardware Failure Statistic RRD A server is downed abruptly Session state maintained? No Downed server is brought back online Server picks up load correctly? Session state maintained? Errors No No >74,000 No No >90,000 Yes Yes 27 No Yes WSAD .NET
Despite the progress they made in handling a controlled shutdown, the WebSphere team still could not get the system to handle a catastrophic failure gracefully. The downed server did not come back properly, leading to a huge number of errors. This was another case where the team had no explanation for the problem and insufficient time to solve it. The .NET implementation handled the failover well, with minimal errors. 14.2.3 Loosely Coupled Test This test focused on the loose coupling of the Customer Service and Work Order applications via messaging. It began with the Customer Service application generating new ticket messages and the Work Order application processing those messages. At some point during the test the Work Order application was shut down. The test was whether the Customer Service application could continue to generate new tickets?
Loosely Coupled Reliability Test Statistic RRD WSAD .NET
Work Order application is shut down Can customers continue data entry in Customer Service app to generate new work tickets with no errors? 14.2.4 Long Duration Test In this test, the entire system was run for 12 hours with a user load of 1,750 virtual users. All three implementations did well. From the auditors report: All three implementations were able to sustain an average response time of less than 1 second and no errors were thrown by any implementation at this user load for the 12 hours. Yes Yes Yes
14.3
Conclusions from Reliability Tests
In these tests the .NET implementation proved significantly more reliable in handling service interruptions than did the two J2EE implementations. The WSAD implementation handled the controlled shutdown as well as the .NET, and better than the RRD (which crashed when the downed server was restarted). In the catastrophic failover test, however, only the .NET implementation recovered, while both J2EE implementations failed to do so. On the remaining two tests the loosely coupled test and the 12-hour sustained operation all three performed with equal reliability. Again, we note that the manageability and reliability testing results show what the teams were able to achieve given the time each took for configuration and tuning in preparation for the tests. With additional time and/or involvement by vendors themselves, improved results might have been achieved.
15 OVERALL CONCLUSIONS
At its heart, this study was a comparison of two fundamentally different approaches to enterprise software embodied by two different technologies and platforms. .NET represents Microsofts longstanding approach, which emphasizes these elements:

Focus on the Windows platform to provide tight integration between the OS and the development framework and tools Standardization on Visual Studio.NET as the primary development tool for .NET
The Java / J2EE world, on the other hand, emphasizes:

Independence of the J2EE platform from the underlying OS Open standards Vendor competition and consumer choices for tools and runtime platforms
How were these approaches reflected in the results of this study? Developer productivity. Microsofts tight integration approach paid off in the development phase, where VS.NET and the .NET platform proved more productive than either RRD or WSAD with the WebSphere platform. Among the reasons:
The position of VS.NET as the premier .NET development tool all but guaranteed an equivalence between VS.NET experience and .NET platform experience. In other words, a developer with three years of .NET experience most likely has used VS.NET for three years, whereas a developer with three years J2EE experience may not have used RRD or WSAD at all. VS.NET shared some of the best features of both RRD (visual page design; data binding) and WSAD (direct coding of business logic; tight integration with the target platform).
Installation and configuration of software. Tight integration paid off here, too, for the .NET team. Most key elements of the .NET runtime infrastructure (basic application platform, Web server, load balancer, session server, message server) were already in place with the basic Windows Server 2003 installation. This fact saved the .NET team a great deal of time and trouble. The WebSphere team, by comparison, spent a great deal of time during the development phase installing the software and configuring it for basic functional tests. They also spent considerable time overcoming fundamental configuration obstacles, such as patching the Linux kernel for Edge Server and configuring WebSphere for session replication. The .NET team did not face such obstacles. System tuning. The .NET team completed their tuning process much more quickly. One obvious reason is that they had fewer knobs to turn. A J2EE system has many more moving parts that interact in many combinations, making the tuning process all the more complex. The WebSphere team took a methodical approach to tuning that certainly proved more time consuming. Performance. In terms of sheer processing throughput, the .NET and WSAD implementations 18 performed comparably. In one particular area, message processing, the .NET version fell far short, but this is most likely explained by the more complex architecture necessary to work
18
It would be interesting to know to what degree, if any, the different operating systems contributed to the performance results. Unfortunately the data from this study sheds no light on that question.
around a specific barrier in MSMQ regarding distributed transactions spanning a read from a remote queue server. Manageability & reliability. The .NET implementation consistently and reliably handled service interruptions, both controlled and unexpected. It also allowed the team to deploy application updates much more smoothly. The WebSphere team, on the other hand, encountered catastrophic failures that they could not diagnose or explain sufficiently to overcome. They also found session persistence performing less than reliably. The team feels they could have solved these problems given more time. Unfortunately, time was a measured resource in this study.
Overall, by most indicators in this study, the .NET implementation running on Windows Server 2003 was better, in some cases significantly so, than either WebSphere/J2EE implementation running on Linux. Are these results surprising? It makes sense that using an integrated out -of-the-box operating system and application server" framework such as Windows and .NET would have a much lower setup cost than attempting to integrate multiple products (albeit from the same company) and a third party OS. Although IBM products have come a long way since 1998, they still have some way to go in providing the seamless integration Microsoft can offer. Nor should it surprise anyone that the development productivity results favor the .NET side; productivity has always been one of Microsofts strong points. Perhaps a more noteworthy result is that WSAD came much closer to Visual Studio than it would have a couple of years ago. Regarding performance, the RRD results should not come as a shock. Any code generation scheme will always have a difficult time holding its own against tightly written, hand-crafted code. What is worth noting, however, is how close the WSAD and .NET performance results came. This outcome basically means that both IBM and Microsoft have done a good job getting the most out of the hardware resources on which their platforms run. The only truly unexpected result was that the WSAD message processing (using JMS and message EJBs) performed so much faster than the.NET.
Given that the J2EE approach to enterprise software is very much about competition and choices, we might well ask whether the most significant problems encountered by the WebSphere team could have been helped or eliminated through different choices. RRD vs. WSAD? RRD, chosen initially for its development productivity offering, did not deliver that offering in this study. Given that the specification included J2EE technologies beyond the core, such as a handheld application and a Web service; and given that the WebSphere team consisted of skilled J2EE developers comfortable with those technologies APIs; WSAD was a much better choice and, so the team feels, would have compared favorably with VS.NET strictly in terms of code production. In terms of producing a high-performance implementation, WSAD was clearly the better choice over RRD. Linux? Given the choice of Edge Server for load balancing, the WebSphere team had to patch and upgrade the Linux kernel to make it work. This process requires skills common to Linux experts but not necessarily to the average J2EE developer or IT team. There is no question that Linux added a layer of complexity to the configuration process.
Moreover, the challenges of Linux are separate from those of WebSphere. Linux emphasizes functionality and control over simplicity and ease of use. Edge Server would have undoubtedly been easier to configure on Windows, which many enterprises choose apart from their choice of J2EE or .NET for their applications. Configuring s ession sharing and failover? Two of the most vexing problems for the WebSphere team were their inability to get session replication working reliably and to configure the system for robust failover. During the manageability and reliability tests they found the system crashing for reasons they could not explain in sufficient depth to correct the problems. But knowing that robust, successful WebSphere installations exist in the world, the team feels certain they could have done so, given enough time. WebSphere? And finally, of course, one has a choice of J2EE platforms. This study examined the use of one particular platform in a carefully constructed experiment. These results do not speak to the qualities of others.
16 APPENDIX: RELATED DOCUMENTS

The following documents are integral to this report. They can be found at http://www.middlewareresearch.com/endeavors/040921IBMDOTNET/endeavor.jsp:

The ITS system specification was the basis for the two teams development. The Independent Auditors report (CN2 report of the study results) provided most of the result data cited in this report.
17 APPENDIX: SOURCES USED 17.1 Sources Used by the IBM WebSphere Team
IBM WebSphere V5.1 Performance, Scalability and High Availability, IBM Redbook. IBM WebSphere V5.1 System Management and Configuration, IBM Redbook. WebSphere Edge Server for Multiplatforms, Administration Guide, Version 2.0 WebSphere Edge Server for Multiplatforms, Network Dispatcher Administration Guide, Version 2.0
17.2
Sources Used by the Microsoft .NET Team
Developing High-Performance ASP.NET Applications, from.NET Framework Developers Guide. MSDN article. Found at http://msdn.microsoft.com/library/default.asp?url=/library/enus/cpguide/html/cpconDevelopingHigh-PerformanceASPNETApplications.asp Transactional Read-response Applications . MSDN article. Found at http://msdn.microsoft.com/library/default.asp?url=/library/enus/msmq/msmq_about_transactions_05wz.asp
18 APPENDIX: SOFTWARE PRICING DATA

This section provides sample prices for the software used in this study.
18.1
IBM Software
This IBM pricing data is obtained from the IBM Passport Advantage Express price book. Passport Advantage (PA) is a relationship-based discount program: the more software you buy over time, the larger the discount. It resembles Microsoft Select pricing. PA Express is a transaction-based discount program, similar to Microsoft Open Value. In other words, the discount is based solely on the amount of software you buy for a particular transaction. These are real prices - what the customer would pay if they would approach IBM today and buy from the web. They are discounted slightly from IBM's suggested retail prices.. IBM Passport Advantage Express Pricing Item Base Server OS Application Server Function Developer tool Total The fine print: Red Hat Linux : http://www.redhat.com/software/rhel/as/ The only version of Red Hat Linux supported on 4-cpu servers is Red Hat Enterprise Linux AS. Support for this product is available from Red Hat in a Standard or Premium subscription, on a per-system, per year basis. The Standard subscription includes 9am-9pm telephone support (US Eastern time), with 4-hour response time. The Premium subscription offers 24/7 telephone support and 1-hour response time. This pricing configuration uses the Standard subscription. WebSphere Application Server: http://www-306.ibm.com/software/info1/websphere/index.jsp?tab=products/appserv WebSphere Application Server is available in multiple versions and editions. This configuration required load balancing and failover, which is available in WebSphere Application Server ND. WAS ND is licensed on a per-cpu basis, and the initial license includes 1 year of product telephone support and maintenance. This pricing configuration uses prices from IBMs Passport Advantage Express discount purchasing program, the transaction-based licensing program from IBM. WebSphere Studio Application Developer: http://www-306.ibm.com/software/info1/websphere/index.jsp?tab=products/studio Product Red Hat Enterprise Linux AS (Standard Subscription) WebSphere Application Server ND v5.1 WebSphere Studio Application Developer v5.1 Price/unit $1499 / system, per year $15,000 /cpu Units 4 servers Extended Price $5,996
4 servers x 4 cpus each 2 seats
$240,000
$4000/ seat
$8,000 $253,996
WebSphere Studio is the brand name for a set of tools available from IBM. For building distributed applications with database access and EJBs, WebSphere Studio Application Developer is required. This pricing configuration uses the per-seat license costs offered under IBMs Passport Advantage Express discount purchase program. Passport Advantage Express: http://www-306.ibm.com/software/info/ecatalog/en_US/brand/websphere.html Taxes and surcharges are extra.
18.2
Microsoft Software
These Microsoft prices are based on a quote from Software Spectrum, Microsofts largest distributor. They are based on using Open Value in order to get Software Assurance with the Support privileges. Microsoft Actual Reseller Pricing Quote Item Base Server OS Product Windows Server 2003 Enterprise Edition Included in Windows Server MSDN Enterprise Subscription CD kit for Windows Server 2003 Enterprise Edition Price/unit $3575.87 / system -Units 4 servers Extended Price $14303.48
Application Server Function Developer tool Media
--
--
$2483.99 / seat, per year $23 / kit
2 seats 1
$4967.98 $23.00
Total The fine print: Windows Server 2003: http://www.microsoft.com/windowsserver2003/howtobuy/licensing/pricing.mspx
$19,294.46
The version of Windows Server required for a 4-cpu server is the Enterprise Edition. In addition to the base Windows Server license, customers enabling authenticated external connections to Windows Server need to purchase the external connector license. Typically the external connector is required for e-commerce applications. There is no External Connector required in this case, since the application did not authenticate incoming requests against Active Directory. Also, there are no Client Access Licenses (CALs) required in this case. The license of Windows Server was priced with Software Assurance (SA), through the Open Value Licensing program, the transaction-based licensing program available through qualified Microsoft resellers. The license and software assurance plan priced here provides maintenance and updates, 24x7 web support, as well as telephone support during business hours for these products, for a period of 2 years. Visual Studio: http://msdn.microsoft.com/vstudio/howtobuy/pricing.aspx
Visual Studio is a family of developer tools available from Microsoft. The developers in this benchmark effort used Visual Studio 2003 Enterprise Architect. The purchase vehicle for this version of the tool with Microsoft Software Assurance is the MSDN Enterprise subscription. Taxes and media shipping surcharges are extra.

Middleware J2ee

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Middleware J2ee

Загружено:

Авторское право:

Доступные форматы

RESEARCH REPORT

Comparing Microsoft .NET and IBM WebSphere/J2EE

1.1 Research Code of Conduct

1.3 Why are we doing this study? What is our agenda?

5.4 Laboratory Rules and Conditions ..............................................................22

6.2 Details of the .NET Architecture ................................................................29

6.2.3 Microsoft Message Queue (MSMQ) ......................................................... 29

7.2 Tools Used by the .NET Team ..................................................................32

DEVELOPER PRODUCTIVITY RESULTS .....................................................34 8.1 Quantitative Results ...................................................................................34

8.2 RRD Development Process.......................................................................37

8.3 WSAD Development Process....................................................................42

8.4 Microsoft .NET Development Process ......................................................46

8.4.4.1 8.4.4.2 8.4.4.3 8.4.4.4

10.2 RRD Round: Configuring the System........................................................55

10.3 RRD Round: Resolving Code Bottlenecks ................................................56

10.4 RRD Round: Tuning the System for Performance....................................58

10.5 WSAD Round: Issues ................................................................................62

10.6 Significant Technical Roadblocks ..............................................................64

10.6.9.1 10.6.9.2 10.6.9.3 10.6.9.4 10.6.10 10.6.11 10.6.12

11.2 Resolving Code Bottlenecks......................................................................84 11.3 Base Tuning Process.................................................................................84

11.4 Significant Technical Roadblocks ..............................................................86

3.1 The Teams

3.2 The System

3.3 The Implementations

3.4 Developer Productivity Results

.NET vs. RRD

.NET vs. WSAD Development Productivity

RRD vs. WSAD

Better; uncertain how much.*

Worse, uncertain how much. *

product installs; less to install/configure on Windows Server side.

developer productivity for all subsystems.

3.5 Configuration and Tuning Results

RRD vs. WSAD

Better; uncertain how much.

3.6 Performance Results

.NET vs. RRD

.NET vs. WSAD Performance

RRD vs. WSAD

Significantly better on 3 of 4 tests.

Significantly worse on all 4 tests.

user throughput in 1 test, slightly higher in 1, worse in 1.

achieved nearly 3 times the message processing thruput.

3.7 Reliability and Manageability Results

RRD vs. WSAD

errors during deployment. .NET slightly faster to deploy.

errors during deployment. .NET slightly faster to deploy.

errors during deployment. RRD preserved sessions more reliably.

Reliability: Handling Failover Significantly better.

could not handle catastrophic failover.

Could not add server to cluster after graceful shutdown.

4.1 How this Report is Organized

Section 5 covers the studys methodology in detail:

Section 6 details the physical architecture of the system:

Sections 9-11 present the deployment, configuration and tuning results:

4.2 Goals of the Study

4.3 The Approach

4.4 The ITS System

Technician Mobile Device Application

ITS Corporate Network B2B Internet Connectivity

ITS Customer Service Application

ITS Work Order Message Queue Server