Вы находитесь на странице: 1из 112

GridServer Administration Guide

Version 4.2

The GridServer Administration Series


Proprietary and Confidential
Confidentiality and Disclaimer
Neither this document nor any of its contents may be used or disclosed without the express written consent
of DataSynapse. This document does not carry any right of publication or disclosure to any other party.
While the information provided herein is believed to be accurate and reliable, DataSynapse makes no
representations or warranties, express or implied, as to the accuracy or completeness of such information.
Only those representations and warranties contained in a definitive license agreement shall have any legal
effect. In furnishing this document, DataSynapse reserves the right to amend or replace it at any time and
undertakes no obligation to provide the recipient with access to any additional information. Nothing
contained within this document is or should be relied upon as a promise or representation as to the future.
This product includes software developed by the Apache Software Foundation (www.apache.org/).
This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit.
(www.openssl.org/).
This product includes code licensed from RSA Data Security (java.sun.com/products/jsse/LICENSE.html).
DataSynapse GridServer Administration Guide Version 4.2
Copyright © 2006 DataSynapse, Inc. All Rights Reserved.
GridServer® is a registered trademark, DataSynapse, FabricServer™, the DataSynapse logo,
LiveCluster™, and GridClient™ are trademarks, and GRIDesign is a servicemark of DataSynapse, Inc.
Protected by U.S. Patent No. 6,757,730. Other patents pending.
WebSphere® is a registered trademark and CloudScape™ is a trademark of International Business
Machines Corporation in the United States, other countries, or both. All other product names are trademarks
or registered trademarks of their respective companies.
DataSynapse, Inc. 632 Broadway, 5th Floor; New York, NY 10012
Tel: 212.842.8842 Fax: 212.842.8843
Email: info@datasynapse.com Web: www.datasynapse.com
For technical support issues and product updates, please visit customer.datasynapse.com.
We appreciate any comments or suggestions you may have about this manual or other DataSynapse
documentation. Please send your feedback to docs@datasynapse.com.
2212006
Contents
Confidentiality and Disclaimer ............................................................................................................2
Contents .....................................................................................................................................................3
Chapter 1 - Introduction ........................................................................................................................9
Before you begin .............................................................................................................................9
GridServer 4.2 Documentation Roadmap .......................................................................................9
GridServer Guides ..............................................................................................................9
Other Documentation and Help ........................................................................................10
Document Conventions .................................................................................................................11
Chapter 2 - Work ...................................................................................................................................13
Introduction ...................................................................................................................................13
Services .........................................................................................................................................13
Clients ...............................................................................................................................13
Service Implementations ...................................................................................................13
Service Session .................................................................................................................14
Service benefits .................................................................................................................14
Jobs ...............................................................................................................................................14
Job Benefits .......................................................................................................................15
Binary-level Integration ................................................................................................................15
Chapter 3 - Engine Balancing and Client Routing .........................................................................17
Introduction ...................................................................................................................................17
Client Routing ...............................................................................................................................17
Allowed Brokers Set .........................................................................................................17
Client Properties Rules .....................................................................................................17
Driver API .........................................................................................................................17
Engine Routing and Balancing .....................................................................................................17
Engine Weight-Based Balancer ........................................................................................18
Home/Shared Balancer .....................................................................................................18
Engine Balancer Configuration ........................................................................................19
Failover Brokers ...........................................................................................................................20
Engine Upper and Lower Bounds .................................................................................................20
Example Use Cases .......................................................................................................................20
N+1 Failover with Weighting ...........................................................................................20
Engine Localization with Sharing .....................................................................................21
Chapter 4 - Grid Fault-Tolerance and Failover ...............................................................................23
Introduction ...................................................................................................................................23
The Fault-tolerant GridServer Deployment ..................................................................................23
Heartbeats and Failure Detection ..................................................................................................23
Manager Stability Features ...........................................................................................................24
Engine Failure ...............................................................................................................................24
Driver Failure ................................................................................................................................24
Director Failure .............................................................................................................................25
Broker Failure ...............................................................................................................................25
Failover Brokers ...........................................................................................................................25
Fault-Tolerant Tasks .....................................................................................................................26
Batch Fault-Tolerance ...................................................................................................................27


GridServer Administration Guide • 3



GridCache Fault-Tolerance ...........................................................................................................27
Client .................................................................................................................................27
Broker Restart ...................................................................................................................27
Failover .............................................................................................................................27
Chapter 5 - Scheduling .........................................................................................................................29
Introduction ...................................................................................................................................29
Reschedules and Retries ...............................................................................................................29
Retry ..................................................................................................................................29
Reschedule ........................................................................................................................29
Timeout Behavior .............................................................................................................30
The Scheduler ...............................................................................................................................30
Scheduler Overview ..........................................................................................................30
Service Priority .................................................................................................................31
Usage Algorithm ...............................................................................................................31
Time Algorithm ................................................................................................................31
Serial Priority Algorithm ..................................................................................................32
Urgent Priority Services and Preemption .....................................................................................32
Engine Blacklisting .......................................................................................................................33
Conditions .....................................................................................................................................33
Redundant Task Rescheduling ......................................................................................................33
Chapter 6 - The GridServer Administration Tool ..........................................................................35
Introduction ...................................................................................................................................35
Getting Started ..............................................................................................................................35
User Accounts and Access Levels ................................................................................................36
Creating User Accounts ....................................................................................................36
Features Available by Access Level .................................................................................37
User Account Security ......................................................................................................37
Navigating the Administration Tool .............................................................................................38
The Home Page .................................................................................................................38
Tabs ...................................................................................................................................38
Shortcut buttons ................................................................................................................39
Action Controls .................................................................................................................39
Links on other pages .........................................................................................................39
Using Tables .................................................................................................................................39
Pager control .....................................................................................................................39
Search control ...................................................................................................................39
Personalize Table ..............................................................................................................40
Refresh ..............................................................................................................................40
Broker and Director Monitors ...........................................................................................40
Manager Component Indicator .........................................................................................40
Status Display ...................................................................................................................41
Chapter 7 - Application Resource Deployment ..............................................................................43
Introduction ...................................................................................................................................43
Grid Libraries ................................................................................................................................43
Grid Library Format ..........................................................................................................44
Using Grid Libraries from a Service .................................................................................49
Deployment .......................................................................................................................50



4 –•Contents This Document is Proprietary and Confidental



Grid Library Manager .......................................................................................................50
C++ Bridges ......................................................................................................................51
JREs ..................................................................................................................................51
Grid Library Example .......................................................................................................51
Legacy Resource Deployment ......................................................................................................52
Using Default Resources ..................................................................................................52
Default Resource Paths .....................................................................................................53
C++ Bridges ......................................................................................................................53
Grid Library features not supported by Default Resources ..............................................53
Code Versioning Deprecation ...........................................................................................53
Resource Deployment: Distributing Grid Libraries and Default Resources ................................54
The Resource Deployment Interface ................................................................................54
Resource Deployment File Locations ...............................................................................54
Configuring Directory Replication ...................................................................................55
Using Engines with Shared Network Directories .............................................................55
JAR Ordering File .............................................................................................................56
Remote Application Installation ...................................................................................................56
Service Run-As .............................................................................................................................57
Types of Credentials .........................................................................................................58
Using Run-As ...................................................................................................................58
Chapter 8 - The Batch Scheduling Facility .......................................................................................61
Introduction ...................................................................................................................................61
Terminology ..................................................................................................................................61
Editing Batch Definitions .............................................................................................................62
Batch Components ........................................................................................................................63
Service Runners ............................................................................................................................65
Scheduling Batch Definitions .......................................................................................................66
The Batch Schedule Page .............................................................................................................66
Running Batches ...........................................................................................................................66
Deploying Batch Resources ..........................................................................................................67
Batch Fault-Tolerance ...................................................................................................................67
Using PDriver in a Batch ..............................................................................................................67
Chapter 9 - Configuring Security .......................................................................................................69
Introduction ...................................................................................................................................69
Authentication ...............................................................................................................................69
Operating System Users ....................................................................................................69
Grid Users .........................................................................................................................69
GridServer Built-In Authentication ..................................................................................70
Extensible Authentication Hooks .....................................................................................70
Enabling Client Authentication ........................................................................................70
SSL ................................................................................................................................................71
Communication Overview ................................................................................................71
Certificate Overview .........................................................................................................71
Keypair and Cert Location ................................................................................................72
Types of Connections Using SSL .....................................................................................72
Enabling HTTPS on the Application Server .....................................................................72
Enabling HTTPS on all Components ................................................................................73



GridServer Administration Guide • 5



Driver SSL ........................................................................................................................73
Engines and Engine Daemon SSL ....................................................................................74
Brokers and Director SSL .................................................................................................75
Resources over HTTPS .....................................................................................................75
Disabling HTTP ................................................................................................................76
Resource Protection ......................................................................................................................76
Chapter 10 - GridServer Performance and Tuning ........................................................................77
Diagnosing Performance Problems ..............................................................................................77
Tuning Data Movement ................................................................................................................77
Stateful Processing ............................................................................................................77
Compression .....................................................................................................................78
Packing ..............................................................................................................................78
Direct Data Transfer .........................................................................................................78
Shared Directories and DDT .............................................................................................79
Caching .............................................................................................................................79
Data References ................................................................................................................79
Tasks Per Message ............................................................................................................79
Invocations Per Message ..................................................................................................80
Tuning for Large Grids .................................................................................................................80
Chapter 11 - Diagnosing GridServer Issues ....................................................................................81
Troubleshooting ............................................................................................................................81
Obtaining Log Files ......................................................................................................................81
Manager Logs ...................................................................................................................81
Engine and Daemon Logs .................................................................................................82
Driver Logs .......................................................................................................................83
Application Server Logs ...................................................................................................83
Chapter 12 - Administration Howto ..................................................................................................85
Backup / Restore ...........................................................................................................................85
Backup Procedure .............................................................................................................85
Restore Procedure .............................................................................................................85
Manager Configuration .................................................................................................................85
Applying a patch or service pack to GridServer ...............................................................85
Importing and Exporting Manager Configuration ............................................................86
Installing Manager Licenses .............................................................................................86
Setting the SMTP host ......................................................................................................87
Setting Up a Failover Broker ............................................................................................87
Configuring SNMP ...........................................................................................................88
Enabling Enhanced Task Instrumentation ........................................................................89
Engine Management .....................................................................................................................89
Deploying Files to Engines ...............................................................................................89
Updating the Windows Engine JRE .................................................................................90
Updating the Unix Engine JRE .........................................................................................90
Setting the Director Used by Engines ...............................................................................91
Running Services ..........................................................................................................................91
Running MPI Jobs using PDriver .....................................................................................91
Registering a Service Type ..............................................................................................92
Creating and Running a Batch .........................................................................................92



6 –•Contents This Document is Proprietary and Confidental



Creating a native stack trace in Linux ..............................................................................93
Attaching GDB to Engine native code on Linux ..............................................................93
Logging messages from a Native service to the Engine log .............................................94
Running a .NET Driver from an Engine Service ..............................................................94
Configuration Issues .....................................................................................................................95
Installation on Dual-Interface Machines ...........................................................................95
Configuring the timeout period for the Administration Tool ...........................................95
Reconfiguring Managers when Installing a secondary Director .......................................95
Using UNC paths in a driver.properties file .....................................................................95
Chapter 13 - Database Administration .............................................................................................97
Introduction ...................................................................................................................................97
Database Types .............................................................................................................................97
The Reporting Database ....................................................................................................97
The Internal Database .......................................................................................................97
Internal Database Backup .............................................................................................................97
Appendix A - The grid-library.dtd ....................................................................................................99
Introduction ...................................................................................................................................99
Appendix B - Reporting Database Tables ......................................................................................101
Introduction .................................................................................................................................101
Batches ........................................................................................................................................101
Brokers........................................................................................................................................ 101
Broker_stats ................................................................................................................................102
Driver_events.............................................................................................................................. 102
Driver_profiles ............................................................................................................................103
Driver_users ................................................................................................................................103
Engine_events .............................................................................................................................104
Engine_info .................................................................................................................................104
Engine_stats ................................................................................................................................104
Event_codes ................................................................................................................................105
Job_status_codes......................................................................................................................... 105
Jobs .............................................................................................................................................105
Job_discriminators ......................................................................................................................106
Properties ....................................................................................................................................107
Tasks ...........................................................................................................................................107
Task_status_codes ...................................................................................................................... 107
Users ...........................................................................................................................................108
User_events .................................................................................................................................108
Index .......................................................................................................................................................109



GridServer Administration Guide • 7





8 –•Contents This Document is Proprietary and Confidental



Chapter 1
Introduction
••••••

This guide is a reference for the administrator who maintains GridServer installations. It includes advanced
information on how GridServer works, including scheduling, routing, failover, and file deployment, plus a
tour of the GridServer Administration Tool. Howto information is given on frequent tasks, plus advanced
information is included on security, tuning, database administration, and log files.

Before you begin


This guide assumes that you already have a GridServer Manager running and know the hostname, username,
and password. If this isn’t true, see the GridServer Installation Guide or contact the administrator
responsible for the installation.

GridServer 4.2 Documentation Roadmap


The following documentation is available for GridServer 4.2:

GridServer Guides
Four guides and four tutorials are included with GridServer in Adobe Acrobat (PDF) format. They are also
available in print format. To view the guides, log in to the Administration tool, select the Admin tab, go to
the Documentation page, and select a guide. A search engine is also available on this page for you to search
all of the documentation for a phrase or keywords. The PDF files can also be found on the Manager at
livecluster/admin/docs. The following guides are available:

Introducing the GridServer Platform Series:


Introducing the GridServer Platform Contains an introduction to GridServer, including
definitions of key concepts and terms, such as work,
Engines, Directors, and Brokers. This should be read first if
you are new to GridServer.
The GridServer Administration Series:
GridServer Administration Guide Covers the operation of a GridServer installation as
relevant to a system administrator. It includes basic theory
on scheduling, fault-tolerance, failover, and other concepts,
plus howto information, and performance and tuning
information.
GridServer Installation Guide Covers installation of GridServer for Windows and Unix,
including Managers, Engines, and pre-installation planning.



GridServer Administration Guide • 9



GridServer 4.2 Documentation Roadmap

The GridServer Developer Series:


GridServer Developer’s Guide Contains information on how to develop applications for
GridServer, including information on Service Domains,
using Services, PDriver (the Batch-oriented GridServer
Client), the theory behind development with the GridServer
Tasklet API and concepts needed to write and adapt
applications.
GridServer Object-Oriented Integration Tutorial on developing applications for GridServer using
Tutorial the object-oriented Tasklet API in Java or C++.
GridServer Service-Oriented Integration Tutorial on developing applications for GridServer using
Tutorial Services, such as Java, .NET, native, or binary executable
Services.
GridServer PDriver Tutorial Tutorial on using PDriver, the Parametric Service Driver, to
create and run Services with GridServer.
GridServer COM Tutorial Tutorial explaining how client applications in Windows can
use COMDriver, GridServer’s COM API, to work with
services on GridServer.

Other Documentation and Help


In addition to the GridServer guides, you can also find help and information from the following sources:
GridServer Administration Tool Help Context-sensitive help is available throughout the GridServer
Administration Tool by clicking the help icon located on any page. This provides reference help, plus how-
to topics.
API Reference Reference information for the GridServer API is provided in the GridServer SDK in the docs
directory. The Java API information is in JavaDoc format, while C++ documentation is presented in HTML,
and .NET API help is in HTMLHelp. You can also view and search them from the GridServer
Administration Tool; log in to the Administration Tool, click the Admin tab, and select the Documentation
link.
Knowledge Base A searchable archive of known issues and support articles is available online. To access
the DataSynapse Knowledge Base, go to the DataSynapse customer extranet site at
customer.datasynapse.com and log in. You can also use this site to file an issue report, download product
updates and licenses, and view documentation.



10 • Chapter 1 – Introduction This Document is Proprietary and Confidental



Document Conventions
Convention Explanation Example
italics Book titles The GridServer Developer’s Guide describes
this API in detail.
“Text in quotation References to chapter or section See “Preliminaries.”
marks” titles
bold text Emphasizes key terminology Client applications (Drivers) submit work to a
central Manager.

Enter your URL in the Address box and click


Interface labels or options Next.
Courier New User input, directories, file names, Run the script in the /opt/datasynapse
file contents, and program scripts directory.
Blue text Hypertext link. Click to jump to the See the GridServer Developer’s Guide for
specified page or document. details.
[GS Manager Root] The directory where GridServer is The Driver packages are located in
installed, such as c:\datasynapse [GS Manager
or Root]/webapps/livecluster/WEB-
/opt/datasynapse. INF/driverInstall



GridServer Administration Guide • 11



Document Conventions



12 • Chapter 1 – Introduction This Document is Proprietary and Confidental



Chapter 2
Work
••••••

Introduction
GridServer supports a Services model for dividing and processing work. This method takes a large data
intensive or compute-intensive problem and logically breaks it down into units of work that can run
independently and combine for a final result. GridServer receives the work unit requests and services them
in parallel. Additionally, high throughput applications or services can be distributed to a Grid. Then, many
similar requests for that service can be fulfilled as they arrive. Each request for service is independent, may
be stateful, and generally arrives unpredictably at different points in time.
Services also provide a language-independent interface to the GridServer platform. As an alternative, the
language-specific Job API can be used to leverage existing Java or C++ development resources. Both
models are described below.

Services
The Service-Oriented method of defining work in GridServer is a standards-based model. It uses a thin client
model, which promotes easy integration of an existing implementation. It also promotes language
interoperability, as clients written in different languages can invoke methods in Service Implementations
written in the same or other languages.
There are two components used with the Service-Oriented method: Clients and Service Implementations.
Both are described below.

Clients
A client or client application is the implementation that is
used to create a Service Session. The client invokes
methods that have been distributed on Engines.
You can create a Service client in different ways:
• A client-side API in Java, COM, C++, or .NET.
• A service proxy of Java or .NET client stubs generated
by GridServer.
• A Web Service client using SOAP, a lightweight protocol
used for exchanging messages with decentralized
components.

Service Implementations
Service Implementations are deployed to Engines, and
FIGURE 2-1: The relationship between Service
process requests from clients. They process data and return Clients and Service Implementations.
results back to the client. Service Implementations are



GridServer Administration Guide • 13



Jobs

registered on a GridServer Manager, as a Service Type, which is virtualized on its Engines. When a client
makes a client request, it sends the request to a Manager instead of directly requesting an Engine to do the
work. This one-to-many relationship provides fault tolerance and scalability for Services.
Service Implementations can be constructed with any of the following:
• Arbitrary Java classes
• Arbitrary .NET classes
• A Dynamic Library (.so, .DLL) with methods that conform to a simple input-output string interface.
• A command, such as a script or binary executable
Integration as a Service in most cases requires minimal changes to the client application.

Service Session
A running Service is referred to as a Service Session. This includes the Service Client, Service
Implementation, and Service state on all components. When a client has created a Service and the Service
Implementation is running on Engines, this is collectively called the Service Session.

Service benefits
There are many advantages to Services:
Cross-language Client and Service can be in different languages
Dynamic Method names can be determined dynamically, or use generated proxies for type safety
Flexible Use synchronous or asynchronous invocation patterns; can use client proxies generated by
GridServer
Virtual Client-Engine correspondence is not one-to-one; Service requests are adaptively load
balanced
Stateful Despite being virtual, stateful Services can be handled
Standards Standards-compliant
For more information on Services, see Chapter 3, “Creating Services” on page 23 and Chapter 4,
“Accessing Services” on page 33 of the GridServer Developer’s Guide.

Jobs
The Object-Oriented method of defining work in GridServer utilizes easy-to-use C++ and Java APIs to
create a rich, empowered client. Using this API, a programmer defines a “Job” as a collection of Tasks, with
each Task defined as an atomic sub-partition of the overall workload that is run in its entirety on an Engine.
The client code submits work and administrative
commands and retrieves computational results and
status information through a simple API.

FIGURE 2-2: Tasks within a Job.



14 • Chapter 2 – Work This Document is Proprietary and Confidental



Using the API, you design a Tasklet, which contains the Engine-side
code for each Task, and marker interfaces called TaskInput and
TaskOutput.

Job Benefits
The Job-Task model has differences to the Service model which may
be an advantage, depending on your development scenario. Its API
makes it easy to adapt if you are designing new applications in Java or
C++, and its API makes it easy to leverage existing trained
programming resources.
FIGURE 2-3: Workflow between a
For more information on the Job API, see Chapter 5, “The Tasklet Job and an Engine.
API” on page 45 of the GridServer Developer’s Guide.

Binary-level Integration
Another native Driver, PDriver, enables you to execute command-line programs as a parallel processing Job
without using the API.
PDriver, or the Parametric Job Driver, is a Driver that can execute existing command-line programs as a
parallel processing service using the GridServer environment, taking full advantage of the parallelism and
fault tolerance of GridServer.
PDriver achieves parallelism by running the same program on Engines several times with different
parameters. A script is used to define how these parameters change. For example, a distributed search
mechanism using the grep command could conduct a brute-force search of a network-attached file system,
with each task in the Service being given a different directory or piece of the file system to search.
PDriver uses its scripting language, called PDS, to define jobs. These scripts can also be used to set options
for a PDriver Service, such as remote logging and exit code checking.
For more information on the PDriver, see Chapter 6, “PDriver” on page 49 of the GridServer Developer’s
Guide.



GridServer Administration Guide • 15



Binary-level Integration



16 • Chapter 2 – Work This Document is Proprietary and Confidental



Chapter 3
Engine Balancing and Client Routing
••••••

Introduction
This chapter covers the various mechanisms used by GridServer Directors to route Engines and Clients to
Brokers, and reallocation of Engines based on the changing state of the grid.

Client Routing
The following sections describe methods of routing Clients to Brokers, of which one or more can be used
together. However, in most scenarios Clients are associated with a specific Broker, and usually a Failover
Broker for fault tolerance.

Allowed Brokers Set


The easiest and most common method of routing clients is to use the Driver Profile’s allowedBrokers
property to perform direct routing to a set of Brokers. This is configured using the Driver Profile page on
the Driver tab in the GridServer Administration Tool. The profile must be associated with the username of
the client using the User Admin page on the Admin tab.

Client Properties Rules


Client can also be routed to Brokers using rules based on client properties. For centralized management at
the Director, user-defined properties are created using the Driver Property List, and are set on a Driver
Profile using the Driver Profile page. Additionally, client properties can also be set using the
DriverManager API or driver.properties file on the client. The profile must be associated with the
username of the client using the User Admin page on the Admin tab. The Broker Routing page is then
used to set up routing rules based on these properties.

Driver API
The DriverManager API on all Driver platforms provide a method, connect(String broker), that will force
the client to log in to the specified Broker. If a Driver Profile is associated with the client, this profile must
permit the specified Broker.

Engine Routing and Balancing


Engines are dynamically allocated resources that can migrate among Brokers based on such criteria as load
and policy. The Engine Balancer is the component on the Director that manages login and regularly re-routes
Engines to maintain an optimal balance across the Grid. The Primary Director’s balancer always runs, while
the Secondary Director’s will only run if the Primary is down.



GridServer Administration Guide • 17



Engine Routing and Balancing

On a regular basis, the Director polls all Brokers for the state of all Engines on those Brokers. The routing
mechanisms are tested against all Engines to determine where all Engines should optimally reside.
Typically, changes in state due to load balancing requirements will result in changes in the optimal
distribution. If it is determined that Engines should be re-routed, the Director sends a request to each Broker
that has Engines that should be moved, to log those Engines off. When an Engine logs off, it will then log
back in to the optimal Broker.
There are three balancers available, depending on how the grid is to be used. The weight-based balancer
algorithm attempts to distribute Engines equally by relative weights, and it also allows rule-based routing
using Engine properties. The Home/Shared Balancer routes Engines based on an Engine’s assigned Home
Brokers, and the sharing policy of Home Brokers to other Brokers. Additionally, because version 4.1 used
a different routing mechanism, and version 4.2 allows for 4.1 Brokers for staged migration of large grids, a
4.1-based balancer is available. All of the balancers take into account the number of running and pending
tasks on each Broker, and the desired maximum and minimum number of Engines for each Broker.
If the Engine Balancer is changed on the Director, it must be restarted. Also, all balancer settings must be
equal on Primary and Secondary Directors.

Engine Weight-Based Balancer


The Engine weight-based balancer allocates Engines based each Broker’s Engine weights value, which is
on the Broker Admin page. This value is the amount of Engines that the Broker will be allocated relative
to the other Brokers’ weights, when all Brokers are idle. The algorithm also takes into account session load,
and idle Engines will be reallocated to busy Brokers as they are needed.
This balancer also allows for rule-based routing via Engine Properties, when it is necessary to restrict some
Engines to a set of Brokers. Engine can be routed via their intrinsic properties, such as cpuTotal, and by user-
defined properties, which can be created using the Engine Property List page and assigned using the
Engine Properties List page. The Broker Routing page is used to set up routing rules based on these
properties.

Home/Shared Balancer
The Home/Shared Engine balancer uses an algorithm based on the idea that every Engine has a set of Home
Brokers that it will always work on when there are outstanding tasks, yet they can be shared to other Brokers
when there are no outstanding tasks on any home. Engines are assigned a home via its configuration, using
the Engine Configuration page. Brokers are configured to share their homed Engines to other Brokers
using the Broker Admin page.
This algorithm uses Broker needs and Engine preferences for Brokers to perform allocation. Each Engine
divides the existing Brokers into tiers by preference. A tier is an unordered set of Brokers. There are two
tiers by default-the Engine’s home Brokers, and the shared Brokers of those home Brokers. A third tier can
be introduced by splitting shared Brokers into two groups. The higher the tier, the more the Engine prefers
the Brokers in that tier.
The balancer uses the following rules:
1. An Engine is routed to the highest-tiered Broker that has pending tasks. If multiple Brokers in the
same tier have pending tasks, the choice is made at random, as if all weights were 1.



18 • Chapter 3 – Engine Balancing and Client Routing This Document is Proprietary and Confidental



2. An Engine will leave its current Broker only if there is a needy Broker in a higher tier. An Engine
will not move to a lower-tiered Broker unless it is idle.
3. Failover Brokers are never allocated Engines unless they are needy.
When using the Home/Shared Engine balancer, tiers are shown in the GridServer Administration Tool, in
the Broker Sharing field of the Broker Admin page. Brokers are separated into tiers with the semicolon,
such as “A,B;C,D,E”.
For example, an Engine configuration’s home Brokers are A and B. A’s shared list is “C,D;E”. B’s shared
list is “F;G”. An Engine with this configuration will have the following preferences: first: A, B; second: C,
D, F; third: E, G. Within each group, Brokers are equal, and ordering doesn’t matter.

Engine Balancer Configuration


Engine Balancing is configured in the GridServer Administration Tool on the Manager Configuration
page, in the Engines and Clients section. These setting must be identical on all Directors.:

Setting Description
Engine Balancer The Engine balancer that will be used: Weight-Based, Home/Shared, or 4.1-
Compatible.
Rebalance Interval The amount of time, in seconds, between balancing episodes. (Previously called
the Poll Period.)
Soft Logoff If true, Engine logoffs do not restart the JVM. This enables them to retain state
and log in faster.
Logoff Timeout The amount of time in seconds that an Engine will wait to finish a task before
logging off.
Engine Balance The fraction of extra Engines that will actually be moved to another Broker on a
Fraction balance. This can be set to less than 1 to dampen Engine movement. For instance,
if the fraction is 0.5 and the balancer determines that a Broker has 8 extra
Engines, it will only move 4 on the first balance. Assuming those Engines move,
on the next balance it will determine that there are 4 extra and move 2, and so on.
Engine Balance The maximum number of Engines that will be moved to another Broker on
Maximum a rebalance. The maximum applies over the entire grid. For instance, if this
parameter is set to 100 and the balancer determines that 200 Engines should
be rebalanced (after taking Engine Balance Fraction into account), then only
100 Engines will actually be rebalanced. Does not apply to 4.1-Compatible
balancer.
Engine Threshold The difference between the actual and optimal number of Engines on a Broker
must be greater than this value before any Engines are logged off. This threshold
minimizes unnecessary Engine reallocation. For example, if the threshold is 2,
and a Broker’s optimal number of Engines is calculated to be 8, it must have
more than 10 Engines before it will log off any of them. Applies to 4.1-
Compatible balancer only.



GridServer Administration Guide • 19



Failover Brokers

Note that if the 4.1-Compatible balancer is selected, it forces Engine instance grouping to avoid constant
Engine upgrading or downgrading.

Failover Brokers
The purpose of a Failover Broker is to temporarily take over the execution of service sessions when the
Client has no other Brokers to which it is permitted to connect. As far as Clients are concerned, Failover
Brokers become part of the pool of active Brokers when there are no other non-Failover Brokers on which
the client is permitted As far as Engines are concerned, Failover Brokers are considered to be part of the
active pool when there are active sessions in progress on that Failover. In either case, this Broker is now
treated like a non-Failover by the algorithm. It is important to then take this into account when setting up
the routing configuration. For example, if you are setting up a Driver Profile to allow a client on only one
Broker under normal conditions, you must also include a Failover Broker in its list of allowed Broker if you
wish this client to have a failover if its main Broker goes down.
See Chapter 4, “Grid Fault-Tolerance and Failover” on page 23 for more information.

Engine Upper and Lower Bounds


Brokers can also be configured to have upper and lower bounds on the amount of Engines that can be logged
in at a given time. These are set the Broker Admin page. By default the columns are hidden, so you may
need to add them using the Add Column control. The minimum value specifies that the balancer algorithm
will always leave at least this amount of Engines (assuming there are this many) on the Broker regardless of
the state of other Brokers. The maximum value is the cap on the total amount of Engines that can be allowed
on the Broker. Both values are always considered by the balancing algorithms.

Example Use Cases


Example use cases are presented in this section.

N+1 Failover with Weighting


An organization has four groups using all available Engines in a Grid. One group is guaranteed to be
allocated at least half of the Grid any time it needs it, and the other three groups share the remaining Engines.
Brokers Set up five Brokers. Each group gets a Broker, plus one is used for failover.
Drivers Create four Driver Profiles, one for each group. In each profile, set the allowedBrokers value to the
group’s Broker and the failover Broker. Assign the Profiles to the appropriate users.
Engines Use the weight-based Engine Balancer. Adjust Engine Weight on the Broker Admin page so the
first group’s Broker is weighted at 3.0, and the other three groups’ Brokers are weighted as 1.0. You would
most likely set the failover Broker weight at 1.0, so that a group would not be assigned any more resources
than normal if their Broker went down.



20 • Chapter 3 – Engine Balancing and Client Routing This Document is Proprietary and Confidental



Engine Localization with Sharing
A company has two groups, one in New York and one in London. Each has a single middleware application
that has a Driver that connects to its own Broker. Each group also has a set of CPUs that it expects to always
be working on their own calculations. However, there will be times when one group’s Broker is idle, so they
are allowed to share with each other.
Brokers Set up four Brokers, a regular and a failover for each group. Each regular Broker shares with the
other regular Broker, plus its own failover Broker.
Drivers Create two Driver Profiles, one for each group. In each profile, set the allowedBrokers value to the
group’s Broker and its failover Broker. Assign the Profile to the middleware application user.
Engines Use the Home/Shared Engine Balancer. Set up two Engine Configurations, “London” and “New
York,” which would home the Engines to their respective Broker.
In this scenario, the application always connects to its local Broker, unless it is down, in which case it moves
to its failover. Whenever that Broker has pending requests, all of its Engines will always be local. If the other
group’s Broker is idle, or if it does not need all of its Engines, any of its idle Engines will be routed to the
Broker that needs it.
You may also want to increase the Engine Threshold, and decrease the Engine Fraction, to minimize
wandering of Engines during normal work periods when there may be occasional brief times when the
Broker may have idle Engines.



GridServer Administration Guide • 21



Example Use Cases



22 • Chapter 3 – Engine Balancing and Client Routing This Document is Proprietary and Confidental



Chapter 4
Grid Fault-Tolerance and Failover
••••••

Introduction
GridServer is a fault-tolerant and resilient distributed computing platform. The GridServer platform will
recover from a component failure, guaranteeing the execution of Services over a distributed computing Grid
with diverse, intermittent compute resources. This section describes how GridServer behaves in the event
of Engine, Driver, and Manager failure. Failures of components within the Grid can happen for a number of
reasons, such as power outage, network failure, or interruptions by end users. For the purposes of this
discussion, failure means any event that causes Grid components to be unable to communicate with each
other.

The Fault-tolerant GridServer Deployment


A GridServer deployment consists of a
primary Director, an optional secondary
Director, and one or more Brokers. Drivers
and Engines log into the Director, which
routes them to one of the Brokers. Directors
balance the load among their Brokers by
routing Drivers and Engines to currently
running Brokers.
A minimal fault-tolerant GridServer
deployment contains two Directors, a primary
and a secondary, and at least two Brokers. The FIGURE 4-1: A typical redundant GridServer configuration.
Brokers, Engines, and Drivers in the Grid
have the network locations of both the primary and the secondary Directors. During normal operation, the
Engines and Drivers log in to their primary Director; the secondary Director is completely idle.
Other GridServer topographies, such as having multiple managers to handle volume or to segregate different
types of Services to different Managers, are discussed in Chapter 2, “Installation Overview” on page 7 of
the GridServer Installation Guide.

Heartbeats and Failure Detection


Lightweight network communications sent at regular intervals, called heartbeats, are sent between
GridServer components, such as from Drivers to Brokers, from Engine Instances to Brokers, and from
Engine Daemons to Directors. A Manager detects Driver and Engine failure when it does not receive a
heartbeat within the configurable heartbeat interval time. Drivers detect Broker failure by failing to connect
when they submit Jobs or poll for results. Engines detect Broker failure when they attempt to report for work
or return results. To minimize unnecessary messaging, a heartbeat is only sent if no other message has been
sent within the heartbeat interval.



GridServer Administration Guide • 23



Manager Stability Features

Manager Stability Features


Several precautions are taken to prevent Manager failure due to excessive traffic. For example, the number
of threads used for file update is limited. This prevents a large number of file updates from Brokers to
Engines from preventing other HTTP activity due to use of all of the HTTP threads on the application server;
instead, Engines will retry the download later when this maximum is reached. By default, this is set at 50
threads, but can be changed in the GridServer Administration Tool on the Manager Configuration page,
in the Communication section, with the Maximum Resource Download Connections property.
The number of Broker/Director messaging threads is also limited. If this limit is reached, clients will retry
rather than immediately fail.

Engine Failure
Network connection loss, hardware failure, or errant application code can cause Engine failure. When an
Engine goes offline, the work assigned to it is requeued, and will be assigned to another Engine. Although
work done on the failed Engine is lost, the Task will be assigned to a new Engine. Engines that have built
up a considerable state or cache or that are running particularly long Tasks could cause a larger loss if Engine
failure occurs. This can be avoided by shortening Task duration in your application or by using the Engine
Checkpointing mechanism. For more information on Task duration, see Chapter 10, “GridServer
Performance and Tuning” on page 77.
Each Engine has a checkpoint directory where a Task can save intermediate results. If an Engine fails and
the Manager retains access to the Engine machine’s file system, a new Engine will copy the checkpoint
directory from the failed Engine. It is the responsibility of the client application to handle correct resumption
of work given the contents of the checkpoint directory.
Note that if an Engine Daemon logs off the Director or otherwise fails, it does not log off its Engines.
Provided the failure has not caused the Engines to also fail, they will continue working and return results
when completed.

Driver Failure
When a client application fails, the Broker detects the failure when the Client does not return a heartbeat
and does not not log back in within the interval specified by the Client Timeout setting. When this
happens, any currently running services are cancelled. If this happens, application failure recovery or
restart is the responsibility of your application. The exception to cancellation are fully submitted Services
of type Collection.LATER, or any of type Collection.NEVER. Also, if a Client is collecting results from a
Collection.LATER type Service, none of the outputs will be removed until all have been collected and the
Client destroys the Service, so that if a Client fails during collection it can restart and recollect the outputs.
All Driver fileservers return a “Server Unavailable” code with instructions to retry if they are processing too
many concurrent requests. This significantly reduces the chance of a Service invocation failing due to a
temporarily overloaded Driver.



24 • Chapter 4 – Grid Fault-Tolerance and Failover This Document is Proprietary and Confidental



Director Failure
If the primary Director fails, the secondary Director takes over balancing and routing Drivers and Engines
to Brokers. Since the Directors do not maintain any state, no work is lost if a Director fails and is restarted.
Also, because both Directors follow the same rules for routing to Brokers, it makes no difference which
Director is used for login.
The Primary Director is also responsible for the Administrative Database, which contains data needed by
the Grid for operation, such as the User list, routing properties, and so on. These values, then, can only be
modified on the Primary Director. This database is synchronized to the Secondary Director while both are
running, and backed up by the Secondary Director on every database backup, so that the Grid can remain in
operation when the Primary Director is down.

Broker Failure
Like the Director, the Broker is designed as a robust application that will run indefinitely, and will typically
only fail in the event of a hardware failure, power outage, or network failure. However, the fault-tolerance
built into the Drivers guarantees that all Services will complete even in the event of failure.
Because the most likely reason that a Driver will be disconnected from its Broker is a temporary network
outage, the Driver does not immediately attempt to log in to another Broker. Instead, it waits a configurable
amount of time to reconnect to the Broker to which it was connected. After this amount of time, it will then
attempt to log in to any available Broker. This amount of time is specified in the driver.properties file or
via the API.
Once the Driver has timed out and reconnected to another Broker, all Service instances will then resubmit
any outstanding tasks and continue. Tasks that are already complete will not be resubmitted. The Service
instances will also resubmit all state updates in the order in which they were originally made. From the
Service instance point of view, there will be no indication of error, such as exceptions or failure, just the
absence of any activity during the time in which the Driver is disconnected. That is, all Services will run
successfully to completion as long as eventually a suitable Broker is brought online.
If an Engine is disconnected from its Broker, the process simply shuts down, restarts, and logs in to any
suitable Broker. Any work is discarded.

Failover Brokers
In the fault-tolerant configuration, somea
Brokers can be set up as a Failover Brokers.
When a DriverClient logs in to a Director, the
Director will first attempt to route it to a non-
Failover Broker. If no non-Failover Brokers are
available, the Director will consider all
Brokers, which would typically then route the
Driver to a Failover Broker.

FIGURE 4-1: A GridServer configuration with Failover


capability.



GridServer Administration Guide • 25



Fault-Tolerant Tasks

A Failover Broker is not considered for Engine routing if there are no active Services on that Broker.
Otherwise, it is considered like any other Broker, and follows Engine routing like any other Broker. By
virtue of these rules, if a Failover Broker becomes idle, Engines will be routed back to other Brokers.
The primary Director monitors the state of all Brokers on the Grid. If a Driver logged into a Failover Broker
is able to log in to a non-Failover Broker, it will be logged off so it can return to the non-Failover Broker.
All running Services will be continued on the new Broker by auto-resubmission.
By default, all Brokers are non-Failover Brokers. Designate one or more Brokers within the Grid as Failover
Brokers when you want those Brokers to remain idle during normal operation.

Fault-Tolerant Tasks
Fault-Tolerant Tasks enable an Engine to continue executing a task even if it logs off of a Broker, so that it
does not lose work due to a Broker failure. It is intended for use on long-running tasks.
This means that if an Engine is working on a task, and it logs off of the Broker, it will not immediately exit.
Rather, it will continue to work on that task, while continuing to attempt to log in to a Broker that has the
Service on which it is working. If it does not log back in within a defined time period, it will exit. If it does
log back in, it will first notify the Broker that it is working on the task. If it has already completed, it will
immediately send the result; otherwise, it will do so upon completion.
It’s not recommend that you use this feature unless you have individual tasks that take many hours to finish
(or the longest task takes nearly as long as the whole job.) For example, if a report runs during the night and
some tasks takes 8 hours to process, then you may want this feature in place to ensure that the 8 hours task
didn’t have to start from the beginning if the Broker failed at 7 AM. On the other hand, enabling fault-
tolerant tasks can diminish the efficiency of the Grid, since it will redundantly schedule all outstanding tasks.
With short tasks, it’s usually more efficient to simply recalculate tasks in the event of a Broker failure.
As an example of Fault-Tolerant Tasks, consider the following:
1. An Engine and Driver are connected to Broker A.
2. Broker A goes down.
3. The Driver continues for 5 minutes to find the Broker with its Service. The Engine continues
working, while it attempts to find the Broker with its Service.
4. After 5 minutes, the Driver connects to Broker B, and resubmits outstanding work.
5. Now that the Service is on Broker B, the Engine logs in to Broker B, and indicates that it has taken
that task. When it has finished, it writes its task. If it has already finished, it immediately writes the
task.
If another Engine has already taken that task by the time this Engine logs in, no attempt will be made to
cancel the task on the Broker. It will essentially be the same as a redundantly rescheduled task.
When an Engine logs into a failover Broker and works on a task, the task is cancelled once the Driver
switches to the regular Broker.
To enable Fault-Tolerant Tasks, in the GridServer Administration Tool, click the Manager tab, then click
Manager Configuration, then Engines and Clients and change the value of Engine Timeout Minutes and
click Save. The timeout should be longer than the Driver’s timeout, which is the value of DSBrokerTimeout
set in the driver.properties file.



26 • Chapter 4 – Grid Fault-Tolerance and Failover This Document is Proprietary and Confidental



To use Fault-Tolerant Tasks, another Broker must be available for failover, and the Client running the session
will need to fail over to the Broker and resubmit its session.
No attempt will be made upon login of the Engine running a fault-tolerant task to cancel that same task if it
has already been taken by another Engine.

Batch Fault-Tolerance
Batch Schedules that exist on a Manager are persistent, provided the Next Run field is not never. This
provides failover capability in the event of a Manager failure, as the Batch Schedules will still exist when
the Manager is restarted.
The following Batch Schedules are persistent:
• Absolute schedules
• Relative schedules with repeat
• Cron schedules
All persistent Batches are restarted when the Manager is restarted, just like they were scheduled for the first
time. Batch runs that were to occur during the time when the Manager was down are ignored.

GridCache Fault-Tolerance
GridCache supports fault-tolerance, as described below. Note that primary and failover Brokers must have
their clocks synchronized for GridCache failover.

Client
If any client puts data in the cache and subsequently dies or logs out, that data is still available to all other
clients. This is due to the fact that the Broker maintains the master index and complete view of the cached
data. This does not apply to the local caching mode where a region has a local loader that does not
synchronize with the other local caches.

Broker Restart
GridCache can be configured to survive Manager restart and failure. GridCache’s cache index is rebuilt on
system startup; objects persisted on the Broker’s file system will be recovered. If some or all of the cache is
stored in memory, that information will be lost.

Failover
A failover Broker can manage a GridServer cache when a regular Broker goes down, provided that the
persistent cache directory is on a shared filesystem. The location of this filesystem is configurable from the
Manager Configuration page in the GridServer Administration Tool. When the regular Broker goes down
and the failover Broker takes over, the failover Broker will build its cache index and begin managing the
cache from the shared filesystem. All clients that then fail over to the failover Broker will be able to get
references to the existing cache regions on the shared filesystem.



GridServer Administration Guide • 27



GridCache Fault-Tolerance

Note that a failover Broker can only be configured to fail over to one shared cache directory. Therefore, a
failover Broker can’t serve as a failover for multiple Brokers with different cache directories; a different
failover Broker would have to be used for each Broker.



28 • Chapter 4 – Grid Fault-Tolerance and Failover This Document is Proprietary and Confidental



Chapter 5
Scheduling
••••••

One of the responsibilities of Brokers is scheduling, which is the management of Services and Tasks on
Engines and interactions between Engines and Drivers. This chapter gives more details on how scheduling
works, and the method used to determine what Tasks in a Service are sent to what Engines.

Introduction
Most of the time, the scheduling of Services and Tasks on Engines is completely transparent and requires no
administration. However, in order to tune performance, or to diagnose and resolve problems, it is helpful to
have a basic understanding of how the Broker manages scheduling.
Recall that clients create Service Sessions on the Broker. Each Service Session consists of one or more
Tasks, which may be performed in any order. The scheduler determines the optimal match of Engines to
Services. Whenever an Engine reports to the Broker to request work, the Broker assigns a Task from that
Service to the Engine. When an Engine completes a Task, it is queued on the Broker for collection by the
client. If an Engine is interrupted during processing, the Task is requeued by the Broker.

Reschedules and Retries


Before the discussion of scheduling behavior, we must first define the terms Retry and Reschedule within
the context of scheduling Tasks.

Retry
A Retry is when a Task is re-queued due to a known failure of the Task. Such failures could be due to an
error condition in the implementation, an error due to inability to download data, or a failure of an Engine
(the monitor has detected that the Engine is no longer connected but it has not logged off.) It is always the
result of the Engine returning the Task as failed to the Broker. When a Task is retried, it is always placed at
the front of that session’s queue. The scheduler manages a retry count for each Task, so that a limit can be
placed on the number of allowed retries.

Reschedule
A Reschedule is when a Task is re-queued when it may or may not have failed. When a Task is rescheduled,
it is by default placed at the back of that session’s queue, unless the Reschedule First configuration option
on the Broker (set in the Manager tab, on the Manager Configuration page, in the Services section) is set
to true. The scheduler also manages a reschedule count for each Task. The following conditions result in a
reschedule:
• Engine Logoff: When an Engine logs off gracefully while running a Task (such as when UI or CPU idle
conditions are met, or there is a forced rebalance), the Task is rescheduled, but the reschedule count is not
incremented, since there was no Task error.



GridServer Administration Guide • 29



The Scheduler

• Redundant Rescheduler: If any of the Redundant Rescheduler strategies are in effect, Tasks may be
rescheduled to other Engines. By default, those Tasks are allowed to continue to run on the current
Engines, in case they finish before the rescheduled Tasks. In this case, the reschedule count is increased.

Timeout Behavior
When the INVOCATION_MAX_TIME option is set, it specifies that any invocation of a request may not exceed
this value. If a Task times out on an Engine, it may be either retried or rescheduled, depending on what
makes more sense for your application. If retried, the current Engine’s invoke process is terminated, and the
Task is assigned to another Engine. If rescheduled, the current Engine Task is allowed to continue execution.
In either case, the appropriate count is incremented.
The default behavior is set on the Broker, and is set to retry by default. It can also be set for the Service Type
via the Service Type Registry page, or programatically when the Service Session is created.

The Scheduler
The Scheduler is the component that is used on a GridServer Broker to assign tasks to Engines. It attempts
to make optimal matches based on criteria such as the session priority level, affinity, and Serial Service and
Priority execution modes.

Scheduler Overview
The scheduler aims to schedule tasks to Engines by attempting to have the proper amount of Engines
allocated to all active Service Sessions at any given time. On any given scheduling event, the algorithm
decides the number of Engines each Session should have at the time based on static and dynamic criteria,
and then assigns the appropriate number of Engines to sessions based on how many the Session needs to
reach the ideal level.
Additionally, the scheduler takes into account the amount of usage that the Session has received over a given
historical window of time. The “usage” refers to the amount of Engine clock time that the Session has
occupied during that window. When a Session is created, it is initialized in such a way that it simulates as if
it was running ideally over this window.
This usage provides the ordering in which Engines are allocated to Sessions. This addresses starvation
issues, round off error (the number of ideal Engines will rarely be an integer), and under/over-utilization due
to discrimination, changes in the number of available Engines, and so on.
Essentially, on a scheduling event, sessions are assigned the ideal number of Engines less the amount that
are currently allocated, in the order of least to most usage. The following sections will discuss first the
general algorithm, and then address specific subclasses of that algorithm for serial service and priority
execution modes.
This approach can be seen as analogous to a CPU thread scheduling algorithm. Each session is a “thread”,
the engines are the “CPU”, the window is the sample period, and each task is an uninterruptible unit of CPU
time allotted to a thread.



30 • Chapter 5 – Scheduling This Document is Proprietary and Confidental



Service Priority
Every GridServer Service has an associated priority. Priorities can take any integer value between zero and
ten, so that there are eleven priority levels in all. 0 is the lowest priority (a suspended Service), 10 is the
highest (an urgent priority Service, see below), and 5 is the default. The GridServer API provides methods
that allow the application code to attach priorities to Services at runtime (see the GridServer API
documentation for more details) and you can use the GridServer Administration Tool to change priorities
while a Service is running.
Priority Weight refers to the weight associated with a Priority Level. The weight defines the amount of
Engines allocated to a session relative to all other active sessions. For example, if Session A and B have
weights of 2.0, and Session B has weight 4.0, and there are eight Engines, Session A and B get allocated two
Engines each, and Session B gets four. The weights are set with the Priority Weights property in the
GridServer Administration Tool, on the Manager Configuration page in the Services section.

Usage Algorithm
The usage algorithm is the default mode, and is used when Serial Service Execution mode is not enabled.
Whenever an Engine or set of Engines is available for scheduling, the scheduler decides how many Engines
each session should be allocate. In general, that value is:
Ideal Engines per Session = All Engines * Session Priority Weight / Total Weight,
where “Total Weight” is the sum of all Priority Weights of active sessions. This value is rounded up to the
next integer to prevents starvation for an ideal calculation of < 0.5, and assures that the sum of Ideal Engine’s
is always at least as large as Total Engines. This algorithm also takes into account if the actual number of
Engine that can be allocated is less than the ideal, such as when a Session is towards the end, or when Max
Engines is used.
Recall that a Session’s usage is considered to be the total Engine clock time spent on the session over the
last configurable amount of time. This includes running and completed tasks. When a Session is created, it
must initialize its usage. The simplest, most fair method of doing this is to assume it has been operating in
a steady state over the window with the ideal non-rounded number of Engines. The variables that monitor
usage are then initialized as such. If no sessions are active, it initializes them such that the session's ideal is
the total number of Engines currently on the Broker.
Whenever there is any event that requires a scheduling episode, the scheduler assigns the proper number of
engines to each session for it to be at its ideal amount. This assignment is performed in order of least to most
priority-normalized usage. If there are any unassigned Engines remaining after this initial round based on
usage (typically due to disallowed conditions preventing assignment), a second tier round robin assignment
is performed.

Time Algorithm
The time algorithm is used when Serial Service Execution mode is enabled. This algorithm works as
follows:



GridServer Administration Guide • 31



Urgent Priority Services and Preemption

Session Addition
When a session is added to the Waiting List, it is placed such that it is ordered by Session creation time.
Typically this is at the back of the list, although if the session had been removed and then re-added, it may
not be.

Scheduling Episode
On each episode, only the first session with waiting tasks is considered for assignment. The scheduler simply
attempts to assign all Idle Engines to the session. Affinity is not considered. Note that as soon as the Session
has no more waiting tasks, subsequent Sessions may be assigned Engines on the next episode even while
the previous session is still running.

Serial Priority Algorithm


The Serial Priority Algorithm is used when Serial Priority Execution mode is enabled. Either the Time
Algorithm or the Usage Algorithm, depending on whether Serial Service Execution mode is enabled, is used
on the subset of sessions at the current highest Priority Level that have waiting tasks in any sessions.
For example, with Serial Service Execution mode off, all sessions at level 9 (assuming highest) will be
allocated equal amounts of Engines until no more sessions at level 9 have waiting tasks, after which level 8
sessions are allocated.
On the other hand, with Serial Service Execution mode on, all sessions at level 9 will execute in their order
of creation. Note that in this state, if they finish, and level 8 sessions start, and then a new level 9 session is
created, that new level 9 session will take over at that point. This is because priority takes precedence over
creation time.

Urgent Priority Services and Preemption


Services with priority of 10 are considered urgent by the scheduler. (The API defines PRIORITY_URGENT to be
equal to 10.) An urgent Service’s weight is hard-coded to be essentially infinite, so that they are assigned all
available Engines. They may also preempt Engines that are currently working. When an Engine is
preempted, the Task it is currently running is cancelled and rescheduled, and the Engine becomes available
for new Tasks.
Engines are preempted on a Service under the following conditions: if after being assigned all free Engines
a Service can still make use of more Engines, then it may preempt some busy Engines, subject to two
constraints that can be adjusted with configuration properties. First, the urgent Service must have been in
the queue for Preempt Delay Seconds. Second, the percentage of Engines in the Grid running urgent
Services cannot exceed Preemptable Engine Percent. For example, if this property is set to 50, and 47
percent of the Engines are currently running urgent Services, then at most three percent will be preempted.
This value is not a hard limit on the number of Engines that may be running urgent Services, because free
Engines are allocated to urgent Services regardless of how many Engines are already running urgent
Services.
The scheduler chooses Engines for preemption based on the following rules: Engines running an urgent
Service will never be preempted. An Engine running a Task from a Service with lower priority will generally
be selected in preference to one running a higher-priority Task. However, if the lower-priority Task has been
running for a long time, a short-running, higher-priority Task may be preempted instead. The Preempt



32 • Chapter 5 – Scheduling This Document is Proprietary and Confidental



Threshold Minutes property determines the value at which this crossover happens. For example, if this
property is set to 30, then an Engine that has just started running a priority 2 Task will be chosen for
preemption over an Engine that has been running a priority 1 Task for more than 30 minutes.
Other important points concerning priority Services and preemption:
• Tasks canceled by preemption are not subject to a rescheduling limit, since they are not considered
failures.
• To prevent preemption from ever occurring, set Preemptable Engine Percent to 0.
• It is possible that the first Service on the queue will not get all free Engines if it doesn’t have enough
Tasks, it is already using its maximum number of Engines, or it discriminates against some Engines. Free
Engines that are not taken by the first urgent Service are first offered to the other urgent Services on the
queue, and then to all other Services.

Engine Blacklisting
If a Service sets the option “engineBlacklisting” (ENGINE_BLACKLISTING) to true, then Engines that fail on a
Task from that Service will not be given any other Tasks from that Service. The default is false. “fail” means
any action that results in a failed Task being sent back to the Manager, regardless of whether that failure was
due to Engine hardware, Engine environment, or Tasklet code. It does not include events such as the Engine
going offline to user activity, since that does not result in a Task failure.
Blacklisted Engines are excluded for a particular Service Session only; they can freely accept tasks from any
other Service, regardless of Service Type, assuming the other Services haven’t also blacklisted the Engine
or have some discriminators in place that prevent it.
To remove an Engine from all blacklists, go to the Engine Daemon Admin page in the GridServer
Administration Tool and select Clear from Blacklists from the Actions list.

Conditions
Task Discrimination allows limiting certain Tasks to a subset of Engines. If an Engine is ineligible to take
the next waiting Task, it will be assigned the first Task it is eligible to take.
The Broker tracks a number of predefined properties, such as available memory or disk space, performance
rating (megaflops), operating system, and so forth, that the Discriminator can use to define eligibility. The
site administrator can also establish additional attributes to be defined as part of the Engine installation, or
attach arbitrary properties to Engines “on the fly” from the Broker.
More information on using the Discriminator API, can be found in Chapter 9, “Using Discriminators” on
page 85 of the GridServer Developer’s Guide.

Redundant Task Rescheduling


Redundant rescheduling addresses the situation in which a handful of Tasks, running on less-capable
processors, might significantly delay or prevent Job completion. The basic idea is to launch redundant
instances of long-running Tasks. The Broker accepts the first result to return; remaining instances will not
be cancelled immediately; it will wait to either finish, or wait until the Job finishes. Redundant rescheduling
does not apply to Services. It is also unrelated to any other retry/reschedule behavior described above.



GridServer Administration Guide • 33



Redundant Task Rescheduling

By default, redundant Task rescheduling is not enabled. With pools of more capable or nearly identical
Engines, fastest Task execution occurs when there is no redundancy from rescheduling. In general,
rescheduling is only appropriate when there are widely different capabilities in Engines.
Three separate strategies, running in parallel, govern rescheduling. Tasks are rescheduled whenever one or
more of the three corresponding criteria are satisfied. However, none of the rescheduling strategies comes
into play for any Service until a certain percentage of Tasks within that Service have completed; the Strategy
Effective Percent parameter determines this percentage.
The rescheduler scans the pending Task list for each Service at regular intervals, as determined by the Poll
Period parameter. Each Service has an associated taskMaxTime, after which Tasks within that Service will
be rescheduled. When the strategies are active (based on the Strategy Effective Percent), the Broker tracks
the mean and standard deviation of the (clock) times consumed by each completed Task within the
Service. Each of the three strategies uses one or both of these statistics to define a strategy-specific time
limit for rescheduling Tasks.
Each time the rescheduler scans the pending list, it checks the elapsed computation time for each pending
Task. Initially, rescheduling is driven solely by the taskMaxTime for the Service; after enough Tasks
complete, and the strategies are active, the rescheduler also compares the elapsed time for each pending Task
against the three strategy-specific limits. If any of the limits is exceeded, it adds a redundant instance of the
Task to the waiting list. (The Broker will reset the elapsed time for that Task when it gives the redundant
instance to an Engine.)
The Reschedule First flag determines whether the redundant Task instance is placed at the front of the back
of the waiting list; that is, if Reschedule First is true, rescheduled Tasks are placed at the front of the queue
to be distributed before other Tasks that are waiting. The default setting is false, which results in less
aggressive rescheduling.
Each of the three strategies computes its corresponding limit as follows:
• The Percent Completed Strategy waits until the Service nears completion (as determined by the
Remaining Task Percent setting), after which it begins rescheduling every pending Task at regular
intervals, based on the average completion time for Tasks within the Service.
• The Average Strategy returns the product of the mean completion time and the Average Limit
parameter. That is, this strategy reschedules Tasks when their elapsed time exceeds some multiple (as
determined by the Average Limit) of the mean completion time:
• The Standard Dev Strategy returns the mean plus the product of the Standard Dev Limit parameter and
the standard deviation of the completion times. That is, this strategy reschedules Tasks when their elapsed
time exceeds the mean by some multiple (as determined by the Standard Dev Limit) of the standard
deviation:



34 • Chapter 5 – Scheduling This Document is Proprietary and Confidental



Chapter 6
The GridServer Administration Tool
••••••

Introduction
The GridServer Manager provides the GridServer Administration Tool, a set of web-based tools that allow
the administrator to monitor and manage the Manager, its Grid of Engines, and the associated job space.
The GridServer Administration Tool is accessed from a web-based interface, usable by authorized users
from any compatible browser, anywhere on the network. Administrative user accounts provide password-
protected, role-based authorization.
With the pages in the Administration
Tool, you can:
• Monitor Service and Task execution
and cancel Services
• Monitor Engine activity and kill
Engines
• View and modify Manager and Engine
configuration
• Install Engines
• Create administrative user accounts
and edit user profiles
• Subscribe to get e-mail notification of
events FIGURE 6-1: The GridServer Administration Tool.
• Edit Engine Tracking properties and
change values
• Configure Broker discrimination
• View the GridServer API FIGURE 6-2: The GridServer Administration Tool.

• Download the SDK files necessary to


integrate application code and run Drivers
• View and extract log information
• View diagnostic reports
• Run Service Tests

Getting Started
The Administration Tool is accessible via HTTP network access from any supported browser that supports
JavaScript and Java applets. Make sure that both of these features are enabled in the browser.



GridServer Administration Guide • 35



User Accounts and Access Levels

In the browser, open http://hostname:port/livecluster (where hostname is the address of the GridServer
Manager, and port is the port on which it is listening.); the Manager will prompt you for a username and
password. If you are running a browser on the same machine that runs the Manager, you can typically open
http://localhost:8000/livecluster to begin.

User Accounts and Access Levels


All of the administrative screens require you to first log in with a user account. The GridServer
Administration Tool uses a system of tiered access to provide security and enable different users to access
different areas of the interface. This is done by assigning different access levels for user accounts.
There are four account access levels: Configure, Manage, Service, and View. The Configure level is for
administrators and allows access to any part of the Administration Tool. By default, the admin account you
created at installation is set to the Configure level; you can also create accounts with full access for other
administrative users.
Other users can be given accounts with more limited access. When a user account with an access level of
View, Service, or Manage is used with the Administration Tool, some pages will either function differently,
or will not be available.

Creating User Accounts


To create a User Account:
1. Log in to the GridServer Administration
Tool using an account that has configure-
level access, such as the one created when
you first installed GridServer.
2. Click the Admin tab, then click User
Admin.
3. On the User Admin page, select Create
New User from the Global Actions list.
The New User Information page will open.
4. Enter the User Name, a password, and
confirm the password.
FIGURE 6-3: Creating a User account.
The following information for a username is
optional. You can also:
• Enter a first and last name, and an email address for notifications.
• Select an access level. By default, this will be View.
• If you are using Driver Authentication, you can associate a Driver Profile with a user account, so Drivers
using the same username as a user account will also use a specified Driver Profile. Select a Driver Profile
from the Driver Profile list to do this.
• Select the users that can be viewed with this account. This user will be able to view any Services
submitted by the selected users. Services that don't specify a user will default to the hostname of the
Driver and can only be viewed by setting Service Username Access to all



36 • Chapter 6 – The GridServer Administration Tool This Document is Proprietary and Confidental



Features Available by Access Level
The following table lists what pages are available in each level:

Level Pages
View Service Session Admin, Service Group Admin, GridCache Admin (view only), Dataset Admin
(view only), Propagator Admin (view only), Engine Home, Engine Admin, Engine Install,
Driver Admin, Broker Admin, Broker Monitor, Director Monitor, License Information,
Discriminator Admin (view only), Engine Configuration (view only), Manager Configuration
(view only), and Documentation.
Service All pages from the View level, plus SDK Download, Cache Configuration (view only),
Resource Deployment (view only), Service Test, Engine Admin - Log URL List, Engine Admin
- Remote Engine Log, Engine Admin - Search Logs, Engine Daemon Admin, Engine Daemon
Admin - Log Url List, Engine Daemon Admin - Search Logs, Event Subscription, Cache
Configuration, Hook Admin, Service Session Admin - Cancel Service, Service Session Admin
- Cancel All Services, Service Session Admin - Remove Finished Service, Service Session
Admin - Remove Finished Services, Service Session Admin - Set Priority, TaskAdmin - Cancel
Task, ServiceSessionAdmin - Update Deployment Files, and Service Test.
Manage All pages from the Service level (with full rights on all Admin pages), plus Discriminator Admin
(full rights), Engine Properties, Broker Routing, Event Subscription, Batch Admin, Batch
Schedule, Reports (except Direct Query), Engine Configuration (full rights), Manager
Configuration (full rights), Cache Configuration, Hook Admin, Current Log, and Diagnostics.
Configure All pages.

Service Session Admin methods or actions require the user to have Service Username Access to the Service
in question. For example, the Service Session page will only show a user’s Services, and that user can only
cancel their own Services.
User account access levels also affect the ability to use GridServer Web Services to programmatically
interact with GridServer. For a list of GridServer Web Service objects and methods enabled by access level,
see Chapter 10, “GridServer Admin API” on page 89 of the GridServer Developer’s Guide.
Note that access levels don’t filter Services that were submitted before the access level was changed. For
example, if a user’s account is changed from Configure to View while a long-running Service was active,
the user would still have Configure-level access to that Service.

User Account Security


User accounts can be secured by assigning minimum username and password length, password aging, and
other attributes. To configure User security, click the Manager tab, click Manager Configuration, then
click Security. The following are configurable: Minimum Username Length, Minimum Password Length,
Password Complexity, Password Aging, Password Aging Expiration, and Driver Fails Login With Expired
Password.
Note that when a user’s password expires, they are required to provide a new password when they log into
the Manager.



GridServer Administration Guide • 37



Navigating the Administration Tool

Session timeouts are also configured for logins to the GridServer Administration Tool and Admin Web
Services. By default, these are set at 60 minutes for Administration Tool logins and 300 seconds for Admin
Web Services. To change these values, click the Manager tab, click Manager Configuration, then click
Security. Values are located in the Admin User Management section.

Navigating the Administration Tool


The Administration Tool consists of a number of pages, organized in the following ways:

The Home Page


When you first open the Administration Tool, a home page is displayed with links to every page. Click a
link to go to that page. You can return to this home page by clicking the Home button in the shortcut buttons.

Tabs
All of the pages in
the Administration
FIGURE 6-4: The Administration Tool Tabs.
Tool are arranged
under seven tabs,
grouped by component or function. Click a tab to display a home page, which contains a description and
link for each of the pages available on the tab. You can click a page link to view that page. Each page in a
section is also listed in the page bar, which is located below the tab controls.
Below each tab is a bar containing a link to each page that’s on the home page, including the home page
itself. This is useful for returning to the home page, or quickly going to another page without first returning
to the home page.
Note that if you have gone to a page other than the home page, clicked on another tab, then clicked on the
first tab, you will return to the page you previously viewed, not the home page.
The following tabs are available:
Services The Services tab contains pages used to manage, view, and submit Services.
Engine The Engine tab contains pages used to manage, view, install, and configure Engines.
Driver The Driver tab contains pages used to manage and install Drivers.
Manager The Manager tab contains pages used to manage Brokers and configure your Manager.
Reports The Reports tab contains pages used to view statistics and events generated by the Manager.
Admin The Admin tab contains various administrative pages used to manage users, view logs, edit Manager
hooks, and view Documentation.
Batch The Batch tab contains links to create, edit, and manage Batches.



38 • Chapter 6 – The GridServer Administration Tool This Document is Proprietary and Confidental



Shortcut buttons
The shortcut buttons, shown to the right, are displayed in the upper right of
each page. The following buttons are available:
• Home - returns to the home page of the Administration Tool. FIGURE 6-5: Shortcut
• License Information - displays information on your GridServer license. buttons.
This button flashes when your license has expired, or when proxy limits
are exceeded. You can turn this off on the Manager tab, in the Manager Configuration page, in the
Admin section, by setting the property under the License Manager heading to false. You will also get a
license warning starting 14 days before your license is due to expire, on the login page.
• Help Index - opens an index of online help topics in a new window.
• Documentation - opens a list of all documentation, including links and a search engine.

Action Controls
Each table item has an action control, which is a list of actions you can choose. Some of these perform
actions on table items, while others open a new page.

Links on other pages


Some pages contain shortcut links to other related pages.
Note that only pages that are accessible from the current account are displayed. If you are not using an
administrative account with all privileges enabled, some options will not be visible.

Using Tables
Most pages have controls or information grouped in tables. The following controls can be used to sort or
reorganize tables for more convenient viewing:

Pager control
The Pager control enables you to step through
multiple pages, or specify how many rows appear
on a page. Select a page number from the Page list, FIGURE 6-6: The Pager control.
or select a range from the second list to display
those items. You can select a greater number of items listed per page in a table or display all of the items;
type a number in the Results Per Page box and click Go.

Search control
The Search control is displayed on any page containing a table.
You can use it to search any column of a table. Select a column
from the list, enter a search term, and click Go. FIGURE 6-7: The Search control.



GridServer Administration Guide • 39



Using Tables

Personalize Table
The Personalize Table commands enable you to make changes to a
table by removing or adding columns. There are two lists that control
this: FIGURE 6-8: The Add and Delete
column controls.
Add Column: Select the name of a listed column to add it to the table.
Columns previously deleted from the table will be listed, along with any optional columns that are not
displayed in a table’s default configuration. Columns will be added to the right of existing columns.
Delete Column: Select the name of a column to remove it from the table. Deleted columns will remain
hidden to this account, and these settings will be saved for future login sessions.
Tables are always sorted by a column that has an arrow in it, either facing up or down. You can click this
arrow to reverse the sort order of a table, or click another column to change the sort column.

Refresh
To update the list and display the most current information in a table, click the Refresh button. You can also
select a time value from the Refresh list to automatically refresh the table at a regular interval. To stop
automatic refreshes, select none.

Broker and Director Monitors


While the pages like the Service Session Admin page and Engine Admin page can be used to oversee the
running of Services on your Grid, two graphical tool can be used to provide a more simple overview of status
information on your system. Both Directors and Brokers have available a graphical monitor, which can be
displayed in its own window.
To display the Director Monitor, click the button to the left in the Administration Tool. Note that
this button is not present in Managers that only host a Broker.

To display the Broker Monitor, click the button to the left in the Administration Tool. Note that
this button is not present in Managers running only a Director.

Both
monitors display up-to-date information on your Grid. The
Director Monitor contains graphs with statistics on Engines,
Tasks, Servicesand machine status, including thread and
memory information. The Broker Monitor contains similar
information about one specific Broker. To the right is a
sample of a Director Monitor for a Grid with three Engines
running several Services at once.

Manager Component Indicator


The Manager Component Indicator graphically displays FIGURE 6-9: The Director Monitor.
what part of the Manager is controlled by each page within
the Administration Tool. Each page’s functionality will control either the entire Manager, a Broker, or a
Director.



40 • Chapter 6 – The GridServer Administration Tool This Document is Proprietary and Confidental



On Manager pages, a red and a blue sphere will be displayed.

If a page’s functionality is tied to a Director, just the red sphere is shown.

If a page’s functionality is for a Broker, just the blue sphere is shown.

Also, the Manager Component Indicator will show the hostname of the related component.

Status Display
The GridServer Administration Tool contains a Status Bar at the top of each page, which contains four Status
displays. Each of these displays are updated at each page reload with information about the status of your
Grid. The following Status displays are included:
• Busy Engines and Available Engines
• Drivers and Engine Daemons
• Running Services and Finished Services
• Running Tasks and Pending Tasks



GridServer Administration Guide • 41



Using Tables



42 • Chapter 6 – The GridServer Administration Tool This Document is Proprietary and Confidental



Chapter 7
Application Resource Deployment
••••••

Introduction
GridServer provides several options for distributing classes, libraries, and other resources to Engines.
A Grid Library (or GL) provides an enterprise solution to managing versioned sets of resources that may
be used by multiple services. Grid Libraries provide the following features:
• Version control, including optional automatic selection of the most current version of a Grid Library.
• Resource upgrading without interrupting current Sessions.
• Specification of dependencies on other Grid Libraries.
• Specification of C++ Bridges and non-default JREs via dependencies.
• All-in-one packaging for JARs, native libraries for multiple OSes, .NET assemblies, Command Service
executables, and Engine Hooks.
• Specification of Environment Variables and Java System properties.
• Engines that require different compiler support libraries (GCC2/GCC3) can participate in the same
Service Session.
• Optimization of Engine restarts.
• Task reservation when an Engine requires a restart.
• Parameterization of package configuration through the use of property substitution files.
The Resource Deployment feature replicates sets of directories from a Manager to Engines to provide a
method of copying and managing files. It can be used for Grid Libraries and for the default set of resources.
In the simplest sense, this enables you to copy a JAR, DLL, or another resource to each Engine to run a
Service.
Remote Application Installation can install and uninstall applications on remote Windows Engines in non-
Grid Library deployment.
This chapter details how to use each of these methods of deployment for your GridServer installation.

Grid Libraries
A Grid Library is essentially a set of resources and properties necessary to run a Grid Service, along with
configuration information that describes to the GridServer environment how those resources are to be used.
For example, a Grid Library can contain JARs, native libraries, configuration files, environment variables,
hooks, and other resources.
A Grid Library is deployed as an archive file in ZIP or gzipped TAR format, with a grid-library.xml file
in the root that describes the Grid Library. It may also contain any number of directories that contain
resources.



GridServer Administration Guide • 43



Grid Libraries

Grid Libraries are identified by name and version. All Grid Libraries must have a name, and typically have
a version. The version is used to detect conflicts between a desired library and library that has already been
loaded; it also provides for automatic selection of the latest version of a library. A GridServer Service can
specify that it is implemented by a particular Grid Library by specifying the gridLibrary and
gridLibraryVersion Service Options or Service Type Registry Options.

Grid Libraries can specify that they depend on other Grid Libraries; like the Service Option, such
dependencies can be specified by the name, and optionally the version. Also, nearly all aspects of a Grid
Library can be specified to be valid only for a specific operating system. This means that the same Grid
Library can specify distinct paths and properties for Windows, Linux, and Solaris, but only the appropriate
set of package options will be applied at run-time.

Grid Library Format


The Grid Library can be any archive file in ZIP (.zip) or gzipped TAR format (.tgz or .tar.gz), with a grid-
library.xml file in the root. Although the filename has no inherent meaning, we recommend the format:

[library name]-[library version].[zip|tar.gz|tgz]

The directory structure is completely up to the user, since the configuration file is used to specify where
resources are found within the Grid Library.
The configuration file must be a well-formed XML file named grid-library.xml, and be in the root of the
Grid Library.
The GridServer SDKs include a grid-library.dtd file that can be used to validate the XML file. They also
include an example Apache Ant build.xml file that can be used to validate and build Grid Libraries. This
DTD can also be found at Appendix A, “The grid-library.dtd” on page 99.
Following is a table that specifies all elements and attributes of the grid-library.dtd file. It uses the XML
schema notation for elements and attributes, such as:
[no tag] (Required)
? (Optional)
* (Optional and Repeatable)

Element Description Elements and Attributes


grid-library The root element. ELEMENTS grid-library-name
grid-library-version?
dependency*
jar-path*
lib-path*
assembly-path*
command-path*
hooks-path*
environment-variables*
java-system-properties*

ATTRIBUTES os?
compiler?

grid-library-name The library name. All libraries must be


named.



44 • Chapter 7 – Application Resource Deployment This Document is Proprietary and Confidental



Element Description Elements and Attributes
grid-library- The version. If not specified, 0 is implied.
version
If in comparable format as defined below,
it can be used to determine the latest
version.
dependency A library dependency. If the version is not ELEMENTS grid-library-name*
specified, the latest version is chosen at grid-library-version?

runtime.
conflict Indicates that this library conflicts with the ELEMENTS grid-library-name*
given library. If this Grid Library is NOT a
dependency, and grid-library-name="*",
then it indicates that this Grid Library
conflicts with all other Grid Libraries
(aside from its dependencies).
pathelement An element containing a relative path,
typically set to a directory. This element
must be in the proper format for the OS.
The path is resolved relative to the Grid
Library.
jar-path The JAR path. If specified, all JARs and ELEMENTS pathelement*
classes in the path are loaded.
ATTRIBUTES os?
compiler?

lib-path The native library search path. ELEMENTS pathelement*

ATTRIBUTES os?
compiler?

assembly-path The .NET assembly search path. Absolute ELEMENTS pathelement*


assembly paths, mapped drives, and UNC
paths will not work.
command-path The path in which the Engine will search ELEMENTS pathelement*
for Command Service executables.
ATTRIBUTES os?
compiler?

hooks-path Engine hooks library path. Engine Hooks ELEMENTS pathelement*


will be initialized at the time the containing
ATTRIBUTES os?
Grid Library is loaded. compiler?

name The name of a property


value The value of a property



GridServer Administration Guide • 45



Grid Libraries

Element Description Elements and Attributes


property A name/value pair, used by environment ELEMENTS name, value
variables and Java System properties.
environment- Environment variables to set. ELEMENTS property
variables
ATTRIBUTES os?
compiler?

java-system- Java system properties, which are set ELEMENTS property


properties
immediately prior to executing a task using
ATTRIBUTES oscompiler
this library.

The following is a list of attributes used above. Valid values can be found in the Product Info page in the
GridServer Administration Tool.:

Attribute Description
os The os attribute specifies that it is only applied to this OS. If the attribute is not this
operating system (OS), the containing element and its children and content are ignored.
compiler If the attribute is not this compiler, the containing element and its children and content
are ignored.

Variable Substitution
A file can be created that contains variable substitutions, which are substituted into the grid-library.xml
file. This allows for quick changes in properties in the grid-library.xml file without redeploying the Grid
Library.
You can have a default properties file in your Grid Library called grid-library.properties that can provide
baseline values for your variables. You can also create an external properties file, named with the same name
as the Grid Library archive, with the extension .properties, and place it in the Grid Library deployment
directory. External properties will substitute over those in the Grid Library.
If the grid-library.xml file contains a property with a value contained with the $ character, such as $mydir$,
and the properties file contains an assignment, such as mydir=c:\\dir, the variable is substituted.
NOTE: Substitutions are allowed within the content of property value elements and pathelements only. If
the substitution is not found in the file, the empty string, "", is substituted.
Substitutions are allowed anywhere in a string. Multiple substitutions per string are allowed. $ characters
can be treated as literals by escaping them with another $ character. Windows paths that are specified in the
[library].properties file must escape the \ character with another \.

Versioning
Versioning provides the following functionality:
• It allows for deployment of new versions of libraries and deletion of old versions without interrupting
currently executing Service Sessions.
• It provides for specifying conflicts, or libraries that cannot coexist with each other.


46 • Chapter 7 – Application Resource Deployment This Document is Proprietary and Confidental



• It allows for a Service Session or dependency to specify the use of the latest version of a Grid Library.
To use versioning, you must specify the Grid Library version in the configuration file. An Engine can load
only one version of the library with the same name at any time. If the version is not specified, it is implied
to be 0.
While the version can be any String, if it follows the proper comparable version format it can also be used
to determine the latest version of the library, for automatic loading. This format is
[n1].[n2].[n3]...

where nx is an integer, and there may be one or more version points.


For instance,
4.0.1.1, 4.1, 3

are in the proper comparable version format.


The integer at each version point is evaluated starting at the first point, and continue until a version point is
greater than the other. If a version point does not exist for one, it is implied as zero.
For instance
4.0.0.1 > 4.0
4.0.0.5 < 4.0.1.1

To specify that a dependency or Service use a particular version of a Grid Library, the version field is set to
that value. To specify that it use the latest version, the field is left blank.
If a version is specified but not in this format, and there are multiple versions of a library, the “latest version”
is undefined. Thus, automatic selection of the latest version is only possible when all Grid Libraries with the
specified name provide a version in the proper format.
Note that automatic versioning is dynamic. That is, if a Service or dependency specifies the latest version,
and a new version of a Grid Library is deployed, the next time that Grid Library is used by any Session it
will be the new version.

Dependencies
Grid Libraries may specify dependencies on other Grid Libraries. A dependency specification resolves to a
particular Grid Library using two values:
grid-library-name: The name of the Grid Library, as specified in the dependency’s XML
grid-library-version: The version of the Grid Library, as specified in the dependency’s XML. OS
compatibility is determined by checking the os and compiler tags for the top-level element in the dependent
Grid Library. If not specified, it will use the latest version supported by the OS
Note that if a dependency resolves to more than one Grid Library, the dependency used is undefined.
Two dependent libraries conflict if they have the same library name, but different versions.



GridServer Administration Guide • 47



Grid Libraries

Conflicts
A conflict between two Grid Libraries means that these libraries cannot be loaded concurrently. When there
is a conflict between a loaded Grid Library and a Grid Library required by a Service, the Engine must restart
to unload the current libraries and load the requested library.
The following circumstances result in a conflict:
Version Conflict
The most common conflict arises via versioning, and typically when upgrading versions or using more than
one version of the same library concurrently. This conflict arises when a Grid Library with the same grid-
library-name as the requested Grid Library, but different version, is loaded.

Explicit Conflict
There can be situations in which different Grid Libraries can conflict with each other due to conflicting
native libraries, different versions of Java classes, and so on. Because the Engine cannot determine these
implicitly, the conflict element can be used to specify Grid Libraries that are known to conflict with this
Grid Library.
Additionally, the value of the grid-library-name can be set to "*". This means that this Grid Library can
conflict with all other Grid Libraries (aside from its dependencies), and it is guaranteed that no other Grid
Libraries will be loaded concurrently with this Grid Library. Note that this is only allowed if the Grid Library
is not a dependency; if the "*" is used as a conflict in a Grid Library that is a dependency, a verification error
will occur.
Dynamic Version Conflict
A Grid Library conflict occurs if dynamic versioning is used, and the latest version of a Grid Library or Grid
Library dependency has changed due to an addition or removal of a dependency since the Grid Library has
been loaded.
Variable Substitution Conflict
A Grid Library conflict occurs if its variable substitution file has changed since it has been loaded.

Grid Library Loading


When a Service Session is set to use a Grid Library, that library is loaded. Loading is the process of setting
up all resources in the Grid Library for use by the Service. A library is loaded only once per Engine session.
First, the library loads itself, and then it loads all dependencies. Libraries are loaded depth-first rather than
breadth-first. Certain aspects of a load may require a restart, and possibly re-initialization of the state. The
following steps are performed by a load of the root library and all dependencies:
1. Checks for conflicts with currently loaded Grid Libraries. If so, it will restart with the requested Grid
Library and clear out the current state of any loaded libraries.
2. If new lib-paths have been added for its OS, they will be appended to the current list of lib-paths,
and the Engine will restart. The state of loaded libraries will include all libraries already loaded, plus
the requested library. Note that specifying a JRE dependency has this effect.
3. If new jar-paths have been added for its OS, the jars and classes will be added to the classloader.
4. If new assembly-paths have been added, it will add them to the .NET search path.



48 • Chapter 7 – Application Resource Deployment This Document is Proprietary and Confidental



5. If new command-paths have been added for its OS, it is added to the search path for Command
Tasklets.
6. If new hooks-paths have been added, any hooks in the path will be initialized.
7. If the default is current and a Grid Library is requested, the Engine will restart.

State Preservation
Under most cases, when an Engine shuts down, it preserves the current state of which Grid Libraries it has
loaded. When it starts back up, it loads all Grid Libraries that were loaded when it shut down. As Grid
Libraries are loaded, the pathelements they contain are added to a ‘master’ list of paths for that type of
pathelement. For example, if a Grid Library contains a lib-path specification, that lib-path is appended to
the list of lib-path values obtained from already-loaded Grid Libraries.
Note that this means that is up to the creator of the Grid Libraries deployed on the Grid to ensure that the
ordering of library paths does not lead to loading the wrong library
For example, if two different Grid Libraries each provide DLLs in their lib-paths that share the same name,
because of OS-specific library load conventions, the one that will be used will be the first one found in the
aggregate lib-path from across all loaded Grid Libraries. Likewise for Java classes, when more than one
copy of the same class is in the classloader, it is undefined which class will be loaded. Therefore it is
important to either subdivide Grid Libraries appropriately when such conflicts could arise, or to use the
conflict element to explicitly state conflicts.

If an Engine shuts down due to a conflict, it clears the current state and sets up for only the requested Grid
Library upon restart. This is referred to as preloading. If an Engine shuts down due to internal library
inconsistencies or a crash, the state is not saved. State is also cleared on all instances for file updates,
Daemon restarts, and Daemon disable.

Task Reservation
If an Engine requires a restart to load a Grid Library, the task will be reserved on the Broker for that Engine.
The Engine is instructed to log back into the same Broker, and will take that task upon login. The timeout
for this is configurable on the Broker on the Manager Configuration page, in the Services section.

Environment Variables and System Properties


All Environment variables and Java System properties for a Grid Library and all dependencies will be set
each time a task is taken from a particular service that specified that Grid Library. (They are not cleared after
the task is finished.) Environment variables are set via JNI so that they can be used by native libraries or
.NET assemblies, and they are also passed into Command Services. Note that environment variables such
as PATH and LD_LIBRARY_PATH should not be changed through this mechanism. Rather, library-path and
command-path are reserved for manipulating these variables.

Using Grid Libraries from a Service


Services can specify a Grid Library to use by setting the GRID_LIBRARY and optionally the
GRID_LIBRARY_VERSION Service Options. This would typically be set by Service Type in the Service Registry
page, although it can be set programatically on the Session. Jobs can specify a Grid Library to use by setting
the corresponding JobOption values. If the version is not set, a Service will use the latest version of a Grid
Library.


GridServer Administration Guide • 49



Grid Libraries

If a Service needs to find resources in a Grid Library, it can use the Grid Library Path. This value is a path
value that includes the root directories of all Grid Libraries currently loaded. This path can be retrieved in
the following way:
ds.GridLibraryPath: Java System property, .NET System.AppDomain.CurrentDomain data entry
ds_GridLibraryPath: Command Service, native library Service environment variable

Deployment
Grid Libraries are typically deployed by placing them in the Grid Library deployment directory on the
Primary Director. The Resource Manager will then replicate these libraries to all Engines. Variable
Substitution property files also should be placed in this directory.
Grid Libraries are special resources, in that adding or removing Grid Libraries or property files will not
result in an Engine and Daemon restart, like other resources. This is because it is not necessary to restart
until the Engine actually needs to use the Grid Library, and even then only if necessary according to the
loading procedure. Note that if a Grid Library is changed, the Daemon and Engines will restart like they
would in the case of a change to any other resource. Also, it is the responsibility of the user not to delete
Grid Libraries via the Resource Deployment page that have been loaded by active Services, as that may lead
to library load failures for subsequently executed Tasks.
If you are not using the Resource Manager for replication, you can use an alternate shared Grid Library
directory. You must then set the Grid Library Path in all Engine Configurations to point to this directory,
instead of the default replicated location. When changes are made to this library, you must then use the
Update button on the Resource Deployment page on the Primary Director. This will send a message to all
Engines to check and update their Grid Libraries via the Grid Library Manager.

Grid Library Manager


The Grid Library Manager exists on all Engines, and is responsible for maintaining the state of all Grid
Libraries deployed. Whenever any change is made to the Grid Library directory (typically due to
replication), the Grid Library Manager will update the local status as follows:
1. Any new Grid Library files are unzipped to a directory with the name corresponding to the file name.
This new library will be added to the Grid Library Manager’s catalog, but not loaded until needed.
2. If a Grid Library is removed, it will delete the local copy of the zipped Grid Library and the unzipped
directory.
3. Variable substitution files are copied into the appropriate directory. If a variable substitution file has
been changed, and the corresponding Grid Library has already been loaded, it is marked as dirty so
that the next time an Engine attempts use it, it will restart due to conflict.
4. If any Grid Library uses a latest version in the Grid Library’s catalog, and the latest version has
changed, it is marked dirty so that the next time an Engine attempts to use it, it will restart due to
conflict.
The Grid Library Manager locks the directory while making any changes, so that if multiple Engine
instances are running or multiple Engine Daemons are running from a shared Engine directory, only one
Engine will perform any file manipulation. Other Engines will wait until those operations are completed,
and then their Grid Library Managers will update their links appropriately.



50 • Chapter 7 – Application Resource Deployment This Document is Proprietary and Confidental



C++ Bridges
C++ Bridges are the native bridges that allow Engines to execute native Services. They are packaged as Grid
Libraries, named cppbridge-[os]-[compiler]-[M]-[m], where M and m are the GridServer major and minor
version numbers. All C++ Bridges are pre-packaged and deployed in the Grid Library replication directory
upon GridServer Manager installation or upgrade.
Only one version of a bridge can be loaded at any given time, so all bridges for a particular platform are built
to explicitly conflict with each other. For example, a Service that was VC7.1 conflicts with one that uses
VC7.0.

JREs
JREs will be packaged as jre-os-.glz. The Grid Library name will be jre-os, and the os will be the JRE
version, for example, 1.4.2.06. DataSynapse will package JREs for customers as needed, or as they become
available; contact DataSynapse support for details.

Grid Library Example


The following example grid-library.xml is for a mixed Java/C++ application that runs on Windows, and
both gcc2 and gcc3 for Linux:
Example 7.1: grid-library.xml example
<?xml version="1.0" encoding="UTF-8"?>
<grid-library>
<grid-library-name>MyLib</grid-library-name>
<grid-library-version>1.0.0.1</grid-library-version>
<!-- Example of how to use both gcc2 and gcc3 libraries -->
<lib-path os="linux">
<pathelement>lib/gcc2</pathelement>
</lib-path>
<lib-path os="linux" compiler"gcc3" />
<pathelement>lib/gcc3</pathelement>
</lib-path>

<!-- All three C++ bridges are included here -->


<dependency>
<grid-library-name>cppbridge-vc6</grid-library-name>
</dependency>
<dependency>
<grid-library-name>cppbridge-gcc3</grid-library-name>
</dependency>
<dependency>
<grid-library-name>cppbridge-gcc2</grid-library-name>
</dependency>

<!-- Specifies that win32 use this JRE Grid Library, others use default -->
<dependency>
<grid-library-name>jre-win32</grid-library-name>
<grid-library-version>1.4.2.06</grid-library-version>
</dependency>



GridServer Administration Guide • 51



Legacy Resource Deployment

Example 7.1: grid-library.xml example (Continued)


<!-- Example of linking to another of my Grid Libraries--->
<dependency>
<grid-library-name>MyCalculator</grid-library-name>
</dependency>

<hooks-path>
<pathelement>hooks</pathelement>
</hooks-path>

<!-- Example of multiple jar paths -->


<jar-path>
<pathelement>jars</pathelement>
<pathelement>morejars</pathelement>
</jar-path>

<!-- Example of a lib path with relative and absolute dirs -->
<lib-path os="win32">
<pathelement>lib\win</pathelement>
<pathelement>s:\lib\win</pathelement>
</lib-path>

<!-- Example of OS-dependent env vars, using a property sub -->


<environment-variables os="win32">
<property >
<name>MY_WIN_VAR</name>
<value>$WinVar$</value>
</property>
</environment-variables>
<environment-variables os="linux" compiler="gcc3"
<property >
<name>MY_GCC3_VAR</name>
<value>$LinuxDriverDir$</value>
</property>
</environment-variables>
<java-system-properties>
<property>
<name>foo</name>
<value>bar</value>
</property>
</java-system-properties>
</grid-library>

Legacy Resource Deployment


When it is not necessary or optimal to use Grid Libraries, a default set of resources is also available for use
by Engines. For instance, a Grid with only a small number of applications that do not require uninterrupted
upgrading may not require Grid Libraries. Also, developing and testing GridServer applications is typically
easier using the default resources.

Using Default Resources


Default resources are used when a Service does not specify a Grid Library. They cannot be used concurrently
with Grid Libraries, so the default resources can be thought of as a non-versioned Grid Library that conflicts
with all other Grid Libraries. Also, rather than using a grid-library.xml file, it uses the Engine
Configuration to specify paths.


52 • Chapter 7 – Application Resource Deployment This Document is Proprietary and Confidental



When using Default Resources, the following Engine Configuration properties take effect; when using Grid
Libraries, they do nothing:

Property
Environment Variables
Default JAR and Class Path
Default Library Path
Common Library Path
Default Hook Path

Default Resource Paths


The paths used by Default Resources are set in the Engine Configuration, in the Classes, Libraries, and
Paths section. By default, these paths are set to replicated resource locations. Following is a list of the paths,
and analogs to Grid Libraries:
JAR and Class Path: The jar-path
Library Path: The lib-path and assembly-path (for Windows)
Hooks Path: The hooks-path

C++ Bridges
C++ Bridges are used by simply including the bridge libraries in the Library Path. These libraries are
installed by default when the Manager is installed or upgraded, into the default library path. Note that this
means that only one version of a bridge may be used. For example, when using the default resources, you
cannot use both VC6 and VC7 services for the same Engine configuration.

Grid Library features not supported by Default Resources


The following features are unique to Grid Libraries and cannot be utilized when using Default Resources:
JRE: Only the default JRE can be used.
System Properties: Not supported, although they can be set via an Engine Hook or in the Service
implementation
Environment Variables: Not supported, although they can be set via an Engine Hook or in the Service
implementation via JNI
Daemon and Engine restart optimization: When default resources are changed, all Engines and Daemons
will restart to update those resources.
Variable Substitution: Not supported.

Code Versioning Deprecation


Code Versioning has been replaced by Grid Libraries as of GridServer version 4.1.



GridServer Administration Guide • 53



Resource Deployment: Distributing Grid Libraries and Default Resources

To support migration from Grid Libraries without changing the client implementation, the following is done:
If the CODE_VERSION option is set for a Service, the GRID_LIBRARY value is set to that value.
To migrate, then, you must at minimum perform the following so that legacy clients work correctly:
1. Package all Code Version directories as Grid Libraries with grid-library-name=codeVersion.
2. If any directories include C++ Bridge DLLs, remove them and replace with the proper bridge
dependency.
3. If Code Versions conflict with each other, use the conflict element. If all Code Versions conflict
with each other, you can simply use the "*" conflict value.
Note that these instructions are the minimum necessary to migrate from Code Versions to Grid Libraries
without changing existing client code. As client code is changed, you may find a more optimal division of
resources into dependencies.

Resource Deployment: Distributing Grid Libraries and Default Resources


The GridServer system provides a Resource Deployment mechanism for securely distributing Grid Libraries
and resources, such as libraries (.dll or .so), Java class archives (JAR), binaries, or large data files that
change relatively infrequently. The resources to be deployed are placed within a reserved directory on the
Primary Director. The system maintains a synchronized replica of the reserved directory structure for all
Engines. The replica of files on the Director is synchronized to Brokers, and then Brokers synchronize the
files with Engines. The files are secure in that they cannot be accessed by anyone on the network, only the
Engines.

The Resource Deployment Interface


The GridServer Administration Tool provides a
graphical interface to manage resources
synchronized to Engines. To manage resources, on
the Primary Director click the Services tab in the
Administration Tool, and click Resource
Deployment. The Resource Deployment page,
shown to the right, features a file browser that can
be used to navigate the replicated directories,
create new directories, and add or delete files.
To navigate the directories, simply click the
displayed file names or the directory names in the
current directory, displayed above. You can add
FIGURE 7-1: The Resource Deployment page.
new files to a directory by entering a filename and
clicking the Upload button, or clicking the
Browse button to find files on your computer. Once you have added new files, you can click Update to
update the files to your Engines.

Resource Deployment File Locations


The resources directory contains a directory for each Engine OS that is deployed only to Engines with the
respective operating system. The gridlib and shared directories are deployed to all Engines.



54 • Chapter 7 – Application Resource Deployment This Document is Proprietary and Confidental



The default locations for these directories, relative to the livecluster base directory, are in the
deploy/resources directory. Files in the resources directory itself are not deployed.

The corresponding Engine-side directory is located under the root directory for the Engine installation, for
example, C:\Program Files\DataSynapse\Engine\resources for Windows; or
/usr/local/DSEngine/resources for Unix.

There two reserved file patterns: those that contain a #, and those that end in .tmp. You cannot deploy
resources that match this pattern, as they will cause problems with the replication mechanism.

Configuring Directory Replication


The system can be configured to trigger updates of the replicas in one of two modes:
• Automatic update mode. The resources will automatically be deployed to any Engine upon login to the
Broker. Also, the Manager continuously polls the file signatures within the designated subdirectories at
the time interval specified in Monitor Interval. and triggers Engine updates whenever it detects changes;
to update the Engines, the system administrator need only add or overwrite files within the directories.
This is the default update method.
• Manual update mode. The administrator ensures that the correct files are located in the designated
subdirectories and triggers the updates manually by issuing the appropriate command in the GridServer
Administration Tool. Updates also take place at startup.
To configure manual updating,
1. Click the Manager tab, then click Manager Configuration.
2. Under Broker Resources and Director Resources, set Monitor Interval for both to 0.
There are two different ways to update files to Engines manually:
1. Click the Services tab, then click Resource Deployment.
2. Click Update.
or:
1. Click the Engine tab, then click Engine Admin.
2. Click Update Deployment Files on the Global Actions menu.
Either of these actions will cause all Engines to update. If you have installed new files and want all Engines
to use them immediately, do either of these commands.
During rapid Java development, an alternative to file updating is the use of the JAR_FILE Service Option to
dynamically attach a local JAR file to the Service. By default, this option is not available for security
reasons, and has certain restrictions.

Using Engines with Shared Network Directories


Instead of using directory replication, you can also provide Engines with common files with a shared
network directory, such as an NFS mounted directory. To do this, you must provide a directory on a shared
server that can be accessed from all of the Engines. Then the Engines must be configured to use that location.
Click the Engine tab in the Administration Tool, click Engine Configuration, and change the directories
appropriately.



GridServer Administration Guide • 55



Remote Application Installation

JAR Ordering File


If you are using multiple JAR files and need the classloader to load them in a specific order to prevent
conflicts, you can specify the order in which they are loaded. To do this, create a file called index.libs in
the JAR path root and put the names of JAR files, one per line, in the order in which they should be loaded.
Those not in the list will be loaded afterwards, in no specified order.

Remote Application Installation


The Windows Deployment Scripting Language provides a mechanism by which programs can be executed
in conjunction with file updating on Windows Engines. This can be used for such purposes as registering
COM DLLs and .NET assemblies, running Microsoft Installer packages, and so on. It runs an installation
command when the script is added, and when any dependent files are modified. It can also run an
uninstallation command when the script is removed. Note that the Remote Application Installation feature
does not work with Grid Libraries.
A deployment script is a file named dsinstall.conf in a resource subdirectory. This is a reserved filename,
and the Engine Daemon interprets any file with this name as a deployment script. The script is a properties
file, with name and value pairs that govern the command execution.
Typically, the script is placed, with associated files, in its own subdirectory of the win32 deployment
directory. This will be referred to as the installation directory.
The following properties are provided:

Property Description
install_cmd The installation command. The command should be either in the current directory or
the resources/win32/lib directory; you can also specify the full path to a command.
This command is run when the dsinstall.conf file is added, modified, and when any
dependency is modified.
workdir Working directory from which the commands are launched. The directory is relative
to the installation directory.
uninstall_cmd Optional. The uninstall command. This is executed when the script is deleted, or prior
to subsequent runs of the install command if uninstall_first is true. Supporting
files for the uninstall script may be deleted along with the script; the command is
executed prior to local deletion of the files. Typically an uninstall is performed by
simply removing the entire installation directory.
dependfiles Comma-delimited list of file names that the script depends on. The files are relative
to the installation directory. If any of these files change on a file update, the install
command is re-run. A file may contain wildcards only as replacements for the entire
name or extension, such as *.dll, *.*, or file.*.
waittime Number of seconds to wait for install/uninstall command to finish. The default is 30
seconds. If this time is exceeded, the process running the command is killed.
uninstall_first Optional. If true, the uninstall command will always be run prior to the install
command, except for the first time the install command is run. This is for situations
in which you need to uninstall software prior to reinstallation.



56 • Chapter 7 – Application Resource Deployment This Document is Proprietary and Confidental



Property Description
success_exit_codes Optional. Comma-delimited list of exit code values that indicate successful command
execution. If the exit code does not match any value, an error will be logged with the
failure code, and the next time the Daemon restarts it will retry the installation. If this
property is not set, exit codes are ignored.
disable_on_fail If an Engine Daemon should disable itself upon the failure of an install. The default
is false if not specified in the conf file. When the value is true, the Engine Daemon
will disable itself if the installation returned exit code is not in the success exit codes.

The : and \ characters must be escaped with a backslash (\) character in the dsinstall.conf file. Also, you
should not rename the dsinstall.conf file.
The following is an example of a script that installs a Microsoft Installer package:
Example 7.2: A Microsoft Installer Package Installation Script
dsinstall.conf:
dependfiles=install.bat,uninstall.bat,mypackage.msi
workdir=.
waittime=30
uninstall_first=true
install_cmd=install.bat
uninstall_cmd=uninstall.bat
success_exit_codes=0

install.bat:
%SystemRoot%\system32\msiexec /q /i mypackage.msi ALLUSERS=1

uninstall.bat:
%SystemRoot%\system32\msiexec /q /x mypackage.msi ALLUSERS=1

These three files, plus the mypackage.msi file, are all placed in a subdirectory under win32. Note that the
uninstall_first property is used to uninstall the previous version of the software whenever the package is
changed. To uninstall the software, simply remove the entire installation directory; the uninstallation is
performed prior to deleting the files.

Service Run-As
There are often cases where Services require specific user permissions in order to access needed resources.
By creating the Engine process as a given user, all Service invocations executed by the Engine can operate
with these permissions.Service Run-as (or RA) allows for specification of authentication domain accounts
under which Service invocations will execute.
By default, all RA credentials are authenticated on the Engine Daemon in order to verify that the credentials
are valid for the Engine’s authentication domain. Service RA authentication may be disabled on the Broker,
but in most installations this is discouraged unless there is a specific reason for doing so. If Service RA
authentication is disabled, then Driver user authentication should be enabled to prevent unauthorized users



GridServer Administration Guide • 57



Service Run-As

from submitting Services that may run under arbitrary accounts. Also note that while disabling this
authentication step removes the need for passwords, such Services may only run on Unix Engines due to
restrictions in the Windows API.
Note that Service Run-As only supports the Service model; there is no support for RA using the legacy Job
API.

Types of Credentials
There are two ways in which Service Run-as credentials may be specified for a given Service:

Stored Credentials
Service Run-as credentials are entered on the Director with the GridServer Administration Tool and are
synchronized with all Brokers. These credentials are linked to Services in the Service Type Registry by
specifying the username in the RunAsUser field. Credentials in the repository consist of a username and a
password. The username may be in Windows DOMAIN/username format if domain-specific authentication is
required. This domain is ignored by Unix Engines.

“Pass through” Credentials


The Driver provides the username of the current Principal that is logged in and is running the Driver. The
password is provided as a DriverManager property, CURRENT_USER_PASSWORD. These are referred to as “pass
through” credentials. A password set on the Driver is required in order to prevent user account spoofing
between authentication domains (for example, logging in as a local user on the Driver machine to pose as
an LDAP user in the credentials DB).
“Pass through” credentials are indicated for a Service in the Service Type Registry with the $ token. This
token is substituted with the username of the current principal that is executing the Driver process. The token
may also be prepended with a Windows domain if domain specific authentication is required. This domain
is ignored by Unix Engines.

Using Run-As
To use Run-As, you must do three things: set up Engines, add credentials, and associate credentials with
Service Types.

Engine Setup
To set up Engines for Service RA:
Unix Engines

For Unix Engines, from the DSEngine directory, after running configure.sh, but before you start the
Engine for the first time, do the following:
1. Change mode of all files to be group read/writable:
find . | xargs chmod g+u

2. Change ownership of the invokeRA program to root, and change it to be set UID:
sudo chown root bin/invokeRA
sudo chmod +s bin/invokeRA



58 • Chapter 7 – Application Resource Deployment This Document is Proprietary and Confidental



3. Set the Engine user’s umask to make these permissions the default:
umask 002

4. Start the Engine:


./engine.sh
Windows Engines

For Windows Engines:


1. Right-click the Engine’s install directory, select Properties, and under the Security tab use Add... to
add all users that you intend to run Services as.
2. Select the Allow check box for Full Control.
3. From the Start menu, click Settings, then Control Panel, then Administrative Tools, then Services.
Right-click the Service running the Engine and select Properties. You will need to ensure that the
Engine Daemon user is allowed to interact with the desktop. If the Local System user is selected,
select the Allow Service to Interact with the Desktop check box.
4. The domain user who launches the Engine service in Windows needs to have the following security
privileges set. Click the Start menu, then click Settings, click Control Panel, click Administrative
Tools, then click Local Security Policy. Click Local Policies, then click User Rights Assignment, and
add the user who launches the Engine service to the following policies:
SE_TCB_NAME (“Act as part of the operating system”)

SE_CHANGE_NOTIFY_NAME (“Bypass traverse checking”)


SE_ASSIGNPRIMARYTOKEN_NAME (“Replace a process level token”)
SE_INCREASE_QUOTA_NAME (“Increase quotas” or “Adjust memory quotas for a process”)
If you are using .NET Services that use XML serialization, complete the following steps:
1. Right-click the Engine’s temp directory in its Windows system directory (C:\WINNT\temp for
Windows 2000, C:\Windows\temp for Windows XP and Windows Server 2003), select Properties,
and under the Security tab, use Add... to add all users that you intend to run Services as.
2. Select the Allow check box, for Read, Write, and Delete permissions. Note that the Delete
permission is set using the Advanced button on the Security page of the Windows Explorer folder
properties dialog box.

Managing Credentials
The Credentials DB is a store of RA credentials on the Director and Brokers to be used for RA services. It
is maintained on the Director and synchronized with Brokers.
The Credential Repository page in the GridServer Administration Tool enables you to create, edit, and
delete RA credentials.
To add new Credentials to your Manager:
1. Log in to the GridServer Administration Tool.
2. Click the Admin tab, then click Credentials Repository.
3. Enter the name of a credential, a password, and then enter the same password again.



GridServer Administration Guide • 59



Service Run-As

4. Click Add.

Manage Service Types


The Service Type Registry entries allow specification of an RA username for use with that Service.
To specify a Run-As user for a Service Type:
1. Log in to the GridServer Administration Tool.
2. Click the Services tab, then click Service Type Registry.
3. For an existing Service Type, go to the Actions control for that Service Type and select Edit Service
Type. This opens the Service Type Editor window.
4. In the Service Type Editor window, under the ContainerBinding header, enter the user name in
RunAsUser.
Note that in this field, you can use $ to indicate the Driver’s current user. Leaving this value blank (the
default) indicates that the process will run as the same user running the Engine Daemon.
It is also possible to specify a Windows domain in the RunAsUser field. For example, if you are using a
Unix Driver (which would not be in a Windows domain) and you want run Services on Windows Engines
using a specific user and domain, you can specify this in the form domain/username. The forward slash will
be translated to a backslash. For example, specifying DATASYNAPSE/BILL will run Services as the user BILL in
the DATASYNAPSE Windows domain (DATASYNAPSE\BILL).



60 • Chapter 7 – Application Resource Deployment This Document is Proprietary and Confidental



Chapter 8
The Batch Scheduling Facility
••••••

Introduction
Commands and Services can be scheduled to run on a regular basis using the Batch Scheduling Facility. A
Batch Definition contains instructions in the form of components that define scheduling and what the Batch
will execute. When the Batch Definition is scheduled on the Manager, it creates a Batch Entry, which
typically waits until its scheduled time, then executes, creating a Batch Execution. Services are executed
using an embedded Driver on the Manager.
Using the Batch Editor page in the GridServer
Administration Tool, you can write a Batch
Definition with specific scheduling instructions.
You can specify a Batch Definition to
immediately execute when scheduled, or it can
wait until a given time and date. A Batch
Definition can be submitted to run at a specific
absolute time, or a relative time, such as every
hour. They can also be written to wait for an
event, such as a new, modified, or deleted file.
FIGURE 8-1: A Batch Definition consists of Batch
Batch Definitions contain one or more Components. When a Batch Definition is scheduled, it
components contained within a batch creates a Batch Entry, and will run as defined by the Batch
Components. When it runs, it creates a Batch Execution,
component. A Command component contains a which then executes the components according to the
program that will be run by the Batch Definition. definition.
A schedule or event component will specify
when subsequent Command components will run.

Terminology
The following terms are used to describe components related to the Batch Scheduling Facility:

Name Page Description


Batch Definition Batch Registry How a Batch is written. The Batch Definition is edited with the
Batch Editor page and contains a Batch Component, that then
contains other components that define the Batch. Once created, it can
be managed from the Batch Registry page.
Batch Component Batch Editor When a Batch Definition is created, it consists of a Batch component,
which can contain other components, such as ServiceCommand
components, Conditional components, and other Batch Components.
The Batch Editor page enables you to add, remove, and edit Batch
components and other components it contains.



GridServer Administration Guide • 61



Editing Batch Definitions

Name Page Description


Batch Entry Batch Schedule When a Batch Definition has been instantiated by being scheduled on
the Batch Schedule page, a Batch Entry is created. The Batch Entry
will either run immediately, or wait to run, depending on what
scheduling components were added to the Batch Definition.
Batch Execution Batch Admin When a Batch Entry runs, it creates a Batch Execution, which does
whatever was defined in the Batch Definition. For example, if a
Batch Definition uses the ServiceCommand to start ten Service
Sessions, the Batch Execution will do that. The Batch Execution is
managed on the Batch Admin page. Any actual Service Sessions
created can be managed on the Service Session page on the
Services tab.
Service Runner Service Runner Service Runners enable you to define a registered Service Type with
Registry options and init data that can be used in a Batch Definition.

Editing Batch Definitions


To create a new Batch Definition, click the
Batch tab in the Administration Tool, then
click Batch Registry. The Batch Registry
page contains a list of Batch Definitions on the
Manager, plus a blank box for entering the
name of a new Batch Definition. In the Action
column, there is an Action list for each Batch
Definition. From each Action list, you can
select Edit Batch Definition to edit a Batch
Definition, Rename Batch Definition to
rename a Batch Definition, Copy Batch
Definition to copy a Batch Definition, Delete
Batch Definition to remove a Batch
Definition, Export Batch Definition to save
an XML file of the Batch Definition, or
Schedule Batch Definition to place a Batch FIGURE 8-2: The Batch Definition Editor.
Definition in the Manager’s Batch queue. You
can also select Batch View to display a graphical representation of the Batch Definition in a new window.
To edit a Batch Definition, either select Edit Batch Definition from an existing Batch Definition’s Action
list, or type the name of a new Batch Definition in the empty box at the end of the list and click Add. This
opens a window, shown above, containing parameters for your new Batch Definition. You can then change
the values of parameters, and click Save to save the values as a Batch Definition on the Manager, or click
Cancel to exit the Batch Editor and discard any changes you have made.



62 • Chapter 8 – The Batch Scheduling Facility This Document is Proprietary and Confidental



The Batch Definition parameters are as follows:

Parameter Description
Batch Component
Name The name of the Batch Definition. If this is a new Batch Definition, this is the name
you initially typed in the blank box prior to selecting Add, and is not editable. (You
can rename a Batch Definition by selecting the Rename action from the Batch
Registry page.) If an additional Batch component is added to a Batch Definition,
you can set its name.
Type Determines how a Batch Definition is run, either in serial or parallel. If set to
parallel, all Batch components are executed when the Batch Definition is scheduled.
If set to serial, Batch components are executed in the order in which they were
added. If any of the components fail, it prevents the Batch from continuing, and the
Batch will fail. The default is serial.
Schedule Component
Type Sets the type of the Schedule. If Immediate, the Batch Definition will run when
scheduled.When Absolute, the Batch Definition will run once according to the date set
in startTime. If Relative, the Batch Definition will run after the specified number of
minutes in minuteDelay as well as repeating or executing immediately with respect to
repeat and runNow. If Cron, the Batch Definition will run according to the values set in
the cron. When set to Manager Startup, the Batch Definition when run when the
Manager is first initialized.
Add component Adds an component to the Batch Definition. A Batch Definition can contain one or more
components, which are described below.

Batch Components
The parameters in the Batch Editor window correspond to components contained in the Batch Definition.
Each Batch Definition can contain one or more Batch components. These components can be commands,
events, or other Batch Definitions. For example, a LogCommand Component is shown below. To add a
component to a Batch Definition, select a component from the add component list.
Batch components are processed in a Batch Definition in order
when Batch Type, described above, is set to serial. You can
change the order of Batch components by clicking the Move
Up and Move Down buttons in the upper-right corner of each FIGURE 8-3: A Batch component.
Batch component, to move that component’s order up or down
in the Batch Definition. You can also remove a Batch component by clicking the Remove button in the
upper-right corner.



GridServer Administration Guide • 63



Batch Components

Each of the types of Batch components that can be added to a Batch Definition are described below. In the
Batch Editor window, a help description is provided for each Batch component shown. By default,
Extended Help is displayed. Using the help control in the upper right corner, you can select Help to display
only the first sentence of help, or No Help to suppress the help display.

Name Description
Batch Contains another Batch Definition. This can be used to create a complex or multi-
leveled Batch Definition. For example, a parent Batch Definition could start each
day, starting a two child Batch Definitions, each with different schedules or
conditions.

For each new Batch component, you must set the same parameters for a Batch
Definition as described above. You can then add additional components to the
Batch.
Conditional Provides conditional processing when running Batches. The component
specified by test is run. If it runs successfully, the component specified by
success is executed. If it fails, the component specified by failed is executed.

The component specified in test returns success in the following conditions:

• Command returns Command.SUCCESS


• ServiceCommand creates the Service and submits the invocation
without exception
• ServiceRunnerCommand creates the Service and submits all
invocations without exception
BatchReference Contains a reference to a registered Batch Definition that gets loaded when
scheduled from the Batch Registry.
Command Runs an implemented method in a deployed class.
ServiceCommand Starts a Service. You can specify a Service type registered on the Manager and
method name to run. You can also specify a Service reference ID (this enables
you to reference the Service from another Service Command), Service action,
and input and init data for the Service. Data is comma-delimited.

You can add ServiceDescription, ServiceOptions, and Discriminator components


to a Service by using a Service Runner.
ServiceRunnerReference Loads the specified registered Service Runner. See below for information on
registering a Service Runner.
AdminCommand Executes a command via the GridServer Admin API. For more information on
using the Admin API, see Chapter 10, “GridServer Admin API” on page 89 of
the GridServer Developer’s Guide.



64 • Chapter 8 – The Batch Scheduling Facility This Document is Proprietary and Confidental



Name Description
EmailCommand Sends an email message from a Batch Definition, for notification or alerts. You
can enter a comma-delimited list of email addresses for recipients, and a message
string, which will be used as a subject and a body.

Note that in order for email to be sent, you must define an SMTP server in your
Manager Configuration. To do this, click the Manager tab, click Manager
Configuration, click Admin, and enter a value in SMTP Host under the Mail
heading.
EmailFileCommand Sends an email message from a Batch Definition that includes files as
attachments, typically used to send the output of a previous command by
saving that output to a file. You can enter a subject, a message body
string, a comma-delimited list of email addresses, and a semicolon-
delimited list of files, which will then be sent as attachments in the
message.

The setup rules given above in the description of the EmailCommand component
also apply to the EmailFileCommand component.
ExecCommand Executes a command from a Batch. This will execute a command from the
application server’s root directory. You can set an input, output, and error file,
plus a log file for the command to be run.
LogCommand Writes a string to the Manager log. This is useful for testing Batches or indicating
when a Batch is starting or stopping.
WaitCommand Halts for a moment before proceeding. The amount of wait time is specified in
seconds. Note that this component is only useful for generating a wait time when
the Batch type is serial.
EngineWeightCommand Sets the Engine distribution weighting relative to other Brokers. The Brokers
must be logged into the Director during execution and to show up in the Batch
Editor. The current Broker list is fetched only when adding a new
EngineWeightCommand component in the Batch Editor.
Event Makes a Batch File wait for an implemented event to take place. You can use this
to pause until a specific condition in a class you deployed has occurred.
FileEvent Makes a Batch wait for a file event to occur before completing the remaining
items in the Batch Definition. Specifically, it enables you to watch a file and wait
until it is created, deleted, or modified before proceeding.

Service Runners
Service Runners enable you to define a registered Service Type with options and init data that can be used
in a Batch Definition. It can also be used to chain together Service Types and discriminators into a single
unit that can be used in a Batch Definition.



GridServer Administration Guide • 65



Scheduling Batch Definitions

To create a Service Runner, click the Service Runner Registry page. Type the name of a Service Runner in
the box and click Add. This will open a Service Runner Editor page, where you can choose a Service Type
and enter init data, a description, and method names and input data for invocations. You can also use the list
at the bottom of the page to add discriminators, Service input description data, and Service options.
The Service Runner Registry also lists all Service Runners existing on a Manager. Using the Actions
controls, you can edit, rename, copy, delete, export, or launch each Service Runner.

Scheduling Batch Definitions


After you have created a Batch Definition with the Batch Editor page, it will be listed with the other Batch
Definitions on the Batch Registry page. However, these Batch Definitions are not actually running on the
Manager yet. To create a Batch from a Batch Definition, you must first schedule it. This actually instantiates
a Batch and inserts it into the Manager’s batch queue.
To schedule a Batch Definition, click the Batch Registry page, and find the Batch Definition in the list.
Select Schedule Batch Definition from the Actions control. This will schedule the Batch Definition, and
open the Batch Schedule page, displaying it as a Batch Entry.

The Batch Schedule Page


Batch Entries on a Manager can be listed and
administered on the Batch Schedule page. To do
this, click the Batch tab, then click the Batch
Schedule page. All Batch Entries resident on the
Manager are listed. To remove or edit an existing
Batch Entry or view logs or Batch executions, select
a command from the Actions control next to the
relevant Batch.

Running Batches
Batch Entries will automatically run when they
reach the scheduled time or conditions defined in
their Batch Definition. When this happens, Batch FIGURE 8-4: The Batch Schedule page.
Executions are created and displayed on the Batch
Admin page. PDriver Batches (which are also Batch Executions) are also displayed on this page. On the
Batch Admin page, you can monitor Batch Executions, search for logs, and display the Batch Monitor
applet to view what parts of a Batch have completed.
Any Services that are run by the Batch Execution are displayed on the Service Session Admin page. From
there, you can cancel Service Sessions, view Tasks, or do any other actions you normally would with a
Service. Note that it is possible to have a Batch Execution run a Service that continues to run, even after the
Batch Execution reports that it is finished.



66 • Chapter 8 – The Batch Scheduling Facility This Document is Proprietary and Confidental



Deploying Batch Resources
Java Services, Commands, and other resources must be placed in
[GS Manager Root]/webapps/livecluster/WEB-INF/batch/jar to be properly loaded by the embedded
Driver.
For more information on resource deployment, see Chapter 7, “Application Resource Deployment” on
page 43.

Batch Fault-Tolerance
Batch Schedules that exist on a Manager are persistent, provided the Next Run field is not never. This
provides failover capability in the event of a Manager failure, as the Batch Schedules will still exist when
the Manager is restarted.
The following Batch Schedules are persistent:
• Absolute schedules
• Relative schedules with repeat
• Cron schedules
All persistent Batches are restarted when the Manager is restarted, just like they were scheduled for the first
time. Batch runs that were to occur during the time when the Manager was down are ignored.

Using PDriver in a Batch


You can use PDriver within a Batch, with the following configuration changes:
1. Download the GridServer SDK on your Broker machine.
2. Write a batch or shell script to run your PDriver job on the Broker.
3. Create a Batch Definition that uses the ExecCommand component to run that script.



GridServer Administration Guide • 67



Using PDriver in a Batch



68 • Chapter 8 – The Batch Scheduling Facility This Document is Proprietary and Confidental



Chapter 9
Configuring Security
••••••

Introduction
GridServer provides a rich set of security options for integrating into your organization’s computing
environment. GridServer does not impose its own security policy; instead you select from the features
available to implement your preferred policy. The key security areas of authentication, access control and
authorization, event logging, data validation, and cryptography are discussed.

Authentication
Authentication is the process of determining if an entity is what it claims to be. In keeping with the
GridServer philosophy of providing a flexible set of tools that can be used to implement an organization’s
security policy, GridServer provides both a built-in authentication service and an extensible set of hooks for
integrating to external authentication systems.

Operating System Users


By default, GridServer does not authenticate using operating system accounts. Operating system accounts
are used to start GridServer software components, like the Manager, Engine, and Driver. It is not required
to use a superuser operating system account to start any GridServer component. Certain features do require
superuser level access. For instance, to use GridServer’s UIIdle scheduling mode on Windows, at least the
DSHook UI event timing service must run as superuser.

It is possible to use operating system user authentication for GridServer authentication. See “Extensible
Authentication Hooks” on page 70 for more information.
Authentication of operating system users is handled by the operating system in question.

Grid Users
Users of Grid Services may be either compute Service users or administrative users. In either case they are
authenticated through the same mechanism.
GridServer is responsible for authenticating Grid users according to the policy defined by the administrator.
Extensible authentication hooks can be used to interface to an external authentication system such as Active
Directory, LDAP, or NIS.
Once a Grid user has been authenticated, they are given an authentication token to use in further
correspondence. In the case of Administration Tool or Web Services users, the authentication token is a
standard HTTP session cookie. In the case where compute users connect via the DataSynapse APIs, the
authentication token is a DataSynapse object.



GridServer Administration Guide • 69



Authentication

User accounts are added or modified with the User Admin page, located on the Admin tab in the
Administration Tool. Each user account is given an access level, which dictates what features of the
Administration Tool they can use. For further details on access levels and their corresponding permissions,
see Chapter 6, “The GridServer Administration Tool” on page 36.

GridServer Built-In Authentication


GridServer’s built-in authentication mechanism uses the embedded Director database (the internal database)
to authenticate Grid users. Administration Tool users must be authenticated with a username and password
before they can access the Administration Tool. Likewise, Web Services users must be authenticated with a
username and password. The DataSynapse Clients APIs (JDriver, CPPDriver, PDriver) do not require
authentication by default, but authentication can be enabled.
GridServer built-in authentication includes options for minimum username length, minimum password
length, password complexity, password aging, and application behavior on password failure.
Password authentication can be configured on the Manager Configuration page, in the Security section.

Extensible Authentication Hooks


Many environments already have a suitable authentication service that can be used by GridServer. For
instance, the organization may be running an LDAP-based service like Active Directory. In this case the
organization’s policy may be to centralize all authentication information in Active Directory. GridServer’s
extensible authentication hooks can be used to integrate with existing authentication services.
Since there is no universally-accepted standard for Grid authentication nor for application authentication,
DataSynapse has chosen to create its own interfaces, DriverAuthenticationHook and UserDatabaseHook, that
can be used to integrate existing authentication models. We provide example implementations for these
hooks to integrate with LDAP. Since LDAP bindings for Grid authentication can be expected to vary from
organization to organization, it may be necessary to modify the example implementations to work with your
bindings. An additional authentication hook example is provided for NTLM.

Enabling Client Authentication


By default, any client is allowed to log in to a Manager. However, it can be configured to only allow Drivers
with a valid Grid User identity that is associated with a Driver Profile to log in. Driver Authentication is a
Director setting, and should be set on all Directors.
To enable Driver authentication:
1. Click the Manager tab on the Director.
2. Click Manager Configuration.
3. Click Engines and Clients.
4. In Client Authentication Enabled, enter True.
5. Click Save.
After authentication is enabled, you will then need to allow clients to log in. To do this, a Driver Profile
must be assigned to a Grid User. For example:
1. Click the Driver tab.



70 • Chapter 9 – Configuring Security This Document is Proprietary and Confidental



2. Click Driver Profiles.
3. Create a new Driver Profile and save it.
4. Click the Admin tab.
5. Click the User Admin page.
6. Create a new user, and assign the profile to that user.
For Drivers, the username and password are assigned using the driver.properties file or the API.
For SOAP clients, they are set using HTTP basic authentication. Most SOAP packages provide a method
for setting the username/password on the proxy.

SSL
SSL (Secure Socket Layer) communication can be enabled for communication at each level in the
GridServer architecture depending on the security requirements of the organization and the deployment
scenarios involved. SSL provides both encryption of messaging between components, and a trust
relationship of the server by the client. In addition, SSL can be used for resource downloading by Engines,
and for use of the Administration Tool. In general, HTTP communication can be completely disabled, and
all GridServer components can be used using only HTTPS.

Communication Overview
To understand how SSL is used for messaging, it is important to understand how components establish
communication channels with each other. For the remainder of this discussion, the terms “client” and
“server” will be used in the traditional way, that is, a client/server relationship. An example is the Engine
Daemon is a “client” to the Director’s “server”.
There are two aspects to establishing communication. The first step is the login process. The client requests
a login via a known communication channel. At that point, the server may perform authentication or
validation, and if successful, it returns a connection for use from then on. Note that this channel may be on
a different server. For example, an Engine logs in via a Director, but the connection exists on a Broker.
SSL is configurable for both aspects. If SSL is to be used for login, it must be configured on the client. If
SSL is to be used for the connection, it must be enabled on the server. For example, to enable a Driver to
login via SSL, the Driver must be set to the HTTPS URL address on the Director, either via the
driver.properties file or the API. To enable HTTPS communication between the Driver and Broker after
login, it must be set on the Broker, typically by configuring all Messaging and Download URLs to the
HTTPS URL.

Certificate Overview
All SSL clients establish a trust relationship with their server. This is performed via a certificate on the client
side, which essentially is a public key that is associated with a private key on the server. When establishing
the trust relationship, the server’s certificate must either have been signed by a key trusted by the client, or
be trusted implicitly by the client (a self-signed certificate). Most SSL clients contain a set of trusted
Certificate Authorities (CAs), so that if a server has a certificate signed by one of those CAs, it will
automatically trust the server. If the server is self-signed, that server’s certificate must be added to the
client’s list of trusted servers.



GridServer Administration Guide • 71



SSL

In addition, the client may check the Common Name (CN) of the server’s certificate against the hostname
of the server, to verify that the certificate is being used on the intended host.
GridServer is packaged with a default self-signed key-pair and certificate. All clients have a local copy of
the certificate added to their list of trusted servers. In addition, hostname verification is disabled by default,
as the CN will not match the servers hostname. This configuration allows immediate use of SSL without any
additional setup. This may or may not be sufficient, depending on your needs.

Keypair and Cert Location


All Managers must contain a keypair, either self or CA-signed. The default keypair is stored in a keystore,
located at [GS Manager Root]/webapps/livecluster/WEB-INF/certs/server.keystore. The keystore
password is configurable via the Manager Configuration page, in the Security section, under the SSL
Certificates heading.
If you’ve replaced the cert on the manager with one signed by your own CA, you need to replace the cert in
each downloaded SDK. If you have your own CA and ROOT_CA.pem contains its cert:
• The ROOT_CA.pem file should be imported into config/ssl.keystore as a trusted cert. This is for JDriver
and .NET.
• ROOT_CA.pem should be renamed ssl.pem (replacing the existing one) in the config directory. This is for
C++-based code (including PDriver).
The default SSL trust files are ssl.keystore, ssl.crt, and ssl.pem for JDriver, .NETDriver, and
CPP/PDriver, respectively. New certificates can be used by either importing them into the appropriate one
of these files, or by changing the DSSSLTrustFile property in driver.properties or the
DriverManager.SSL_TRUST_FILE option through the API to the file containing the certs.

Types of Connections Using SSL


It is possible to enable SSL on several different types of connections within GridServer. SSL can be used
for Driver connections, Engine and Engine Daemon connections, Broker and Director communication, and
Engine resources.
There are two methods for enabling SSL within GridServer. The first is to enable Manager HTTPS and then
enable SSL on some components. The other method is to enable HTTPS to all components. Both methods
are detailed below.

Enabling HTTPS on the Application Server


To enable HTTPS, you must first enable HTTPS on the Manager’s application server. You can then
configure HTTPS on any of the connections to components.
To enable HTTPS on the application server:
1. Log in to the GridServer Administration Tool.
2. Click the Admin tab, then click Manager Reconfigure.
3. Click the Resin Configuration option.
4. Proceed to step 4 of the Resin Configuration, the Resin SSL page. Click Enable SSL and enter an
SSL port, or use the default of 8443.



72 • Chapter 9 – Configuring Security This Document is Proprietary and Confidental



5. Complete the Manager Reconfigure steps and restart your application server.
6. After restart, open the URL to your GridServer Administration Tool. You will be presented with the
Manager Installation page. Complete the installation (enabling HTTPS on components if needed,
described in the next section) and restart your application server.

Enabling HTTPS on all Components


Because it is possible to enable SSL on several different types of connections within GridServer, the option
is available to enable everything with SSL on installation, for those who want to run a pure SSL
environment.
To do this:
1. Complete the above procedure for Enabling Manager HTTPS up to step 6, and start the Manager
Installation.
2. On step 3 of the Manager Installation, you are given the option to select Protocol and Port for both
Web Administration and Messaging and Resource Download.
The Web Administration settings are used for connections for the GridServer Administration Tool.
When this is set to HTTPS (typically with port set to 8443), any attempted HTTP connection will be
rerouted to a HTTPS connection on this port.
The Messaging and Resource Download settings are used for all Engine and Client messaging and
Resource Downloads. Setting this protocol to HTTPS will cause all connections to use HTTPS. To
configure HTTPS for only a subset of these, such as HTTPS only for Resources, you should set this
protocol to HTTP, and then set HTTPS for individual components in the Manager Configuration after
installation. Each component’s specific settings are described below.
3. Complete the remaining steps in the configuration, then click Start Installation to complete the
installation/reconfiguration. You will need to restart your application server.
Note that if you have already installed Drivers from this GridServer installation, their driver.properties
files will have to be edited to point to the new HTTPS URL before they will use SSL; Engines will
reconfigure themselves to use the new secure reinstallation; the Director URLs in all Engine Configurations
are changed to https://host:sslport.

Driver SSL
All Driver certificates can be found in the SDK stored in the config directory. Drivers will look for this
certificate in this directory by default. The Driver can use a different location if desired; see the API for more
information. If your server is using a CA-signed certificate, there is no need to for the default certificate. The
JDriver keystore includes all certificates packaged with the Java 1.4.2 cacerts file, plus the GridServer
default certificate.
HTTPS must be enabled on the Director for login, and on the Brokers for the connection.
To enable SSL for Driver login, you must set the Director URLs to the HTTPS location, either via the
driver.properties file (with the DSPrimaryDirector property) or by setting the URL programmatically
through the DriverManager API.



GridServer Administration Guide • 73



SSL

To enable SSL for Driver communication, you must enable it on all Brokers you wish to use it. This setting
will affect any Driver that is logged in to that Broker. If your Broker is configured to use HTTPS for all
Messaging, Drivers will already use HTTPS.
If you did not enable HTTPS for all messaging and want to enable SSL for Driver communication:
1. Click the Manager tab.
2. Click Manager Configuration.
3. Click Security.
4. Under HTTPS Communication, set Use HTTPS for Client Communication to True.
5. Click Save.
If you wish to use hostname verification, it can be enabled via the driver.properties file or API. Keep in
mind that you have to create and install your own keypair corresponding to the CN of the host.

Engines and Engine Daemon SSL


The Engine Daemon and Engine use the ssl.pem and ssl.keystores files, respectively, found in the Engine’s
root directory.
HTTPS must be enabled on the Director for login and connection for Daemons, and on the Brokers for the
connection for Engines.
To enable SSL for Engine and Engine Daemon login, you must set the Directors to the HTTPS location in
the Engine Configuration.
To enable SSL for Engine communication, you must enable it on all Brokers you wish to use it. SSL is
enabled for Engine Daemons on Directors. If your Broker is configured to use HTTPS for all Messaging,
Engines will already use HTTPS.
If you did not enable HTTPS for all messaging and want to enable SSL for Engines on Broker:
1. Click the Manager tab.
2. Click Manager Configuration.
3. Click Security.
4. Under HTTPS Communication, set Use HTTPS for Engine Communication to True.
5. Click Save
To enable SSL for Engines Daemons on a Director:
1. Click the Manager tab.
2. Click Manager Configuration.
3. Click Security.
4. Under HTTPS Communication, set Use HTTPS for Engine Daemon Communication to True.
5. Click Save
If you wish to use hostname verification, it can be enabled via the Engine Configuration. Keep in mind that
you have to create and install your own keypair corresponding to the CN of the host.



74 • Chapter 9 – Configuring Security This Document is Proprietary and Confidental



Brokers and Director SSL
The communication between Brokers and Directors, and the Secondary Director and Primary Director can
also be configured to use SSL. Note that because they use pure sockets for communication, HTTPS does not
need to be enabled on the Manager.
The default cert is stored in livecluster/WEB-INF/certs/ssl.keystore. Its location is configurable via the
Manager Configuration page, in the SSL section.
To enable SSL for Broker and Secondary Director login:
1. Click the Manager tab on the Director.
2. Click Manager Configuration.
3. Click Security.
4. Under Server-side Socket SSL, set Require SSL for Login to True.
5. Click Save.
6. Click the Manager tab on the Brokers and/or Secondary Director.
7. Click Manager Configuration.
8. Click Security.
9. Set Use SSL for Login to for all applicable categories (such as Broker- Primary Director)
10.Click Save.
WARNING: If a Director requires SSL, all Brokers and the Secondary Director must be also use SSL for
login.
To enable SSL for the connections:
1. Click the Manager tab on the Director.
2. Click Manager Configuration.
3. Click Security.
4. Set Use SSL for Communication to True for the Broker-Primary Director and/or Broker-
Secondary Director Connections.
5. Click Save.
If you wish to use hostname verification, it can be enabled via the Verify Hostname setting on the Security
page. Keep in mind that you have to create and install your own keypair corresponding to the CN of the host.

Resources over HTTPS


The resources used by Engines may be downloaded via HTTPS. In addition to Engines downloading
resources from Brokers, Brokers also download synchronized resources from the Director. Thus there are
two settings. If your Broker is configured to use HTTPS for all Messaging, Resources will already use
HTTPS. Otherwise, the following procedure will enable it.
To enable SSL for the connections:
1. Click the Manager tab.
2. Click Manager Configuration.
3. Click Security.


GridServer Administration Guide • 75



Resource Protection

4. Under Broker Resources, set HTTPS Enabled to True for appropriate settings. On a Manager that
contains only a Broker or Director, there will only be a single setting.

Disabling HTTP
For security reasons, you may want to disable HTTP on the Director and only use HTTPS.
NOTE: 1-Click install will not work if you are accessing the Manager using SSL (through an HTTPS URL.)
To disable non-HTTP connections:
1. Reconfigure the Manager, setting the URL to use the HTTPS URL.
2. Update all Drivers (in the driver.properties files) to use the HTTPS URL.
3. Shut down the Manager and edit the datasynapse/conf/resin.conf file (or whatever RESIN_CONF
refers to) and comment out the <http></http> entry for port 8000. (If you have already successfully
gone through the Resin Configuration pages in the Administration Tool, there will be another,
uncommented <http></http> entry that contains an SSL-enabled tag.)
4. When you restart the Manager, everything should use SSL, with no HTTP port open.

Resource Protection
Resources that are downloaded by Engines are protected from download via HTTPS. This is done in the
following manner:
• The deployment directory is protected such that files cannot be directly downloaded from it.
• When an Engine receives a message to download resources, it is provided a random nonce (a single use
token) that will expire. (This expiration time is configurable via the Manager Configuration page, in
the Security section, in the Resource Deployment heading, in the Broker Resources section, with the
Token Timeout setting.) When the Engine attempts to download data from the URL, it is redirected to
the protected deployment directory. The nonce is then validated by the Manager, and the Engine is
allowed to download the data.
Note the if you are using an alternate base directory, resources are NOT protected.



76 • Chapter 9 – Configuring Security This Document is Proprietary and Confidental



Chapter 10
GridServer Performance and Tuning
••••••

Diagnosing Performance Problems


To find bottlenecks in application performance, use GridServer’s Instrumentation feature. With
instrumentation enabled, you can get detailed timings of each request submitted to the Broker. These timings
highlight scheduling overhead, data marshalling time and network delays.
Note that Instrumentation measures only GridServer-related times. It does not show other application delays
due to, for example, excessive database load.
For information on turning on Instrumentation, see Chapter 12, “Administration Howto” on page 89. For
more information on instrumentation, see Appendix A, “Task Instrumentation” on page 105 of the
GridServer Developer’s Guide.

Tuning Data Movement


Efficient handling of data can often make or break achieving performance gains in a Grid-enabled
application. Instrumentation will reveal problems with having too much data per request: serialization,
deserialization and network transport times will be high compared to the actual Engine-side compute time.
There are a number of remedies for inefficient data movement. We survey them here in order from simplest
to most complex.

Stateful Processing
GridServer supports two related mechanisms that link client-side service instances to Engine-side state,
thereby reducing the need to transmit the same data many times. The two mechanisms are
initialization/update data, and Service affinity.
Data that is constant across an entire set of task requests should be made Service initialization data.
Initialization data is transmitted once per Engine, rather than once per request. Long-lived volume-based
applications will typically process thousands of requests, and compute-intensive applications should be
designed to create many small requests, rather than few large ones, for a variety of reasons (see Chapter 8,
“GridServer Design Guidelines” on page 79 in the GridServer Developer’s Guide for more information).
If a piece of data is not constant throughout the life of the application, but changes rarely (relative to the
frequency of requests), it can be passed as initialization data and then changed by using an update method.
See Chapter 3, “Creating Services” on page 23 the GridServer Developer’s Guide for details.
The GridServer scheduler uses the fact that an Engine has initialization data and updates from a particular
Service to route subsequent requests to that Service. This feature, called affinity, further reduces data
movement, because unneeded Engines are not recruited into the Service. (However, if the Service has
pending requests, available but uninitialized Engines will be allocated to it.) Affinity can be further exploited
by dividing the state of an application across multiple client-side Service instances, called Service Sessions.
The application then routes requests to the instance with the appropriate data. For example, in an application
dealing with bonds, each Service instance can be initialized with the data from one or several bonds. When


GridServer Administration Guide • 77



Tuning Data Movement

a request comes in for the value of a particular bond, it is routed to the service instance responsible for that
bond. In this way, a request is likely to arrive on an Engine that already has the bond data loaded, yet no
Engine will be burdened with the entire universe of bonds.
There are Engine and Service parameters related to stateful processing. The Service Session Size parameter,
located on Engine Configuration pages under the Caches heading, controls how much initialization data
can be stored on an Engine in aggregate. In other words, if the total size of init data across all loaded service
instances exceeds the set value of the parameter, then the least-recently used Service instance will be purged
from the cache. If Instrumentation shows a non-zero time for Engine Download Instance the second or
subsequent time an Engine receives a request from a service, that indicates that the service instance was
purged from the cache. Increasing Tasklet Size may then result in improved performance.
The STATE_AFFINITY Service option is a number that controls how strongly the scheduler uses affinity for
this service. The default is 1, so set it to a higher value to give your service preference when Engines are
being allocated by affinity.
The AFFINITY_WAIT Service option controls how long a queued request will avoid being allocated to an
available Engine that has no affinity, in the hope of later being matched to an Engine with affinity. Use this
option when the initialization time for a service instance is large. For instance, say it takes five minutes to
load a bond. If AFFINITY_WAIT is set to two minutes, then a queued request will not be assigned to an available
Engine that lacks affinity for two minutes from the time the first Engine becomes available. If an Engine that
already has loaded the bond becomes available in those two minutes, then the request will be assigned to
that Engine, saving five minutes of startup time.

Compression
Setting the COMPRESS_DATA Service option to true (in the Service client or on the Service Type Registry page)
will cause all transmitted data to be compressed. For large amounts of data, the transmission time saved
more than makes up for the time to do the compression.

Packing
Packing multiple requests into a single one can improve performance by amortizing the fixed per-request
overhead of GridServer and the application over multiple units of work. The fixed overhead includes TCP/IP
connection setups for multiple transits, GridServer scheduling, and other possible application initialization
steps.
GridServer’s AUTO_PACK_NUM Service option is an easy way to achieve request packing. If its value is greater
than zero, then that many requests will be packed into a single request, and responses will be unpacked,
transparently to the application. (If the application makes fewer than AUTO_PACK_NUM requests, then the
accumulated requests are transmitted after one second.) Auto-packing amortizes per-request overhead, but
does not factor out common data.

Direct Data Transfer


By default, GridServer uses Direct Data Transfer (DDT) to transfer inputs and outputs between Drivers and
Engines. When Driver-Engine DDT is enabled, the Driver saves each request as a file and sends a URL to
the Broker. The Engine assigned to the request gets the URL from the Broker and reads the data directly
from the Driver. Engine-Driver DDT works the same way in the opposite direction. Without DDT, all data
must needlessly go through the Broker.



78 • Chapter 10 – GridServer Performance and Tuning This Document is Proprietary and Confidental



DDT is efficient for medium to large amounts of data, and prevents the Broker from becoming a bottleneck.
However, if the amount of data read and written is small, disabling DDT may boost performance.
Disable Driver-Engine DDT in the driver.properties file on the client. Disable Engine-Driver DDT from
the Engine Configuration page.

Shared Directories and DDT


In some network configurations, it may be more efficient to use a shared directory for DDT rather than the
internal fileservers included in the Drivers and Engines. In this case, the Driver and Engines are configured
to read and write requests and results to the same shared network directory, rather than transferring data over
HTTP. All Engines and the Driver must have read and write permissions on this directory. Shared directories
are configured at the Job and Service level with the SHARED_UNIX_DIR and SHARED_WIN_DIR options. If using
both Windows and Unix Engines and Drivers, you must configure both options to be directories that resolve
to the same directory location for the respective operating systems.

Caching
Service initialization data is effectively a caching mechanism for data whose lifetime corresponds to the
Service Session. Other caching mechanisms can be used for data with other lifetimes.
If the data is constant or rarely changing, use GridServer’s resource deployment mechanism to distribute it
to Engine disks before the computation begins. This is the most efficient form of data transfer, because the
transfer occurs before the application starts.
GridCache can also be used to cache data. GridCache data is stored on the Manager and cached by Engines
and other clients. GridCache can handle large amounts of frequently updated data. See Chapter 7,
“GridCache” on page 73 of the GridServer Developer’s Guide for more information.

Data References
GridServer supports Data References: remote pointers to data. A Data Reference is small, but can refer to
an arbitrary amount of data on another machine. Data References are helpful in reducing the number of
network hops a piece of data needs to make. For instance, imagine that an Engine has computed a result that
another Engine may want to use. It could write this result to GridCache. But if the result is large, it will travel
from the writing Engine to the GridCache repository on the Broker, and then to the reading Engine. If the
first Engine writes a Data Reference instead, the second Engine can read the data directly from the first
Engine. Data References hide this implementation from the programmer, making network programming
much simpler.
See Chapter 4, “Accessing Services” on page 39 of the GridServer Developer’s Guide or the GridServer API
for more information.

Tasks Per Message


In the Job model, messages are sent to the Engine when TaskInputs are created. To minimize message
overhead, a message is only sent for each 20 Tasks in a Job. You may find that when running Jobs with many
short-running tasks, message overhead can be minimized by setting the Job option TASKS_PER_MESSAGE to a
number higher than the default of 20.



GridServer Administration Guide • 79



Tuning for Large Grids

Invocations Per Message


In the Services model, Drivers will send a message per invocation submitted to the Manager. To minimize
message overhead, more invocations can be sent in each message. This can increase submission speed on
Services when many invocations are submitted in bulk. The Service option INVOCATIONS_PER_MESSAGE can
be changed to a number greater than 1, so the Driver will buffer that number of invocations before
submitting to the Manager. The buffered invocations are also flushed to the Manager every second if the
buffered number doesn't reach the maximum number.

Tuning for Large Grids


In GridServer installations with a large Grid, Manager performance may become extremely slow. For
example, the Broker Monitor response time may take several seconds to update.
The following changes can improve performance on large Grids:
• Increase the number of Resin request threads from the default of 200 to 300 or more. A good rule of
thumb is Resin Threads = Maximum Messaging Connections + Maximum Resource Download
Connections + 50. This ensures enought threads to handle all messaging, downloads, and browser
requests. To do this, edit the conf/resin.conf file at the top of the Broker's installation directory. Change
the line that reads:
<thread-max>200</thread-max>

to change the setting to 300 or more. Note that your Broker will restart when the resin.conf file is
modified.
• On the Brokers, increase the Engine “Max Millis Per Heartbeat” value to be at least 2 minutes; the default
is 30 seconds.
• Increase the SSL “Token Timeout,” which is actually in effect regardless of SSL, for both the “Broker
Resources” and “Director Resources” to be 5 minutes. The settings are on the Manager Configuration
page, in the SSL section, under the Resource Deployment heading.
• Increase the Assignment Timeout, on the Manager Configuration page, in the Services section, to
60000 ms. Increasing this allows more time for an Engine to connect and pickup an assigned task when
the Broker is under heavy load. This value should be increased if you see 'Task assignment expired:'...
messages often.
• On the Manager Configuration page, in the communication section, change Maximum Messaging
Connections to 200; change Messaging Retry Wait to 10000 ms; change Driver/Engine/Daemon Socket
Timeout to 120 seconds.
• Increase the heap size. The Java maximum heap size is set in the server.sh or server.bat file, and is 512
MB by default in GridServer 4.2. It can be increased by changing the environment variable MAX_HEAP in
the server.bat or server.sh file.



80 • Chapter 10 – GridServer Performance and Tuning This Document is Proprietary and Confidental



Chapter 11
Diagnosing GridServer Issues
••••••

This chapter contains information on how to find information to diagnose GridServer issues. It contains
information on troubleshooting your installation and gathering information that will be helpful if you contact
DataSynapse for support.

Troubleshooting
When troubleshooting a GridServer installation, try the following:
1. Search the GridServer Knowledge Base, located at customer.datasynapse.com. This contains known
issues, including those that have occurred since the publication of this guide, and is updated
frequently.
2. Check the state of your Grid:
• Check Engine Daemon state configuration.
• Is File Update enabled?
• Are Engine paths set as desired?
3. Read the log files, as described below.

Obtaining Log Files


There are several logs generated by GridServer. Depending on what kind of issue you are troubleshooting,
you may need to examine one or more logs. These include Manager, Driver, Engine, and Engine Daemon
logs.

Manager Logs
Manager Logs are generated on the console window on Windows machines if the Manager is not run as a
service, or on Unix machines if the Manager is run in the foreground on the console. Because GridServer is
usually run as a service or in the background, there are several other ways to view the manager log:
• In the GridServer Administration Tool, from the Admin menu, select Current Log. This displays new
lines of the log as the happen, in a new window. It doesn’t, however, display any historical information.
Click the Snapshot button to open a frozen duplicate of the current log window.
• Also in the Administration Tool, from the Admin menu, select Diagnostics. This page enables you to
search from the Manager log, plus other logs, and display it, or create a .ZIP file of the results.
To view Manager Log results, select Manager Log in Choose Files, then select a time range in Choose
Manager Log Date/Time. You can then display the log on-screen by clicking Display Below, display it
in a new window with Display in Separate Popup Window, or save it in a compressed file with Create
.ZIP File.
• The Manager log is available directly at manager_root/webapps/livecluster/WEB-INF/log/server/* or
the location specified on the Manager Configuration page in the Logging section, on the Manager tab.


GridServer Administration Guide • 81



Obtaining Log Files

The Manager log can be set to different levels of granularity, ranging from Severe, which provides the least
amount of logging information, to Finest, which logs the most information. By default, this level is set at
Info. For debugging purposes, it may be neccesary to set the level higher, to Finer or Finest.
To change the log level:
1. In the GridServer Administration Tool, select the Manager tab.
2. Select Manager Configuration.
3. Select Logging.
4. In Default Debug Level, select a new level.

Engine and Daemon Logs


Each Engine and Engine Daemon generates its own logs. These can be accessed directly on Engines.
However, because Engines are typically installed in several different machines, there are also methods to
view logs remotely from other computers. The following procedures describe how to read Engine logs.
To read the log in a scrolling window:
1. In the GridServer Administration Tool, select the Engine tab.
2. Select the Engine Admin page.
3. From the Actions menu, select Remote Log.
This will open a window that displays the log for the Engine. As new logging information is generated,
it is displayed. This does not, however, display any prior logging history.
To access previous logs:
1. In the GridServer Administration Tool, select the Engine tab.
2. Select the Engine Admin page.
3. From the Actions menu, select Log URL List.
This will open a window containing hyperlinks to each of the log files on the Engine. You can click on
each link to remotely view each log. Note that if you open a log and then more Engine activity occurs,
you will need to reload the log to view it.
To directly view log files, look in the following directories in each Engine install directory:
• Instance logs: work/name-instance/log/*
• Daemon logs: profiles/name/logs/engined.log
• Also examine other .log files in Engine tree
To change the log level for Engines:
1. In the GridServer Administration Tool, select the Engine tab.
2. Select the Engine Configuration Page.
3. Select an Engine Configuration from the list.
4. In the Log section, select a new level in the Level list.
5. Change this setting in each Engine Configuration for which you want to change logging.



82 • Chapter 11 – Diagnosing GridServer Issues This Document is Proprietary and Confidental



Driver Logs
Driver logs are displayed in the command or shell window when a Driver is running. They are also captured
in the in logs subdirectory of working directory
For SOAP access, including Web Service and Batches, an embedded Driver on the Manager is used: no local
logs are generated.

Application Server Logs


The application server used to run the GridServer Manager also generates logs that can be helpful in
diagnosing issues. For Resin, the logs are in manager_root/log/error.log



GridServer Administration Guide • 83



Obtaining Log Files



84 • Chapter 11 – Diagnosing GridServer Issues This Document is Proprietary and Confidental



Chapter 12
Administration Howto
••••••

This chapter contains several procedures that are commonly used when administrating a GridServer
Manager. Most of the tasks outlined below use the GridServer Administration Tool, which is also described
in Chapter 6, “The GridServer Administration Tool” on page 35. Also, the Administration Tool has online
help, which further describes each page’s features.

Backup / Restore
Backing up and restoring GridServer managers requires doing little more than an OS level file copy of the
webapps/livecluster directory in your installation directory. On Director installations you may also have to
use the database repair scripts to back up or restore the internal and reporting databases.

Backup Procedure
To back up a GridServer installation:
1. Archive (with tar or zip) or simply copy the [GS Manager Root]/datasynapse/webapps/livecluster
directory. Exclude the subdirectories livecluster/dataTransfer and livecluster/localDriverDDT
from your archive process.

Restore Procedure
To restore a GridServer installation:
1. Unpack the original GridServer Manager installation using WinZip or a similar tool for Windows.
On a Unix system, do the following:
gzip -d -c GridServer_R4*gz | tar xvf -

2. Delete the livecluster directory from [GS Manager Root]/DataSynapse/webapps.


3. Copy the backup livecluster directory to [GS Manager Root]/DataSynapse/webapps.

Manager Configuration

Applying a patch or service pack to GridServer


To apply a patch or service pack to GridServer, do the following:
1. Shut down the GridServer Managers that will be updated.
2. Run the JAR file. The syntax for running the JAR is:
java -jar [Patch or Service Pack].jar [webapp_dir] [basedir1] [basedir2] ...



GridServer Administration Guide • 85



Manager Configuration

[webapp_dir] is the livecluster directory on your application server.


[basedirX] is the base directory for each Manager, if using alternate base dirs.
For example, to apply GridServer 3.2 patch 1 to GridServer 3.2 installed in c:\datasynapse:
java -jar GridServer-3_2-Patch1.jar C:\datasynapse\webapps\livecluster
Driver Upgrade:
Be sure to re-download the SDK and update all Drivers after a successful Manager update.
Note:
All files that are changed will be saved in the corresponding directory in [basedirX]\WEB-INF\uninstall.
For instance, the above example will save the old files in c:\datasynapse\webapps\livecluster\WEB-
INF\uninstall\3_2-Patch1.

Importing and Exporting Manager Configuration


GridServer Managers support the ability to export the Director and Broker configurations and Engine
configuration profiles into a signed JAR file format and later import this same format to migrate settings
from one Manager to another. This can be used to migrate Engines from one Manager to another Manager
without reconfiguring all of the Engines, to simplify administration of multiple Manager systems, or to
disseminate an organization’s preferred default Engine configuration among all clusters in the organization.
To export a configuration:
1. In the GridServer Administration Tool, click the Admin tab and click the Import/Export page.
2. Select the configurations you would like to include in the JAR. This includes the Broker
configuration, Director configuration, and any Engine configuration profiles.
3. Click Export.
4. A File Download dialog box appears. Click Save to save the jar file.
To import a configuration:
1. In the GridServer Administration Tool, click the Admin tab and click the Import/Export page.
2. Next to the Provide File for import box, click Browse.
3. Browse to the location of the jar file containing the GridServer Manager configuration export.
4. Click Upload to begin the import.
5. A list of configurations found in the JAR file will be displayed, with configurations highlighted in
red if they will install over existing configurations. Select the configurations you wish to import,
then click Import.
When completed, the Manager may need to be restarted for changes to take effect; in this case, a message
will be displayed and the Manager will automatically shut down.

Installing Manager Licenses


Each GridServer Manager requires a valid license to function. Licenses are limited by date, hostname, and
number of Engines. By default, a demo license for four Engines is included with each Manager, but for
further evaluation or production use, you must obtain a license by contacting DataSynapse Support.


86 • Chapter 12 – Administration Howto This Document is Proprietary and Confidental



To view your Manager’s license information in the GridServer Administration Tool, click the Admin tab
and click the License Information page.
A Manager license consists of a single XML file, and is typically sent by DataSynapse Support via email as
an attached file. They can also be downloaded at any time from the http://customer.datasynapse.com
customer support site. You can inspect the license with a text editor to determine its capacity, but you should
not make changes to the file.
To install the license:
1. In the GridServer Administration tool, click the Admin tab, click the License Information page.
2. Copy the .ser file that was an attachment in your email message from DataSynapse or from a
download from the DataSynapse customer support site to a location accessible with your web
browser (either a local directory or a shared directory.)
3. Click Browse.
4. Find the license file and click Open.
5. Click Upload New License.
If the license file is valid, it will overwrite the existing license and changes will take place immediately. If
it is expired, corrupt, or otherwise not valid, an error message will appear and your existing license will
remain in place.

Setting the SMTP host


The GridServer Administration Tool can be configured to send notifications via email, via the Event
Subscription page. To send the email, there must be a SMTP host configured for the Manager. This is
typically configured during Manager installation, but you can later add or change the value.
To set the SMTP host:
1. Click the Manager tab.
2. Click Manager Configuration.
3. Click Admin.
4. In the Mail heading, in SMTP Host, enter the name of your SMTP server. For many organizations,
this is simply mail.
5. In Contact Address, enter the email address of an administrative contact. A notification will be sent
to this address when new users are added to the Administration Tool.
6. Click Save.

Setting Up a Failover Broker


In the fault-tolerant configuration, some Brokers can be set up as Failover Brokers. When a Broker is
designated a Failover Broker, no Director will route Engines to that Broker unless there are no other active
Brokers. When there are no Jobs waiting for Service on a Failover Broker and other Brokers in the Grid are
available, the Failover Broker will “kick off” idle Engines causing the Engines to login to their Primary
Director and get reassigned to a non-Failover Broker in the Grid. By default, all Brokers are non-Failover
Brokers (they load-balance work). Designate one or more Brokers within the Grid as Failover Brokers when
you want those Brokers to remain idle during normal (non-failure) operation.


GridServer Administration Guide • 87



Manager Configuration

To set up a Failover Broker:


1. Log in to the GridServer Administration Tool.
2. Click the Admin Tab, then click Manager Reconfigure page.
3. Go through each configuration step. In the third step, set Broker to Failover.
4. After completing the eight steps of the Manager Reconfigure, click Start Installation. This will
reinstall GridServer and restart the Broker as a Failover Broker.

Configuring SNMP
The ServerEvent API supports the generation of SNMP traps on a per-event basis. For example, events such
as ‘Job Cancelled’ and ‘Engine Died’ can be sent as traps to an SNMP monitoring station. The SNMP
interface can be administered through an administrative plugin on the GridServer Manager. The traps
themselves are defined in the GridServer application MIB.
To configure and enable SNMP support for your Manager:
1. In the Administration Tool, click the Admin tab and click SNMP Configuration.
2. Enter the hostname and port of your SNMP server in the Host and Port fields, then click Add.
3. If you have multiple SNMP servers, repeat step 2 for each server.
4. In SNMP Version, select the version of the SNMP protocol your servers use.
5. Select each event in the event list for which you would like to have a trap generated.
6. Click the Manager tab, click Manager Configuration, and click Admin.
7. In the SNMP section, set enabled to True for the Broker, Director, or both.
The GridServer MIB can be found in [GS Manager Root]/webapps/livecluster/WEB-INF/etc/snmp.
Some SNMP events generate traps from the Broker, while others generate traps from the Director. The
following is a list of events that generate traps, sorted by Broker or Director:

Broker Trap Events Director Trap Events


DriverAddedEvent BrokerAddedEvent
DriverRemovedEvent BrokerRemovedEvent
EngineAddedEvent EngineDaemonAddedEvent
EngineDiedEvent EngineDaemonRemovedEvent
EngineRemovedEvent RemoteDatabaseBackupFailure
JobCancelled LocalDatabaseBackupFailure
JobFinished ServerStartedEvent
JobRunning
ServerStartedEvent
TaskFailed



88 • Chapter 12 – Administration Howto This Document is Proprietary and Confidental



Enabling Enhanced Task Instrumentation
Normally, a submitted task or remote Service Invocation’s execution time is measured only from start to
finish. But often it is useful to be able to track the time spent in the various stages of this process, including
input serialization, disk writing, task message submission, task queueing, task fetching, data transport, input
deserialization, task processing, output serialization, output transport, queuing, and so on. This will allow
you to understand the timing characteristics of distributed computing, optimize the process, and diagnose
problems with greater ease.
To enable enhanced task instrumentation:
1. In the Administration Tool, click the Manager tab, click Manager Configuration, then click
Services.
2. In Instrumentation, set Enable to True.
3. Click Save.
When enabled, task instrumentation applies to all Services on the Manager.

WARNING Task instrumentation will slow down the Manager, and also
requires additional disk space, so it is important to disable it after you have
completed using it. It is NOT recommended for production systems.

To view data generated by enhanced task instrumentation:


1. Click the Services tab, and click Service Session Admin.
2. Find the Service you wish to view, and select View Instrumentation from the Actions menu. Note
that this choice will only appear after the Service has finished running.
A new window will open, displaying a table of data collected by enhanced task instrumentation for the
Service. For more information on instrumentation, see Appendix A, “Task Instrumentation” on page 105 of
the GridServer Developer’s Guide.

Engine Management

Deploying Files to Engines


Directory Replication enables you to coordinate and synchronize files from your Manager to Engines. You
can use this to ensure that Engines all have the latest version of a library, file, data set, or other
resources needed to complete work.
By default, the Directory Replication mechanism automatically looks for files in a predefined directory
(typically deploy/resources, within the livecluster directory on the Manager. This contains six directories,
one for each OS supported and a shared directory, replicated to all OSes.) During each check of the directory
(the default is once per minute), if it notices changes, it sends the new files to each Engine. It also forces the
Engine to log out and log back in. This interrupts any current work, but it also ensures that work isn’t
completed with incorrect libraries or data.
You can also manually trigger a file update to ensure all Engines have the same files.



GridServer Administration Guide • 89



Engine Management

To upload files and manually trigger an update:


1. Click the Services tab.
2. Click Resource Deployment.
3. Add your files to the Manager by clicking directory names to navigate to a directory. Then click
Browse to find a file on your PC, and Upload to upload it to the Manager.
4. You can also place files in the livecluster/deploy/resources directories on the Manager. There
are OS-specific directories for Engines running on Linux, Solaris, and Win32 machines, and a
shared directory which is copies to all Engines.
5. Click the Update button.

Updating the Windows Engine JRE


By default, the 1.4.2_03 JRE is used for Windows Engines. You can change what version of the JRE is used.
For Windows, the JRE used resides on the Manager, and is updated on Engines, so you only need to change
the JRE once on the Manager.
It is not necessary to re-install the Engines after adding the new JRE because they will update themselves
automatically.
Note that when downloading a new JRE from Sun, you should download the SDK and use the JRE
contained within that package. There is also a downloadable JRE package, but the JRE it contains does not
contain the server version of a library required for Engines to run.
To change the JRE version:
1. Open the JRE you wish to use into a temporary directory. Also, ensure that the JRE you have is the
Server version (included in the Java JDK) and not the client version (from the standalone JRE.)
2. Download the Java Cryptography Extension (JCE) from Sun at
http://java.sun.com/j2se/1.4.2/download.html (at the bottom, under “Other Downloads”). This
download contains the two files local_policy.jar and US_export_policy.jar which should be copied
into the jre/lib/security directory.
3. Create a ZIP file of the directory containing the JRE files and additional files.
4. Replace the public_html/register/install/jre/jre.ZIP on the Manager with the ZIP file of the JRE
you created. Note: the file is case-sensitive, and ZIP must be uppercase. If you are using an alternate
base directory, a read-only installation, or running multiple Managers on one machine, make sure
to copy this file into the same location on each DS_BASEDIR directory.
5. Open the file engineUpdate/Win32/jre.dat in your GridServer distribution with a text editor.
6. Replace the 1.4.2_03 with the version number of the JRE you wish to use.

Updating the Unix Engine JRE


By default, the 1.4.2 JRE is used for Unix Engines. You can change what version of the JRE is used. For
Unix, there is not an update mechanism similar to the Windows, so you need to update the JRE on each
Engine.



90 • Chapter 12 – Administration Howto This Document is Proprietary and Confidental



Note that when downloading a new JRE from Sun, you should download the SDK and use the JRE
contained within that package. There is also a downloadable JRE package, but the JRE it contains does not
contain the server version of a library required for Engines to run.
To change the JRE version:
1. Shut down any running daemons:
engine.sh stop
2. Change directories to the Engine home directory on the machine running the Engine, for example,
DSEngine.
3. Move the current JRE to a new directory:
mv jre jre1_4_3

4. Unarchive the desired JRE into a new directory, such as jre1_4_3.


5. Download the Java Cryptography Extension (JCE) from Sun at
http://java.sun.com/j2se/1.4.2/download.html (at the bottom, under “Other Downloads”). This
download contains the two files localsecurity.jar and US_export_security.jar which should be
copied into the jre/lib/security directory.
6. Symlink the desired JRE to jre:
ln -s jre1_4_3 jre

Setting the Director Used by Engines


The primary and secondary Directors for an Engine is set during Engine installation. You can later change
the Directors to which an Engine reports, by changing the Engine Configuration used by the Engine.
To configure an Engine’s Directors:
1. Log in to the GridServer Administration Tool.
2. Click the Engine Tab, then click the Engine Configuration page.
3. Select the Engine distribution used by the Engine. This is typically the operating system of the
Engine.
4. Go to the Directors and Brokers heading and change Primary Director URL and Secondary
Director URL to the corresponding addresses and ports of the primary and secondary Directors, in
the format http(s)://address:port.
Note that this will change the Directors for all Engines using that Engine distribution.

Running Services

Running MPI Jobs using PDriver


PDriver, the Parametric Job Driver, has support for running MPI Jobs. The following two options in the PDS
language supported by PDriver are used when running MPI:
mpiEnabled - boolean switch which indicates the job is to be run in MPI mode. An MPI mode job is based
on a groupsize (see below), and each “group step” being treated as a single step of the job. If a single task
in an MPI job, all other tasks in that “group step” are rescheduled.


GridServer Administration Guide • 91



Running Services

mpiGroupsize - The number of nodes used in each MPI group step. The number of tasks for the job must be
evenly divisible by this setting.
For more information on writing PDS scripts for PDriver, see Chapter 6, “PDriver” on page 49 of the
GridServer Developer’s Guide.

Registering a Service Type


To use a Service, you must first register a Service Type from the GridServer Administration Tool.
To register a Service Type:
1. Log in to the GridServer Administration Tool.
2. Click the Service tab, then click the Service Type Registry page.
3. A list of existing Service Types appears on that page, along with a line for adding a new Service
Type.
4. Enter the Service Type Name on the blank line.
5. Select the Service Implementation, then click Add.
A window with several options appears after clicking the Add button.
6. For Java Service Types, enter the fully qualified class name for the service; for .NET, dynamic
libraries, or commands, enter the classname plus assembly name, library name, or command line,
respectively. The window also allows you to enter options for the Service Type.
Note that after you register a Service Type, you must deploy the implementation to your Engines

Creating and Running a Batch


To run a Batch, you must first create a Batch Definition, which contains components that specify the
schedule used by a Batch and what Services or commands are executed.
To edit a Batch Definition:
1. In the Administration Tool, click the Batch tab, and click Batch Registry.
2. Type a name for your Batch Definition in the blank box at the bottom of the list and click Add.
3. The Batch Editor dialog box will open. You can type values for the Batch and Schedule components,
and add additional components.
For example, to create a simple Batch Definition named NightlyBatch that runs a registered Service at
midnight, do the following:
1. In the Schedule object, select a type of cron.
2. In the Cron subheading, enter 0 for minute and hour. This specifies a starting time of 00:00 on a
daily basis, in cron format. You could change the values here to select a different time pattern, or
select a type of absolute to enter times in a string, like Sat, 12 Aug 1995 13:30:00 GMT.
3. In the Add Component list, select ServiceCommand.
4. In the ServiceCommand component, select a Service Type from the ServiceName list. This is a list
of all Service Types currently registered on your Manager.
5. In the ServiceCommand object, enter a MethodName, initData, inputData, or any other values that
will be needed by your Service.


92 • Chapter 12 – Administration Howto This Document is Proprietary and Confidental



6. Click Save.
7. In the Actions control next to your Batch Definition in the list, select Schedule Batch Definition.
8. Your Batch will now be on the Manager and viewable in the Batch Schedule page on the Batch tab.
It will wait until midnight, and then run the specified Service. When the Batch is running, you can
monitor it on the Batch Admin page.

Creating a native stack trace in Linux


Sometimes when you are troubleshooting native C/C++ code on linux, you want to generate a stack trace,
for example when a SIGSEGV is thrown. Since the JVM on the Engine already traps SIGSEGV and prints
out a Java (not native) stack trace, you need to override the actions of the JVM and install your own
SIGSEGV handler for debugging. The backtrace_fd() and backtrace_symbols_fd() methods from glibc
can be used for this purpose.
To install your own SIGSEGV handler for debugging, add code to your tasklet or service initialization
method similar to this:
#include <execinfo.h>
#include <stdio.h>
#include <signal.h>
#define TRACE_DEPTH 50
void MyService::segv_handler(int signum) {
void *trace[TRACE_DEPTH];
int depth;
FILE *fp;
depth = backtrace(trace, TRACE_DEPTH);
fp = fopen("trace.log", "w");
backtrace_symbols_fd(trace, depth, fileno(fp));
fclose(fp);
abort();
}
void MyService::init() {
signal(SIGSEGV, segv_handler);
signal(SIGBUS, segv_handler);
}

Attaching GDB to Engine native code on Linux


GDB can be used to debug native code in cppdriver or JNI in Linux. Also, GDB can be useful in identifying
unusual problems with the Linux JVM. However, there are some subtle issues when trying to use GDB on
a JVM, as is the case with the GridServer Engine.
First, when attaching GDB to the Engine, you must specify the LD_LIBRARY_PATH to both the Engine
components and the JVM components. You must also obtain the process ID of a running “invoke” process
from the ps command. Also, it’s somewhat easier if you run GDB from the base directory of the Engine
install (typically DSEngine) . The GDB command used is something like:
LD_LIBRARY_PATH=lib:jre/lib/i386:jre/lib/i386/native_threads:jre/lib/i386/server:resources/l
ib/linux gdb bin/invoke $INVOKEPID



GridServer Administration Guide • 93



Running Services

This method of running GDB works well for troubleshooting those rare JVM problems. However when you
are troubleshooting cppdriver code, you need a little more finesse. The issue is that cppdriver loads your
application shared objects only when the tasklet or service is instantiated, so it becomes difficult to set a
breakpoint in the application shared object. Further, attaching GDB to a running JVM often has undesired
side effects, including crashing the JVM depending on the versions of JVM, pthreads, and GDB being used.
One technique that works in this instance is to have your application tasklet or service method include some
conditional code to enter a loop checking some variable value that is never changed by the application code,
effectively creating an infinite loop. When you need to attach GDB, trigger the conditional that causes the
loop to be entered on the next invocation. Then attach GDB as above. You’ll see that the invoke process is
stopped while running in the loop. At that point you can change the loop evaluation value so that the infinite
loop is exited, and the code will continue to your breakpoint where you can continue debugging.

Logging messages from a Native service to the Engine log


To log messages to the Engine log file, use the UtilFactory::log method. See the C++ API documentation
for more information. Alternatively, you may redirect your standard out to a separate log file. See the
“Redirecting Engine Output” section in “Log Overview” on page 21 chapter of the GridServer Developer’s
Guide for more details.
Also, if you’re using C++ via JNI from a Java Tasklet and Linux Engines, you can log to stderr and it will
appear in profiles/.../engine.x.log.
Note that JNI C++ code can not write to standard out on Windows Engines.

Running a .NET Driver from an Engine Service


To run a .NET Driver from an Engine Service, you must first deploy the driver.properties file to the
Engine, and then configure the Engine to use the new file. To do this:
1. If you haven’t already downloaded a copy of the driver.properties file, log in to the GridServer
Administration Tool, click the Driver tab, click SDK download, and download the
driver.properties to your local machine.
2. In the GridServer Administration Tool, on the Services page, click the Resource Deployment page.
3. Navigate to the resources\shared\config directory.
4. Click Browse and find your local copy of the driver.properties file, and click Upload.
5. On the Engine tab, click the Engine Configuration page.
6. Select the configuration that your Engines are currently configured to use or create a new one. If
you create a new configuration remember to change the Engines to use that configuration before you
test.
7. In the configuration editing screen there will be a section called Properties. In the Environment
Variables section, change the value of DSDRIVER_DIR to .\resources\shared\config.
8. Click Save.



94 • Chapter 12 – Administration Howto This Document is Proprietary and Confidental



Configuration Issues

Installation on Dual-Interface Machines


In some network configurations, a machine may have more than one network interface, and a GridServer
component may default to using the incorrect interface. This can be corrected by configuring the component
to use the correct interface.
Drivers: To configure the Driver to use a different network interface, set the DSLocalIPAddress property to
the IP number of the correct interface. For example:
DSLocalIPAddress=192.168.12.1

Engines: To configure the Engine to use a different network interface, select the Engine Configuration that
will be used by the Engine on the Engine Configuration page, and set the Net Mask value under the File
Server heading to match the network range on which the Engine should run.

Configuring the timeout period for the Administration Tool


For security purposes, the GridServer Administration Tool will time out and require users to log in again.
By default, the timeout period is 60 minutes.
To change the timeout period, log in to the Administration tool and click the Manager tab. Click Manager
Configuration and click Security. In the Admin User Management selection, type a time in seconds in
the Admin Browser Timeout box.

Reconfiguring Managers when Installing a secondary Director


When you install a Manager that includes a secondary Director, you must also configure the Manager
containing the primary Director. This will register the secondary Director’s address with the primary
Director, as well as reconfigure the Engine and Driver configurations.
To reconfigure the Manager containing the primary Director, click the Admin menu, click Manager
Reconfigure, and enter the secondary Director’s address and port in the corresponding page. This will
configure the primary Director to recognize the secondary Director, as well as reconfiguring Engine and
Driver configurations accordingly.

Using UNC paths in a driver.properties file


It is possible to use UNC paths to specify a hostname or directory within a driver.properties file.
However, you will need to change all backslashes (\) to forward slashes (/) in the path.
For example, to change the input directory for task (Job) inputs to the UNC path \\homer\job1-dir, change
the following line:
DSWebserverDir=./ds-data

to this:
DSWebserverDir=//homer/job1-dir



GridServer Administration Guide • 95



Configuration Issues



96 • Chapter 12 – Administration Howto This Document is Proprietary and Confidental



Chapter 13
Database Administration
••••••

Introduction
Each GridServer Manager has an embedded database running on each Director. This internal, or admin
database stores administrative data, such as User, Engine, Driver, and Broker information. An external
reporting database can be used to log events and statistics. By default, GridServer is not configured with a
reporting database; the included HSQLDB or a different external reporting database can be used.

Database Types
There are two databases used by the GridServer ManagerBroker, each of which are described below.

The Reporting Database


The external reporting database is optionally used to store events and statistics, which depending on
configuration settings, can grow fairly quickly. It is recommended to use a robust external database if you
are going to be making extensive use of the reporting capabilities. The specific types of data that are stored
in the reporting database are configurable on the Manager Configuration page’s Database section. The
external database can be installed on any machine, providing that the ManagerBroker is able to create
connections to the database through a protocol such as JDBC.
For information on installing an external database for the reporting database, see Appendix B, “Database
Configuration” on page 61 of the GridServer Installation Guide.

The Internal Database


GridServer’s internal database stores admin data such as User, Engine, Driver, and Broker information. In
typical cases, the internal database is read at Manager startup, and only written to thereafter if user-driven
admin events occur, such as adding a user, Engine, Broker, or Driver profile. The internal database is
required in order to start the Manager. If it becomes unavailable or corrupt, the Manager will continue to
function, but a restart would be impossible until the database is available again. This database is an
embedded component of the GridServer software.

Internal Database Backup


The internal database used by GridServer is automatically backed up at on a regular interval. The database
is backed up to the [GS Manager Root]/webapps/livecluster/WEB-INF/db/internal/backup directory and is
also replicated to the secondary Director if one is installed.
Backups take place based on the Backup Cron configuration option, located on the Database section of the
Manager Configuration page of the Manager tab in the GridServer Administration Tool. The cron setting
is the same as traditional Unix cron settings. It is a string of the form “minute, hour, day of month, month,



GridServer Administration Guide • 97



Internal Database Backup

day of week, year”. If any field is set to -1, the backup will be repetitive. For instance, a setting of “00,23,-
1,-1,-1,-1” means the backup will occur daily at 11 PM. A setting of “00,23,1,-1,-1,-1” means the backup
will occur on the first of every month at 11 PM.
Ranges are as follows:

Name Description
minute Minute of the backup. Allowed values 0-59.
hour Hour of the backup. Allowed values 0-23.
dayOfMonth Day of month of the backup (-1 if every day). This attribute is exclusive
with dayOfWeek. Allowed values 1-31. If both dayOfMonth and
dayOfWeek are restricted, each backup will be scheduled for the
earlier match.
month Month of the backup (-1 if every month). Allowed values 0-11 (0 =
January, 1 = February, ...). java.util.Calendar constants can be used.
dayOfWeek Day of week of the backup (-1 if every day). This attribute is exclusive
with dayOfMonth. Allowed values 1-7 (1 = Sunday, 2 = Monday, ...).
java.util.Calendar constants can be used. If both dayOfMonth and
dayOfWeek are restricted, each alarm will be scheduled for the earlier
match.
year Year of the backup. When this field is not set (i.e. -1) the alarm is repetitive
(i.e. it is rescheduled when reached).

NOTE: Database backups can be very resource-intensive. It’s advisable to schedule them to occur during
off-peak hours when your Grid usage is minimal.



98 • Chapter 13 – Database Administration This Document is Proprietary and Confidental



Appendix A
The grid-library.dtd
••••••

Introduction
The grid-library.xml configuration file in the root of a Grid Library must be a well-formed XML file. The
GridServer SDKs include a grid-library.dtd file that can be used to validate the XML file. The DTD is
also shown below.
Example A.1: grid-library.dtd
<?xml version="1.0" encoding="ISO-8859-1"?>
<!-- Copyright 2004 DataSynapse. All Rights Reserved. -->

<!-- Grid-Library is in the root of the GL. --> <!ELEMENT grid-library (grid-
library-name, grid-library-version?, dependency*, conflict*, jar-path*, lib-
path*, assembly-path*, command-path*, hooks-path*, environment-variables*, java-
system-properties*)> <!ATTLIST grid-library jre (true|false) "false"> <!ATTLIST
grid-library bridge (true|false) "false"> <!ATTLIST grid-library os
(win32|solaris|solarisX86|linux|linux64|plinux) #IMPLIED> <!ATTLIST grid-library
compiler (gcc2|gcc3|gcc34) #IMPLIED>

<!-- The library name. -->


<!ELEMENT grid-library-name (#PCDATA)>

<!-- The version. If not specified, 0 is implied. --> <!ELEMENT grid-library-


version (#PCDATA)>

<!-- A library dependency. Dependencies can be specified by package name and


optional version.
If the version is not specified, the latest version is chosen at load time.
--> <!ELEMENT dependency (grid-library-name, grid-library-version?)>

<!-- A library conflict. Indicates that this library conflicts with the given
library.
If this library is NOT a dependency, and grid-library-name="*",
then it indicates that this library conflicts with all other libraries
aside from its own dependencies). -->
<!ELEMENT conflict (grid-library-name)>

<!-- The JAR path. If specified, all jars and classes in the path are loaded. --
> <!ELEMENT jar-path (pathelement*)> <!ATTLIST jar-path os
(win32|solaris|solarisX86|linux|linux64|plinux) #IMPLIED> <!ATTLIST jar-path
compiler (gcc2|gcc3|gcc34) #IMPLIED>

<!-- An element of a path, typically a directory. --> <!ELEMENT pathelement


(#PCDATA)>

<!-- Load library path. If not specified, it is assumed that no native libraries
are loaded by this GL.
If this is specified and it the library was not loaded at init time, the Engine
will restart, adding this path to the current path. --> <!ELEMENT lib-path
(pathelement*)> <!ATTLIST lib-path os
(win32|solaris|solarisX86|linux|linux64|plinux) #IMPLIED> <!ATTLIST lib-path
compiler (gcc2|gcc3|gcc34) #IMPLIED>



GridServer Administration Guide • 99



Introduction

Example A.1: grid-library.dtd (Continued)


<!-- .NET assembly path. System.AppDomain.CurrentDomain.AppendPrivatePath(path)
will be called on this path,
which add it to the lookup location for assemblies. --> <!ELEMENT assembly-
path (pathelement*)> <!ATTLIST assembly-path os
(win32|solaris|solarisX86|linux|linux64|plinux) #IMPLIED> <!ATTLIST assembly-
path compiler (gcc2|gcc3|gcc34) #IMPLIED>

<!-- The path in which the Engine will search for Command Service executables. -
-> <!ELEMENT command-path (pathelement*)> <!ATTLIST command-path os
(win32|solaris|solarisX86|linux|linux64|plinux) #IMPLIED> <!ATTLIST command-path
compiler (gcc2|gcc3|gcc34) #IMPLIED>

<!-- Engine hooks library path. Hook will be initialized as libraries are loaded.
--> <!ELEMENT hooks-path (pathelement*)> <!ATTLIST hooks-path os
(win32|solaris|solarisX86|linux|linux64|plinux) #IMPLIED> <!ATTLIST hooks-path
compiler (gcc2|gcc3|gcc34) #IMPLIED>

<!-- Environment variables to set. Environment variables are set via JNI
immediately prior to executing a task using this library. --> <!ELEMENT
environment-variables (property*)> <!ATTLIST environment-variables os
(win32|solaris|solarisX86|linux|linux64|plinux) #IMPLIED> <!ATTLIST environment-
variables compiler (gcc2|gcc3|gcc34) #IMPLIED>

<!-- A property, used by env vars & system props. --> <!ELEMENT property
(name,value)>

<!-- The name for a property element. --> <!ELEMENT name (#PCDATA)>

<!-- The value for a property element. --> <!ELEMENT value (#PCDATA)>

<!-- Java system properties, which are set upon load. --> <!ELEMENT java-system-
properties (property*)> <!ATTLIST java-system-properties os
(win32|solaris|solarisX86|linux|linux64|plinux) #IMPLIED> <!ATTLIST java-system-
properties compiler (gcc2|gcc3|gcc34) #IMPLIED>

<!-- end of grid-library dtd -->



100• Appendix A – The grid-library.dtd This Document is Proprietary and Confidental



Appendix B
Reporting Database Tables
••••••

Introduction
GridServer uses a simple relational database to report Grid processing events for historical analysis. This
appendix describes the tables in the reporting and internal databases for use by external programs.

Batches
Batches that have been scheduled or executed
Database: reporting
Primary key: none

Column name Data type Description


server Varchar Manager where the Batch resided or ran
batch_id Bigint Unique ID number of the Batch Entry
time_stamp Timestamp Timestamp of the event
event Int Event code
class Varchar Class in the Batch
execution_id Bigint Unique ID number of the Batch Execution, if
applicable
description Longvarchar Description of the Batch Event

Brokers
Table of all Brokers that have participated in this Grid.
Database: internal
Primary key: broker_id

Column name Data type Description


broker_id Int Broker ID #
broker_url Varchar Broker’s configured base URL
weight_0 Float Engine weight for Broker routing
weight_1 Float Driver weight for Broker routing



GridServer Administration Guide •101



Broker_stats

Column name Data type Description


discriminator_0 Longvarchar Engine discriminator for Broker routing*
discriminator_1 Longvarchar Driver discriminator for Broker routing*
broker_name Varchar Name of the Broker
shared_brokers Longvarchar Comma-delimited list of Brokers that share
Engines with this Broker
min_engines Int Minimum number of Engines allowed on the
Broker
max_engines Int Maximum number of Engines allowed on the
Broker

* stored as xml object

Broker_stats
All statistic reports from Brokers are stored in this table.
Database: reporting
Primary key: broker_id + timestamp

Column name Data type Description


broker_id Int The unique id of the Broker
time_stamp Timestamp Timestamp of the report
num_busy_engines Int Number of Engines busy at report time
num_total_engines Int Number of Engines logged in at report time
num_drivers Int Number of Drivers logged in at report time
uptime_minutes Float Time since Broker start in minutes
num_jobs_running Int Number of jobs running at report time
num_tasks_pending Int Number of tasks pending (not yet assigned to
Engines) at report time

Driver_events
Brokers report when a Driver logs in or out.
Database: reporting



102• Appendix B – Reporting Database Tables This Document is Proprietary and Confidental



Primary key: none

Column name Data type Description


username Varchar Driver user name
hostname Varchar Hostname Driver is running on
time_stamp Timestamp Timestamp of the report
broker_id Int ID of Broker where event occurred
event Int 0 for an add, or the reason code for a remove –
map these to the event_codes table

Driver_profiles
Profiles that can be used by Drivers
Database: internal
Primary key: name

Column name Data type Description


name Varchar Profile name
driver_properties Longvarchar Internal properties*
permission_properties Longvarchar Permissions*
description_discriminator Longvarchar Job description discriminator*

* Stored as xml object

Driver_users
Driver users for internal use
Database: internal
Primary key: username

Column name Data type Description


username Varchar Driver username
password Varchar Driver password
hostname Varchar Hostname Driver is on
profile Varchar Driver profile used by Driver



GridServer Administration Guide •103



Engine_events

Engine_events
The Brokers report when an Engine is added or removed; for example, when an Engine logs in or logs out.
Database: reporting
Primary key: none

Column name Data type Description


engine_id Bigint The unique id of the Engine
time_stamp Timestamp Timestamp of the report
broker_id Int ID of Broker where event occurred
event Int 0 for an add, or the reason code for a remove –
map these to the event_codes table

Engine_info
This table contains administrative information for all Engines that have ever logged in to this Director.
Database: internal
Primary key: engine_id

Column name Data type Description


engine_id Bigint The unique ID of the Engine
username Varchar The username used by the Engine
guid Varchar Another unique such as a MAC address
IP Varchar The IP address used by the Engine
install_date Timestamp When the Engine was installed
last_logon_date Timestamp When the Engine last logged on*
last_file_update_date Timestamp The last successful file update to the Engine*
properties Longvarchar Administratively defined Engine properties**

* deprecated - fields are no longer updated


** stored as xml object

Engine_stats
All statistic reports from Engine Daemons are stored in this table.
Database: reporting



104• Appendix B – Reporting Database Tables This Document is Proprietary and Confidental



Primary key: none

Column name Data type Description


engine_id Bigint The unique ID of the Engine
time_stamp Timestamp Timestamp of the report
cpu_utilization Float %CPU total utilization
ds_cpu_utilization Float %CPU utilized by DataSynapse processes
total_ram_kb Bigint Installed RAM reported by the OS in kilobytes
free_ram_kb Bigint Free RAM reported by the OS in kilobytes
disk_mb Bigint Free disk reported by the OS in megabytes
num_invokes Int Number of Engine processes currently running

Event_codes
Table mapping event codes to reasons
Database: reporting or internal
Primary key: none

Column name Data type Description


code Int Numeric code
name Varchar Description

Job_status_codes
Table mapping numeric job status codes to descriptive text
Database: reporting or internal
Primary key: none

Column name Data type Description


code Int Numeric code
name Varchar Description

Jobs
Historical information about all jobs that have been run by GridServer
Database: reporting



GridServer Administration Guide •105



Job_discriminators

Primary key: job_id+start_time

Column name Data type Description


job_id bigint Job ID
service_type_name Varchar The Service Type used for the Service.
job_class Varchar Java or pseudo-java class used to create the job on the client
start_time Timestamp When job was started
end_time Timestamp When job finished
job_status Int Job status (see job_status_codes table)
num_tasks Int Number of tasks in the job
task_time_std Float Standard deviation of task completion time
task_time_avg Float Mean task completion time
priority Int Job priority when submitted
end_priority Int Job priority when complete
driver_username Varchar Submitting Driver username
driver_hostname Varchar Submitting Driver hostname
job_name Varchar Optional descriptive job name from JobDescription
app_name Varchar Optional descriptive application name from JobDescription
description Varchar Optional descriptive description from JobDescription
dept_name Varchar Optional descriptive department name from JobDescription
group_name Varchar Optional descriptive group name from JobDescription
indiv_name Varchar Optional descriptive individual name from JobDescription
broker_id Int ID of Broker that ran the job

Job_discriminators
Table of Job-based discriminators
Database: internal
Primary key: name

Column name Data type Description


name Varchar Name of discriminator



106• Appendix B – Reporting Database Tables This Document is Proprietary and Confidental



Column name Data type Description
description_discriminator Longvarchar Discriminator on Job description to determine whether
to attach job discriminator*
job_discriminator Longvarchar Engine discriminator for service

* Stored as xml object

Properties
Properties used by the Manager for its internal processing.
Database: internal
Primary key: none

Column name Data type Description


name Varchar The property name
value Longvarchar The property value as an XML object.

Tasks
Historical information about all tasks that have been run by GridServer
Database: reporting
Primary key: none

Column name Data type Description


job_id Bigint Job ID
task_id Int Task ID
engine_id Bigint Engine that (finally) ran task
start_time Timestamp When task was started
end_time Timestamp When task finished
task_status Int Task status (see task_status_codes table)
num_reschedules Int Number of times task was retried
engine_instance Int Number of Engine instance that ran task
task_info Varchar Task information

Task_status_codes
Table mapping numeric task status codes to descriptive text



GridServer Administration Guide •107



Users

Database: reporting or internal


Primary key: none

Column name Data type Description


code Int Numeric code
name Varchar Description

Users
Administrative users for internal use
Database: internal
Primary key: none

Column name Data type Description


username Varchar User name
user_access Int Authorized role
user_info Longvarchar Various internal info about the user
personalization Longvarchar UI personalization*

* Stored as xml object

User_events
Table stores historical user events.
Database: reporting
Primary key: none

Column name Data type Description


server Varchar Server where event occurred
username Varchar User recording event
time_stamp Timestamp When event occurred
handler Varchar Internal handler class that recorded event
event Longvarchar Description of event



108• Appendix B – Reporting Database Tables This Document is Proprietary and Confidental



Index Engine 33
Broker
enabling SSL for messaging with clients 72, 73,
74, 75
Symbols failover 25
[GS Manager Root] 11 failure 25
heartbeat 23
monitor 40
A Broker Monitor 40
access levels Broker routing 17
Administration Tool 36 introduction 17
Administration Tool Broker,routing 17
access levels 36
help 10
introduction 35 C
opening 35 C++ bridges 51
shortcut buttons 39 configuring 88
timeout 38 SNMP 88
authentication conflicts
built-in 70 Grid Library 48
Driver, configuring 69, 70, 71, 76 credentials
pass through 58
stored 58
B
backup
database 97 D
balancing database
Engines 18 backup 97
Batch deployment
Batch Definition 61 Batch resources 67
Batch Entry 61 Director
deploying resources 67 failure 25
editing Batch Definition 62 monitor 40
fault-tolerance 27, 67 Director Monitor 40
running 66 discriminators
Service Runners 65 in Service Runners 66
using PDriver with 67 task 33
Batch Definition Driver
definition 61 authentication, enabling 69, 70, 71, 76
editing 62 failure 24
scheduling 66 heartbeat 23
Batch Entry dsinstall.conf
definition 61 definition 56
Batch scheduling facility
introduction 61
serial and parallel jobs 63 E
blacklisting Engine


GridServer Administration Guide •109



balancing 17 H
blacklisting 33
heartbeat 23
failure 24
HTTP
heartbeat 23
disabling 76

F I
failover
internal database
introduction 23
backup 97
failover Brokers 25
failure
Broker 25
Director 25
J
Driver 24 JAR Ordering File 56
Engine 24 Job
fault tolerant tasks 26 definition 13
fault-tolerance
Batch 27, 67
GridCache 27 M
introduction 23 Manager
component indicator 40
Manager Component Indicator 40
G Microsoft Install Package
Grid Library example 57
conflicts 48 monitor
definition 43 Broker 40
directory, alternate 50 Director 40
example 51
format 44–46
loading 48 P
state preservation 49 pass through credentials
using 49 using 58
variable substitution 46 PDriver
versioning 46–47 introduction 15
Grid Library Manager 50 using with Batch 67
GridCache port 80
fault-tolerance 27 disabling 76
grid-library.dtd preemption
description 99–100 Service 32
grid-library.xml priority
dtd 99–100 Service 31
elements 44–46
GridServer Web Services
timeout 38 R
Remote Application Installation
definition 43
using 56



110 • – Index This Document is Proprietary and Confidental



Resource Deployment Administration Tool 39
definition 43 simple network management protocol 88
ROOT_CA.pem SNMP
definition 72 configuring 88
Run-as SSL
definition 57 enabling for Broker-Client messaging 72, 73,
Engine setup 58 74, 75
managing credentials 59 state preservation
Service Type Registry 60 Grid Library 49
using 58 stored credentials
using 58

S
scheduling T
introduction 29 Task
serial priority execution 32 discriminators 33
serial Service execution 32 Task Reservation
security definition 49
authentication 69 Tasks
disabling HTTP 76 fault tolerant 26
Grid users
authenticating
with Grid users 69 U
operating system users User accounts
authentication security 37
with operating system users 69 using
user accounts 37 Grid Library 49
Server
See also Manager
Service V
preemption 32 variable substitution
priority 31 Grid Library 46
urgent priority 32 versioning
Service Runners 65 Grid Library 46–47
Service Session
definition 14
Services W
definition 13 Windows Deployment Scripting Language
session timeout using 56
Administration Tool 38
shortcut buttons



GridServer Administration Guide •111





112 • – Index This Document is Proprietary and Confidental