Вы находитесь на странице: 1из 6

YellowRiver: A Flexible High Performance Cluster Computing Service for Grid

Liang Peng, Lip Kian Ng, and Simon See Asia Pacic Science and Technology Center, Sun Microsystems Inc. Nanyang Center for Supercomputing and Visualization, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798 pengliang,lkng,simon @apstc.sun.com.sg

Abstract
Computational Grids provide an emerging highly distributed computing platform for scientic computing. Recently, service oriented architecture (SOA) has become a trend of implementing software systems including Grids. SOA provides more exibilities for Grid users at the service level. High performance computing (HPC) facilities such as HPC clusters, as building blocks of Grid computing, are playing an important role in computational Grid and they are embracing SOA when integrated into Grid in the format of services. Currently, how to build exible and easy-to-use HPC service for Grid computing still remains an open topic and not much work has been done in this area. In this paper, we propose an HPC cluster service architecture for Grid computing and utility computing. It provides the basic function such as service deployment, service monitoring, and service execution, etc. HPC cluster service deployment not only includes normal application deployment, but also operating system (currently Open Solaris) deployment on demand. Based on Solaris Jumpstart technology and some related tools, a prototype of this architecture has been developed and running on HPC clusters. With our prototype, the Grid users are able to deploy a basic HPC environment (e.g., OpenSolaris, MPICH, Sun Grid Engine or N1 Grid Engine resource management tool) among the available cluster nodes. Our experiments show that our work provide great convenience and exibility for users to setup and customize their preferred HPC cluster environment for their computation intensive applications in Grid computing or utility computing. Keywords: Grid computing, High Performance Cluster, Grid Service, Service Oriented Grid Architecture

1. Introduction
In recently years, Grid computing is a fast developing technology as an approach to do high performance scientic and engineering computation. While web service has become a fundamental concept for distributed computing over the Internet, service oriented computing is one of the major computing paradigms and Service Oriented Architecture (SOA) [6] becomes a trend of building software architecture not only on existing enterprise IT environments, but also on emerging Grid computing environments. High performance computing (HPC) facility, as an important building blocks of Grid environment, is also going to be service-oriented when integrated into Grid environment. HPC supercomputers and clusters will be able to appear as a computing service in service oriented Grid architectures. In the meantime, different HPC users may have different requirements on the HPC facility depending on the applications they run, so it will be very useful if the HPC facility is re-congurable by the users. These bring out a problem of how to build re-congurable and easy-touse HPC service and integrate them into Grid service architecture. Challenging issues include the re-congurability of the HPC service, the presentation of the HPC functionality, the interface denition between HPC service and Grid architecture, etc. So far few work has been done to address this problem. In this paper we present our work of providing a Grid service that allows users to build an HPC cluster computing environment on demand and then run their applications on it. Our architecture provides the basic functionality such as service deployment, service monitoring, service execution, etc. HPC cluster service deployment not only includes normal application deployment, but also operating system (currently Open Solaris) deployment on demand. We implemented a prototype of this architecture based on Solaris Jumpstart technology and some related tools. With our

Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA05) 0-7695-2486-9/05 $20.00 2005

IEEE

prototype, the Grid users are able to deploy an HPC environment (e.g., OpenSolaris [2], MPICH [1], N1 Grid Engine [8] resource management tool) on demand among the available options. Our work is useful in HPC service provisioning for Grid computing and utility computing. The remainder of this paper is organized as follows: In Section 2 we introduce the system overview of the recongurable HPC cluster service for Grid; Section 3 describes some service implementation details of our prototype as a proof of concept; A simple demonstration of a default HPC cluster deployment is shown in Section 4; Some related work are introduced in Section 5 and nally we give a conclusion in Section 6.

search and discover the HPC cluster service through certain service discovery mechanisms; get some credentials from Globus gateway/server for security; call the provided service operations in the client program such as deploying a default HPC cluster (including OS, Parallel computing environment, and Distributed Resource Manageer) or some of them selectively. destroy the environment after the computing jobs are nished, so that the system can collect the computing resources for later use. The HPC job submission is out of the range of this paper since here we mainly focus on how to setup and customize an HPC environment through Grid services. Once the HPC environment is set up, the user should be able to run their job as they do with other HPC systems within the Grid environment.

2. Overview of HPC Cluster Grid Service


The architecture of YellowRiver HPC Cluster Grid Services include several components: The HPC clusters, HPC cluster service implementation and the Globus service container where it runs, the client program who calls the operations of the service, the client environment where client program runs, a Grid service registry server, and some security components. Figure 1 shows the relationship between the components in YellowRiver HPC Cluster Grid service. With this architecture, the Grid users are able to:

3. Implementation of HPC Cluster Grid Service


We implemented a prototype of our architecture and the sample WSDL le and pseudo code of the service implementation are shown in the following subsections. Our development work is still in progress and some other components such as Grid service registration and Grid security are not implemented yet.

HPC Cluster

Globus Gateway
Service Security Configuration
HPCC Service Implementation

Grid Service Registry


ServiceRegister
YellowRiver HPCC Service WSDL file

3.1 YelloRiver HPC Cluster Service WSDL le


By dening the interface in WSDL le, we specify what our service provides to the users. Basic operations include DeployDefaultYRHPCC (to deploy a default HPC cluster), DeployOS (to deploy Operating System only), DeployN1GE (to deploy Sun N1 Grid Engine resource manager), and DestroyHPC (to destroy the HPC cluster that you have built). The WSDL le is partially shown in Figure 2.

Client Program

Grid Service Discovery


Web Portal

Command Line

Client Environment

3.2 Service Implementation


The YellowRiver HPC Cluster service is implemented in Java. It also utilize an available tool, namely JET (Java Enterprise Toolkit) which is based on Solaris JumpStart technology but make it easier to use. In our Java implementation, we monitor the server system log le while rebooting other compute nodes (this can be done through service processor) for installation. If there is PXE boot request coming in then allocate the IP address by using DHCP server and

Grid Service Users

Figure 1. Overview of YellowRiver HPC Cluster Service

Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA05) 0-7695-2486-9/05 $20.00 2005

IEEE

<?xml version="1.0" encoding="UTF-8"?> <definitions name="YellowRiverHPCCS" targetNamespace="http://apstc.sun.com.sg/namespaces/YellowRiverInstance" xmlns="http://schemas.xmlsoap.org/wsdl/" xmlns:tns="http://apstc.sun.com.sg/namespaces/YellowRiverInstance" xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/" xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/03/addressing" xmlns:wsrlw= "http://docs.oasis-open.org/wsrf/2004/06/ wsrf-WS-ResourceLifetime-1.2-draft-01.wsdl" xmlns:wsrp= "http://docs.oasis-open.org/wsrf/2004/06/ wsrf-WS-ResourceProperties-1.2-draft-01.xsd" xmlns:wsrpw= "http://docs.oasis-open.org/wsrf/2004/06/ wsrf-WS-ResourceProperties-1.2-draft-01.wsdl" xmlns:wsntw= "http://docs.oasis-open.org/wsn/2004/06/ wsn-WS-BaseNotification-1.2-draft-01.wsdl" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <!-- YellowRiver HPC Cluster Grid Service description file --> <wsdl:import namespace= "http://docs.oasis-open.org/wsrf/2004/06/ wsrf-WS-ResourceProperties-1.2-draft-01.wsdl" location="../../../wsrf/properties/WS-ResourceProperties.wsdl" /> ...... <portType name="YellowRiverHPCCSPortType" gtwsdl:implements="wsntw:NotificationProducer wsrlw:ImmediateResourceTermination wsrlw:ScheduledResourceTermination" wsrp:ResourceProperties ="tns:YellowRiverHPCCSRP"> <!-- Operation invoked when creating the Grid service --> <operation name="DeployDefaultHPCC"> <input message="tns:DeployDefaultHPCCRequest"/> <output message="tns:DeployDefaultHPCCResponse"/> </operation> <operation name="DeployOS"> <input message="tns:DeployOSRequest"/> <output message="tns:DeployOSResponse"/> </operation> <operation name="DeployN1GE"> <input message="tns:DeployN1GERequest"/> <output message="tns:DeployN1GEResponse"/> </operation> <operation name="DestroyHPCC"> <input message="tns:DestroyHPCCInputMessage"/> <output message="tns:DestroyHPCCOutputMessage"/> </operation> </portType> </definitions>

Figure 2. Yellow River HPC Cluster Service WSDL le

Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA05) 0-7695-2486-9/05 $20.00 2005

IEEE

package com.sun.apstc.hptc.YellowRiver; import java.rmi.RemoteException; import com.sun.apstc.wsrf.Resource; import com.sun.apstc.wsrf.ResourceProperties; ...... public class YellowRiver implements Resource, ResourceProperties{ ...... public void DeployDefaultHPCC(String OStype){ ... } public void DeployOS(String OStype){ ... } public void DeployN1GE(String OStype){ ... } public void DestroyHPCC(String OStype){ ... } }

Figure 3. YellowRiver HPC Cluster Service Implementation pseudo code call relative JET utilities to make some template le and transfer OS image to the compute node. Some Grid service implementation related pseudo and structure are shown in Figure 3 Our design is to make it more exible in a sense that the users are able to add their own HPC components. Currently we have prepared two default HPC components, namely Sun N1 Grid Engine and MPICH. The whole HPC stack is implemented as a JET module and the user can upload their own software as an HPC component to the server to make it possible to deploy to the cluster. the server side: checking the system log le to monitor the PXE boot request from compute node.

4. Experiments
In this section, we demonstrate an example of a default HPC Cluster deployment using YellowRiver HPC Cluster Grid service. Current default deployment include OpenSolaris/Solaris10, Sun N1 Grid Engine, and MPICH parallel programming libraries over the cluster. Our hardware testbed is a cluster of Sun Fire V20z machines. When the service is invoked, there is a YellowRiver daemon running on the Globus gateway server side (which is actually a front-end node of the cluster), monitoring the internal network for incoming PXE boot request and action accordingly. Figure 4 shows that the service is running at

Figure 4. Service running at the service side to monitor the PXE boot request from compute node

When the compute node is PXE booted up, it sends DHCP request to the server and waiting for an IP address.

Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA05) 0-7695-2486-9/05 $20.00 2005

IEEE

After that, the Solaris will boot up from network. Figure 5 shows that the Solaris is network booting on the client after allocated IP address by the server.

stalling) at the compute node.

Figure 7. The compute node (compute-0-1) package installation is in progress Figure 5. Solaris is network booting on the computing
node

In the meantime, the daemon running on the server side will also action accordingly. Figure 6 shows that the daemon is doing more operations at the server side after the compute node PXE boot request has been detected, including making templates for Jumpstart installation, etc.

The installation on the compute node will continue until all the software packages are installed and congured. Figure 8 shows that after installation there are two nodes in the cluster: thebe (the server) and compute-0-1 (a newly installed compute node).

Figure 8. A new compute node (compute-0-1) is added into the cluster Figure 6. Server doing more operations after knowing
there is a computing node being installed

When the server is ready for remote installation (e.g. setting up tftp service, etc), the OS image can be transferred to the compute node and installed. Figure 7 shows that the Solaris installation is in progress (package copying and in-

The whole procedure of a default setup (installing OpenSolaris, N1GE, and MPICH) on one compute node takes about 5 to 20 minutes depending on the number of selected packages in the OS to be installed. For multiple compute nodes, their installation time can be largely overlapped with each other.

Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA05) 0-7695-2486-9/05 $20.00 2005

IEEE

5. Related Work
There has been a lot efforts on automatic HPC cluster deployment and management, however, few of them are targeting to provide it as a Grid service for Grid computing or utility computing. Rocks [4, 5, 7] is a typical tool for automatic cluster deployment and management. It provides a lot of third party software that enable HPC in a cluster. However Rocks is not provided as a service for Grid or utility computing. OSCAR [3](Open Source Cluster Application Resources) is yet another cluster deployment and management tool. The major difference between OSCAR and Rocks is that OSCAR is separated from OS and Rocks distribution already integrated with RedHat Linux installation.

[6] M. P. Papazoglou and D. Georgakopoulos. Service-Oriented Computing. Communications of The ACM, 46(10):2528, Oct. 2003. [7] F. D. Sacerdoti, M. J. Katz, M. L. Massie, and D. E. Culler. Wide area cluster monitoring with ganglia. In IEEE International Conference on Cluster Computing, Hong Kong, Dec. 2003. [8] Sun Microsystems, Inc. N1 grid engine. http://docs.sun.com/app/docs/coll/1017.3?q=N1GE.

6. Conclusion
In this paper we presented an HPC cluster computing service for Grid and utility computing. With our proposed architecture, the Grid users will be able to deploy an HPC cluster (including Solaris10/OpenSolaris, MPICH, N1GE resource manager) based on the requirements of their Grid jobs. A prototype has been implemented and a demonstration of deploying a default HPC cluster is also shown. Our work is useful in HPC cluster service provisioning in Grid or utility computing environment, where the computing services can be recongured by the service consumers on demand. Facilities such as HPC benchmark center with large clusters can also benet from the solution. Our work is still in progress and a lot of fugure work need to be done in order to make the service practical and more convenient. Some other components are not implemented yet such as service registration mechanism, security components, GUI or browser based interface, etc. Moreover, multiple choice of operating system may also be of interest in our future work.

References
[1] MPICH-A Portable Implementation of MPI. http://wwwunix.mcs.anl.gov/mpi/mpich/. [2] Open solaris. http://www.opensolaris.org/. [3] Open source cluster application resources. http://oscar.openclustergroup.org/. [4] Leveraging Standard Core Technologies to Programmatically Build Linux Cluster Appliances. CLUSTER 2002: IEEE International Conference on Cluster Computing, Apr. 2002. [5] P. M. Papadopoulos, M. J. Katz, and G. Bruno. Npaci rocks: Tools and techniques for easily deploying manageable linux clusters. Concurrency and Computation: Practice and Experience, Special Issue: Cluster 2001, June 2002.

Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA05) 0-7695-2486-9/05 $20.00 2005

IEEE

Вам также может понравиться