Академический Документы
Профессиональный Документы
Культура Документы
Liang Peng, Lip Kian Ng, and Simon See Asia Pacic Science and Technology Center, Sun Microsystems Inc. Nanyang Center for Supercomputing and Visualization, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798 pengliang,lkng,simon @apstc.sun.com.sg
Abstract
Computational Grids provide an emerging highly distributed computing platform for scientic computing. Recently, service oriented architecture (SOA) has become a trend of implementing software systems including Grids. SOA provides more exibilities for Grid users at the service level. High performance computing (HPC) facilities such as HPC clusters, as building blocks of Grid computing, are playing an important role in computational Grid and they are embracing SOA when integrated into Grid in the format of services. Currently, how to build exible and easy-to-use HPC service for Grid computing still remains an open topic and not much work has been done in this area. In this paper, we propose an HPC cluster service architecture for Grid computing and utility computing. It provides the basic function such as service deployment, service monitoring, and service execution, etc. HPC cluster service deployment not only includes normal application deployment, but also operating system (currently Open Solaris) deployment on demand. Based on Solaris Jumpstart technology and some related tools, a prototype of this architecture has been developed and running on HPC clusters. With our prototype, the Grid users are able to deploy a basic HPC environment (e.g., OpenSolaris, MPICH, Sun Grid Engine or N1 Grid Engine resource management tool) among the available cluster nodes. Our experiments show that our work provide great convenience and exibility for users to setup and customize their preferred HPC cluster environment for their computation intensive applications in Grid computing or utility computing. Keywords: Grid computing, High Performance Cluster, Grid Service, Service Oriented Grid Architecture
1. Introduction
In recently years, Grid computing is a fast developing technology as an approach to do high performance scientic and engineering computation. While web service has become a fundamental concept for distributed computing over the Internet, service oriented computing is one of the major computing paradigms and Service Oriented Architecture (SOA) [6] becomes a trend of building software architecture not only on existing enterprise IT environments, but also on emerging Grid computing environments. High performance computing (HPC) facility, as an important building blocks of Grid environment, is also going to be service-oriented when integrated into Grid environment. HPC supercomputers and clusters will be able to appear as a computing service in service oriented Grid architectures. In the meantime, different HPC users may have different requirements on the HPC facility depending on the applications they run, so it will be very useful if the HPC facility is re-congurable by the users. These bring out a problem of how to build re-congurable and easy-touse HPC service and integrate them into Grid service architecture. Challenging issues include the re-congurability of the HPC service, the presentation of the HPC functionality, the interface denition between HPC service and Grid architecture, etc. So far few work has been done to address this problem. In this paper we present our work of providing a Grid service that allows users to build an HPC cluster computing environment on demand and then run their applications on it. Our architecture provides the basic functionality such as service deployment, service monitoring, service execution, etc. HPC cluster service deployment not only includes normal application deployment, but also operating system (currently Open Solaris) deployment on demand. We implemented a prototype of this architecture based on Solaris Jumpstart technology and some related tools. With our
Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA05) 0-7695-2486-9/05 $20.00 2005
IEEE
prototype, the Grid users are able to deploy an HPC environment (e.g., OpenSolaris [2], MPICH [1], N1 Grid Engine [8] resource management tool) on demand among the available options. Our work is useful in HPC service provisioning for Grid computing and utility computing. The remainder of this paper is organized as follows: In Section 2 we introduce the system overview of the recongurable HPC cluster service for Grid; Section 3 describes some service implementation details of our prototype as a proof of concept; A simple demonstration of a default HPC cluster deployment is shown in Section 4; Some related work are introduced in Section 5 and nally we give a conclusion in Section 6.
search and discover the HPC cluster service through certain service discovery mechanisms; get some credentials from Globus gateway/server for security; call the provided service operations in the client program such as deploying a default HPC cluster (including OS, Parallel computing environment, and Distributed Resource Manageer) or some of them selectively. destroy the environment after the computing jobs are nished, so that the system can collect the computing resources for later use. The HPC job submission is out of the range of this paper since here we mainly focus on how to setup and customize an HPC environment through Grid services. Once the HPC environment is set up, the user should be able to run their job as they do with other HPC systems within the Grid environment.
HPC Cluster
Globus Gateway
Service Security Configuration
HPCC Service Implementation
Client Program
Command Line
Client Environment
Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA05) 0-7695-2486-9/05 $20.00 2005
IEEE
<?xml version="1.0" encoding="UTF-8"?> <definitions name="YellowRiverHPCCS" targetNamespace="http://apstc.sun.com.sg/namespaces/YellowRiverInstance" xmlns="http://schemas.xmlsoap.org/wsdl/" xmlns:tns="http://apstc.sun.com.sg/namespaces/YellowRiverInstance" xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/" xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/03/addressing" xmlns:wsrlw= "http://docs.oasis-open.org/wsrf/2004/06/ wsrf-WS-ResourceLifetime-1.2-draft-01.wsdl" xmlns:wsrp= "http://docs.oasis-open.org/wsrf/2004/06/ wsrf-WS-ResourceProperties-1.2-draft-01.xsd" xmlns:wsrpw= "http://docs.oasis-open.org/wsrf/2004/06/ wsrf-WS-ResourceProperties-1.2-draft-01.wsdl" xmlns:wsntw= "http://docs.oasis-open.org/wsn/2004/06/ wsn-WS-BaseNotification-1.2-draft-01.wsdl" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <!-- YellowRiver HPC Cluster Grid Service description file --> <wsdl:import namespace= "http://docs.oasis-open.org/wsrf/2004/06/ wsrf-WS-ResourceProperties-1.2-draft-01.wsdl" location="../../../wsrf/properties/WS-ResourceProperties.wsdl" /> ...... <portType name="YellowRiverHPCCSPortType" gtwsdl:implements="wsntw:NotificationProducer wsrlw:ImmediateResourceTermination wsrlw:ScheduledResourceTermination" wsrp:ResourceProperties ="tns:YellowRiverHPCCSRP"> <!-- Operation invoked when creating the Grid service --> <operation name="DeployDefaultHPCC"> <input message="tns:DeployDefaultHPCCRequest"/> <output message="tns:DeployDefaultHPCCResponse"/> </operation> <operation name="DeployOS"> <input message="tns:DeployOSRequest"/> <output message="tns:DeployOSResponse"/> </operation> <operation name="DeployN1GE"> <input message="tns:DeployN1GERequest"/> <output message="tns:DeployN1GEResponse"/> </operation> <operation name="DestroyHPCC"> <input message="tns:DestroyHPCCInputMessage"/> <output message="tns:DestroyHPCCOutputMessage"/> </operation> </portType> </definitions>
Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA05) 0-7695-2486-9/05 $20.00 2005
IEEE
package com.sun.apstc.hptc.YellowRiver; import java.rmi.RemoteException; import com.sun.apstc.wsrf.Resource; import com.sun.apstc.wsrf.ResourceProperties; ...... public class YellowRiver implements Resource, ResourceProperties{ ...... public void DeployDefaultHPCC(String OStype){ ... } public void DeployOS(String OStype){ ... } public void DeployN1GE(String OStype){ ... } public void DestroyHPCC(String OStype){ ... } }
Figure 3. YellowRiver HPC Cluster Service Implementation pseudo code call relative JET utilities to make some template le and transfer OS image to the compute node. Some Grid service implementation related pseudo and structure are shown in Figure 3 Our design is to make it more exible in a sense that the users are able to add their own HPC components. Currently we have prepared two default HPC components, namely Sun N1 Grid Engine and MPICH. The whole HPC stack is implemented as a JET module and the user can upload their own software as an HPC component to the server to make it possible to deploy to the cluster. the server side: checking the system log le to monitor the PXE boot request from compute node.
4. Experiments
In this section, we demonstrate an example of a default HPC Cluster deployment using YellowRiver HPC Cluster Grid service. Current default deployment include OpenSolaris/Solaris10, Sun N1 Grid Engine, and MPICH parallel programming libraries over the cluster. Our hardware testbed is a cluster of Sun Fire V20z machines. When the service is invoked, there is a YellowRiver daemon running on the Globus gateway server side (which is actually a front-end node of the cluster), monitoring the internal network for incoming PXE boot request and action accordingly. Figure 4 shows that the service is running at
Figure 4. Service running at the service side to monitor the PXE boot request from compute node
When the compute node is PXE booted up, it sends DHCP request to the server and waiting for an IP address.
Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA05) 0-7695-2486-9/05 $20.00 2005
IEEE
After that, the Solaris will boot up from network. Figure 5 shows that the Solaris is network booting on the client after allocated IP address by the server.
Figure 7. The compute node (compute-0-1) package installation is in progress Figure 5. Solaris is network booting on the computing
node
In the meantime, the daemon running on the server side will also action accordingly. Figure 6 shows that the daemon is doing more operations at the server side after the compute node PXE boot request has been detected, including making templates for Jumpstart installation, etc.
The installation on the compute node will continue until all the software packages are installed and congured. Figure 8 shows that after installation there are two nodes in the cluster: thebe (the server) and compute-0-1 (a newly installed compute node).
Figure 8. A new compute node (compute-0-1) is added into the cluster Figure 6. Server doing more operations after knowing
there is a computing node being installed
When the server is ready for remote installation (e.g. setting up tftp service, etc), the OS image can be transferred to the compute node and installed. Figure 7 shows that the Solaris installation is in progress (package copying and in-
The whole procedure of a default setup (installing OpenSolaris, N1GE, and MPICH) on one compute node takes about 5 to 20 minutes depending on the number of selected packages in the OS to be installed. For multiple compute nodes, their installation time can be largely overlapped with each other.
Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA05) 0-7695-2486-9/05 $20.00 2005
IEEE
5. Related Work
There has been a lot efforts on automatic HPC cluster deployment and management, however, few of them are targeting to provide it as a Grid service for Grid computing or utility computing. Rocks [4, 5, 7] is a typical tool for automatic cluster deployment and management. It provides a lot of third party software that enable HPC in a cluster. However Rocks is not provided as a service for Grid or utility computing. OSCAR [3](Open Source Cluster Application Resources) is yet another cluster deployment and management tool. The major difference between OSCAR and Rocks is that OSCAR is separated from OS and Rocks distribution already integrated with RedHat Linux installation.
[6] M. P. Papazoglou and D. Georgakopoulos. Service-Oriented Computing. Communications of The ACM, 46(10):2528, Oct. 2003. [7] F. D. Sacerdoti, M. J. Katz, M. L. Massie, and D. E. Culler. Wide area cluster monitoring with ganglia. In IEEE International Conference on Cluster Computing, Hong Kong, Dec. 2003. [8] Sun Microsystems, Inc. N1 grid engine. http://docs.sun.com/app/docs/coll/1017.3?q=N1GE.
6. Conclusion
In this paper we presented an HPC cluster computing service for Grid and utility computing. With our proposed architecture, the Grid users will be able to deploy an HPC cluster (including Solaris10/OpenSolaris, MPICH, N1GE resource manager) based on the requirements of their Grid jobs. A prototype has been implemented and a demonstration of deploying a default HPC cluster is also shown. Our work is useful in HPC cluster service provisioning in Grid or utility computing environment, where the computing services can be recongured by the service consumers on demand. Facilities such as HPC benchmark center with large clusters can also benet from the solution. Our work is still in progress and a lot of fugure work need to be done in order to make the service practical and more convenient. Some other components are not implemented yet such as service registration mechanism, security components, GUI or browser based interface, etc. Moreover, multiple choice of operating system may also be of interest in our future work.
References
[1] MPICH-A Portable Implementation of MPI. http://wwwunix.mcs.anl.gov/mpi/mpich/. [2] Open solaris. http://www.opensolaris.org/. [3] Open source cluster application resources. http://oscar.openclustergroup.org/. [4] Leveraging Standard Core Technologies to Programmatically Build Linux Cluster Appliances. CLUSTER 2002: IEEE International Conference on Cluster Computing, Apr. 2002. [5] P. M. Papadopoulos, M. J. Katz, and G. Bruno. Npaci rocks: Tools and techniques for easily deploying manageable linux clusters. Concurrency and Computation: Practice and Experience, Special Issue: Cluster 2001, June 2002.
Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA05) 0-7695-2486-9/05 $20.00 2005
IEEE