Вы находитесь на странице: 1из 73

CS6703-GRID AND CLOD COMPUTING

Dr.R.Anand
Associate Professor
Department of IT
UNIT I
INTRODUCTION
Evolution of Distributed computing: Scalable computing over the Internet – Technologies for network
based systems – clusters of cooperative computers - Grid computing Infrastructures – cloud computing -
service oriented architecture – Introduction to Grid Architecture and standards –Elements of Grid –
Overview of Grid Architecture.
1.1 Evolution of Distributed computing: Scalable computing over the Internet - Technologies for
network based systems - Clusters of cooperative computers
Certainly one can go back a long way to trace the history of distributed computing. Types of
distributed computing existed in the 1960s.
Many people were interested in connecting computers together for high performance computing
in the 1970s and in particular forming multicomputer or multiprocessor systems. From connecting
processors and computers together locally that began in earnest in the 1960s and 1970s, distributed
computing now extends to connecting computers that are geographically distant.
The distributed computing technologies that underpin Grid computing were developed
concurrently and rely upon each other.
There are three concurrent interrelated paths. They are:
•Networks
•Computing platforms
•Software techniques
Networks: Grid computing relies on high performance computer networks. The history of such networks
began in the 1960s with the development of packet switched networks. The most important and ground-
breaking geographically distributed packet-switched network was the DoD-funded ARPNET network
with a design speed of 50 Kbits/sec.
ARPNET became operational with four nodes (University of California at Los Angeles, Stanford
Research Institute, University of California at Santa Barbara and University of Utah) in 1969. TCP
(Transmission Control Protocol) was conceived in 1974 and became TCP/IP (Transmission Control
Protocol/Internet Protocol) in 1978. TCP/IP became universally adopted. TCP provided a protocol for
reliable communication while IP provided for network routing. Important concepts including IP addresses
to identify hosts on the Internet and ports that identify end points (processes) for communication
purposes. The Ethernet was also developed in the early 1970s and became the principal way of
interconnecting computers on local networks.
It initially enabled multiple computers to share a single Ethernet cable and handled
communication collisions with a retry protocol although nowadays this collision detection is usually not
needed as separate Ethernet cables are used for each computer, with Ethernet switches to make
connections. Each Ethernet interface has a unique physical address to identify it for communication
purpose, which is mapped to the host IP address.
The Internet began to be formed in early 1980s using the TCP/IP protocol. During the 1980s, the
Internet grew at a phenomenal rate. Networks continued to improve and became more pervasive
throughout the world. In the 1990s, the Internet developed into the World-Wide Web. The browser and
the HTML markup language was introduced. The global network enables computers to be interconnected
virtually anywhere in the world.
Computing Platforms: Computing systems began as single processor systems. It was soon recognized
that increased speed could potentially be obtained by having more than one processor inside a single
computer system and the term parallel computer was coined to describe such systems.
Parallel computers were limited to applications that required computers with the highest
computational speed. It was also recognized that one could connect a collection of individual computer
systems together quite easily to form a multicomputer system for higher performance. There were many
projects in the 1970s and 1980s with this goal, especially with the advent of low cost microprocessors.

CS6703 GRID AND CLOUD COMPUTING


In the 1990s, it was recognized that commodity computers (PCs) provided the ideal cost-effective
solution for constructing multicomputer and the term cluster computing emerged. In cluster computing, a
group of computers are connected through a network switch as illustrated in the figure below. Specialized
high-speed interconnections were developed for cluster computing.
However, many chose to use commodity Ethernet as a cost-effective solution although Ethernet
was not developed for cluster computing applications and incurs a higher latency. The term Beowulf
cluster was coined to describe a cluster using off-the-shelf computers and other commodity components
and software, named after the Beowulf project at the NASA.

Typical cluster computing configuration


Goodard Space Flight Center started in 1993 (Sterling 2002). The original Beowulf project used Intel 486
processors, the free Linux operating system and dual 10 Mbits/sec Ethernet connections.
As clusters were being constructed, work was done on how to program them. The dominant
programming paradigm for cluster computing was and still is message passing in which information is
passed between processes running on the computers in the form of messages. These messages are
specified by the programmer using message-passing routines.
The most notable library of message-passing routines was PVM (Parallel Virtual Machine)
(Sunderam 1990), which was started in the late 1980s and became the de facto standard in the early-mid
1990. PVM included the implementation of message-passing routines.
Subsequently, a standard definition for message passing libraries called MPI (Message Passing
Interface) was established (Snir et al. 1998), which laid down what the routines do and how they are
invoked but not the implementation. Several implementations were developed. Both PVM and MPI
routines could be called from C/C++ or Fortran programs for message passing and related activities.
Several projects began in the 1980s and 1990s to take advantage of networked computers in
laboratories for high performance computing. A very important project in relation to Grid computing is
called Condor, which started in the mid-1980s with the goal to harness “unused” cycles of networked
computers for high performance computing.
In the Condor project, a collection of computers could be given over to remote access
automatically when they were not being used locally. The collection of computers (called a Condor pool)
then formed a high-performance multicomputer.

CS6703 GRID AND CLOUD COMPUTING


Multiple users could use such physically distributed computer systems. Some very important
ideas were employed in Condor including matching the job with the available resources automatically
using a description of the job and a description of the available resources. A job workflow could be
described in which the output of one job could automatically be fed into another job. Condor has become
mature and is widely used as a job scheduler for clusters in addition to its original purpose of using
laboratory computers collectively.
In Condor, the distributed computers need only be networked and could be geographically
distributed. Condor can be used to share campus-wide computing resources.
Software Techniques: Apart from the development of distributed computing platforms, software
techniques were being developed to harness truly distributed systems.
The remote procedure call (RPC) was conceived in the mid-1980s as a way of invoking a
procedure on a remote computer, as an extension of executing procedures locally. The remote procedure
call was subsequently developed into object-oriented versions in the 1990s, one was CORBA (Common
Request Broker Architecture) and another was the Java Method Invocation (RMI).
The remote procedure call introduced the important concept of a service registry to locate remote
services. Service registries in relation to discovering services in a Grid computing environment includes
the mechanism of discovering their method of invocation.
During the early development of the World-Wide Web, the HTML was conceived to provide a
way of displaying Web pages and connecting to other pages through now very familiar hypertext links.
Soon, a Web page became more than simply displaying information, it became an interactive tool
whereby information could be entered and processed at either the client side or the server side. The
programming language JavaScript was introduced in 1995, mostly for causing actions to take place
specified in code at the client, whereas other technologies were being developed for causing actions to
take place at the server such as ASP first released in 1996.
In 2000s, a very significant concept for distributed Internet-based computing called a Web service
was introduced. Web services have their roots in remote procedure calls and provide remote actions but
are invoked through standard protocols and Internet addressing. They also use XML (eXtensible Markup
Language), which was also introduced in 2000.
The Web service interface is defined in a language-neutral manner by the XML language WSDL.
Web services were adopted into Grid computing soon after their introduction as a flexible interoperable
way of implementing the Grid infrastructure and were potentially useful for Grid applications.
Grid Computing: The first large-scale Grid computing demonstration that involved geographically
distributed computers and the start of Grid computing proper was the Information Wide-Area Year (I-
WAY) demonstration at the Supercomputing 1995 Conference (SC’95).
Seventeen supercomputer sites were involved including five DOE supercomputer centers, four
NSF supercomputer centers, three NASA supercomputer sites and other large computing sites. Ten
existing ATM networks were interconnected with the assistance of several major network service
providers.
Over 60 applications demonstrated in areas including astronomy and astrophysics, atmospheric
science, biochemistry, molecular biology and structural biology, biological and medical imaging,
chemistry, distributed computing, earth science, education, engineering, geometric modeling, material
science, mathematics, microphysics and macrophysics, neuroscience, performance analysis, plasma
physics, tele operations/telepresence, and visualization (DeFanti 1996). One focus was on virtual reality
environments. Virtual reality components included an immersive 3D environment. Separate papers in the
1996 special issue of International Journal of Supercomputer Applications described nine of the I-Way
applications.
I-Way was perhaps the largest collection of networked computing resources ever assembled for
such a significant demonstration purpose at that time. It explored many of the aspects now regarded as
central to Grid computing, such as security, job submission and distributed resource scheduling. It came
face-to-face with the “political and technical constraints” that made it infeasible to provide single
scheduler (DeFanti 1996). Each site had its own job scheduler, which had to be married together. The I-

CS6703 GRID AND CLOUD COMPUTING


Way project also marked the start of the Globus project (GlobusProject), which developed de facto
software for Grid computing. The Globus Project is led by Ian Foster, a co-developer of the I-Way
demonstration, and a founder of the Grid computing concept. The Globus Project developed a toolkit of
middleware software components for Grid computing infrastructure including for basic job submission,
security and resource management.
Globus has evolved through several implementation versions to the present time as standards
have evolved although the basic structural components have remained essentially the same (security, data
management, execution management, information services and run time environment). We will describe
Globus in a little more detail later.
Although the Globus software has been widely adopted and is the basis of the coursework
described in this book, there are other software infrastructure projects. The Legion project also envisioned
a distributed Grid computing environment. Legion was conceived in 1993 although work on the Legion
software did not begin in 1996 (Legion WorldWide Virtual Computer). Legion used an object-based
approach to Grid computing. Users could create objects in distant locations.
The first public release of Legion was at the Supercomputing 97 conference in November 1997.
The work led to the Grid computing company and software called Avaki in 1999. The company was
subsequently taken over by Sybase Inc.
In the same period, a European Grid computing project called UNICORE (UNiform Interface to
COmputing REsources) began, initially funded by the German Ministry for Education and Research
(BMBF) and continued with other European funding.
UNICORE is the basis of several of the European efforts in Grid computing and elsewhere,
including in Japan. It has many similarities to Globus for example in its security model and a service
based OGSA standard but is a more complete solution than Globus and includes a graphical interface. An
example project using UNICORE is EUROGRID, a Grid computing testbed developed in the period of
2000-2004.
A EUROGRID application project is OpenMolGRID Open Computing GRID for Molecular
Science and Engineering developed during the period of 2002-2005 to “speed up, automatize and
standardize the drug-design using Grid technology” (OpenMolGRID).
The term e-Science was coined by John Taylor, the Director General of the United Kingdom’s
Office of Science and Technology, in 1999 to describe conducting scientific research using distributed
networks and resources of a Grid computing infrastructure. Another more recent European term is e-
Infrastructure, which refers to creating a Grid-like research infrastructure.
With the development of Grid computing tools such as Globus and UNICORE, a growing
number of Grid projects began to develop applications. Originally, these focused on computational
applications. They can be categorized as:
•Computationally intensive
•Data intensive
•Experimental collaborative projects
The computationally intensive category is traditional high performance computing addressing
large problems. Sometimes, it is not necessarily one big problem but a problem that has to be solved
repeatedly with different parameters (parameter sweep problems) to get to the solution. The data intensive
category includes computational problems but with the emphasis on large amounts of data to store and
process. Experimental collaborative projects often require collecting data from experimental apparatus
and very large amounts of data to study.
The potential of Grid computing was soon recognized by the business community for so-called e-
Business applications to improve business models and practices, sharing corporate computing resources
and databases and commercialization of the technology for business applications.
For e-Business applications, the driving motive was reduction of costs whereas for e-Science
applications, the driving motive was obtaining research results. That is not to say cost was not a factor in
e-Science Grid computing.

CS6703 GRID AND CLOUD COMPUTING


Large-scale research has very high costs and Grid computing offers distributed efforts and cost
sharing of resources. There are projects that are concerned with accounting such as GridBus mentioned
earlier.
The figure below shows the time lines for computing platforms, underlying software techniques
and networks discussed. Some see Grid computing as an extension of cluster computing and it is true in
the development of high performance computing, Grid computing has followed on from cluster
computing in connecting computers together to form a multicomputer platform but Grid computing offers
much more.
The term cluster computing is limited to using computers that are interconnected locally to form a
computing resource. Programming is done mostly using explicit message passing. Grid computing
involves geographically distributed sites and invokes some different techniques. There is certainly a fine
line in the continuum of interconnected computers from locally interconnected computers in a small
room, through interconnected systems in a large computer room, then in multiple rooms and in different
departments within a company, through to computers interconnected on the Internet in one area, in one
country and across the world.
The early hype of Grid computing and marketing ploys in the late 1990s and early 2000s caused
some to call configurations Grid computing when they were just large computational clusters or they were
laboratory computers whose idle cycles are being used.
One classification that embodies the collaborative feature of Grid computing is:
•Enterprise Grids – Grids formed within an organization for collaboration.
•Partner Grids – Grids set up between collaborative organizations or institutions.
Enterprise Grid still might cross administrative domains of departments and requires departments to share
their resources. Some of the key features that are indicative of Grid computing are:
•Shared multi-owner computing resources.
•Used Grid computing software such as Globus, with security and cross-management
mechanisms in place.
Grid computing software such as Globus provides the tools for individuals and teams to use
geographically distributed computers owned by others collectively.

CS6703 GRID AND CLOUD COMPUTING


Key concepts in the history of Grid computing.
Foster’s Check List: Ian Foster is credited for the development of Grid computing, and sometimes called
the father of Grid computing. He proposed a simple checklist of aspects that are common to most true
Grids (Foster 2002):
•No centralized control
•Standard open protocols
•Non-trivial quality of service (QoS)
Grid Computing verse Cluster Computing: It is important not to think of Grid computing simply as a
large cluster because the potential and challenges are different. Courses on Grid computing and on cluster
computing are quite different. In cluster computing, one learns about message-passing programming
using tools such as MPI. Also shared memory programming is considered using threads and OpenMP,
given that most computers in a cluster today are now also multicore shared memory systems. In cluster
computing, network security is not a big issue that usually concerns the user directly.
Usually an ssh connection to the front-end code of cluster is sufficient. The internal compute
nodes are reached from there. Clusters are usually Linux clusters and in those often an NFS (Network File
System) shared file system installed across the compute resources. Accounts need to be present on all
systems in the cluster and it may be that NIS (Network Information System) is used to provide consistent
configuration information on all systems, but not necessary so.
NIS can increase the local network traffic and slow the start of applications. In Grid computing,
one looks at how to manage and use the geographically distributed sites (distributed resources). Users
need accounts on all resources but generally a shared file system is not present. Each site is typically a
high performance cluster. Being a distributed environment, one looks at distributing computing
techniques such as Web services and Internet protocols and network security as well as how to actually
take advantage of the distributed resource.

CS6703 GRID AND CLOUD COMPUTING


Security is very important because the project may use confidential information and the
distributed nature of the environment opens up a much higher probability of a security breach.
There are things in common with both Grid computing and cluster computing. Both involve using
multiple compute resources collectively. Both require job schedulers to place jobs onto the best platform.
In cluster computing, a single job scheduler will allocate jobs onto the local compute resources. In Grid
computing, a Grid computing scheduler has to manage the geographically disturbed resources owned by
others and typically interacts with local cluster job schedulers found on local clusters.
Grid Computing versus Cloud Computing: Commercialization of Grid computing is driven by a
business model that will make profits. The first widely publicized attempt was on-demand and utility
computing in the early 2000s, which attempted to sell computer time on a Grid platform constructed using
Grid technologies such as Globus. More recently, cloud computing is a business model in which services
are provided on servers that can be accessed through the Internet.
The common thread between Grid computing and cloud computing is the use of the Internet to
access the resources. Cloud computing is driven by the widespread access that the Internet and Internet
technologies provide.
However, cloud computing is quite distinct from the original purpose of Grid computing.
Whereas Grid computing focuses on collaborative and distributed shared resources, cloud computing
concentrates upon placing resources for paying users to access and share. The technology for cloud
computing emphasizes the use of services (software as a service, SaaS) and possibly the use of
virtualization .
A number of companies entered the cloud computing space in the mid-late 2000s. IBM was an
early promoter of on-demand Grid computing in the early 2000s and moved into cloud computing in a
significant way, opening a cloud computing center in Ireland in March 2008 (Dublin), and subsequently
in the Netherlands (Amsterdam), China (Beijing), and South Africa (Johannesburg) in June 2008.

Cloud computing using virtualized resources.


Other major cloud computing players include Amazon and Google who utilize their massive
number of servers. Amazon has the Amazon Elastic Compute Cloud (Amazon E2) project for users to
buy time and resources through Web services and virtualization.

CS6703 GRID AND CLOUD COMPUTING


The cloud computing business model is one step further than hosting companies simply renting
servers they provide at their location, which became popular in the early-mid 2000s with many start-up
companies and continues to date.
1.2 Grid computing Infrastructures
Grid Computing is based on the concept of information and electricity sharing, which allowing us
to access to another type of heterogeneous and geographically separated resources.
Grid gives the sharing of:
1. Storage elements
2. Computational resources
3. Equipment
4. Specific applications
5. Other
Thus, Grid is based on:
• Internet protocols.
• Ideas of parallel and distributed computing.

A Grid is a system that,


1) Coordinates resources that may not subject to a centralized control.
2) Using standard, open, general-purpose protocols and interfaces.
3) To deliver nontrivial qualities of services.
Flexible, secure, coordinated resource sharing among individuals and institutions.
Enable communities (virtual organizations) to share geographically distributed resources in order to
achieve a common goal.
In applications which can’t be solved by resources of an only institution or the results can be
achieved faster and/or cheaper.
1.3 Cloud computing
Cloud Computing is used to manipulating, accessing, and configuring the hardware and software
resources remotely. It gives online data storage, infrastructure and application.

CS6703 GRID AND CLOUD COMPUTING


Cloud Computing
Cloud computing supports platform independency, as the software is not required to be installed locally
on the PC. Hence, the Cloud Computing is making our business applications mobile and collaborative.
Characteristics of Cloud Computing
There are four key characteristics of cloud computing. They are shown in the following diagram:

On Demand Self Service


Cloud Computing allows the users to use web services and resources on demand. One can logon to a
website at any time and use them.
Broad Network Access
Since cloud computing is completely web based, it can be accessed from anywhere and at any time.
Resource Pooling

CS6703 GRID AND CLOUD COMPUTING


Cloud computing allows multiple tenants to share a pool of resources. One can share single physical
instance of hardware, database and basic infrastructure.
Rapid Elasticity
It is very easy to scale the resources vertically or horizontally at any time. Scaling of resources means the
ability of resources to deal with increasing or decreasing demand.
The resources being used by customers at any given point of time are automatically monitored.
Measured Service
In this service cloud provider controls and monitors all the aspects of cloud service. Resource
optimization, billing capacity planning etc. depend on it.
Benefits
1. One can access applications as utilities, over the Internet.
2. One can manipulate and configure the applications online at any time.
3. It does not require to install a software to access or manipulate cloud application.
4. Cloud Computing offers online development and deployment tools, programming runtime environment
through PaaS model.
5. Cloud resources are available over the network in a manner that provide platform independent access to
any type of clients.
6. Cloud Computing offers on-demand self-service. The resources can be used without interaction with
cloud service provider.
Disadvantages of cloud computing
• Requires a high-speed internet connection
• Security and confiability of data
• Not solved yet the execution of HPC apps in cloud computing Interoperability between cloud based
systems
1.4 Service oriented architecture
Service-Oriented Architecture helps to use applications as a service for other applications regardless the
type of vendor, product or technology. Therefore, it is possible to exchange the data between applications
of different vendors without additional programming or making changes to services.
The cloud computing service oriented architecture is shown in the diagram below.

Distributed computing such as Grid computing relies on causing actions to occur on remote computers.
Taking advantage of remote computers was recognized many years ago well before Grid computing. One

CS6703 GRID AND CLOUD COMPUTING


of the underlying concepts is the client-server model, as shown in the figure below. The client in this
context is a software component on one computer that makes an access to the server for a particular
operation.

Client-server model
The server responds accordingly. The request and response are transmitted through the network from the
client to the server.
An early form of client-server arrangement was the remote procedure call (RPC) introduced in the 1980s.
This mechanism allows a local program to execute a procedure on a remote computer and get back results
from that procedure. It is now the basis of certain network facilities such as mounting remote files in a
shared file system. For the remote procedure call to work, the client needs to:
•Identify the location of the required procedure.
•Know how to communicate with the procedure to get it to provide the actions required.
The remote procedure call introduced the concept of a service registry to provide a means of locating the
service (procedure). Using a service registry is now part of what is called a service-oriented architecture
(SOA) as illustrated in the figure below. The sequence of events is as follows:
•First, the server (service provider) publishes its services in a service registry.
•Then, the client (service requestor) can ask the service registry to locate a service.
•Then, the client (service requestor) binds with service provider to invoke a service.

Service-oriented architecture.
Later forms of remote procedure calls in 1990s introduced distributed objects, most notably, CORBA
(Common Request Broker Architecture) and Java RMI (Remote Method Invocation).

CS6703 GRID AND CLOUD COMPUTING


A fundamental disadvantage of remote procedure calls so far described is the need for the calling
programs to know implementation-dependent details of the remote procedural call. A procedural call has
a list of parameters with specific meanings and types and the return value(s) have specific meaning and
type.
All these details need to be known by the calling program each remote procedure provided by different
programmers could have different and incompatible arrangements. This led to improvements including
the introduction of interface definition (or description) languages (IDLs) that enabled the interface to be
described in a language-independent manner and would allow clients and servers to interact in different
languages (e.g., between C and Java). However, even with IDLs, these systems were not always
completely platform/language independent.
Some aspects for a better system include:
•Universally agreed-upon standardized interfaces.
•Inter-operability between different systems and languages.
•Flexibility to enable different programming models and message patterns.
•Agreed network protocols (Internet standards).
Web services with an XML interface definition language offer the solution.
1.5 Introduction to Grid Architecture and standards

CS6703 GRID AND CLOUD COMPUTING


Basic pillars
 Data management
 Resource management
 Security
 Information services
Need of security
 No centralized control
 Distributed resources
 Different resource providers
 Each resource provider uses different security policies
Resource Management
The huge number and the heterogeneous potential of Grid Computing resources causes the resource
management challenge to be a major effort topic in Grid Computing environments.
These resource management eventualities are include resource discovery, resource inventories, fault
isolation, resource provisioning, resource monitoring, a variety of autonomic capabilities and service-
level management activities.
The most interesting aspect of the resource management area is the selection of the correct resource from
the grid resource pool, based on the service-level requirements then to efficiently provision them to
facilitate user needs.
Information Services
Information services are fundamentally concentrated on providing valuable information respective to the
Grid Computing infrastructure resources.
These services leverage and entirely depend on the providers of information such as resource availability,
capacity utilization, just to name a few. This information is valuable and mandatory feedback respective
to the resources managers. These information services enable service providers to most efficiently
allocate resources for the variety of very specific tasks related to the Grid Computing infrastructure
solution.
Data Management
Data forms the single most important asset in a Grid Computing system. This data may be input into the
resource the results from the resource on the execution of a specific task.
If the infrastructure is not designed properly, the data movement in a geographically distributed system
can quickly cause scalability problems.

CS6703 GRID AND CLOUD COMPUTING


It is well understood that the data must be near to the computation where it is used. This data movement
in any Grid Computing environment requires absolutely secure data transfers, both to and from the
respective resources.
The current advances surrounding data management are tightly focusing on virtualized data storage
mechanisms, such as storage area networks (SAN), network file systems, dedicated storage servers virtual
databases.
These virtualization mechanisms in data storage solutions and common access mechanisms (e.g.,
relational SQLs, Web services, etc.) help developers and providers to design data management concepts
into the Grid Computing infrastructure with much more flexibility than traditional approaches.
Standards for GRID environment
 OGSA
 OGSI
 OGSA-DAI
 GridFTP
 WSRF and etc.
OGSA
The Global Grid Forum has published the Open Grid Service Architecture (OGSA). To address the
requirements of grid computing in an open and standard way, requires a framework for distributed
systems that support integration, virtualization and management. Such a framework requires a core set of
interfaces, expected behaviors, resource models bindings.
OGSA defines requirements for these core capabilities and thus provides a general reference architecture
for grid computing environments. It identifies the components and functions that are useful if not required
for a grid environment.

OGSI
As grid computing has evolved it has become clear that a service-oriented architecture could provide
many benefits in the implementation of a grid infrastructure.
The Global Grid Forum extended the concepts defined in OGSA to define specific interfaces to various
services that would implement the functions defined by OGSA. More specifically, the Open Grid Services
Interface (OGSI) defines mechanisms for creating, managing exchanging information among Grid
services.
A Grid service is a Web service that conforms to a set of interfaces and behaviors that define how a client
interacts with a Grid service.
These interfaces and behaviors, along with other OGSI mechanisms associated with Grid service creation
and discovery, provide the basis for a robust grid environment. OGSI provides the Web Service
Definition Language (WSDL) definitions for these key interface
OGSA-DAI
The OGSA-DAI (data access and integration) project is concerned with constructing middleware to assist
with access and integration of data from separate data sources via the grid.
The project was conceived by the UK Database Task Force and is working closely with the Global Grid
Forum DAIS-WG and the Globus team.
GridFTP
GridFTP is a secure and reliable data transfer protocol providing high performance and optimized for
wide-area networks that have high bandwidth.
As one might guess from its name, it is based upon the Internet FTP protocol and includes extensions that
make it a desirable tool in a grid environment. The GridFTP protocol specification is a proposed
recommendation document in the Global Grid Forum (GFD-R-P.020). GridFTP uses basic Grid security
on both control (command) and data channels.

CS6703 GRID AND CLOUD COMPUTING


Features include multiple data channels for parallel transfers, partial file transfers, third-party transfers
more. WSRF Web Services Resource Framework (WSRF). Basically, WSRF defines a set of
specifications for defining the relationship between Web services (that are normally stateless) and stateful
resources
Web services related standards
Because Grid services are so closely related to Web services, the plethora of standards associated with
Web services also apply to Grid services.
We do not describe all of these standards in this document, but rather recommend that the reader become
familiar with standards commonly associate with Web services, such as: _ XML _ WSDL _ SOAP _
UDDI.
1.5.1 Elements of Grid - Overview of Grid Architecture
General Description
The Computing Element (CE) is a set of gLite services that provide access for Grid jobs to a local
resource management system (LRMS, batch system) running on a computer farm, or possibly to
computing resources local to the CE host. Typically the CE provides access to a set of job queues within
the LRMS.
Utilization Period
Booking Conditions
No particular booking is required to use this service. However, the user MUST have a valid grid
certificate of an accepted Certificate Authority and MUST be member of a valid Virtual Organization
(VO).
The service is initiated by respective commands that can be submitted from any gLite User Interface
either interactively or through batch submission.
To run a job on the cluster the user must install an own or at least have access to a gLite User Interface.
Certificates can be requested for example at the German Grid Certificate Authority.
Deregistration
No particular deregistration is required for this service. A user with an expired Grid certificate or VO
membership is automatically blocked from accessing the CE.
IT-Security
The database and log files of the CEs contain information on the status and results of the jobs and the
certificate that was used to initiate the task.
The required data files themselves are stored on the worker nodes or in the Grid Storage Elements (SEs).
No other personal data is stored.
Technical requirements
To run a job at the Grid cluster of the Steinbuch Centre for Computing (SCC) the user needs:
1. A valid Grid user certificate.
2. Membership in a Virtual Organization (VO).
3. An own or at least access to a User Interface.

CS6703 GRID AND CLOUD COMPUTING


Overview of Grid Architecture

CS6703 GRID AND CLOUD COMPUTING


Unit-2
Introduction to Open Grid Services Architecture (OGSA) – Motivation – Functionality Requirements
– Practical & Detailed view of OGSA/OGSI – Data intensive grid service models – OGSA services.
What is the OGSA Standard?
 Acronym for Open Grid Service Architecture
 OGSA define how different components in grid interact
 Open Grid Services Architecture (OGSA) is a set of standards defining the way in which
information is shared among diverse components of large, heterogeneous grid systems. In this
context, a grid system is a scalable wide area network (WAN) that supports resource sharing
and distribution.
Architecture of OGSA
Comprised of 4 main layers
1. Physical and Logical Resources Layer
2. Web Service Layer
3. OGSA Architected Grid Services Layer
4. Grid Applications Layer
OGSA Architecture

OGSA Architecture - Physical and Logical Resources Layer


 Physical resources are: servers, storage, network
 Logical resources manage physical resources
 Examples of logical resources: database managers, workflow managers
OGSA Architecture - Web Services Layer
 Web service is software available online that could interact with other software using XML

CS6703 GRID AND CLOUD COMPUTING


 Consists of Open Grid Services Infrastructure (OGSI) sub-layer which specifies grid services
and provide consistent way to interact with grid services
 Also extends Web Service Capabilities
Consists of 5 interfaces:
1. Factory: provide way for creation of new grid services
2. Life Cycle: Manages grid service life cycles
3. State Management: Manage grid service states
4. Service Groups: collection of indexed grid services
5. Notification: Manages notification between services & resources
OGSA Architecture - Web Services Layer (OGSI)

OGSA Architecture – OGSA Architected Services – Layer


Classified into 3 service categories
1. Grid Core Services
2. Grid Program Execution Services
3. Grid Data Services
OGSA Architected Services – Grid Core Services
Composed of 4 main types of services:
1. Service Management: assist in installation, maintenance, & troubleshooting tasks in grid
system
2. Service Communication: include functions that allow grid services to communicate
3. Policy Services: Provide framework for creation, administration & management of policies
for system operation
4. Security Services: provide authentication & authorization mechanisms to ensure systems
interoperate securely
OGSA Architected Services – Grid Program Execution Services
 Supports unique grid systems in high performance computing, collaboration, parallelism
 Support virtualization of resource processing

OGSA Architected Services – Grid Data Services

CS6703 GRID AND CLOUD COMPUTING


 Support data virtualization
 Provide mechanism for access to distributed resources such as databases, files
OGSA Architecture – OGSA Architected Services – Layer

OGSA Architecture – Grid Applications Layer


 This layer comprise of applications that use the grid architected services
Open Grid Services Infrastructure (OGSI)
 Gives a formal and technical specification of what a grid service is.
 Its a excruciatingly detailed specification of how Grid Services work.
 GT3 includes a complete implementation of OGSI.
 It is a formal and technical specification of the concepts described in OGSA.
 The Globus Toolkit 3 is an implementation of OGSI.
 Some other implementations are OGSI::Lite (Perl)1 and the UNICORE OGSA demonstrator2
from the EU GRIP project.
 OGSI specification defines grid services and builds upon web services.
Open Grid Services Infrastructure (OGSI)
 OGSI creates an extension model for WSDL called GWSDL (Grid WSDL). The reason is:
 Interface inheritance
 Service Data (for expressing state information)
 Components:
 Lifecycle
 State management
 Service Groups
 Factory
 Notification
 Handle Map

CS6703 GRID AND CLOUD COMPUTING


Data intensive grid service models
Applications in the grid are normally grouped into two categories
 Computation-intensive and Data intensive
 Data intensive applications deals with massive amounts of data. The grid system must
specially designed to discover, transfer and manipulate the massive data sets.
 Transferring the massive data sets is a time consuming task.
 Data access method is also known as caching, which is often applied to enhance data
efficiency in a grid environment.
 By replicating the same data block and scattering them in multiple regions in a grid, users can
access the same data with locality of references.
Data intensive grid service models
 Replication strategies determine when and where to create a replica of the data.
 The strategies of replications can be classified into dynamic and static
Static method
 The locations and number of replicas are determined in advance and will not be modified.
 Replication operation require little overhead
 Static strategic cannot adapt to changes in demand, bandwidth and storage variability
 Optimization is required to determine the location and number of data replicas.
Dynamic strategies
 Dynamic strategies can adjust locations and number of data replicas according to change in
conditions
 Frequent data moving operations can result in much more overhead the static strategies
 Optimization may be determined based on whether the data replica is being created, deleted
or moved.
 The most common replication include preserving locality, minimizing update costs and
maximizing profits .
Grid data Access models
In general there are four access models for organizing a data grid as listed here
1. Monadic method
2. Hierarchical model
3. Federation model
4. Hybrid model

CS6703 GRID AND CLOUD COMPUTING


Monadic method

 This is a centralized data repository model. All data is saved in central data repository.
 When users want to access some data they have no submit request directly to the central
repository.
 No data is replicated for preserving data locality.
 For a larger grid this model is not efficient in terms of performance and reliability.
 Data replication is permitted in this model only when fault tolerance is demanded.
Hierarchical model

 It is suitable for building a large data grid which has only one large data access directory
 Data may be transferred from the source to a second level center. Then some data in the
regional center is transferred to the third level centre.
 After being forwarded several times specific data objects are accessed directly by users.
Higher level data center has a wider coverage area.
 PKI security services are easier to implement in this hierarchical data access model

CS6703 GRID AND CLOUD COMPUTING


Federation model

 It is suited for designing a data grid with multiple source of data supplies.
 It is also known as a mesh model
 The data is shared the data and items are owned and controlled by their original owners.
 Only authenticated users are authorized to request data from any data source.
 This mesh model cost the most when the number of grid intuitions becomes very large
Hybrid model

 This model combines the best features of the hierarchical and mesh models.
 Traditional data transfer technology such as FTP applies for networks with lower bandwidth.
 High bandwidth are exploited by high speed data transfer tools such as GridFTP developed
with Globus library.
 The cost of hybrid model can be traded off between the two extreme models of hierarchical
and mesh-connected grids.
Parallel versus Striped Data Transfers
 Parallel data transfer opens multiple data streams for passing subdivided segments of a file
simultaneously. Although the speed of each stream is same as in sequential streaming, the
total time to move data in all streams can be significantly reduced compared to FTP transfer.

CS6703 GRID AND CLOUD COMPUTING


 Striped data transfer a data objects is partitioned into a number of sections and each section
is placed in an individual site in a data grid. When a user requests this piece of data, a data
stream is created for each site in a data gird. When user requests this piece of data, data
stream is created for each site, and all the sections of data objects ate transected
simultaneously.
Grid Services and OGSA
 Facilitate use and management of resources across distributed, heterogeneous environments
 Deliver seamless QoS
 Define open, published interfaces in order to provide interoperability of diverse resources
 Exploit industry-standard integration technologies
 Develop standards that achieve interoperability
 Integrate, virtualize, and manage services and resources in a distributed, heterogeneous
environment
 Deliver functionality as loosely coupled, interacting services aligned with industry- accepted
web service standards
 OGSA services fall into seven broad areas, defined in terms of capabilities frequently
required in a grid scenario. Figure shows the OGSA architecture. These services are
summarized as follows:

OGSA services - seven broad areas


1. Infrastructure Services Refer to a set of common functionalities, such as naming, typically
required by higher level services.
CS6703 GRID AND CLOUD COMPUTING
2. Execution Management Services Concerned with issues such as starting and managing
tasks, including placement, provisioning, and life-cycle management. Tasks may range from
simple j obs to complex workflows or composite services.
3. Data Management Services Provide functionality to move data to where it is needed,
maintain replicated copies, run queries and updates, and transform data into new formats.
These services must handle issues such as data consistency, persistency, and integrity. An
OGSA data service is a web service that implements one or more of the base data interfaces to
enable access to, and management of, data resources in a distributed environment. The three
base interfaces, Da ta Access, Da ta Fa ctory, and Da ta Ma na gement, define basic
operations for representing, accessing, creating, and managing data.
4. Resource Management Services Provide management capabilities for grid resources:
management of the resources themselves, management of the resources as grid components,
and management of the OGSA infrastructure. For example, resources can be monitored,
reserved, deployed, and configured as needed to meet application QoS requirements. I t also
requires an information model (semantics) and data model (representation) of the grid
resources and services.
5. Security Services Facilitate the enforcement of security-related policies within a (virtual)
organization, and supports safe resource sharing. Authentication, authorization, and integrity
assurance are essential functionalities provided by these services.
6. Information Services Provide efficient production of, and access to, information about the
grid and its constituent resources. The term “information” refers to dynamic data or events
used for status monitoring; relatively static data used for discovery; and any data that is
logged. Troubleshooting is j ust one of the possible uses for information provided by these
services.
7. Self-Management Services Support service-level attainment for a set of services (or
resources), with as much automation as possible, to reduce the costs and complexity of
managing the system. These services are essential in addressing the increasing complexity of
owning and operating an I T infrastructure.

CS6703 GRID AND CLOUD COMPUTING


UNIT III VIRTUALIZATION
Cloud deployment models: public, private, hybrid, community – Categories of cloud
computing: Everything as a service: Infrastructure, platform, software - Pros and Cons of cloud
computing – Implementation levels of virtualization – virtualization structure – virtualization of CPU,
Memory and I/O devices – virtual clusters and Resource Management – Virtualization for data center
automation.
Definition of Cloud Computing
 The practice of using a network of remote servers hosted on the Internet to store, manage, and
process data, rather than a local server or a personal computer.
 Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access
to a shared pool of configurable computing resources (e.g., networks, servers, storage,
applications, and services) that can be rapidly provisioned and released with minimal
management effort or service provider interaction. This cloud model is composed of five
essential characteristics, three service models, and four deployment models.
 The following image shows that cloud computing is composed of five essential
characteristics, three deployment models, and four service models as shown in the following
figure:

Cloud Ecosystem and Enabling Technologies

Cost Model

CS6703 GRID AND CLOUD COMPUTING


Cloud Design Objectives
1. Shifting computing from desktops to data centers
2. Service provisioning and cloud economics
3. Scalability in performance
4. Data privacy protection
5. High quality of cloud services
6. New standards and interfaces
Essential Characteristics:
 On-demand self-service. A consumer can unilaterally provision computing capabilities, such
as server time and network storage, as needed automatically without requiring human
interaction with each service provider.
 Broad network access. Capabilities are available over the network and accessed through
standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g.,
mobile phones, tablets, laptops, and workstations).
 Resource pooling. The provider’s computing resources are pooled to serve multiple
consumers using a multi-tenant model, with different physical and virtual resources
dynamically assigned and reassigned according to consumer demand. There is a sense of
location independence in that the customer generally has no control or knowledge over the
exact location of the provided resources but may be able to specify location at a higher level
of abstraction (e.g., country, state, or data center). Examples of resources include storage,
processing, memory, and network bandwidth.
 Rapid elasticity. Capabilities can be elastically provisioned and released, in some cases
automatically, to scale rapidly outward and inward commensurate with demand. To the
consumer, the capabilities available for provisioning often appear to be unlimited and can be
appropriated in any quantity at any time.
 Measured service. Cloud systems automatically control and optimize resource use by
leveraging a metering capability1 at some level of abstraction appropriate to the type of
service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can
be monitored, controlled, and reported, providing transparency for both the provider and
consumer of the utilized service.
Service Models

CS6703 GRID AND CLOUD COMPUTING


 Software as a Service (SaaS). The capability provided to the consumer is to use the provider’s
applications running on a cloud infrastructure. The applications are accessible from various
client devices through either a thin client interface, such as a web browser (e.g., web-based
email), or a program interface. The consumer does not manage or control the underlying
cloud infrastructure including network, servers, operating systems, storage, or even individual
application capabilities, with the possible exception of limited user-specific application
configuration settings.
 Platform as a Service (PaaS). The capability provided to the consumer is to deploy onto the
cloud infrastructure consumer-created or acquired applications created using programming
languages, libraries, services, and tools supported by the provider. The consumer does not
manage or control the underlying cloud infrastructure including network, servers, operating
systems, or storage, but has control over the deployed applications and possibly configuration
settings for the application-hosting environment.
 Infrastructure as a Service (IaaS). The capability provided to the consumer is to provision
processing, storage, networks, and other fundamental computing resources where the
consumer is able to deploy and run arbitrary software, which can include operating systems
and applications. The consumer does not manage or control the underlying cloud
infrastructure but has control over operating systems, storage, and deployed applications; and
possibly limited control of select networking components (e.g., host firewalls).
 Cloud service models offer customers varying levels of control over assets and services,
which presents performance visibility challenges.

The platform and ecosystem views of cloud computing represent a new paradigm, and promote a new
way of computing.

CS6703 GRID AND CLOUD COMPUTING


Deployment Models
 Private cloud. The cloud infrastructure is provisioned for exclusive use by a single
organization comprising multiple consumers (e.g., business units). It may be owned,
managed, and operated by the organization, a third party, or some combination of them, and it
may exist on or off premises.
 Community cloud. The cloud infrastructure is provisioned for exclusive use by a specific
community of consumers from organizations that have shared concerns (e.g., mission,
security requirements, policy, and compliance considerations). It may be owned, managed,
and operated by one or more of the organizations in the community, a third party, or some
combination of them, and it may exist on or off premises.
 Public cloud. The cloud infrastructure is provisioned for open use by the general public. It
may be owned, managed, and operated by a business, academic, or government organization,
or some combination of them. It exists on the premises of the cloud provider.
 Hybrid cloud. The cloud infrastructure is a composition of two or more distinct cloud
infrastructures (private, community, or public) that remain unique entities, but are bound
together by standardized or proprietary technology that enables data and application
portability (e.g., cloud bursting for load balancing between clouds).

CS6703 GRID AND CLOUD COMPUTING


Cloud deployment models
 The concept of cloud computing has evolved from cluster, grid and utility computing.
 Cluster and grid computing leverage the use of many computers in parallel to solve
problems of any size.
 Utility and Software as a Service (SaaS) provide computing resource as a service with
notation of pay per use.
 Cloud computing is a high throughput computing (HTC) paradigm whereby the
infrastructure provides the service through a large data centre or server farms.
 The cloud computing model enables users to share to resources from anywhere at any
time through their connected devices.
 All computations in cloud applications are distributed to servers in a data centre, cloud
platforms are systems distributed through virtualization.
Cloud deployment models

The major cloud deployment models


are

CS6703 GRID AND CLOUD COMPUTING


1. Public Cloud
2. Private Cloud,
3. Hybrid Cloud
4. Community Cloud
Advantages of Cloud Computing
Advantages of Cloud Computing
 Cost Savings : Perhaps, the most significant cloud computing benefit is in terms of IT cost
savings. Businesses, no matter what their type or size, exist to earn money while keeping
capital and operational expenses to a minimum. With cloud computing, you can save
substantial capital costs with zero in-house server storage and application requirements. The
lack of on-premises infrastructure also removes their associated operational costs in the form
of power, air conditioning and administration costs. You pay for what is used and disengage
whenever you like - there is no invested IT capital to worry about. It’s a common
misconception that only large businesses can afford to use the cloud, when in fact, cloud
services are extremely affordable for smaller businesses.
 Reliability: With a managed service platform, cloud computing is much more reliable and
consistent than in-house IT infrastructure. Most providers offer a Service Level Agreement
which guarantees 24/7/365 and 99.99% availability. Your organization can benefit from a
massive pool of redundant IT resources, as well as quick failover mechanism - if a server
fails, hosted applications and services can easily be transited to any of the available servers.
 Manageability :Cloud computing provides enhanced and simplified IT management and
maintenance capabilities through central administration of resources, vendor managed
infrastructure and SLA backed agreements. IT infrastructure updates and maintenance are
eliminated, as all resources are maintained by the service provider. You enjoy a simple web-
based user interface for accessing software, applications and services – without the need for
installation - and an SLA ensures the timely and guaranteed delivery, management and
maintenance of your IT services.
 Strategic Edge: Ever-increasing computing resources give you a competitive edge over
competitors, as the time you require for IT procurement is virtually nil. Your company can
deploy mission critical applications that deliver significant business benefits, without any
upfront costs and minimal provisioning time. Cloud computing allows you to forget about
technology and focus on your key business activities and objectives. It can also help you to
reduce the time needed to market newer applications and services.
 Lower computer costs:
 You do not need a high-powered and high-priced computer to run cloud computing's
web-based applications.
 Since applications run in the cloud, not on the desktop PC, your desktop PC does not
need the processing power or hard disk space demanded by traditional desktop
software.
 When you are using web-based applications, your PC can be less expensive, with a
smaller hard disk, less memory, more efficient processor...

CS6703 GRID AND CLOUD COMPUTING


 In fact, your PC in this scenario does not even need a CD or DVD drive, as no
software programs have to be loaded and no document files need to be saved.
 Improved performance:
 With few large programs hogging your computer's memory, you will see better
performance from your PC.
 Computers in a cloud computing system boot and run faster because they have fewer
programs and processes loaded into memory…
 Reduced software costs:
 Instead of purchasing expensive software applications, you can get most of what you
need for free-ish!
 most cloud computing applications today, such as the Google Docs suite.
 better than paying for similar commercial software
 which alone may be justification for switching to cloud applications.
 Instant software updates:
 Another advantage to cloud computing is that you are no longer faced with choosing
between obsolete software and high upgrade costs.
 When the application is web-based, updates happen automatically
 available the next time you log into the cloud.
 When you access a web-based application, you get the latest version
 without needing to pay for or download an upgrade.
 Improved document format compatibility.
 You do not have to worry about the documents you create on your machine being
compatible with other users' applications or OSes
 There are potentially no format incompatibilities when everyone is sharing
documents and applications in the cloud.
Disadvantages of Cloud Computing
 Downtime : As cloud service providers take care of a number of clients each day, they can
become overwhelmed and may even come up against technical outages. This can lead to your
business processes being temporarily suspended. Additionally, if your internet connection is
offline, you will not be able to access any of your applications, server or data from the cloud.
 Security :Although cloud service providers implement the best security standards and industry
certifications, storing data and important files on external service providers always opens up
risks. Using cloud-powered technologies means you need to provide your service provider
with access to important business data. Meanwhile, being a public service opens up cloud
service providers to security challenges on a routine basis. The ease in procuring and
accessing cloud services can also give nefarious users the ability to scan, identify and exploit
loopholes and vulnerabilities within a system. For instance, in a multi-tenant cloud
architecture where multiple users are hosted on the same server, a hacker might try to break

CS6703 GRID AND CLOUD COMPUTING


into the data of other users hosted and stored on the same server. However, such exploits and
loopholes are not likely to surface, and the likelihood of a compromise is not great.
 Vendor Lock-In: Although cloud service providers promise that the cloud will be flexible to
use and integrate, switching cloud services is something that hasn’t yet completely evolved.
Organizations may find it difficult to migrate their services from one vendor to another.
Hosting and integrating current cloud applications on another platform may throw up
interoperability and support issues. For instance, applications developed on Microsoft
Development Framework (.Net) might not work properly on the Linux platform.
 Limited Control :Since the cloud infrastructure is entirely owned, managed and monitored by
the service provider, it transfers minimal control over to the customer. The customer can only
control and manage the applications, data and services operated on top of that, not the
backend infrastructure itself. Key administrative tasks such as server shell access, updating
and firmware management may not be passed to the customer or end user.
 Requires a constant Internet connection:
 Cloud computing is impossible if you cannot connect to the Internet.
 Since you use the Internet to connect to both your applications and documents, if you
do not have an Internet connection you cannot access anything, even your own
documents.
 A dead Internet connection means no work and in areas where Internet connections
are few or inherently unreliable, this could be a deal-breaker.
 Can be slow:
 Even with a fast connection, web-based applications can sometimes be slower than
accessing a similar software program on your desktop PC.
 Everything about the program, from the interface to the current document, has to be
sent back and forth from your computer to the computers in the cloud.
 If the cloud servers happen to be backed up at that moment, or if the Internet is
having a slow day, you would not get the instantaneous access you might expect from
desktop applications.
 Does not work well with low-speed connections:
 Similarly, a low-speed Internet connection, such as that found with dial-up services,
makes cloud computing painful at best and often impossible.
 Web-based applications require a lot of bandwidth to download, as do large
documents.
 Features might be limited:
 This situation is bound to change, but today many web-based applications simply are
not as full-featured as their desktop-based applications.
 For example, you can do a lot more with Microsoft PowerPoint than with
Google Presentation's web-based offering
Implementation Levels of Virtualization

CS6703 GRID AND CLOUD COMPUTING


Virtualization technology benefits the computer and IT industries by enabling users to share
expensive hardware resources by multiplexing VMs on the same set of hardware hosts. Virtual
workspaces:
 An abstraction of an execution environment that can be made dynamically available
to authorized clients by using well-defined protocols,
 Resource quota (e.g. CPU, memory share),
 Software configuration (e.g. O/S, provided services).
 Implement on Virtual Machines (VMs):
 Abstraction of a physical host machine,
 Hypervisor intercepts and emulates instructions from VMs, and
allows management of VMs,
 VMWare, Xen, etc.
 Provide infrastructure API:
 Plug-ins to hardware/support structures

Virtual Machines:
 VM technology allows multiple virtual machines to run on a single physical machine.
Difference between Traditional and Virtual machines
 A traditional computer runs with a host operating system specially tailored for its hardware
architecture

CS6703 GRID AND CLOUD COMPUTING


 After virtualization, different user applications managed by their own operating systems
(guest OS) can run on the same hardware, independent of the host OS.
 The Virtualization layer is the middleware between the underlying hardware and virtual
machines represented in the system, also known as virtual machine monitor (VMM) or
hypervisor.

Virtualization Layers
The virtualization software creates the abstraction of VMs by interposing a virtualization layer at
various levels of a computer system. Common virtualization layers include
1. the instruction set architecture (ISA) level,
2. hardware level,
3. operating system level,
4. library support level, and
5. application level
Virtualization Ranging
from Hardware to
Applications in Five
Abstraction Levels

CS6703 GRID AND CLOUD COMPUTING


1.Virtualization at Instruction Set Architecture (ISA) level:
 At the ISA level, virtualization is performed by emulating a given ISA by the ISA of the host
machine. Instruction set emulation leads to virtual ISAs created on any hardware machine.
e.g, MIPS binary code can run on an x-86-based host machine with the help of ISA
emulation.
 With this approach, it is possible to run a large amount of legacy binary code written for
various processors on any given new hardware host machine.
 code interpretation – dynamic binary translation - virtual instruction set architecture (V-ISA)
 Advantage:
• It can run a large amount of legacy binary codes written for various processors on any
given new hardware host machines
• best application flexibility
 Shortcoming & limitation:
• One source instruction may require tens or hundreds of native target instructions to
perform its function, which is relatively slow.
• V-ISA requires adding a processor-specific software translation layer in the complier.
2.Virtualization at Hardware Abstraction level:
 Hardware-level virtualization is performed right on top of the bare hardware.
 On the one hand, this approach generates a virtual hardware environment for a VM.
 On the other hand, the process manages the underlying hardware through virtualization.
 The idea is to virtualize a computer’s resources, such as its processors, memory, and I/O
devices. The intention is to upgrade the hardware utilization rate by multiple users
concurrently.
Advantage:
• Has higher performance and good application isolation
Shortcoming & limitation:
• Very expensive to implement (complexity)
3.Virtualization at Operating System (OS) level:
 OS-level virtualization creates isolated containers on a single physical server and the OS
instances to utilize the hardware and software in data centers. The containers behave like real
servers.

CS6703 GRID AND CLOUD COMPUTING


 OS-level virtualization is commonly used in creating virtual hosting environments to allocate
hardware resources among a large number of mutually distrusting users.
Advantage:
• Has minimal startup/shutdown cost, low resource requirement, and high scalability;
synchronize VM and host state changes.
Shortcoming & limitation:
• All VMs at the operating system level must have the same kind of guest OS
• Poor application flexibility and isolation.
Virtualization at OS Level:

Advantages of OS Extension for Virtualization


1. VMs at OS level has minimum startup/shutdown costs
2. OS-level VM can easily synchronize with its environment
Disadvantage of OS Extension for Virtualization
 All VMs in the same OS container must have the same or similar guest OS, which restrict
application flexibility of different VMs on the same physical machine.
4.Library Support level:
 Since most systems provide well-documented APIs, such an interface becomes another
candidate for virtualization.
 Virtualization with library interfaces is possible by controlling the communication link
between applications and the rest of a system through API hooks.
 The software tool WINE has implemented this approach to support Windows applications on
top of UNIX hosts.
 Another example is the vCUDA which allows applications executing within VMs to leverage
GPU hardware acceleration.
Advantage:
• It has very low implementation effort

CS6703 GRID AND CLOUD COMPUTING


Shortcoming & limitation:
• poor application flexibility and isolation
5.User-Application Level
 Virtualization at the application level virtualizes an application as a VM. On a traditional OS,
an application often runs as a process.
 Therefore, application-level virtualization is also known as process-level virtualization.
 The most popular approach is to deploy high level language (HLL) VMs. In this scenario, the
virtualization layer sits as an application program on top of the operating system, and the
layer exports an abstraction of a VM that can run programs written and compiled to a
particular abstract machine definition.
 Other forms of application-level virtualization are known as
 application isolation,
 application sandboxing, or application streaming.
Advantage:
• has the best application isolation
Shortcoming & limitation:
• low performance, low application flexibility and high implementation complexity.
Virtualization Structures/Tools and Mechanisms:
 In general, there are three typical classes of VM architecture. Figure showed the architectures
of a machine before and after virtualization.
 Before virtualization, the operating system manages the hardware.
 After virtualization, a virtualization layer is inserted between the hardware and the operating
system. In such a case, the virtualization layer is responsible for converting portions of the
real hardware into virtual hardware.
 Therefore, different operating systems such as Linux and Windows can run on the same
physical machine, simultaneously.
 Depending on the position of the virtualization layer, there are several classes of VM
architectures, namely the hypervisor architecture, para-virtualization, and host-based
virtualization.
 The hypervisor is also known as the VMM (Virtual Machine Monitor). They both perform the
same virtualization operations.
Hypervisor:
 A hypervisor is a hardware virtualization technique allowing multiple operating systems,
called guests to run on a host machine. This is also called the Virtual Machine Monitor
(VMM).
Type 1: bare metal hypervisor
• sits on the bare metal computer hardware like the CPU, memory, etc.

CS6703 GRID AND CLOUD COMPUTING


• All guest operating systems are a layer above the hypervisor.
• The original CP/CMS hypervisor developed by IBM was of this kind.
Type 2: hosted hypervisor
• Run over a host operating system.
• Hypervisor is the second layer over the hardware.
• Guest operating systems run a layer over the hypervisor.
• The OS is usually unaware of the virtualization
The XEN Architecture
 Xen is an open source hypervisor program developed by Cambridge University. Xen is a
micro-kernel hypervisor, which separates the policy from the mechanism.
 Xen does not include any device drivers natively . I t just provides a mechanism by which a
guest OS can have direct access to the physical devices.
 As a result, the size of the Xen hypervisor is kept rather small. Xen provides a virtual
environment located between the hardware and the OS.

Binary Translation with Full Virtualization


 Depending on implementation technologies, hardware virtualization can be classified into two
categories: full virtualization and host-based virtualization.
 Full virtualization does not need to modify the host OS. I t relies on binary translation to trap
and to virtualizes the execution of certain sensitive, non virtualizable instructions. The guest
OSes and their applications consist of noncritical and critical instructions.
 I n a host-based system, both a host OS and a guest OS are used. A virtualization software
layer is built between the host OS and guest OS.
 These two classes of VM architecture are introduced next.
Binary Translation of Guest OS Requests Using a VMM
 This approach was implemented by VMware and many other software companies.
 VMware puts the VMM at Ring 0 and the guest OS at Ring 1. The VMM scans the
instruction stream and identified the privileged, control- and behavior sensitive instructions.

CS6703 GRID AND CLOUD COMPUTING


 When these instructions are identified, they are trapped into the VMM, which emulates the
behavior of these instructions.
 The method used in this emulation is called binary translation. Therefore, full virtualization
combines binary translation and direct execution.
Host-Based Virtualization
 An alternative VM architecture is to install a virtualization layer on top of the host OS. This
host OS is still responsible for managing the hardware.
 This host-based architecture has some distinct advantages. First, the user can install this VM
architecture without modifying the host OS. The virtualizing software can rely on the host OS
to provide device drivers and other low-level services. This will simplify the VM design and
ease its deployment.
 Second, the host-based approach appeals to many host machine configurations. Compared to
the hypervisor/VMM architecture, the performance of the host-based architecture may also be
low
Para –virtualization:

 Para -virtualization needs to modify the guest operating systems. A para-virtualized VM


 provides special API s requiring substantial OS modifications in user applications.
 Performance degradation is a critical issue of a virtualized system.
Full Virtualization vs. Para-Virtualization
Full virtualization
 Does not need to modify guest OS, and critical instructions are emulated by software through
the use of binary translation.
 VMware Workstation applies full virtualization, which uses binary translation to
automatically modify x86 software on-the-fly to replace critical instructions.
Advantage: no need to modify OS.
Disadvantage: binary translation slows down the performance.
Para virtualization

CS6703 GRID AND CLOUD COMPUTING


 Reduces the overhead, but cost of maintaining a paravirtualized OS is high.
 The improvement depends on the workload.
 Para virtualization must modify guest OS, non-virtualizable instructions are replaced by
hyper calls that communicate directly with the hypervisor or VMM.
 Para virtualization is supported by Xen, Denali and VMware ESX.
Virtualization of CPU, Memory, and I/O Devices :
CPU Virtualization
 A VM is a duplicate of an existing computer system in which a majority of the VM
instructions are executed on the host processor in native mode. Thus, unprivileged
instructions of VMs run directly on the host machine for higher efficiency. Other critical
instructions should be handled carefully for correctness and stability.
 The critical instructions are divided into three categories: privileged instructions, control –
sensitive instructions, and behavior-sensitive instructions.
 Privileged instructions execute in a privileged mode and will be trapped if executed outside
this mode.
 Control-sensitive instructions attempt to change the configuration of resources used.
Behavior-sensitive instructions have different behaviors depending on the configuration of
resources, including the load and store operations over the virtual memory.
 A CPU architecture is virtualizable if it supports the ability to run the VM’s privileged
 and unprivileged instructions in the CPU’s user mode while the VMM runs in supervisor
mode.
 When the privileged instructions including control- and behavior-sensitive instructions of a
VM are executed, they are trapped in the VMM. I n this case, the VMM acts as a unified
mediator for hardware access from different VMs to guarantee the correctness and stability of
the whole system. However, not all CPU architectures are virtualizable.
 RI SC CPU architectures can be naturally virtualized because all control and behavior-
sensitive instructions are privileged instructions.
 On the contrary, x86 CPU architectures are not primarily designed to support virtualization.

Memory Virtualization:
 Virtual memory virtualization is similar to the virtual memory support provided by modern
operating systems. I n a traditional execution environment, the operating system maintains
mappings of virtual memory to ma chine memory using page tables, which is a one-stage
mapping from virtual memory to machine memory.
 However, in a virtual execution environment, virtual memory virtualization involves sharing
the physical system memory in RAM and dynamically allocating it to the physical memory
of the VMs.
 That means a two-stage mapping process should be maintained by the guest OS and the
VMM, respectively: virtual memory to physical memory and physical memory to machine
memory.

CS6703 GRID AND CLOUD COMPUTING


I/O Virtualization:
 there are three ways to implement I /O virtualization: full device emulation, para-
virtualization, and direct I /O.
 I /O virtualization. Generally, this approach emulates well-known, real-world devices. All the
functions of a device or bus infrastructure, such as device enumeration, identification,
interrupts, and DMA, are replicated in software. This software is located in the VMM and
acts as a virtual device.
 The para-virtualization method of I /O virtualization is typically used in Xen. I t is also known
as the split driver model consisting of a frontend driver and a backend driver. It achieves beer
device performance than full device emulation, it comes with a higher CPU overhead
 Direct I /O virtualization lets the VM access devices directly. I t can achieve close-to native
performance without high CPU costs.
Virtual Clusters and Resource Management:
 A physical cluster is a collection of servers (physical machines) interconnected by a physical
network such as a LAN
 Virtual clusters are built with VMs installed at distributed servers from one or more physical
clusters. The VMs in a virtual cluster are interconnected logically by a virtual network across
several physical networks. Figure illustrates the concepts of virtual clusters and physical
clusters. Each virtual cluster is formed with physical machines or a VM hosted by multiple
physical clusters. The virtual cluster boundaries are shown as distinct boundaries.
Trust Management in Virtualized Data Centers:
 A VMM changes the computer architecture. I t provides a layer of software between the
operating systems and system hardware to create one or more VMs on a single physical
platform.
 VMM can provide secure isolation and a VM accesses hardware resources through the control
of the VMM, so the VMM is the base of the security of a virtual system. Normally, one VM
is taken as a management VM to have some privileges such as creating, suspending,
resuming, or deleting a VM.
 Once a hacker successfully enters the VMM or management VM, the whole system is in
danger.

CS6703 GRID AND CLOUD COMPUTING


UNIT IV PROGRAMMING MODEL
Open source grid middleware packages – Globus Toolkit (GT4) Architecture , Configuration
– Usage of Globus – Main components and Programming model - Introduction to Hadoop
Framework - Mapreduce, Input splitting, map and reduce functions, specifying input and output
parameters, configuring and running a job – Design of Hadoop file system, HDFS concepts, command
line and java interface, dataflow of File read & File write.
Open source grid middleware packages
 The Open Grid Forum and Object Management are two well- formed organizations behind
the standards
 Middleware is the software layer that connects software components. It lies between
operating system and the applications.
 Grid middleware is specially designed a layer between hardware and software, enable the
sharing of heterogeneous resources and managing virtual organizations created around the
grid.
 The popular grid middleware are
1. BOINC -Berkeley Open Infrastructure for Network Computing.
2. UNICORE - Middleware developed by the German grid computing community.
3. Globus (GT4) - A middleware library jointly developed by Argonne National Lab., Univ. of
Chicago, and USC Information Science Institute, funded by DARPA, NSF, and NIH.
4. CGSP in ChinaGrid - The CGSP (ChinaGrid Support Platform) is a middleware library
developed by 20 top universities in China as part of the ChinaGrid Project
5. Condor-G - Originally developed at the Univ. of Wisconsin for general distributed
computing, and later extended to Condor-G for grid job management.
6. Sun Grid Engine (SGE) - Developed by Sun Microsystems for business grid applications.
Applied to private grids and local clusters within enterprises or campuses.
7. gLight -Born from the collaborative efforts of more than 80 people in 12 different academic
and industrial research centers as part of the EGEE Project, gLite provided a framework for
building grid applications tapping into the power of distributed computing and storage
resources across the Internet.
The Globus Toolkit Architecture (GT4):
 The Globus Toolkit, is an open middleware library for the grid computing communities.
These open source software libraries support many operational grids and their applications on
an international basis.
 The toolkit addresses common problems and issues related to grid resource discovery,
management, communication, security, fault detection, and portability. The software itself
provides a variety of components and capabilities.
 The library includes a rich set of service implementations. The implemented software
supports grid infrastructure management, provides tools for building new web services in Java
, C, and Python, builds a powerful standard-based security infrastructure and client API s (in
different languages), and offers comprehensive command-line programs for accessing various
grid services. T

CS6703 GRID AND CLOUD COMPUTING


 The Globus Toolkit was initially motivated by a desire to remove obstacles that prevent
seamless collaboration, and thus sharing of resources and services, in scientific and
engineering applications. The shared resources can be computers, storage, data, services,
networks, science instruments (e.g., sensors), and so on. The Globus library version GT4, is
conceptually shown in Figure
The Globus Toolkit:

The Globus Toolkit:


 GT4 offers the middle-level core services in grid applications.
 The high-level services and tools, such as MPI , Condor-G, and Nirod/G, are developed by
third parties for general purpose distributed computing applications.
 The local services, such as LSF, TCP, Linux, and Condor, are at the boom level and are
fundamental tools supplied by other developers.
 As a de facto standard in grid middleware, GT4 is based on industry-standard web service
technologies.
Functionalities of GT4:
 Global Resource Allocation Manager (GRAM ) - Grid Resource Access and Management
(HTTP-based)
 Communication (Nexus ) - Unicast and multicast communication
 Grid Security Infrastructure (GSI ) - Authentication and related security services
 Monitory and Discovery Service (MDS ) - Distributed access to structure and state
information
 Health and Status (HBM ) - Heartbeat monitoring of system components
 Global Access of Secondary Storage (GASS ) - Grid access of data in remote secondary
storage

CS6703 GRID AND CLOUD COMPUTING


 Grid File Transfer (GridFTP ) Inter-node fast file transfer
Globus Job Workflow:

Globus Job Workflow:


 A typical job execution sequence proceeds as follows: The user delegates his credentials to a
delegation service.
 The user submits a j ob request to GRAM with the delegation identifier as a parameter.
 GRAM parses the request, retrieves the user proxy certificate from the delegation service,
and then acts on behalf of the user.
 GRAM sends a transfer request to the RFT (Reliable File Transfer), which applies GridFTP to
bring in the necessary files.
 GRAM invokes a local scheduler via a GRAM adapter and the SEG (Scheduler Event
 Generator) initiates a set of user j obs.
 The local scheduler reports the j ob state to the SEG. Once the j ob is complete, GRAM uses
RFT and GridFTP to stage out the resultant files. The grid monitors the progress of these
operations and sends the user a notification
Client-Globus Interactions:
 There are strong interactions between provider programs and user code. GT4 makes heavy
use of industry-standard web service protocols and mechanisms in service description,
discovery, access, authentication, authorization.
 GT4 makes extensive use of java, C, and Python to write user code. Web service mechanisms
define specific interfaces for grid computing.
 Web services provide flexible, extensible, and widely adopted XML-based interfaces.

CS6703 GRID AND CLOUD COMPUTING


Data Management Using GT4:
 Grid applications one need to provide access to and/or integrate large quantities of data at
multiple sites. The GT4 tools can be used individually or in conj unction with other tools to
develop interesting solutions to efficient data access. The following list briefly introduces
these GT4 tools:
 1. Grid FTP supports reliable, secure, and fast memory-to-memory and disk-to-disk data
movement over high-bandwidth WANs. Based on the popular FTP protocol for internet file
transfer, Grid FTP adds additional features such as parallel data transfer, third-party data
transfer, and striped data transfer. I n addition, Grid FTP benefits from using the strong
Globus Security Infra structure for securing data channels with authentication and reusability.
It has been reported that the grid has achieved 27 Gbit/second end-to-end transfer speeds over
some WANs.
 2. RFT provides reliable management of multiple Grid FTP transfers. I t has been used to
orchestrate the transfer of millions of files among many sites simultaneously.
 3. RLS (Replica Location Service) is a scalable system for maintaining and providing access
to information about the location of replicated files and data sets.
 4. OGSA-DAI (Globus Data Access and Integration) tools were developed by the UK
eScience program and provide access to relational and XML databases.
Hadoop:
 Apache top level project, open-source implementation of frameworks for reliable, scalable,
distributed computing and data storage.
 It is a flexible and highly-available architecture for large scale computation and data
processing on a network of commodity hardware.
 Hadoop offers a software platform that was originally developed by a Yahoo! group. The
package enables users to write and run applications over vast amounts of distributed data.
 Users can easily scale Hadoop to store and process petabytes of data in the web space.
 Hadoop is economical in that it comes with an open source version of MapReduce that
minimizes overhead in task spawning and massive data communication.
 It is efficient, as it processes data with a high degree of parallelism across a large number of
commodity nodes, and it is reliable in that it automatically keeps multiple data copies to
facilitate redeployment of computing tasks upon unexpected system failures.

CS6703 GRID AND CLOUD COMPUTING


 Hadoop:
 an open-source software framework that supports data-intensive distributed
applications, licensed under the Apache v2 license.
 Goals / Requirements:
 Abstract and facilitate the storage and processing of large and/or rapidly growing data
sets
 Structured and non-structured data
 Simple programming models
 High scalability and availability
 Use commodity (cheap!) hardware with little redundancy
 Fault-tolerance
 Move computation rather than data
Hadoop Framework Tools:

Hadoop’s Architecture:
 Distributed, with some centralization
 Main nodes of cluster are where most of the computational power and storage of the system
lies
 Main nodes run TaskTracker to accept and reply to MapReduce tasks, and also DataNode to
store needed blocks closely as possible
 Central control node runs NameNode to keep track of HDFS directories & files, and
JobTracker to dispatch compute tasks to TaskTracker

CS6703 GRID AND CLOUD COMPUTING


 Written in Java, also supports Python and Ruby
Hadoop’s Architecture:

 Hadoop Distributed Filesystem


 Tailored to needs of MapReduce
 Targeted towards many reads of filestreams
 Writes are more costly
 High degree of data replication (3x by default)
 No need for RAID on normal nodes
 Large blocksize (64MB)

CS6703 GRID AND CLOUD COMPUTING


 Location awareness of DataNodes in network
Hadoop’s Architecture:
NameNode:
 Stores metadata for the files, like the directory structure of a typical FS.
 The server holding the NameNode instance is quite crucial, as there is only one.
 Transaction log for file deletes/adds, etc. Does not use transactions for whole blocks or file-
streams, only metadata.
 Handles creation of more replica blocks when necessary after a DataNode failure
DataNode:
• Stores the actual data in HDFS
• Can run on any underlying filesystem (ext3/4, NTFS, etc)
• Notifies NameNode of what blocks it has
• NameNode replicates blocks 2x in local rack, 1x elsewhere

MapReduce Engine:
 JobTracker & TaskTracker
 JobTracker splits up data into smaller tasks(“Map”) and sends it to the TaskTracker process in
each node
 TaskTracker reports back to the JobTracker node and reports on job progress, sends data
(“Reduce”) or requests new jobs
 None of these components are necessarily limited to using HDFS
 Many other distributed file-systems with quite different architectures work

CS6703 GRID AND CLOUD COMPUTING


 Many other software packages besides Hadoop's MapReduce platform make use of HDFS
Hadoop in the Wild:
Hadoop is in use at most organizations that handle big data:
 Yahoo!
 Facebook
 Amazon
 Netflix
 Etc…
Some examples of scale:
 Yahoo!’s Search Webmap runs on 10,000 core Linux cluster and powers Yahoo! Web search
 FB’s Hadoop cluster hosts 100+ PB of data (July, 2012) & growing at ½ PB/day (Nov, 2012)
Three main applications of Hadoop:
 Advertisement (Mining user behavior to generate recommendations)
 Searches (group related documents)
 Security (search for uncommon patterns)
Hadoop Highlights
 Distributed File System
 Fault Tolerance
 Open Data Format
 Flexible Schema
 Queryable Database
Why use Hadoop?
 Need to process Multi Petabyte Datasets
 Data may not have strict schema
 Expensive to build reliability in each application
 Nodes fails everyday
 Need common infrastructure
 Very Large Distributed File System
 Assumes Commodity Hardware
 Optimized for Batch Processing
 Runs on heterogeneous OS
DataNode:

CS6703 GRID AND CLOUD COMPUTING


 A Block Sever
 Stores data in local file system
 Stores meta-data of a block - checksum
 Serves data and meta-data to clients
 Block Report
 Periodically sends a report of all existing blocks to NameNode
 Facilitate Pipelining of Data
 Forwards data to other specified DataNodes
Block Placement:
 Replication Strategy
 One replica on local node
 Second replica on a remote rack
 Third replica on same remote rack
 Additional replicas are randomly placed
 Clients read from nearest replica
Data Correctness:
 Use Checksums to validate data – CRC32
 File Creation
 Client computes checksum per 512 byte
 DataNode stores the checksum
 File Access
 Client retrieves the data and checksum from DataNode
 If validation fails, client tries other replicas
Data Pipelining:
 Client retrieves a list of DataNodes on which to place replicas of a block
 Client writes block to the first DataNode
 The first DataNode forwards the data to the next DataNode in the Pipeline
 When all replicas are written, the client moves on to write the next block in file
MapReduce Usage:
 Log processing
 Web search indexing
 Ad-hoc queries

CS6703 GRID AND CLOUD COMPUTING


MapReduce Process (org.apache.hadoop.mapred) :
 JobClient
 Submit job
 JobTracker
 Manage and schedule job, split job into tasks
 TaskTracker
 Start and monitor the task execution
 Child
 The process that really execute the task
Inter Process Communication:
 Protocol
 JobClient <-------------> JobTracker
 TaskTracker <------------> JobTracker
 TaskTracker <-------------> Child
 JobTracker impliments both protocol and works as server in both IPC
 TaskTracker implements the TaskUmbilicalProtocol; Child gets task information and reports
task status through it.
JobClient.submitJob – 1:
 Check input and output, e.g. check if the output directory is already existing
 job.getInputFormat().validateInput(job);
 job.getOutputFormat().checkOutputSpecs(fs, job);
 Get InputSplits, sort, and write output to HDFS
 InputSplit[] splits = job.getInputFormat().
getSplits(job, job.getNumMapTasks());
 writeSplitsFile(splits, out); // out is $SYSTEMDIR/$JOBID/job.split
JobClient.submitJob - 2 :
 The jar file and configuration file will be uploaded to HDFS system directory
 job.write(out); // out is $SYSTEMDIR/$JOBID/job.xml
 JobStatus status = jobSubmitClient.submitJob(jobId);
 This is an RPC invocation, jobSubmitClient is a proxy created in the initialization
Job initialization on JobTracker - 1 :
 JobTracker.submitJob(jobID) <-- receive RPC invocation request
 JobInProgress job = new JobInProgress(jobId, this, this.conf)

CS6703 GRID AND CLOUD COMPUTING


 Add the job into Job Queue
 jobs.put(job.getProfile().getJobId(), job);
 jobsByPriority.add(job);
 jobInitQueue.add(job);
Job initialization on JobTracker - 2 :
 Sort by priority
 resortPriority();
 compare the JobPrioity first, then compare the JobSubmissionTime
 Wake JobInitThread
 jobInitQueue.notifyall();
 job = jobInitQueue.remove(0);
 job.initTasks();
JobTracker Task Scheduling - 1 :
 Task getNewTaskForTaskTracker(String taskTracker)
 Compute the maximum tasks that can be running on taskTracker
 int maxCurrentMap Tasks = tts.getMaxMapTasks();
 int maxMapLoad = Math.min(maxCurrentMapTasks, (int)Math.ceil(double)
remainingMapLoad/numTaskTrackers));
Demo:
 Word Count
 hadoop jar hadoop-0.20.2-examples.jar wordcount <input dir> <output dir>
 Hive
 hive -f pagerank.hive

CS6703 GRID AND CLOUD COMPUTING


CS6703 GRID AND CLOUD COMPUTING
UNIT V SECURITY
Trust models for Grid security environment – Authentication and Authorization
methods – Grid security infrastructure – Cloud Infrastructure security: network, host and application
level – aspects of data security, provider data and its security, Identity and access management
architecture, IAM practices in the cloud, SaaS, PaaS, IaaS availability in the cloud, Key privacy issues
in the cloud.
Definition of Trust
 Trust is the firm belief in the competence of an entity to behave as expected such that this
firm belief is a dynamic value associated with the entity and is subject to the entity’s behavior
and applies only within a specific context at a given time
Trust
 Trust value is a continuous and dynamic value in the range of [0,1]
 1 means very trustworthy
 0 means very untrustworthy
 It is built on past experience
 It is context based (under different context may have different trust value)
Trust models for Grid security environment:
 Many potential security issues may occur in a grid environment if qualified security
mechanisms are not in place. These issues include
1. network sniffers,
2. out-of-control access,
3. faulty operation,
4. malicious operation,
5. integration of local security mechanisms,
6. delegation,
7. dynamic resources and services,
8. attack provenance, and so on.
Security Demand (SD) and Trust Index (TI):
 On the one hand, a user job demands the resource site to provide security assurance by issuing
a security demand (SD).
 On the other hand, the site needs to reveal its trustworthiness, called its trust index (TI).
 These two parameters must satisfy a security assurance condition: TI ≥ SD during the j ob
mapping process.
 When determining its security demand, users usually care about some typical attributes.
These attributes and their values are dynamically changing and depend heavily on the trust
model, security policy, accumulated reputation, self-defense capability, attack history, and
site vulnerability.
CS6703 GRID AND CLOUD COMPUTING
 Three challenges are outlined below to establish the trust among grid sites
The first challenge is integration with existing systems and technologies.
 The resources sites in a grid are usually heterogeneous and autonomous.
 It is unrealistic to expect that a single type of security can be compatible with and adopted by
every hosting environment.
 At the same time, existing security infrastructure on the sites cannot be replaced overnight.
Thus, to be successful, grid security architecture needs to step up to the challenge of
integrating with existing security architecture and models across platforms and hosting
environments.
The second challenge is interoperability with different “hosting environments.”
 Services are often invoked across multiple domains, and need to be able to interact with one
another.
 The interoperation is demanded at the protocol, policy, and identity levels. For all these
levels, interoperation must be protected securely.
The third challenge is to construct trust relationships among interacting hosting environments.
 Grid service requests can be handled by combining resources on multiple security domains.
 Trust relationships are required by these domains during the end-to-end traversals.
 A service needs to be open to friendly and interested entities so that they can submit requests
and access securely.
Trust Model:
 Resource sharing among entities is one of the major goals of grid computing. A trust
relationship must be established before the entities in the grid interoperate with one another.
 To create the proper trust relationship between grid entities, two kinds of trust models are
often used.
1. One is the PKI -based model, which mainly exploits the PKI to authenticate and authorize
entities.
2. The other is the reputation-based model.

CS6703 GRID AND CLOUD COMPUTING


 Figure shows a general trust model.
 The three major factors which influence the trustworthiness of a resource site.
 An inference module is required to aggregate these factors. Followings are some existing
inference or aggregation methods. An intra-site fuzzy inference procedure is called to assess
defense capability and direct reputation.
 Defense capability is decided by the firewall, intrusion detection system (I DS), intrusion
response capability, and anti-virus capacity of the individual resource site.
 Direct reputation is decided based on the j ob success rate, site utilization, job turnaround
time, and j ob slowdown ratio measured.
 Recommended trust is also known as secondary trust and is obtained indirectly over the grid
network.
Reputation-Based Trust Model:
 In a reputation-based model, jobs are sent to a resource site only when the site is trustworthy
to meet users’ demands.
 The site trustworthiness is usually calculated from the following information: the defense
capability, direct reputation, and recommendation trust. The defense capability refers to the
site’s ability to protect itself from danger.
 It is assessed according to such factors as intrusion detection, firewall, response capabilities,
anti-virus capacity, and so on.
 Direct reputation is based on experiences of prior jobs previously submitted to the site.
 The reputation is measured by many factors such as prior j ob execution success rate,
cumulative site utilization, job turnaround time, job slowdown ratio, and so on.
 A positive experience associated with a site will improve its reputation. On the contrary, a
negative experience with a site will decrease its reputation.
A Fuzzy-Trust Model
 In this model, the j ob security demand (SD) is supplied by the user programs. The trust index
(TI ) of a resource site is aggregated through the fuzzy-logic inference process over all related
parameters. Specifically, one can use a two-level fuzzy logic to estimate the aggregation of
numerous trust parameters and security attributes into scalar quantities that are easy to use in
the j ob scheduling and resource mapping process.
 The TI is normalized as a single real number with 0 representing the condition with the
highest risk at a site and 1 representing the condition which is totally risk-free or fully trusted.
 The fuzzy inference is accomplished through four steps: fuzzification, inference, aggregation,
and defuzzification.
 The second salient feature of the trust model is that if a site’s trust index cannot match the job
security demand (i.e., SD > TI ), the trust model could deduce detailed security features to
guide the site security upgrade as a result of tuning the fuzzy system.
Authentication and Authorization Methods
 The major authentication methods in the grid include passwords, PKI , and Kerberos. The
password is the simplest method to identify users, but the most vulnerable one to use.
CS6703 GRID AND CLOUD COMPUTING
 The PKI is the most popular method supported by GSI . To implement PKI , we use a trusted
third party, called the certificate authority (CA). Each user applies a unique pair of public and
private keys. The public keys are issued by the CA by issuing a certificate, after recognizing a
legitimate user.
 The private key is exclusive for each user to use, and is unknown to any other users. A digital
certificate in IEEE X.509 format consists of the user name, user public key, CA name, and a
secrete signature of the user.
Authorization for Access Control
 The authorization is a process to exercise access control of shared resources.
 Decisions can be made either at the access point of service or at a centralized place.
 Typically, the resource is a host that provides processors and storage for services deployed on
it. Based on a set predefined policies or rules, the resource may enforce access for local
services.
 The central authority is a special entity which is capable of issuing and revoking polices of
access rights granted to remote accesses.
 The authority can be classified into three categories:
1. Attribute authorities - issue attribute assertions
2. Policy authorities - authorization policies , and
3. Identity authorities - issue certificates
 The authorization server makes the final authorization decision.
Three Authorization Models:

Figure shows three authorization models.


1. The subject-push model
2. The resource-pulling model
3. The authorization agent model
Three authorization model:
CS6703 GRID AND CLOUD COMPUTING
 The subject-push model is shown at the top diagram. The user conducts handshake with the
authority first and then with the resource site in a sequence.
 The resource-pulling model puts the resource in the middle. The user checks the resource
first. Then the resource contacts its authority to verify the request, and the authority
authorizes at step 3. Finally the resource accepts or rejects the request from the subject at step
4.
 The authorization a gent model puts the authority in the middle. The subject check with the
authority at step 1 and the authority makes decisions on the access of the requested resources.
The authorization process is complete at steps 3 and 4 in the reverse direction.
Grid Security Infrastructure (GSI):
 The grid is increasingly deployed as a common approach to constructing dynamic, inter
domain, distributed computing and data collaborations, still there is “lack of security/trust
between different services”.
 It is still an important challenge of the grid.
 The grid requires a security infrastructure with the following properties:
1. easy to use;
2. conforms with the virtual organization (VO’s) security needs while working well with site
policies of each resource provider site; and
3. provides appropriate authentication and encryption of all interactions.
4. The GSI is an important step toward satisfying these requirements.
5. GSI is well-known security solution in the grid environment,
6. GSI is a portion of the Globus Toolkit and provides fundamental security services needed to
support grids, including supporting for message protection, authentication and delegation, and
authorization.
7. GSI enables secure authentication and communication over an open network, and permits
mutual authentication across and among distributed sites with single sign-on capability.
8. No centrally managed security system is required, and the grid maintains the integrity of its
members’ local policies.
9. GSI supports both message-level security, which supports the WS-Security standard and the
WS-Secure Conversation specification to provide message protection for SOAP messages,
and transport-level security, which means authentication via transport-level security (TLS
)with support for X.509 proxy certificates.
GSI Functional Layers:
 GT4 provides distinct WS and pre-WS authentication and authorization capabilities.
 Both build on the same base, namely the X.509 standard and entity certificates and proxy
certificates, which are used to identify persistent entities such as users and servers and to
support the temporary delegation of privileges to other entities, respectively. As shown in
Figure.
 GSI may be thought of as being composed of four distinct functions: message protection,
authentication, delegation, and authorization.
CS6703 GRID AND CLOUD COMPUTING
 TLS (transport-level security) or WS-Security and WS-Secure Conversation (message level)
are used as message protection mechanisms in combination with SOAP.
 X.509 End Entity Certificates or Username and Password are used as authentication
credentials. X.509 Proxy Certificates and WS-Trust are used for delegation.
 An Authorization Framework allows for a variety of authorization schemes, including a “grid-
mapfile” ACL, an ACL defined by a service, a custom authorization handler, and access to an
authorization service via the SAML protocol.
 In addition, associated security tools provide for the storage of X.509 credentials, the
mapping between GSI and other authentication mechanisms and maintenance of information
used for authorization.
 The web services portions of GT4 use SOAP as their message protocol for communication.
 Message protection can be provided either by
 transport-level security, which transports SOAP messages over TLS, or by
 message-level security, which is signing and/or encrypting Portions of the SOAP message
using the WS-Security standard.
 The X.509 certificates used by GSI are conformant to the relevant standards and conventions.

 Grid deployments around the world have established their own CAs based on third-party
software to issue the X.509 certificate for use with GSI and the Globus Toolkit.
 GSI also supports delegation and single sign-on through the use of standard .X.509 proxy
certificate. Proxy certificate allow bearers of X.509 to delegate their privileges temporarily to
another entity.
 For the purposes of authentication and authorization, GSI treats certificate and proxy
certificate equivalently. Authentication with X.509 credentials can be accomplished either via
TLS, in the case of transport-level security, or via signature as specified by WS-Security, in
the case of message-level security.
Authentication and Delegation:

CS6703 GRID AND CLOUD COMPUTING


 To reduce or even avoid the number of times the user must enter his passphrase when several
grids are used or have agents (local or remote) requesting services on behalf of a user, GSI
provides a delegation capability and a delegation service that provides an interface to allow
clients to delegate (and renew) X.509 proxy certificates to a service.
 The interface to this service is based on the WS-Trust specification. A proxy consists of a new
certificate and a private key. The key pair that is used for the proxy, that is, the public key
embedded in the certificate and the private key, may either be regenerated for each proxy or
be obtained by other means.
 The new certificate contains the owner’s identity, modified slightly to indicate that it is a
proxy. The new certificate is signed by the owner, rather than a CA
Trust Delegation:
 GSI has traditionally supported authentication and delegation through the use of X.509
certificate and public keys. As a new feature in GT4, GSI also supports authentication
through plain usernames and passwords as a deployment option.
 As a central concept in GSI authentication, a certificate includes four primary pieces of
information:
 (1) a subject name, which identifies the person or object that the certificate represents;
 (2) the public key belonging to the subject;
 (3) the identity of a CA that has signed the certificate to certify that the public key and the
identity both belong to the subject; and
 (4) the digital signature of the named CA. X.509 provides each entity with a unique identifier
(i.e., a distinguished name) and a method to assert that identifier to another party through the
use of an asymmetric key pair bound to the identifier by the certificate.
Risks and Security Concerns With Cloud Computing:
 Many of the cloud computing associated risks are not new and can be found in the computing
environments. There are many companies and organizations that outsource significant parts of
their business due to the globalization.
 It means not only using the services and technology of the cloud provider, but many questions
dealing with the way the provider runs his security policy.
 After performing an analysis the top threats to cloud computing can be summarized as
follows
1. Abuse and Unallowed Use of Cloud Computing;
2. Insecure Application Programming Interfaces;
CS6703 GRID AND CLOUD COMPUTING
3. Malicious Insiders;
4. Shared Technology Vulnerabilities;
5. Data Loss and Leakage
6. Account, Service and Traffic Hijacking;
7. Unknown Risk Profile.
Cloud Security Principles:
 Public cloud computing requires a security model that coordinates scalability and multi-
tenancy with the requirement for trust. As enterprises move their computing
environments with their identities, information and infrastructure to the cloud, they must
be willing to give up some level of control.
 In order to do so they must be able to trust cloud systems and providers, as well as to
verify cloud processes and events.
 Important building blocks of trust and verification relationships include access control,
data security, compliance and event management - all security elements well understood
by IT departments today, implemented with existing products and technologies, and
extendable into the cloud. The cloud security principles comprise three categories:
1. identity,
2. information and
3. infrastructure.
Identity security:
 End-to-end identity management, third-party authentication services and identity must
become a key element of cloud security. Identity security keeps the integrity and
confidentiality of data and applications while making access readily available to appropriate
users. Support for these identity management capabilities for both users and infrastructure
components will be a major requirement for cloud computing and identity will have to be
managed in ways that build trust. It will require:
 Stronger authentication: Cloud computing must move beyond authentication of username
and password, which means adopting methods and technologies that are IT standard IT such
as strong authentication, coordination within and between enterprises, and risk-based
authentication, measuring behavior history, current context and other factors to assess the
risk level of a user request.
 Stronger authorization: Authorization can be stronger within an enterprise or a private
cloud, but in order to handle sensitive data and compliance requirements, public clouds
will need stronger authorization capabilities that can be constant throughout the lifecycle of
the cloud infrastructure and the data.
Information security:
 In the traditional data center, controls on physical access, access to hardware and software
and identity controls all combine to protect the data. In the cloud, that protective barrier that
secures infrastructure is diffused. The data needs its own security and will require

CS6703 GRID AND CLOUD COMPUTING


 Data isolation: In multi-tenancy environment data must be held securely in order to
protect it when multiple customers use shared resources. Virtualization, encryption and
access control will be workhorses for enabling varying degrees of separation between
corporations, communities of interest and users.
 Stronger data security: In existing data center environments the role-based access control
at the level of user groups is acceptable in most cases since the information remains within
the control of the enterprise. However, sensitive data will require security at the file, field or
block level to meet the demands of assurance and compliance for information in the cloud.
 Effective data classification: Enterprises will need to know what type of data is important
and where it is located as prerequisites to making performance cost-benefit decisions, as
well as ensuring focus on the most critical areas for data loss prevention procedures.
 Information rights management: it is often treated as a component of identity on which users
have access to. The stronger data-centric security requires policies and control mechanisms
on the storage and use of information to be associated directly with the information itself.
 Governance and compliance: A major requirement of corporate information governance
and compliance is the creation of management and validation information - monitoring and
auditing the security state of the information with logging capabilities. The cloud computing
infrastructures must be able to verify that data is being managed per the applicable local
and international regulations with appropriate controls, log collection and reporting.
Cloud Infrastructure Security:
 IaaS application providers treat the applications within the customer virtual instance as a
black box and therefore are completely indifferent to the operations and management of
a applications of the customer. The entire pack (customer application and run time
application) is run on the customers’ server on provider infrastructure and is managed by
customers themselves. For this reason it is important to note that the customer must take
full responsibility for securing their cloud deployed applications.
 Cloud deployed applications must be designed for the internet threat model.
 They must be designed with standard security countermeasures to guard against the
common web vulnerabilities.
 Customers are responsible for keeping their applications up to date - and must therefore
ensure they have a patch strategy to ensure their applications are screened from malware
and hackers scanning for vulnerabilities to gain unauthorized access to their data within
the cloud.
 Customers should not be tempted to use custom implementations of Authentication,
Authorization and Accounting as these can become weak if not properly implemented.
 The foundational infrastructure for a cloud must be inherently secure whether it is a private
or public cloud or whether the service is SAAS, PAAS or IAAS.
 Inherent component-level security: The cloud needs to be architected to be secure, built
with inherently secure components, deployed and provisioned securely with strong
interfaces to other components and supported securely, with vulnerability-assessment and
change-management processes that pro- duce management information and service-level
assurances that build trust.

CS6703 GRID AND CLOUD COMPUTING


 Stronger interface security: The points in the system where interaction takes place (user-to-
network, server-to application) require stronger security policies and controls that ensure
consistency and accountability.
 Resource lifecycle management: The economics of cloud computing are based on multi-
tenancy and the sharing of resources. As the needs of the customers and requirements will
change, a service provider must provision and decommission correspondingly those
resources - bandwidth, servers, storage and security. This lifecycle process must be
managed in order to build trust.
 The infrastructure security can be viewed, assessed and implemented according its building
levels - the network, host and application levels
Infrastructure Security - The Network Level:
 When looking at the network level of infrastructure security, it is important to distinguish
between public clouds and private clouds. important to distinguish between public clouds
and private clouds. With private clouds, there are no new attacks, vulnerabilities, or changes
in risk specific to this topology that information security personnel need to consider.
 If public cloud services are chosen, changing security requirements will require changes to
the network topology and the manner in which the existing network topology interacts with
the cloud provider’s network topology should be taken into account. There are four
significant risk factors in this use case:
1. Ensuring the confidentiality and integrity of organization’s data-in-transit to and from a
public cloud provider;
2. Ensuring proper access control (authentication, authorization, and auditing) to whatever
resources are used at the public cloud provider;
3. Ensuring the availability of the Internet-facing resources in a public cloud that are being
used by an organization, or have been assigned to an organization by public cloud providers;
4. Replacing the established model of network zones and tiers with domains.
Infrastructure Security - The Host Level:
 When reviewing host security and assessing risks, the context of cloud services delivery
models (SaaS, PaaS, and IaaS) and deployment models public, private, and hybrid) should
be considered.
 The host security responsibilities in SaaS and PaaS services are transferred to the
provider of cloud services.
 IaaS customers are primarily responsible for securing the hosts provisioned in the cloud
(virtualization software security, customer guest OS or virtual server security).
Infrastructure Security - The Application Level:
 Application or software security should be a critical element of a security pro- gram. Most
enterprises with information security programs have yet to institute an application security
program to address this realm.
 Designing and implementing applications aims at deployment on a cloud platform will
require existing application security programs to reevaluate current practices and
standards.

CS6703 GRID AND CLOUD COMPUTING


 The application security spectrum ranges from standalone single-user applications to
sophisticated multiuser e-commerce applications used by many users. The level is
responsible for managing .
1. Application-level security threats;
2. End user security;
3. SaaS application security;
4. PaaS application security;
5. PaaS application security;
6. Customer-deployed application security
7. IaaS application security
8. Public cloud security limitations
 It can be summarized that the issues of infrastructure security and cloud computing lie in
the area of definition and provision of security specified aspects each party delivers.
Aspects of Data Security:
Security for
1. Data in transit
2. Data at rest
3. Processing of data including multitenancy
4. Data Lineage
5. Data Provenance
6. Data remanance
 Solutions include encryption, identity management, sanitation
Provider Data and its Security:
 What data does the provider collect – e.g., metadata, and how can this data be secured?
1. Data security issues
2. Access control,
 Key management for encrypting
 Confidentiality, Integrity and Availability are objectives of data security in the cloud
Identity and Access Management (IAM) in the Cloud:
Trust Boundaries and IAM
 In a traditional environment, trust boundary is within the control of the organization
 This includes the governance of the networks, servers, services, and applications
 In a cloud environment, the trust boundary is dynamic and moves within the control of the
service provider as well ass organizations
CS6703 GRID AND CLOUD COMPUTING
 Identity federation is an emerging industry best practice for dealing with dynamic and loosely
coupled trust relationships in the collaboration model of an organization
 Core of the architecture is the directory service which is the repository for the identity,
credentials and user attributes
Why IAM?
 Improves operational efficiency and regulatory compliance management
 IAM enables organizations to achieve access cont6rol and operational security
 Cloud use cases that need IAM
 Organization employees accessing SaaS se4rvidce using identity federation
 IT admin access CSP management console to provision resources and access foe
users using a corporate identity
 Developers creating accounts for partner users in PaaS
 End uses access storage service in a cloud
 Applications residing in a cloud serviced provider access storage from another cloud
service
IAM Challenges:
 Provisioning resources to users rapidly to accommodate their changing roles
 Handle turnover in an organization
 Disparate dictionaries, identities, access rights
 Need standards and protocols that address the IAM challenges
IAM Definitions:
 Authentication
 Verifying the identity of a user, system or service
 Authorization
 Privileges that a user or system or service has after being authenticated (e.g., access
control)
 Auditing
 Exam what the user, system or service has carried out
 Check for compliance
IAM Practice:
IAMN process consists of the following:
 User management (for managing identity life cycles),
 Authentication management,
 Authorization management,

CS6703 GRID AND CLOUD COMPUTING


 Access management,
 Data management and provisioning,
 Monitoring and auditing
 Provisioning,
 Credential and attribute management,
 Entitlement management,
 Compliance management,
 Identity federation management,
 Centralization of authentication and authorization,
IAM Practices in the Cloud:
 Cloud Identity Administration
 Life cycle management of user identities in the cloud
 Federated Identity (SSO)
 Enterprise an enterprise Identity provider within an Organization perimeter
 Cloud-based Identity provider
Cloud Authorization Management:
 XACML is the preferred model for authorization
 RBAC is being explored
 Dual roles: Administrator and User
 IAM support for compliance management
Cloud Service Provider and IAM Practice:
 What is the responsibility of the CSP and the responsibility of the organization/enterprise?
 Enterprise IAM requirements
 Provisioning of cloud service accounts to users
 Provisioning of cloud services for service to service integration’
 SSO support for users based on federation standards
 Support for international and regulatory policy requirements
 User activity monitoring
 How can enterprises expand their IAM requirements to SaaS, PaaS and IaaS
Security Management in the Cloud:
 Security Management Standards
 Security Management in the Cloud

CS6703 GRID AND CLOUD COMPUTING


 Availability Management
 Access Control
 Security Vulnerability, Patch and Configuration Management
Security Management Standards:
 Security Manage3ment has to be carried out in the cloud
 Standards include ITIL (Information Technology Infrastructure Library) and ISO
27001/27002
 What are the policies, procedures, processes and work instruction for managing security
Security Management in the Cloud:
 Availability Management (ITIL)
 Access Control (ISIO, ITIL)
 Vulnerability Management (ISO, IEC)
 Patch Management (ITIL)
 Configuration Management (ITIL)
 Incident Response (ISO/IEC)
 System use and Access Monitoring
Availability Management :
 SaaS availability
 Customer responsibility: Customer must understand SLA and communication
methods
 SaaS health monitoring
 PaaS availability
 Customer responsibility
 ‘PaaS health monitoring
 IaaS availability
 Customer responsibility
 IaaS health monitoring
Access Control Management in the Cloud:
 Who should have access and why
 How is a resources accessed
 How is the access monitored
 Impact of access control of SaaS, PaaS and IaaS
Security Vulnerability, Patch and Configuration (VPC) Management

CS6703 GRID AND CLOUD COMPUTING


 How can security vulnerability, patch and configuration management for an organization be
extended to a cloud environment
 What is the impact of VPS on SaaS, PaaS and IaaS
Privacy:
 Privacy and Data Life Cycle
 Key Privacy Concerns in the Cloud
 Who is Responsible for Privacy
 Privacy Risk Management and Compliance ion the Cloud
 Legal and Regulatory Requirements
Privacy and Data Life Cycle:
 Privacy: Accountability of organizations to data subjects as well as the transparency to an
organization’s practice around personal information
 Data Life Cycle
 Generation, Use, Transfer, Transformation, Storage, Archival, Destruction
 Need policies.
Privacy Risk Management and Compliance:
 Collection Limitation Principle
 Use Limitation Principle
 Security Principle
 Retention and Destruction Principle
 Transfer Principle
 Accountab9lity Principles.
Legal and Regulatory Requirements:
 US Regulations
 Federal Rules of Civil Procedure
 US Patriot Act
 Electronic Communications Privacy Act
 FISMA
 GLBA
 HIPAA
 HITECH Act
 International regulations
 EU Directive

CS6703 GRID AND CLOUD COMPUTING


 APEC Privacy Framework
Audit and Compliance:
 Internal Policy Compliance
 Governance, Risk and Compliance (GRC)
 Control Objectives
 Regulatory/External Compliance
 Cloud Security Alliance
 Auditing for Compliance
Audit and Compliance:
 Defines Strategy
 Define Requirements (provide services to clients)
 Defines Architecture (that is architect and structure services to meet requirements)
 Define Policies
 Defines process and procedures
 Ongoing operations
 Ongoing monitoring
 Continuous improvement
Governance, Risk and Compliance:
 Risk assessment
 Key controls (to address the risks and compliance requirements)
 Monitoring
 Reporting
 Continuous improvement
 Risk assessment – new IT projects and systems
Regulatory/External Compliance:
 Sarbanes-Oxley Act
 PCI DSS
 HIPAA
 COBIT
 What is the impact of Cloud computing on the above regulations?
Cloud Security Alliance (CSA):
 Create and apply best practices to securing the cloud

CS6703 GRID AND CLOUD COMPUTING


 Objectives include
 Promote common level of understanding between consumers and providers
 Promote independent research into best practices
 Launch awareness and educational programs
 Create consensus
 White Paper produced by CSA consist of 15 domains
 Architecture, Risk management, Legal, Lifecycle management, applications security,
storage, virtualization, - - - -
Auditing for Compliance:
 Internal and External Audits
 Audit Framework
 SAS 70
 SysTrust
 WebTrust
 ISO 27001 certification
 Relevance to Cloud
Cloud Service Providers:
1. Amazon Web Services (IaaS)
2. Google (SaaS, PaaS)
3. Microsoft Azure (SaaS, IaaS)
4. Proofpoint (SaaS, IaaS)
5. RightScale (SaaS)
6. Slaeforce.com (SaaS, PaaS)
7. Sun Open Cloud Platform
8. Workday (SaaS)
Security as a Service:
 Email Filtering
 Web Content Filtering
 Vulnerability Management
 Identity Management
Impact of Cloud Computing:
 Benefits
 Low cost solution
CS6703 GRID AND CLOUD COMPUTING
 Responsiveness flexibility
 IT Expense marches Transaction volume
 Business users are in direct control of technology decisions
 Line between home computing applications and enterprise applications will blur
 Threats
 Vested interest of cloud providers
 Less control over the use of technologies
 Perceived risk of using cloud computing
 Portability and Lock-in to Proprietary systems for CSPs
 Lack of integration and componentization

CS6703 GRID AND CLOUD COMPUTING


CS6703 GRID AND CLOUD COMPUTING

Вам также может понравиться