Академический Документы
Профессиональный Документы
Культура Документы
Dr.R.Anand
Associate Professor
Department of IT
UNIT I
INTRODUCTION
Evolution of Distributed computing: Scalable computing over the Internet – Technologies for network
based systems – clusters of cooperative computers - Grid computing Infrastructures – cloud computing -
service oriented architecture – Introduction to Grid Architecture and standards –Elements of Grid –
Overview of Grid Architecture.
1.1 Evolution of Distributed computing: Scalable computing over the Internet - Technologies for
network based systems - Clusters of cooperative computers
Certainly one can go back a long way to trace the history of distributed computing. Types of
distributed computing existed in the 1960s.
Many people were interested in connecting computers together for high performance computing
in the 1970s and in particular forming multicomputer or multiprocessor systems. From connecting
processors and computers together locally that began in earnest in the 1960s and 1970s, distributed
computing now extends to connecting computers that are geographically distant.
The distributed computing technologies that underpin Grid computing were developed
concurrently and rely upon each other.
There are three concurrent interrelated paths. They are:
•Networks
•Computing platforms
•Software techniques
Networks: Grid computing relies on high performance computer networks. The history of such networks
began in the 1960s with the development of packet switched networks. The most important and ground-
breaking geographically distributed packet-switched network was the DoD-funded ARPNET network
with a design speed of 50 Kbits/sec.
ARPNET became operational with four nodes (University of California at Los Angeles, Stanford
Research Institute, University of California at Santa Barbara and University of Utah) in 1969. TCP
(Transmission Control Protocol) was conceived in 1974 and became TCP/IP (Transmission Control
Protocol/Internet Protocol) in 1978. TCP/IP became universally adopted. TCP provided a protocol for
reliable communication while IP provided for network routing. Important concepts including IP addresses
to identify hosts on the Internet and ports that identify end points (processes) for communication
purposes. The Ethernet was also developed in the early 1970s and became the principal way of
interconnecting computers on local networks.
It initially enabled multiple computers to share a single Ethernet cable and handled
communication collisions with a retry protocol although nowadays this collision detection is usually not
needed as separate Ethernet cables are used for each computer, with Ethernet switches to make
connections. Each Ethernet interface has a unique physical address to identify it for communication
purpose, which is mapped to the host IP address.
The Internet began to be formed in early 1980s using the TCP/IP protocol. During the 1980s, the
Internet grew at a phenomenal rate. Networks continued to improve and became more pervasive
throughout the world. In the 1990s, the Internet developed into the World-Wide Web. The browser and
the HTML markup language was introduced. The global network enables computers to be interconnected
virtually anywhere in the world.
Computing Platforms: Computing systems began as single processor systems. It was soon recognized
that increased speed could potentially be obtained by having more than one processor inside a single
computer system and the term parallel computer was coined to describe such systems.
Parallel computers were limited to applications that required computers with the highest
computational speed. It was also recognized that one could connect a collection of individual computer
systems together quite easily to form a multicomputer system for higher performance. There were many
projects in the 1970s and 1980s with this goal, especially with the advent of low cost microprocessors.
Distributed computing such as Grid computing relies on causing actions to occur on remote computers.
Taking advantage of remote computers was recognized many years ago well before Grid computing. One
Client-server model
The server responds accordingly. The request and response are transmitted through the network from the
client to the server.
An early form of client-server arrangement was the remote procedure call (RPC) introduced in the 1980s.
This mechanism allows a local program to execute a procedure on a remote computer and get back results
from that procedure. It is now the basis of certain network facilities such as mounting remote files in a
shared file system. For the remote procedure call to work, the client needs to:
•Identify the location of the required procedure.
•Know how to communicate with the procedure to get it to provide the actions required.
The remote procedure call introduced the concept of a service registry to provide a means of locating the
service (procedure). Using a service registry is now part of what is called a service-oriented architecture
(SOA) as illustrated in the figure below. The sequence of events is as follows:
•First, the server (service provider) publishes its services in a service registry.
•Then, the client (service requestor) can ask the service registry to locate a service.
•Then, the client (service requestor) binds with service provider to invoke a service.
Service-oriented architecture.
Later forms of remote procedure calls in 1990s introduced distributed objects, most notably, CORBA
(Common Request Broker Architecture) and Java RMI (Remote Method Invocation).
OGSI
As grid computing has evolved it has become clear that a service-oriented architecture could provide
many benefits in the implementation of a grid infrastructure.
The Global Grid Forum extended the concepts defined in OGSA to define specific interfaces to various
services that would implement the functions defined by OGSA. More specifically, the Open Grid Services
Interface (OGSI) defines mechanisms for creating, managing exchanging information among Grid
services.
A Grid service is a Web service that conforms to a set of interfaces and behaviors that define how a client
interacts with a Grid service.
These interfaces and behaviors, along with other OGSI mechanisms associated with Grid service creation
and discovery, provide the basis for a robust grid environment. OGSI provides the Web Service
Definition Language (WSDL) definitions for these key interface
OGSA-DAI
The OGSA-DAI (data access and integration) project is concerned with constructing middleware to assist
with access and integration of data from separate data sources via the grid.
The project was conceived by the UK Database Task Force and is working closely with the Global Grid
Forum DAIS-WG and the Globus team.
GridFTP
GridFTP is a secure and reliable data transfer protocol providing high performance and optimized for
wide-area networks that have high bandwidth.
As one might guess from its name, it is based upon the Internet FTP protocol and includes extensions that
make it a desirable tool in a grid environment. The GridFTP protocol specification is a proposed
recommendation document in the Global Grid Forum (GFD-R-P.020). GridFTP uses basic Grid security
on both control (command) and data channels.
This is a centralized data repository model. All data is saved in central data repository.
When users want to access some data they have no submit request directly to the central
repository.
No data is replicated for preserving data locality.
For a larger grid this model is not efficient in terms of performance and reliability.
Data replication is permitted in this model only when fault tolerance is demanded.
Hierarchical model
It is suitable for building a large data grid which has only one large data access directory
Data may be transferred from the source to a second level center. Then some data in the
regional center is transferred to the third level centre.
After being forwarded several times specific data objects are accessed directly by users.
Higher level data center has a wider coverage area.
PKI security services are easier to implement in this hierarchical data access model
It is suited for designing a data grid with multiple source of data supplies.
It is also known as a mesh model
The data is shared the data and items are owned and controlled by their original owners.
Only authenticated users are authorized to request data from any data source.
This mesh model cost the most when the number of grid intuitions becomes very large
Hybrid model
This model combines the best features of the hierarchical and mesh models.
Traditional data transfer technology such as FTP applies for networks with lower bandwidth.
High bandwidth are exploited by high speed data transfer tools such as GridFTP developed
with Globus library.
The cost of hybrid model can be traded off between the two extreme models of hierarchical
and mesh-connected grids.
Parallel versus Striped Data Transfers
Parallel data transfer opens multiple data streams for passing subdivided segments of a file
simultaneously. Although the speed of each stream is same as in sequential streaming, the
total time to move data in all streams can be significantly reduced compared to FTP transfer.
Cost Model
The platform and ecosystem views of cloud computing represent a new paradigm, and promote a new
way of computing.
Virtual Machines:
VM technology allows multiple virtual machines to run on a single physical machine.
Difference between Traditional and Virtual machines
A traditional computer runs with a host operating system specially tailored for its hardware
architecture
Virtualization Layers
The virtualization software creates the abstraction of VMs by interposing a virtualization layer at
various levels of a computer system. Common virtualization layers include
1. the instruction set architecture (ISA) level,
2. hardware level,
3. operating system level,
4. library support level, and
5. application level
Virtualization Ranging
from Hardware to
Applications in Five
Abstraction Levels
Memory Virtualization:
Virtual memory virtualization is similar to the virtual memory support provided by modern
operating systems. I n a traditional execution environment, the operating system maintains
mappings of virtual memory to ma chine memory using page tables, which is a one-stage
mapping from virtual memory to machine memory.
However, in a virtual execution environment, virtual memory virtualization involves sharing
the physical system memory in RAM and dynamically allocating it to the physical memory
of the VMs.
That means a two-stage mapping process should be maintained by the guest OS and the
VMM, respectively: virtual memory to physical memory and physical memory to machine
memory.
Hadoop’s Architecture:
Distributed, with some centralization
Main nodes of cluster are where most of the computational power and storage of the system
lies
Main nodes run TaskTracker to accept and reply to MapReduce tasks, and also DataNode to
store needed blocks closely as possible
Central control node runs NameNode to keep track of HDFS directories & files, and
JobTracker to dispatch compute tasks to TaskTracker
MapReduce Engine:
JobTracker & TaskTracker
JobTracker splits up data into smaller tasks(“Map”) and sends it to the TaskTracker process in
each node
TaskTracker reports back to the JobTracker node and reports on job progress, sends data
(“Reduce”) or requests new jobs
None of these components are necessarily limited to using HDFS
Many other distributed file-systems with quite different architectures work
Grid deployments around the world have established their own CAs based on third-party
software to issue the X.509 certificate for use with GSI and the Globus Toolkit.
GSI also supports delegation and single sign-on through the use of standard .X.509 proxy
certificate. Proxy certificate allow bearers of X.509 to delegate their privileges temporarily to
another entity.
For the purposes of authentication and authorization, GSI treats certificate and proxy
certificate equivalently. Authentication with X.509 credentials can be accomplished either via
TLS, in the case of transport-level security, or via signature as specified by WS-Security, in
the case of message-level security.
Authentication and Delegation: