Академический Документы
Профессиональный Документы
Культура Документы
Set-1
Distributed Computing Systems : Over the past two decades, advancements in microelectronic technology have resulted in the availability of fast, inexpensive processors, and advancements in communication technology have resulted in the availability of cost-effective and highly efficient computer networks. The advancements in these two technologies favour the use of interconnected, multiple processors in place of a single, high-speed processor. Computer architectures consisting of interconnected, multiple processors are basically of two types: In tightly coupled systems, there is a single system wide primary memory (address space) that is shared by all the processors (Fig. 1.1). If any processor writes, for example, the value 100 to the memory location x, any other processor subsequently reading from location x will get the value 100. Therefore, in these systems, any communication between the processors usually takes place through the shared memory. In loosely coupled systems, the processors do not share memory, and each processor has its own local memory (Fig. 1.2). If a processor writes the value 100 to the memory location x, this write operation will only change the contents of its local memory and will not affect the contents of the memory of any other processor. Hence, if another processor reads the memory location x, it will get whatever value was there before in that location of its own local memory. In these systems, all physical communication between the processors is done by passing messages across the network that interconnects the processors. Usually, tightly coupled systems are referred to as parallel processing systems, and loosely coupled systems are referred to as distributed computing systems, or simply distributed systems. In contrast to the tightly coupled systems, the processors of distributed computing systems can be located far from each other to cover a wider geographical area. Furthermore, in tightly coupled systems, the number of processors that can be usefully deployed is usually small and limited by the bandwidth of the shared memory. This is not the case with distributed computing systems that are more freely expandable and can have an almost unlimited number of processors.
Hence, a distributed computing system is basically a collection of processors interconnected by a communication network in which each processor has its own local memory and other peripherals, and the communication between any two processors of the system takes place by message passing over the communication network. For a particular processor, its own resources are local, whereas the other processors and their resources are remote. Together, a processor and its resources are usually referred to as a node or site or machine of the distributed computing system.
Distributed Computing System Models : Distributed Computing system models can be broadly classified into five categories. They are ; Minicomputer model Workstation model Workstation server model Processor pool model Hybrid model Minicomputer Model :- The minicomputer model (Fig. 1.3) is a simple extension of the centralized time-sharing system. A distributed computing system based on this model consists of a few minicomputers (they may be large supercomputers as well) interconnected by a communication network. Each minicomputer usually has multiple users simultaneously logged on to it. For this, several interactive terminals are connected to each minicomputer. Each user is logged on to one specific minicomputer, with remote access to other minicomputers. The network allows a user to access remote resources that are available on some machine other than the one on to which the user is currently logged. The minicomputer model may be used when resource sharing (such as sharing of information databases of different types, with each type of database located on a different machine) with remote users is desired. The early ARPAnet is an example of a distributed computing system based on the minicomputer model.
Workstation Model:- A distributed computing system based on the workstation model (Fig. 1.4) consists of several workstations interconnected by a communication network. An organization may have several workstations located throughout a building or campus, each workstation equipped with its own disk and serving as a single-user computer. It has been often found that in such an environment, at any one time a significant proportion of the workstations are idle (not being used), resulting in the waste of large amounts of CPU time. Therefore, the idea of the workstation model is to interconnect all these workstations by a high-speed LAN so that idle workstations may be used to process jobs of users who are logged onto other workstations and do not have sufficient processing power at their own workstations to get their jobs processed efficiently.
Workstation Server Model:The workstation model is a network of personal workstations, each with its own disk and a local file system. A workstation with its own local disk is usually called a diskful workstation and a workstation without a local disk is called a diskless workstation. With the proliferation of high-speed networks, diskless workstations have become more popular in network environments than diskful workstations, making the workstation-server model more popular than the workstation model for building distributed computing systems. Workstation Server Model:- The workstation model is a network of personal workstations, each with its own disk and a local file system. A workstation with its own local disk is usually called a diskful
In this model, a user logs onto one of the workstations called his or her "home" workstation and submits jobs for execution. When the system finds that the user's workstation does not have sufficient processing power for executing the processes of the submitted jobs efficiently, it transfers one or more of the processes from the user's workstation to some other workstation that is currently idle and gets the process executed there, and finally the result of execution is returned to the user's workstation.
In this model, a user logs onto a workstation called his or her home workstation. Normal computation activities required by the user's processes are performed at the user's home workstation, but requests
Amoeba proposed by Mullender et al. in 1990 is an example of distributed computing systems based on the processor-pool model. Hybrid Model:- Out of the four models described above, the workstation-server model, is the most widely used model for building distributed computing systems. This is because a large number of computer users only perform simple interactive tasks such as editing jobs, sending electronic mails, and executing small programs. The workstation-server model is ideal for such simple usage. However,
2. Discuss the implementation of RPC Mechanism in detail. Ans:2 In computer science, a remote procedure call (RPC) is an inter-process communication that allows a computer program to cause a subroutine or procedure to execute in another address space (commonly on another computer on a shared network) without the programmer explicitly coding the details for this remote interaction. That is, the programmer writes essentially the same code whether the subroutine is local to the executing program, or remote. When the software in question uses objectoriented principles, RPC is called remote invocation or remote method invocation. Note that there are many different (often incompatible) technologies commonly used to accomplish this. Message passing An RPC is initiated by the client, which sends a request message to a known remote server to execute a specified procedure with supplied parameters. The remote server sends a response to the client, and the application continues its process. There are many variations and subtleties in various implementations, resulting in a variety of different (incompatible) RPC protocols. While the server is processing the call, the client is blocked (it waits until the server has finished processing before resuming execution). An important difference between remote procedure calls and local calls is that remote calls can fail because of unpredictable network problems. Also, callers generally must deal with such failures without knowing whether the remote procedure was actually invoked. Idempotent procedures (those that have no additional effects if called more than once) are easily handled, but enough difficulties remain that code to call remote procedures is often confined to carefully written low-level subsystems. Sequence of events during a RPC: The client calls the Client stub. The call is a local procedure call, with parameters pushed on to the stack in the normal way. The client stub packs the parameters into a message and makes a system call to send the message. Packing the parameters is called marshalling. The kernel sends the message from the client machine to the server machine. The kernel passes the incoming packets to the server stub. Finally, the server stub calls the server procedure. The reply traces the same steps in the reverse direction.
3. Explain the following: Distributed Shared Memory Systems Memory Consistency models Ans:3
Distributed Shared Memory Systems : Distributed Shared Memory (DSM), also known as a distributed global address space (DGAS), is a concept in computer science that refers to a wide
Memory Consistency models : Memory consistency model Order in which memory operations will appear to execute #What value can a read return? Affects ease-of-programming and performance An implementation of a memory consistency model is often stricter than the model would allow. For example, SC allows the possibility of a read returning a value that hasnt been written yet (see example discussed under 3.2 Sequential Consistency). Clearly, no implementation will ever exhibit an execution with such a history. In general, it is often simpler to implement a slightly stricter model than its definition would require. This is especially true for hardware realizations of shared memories [AHJ91, GLL+90]mThe memory consistency model of a shared-memory multiprocessor provides a formal specification of how the memory system will appear to the programmer, eliminating the gap between the behavior expected by the programmer and the actual behavior supported by a system. Effectively, the consistency model places restrictions on the values that can be returned by a read in a shared-memory programexecution. Intuitively, a read should return the value of the last write to the same memory location. In uniprocessors, last is precisely defined by program order, i.e., the order in which memory operations appear in the program. This is not the case in multiprocessors. For example, in Figure 1, the write and read of the Data field within a record are not related by program order because they reside on two different processors. Nevertheless, an intuitive extension of the uniprocessor model can be applied to the multiprocessor case. This model is called sequential consistency. Informally, sequential consistency requires that all memory operations appear to execute one at a time, and the operations of a single processor appear to execute in the order described by that processors program. Referring back to the program in Figure 1, this model ensures that the reads of the data field within a dequeued record will return the new values written by processor P1. Sequential consistency provides a simple and intuitive programming model. However, it disallows many hardware and compiler optimizations that are possible in uniprocessors by enforcing a strict order among shared memory operations. For this reason, a number of more relaxed memory consistency models have been proposed, including some that are supported by commercially available architectures such as Digital Alpha, SPARC V8 and V9, and IBM PowerPC. Unfortunately, there has been a vast variety of relaxed consistency models proposed in the literature that differ from one another in subtle but important
4. Discuss the following with respect to File Systems: Stateful Vs Stateless Servers Caching Ans:4
Stateful Vs Stateless Servers : The file servers that implement a distributed file service can be stateless or Stateful. Stateless file servers do not store any session state. This means that every client request is treated independently, and not as a part of a new or existing session. Stateful servers, on the other hand, do store session state. They may, therefore, keep track of which clients have opened which files, current read and write pointers for files, which files have been locked by which clients, etc. The main advantage of stateless servers is that they can easily recover from failure. Because there is no state that must be restored, a failed server can simply restart after a crash and immediately provide services to clients as though nothing happened. Furthermore, if clients crash the server is not stuck with abandoned opened or locked files. Another benefit is that the server implementation remains simple because it does not have to implement the state accounting associated with opening, closing, and locking of files. The main advantage of Stateful servers, on the other hand, is that they can provide better performance for clients. Because clients do not have to provide full file information every time they perform an operation, the size of messages to and from the server can be significantly decreased. Likewise the server can make use of knowledge of access patterns to perform read-ahead and do other optimizations. Stateful servers can also offer clients extra services such as file locking, and remember read and write positions. Caching : Besides replication, caching is often used to improve the performance of a DFS. In a DFS, caching involves storing either a whole file, or the results of file service operations. Caching can be performed at two locations: at the server and at the client. Server-side caching makes use of file caching provided by the host operating system. This is transparent to the server and helps to improve the servers performance by reducing costly disk accesses. Client-side caching comes in two flavours: ondisk caching, and in-memory caching. On-disk caching involves the creation of (temporary) files on the clients disk. These can either be complete files (as in the upload/download model) or they can contain partial file state, attributes, etc. In-memory caching stores the results of requests in the client-machines memory. This can be process-local (in the client process), in the kernel, or in a separate dedicated caching process. The issue of cache consistency in DFS has obvious parallels to the consistency issue in shared memory systems, but there are other tradeoffs (for example, disk access delays come into play, the granularity of sharing is different, sizes are different, etc.). Furthermore, because writethrough caches are too expensive to be useful, the consistency of caches will be weakened. This makes implementing Unix semantics impossible. Approaches used in DFS caches include, delayed writes where writes are not propagated to the server immediately, but in the background later on, and write-on-close where the server receives updates only after the file is closed. Adding a delay to write-on-close has the benefit of avoiding superfluous writes if a file is deleted shortly after it has been closed.