Вы находитесь на странице: 1из 11

MC0085 Advanced Operating Systems(Distributed Systems)

Assignment Set 1
1. What is a message passing system? Discuss the desirable features of a message passing system. ANS:-

2. Discuss the implementation of RPC Mechanism in detail. ANS:-

In computer science, a remote procedure call (RPC) is an interprocess communication that allows a computer program to cause a subroutine or procedure to execute in another address space (commonly on another computer on a shared network) without the programmer explicitly coding the details for this remote interaction. That is, the programmer writes essentially the same code whether the subroutine is local to the executing program, or remote. When the software in question uses object-oriented principles, RPC is called remote invocation or remote method invocation. Note that there are many different (often incompatible) technologies commonly used to accomplish this. Message passing An RPC is initiated by the client, which sends a request message to a known remote server to execute a specified procedure with supplied parameters. The remote server sends a response to the client, and the application continues its process. There are many variations and subtleties in various implementations, resulting in a variety of different (incompatible) RPC protocols. While the server is processing the call, the client is blocked (it waits until the server has finished processing before resuming execution).

An important difference between remote procedure calls and local calls is that remote calls can fail because of unpredictable network problems. Also, callers generally must deal with such

failures without knowing whether the remote procedure was actually invoked. Idempotent procedures (those that have no additional effects if called more than once) are easily handled, but enough difficulties remain that code to call remote procedures is often confined to carefully written low-level subsystems. Sequence of events during a RPC The client calls the Client stub. The call is a local procedure call, with parameters pushed on to the stack in the normal way. The client stub packs the parameters into a message and makes a system call to send the message. Packing the parameters is called marshalling. The kernel sends the message from the client machine to the server machine. The kernel passes the incoming packets to the server stub. Finally, the server stub calls the server procedure. The reply traces the same steps in the reverse direction.
3. Discuss the following with respect to Distributed Shared Memory: a. Memory Coherence (Consistency) Models b. Memory Consistency models c. Implementing Sequential Consistency d. Centralized Server Algorithm ANS:4. Explain the following with respect to Resource Management in Distributed Systems: a. Task assignment Approach b. Load Balancing Approach c. Load Sharing Approach ANS:-

A) Task assignment Approach Each process is viewed as a collection of tasks. These tasks are scheduled to suitable processor to improve performance. This is not a widely used approach because: It requires characteristics of all the processes to be known in advance.

This approach does not take into dynamically changing state of the system.

consideration

the

In this approach, a process is considered to be composed of multiple tasks and the goal is to find an optimal assignment policy for the tasks of an individual process. The following are typical assumptions for the task assignment approach: Minimize IPC cost (this problem can be modeled using network flow model)

Efficient resource utilization Quick turnaround time A high degree of parallelism

B) Approach Load Balancing In this, the processes are distributed among nodes to equalize the load among all nodes. The scheduling algorithms that use this approach are known as Load Balancing or Load Leveling Algorithms. These algorithms are based on the intuition that for better resource utilization, it is desirable for the load in a distributed system to be balanced evenly. This a load balancing algorithm tries to balance the total system load by transparently transferring the workload from heavily loaded nodes to lightly loaded nodes in an attempt to ensure good overall performance relative to some specific metric of system performance. We can have the following categories of load balancing algorithms: Static: Ignore the current state of the system. E.g. if a node is heavily loaded, it picks up a task randomly and transfers it to a random node. These algorithms are simpler to implement but performance may not be good.

Dynamic: Use the current state information for load balancing. There is an overhead involved in collecting state information periodically; they perform better than static algorithms.

Deterministic: Algorithms in this class use the processor and process characteristics to allocate processes to nodes.

Probabilistic: Algorithms in this class use information regarding static attributes of the system such as number of nodes, processing capability, etc.

Centralized: System state information is collected by a single node. This node makes all scheduling decisions.

Distributed: Most desired approach. Each node is equally responsible for making scheduling decisions based on the local state and the state information received from other sites.

Cooperative: A distributed dynamic scheduling algorithm. In these algorithms, the distributed entities cooperate with each other to make scheduling decisions. Therefore they are more complex and involve larger overhead than non-cooperative ones. But the stability of a cooperative algorithm is better than of a non-cooperative one.

Non-Cooperative: A distributed dynamic scheduling algorithm. In these algorithms, individual entities act as autonomous entities and make scheduling decisions independently of the action of other entities.

C) Load Sharing Approach Several researchers believe that load balancing, with its implication of attempting to equalize workload on all the nodes of the system, is not an appropriate objective. This is because the overhead involved in gathering the state information to achieve this objective is normally very large, especially in distributed systems having a large number of nodes. In fact, for the proper utilization of resources of a distributed system, it is not required to balance the load on all the nodes. It is necessary and sufficient to prevent the nodes from being idle

while some other nodes have more than two processes. This rectification is called the Dynamic Load Sharing instead of Dynamic Load Balancing. The design of a load sharing algorithms require that proper decisions be made regarding load estimation policy, process transfer policy, state information exchange policy, priority assignment policy, and migration limiting policy. It is simpler to decide about most of these policies in case of load sharing, because load sharing algorithms do not attempt to balance the average workload of all the nodes of the system. Rather, they only attempt to ensure that no node is idle when a node is heavily loaded. The priority assignments policies and the migration limiting policies for load-sharing algorithms are the same as that of load-balancing algorithms.
5. Explain the following with respect to Distributed File Systems: a. The Key Challenges of Distributed Systems b. Clients Perspective: File Services c. File Access Semantics d. Servers Perspective Implementation e. Stateful Versus Stateless Servers ANS:6. Describe the Clock Synchronization Algorithms and Distributed Algorithms in the context of Synchronization. ANS:- Clock Synchronization Algorithms

Clock synchronization algorithms may be broadly classified as Centralized and Distributed: Centralized Algorithms In centralized clock synchronization algorithms one node has a real-time receiver. This node, called the time server node whose clock time is regarded as correct and used as the reference time. The goal of these algorithms is to keep the clocks of all other nodes synchronized with the clock time of the time server node. Depending on the role of the time server node, centralized clock synchronization algorithms are again of two types Passive Time Sever and Active Time Server. 1. Passive Time Server Centralized Algorithm: In this method each node periodically sends a message to the time server. When the time server receives the message, it quickly

responds with a message (time = T), where T is the current time in the clock of the time server node. Assume that when the client node sends the time = ? message, its clock time is T0, and when it receives the time = T message, its clock time is T1. Since T0 and T1 are measured using the same clock, in the absence of any other information, the best estimate of the time required for the propagation of the message time = T from the time server node to the clients node is (T1-T0)/2. Therefore, when the reply is received at the clients node, its clock is readjusted to T + (T1-T0)/2. 2. Active Time Server Centralized Algorithm: In this approach, the time server periodically broadcasts its clock time (time = T). The other nodes receive the broadcast message and use the clock time in the message for correcting their own clocks. Each node has a priori knowledge of the approximate time (Ta) required for the propagation of the message time = T from the time server node to its own node, Therefore, when a broadcast message is received at a node, the nodes clock is readjusted to the time T+Ta. A major drawback of this method is that it is not fault tolerant. If the broadcast message reaches too late at a node due to some communication fault, the clock of that node will be readjusted to an incorrect value. Another disadvantage of this approach is that it requires broadcast facility to be supported by the network. 2. Another active time server algorithm that overcomes the drawbacks of the above algorithm is the Berkeley algorithm proposed by Gusella and Zatti for internal synchronization of clocks of a group of computers running the Berkeley UNIX. In this algorithm, the time server periodically sends a message (time = ?) to all the computers in the group. On receiving this message, each computer sends back its clock value to the time server. The time server has a priori knowledge of the approximate time required for the propagation of a message from each node to its own node. Based on this knowledge, it first readjusts the clock values of the reply messages, It then takes a fault-tolerant average of the clock values of all the computers (including its own). To take the fault tolerant average, the time server chooses a subset of all clock values that do not differ from one another by more than a specified amount, and the average is taken only for the clock values in

this subset. This approach eliminates readings from unreliable clocks whose clock values could have a significant adverse effect if an ordinary average was taken. The calculated average is the current time to which all the clocks should be readjusted, The time server readjusts its own clock to this value, Instead of sending the calculated current time back to other computers, the time server sends the amount by which each individual computers clock requires adjustment, This can be a positive or negative value and is calculated based on the knowledge the time server has about the approximate time required for the propagation of a message from each node to its own node. Centralized clock synchronization algorithms suffer from two major drawbacks: 1. They are subject to single point failure. If the time server node fails, the clock synchronization operation cannot be performed. This makes the system unreliable. Ideally, a distributed system, should be more reliable than its individual nodes. If one goes down, the rest should continue to function correctly. 2. From a scalability point of view it is generally not acceptable to get all the time requests serviced by a single time server. In a large system, such a solution puts a heavy burden on that one process. Distributed Algorithms We know that externally synchronized clocks are also internally synchronized. That is, if each nodes clock is independently synchronized with real time, all the clocks of the system remain mutually synchronized. Therefore, a simple method for clock synchronization may be to equip each node of the system with a real time receiver so that each nodes clock can be independently synchronized with real time. Multiple real time clocks (one for each node) are normally used for this purpose. Theoretically, internal synchronization of clocks is not required in this approach. However, in practice, due to inherent inaccuracy of real-time clocks, different real time clocks produce different time. Therefore, internal synchronization is

normally performed for better accuracy. One of the following two approaches is used for internal synchronization in this case. 1. Global Averaging Distributed Algorithms: In this approach, the clock process at each node broadcasts its local clock time in the form of a special resync message when its local time equals T0+iR for some integer I, where T0 is a fixed time in the past agreed upon by all nodes and R is a system parameter that depends on such factors as the total number of nodes in the system, the maximum allowable drift rate, and so on. i.e. a resync message is broadcast from each node at the beginning of every fixed length resynchronization interval. However, since the clocks of different nodes run slightly different rates, these broadcasts will not happen simultaneously from all nodes. After broadcasting the clock value, the clock process of a node waits for time T, where T is a parameter to be determined by the algorithm. During this waiting period, the clock process records the time, according to its own clock, when the message was received. At the end of the waiting period, the clock process estimates the skew of its clock with respect to each of the other nodes on the basis of the times at which it received resync messages. It then computes a fault-tolerant average of the next resynchronization interval. 2. The global averaging algorithms differ mainly in the manner in which the fault-tolerant average of the estimated skews is calculated. Two commonly used algorithms are: 1. The simplest algorithm is to take the average of the estimated skews and use it as the correction for the local clock. However, to limit the impact of faulty clocks on the average value, the estimated skew with respect to each node is compared against a threshold, and skews greater than the threshold are set to zero before computing the average of the estimated skews. 2. In another algorithm, each node limits the impact of faulty clocks by first discarding the m highest and m lowest estimated skews and then calculating the average of the remaining skews, which is then used as the correction for the local clock. The value of m is usually decided based on the total number of clocks (nodes).
Assignment Set 2

1. In what aspects is the design of a distributed file system different from that of a centralized file system? ANS:2. What are the Issues in Load-Sharing Algorithms? Discuss in detail . ANS:-

Several researchers believe that load balancing, with its implication of atte equalize workload on all the nodes of the system, is not an appropriate ob is because the overhead involved in gathering the state information to ach objective is normally very large, especially in distributed systems having a number of nodes. In fact, for the proper utilization of resources of a distrib system, it is not required to balance the load on all the nodes. It is necess sufficient to prevent the nodes from being idle while some other nodes hav two processes. This rectification is called the Dynamic Load Sharing instea Dynamic Load Balancing.

The design of a load sharing algorithms require that proper decisions be m regarding load estimation policy, process transfer policy, state information policy, priority assignment policy, and migration limiting policy. It is simple about most of these policies in case of load sharing, because load sharing do not attempt to balance the average workload of all the nodes of the sys Rather, they only attempt to ensure that no node is idle when a node is he loaded. The priority assignments policies and the migration limiting policie sharing algorithms are the same as that of load-balancing algorithms.

3. Explain the following with respect to Synchronization in Distributed Systems: a. Clock Synchronization b. Clock Synchronization Algorithms c. Distributed Algorithms d. Event Ordering ANS:4. Explain the following with respect to Naming in Distributed Systems: a. Desirable Features of a Good Naming system b. Fundamental Terminologies and Concepts c. System Oriented Names ANS:5. Explain the following with respect Security in Distributed Systems: a. Cryptography b. Authentication c. Access Control d. Digital Signatures ANS:6. Describe the following:

A) Process Migration B) Threads ANS:A)Process Migration:-

Migration of a process is a complex activity that involves proper handling of several sub-activities in order to meet the requirements of a good process migration mechanism. The four major subactivities involved in process migration are as follows: 1. Freezing the process and restarting on another node. 2. Transferring the process address space from its source node to its destination node 3. Forwarding messages meant for the migrant process 4. Handling communication between cooperating processes that have been separated as a result of process migration. B) Threads:-In computer science, a thread of execution is the smallest unit of processing that can be scheduled by an operating system. It generally results from a fork of a computer program into two or more concurrently running tasks. The implementation of threads and processes differs from one operating system to another, but in most cases, a thread is contained inside a process. Multiple threads can exist within the same process and share resources such as memory, while different processes do not share these resources. In particular, the threads of a process share the latter's instructions (its code) and its context (the values that its variables reference at any given moment). To give an analogy, multiple threads in a process are like multiple cooks reading off the same cook book and following its instructions, not necessarily from the same page. On a single processor, multithreading generally occurs by time-division multiplexing (as in multitasking): the processor switches between different threads. This context switching generally happens frequently enough that the user perceives the threads or tasks as running at the same time. On a

multiprocessor or multi-core system, the threads or tasks will actually run at the same time, with each processor or core running a particular thread or task. Many modern operating systems directly support both timesliced and multiprocessor threading with a process scheduler. The kernel of an operating system allows programmers to manipulate threads via the system call interface. Some implementations are called a kernel thread, whereas a lightweight process (LWP) is a specific type of kernel thread that shares the same state and information. Programs can have user-space threads when threading with timers, signals, or other methods to interrupt their own execution, performing a sort of ad-hoc time

Вам также может понравиться