Вы находитесь на странице: 1из 72

OS notes

25/01/10 Monday A computer system can be divided roughly into 4 components: 1. Hardware 2. Operating system 3. Application programs 4. Users Hardware, comprising the C.P.U, Memory & I/O devices, provides the basic computing resources. Application programs define the ways in which the resources are used to solve the computing problems of the users. Operating system is a system software (a software consisting of control routines) for operating a computer and for providing an environment for execution of programs. An Operating System (OS) manages the computer hardware & ensures correct operation of computer system acts as an intermediary (interface) between the user of a computer and the computer hardware provides an environment in which a user can execute programs in a convenient and efficient manner provides certain services to programs and to users of those programs in order to make their tasks easier helps user to use the system without knowing the entire hardware, thereby isolating hardware from him From the single-user viewpoint an OS is designed mostly for ease of use, with some attention paid to performance and none paid to resource utilization (how various hardware and software resources are shared). From the terminal-user viewpoint (user sitting at a terminal connected to mainframe or minicomputer with no resources of his own) an OS is designed to maximize resource utilization. From the workstation-user viewpoint (user sitting at a workstation with its own dedicated resources and also share resources such as networking and servers) an OS is designed to compromise between individual usability and resource utilization. From the handheld device-user viewpoint an OS is designed mostly for individual usability, but performance per amount of battery life is important as well. For embedded systems, OS is designed primarily to run without user intervention. Internally (i.e. from system viewpoint) OS acts as a manager of resources of computer system such as processor, memory, files & I/O devices. Facing numerous and possibly conflicting
1

OS notes

requests for resources, the OS must decide how to allocate them to specific programs and users. A slightly different view of an OS emphasises the need to control the various I/O devices and user programs. An OS is a control program. A control program manages the execution of user programs to prevent errors and improper use of the computer. It is especially concerned with the operation and control of I/O devices. Evolution of Operating Systems Serial Processing: Before 1950 the programmers had to directly interact with the hardware. There was no operating system at that time. This mode of operation is called Serial processing. This type of processing is difficult for the users. It takes much time and next program should wait for completion of previous program. The programs are submitted to the machine one after another manually by the operator. Therefore this method is said to be serial processing. Batch Processing: It was introduced to reduce operator intervention during the processing of jobs by a computer. Many user jobs with similar needs are put together manually by the operator to form a batch and submitted to computer as a batch. The primary function of batch processing system is to perform automatic transition from execution of one job to next job in the batch without requiring operator intervention. All functions of the batch processing are implemented by the batch monitor which is a system program residing in system area of main memory. In the two systems described above the C.P.U was often idle, because the speeds of operator as well as mechanical I/O devices were much slower than those of electronic C.P.U. Multiprogramming: Its a technique to execute a number of programs simultaneously by a single processor. The OS keeps several jobs in memory (this set of jobs is a subset of jobs kept in job pool which consists of all processes residing on disk awaiting allocation of main memory) and then it picks and begins to execute only one program at a time. In non-multiprogramming system C.P.U can execute only one program at a time. If running program is waiting for any I/O devices the C.P.U becomes idle. It will affect the performance of C.P.U. In multiprogramming environment when an I/O wait happens in a process, the C.P.U switches from that job to another. So C.P.U is not idle at any time. But a process continues to be executed until it is completed. Multiprogramming is the first instance where the OS must make decisions for the users. Its fairly sophisticated than its predecessors with features like Job scheduling, Memory management & C.P.U scheduling. Timeshared Processing / Multitasking:
2

OS notes

Its also multiprogramming system in which C.P.U time is shared among processes. Its a logical extension of multiprogramming. Multiple tasks are executed by C.P.U, switching between them. C.P.U scheduler selects a job from the ready queue and switches the C.P.U to that job. When time slot expires C.P.U switches from this to another. In this C.P.U time was shared by different tasks. Hence its called timeshared system. The switching of jobs is done so frequently that the user can interact with each program while it is running. Thus it provides an interactive interface to the user where the user can give instructions to the OS or to a program directly, using a keyboard or a mouse and waits for immediate results. When a program is present in secondary memory, its a job. When it reaches main memory it becomes part of a ready queue.

Advance of Technology & System performance Multiprocessor Systems: Such systems have more than one processor in close communication, sharing the computer bus, the clock, and sometimes memory and peripheral devices. They are also called parallel systems or tightly coupled system. Advantages of Multiprocessor systems are: Increased throughput Increased reliability More economic There are two types of multiprocessing systems: Symmetric multiprocessing (SMP) & Asymmetric multiprocessing. In Symmetric processing each processor runs an identical copy of the OS and these copies communicate with one another as needed. In Asymmetric processing each processor is assigned a specific task. A master processor controls the system; the other processors either look to the master for instruction or have predefined tasks. This scheme defines a masterslave relationship. Distributed Systems: Distributed systems depend on networking for their functionality. By being able to communicate, distributed systems are able to share computational tasks, and provide a rich set of features to users. In contrast to the tightly coupled systems, computer networks consist of a collection of processors that do not share memory or a clock, rather have it their own. The processors communicate with one another through various communication lines, such as high-speed buses or telephone lines. These systems are usually referred to as loosely coupled systems or distributed systems. A network operating system is an OS that provides features such as file sharing across the network, and that includes a communication scheme that allows different processes on
3

OS notes

different computers to exchange messages. A distributed OS is a less autonomous environment: The different OSs communicate closely enough to provide the illusion that only a single OS controls the network. Real-Time Systems: A real-time system is used when rigid time requirements have been placed on the operation of a processor or the flow of data; thus, it is often used as a control device in a dedicated application. Systems that control scientific experiments, medical imaging systems, industrial control systems, and certain display systems, some automobile-engine fuel-injection systems, home-appliance controllers and weapon systems are real-time systems. A real-time system has well-defined, fixed time constraints. Processing must be done within the defined constraints, or the system will fail. Real-time systems are of two types: Hard real-time system (Hard RTOS) & Soft real-time system (Soft RTOS). A hard real-time system guarantees that critical tasks be completed on time. It requires all the delays in the system to be strictly bounded. Secondary storage of any sort is usually limited or missing, with data instead being stored in short-term memory or in read-only memory (ROM). Hard real-time systems conflict with the operation of time-sharing systems. E.g.: Blast furnace, Robotics, Traffic control. In a soft real-time system a critical real-time task gets priority over other tasks, and retains that priority until it completes. However, a real-time task cannot be kept waiting indefinitely for the kernel to run it. They need advanced OS features that cannot be supported by hard RTOS. E.g.: Weather forecast, Multimedia, Virtual reality

01/02/10 Monday An Operating system can be viewed as one which is partitioned into well-delineated portions or System components, each with carefully defined inputs, outputs and functions. These are: 1. Process Management 2. Main-memory Management 3. File Management 4. I/O Management 5. Secondary Storage Management 6. Networking Process Management: A process is a program in execution. It requires certain resources like CPU time, memory, files and I/O devices to accomplish its task. Program by itself is not a process; a program is a passive entity and a process is an active entity, with a program counter specifying the next instruction to execute.

OS notes

Processes can be either Operating system process (that execute system code) or User process (that execute user code). OS is responsible for following activities in connection with process management. 1. Creation, deletion, suspension and resumption of processes 2. Providing mechanisms for process synchronization, process communication and deadlock handling. Main Memory Management: Main memory is a repository of quickly accessible data shared by the CPU and I/O devices. For a program to be executed, it must be mapped to absolute addresses and loaded into memory. To improve both the utilization of the CPU and the speed of the computers response to its users, we must keep several programs in memory. The OS is responsible for the following activities in connection with memory management. 1. Keeping track of which part of memory are currently being used and by whom 2. Deciding which process are to be loaded into memory when memory space becomes available 3. Allocating and deallocating memory space as needed. File Management: File management is one of the most visible components of an operating system. The OS provides a uniform logical view of information storage hiding the physical properties of its storage devices like access speed, capacity, data-transfer rate and access method (sequential or random). It defines a logical storage unit, the file. The OS maps files onto physical media and accesses these files via the storage devices. The OS is responsible for the following activities in connection with file management: 1. Creating and deleting files and directories 2. Supporting primitives for manipulating files and directories 3. Mapping files onto secondary storage 4. Backing up files on stable (non-volatile) storage media

I/O System Management: One of the purposes of an OS is to hide the peculiarities of specific hardware devices from the user. The I/O system management includes: 1. Managing the buffering, caching and spooling 2. Providing a general device-driver interface 3. Managing drivers for specific hardware devices Secondary-Storage Management: Since main memory is too small and volatile, the computer system must provide secondary storage to back up main memory. Most programs are stored on a secondary storage device like disk until loaded into memory and then uses the disk as both the source and destination of their processing. The OS is responsible for:
5

OS notes

1. Free space management 2. Storage allocation 3. Disk scheduling Networking: A distributed system is a collection of processors that do not share memory, peripheral devices or a clock. Each processor will have its own local memory and clock, and the processors communicate with one another through various communication lines. The communication network design must consider message routing and connection strategies, and the problems of contention and security. It should provide sharing of resources. OS generalizes network access as a form of file access, with details of networking being contained in the network interfaces device driver. System Calls: System calls provide interface between a process and the operating system. These calls are generally available as assembly-language instructions. However, certain systems allow system calls to be made directly from a higher level language program, in which case the calls normally resemble predefined function or subroutine calls. They may generate a call to a special run-time routine that makes the system call or the system call may be generated directly in-line. UNIX system calls may be invoked directly from a C or C++ program. Modern Microsoft Windows platform system calls are part of the Win32 API (Application Programming interface), which is available for use by all the compilers written for Microsoft Windows. When system calls are made to OS, parameters may have to be passed. Three general methods are used to pass parameters to the OS. The simplest method is to pass parameters in registers. When there are more parameters, they are generally stored in a block or table in memory and the address of the block is passed as a parameter in a register. Parameters can also be pushed onto the stack by the program and popped off the stack by the OS. System calls can be grouped roughly into five major categories: 1. Process control 2. File management 3. Device management 4. Information maintenance 5. Communications

OS notes

Process Management: A process is a program in execution which requires certain resources. These resources are allocated to the process either when it is created or while it is executing. (A batch system executes jobs, whereas a time-shared system has user programs or tasks.) A process has a program code, known as the text section. It also includes the current activity, as represented by the value of the Program counter and the contents of the processors registers. It also includes the process stack, which contains temporary data (such as method parameters, return addresses and local variables) and a data section, which contains global variables. Process state: The state of a process is defined in part by the current activity of that process. Each process may be in one of the following states: New: The process is being created Running: Instructions are being executed Waiting: Process is waiting for some event to occur (such as an I/O completion)
7

OS notes

Ready: Process is waiting to be assigned to a processor Terminated: Process has finished execution

Only one process can be running on any processor at any instant, although many processes may be ready and waiting. Process Control Block: Each process is represented in the OS by a data structure called process control block (PCB) also called a task control block. It contains basic information associated with a specific process like what it is, where it is going, how much processing completed, where it is stored and how much it has spend in using resources. Process identification Process status Process state - Process status word - Register contents - Main memory management - Process priority Accounting Fig: Process Control Block (PCB) Process Scheduling: Process scheduling is the mechanism to determine the optimum process sequences and timing of assigning them to the different components of the system. Scheduling queues: As a process enters the system, they are put into a Job Queue, which consists of all processes in the system.
8

OS notes

The processes that are residing in main memory and are ready and waiting to execute are kept on a list called the Ready Queue which is generally stored as a linked list. When a process makes an I/O request to a shared device, as there are many processes in the system, device may be busy with I/O request of some other process. Then the process has to wait for the device in device queue. Each device has its own device queue. The records in the queues are generally Process Control Blocks (PCBs) of the processes.

As shown in the figure, a new process is initially put in the ready queue. It waits there until it is selected for execution, or is dispatched. Once the process is allocated by the CPU and is executing, one of the several events occur: The process issues an I/O request and then be placed in an I/O queue The process creates a new sub process and wait for the sub processs termination The process is removed forcibly from CPU, as a result of an interrupt, and is put back in ready queue When a process terminates it is removed from all queues and has its PCB and resources deallocated. Types of Scheduling: There are 3 different types of scheduling. 1. Long term scheduling/Job scheduling 2. Medium term scheduling/Intermediate scheduling 3. Short term/Processor scheduling Long term scheduler selects jobs from job pool in mass-storage device and loads them into memory for execution. It controls the degree of multiprogramming (i.e. no. of processes in ready queue). The long term scheduler should select an optimal process mix of I/O bound and CPU bound processes (I/O bound process spends more of its time doing I/O; CPU bound process spends more time on doing computations). Long term scheduler is absent or minimal in todays systems.

OS notes

Intermediate scheduler is concerned with the decision to temporarily remove a process from memory and from active contention of CPU (swap out), thus reducing the degree of multiprogramming, or to reintroduce it (swap in) so that its execution can be continued where it left off. This scheme is called swapping. Short term scheduler handles the decision on which ready process in ready queue is to be assigned to processor. Context switch: Context switching is the process of switching the CPU to another process by saving the context of the old process in its PCB and loading the saved context for the new process scheduled to run. The context of a process is represented in the PCB of a process; it includes the value of the CPU registers, the process state and memory-management information. Context switch times are pure overhead, because the system does no useful work while switching. Speed depends on: 1. Machine architecture 2. Memory speed 3. No of registers to be copied 4. Existence of special instructions 5. OS structure Context switch times are highly dependent on hardware support. Also, the more complex the OS, the more work must be done during a context switch. Extra memory management techniques may require extra data to be switched with each context.

10

OS notes

02/02/10 Tuesday CPU Scheduler The objective of multiprogramming is to have some process running at all times, so as to maximize CPU utilization. The objective of time-sharing is to switch the CPU among processes so frequently that users can interact with each program while it is running. A uniprocessor system can have only one running process. If more processes exist, the rest must wait until the CPU is free and can be rescheduled. Process execution consists of a cycle of CPU execution (CPU Burst cycle) and I/O wait (I/O Burst cycle). Processes alternate between these two states. A process execution begins and ends with a CPU burst. An I/O bound program would typically have many very short CPU bursts and a CPU bound program might have a few very long CPU bursts. Whenever the CPU becomes idle, the OS must select one of the processes in the ready queue to be executed. This selection is carried out by the short-term scheduler (or CPU scheduler). Pre-emptive Scheduling: In pre-emptive scheduling, scheduling decision can be made even while the execution of a process is in progress. Consequently a process in execution may be forced to release processor so that execution of some other process can be undertaken. This kind of scheduling is needed either when a process switches from the running state to the ready state due to an interrupt or when a process switches from the waiting state to the ready state when its I/O operation completes. Non Pre-emptive Scheduling: In non pre-emptive scheduling, a scheduled process always completes before another scheduling decision is made. The processes are therefore finished in the order in which they are scheduled. Under non pre-emptive scheduling, once the CPU has been allocated to a process, the process keeps the CPU until it releases the CPU by terminating the process or by switching to waiting state due to an I/O request. Scheduling criteria: CPU scheduling algorithms use many criteria for scheduling the processes. The criteria include the following:
11

OS notes

CPU utilization: CPU should be kept as busy as possible Throughput: If CPU is busy executing processes, work is being done. Throughput is a measure of this work. Its the number of processes completed in unit time. Turnaround time: Its the interval from the time of submission of a process to the time of completion. Its the sum of the periods spent waiting in job queue to get into memory, waiting in the ready queue, executing on the CPU and doing I/O. Waiting time: Its the sum of periods spent waiting in the ready queue alone. (A CPU scheduling algorithm does not affect the amount of time during which a process executes or does I/O) Response time: Its the time from submission of a request until the first response is produced. I.e. the time it takes to start responding. (In an interactive system, turnaround time may not be the best criterion as a process can produce some output fairly early and can continue computing new results while previous results are being output to the user) Scheduling algorithms

First-Come, First-Served Scheduling (FCFS) Process that request CPU first is allocated to the CPU first. Its implemented with a FIFO queue. When a new process enters the ready queue, its PCB is kept at the tail of the queue by the scheduler. When CPU becomes free process at the head of the queue allocates CPU and its PCB is removed from the queue. E.g. Batch processing system ADVANTAGES: 1. Simple to understand 2. Easy to implement DISADVANTAGES 1. Average waiting time and hence turnaround times are often quite long 2. Inherently its non pre-emptive 3. Suffers from Convoy effect causing lower CPU and I/O device utilization (Convoy effect is a situation when there is a big CPU bound process and many small I/O bound processes. When the CPU bound process gets hold of the CPU the I/O bound processes, after completing their I/O, waits for the CPU in ready queue leaving the I/O devices idle. When CPU bound process goes for its longer I/O burst all other I/O bound processes finishes their shorter CPU bursts and move back to the I/O queues leaving the CPU idle. After its I/O burst CPU bound process enters its CPU burst and holds the CPU. Again all I/O processes end up waiting in ready queue until CPU bound process is done.) Shortest Job First Scheduling (SJF) When the CPU is available it is assigned to the process in ready queue that has the smallest next CPU burst (not the total length of process). If two processes have the same length for their next CPU burst, FCFS scheduling is used to break tie.
12

OS notes

SJF algorithm may be either pre-emptive or non pre-emptive. DISADVANTAGE: 1. Difficulty to determine the next CPU burst length of a process. Hence is used mainly for long term scheduling rather than short term CPU scheduling. Priority Scheduling A priority is associated with each process and the CPU is allocated to the process with the highest priority. SJF is a special case of Priority scheduling algorithm. Equal-priority processes are scheduled in FCFS order. Priority scheduling can be either pre-emptive or non pre-emptive. A pre-emptive priorityscheduling algorithm will pre-empt if the priority of the newly arrived process is higher than the priority of the currently running process. A non pre-emptive algorithm will simply put the new process at the head of the ready queue so that when current process goes to I/O wait the new process can get hold of CPU. DISADVANTAGE: 1. Lower priority process may suffer indefinite blocking or Starvation, a situation where theres a steady supply of higher priority processes and the CPU can never be allocated to the lower priority process. (A solution for Starvation of lower priority processes is Aging- a technique of gradually increasing the priority of processes that wait in the system for a long time.) Round Robin Scheduling (RR) RR scheduling algorithm is designed especially for timesharing systems. Its similar to FCFS, but pre-emption is added which helps to switch between processes. A small unit of time called Time quantum is defined which is generally from 10 to 100 milliseconds. The ready queue is treated as a circular queue. The CPU scheduler goes around the ready queue, allocating the CPU to each process for a time interval of up to 1 time quantum. To implement RR scheduling, the ready queue is kept as a FIFO queue of processes. New processes are added to the tail of the queue. Scheduler picks up the first process from the head of the queue, sets a timer to interrupt after 1 time quantum and dispatches the process. (The selected process is removed from queue as always) If the process has a CPU burst of less than 1 time quantum, the process itself will release the CPU voluntarily so that the next process at the head of queue can be scheduled. Otherwise, if the CPU burst of current process is longer than 1 time quantum, the timer will go off and will cause an interrupt to the OS. A context switch will be executed, and the process will be put at the tail of ready queue. The process at the head of queue is next scheduled to run. Performance issues of Round Robin: 1. Performance of RR depends heavily on the size of the time quantum. If its too large its similar to FCFS. If its too small the RR approach is called Processor sharing. It
13

OS notes

will give a feeling to the user as though each of n processes has its own processor running at 1/n the speed of the real processor. 2. Performance also depends on context switch time. Time quantum should be much larger compared to the context-switch time. Else it will increase the average turnaround time of the processes as context-switches give overhead. ADVANTAGES: 1. User can interact with any process at a time giving him the illusion that his process is running continuously. Hence appropriate for timesharing systems 2. Inherently pre-emptive DISADVANTAGES: 1. Average waiting time is often quite long 2. For large time quantum it becomes FCFS 3. Time quantum should be greater than context switch time

EXAMPLE: Consider the following set of processes with the arrival time and length of CPU burst time given in milliseconds and priority given as a ranking: Process P1 P2 P3 P4 Arrival Time 0 1 2 3 Burst time 8 4 9 5 Priority 3 4 1 2

The results for various scheduling algorithms (considering that the processes entered ready queue in the order P1, P2, P3 and P4) are as shown in following Gantt charts: For FCFS:
P1 P2 P3 P4

0 26

12

21
14

OS notes

Average waiting time = [0 + 8 + 12 + 21] / 4 = 10.25 ms (without considering arrival times) Average waiting time = [0 + (8-1) + (12-2) + (21-3)] / 4 = 8.75 ms (considering arrival times) Average turnaround time = [8 + 12 + 21 + 26] / 4 = 16.75 ms For Non Pre-emptive SJF:
P1 P2 P4 P3

0 26

12

17

Average waiting time = [0 + (8-1) + (12-3) + (17-2)] / 4 = 7.75 ms For Pre-emptive SJF:
P1 P2 P4 P1 P3

1 26

10

17

Average waiting time = [0 + (1-1) + (5-3) + (10-1) + (17-2)] / 4 = 6.5 ms For Non Pre-emptive PS:
P1 P3 P4 P2

0 26

17

22

Average waiting time = [0 + (8-2) + (17-3) + (22-1)] / 4 = 9.25 ms For Pre-emptive PS:
P1 P3 P4 P1 P2

0 26

11

16

22

Average waiting time = [0 + (2-2) + (11-3) + (16-2) + (22-1)] / 4 = 10.75 ms Multilevel Queue Scheduling A multilevel queue scheduling algorithm partitions the ready queue into several separate queues. The processes are classified into several separate groups and are permanently
15

OS notes

assigned to particular queues, generally based on some property of the process, such as memory size, process priority or process type. Each queue has its own scheduling algorithm. For e.g., a common division is made between Foreground (or Interactive) processes and Background (or Batch) processes. They have different response times, different priorities and might have different scheduling needs. Hence they can be kept in separate queues and can be scheduled by different scheduling algorithms, say one with RR and other with FCFS. In addition there must be scheduling among queues, which is commonly implemented as fixed-priority pre-emptive scheduling.

Multilevel Feedback Queue Scheduling In multilevel queue scheduling processes are permanent in a particular queue and they do not move to another. Thus, even though it has the advantage of low scheduling overhead, it has the disadvantage of being inflexible. Multilevel feedback queue scheduling allows a process to move between queues. The idea is to separate processes with different CPU-burst characteristics. If a process uses too much CPU time, it will be moved to a lower priority queue. Similarly, a process that waits too long in a lower priority queue may be moved to a higher-priority queue. This form of aging prevents starvation.

08/02/10 Monday Co-operating processes The concurrent processes executing in OS may be either Independent processes or Cooperating processes.
16

OS notes

A process is independent if it cannot affect or be affected by the other processes executing in the system. Any process that does not share any data with any other process is independent. A process is co-operating if it can affect or be affected by the other processes executing in the system. Thus, any process that shares data with other processes is a co-operating process. There are several reasons to provide an environment that allows process co-operation. They are: Information sharing: Since several users may be interested in same piece of information (like a shared file), concurrent access to these are to be provided. Computation speedup: To run a task faster it should be broken into subtasks, each of which will be executing in parallel with others. Modularity: We may want to construct a system in a modular fashion dividing the system functions into separate processes or threads that execute concurrently. Convenience: Even an individual user may have many tasks on which to work at one time. For e.g. a user may be editing, printing and compiling in parallel. Concurrent execution of co-operating processes requires mechanisms that allow processes to communicate with one another and to synchronize their actions. Lets have a look on a classic co-operating processes example, the Producer-Consumer problem. Producer-Consumer problem: A producer process produces information that is consumed by the consumer process. For e.g., a print program produces characters that are consumed by a printer driver. A compiler may produce assembly code that is consumed by an assembler. The assembler, in turn, may produce object modules, which are consumed by the loader. To allow producer and consumer to run concurrently 1. There should be a buffer of items that can be filled by producer and emptied by consumer 2. The producer and consumer must be synchronized that the consumer doesnt try to consume an item that has not yet been produced In case of an unbounded buffer, where buffer size have no limit, the consumer may have to wait for the producer to fill new items in buffer, but the producer can always produce new items. But in the case of bounded buffer, where the buffer size is fixed, the consumer must wait if the buffer is empty and the producer must wait if the buffer is full. Solutions for bounded buffer problem are 1. Providing the buffer through the use of an Inter-process Communication (IPC) facility 2. Providing a buffer that is shared by explicit coding

17

OS notes

Shared Memory Solution: The producer and consumer processes share the following variables: typedef struct { ..... } item; item buffer[10]; int in = 0; int out = 0; int counter=0; The shared buffer is implemented as a queue with two pointers in and out to show the next free position and the first full position, respectively. A counter shows the number of items in buffer that are yet to be consumed. Buffer is empty when counter==0 and its full when counter==10. Algorithm for Producer process: item nextproduced; repeat produce an item in nextproduced; while (counter==10) do no_op; buffer[in]=nextproduced; in=(in+1)%10; counter++; until false Algorithm for Consumer process: item nextconsumed; repeat while (counter==0) do no_op; nextconsumed=buffer[out]; out=(out+1)%10; counter--; consume item in nextconsumed; until false When the producer process increments counter and the consumer process decrements counter at the same time, theres a chance to arrive at an incorrect value of counter. The statement counter++ may be implemented in machine language as: register1 = counter register1 = register1 + 1 counter = register1 Similarly, the statement counter-- is implemented as: register2 = counter
18

OS notes

register2 = register2 - 1 counter = register2 Here register1 and register2 are local CPU registers. The concurrent execution of counter++ and counter-- is equivalent to a sequential execution where the lower-level statements presented are interleaved in some arbitrary order such as: T0: producer execute register1 = counter {register1 = 5} T1: producer execute register1 = register1 + 1 {register1 = 6} T2: consumer execute register2 = counter {register2 = 5} T3: consumer execute register2 = register2 1 {register2 = 4} T4: producer execute counter = register1 {counter = 6} T5: consumer execute counter = register2 {counter = 4} We arrived at this incorrect state (counter value should have been 5 instead of 4) because we allowed both processes to manipulate the same data concurrently. A situation like this, where several processes access and manipulate the same data concurrently and the outcome of the execution depends on the particular order in which the access takes place, is called a Race condition. Its to be avoided by ensuring that only one process manipulates the common variable counter at a time. For this, Process Synchronization is required. The Critical Section Problem: Critical section is that section of a process in which it may be changing a common variable. Thus, when one process is executing in its critical section no other process is to be allowed to execute in its own critical section. Thus the execution of critical sections by the processes should be Mutually exclusive (mutex) in time. The Critical Section Problem is to design a protocol that the processes can use to co-operate inorder to avoid errors in execution. Each process much request permission to enter its critical section. The section of code implementing this request is the Entry section. The remaining code is the Remainder section. Thus the general structure of a typical process is as shown:

Solutions to Critical Section Problem: A solution to critical section problem must satisfy the following requirements.

19

OS notes

1. Mutual exclusion: If previous process is executing in its critical section, then no others can execute in their critical sections (CS). 2. Progress: If no process is in its CS and some processes wish to enter their own critical sections, then only those processes that are not executing in their remainder section can participate in the decision on which process will enter its CS next and this selection cannot be postponed indefinitely. 3. Bounded waiting: There should be a bound (a limit) on the number of times that other processes are allowed to enter their critical sections after a process has made a request to enter its critical section and before that request is granted. Two-Process Solutions for Critical Section Problem: Lets consider different solutions when there are only two processes P0 and P1. When Pi represents one process, let Pj represent the other; i.e. j = 1-i. Algorithm 1: Repeat while (turn != i) do no_op; Critical section turn=j; Remainder section Until false In this algorithm processes share a common integer variable turn initialized to 0 (or 1). A process Pi is allowed to enter its CS only when turn becomes i (i.e. when turn == i). Performance: Ensures that only one process can be in its CS at a time, providing Mutual Exclusion Does not satisfy Progress requirement. The process Pi in its exit section is strictly forced to give next turn to the other process Pj, regardless of the fact that Pj may be still in its remainder section and do not want to enter its CS now. This may lead to a situation that Pi completes its RS and request CS before Pj does. In this case Pi will be not allowed to enter its CS even when there is no one else in their CS. Algorithm 2: Repeat
flag[i] = true; while (flag[j]) do no_op;

Critical section flag[i] = false; Remainder section Until false The problem with algorithm 1 is that it does not retain sufficient information about the state of each process; it remembers only which process is allowed to enter its critical section. To
20

OS notes

solve this problem a boolean array flag[] was introduced instead of turn. The elements of the array are initialized to false. If flag[i] is true, this value indicates Pi is ready to enter the CS. Performance: Satisfies Mutual exclusion Progress requirement is again not met as both the processes may set their flags to true state, stopping each other from entering their CS. This situation is called Deadlock.

09/02/10 Monday Algorithm 3 (Petersons Solution): Repeat


flag[i] = true; turn = j; while (flag[j] && turn == j) do no_op;

Critical section flag[i] = false; Remainder section until false Algorithm 3 is a combination of the logics behind Algorithm 1 and Algorithm 2 which gives a correct solution to critical section problem, where all the 3 requirements are met. The process share the two common variables flag[] and turn. Elements of flag[] are initialized to false and turn is either 0 or 1. Performance: Since the value of turn can be either 0 or 1, but not both, only P0 or only P1 can have its turn to enter its critical section at a time. Even if both processes are in their entry section, because of the peculiar position where turn is assigned a value, one process gets stuck in while loop, while the other enters its critical section. Thus it provides Mutual Exclusion. The flag[j] will indicate the current state of Pj. Hence when Pi requests entry to its CS, it checks the flag of Pj to see whether it is in its remainder section. If Pj is in its RS, Pi can enter its CS. Even if both processes are in their entry sections and set their flags true at same time, it will never lead to Deadlock or No Progress state as turn is set after setting flags. Thus either turn == i or turn == j. Thus it provides Progress.
21

OS notes

Since Pi cannot change the value of turn while executing while statement, Pi will enter its CS after at most one entry by Pj in its CS. Thus it provides Bounded Waiting.

Multiple-Process Solutions for Critical Section Problem Bakery Algorithm (Lamports Solution): Bakery algorithm is based on a scheduling algorithm commonly used in bakeries, ice-cream stores, deli counters, motor-vehicle registries and other locations where order must be made out of chaos. On entering the store, each customer receives a number. The customer with the lowest number is served next. If more than one process (customer) receives the same number, the process with the lowest name is served first. i.e. if Pi and Pj have same number and if i < j, then Pi is served first. Repeat
choosing[i] = true; number[i] = max(number[0],number[1],.....,number[n-1]) + 1; choosing[i] = false; for (j=0 to n-1) do begin while(choosing[j]) do no_op; while((number[j] != 0) && (number[j], j < number[i], i)) do no_op; end

Critical section number[i] = 0; Remainder section until false The common data structures are a Boolean array choosing[n] and an integer array number[n] which are initialized to false and 0 respectively. For convenience, we define the following notation: (a, b) < (c, d) if a<c or if a==c and b<d. max(a0, a1,..., an-1) is a number k such that k ai for i=0, ... , n-1 To prove that the bakery algorithm is correct, we need to show that, if Pi is in its critical section and Pk (k!=i) has already chosen its number k!=0, then (number[i], i) < (number [k], k). Given this result, we can show that mutual exclusion is observed. If Pi is in its CS and if Pk try to enter its CS, Pk will be trapped looping in the second while statement for j==i as number[i]!=0 and (number[i],i)< (number[k],k). To show that the progress and bounded-waiting requirements are preserved, it is sufficient to observe that the processes enter their CS on a first-come, first-served basis. Semaphores (Dijkstras Solution): The solutions to the critical section problem given above are not enough to generalize more complex problems. To overcome this difficulty a synchronization tool called Semaphore is used.

22

OS notes

A semaphore S is an integer variable that, apart from initialization, is accessed only through two standard atomic operations: wait and signal. The classical definition of wait: wait(S) { while (S 0) do no_op; S--; } The classical definition of signal: signal(S) { S++; } Modifications to the integer value of the semaphore in the wait and signal operations must be executed indivisibly i.e. when one process modifies the semaphore value, no other process can simultaneously modify that same semaphore value. In addition, in the case of wait(S), the testing of the integer value of S (i.e S 0) and its possible modification (i.e S--), must also be executed without interruption. Semaphore Solution for n-process Critical Section Problem: repeat
wait(S);

Critical section
signal(S);

Remainder section until false The n processes share a semaphore S initialized to 1. As wait is an indivisible operation only one process will check and see that S is positive and that process decrements S and enters its CS. The wait operation by any other process will now make that process loop in while until the process which entered in its CS exits and increments S by a signal operation. Implementation of Semaphore: Type declaration for Semaphore: type semaphore = record value : integer list : ... { list of blocked processes} end; Library procedure for wait: procedure wait(sem) if sem.value > 0
23

OS notes

then sem.value = sem.value 1; else Add this process to list of blocked processes on sem.list; block_me(); //Process send from memory to disk (swap area??) end; Library procedure for signal: procedure signal(sem) if some processes are blocked on sem then Remove a process from sem.list; wake_up(p_id); //Process with given p_id retrieved from disk to memory else sem.value = sem.value + 1; end; The indivisibility of wait and signal operations is ensured by the programming language or the operating system that implements it. It should be ensured that race conditions cannot arise over a semaphore. Processes uses wait and signal operations to synchronize their execution with respect to each other. The initial value of semaphore (sem.value) determines how many processes can get past the wait operation. A process that does not get past a wait operation is blocked on semaphore and sent to sleep. This feature avoids busy waits. When the process who entered CS exits it calls signal procedure, which wakes up a process in blocked list to enter its CS. The definition of block_me() operation can be modified in such a way that it places the PCB of the process into a waiting queue associated with the semaphore (sem.list) and the state of the process in PCB is switched to waiting state. (CPU scheduler schedules only processes in ready queue) The definition of wake_up() is also to be modified that it has to place the PCB of the process into ready queue from the waiting queue associated with the semaphore (sem.list) and switch the state of the process in its PCB to ready state. Deadlocks and Starvation: The implementation of a semaphore with a waiting queue may result in a situation where two or more processes are waiting indefinitely for an event that can be caused only by one of the waiting processes. The event that may cause deadlock here is the execution of a signal() operation. For e.g, consider a system with two processes P0 and P1, each accessing two semaphores S and Q, set to value 1. Consider the situation below:

24

OS notes

Since the signal() operations cannot be executed, P0 and P1 are deadlocked.

02/03/10 Tuesday Classic Problems of Synchronization Solution for Bounded Buffer Problem using Semaphore Producer Process: Consumer Process:

25

OS notes repeat ... Produce an item in NextProd; ... wait(empty); wait(mutex); ... Add NextProd to buffer[in]; ... signal(mutex); signal(full); until false repeat wait(full); wait(mutex); ... Take an item from buffer[out] to NextCons; ... signal(mutex); signal(empty); ... Consume the item in NextCons; ... until false

The mutex semaphore, initialized to 1, provides mutual exclusion for accesses to the buffer pool that consists of n buffers. The empty and full semaphores count the number of empty and full buffers, respectively. The semaphore empty is initialized to the value n; the semaphore full is initialized to the value 0. Solution for Readers-Writers Problem using Semaphore Problem Description: A data object (such as file or record) is to be shared among several concurrent processes. Some of these processes may want only to read the content of the shared object (Readers), whereas others may want to update (i.e. to read and write) the shared object (Writers). If two readers are access the shared data object simultaneously, no adverse effect will result. However, if a writer and some other process (either a reader or a writer) access the shared object simultaneously, chaos will ensue. To solve this problem, writers should have exclusive access to the shared object. Several variations of readers-writers problem exists, the simplest one is first readers-writers problem. It requires that no reader will be kept waiting unless a writer has already obtained permission to use the shared object. I.e. no writer should wait for other readers to finish simply because a writer is waiting. However, solution for such a problem may lead to starvation of writers. Semaphore wrt is common for both readers & writers; mutex is shared by writers only. Solution Description: Writer Process:
wait(wrt); ... writing is performed ... signal(wrt);

Reader Process:

26

OS notes wait(mutex); readcount++; if(readcount == 1) wait(wrt); signal(mutex); ... reading is performed ... wait(mutex); readcount--; if(readcount == 0) signal(wrt); signal(mutex);

The readcount is an integer variable that keeps track of how many readers are currently reading the object. The mutex semaphore is used to ensure mutual exclusion when the variable readcount is updated. The semaphore wrt functions as a mutual-exclusion semaphore for the writers. Semaphores mutex and wrt are initialized to 1; readcount is initialized to 0. Solution for Dining Philosopher Problem using Semaphore Problem description: Consider 5 philosophers who spend their lives thinking and eating. The philosophers share a circular table surrounded by 5 chairs, each belonging to one philosopher. In the centre of the table is a bowl of rice, and the table is laid with 5 single chopsticks. When a philosopher thinks, she does not interact with her colleagues. From time to time, a philosopher gets hungry and tries to pick up the two chopsticks that are closest to her (the chopsticks between her left and right neighbours). A philosopher may pick up only one chopstick at a time. She cannot pick up a chopstick that is already in the hand of a neighbour. When a hungry philosopher has both her chopsticks at the same time, she eats without releasing her chopsticks until she finishes. Dining philosopher is an example of a large class of concurrency-control problems. It is a simple representation of the need to allocate several resources among several processes in a deadlock-free and starvation-free manner. Structure of Philosopher i

27

OS notes repeat wait(chopstick[i]); wait(chopstick[i+1] % 5); ... eat ... signal(chopstick[i]); signal(chopstick[i+1] % 5); ... think ... until false;

Each chopstick is represented with a semaphore. Thus, the shared data are semaphore chopstick[5] where all the elements of chopstick are initialized to 1. However, the above solution may create a deadlock when all philosophers grab one chopstick each and waits for the other to release hers. Monitors: If the sequence of semaphore functions (wait & signal) are not observed properly, timing errors can happen, which are difficult to find. To deal with such errors, researchers have developed high level language constructs. Monitor type is such a fundamental high level synchronization construct. A type, or abstract data type, encapsulates private with public methods to operate on that data. A monitor type presents a set of programmer-defined operations that are provided mutual exclusion within the monitor. The monitor type also contains the declaration of variables whose values define the state of an instance of that type, along with the bodies of procedures or functions that operate on those variables. Deadlock-free Solution to Dining Philosophers problem using Monitor This solution imposes a restriction that a philosopher may pick up her chopsticks only if both of them are available. To distinguish between 3 states of philosopher we use following data structure. enum{ thinking, hungry, eating } state [5]; We also need to declare condition self [5]; The only operations that can be invoked on a condition variable are wait() and signal(). Calling self[i].wait() philosopher i can delay herself when she is hungry but is unable to obtain the chopsticks she needs. Calling self[i].signal() resumes exactly one waiting philosopher i. If i is not waiting then the signal() operation is has no effect. The distribution of the chopsticks is controlled by the monitor dp, whose definition is as follows:

28

OS notes

Each philosopher, before starting to eat, must invoke the operation pickup() and after completing eating operation putdown() must be invoked. This solution ensures that no deadlocks will occur, however, a philosopher may starve to death. Deadlocks A set of processes is in a deadlock state when every process in the set is waiting for an event that can be caused by only another process in the set. A deadlock situation can arise if the following 4 conditions hold simultaneously in a system: 1. Mutual exclusion: At least one resource must be held in a non-sharable mode; i.e. only one process at a time can use the resource. 2. Hold & wait: A process must be holding at least one resource and waiting to acquire additional resources that are already held by other processes. 3. No pre-emption: Resources cannot be pre-empted. A resource can be released only voluntarily by the process holding it, after that process has completed its task. 4. Circular wait: A set of waiting processes {P0, P1, ..., Pn} must exist such that P0 is waiting for a resource held by P1, P1 is waiting for a resource held by P2, .... , Pn is waiting for a resource held by P0. Resource Allocation Graph: Deadlocks can be described more precisely in terms of a directed graph called a System resource-allocation graph. This graph consists of a set of vertices V and a set of edges E. The set of vertices V can be partitioned into 2 different sets of nodes: 1. Set of all active processes in the system (P = {P1, P2, ..., Pn}) 2. Set of all resource types in the system (R = {R1, R2, ..., Rn}) A directed edge from process Pi to resource Rj (Pi Rj) signifies that Pi has requested an instance of Rj and is currently waiting for it. Such an edge is called a request edge. A
29

OS notes

directed edge from Rj to Pi (Rj Pi) signifies that Rj has been allocated to Pi. Such an edge is called an assignment edge. Processes are represented with circles and resources are represented with rectangles. As resource type Rj may have more than one instance, each instance is represented as a dot within the rectangle.

Note that, a request edge points to only the rectangle, whereas an assignment edge must also designate one of the dots in the rectangle.

Methods for Handling Deadlocks: Deadlock Prevention: Deadlock prevention provides a set of methods for ensuring that at least one of the 4 necessary conditions cannot hold. It prevents deadlocks by constraining how requests for resources can be made. Protocol 1: (Mutual Exclusion) The mutual exclusion condition must hold for non sharable resources like printer. (Non sharable in the sense that it cannot be shared simultaneously by several processes) Protocol 2: (Hold & wait) Ensure that the hold & wait condition never occurs in the system. Each process needs to request and get all its resources allocated before it begins execution. This can be implemented by requiring that system calls requesting resources for a process precede all other system calls. An alternative protocol is to allow a process to request resources only when it has none. A process may request some resources and use them. Before it can request any additional resources, however, it must release all the resources that it is currently allocated. Disadvantages: 1. Resource utilization may be low. (Resources may be allocated but unused for a long time)
30

OS notes

2. Starvation is possible. (A process may have to wait indefinitely for a resource which always gets allocated to some other process) Protocol 3: (No Pre-emption) If a process is holding some resources and requests another resource that cannot be immediately allocated to it, then all resources currently being held are pre-empted. (i.e. they are implicitly released) The process will be restarted only when it can regain its old resources, as well as the new ones that it is requesting. An alternate method is to pre-empt the allocated resources only if some other process requests it. Protocol 4: (Circular wait) Impose a total ordering of all resource types and require that each process requests resources in an increasing order of enumeration. I.e. a process can initially request any number of instances of a resource type, say Ri. After that the process can request instances of resource type Rj if and only if Rj comes after Ri in the ordering done on resources. If several instances of same resource type are needed, a single request for all of them must be issued.

08/03/10 Monday Deadlock avoidance: Deadlock prevention algorithms worked by restraining at least one of the necessary conditions for deadlock not to occur and hence preventing deadlock occurrence. But this may lead to low device utilization and reduced system throughput. An alternative method for avoiding deadlocks is to require additional information about how resources are going to be requested by different processes. With the knowledge of complete sequence of requests and releases for each process, the system can decide for each request whether or not the process should wait in order to avoid a possible future deadlock. Each request requires that in making this decision the system consider the resources currently
31

OS notes

available, the resources currently allocated to each process, and the future requests and release of each process. Safe state Algorithm A state is safe if the system can allocate resources to each process (up to its maximum) in some order and still avoid a deadlock. More formally, a system is in a safe state only if there exists a safe sequence. A sequence of processes <P1, P2, ..., Pn> is a safe sequence for the current allocation state if, for each Pi, the resource requests that Pi can still make can be satisfied by the currently available resources plus the resources held by all Pj, with j < i. In this situation, if the resources that Pi needs are not immediately available, then Pi can wait until all Pj have finished. When they have finished, Pi can obtain all of its needed resources, complete its designated task, return its allocated resources, and terminate. When Pi terminates, Pi+1 can obtain its needed resources, and so on. If no such sequence exists, then the system state is said to be unsafe. A safe state is not a deadlock state. Conversely, a deadlocked state is an unsafe state. However, all unsafe state may not lead to a deadlock as shown below:

As long as the state is safe, the OS can avoid unsafe (and deadlocked) states. For e.g., consider a system with 12 magnetic tape drives and three processes: X, Y and Z. Process X requires 10 tape drives, process Y may need as many as 4 tape drives, and process Z may need up to 9 tape drives. Suppose that, at time t0, process X is holding 5, Y is holding 2 and Z is holding 2 tape drives. Thus there are 3 free tape drives at time t0. Process Maximum Needs Current Needs 10 5 X 4 2 Y 9 2 Z At time t0, the system is in a safe state. The sequence <Y, X, Z> satisfies the safety condition. Process Y can request and get its remaining 2 tape drives as there are 3 free. After its execution, Y returns them and thus 5 drives become free. Now X can execute and it can request and get its remaining 5 drives, if needed. After X there will be 10 free drives. Z can now ask for its remaining 7 and get them. A system can go from a safe state to an unsafe state. Suppose that, at time t1, Z requests and is allocated 1 more tape drive. At this point, only Y can be allocated all its tape drives and even when it returns, no other process can have drives to meet its maximum need as there
32

OS notes

will be only 4 free. The system is now in unsafe state which may lead to deadlock. There is no safe sequence. Safe state algorithm is a deadlock avoidance algorithm that ensures that the system will never deadlock by ensuring that the system will always remain in a safe state. Initially, the system is in a safe state. Whenever a process requests a resource that is currently available, the system must decide whether the resource can be allocated immediately or whether the process must wait. The request is granted only if the allocation leaves the system in a safe state. Disadvantage: Resource utilization may be lower. Even if a process requests a resource that is currently available it may still have to wait to preserve the safe state of the system. Bankers algorithm The name of algorithm is chosen so, as it works like a banking system that ensures that the bank never allocated its available cash in such a way that it could no longer satisfy the needs of all its customers. When a new process enters the system, it must declare the maximum number of instances of each resource type that it may need. This number may not exceed the total number of resources in the system. When a user requests a set of resources, the system must determine whether the allocation of these resources will leave the system in a safe state. If it will, the resources are allocated; otherwise, the process must wait until some other process releases enough resources. Data Structures to be maintained: Let there be n processes in the system and m resource types. We need the following data structures: Available: A vector of length m indicates the number of available resources of each type. If Available[j] equals k, there are k instances of resource type Rj available. Max: An n x m matrix defines the maximum demand of each process. If Max[i][j] equals k, then process Pi may request at most k instances of resource type Rj. Allocation: An n x m matrix defines the number of resources of each type currently allocated to each process. If Allocation[i][j] equals k, then the process Pi is currently allocated k instances of resource type Rj. Need: An n x m matrix indicates the remaining resource need of each process. If Need[i][j] equals k, then process Pi may need k more instances of resource type Rj to complete its task. Note that Need[i][j] = Max[i][j] Allocation[i][j] The data structures vary over time in both size and value. We can treat each row in the matrices Allocation and Need as vectors and refer to them as Allocationi and Needi. The vector Allocationi specifies the resources currently allocated to process Pi; the vector Needi specifies the additional resources that process Pi may still request to complete its task.
33

OS notes

Safety algorithm: 1. Let Work and Finish be vectors of length m and n respectively. Initialize Work = Available and Finish[i] = false for i = 0, 1, ..., n-1. 2. Find an i such that both Finish[i] == false and Needi Work. If no such i exists go to step 4. 3. Work = Work + Allocationi Finish[i] = true Go to step 2 4. If Finish[i] == true for all i then the sytem is in a safe state. This algorithm may require an order of m x n2 operations to determine whether a state is safe. Resource-Request Algorithm: This algorithm determines whether requests can be safely granted. Let Requesti be the request vector for process Pi. If Requesti[j] == k, then process Pi wants k instances of resource type Rj. When a request for resources is made by process Pi, the following actions are taken. 1. If Requesti Needi, go to step 2. Otherwise, raise an error condition, since the process has exceeded its maximum claim. 2. If Requesti Available, go to step 3. Otherwise, Pi must wait, since the resources are not available. 3. Have the system pretend to have allocated the requested resources to process Pi by modifying the state as follows: Avaialble = Available Requesti; Allocationi = Allocationi + Requesti; Needi = Needi Requesti; If the resulting resource-allocation state is safe, the transaction is completed, and process Pi is allocated its resources. However, if the new state is unsafe, then Pi must wait for Requesti, and the old resource-allocation state is restored. Memory Management The main purpose of a computer system is to execute programs. These programs, together with the data they access, must be in main memory (at least partially) during execution. As a result of CPU scheduling, we can improve both the utilization of the CPU and the speed of the computers response to its users. To realize this increase in performance, however, we must keep several processes in memory; that is, we must share memory. Thus there is a need of memory management. Memory consists of a large array of words or bytes, each with its own address. The CPU fetches instructions from memory according to the value of the program counter. These instructions may cause additional loading from and storing to specific memory addresses. The memory unit sees only a stream of memory addresses; it does not know how they are generated or what they are for. Memory allocation Methods 1. Single Partition Systems
34

OS notes

In a computer which is only intended to run one process at a time memory management is simple. The process to be executed is loaded into main memory is simple. The process to be executed is loaded in to the free space of memory. In general a part of memory space is used. Such an arrangement is clearly limited in capability and is used nowadays primarily in simple systems such as game computers. Early MS-DOS operating system operated in this way. 2. Fixed Multiple Partition Allocation This is the first attempt to allow memory allocation for multiprogramming. This scheme is to divide memory in to a number of fixed partitions. Each partition will contain exactly one process. Thus, the degree of multiprogramming is bound by the number of partitions. It is also called static allocation because the size of each partition is allocated when the system is powered on. Each partition could only be reconfigured when the computer system is shutdown, reconfigured and restarted. Thus, once the system is in operation, the partition size remains static. Drawbacks: After process allocation each partition may contain an unused space within it called internal fragmentation. The word internal refers to the wastage within the space allocated to it. But the total unused space couldnt be considered to introduce a new process for allocation. 3. Dynamic Allocation With dynamic partition, available memory space is still kept in contiguous blocks. But the processes are given only as much memory as requested when they are loaded for processing. At any time we have a set of holes of various sizes scattered throughout memory. When a process arrives it is put into an input queue. Then the system searches for a hole that is large enough for this process and if obtained, allocates it for the process so that it can then compete for the CPU. Memory is allocated to processes until finally the memory requirements of the next process cannot be satisfied. Thus the OS waits until an enough large block is available. When a process terminates, it releases its block of memory, which is then placed back in the set of holes. If the new hole is adjacent to other holes, these adjacent holes are merged together to form one large hole. At this point, the system have to check whether there are processes waiting for memory and whether this newly freed and recombined memory could satisfy the demands of any of these waiting processes. This procedure is a particular instance of the general dynamic storage allocation problem, which concerns how to satisfy a request of size n from a list of free holes. The strategies commonly used to select a free hole from the set of available holes are: First-fit: Allocate the first hole that is big enough. Best-fit: Allocate the smallest hole that is big enough. Worst-fit: Allocate the largest hole. Although this is a significant improvement over fixed partition, as memory is not wasted internally, it doesnt entirely eliminate the problem.
35

OS notes

Drawbacks: As processes are loaded and removed from memory, the free memory space is broken into little pieces or fragments. External fragmentation exists when total memory space exceeds to satisfy a request, but as it is not contiguous it cannot be allocated. (Storage is fragmented into large number of small holes) Solution: The fragmentation problem can be tackled by physically moving resident processes about the memory inorder to fill up holes and to bring free space to a single large hole. This process is referred to as Compaction. Compaction is possible only if relocation is dynamic and is done at execution time. Compaction can be expensive. Other solutions are Paging, Segmentation or both. Compaction decision can be made 1. As soon as any process terminates 2. When a new process cannot load due to fragmentation 3. At fixed intervals 4. When the users decide to

36

OS notes

09/03/10 Tuesday Paging Paging is a memory management scheme that permits the physical address space of a process to be non-contiguous. The basic method for implementing paging involves breaking physical memory into fixed-sized blocks called frames and dividing process (logical memory) into pages of fixed length in such a way that page size = frame size. In a page system each process is divided into a number of fixed sized blocks called pages, typically 4KB in length. The memory space is also viewed as a set of page frames of the same size. The loading process now involves transferring each process page to memory page frame. Implementation of Paging An address generated by CPU is considered to have the form (p, d) where p is the number of page containing the location and d is the displacement or offset of the location from start of page.

37

OS notes

Page number is used as an index to page table. It contains base address of each page in physical memory. This base address is combined with page offset to define the physical memory address that is sent to the physical memory unit. The page size (and hence frame size) is selected as a power of 2 as a page size makes the translation of a logical address into a page number and page offset particularly easy. If the size of logical address space is 2m and a page size is 2n addressing units (bytes or words), then the high-order m n bits of a logical address designate the page number, and the n low-order bits designate the page offset. Thus, the logical address is as follows:

For paging there is no trouble of external fragmentation as any free frame can be allocated to a process that needs it. However, it may have some internal fragmentation. If memory requirements of a process do not happen to coincide with page boundaries, the last frame allocated may not be completely full. When a process arrives in the system to be executed, its size, expressed in pages, is examined. Each page of the process needs one frame. Thus, if the process has n pages, at least n frames must be available in memory so that they can be allocated to the arriving process. When a page is loaded into one of the allocated frames, its frame number is put into the page table so that each logical address can be translated into its corresponding physical
38

OS notes

address. This mapping is hidden from the user program and is controlled by the operating system. An important aspect of paging is the clear separation between the users view of memory and the actual physical memory. The user program views memory as one single space, containing only this one program in contiguous locations. In fact, the user program is scattered throughout physical memory, which also holds other programs. Segmentation Segmentation presents with an alternative method of dividing a process into variable length blocks called segments. Segmentation is similar in some way to paging except that processes can be loaded as several variable sized segments which are independently positioned in memory. It can provide more efficient utilization of free spaces. Segments can be of any length up to a maximum value determined by design of the system.

A segment address reference requires following: 1. Extract segment number and displacement from logical address. 2. Use segment number to index segment table to obtain segment base address and length. 3. Check that offset is not greater than given length. If so an invalid flag is signalled. 4. Generate the required physical address by adding offset to base address. Virtual Memory Virtual memory is a technique to allow execution of a process even when it is not completely in main memory. In other words, it allows execution of a process even when the logical address space is less than the physical address space. The Virtual Address Space of a process refers to the logical (or virtual) view of how a process is stored in memory. Typically, this view is that a process begins at a certain logical address say, address 0 and exists in contiguous memory. Advantages: 1. Efficient main memory utilization. Programs can be loaded partially in main memory. 2. More programs could be run at same time. Efficient CPU utilization and better throughput is possible.
39

OS notes

3. Virtual memory makes task of programming much easier because programmer no longer needs to worry about amount of physical memory available. 4. Less I/O operations would be needed to load or swap each user program into memory, so each user program would run faster. 5. It allows processes to share files easily and to implement shared memory. 6. It provides an efficient mechanism for process creation. Demand Paging: Demand paging is the combination of paging and swapping. The criterion of this scheme is that a page is not loaded in main memory until it is needed (We may not need the entire program in memory). So a page is loaded into main memory by demand. Hence this scheme is said to be demand paging and is commonly used in virtual memory systems. Page Fault: When processor needs to execute a page that is not available in main memory, this situation is called page fault. If there are no free frames when a page fault happens page replacement will be needed. Page Replacement Algorithms: 1. FIFO Algorithm A FIFO replacement algorithm associates with each page the time when that page was brought into memory. When a page is to be replaced, the oldest page is chosen. BELADYs Anomaly: It reflects the fact that for some page replacement algorithms the page fault rate may increase as the number of allocated frames increase. This most unexpected result is known as Beladys anomaly. FIFO algorithm suffers from Beladys anomaly. 2. Optimal Algorithm The discovery of Beladys anomaly led to the search for an Optimal Page Replacement Algorithm. In this algorithm the page that will not be used for the longest period of time is chosen to be replaced. It has the lowest page-fault rate of all algorithms and will never suffer from Beladys anomaly. 3. LRU Algorithm The Optimal Page Replacement Algorithm may not be feasible as it uses the time when a page is to be used in future. The page that has not been used for the longest period of time is chosen to be replaced. E.g.: Consider the following process reference string: 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1. Assuming there are 3 frames, find the number of page faults for each algorithm. 1. FIFO Algorithm For FIFO algorithm we can see that there will be total 15 page faults as shown below:

40

OS notes

2. Optimal Algorithm There will be a total of 9 page faults.

3. LRU Algorithm There will be total 12 page faults.

41

OS notes

15/03/10 Monday Thrashing: If we increase the number of processes submitted to the CPU for execution, CPU utilization will also increase. But if the number of processes is increasing continuously, at certain time CPU utilization will fall sharply and sometimes it reaches to zero. This situation is said to be Thrashing. When more processes are running, it may happen that all frames are filled and all pages are in active use. If a process does not have the number of frames it needs to support the pages in active use it will quickly page-fault. A global page replacement algorithm replaces pages without regard to the process to which they belong. Thus on the page fault it takes frames away from other processes which may be also needing those pages, and so they also fault, taking frames from other processes. As the replaced pages will be needed again right away, it quickly faults again, and again, replacing pages that it must bring back immediately. This high paging activity is called thrashing. More time is spent on paging than on execution.

Algorithms to solve thrashing: Locality model: The set of pages that are actively used together by a process at a time is called locality of that process. The locality model states that, as a process executes, it moves from locality to locality. A program is generally composed of several different localities which may overlap.

42

OS notes

The working set model is based on an assumption of locality. This model uses a parameter to define the working-set window. The idea is to examine the most recent page references. The set of pages in the most recent page references is the working set. If a page is in active use, it will be in the working set. If it is no longer being used, it will drop from the working set time units after its last reference. Thus, the working set is an approximation of the programs locality. Operating system monitors working set of each process and allocates to that working set enough frames to provide it with its working-set size. If there are enough extra frames, another process can be initiated. If the sum of the working set sizes increases, exceeding the total number of available frames, OS selects a process to suspend. The processs pages are written out (swapped out), and its frames are reallocated to other processes. The suspended processes can be restarted later. Page fault frequency (PFF) algorithm: Thrashing has a high page fault rate. Thus the page fault rate is required to be controlled. When its too high, the process needs more frames. Similarly, if the page fault rate is too low, the process may have too many frames and we can reduce it. OS can establish upper and lower bounds on desired page fault rate. If actual page fault rate exceeds upper limit, the OS can allocate another frame to that process and if page fault rate falls below lower limit OS can remove a frame of that process. Thus the OS can directly measure and control the page fault rate to prevent thrashing. If the page-fault rate increases and no free frames are available, OS must select some process and suspend it. The freed frames are then distributed to processes with high page-fault rates.

Storage Management Disk Management: Magnetic disks provide the bulk of secondary storage for modern computer systems. A disk is made up of several Platters with a diameter range of 1.8 - 5.25 inches. The two surfaces of a platter are covered with a magnetic material. Information is stored by recording it magnetically on the platters. The platters have a common axis called spindle.

43

OS notes

A read-write head flies just above each surface of every platter. The heads are attached to a disk arm that moves all the heads as a unit. The surface of a platter is logically divided into circular tracks, which are subdivided into sectors. The set of tracks that are at one arm position makes up a cylinder. There may be thousands of concentric cylinders in a disk drive, and each track may contain hundreds of sectors. When the disk is in use, a drive motor spins it at high speed. Disk speed has 2 parts: Positioning Time & Transfer rate. - Transfer rate is the rate at which data flow between the drive and the computer. - Positioning time (Access time) consists of the time to move the disk arm to the desired cylinder, called Seek time and the time for the desired sector to rotate to the disk head, called Rotational latency. Disk Scheduling: Whenever a process needs I/O to or from the disk, it issues a system call to the OS requesting for it. If the desired disk drive or controller is busy, this request will be placed in the queue of pending requests for that drive. When one request is completed, the OS chooses which pending request to service next. For this scheduling several disk-scheduling algorithms can be used. 1. FCFS Scheduling: The request that came first is scheduled first in First-Come, First-Served (FCFS) algorithm. The algorithm is intrinsically fair, but it generally does not provide the fastest service. Consider the example below:

Total Head Movements


44

OS notes

= (98-53) + (183-98) + (183-37) + (122-37) + (122 - 14) + (124 - 14) + (124 - 65) + (67 - 65) = 640 cylinders It is obvious that this algorithm may lead to increased head movements (like the wild swing from 122 to 14 and then back from 14 to 122!!) and thus reducing the performance. 2. SSTF Scheduling: Request close to the current head position is serviced first before moving the head far away to service other requests in Shortest-Seek-Time-First (SSTF) algorithm. Consider the example:

Total head movements = (65-53) + (67-65) + (67-37) + (37-14) + (98-14) + (122-98) + (124-122) + (183124) = 236 cylinders Although this algorithm is improvement over FCFS, it is not optimal. Better performance would have resulted if it had gone from 53 to 37, even though the latter is not closest, and then to 14, before turning around to 65, 67, 98, 122, 124 and 183 (This may have reduced total head movements to 208 cylinders). This algorithm may also cause starvation of some requests. 3. SCAN Scheduling: The disk arm starts at one end of the disk and moves toward the other end, servicing requests as it reaches each cylinder, until it gets to the other end of the disk. At the other end, the direction of head movement is reversed, and servicing continues. The head continuously scans back and forth across the disk. This algorithm is also called Elevator Algorithm, since the disk arm behaves like just an elevator.

45

OS notes

If a request arrives in the queue just in front of the head, it will be serviced almost immediately; a request arriving just behind the head will have to wait until the arm moves to the end of the disk, reverses direction, and comes back. 4. C-SCAN Scheduling: Circular SCAN scheduling is a variant of SCAN designed to provide a more uniform wait time. Like SCAN, C-SCAN moves the head from one end of the disk to the other, however, it immediately returns to the beginning of the disk, without servicing any requests on the return trip. The C-SCAN scheduling algorithm essentially treats the cylinders as a circular list that wraps around from the final cylinder to the first one.

5. LOOK and C-LOOK Scheduling: The arm goes only as far as the final request in each direction. Then, it reverses direction immediately, without going all the way to the end of the disk. LOOK and CLOOK are scheduling algorithms which look for a request before continuing to move in a given direction.

46

OS notes

47

OS notes

16/03/10 Tuesday Fields related to fragmentation Identification: 16-bit field identifies a datagram originating from the source host. The combination of the identification and source IP address must uniquely define a datagram as it leaves the source host. It is implemented with the help of a counter which is initialized to a positive number. When the IP protocol sends a datagram, it copies the current value of the counter to the identification field and increments the counter by one. Fragments of a datagram are identified with this identification number, which will be copied to all fragments when the datagram is fragmented. Flags: 3-bit field whose first bit is reserved, second bit is Do not fragment bit and third bit is More fragment. If the second bit is 1, the machine should not fragment the datagram. If it cannot pass the datagram through any physical network, it discards the datagram and sends an ICMP error message to the source host. If the third bit is 1, it means that this is not the last fragment; there are more fragments after this one. Fragmentation offset: 13-bit field shows the relative position of this fragment with respect to the whole datagram. It is the offset of the data in the original datagram measured in units of 8 bytes. Consider a fragmentation as shown below:

The original datagram with a data size of 4000 bytes is first fragmented to 3 fragments Fragment 1, Fragment 2 and Fragment 3. Fragment 1 carries bytes 0 to 1399 and hence its offset is 0000/8=000. Fragment 2 carries bytes 1400 to 2799 and
48

OS notes

hence its offset is 1400/8=175. Fragment 3 carries bytes 2800 to 3999 and hence offset is 2800/8=350. Fragment 2 is again fragmented to fragments 2.1 and 2.2. As Fragment 2.1 carries bytes 1400 to 2199 its offset is 1400/8=175. As Fragment 2.2 carries bytes 2200 to 2799 its offset is 2200/8=275. Note that the identification field remains same for all fragments. Also note that the value of the flags field with more fragment bit set for all fragments except the last. Thus the 4 fragments (Fragment 1, Fragment 2.1, Fragment 2.2 and Fragment3) of same datagram will reach the final destination. Reassembly Logic: Even if each fragment follows a different path and arrives out of order, the final destination host should be able to reassemble the original datagram from the fragments received. If none of them is lost, the following strategy can be used to reassemble the fragments: The first fragment has an offset field value of zero Divide the length of the first fragment by 8. The second fragment has an offset value equal to that result. Divide the total length of the first and second fragment by 8. The third fragment has an offset value equal to that result. Continue the process. The last fragment has a more bit value of 0. Options The header of the IP datagram is made of two parts: a fixed part and a variable part. The variable part comprises the options that can be a maximum of 40 bytes. Options, as the name implies, are not required for a datagram. They can be used for network testing and debugging. Format: Code Copy: 1-bit subfield controls the presence of the option in fragmentation. If its value is 0, it means that the option must be copied only to the first fragment. If its value is 1, it means the option must be copied to all fragments. Class: 2-bit subfield defines the general purpose of the option. 00 means that the option is used for datagram control, 10 means that the option is used for debugging and management. The other two possible values have not yet been defined. Number: 5-bit subfield defines the type of option. Although 5-bits can define up to 32 different types, currently only 6 types are in use. Length 8-bit field defines the total length of the option including the code field and length field itself Data Variable-length field which contains the data that specific options require

49

OS notes

Option Types As mentioned previously, only 6 option types are currently being used. Two of them are 1byte options whereas the remaining four are multiple-byte options. They are as follows:

No Operation: 1-byte option used as filler between options as shown below:

End of Option: 1-byte option used for padding at the end of the option field. After this option, the receiver looks for payload data.

Record Route: Used to record the internet routers that handle the datagram. It can list up to 9 router IP addresses since the maximum size of the option part in header is 40 bytes. The addresses are filled by the visited routers as shown:

50

OS notes

Strict Source Route: Used by the source to predetermine a route for the datagram as it travels through the Internet. All of the routers defined in the option must be visited by the datagram and no other must be visited. Loose Source Route: Similar to strict source route option, but it is more relaxed. Each router in the list must be visited, but the datagram can visit other routers as well. Timestamp: Used to record the time of datagram processing by a router. Time is expressed in milliseconds from midnight, Universal Time. 19/03/10 Friday ICMP (Internet Control Message Protocol) IP protocol does not have an error control mechanism and an assistance mechanism. I.e. there is no error-reporting or error-reporting or error-correcting mechanism. For e.g. IP cannot help the router to report an error to the sender when it discards a packet Errors may be as follows: Routers may discard a datagram when it cannot find a router to final destination Routers may discard a datagram if its TTL field is 0 Final destination may discard all fragments of a datagram if it has not received all fragments within a predetermined time IP protocol also lacks a mechanism for host and management queries. For e.g. a host may need to know whether a router or another host is alive. ICMP is designed to compensate for these two deficiencies of IP by working as a companion to IP protocol in the network layer. The ICMP messages are encapsulated in IP datagram and value of protocol field is set to1 to indicate it is an ICMP message.

51

OS notes

Transport Layer Protocols A protocol of transport layer has several responsibilities. One is to provide a process-toprocess communication. Another is to provide control mechanisms at the transport level. It should also provide a connection mechanism for processes. Transport layer at sending station should make a connection with the receiver before it starts sending (connection oriented). The streams of data sent by processes to transport layer must be chopped into transportable units, numbered and sent one by one. It is the responsibility of transport layer at receiving end to wait until all the different units belonging to the same process have arrived, check and pass those are error free and deliver them to the receiving process as a stream. After the entire stream has been sent, the transport layer closes the connection. User Datagram Protocol (UDP) UDP is a transport layer protocol which serves as an intermediary between the application programs and the network operations. Its functions are as follows: 1. 2. 3. 4. 5. It creates process-to-process communication with port numbers. It provides control mechanisms at a very minimal level. It provides no flow control. It gives no acknowledgement for received packets. (unreliable communication) It provides error control to some extent. If UDP detects an error in the received packet, it silently drops it. 6. It can only receive a data unit from the processes (instead of a stream of data) and deliver it unreliably to the receiver. The data unit must be small enough to fit in a UDP packet. 7. It does not create a connection before sending. Thus it is connectionless. Thus UDP is a connectionless, unreliable transport protocol. It does not add anything to the services of IP except for providing process-to-process communication instead of host-to-host communication. Also, it provides very limited error checking. Advantages: 1. UDP is a very simple protocol using a minimum of overhead 2. If a process has a small message to send without caring much about reliability, it can use UDP 3. Sending a small message using UDP takes much less interaction between the sender and the receiver than using TCP. Process-to-Process Communication As a network layer protocol, IP can deliver the message only to the destination computer. The message needs to be handed over to the correct process. This is ensured by a transport layer protocol. UDP does this with the help of port numbers. Port Numbers: The communication takes place between the local process on a local host and the remote process on a remote host. As we know, the local host and remote host are defined using their
52

OS notes

IP addresses. To define the processes, we need second identifiers called Port Numbers. In TCP/IP protocol suite, the port numbers are integers between 0 and 65,535. Ephemeral Port Numbers: The client process defines itself in its host with a port number, called the Ephemeral port number, which can be randomly chosen (But unique in the host). The word ephemeral means short lived. The life of a client is normally short when compared to the server. Client processes are not needed to be always active. However, they will be active when they request the server process. ICANN recommends an ephemeral port number to be greater than 1023 for some client/server programs to work properly. Well-known Port Numbers: The server process also defines itself with a port number. However, it cannot be chosen randomly as this port number should be the one that the client processes expect the server process to be running on. Of course, another solution would be to send a packet and request the port number of a specific server, but it creates more overhead. TCP/IP has decided to use universal port numbers for servers; these are called Well-known port numbers. Every client process knows the well-known port number of the corresponding server process. Thus the IP addresses and port numbers play different roles in selecting the final destination of the data. The destination IP address defines the host among different hosts in the world. After a host has been selected, the port number defines one of the processes on this particular host.

20/03/10 Saturday ICANN Ranges ICANN has divided the port numbers into three ranges: Well-known, Registered and Dynamic (or private) as shown below:

Well-known and Registered ports are assigned and controlled by ICANN. Registered ports can only be registered with ICANN to prevent duplication. Dynamic or private ports are neither controlled nor registered. They can be used as temporary or private port numbers. Below given are the well-known ports used with UDP.

53

OS notes

Socket Addresses: UDP needs two identifiers at each end for communication the IP address and the Port number. The combination of an IP address and a port number is called a Socket address. The client socket address defines the client process uniquely and the server socket address defines the server process uniquely.

The IP datagram header will keep the IP address part of socket address whereas the UDP header keeps the port number part. User Datagram UDP packets are called User datagrams. The format of a user datagram is as shown below:

Source port number: 16-bit field defines the port number used by the process running on the source host
54

OS notes

Destination port number: 16-bit field defines the port number used by the process running on the destination host Length: 16-bit field defines the total length of the user datagram in bytes. However, the total length needs to be much less because a UDP user datagram is encapsulated in an IP datagram whose maximum length is 65,535 bytes. This field is actually not necessary. A user datagram is encapsulated in an IP datagram. There is a field in the IP datagram that defines the total length. There is another field in the IP datagram that defines the length of the header. Thus total length of UDP datagram can be found as UDP length = IP length IP headers length However, UDP protocol designers felt it was more efficient for destination UDP to get total length information from user datagram itself rather than asking the IP software to supply this information from the datagram header which it has already dropped (before giving the data part to upper layer). Checksum: 16-bit field used to detect errors over the entire user datagram (Header + Data)

Checksum Checksum calculation is done by keeping a pseudo header along with the UDP user datagram shown below:

The protocol field ensures that the packet belongs to UDP and not to TCP. Its value is 17 for UDP. Checksum calculation at Sender: The sender follows these 8 steps to calculate the checksum: 1. Add a pseudo header to the UDP user datagram 2. Fill the checksum field with zeroes 3. Divide the total bits into 16-bit (2-byte) words 4. If the total number of bytes is not even, add 1 byte of padding with all 0s. It will be discarded afterwards.
55

OS notes

5. 6. 7. 8.

Add all 16-bit sections using 1s complement arithmetic Complement the result which is a 16-bit number and insert it in the checksum field Drop the pseudo header and any added padding Deliver the UDP user datagram to the IP software for encapsulation

Consider the checksum calculation of a simple UDP user datagram given below:

Checksum calculation at Receiver: The receiver follows these six steps to calculate the checksum: 1. Add the pseudo header to the UDP user datagram 2. Add padding if needed 3. Divide the total bits into 16-bit sections 4. Add all 16-bit sections using 1s complement arithmetic 5. Complement the result 6. If the result is all 0s, drop the pseudo header and any added padding and accept the user datagram. If the result is anything else, discard the user datagram. UDP Operation Connectionless Service UDP provides connectionless service. Hence each user datagram is an independent datagram and no two datagrams, even if they are from same source process and going to same destination program, are related. The user datagrams are not numbered. Also, there is no connection establishment and connection termination. Processes should try to send only short messages, which can fit into one user datagram, using UDP.

56

OS notes

Flow and Error Control UDP is a very simple, unreliable transport protocol which has no flow control and hence no window mechanism. There is no error control mechanism in UDP except for the checksum. This means that the sender does not know if a message has been lost or duplicated. Encapsulation and Decapsulation

Encapsulation: When a process has a message to send through UDP, it passes the message to UDP along with a pair of socket addresses and the length of data.UDP receives the data and adds the UDP header. UDP then passes the user datagram to IP with the socket addresses. IP adds its own header, keeping protocol field value 17 and then passes that IP datagram to data link layer. Data link layer adds its own header (and possibly a trailer for error control) and passes it to the physical layer where it is encoded and sent to remote machine. Decapsulation: When the message arrives at destination host, physical layer decodes the signals into bits and passes it to the data link layer. The data link layer uses the header (and the trailer) to check the data. If there is no error, the header and trailer are dropped and the datagram is passed to IP. The IP software checks it and drops the header and the user datagram is passed to UDP with sender and receiver IP addresses. UDP uses the checksum to check the entire user datagram. If there is no error, the header is dropped and the application data along with the sender socket address is passed to the process. The sender socket address is passed to the process in case it needs to respond to the message received. Queuing In UDP, queues are associated with ports as shown below:

57

OS notes

At the client site, when a process starts, it requests a port number from the operating system. Some implementations create both an incoming and an outgoing queue associated with each process. Other implementations create only an incoming queue associated with each process. However, one process will be given only one port number, one incoming queue and one outgoing queue. The queues opened by the client are, in most cases, identified by ephemeral port numbers. The queues function as long as the process is running. When the process terminates the queues are destroyed. However, the mechanism of creating queues at server site is different. In its simplest form, the server asks for incoming and outgoing queues using its well-known port when it starts running. The queues remain open as long as the server is running. All the messages sent by a process are sent to the end of its outgoing queue while all the messages received for a process are placed at the end of its incoming queue. When a message arrives for process and if there is no incoming queue for that process to receive it or if it is full, UDP discards the user datagram that brought the message and asks ICMP to send a port unreachable message to the sender. If a process tries to send a message to its outgoing queue when it is full, the operating system asks the process to wait before sending any more messages. Multiplexing and Demultiplexing In a host running a TCP/IP protocol suite, there is only one UDP but possibly several processes that may want to use the services of UDP. To handle this situation, UDP multiplexes and demultiplexes as shown below:

58

OS notes

At sender site UDP accepts messages from different processes, differentiated by their assigned port numbers. After adding the header, UDP passes the user datagram to the IP. At the receiver site UDP receives a user datagram from IP. After error checking and dropping the header, UDP delivers the message to the appropriate process based on the port numbers. Use of UDP Suitable for a process that requires simple request-response communication with little concern for flow and error control. It is not usually used for a process such as FTP that needs to send bulk data. Suitable for processes like TFTP (Trivial File Transfer Protocol) which have internal flow and error-control mechanisms. Suitable transport protocol for multicasting. Multicasting capability is embedded in the UDP software but not in the TCP software. Used for management processes such as SNMP. Used for some route updating protocols such as Routing Information Protocol (RIP) 30/03/10 Tuesday Duties of transport layer 1. Packetizing: A process of dividing a long message into smaller ones. These packets are then encapsulated into data field of transport layer packet and headers are added. 2. Connection Control: Transport layer protocols may be divided into following categories: a. Connection oriented: Connection oriented transport layer protocol establishes a connection i.e. a virtual path between the sender and receiver. This is a virtual connection. A packet may travel out of order. The packets are numbered consecutively and the communication is bidirectional. b. Connection less: A connectionless transport layer protocol sends packets independently. There is no connection between the sender and receiver. Each packet can take its own different route. 3. Addressing: When an application process wishes to setup a connection to a remote application process, it must specify which one to connect to. The method normally used is to define transport addresses to which processes can listen for connection requests. In the internet, these end points are pairs.Transport layer protocols use a socket addressing scheme to define the client process and the server process. The client socket address defines the client process uniquely and the server socket address defines the server process uniquely. 4. Providing Reliability: For high reliability flow control and error control mechanism should be incorporated. We know that data link layer can provide flow control. Similarly transport layer also provide a flow control which is end-to-end than across a link. The transport layer can provide error control as well, but error control is also end-to-end rather than across a link.
59

OS notes

Quality of Service (QoS) Parameters The primary function of transport layer can be considered as enhancing the QoS (Quality of Service) provided by the network layer. The transport service may allow the user to specify preferred, acceptable, and minimum values for various service parameters at the time a connection is setup. Some important service parameters are: 1. Connection establishment delay: It is the amount of time elapsing between a transport connection being requested and the confirmation being received by the user of the transport service. The shorter the delay, the better the service. 2. Connection establishment failure probability: It is the probability that connection is not being established within the maximum connection establishment delay. This can be due to network congestion, lack of table space or some other internal problems. 3. Throughput: It measures the number of bytes of user data transferred per second, measured over some time interval. Throughput, measured in the number of bits per second, sometimes is called bit rate or bandwidth. 4. Transit delay: It measures the time between a message being sent by the transport user on the source machine and its being received by the transport user on the destination machine. 5. Residual error ratio: It measures the number of lost or garbled messages as a fraction of the total sent. In theory, the residual error rate should be zero, since it is the job of the transport layer to hide all network layer errors. In practice, it may have some finite value. 6. Protection parameter: This parameter is a way for the transport user to specify interest in having the transport layer provide protection against unauthorized third parties (wire tapers) reading or modifying the transmitted data. 7. Priority: It provides a way for a transport user to show that some of its connections are more important (a higher priority) than the other ones. This is important while handling congestions because the higher priority connections should get serviced before lower priority ones. 8. Resilience: It gives the probability of the transport layer itself spontaneously terminating a connection due to internal problems or congestion. Transport Service Primitives The ultimate goal of the transport layer is to provide efficient, reliable, and cost-effective service to its users, normally processes in the application layer. To achieve this goal, the transport layer makes use of the services provided by the network layer. The hardware and/or software within the transport layer that does the work are called the transport entity. The (logical) relationship of the network, transport, and application layers is illustrated below:

60

OS notes

TPDU (Transport Protocol Data Unit) refers to the messages sent from transport entity to transport entity. Thus, TPDUs (exchanged by the transport layer) are contained in packets (exchanged by the network layer). In turn, packets are contained in frames (exchanged by the data link layer). There are transport service primitives to allow the transport user such as application programs to access transport service. The transport interface allows the application program to establish, use and release connections. S. No 1 2 3 4 5 Primitive LISTEN CONNECT SEND RECEIVE DISCONNECT TPDU sent None Connection request Data None Disconnection request Meaning Block until some process tries to connect Actively attempt to establish a connection Send data Block until a data TPDU arrives Replace the connection

Consider an application with a server and a number of remote clients. To start with, the server executes a LISTEN primitive client turns up. When a client wants to talk to the server, it executes a CONNECT primitive. The transport entity carries out this primitive by blocking the caller (BLOCK) and sending a packet (SEND) to the server. Encapsulated in the payload of this packet is a transport layer message for the server's transport entity.

06/04/10 Tuesday Transmission Control Protocol (TCP) TCP, like UDP, is a process-to-process protocol. It also uses port numbers for communication. However, unlike UDP, TCP is a connection-oriented protocol; it creates a virtual connection between two TCPs to send data. In addition, TCP uses flow and error61

OS notes

control mechanisms at the transport level. Thus TCP is a connection-oriented, reliable transport protocol. TCP SERVICES Process-to-Process Communication TCP provides process-to-process communication using port numbers. Some of them, which are well-known ports, are given below:

Stream Delivery Service TCP, unlike UDP, is a stream-oriented protocol. It allows the sending process to deliver data as a stream of bytes and allows the receiving process to obtain data as a stream of bytes. TCP creates an environment in which two processes seem to be connected by an imaginary tube that carries data across the Internet as shown below:

The sending process produces (writes to) the stream of bytes and the receiving process consumes (reads from) them. Sending and Receiving Buffers As the sending and receiving processes may not write or read data at the same speed, TCP needs buffers for storage. There are sending and receiving buffers. These are also necessary
62

OS notes

for flow and error control mechanisms used by TCP. The figure below shows buffers which are implemented as circular arrays:

The sending buffer may have chambers that are empty, that contain bytes to be sent and that contain bytes that have been sent but not yet acknowledged. The receiving buffer may have chambers that are empty and that contain bytes received from the network. As the IP layer have to send the data in packets and not as a stream of bytes, TCP groups a number of bytes together into a packet called a Segment. TCP adds a header to each segment (for control purposes) and delivers it to the IP layer for transmission. However, the segments are not necessarily of same size. Full-Duplex Communication TCP offers full-duplex service, where data can flow in both directions at the same time. Connection-oriented Service When a process at site A wants to send and receive data from another process at site B, the following occurs: 1. The two TCPs establish a connection between them 2. Data are exchanged in both directions 3. Connection is terminated This connection is not a physical connection, but a virtual connection. The TCP segment encapsulated in IP datagram can be sent out of order, or lost, or corrupted and then resent. Each may use a different path to reach the destination. Connection-oriented means that TCP takes the responsibility of delivering the bytes in order to the other site. Reliable Service TCP uses an acknowledgement mechanism to check the safe and sound arrival of data.

63

OS notes

TCP FEATURES Numbering system: Although the TCP software keeps track of the segments being transmitted or received, there is no field for a segment number value in the segment header. Instead, there are two fields called the Sequence number and the Acknowledgement number. However, these two fields refer to the byte number and not the segment number. Byte Number: All the data bytes that are transmitted in a connection are numbered by TCP. When TCP receives bytes of data from a process it stores them in the sending buffer and numbers them. TCP generates a random number between 0 and 232 1 for the number of the first byte. Sequence Number: After the bytes have been numbered, TCP assigns each segment that is being sent a sequence number which is the number of the first byte carried in that segment. If the randomly generated number is x, the first data byte is numbered x + 1. The byte x is considered a phony byte that is used as sequence number for a control segment to open a connection. During connection establishment each party uses a random number generator to create Initial Sequence Number (ISN). Acknowledgement Number: When a receiver TCP receives a segment, it uses an acknowledgement number to confirm the bytes it has received. Acknowledgement number defines the number of the next byte that the receiver expects to receive. The receiver takes the number of the last byte that it has received, safe and sound, adds 1 to it, and uses it as acknowledgement number. Flow Control: TCP provides flow control by which the receiver of the data controls how much data are to be sent by the sender. This is done to prevent the receiver from being overwhelmed with data. The numbering system allows TCP to use a byte-oriented flow control. Error Control: Error control provided by TCP is also byte-oriented, although it considers a segment as the unit of data for error detection. Congestion Control: TCP, unlike UDP, takes into account congestion in the network. The amount of data sent by sender is also determined by the level of congestion in the network. TCP Segment A packet in TCP is called a Segment. Segment Format

64

OS notes

The segment consists of a 20 to 60 byte header, followed by data. The fields in header are: Source port address: 16-bit field defines port number of the process in the host that is sending the segment Destination port address: 16-bit field defines port number of the process in the host that is receiving the segment Sequence number: 32-bit field defines the sequence number of the segment Acknowledgement number: 32-bit field defines the acknowledgement number for last received segment Header length: 4-bit field indicates the length of TCP header in units of 4 bytes Reserved: 6-bit field reserved for future use Control: 6-bit field defines 6 control bits given below:

Window size: 16-bit field defines the size of the window in bytes that the other party must maintain. This is normally referred to as the receiving window (rwnd) and is determined by the receiver. Checksum: 16-bit field contain checksum. Calculation of checksum is same as that for UDP. For the TCP pseudo header, the protocol field has value 6. However, unlike UDP the inclusion of checksum in TCP is mandatory. Urgent pointer: 16-bit field, which is valid only if the urgent flag is set, is used when the segment contains urgent data. It defines the number that must be added to the sequence number to obtain the last urgent byte in the data section. Options: Variable-sized optional information with a maximum size of 40 bytes
65

OS notes

TCP Connection TCP establishes a virtual path between the source and destination for communication. Using a single virtual pathway for the entire message facilitates the acknowledgement process as well as retransmission of damaged or lost frames. TCP uses the services of IP to deliver individual segments to the receiver, but it controls the connection itself. If a segment is lost or corrupted, it is retransmitted. If a segment arrives out of order, TCP holds it until the missing segments arrive. IP is unaware of retransmission or reordering. In TCP, connection-oriented transmission requires 3 phases: Connection establishment, Data transfer and Connection termination. Connection Establishment TCP transmits data in full-duplex mode. Hence each party must initialize communication and get approval from the other party before any data is transferred. Three-Way Handshaking: The connection establishment in TCP is called three-way handshaking. The process starts with the server process that tells its TCP that it is ready to accept a connection. This is called a request for a passive open. The client process that wishes to connect to an open server tells its TCP that it needs to be connected to that particular server by issuing a request for an active open. TCP now starts three-way handshaking process as shown below:

The 3 steps in this phase are as follows: 1. The client sends the first segment, a SYN segment, in which only the SYN flag is set. This is for synchronization of sequence numbers. The client in our e.g. chooses a random number as its ISN (Initial Sequence Number) and sends this to server. SYN segment is a control segment and does not contain any data. 2. The server sends the second segment, a SYN + ACK segment with two flag bits set: SYN and ACK. This segment is a SYN segment for communication in other direction as well as an acknowledgement for the receipt of the SYN segment from the client. The server also uses this segment to choose its ISN for numbering the bytes from
66

OS notes

server to the client. Since it contains an acknowledgment, it also needs to define the receiver window size (rwnd) to be used by the client. 3. The client sends the third segment which is an ACK segment to acknowledge the receipt of second segment. The sequence number is kept the same as that in the SYN segment to avoid consuming another sequence number just for an acknowledgement. The client must also define the server window size. SYN Flooding Attack: The connection establishment procedure in TCP is susceptible to a serious security problem called SYN flooding attack which happens when a malicious attacker sends a large number of SYN segments to a server pretending that each of them is coming from a different client by faking the source IP addresses in the datagrams. The server allocates the necessary resources for communication with those clients. The TCP server sends the SYN+ACK segments to the fake clients, which are lost. During this time a lot of resources are occupied without being used. If, during this short period of time, the number of SYN segments is large, the server eventually runs out of resources and may crash. This attack belongs to a group of attacks known as Denial of Service attack, in which the attacker monopolizes a system with so many service requests that the system collapses and denies service to every request.

67

OS notes

Data Transfer After a connection is established, bidirectional data transfer can take place. The client and server can send data and acknowledgements in both directions. Data travelling in the same direction as an acknowledgement are carried on same segment (piggybacking).

Pushing data: In situations where delayed transmission and delayed delivery of data is not acceptable by the application program, the application program at the sending site can request a push operation. This means the sending TCP must not wait for the window to be filled. It must create a segment and send it immediately setting the push bit (PSH) to let the receiving TCP know that it has data that must be delivered to the receiving application program as soon as possible and not to wait for more data to come. Urgent data: There are occasions in which an application program needs to send urgent bytes which mean that the sending application program wants a piece of data to be read out of order by the receiving application program. For e.g. the sending process wants to abort a process, to which has already sent a huge amount of data. The abort command (Ctrl + C) will be stored only at the end of the receiving buffer. The solution is to send a segment with the URG bit set. The sending application program tells the sending TCP that the piece of data is urgent. The sending TCP creates a segment and inserts the urgent data at the beginning of the segment. The urgent pointer field defines the end of the urgent data. When the receiving TCP receives a segment with the URG bit set, it
68

OS notes

extracts the urgent data from the segment, using the value of the urgent pointer and delivers it, out of order, to the receiving application program. Connection Termination Any of the two parties involved in exchanging data can close the connection. Three-Way Handshaking: Most implementations today allow three-way handshaking for connection termination as shown below:

1. In normal situation, the client TCP, after receiving a close command from the client process, sends the first segment, a FIN segment with FIN flag set. 2. The server TCP after receiving the FIN segment informs its process about the situation and sends the second segment, a FIN+ACK segment to confirm the receipt of FIN segment from client and at the same time to announce the closing of the connection in the other direction. 3. The client TCP sends the last segment, an ACK segment to confirm the receipt of the FIN segment from the TCP server. Half-Close: In TCP, one end can stop sending data while still receiving data. This is called Half-close. It is normally initiated by the client. It can occur when the server needs all the data before processing can begin. Hence the client, after sending the whole data, can close the connection in the outbound direction. However, inbound direction must remain open to receive processed data from server. It works as shown below:

69

OS notes

Connection Reset: The TCP at one end may deny a connection request, may abort a connection or may terminate an idle connection. All of these are done with the RST (reset) flag. A connection is denied when the TCP on one side has requested a connection to a nonexistent port of other. To deny the connection the TCP at other end may send a segment with RST bit set. TCP can abort a connection due to an abnormal situation by sending an RST segment to close the connection. TCP on one side may discover that the TCP on other side has been idle for a long time. It may send an RST segment to destroy the idle connection. State Transition Diagram To keep track of all the different events happening during the connection establishment, connection termination and data transfer, the TCP software is implemented as a finite state machine. At any moment, the system is in a state and changes to a new state on an event which is an input applied to the state. The change of state creates an output. Below given are the finite states for TCP and its state transition diagram.

70

OS notes

In Time-wait state a timer for a time-out value of twice the Maximum Segment Lifetime (MSL) is set. The MSL is the maximum time a segment can exist in the Internet before it is dropped. Its common value is between 30 seconds and 1 minute. It is kept to limit the time for which a party expects a segment from the other and when this timer expires it is taken as an event to close the connection.
71

OS notes

Flow Control Flow control regulates the amount of data a source can send before receiving an acknowledgement from the destination. The flow control should be optimal so that neither it slows down the communication nor it overwhelms the receiver. TCP offers byte-oriented flow control with a mechanism called Sliding window. Sliding Window Protocol In this method a host uses a window for outbound communication. The window is defined on that portion of the buffer that contains bytes received from the process. These are bytes that can be sent and other which are sent and not yet acknowledged. The window is opened, closed or shrunk. These activities on the window is controlled by the receiver (and also by the congestion of network) and not by the sender, who have to obey the commands of receiver in this matter. The size of the window is lesser is determined by the lesser of two values: receiver window (rwnd) or congestion window (cwnd). Opening a window means moving right wall of window to right so that more new bytes can be sent. Closing a window means moving the left wall to right so that the bytes that are acknowledged are removed from window. Shrinking the window means moving the right wall to the left which means revoking the eligibility of some bytes for sending. However, this is strongly discouraged as this would be a problem if the sender has already sent these bytes.

Even though shrinking the window is not recommended, there is an exception that the receiver can temporarily shut down the window by sending an rwnd of 0. It happens if for some reason the receiver does not want to receive any data from the sender for a while. This is called Window shutdown. However, the sender can always send a segment with one byte of data to prevent deadlock. This is called Probing. Silly Window Syndrome A serious problem can arise in the sliding window operation when either the sending application program creates data slowly or the receiving application program consumes data slowly or both. Any of these situations results in the sending of data in very small segments, which reduces the efficiency of the operation. This problem is called the Silly window syndrome.

72

Вам также может понравиться