Академический Документы
Профессиональный Документы
Культура Документы
(MCA/PGDCA - 101)
1.0 Objectives
After studying this chapter, you will be able to:
Discuss the strengths of computers
Explain the limitations of computers
Discuss the fundamental uses of computers
Explain the developments of computers
Define the generations of computers
1.1 Introduction
The word ―computer‖ comes from the word ―compute‖, which means, ―to calculate‖. Hence, people
usually a computer to be a calculating device that can perform arithmetic operations at high speed. In fact
the original objective for inventing a computer was to create a fast calculating machine. However, more
than 80% of work done by computers today is of non-mathematical or non-numerical nature. Hence, to
define computer merely as a calculating device is to ignore over 80% of its functions.
2. Speed. A computer is a very fast device. It can perform in a few seconds, the amount of work that a
human being can do in an entire year, if he/she worked day and night and did nothing else. In other words,
a computer can do in a few minutes what would take a man his entire lifetime.
While talking about the speed of a computer we do not talk in terms of seconds or even milliseconds (10 -3)
but in terms of microseconds (10-6), nanoseconds (10-9), and even picoseconds (10-12). A powerful computer
is capable of performing several billion (109) simple arithmetic operations per second.
3. Accuracy. In addition to being very fast, computers are very accurate. Accuracy of a computer is
consistently high and the degree of its accuracy depends upon its design. A computer performs even
calculation with the same accuracy.
However, errors can occur in a computer. These errors are mainly due to human rather than technological
weaknesses. For example, errors may occur due to imprecise thinking by a programmer (a person who
writes instructions for a computer to solve a particular problem) or incorrect input data. We often refer to
computer errors caused due to incorrect input data or unreliable programs as garbage-in-garbage-out
(GIGO).
4. Diligence. Unlike human beings, a computer is free from monotony, tiredness, and lack of
concentration. It can continuously work for hours without creating any error and without grumbling.
Hence, computers score over human beings in doing routine type of jobs that require great accuracy. If ten
million calculations have to be performed, a computer will perform the last one with exactly the same
accuracy and speed as the first one.
5. Versatility. Versatility is one of the most wonderful things about a computer. One moment it is
preparing results of an examination, next moment it is busy preparing electricity bills, and in between. It
may be helping an office secretary to trace an important letter in seconds. All that is required to change its
talent is to slip in a new program (a sequence of instructions for the computer) into it. In brief, a computer
is capable of performing almost any task, if the task can be reduced to a finite series of logical steps.
6. Power of Remembering. As a human being acquires new knowledge, his/her brain subconsciously
selects what it feels to be important and worth retaining in memory. The brain relegates unimportant
details to back of mind or just forgets them. This is not the case with computers. A computer can store and
recall any amount of information because of its secondary storage (a type of detachable memory)
capability. It can retain a piece of information as long as a user desires and the user can recall the
information whenever required. Even after several years, a user can recall exactly the same information
that he/she had stored in the computer several years ago. A computer forgets or looses certain information
only when a user asks it to do so. Hence, it is entirely up to the user to make a computer retain or forget
some information.
7. No I Q. A computer is not a magical device. It possesses no intelligence of its own. Its I. Q. is zero at
least until today. It has to be told what to do and in what sequence. Hence, only users determine what tasks
a computer will perform. A computer cannot take its own decision in this regard.
8. No Feelings. Computers are devoid of emotions. They have no feelings and no instincts because they
are machines. Although men have succeeded in building a memory for computer, but no computer
possesses the equivalent of a human heart and soul. Based on our feelings, taste, knowledge, and
experience we often make certain judgements in our day-to-day life whereas, computers cannot make such
judgements on their own. They make judgements based on the instructions given to them in the form of
programs that are written by us (human beings).
Caution
Excessive use of computers is causing various type of health injuries such as cervical and back pain, pain
in eye, headache.
Communication
Thanks to computers and the Internet, the world has gotten much smaller in recent years. Many people use
their computers to keep in touch with friends and family using instant messenger programs as well as
email. A growing communication tool is social networking, with sites like Facebook and Twitter becoming
incredibly popular.
Games
PCs have long served as recreational devices with hundreds of games available each year. Gaming on a PC
can be an expensive hobby, with video cards ranging in price from INR 3,000 to more than 20,000 and
fully equipped gaming PCs costing in excess of INR 75,000 in many cases. For all the top-tier AAA titles,
there are other games that users can find both pre-installed on PCs as well as online.
Entertainment
Almost all computers come with CD or DVD disk drives, which allow you to use the computer as a CD
player or DVD player. Some computers are also capable, with the proper hardware, of viewing and
recording television onto the machine's hard drives. With an Internet connection, users have a nearly
limitless of videos and music available online as well.
Work
Almost every working environment uses computers in one capacity or another. Office buildings use
computers to keep track of everything from pay wages to hours logged, retail stores use computers as cash
registers and industries such as construction and architecture use computers to help design buildings.
4. The EDVAC (1946-52). A major drawback of ENIAC was that its programs were wired on boards
that made it difficult to change the programs. Dr. John Von Neumann later introduced the ―stored
program‖ concept that helped in overcoming this problem. The basic idea behind this concept is that a
sequence of instructions and data can be stored in the memory of a computer for automatically directing
the flow of operations. This feature considerably influenced the development of modern digital computers
because of the ease with which different programs can be loaded and executed on the same computer. Due
to this feature, we often refer to modern digital computers as stored program digital computers. The
Electronic Discrete Variable Automatic Computer (EDVAC) used the stored' program concept in its
design. Von Neumann also has a share of the credit for introducing the idea of storing both instructions and
data in binary form (a system that uses only two digits - 0 and 1 to represent all characters), instead of
decimal numbers or human readable words.
5. The EDSAC (1947-49). Almost simultaneously with EDVAC of U.S.A., the Britishers developed the
Electronic Delay Storage Automatic Calculator (EDSAC). The machine executed its first program in May
1949. In this machine, addition operations took 1500 microseconds and multiplication operation: took
4000 microseconds. A group of scientists headed by Professor Maurice Wilkes at the Cambridge
University Mathematical Laboratory developed this machine.
6. The UNIVAC I (1951). The Universal Automatic Computer (UNIVAC) was the first digital
computer that was not ―one of a kind‖. Many UNIVAC machines were produced, the first of which was
installed in the Census Bureau in 1951 and was used continuously for 10 years. In 1952, the International
Business Machines (IBM) Corporation introduced the IBM-701 commercial computer. In rapid succession,
improved models of the UNIVAC I and other 700-series machines were introduced. In 1953, IBM
produced the IBM-650, and sold over 1000 of these computers.
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
Digital Computers: They use digital circuits and are designed to operate on two states, namely bits 0 and
1. They are analogous to states ON and OFF. Data on these computers is represented as a series of 0s and
1s. Digital computers are suitable for complex computation and have higher processing speeds. They are
programmable. Digital computers are either general purpose computers or special purpose ones. General
purpose computers, as their name suggests, are designed for specific types of data processing while general
purpose computers are meant for general use.
Hybrid Computers: These computers are a combination of both digital and analog computers. In this type
of computers, the digital segments perform process control by conversion of analog signals to digital ones.
Microcomputers: A computer with a microprocessor and its central processing unit is known as a
microcomputer. They do not occupy space as much as mainframes do. When supplemented with a
keyboard and a mouse, microcomputers can be called personal computers. A monitor, a keyboard and
other similar input-output devices, computer memory in the form of RAM and a power supply unit come
packaged in a microcomputer. These computers can fit on desks or tables and prove to be the best choice
for single-user tasks.
Notebooks: They fall in the category of laptops, but are inexpensive and relatively smaller in size. They
had a smaller feature set and lesser capacities in comparison to regular laptops, at the time they came into
the market. But with passing time, netbooks too began featuring almost everything that notebooks had. By
the end of 2008, netbooks had begun to overtake notebooks in terms of market share and sales.
Personal Digital Assistants (PDAs): It is a handheld computer and popularly known as a palmtop. It has a
touch screen and a memory card for storage of data. PDAs can also be used as portable audio players, web
browsers and smartphones. Most of them can access the Internet by means of Bluetooth or Wi-Fi
communication.
Minicomputers: In terms of size and processing capacity, minicomputers lie in between mainframes and
microcomputers. Minicomputers are also called mid-range systems or workstations. The term began to be
popularly used in the 1960s to refer to relatively smaller third generation computers. They took up the
space that would be needed for a refrigerator or two and used transistor and core memory technologies.
The 12-bit PDP-8 minicomputer of the Digital Equipment Corporation was the first successful
minicomputer.
Servers: They are computers designed to provide services to client machines in a computer network. They
have larger storage capacities and powerful processors. Running on them are programs that serve client
requests and allocate resources like memory and time to client machines. Usually they are very large in
size, as they have large processors and many hard drives. They are designed to be fail-safe and resistant to
crash.
Wearable Computers: A record-setting step in the evolution of computers was the creation of wearable
computers. These computers can be worn on the body and are often used in the study of behaviour
modelling and human health. Military and health professionals have incorporated wearable computers into
their daily routine, as a part of such studies. When the users' hands and sensory organs are engaged in other
activities, wearable computers are of great help in tracking human actions. Wearable computers do not
have to be turned on and off and remain in operation without user intervention.
Tablet Computers: Tablets are mobile computers that are very handy to use. They use the touch screen
technology. Tablets come with an onscreen keyboard or use a stylus or a digital pen. Apple‘s iPod
redefined the class of tablet computers.
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
Figure 1.1: Electronic devices used for manufacturing computers of different generations.
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
1.8 Summary
The word computer comes from the word ‗compute‘, which means, to calculate.
Computer generation like First Generation, Second Generation, Third Generation, Fourth Generation
and Fifth Generation.
Computers are emotionless. They do not have emotion, like dislike feelings.
Basic Pascal invented the first mechanical adding machine in 1642.
Charles Babbage, a nineteenth century Professor at Cambridge University, is considered the father of
modern digital computers.
1.9 Keywords
Generation: Originally, the term .generation was used to distinguish between varying hardware
technologies but it has now been extended to include both hardware and software that together make up a
computer system.
Graphical user interface (GUI): It enables new users to quickly learn how to use computers.
Integrated Circuits: They are usually called ICs or chips. They are complex circuits which have been
etched onto tiny chips of semiconductor (silicon). The chip is packaged in a plastic holder with pins spaced
on a 0.1''(2.54 mm) grid which will fit the holes on strip board and breadboards. Very fine wires inside the
package link the chip to the pins.
Medium scale integration (MSI): Medium-Scale Integration is a term used in electronic chip
manufacturing industry. An integrated circuit which contained hundreds of transistors on each chip, called
Medium-Scale Integration (MSI).
Small-Scale Integration (SSI): The first integrated circuits contained only a few transistors. Called Small-
Scale Integration (SSI), they used circuits containing transistors numbering in the tens.
2.0 Objectives
After studying this chapter, you will be able to:
Discuss the algorithm
Explain about the personal computer
Discuss the uses of a personal computer
Define about components of personal computers
Discuss the evolution of PCs
Explain the development of processors
Describe architecture of Pentium IV
Discuss the configuration of a PC
2.1 Introduction
A PC (personal computer) is a microcomputer. A PC is a single-user system, designed to fit on a desk-top;
hence the word Personal. The IBM PC was introduced in the early eighties and since then has been
modified and improved. Subsequent PCs have been designed to run any software written for previous
versions of the PC. Many other manufacturers have produced compatible computers; that is, computers
which work in the same manner as the IBM PC and use the same software. These are often known as
clones.
2.2 Algorithm
This is a problem solving technique. An algorithm can be defined as a step by step procedure to solve a
particular problem. It consists of English like statements. Each statement must be precise and well-defined
to perform a specific operation. When these statements are executed for a given set of conditions, they will
produce the required results. See Example:
Example: Write an algorithm to compute the area of a circle. Algorithm: Area of a circle
Step 1: Read radius
Step 2: [Compute the area]
Area = 3.142 x radius x radius Step 3: [Print the area]
Print ′Area of a circle=', Area Step 4: [End of algorithm]
Stop
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
Almost every other part of your computer connects to the system unit using cables. The cables plug into
specific ports (openings), typically on the back of the system unit. Hardware that is not part of the system
unit is sometimes called a peripheral device or device.
2.5.2 Storage
Your computer has one or more disk drives—devices that store information on a metal or plastic disk. The
disk preserves the information even when your computer is turned off. There are some types of storage
device:-
Caution
Be careful while writing data in CD/DVD, if power fails data may be loss.
2.5.3 Mouse
A mouse is a small device used to point to and select items on your computer screen. Although mice come
in many shapes, the typical mouse does look a bit like an actual mouse. It is small, oblong, and connected
to the system unit by a long wire that resembles a tail. Some newer mice are wireless.
A mouse usually has two buttons: a primary button (usually the left button) and a secondary button. Many
mice also have a wheel between the two buttons, which allows you to scroll smoothly through screens of
information.
When you move the mouse with your hand, a pointer (see Figure 2.4) on your screen moves in the same
direction. (The pointer‘s appearance might change depending on where it is positioned on your screen.)
When you want to select an item, you point to the item and then click (press and release) the primary
button. Pointing and clicking with your mouse is the main way to interact with your computer.
2.5.4 Keyboard
A keyboard (see Figure 2.5) is used mainly for typing text into your computer. Like the keyboard on a
typewriter, it has keys for letters and numbers, but it also has special keys:
The function keys, found on the top row, perform different functions depending on where they are
used.
The numeric keypad, located on the right side of most keyboards, allows you to enter numbers quickly.
The navigation keys, such as the arrow keys, allow you to move your position within a document or
webpage.
You can use your keyboard and mouse to perform many of the same tasks.
2.5.5 Monitor
A monitor (see Figure 2.6) displays information in visual form, using text and graphics. The portion of the
monitor that displays the information is called the screen. Like a television screen, a computer screen can
show still or moving pictures.There are two basic types of monitors: CRT (cathode ray tube) monitors and
LCD (liquid crystal display) monitors. Both types produce sharp images, but LCD monitors have the
advantage of being much thinner and lighter. CRT monitors, however, are generally more affordable.
2.5.6 Printer
A printer (see Figure 2.7) transfers data from a computer onto paper. You do not need a printer to use your
computer, but having one allows you to print e-mail, cards, invitations, announcements, and other
materials. Many people also like being able to print their own photos at home.
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
In 1993, Intel introduced the Pentium processor which has a speed of 60 MHz. This was followed by the
Pentium II which has a speed of 233 MHz, and the Pentium III which has a speed of 450 MHz, and the
Pentium 4 which has a speed of 1.3 GHz. Later, Intel brought out the Celeron processor, which has a speed
of 266 MHz and which is used in affordable low-end computers. In 2003, Intel inaugurated the Pentium M
processor, which ushered in a new era of mobile computing, under the Centrino platform. The Pentium M
is slower, at 900 MHz, so that energy consumption is reduced and the battery of the laptop lasts longer. In
2006, Intel introduced the Core processor which has a speed of 1.6 GHz. It has more than one core, like in
the case of Core Duo (which has two cores) and has virtualization capability which allows multiple copies
of an operating system to be run on the same computer.
While Intel is the leading company in the manufacturing of processors, there are other companies such as
AMD that make processors too. In 1991, AMD had brought out the Am386 processor and its speed is 40
MHz. It is compatible with the Intel 386 processor. In 1999, AMD introduced the Athlon processor which
has a speed of 500 MHz. Athlon was a legitimate competitor to Intel Pentium III because it was faster. As
a matter of fact, AMD Athlon was the first processor to reach the speed of 1 GHz. The future for the
computer processor industry is promising, as processors will continue to get faster and cheaper. According
to Moore‘s Law, the number of transistors on a chip used to double every year, and from 1975, it used to
double every two years.
In the future, processors will have more cores that will be blistering fast and reduce power consumption.
Software programmers will have to create multi-threaded applications to utilize the multiple cores.
Computers with such processors will be faster for multimedia applications such as graphics software, audio
players and video players. There is also a possibility that optical computing will increase processor speeds
exponentially. All these signs point to a brighter future for processors, which will be to the benefit of
everyone.
A typical pipeline has a fixed amount of work that is required to decode and execute an instruction. This
work is performed by individual logical operations called gates. Each logic gate consists of multiple
transistors. By increasing the stages in a pipeline, fewer gates are required per stage. Because each gate
requires some amount of time (delay) to provide a result, decreasing the number of gates in each stage
allows the clock rate to be increased. It allows more instructions to be in flight or at various stages of
decode and execution in the pipeline. Although these benefits are offset somewhat by the overhead of
additional gates required to manage the added stages, the overall effect of increasing the number of
pipeline stages is a reduction in the number of gates per stage, which allows a higher core frequency and
enhances scalability.
In absolute terms, the maximum frequency that can be achieved by a pipeline in an equivalent silicon
production process can be estimated as:
1/ (pipeline time in ns/number of stages) * 1,000 (to convert to megahertz) = maximum frequency
Accordingly, the maximum frequency achievable by a five-stage, 10-ns pipeline is: 1/ (10/5) * 1,000 =
500MHz
In contrast, a 15-stage, 12-ns pipeline can achieve: 1/ (12/15) * 1,000 = 1,250MHz or 1.25GHz
Additional frequency gains can be achieved by changing the silicon process and/or using smaller
transistors to reduce the amount of delay caused by each gate.
Other new features introduced by the Pentium 4′s new micro-architecture – dubbed NetBurst – include:
An innovative Level 1 cache implementation comprising – in addition to an 8KB data cache – an
Execution Trace Cache, that stores up to 12K of decoded x86 instructions (micro-ops), thus removing
the latency associated with the instruction decoder from the main execution loops.
A Rapid Execution Engine that pushes the processor‘s ALUs to twice the core frequency resulting in
higher execution throughput and reduced latency of execution – the chip actually uses three separate
clocks: the core frequency, the ALU frequency and the bus frequency.
A very deep, out-of-order speculative execution engine – referred to as the Advanced Dynamic that
avoids stall can occur while instructions are waiting for dependencies resolve by providing a large
window of from which units choose.
A 256KB Level 2 Advanced Transfer Cache that provides a 256-bit (32-byte) interface that transfers
data on each core clock, thereby delivering a much higher data throughput channel – 44.8 GBps (32
bytes x 1 data transfer per clock x 1.4 GHz) – for a 1.4GHz Pentium 4 processor.
SIMD Extensions 2 (SSE2) – the latest iteration of Intel‘s Single Instruction Multiple Data technology
which integrate 76 new SIMD instructions and improvements to 68 integer instructions, allowing chip
grab 128-bits at a time in both floating-point and integer and thereby accelerate CPU-intensive
encoding and decoding operations such as streaming video, speech, 3D rendering and other multimedia
procedures.
The industry‘s first 400MHz system bus, providing a 3-fold increase in throughput compared with
Intel current 133MHz bus.
Based on Intel‘s ageing 0.18-micron process, the new chip comprised a massive 42 million transistors.
Indeed, the chip‘s original design would have resulted in a significantly larger chip still – and one that was
ultimately deemed too large to build economically at 0.18 micron. Features that had to be dropped from the
Willamette‘s original design included a larger 16KB Level 1 cache, two fully functional FPUs and 1MB of
external Level 3 cache. What this reveals is that the Pentium 4 really needs to be built on 0.13-micron
technology – something that was to finally happen in early 2002.
The first Pentium 4 shipments – at speeds of 1.4GHz and 1.5GHz – occurred in November 2000. Early
indications were that the new chip offered the best performance improvements on 3D applications – such
as games – and on graphics intensive applications such as video encoding. On everyday office applications
– such as word processing, spreadsheets, Web browsing and e-mail – the performance gain appeared much
less pronounced.
One of the most controversial aspects of the Pentium 4 was its exclusive support – via its associated
chipsets – for Direct Rambus DRAM (DRDRAM). This made Pentium 4 systems considerably more
expensive than systems from rival AMD that allowed use of conventional SDRAM, for little apparent
performance gain. Indeed, the combination of an AMD Athlon CPU and DDR SDRAM outperformed
Pentium 4 systems equipped with DRDRAM at a significantly lower cost.
During the first half of 2001 rival core logic providers SiS and VIA decided to exploit this situation by
releasing Pentium 4 chipsets that did support DDR SDRAM. Intel responded in the summer of 2001 with
the release of its i845 chipset. However, even this climb down appeared half-hearted, since the i845
supported only PC133 SDRAM and not the faster DDR SDRAM. It was not until the beginning of 2002
that the company finally went the whole hog, re-releasing the i845 chipset to extend support to DDR
SDRAM as well as PC133 SDRAM.
During the course of 2001 a number of faster versions of the Pentium 4 CPU were released. The 1.9GHz
and 2.0GHz versions released in the summer of 2001 were available in both the original 423-pin Pin Grid
Array (PGA) socket interface and a new Socket 478 form factor. The principal difference between the two
is that the newer format socket features a much more densely packed arrangement of pins known as a
micro Pin Grid Array (AµPGA) interface. It allows both the size of the CPU itself and the space occupied
by the interface socket on the motherboard to be significantly reduced.
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
2.9 Configuration of PC
It can be safely assumed that an average home computer user uses his computer mostly to send/receive
mails, browse the net, access online applications, watch movies, listen to music, use some desktop
applications and work on documents or spreadsheets, in that order of decreasing frequency of use. But for
the avid gamers, most would play some sort of simple computer games like card games, puzzles, chess etc.
A very small percentage of the population would use a home computer for programming.
Except for extreme gaming, rendering animations and heavy duty data processing, most of the processing
requirements of a typical user can be met by an entry level system. Of these three activities only the 2nd
and the 3rd could be considered of any practical value to the system (and to an extent to the user) and these
two are required by a very small percentage of the total population. We could possibly even conclude that
most of the home users could perform most of their normal computer uses using an entry level PC.
Consider the following configuration
AMD Sempron 3000
512 MB DDRII RAM
80 GB Harddisk
DVD Combo Drive
15" CRT Monitor
Multimedia Keyboard
Optical Mouse
Speakers
5. …………..stores information on a hard disk, a rigid platter or stack of platters with a magnetic surface.
(a) CD or DVD drive (b) Hard disk drive
(c) Floppy disk drives (d) None of these
All high performance systems will be parallel computer systems. High-end super computers will be the
Massively Parallel Processing (MPP) systems having thousands of processors interconnected. To perform
well, these parallel systems require an operating system radically different from current ones. Most
researchers in the field of operating systems have found that these new operating systems will have to be
much smaller than traditional ones to achieve the efficiency and flexibility needed. The solution appears to
be to have a new kind of OS that is effectively a compromise between having no OS at all and having a
large monolithic OS that does many things that are not needed.
Until recently, a mainframe system sat at the heart of Grammer‘s IT infrastructure, running a Production
Planning System (PPS) application and other applications for accounting and human resources. Although
Grammer had been very satisfied with this system, it was impossible to reengineer it to include the new,
complex enterprise resource planning applications that were needed to meet the demanding JIT
requirements. Additionally, with the need to adapt to the European Monetary Unit (EMU) in 1999 and the
Year 2000 issues to be faced, Grammer knew that it was time for a technology overhaul.
Enter IBM PC Server Systems and SAP R/3
Gunnar Blodig, IT manager at Grammer AG, set very high standards for the new hardware. It had to offer
not only a high level of performance, power and integration, but it also supports the mission critical SAP
R/3 applications essential in meeting the company's objectives. Grammer's complete deployment combined
13 IBM PC Server 704 systems running Microsoft Windows NT Server, SAP R/3 and various software
applications.
Reliability and manageability were critical for success. IBM‘s high performance Serial Storage
Architecture (SSA) hard disk storage and the IBM PC Server 704‘s proven track record were determining
factors in Grammer‘s decision to revamp its technology. The IBM PC Server 704 offered a strong platform
for Grammer to build its application-serving environment while SAP R/3 made the most sense when it
came to the best plan for its business. ―We were looking for something to cover all fields such as
commercial applications, human resources, or PPS throughout the whole company. International support
for all components was very important to us, because we are a presence in every part of the world,‖ Blodig
explains. SAP R/3 will allow Grammer to meet the demanding universe of close cooperation with motor
vehicle manufacturers and integrate the flow of data and information throughout its worldwide operating
group via datamining, datamarts and datawarehousing.
Questions
1. What are the advantages of IBM PC Server 704?
2. What was the view of Grammer about IBM personal computers?
2.12 Summary
A personal computer is a machine intended for individual use that receives and provides information,
calculates and manipulates data.
Algorithm consists of English like statements. Each statement must be precise and well-defined to
perform a specific operation.
A mouse is a small device used to point to and select items on your computer screen.
A keyboard is used mainly for typing text into your computer. Like the keyboard on a typewriter, it has
keys for letters and numbers, but it also has special keys.
A monitor displays information in visual form, using text and graphics.
A printer transfers data from a computer onto paper.
A modem is a device that sends and receives computer information over a telephone line or high-speed
cable.
Distributed computing systems run on hardware that is provided by many vendors, and use a variety of
standards-based software components.
2.13 Keywords
Algorithm: This is a problem solving technique, defined as a step by step procedure to solve a particular
problem.
Effectiveness: This means that operations must be simple and are carried out in a finne time at one or more
levels of complexity. It should be effective whenever traced manua:?, for the results.
Finiteness: It should be a sequence of finite instructions. That is, it should end after: fixed time. It should
not enter into an infinite loop.
Hardware: The physical parts of a computer, which you can see and touch, are collectively called
hardware.
Input: The value entered by a user to the system is called input.
Output: The reply given by the system as the answer of input is called output.
Personal computer (PC): Any general-purpose computer whose size, capabilities, and original sales price
make it useful for individuals, and which is intended to be operated directly by an end-user with no
intervening computer operator.
3.0 Objectives
After studying this chapter, you will be able to:
Understand the Boolean algebra
Understand the binary valued quantities and operator
Explain the basic postulates of Boolean algebra
Explain the theorems of Boolean algebra
Define the de Morgan‘s theorems
3.1 Introduction
Boolean logic forms the basis for computation in modern binary computer systems. You can represent any
algorithm, or any electronic computer circuit, using a system of Boolean equations. This provides a brief
introduction to Boolean algebra, truth tables, canonical representation, of Boolean functions, Boolean
function simplification, logic design, combinatorial and sequential circuits, and hardware/software
equivalence.
3.2 Boolean Algebra
That framework is Boolean algebra. This document of course provides only and introduction to Boolean
algebra, refer to dedicated texts for a detailed discussion of the subject.
All arithmetic operations performed with Boolean quantities have but one of two possible outcomes: either
1 or 0. There is no such thing as ―2‖ or ―-1‖ or ―1/2‖ in the Boolean world. It is a world in which all other
possibilities are invalid by fiat. As one might guess, this is not the kind of math you want to use when
balancing a checkbook or calculating current through a resistor. However, Claude Shannon of MIT fame
recognized how Boolean algebra could be applied to on-and-off circuits, where all signals are
characterized as either ―high‖ (1) or ―low‖ (0). His 1938 thesis, titled A Symbolic Analysis of Relay and
Switching Circuits, put Boole‘s theoretical work to use in a way Boole never could have imagined, giving
us a powerful mathematical tool for designing and analyzing digital circuits.
Caution
1. Remember that in the world of Boolean algebra, there are only two possible values for any quantity
and for any arithmetic operation: 1 or 0.
2. Be careful that ―Truth table‖ must be considered, while designing digital circuits.
Expression Result
1 && 0 False or 0
1 && 4 True or 1
0 && 0 False or 0
Figure: 1
3.3.2 Logical OR operator ||
The || (logical OR) operator indicates whether either operand is true.
In C, if either of the operands has a nonzero value, the result has the value 1. Otherwise, the result has the
value 0. The type of the result is int. Both operands must have an arithmetic or pointer type. The usual
arithmetic conversions on each operand are performed.
In C++ if either operand has a value of true, the result has the value true. Otherwise, the result has the
value false. Both operands are implicitly converted to bool and the result type is bool.
Unlike the | (bitwise inclusive OR) operator, the || operator guarantees left-to-right evaluation of the
operands. If the left operand has a nonzero (or true) value, the right operand is not evaluated.
The following examples show how expressions that contain the logical OR operator are evaluated:
Expression Result
1 || 0 True or 1
1 || 4 True or 1
0 || 0 False or 0
Figure: 2
The following example uses the logical OR operator to conditionally increment y:
++x || ++y;
The expression ++y is not evaluated when the expression ++x evaluates to a nonzero (or true) quantity.
3.4.1 Theorems
We now list a number of theorems of the resulting algebra.
1. a''= a, where a''= (a')'.
2. (aab)=a'.
3. (abc) = (acb).
4. (abc) = (cba) = (bca) = (acb) = (cab) = (bac).
5. [a'(abc)' (a'b'c')']=a.
6. [a(abc)' (a'b'c')']=a'.
7. (abc) = [(abd)'(abd')'c].
8. [d' (abc)'(a'b'c')']=d.
9. If (a'bc) =a for all a, then c = b'.
10. (abc)' = (a'b'c').
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
Before introducing any more theorems, it should be pointed out that in applying theorems (1) through (8)
the variable x may actually represent an expression containing more than one variable. For example, if we
have A ( ), we can invoke theorem (4) by letting x=• A . Thus, we can say that A ( = 0. The
same idea can be applied to the use of any of these theorems.
Case 1. For x= 0, y= 0,
x+ xy= x
0+0 •0=0
0=0
Case 2. For x= 0, y= 1, x+xy=x
0+0 • 1=0
0+0=0
0=0
Case 3. For x= l, y= 0, x+xy=x
1+1•0=1
1+0=1
1=1
Case 4. For x= 1, y=1, x+xy=x
1+1•1=1
1+1=1
1=1
Theorem, (14) can also be proved by factoring and using theorems (6) and (2) as follows:
x+xy=x (1+y)
=x1 [using theorem (6)]
=x [using theorem (2)]
Example
Simplify the expression y = A D + A .
Example
Simplify z = ( + B)(A + B).
Example
Simplify x = ACD + BCD.
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
Figure.4 (a): Equivalent circuits implied by theorem (1) (b) alternative symbol for the NOR function.
Figure 5 (a): Equivalent circuits implied by theorem (2); (b) alternative symbol for the NAND function.
That this means is that an AND gate with INVERTERs on each of its inputs is equivalent to a NOR gate.
In fact, both representations are used to represent the NOR function. When the AND gate with inverted
inputs is used to represent the NOR function, it is usually drawn as shown in Figure 3.8(b), where the
small circles on the inputs represent the inversion operation.
Now consider theorem (2),
xy=x+y
The left side of the equation can be implemented by a NAND gate with inputs x and y. The right side can
be implemented by first inverting inputs x and y and then putting them through an OR gate. These two
equivalent representations are show in Figure 6 (a). The OR gate with INVERTERS on each of its inputs is
equivalent to the NAND gate. In fact, both representations are used to represent the NAND function. When
the OR gate with inverted inputs is used to represent the NAND function.
Example
Simplify the expression z= ( + C) • (B + ) to one having only single variables inverted.
Example
Determine the output expression for the circuit and simplify it using De-Morgan‘s theorems.
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
In the above table, columns 8 and 9 are same and column 10 and 11 are same, hence the above both
statements are proved.
Note: (1) The duality theorem is useful to produce a new Boolean relation.
(2) The dual of SOP from is POS from and vice versa.
Example
F2 = Σm (0, 3, 5, 6)
Dual of F2 = F2D = m0D m3D m5D m6D
= M7-0 M7-3 M7-5 M7-6
= M 7 M4 M2 M1
= m0 + m3 + m5 + m6
= Σm (0, 3, 5, 6) = F2
Hence F2 is a self dual function.
Figure 10: The Truth Table and Logic Symbols for NOR
We can build logic diagrams (which in turn lead to digital circuits) for any Boolean expression.
From Figure 18 it is evident that is logically adjacent to both and A.B. This suggests an approach
to simplifying the expression.
What if there are three variables? It is easy if we use three dimensions, then each minterm can be placed in
a box as shown in Figure 19.
Unfortunately, this approach is hard to do on paper and does not generalize to four or more variables.
Instead, we flatten Figure 19 by folding the back half of the cube to the front, giving us the map in Figure
20.
Figure 20: A Three Variable Map.
Not column in Figure 20 are not physically adjacent, but were adjacent in
Figure 19. We must always remember that the first and last columns of a map are logically adjacent.
Figure: 21
In the top group of Figure 21, each minterm contains A and (the right two columns have A=1, and the
top row has C=0), so the term associated with that group is . In the second group, each minterm
contains and C, so the term associated with that group is . The sum of these terms gives
, which matches the result in Example 18.
Figure: 23
The input and output cards were both selected to be 24Vdc so that they may share a single 24Vdc power
supply. In this case the solenoid valve was wired directly to the output card, while the hydraulic pump was
connected indirectly using a relay (only the coil is shown for simplicity). This decision was primarily made
because the hydraulic pump requires more current than any PLC can handle, but a relay would be
relatively easy to purchase and install for that load. All of the input switches are connected to the same
supply and to the inputs.
Questions
1. Explain the need of hydraulic press.
2. What are limit switches at the top and bottom of the press?
3.12 Summary
Karnaugh map is a graphical device used to simplify a logic equation or to convert a truth table to its
corresponding logic circuit in a simple, orderly process.
K map has been filled with 0s and 1s, the sum-of-products expression for the output X can be obtained
by ORing together those squares that contain a 1.
Looping an octet of 1s eliminates the three variables that appear in both complemented and un-
complemented form.
Minimal cost solution is a valid logic design with the minimum number of gates with the minimum
number of inputs.
Do not-care condition input-output condition that never occurs during normal operation. Since the
condition never occurs, you can use an X on the Karnaugh map. This X can be a 0 or a 1, whichever
you prefer.
The operation of ternary rejection in Boolean algebra is the operation ( ) given by (abc) = a'b' + b'c' +
c'a'.
3.13 Keywords
AND gate: The AND gate is so named because, if 0 is called ―false‖ and 1 is called ―true,‖ the gate acts in
the same way as the logical ―and‖ operator.
Boolean algebra: It is used to help analyze a logic circuit and express its operation mathematically and it
has its own unique identities based on the bivalent states of Boolean variables.
Boolean quantities: It has led to the simple rules of addition and multiplication, and has excluded both
subtraction and division as valid arithmetic operations.
Boolean theorem: It is useful in, simplifying a logic expression that is, in reducing the number of terms in
the expression.
DeMorgan’s theorems: These are extremely useful in simplifying expressions in which a product or sum
of variables is inverted
Distributive law: It states that an expression can be expanded by multiplying term by term just the same as
in ordinary algebra.
Identity: It is a statement true for all possible values of its variable or variables.
Inverter: A logical inverter sometimes called a NOT gate to differentiate it from other types of electronic
inverter devices, has only one input. It reverses the logic state.
Logic gate: A logic gate is an elementary building block of a digital circuit. Most logic gates have two
inputs and one output.
5. Simplify the following Boolean equation and write a ladder logic program to implement it.
4.0 Objectives
After studying this chapter, you will be able to:
Explain the digital and analog operations
Understand the binary data
Explain the Number system
Define the conversion of numbers
Discuss the coding system
Describe the error-detecting codes
4.1 Introduction
We are familiar with the decimal number system in which digits are 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9. The
computer uses binary digits for its operation. In the binary system there are only two digits 0 and 1. The
programmer feeds instruction and data in alphabets and decimal digits. But for the operation of the
computer these are converted to binary bits. This chapter deals with the conversion of binary numbers to
decimal numbers and vice versa. It also deals with hexadecimal and octal system. Computer circuitry is
usually designed to process hexadecimal or octal number.
Number Systems are two types:
1. Non Positional Number System
2. Positional Number System
is made to determine whether to reset the second MSB flip-flop. The process is repeated down to the LSB,
and at this time the desired number is in the counter. Since the conversion involves operating on one flip-
flop at a time, beginning with the MSB, a ring counter may be used for flip-flop selection.
The successive-approximation method thus is the process of approximating the analog log voltage by
trying 1 bit at a time beginning with the MSB. The operation is shown in diagram form in Figure 4.2b. It
can be seen from this diagram that each conversion takes the same time and requires one conversion cycle
for each bit. Thus the total conversion time is equal to the number of bits, n, times the time required for one
conversion cycle. One conversion cycle normally requires one cycle of the clock. As an example, a 10-bit
converter operating with a 1-MHz clock has a conversion time of 10 * 10-6 = l0-5 = 10 µs.
Since the number system is represented in ―sixteen‘s‖, there are only 10 numbers and 5 letters that can be a
value in each position of the base-16 number. Below are the numbers that each position can hold:
Caution
Be careful to use ten different symbols like: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 for hexadecimal number.
2. Binary system there are only two symbols or possible digit values, and ………..
(a) 1 and 2 (b) 8 and 10 (c) 0 and 1 (d) 8 and 2
3. The hexadecimal number system is also known as the base-16 number system, because each
position in the number represents an incremental number with a base of 16.
(a) True (b) False
4. The octal, or base ………., number system is a common system used with computers.
(a) 0 (b) 10 (c) 11 (d) 8
8. When ……………. is transmitted from one location to another there is always the possibility that an
error may occur.
(a) data (b) number (c) hexadecimal (d) None of these.
Example
110012=?10
Solution:
Step1: Determine column Values
16 8 0 0 1
Step 3: Sum up the products
16+8+0+0+1=25
Hence, 110012=2510
Example
47068=? 10
Solution:
Step 1: Determine column values
Example
1AC16=? 10
Solution: 1AC16=1*162+A *161+C*160
=1*256+10*16+12*1
=256+160+12
=42810
Example
40527=?10
Solution: 40527 =4*73+0*72+5*71+2*70
=4*343+0*49+5*7+2*1
= 1372+0+35+2
140910
Example
40526=?10
Solution: 40526=4*63+0*62+5*61+2*60
=4*216+0*36+5*6+2*1
=864+0+30+2
=89610
Example
1AC13=? 10
Solution: lAC13=1*132+A*131 +C*130
=1*169+10 *13+12*1
=31110
4.8.2 Converting from Decimal to another Base (Division-Remainder Technique)
The following steps are used to convert a base 10 (decimal) number to a number in another base
Step 1: Divide the decimal number by the value of the new base.
Step 2: Record the remainder from Step 1 as the rightmost digit (least significant digit) of the new base
number.
Step 3: Divide the quotient of the previous division by the new base.
Step 4: Record the remainder from Step 3 as the next digit (to the left) of the new base number. Repeat
Steps 3 and 4, recording remainders from right to left, until the quotient becomes zero in Step 3.
Note that the last remainder, thus obtained, will be the most significant digit of the new base number.
Example
2510
Solution:
Steps 1: 25/2 = 12 and remainder 1
Steps 2: 12/2 = 6 and remainder 0
Steps 3: 6/2 = 3 and remainder 0
Steps 4: 3/2 = 1 and remainder 1
Steps 5: 1/2 = 0 and remainder 1
The remainders are now arranged in the reverse order, making the first remainder the least significant
digit (LSD) and the last remainder the most significant digit (MSD).
Hence, 2510= 110012
4.8.3 Converting from a Base Other Than 10 to another Base Other Than 10
The following steps are used to convert a number in a base other than 10, to number base other than 10:
Step 1: Convert the original number to a base (decimal) number.
Step 2: Convert the decimal number obtained in step 1 to the new base number.
Example
5456=? 4
Solution: Step 1: Convert from base 6 to base 10
545=5*62+4*61+5*60
= 5*36+4*6+5*1
=180+24+5
=20910
4 209 Remainder
52 1
13 0
3 1
0 3
20910=31014
Therefore, 5456=20910=31014
Hence, 5466=31014
Above example illustrates the method of converting a binary number to an octal number.
Example
1011102=? 8
Solution: Step 1: Divide the binary digits into groups of 3, starting the right (LSD).
101 110
Step 2: Convert each group into one digit of octal (use binary- to- decimal conversion method).
1012=1*22+0*21+1*20 1102=1*22+1*21+0*20
=4+0+1 4+2+0
=58 68
Hence, 1011102=568
Example
5628=? 2
Solution: Step 1: Convert each octal digit to 3 Binary digits
58=1012
68 = 1102
28-0102
Step 2: Combine the binary groups.
101 110 010
5628 =
5 6 2
Hence, 5628 = 1011100102
Example:
110100112=? 16
Solution: Step 1: Divide the binary digit into groups of 4, starting from the right (LSD)
1101 0011
Step 2: Convert each group of 4 binary digits to 1 hexadecimal digit.
11012=1*23 +1*22+0*21+1*20 00112=0*23+0*22+1*21+1*20
=8+4+0+1 =0+0+2+1
=1310 =316
=D16
Hence, 110100112=D316
Check Your Progress 1
Note: i) Use the space below for your answer.
Ex1: Convert the number 10010110100112=? 16
……………………..…………………………………………………………………………………………
……………………………..…………………………………………………………………………………
……………………………………………………………………………………………………………
Table 4.4 summarizes the relationship among decimal, hexadecimal, binary, and octal number systems.
Note that the maximum value for a single digit of octal (7) is equal to the maximum value of three digits of
binary. The value range of one digit of octal duplicates the value range of three digits of binary. If we
substitute octal digits for binary digits, the substitution is on a one-to-three basis. Hence, computers that
print octal numbers instead binary, while taking memory dump, save one-third of printing space and time.
Similarly, note that the maximum value of one digit in hexadecimal is equal to the maximum value of four
digits in binary. Hence, the value range of one digit of hexadecimal is equivalent to the value range of four
digits of binary. Therefore, hexadecimal shortcut notation is a one-to-four reduction in space and time
required for memory dump.
Table 4.4: Relationship among Decimal, Hexadecimal, Binary, and Octal number systems
Decimal Hexadecimal Binary Octal
0 0 0 0
1 1 1 1
2 2 10 2
3 3 11 3
4 4 100 4
5 5 111 5
6 6 110 6
7 7 111 7
8 8 1000 10
9 9 1001 11
10 A 1010 12
11 B 1011 13
12 C 1100 14
13 D 1101 15
14 E 1110 16
15 F 1111 17
16 10 10000 20
Check Your Progress 2
Note: i) Use the space below for your answer.
Ex1: Convert the value 2AFCB16=? 2
………………………..………………………………………………………………………………………
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
When you look at figure 4.7, you will notice that the four rightmost bits in EBCDIC are assigned values of
8, 4, 2, and 1. The next four bits to the left are called the zone bits. The EBCDIC coding chart for
uppercase and lowercase alphabetic characters and for the numeric digits 0 through 9 is shown in figure
4.8, with their hexadecimal equivalents. Hexadecimal is a number system used with some computer
systems. It has a base of 16 (0-9 and A-F). A represents 10; B represents 11; C represents 12; D represents
13; E represents 14; and F represents 15. In EBCDIC, the bit pattern 1100 is the zone combination used for
the alphabetic characters A through I, 1101 is used for the characters J through R, and 1110 is the zone
combination used for characters S through Z. The bit pattern 1111 is the zone combination used when
representing decimal digits. For example, the code 11000001 is equivalent to the letter A; the code
11110001 is equivalent to the decimal digit 1. Other zone combinations are used when forming special
characters. Not all of the 256 combinations of 8-bit code have been assigned characters. Figure 4.8
illustrates how the characters DP-3 are represented using EBCDIC.
Figure 4.8: Eight-bit EBCDIC coding chart (including hexadecimal equivalents).
Since one numeric character can be represented and stored using only four bits (8-4-2-1), using an 8-bit
code allows the representation of two numeric characters (decimal digits) as illustrated in figure 4.9.
Representing two numeric characters in one byte (eight bits) is referred to as packing or packed data. By
packing data (numeric characters only) in this way, it allows us to conserve the amount of storage space
required, and at the same time, increases processing speed.
DECIMAL VALUE 92 73
EBCDIC 10010010 01110011
BIT PLACE VALUES 84218421 8421
8421 BYTE1 BYTE2
Figure 4.9: Packed data.
In ASCII, rather than breaking letters into three groups, uppercase letters are assigned codes beginning
with hexadecimal value 41 and continuing sequentially through hexadecimal value 5A. Similarly,
lowercase letters are assigned hexadecimal values of 61 through 7A. The decimal values 1 through 9 are
assigned the zone code 0011 in ASCII rather that 1111 as in EBCDIC. Figure 4.10 is the ASCII coding
chart showing uppercase and lowercase alphabetic characters and numeric digits 0 through 9.
At this point you should understand how coding systems are used to represent data in both EBCDIC and
ASCII. Regardless of what coding system is used, each character will have an additional bit called a check
bit or parity bit.
Questions
1. What are the differences between the Indian and Chinese mathematics?
2. Why the invention of calculus was needed? How it affected the mathematics?
4.10 Summary
A number system is a way of representing a number. Every number system has a base (the number of
digits available).
A number system does not change the value of the number, but only the manner in which it is
represented.
The decimal system is a positional-value system in which the value of a digit depends on its position.
The decimal point separates the positive powers of 10 from the negative powers.
The binary system is positional-value system, wherein each binary digit has its own value or weight
expressed as a power of 2.
The hexadecimal number system is known as the base-16 number system, because each position in the
number represents an incremental number with a base of 16.
4.11 Keywords
Binary system: It is a positional-value system, where in each binary digit has its own value or weight
expressed as a power of 2.
Digital systems: Digital systems process digital signals which can take only a limited number of values
(discrete steps) usually just two values are used: the positive supply voltage (+Vs) and zero volts (0V).
Hexadecimal number system: It is known as the base-16 number system, because each position in the
number represents an incremental number with a base of 16.
Number system: It is a basic counting various items. On hearing the word number all of us immediately
think of the familiar decimal number system with its 10digit: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9.
Octal: The octal, or base 8, number system is a common system used with computers. Because of its
relationship with the binary system, it is useful in programming some types of computers.
5.0 Objectives
After studying this chapter, you will be able to:
Discuss about the data organization
Explain the data representation
Describe the binary arithmetic
Explain the character representation
Checking the result of binary arithmetic
5.1 Introduction
This chapter discusses several important concepts including the binary and hexadecimal numbering
systems, binary data organization (bits, nibbles, bytes, words, and double words), signed and unsigned
numbering systems, arithmetic, logical, shift, and rotate operations on binary values, bit fields and packed
data, and the ASCII character set. This is basic material and the remainder of this text depends upon your
understanding of these concepts.
5.2.1 Bits
The smallest ―unit‖ of data on a binary computer is a single bit. Since a single bit is capable of representing
only two different values (typically zero or one) you may get the impression that there are a very small
number of items you can represent with a single bit. There are an infinite number of items you can
represent with a single bit. You can represent any two distinct items. Examples include zero or one, true or
false, on or off, male or female, and right or wrong. However, you are not limited to representing binary
data types (that is, those objects which have only two distinct values). You could use a single bit to
represent the numbers 723 and 1,245 or perhaps 6,254 and 5. You could also use a single bit to represent
the colours red and blue. You could even represent two unrelated objects with a single bit. For example,
you could represent the colour red and the number 3,256 with a single bit. You can represent any two
different values with a single bit. However, you can represent only two different values with a single bit.
5.2.2 Nibbles
A nibble is a collection of four bits. It would not be a particularly interesting data structure except for two
items: BCD (binary coded decimal) numbers and hexadecimal numbers. It takes four bits to represent a
single BCD or hexadecimal digit. With a nibble, we can represent up to 16 distinct values. In the case of
hexadecimal numbers, the values 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F are represented with four
bits. BCD uses ten different digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) and requires four bits. In fact, any sixteen
distinct values can be represented with a nibble, but hexadecimal and BCD digits are the primary items we
can represent with a single nibble.
5.2.3 Bytes
Without question, the most important data structure used by the 80x86microprocessor is the byte. A byte
consists of eight bits and is the smallest addressable datum (data item) on the 80x86 microprocessor. Main
memory and I/O addresses on the 80x86 are all byte addresses. This means that the smallest item that can
be individually accessed by an 80x 86 programs is an eight-bit value. To access anything smaller requires
that you read the byte containing the data and mask out the unwanted bits. The bits in a byte are normally
numbered from zero to seven using the convention shown in Figure 5.1:
Bit 0 is the low order bit or least significant bit; bit 7 is the high order bit or most significant bit of the byte.
We will refer to all other bits by their number.
Note that a byte also contains exactly two nibbles. (see Figure 5.2)
5.2.4 Words
A word is a group of 16 bits. We will number the bits in a word starting from zero on up to fifteen. The bit
numbering appears as shown in Figure 5.3.
Figure 5.3: Bit numbering of a word.
Like the byte, bit 0 is the low order bit and bit 15 is the high order bit. When referencing the other bits in a
word use their bit position number.
Notice that a word contains exactly two bytes. Bits 0 through 7 forms the low order byte, bits 8 through 15
forms the high order byte (see Figure 5.4):
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
5.4.1 Addition
Binary addition is performed in the same manner as decimal addition. However, since binary number
system has only two digits, the addition table for binary arithmetic is very simple and consists of only four
entries. The complete table for binary addition is as follows:
0+0=0
0+1=1
1+0=1
1+1=0 plus a carry of 1 to next higher column
Carry-overs are performed in the same manner as in decimal arithmetic. Since 1 is the largest digit in
binary number system, any sum greater than 1 requires a digit to be carried over. For instance, 10 plus 10
binary requires addition of two 1s in the second position. Since 1+ 1= 0 plus a carry-over of 1, the sum of
10 + 10 is 100 in binary.
By repeated use of the above rules, any two binary numbers can be added together by adding two bits at a
time. The examples are illustrating the exact procedure.
Example: Add binary numbers 101 and 10in both decimal and binary forms.
Solution: Binary Decimal
101 5
+10 +2
111 7
5.4.2 Subtraction
The principles of decimal subtraction can as well be applied to subtraction of numbers in other number
systems. It consists of two steps that are repeated for each column of the numbers. The first step is to
determine if it is necessary to borrow. If the subtrahend (the lower digit) is larger than the minuend (the
upper digit), it is necessary to borrow from the column to the left. It is important to note here that the value
borrowed depends upon the base of the number system and is always the decimal equivalent of the base.
Hence, in decimal 10 is borrowed, in binary 2 is borrowed, in octal 8 is borrowed, and in hexadecimal 16 is
borrowed. The second step is simply to subtract the lower value from the upper value. The complete table
for binary subtraction is as follows:
Observe that the only case in which it is necessary to borrow is when 1 is subtracted from 0. The examples
given here illustrate the exact procedure.
Example:
Subtract 011102 from 10102 .
Solution:
In the first column (from right to left), 0 is subtracted from 1. No borrow is required in this case and the
result is 1. In the second column, we have to subtract 1 from 0. A borrow is necessary to perform this
subtraction. Hence, a 1 is borrowed from the third column that becomes 2 (binary 10) in the second column
because the base is 2. The third column now becomes 0. Now in the second column, we subtract 1 from 2
giving 1. Since the third column is 0 due to earner borrow, we have to subtract 1 from 0 for which borrow
is required. The fourth column contains a 0 and hence, has nothing to borrow. Therefore, we have to
borrow from the fifth column. Borrowing 1 from the fifth column gives 2 in the fourth column and the fifth
column becomes 0. Now the fourth column has something to borrow. When 1 of the 2 in the fourth column
borrowed, it becomes 2 in the third column and 1 remains in the fourth column. Now in the third column,
we subtract 1 from 2, giving 1. Subtraction of the fourth column is now 1 from 1, giving 0 and in the fifth
column, subtraction is 0 from 0, giving 0. Hence, the result of subtraction is 00111 2. The result may be
verified by subtracting 1410 (= 011102) from 2110 (=101012), which gives 710 (= 001112).
5.4.3 Multiplication
Multiplication in binary number system also follows the same general rules as multiplication in decimal
number system. However, learning binary multiplication is a trivial task because the table for binary
multiplication is very short, with only four entries, instead of 100 entries necessary for decimal
multiplication. The complete table for binary multiplication is as follows:
The example illustrates the method of binary multiplication. It is only necessary to copy the multiplicand,
if the digit in the multiplier is 1 and to copy all 0s, if the digit in the multiplier is 0. The ease with which
each step of the operation is performed is apparent.
5.4.4 Division
Once again, division in binary number system is very simple. As in decimal number system (or in any
other number system), division by zero is meaningless. A computer deals with this problem by raising an
error condition called ‗Division by zero‘ error. Hence the complete table for binary division is as follows:
Binary division is performed in a manner similar to decimal division. The rules for binary division are:
1. Start from the left of the dividend.
2. Perform a series of subtractions, in which the divisor is subtracted from the dividend.
3. If subtraction is possible, put a 1 in the quotient and subtract the divisor from the corresponding digits of
dividend.
4. If subtraction in not possible (divisor greater than reminder), record a 0 in the quotient.
5. Bring down the digit to add to the reminder digits. Proceed as before in manner similar to long division.
Verify the result by dividing 3310 (1000012) by 610 (1102), which gives a quotient of 510 (1012) and a
remainder of 310(112).
5.5 Character Representation
Character data is not just alphabetic characters, but also numeric characters, punctuation, spaces, etc. Most
keys on the central part of the keyboard (except shift, caps lock) are characters.
As we know with signed and unsigned integers, characters need to represent. In particular, they need to be
represented in binary. After all, computers store and manipulate 0‘s and 1‘s (and even those 0‘s and 1‘s are
just abstractions---the implementation is typically voltages).
Unsigned binary and two‘s complement are used to represent unsigned and signed integer respectively,
because they have nice mathematical properties, in particular, you can add and subtract as you would
expect.
However, there are not such properties for character data, so assigning binary codes for characters is
somewhat arbitrary. The most common character representation is ASCII, which attends for American
Standard Code for Information Interchange.
There are two reasons to use ASCII. First, we need some way to represent characters as binary numbers
(or, equivalently, as bit string patterns). There is not much choice about this since computers represent
everything in binary.
If you have noticed a common theme, it is that we need representation schemes for everything. However,
most importantly, we need representations for numbers and characters. Once you have that (and perhaps
pointers), you can build up everything you need.
The other reason we use ASCII is because of the letter ―S‖ in ASCII, which stands for ―standard‖.
Standards are good because they allow for common formats that everyone can agree on.
Unfortunately, there is also the letter ―A‖, which stands for American. ASCII is clearly biased for the
English language character set. Other languages may have their own character set, even though English
dominates most of the computing world (at least, programming and software).
The difference in the ASCII code between an uppercase letter and its corresponding lowercase letter is
2016. This makes it easy to convert lower to uppercase (and back) in hex (or binary).
char as a one byte int
It turns out that C supports two char types: char (which is usually considered ―signed‖) and unsigned char,
which is unsigned.
Caution
Transcoding could result in character data loss when encodings are incompatible.
5. When arithmetic operations are performed on binary numbers the results are in 0s and 1s.
(a) True (b) False
The answer (1001002), interpreted with the sixth bit as the -3210 place, is actually equal to -2810, not +3610
as we should get with +1710 and +1910 added together. Obviously, this is not correct. What went wrong?
The answer lies in the restrictions of the six-bit number field within which we are working, since the
magnitude of the true and proper sum (3610) exceeds the allowable limit for our designated bit field, we
have an overflow error. Simply put, six places does not give enough bits to represent the correct sum, so
whatever figure we obtain using the strategy of discarding the left-most ―carry‖ bit will be incorrect.
A similar error will occur if we add two negative numbers together to produce a sum that is too low for our
six-bit binary field. Let us try adding -1710 and -1910 together to see how this works (or does not work, as
the case may be):
. -1710 = 1011112 -1910 = 1011012
.
. 1 1111 <--- Carry bits (Showing sign bits)
101111
. +101101
. --------
. 1|011100
. Discard extra bit
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
5.7 Summary
There are 128 defined codes in the ASCII character set. IBM uses the remaining 128 possible values
for extended character codes including European characters, graphic symbols, Greek letters, and math
symbols.
Unsigned binary and two‘s complement are used to represent unsigned and signed integer respectively
Character data is at least as important as numeric data. Like numeric data, character data is represented
using 0‘s and 1‘s.
The most commonly used character representation is ASCII. Unicode is gaining popularity, and should
eventually become the standard character set in programming languages.
No bit overflow error occurs when two numbers of opposite signs are added together.
Bit overflow occurs when the magnitude of a number exceeds the range allowed by the size of the bit
field.
5.8 Keywords
Bits: The smallest ―unit‖ of data on a binary computer is called a single bit. A single bit is capable of
representing only two different values, zero or one.
Byte: It consists of eight bits and is the smallest addressable datum (data item) on the 80x86
microprocessor.
Data: It is information that has been translated into a form that is more convenient to move or process.
Nibble: It is a collection of four bits. It would not be a particularly interesting data structure except for two
items: BCD (binary coded decimal) numbers and hexadecimal numbers.
Word: A word is a group of 16 bits. It represents integer values in the range 0...65,535 or -32,768...32,767.
6.0 Objectives
After studying this chapter, you will be able to:
Discuss the typing and input device
Explain the pointing input devices
Discuss the scanning input devices
Explain the audio visual input devices
6.1 Introduction
Input information and programs are entered into the computer through input devices such as the keyboard,
disks, or through other computers via network connections or modems connected to the Internet. The input
device also retrieves information off disks.
6.4.1 Mouse
Mouse is an input device used to control motion of pointer on screen. A mouse has two or three buttons
called Left, Right and Middle button. Buttons are used to perform different functions.
6.4.3 Joystick
Joystick is an input device used to play games on computer.
5. A …………….is an input device which is used to control the movement of the pointer to select items on
a display screen.
(a).Pointing input device (b). keyboard device
(c).scanning device (d). None of these
Caution
Be aware while scanning any document, the scanner cover must be covered properly as its rays may harm
your eyes.
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
6.6.3 Microphone
An input device that converts sound into a signal that can be fed into a computer. The signal from a
microphone is usually analogue so, before it can be processed by a computer, it must be converted into
digital data. An Analogue-to-Digital Convertor (ADC) is used for this (usually built into the computer‘s
sound card). Many headphones now come with microphones to allow them to be used with chat and phone
applications.
.
Caution
The signal from a microphone is analog and it must be converted into digital data before processed by a
computer otherwise it will not respond properly.
6.7 Summary
The work of a computer is characterized by an input-process-output model in which a program
receives input from an input device.
Users employ a variety of input devices to interact with the computer, but most user interfaces today
are based upon a keyboard and a mouse pointing input device.
A keyboard consists of a number of switches and a keyboard controller. The keyboard controller is
built into the keyboard itself.
Keyboard scan codes are sent to the computer via a serial port.
Digital camera records and stores photographic images in digital form that is fed to a computer for
viewing and printing.
6.8 Keywords
Data scanning devices: it is input devices used for direct data entry into a computer system from source
documents
Graphics tablet: it consists of a special pen called stylus and a flat pad. The image is created on the
monitor screen as the user draws it on the pad with the help of stylus (special pen).
Input device: it is an electromechanical device that accepts data from outside world and translates them
into form of a computer can interpret.
Keyboard devices: the most commonly used input devices today. They allow data entry into a computer
system by pressing asset of keys (labeled buttons) neatly mounted on a keyboard connected to the
computer system.
Pointing stick: It is a pressure sensitive small nub (similar to pencil eraser) used like a joystick.
7.0 Objectives
After studying this chapter, you will be able to:
Discuss the output devices
Differentiate the soft and hard copy output
Explain the monitor
Discuss the electrostatic technique
Explain the special purpose output equipments
7.1 Introduction
An Output Device is a piece of hardware that is used receiving information from a computer. Some devices
are both an input and output device and can transfer information in one of two directions depending on the
current situation. A disk drive is an example of an input/output device. Some devices can be used only for
output - for example, a printer and a monitor. This is the most commonly used output device. It displays
what you have typed or otherwise entered in to the computer on the screen in front of you, monitors can be
monochrome (black and white, black and green or black and amber), or colour. Colour monitors come in
various types. Each type has a different number of colours to use and a different quality of picture
(resolution). The higher the resolution, the better the quality of the pictures on the screen. Resolution is
measured in pixels. The screen is divided into a grid. Each square on the grid is a pixel.
7.2 Output Devices
An output device is an electromechanical device that accepts data from a computer and translates them into
a form suitable for use by outside world (users). Several output devices are available today. They can be
broadly classified into following categories:
1 Monitors 4. Screen image projector
2 Printers 5. Voice response systems
3 plotters
7.4 Monitor
Monitors are the most popular output devices used today for producing soft-copy output. They display the
generated output on a television like screen. A monitor is associated usually with a keyboard and together
they form a video display terminal (VDT). A VDT (often referred to as just terminal) is the most popular
input/output (I/O) device used with today‘s computers. It serves as both an input and output device. The
keyboard is used for input to a computer and the monitor is used to display the output from the computer.
The name ―terminal‖ comes from the fact that a terminal is at the terminus or end of a communication
path.
Two basic types of monitors used today are cathode-ray-tube (CRT) and LCD (Liquid Crystal Display)
.flat-panel. CRT monitors work much like a television screen and are used with non-portable computer
systems. On the other hand, LCD flat-panel monitors are thinner, lighter and are used commonly with
portable computer systems like notebook computers. With gradual reduction in price of LCD flat panel
monitors, they are used increasingly with non-portable desktop computer systems also. They are also
preferred because they occupy less table space.
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
7.5 Printers
Printers are the most popular output devices used today for producing hard-copy output.
Different types of printers are described below:
7.5.1 Dot-Matrix Printers
Dot-matrix printers are character printers that print one character at a time. They form characters and all
kinds of images as patterns of dots. Figure 7.3 shows how various types of characters can be formed as
patterns of dots. A Dot matrix printer has a print head that moves horizontally (left to right and right to
left) across the paper. Print contains an array of pins that can be activated independent of each other to
extend and strike against an inked -n to form patterns of dots on the paper. To print a character, the printer
activates the appropriate set of pins as the print head moves horizontally. For faster printing, many dot-
matrix printers print both ways - while the printer head moves from left to right and while it moves from
right to left, on return. Such method is called bidirectional printing.
Inkjet printers are slower than dot-matrix printers are with printing speeds ranging from 40 to 300
characters per second. Typically, an inkjet printer is more expensive than a dot-matrix printer. They are
preferred if speed of printing is not an important factor.
Chain/band printers are impact printers because they print by hammering on a paper and inked ribbon
against the characters embossed on the chain/band. Hence, they can be used to produce multiple copies by
using carbon paper or its equivalent. Due to impact printing, chain/band printers are noisy in operation and
often use a cover to reduce the noise level. Printing speeds of chain/band printers are in the range of 400 to
3000 lines per minute.
Caution
In laser printer, the toner composed of oppositely charged ink particles, sticks to the drum and then fused
permanently on it.
4. ……………. storage system consists of a rotating disk coated with a thin metal.
(a). Memory disc (b).Optical disk
(c). Hard disk (d). All of these
7. A color inkjet printer comes with two ink cartridges-black and ……..
(a).Tricolor (b).Blue, green, white
(c) Orange (d). None of these
7.6 Electrostatic Technique
An electrostatic technique converts text information into spoken sentences. To produce speech, these
devices combine basic sound units called phonemes. From a given text information, sequence of words are
combined into phonemes, amplified, and output through a speaker attached to the system. Electrostatic
technique are still in their infancy because currently they can produce only limited unique sounds with only
limited vocal inflections and phrasing. However, they are very useful in a wide range of applications.
1. For reading out text information to blind persons. For example, a recently published book may be
scanned using a scanner, converted into text using OCR software, and read out to blind persons using a
speech synthesizer.
2. For allowing those persons who cannot speak, to communicate effectively. For example, a person with
this type of disability simply types the information and the electrostatic technique converts it into spoken
words.
3. For translation systems that convert an entered text into spoken words in a selected language. For
example, a foreigner coming to India may enter a text he/she wants to communicate to an Indian, and the
electrostatic technique converts it into spoken words of the selected Indian language.
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
7.7.3 Plotter
We learnt earlier that dot matrix, inkjet, and laser printers are capable of producing graphics output.
However, engineering design applications like architectural plan of a building, design of mechanical
components of an air craft or a car, etc., often require high-quality, perfectly-proportioned graphic output
on large sheets. The various: types of printers discussed above are not suitable for meeting this output
requirement of such applications. A special type of output device, called plotters, is used for this purpose.
Plotters are an ideal output device for architects, engineers, city planners, and others who need to routinely
generate high-precision, hard-copy, graphic output of widely varying sizes. Two commonly used types of
plotters are drum plotter and flatbed plotter.
Drum Plotter
In a drum plotter, the paper on which the design is to be made is placed over a drum that can rotate in both
clockwise and anti-clockwise directions to produce vertical motion. The mechanism also consists of one or
more penholder mounted perpendicular to the drum‘s surface. The pen(s) clamped in the holder(s) can
move left to or right to left to produce horizontal motion. A graph-plotting program controls the
movements of the drum and pen (s) that is, under computer control, the drum and pen(s) move
simultaneously to draw designs and graphs; sheet placed on the drum. The plotter can also annotate the
designs and graphs so drawn by using the pen to draw characters of various sizes. Since each pen is
program selectable, pens having ink of different colors can be mounted in different holders to produce
multi-colored designs. Figure 7.9 shows a drum plotter.
Finding a Solution
The ability to use the C9600 Series printer, incorporating Heidelberg's RIP, the world's most advanced
color-conversion processor, has changed this situation completely. The solution directly controls
MICROLINE's colors when processing proofs. It is also equipped with lithography technology. As a result,
it can be used for simple proofing and can also be deployed at the actual printing site. In addition, print
shops can now review work in the same environment throughout the process since many designers are also
OKI users. The solution's processing speed is attractive, too. OKI's printers have a reputation for fast
processing anyway. Now it seems that the engine itself is faster still. In addition, it is very easy to bind-
print with OKI's page printers, enabling rapid delivery to customers. And spotting imposition mistakes is
easier than ever. In Japan, OKI's printers have been widely used as standardized machines at graphic
designers' and prepress sites. In fact, they are known as the first PostScript printers in Japan. And now,
following a further refinement in technology and performance. they are helping to support seamless and
smooth digital workflows.
In the future, the ability of print shops, production firms and project owners and to install and use the same
brand of printer will help to enable remote color proofing, ultimately resulting in even greater efficiency
levels.
Questions
1. What was the main problem with OKI’s C9600 printers?
2. How the OKI’s C9600 printer’s problem was solved?
7.8 Summary
The plotters annotate the designs and graphs so drawn by using the pen to draw characters of various
sizes.
Monitors are the most popular output device used today for producing soft-copy output.
The drum and pen(s) move simultaneously to draw designs and graphs; sheet placed on the drum.
Plotters are ideal output device for architects, engineers, city planners, and others who need to
routinely. Generate high-precision, hard-copy, graphic output of widely varying sizes.
Digitizers are used commonly in the area of computer aided design (CAD) by architects and engineers
to design cars, buildings, medical devices, robots, mechanical parts, etc.
7.9 Keywords
Flush memory: Storage technology recall that flashes is non-volatile, electrically erasable programmable
read only memory (EEPROM) chips.
Hot-spot of graphics: The graphics cursor, irrespective of its size and shape, has a pixel-size point that is
considered the point of reference to decide where the cursor is positioned on the screen. This point is called
hot-spot of the graphics cursor.
LCD: LCD stands for Liquid Crystal Display, referring to the technology behind these popular flat panel
monitors.
Plotter: A special type of output device, called plotters, is purpose. plotters are an ideal output device for
architects, engineers, city planners, and others who need to routinely generate high-precision, hard-copy,
graphic output of widely varying sizes.
Terminal: A monitor is associated usually with a keyboard and together they form a video display terminal
(VDT). A VDT (often referred to as just terminal) is the most popular input/output (I/O) device used with
today's computers.
8.0 Objectives
After studying this chapter, you will be able to:
Understand the central processing unit
Discuss the concept of arithmetic and logic unit
Define and declare control unit
Explain the registers
Understand the instruction set
Define and declare processor speed
8.1 Introduction
Central processing unit (CPU) is an older term for processor and microprocessor, the central unit in a
computer containing the logic circuitry that performs the instructions of a computer's programs. It
otherwise known as a processor is an electronic circuit that can execute computer programs. Both the
miniaturization and standardization of CPUs have increased their presence far beyond the limited
application of dedicated computing machines. Modern microprocessors appear in everything from
automobiles to mobile phones. The clock rate is one of the main characteristics of the CPU when
performance is concerned. Clock rate is the fundamental rate in cycles per second (measured in hertz,
kilohertz, megahertz or gigahertz) for the frequency of the clock in any synchronous circuit. A single clock
cycle (typically shorter than a nanosecond in modern non-embedded microprocessors) toggles between a
logical zero and a logical one state.
Engineers are working hard to push the boundaries of the current architectures and are constantly searching
for new ways to design CPUs that tick a little quicker or use slightly less energy per clock. This produces
new cooler CPUs that can run at higher clock rates.
Scientists also continue to search for new designs that allow CPUs to run at the same or at a lower clock
rate as older CPUs, but which get more instructions completed per clock cycle.
The clock rate of a processor is only useful for providing comparisons between computer chips in the same
processor family and generation. Clock rates can be very misleading since the amount of work different
computer chips can do in one cycle varies. Clock rates should not be used when comparing different
computers or different processor families. Rather, some kind of software benchmarks should be used.
Smartphone are equipped with more advanced embedded chipsets that can do many different tasks
depending on their programming.
The performance of the CPU that is at the core of the chipset is vital for the daily user experience and the
general computing performance of the Smartphone. People tend to use the clock rate of the main CPU to
compare the performance of competing end products. But as we already pointed out, the clock rate of a
processor is only useful for providing performance comparisons between computer chips in the same
processor family and generation. For all other purposes, it's best to use software benchmarks for
determining comparative performance.
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
The design of the ALU is obviously a critical part of the processor and new approaches to speeding up
instruction handling are continually being developed.
The ALU is the part where actual computations take place. It consists of circuits which perform arithmetic
operations (e.g. addition, subtraction, multiplication, division) over data received from memory and
capable to compare numbers.
8.5 Registers
Registers is special, high-speed storage area within the CPU. For example, if two numbers are to be
multiplied, both numbers must be in registers, and the result is also placed in a register. (The register can
contain the address of a memory location where data is stored rather than the actual data itself.)
The number of registers that a CPU has and the size of each (number of bits) help determine the power and
speed of a CPU. For example a 32-bit CPU is one in which each register is 32 bits wide. Therefore, each
CPU instruction can manipulate 32 bits of data.
Usually, the movement of data in and out of registers is completely transparent to users, and even to
programmers. Only assembly language programs can manipulate registers. In high-level languages, the
compiler is responsible for translating high-level operations into low-level operations that access registers.
While performing these operations the ALU takes data from the temporary storage area inside the CPU
named registers. Registers are a group of cells used for memory addressing, data manipulation and
processing. Some of the registers are general purpose and some are reserved for certain functions. It is a
high-speed memory which holds only data for immediate processing and results of this processing. If these
results are not needed for the next instruction, they are sent back to the main memory and registers are
occupied by the new data used in the next instruction.
Caution
Before processing data, all data must be represented in a register.
3. CPUs with a small instruction set, fixed-length instructions, and reduced references to memory to
retrieve operands are said to employ RISC (Reduced Instruction Set Computer) architecture.
(a). True (b) False
5. The number of registers that a CPU has and the size of each (number of bits) help determine the power
and speed of a CPU.
(a). True (b) False
EPIC Processors
The Explicitly Parallel Instruction Computing (EPIC) technology breaks through the sequential nature of
conventional processor architectures by allowing the software to communicate explicitly to the processor
when operations can be done in parallel. For this, it uses tighter coupling between the compiler and the
processor. It enables the compiler to extract maximum parallelism in the original code and explicitly
describe it to the processor. Processors based on EPIC architecture are simpler and more powerful than
traditional CISC or RISC processors. These processors are mainly targeted to next-generation, 64-bit, high-
end server and workstation market (not for personal computer market).
CMOS versions
The 68HC000, the first CMOS version of the 68000, was designed by Hitachi and jointly introduced in
1985. Motorola's version was called the MC68HC000, while Hitachi's was the HD68HC000. The
68HC000 was eventually offered at speeds of 8-20 MHz. Except for using CMOS circuitry, it behaved
identically to the HMOS MC68000, but the change to CMOS greatly reduced its power consumption. The
original HMOS MC68000 consumed around 1.35 watts at an ambient temperature of 25 °C, regardless of
clock speed, while the MC68HC000 consumed only 0.13 watts at 8 MHz and 0.38 watts at 20 MHz.
(Unlike CMOS circuits, HMOS still draws power when idle, so power consumption varies little with clock
rate.) Apple selected the 68HC000 for use in the Macintosh Portable.
Motorola replaced the MC68008 with the MC68HC001 in 1990. This chip resembled the 68HC000 in
most respects, but its data bus could operate in either 16-bit or 8-bit mode, depending on the value of an
input pin at reset. Thus, like the 68008, it could be used in systems with cheaper 8-bit memories.
The later evolution of the 68000 focused on more modern embedded control applications and on-chip
peripherals. The 68EC000 chip and SCM68000 core expanded the address bus to 32 bits, removed the
M6800 peripheral bus, and excluded the MOVE from SR instruction from user mode programs. In 1996,
Motorola updated the standalone core with fully static circuitry drawing only 2 µW in low-power mode,
calling it the MC68SEC000.
Motorola ceased production of the HMOS MC68000 and MC68008 in 1996, but its spin-off company,
Freescale Semiconductor, is still producing the MC68HC000, MC68HC001, MC68EC000, and
MC68SEC000, as well as the MC68302 and MC68306 microcontrollers and later versions of the
DragonBallfamily. The 68000's architectural descendants, the 680x0, CPU32, and Coldfire families, are
also still in production.
As a microcontroller core
After being succeeded by ―true‖ 32-bit microprocessors, the 68000 was used as the core of many
microcontrollers. In 1989, Motorola introduced the MC68302 communications processor.
Questions
1. Explain the brief history of Motorola 68000 CPU.
2. Discuss the CMOS versions of Motorola 68000.
8.8 Summary
Registers are a group of cells used for memory addressing, data manipulation and processing.
The CPU thought of as the ―brains‖ of the device.
The control unit directs and controls the activities of the internal and external devices
Control unit is a typical component of the CPU that implements the microprocessor instruction set.
The three commonly known processor architectures are CISC (Complex Instruction Set Computer),
RISC (Reduced Instruction Set Computer), and EPIC (Explicitly Parallel Instruction Computing).
8.9 Keywords
Arithmetic and logic unit: The two basic components of a CPU are the control unit the arithmetic logic
unit.
Control Unit: A control unit in general is a central (or sometimes distributed but clearly distinguishable)
part of the machinery that controls its operation, provided that a piece of machinery is complex and
organized enough to contain any such unit.
CPU: It is the brain of a computer system. All major calculations and comparisons performed by a
computer are carried out inside its CPU. CPU is also responsible for activating and controlling the
operations of other unit of the computer system. Hence, no other single component of a computer
determines its overall performance as much as its CPU.
Registers: It is Special, high-speed storage area within the CPU.
System clock and clock cycles: The CU and ALU perform operations at incredible speed. These operations
are usually synchronized by a built-in electronic clock (known as system clock) that emits millions of
regularly spaced electric pulses per second (known as clock cycles).
9.0 Objectives
After studying this chapter, you will be able to:
Discuss the basic concept of storage and its needs
Explain the Brain versus Memory
Understand about the Storage Evaluation Units
9.1 Introduction
Computer storage devices are used to store huge amounts of data and information permanently. If you
want any of your data kept safe and lastingly, then your choice should be these devices. Usually these
kinds of devices are called secondary storage or permanent storage.
When we choose storage devices, we need to understand their characteristics. The three main
characteristics of storage media are access method, capacity and portability. Access method refers to
how data is accessed from storage devices. Sequential and direct are the two kinds of methods used to
access data from secondary devices.
Storage devices are the building blocks of storage in disk subsystems as well as being used as standalone
products in server systems. The disk drive technology as the device that is used tar more than any other.
The terms main storage and auxiliary storage originated in the days of the mainframe computer to
distinguish the more immediately accessible data storage from storage that required input/output
operations. An earlier term for main storage was core in the days when the main data storage contained
ferrite cores.
Primary storage is sometimes used to mean storage for data that is in active use in contrast to storage that is
used for backup purposes. In this usage, primary storage is mainly the secondary storage referred to in
meaning 1. (It should be noted that, although these two meanings conflict, the appropriate meaning is
usually apparent from the context.)
Hard disk drives usually have multiple disks, called platters, that are stacked on top of each other and spin
in unison, each with two sides on which the drive stores data. Most drives have two or three platters,
resulting in four or six sides, but some PC hard disks have up to 12 platters and 24 sides with 24 heads to
read them (Seagate Barracuda 180). The identically aligned tracks on each side of every platter together
make up a cylinder. A hard disk drive usually has one head per platter side, with all the heads mounted on
a common carrier device or rack. The heads move radially across the disk in unison; they can't move
independently because they are mounted on the same carrier or rack, called an actuator.
Originally, most hard disks spun at 3,600rpm—approximately 10 times faster than a floppy disk drive. For
many years, 3,600rpm was pretty much a constant among hard drives. Now, however, most drives spin
even faster. Although speeds can vary, modern drives typically spin the platters at either 4,200rpm;
5,400rpm; 7,200rpm; 10,000rpm; or 15,000rpm. Most standard-issue drives found in PCs today spin at
5,400rpm, with high performance models spinning at 7,200rpm. Some of the small 2 1/2'' notebook drives
run at only 4,200rpm to conserve power, and the 10,000rpm or 15,000rpm drives are usually found only in
very high-performance workstations or servers, where their higher prices, heat generation, and noise can be
more easily dealt with. High rotational speeds combined with a fast head-positioning mechanism and more
sectors per track are what make one hard disk faster overall than another.
The heads in most hard disk drives do not (and should not!) touch the platters during normal operation.
However, on most drives, the heads do rest on the platters when the drive is powered off. In most drives,
when the drive is powered off, the heads move to the innermost cylinder, where they land on the platter
surface. This is referred to as contact start stop (CSS) design. When the drive is powered on, the heads
slide on the platter surface as they spin up, until a very thin cushion of air builds up between the heads and
platter surface, causing the heads to lift off and remain suspended a short distance above or below the
platter. If the air cushion is disturbed by a particle of dust or a shock, the head can come into contact with
the platter while it is spinning at full speed. When contact with the spinning platters is forceful enough to
do damage, the event is called a head crash. The result of a head crash can be anything from a few lost
bytes of data to a completely ruined drive. Most drives have special lubricants on the platters and hardened
surfaces that can withstand the daily ―takeoffs and landings‖ as well as more severe abuse.
Some newer drives do not use CSS design and instead use a load/unload mechanism that does not allow
the heads to contact the platters, even when the drive is powered off. First used in the 2 1/2'' form factor
notebook or laptop drives where resistance to mechanical shock is more important, traditional load/unload
mechanisms use a ramp positioned just off the outer part of the platter surface, whereas some newer
designs position the ramp near the spindle. When the drive is powered off or in a power saving mode, the
heads ride up on the ramp. When powered on, the platters are allowed to come up to full speed before the
heads are released down the ramp, allowing the airflow (air bearing) to prevent any head/platter contact.
Because the platter assemblies are sealed and nonremovable, the track densities on the disk can be very
high. Hard drives today have up to 96,000 or more tracks per inch (TPI) recorded on the media (Hitachi
Travelstar 80GN). Head disk assemblies (HDAs), which contain the platters, are assembled and sealed in
clean rooms under absolutely sanitary conditions. Because few companies repair HDAs, repair or
replacement of the parts inside a sealed HDA can be expensive. Every hard disk ever made eventually
fails. The only questions are when the failure will occur and whether your data is backed up.
Optical Disk
An optical disk is mounted on an optical disk drive for reading/writing of information on it. An optical disk
drive contains all the mechanical, electrical, and electronic components for holding an optical disk and for
reading/writing of information on it. That is, it contains the tray on which the disk is kept, read/write laser
beams assembly, and motor to rotate the disk Figure 5 Shows an optical disk drive.
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
Zip Drive: A Zip drive is a small, portable disk drive used primarily for backing up and archiving personal
computer files. The trademarked Zip drive was developed and is sold by Lomega Corporation. Zip drives
and disks come in two sizes. The 100 megabyte size actually holds 100,431,872 bytes of data or the
equivalent of 70 floppy diskettes. There is also a 250 megabyte drive and disk. The Lomega Zip drive
comes with a software utility that lets you copy the entire contents of your hard drive to one or more Zip
disks.
Flash Drives: flash drive is a compact device of the size of a pen, comes in various shapes and stylish
designs (such as pen shape, wallet shape etc.), and may have different added features (such as with a
camera, with a built-in N1P3lWMA/FM Radio play back for music on the go, etc.). It enables easy
transport of data from one computer to another. It is a plug-and-play device that simply plugs into a USB
(Universal Serial Bus) port of a computer. The computer detects it automatically as removable drive. No
one can read, write, copy, delete, and move data from the computer‘s hard disk drive to the flash drive or
from the flash drive to the hard disk drive. One can even run applications, view videos, or play MP3 files
from it directly. Once done, it can be simply plugged out of the USB port of the computer and kept into the
pocket for being carried anywhere. A flash drive does not require any battery cable, or software, and is
compatible with most PCs, desktop, and laptop computers with USB 2.0 port. All these features make it
ideal external data storage for mobile people to carry or transfer data from one computer to another. As the
name implies, it is based on flush memory storage technology Recall that flash is non-volatile, Electrically
Erasable Programmable Read Only Memory (EEPROM) chip. It is a highly -Aid-state storage having data
retention capability of more than 10 years.
Available storage capacities are 8MB, 16MB, 64MB, 128MB, 256MB, 512MB, 1GB, 2GB, 4GB, and
8GB. A: of 16MB capacity has 5600 times more storage capacity than a IAMB floppy disk.
Figure 9.7 shows a flash drive. It has a main body and usually a port connector cover. The cover is
removed or port connector is pushed out when the drive is to be plugged into the USB port of' a computer.
The main body usually has a write protect tab, a read/write LED (Light Emitting Diode) indicator, and a
strap hole. Some manufacturers also provide software to be used with the drive.
Blu Ray Disk: Blu-ray Disc (BD) is a next-generation optical disc format meant for storage of high-
definition video and high-density data. The Blu-ray standard was jointly developed by a group of consumer
electronics and PC companies called the Blu-ray Disc Association (BDA). As compared to the HD DVD
format, its main competitor, Blu-ray has more information capacity per layer, 25 instead of 15 gigabytes,
but may initially be more expensive to produce.
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
2. Which would be the most appropriate device for storing a 2 hour film?
(a) CD-ROM (b) Zip disk
(c) DVD (d) Hard disk
3. Which would be the most appropriate device for transferring files between school and home?
(a) Zip disk (b ) Hard disk
(c) CD-ROM (d) Flash memory stick
4. Which would be the most appropriate to store the computers BIOS instructions?
(a) Flash memory stick (b) RAM
(c) ROM (d) Hard disk
5. Which of these would be the most likely storage device for a music album?
(a) CD-ROM (b) Hard disk
(c) Flash memory (d) RAM
9.8 Summary
Secondary storage of a computer system is non volatile and has low cost per bit stored, but it generally
has an operating speed far slower than that of primary storage.
A computer‘s main memory built of volatile RAM chips.
Memory card disks operate like the common hard disk. But have less storage space, consumes less
power and data can be accessed quickly.
Computer memory is organized into a hierarchy. At the highest level are the processor registers.
Any storage unit of a computer system is characterized and evaluated based on following properties
storage capacity, access time cost per bit of storage, volatile, and random access.
A primary storage or main memory of a computer system is made up of several small storage areas
called location or cell. Each of these location store a fixed number of bits, called word length the
memory.
Memory storage devices use flash memory technology for secondary storage devices. Two popular
memory storage devices are flash device (pen drive ) and memory card.
9.9 Keywords
Blu-ray Disc BD is a next-generation optical disc format meant for storage of high-definition video and
high-density data.
Digital Video (or Versatile) Disk DVD was designed primarily to store and distribute movies. However, it
is fast becoming mainstream optical disk as prices are reducing and need for large capacity storage is
increasing.
Flash Drives It enables easy transport of data from one computer to another
Hard Disk Operations The basic physical construction of a hard disk drive consists of spinning disks with
heads that move over the disks and store data in tracks and sectors. The heads read and write data in
concentric rings called tracks, which are divided into segments called sectors, which typically store 512
bytes each.
Primary storage It is a computer system has limited capacity and is volatile. Hence, additional memory,
called auxiliary memory or secondary storage is used with most computer systems.
Video Compact Disc' VCD stands for ‗Video Compact Disc‘ and basically it is a CD that contains moving
pictures and sound.
10.0 Objectives
After studying this chapter, you will able to:
Define software
Explain the type of software
Describe the open source software
Explain the integrated development environment
Understand need of software
10.1 Introduction
Software is a general term used to describe a collection of computer programs, procedures and
documentation that perform some tasks on an operating system. Software is the way to perform different
tasks electronically. Software is a set of rules to perform a specific task. The software is the information
that the computer uses to get the job done. Software needs to be accessed before it can be used. There are
many terms used for the process of accessing software including running, executing, starting up, opening,
and others. Computer programs allow users to complete tasks. A program can also be referred to as an
application and the two words are used interchangeably.
10.2 Software
A computer cannot do anything on its own. It must be instructed to do a job desired by us. Hence, it is
necessary to specify a sequence of instructions a computer must perform to solve a problem. Such a
sequence of instructions written in a language understood by a computer is called a computer program. A
program controls a computer‘s processing activity, and the computer performs precisely what the program
wants it to do. When a computer is running a program to perform a task, we say, it is running or executing
that program. Hardware you can touch, software you can not.
The term software refers to a set of computer programs, procedures, and associated documents (flowcharts,
manuals, etc.) describing the programs, and how they are to be used.
Software package is a group of programs that solve a specific problem or perform a specific type of job.
For sample, a word-processing package may contain programs for text editing, text formatting, drawing
graphics, spelling checking, etc. Hence, a multipurpose computer system, like a personal computer in your
home, has several software packages, one each for every type of job it can perform. Software is a
collection of instructions that enables a user to interact with the computer or have the computer perform
specific tasks for them. Without any software the computer would be useless.
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
Linkers
A linker or link editor is a program that takes one or more objects generated by a compiler and combines
them into a single executable program.
Text editors
A text editor is a type of program used for editing plain text files. Text editors are often provided with
operating systems or software development packages, and can be used to change configuration files and
programming language source code.
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
2. Software refers to a set of computer programs, procedures, and associated documents describing the
programs, and how they are to be used.
(a) True (b) False
3. All computer setups will not require at least a disk drive, display, keyboard, memory, motherboard,
processor, power supply, and video card in order to function properly.
(a) True (b) False
5. ...................are a set of programs that help users in system maintenance tasks, and in performing tasks of
routine nature.
(a) Utility programs (b) Operating system
(c) System programs (d) None of these
Caution
Be cautious before using any software, it should be compatible to hardware otherwise it may not run
properly.
Solution
Having a look at the technical challenges, our designers proposed the following solution to Client:
Development and runtime platform was proposed to be Java in order to avoid any kind of platform
variances. Java in itself is a full fledged operating platform which provides application and native
operating system, in order to allow them to execute in a platform independent manner.
User Interface was proposed to be built in Java Swings using one of the available Plastic Look and
Feel, in order to overcome any kind of user interface differences across the platforms.
Database employed for this application was selected as HSQLDB. Being a cross platform database, it
has an ability to operate in both server as well as embedded mode. Also being open source and free, it
was best choice to use at the back end.
As far as productivity tools are concerned, we proposed a solution of choice to be OpenOffice.org.
Again OpenOffice.org is an open source project and has its own Java wrappers in order to make it
programmable. It is also available for major operating systems including Windows, Linux and Mac.
We proposed a tight integration with OpenOffice.org suite in order to follow Going beyond the
platform objective.
As far as e-mail is concerned we could have easily integrate the CRM application with Outlook but
that would result in Windows based solution. Therefore we opted out on creating our own E-Mail
engine in order to provide more value to the system.
We also proposed on creating add ons for the applications on later date in case a customization is
required where integration with native productivity tools is require.
Benefits
The solutions proposed by our designers and thinkers and the implementation of these solutions by our
developers completed the project successfully. Following are the positives of these development efforts:
Platform/Database Independent Solution
On time delivery of the solution.
Objectives were met successfully.
Future plans to start with Web Based and PDA version of the CRM application.
Third Party Tools and Software
We used the following software and tools to develop the software in order to meet all the requirements of
the project:
Development IDE: Net beans
Development platform: Java
Database Layer: Hibernate
Back end: HSQLDB
GUI: Swings component suites
Other utilities and libraries:
OpenOffice.org developers API were used to create Client OpenOffice.org bridge.
Calendar API were used to create the Calendar module of the application.
Mail API were used to create E-Mail module
Jasper Reports was used to create reporting engine of the application.
Question
1. Differentaite between technical and business situation.
2. Discuss third party tools and software.
10.7 Summary
Software is a general term used to describe a collection of computer programs, procedures and
documentation that perform some tasks on an operating system.
Computer hardware is any physical device, something that you are able to touch and software is a
collection of instructions and code installed into the computer and cannot be touched.
System software makes the operation of a computer system more effective and efficient. It helps the
hardware components work together, and provides support for the development and execution of
application software (programs).
Personal assistance software allows us to use personal computers for storage and retrieval of our
personal information, as well as planning and management of our schedules, contacts, finances, and
inventory of important items.
Open source software (OSS) is computer software that has its underlying ‗source-code‘ made available
under a license.
An integrated development environment (IDE) is a programming environment that has been packaged
as an application program, typically consisting of a code editor, a compiler, a debugger, and a
graphical user interface (GUI) builder.
10.8 Keywords
Application software: It is a set of one or more programs designed to solve a specific problem, or do a
specific task.
Compiler: It is a computer program (or set of programs) that transforms source code written in a
programming language (the source language) into another computer language.
Database software: A database is a collection of related data stored and treated as a unit for information
retrieval purposes.
Integrated development environment (IDE): It is a software application that provides comprehensive
facilities to computer programmers for software development.
Operating Systems: Operating system software takes care of effective and efficient utilization of all
hardware and software components of a computer system.
Software: It is a collection of instructions that enables a user to interact with the computer or have the
computer perform specific tasks for them. Without any software the computer would be useless.
System software: It is a set of one or more programs designed to control the operation and extend the
processing capability of a computer system.
11.0 Objectives
After studying this chapter, you will be able to:
Explain the use of operating system
Discuss the function and types of operating system
Explain the types of reboot
Explain briefly the booting process
11.1 Introduction
Modern general-purpose computers, including personal computers and mainframes, have an operating
system to run other programs, such as application software. Examples of operating systems for personal
computers include Microsoft Windows, Mac OS (and Darwin), UNIX, and LINUX. The lowest level of
any operating system is its kernel. This is the first layer of software loaded into memory when a system
boots or starts up. The kernel provides access to various common core services to all other system and
application programs. These services include, but are not limited to: disk access, memory management,
task scheduling, and access to other hardware devices.
As well as the kernel, an operating system is often distributed with tools for programs to display and
manage a graphical user interface (although Windows and the Macintosh have these tools built into the
operating system), as well as utility programs for tasks such as managing files and configuring the
operating system. They are also often distributed with application software that does not relate directly to
the operating system‘s core function, various camps advocate microkernels, monolithic kernels, and so on.
Operating systems are used on most, but not all, computer systems. The simplest computers, including the
smallest embedded systems and many of the first computers did not have operating systems. Instead, they
relied on the application programs to manage the minimal hardware themselves, perhaps with the aid of
libraries developed for the purpose. Commercially-supplied operating systems are present on virtually all
modern devices described as computers, from personal computers to mainframes, as well as mobile
computers such as PDAs and mobile phones.
11.2.2 Unix-like
The UNIX-like family is a diverse group of operating systems, with several major subcategories including
System V, (Berkeley Software Distribution) BSD, and Linux. The name ―UNIX‖ is a trademark of the
open group which licenses it for use to any operating system that has been shown to conform to the
definitions that they have cooperatively developed. The name is commonly used to refer to the large set of
operating systems which resemble the original UNIX systems run on a wide variety of machine
architectures. They are used heavily as server systems in business, as well as workstations in academic and
engineering environments. Free software UNIX variants, such as Linux and BSD, are increasingly popular.
They are used in the desktop market as well, for example Ubuntu, but mostly by hobbyists. Some UNIX
variants like HP‘s HP-UX and IBM‘s AIX are designed to run only on that vendor‘s proprietary hardware.
Others, such as Solaris, can run on both proprietary hardware and on commodity x86 PCs. Apple‘s Mac
OS X, a microkernel BSD variant derived from next step, mach, and free BSD, has replaced Apple‘s
earlier (non-UNIX) Mac OS. Over the past several years, free UNIX systems have supplanted proprietary
ones in most instances. For instance, scientific modeling and computer animation were once the province
of SGI‘s IRIX, present scenario they are dominated by Linux-based.
The team at bell labs who designed and developed UNIX went on to develop and inferno, which were
designed for modern distributed environments. They had graphics built-in, unlike UNIX counterparts that
added it to the design later did not become popular because, unlike many UNIX distributions.
The hardware-the central processing unit (CPU), the memory, and the input/output (110) devices-provides
the basic computing resources. The application programs-such as word processors, spreadsheets,
compilers, and web browsers-define the ways in which these resources are used to solve the computing
problems of the users. The operating system controls and coordinates the use of the hardware among the
various application programs for the various users. The components of a computer system are its hardware,
software, and data.
The operating system provides the means for the proper use of these resources in the operation of the
computer system operating systems can be explored from two viewpoints the user and the system.
2. The commonly used UNIX commands like date, Is, cat, etc. are stored in
(a)./dev directory (b)./bin and /usr/bin directories
(c)./unix directory (d)./tmp directory
3. When a computer is first turned on or restarted, a special type of absolute loader called ____ is executed
(a).Compile and Go loader (b).Boot loader
(c).Bootstrap loader (d).Relating loader
4. Which of the following Operating systems is better for implementing a Client-Server network
(a).MS DOS (b).Windows 95
(c) Windows 98 (d) Windows 2000
11.7.4 Multi-user
Multi-user defines operating system or application software that allows concurrent access by multiple users
of a computer. A multi-user operating system allows many different users to take advantage of the
computer‘s resources simultaneously. The operating system must make sure that the requirements of the
various users are balanced, and that each of the programs they are using has sufficient and separate
resources so that a problem with one user does not affect the entire community of users. UNIX, VMS and
mainframe operating systems, such as MVS, are examples of multi-user operating systems. Time-sharing
systems are multi-user systems. Most batch processing systems for mainframe computers may also be
considered ―multi-user‖, to avoid leaving the CPU idle while it waits for I/O operations to complete.
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
11.8 Summary
An operating system (OS) is a software program that manages the hardware and software.
A better method of implementing multitasking is for an operating system to employ preemptive.
Multiprogramming is interleaved execution of two or more different and independent programs by a
computer.
It provides strict system security.
Operating system is an integrated set of programs that controls the resources (CPU, memory I/O
devices, etc.)
11.9 Keywords
Asymmetric multi processing: Asymmetric hardware systems commonly dedicated individual processors
to specific tasks.
Multitasking: An operating system that utilizes multitasking is one that allows more than one program to
run simultaneously.
Operating System: An operating system (OS) is a software program that manages the hardware and
software resources of a computer.
Real time operating system (RTOS): Real-time operating systems are used to control machinery, scientific
instruments and industrial systems such as embedded systems.
Symmetric multi processing: SMP involves a multiprocessor computer architecture where two or more
identical processors can connect to a single shared main memory.
UNIX-like: UNIX-like family is a diverse group of operating systems, with several major subcategories
including System V, BSD, and Linux.
12.0 Objectives
After studying this chapter, you will be able to:
Discuss the versions of DOS
Explain the DOS
Explain the DOS system files
Discuss about the DOS commands
12.1 Introduction
The DOS Stands for "Disk Operating System." DOS was the first operating system used by IBM-
compatible computers. It was originally available in two versions that were essentially the same, but
marketed under two different names. "PC-DOS" was the version developed by IBM and sold to the first
IBM-compatible manufacturers. "MS-DOS" was the version that Microsoft bought the rights to, and was
bundled with the first versions of Windows.
DOS uses a command line, or text-based interface, that allows the user to type commands. By typing
simple instructions such as pwd (print working directory) and cd (change directory), the user can browse
the files on the hard drive, open files, and run programs. While the commands are simple to type, the user
must know the basic commands in order to use DOS effectively (similar to Unix). This made the operating
system difficult for novices to use, which is why Microsoft later bundled the graphic-based Windows
operating system with DOS.
The first versions of Windows (through Windows 95) actually ran on top of the DOS operating system.
This is why so many DOS-related files (such as .INI, .DLL, and .COM files) are still used by Windows.
However, the Windows operating system was rewritten for Windows NT (New Technology), which
enabled Windows to run on its own, without using DOS. Later versions of Windows, such as Windows
2000, XP, and Vista, also do not require DOS.
DOS is still included with Windows, but is run from the Windows operating system instead of the other
way around. The DOS command prompt can be opened in Windows by selecting "Run..." from the Start
Menu and typing cmd.
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
To specify the same path at the command prompt, you would type it as shown in the illustration:
Caution
Without using an emulator, DOS application run under following versions of Windows like, Windows XP
x64, Windows Vista x64 and Windows 7 x64 editions, as these do not contain NTVDM as well as 16-bit
DOS applications cannot run directly because COMMAND.COM is missing.
File Allocation table (FAT) uses the file allocation table which records, which clusters are used and unused
and where files are located within the clusters.
NTFS is a file system introduced by Microsoft and it has a number of advantages over the previous file
system, named FAT32 (File Allocation Table).
One major advantage of NTFS is that it includes features to improve reliablity. For example, the new
technology file system includes fault tolerance, which automatically repairs hard drive errors without
displaying error messages. It also keeps detailed transaction logs, which tracks hard drive errors. This can
help prevent hard disk failures and makes it possible to recover files if the hard drive does fail.
NTFS also allows permissions (such as read, write, and execute) to be set for individual directories and
files.
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
3. The maximum length allowed for primary name of a computer file under DOS is …………
(a) 8 (b) 12 (c) 3 (d) None of these.
6. What is the name given to something that the computer will automatically use unless you tell it
otherwise?
(a) a specification (b) a wildcard (c) a default (d) a rule.
12.7 Summary
The operating system is used for operating the system or the computer. It is a set of computer programs
and also known as DOS.
Basically, DOS is the medium through which the user and external devices attached to the system
communicate with the system.
In DOS, programs are started by typing their name into the command line. Directory is just like a file
folder, which contain all the logically related files. Data is stored in individual 512-byte sectors on the
hard disk. The hard disk is instead broken into larger pieces called clusters, or alternatively, allocation
units.
The modern DOS operating system is distributed on 3-5 high density floppy disks.
12.8 Keywords
Backup: It lets the user to take the backup of hard disk files to floppies.
File Allocation table (FAT) uses the file allocation table which records, which clusters are used and
unused and where files are located within the clusters.
NTFS is a file system introduced by Microsoft and it has a number of advantages over the previous file
system, named FAT32 (File Allocation Table).
Hard Disk Drive: A hard disk drive is a device for storing and retrieving digital information, primarily
computer data.
Path: It is used to each for the executable files in the directories specified.
Prompt: It changes the appearance of the command prompt or displays the current prompt.
12.9 Review questions
1. What is the history of disk operating system?
2. How many versions of disk operating system?
3. What is the physical structure of disk? Explain the disk name.
4. Discuss about the FAT file system.
5. What is the common DOS Windows file?
6. What are the rules for DOS file and directory name creation?
7. Discuss about the Long File Names (LFNS).
8. Describe the steps in the DOS boot process.
9. How many types of files are in the core DOS operating system?
10. What are the DOS Commands? Explain briefly.
13.0 Objectives
After studying this chapter, you will be able to:
Explain the data information and knowledge.
Discuss the characteristics of information.
Discuss the comparison between human language and computer language.
Define the program and programming language
Explain the programming development cycle algorithm.
Discuss the program flowcharts.
Define the pseudocode.
Explain the approaches and programming paradigms.
Explain the types of programming language.
Discuss about the third/fourth generation language.
13.1 Introduction
A program is a set of instructions that tell the computer to do various things; sometimes the instruction it
has to perform depends on what happened when it performed a previous instruction. This section gives an
overview of the two main ways in which you can give these instructions or ―commands‖ as they are
usually called. One way uses an interpreter, the other a compiler. As human languages are too difficult for
a computer to understand in an unambiguous way, commands are usually written in one or other languages
specially designed for the purpose.
13.2.2 Information
Information is a flow of messages. The patterns and relationship in the data is pointed out and discussed.
The data is made informative and must be put into a context and linked like data.
Information can be considered as an aggregation of data (processed data) which makes decision
making easier.
Information has usually got some meaning and purpose.
13.2.3 Knowledge
Knowledge is a multifaceted concept with multilayered meaning. The history of philosophy since the
classical Greek period can be regarded as never ending search for the meaning of knowledge
By knowledge we mean human understanding of a subject matter that has been acquired through
proper study and experience.
Knowledge is usually based on learning, thinking, and proper understanding of the problem area.
Knowledge is not information and information is not data.
Knowledge is derived from information in the same way information is derived from data.
We can view it as an understanding of information based on its perceived importance or relevance to a
problem area.
It can be considered as the integration of human perceptive processes that helps them to draw
meaningful conclusions.
13.3 Characteristics of Information
Good information is that which is used and which creates value. Experience and research shows that good
information has numerous qualities. Good information is relevant for its purpose, sufficiently accurate for
its purpose, completes enough for the problem, reliable and targeted to the right person. It is also
communicated in time for its purpose, contains the right level of detail and is communicated by an
appropriate channel, i.e. one that is understandable to the user.
Further details of these characteristics related to organizational information for decision-making follows.
13.3.1 Availability/Accessibility
Information should be easy to obtain or access. Information kept in a book of some kind is only available
and easy to access if you have the book to hand. A good example of availability is a telephone directory, as
every home has one for its local area. It is probably the first place you look for a local number. But
nobody keeps the whole country‘s telephone books so for numbers further afield you probably phone a
directory enquiry number. For business premises, say for a hotel in London, you would probably use the
Internet.
Businesses used to keep customer details on a card-index system at the customer‘s branch. If the customer
visited a different branch a telephone call would be needed to check details. Now, with centralized
computer systems, businesses like banks and building societies can access any customer‘s data from any
branch.
13.3.2 Accuracy
Information needs to be accurate enough for the use to which it is going to be put. To obtain information
that is 100% accurate is usually unrealistic as it is likely to be too expensive to produce on time. The
degree of accuracy depends upon the circumstances. At operational levels information may need to be
accurate to the nearest penny on a supermarket till receipt, for example. Accuracy is important. As an
example, if government statistics based on the last census wrongly show an increase in births within an
area, plans may be made to build schools and construction companies may invest in new housing
developments. In these cases any investment may not be recouped.
13.3.4 Relevance/Appropriateness
Information should be relevant to the purpose for which it is required. It must be suitable. What is relevant
for one manager may not be relevant for another. The user will become frustrated if information contains
data irrelevant to the task in hand.
For example, a market research company may give information on users‘ perceptions of the quality of a
product. This is not relevant for the manager who wants to know opinions on relative prices of the product
and its rivals. The information gained would not be relevant to the purpose.
13.3.5 Completeness
Information should contain all the details required by the user. Otherwise, it may not be useful as the basis
for making a decision. For example, if an organization is supplied with information regarding the costs of
supplying a fleet of cars for the sales force, and servicing and maintenance costs are not included, then a
costing based on the information supplied will be considerably underestimated.
Ideally all the information needed for a particular decision should be available. However, this rarely
happens; good information is often incomplete. To meet all the needs of the situation, you often have to
collect it from a variety of sources.
13.3.7 Presentation
The presentation of information is important to the user. Information can be more easily assimilated if it is
aesthetically pleasing. For example, a marketing report that includes graphs of statistics will be more
concise as well as more aesthetically pleasing to the users within the organization. Many organizations use
presentation software and show summary information via a data projector. These presentations have
usually been well thought out to be visually attractive and to convey the correct amount of detail.
13.3.8 Timing
Information must be on time for the purpose for which it is required. Information received too late will be
irrelevant. For example, if you receive a brochure from a theatre and notice there was a concert by your
favorite band yesterday, then the information is too late to be of use.
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
Symbols
A typical flowchart have the following kinds of symbols:
Start and end symbols: Represented as circles, ovals or rounded rectangles, usually containing the word
―Start‖ or ―End‖, or another phrase signaling the start or end of a process, such as ―submit enquiry‖ or
―receive product‖.
Arrows: Showing ―flow of control‖. An arrow coming from one symbol and ending at another symbol
represents that control passes to the symbol the arrow points to.
Generic processing steps: Represented as rectangles. Examples: ―Add 1 to X‖; ―replace identified part‖;
―save changes‖ or similar.
Subroutines: Represented as rectangles with double-struck vertical edges; these are used to show complex
processing steps which may be detailed in a separate flowchart. Example: PROCESS-FILES. One subroutine
may have multiple distinct entry points or exit flows; if so, these are shown as labeled 'wells' in the
rectangle, and control arrows connect to these ‗wells‘.
Input/Output: Represented as a parallelogram. Examples: Get X from the user; display X.
Prepare conditional: Represented as a hexagon. Shows operations which have no effect other than
preparing a value for a subsequent conditional or decision step.
Conditional or decision: Represented as a diamond (rhombus) showing where a decision is necessary,
commonly a Yes/No question or True/False test. The conditional symbol is peculiar in that it has two
arrows coming out of it, usually from the bottom point and right point, one corresponding to Yes or True,
and one corresponding to No or False. (The arrows should always be labeled.) More than two arrows can
be used, but this is normally a clear indicator that a complex decision is being taken, in which case it may
need to be broken-down further or replaced with the ―pre-defined process‖ symbol.
Junction symbol: Generally represented with a black blob, showing where multiple control flows converge
in a single exit flow. A junction symbol will have more than one arrow coming into it, but only one going
out. In simple cases, one may simply have an arrow point to another arrow instead. These are useful to
represent an iterative process (what in Computer Science is called a loop). A loop may, for example,
consist of a connector where control first enters, processing steps, a conditional with one arrow exiting the
loop, and one going back to the connector. For additional clarity, wherever two lines accidentally cross in
the drawing, one of them may be drawn with a small semicircle over the other, showing that no junction is
intended.
Labeled connectors: Represented by an identifying label inside a circle. Labeled connectors are used in
complex or multi-sheet diagrams to substitute for arrows. For each label, the ―outflow‖ connector must
always be unique, but there may be any number of ―inflow‖ connectors. In this case, a junction in control
flow is implied.
Concurrency symbol: Represented by a double transverse line with any number of entry and exit arrows.
These symbols are used whenever two or more control flows must operate simultaneously. The exit flows
are activated concurrently when all of the entry flows have reached the concurrency symbol. A
concurrency symbol with a single entry flow is a fork; one with a single exit flow is a join.
Data-flow extensions: A number of symbols have been standardized for data flow diagrams to represent
data flow, rather than control flow. These symbols may also be used in control flow charts (e.g. to
substitute for the parallelogram symbol).
A Document represented as a rectangle with a wavy base;
A Manual input represented by quadrilateral, with the top irregularly sloping up from left to right. An
example would be to signify data-entry from a form;
A Manual operation represented by a trapezoid with the longest parallel side at the top, to represent an
operation or adjustment to process that can only be made manually.
A Data File represented by a cylinder.
Types of flowchart
Flowcharts can be modelled from the perspective of different user groups and that there are four general
types:
Document flowcharts, showing controls over a document-flow through a system
Data flowcharts, showing controls over a data-flow in a system
System flowcharts showing controls at a physical or resource level
Program flowchart, showing the controls in a program within a system
Figure.13.1 A flowchart for computing the factorial of N (10!) where N! = (1*2*3*4*5*6*7*8*9*10)
Caution
An algorithm is a precise list of precise steps, the order of computation will always be critical to the
functioning of the algorithm.
13.8 Pseudocode
In computer science and numerical computation, pseudocode is a compact and informal high-level
description of the operating principle of a computer program or other algorithm. It uses the structural
conventions of a programming language, but is intended for human reading rather than machine reading.
Pseudocode typically omits details that are not essential for human understanding of the algorithm, such as
variable declarations, system-specific code and some subroutines. The programming language is
augmented with natural language descriptions details, where convenient, or with compact mathematical
notation. The purpose of using pseudocode is that it is easier for people to understand than conventional
programming language code, and that it is an efficient and environment-independent description of the key
principles of an algorithm. It is commonly used in textbooks and scientific publications that are
documenting various algorithms, and also in planning of computer program development, for sketching out
the structure of the program before the actual coding takes place.
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
It can be shown that anything solvable using one of these paradigms can be solved using the others;
however, certain types of problems lend themselves more naturally to specific paradigms.
13.10.1 Imperative
The imperative programming paradigm assumes that the computer can maintain through environments of
variables any changes in a computation process. Computations are performed through a guided sequence
of steps, in which these variables are referred to or changed. The order of the steps is crucial, because a
given step will have different consequences depending on the current values of variables when the step is
executed.
Imperative Languages
Popular programming languages are imperative more often than they are any other paradigm studies in this
course. There are two reasons for such popularity:
The imperative paradigm most closely resembles the actual machine itself, so the programme is much
closer to the machine;
Because of such closeness, the imperative paradigm was the only one efficient enough for widespread use
until recently.
Advantages
Efficient
Close to the machine
Popular
Familiar
Disadvantages
The semantics of a program can be complex to understand or prove, because of referential
transparency does not hold(due to side effects)
Side effects also make debugging harder.
Abstraction is more limited than with some paradigms;
Order is crucial, which doesn't always suit itself to problems.
13.10.2 Logical
The Logical Paradigm takes a declarative approach to problem-solving. Various logical assertions about a
situation are made, establishing all known facts. Then queries are made. The role of the computer becomes
maintaining data and logical deduction.
Logical Paradigm Programming
A logical program is divided into three sections:
A series of definitions/declarations that define the problem domain
Statements of relevant facts
Statement of goals in the form of a query
Any deducible solution to a query is returned. The definitions and declarations are constructed entirely
from relations. i.e. X is a member of Y or X is in the internal between a and b etc.
Advantages
The advantages of logic oriented programming are beheld:
The system solves the problem, so the programming steps themselves are kept to a minimum;
Proving the validity of a given program is simple.
13.10.3 Functional
The Functional Programming paradigm views all subprograms as functions in the mathematical sense-
informally; they take in arguments and return a single solution. The solution returned is based entirely on
the input, and the time at which a function is called has no relevance. The computational model is therefore
one of function application and reduction.
Languages
Functional languages are created based on the functional paradigm. Such languages permit functional
solutions to problems by permitting a programmer to treat functions as first-class objects (they can be
treated as data, assumed to have the value of what they return; therefore, they can be passed to other
functions as arguments or returned from functions).
Advantages
The following are desirable properties of a functional language:
The high level of abstraction, especially when functions are used, suppresses many of the details of
programming and thus removes the possibility of committing many classes of errors.
The lack of dependence on assignment operations, allowing programs to be evaluated in many
different orders. This evaluation order independence makes function-oriented languages good
candidates for programming massively parallel computers.
The absence of assignment operations makes the function-oriented programs much more amenable to
mathematical proof and analysis than are imperative programs, because functional programs possess
referential transparency.
Disadvantages
Perhaps less efficiency.
Problems involving many variables or a lot of sequential activity are sometimes easier to handle
imperatively or with object-oriented programming.
13.10.4 Object-Oriented
Object Oriented Programming (OOP) is a paradigm in which real-world objects are each viewed as
separate entities having their own state which is modified only by built in procedures, called methods.
Because objects operate independently, they are encapsulated into modules which contain both local
environments and methods. Communication with an object is done by message passing.
Objects are organized into classes, from which they inherit methods and equivalent variables. The object-
oriented paradigm provides key benefits of reusable code and code extensibility.
4. Most popular object-oriented programming languages include ……….C#, C++, and Python
(a).Java (b).None of these
(c). Visual Basic (d). Both of these
A third generation language improves over a second generation language by having the computer take care
of non-essential details, not the programmer. ―High level language‖ is a synonym for third-generation
programming language.
First introduced in the late 1950s, FORTRAN, ALGOL, and COBOL are early examples of this sort of
language. Most popular languages today, such as C, C++, C#, Java, BASIC and Delphi, are also third-
generation languages.
Most 3GLs support structured programming. A fourth-generation programming language (4GL) is a
programming language or programming environment designed with a specific purpose in mind, such as the
development of commercial business software. In the history of computer science, the 4GL followed the
3GL in an upward trend toward higher abstraction and statement power. The 4GL was followed by efforts
to define and use a 5GL.
The natural-language, block-structured mode of the third-generation programming languages improved the
process of software development. However, 3GL development methods can be slow and error-prone. It
became clear that some applications could be developed more rapidly by adding a higher-level
programming language and methodology which would generate the equivalent of very complicated 3GL
instructions with fewer errors. In some senses, software engineering arose to handle 3GL development.
4GL and 5GL projects are more oriented toward problem solving and systems engineering. All 4GLs are
designed to reduce programming effort, the time it takes to develop software, and the cost of software
development.
A quantitative definition of 4GL has been set by Capers Jones, as part of his work on function point
analysis. Jones defines the various generations of programming languages in terms of developer
productivity, measured in function points per staff-month. A 4GL is defined as a language that supports
12–20 function points per staff month. This correlates with about 16–27 lines of code per function point
implemented in a 4GL.
13.13 Summary
Computer software is a set of programming instructions. Before starting coding, programmers must
understand the user requirements and the flow of logic of the program.
Assembly language is easier to use than machine language as a programmer can use symbols to sum
up program instructions.
Fourth-generation languages free programmers from worrying about the procedures to be followed to
solve a problem
The design is then broken down into modules to facilitate programming.
13.14 Keywords
Compiler: It supports the assembler instructions. If an exceptional speed of execution of a part of a code is
required, and the user possesses the corresponding knowledge of the microcontroller architecture and
assembler instructions, then the critical part of the program could be written in the assembler (user-
optimized parts of the code).
Modularity: SDF definitions can be modular because they accept all context-free languages, including the
ambiguous ones. This will help you compose embedded languages and deal with language dialects in a
natural manner.
Object-oriented programs: The designer specifies both the data structures and the types of operations that
can be applied to those data structures.
Programming languages: It usually has several kinds of identifiers. Consider Java for example, it has
class names, variable names, package names, etc.
Structured programming: It requires that programmers break program structure into small pieces of code
that are easily understood.
14.0 Objectives
After studying this chapter, you will be able to:
Explain history of virus
Discuss mechanism of virus
Understand how a virus spreads
Understand how virus is named
Explain a few prominent viruses
Discuss types of computer virus
Understand Norton anti virus
Understand execution of Norton anti virus
14.1 Introduction
The person might have a computer virus infection when the computer starts acting differently. For instance
getting slow or when they turn the computer on, it says that all the data is erased or when they start writing
a document, it looks different, some chapters might be missing or something else abnormal has happened.
The next thing usually the person whose computer might be infected with virus, panics. The person might
think that all the work that has been done is missing. That could be true, but in most cases viruses have not
done any harm jet, but when one start doing something and are not sure what you do, that might be
harmful. When some people try to get rid of viruses they delete files or they might even format the whole
hard disk.
4. Viruses that replicate themselves via e-mail or over a computer network cause the subsidiary problem of
increasing the amount of…………..
(a) Internet (b) data
(c) network traffic (d) Both (a) and (c)
5. ……mission is to hop from program to other and this should happen as quickly as possible.
(a) Antivirus (b) Virus‘s
(c) Program (d) None of these
Troj/Invo-Zip
W32/Netsky
Mal/EncPk-EI
Troj/Pushdo-Gen
Troj/Agent-HFU
Mal/Iframe-E
Troj/Mdrop-BTV
Troj/Mdrop-BUF
Troj/Agent-HFZ
Troj/Agent-HGT
Caution
Always scan email and instant messages for viruses before opening any attachments, as sometimes it may
contain harmful viruses.
14.7.6 Worms
A worm is a program that scans a company‘s network, or the Internet, for another computer that has a
specific security hole. It copies itself to the new machine (through the security hole), and
Different Types of Viruses then starts replicating itself there. Worms replicate themselves very quickly; a
network infected with a worm can be brought to its knees within a matter of hours.
Worms do not even have to be delivered via conventional programs; so-called ―fileless‖ worms are recent
additions to the virus scene. While in operation, these programs exist only in system memory, making
them harder to identify than conventional file-hosted worms. These worms— such as the CodeRed and
CodeBlue viruses—could cause considerable havoc in the future.
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
Caution
For surfing on Internet your computer must be virus protected.
Unfortunately, after installation, you hit a slight roadblock in the form of forced product activation. There
was no way that you see to close the activation window, and it remained on my screen until gave it an
email address to tag the serial on to. It is bad enough when you are bugged to activate a product after
installation, but when, for whatever reason, you have no choice but to register, we get a little more pissed
off. It seems Norton wanted to twist your arm on this one, and if it was not clear enough with the lack of an
exit button on the window.
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
Melissa‘s attack begins as an infected Microsoft Office file that takes advantage of the interoperability of
Microsoft software. It copies itself to various files on the infected machine, then emails itself to entries
found in address books on the machine with an attachment bearing the Microsoft .doc extension.
Originally, Smith‘s attachment was passed off as a list of names and passwords to get access to
pornographic websites. Once a machine became infected Melissa could send out any Office file as the
attachment, so in just a few hours every .doc attachment was suspect.
In addition to reproducing and emailing itself, Melissa can also modify the infected Office documents in a
variety of ways including data corruption, replacing the current data with something completely unrelated,
damaging macros or adding its own, even harvesting data found in some documents. This is was Dan‘s
experience with the Office documents on his machine. Another variant, Melissa. U went so far as to
change the properties of Windows system files and then delete them, rendering the machine un-bootable as
soon as it was shut down. Fortunately Dan was not struck with this variant.
Removal of the virus needed to be done manually since Dan's antivirus vendor had not yet released an
automatic removal tool. The technician first needed to isolate Melissa's original source file, usually found
still residing in the email folders. That source file had to be deleted along with any copies it made of itself
and placed elsewhere on the machine, but unfortunately no source file was found initially. Next, all
documents had to be scanned and cleaned where possible, deleted when cleaning was not possible.
Finally, the tech had to clean the system registry and the Microsoft Office preferences. Melissa modified a
registry entry that was originally produced by the operating system. This modification told the virus
whether or not it had mailed itself out previously. Oddly enough, the author programmed Melissa to run
the email only once. As for the Office preferences, Melissa disabled macro tools, macro virus protection,
verification of template saving, and confirmation of document conversion. Disabling these options allowed
the virus to modify documents without the knowledge of the user. All these features were turned back on
as part of the removal process.
Once removal was complete and the computer returned, Dan and the tech needed to figure out where the
infection came from in order to kill the source and prevent a second attack. The usual suspects were
checked first; teenagers in the house who frequently exchanged files, unusual email attachments that had
been opened, questionable websites that might have been visited. Yet all of these possibilities came up
empty.
Dan mentioned he had been off work all week due to a mandatory facility furlough and was looking
forward to returning in a couple of days. He had brought some work home with him the previous Friday so
he would not be behind after the furlough, but the computer being down prevented him from doing much
work. As it turned out, in his remarks Dan had revealed the source: the documents he had brought home
from work. The floppy disk was checked and there it was; a file Dan had received in his email and which
he brought home and opened on his computer.
The lab where Dan worked had been infected, but due to the week-long furlough it had not been able to do
significant damage to the system. A call was placed to the lab's IT department who went in immediately
and cleaned all the computers. When the doors opened the following Monday it was business as usual,
thanks to a dedicated employee and a repair tech who knew what he was doing.
ZSecurity detects and cleans thousands of computer viruses, including Melissa and it variants. Make sure
your program is updated and running at all times.
Questions
1. Write the brief conclusion of the case study.
2. How Melissa‘s attack begins as an infected Microsoft Office? Discuss.
14.10 Summary
Viruses that replicate themselves via e-mail or over a computer network cause the subsidiary problem
of increasing the amount of Internet and network traffic.
Viruses are nasty little bits of computer code, designed to inflict as much damage as possible, and to
spread to as many computers as possible—a particularly vicious combination.
Antivirus vendors generally assign virus names consisting of a prefix, the name, and a suffix.
The most ―traditional‖ form of computer virus is the file infector virus, which hides within the code of
another program.
Boot sector viruses reside in the part of the disk that is read into memory and executed when your
computer first boots up.
Norton antivirus makes online shopping, banking, and browsing safer and more convenient than ever.
14.11 Keywords
Master Boot Record: The MBR is a type of boot sector popularized by the IBM Personal Computer.
Prefix: In the virus naming the prefix identifies the type of virus or malware.
Script Viruses: Script viruses are based on common scripting languages, which are macro-like pseudo-
programming languages typically used on Web sites and in some computer applications.
Virus: The term ―virus‖ is commonly but erroneously used to refer to other types of malware, including
but not limited to adware and spyware programs that do not have the reproductive ability.
Wild virus: It is the first virus infected Apple II floppy disk in 1981.
Worms: A worm is a program that scans a company‘s network, or the Internet, for another computer that
has a specific security hole.
14.12 Review Questions
1. Discuss the history of virus in brief.
2. Explain the file infector viruses.
3. What do you understand by mechanism of virus?
4. Write five reasons for spread the virus.
5. Explain the concept of virus naming with suitable example.
6. Differentiate between boot sector and macro virus.
7. What do you understand by chat and instant messaging viruses?
8. Explain in brief about antivirus.
9. Write five tips to safe computer from virus.
10. How viruses activate? Explain.
15.0 Objectives
After studying this chapter, you will be able to:
Explain Network
Discuss MODEM
Understand Types of Modem
Transmission directional capability: The direction in which information can be transmitted over a channel
depends on whether the channel is simple, half-duplex or full-duplex.
Simplex: Information can be transmitted only in one direction.
Half-duplex: Information can be transmitted in both directions, but only in one direction at a time.
Full-duplex: Information can be transmitted in both directions simultaneously.
Signal type: There are two signal types analog and digital. It is a little hard to understand the exact
difference without discussing a lot of electrical engineering and physics, so we would not go there. What
you need to take away is that:
Analog signals are ‗continuous‘ (they take on a wide range of values) and digital signals are
‗discrete‘, and binary.
Digital signals are more ‗natural‘ for computer networks, since, as we know, computers represent all
information in binary.
The reason why we have to worry about analog signals is because the communications. Channels that
predated computer networks (like telephone lines, cable TV lines and radio transmitters) were all
designed to carry analog signals.
In the past, two parallel flat wires were used for communication. However electromagnetic
interference from devices such as a motor can create noise over those wires. If the two wires are
parallel, the wire closest to the source of the noise gets more interference and ends up with a
higher voltage level than the wire farther away, which results in an uneven load and a damaged
signal (see Figure 15.3).
Advantages of UTP are its cost and ease of use. UTP is cheap, flexible, and easy to install. Higher
grades of UTP are used in many LAN technologies, including Ethel and Token Ring. Figure 15.5
shows a cable containing five unshielded twisted pairs.
The Electronic Industries Association (EIA) has developed standards to grade UTP cables by
quality. Categories are determined by cable quality, with 1 as the lowest and 5 as the highest.
Each EIA category is suitable for certain uses and not for others:
Category 1: The basic twisted-pair cabling used in telephone systems. This level of quality is fine
for voice but inadequate for all but low-speed data communication.
Category 2: The next higher grade, suitable for voice and for data transmission of up to 4 Mbps.
Category 3: Required to have at least three twists per foot and can be used for data transmission
of up to 10 Mbps. It is now the standard Cable for most telephone systems.
Category 4: Must also have at least three twists per foot as well A other conditions to bring the
possible transmission rate to 15 Mbps.
Category 5: Used for data transmission up to 100 Mbps.
UTP Connectors UTP is most commonly connected to network devices via a type of snap-in plug
like that used with telephone jacks. Connectors are either male (the plug) or female (the
receptacle). Male connectors snap into female connectors and have a repressible tab (called a key)
that locks them in place. Each wire in a cable is attached to one conductor (or pin) in the
connector. The most frequently used of these plugs is an RJ45 connector with eight conductors,
one for each wire of four twisted pairs (see Figure 15.6).
Materials and manufacturing requirements make STP more expensive than UTP but less
susceptible to noise.
Coaxial Cable
Coaxial cable (or coax) carries signals of higher frequency ranges than twisted-pair cable (see
Figure 15.8), in part because the two media are constructed quite differently. Instead of having
two wires, coax has a central core conductor of solid or stranded wire (usually copper) enclosed in
an insulating sheath, which is, in turn, encased in an outer conductor of metal foil, braid, or a
combination of the two (also usually copper). The outer metallic wrapping serves both as a shield
against noise and as the second conductor which completes the circuit. This outer conductor is
also enclosed in an insulating sheath, and the whole cable is protected by a plastic cover (see
Figure 15.9).
Figure 15.8: Frequency range of coaxial cable.
Optical Fiber
Up until this point, we have discussed conductive (metal) cables that transmit signals in the form
of current. Optical fiber, on the other hand, is made of glass or plastic and transmits signals in the
form of light. To understand optical fiber, we first need to explore several aspects of the nature of
light.
The Nature of Light
Light is a form of electromagnetic energy. It travels at its fastest in a vacuum: 300,000
kilometers/second (approximately 186,000 miles/second). The speed of light depends on the
density of the medium through which it is travelling (the higher the density, the slower the speed).
Light, a form of electromagnetic energy, travels at 300,0001cilometers/second, or approximately
186,000 miles/second, in a vacuum. This speed decreases as the medium through which the light
travels becomes denser.
Refraction
Light travels in a straight line as long as it is moving through a single uniform substance. If a ray
of light travelling through one substance suddenly enters another (more or less dense) substance,
its speed changes abruptly, causing the ray to change direction. This change is called refraction. A
straw sticking out of a glass of water appears bent, or even broken, because the light by which we
see it changes direction as it moves from the air to the water.
The direction in which a light ray is refracted depends on the change in density encountered. A
beam of light moving from a less dense into a more dense medium is bent toward the vertical axis
(examine Figure 15.10). The two angles made by the beam of light in relation to the vertical axis
are called I, for incident, and R, for refracted. In Figure 15.10a, the beam travels from a less dense
medium into a denser medium. In this case, angle R is smaller than angle I. In Figure 15.10b,
however, the beam travels from a denser medium into a less dense medium. In this case, the value
of I is smaller than the value of R. In other words, when light travels into a denser medium, the
angle of incidence is greater than the angle of refraction; and when light travels into a less dense
medium, the angle of incidence is less than the angle of refraction.
Fiber-optic technology takes advantage of the properties shown in Figure 15.10b to control the
propagation of light through the fiber channel.
Critical Angle
Now examine Figure 15.11. Once again we have a beam of light moving from a denser into a less
dense medium. In this example, however, we gradually increase the angle of incidence measured
from the vertical. As the angle of incidence increases, so does the angle of refraction. It, too,
moves away from the vertical and closer and closer to the horizontal.
Optical fibers use reflection to guide light through a channel. A glass or plastic core is surrounded
by a cladding of less, dense glass or plastic. The difference in density of the two materials must be
such that a beam of light moving through the core is reflected off the cladding instead of being
refracted into it. Information is encoded onto a beam of light as a series of on-off flashes that
represent 1 and 0 bits.
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
15.3 MODEM
The need to communicate between distant computers led to the use of the existing phone network
for data transmission. Most phone lines were designed to transmit analog information - voices,
while the computers and their devices work in digital form - pulses. So, in order to use an analog
medium, a converter between the two systems is needed. This converter is the MODEM which
performs MODulation and DEModulation of transmitted data. It accepts serial binary pulses from
a device, modulates some property (amplitude, frequency, or phase) of an analog signal in order
to send the signal in an analog medium, and performs the opposite process, enabling the analog
information to arrive as digital pulses at the computer or device on the other side of connection.
Modems, in the beginning, were used mainly to communicate between DATA TERMINALS and
a HOST COMPUTER. Later, the use of modems was extended to communicate between END
COMPUTERS. This required more speed and the data rates increased from 300 bps in early days
to 28.8bps today. Today, transmission involves data compression techniques which increase the
rates, error detection and error correction for more reliability.
In order to enable modems of various types and different manufacture to communicate, interface
standards were developed by some standard organizations
Today's modems are used for different functions. They act as textual and voice mail systems,
facsimiles, and are connected or integrated into cellular phones and in notebook computers
enabling sending data from anywhere. The future might lead to new applications. Modem speeds
are not expected to be increased much over today's 28.8 kbps. Further dramatic speed increases
will require digital phone technology such as ISDN and fiber optic lines.
New applications might be implemented such as simultaneous voice and data. Videophones are an
example of this.
Modems can be characterised by the following properties:
Internal / External / PCMCIA modem
An internal modem is installed in one of the computer's expansion slot.
External modems are fully functioning external devices. The external modem is connected
to a computer using a serial cable to one of the computer's serial ports, and draws power
from an external power source.
PCMCIA - Personal Computer Memory Card International Association. (Or People Can't
Memorise Computer Industry Acronyms)
Transmission speed
Error detection and correction
Compression
3……..the central connection point for network cables that connect to computers or other devices on a
network.
(a). Network (b). Network hub
(c). Network adapter cards (d). None of these
4…………expansion cards that provide the physical connection between each computer and the network.
(a). Network cards (b). Pen cards
(c). Network adapter cards (d). None of these
5. ………are more ‗natural‘ for computer networks, since, as we know, computers represent all information
in binary.
(a). Analog signals (b). Network signals
(c). Digital signals (d). None of these
6. ……..are ‗continuous‘ (they take on a wide range of values) and digital signals are ‗discrete’, and
binary.
(a). Analog signals (b). Digital signals
(c). Communication (d). None of these
7 …………Information can be transmitted in both directions simultaneously.
(a). Half-duplex (b). Full-duplex
(c). Signals (d). None of these
15.5 Summary
The modern form of communication like e-mail and Internet is possible only because of computer
networking.
Data Routing is the process of finding the most efficient route between source and destination before
sending the data.
In simplex mode the communication take place in one direction. The receiver receives the signal from
the transmitting device.
In half-duplex mode the communication channel is used in both directions, but only in one direction at
a time. Thus a half-duplex line can alternately send and receive data.
The computer that provides resources to other computers on a network is known as server.
In the network the individual computers, which access shared network resources, are known as nodes.
15.6 Keywords
Communication Satellite: The problem of line-sight and repeaters are overcome by using satellites which
are the most widely used data transmission media in modern days.
Data sequencing: A long message to be transmitted is broken into smaller packets of fixed size for error
free data transmission.
Internet: The newest type of network to be used within an organisation is an internet or Internet Web.
Such networks enable computers (or network) of any type to communicate easily.
Transmission: Communication of data achieved by the processing of signals.
Teleconferencing: It refers to electronic meetings that involve people who are at physically different sites.
Telecommunication technology allows participants to interact with one another without travelling to the
same location.
16.0 Objectives
After studying this chapter, you will be able to:
Explain Internet V/s Intranet
Discuss Network Topology
Understand Network Devices
Understand how virus is named
16.1 Introduction
In the information age that we live in today, the speed at which information can travel inside a company
would often indicate the productivity of that company. It is often necessary to create an environment where
the flow of data is unimpeded and the intended recipient gets it instantaneously. Computers make this
possible and there are multiple ways to implement such a network.
An Intranet is a computer network that is designed to work like the internet but in a much smaller scale and
is restricted only to the employees of the company. It is possible to run FTP, HTTP, and mail servers in the
intranet that is independent and inaccessible from the internet without proper authorization. This allows the
employees to send progress reports to their manager even when they cannot meet in person. Workers could
also work collaboratively on a certain project while keeping their paperwork properly synchronized. It is
often necessary to have access to the internet from within your intranet, which is why intranets are placed
behind a firewall. Some companies even deploy two firewalls and place some services inside the DMZ in
order to raise their security further.
An intranet, although very helpful, wouldn‘t be very effective if it is totally removed from the internet. The
internet is the massive network of computers from all around the world. It allows people to virtually any
point in the world at a very minimal cost. Services like Email and VoIP has allowed many people to keep
in touch despite geographical locations and time zones.
Being connected to the internet, a company can have their people in the field or those who are working at
home to still be able to do what they would usually do when they are inside the office. They can connect to
services inside the intranet and submit their work or contact their coworkers and superiors. They can even
call online if their office supports IP-PABX systems.
The Intranet and the Internet are two domains that are very alike but are often segregated in order to
maintain security. If properly configured and guarded, an Intranet that is connected to the Internet could
raise your company‘s productivity by leaps and bound; not to mention cutting down the cost of traditional
communications. It could also open the door to malicious people who can do major damage or even steal
confidential company data if done haphazardly. It should be up to the management to make sure that all
precautions are taken.
The physical topology of a network refers to the configuration of cables, computers, and other peripherals.
Physical topology should not be confused with logical topology which is the method used to pass
information between workstations. Main Types of Network Topologies In networking, the term "topology"
refers to the layout of connected devices on a network. This article introduces the standard topologies of
computer networking.
One can think of a topology as a network's virtual shape or structure. This shape does not necessarily
correspond to the actual physical layout of the devices on the network. For example, the computers on a
home LAN may be arranged in a circle in a family room, but it would be highly unlikely to find an actual
ring topology there.
Star Topology
Star Topology Many home networks use the star topology. A star network features a central connection
point called a "hub" that may be a hub, switch or router. Devices typically connect to the hub with
Unshielded Twisted Pair (UTP) Ethernet.
Compared to the bus topology, a star network generally requires more cable, but a failure in any star
network cable will only take down one computer's network access and not the entire LAN. (If the hub fails,
however, the entire network also fails.)
Star Topology
Star-Wired Ring
A star-wired ring topology may appear (externally) to be the same as a star topology. Internally, the MAU
of a star-wired ring contains wiring that allows information to pass from one device to another in a circle
or ring (See fig. 3). The Token Ring protocol uses a star-wired ring topology.
Ring Topology
Ring Topology In a ring network, every device has exactly two neighbors for communication purposes. All
messages travel through a ring in the same direction (either "clockwise" or "counterclockwise"). A failure
in any cable or device breaks the loop and can take down the entire network.
To implement a ring network, one typically uses FDDI, SONET, or Token Ring technology. Ring
topologies are found in some office buildings or school campuses.
Ring Topology
Bus Topology
Bus Topology Bus networks (not to be confused with the system bus of a computer) use a common
backbone to connect all devices. A single cable, the backbone functions as a shared communication
medium that devices attach or tap into with an interface connector. A device wanting to communicate with
another device on the network sends a broadcast message onto the wire that all other devices see, but only
the intended recipient actually accepts and processes the message.
Ethernet bus topologies are relatively easy to install and don't require much cabling compared to the
alternatives. 10Base-2 ("ThinNet") and 10Base-5 ("ThickNet") both were popular Ethernet cabling options
many years ago for bus topologies. However, bus networks work best with a limited number of devices. If
more than a few dozen computers are added to a network bus, performance problems will likely result. In
addition, if the backbone cable fails, the entire network effectively becomes unusable.
Bus Topology
Advantages of a Linear Bus Topology:
a. Easy to connect a computer or peripheral to a linear bus.
b. Requires less cable length than a star topology.
Disadvantages of a Linear Bus Topology:
a. Entire network shuts down if there is a break in the main cable.
b. Terminators are required at both ends of the backbone cable.
c. Difficult to identify the problem if the entire network shuts down.
d. Not meant to be used as a stand-alone solution in a large building.
Tree Topology
Tree Topology Tree topologies integrate multiple star topologies together onto a bus. In its simplest form,
only hub devices connect directly to the tree bus, and each hub functions as the "root" of a tree of devices.
This bus/star hybrid approach supports future expandability of the network much better than a bus (limited
in the number of devices due to the broadcast traffic it generates) or a star (limited by the number of hub
connection points) alone.
Tree Topology
Mesh Topology
Mesh Topology Mesh topologies involve the concept of routes. Unlike each of the previous topologies,
messages sent on a mesh network can take any of several possible paths from source to destination. (Recall
that even in a ring, although two cable paths exist, messages can only travel in one direction.) Some
WANs, most notably the Internet, employ mesh routing.
A mesh network in which every device connects to every other is called a full mesh. As shown in the
illustration below, partial mesh networks also exist in which some devices connect only indirectly to
others.
Mesh Topology
Hybrid Topology
A combination of any two or more network topologies. Note 1: Instances can occur where two basic
network topologies, when connected together, can still retain the basic network character, and therefore not
be a hybrid network. For example, a tree network connected to a tree network is still a tree network.
Therefore, a hybrid network accrues only when two basic networks are connected and the resulting
network topology fails to meet one of the basic topology definitions. For example, two star networks
connected together exhibit hybrid network topologies. Note 2: A hybrid topology always accrues when two
different basic network topologies are connected.
Network hub: the central connection point for network cables that connect to computers or other devices
on a network. The hub has several network cable jacks or ports that you use to connect network cables to
computers. The hub contains circuitry that enables each computer to communicate with any other
computer connected to the hub (see Figure ).
Network cables: special, unshielded twisted-pair (UTP) cables used to connect each computer to the hub.
The cable you need is Category 5 UTP cable with a square plastic RJ-45 connector on each end.
Figure : Network cable with RJ-45 connector.
All the networking hardware described here is known as Ethernet. Ethernet is the industry-wide standard
for computer networks. Standard Ethernet networks transmit data at 10 million bits per second (Mbps). A
newer Ethernet standard, called Fast Ethernet, transmits data at 100 Mbps. Computer networks often
contain a mixture of 10 Mbps and 100 Mbps devices.
Suppose you want to network a few computers together in a small area where it would be expensive to
have network cabling installed in an existing building. Or perhaps you just have a desktop computer and a
notebook computer at home and you would like to be able to roam the house with the notebook computer
and perhaps even browse the Web from the hammock in the back yard. Wireless Ethernet makes all this
possible. You can install wireless adapters in each computer and form a wireless network Figure .
Recommendations
If you are installing a new network, the best choice is standard Ethernet hardware. This is the same
networking hardware used by thousands of businesses and corporations to connect millions of computers
together. Ethernet networking components are standardized, inexpensive, dependable, and easy to install
and maintain. Ethernet hardware is widely available. You can find network hubs, adapters, and cables at
most stores that specialize in computer sales. Because all manufacturers of Ethernet hardware adhere to the
Ethernet standards, you can buy any component from any manufacturer and connect it to Ethernet
components you already have. Wireless Ethernet is the best choice if you are installing a wireless network.
To make sure the hardware is 802.11b compatible, look for the Wi-Fi logo on the product box. The Wi-Fi
logo indicates the product is certified by the Wireless Ethernet Compatibility Alliance (WECA). Because
these products are standardized, you can buy products from different manufacturers and use them together.
Wireless Ethernet products have become widely available and continue to drop in price. If you use
standard Ethernet and 802.11 wireless networking products, you can easily connect wireless and wired
networks together using a wireless access point.
New Technologies
New Ethernet standards support even higher data rates for both wired and wireless networks.
Gigabit Ethernet: This new Ethernet standard transfers data at 1000 Mbps (1 Gbps) using standard
Category 5 networking cables. If you install this cable today, you can migrate to the faster hardware should
the need arise. Gigabit adapters, hubs and switches are available today, but Fast Ethernet is likely to
provide adequate bandwidth for most networking applications on a small network. In most cases, the
Ethernet hardware that you purchase today will be able to interoperate with newer Gigabit hardware.
802.11a: This new wireless standard supports speeds up to 54 Mbps. It uses technology similar to 802.11b,
but operates at 5 GHz rather than at the 2.4 GHz band used for 802.11b. The higher frequency makes
802.11a less susceptible to interference from other devices such as cell phones, cordless phones, and
microwave ovens. An 802.11a network can operate without interference in the same location as an 802.11b
network, or near Bluetooth devices, which operate in the same frequency spectrum as 802.11b.
802.11g: This new wireless standard also supports speeds up to 54 Mbps. It is an extension of 802.11b and
operates in the same RF spectrum as 802.11b. While 802.11g offers a clean upgrade path from 802.11b,
802.11a is less likely to be affected by interference. It is likely that one of these two competing
technologies will become widely adopted. These new technologies are likely to be more expensive until
their use becomes widespread. If you choose to use any of these new technologies, make sure your new
hardware is compatible with any existing hardware you have. If you choose 802.11a or 802.11g, you may
want to choose adapters that are compatible with 802.11b. Compatibility with 802.11b will let you connect
to networks that do not support the newer technology.
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
In addition to these types, the following characteristics are also used to categorize different types of
networks.
Topology: The geometric arrangement of a computer system. Common topologies include bus, star, and
ring.
Protocol
The protocol defines a common set of rules and signals that computers on the network use to communicate.
One of the most popular protocols for LANs is called Ethernet. Another popular LAN protocol for PCs is
the IBM token-ring network.
Architecture
Networks can be broadly classified as using either peer-to-peer or client/server architecture. Computers on
a network are sometimes called nodes. Computers and devices that allocate resources for a network are
called servers.
The types of networks can be further classified into two more divisions:
………………………………..………………………………………………………………………………
………………………………………………………………………………………………………………
B) Multipoint Connection.
A multipoint connection is a link between three or more devices. It is also known as Multi-drop
configuration. The networks havjng multipoint configuration are called Broadcast Networks. In broadcast
network, a message or a packet sent by any machine is received by all other machines in a network. The
packet contains address field that specifies the receiver. Upon receiving a packet, every machine checks
the address field of the packet. If the transmitted packet is for that particular machine, it processes it;
otherwise it just ignores the packet.
Broadcast network provides the provision for broadcasting & multicasting. Broadcasting is the process in
which a single packet is received and processed by all the machines in the network. It is made possible by
using a special code in the address field of the packet. When a packet is sent to a subset of the machines
i.e. only to few machines in the network it is known as multicasting. Historically, multipoint connections
were used to attach central CPs to distributed dumb terminals. In today's LAN environments, multipoint
connections link many network devices in various configurations.
16.7 Summary
Signals travel from transmitter to receiver via a path. This path, called the medium, guided or
unguided.
A guided medium is contained within physical boundaries, while an unguided, medium is
boundless.
Radio waves used to transmit data. These waves use unguided are usually propagated through
the air.
Fiber-optic cables are composed of a glass or plastic inner core surrounded by cladding, all
encased in an outside jacket.
Satellite communication uses a satellite in geosynchronous orbit to relay signals. A system of
three correctly spaced satellites covers most of the earth.
The Shannon capacity is a formula to determine the theoretical maximum data rate for a
channel.
16.8 Keywords
Cellular telephony: Cellular telephony is moving fast toward integrating the existing system with
satellite communication.
Guided media: It provides a conduit from one device to another; include twisted-pair cable,
coaxial cable, and fiber-optic cable. A signal travelling along any of these media is directed and
contained by the physical limits of the medium.
Optical fiber: Optical fiber is a glass or plastic cable that accepts and transports signals in the
form of light.
Reflection: When the angle of incidence becomes greater than the critical angle, a new
phenomenon occur called reflection.
Satellite transmission: Satellite transmission is much like line-of-sight microwave transmission in
which one of the stations is a satellite orbiting the earth.
1.0 Objectives
After studying this chapter, you will be able to:
1.1 Introduction
Office software forms a critical link between the primary systems in your day to day work. The initial choice
of the office package has far reaching consequences; both for the future selection of additional software in the
future and for the ease with which documents and information can be shared throughout the organization.
Application software uses the computer system to perform useful work or provide entertainment
functions beyond the basic operation of the computer itself.
System software is designed to operate the computer hardware, to provide basic functionality, and to
provide a platform for running application software. System software includes:
Operating system, an essential collection of computer programs that manages resources and provides
common services for other software. Supervisory programs, boot loaders, shells and window systems
are core parts of operating systems. In practice, an operating system comes bundled with additional
software (including application software) so that a user can potentially do some work with a computer
that only has an operating system.
Device driver, a computer program that operates or controls a particular type of device that is attached
to a computer. Each device needs at least one corresponding device driver; thus a computer needs
more than one device driver.
Utilities, software designed to assist users in maintenance and care of their computers.
Malicious software or malware, computer software developed to harm and disrupt computers. As such,
malware is undesirable. Malware is closely associated with computer-related crimes, though some
malicious programs may have been designed as practical jokes.
The great advantage of word processing over using a typewriter is that you can make changes without retyping
the entire document. If you make a typing mistake, you simply back up the cursor and correct your mistake. If
you want to delete a paragraph, you simply remove it, without leaving a trace. It is equally easy to insert a
word, sentence, or paragraph in the middle of a document. Word processors also make it easy to move sections
of text from one place to another within a document, or between documents. When you have made all the
changes you want, you can send the file to a printer to get a hardcopy.
Word processors vary considerably, but all word processors support the following basic features:
Insert text: Allows you to insert text anywhere in the document.
Delete text: Allows you to erase characters, words, lines, or pages as easily as you can cross them out on
paper.
Cut and paste: Allows you to remove (cut) a section of text from one place in a document and insert (paste) it
somewhere else.
Copy: Allows you to duplicate a section of text.
Page size and margins: Allows you to define various page sizes and margins, and the word processor will
automatically readjust the text so that it fits.
Search and replace: Allows you to direct the word processor to search for a particular word or phrase. You
can also direct the word processor to replace one group of characters with another everywhere that the first
group appears.
Word wrap: The word processor automatically moves to the next line when you have filled one line with text,
and it will readjust text if you change the margins.
Print: Allows you to send a document to a printer to get hardcopy.
Word processors that support only these features (and maybe a few others) are called text editors. Most word
processors, however, support additional features that enable you to manipulate and format documents in more
sophisticated ways. These more advanced word processors are sometimes called full-featured word processors.
Full-featured word processors usually support the following features:
File management: Many word processors contain file management capabilities that allow you to create,
delete, move, and search for files.
Font specifications: Allows you to change fonts within a document. For example, you can specify bold,
italics, and underlining. Most word processors also let you change the font size and even the typeface.
Footnotes and cross-references: Automates the numbering and placement of footnotes and enables you to
easily cross-reference other sections of the document.
Graphics: Allows you to embed illustrations and graphs into a document. Some word processors let you create
the illustrations within the word processor; others let you insert an illustration produced by a different
program.
Headers, footers, and page numbering: Allows you to specify customized headers and footers that the word
processor will put at the top and bottom of every page. The word processor automatically keeps track of page
numbers so that the correct number appears on each page.
Layout: Allows you to specify different margins within a single document and to specify various methods for
indenting paragraphs.
Macros: A macro is a character or word that represents a series of keystrokes. The keystrokes can represent
text or commands. The ability to define macros allows you to save yourself a lot of time by replacing common
combinations of keystrokes.
Merges: Allows you to merge text from one file into another file. This is particularly useful for generating
many files that have the same format but different data. Generating mailing labels is the classic example of
using merges.
Spell checker: A utility that allows you to check the spelling of words. It will highlight any words that it does
not recognize.
Tables of contents and indexes: Allows you to automatically create a table of contents and index based on
special codes that you insert in the document.
Thesaurus: A built-in thesaurus that allows you to search for synonyms without leaving the word processor.
Windows: Allows you to edit two or more documents at the same time. Each document appears in a separate
window. This is particularly valuable when working on a large project that consists of several different files.
WYSIWYG (what you see is what you get): With WYSIWYG, a document appears on the display screen
exactly as it will look when printed.
The line dividing word processors from desktop publishing systems is constantly shifting. In general, though,
desktop publishing applications support finer control over layout, and more support for full-color documents.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
1.4 Spreadsheet
A table of values arranged in rows and columns. Each value can have a predefined relationship to the other
values. If you change one value, therefore, you may need to change other values as well.
Spreadsheet applications (sometimes referred to simply as spreadsheets) are computer programs that let you
create and manipulate spreadsheets electronically. In a spreadsheet application, each value sits in a cell. You
can define what type of data is in each cell and how different cells depend on one another. The relationships
between cells are called formulas, and the names of the cells are called labels.
Once you have defined the cells and the formulas for linking them together, you can enter your data. You can
then modify selected values to see how all the other values change accordingly. This enables you to study
various what-if scenarios.
A simple example of a useful spreadsheet application is one that calculates mortgage payments for a house.
You would define five cells:
1. Total cost of the house
2. Down payment
3. Mortgage rate
4. Mortgage term
5. Monthly payment
Once you had defined how these cells depend on one another, you could enter numbers and play with various
possibilities. For example, keeping all the other values the same, you could see how different mortgage rates
would affect your monthly payments.
There are a number of spreadsheet applications on the market, Lotus 1-2-3 and Excel being among the most
famous. The more powerful spreadsheet applications support graphics features that enable you to produce
charts and graphs from the data.
Most spreadsheet applications are multidimensional, meaning that you can link one spreadsheet to another. A
three-dimensional spreadsheet, for example, is like a stack of spreadsheets all connected by formulas. A
change made in one spreadsheet automatically affects other spreadsheets.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Figure 1.3: The Word Ribbon puts the most necessary items on the Home tab.
Figure 1.4: The Excel ribbon houses Excel-specific tasks.
The Ribbon provides a contextual experience for your users. By that, it means that the tabs that are available
on the Ribbon change based on the document context. If a user is working with a table, for example, a Table
Tools section is added to the Ribbon with Design and Layout tabs. These new tabs are visible only when your
insertion point is within a table, and stay out of your way at other times. Figure 1.5 shows you an example of
the Table Tools context sensitive tabs.
Context sensitive tabs keeps the clutter out of your interface when it is not needed. If you are more
comfortable working with a more traditional dialog box, these have not been eliminated from Office. In fact,
many of the most common dialog boxes are accessible via a single click of the mouse. Take a look back at
Figures 1.3, 1.4, and 1.5. In the lower right-hand corner of most of the various sections of the Ribbon, take
note of the small arrow pointing down and to the right. These icons open up the associated traditional dialog
box. For example, if you click on the arrow icon in the Font section of the Ribbon in Word, the Font dialog
box will open. Since not every single option will fit on the Ribbon, these dialog boxes remain useful.
In the Ribbon bar, on the Home tab, you can also see the most obvious example of galleries. A gallery is
basically an example of what a particular style will look like. Word, Excel and PowerPoint make liberal use of
galleries. Word uses them to give you a look at what would happen if you applied a particular style to your
document. Excel uses them to apply formatting to your spreadsheets and PowerPoint uses them so you can get
a look at what a particular template might look like.
To use a gallery, just hover your mouse pointer over one of the representations in the Ribbon. In all Office
programs that have a gallery, hovering the mouse pointer over the sample actually temporarily applies that
style to your work. As you move across the gallery, you can see each style in turn. To apply a particular style
to your work, click the style.
2. …………………..is word processor software produced by IBM's Lotus Software group for use on
Microsoft Windows-compatible computers and on IBM OS/2 Warp?
(a) Lotus Word Pro (b) Word processors
(c) Word document (d) Word Ribbon.
4. ………………is a part of the Lotus SmartSuite office suite for Microsoft Windows.
(a) Word wraps (b) Lotus Freelance Graphics
(c) Lotus Word Pro (d) Word Ribbon.
Word Pro was based upon Ami Pro but was substantially rewritten (including a new native document format).
Lotus obtained Ami Pro to round out their office suite by acquiring Samna, and continued to develop Ami Pro
further, with version 3 becoming a 32-bit application available for Microsoft Windows and IBM OS/2. Create
reports, documents and proposals in a snap with the word processor for today's Internet-centered world.
Switching to Word Pro? You will feel comfortable right away — Word Pro offers excellent file compatibility
with Microsoft Word, plus a choice of other menu formats, including Lotus Ami Pro, Microsoft Word, and
WordPerfect.
1.7.2 Lotus 1-2-3—Spreadsheet
Lotus 1-2-3 is a spreadsheet program from Lotus Software (now part of IBM). It was the IBM PC‘s first ―killer
application‖; The Lotus Development Corporation was founded by Mitchell Kapor, a friend of the developers
of VisiCalc. The 1-2-3 was originally written by Jonathan Sachs, who had written two spreadsheet programs
previously while working at Concentric Data Systems, Inc.
Unlike Microsoft Multiplan, it stayed very close to the model of VisiCalc, including the ―A1‖ letter and
number cell notation, and slash-menu structure. It was free of notable bugs, and was very fast because it was
programmed entirely in x86 assembly language and bypassed the slower DOS screen input/output functions in
favor of writing directly to memory-mapped video display hardware.
The name ―1-2-3‖ stemmed from the product's integration of three main capabilities. Along with being a
spreadsheet, it also offered integral charting/graphing and rudimentary database operations. Data features
included sorting data in any defined rectangle, by order of information in one or two columns in the
rectangular area. Justifying text in a range into paragraphs allowed it to be used as a primitive word processor.
Lotus Freelance Graphics is a part of the Lotus SmartSuite office suite for Microsoft Windows. (Previous
versions were also released for OS/2.) It allows users to create and compile text, digital images, diagrams,
basic drawings, and charts (such as bar charts and pie charts) into a digital slide show.
Lotus Smart Center — a toolbar that let users quickly access programs, their calendar, Internet bookmarks,
and other resources
Lotus Approach is a relational database management system included in IBM‘s Lotus SmartSuite for
Microsoft Windows.
Lotus Approach is the award-winning relational database designed to manage, analyze and report on business
information. It offers breakthrough ease of use, unprecedented cross-product integration, connectivity, and
outstanding power and analysis capabilities. Computing features maximize the sharing of information in the
organization. Approach offers tight integration with Lotus Notes, making it an excellent tool for reporting on,
analyzing and updating Notes data. Approach lets users seamlessly connect to all data, whether it is stored in
dBASE, DB2, Oracle, Lotus Notes or almost anywhere else.
Lotus Organizer is a personal information manager package. It was initially developed by Threads, a small
British software house, reaching version 3.0. Organizer was subsequently acquired by Lotus Development
Corporation, for whom the package was a Windows-based replacement for Lotus Agenda. For several years it
was the unquestioned market leader before it was gradually overtaken by Microsoft‘s Outlook. It is also the
only PIM package recommended by the British Philosophical Association.
It is notable for using the organizer graphical metaphor for its user interface and is often bundled within Lotus
SmartSuite.
Organizer was the first and most important software to be used as an agenda and its usability was so good that
even now is still appreciated. The so called current version is actually more than 10 years old, as Lotus and
IBM never updated the software after version 5, the 6 and 6.x versions are really minor upgrades.
The famous Covey organizer has a current software version that is obviously inspired in the Lotus Organizer.
It is surprising how Lotus and IBM let go the chance of filling a market niche that Outlook never could reach.
IBM continues to support and ship Lotus Organizer. Version 6.1 is the most recent version, with support for
Windows 2000 and Windows XP. Contents. It is an electronic day planner with tabs for each section and pages
that turn. You can quickly see all your calendar, contacts, to does, calls, notes, Web information and more at a
glance. No more looking for sticky-note reminders or lost scraps of paper. It is all there, right before your eyes.
Lotus FastSite — web design software - .htm files
Lotus ScreenCam — recording of screen activity for demos and tutorials - .scm, .exe, wav files
Caution
Do not use the 64-bit version of Office SharePoint Server 2007 to crawl Lotus Notes because the Lotus C++
API Toolkit is available only in 32-bit.
Autopilots to guide you through creating new documents and importing data.
Charts and equations
Data source connection capabilities for easy mail merges and access to your existing databases
XML file formats for easy opening by other applications, plus extremely small file sizes
Easy, high-quality conversion to and from Microsoft Office and other files
HTML hotlinks from text or buttons
A huge gallery of clip art you can use in your documents, modify, and add to
Animation in presentations, plus animated GIFs
Available in many languages, plus Asian language support
Figure 1.9: plus animated GIFs.
Anytime, anywhere access Web-based Google Docs safely stores documents online, making them accessible
to authorized users from any computer or mobile device, whenever they are needed. No need to save files to a
USB thumb drive, you can always access your files from any internet browser.
Collaboration support Google Docs lets users easily invite others to work on the same document, at the same
time, without the hassle of attaching and sending documents. Sharing privileges ensure access by only the right
people or groups, and allow either editing or read-only access.
Auto save and revision history Continuous auto save ensures that current work stays safe, preserving
ongoing drafts and edits. A complete revision history makes it easy to review, compare, or revert to a prior
version at any point.
Shared collections Files and docs that are regularly used by teams or groups stay organized and up-to-date
without the need to manage and communicate changes.
Templates Ready-made templates covering a wide range of document and report types help jump-start writing
projects. You can also create and publish your own document templates to establish assignment structures for
your students. Templates can be copied with one click and then modified like any other document.
Questions
1. What was the purpose to develop the Google Docs service?
2. Is there any disadvantage using Google Docs? If yes then discuss in brief.
1.10 Summary
Word processors make it easy to move sections of text from one place to another within a document, or
between documents.
Making presentation using slides prepared in presentation graphics software is fast becoming one of the
modern ways of exchanging ideas between the speaker and his audience.
A database is a collection of information that is organized so that it easily be accessed, managed, and
updated.
A distributed database is one that dispersed or replicated among different points in a network.
Lotus is most commonly known for the Lotus 1-2-3 spreadsheet application, SmartSuite is an office suite
from Lotus Software.
Lotus Organizer is a personal information manager package.
1.11 Keywords
Database: A database is a collection of information that is organized so that it can easily be accessed,
managed, and updated.
Macros: A macro is a character or word that represents a series of keystrokes. The keystrokes can represent
text or commands.
Open Office: It is a volunteer-run project. It is use to build a world-class office suite, available to all.
Spell checker: A utility that allows you to check the spelling of words. It will highlight any words that it does
not recognize.
Star Office: It is a full featured office suite that you can use to create text documents and Web pages,
spreadsheets, slide presentations, and drawings and images.
Thesaurus: A built-in thesaurus that allows you to search for synonyms without leaving the word processor.
2.0 Objectives
After studying this chapter, you will be able to:
Discuss the introduction to MSWord
Explain the menus and commands in MSWord
Describe about the Microsoft office template wizard
Explain different page views and layouts
Working with styles in MSWord
2.1 Introduction
Microsoft Office 2007 Professional Software contains five programs: Word is the word processing software
that has replaced the typewriter. It is commonly used to create letters, mass mailings, resumes, newsletters and
so on.
Excel is a program used to create spread sheets. Spread sheets are commonly used to create payroll, balance a
check book or track an organization‘s finances.
PowerPoint is used to create a slideshow that helps address the topics being covered. It is commonly used to
help discuss a topic or provide training.
Access is a database management program. It allows large quantity of information to be easily searched,
referenced, compared, changed or otherwise manipulated without a lot of work.
Outlook is an e-mail software program that allows users to send and receive e-mail. It also allows you to keep
a personal calendar and/or group schedule, personal contacts, personal tasks and has the ability to collaborate
and schedule with other users.
Microsoft Works is best described as a less expensive, slimmed down version of Word/Office. It often comes
with the purchase of a home computer that contains a Home Edition of Windows XP or Windows Vista.
Although some commands are similar in Works and Word, they are different program. Works is not
commonly used by professional organizations and there may be compatibility issues if you try to exchange
documents with Word users. The same is true for other program such as Word Perfect.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
With this menu, you can create a new document or file, open an existing document or file,
save a file, and perform many other tasks like printing etc.
We can also display menus similar to previous versions of Word (like MS Word 97) with all the choices listed
initially:
1. Select View Toolbars Customize commands on the menu bar from the menu bar. The Customize dialog (see
Figure 2.13) box will appear.
2. Click on the Options tab.
3. Uncheck the Menus show recently used commands first check box.
4. Click on Close to close the Customize dialog box.
Rulers
The rulers display horizontal and vertical scales that reflect the width and height of your typing area. The
horizontal scale is invaluable when you want to quickly set tabs, margins, and indents. If you do not see the
rulers select View Ruler. If you are in normal view, you will see only horizontal ruler. To see both the rulers
you should be in Page Layout view. If you do not want to see the ruler selects View Ruler to turn off the ruler.
When you start OTW for the first time, you have to define at least one profile in the user profile window. This
is the information that will be used during the generation of the Word document or PowerPoint presentation
when the OTW has been finished successfully.
After selection of the template or presentation you are still able to make temporary changes to the profile
definitions see Figure 2.19.
You are also able to select the languages in which the document or presentation must be defined. Dependent
on employee‘s role, the language can be selected and defined in different user profiles. In the Figure 2.21 the
same Word template is chosen but now defined in the Dutch language.
The language wherein the OTW is presented is dependent on the language of your Microsoft Operating
System: if you have a Dutch operating system, then OTW will be presented in Dutch; do you have a French
operating system, then OTW will be presented in French.
Caution
Be careful to restore the document to normal mode after adding merge fields, if you forgot to restore the
document to normal node, the Document Server can stop responding when correspondence is generated.
When you click on the Print Layout button in the Document Views section this will change the view of the
document you are working on to look just like the document will print. The next button, Full Screen Reading,
changes the view of the document to a larger view that takes up most of the screen and removes the buttons at
the top to maximize the view for easy reading and editing. If you choose this view click the close button at the
top right corner to return to the normal view. The Web Layout button will change the view of the document to
appear as it would if the pages were turned into a web page. The outline button will show your document as an
outline then give you another tab with more outlining tools. The last button, Draft, will give you a chance to
view your document as a draft for quick editing. This view removes elements of the document such as headers
and footers for easy editing. In the Word 2007 Page Layout tabs. We will go through step by step explaining
how all of the buttons in each section work. The Page layout tab is where you can change the appearance of
the entire Word document. Open your greeting card we were working on then click the Page Layout tab and
we will get started.
The first section of the Page Layout tab is themes see Figure 2.25. Themes is a great feature if you are typing
an elaborate document and want to use a variety of fonts and colors and then duplicating those fonts and colors
on another document or throughout a long document. A document theme is a set of formatting choices that
include a set of theme colors, a set of theme fonts that you can specify a heading and body text font, and a set
of theme effects you can choose lines and fill effects. We are not going to use the themes section on our
greeting card but we want you to understand what the feature does. Click the down arrow under themes see
Figure 2.26.
Figure 2.26: Built in themes.
You will get a list of pre-designed themes you can apply to your document. Each theme will include font
colors, font styles, font sizes and effects including lines, fill effects, and colors. If you already selected a theme
for your document and no longer want to use it click on the Reset to Theme from Template option. If you do
not like any of the built-in themes you can click More Themes on Microsoft Office Online and there will be
many more to choose from. Or you can create your own theme with the other options in the Theme section.
Then click back on Themes and at the bottom click Save Current Theme then you will be prompted for a file
name. Once you give your theme a name it will be available to use on other documents you create. Now we
will go over how to create custom theme with the other features in this category. Now click on the down arrow
next to the square made up of 4 colors to learn how to change the theme color.
When you click on the custom color theme drop down arrow you will get a list of Built-In Color themes for
your document see Figure 2.27. These are colors are for a variety of things including heading, body, and
accent colors. Now click on Create New Theme Colors see Figure 2.28.
Figure 2.28: Create New Theme Colors window.
The Create New Theme Colors window will appear. Here you get a better idea of what each line of colors is
going to do. Not only do you now understand what each color is for but you can modify the colors to your
liking. Once you are finished modifying the colors by clicking the dropdown arrow next to the color you want
to change and selecting a new color then type a theme color name in the Name section the click Save. Your
new modified color will appear in the list of Built-In Color Themes see Figure 2.29.
Next is the Font theme selector. Click the dropdown arrow next to the box with an A.
This menu works just like the colors but you are changing the fonts. It has the same Built-In selections
Microsoft Word 2007 has provided for you and also the Create New Theme Fonts option. Click the Create
New Theme Fonts.
Figure 2.30: Create New Theme Fonts window.
When the Create New Theme Fonts window (see Figure 3.30) opens go ahead and play around with Heading
font and the Body font by clicking the dropdown arrow next to the font names. It will show you a preview of
your selections in the Sample section. If you want to save your selections type a name in the Name section and
click save.
The last button in the Themes is the Effects button see Figure 2.31. Theme effects are sets of lines and fill
effects used on shapes and graphics you use in your document. Click the drop down arrow to see your list of
choices.
The next button is Change Styles see Figure 3.32. This button gives you the opportunity to customize the style
you choose. Click the down arrow to see the options. The first selection is Style Set. A style set is the
combination of formatting changes you make to a document. Place your mouse over Style Set to see a list of
options. These options will change the style selections you can choose from. If you have made style changes to
your document you can click the selections at the bottom of the list to reset changes made to a template, reset
the document to quick styles, or save your customized style as a Quick Style set.
The next selection in the Change Styles button is Colors. Place your mouse over Colors. You will see a list of
preselected color combinations. These color combinations are for different text colors throughout your
document. If you select the Create New Theme Color at the bottom of the menu you will see a list of the
different types of text you can change the colors see Figure 2.33.
Figure 2.33: Change Styles button.
Next is the Fonts selection in the Change Styles button. Click on the Fonts selection and see a list of default
combination of Fonts. The top is the Heading font and the bottom is the text for the body of the document. Use
the arrow slide bar on the right hand side to scroll through the selections. At the bottom of the Fonts menu you
will see Create New Theme Fonts. Click on this option. The Create new Theme Fonts window will open see
Figure 2.34.
In the Create new Theme Fonts window you can customize a font theme. Simply use the down arrows to select
a font for the Heading and or the Body. Name your font theme and click Save. The last option in the Change
Styles button is the set as default selection. This will take the current theme of your document and set it to
default so every time you start a new document the theme you have created will be used.
Word/sentence Correct
1. Select word/sentence
2. Type new word/sentence
objects on the page. The first three buttons are list buttons. The first button is a bulleted list.
Click on the dropdown menu to see you selections for you bullets see Figure 2.35.
These are just a few of your options if you click on the Define New Bullet you have endless options to create
your own look and feel.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
MARGINS
When you click on the Define New Bullet the window (see Figure 2.39) pops up and you can create your own
bullet by using the Symbol, Picture or font button. The Alignment dropdown menu lets you choose where on
the page you would like your list to appear and the Preview section will let you know what your bullet will
look like before you click OK and start your bulleted list. Once you select your bullet style it will
automatically place a bullet on your document. Type your text after your bullet and hit enter to create the next
item in your list. If you are finished with your list hit enter twice and the bullets will be discontinued. The
same rules apply with the numbered list Click the drop down menu on the numbered list button to get
your options see Figure 2.40.
When you click Define your own format the above window appears and you can customize your own format
by instead of using a predefined one. Just enter your customizations in each category, preview it in the preview
pane until you are satisfied and click OK.
The numbers will work the same way as the bullets, after typing your text hit enter and the next number will
appear. When your list is complete press enter twice and your numbered list will end. The last list button
is the multilevel list. This is great for outlines. It works the same as the lists above with all of the option and
customizations but it has one difference; to get to the next level in the list press tab and your list item will tab
over and change to the second level format.
Once you have tabbed over to the second, third, etc level when you hit enter it will stay at that level until you
hold down the Shift key and press tab. This keyboard command will bring your list up a level. To discontinue
the list simply hit enter twice and you can continue your document without continuing the list. The next two
buttons Increase and Decrease an Indent at the beginning of a paragraph. Notice the hourglass at
the top left of your word document in the ruler see Figure 2.42.
Now click on the Increase Indent button did you notice how the hourglass moved to the right? This is
setting a tab. Each time you finish a paragraph and press tab on your keyboard the tab will move to the
location of the hourglass. By clicking the Increase Indent button more than once the hourglass moves further to
the right. The Decrease Indent button will move the tab back.
The button will alphabetize a list of words or sort numbers for you. To use this features simply select the
text you would like to organize by clicking in front of the text you want to highlight and dragging the mouse to
the end of the text then click the button It will alphabetize the list of words for you.
The next button we use all of the time to help me with my formatting. The Button will give you
formatting symbols on your document. It allows you to see if you have an extra space between words by the
dots or an extra line break by the paragraph symbol.
These 4 buttons you will use to justify your text on the page. Either clicks the button
before you start typing or highlight text that has already been typed and click the justification you want.
The button is your line spacing. To use this feature you can either click on the drop down menu and
select your options or highlight your text then click and select. Most users will only need to select one of the
default options listed.
If you are interested in other options besides the default line spacing selections click on Line Spacing Options
see Figure 2.43. This will open the Paragraph window see Figure 2.44. In this window you can do the same
things we have been using the buttons for.
Under Line spacing you have a drop down menu where you can select from single, double, 1.5 lines, At Least,
Exactly, or Multiple. The At selection is where you can input your custom settings.
The Preview section at the bottom will let you see what the spacing will look like in your document. Click the
OK button to return to your document and make the changes.
The fill button and Boarders button can be used on individual lines of text. If you would like to
change the color of the background for an area of text in your document click the button before you start
typing and choose a color from the drop down menu. When you are finished typing click the fill button again
and the background will go back to the original color see Figure 2.45.
Figure 2.45: Text background color.
Highlight the text you would like a different background and select a color from the dropdown menu and the
background of selected text will change.
The Boarders Button works the same way as the background fill but will place a boarder around the text. Click
the dropdown menu to see your entire options in Figure 2.46.
Figure 2.47: The insertion point is a flashing vertical line; the end-of-file marker (appearing only in Draft or
Outline view) is a horizontal, non-flashing line.
Text you type always appears at the insertion point. To enter text, just type as you would in any program. The
given keys have specific functions:
Enter: Press this key to start a new paragraph.
Shift+Enter: Press this key combination to start a new line within the same paragraph.
Ctrl+Enter: Press this key combination to start a new page.
Tab: Press this key to move to the next tab stop (by default every 0.5").
Backspace: Press this key to delete a single character to the left of the insertion point.
Delete: Press this key to delete a single character to the right of the insertion point.
You can also delete a text selection of any size, including text and/or other objects, by pressing the Delete or
Backspace key.
Line Breaks versus Paragraph Breaks: A surprising number of people have trouble understanding the
difference between a new paragraph and a new line. Yes, starting a new paragraph does also start a
new line, so on the surface they seem to be doing the same thing. But if you turn on the Show/Hide ¶
feature (on the Home tab), you will see that two completely different symbols are inserted.
A paragraph break (¶ symbol) creates a whole new paragraph, which can have its own indentation, bullets and
numbering, line spacing, and other paragraph-level settings.
A line break ( symbol) is like any other character of text within the paragraph, except instead of printing a
letter on the screen, it moves the insertion point to the next line. The text after the line break has the exact
same paragraph-level formatting as the text before the break, because it is all one paragraph.
Line breaks come in handy whenever you do not want the stylistic attributes of multiple paragraphs. For
example, suppose you want to create a bulleted list of mailing addresses, with each complete address as a
separate bullet point. If you press Enter between the lines of each address, each line will have its own bullet
character, like this:
John Smith
240 W. Main Street
Macon, IL 62544
By using line breaks instead, you can create a single bulleted item with multiple lines, like this:
John Smith
240 W. Main Street
Macon, IL 62544
Caution
Be careful, even if the active document does not have Track Changes enabled, you still cannot access that
check box if any open document is tracking changes.
Figure 2.48: The Undo button undoes the last action when clicked; it also has a drop-down list from which
you can choose to undo multiple actions at once.
The Repeat feature enables you to repeat an operation such as typing, formatting, inserting, and so on. The
Repeat button looks like a U-turn arrow, and appears in place of the Redo button on the Quick Access toolbar,
when available. Its shortcut is also Ctrl+Y; this works because Repeat and Redo are not available at the same
time (see Figure 2.49).
Figure 2.49: The Repeat button makes it easy to repeat the last action you took.
The single and double quotation marks in Table 2.2 are typographical—that is, they differ depending on
whether they are at the beginning or end of the quoted phrase. This is different from the straight quotation
marks and apostrophes that you can directly type from the keyboard.
There are no AutoCorrect entries for the dashes and the quotation marks. That is because they're not needed.
Word automatically converts straight quotes to typographical ones (Word calls these "smart quotes") and two
hyphens in a row to a dash. If you do not want that change to occur, using Undo (Ctrl+Z) immediately after
Word makes the change to reverse it. Undo also reverses any of the AutoCorrect conversions as well if you
catch them immediately after they occur.
To disable an AutoCorrect entry.
To learn how to disable the automatic conversion of straight quotes to smart quotes, or two hyphens to a
dash.
Figure 2.50: Symbols can be inserted from the Symbol drop-down list on the Insert tab.
If the symbol you want does not appear, click More Symbols to open the Symbol dialog box, shown in Figure
2.51. From here you can select any character from any installed font, including some of the alternative
characters that do not correspond to a keyboard key, such as letters with accent symbols over them.
Figure 2.51: The Symbol dialog box can be used to insert any character from any font.
For a wide choice of interesting and unique symbols, check out the Wingdings fonts, which you can select
from the Font drop-down menu.
You can also find a symbol by its character code, which is a numeric identifier of a particular symbol in a
particular coding system. The two main coding systems are ASCII and Unicode. ASCII is the older system,
and characters can be identified using either decimal or hexadecimal numbering in it. Unicode is the Windows
standard for character identification, and it uses only hex numbering. Select the desired coding system from
the From drop-down list and then type the character code in the Character Code box.
On the Special Characters tab of the dialog box are some of the most common typographical characters, along
with reminders of their keyboard shortcuts. If you need to insert one of these common characters, finding it on
the Special Characters tab can be easier than trying to wade through all the characters in a font for it.
6......................display the same types of controls as a dialog box, such as command buttons, options, and list
boxes.
(a) Dockers (b) Property Bar
(c) Colour Palette (d) Toolbox
2.13 Bullets in Word 2007
When you are writing a document, you need to make it easy to read. Professional writers sometimes talk about
things called ―Entry Points‖… points where a reader can quickly and easily start reading your document and
pick up what you are trying to say. Bullet points (see Figure 3.54) allow you to quickly structure information
so your reader can easily interpret exactly what you are going on about!
That is the easy way to use bullets in your document. But what if you do not want to use the standard black dot
bullet? Maybe you want to use something with a little more flair?
Now, select your first heading and click on the ―Multilevel List‖ in the ―Home‖ tab and choose ―Define New
Multilevel List….‖ see Figure 2.56.
On the bottom left of the ―Define new Multilevel list‖ window click on the ―More >>‖ button. This is what
you will see in Figure 2.57:
Figure 2.57: Multilevel List Dialog.
Here we can select a level of the list (in the top left) and change its style.
Let us say you want each Heading 1 to include ―Chapter‖. Simply enter ―Chapter‖ before the number in the
―Enter formatting for number:‖ text field. But aside from styling your list, the important step is to link this
style to your header. You do this by selecting ―Heading 1″ to the ―Link level to style:‖ drop down list see
Figure 2.58.
For levels other than 1, you can include the number of its parent level. This is very useful when you want sub
sections to look like ―1.2.1 Title‖. Simply edit those levels and select the level you want in the ―Include level
number from:‖ Do not forget to link each level to the appropriate headings style see Figure 2.59.
Figure 2.59: Include Level Number.
You should now have each heading numbered correctly! (see Figure 2.60.)
While this will do the trick, you have to manually set the new style. Create a new style by opening the ―Styles‖
windows and clicking ―New Style‖. Name it ―Appendix‖, set the ―Style type:‖ to ―Linked (paragraph and
character)‖ and ―Style for given paragraphs:‖ to ―Normal‖ see Figure 2.62.
Now, this is where the magic happens: set the ―Style based on:‖ to ―Heading 1″ and click ―OK‖. This will link
this style with Heading 1, which means two things: a) any changes in Heading 1 will be set to Appendix as
well and b) the title of this heading will be set to the same level of Heading 1 in Table of Contents.
Figure 2.62: Appendix Style.
If you now apply the Appendix style to your titles, you will see that ―Chapter‖ appears before them. To set this
to ―Appendix‖, select your title and define a new multi level list. Next, select the fourth level and set the ―Link
level to style:‖ to ―Appendix‖. Make any changes to the style you want (like setting ―Appendix‖) and it done!
See Figure 2.63.
When you will generate your Table of Contents, you will see that Appendices are at the same level as Heading
1. If you change the Appendix style to be based on Heading 2, this will be visible to the Table of Contents as
well see Figure 2.64.
Figure 2.64: Table of Contents.
For information on AutoFormat tab options that are also found in the AutoFormat As You Type tab.
List styles: Applies list styles to numbered, bulleted, outlines, and other lists. It replaces any numbers or
bullets that were inserted manually
Other paragraph styles: Applies styles other than for headings and lists (e.g., body text)
Preserve styles: Retains the styles you have already applied in your document
Plain text e-mail documents: Formats e-mail messages when they are opened
Click Options button in the bottom left corner of the dialog box. This opens the Word Options dialog box see
Figure 2.68.
You will see the Display section of the Word Options box. Here you will find a number of options. You can
choose to print hidden text, backgrounds, drawings, and properties. You can also have word update links and
form fields before printing. If you want more options, you will need to open the advanced section of the Word
Options box see Figure 2.69. Then, scroll down to Print.
Figure 2.69: Advanced section of the Word Options box.
You can set options for print order, duplex printing and also print quality. When you are done, click OK. The
options you select will be kept until you change them again.
2.17 Summary
Word is the word processing software that has replaced the typewriter. It is commonly used to create
letters, mass mailings, resumes, newsletters and so on.
Word 2007 is full of new tools and options, expanded capabilities, and significant changes.
Themes include predesigned settings for colors, fonts, and effects, and things like sidebars and quotes have
their own styles as well.
Microsoft Word is word processing software. It is used to create and edit texts, letters, reports, and
graphics.
The vertical scroll bar is located along the right side of the screen. The horizontal scroll bar is located just
below your document. The horizontal scroll bar is only visible when your document is larger than your
screen.
The shortcut menus are helpful because they display only those options that can be applied to the item that
was right-clicked and, therefore, prevent searching through the many menu options
The rulers display horizontal and vertical scales that reflect the width and height of your typing area.
The Word 2007 Styles section is used to quickly format an entire document.
2.18 Keywords
ASCII: It is the older system, and characters can be identified using either decimal or hexadecimal numbering
in it.
Microsoft Word: It is a word processing program that allows you to create, revise, and save documents for
printing and future retrieval.
Paragraph break: A paragraph break creates a whole new paragraph, which can have its own indentation,
bullets and numbering, line spacing, and other paragraph-level settings.
Ruler: The Ruler is used as a quick way to adjust margins. Margins may also be adjusted by using a preset
option provided by Word, or through the Page Setup dialog box.
Style: A style is a set of formatting characteristics such as font size, color, paragraph alignment, spacing, and
shading.
Text area: The text area is basically where you type in your texts (letters and numbers). It is the open area
with white background (depending on your chosen color) the blinking vertical line in the upper-left corner of
the text area is the cursor.
Unicode: It is the Windows standard for character identification, and it uses only hex numbering.
3.0 Objectives
After studying this chapter, you will be able to:
Discuss the spell check
Explain the thesaurus
Discuss about the find and replace
Explain the headers and footers
Explain the working with columns
Discuss the tabs and indents
Explain the creation and working with tables
3.1 Introduction
Some of the advanced features of Microsoft Office Word 2007 offer ways in which you can automate and
streamline the way you work. You can use macros in Word 2007 to easily automate repetitive, complex tasks.
A macro is a set of instructions that can group a series of actions and keystrokes as a single command. You can
also simplify the management of your Word 2007 files by using master documents. Master documents divide
large files into related subdocuments through a series of links. This course demonstrates how to create, edit,
copy and delete macros through the use of the macro recorder. In addition, aspects of master and subordinate
documents are explained, including outline levels, rearranging and restructuring subordinate documents in the
master, and converting, deleting, merging, locking subdocuments and checking spellings and etc.
Any errors will display a dialog box that allows you to choose a more appropriate spelling or phrasing.
If you wish to check the spelling of an individual word, you can right click any word that has been underlined
by Word and choose a substitution.
Figure 3.3: Suggested spelling list.
3.3 Thesaurus
The Thesaurus allows you to view synonyms. To use the thesaurus:
Click the Review Tab of the Ribbon
Click the Thesaurus Button on the Proofing Group.
The thesaurus tool will appear on the right side of the screen and you can view word options.
You can also access the thesaurus by right-clicking any word and choosing Synonyms on the menu.
The same Find and Replace window pops up except the Replace tab is selected. This feature is useful if you
have dates or names in a form letter you need to change. Type the word you would like to change in the Find
what text box and Type the word you would like to change it to in the Replace with text box.
The Replace, Replace All and the Find Next buttons will no longer be grayed out. If you would like to replace
the words one at a time click Find Next and if it is a word you want replaced click Replace and continue that
way through the document.
If you know you want every word replaced click Replace All and each word in the document will be replace.
Click Edit Header or Edit Footer. Type text or insert graphics and other content by using the options in the
Insert group on the Design tab, under the Header and Footer Tools tab. If you do not see a gallery of
header or footer designs, there might be a problem with the Building Blocks template on your computer.
To save the header or footer that you created to the gallery of header or footer options, select the text or
graphics in the header or footer, and then click Save Selection as New Header or Save Selection as New
Footer.
Make the first page header or footer different from the rest of the pages
On the first page of the document, double click the header or footer area.
Under Header and Footer Tools, on the Design tab, in the Options group, select the Different First Page
check box.
If your document includes a cover page from the gallery of cover pages in Office Word 2007, the Different
First Page option is already turned on. Inserting or editing a header or footer on this page does not affect the
other pages in the document.
Create a header or footer, or make changes to the existing header or footer, on the first page.
Make the header or footer different for odd and even pages
For example, you can use the title of the document on odd-numbered pages, and the chapter title on even-
numbered pages. Or, for a booklet, you can place page numbers on odd-numbered pages to be on the right side
of the page and page numbers on even-numbered pages to be on the left side of the page. This way, the page
numbers are always on the outside edge when the pages are printed on both sides of the paper.
Create Odd and Even Headers or Footers in A Document That Does Not Yet Use Headers or Footers
Click an odd-numbered page, such as the first page of your document.
On the Insert tab, in the Header and Footer group, click Header or Footer.
In the gallery of headers or footers, click a design labeled (Odd Page), such as Austere (Odd Page).
If you do not see a gallery of header or footer designs, there might be a problem with the Building Blocks
template on your computer
Under Header and Footer Tools, on the Design tab, in the Options group, select the Different Odd and Even
Pages check box.
Under Header and Footer Tools, on the Design tab, in the Navigation group, click Next Section to advance
the cursor to the header or footer for even-numbered pages.
Under Header and Footer Tools, on the Design tab, in the Header and Footer group, click Header or Footer.
In the gallery of headers or footers, click a design labeled (Even Page), such as Austere (Even Page).
If necessary, you can format text in the header or footer by selecting the text and using the formatting options
on the Office Fluent Mini toolbar.
If you want to switch to a different predefined header or footer, repeat these steps, and choose a different
header or footer from the gallery.
Create odd and even headers or footers in a document that already has headers or footers
Double-click in the header or footer area.
Under Header and Footer Tools, on the Design tab, in the Options group, select the Different Odd and
Even Pages check box.
The existing header or footer is now configured for odd-numbered pages only.
Under Header and Footer Tools, on the Design tab, in the Navigation group, click Next Section to
advance the cursor to the header or footer for even-numbered pages, and then create the header or footer
for even-numbered pages.
Click Top of Page, Bottom of Page, or Page Margins, depending on where you want page numbers to
appear in your document.
Choose a page number design from the gallery of designs.
Choose a design that positions the page number where you want it. For example, if you want some header
content aligned on the left margin and the page number aligned on the right margin, choose a right-aligned
page number design.
Do one of the following:
o To insert header or footer content before the page number, press the HOME key, enter the content, and
then press TAB to position the content.
o To insert header or footer content after the page number, press the END key, press TAB, and then enter
the content.
o If you chose a design from the Page Margins designs, click in the header or footer, and add the content that
you want.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
1. To select the number of columns, in the Number of columns text box, use the nudge buttons or type the
desired number of columns
OR
Within the Presets section, select the desired option
4. From the Apply to pull-down list, select Selected Text
Your choices will depend upon whether you selected the text or placed your insertion point in the text to create
columns.
5. Click OK
The columns are applied to the selected text only.
arrow.
3. Click and drag the column boundary for the appropriate column width
Adjusting Column Width: Column Dialog Box Option
Place the insertion point in the document that is formatted into columns OR
Select the text that is formatted into columns
Windows: From the Page Layout command tab, within the Page Setup section, click Columns » select More
Columns...
OPTIONAL: To create columns of unequal width, make sure that Equal column width is not selected
NOTE: The option is not selected when no checkmark appears
Under Width and spacing, use the nudge buttons or type values for the column attributes you want to change
NOTE: The Width will alter the width of the column and the Spacing will alter the space between the columns.
Click OK
Adding Lines
Adding lines between columns can add an element of design to your document. You may want to add lines to
your column if you are following a style similar to that of a newsletter or bulletin. The following feature
automatically adds lines between all columns.
1. Place the insertion point within the column text
2. Windows: From the Page Layout command tab, within the Page Setup section, click Columns » select
More Columns...
3. Select Line between
4. Click OK
Inserting Column Breaks
Insert a column break when you want to force the end of a column and the beginning of another.
Place the insertion point at the point in the text where you want the column to break
Windows: From the Page Layout command tab, within the Page Setup section, select Breaks » select Column
Balancing Column Endings
When using columns, often the text in the last column is of uneven length with the previous column. Inserting
a continuous column break will balance the column lengths, giving your document a finished, professional
look.
Place the insertion point after the last character in the last column
Windows: From the Page Layout command tab, within the Page Setup section, select Breaks » select
Continuous
Caution
Be careful when sizing columns if a cell is selected in a column and if you attempt to drag the sizing tool to
change the column width, only the width of the row holding selected cell will change. Make sure no cells are
selected if you want to size the entire column.
3.6.3 Deleting Columns
You can choose to delete all columns in a document or only a section of columns.
Deleting Columns: Button Option
Place the insertion point in the document that is formatted into columns
OR
Select the text that is formatted into columns
Windows: From the Page Layout command tab, within the Page Setup section, click COLUMNS
Deleting Columns: Dialog Box Option
Deleting All Columns
Windows: From the Home command tab, within the Editing section, click Select » select Select All
OR
Windows: Press [Ctrl]+[A]
Windows: From the Page Layout command tab, within the Page Setup section, click Columns » select More
Columns
Within the Presets section, select One
From the Apply To pull-down list, select Whole document
Click OK
Deleting Columns from a Section
Select the text that you want changed to one column
Windows: From the Page Layout command tab, within the Page Setup section, click Columns » The Columns
dialog box opens.
Within the Presets section, select One
From the Apply To pull-down list, select This section
Click OK
2. ………………..includes many predesigned headers or footers that you can insert into your document.
(a) Microsoft office word 2003 (b) Microsoft office word 2010
(c) Microsoft office word 2007 (d) Microsoft office word 97
3. Using the button option to ……………..is quick and easy, whereas the Columns dialog box requires more
steps but offers more options for modification.
(a) Create columns (b) Modifying columns
(c) Deleting columns (d) Inserting columns
If you do not see the ruler at the top of your document workspace (below the Ribbon), you will want to turn it
on.
3.7.1 Tabs
Tabs are set, by default at every ½ inch between your margins (until you start setting custom tabs). You do not
have to do anything special to use the default tabs except press the [Tab] key on your keyboard.
When you do, you can see tabs in the form of arrows ( ) on your screen. (You may need
to click the Show/Hide button in the Paragraph group on the Home tab to see the [Tab] characters.)
If you look closely at the ruler bar, you can see small tick marks at every ½ inch along the bottom. Those at the
default tab stops. However, when you begin to set custom tabs, any default tabs to the left of (or before) the
custom tab stop are automatically deleted. The tab stop indicator is located on the very left edge of your ruler,
just under the Ribbon. Typically, it displays the Left-Aligned Tab symbol ( ). In addition to [Tab] stops that
align along the left, you can create [Tab] stops that align in the center, at the right or with a decimal.
Symbol Means
Left-Aligned Tab
Center-Aligned Tab
Right-Aligned Tab
Decimal-Aligned Tab
Bar Tab
3.7.2 Indents
Whether you know it or not, you create and work with indents every time you create a bulleted or numbered
list.
While you can set your indents in the Paragraph Dialog Box, it is certainly easier to set them using the Ruler.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Navigating in a Table
Please see below to learn how to move around inside a table.
To Insert a Row:
1) Position the cursor in the table where you would like to insert a row
2) Select the Layout tab on the Ribbon
3) Click either the Insert Row Above or the Insert Row Below button in the
Rows & Columns group
To Insert a Column
1) Position the cursor in the table where you would like to insert a column
2) Select the Layout tab on the Ribbon
3) Click either the Insert Columns to Left button or the Insert Columns to
Right button in the Rows & Columns group
To Delete a Row
1) Position your cursor in the row that you would like to delete
2) Select the Layout tab on the Ribbon
3) Click the Delete button in the Rows & Column group
4) Select Delete Rows
To Delete a Column
1) Position your cursor in the column that you would like to delete
2) Select the Layout tab on the Ribbon
3) Click the Delete button in the Rows & Column group
4) Select Delete Columns
Formatting a Table
Using Microsoft Word you are able to format a table by changing table lines and colors, shading tables,
adjusting row and column size as well as alignment.
Note: You are able to format data in a table the same way you format it in a document
To Merge Cells in a Table
1) Select the cells that you would like to merge in the table
2) Click on the Layout tab on the ribbon
3) Click the Merge Cells button in the Merge group
7) Click OK
3. When you click the margin type that you want, your entire document automatically changes to the margin
type that you have selected.
You can also specify your own margin settings. Click Margins, click Custom Margins, and then in the Top,
Bottom, Left, and Right boxes, enter new values for the margins.
If you are creating a New Source, choose the type of source (book, article, etc.)
Complete the Create Source Form
If you need additional fields, be sure to click the Show All Bibliography Fields check box
Click OK
Placeholders
Placeholders can be utilized when there is a reference to be cited, but you do not have all of the information on
the source. To insert a Placeholder:
Click Insert Citation
Click Add New Placeholder
Manage Sources
Once you have completed a document you may need to add or delete sources, modify existing sources, or
complete the information for the placeholders. To Manage Sources:
Click the References Tab on the Ribbon
Click the Manage Sources Button on the Citations and Bibliography Group
From this menu you can Add, Delete, and Edit Sources (note, you can preview the source in the bottom
pane of the window
Figure 3.40: Source Manager.
Bibliography
To add a Bibliography to the document:
Place the cursor in the document where you want the bibliography
Click the References Tab on the Ribbon
Click the Bibliography Button on the Citations and Bibliography Group
Choose Insert Built-in Bibliography/Works Cited or Insert Bibliography
To insert a picture:
Place your cursor in the document where you want the illustration/picture
Click the Insert Tab on the Ribbon
Click the Picture Button
Browse to the picture you wish to include
Click the Picture
Click Insert
Smart Art is a collection of graphics you can utilize to organize information within your document. It
includes timelines, processes, or workflow. To insert SmartArt
Place your cursor in the document where you want the illustration/picture
Click the Insert Tab on the Ribbon
Click the SmartArt button
Click the SmartArt you wish to include in your document
Click the arrow on the left side of the graphic to insert text or type the text in the graphic.
Resize Graphics
All graphics can be resized by clicking the image and clicking one corner of the image and dragging the cursor
to the size you want the picture.
The recipient list can be refined by Sort, Filter, Find Duplicates, Find Recipient and Validate addresses
options.
Step 3: Design your data document by combining ordinary document features with Word merge fields.
Placeholders can be used when designing the data document for information pertaining to the intended
recipient. When you are done, edit your document and substitute Merge Fields for the placeholders. To insert a
merge field, position the insertion point where you want the field to appear. In the Mailings tab, choose Insert
Merge Field in the Write and Insert Fields group. Click on the field you want to insert. Special sets of merge
fields like Address Block and Greeting Line can be inserted to save time!
Figure 3.53: Inset Merge field.
Step 4: Preview the finished document by testing to see how it looks with different data records.
Click the Preview Results button in the Preview Results group of the Mailings tab. Navigation buttons help
you to traverse through the records.
Step 5 Finish the process. Merge the data document with the data source, creating a printed result, a saved
document or an e-mailed document.
Your other option is to use the Mail Merge Wizard! In the Start Mail Merge group of the Mailings tab, click
the Start Mail Merge button and choose Step by Step Mail Merge Wizard.
Figure 3.56: Mail Merge Wizard.
Click the Start Mail Merge button and from the drop down list click Labels… to make a page of mailing
labels. The Letters option is also available to you at this point if you are planning on having a form letter with
a personalized salutation.
For mailing labels; From the Label Options window choose the Label vendor and product number of the labels
you will be using. The label number will appear on the outside of the box of labels you purchase from the
store. After your selection click OK.
Figure 3.59: Label option.
You are now ready to get the information from the database with the contact information. Click the Select
Recipients button and pick Use Existing List from the drop down menu.
You will be prompted to select the source of your database. You will need change the Look in: field to the
folder where your list is stored and then select your list and choose Open.
Word will bring up an Insert Merge Field window. This will have a list of all the fields you have entered into
your database. We are going to use First Name, Last Name, Address, City, State, and Zip. If your database has
extra fields like phone number, or e-mail address we will not select those for the mailing label. To add a field
to a label double click on the field name or click on the field name and then press the Insert button. After all
the required fields have been selected you can close the window by pressing the Close button.
5. Continuous feed printers are usually used with dot matrix printers, while page printers are not typically laser
jets or ink jets.
(a) True (b) False
3.13 Summary
Smart Art is a collection of graphics you can utilize to organize information within document.
A watermark is a translucent image that appears behind the primary text in a document.
The existing header o
Columns are a good way to separate sections of your document on one page r footer is now configured for
odd-numbered pages only.
Mail Merge is an automated feature of MS Word that enables you to merge a data source into a copy of a
document to customize or personalize the document.
Many header and footer designs are laid out in a table, and pressing TAB moves the cursor without
inserting a tab stop.
3.14 Keywords
Bibliography: Bibliography is an AJAX-based multi-user open source on-line bibliographic data manager and
bibliography tool.
Header and Footer: Headers and footers are those little identifiers that run across the top and bottom of your
document, providing important background information about it.
Mail Merge: Mail merge is a software function which allows to create multiple (and potentially large numbers
of) documents from a single template form and a structured data source.
Master documents: A Master Document is a document that contains a set of related documents called Sub
Documents.
Page Layout: Page layout is the process of placing and arranging and rearranging text and graphics on the
page to produce documents such as newsletters, brochures, books, etc.
4.0 Objectives
After studying this chapter, you will be able to:
Discuss Excel working area
Explain working with Excel‘s windows
Understand working with rows and columns
Discuss concepts of workbooks and worksheets
Explain moving around a worksheet
Understand creating your first excel worksheet
Discuss different views of worksheets
Define cell formatting
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
4.4.5 SmartArt
Excel 2007 still includes a wide assortment of Shapes that you can use to create visual diagrams, such as flow
charts, org charts, or diagrams that depict relationships. But the new SmartArt feature is a much better tool for
such tasks. You can quickly add shadows, reflection, glow, and other special effects.
4.4.6 Formula AutoComplete
Entering formulas in Excel 2007 can be a bit less cumbersome, thanks to the new Formula AutoComplete
feature. When you begin typing a formula, Excel displays a continually updated drop-down list of matching
items, including a description of each item. When you see the item you want, press Tab to enter it into your
formula. The items in this list consist of functions, defined names, and table references.
Figure 4.1: The Excel screen has many useful elements that you will use often.
Table 4.2 Parts of Excel Screen
Caution
To change the active cell, you must click a new cell after scrolling.
2. To enter a formula to calculate the projected sales for February, move to cell B3 and enter the following:
=B2*103.5%. When you press Enter, the cell will display 51750. The formula returns the contents of cell B2,
multiplied by 103.5%. In other words, February sales are projected to be 3.5% greater than January sales.
3. The projected sales for subsequent months will use a similar formula. But rather than retyping the formula
for each cell in column B, once again take advantage of the AutoFill feature. Make sure that cell B3 is
selected. Click the cell‘s fill handle, drag down to cell B13, and release the mouse button.
At this point, your worksheet should resemble the one shown in Figure 5.3. Keep in mind that, except for cell
B2, the values in column B are calculated with formulas. To demonstrate, try changing the projected sales
value for the initial month, January (in cell B2). You will find that the formulas recalculate and return different
values. But these formulas all depend on the initial value in cell B2.
Figure 4.3: Your worksheet, after creating the formulas.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
7. The ................ key on your keyboard controls how the keys on the numeric keypad behave.
(a) PgUp (b) Num Lock
(c) PgDn (d) None of these.
4.10 Summary
Excel is the world‘s most widely used spreadsheet program and is part of the Microsoft Office suite.
Excel is very useful for non-numerical applications.
Every worksheet has exactly 1,048,576 rows and 16,384 columns, and these values cannot be changed.
To increase the number of rows and columns, save the workbook as an Excel 2007 XLSX file and then
reopen it.
A chart sheet displays a single chart and is accessible by clicking a tab.
4.11 Keywords
Automating complex tasks: Perform a tedious task with a single mouse click with Excel‘s macro capabilities.
Chart Sheet: It displays a single chart and is also accessible by clicking a tab.
Conditional Formatting: It refers to the ability to format a cell based on its value
Table: It is a rectangular range of cells that contain column headers.
Workbook: It is comprised of one or more worksheets, and each worksheet is made up of individual cells.
4.12 Review Questions
1. Excel is for non-numerical applications. Explain.
2. How Excel 2007 is different from previous version of excel?
3. What is the importance of table in worksheet?
4. What is the difference between worksheet and workbook?
5. Explain the process of moving and resizing the windows.
6. Discuss the process to increase the number of rows and columns in a worksheet.
7. How can we hide the rows and columns? Explain.
8. Define SmartArt.
9. What are the new features in Excel 2007? Discuss.
10. What are the tools through which you can move around a worksheet? Discuss each of them.
Answers for Self Assessment Questions
1 (a) 2 (b) 3 (c) 4 (a)
5 (d) 6 (a) 7 (b) 8 (c)
5.0 Objectives
After studying this chapter, you will be able to:
Understand area and working with MS PowerPoint
Define starting and exiting PowerPoint
Define creating a new presentation
Discuss closing and reopening presentations
Define creating a new slide
Understand inserting content from external sources
5.1 Introduction
A presentation is any kind of interaction between a speaker and audience, but it usually involves one or more
of the following visual aids: 35 mm slides, overhead transparencies, computer-based slides (either local or at a
Web site or other network location), hard-copy handouts, and speaker notes. PowerPoint 2007 can create all of
these types of visual aids, plus many other types that you learn about as we go along. Like other programs in
the Office 2007 suite, PowerPoint 2007 takes a radical and innovative new approach to its user interface.
Although it is very convenient to use once you master it, even experienced users of earlier versions might need
some help getting started.
Figure 5.2: Many more effects are available for drawn lines and shapes.
PowerPoint 2007 loses the differentiation between WordArt text and regular text, so the full gamuts of
formatting features are available to all text, regardless of position or usage. You can format individual words
as separate pieces of WordArt, or entire text boxes by using a common WordArt style. In Figure 5.3, the slide
title ―Green Hill Shelties‖ is regular text, and appears on the presentation outline, but it also benefits from
WordArt formatting effects.
Figure 5.3: WordArt can now be applied to regular text, including slide titles.
Figure 5.4: Choose colours for text and graphic objects from a colour picker that focuses on theme-based
colour choices.
Font themes apply one font for headings and another for body text. In PowerPoint 2007 it is usually best not to
apply a specific font to any text, but instead to apply either (Body) or (Heading) to it. Then you can let the font
theme dictate the font choices, so that they will update automatically when you chose a different theme. On the
Font drop-down list, the top choices are now (Body) and (Heading). The font listed next to them is the font
that happens to be applied with the current theme.
Effect themes apply shadows and 3-D effects to graphic objects. PowerPoint 2007‘s new gallery of effects are
impressive, and can make plain lines and shapes appear to pop off the screen with textures that simulate glass,
metal, or other surfaces.
SmartArt
SmartArt uses groups of lines and shapes to present text information in a graphical, conceptually meaningful
way. Experts have been saying for years that people respond better to information when it is presented
graphically, but the difficulty in constructing attractive diagrams has meant that most people used plain
bulleted lists for everything. SmartArt can convert a bulleted list into a conceptual diagram in just a few clicks.
Figure 5.5 shows a plain bulleted list (left) and a SmartArt diagram constructed from it. The SmartArt is not
only more interesting to look at, but it also conveys additional information—it shows that the product life
cycle repeats continuously.
Figure 5.5: SmartArt diagrams are easy to create and make information more palatable and easy to
understand.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Figure 5.7 Select a view from the View tab or from the viewing controls in the bottom-right corner of the
screen.
5.3.2 Normal View
Normal view, shown in Figure 5.8, is a very flexible view that contains a little of everything. In the center is
the active slide, below it is a Notes pane, and to its left is a dual-use pane with two tabs: Outline and Slides.
(Figure 5.7 shows Slides, and Figure 5.8 shows Outline.)
When the Outline tab is selected, the text from the slides appears in an outline form. When the Slides tab is
selected, thumbnail images of all the slides appear.
Each of the panes in Normal view has its own scroll bar, so you can move around in the outline, the slide, and
the notes independently of the other panes. You can resize the panes by dragging the dividers between the
panes. For example, to give the notes area more room, point the mouse pointer at the divider line between it
and the slide area so that the mouse pointer becomes a double-headed arrow, and then hold down the left
mouse button as you drag the line up to a new spot.
Figure 5.8: Normal view, the default, offers access to the outline, the slide, and the notes all at once.
The Slides/Outline pane is useful because it lets you jump quickly to a specific slide by clicking on it. For
example, in Figure 5.7 you can click on any of the slide thumbnails on the Slides tab to display it in the Slide
pane. Or in Figure 6.8 you can click some text anywhere in the outline to jump to the slide containing that text.
You can turn the Slides/Outline pane off completely by clicking the X button in its top-right corner. This gives
maximum room to the Slides pane. When you turn it off, the Notes pane disappears too; they cannot be turned
on/off separately. To get the extra panes back, reapply Normal view.
Figure 5.10: Notes Page view offers a special text area for your notes, separate from the slides.
5.3.6 Zooming In and Out
If you need a closer look at your presentation, you can zoom the view in or out to accommodate almost any
situation. For example, if you have trouble placing a graphic exactly at the same vertical level as some text in a
box next to it, you can zoom in for more precision. You can view your work at various magnifications on-
screen without changing the size of the surrounding tools or the size of the print on the printout. In Normal
view, each of the panes has its own individual zoom. To set the zoom for the Slides/Outline pane only, for
example, select it first; then choose a zoom level. Or to zoom only in the Slide pane, click it first.
In a single-pane view like Notes Page or Slide Sorter, a single zoom setting affects the entire work area. The
larger the zoom number, the larger the details on the display. A zoom of 10% would make a slide so tiny that
you could not read it. A zoom of 400% would make a few letters on a slide so big they would fill the entire
pane. The easiest way to set the zoom level is to drag the Zoom slider in the bottom-right corner of the
PowerPoint window, or click its plus or minus buttons in increment the zoom level. See Figure 5.11.
To resize the current slide so that it is as large as possible while still fitting completely in the Slides pane, click
the Fit Slide to Current Window button, or click the Fit to Window button in the Zoom group on the View tab.
Figure 5.12: Zoom in or out to see more or less of the slide(s) at once.
Another way to control the zoom is with the Zoom dialog box. On the View tab, in the Zoom group, click the
Zoom button. (You can also open that dialog box by clicking the % next to the Zoom slider.) Make your
selection, as shown in Figure 5.12, by clicking the appropriate button, and then click OK. Notice that you can
type a precise zoom percentage in the Percent text box. You can specify any percentage you like, but some
panes and views will not go higher than 100%.
Figure 5.12: You can zoom with this Zoom dialog box rather than the slider if you prefer.
Enabling Optional Display Elements
PowerPoint has a lot of optional screen elements that you may (or may not) find useful, depending on what
you are up to at the moment. The following sections describe them.
Ruler
Vertical and horizontal rulers around the slide pane can help you place objects more precisely. To toggle them
on or off, mark or clear the Ruler check box on the View tab. Rulers are available only in Normal and Notes
Page views. The rulers help with positioning no matter what content type you are working with, but when you
are editing text in a text frame they have an additional purpose as well. The horizontal ruler shows the frame‘s
paragraph indents and any custom tab stops, and you can drag the indent markers on the ruler just like you can
in Word.
Gridlines
Gridlines are non-printing dotted lines at regularly spaced intervals that can help you line up objects on a slide.
Figure 5.13 shows gridlines (and the ruler) enabled.
To turn gridlines on or off, use any of these methods:
Press Shift+F9.
On the View tab, in the Show/Hide group, mark or clear the Gridlines check box.
On the Design tab, in the Arrange group, choose Align➪Show Gridlines.
There are many options you can set for the gridlines, including whether objects snap to it, whether the grid is
visible, and what the spacing should be between the gridlines.
To set grid options, follow these steps:
1. On the Home tab, in the Drawing group, choose Arrange➪Align➪Grid Settings, or right click the slide
background and choose Grid and Guides. The Grid and Guides dialog box opens (see Figure 6.14).
2. In the Snap To section, mark or clear these check boxes.
Snap Objects to Grid: Specifies whether or not objects will shift automatically align with the grid.
Snap Object to Other Objects: Specifies whether or not objects will automatically align with other objects.
3. In the Grid Settings section, enter the amount of space between gridlines desired.
4. Mark or clear the Display Grid on Screen check box to display or hide the grid.
5. Click OK.
Figure 5.13: Gridlines and the ruler help align objects on a slide.
Figure 5.14: Set grid options and spacing.
Guides
Guides are like gridlines except they are individual lines, rather than a grid of lines, and you can drag them to
different positions on the slide. As you drag a guide, a numeric indicator appears to let you know the ruler
position. See Figure 5.15. Use the Grid and Guides dialog box to turn guides on/off, or press Alt+F9.
Figure 5.15: Guides are movable, non-printing lines that help with alignment.
You can create additional sets of guide lines by holding down the Ctrl key while dragging a guide (to copy it).
You can have up to eight horizontal and vertical guides, all at positions you specify.
When you are finished, click the Back to Colour View button on the Grayscale tab. Changing the Black and
White or Grayscale settings does not affect the colours on the slides; it only affects how the slides will look
and print in black and white or grayscale.
Figure 5.17: Select Blank Presentation from the New Presentation dialog box.
Figure 5.20.Select a data file from some other program as the basis of a new presentation.
Self assessment Questions
1. A .................is a set of formatting specifications that are applied to objects and text consistently throughout
the presentation.
(a) Colour (b) Font (c) Themes (d) Text.
2. SmartArt uses groups of lines and shapes to present ..............information in a graphical, conceptually
meaningful way.
(a) Colour (b) Font (c) Themes (d) Text.
3. ........................rulers around the slide pane can help you place objects more precisely.
(a) Vertical and horizontal (b) Left and right
(c) Centre (d) Up and down.
4. .......................are non-printing dotted lines at regularly spaced intervals that can help you line up objects on
a slide.
(a) View tab (b) Gridlines (c) Guides (d) Ruler.
Figure 5.21: Save your work by specifying a name for the presentation file.
3. Click Save. Your work is saved.
Filenames can be up to 255 characters. For practical purposes, however, keep the names short. You can
include spaces in the filenames and most symbols except <, >, ?, *, /, and \. However, if you plan to post the
file on a network or the Internet at some point, you should avoid using spaces; use the underscore character
instead to simulate a space if needed. There have also been problems reported with files that use exclamation
points in their names, so beware of that. Generally it is best to avoid punctuation marks in names.
Figure 5.22: Choose a different format, if needed, from the Save As Type drop-down list.
Most of the other choices from Table 5.2 are special-purpose, and not suitable for everyday use.
Table 5.2 PowerPoint 2007 Features Not Supported in Previous PowerPoint Versions
Figure 5.23: Set Save Options to match the way you want PowerPoint to save your work.
Then set any of the options desired. They are summarized in Table 5.3. Click OK when you are finished. One
of the most important features described in Table 5.3 is AutoRecover, which is turned on by default.
This means if a system error or power outage causes PowerPoint to terminate unexpectedly, you do not lose all
of the work you have done. The next time you start PowerPoint, it opens the recovered file and asks if you
want to save it.
Caution
AutoRecover is not a substitute for saving your work the regular way. It does not save in the same sense that
the Save command does; it only saves a backup version as PowerPoint is running. If you quit PowerPoint
normally, that backup version is erased. The backup version is available for recovery only if PowerPoint
terminates abnormally (because a system lockup or a power outage).
3. If you want to save your changes, click Yes. If the presentation has already been saved once, you are done.
4. If the presentation has not been saved before, the Save As dialog box appears. Type a name in the File
Name text box and click Save.
To open more than one presentation at once, hold down the Ctrl key as you click each file you want to open.
Then, click the Open button and they all open in their own windows.
The Open button in the Open dialog box has its own drop-down list from which you can select commands that
open the file in different ways. See Figure 5.24, and refer to Table 5.4 for an explanation of the available
options.
Figure 5.24: The Open button‘s menu contains several special options for opening a file.
Figure 5.24: To open files from different programs, change the File Type setting to All Files.
5.7.4 Finding a presentation file to open
If you have forgotten where you saved a particular presentation file, you‘re not out of luck. The Open dialog
box (under Windows Vista) includes a Search box that can help you locate it. See Figure. 5.25.
To search for a file, follow these steps:
1. Choose Office Button➪Open to display the Open dialog box.
2. Navigate to a location that you know the file is in. For example, if you know it is on the C:
drive, click Computer in the Favourite Links list and then double-click the C: drive.
3. Click in the Search box and type part of the filename (if you know it) or a word or phrase used in the file.
4. Press Enter. A list of files appears that match that specification.
5. Open the file as you normally would.
Figure 5.25: When you type text into the Outline pane, it automatically appears on the current slide.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
5.9 Inserting Content from External Sources
Many people find that they can save a lot of time by copying text or slides from other programs or from other
PowerPoint presentations to form the basis of a new presentation. There is no need to reinvent the wheel each
time! The following sections look at various ways to bring in content from external sources.
5. (Optional) If you want to keep the source formatting when copying slides, select the Keep Source
Formatting check box at the bottom of the task pane.
6. (Optional) You can move the cursor over a slide to see an enlarged image of it.
7. Do any of the following:
To insert a single slide, click it.
To insert all slides at once, right-click any slide and choose Insert All Slides.
To copy only the theme (not the content), right-click any slide and choose Apply Theme to All Slides, or
Apply Theme to Selected Slides.
Caution
Copying the theme with the Apply Theme to All Slides or Apply Theme to Selected Slides command does not
copy the background graphics, layouts, or anything else other than the three elements that are included in a
theme: font choices, colour choices, and effect choices. If you want to copy all of the formatting, select the
Keep Source Formatting checkbox and insert one or more slides
Figure 5.28: You can insert a picture by using the Insert Picture from File content placeholder icon.
3. Select the picture to import. See Figure 5.29. You can switch the view by using the View (or
Views) button in the dialog box to see thumbnails or details if either is effective in helping you determine
which file is which.
4. Click Insert. The picture is inserted.
5.12 Handouts
If you are presenting a live show, the center piece of your presentation is your slides. Whether you show them
using a computer screen, a slide projector, or an overhead projector, the slides—combined with your own
dazzling personality—make the biggest impact. But if you rely on your audience to remember everything you
say, you may be disappointed. With handouts, the audience members can follow along with you during the
show and even take their own notes. They can then take the handouts home with them to review the
information again later.
You probably want a different set of support materials for yourself than you want for the audience. Support
materials designed for the speaker‘s use are called speaker notes. In addition to small printouts of the slides,
the speaker notes contain any extra notes or background information that you think you may need to jog your
memory as you speak. Some people get very nervous when they speak in front of a crowd; speaker notes can
remind you of the joke you wanted to open with or the exact figures behind a particular pie chart.
Presentation professionals are divided about how and when to use handouts most effectively. Here are some of
the many conflicting viewpoints. The bottom line is that each of them is an opinion on how much power and
credit to give to the audience; your answer may vary depending on the audience you are addressing.
You should give handouts at the beginning of the presentation. The audience can absorb the information
better if they can follow along on paper.
This approach makes a lot of sense. Research has proven that people absorb more facts if presented with them
in more than one medium. This approach also gives your audience free will; they can listen to you or not, and
they still have the information. It is their choice, and this can be extremely scary for less-confident speakers. It
is not just a speaker confidence issue in some cases, however. If you plan to give a lot of extra information in
your speech that is not on the handouts, people might miss it if you distribute the handouts at the beginning
because they‘re reading ahead.
You should not give the audience handouts because they would not pay as close attention to your speech if
they know that the information is already written down for them.
This philosophy falls at the other end of the spectrum. It gives the audience the least power and shows the least
confidence in their ability to pay attention to you in the presence of a distraction (handouts). If you truly do not
trust your audience to be professional and listen, this approach may be your best option. However, do not let
insecurity as a speaker drive you prematurely to this conclusion.
The fact is that people would not take away as much knowledge about the topic without handouts as they
would if you provide handouts. So, ask yourself if your ultimate goal is to fill the audience with knowledge or
to make them pay attention to you.
You should give handouts at the end of the presentation so that people will have the information to take
home but not be distracted during the speech.
This approach attempts to solve the dilemma with compromise. The trouble with it, as with all compromises, is
that it does an incomplete job from both angles. Because audience members cannot follow along on the
handouts during the presentation, they miss the opportunity to jot notes on the handouts. And because the
audience knows that handouts are coming, they might nod off and miss something important. The other
problem is that if you do not clearly tell people that handouts are coming later, some people spend the entire
presentation frantically copying down each slide on their own notepaper.
Figure 5.30: Choose Handouts to print and specify which handout layout you want.
7. Open the Slides Per Page drop-down list and choose the number of slides per page you want.
8. If available, choose an Order: Horizontal or Vertical. Not all number-of-slide choices (from Step 7) support
an Order choice.
9. Open the Colour/Grayscale drop-down list and select the colour setting for the printouts:
Colour: Sends the data to the printer assuming that colour will be used. When you use this setting with a
black-and-white printer, it results in slides with grayscale or black backgrounds.
Use this setting if you want the handouts to look as much as possible like the onscreen slides.
Grayscale: Sends the data to the printer assuming that colour will not be used. Coloured backgrounds are
removed, and if text is normally a light colour on a dark background, that is reversed. Use this setting if you
want PowerPoint to optimize the printout for viewing on white paper.
Pure Black and White: This format hides most shadows and patterns. It is good for faxes and overhead
transparencies.
10. Mark any desired checkboxes at the bottom of the dialog box:
Scale to Fit Paper: Enlarges the slides to the maximum size they can be and still fit on the layout
Frame Slides: Draws a black border around each slide image. Useful for those slides being printed with white
backgrounds.
Print Comments: Prints any comments that you have inserted with the Comments feature in PowerPoint.
Print Hidden Slides: Includes hidden slides in the printout. This option is not available if you do not have any
hidden slides in your presentation.
High Quality: Optimizes the appearance of the printout in small ways, such as allowing text shadows to print.
11. (Optional) Click the Preview button to see a preview of your handouts; then click the Print button to return
to the Print dialog box.
12. Click OK. The handouts print, and you are ready to roll!
5.13.3 Setting printer-specific options
In addition to the controls in the Print dialog box in PowerPoint, there are controls you can set that affect the
printer you have chosen. In the Printer section of the Print dialog box, you can open the Name drop-down list
and choose the printer you want to use to print the job.
.
Figure 5.31: Notes Page view is one of the best ways to work with your speaker notes.
If you recorded the sound clip, the sound will play during a slide show only when you click the sound icon.
(You can modify this behaviour using the Custom Animation task pane.) If, however, you inserted a sound clip
from the Clip Organizer or from a sound file, PowerPoint will display a message box letting you choose when
the sound clip will play.
To have the clip play automatically when the slide is displayed in a slide show, click the automatically
button.
To have the clip play only after you click the sound icon, click the When Clicked button.
Modify the Way the Sound Clip Plays During a Slide Show
1. Right-click the sound icon in your slide.
2. Do any or all of the following:
To adjust the volume or change the sound object display options, on the shortcut menu, click Edit Sound
Object and then select the options you want. The Sound Options dialog box also indicates where the clip is
stored. If it is stored within the presentation file, the dialog box will display the location Contained In
Presentation. If it is stored in a separate linked file, it will display the file path. (PowerPoint normally
stores a sound clip in a separate file if it is larger than 100 KB.) In the latter case, if you are going to
present your slide show on another computer, you will need to take the linked file with you.
The easiest way to copy a presentation plus all linked files to a portable medium that you can bring to
another computer is to use the new Package for CD feature.
To modify the way the sound clip plays during a slide show, on the shortcut menu, click Custom
Animation, and then use the controls in the Custom Animation task pane.
To control the action that takes place when you either click the sound icon or move the mouse pointer
over it, on the shortcut menu, click Action Settings.
To preview the sound clip, on the shortcut menu, click Play Sound.
5.15 Summary
PowerPoint 2007 adds new text formatting capabilities to help users further polish their work.
Regular text could not receive WordArt formatting such as reshaping, stretching, and distortion
SmartArt uses groups of lines and shapes to present text information in a graphical, conceptually
meaningful way.
SmartArt convert a bulleted list into a conceptual diagram in just a few clicks.
Notes Page view is accessible only from the View tab.
Graphics and videos can be added to the presentation.
5.16 Keywords
Gridlines: These are non-printing dotted lines at regularly spaced intervals that can help you line up objects on
a slide.
Single File Web Page: It creates a single .mht document that contains all of the HTML codes and all of the
slides.
Slide Show: The view you use to show the presentation on-screen. Each slide fills the entire screen in its turn
SmartArt: It can convert a bulleted list into a conceptual diagram in just a few clicks.
Template: It is a file that contains starter settings on which you can base new presentations.
6.0 Objectives
After studying this chapter, you will be able to:
Explain the setup e-mail account with outlook
Discuss the sending and receiving mail through outlook
Explain the concepts of Cc and Bcc
Explain the forwarding mail
Explain the draft messages
Discuss about the formatting e-mail message
Explain the concept of MIME
Discuss the outlook protocol
Discuss about the attaching files and items into messages
Understand the inserting hyperlink using outlook
6.1 Introduction
Outlook Express is a free online communication tool from Microsoft that you can use for e-mail or
newsgroups. It is included with Microsoft Internet Explorer 6 for Windows operating systems. With Outlook
Express, you can download your e-mail messages from the UH mail server onto your computer‘s local hard
drive. Outlook Express also allows you to view old mail messages and compose new mail messages off-line
and simplifies reading and sending attachments. It allows you to receive mail from multiple e-mail accounts,
as well as create Inbox rules that allow you to manage and organize your e-mail.
2. On the Account Configuration page, select yes to indicate you want to configure an e-mail account, and then
click Next.
Figure 6.2: Account Configuration page.
4. At the bottom of the page, select manually configure server settings or additional server types, and then
click Next.
5. On the choose e-mail service page, select Internet e-mail, and then click Next.
Figure 6.4: Choose e-mail service page.
6. On the Internet e-mail Settings page, enter your e-mail account information as follows:
Your Name
Enter your first and last name.
E-mail Address
Enter your e-mail address.
Account Type
Select POP3.
Incoming mail server
Type pop-1.mail.vi.net for your incoming mail server. Outgoing mail server (SMTP)
Type smtp-1.mail.vi.net for your outgoing mail server.
User Name
Enter your e-mail address again.
Password
Enter the password you created for your e-mail account.
7. Select the Remember Password checkbox, and then click More Settings.
Select the option in the bottom left corner to manually configure server settings or additional server settings.
Click Next.
Click More Settings. In the General tab, enter your name and a reply e-mail address. Usually, this is the same
as your e-mail address.
Click the Outgoing Server tab. If you are using this service's server for sending e-mail, select the option: My
outgoing server (SMTP) requires authentication.
If you are using your Internet Service Provider's SMTP server, it is likely you will not need to enable this
option, but check with them to verify the settings to use.
Figure 6.13: Verify the settings to use.
Click the Advanced tab. Verify that the port numbers are set to 110 and 25. Make sure the other options are
not selected.
Click OK and then click Finish.
Test the new e-mail account to verify that you can send and receive mail, by clicking Send/Receive.
Did You Know?
The maximum size of one piece of mail is to 10 MB for both transmission and reception.
The e-mail‘s headers also tell me that the message was sent to nav@sngt.com and copied to nav@asngt.com at
the same time.
From: "Navneet" editor@sngt.com
To: nav@sngt.com
Cc: k.rah@gmail.com
Subject: test
(You can check an e-mail‘s headers in Outlook Express by right-clicking on the e-mail in your inbox and left
clicking on Properties.)
Now if Norrie sees that, he should right away be able to tell that his e-mail address has been put into the Bcc
field. It is pretty obvious as an e-mail has appeared in his inbox that is clearly addressed to someone else. This
may ring a bell for some of you who have noticed you have received Spam messages that do not appear to be
addressed to you. Obviously they are being sent to someone else and your e-mail address has been included in
the Bcc field.
Received: by xyz@abc.com
Delivered-To: tqs@nmj.com
Message-ID: <011201c1b34d$48eed570$4d1560cb@ZORNCAT>
From: "nav" editor@abc.com
To: vb@abc.com
Subject: test Bcc
This information tells him that the e-mail was delivered to him even though it was addressed to someone else,
a clear sign that his e-mail address was in the Bcc field.
If that is too confusing, then perhaps we can simplify it as follows. If you do not want someone to know that
the e-mail to them is being copied to someone else, put their e-mail address in the ―To‖ field. The person
whose e-mail address you put into the Bcc field will know what is going on.
Figure 6.19: Click the Microsoft Office Button, and then click Save.
To return to a saved draft so that you can continue composing the message, do the following:
In Mail, in the All Mail Items, looks for a folder called Drafts, and then double-click the message that was
saved in that folder.
Caution
When creating a long mail document, save a draft during mail creation. When mail being created remains as is
without being saved for 120 minutes, the session may be interrupted, and the created mail may be lost.
Mail
Microsoft Outlook works with Microsoft Exchange, Microsoft's proprietary mail server for businesses.
Outlook can also handle popular e-mail protocols included POP3 (Post Office Protocol), IMAP (Internet
Message Access Protocol) and SMTP (Send Mail Transfer Protocol.) Users can also access Windows Live
Hotmail accounts from Outlook. Active mail accounts will show up in the Mail section of Outlook.
Calendar
Outlook Calendar can handle calendars served by Microsoft Exchange as well as Internet Calendar (.ics)
format files. Internet Calendar event items may be sent via e-mail as a one-time calendar snapshot. Whole
calendars can be subscribed to over the Internet. Calendar event items and subscribed calendars will show up
in the user's Outlook calendar.
Other
Outlook can also subscribe to RSS (Real Simple Syndication) feeds to keep track of updates to websites
such as news sites and blogs. Subscribed feeds show up in the Mail section of Outlook.
3. In the Insert File dialog box, browse to and select the file that you want to attach, and then click Insert.
You do not see the file that you want in the appropriate folder, make sure that All files (*.*) is selected in the
Files of type box, and that Windows Explorer is configured to show file name extensions.
Caution
By default, Outlook blocks potentially unsafe attachments (including .bat, .exe, .vbs, and .js files) that might
contain viruses. If you attach such a file to an e-mail message, you will be asked whether you want to send a
potentially unsafe attachment. If you answer Yes, Outlook will send the attachment. If you answer No, you can
then remove the potentially unsafe attachment.
You can attach multiple files simultaneously by selecting the files and dragging them from a folder on your
computer to an open message in Outlook.
6.10.2 Attach a message or other item to a message
You can attach items and other messages to a new message:
1. On the File menu, click New, and then click Mail Message.
2. On the Message tab, in the Include group, click Attach Item.
3. In the Look in list, click the folder that contains the item that you want to attach.
4. Click the item that you want, and then click OK.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
4. Populate the To, CC, Subject fields as you normally would and type in a sentence of text including the
word that you want to change into a link.
6. In the "Address" field, type the URL of the website that you want to link to and click the "OK" button.
Next, you will want to choose Select Members from the Members group on the Ribbon.
I have doubled clicked both Administrator and Citrix Resource Manager Accounts, at which point they show
in the ―Members ->‖ area at the bottom of the screen. (Pick someone you actually want to e-mail.) Once you
have got a nice list of people that you had like to include in your group, click OK.
The secret to finding your new group is to click the drop down arrow and choose the address book that holds
the group you created. (In this case Outlook Contacts instead of our Global Address List):
Figure 6.38: Outlook Contacts instead of our Global Address List.
And there it is! Double click it to select it, and then click OK.
You will notice that your e-mail is now addressed to the group. Now do you remember who you put in the
group? Ok, you probably do at this stage, but as time goes on, you may forget and want to verify it. (Or maybe
it is your joke list, and you want to forward it to everyone except the person who just e-mailed the joke to you.)
Do you see the little + sign just in front of the group name?
Figure 6.40: Messages Title name.
Go ahead and give it a click.
The e-mail group automatically expands to show you the names of all of the members! You will never wonder
who you are sending e-mail to again! And if you want to remove one, just highlight the name and press the
delete key. It is just that easy!
Part 3–Editing Existing Groups
Assume now that months have gone by, and you want to make a change to your group either adding or
removing someone. How do you do it? The first step is to locate the group in your contacts list. Once you have
found it, right click it, and choose Open.
Figure 6.42: The first step is to locate the group in your contacts list. Once you have found it, right click it,
and choose Open.
Your "Distribution List" will open, and show who the members are.
6. Select the reason you want to apply a flag from the…………….drop-down menu.
(a) Follow up (b) Flag to
(c) Custom (d) Actions.
7. Outlook can be customized to search particular address lists first when you use the……………….
(a) Messages (b) Tools
(c) Address Book (d) Actions.
To open the attachment in the default application, just hit the Enter key. You will probably get a message
similar to this one:
3. Type the e-mail address of the person you want to forward the e-mail to and press the ―Send‖ button on the
menu bar.
4. This task should now be complete. If not, review and repeat the steps as needed. Submit any questions using
the section at the bottom of this page.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
6.18.7 To Add a Person to your Contacts (Address book) from Directory Services
1. Once you are used the above steps to find a contact, Right Click on their name and choose Add to Contacts
or click Add to Contacts on the toolbar.
2. You‘re Contacts or Address Book will be opened and you will see the information there. You can edit the
contact information if necessary.
6.20 Summary
The rich text format is proprietary to Microsoft e-mailing software.
Outlook is smart enough to transmit messages in HTML, plain text, or rich text format when you reply to a
message that was sent to you in that format.
Microsoft Outlook is the collaboration application in Microsoft‘s Office productivity suite.
The contacts folder is integrated with the inbox and the calendar for sending mail and scheduling
meetings.
Outlook Express does not display the Bcc field for e-mails by default.
6.21 Keywords
E-mail address: E-mail address is the POP e-mail address, enter the same e-mail you use for your user name
when accessing the Admin Console.
Internet Message Access Protocol (IMAP): Internet message access protocol is one of the two most prevalent
Internet standard protocols for e-mail retrieval, the other being the Post Office Protocol.
Internet Service Provider (ISP): An Internet service provider is a company that provides access to the
Internet. Access ISPs directly connect customers to the Internet using copper wires, wireless or fiber-optic
connections.
Quick Access Toolbar: The Quick Access Toolbar is a customizable toolbar that contains a set of commands
that are independent of the tab that is currently displayed
Rich text format: The Rich Text Format is a proprietary document file format with published specification
developed by Microsoft Corporation since 1987 for Microsoft products and for cross-platform document
interchange.
1.0 Objectives
After studying this chapter, you will be able to:
Define the database
Discuss the three-level architecture of proposal
Explain the purpose of database system
Discuss the data model and abstraction
1.1 Introduction
A database management system (DBMS) is a set of software programs that allows users to create, edit and
update data in database files, and store and retrieve data from those database files. Data in a database can be
added, deleted, changed, sorted or searched all using a DBMS. If you were an employee or part of any large
organization, the information about you would likely be stored in different files that are linked together. One
file about you would pertain to your skills and abilities, another file to your income tax status, another to your
home and office address and telephone number, and another to your annual performance ratings. By cross-
referencing these files, someone could change a person‘s address in one file and it would automatically be
reflected in all the other files.
1.2 Database
A database is a collection of related files that are usually integrated, linked or cross-referenced to one another.
The advantage of a database is that data and records contained in different files can be easily organized and
retrieved using specialized database management software called a database management system (DBMS) or
database manager.
1.2.1 Views of Data
DBMS is a collection of interrelated files and a set of programs that allow users access and modify these files.
A major purpose of a database system is to provide users with an abstract view of the data. That is, the system
hides certain details of how the data are stored and maintained.
The main purpose of database systems is to manipulate information and to provide for data mining tasks.
System programmers wrote these application programs to meet the needs of the bank. New application
programs are added to the system as the need arises. For example, suppose that a savings bank decides to offer
checking accounts. As a result, the bank creates new permanent files that contain information about all the
checking accounts maintained in the bank, and it may have to write new application programs to deal with
situations that do not arise in savings accounts, such as overdrafts. Thus, as time goes by, the system acquires
more files and more application programs.
This typical file-processing system is supported by a conventional operating system. The system stores
permanent records in various files, and it needs different application programs to extract records from, and add
records to, the appropriate files. Use of database server could be the main task to introduce. Before database
management systems (DBMS‘s) came along, organizations usually stored information in such systems. Some
important issues when talk about purpose of database systems is introduction to database, it must be complete
to more understand the purpose database systems.
Keeping organizational information in a file-processing system has a number of major disadvantages:
Data redundancy and inconsistency: Since different programmers create the files and application programs
over a long period, the various files are likely to have different structures and the programs may be written in
several programming languages. Moreover, the same information may be duplicated in several places (files).
For example, the address and telephone number of a particular customer may appear in a file that consists of
savings-account records and in a file that consists of checking-account records. This redundancy leads to
higher storage and access cost. In addition, it may lead to data inconsistency; that is, the various copies of the
same data may no longer agree. For example, a changed customer address may be reflected in savings-account
records but not elsewhere in the system.
Difficulty in accessing data: Suppose that one of the bank officers needs to find out the names of all
customers who live within a particular postal-code area. The officer asks the data-processing department to
generate such a list. Because the designers of the original system did not anticipate this request, there is no
application program on hand to meet it. There is, however, an application program to generate the list of all
customers. The bank officer has now two choices: either obtain the list of all customers and extract the needed
information manually or ask a system programmer to write the necessary application program. Both
alternatives are obviously unsatisfactory. Suppose that such a program is written, and that, several days later,
the same officer needs to trim that list to include only those customers who have an account balance of
INR100000 or more. As expected, a program to generate such a list does not exist. Again, the officer has the
preceding two options, neither of which is satisfactory. The point here is that conventional file-processing
environments do not allow needed data to be retrieved in a convenient and efficient manner. More responsive
data-retrieval systems are required for general use.
Data isolation: Because data are scattered in various files, and files may be in different formats, writing new
application programs to retrieve the appropriate data is difficult. Database archiving is one of data isolation
techniques. Computer tutorial about database and this data isolation is very important to understanding these
issues.
Integrity problems: The data values stored in the database must satisfy certain types of consistency
constraints. For example, the balance of certain types of bank accounts may never fall below a prescribed
amount (say, INR250). Developers enforce these constraints in the system by adding appropriate code in the
various application programs. However, when new constraints are added, it is difficult to change the programs
to enforce them. The problem is compounded when constraints involve several data items from different files.
Atomicity problems: A computer system, like any other mechanical or electrical device, is subject to failure. In
many applications, it is crucial that, if a failure occurs, the data be restored to the consistent state that existed
prior to the failure. Consider a program to transfer INR500 from account A to account B. If a system failure
occurs during the execution of the Program, it is possible that the INR500 was removed from account but was
not credited to account B, resulting in an inconsistent database state. Clearly, it is essential to database
consistency that either both the credit and debit occur, or that neither occur. That is, the funds transfer must be
atomic-it must happen in its entirety or not at all. It is difficult to ensure atomicity in a conventional file-
processing system.
Concurrent-access anomalies: For the sake of overall performance of the system and faster response, many
systems allow multiple users to update the data simultaneously. Indeed, today, the largest Internet retailers
may have millions of accesses per day to their data by shoppers. In such an environment, interaction of
concurrent updates is possible and may result in inconsistent data. Consider bank account A, containing 5000.
If two customers withdraw funds (say INR500 and INR1000, respectively) from account A at about the same
time, the result of the concurrent executions may leave the account in an incorrect (or inconsistent) state.
Suppose that the program executing on behalf of each withdrawal read the old balance, reduce that value by
the amount being withdrawn, and write the result back. If the two programs run concurrently, they may both
read the value INR5000, and write back INR4500 and 9400, respectively. Depending on which one writes the
value last, the account may contain either INR 4500 or 4000, rather than the correct value of INR3500. To
guard against this possibility, the system must maintain some form of supervision. But supervision is difficult
to provide because may be accessed by many different application programs that have not been coordinated
previously.
Security problems: Not every user of the database system should be able to access all the data. For example in
a banking system, payroll personnel need to see only that part of the database that has information about the
various bank employees. They do not need access to information about customer accounts. But‖ since
application programs are added to the file-processing system in an alcohol manner, enforcing such security
constraints is difficult.
Computer development bought us to the development of DBMS and database systems. These difficulties,
among others, prompted the development of database systems. In what follows, we shall see the concepts and
algorithms that enable database systems to solve the problems with file-processing systems. In most of this
book, we use a bank enterprise as a running example of a typical data-processing application found in a
corporation.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
In addition to entities and relationships, the E-R model represents certain constraints to which the contents of a
database must conform. One important constraint is mapping cardinalities, which express the number of
entities to which another entity can be associated via relationship set. The overall logical structure of a
database can be expressed graphically by an E-R diagram, which is built up from the following components:
Rectangles, which represent entity sets
Ellipses, which represent attributes
Diamonds, which represent relationships among entity sets
Lines, which link attributes to entity sets and entity sets to relationships
Each component is labelled with the entity or relationship that it represents. An illustration, consider part of a
database banking system consisting of customers and of the accounts that these customers have. The
corresponding E-R diagram is shown in Figure 1.3.
2. Unlike entities in the E-R model, each object has its own unique identity, independent of the values it
contains:
Two objects containing the same values are distinct.
Distinction is created and maintained in physical level by assigning distinct object identifiers.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Caution
At the physical level, we must define algorithms that allow efficient access to data.
2. A ………is partitioned into modules that deal with each of the responsibilities of the overall system.
(a). Database system (c).Data manipulation language
(b). Over all system (d). None of these
3. The portion of a DML that involves information retrieval is called a query language.
(a). True (b). False
4. The retrieval of information stored in the………
(a). Management (b). Database
(c). Relational database (d). None of these
5. A data-manipulation language (DML) is a language that enables users to access or manipulate data as
organized by the appropriate data model.
(a). True (b). False
6. Nonprocedural DMLs are usually easier to learn and use than are procedural DMLs.
(a). True (b). False
Schema Definition: The DBA creates the original database schema by writing a set of definitions that is
translated by the DDL compiler to a set of tables that is stored permanently in the data dictionary.
Storage Structure and Access-method Definition: The DBA creates appropriate storage structures and access
methods by writing a set of definitions. This is translated by the data-storage and data-definition-language
compiler.
Granting of Authorization for Data Access: The granting of different types of authorization allows the
database administrator to regulate which parts of the database various users can access. The authorization
information is kept in a special system structure that is consulted by the database system whenever access to
data is attempted in the system.
Integrity-constraint Specification: The data values stored in the database must satisfy certain.
Consistency-constraints: For example, perhaps the number of hours an employee may work in 1 week may
not exceed a specified limit (say, 80 hours), such a constraint must be specified explicitly by the database
administrator. The integrity constraints are kept in a special system structure that is consulted by the database
system whenever an update takes place in the system.
1.8 Data Base Users
A primary goal of a database system is to provide an environment for retrieving information from and storing
new information into the database. There are four different types of database-system users, differentiated by
the way that they expect to interact with the system.
1. Application programmers are computer professionals who interact with the system through DML calls,
which are embedded in a program written in a host language (for example, COBOL, PL/1, Pascal, C).
These programs are commonly referred to as application programs. Examples in a banking system include
programs that generate payroll checks that debit accounts, that credit accounts or that transfer funds
between accounts.
Since the DML syntax is usually markedly different from the host language syntax, DML calls are-usually
prefaced by a special character so that the appropriate code can be generated. A special pre-processor,
called the DML pre-compiler, converts the DML statements to normal procedure calls in the host
language. The resulting program is then run through the host-language compiler, which generates
appropriate object code.
There are special types of programming languages that combine control structures of Pascal-like languages
with control structures for the manipulation of a database object (for example, relations). These languages
sometimes called fourth-generation languages often include special features to facilitate the generation of
forms and the display of data on the screen. Most major commercial database systems include a fourth-
generation language.
2. Sophisticated users interact with the system without writing programs. Instead, they form their requests in
a database query language. Each such query is submitted to a query processor whose function is to break
down DML statement into instructions that the storage manager understands. Analysts who submit queries
to explore data in the database fall in this category.
3. Specialized users are sophisticated users who write specialized database applications that do not fit into the
traditional data-processing framework. Among these applications are computer-aided design systems,
knowledgebase and expert systems, systems that store data with complex data types (for example, graphics
data and audio data), and environment-modelling systems.
4. Naive users are unsophisticated users who interact with the system by invoking one of the permanent
application programs that have been written previously. For example, a bank teller who needs to transfer
2250 from account A to account B invokes a program called transfer. This program asks the teller for the
amount of money to be transferred, the account from which the money is to be transferred, and the account
to which the money is to be transferred.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
DML compiler, which translates DML statements in a query language into low-level instructions that the
query evaluation engine understands. In addition on the DML compiler attempts to transform 'a user's
request into an equivalent but more efficient form, thus finding a good strategy for executing the query.
Embedded DML Pre-compiler, which converts DML statements, embedded in an application program
to normal procedure calls in the host language. The pre-compiler must interact with the DML compiler
to generate the appropriate code.
DDL interpreter, which interprets DDL statements and records them in a set of tables containing
metadata.
Query Evaluation engine, this executes low-level instructions generated by the DML compiler.
The storage manager components provide the interface between the low level data stored in the database and
the application programs and queries submitted to the system. The storage manager components include:
Authorization and integrity manager, which tests for the satisfaction of integrity constraints and
checks the authority of users to access data.
Transaction manager, which ensures that the database remains in a consistent (correct) state despite
system failures, and that concurrent transaction executions proceed without conflicting.
File manager, which manages the allocation of space on disk storage and the data structures used to
represent information stored on disk.
Buffer manager, which is responsible for fetching data from disk storage into main memory, and
deciding what data to cache in memory.
In addition, several data structures are required as part of the physical system implementation:
Data files, which store the database itself.
Data dictionary, which stores metadata about the structure of the database. The data dictionary is used
heavily. Therefore, great emphasis should be placed on developing a good design and efficient
implementation of the dictionary.
Indices, which provide fast access to data items that hold particular values.
Statistical data, which store statistical information about the data in the database. This information is
used by the query processor to select efficient ways to execute a query.
Figure 1.4: System structure.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Caution
The design of a database system must include consideration of the interface between the database system and
the operating system.
Examples:
The examples of integrity constraints are:
(i) 'Issue Date' in a library system cannot be later than the corresponding 'Return Date' of a book.
(ii) Maximum obtained marks in a subject cannot exceed 100.
(iii) Registration number of BCS and MCS students must start with 'BCS' and 'MCS' respectively etc.
There are also some standard constraints that are intrinsic in most of the DBMSs. These are;
Constraint
Name Description
PRIMARY KEY Designates a column or combination of columns as Primary Key and therefore,
values of columns cannot be repeated or left blank.
FOREIGN KEY Relates one table with another table.
UNIQUE Specifies that values of a column or combination of columns cannot be repeated.
NOT NULL Specifies that a column cannot contain empty values.
CHECK Specifies a condition which each row of a table must satisfy.
6. Data Security:
Data security is the protection of the database from unauthorized users. Only the authorized persons are
allowed to access the database. Some of the users may be allowed to access only a part of database i.e., the
data that is related to them or related to their department. Mostly, the DBA or head of a department can access
all the data in the database. Some users may be permitted only to retrieve data, whereas others are allowed to
retrieve as well as to update data. The database access is controlled by the DBA. He creates the accounts of
users and gives rights to access the database. Typically, users or group of users are given usernames protected
by passwords.
7. Data Atomicity:
A transaction in commercial databases is referred to as atomic unit of work. For example, when you purchase
something from a point of sale (POS) terminal, a number of tasks are performed such as;
Company stock is updated.
Amount is added in company's account.
Sales person's commission increases etc.
All these tasks collectively are called an atomic unit of work or transaction. These tasks must be completed in
all; otherwise partially completed tasks are rolled back. Thus through DBMS, it is ensured that only consistent
data exists within the database.
9. Development of Application:
The cost and time for developing new applications is also reduced. The DBMS provides tools that can be used
to develop application programs. For example, some wizards are available to generate Forms and Reports.
Stored procedures (stored on server side) also reduce the size of application programs.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
1.11 Summary
A database system is partitioned into modules that deal with each of the responsibilities of the overall
system.
The DBMS define as in interface between the program and the operating system to access or manipulate
the database.
The collection of information stored in the database at a particular moment is called an instance of the
database.
The overall design of the database is called the schema. It is of three types: physical schema, conceptual
schema and external schema.
A database administrator (DBA) directs or performs all activities related to maintaining a successful
database environment. Responsibilities include designing, implementing, and maintaining the database
system.
1.12 Keywords
Abstraction: It is the process of taking away or removing characteristics from something in order to reduce it
to a set of essential characteristics.
Database Management System (DBMS): Collection of interrelated data set of programs to access those data
DBMS contains information about a particular enterprise.
Data Manipulation Language (DML): Language for accessing and manipulating the data organized by the
appropriate data model DML also known as query language.
Data Definition Language: A DDL is a language used to define data structures within a database. It is
typically considered to be a subset of SQL, the Structured Query Language, but can also refer to languages that
define other types of data.
Data Models: A collection of tools for describing data relationships, data semantics, and data constraints.
1.13 Review Questions
1. Describe the three levels of data abstraction.
2. What are the views of data?
3. What is the three-level architecture proposal?
4. Describe the instances and schemas.
5. Explain the purpose of database system and data abstraction.
6. What is the data independence?
7. Differentiate between DDL and DML.
8. Discuss the role of database administrator and users.
9. Explain the overall structure of DBMS.
10. What are the advantages and disadvantages of DBMS?
2.0 Objectives
After studying this chapter, you will be able to:
Define the E-R model
Explain the types of keys
Discuss the relationship sets
Explain the mapping constraints
2.1 Introduction
The entity-relationship (E-R) data model is based on a perception of a real world that consists of a set of basic
objects called entities, and of relationships among these objects. It was developed to facilitate database design
by allowing the specification of an enterprise schema, which represents the overall logical structure of a
database. The E-R data model is one of several semantic data models; the semantic aspect of the model lies in
the attempt to represent the meaning of the data. The E-R model is extremely useful in mapping the meanings
and interactions of real-world enterprises onto a conceptual schema. Because of this utility, many database-
design tools draw on concepts from the E-R model.
2.2 Concept of Entity-relationship Model
When a relational database is to be designed, an entity-relationship diagram is drawn at an early stage and
developed as the requirements of the database and its processing become better understood. Drawing an entity-
relationship diagram aids understanding of an organization's data needs and can serve as a schema diagram for
the required system's database. A schema diagram is any diagram that attempts to show the structure of the
data in a database. Nearly all systems analysis and design methodologies contain entity-relationship
diagramming as an important part of the methodology and nearly all CASE (Computer Aided Software
Engineering) tools contain the facility for drawing entity-relationship diagrams. An entity-relationship diagram
could serve as the basis for the design of the files in a conventional file-based system as well as for a schema
diagram in a database system.
There are three basic notions that the E-R data model employs: entity sets, relationship sets, and attributes.
Null attributes: A null value is used when an entity does not have a value for an attribute. As an illustration, if
a particular employee has no dependents, the dependent-name value for that employee will be null, and will
have the meaning of ―not applicable.‖ Null can also designate that an attribute value is unknown. An unknown
value may be either missing (the value does exist, but we do not have that information) or not known (we do
not know whether or not the value actually exists). For instance, if the social-security value for a particular
customer is null, we assume that the value is missing, since it is required for tax reporting. A null value for the
apt-number attribute could mean that the address does not include an apartment number, that an apartment
number exists but we do not know what it is, or that we do not know whether or not an apartment number is
part of the customer‘s address.
Figure 2.3: An entity type CUSTOMER and one of its attributes Cus_no.
In Figure 2.3, the attribute CUS_NO is shown. Assuming the organization storing the data ensures that each
customer is allocated a different cus_no, that attribute could act as the primary key, since it identifies each
customer; it distinguishes each customer from all the rest. No two customers have the same value for the
attribute cus_no. Some people would say that an attribute is a candidate for being a primary key because it is
‗unique‘. They mean that no two entities within that entity type can have the same value of that attribute. In
practice it is best not to use that word because it has other connotations.
As already mentioned, you may need to have a group of attributes to form a primary key, rather than just one
attribute, although the latter is more common. For example if the organization using the CUSTOMER entity
type did not allocate a customer number to its customers, then it might be necessary to use a composite key,
for example one consisting of the attributes SURNAME and INITIALS together, to distinguish between
customers with common surnames such as Smith. Even this may not be sufficient in some cases.
Primary keys are not the only attributes you might want to show on the entity-relationship diagram. For
example, in a manufacturing organization you might have an entity type called COMPONENT and you want
to make it clear on the entity-relationship diagram that the entities within the type are not single components
but a component type such as a BC109 transistor. There are thousands of BC109s in stock and any one will do
for any application. It is therefore not necessary to identify each BC109 differently (they all look and work the
same). However you might want to distinguish BC109s from another transistor type BC108. To make it clear
that you are considering all the BC109s as one entity and all the BC108s as another entity, you might put the
attribute QIS (quantity in stock) on the entity-relationship diagram as in Figure. 2.4. This makes it clearer at
the entity-relationship model level that each entity in the entity type is in fact a stock item of which there will
be several in stock. Any doubts on this point should be resolved by inspecting the entity description, which
shows all the attributes of the entity type and (ideally) their meaning. The primary key might be STOCK_NO
and one of the attributes QIS, which should remove any doubt on this point.
Figure 2.4: A well-placed attribute may clarify the meaning of an entity type.
Did You Know?
The database concept has evolved since the 1960s to ease increasing difficulties in designing, building, and
maintaining complex information system.
If the relationship set R had any attributes, these are assigned to entity set E; otherwise, a special identifying
attribute is created for E (since every entity set must have at least one attribute to distinguish members of the
set). For each relationship (ai, bi; ci; ) in the relationship set R, we create a new entity ei; in the entity set E.
Then, in each of the three new relationship sets, we insert a relationship as follows:
(ei, ai;) in RA
(ei, bi; ) in Rb
(ei, ci;) in RC
We can generalize this process in a straightforward manner to n-ary relationship sets. Thus, conceptually, we
can restrict the E-R model to include only binary relationship sets. However, this restriction is not always
desirable.
An identifying attribute may have to be created for the entity set created to represent the relationship set.
This attribute, along with the extra relationship sets required, increase the complexity of the design overall
storage requirements.
An n-ary relationship set shows more clearly that several entities participate in a single relationship. In the
corresponding design using binary relationships, it is more difficult to enforce this participation constraint.
The appropriate mapping cardinality for a particular relationship set is obviously dependent on the real-world
situation that is being modelled by the relationship set.
As an illustration, consider the borrower relationship set. If, in a particular bank, a loan can belong to only one
customer, and a customer can have several loans, then the relationship set from customer to loan is one to
many. If a loan can belong to several customers (as can loans taken jointly by several business partners), the
relationship set is many to many.
Figure 2.6: Mapping cardinalities. (a) One to one. (b) One to many.
The cardinality ratio of a relationship can affect the placement of relationship attributes. Attributes of one-to-
one or one-to-many relationship sets can be associated with one of the participating entity sets, rather than with
the relationship set. For instance, let us specify that depositor is a one-to-many relationship set such that one
customer may have several accounts, but each account is held by only one customer. In this case, the attribute
access-date could be associated with the account entity set, as depicted in Figure 2.8; to keep the figure simple,
only some of the attributes of the two entity sets are shown. Since each account entity participates in a
relationship with at most one instance of customer, making this attribute designation would have the same
meaning as would placing access-date with the depositor relationship set. Attributes of a one-to-many
relationship set can be repositioned to only the entity set on the ―many‖ side of the relationship. For one-to-one
relationship sets, the relationship attribute can be associated with either one of the participating entities.
Figure 2.7: Mapping cardinalities. (a) Many to one. (b) Many to many.
The design decision of where to place descriptive attributes in such cases-as a relationship or entity attribute-
should reflect the characteristics of the enterprise being modelled. The designer may choose to retain access-
date as an attribute of depositor to express explicitly that an access occurs at the point of interaction between
the customer and account entity sets.
The choice of attribute placement is more clear-cut for many-to-many relationship sets. Returning to our
example, let us specify the perhaps more realistic case that depositor is a many-to-many relationship set
expressing that a customer may have one or more accounts, and that an account can be held by one or more
customers. If we are to express the date on which a specific customer last accessed a specific account, access-
date must be an attribute of the depositor relationship set, rather than either one of the participating entities. If
access-date were an attribute of account, for instance, we could not determine which customer made the most
recent access to a joint account. When an attribute is determined by the combination of participating entity set,
rather than by either entity separately, that attribute must be associated with the many-to-many relationship set.
The placement of access-date as a relationship attribute is depicted in Figure 2.8; again, to keep the figure
simple, only some of the attributes of the two entity sets are shown.
The participation of an entity set E in a relationship set R is said to be total if every entity in E participates in at
least one relationship in R. If only some entities in E participate in relationships in R, the participation of entity
set E in relationship R is said to be partial. Total participation is closely related to existence dependency. For
example, since every payment entity must be related to some loan entity by the loan-payment relationship, the
participation of payment in the relationship set loan-payment is total. In contrast, an individual can be a bank
customer whether or not she has a loan with the bank. Hence, it is possible that only a partial set of the
customer entities relate to the loan entity set.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
As depicted in Figure 2.10, attributes of an entity set that are members of the primary key are underlined.
Consider the entity-relationship diagram in Figure 2.10, which consists of two entity sets, customer and
loan, related through a binary relationship set borrower. The attributes associated with customer are
customer-name, social-security, customer-street, and customer-city. The attributes associated with loan are
loan-number and amount.
An undirected line from the relationship set borrower to the entity set loan specifies that borrower is either
a many-to-many, or one-to-many relationship set, from customer to loan.
Returning to the E-R diagram of Figure 2.10, we see that the relationship set borrower is many to many. If the
relationship set borrower were one to many, from customer to loan, then the line from borrower to customer
would be directed, with an arrow pointing to the customer entity set (Figure 2.11a). Similarly, if the
relationship set borrower were many to one from customer to loan, then the line from borrower to loan would
have an arrow pointing to the loan entity set (Figure 2.11b). Finally, if the relationship set borrower were one
to one, then both lines from borrower would have arrows: one pointing to the loan entity set, and one pointing
to the customer entity set (Figure 2.12).
If a relationship set has also some attributes associated with it, then we link these attributes to that relationship
set. For example, in Figure 2.13, we have the access-date descriptive attribute attached to the relationship set
depositor to specify the most recent date on which a customer accessed that account.
We indicate roles in E-R diagrams by labelling the lines that connect diamonds to rectangles. Figure 2.14
shows the role indicators manager and worker between the employee entity set and the works for relationship
set.
Non binary relationship sets can be specified easily in an E-R diagram. Figure 2.15 consists of the three entity
sets customer, loan, and branch, related through the relationship set CLB. This diagram specifies that a
customer may have several loans, and that loan may belong to several different customers. Further, the arrow
pointing to branch indicates that each customer-loan pair is associated with a specific bank branch. If the
diagram had an arrow pointing to customer, in addition to the arrow pointing to branch, the diagram would
specify that each loan is associated with a specific customer and a specific bank branch.
As an illustration, consider the entity set payment, which has the three attributes: payment-number, payment-
date, and payment-amount. Although each payment entity is distinct, payments for different loans may share
the same payment number. Thus, this entity set does not have a primary key; it is a weak entity set. (For a
weak entity set to be meaningful, it must be part of a one-to-many relationship set. This relationship set should
have no descriptive attributes, since any required attributes can be associated with the weak entity set)
Although a weak entity set does not have a primary key, we nevertheless need means of distinguishing among
all those entities in the entity set that depend on one particular strong entity. The discriminator of a weak entity
set is a set of attributes that allows this distinction to be made) For example, the discriminator of the weak
entity set payment is the attribute payment-number, since, for each loan, a payment number uniquely identifies
one single payment for that loan. The discriminator of a weak entity set is also called the partial key of the
entity set.
Primary key of a weak entity set is formed by the primary key of the strong entity set on which the weak entity
set is existence dependent, plus the weak entity set's discriminator. In the case of the entity set payment, its
primary key is {loan-number, payment-number}, where loan-number identifies the dominant entity of a
payment, and payment-number distinguishes payment entities within the same loan.
Identifying dominant entity set is said to own the weak entity set that it identifies. The relationship that
associates the weak entity set with an owner is the identifying relationship. In our example, loan-payment is
the identifying relationship for payment.
In some cases, the database designer may choose to express a weak entity set as a multivalve, composite
attribute of the owner entity set. In our example, this alternative would require that the entity set loan have a
multivalve, composite attribute payment, consisting of payment-number, payment-date, and payment-amount.
A weak entity set may be more appropriately modelled as an attribute if it participates in only the identifying
relationship, and if it has few attributes. Conversely, a weak-entity-set representation will more aptly model a
situation where the set participates in relationships other than the identifying relationship, and where the weak
entity set has several attributes.
2.7.2 Generalization: Generalization and specialization are actually the same thing. They are the inverse of
each other. However, it only differs in the design process. Specialization is a top-down design process,
whereas generalization is a bottom-up design process. Which means you will first design the sub groupings
like officer; temp-staff etc. and slowly move upwards to 'employer' - 'customer' and then design the 'person'
entity. In the ER diagram generalization and specialization are both represented exactly same.
Exercise: Check Your Progress 2
Note: i) Use the space below for your answer.
Ex1: What are Strong and Weak Entities?
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
2.8 Aggregation
The E-R model cannot express relationships among relationships. When would we need such a thing?
Consider a DB with information about employees who work on a particular project and use a number of
machines doing that work. We get the E-R diagram shown in Figure
Figure 2.17: E-R diagram with redundant relationships Relationship sets work and uses could be combined
into a single set. However, they shouldn't be, as this would obscure the logical structure of this scheme.
The solution is to use aggregation.
Through an abstraction relationships are treated as higher-level entities. For our example, we treat the
relationship set work and the entity sets employee and project as a higher-level entity set called work.
Figure 2.18 shows the E-R diagram with aggregation.
Transforming an E-R diagram with aggregation into tabular form is easy. We create a table for each entity and
relationship set as before. The table for relationship set uses contains a column for each attribute in the primary
key of machinery and work.
2.9 Reduction of an E-R Schema to Tables
A database that conforms to an E-R database schema can be represented by a collection of tables. For each
entity set and for each relationship set in the database, there is a unique table that is assigned the name of the
corresponding entity set or relationship set. Each table has multiple columns, each of which has a unique
name.
Both the E-R model and the relational-database model are abstract, logical representations of real-world
enterprises. Because the two models employ similar design principles, we can convert an E-R design into a
relational design. Converting a database representation from an E-R diagram to a table format is the basis for
deriving a relational-database design from an E-R diagram. Although important differences exist between a
relation and a table, informally, a relation can be considered to be a table of values. We describe how an E-R
schema can be represented by tables; we show how to generate a relational-database schema from an E-R
schema.
As another example, consider the entity set- customer of the E-R diagram shown in Figure 2.10.This entity set
has the attributes customer-name, social-security, customer-street, and customer city. The table corresponding
to customer has four columns, as shown in Figure 2.22.
As an illustration, consider the relationship set borrower in the E-R diagram of Figure 2.10. This relationship
set involves the following two entity sets:
customer, with the primary key social-security
loan, with the primary key loan-number
Since the relationship set has no attributes, the borrower table has two columns labeled social-security and
loan-number, as shown in Figure 2.22.
Combination of Tables
Consider a many-to-one relationship set AB from entity set A to entity set B. Using our table-construction
scheme outlined previously, we get three tables: A, B, and AB. However, if there is an existence dependency
of A on B (that is, for each entity a in A, the existence of a depends on the existence of some entity b in B),
then we can combine the tables A and AB to form a single table consisting of the union of columns of both
tables.
As an illustration, consider the E-R diagram of Figure 2.23. The relationship set account-branch is many to
one from account to branch. Further, the double line in the E-R diagram indicates that the participation of
account in the account-branch is total. Hence, an account cannot exist without being associated with a
particular branch. Therefore, we require only the following two tables
account, with- attributes account-number, balance, and branch-name branch, with attributes branch-name,
branch-city, and assets
Multivalued Attributes
We have seen that attributes in an E-R diagram generally map directly into columns for the appropriate tables.
Multivalued attributes, however, are an exception; new tables are created for these attributes.
For a multivalued attribute M, we create a table T with a column C that corresponds to M and columns
corresponding to the primary key of the entity set or relationship set of which M is an attribute. As an
illustration, consider the E-R diagram depicted in Figure 2.19. The diagram includes the multivalued attribute
dependent-name. For this multivalued attribute, we create a table dependent-name, with columns dname,
referring to the dependent-name attribute.
Employee, and e-social-security, representing the primary key of the entity set employee. Each dependent of
an employee is represented as a unique row in the table.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
2.10 Summary
The entity-relationship (E-R) data model is based on a perception of a real world that consists of a set of
basic objects called entities, and of relationships among these objects
A superkey is a set of one or more attributes that, taken collectively, allows us to identify uniquely an
entity in the entity set
Aggregation is an abstraction through which relationships are treated as higher-level entities.
Mapping cardinalities, or cardinality ratios, express the number of entities to which another entity can be
associated via a relationship set.
A null value is used when an entity does not have a value for an attribute called null attribute.
2.11. Keywords
Attributes: It is descriptive properties possessed by each member of an entity set. The designation of an
attribute for an entity
Database: A database thus includes a collection of entity sets each of which contains any number of entities of
the same type.
Entity: It is a ―thing‖ or ―object‖ in the real world that is distinguishable from all other objects
Key: A key (primary, candidate, and super) is a property of the entity set.
Null attributes: A null value is used when an entity does not have a value for an attribute.
3.0 Objectives
After studying this chapter, you will be able to:
Discuss the set theory concepts in RDBMS
Explain the extension and intention
Understand database relationships
Discuss the integrity rules
3.1 Introduction
The RDBMS stands for Relational Database Management System. RDBMS is the basis for SQL, and for all
modern database systems like MS SQL Server, IBM DB2, Oracle, MySQL, and Microsoft Access.
The model is based on branches of mathematics called set theory and predicate logic. The basic idea behind
the relational model is that a database consists of a series of unordered tables (or relations) that can be
manipulated using non-procedural operations that return tables. This model was in vast contrast to the more
traditional database theories of the time that were much more complicated, less flexible and dependent on the
physical storage methods of the data.
It is commonly thought that the word relational in the relational model comes from the fact that you relate
together tables in a relational database. Although this is a convenient way to think of the term, it is not
accurate. Instead, the word relational has its roots in the terminology that Codd used to define the relational
model. The table in Codd‘s writings was actually referred to as a relation (a related set of information). In fact,
Codd (and other relational database theorists) use the terms relations, attributes and tuples where most of us
use the more common terms tables, columns and rows, respectively (or the more physical—and thus less
preferable for discussions of database design theory—files, fields and records).
The relational model can be applied to both databases and database management systems (DBMS) themselves.
The relational fidelity of database programs can be compared using Codd‘s 12 rules (since Codd‘s seminal
paper on the relational model, the number of rules has been expanded to 300) for determining how DBMS
products conform to the relational model.
Candidate keys for tblCustomer might include CustomerId, (LastName + FirstName), Phone#, (Address, City,
and State), and (Address + ZipCode). Following Pascal‘s guidelines, you would rule out the last three
candidates because addresses and phone numbers can change fairly frequently. The choice among CustomerId
and the name composite key is less obvious and would involve tradeoffs. How likely would a customer‘s name
change (e.g., marriages cause names to change)? Will misspelling of names be common? How likely will two
customers have the same first and last names? How familiar will CustomerId be to users? There is no right
answer, but most developers favor numeric primary keys because names do sometimes change and because
searches and sorts of numeric columns are more efficient than of text columns in most of the databases.
Note: In many situations, it is best to use some sort of arbitrary static whole number (e.g., employee ID, order
ID, etc.) as a primary key rather than a descriptive text column. This avoids the problem of misspellings and
name changes. Also, do not use real numbers as primary keys since they are inexact.
Caution
In Relational Database Management we cannot allows for duplicate row, if we do it may be cause of data
ambiguity.
CustomerId is considered a foreign key in tblOrder since it can be used to refer to given customer (i.e., a row
in the tblCustomer table). It is important that both foreign keys and the primary keys that are used to reference
share a common meaning and draw their values from the same domain. Domains are simply pools of values
from which columns are drawn. For example, CustomerId is of the domain of valid customer ID #‘s, which in
this case might be Long Integers, ranging between 1 and 50,000. Similarly, a column named Sex might be
based on a one-letter domain equalling ‗M‘ or ‗F‘. Domains can be thought of as user-defined column types
whose definition implies certain rules that the columns must follow and certain operations that you can
perform on those columns.
3.2.5 Terminologies
There are two sets of relational database terminology in use. The original developers of the relational theory
approached it from a theoretical perspective and used terminology that came from set theory and formal logic.
These terms never caught on among practitioners, who preferred to use more intuitive and practical terms,
which were eventually enshrined in the SQL standard.
Most modern publications about databases use the SQL terms as described below, but you should be aware of
the different terminology. The pairs of terms are not entirely synonymous, so some writers on relational theory
prefer to use the strict relational terminology.
Domains
Domains are the set of allowable data values for a Column. For example, the FiveDigitZipCode Column on the
customer entity can be in the integer domain. As such, the database would not allow you to place values like
123.45 (floating point) or ABC (character) into that Column.
Some authors draw a distinction between a domain and a type in the fact that a type is a fundamental concept
built into the DBMS (e.g. string, integer, floating point) while a domain can have additional business rules
about what values are acceptable. For example, if you have a database storing scores in ten-pin bowling, the
score for a game will be of integer type, but the rules of the game (it is impossible to score more than 300)
mean that the domain of the score would be integers between 0 and 300. The additional constraints on the
domain make it harder for bad data to be inserted into the database.
Columns-Columns are the attributes that describe an entity in the database model. For example, the customer
entity may have attributes for First Name, Last Name, Address, City, State, and FiveDigitZipCode.
Tables
Tables are collections of Rows that act as logical groupings of entities.
Databases
A collection of related Tables and any supporting objects (e.g. stored procedures) is often referred to as a
Database (or schema). Multiple Databases are usually logically separate from one another.
The term ‗database‘ is sometimes used loosely to refer to the software that manages the database. To avoid
ambiguity it is standard in more formal contexts to refer to the software as a Database Management System or
DBMS—or more specifically a Relational Database Management System or RDBMS.
Note that the definition says zero or more: a set with zero members is still a set, even though it is empty. The
set with zero elements is written as a pair of empty braces , and is often represented by the symbol .
If two sets have the exact same elements, then they are the same set. There is nothing special about one set that
can distinguish it from others, apart from the elements it contains. The order of the elements is not important,
so the sets are the same no matter what order we choose to write the elements in:
Anything can be put into set, not just mathematical concepts such as numbers. You could have the set of all
weekdays, and deal with it in set theory the same as any other set:
S= {Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday}
Even sets can be members of sets:
Sets can be infinite, for example the set of all positive whole numbers.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
3.4.2 Intension
The intension of a given relation is independent of time. It is the permanent part of the relation. It corresponds
to what is specified in the relational schema. The intension thus defines all permissible extensions. The
intension is a combination of two things: a structure and a set of integrity constraints.
The naming structure consists of the relation name plus the names of the attributes (each with its associated
domain name).
The integrity constraints can be subdivided into key constraints, referential constraints, and other constraints.
For example,
Employee (EmpNo Number (4) Not NULL, EName Char(20), Age Number(2), Dept Char(4) )
This is the intension of Employee relation.
3.5 Relationships
You define foreign keys in a database to model relationships in the real world. Relationships between real-
world entities can be quite complex, involving numerous entities each having multiple relationships with each
other. For example, a family has multiple relationships between multiple people—all at the same time. These
tables can be related in one of three different ways: one-to-one, one-to-many or many-to-many.
3.5.1 One-to-Many Relationships
In Figure 3.3 the procedure for deriving the degree of a relationship type and putting it on the entity
relationship diagram is shown. The example concerns part of a sales ledger system. Customers may have
received zero or more invoices from us. The relationship type is thus called ‗received‘ and is from
CUSTOMER to INVOICE. The arrow shows the direction. The minimum number of invoices the customer
has received is zero and thus the ‗received‘ relationship type is optional. This is shown by the zero on the line.
The maximum number of invoices the customer may have received is ‗many‘. This is shown by the crow‘s
foot. This is summarized in Figure 3.3(a). To complete the definition of the relationship type the next step is to
name the inverse relationship type. Clearly if a customer received an invoice, the invoice was sent to the
customer and this is an appropriate name for this inverse relationship type. Now consider the degree of the
inverse relationship type. The minimum number of customers you would send an invoice to is one; you would
not send it to no-one. The optionality is thus one. The inverse relationship type is mandatory. The maximum
number of customers you would send an invoice to is also one so the cardinality is also one. This is
summarized in Figure 3.3(b). Figure 3.3(b) shows the completed relationship.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
3.6 Integrity Rules
The relational model defines several integrity rules that, while not part of the definition of the Normal Forms
are nonetheless a necessary part of any relational database. There are two types of integrity rules: entity and
referential integrity.
Caution
If two entities are not distinguishable from each other, then by definition there are not two entities but only one. It
may be cause of data redundancy.
2. The relational model is based on branches of mathematics called set theory and predicate logic.
(a) True (b) False
3. The relational model can be applied to both databases and database management systems.
(a) True (b) False
We explain the other terms informally here, and then go on to give more formal definitions in subsequent
sections. Briefly, if we think of a relation as a table, then a tuple corresponds to a row of such a table and an
attribute to a column; the number of tuples is called the cardinality and the number of attributes is called the
degree; and a domain is a pool of values, from which the values of specific attributes of specific relations are
taken. The domain labelled S# in for example is the set of all possible supplier numbers, and every S# value
appearing in the supplier‘s relation is some value from that set presents a summary of the foregoing. Please
understand, however, that the ―equivalences‖ are all only approximate (the formal relational terms have
precise definitions, while the informal ―equivalents‖ have only rough and ready definitions).
A domain is nothing more nor less than a data type (type for short)-possibly a simple system-defined type like
INTEGER or CHAR, more generally a user-defined type like S# or P# or WEIGHT or QTY in the suppliers
and parts database. Indeed, we can use the terms type and domain interchangeably. (Though we prefer the term
type; when we use the term domain, we do so mainly for historical reasons).
Among other things, it is a set of values-all possible values of the type in question. The type INTEGER for
example, is the set of all possible integers; the type S# is the set of all possible supplier numbers; and so on.
Also, along with the notion of a given type is the associated notion of the valid operator‘s that can legally be
applied to values of that type; i.e., values of that type can be operated upon solely by means of the operators
defined for that type. For example, type INTEGER (which we assume for simplicity is system-defined).
The system provides operators ―=―, ―<―, and so on, for comparing: integers;
It also provides operators ―+‖, ―*‖, and so on, performing arithmetic on integers;
It does not provide operators ‗||‘ (concatenate), SUBSTR (substring), and so on, for performing string
operations on integers. In other words, string operations on integers are not supported.
In the third relationship, ―departments‖ is on the ―many‖ side, so a department table, DEPARTMENT, is
defined.
Many-to-many relationships
A relationship that is multi-valued in both directions is a many-to-many relationship. An employee can work
on more than one project, and a project can have more than one employee. For example the questions ―What
does Dolores Quintana work on?‖, and ―Who works on project IF1000?‖ both yield multiple answers. A
many-to-many relationship can be expressed in a table with a column for each entity (―employees‖ and
―projects‖), as shown in the following example.
Table 3.9 shows how a many-to-many relationship (an employee can work on many projects, and a project can
have 7 many employees working on it) is represented.
One-to-one relationships
One-to-one relationships are single-valued in both directions. A manager manages one department; a
department has only one manager. The questions, ―Who is the manager of Department C01?‖, and ―What
department does Sally Kwan manage?‖ both have single answers. The relationship can be assigned to either
the DEPARTMENT table or the EMPLOYEE table. Because all departments have managers, but not all
employees are managers, it is most logical to add the manager to the DEPARTMENT table, as shown in the
following example.
The Table 3.10 shows the representation of a one-to-one relationship.
When you retrieve information about an entity from more than one table, ensure that equal values represent the
same entity. The connecting columns can have different names (like WORKDEPT and DEPTNO in the
previous example), or they can have the same name (like the columns called DEPTNO in the department and
project tables).
Caution
If your data files contain integrity constraints, do not use your operating environment commands to copy,
move, or delete your data files. This can cause of data loss.
Example:
The definition of candidate keys can be illustrated with the following (abstract) example. Consider a relation
variable (relvar) R with attributes (A, B, C, D) that has only the following two legal values r1 and r2.
Table 3.13: r1
A B C D
a1 b1 c1 d1
a1 b2 c2 d1
a2 b1 c2 d1
Table 3.14: r2
A B C D
a1 b1 c1 d1
a1 b2 c2 d1
a1 b1 c2 d2
Here r2 differs from r1 only in the A and D values of the last tuple.
For r1 the following sets have the uniqueness property, i.e., there are no two distinct tuples in the instance with
the same values for the attributes in the set.
{A,B}, {A,C}, {B,C}, {A,B,C}, {A,B,D}, {A,C,D}, {B,C,D}, {A,B,C,D}
For r2 the uniqueness property holds for the following sets.
{B,C}, {B,D}, {C,D}, {A,B,C}, {A,B,D}, {A,C,D}, {B,C,D}, {A,B,C,D}
Since super keys of a relvar are those sets of attributes that have the uniqueness property for all legal values of
that relvar and because we assume that r1 and r2 are all the legal values that R can take, we can determine the
set of super keys of R by taking the intersection of the two lists.
{B,C}, {A,B,C}, {A,B,D}, {A,C,D}, {B,C,D}, {A,B,C,D}
Finally we need to select those a set for which there is no proper subset in the list, which are in this case.
{B,C}, {A,B,D}, {A,C,D}
These are indeed the candidate keys of relvar R.
We have to consider all the relations that might be assigned to a relvar to determine whether a certain set of
attributes is a candidate key. For example, if we had considered only r1 then we would have concluded that
{A, B} is a candidate key, which is incorrect. However, we might be able to conclude from such a relation that
a certain set is not a candidate key, because that set does not have the uniqueness property (example {A, D}
for r1). Note that the existence of a proper subset of a set that has the uniqueness property cannot in general be
used as evidence that the superset is not a candidate key. In particular, note that in the case of an empty
relation, every subset of the heading has the uniqueness property, including the empty set.
Caution
If you want to redefine the primary key, any relationships to the existing primary key must be deleted before
the new primary key can be created or it will be automatically deleted as part of this process.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Insert Rule
The insert rule of a referential constraint is that a non-null insert value of the foreign key must match some
value of the parent key of the parent table. The value of a composite foreign key is null if any component of
the value is null. This rule is implicit when a foreign key is specified.
Update Rule
The update rule of a referential constraint is specified when the referential constraint is defined. The choices
are NO ACTION and RESTRICT. The update rule applies when a row of the parent or a row of the dependent
table is updated. In the case of a parent row, when a value in a column of the parent key is updated, the
following rules apply:
If any row in the dependent table matches the original value of the key, the update is rejected when the
update rule is RESTRICT.
If any row in the dependent table does not have a corresponding parent key when the update statement is
completed (excluding AFTER triggers), the update is rejected when the update rule is NO ACTION.
The value of a composite foreign key is null if any component of the value is null.
Delete Rule
The delete rule of a referential constraint is specified when the referential constraint is defined. The choices are
NO ACTION, RESTRICT, CASCADE, or SET NULL. SET NULL can be specified only if some column of
the foreign key allows null values. If the identified table or the base table of the identified view is a parent, the
rows selected for delete must not have any dependents in a relationship with a delete rule of RESTRICT, and
the DELETE must not cascade to descendent rows that have dependents in a relationship with a delete rule of
RESTRICT.
If the delete operation is not prevented by a RESTRICT delete rule, the selected rows are deleted. Any rows
that are dependents of the selected rows are also affected:
The null able columns of the foreign keys of any rows that are their dependents in a relationship with a
delete rule of SET NULL are set to the null value.
Any rows that are their dependents in a relationship with a delete rule of CASCADE are also deleted, and
the above rules apply, in turn, to those rows.
The delete rule of NO ACTION is checked to enforce that any non-null foreign key refers to an existing parent
row after the other referential constraints have been enforced.
The delete rule of a referential constraint applies only when a row of the parent table is deleted. More
precisely, the rule applies only when a row of the parent table is the object of a delete or propagated delete
operation and that row has dependents in the dependent table of the referential constraint. Consider an example
where P is the parent table, D is the dependent table, and p is a parent row that is the object of a delete or
propagated delete operation. The delete rule works as follows.
With RESTRICT or NO ACTION, an error occurs and no rows are deleted.
With CASCADE, the delete operation is propagated to the dependents of p in table D.
With SET NULL, each null able column of the foreign key of each dependent of p in table D is set to null.
Any table that can be involved in a delete operation on P is said to be delete-connected to P. Thus, a table is
delete-connected to table P if it is a dependent of P, or a dependent of a table to which delete operations from P
cascade.
The following restrictions apply to delete-connected relationships.
When a table is delete-connected to itself in a referential cycle of more than one table, the cycle must not
contain a delete rule of either RESTRICT or SET NULL.
A table must not both be a dependent table in a CASCADE relationship (self-referencing or referencing
another table) and have a self-referencing relationship with a delete rule of either RESTRICT or SET
NULL.
When a table is delete-connected to another table through multiple relationships where such relationships
have overlapping foreign keys, these relationships must have the same delete rule and none of these can be
SET NULL.
When a table is delete-connected to another table through multiple relationships where one of the
relationships is specified with delete rule SET NULL, the foreign key definition of this relationship must
not contain any distribution key.
When two tables are delete-connected to the same table through CASCADE relationships, the two tables
must not be delete-connected to each other where the delete connected paths end with delete rule
RESTRICT or SET NULL.
A FOREIGN KEY constraint does not have to be linked only to a PRIMARY KEY constraint in another table;
it can also be defined to reference the columns of a UNIQUE constraint in another table. A FOREIGN KEY
constraint can contain null values; however, if any column of a composite FOREIGN KEY constraint contains
null values, verification of all values that make up the FOREIGN KEY constraint is skipped. To make sure
that all values of a composite FOREIGN KEY constraint are verified, specify NOT NULL on all the
participating columns.
Proposal Database
The Proposal Database of EUVE pointed observations is designed to provide not only information about the
target, the observation, and the principal investigator for the data, but also information about the observation
schedule, the location of the acquired data, and various types of historical data such as the software version
used for processing. Over time, the design expanded to include the state of the data reduction processing. In
general, information in the Proposal Database handles higher level information than the information found in
the Archive Database. The Proposal database has two components: an observation database and a proposal
database.
The proposal component of the Proposal Database tracks high-level information about investigators, their
proposals, the related observation requests, and completed observations. This database has the greatest amount
of operator data entry. Several mechanisms have been used to check the consistency of the data in this
database, since errors can directly effect the productivity of the EUVE mission. Moreover, the information
changes frequently as adjustments are made to observing proposals. Runtime exceptions occur frequently and
are handled by operators using a WWW interface. This interface is closely related to the schema structure of
the database. Operators must be familiar with the schema structure in order to appropriately correct data in the
database. However, this cost is small compared to the cost of identifying and anticipating the multitude of
exception conditions that may occur in this database.
Discussion
The EUVE mission has used RDBMS technology in mission-critical processing. The features used are
transactions and synchronization (particularly suited to distributed systems). In some cases the relational
features of the RDBMS were also used. However, in many cases complex logical expressions were handled in
the application software instead of the RDBMS, and the RDBMS was used as a reliable data store. Overall
design, development, and testing of the databases were not difficult.
Providing operators with the capability to access and manipulate data has been an ongoing problem in working
with databases on the EUVE mission. We experimented with several commercial products, as well as the
Astronomical Data System, in an attempt to provide an operator front-end to the RDBMS. None of these
systems provided an appropriate solution. Typically such products are designed to provide complete solutions
that require extensive development of interface specifications intimately connected to the schema of the
database. This type of extensive development was never justified, nor did it appear maintainable since the
database schemas have continuously evolved. The evolution of the database schema is a natural result of the
evolution of the scientific goals of the mission. Therefore, instead of providing a complete solution only
applicable at a given moment, the EUVE mission has concentrated on providing general-purpose, partial
solutions. Specifically, the WWW interface used in the Proposal Database is derived from the database
schema. This interface requires operators be trained in the structure of the database, but dramatically reduces
development and maintenance costs. More recently, the EUVE mission has developed a WWW server
prototype (named xdb) that provides structured access to databases based solely on their Meta data
information.
Questions
1. Write a brief history of EUVE mission.
2. Why the RDBMS technology was used for the mission?
3.9 Summary
RDBMS is the basis for SQL, and for all modern database systems like MS SQL Server, IBM DB2,
Oracle, MySQL, and Microsoft Access.
The relational model can be applied to both databases and database management systems (DBMS)
themselves.
A well-designed database takes time and effort to conceive, build and refine.
Primary keys become essential, however, when you start to create relationships that join together multiple
tables in a database.
A relational database is a collection of data organized in two-dimensional tables consisting of named
columns and rows.
In set theory, columns are known as attributes and rows are known as tuples.
A composite key is a key that contains more than one attribute.
A foreign key is a field in a relational table that matches the primary key column of another table. The
foreign key is used to cross-reference tables.
The insert rule of a referential constraint is that a non-null insert value of the foreign key must match some
value of the parent key of the parent table.
The primary key is usually the key selected to identify a row when the database is physically implemented.
For example, a part number is selected instead of a part description.
Many relational database management systems include mechanisms that enforce a database‘s referential
integrity.
Referential integrity is another measure of the consistency of the data in databases.
3.10 Keywords
Alternate Key: All candidate keys excluding the primary key are known as alternate keys.
Artificial Key: If no obvious key either stands alone or compound is available, then the last resort is to simply
create a key, by assigning a unique number to each record or occurrence. Then this is known as developing an
artificial key.
Compound Key: If no single data element uniquely identifies occurrences within a construct, then combining
multiple elements to create a unique identifier for the construct is known as creating a compound key.
Foreign key: It is an attribute (or set of attributes) that appears (usually) as a non key attribute in one relation
and as a primary key attribute in another relation.
Partial Key: It is a set of attributes that can uniquely identify weak entities and that are related to same owner
entity. It is sometime called as discriminator.
Arity: Arity refers to the number of columns in a table.
Cardinality: Cardinality refers to the number of elements in a set.
Columns: Columns are the attributes that describe an entity in the database model.
Domains: Domains are the set of allowable data values for a Column.
Entity Integrity: It says that no component of a primary key may be null. All entities must be distinguishable. That
is, they must have a unique identification of some kind.
Foreign Key: A foreign key is a column in a table used to reference a primary key in another table.
Referential Integrity: The referential integrity constraint is specified between two relations and is used to
maintain the consistency among tuples of the two relations.
Tables: Tables are collections of Rows that act as logical groupings of entities.
Tuple: A row or tuple is a complete set of Columns that describe the entity that you are trying to model.
4.0 Objectives
After studying this chapter, you will be able to:
• Define the concept of normalization
• Explain the database anomalies
• Discuss the decomposition
Examples
4.4 Decomposition
The relational database design algorithm start with a single universal relation schema, R = {A 1, A2,
A3,………An}, which includes all the attributes of a database. The database designers specify the set, F of
functional dependencies, which holds true for all the attributes of R. This set, F of functional dependencies is
also provided to the design algorithms. With the help of functional dependencies, these algorithms decompose
the universal relation schema, R into a set of relation schemas, D= {Rl, R2..., Rm}, which becomes the
relational database schema. In this case, D is referred as a decomposition of R. The properties of
decomposition are as follows:
• Attribute preservation: It involves preserving all the attributes of the relation, which is being decomposed
by the design algorithms. While decomposing a relation, you need to make sure that each attribute in R
exists in at least one relation schema, Ri while decomposing the relation.
• Lossless-join decomposition: It ensures that the join remains in the same relation, as it was before the
decomposition of the relation. The decomposition of the relation R into several relations, R 1, R2, ..., Rn is
called a lossless join decomposition, if the relation R is the natural join of the relations R 1, R2, ..., Rn,,. To
test whether a given decomposition is a lossless join for a given set F of functional dependencies, you need
to decompose the relation, R into R1 and R2. If the decomposition of the relation R is lossless join, then
one of the following conditions has to be true:
o (Rl intersection R1) → (R1 - R2) and
o (R1 intersection R1) → (R1 – R2)
• Dependency preservation: It states that if each functional dependency X-> Y, specified in F, either directly
appears in one of the relation schemas Ri in the decomposition D or is inferred from the dependencies that
appear in the relation, Ri. The need of dependency preservation arises because each dependency in F
represents a constraint on the database. When decomposition does not preserve the dependency, then some
dependency can be lost in the decomposition. You can check for a lost dependency by creating a join of two
or more relations in decomposition to get a relation, which includes all the left and right-hand side attributes
of the lost dependency. Then, check whether or not the dependency is preserved on the result of join.
1 Access Anurag
8 Access Samar
1 Access Roshan
1 DB2 Gita
1 DB2 John
8 Oracle Ram
The-redundancy of data is easily perceived. For each MID there are defined multiple of Instructors and
Database. This is a perfect example of a multi-valued dependency. Figure 4.1 shows the fourth normal form of
instructor table.
MID DATA Table
MID Database
1 Access
8 Access
1 DB2
8 Oracle
Sumit 1 Delhi
Nina 2 Kanpur
Karan 1 Kanpur
If you were to add the MID 2 to Kanpur, you would have to add a line to the table for each instructor located
in Kanpur. If Jones were certified for MID 2 and could travel to Kanpur, you would have to add two lines to
reflect this.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
4.5 Normalization
Data normalization is a process in which data attributes within a data model are organized to increase the
cohesion of entity types. In other words, the goal of data normalization is to reduce and even eliminate data
redundancy, an important consideration for application developers because it is incredibly difficult to stores
objects in a relational database that maintains the same information in several places. Table 4.1 summarizes
the three most common forms of normalization (First normal form (1NF), Second normal form (2NF), and
Third normal form (3NF)) describing how to put entity types into a series of increasing levels of
normalization. With respect to terminology, a data schema is considered to be at the level of normalization of
its least normalized entity type. For example, if all of your entity types are at second normal form (2NF) or
higher then we say that your data schema is at 2NF.
In this chapter we will also discuss about the database anomalies and database decomposition.
2. The primary key of a relational table uniquely identifies each …………….. in a table.
(a) Row (b) Column
(c) Both (a) and (b) (c) None of these
Primary Key
The primary key of a relational table uniquely identifies each row in a table. A primary key is either a column
in a table that is unique such as identification number and social security number or it is generated by the
DBMS such as a Globally Unique Identifier (GUID). Primary key is a set of single column or multiple
columns from a table. For example, consider a student records database that contains tables related to student‘s
information. The first table, STUDENTS, contains a record for each student at the university. The table,
STUDENTS, consists of various attributes such as student_id, first _name, last _name and student_ stream.
Table 4.2 lists the various attributes in the STUDENTS table.
A unique Student __id number of a student is a primary key in the STUDENTS table. You cannot make the
first or last _name of a student a primary key because more than one student can have the same first name and
can have same stream.
Functional Dependency
A functional dependency is termed as a constraint between two sets of attributes of the database. Functional
dependency is represented by X→Y between two attributes, X and Y, in a table. The functional dependency
X→Y implies that Y is functionally dependent on X. Table 4.3 lists the various attributes in the EMPLOYEE
table.
In Table 4.6, the various attributes of the EMPLOYEE are Employee_id Employee_ name and Employee
_dept. You can state that:
Employee_id→Employee_name
In the above representation the Employee name attribute is functionally dependent on the Employee_ id. This
implies that the name of an employee can be uniquely identified from the id of the employee. However, you
cannot uniquely identify the Employee_id from the Employee_name column because more than one employee
can have the same name. However, each employee has a different value in: Employee_ id column.
Functional dependencies are a type of constraints based on keys such as primary Key or foreign key. For a
relation table R, a column Y is said to be functionally dependent on a column X of the same table if each value
of the-column X is associated with only one value of the column Y at a given time. All the columns in the
relational table R should be functionally dependent on X if the column X is a primary key.
If the columns X and Y are functionally dependent, the functional dependency can be represented as:
R. X→R. Y
For example, consider the following functional dependency in a table.
Employee_ id→Salary, the column Employee id functionally determines the Salary column because the salary
of each employee is unique and remains same for an employee, each time the name of the employee appears in
the table.
A functional dependency represented by X→Y between two sets of attributes X and Y are the subsets of R and
is termed as trivial functional dependency if Y is a subset of X. For example, Employee id→ Project is a trivial
functional dependency.
A functional dependency represented by X→Y between two sets of attributes X and Y are subsets of R and is
termed as non-trivial functional dependency if at least one of the attributes of Y is not among the attributes of
X. For example, Employee_id→Salary is a non-trivial functional dependency.
In Table 4.7, the information provided is redundant. The multiple values of the same type, such as quantity and
price of two items, are stored in different columns.
The requirements of the first normal form are:
• Eliminate the multi-valued fields from the table
• Each column in the table must be atomic
• Each column in the table must have a key such as primary or foreign key
• Remove the repeated information from the table
In Table 4.8, since a book can have more than one author and also a book can be included in different
categories, therefore, columns that consist of multi-valued elements should be removed from the table.
Therefore, the Books table should contain Book_ISBNno, Book_price and Book_publisher columns.
Table 4.9 lists the various attributes of the Books table after the multi-valued elements are removed.
Table 4.9: The Books Table after the Multi-valued Elements is removed
Book_ ISBN no Book_price Book_publisher
8790478 35 ABC
8790388 25 PQR
8790689 77 ABC
8790689 Sales
P0240 ABC
Caution
Each column has a unique name and the content within it must be of the same type. The different type of
content will assumed as invalid and not accepted by the database.
C012 Stkl 15
C013 Stk2 10
C014 Stk3 20
In Table 4.14, suppose cust_id and stock are identified as the primary key for the Stocks table. However, the
column stock_price is partially dependent on the primary key because only the stock column determines the
stock_price. Also, the values in the stock_price column do not need the cust_id column to uniquely identify
the price of the stocks. Therefore, you need to make a separate table for the stock_price where the stock
column is the primary key. In the new table, partial dependency is eliminated because the stock_price column
is entirely dependent on the primary key.
Partial dependencies can only occur when more than one field constitutes the primary key. If there is only one
field in the primary identifier, then partial dependencies cannot occur.
Table conforms to 1NF since it does not contain repeated values and Emp_id and Proj_ id are identified as the
primary keys for the table. However, the table is not in 2NF because all the columns of the table depend on
only a part of the primary key, which comprises of Emp_id and Proj _no, identified for the table. For example,
the column Emp_ name is dependent on only the Emp_ id and does not depend on the proj_no part of the
primary key. Similarly, the Proj_ name column is dependent only the Proj_ no column and not on the Emp_ id
primary key.
Therefore, to apply 2NF to the employee_project table, you need to make a separate table for columns that
depend on only a part of the primary key. The new table should contain columns that are dependent on the
entire primary key identified for the table. The tables formed after applying 2NF to the employee_project table
are emp_proj table and emp table and proj table.
Table 4.16 lists the various attributes in the emp_proj table.
Table 4.16: The Emp_project table
Emp_ id Proj_ no Proj_ hrs
H76320 W36 08
H76321 W37 02
H76320 W36
H76321 W37
Similarly, consider an ORDERS table that you need to normalize to 2NF. Table 4.19 lists the various attributes
in the ORDERS table.
Table 4.19: The ORDERS Table
Order_ no Item _no Customer Item Qty Price
In Table 4.19, Order_no and Item_no are identified as the primary keys for the table. Also, the table conforms
to 1NF since it does not contain repeated value. However, to apply 2NF to the ORDERS table, you need to
create a separate table for the columns that do not depend on either Order no or Item no primary key.
The tables, which are created after 2NF is applied to the ORDERS table, are order_cust table and orders table.
Table 4.20 lists the various attributes in the Order_cust table.
H76321 XYZ Co
In the above Order_cust table, the customer column is dependent on the primary key order -no. Similarly,
another table is created in which all the columns, Order_no and Item_no, Item, Qty and Price are dependent on
the primary keys, Order _no, Item, Qty and Price are dependent on the primary keys Order_no and Item_no.
Table 4.21 lists the various attributes in the orders table.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Caution
A database to be in 2NF, it must first fulfill all the criteria of a 1NF database.
4.6.7 Third Normal Form
A table is said to be in third normal form or 3NF if the table satisfies the requirements of 2NF and the non-key
columns should be only functionally dependent on the primary key. The third normal form is based on the
concept of transitive dependency. A functional dependency, A→B, in a relation, R is a transitive dependency
if the following conditions are satisfied:
• A column or set of columns, C, exists in the table that is neither the candidate key of R nor the subset of
any key of R.
• The functional dependencies A→*C and C→B hold in the table.
For example, consider a Subject table with attributes such as Subject_no and Chapter_name. Table 4.23 lists
the various attributes in the Subject table.
Table 4.23: Subject table
Subject _no Chapter_ name Instructor Department
H76320 Data structure ABC Computer
H76320 Communication XYZ Electronics
In the above table, Subject_no is the only candidate key. Therefore, the following functional dependency exists
for the Subject table.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
4.6.8 Boyce-Codd Normal Form
Boyce-Codd Normal Form (BCNF) is stricter than the third normal form. In BCNF the relation, who is in
BCNF is also present in Third Normal Form (3NF), but the relation present in 3NF form is not necessarily be
present in BCNF. In 3NF if a relation has more than one candidate 1ey then anomalies can occur. In case of
overlapping of candidate keys, 3NF is unable to stop the occurrence of anomalies. This provides a base for
BCNF and is based on the determinant concept. A determinant is an attribute on which some other attribute is
fully functionally dependent. The following code shows the relation and determinants:
R(a,b,c,d)
a, c → b, d
In the above code the first determinant states that you can change the primary key of relation R from a,b to a,c.
After applying this change, you can still determine the non key attributes present in relation R. The second
determinant indicates that a, d determine b, but as ad do not determine all the non-key attributes of R, it cannot
be considered as the primary key of R. This implies that the first determinant is a candidate key, but the second
determinant is not a candidate key, hence this relation t in BCNF but is in 3NF.
To be in BCNF, every determinant of the relation has to be a candidate key. The definition of BCNF specifies
that a relation schema R is in BCNF if a non-trivial functional dependency X→A holds in R, then X is a super-
key of R.
Caution
The form of a string, security mechanisms or character validation algorithms should usually be implemented
after normalization, because the normalization can result unexpected change.
After applying First Normal Form, Repeating Groups are eliminated. Suppose the above relation is named as
PATIENT and the primary key of this relation is selected as combination of PatientID and VisitDate attributes.
PATIENT(PatientID, Name, Address, VisitDate, Physician, Diagnosis Treatment)
In the relation PATIENT2, the primary key is PatientID and all other attributes are functionally dependent on
this primary key. Similarly, in the relation PATIENT HISTORY, the primary key is PatientID, VisitDate and
all other attributes are functionally dependent on this primary key. Therefore, the relations PATIENT2 and
PATIENT HISTORY are in 2NF.
Third Normal Form
A relation is in Third Normal Form (3NF) if it is in Second Normal Form and no transitive dependency exists.
If we see the relation PATIENT HISTORY, it is in 2NF. It is obvious that ‗Physician‘ and ‗Diagnosis‘
attributes directly depend on primary key but ‗Treatment‘ is indirectly dependent on the primary key. It means
that ‗Treatment‘ is transitively dependent on ‗Diagnosis‘. Therefore, we split the relation into two relations to
get the relations in 3NF. Suppose these relations are named as PAT-HISTORY and DIAGNOSIS. The
relations with sample data are given below.
Diagnosis Treatment
Chest Infection Free
Cold Free
Hepatitis-A Paid
Eyes Infection Free
Bone Fracture Paid
Cough Free
Flu Free
Questions
1. Explain the database arrangement of the hospital.
2. Which normal forms are used to arrange the hospital‘s database?
4.7 Summary
The goal of designing a database schema is to minimize the storage space which is occupied by the data
stored on the hard drive.
Database anomalies are the errors in data, contained in the database, which reduces the performance of
database Management System (DBMS).
Normalization is a process of eliminating the redundancy of data in a database relational table in a
database is said to be in a normal form if it satisfies constraints.
The normalization process involves various levels of normal forms that allow you to separate the data into
multiple related tables. The various normal forms are first normal form (1NF), second normal form (2NF),
third normal form (3NF) fourth normal form (4NF) and fifth normal form (5NF).
The primary key of a relational table uniquely identifies each row in a table.
4.8 Keywords
Candidate key: If there is more than one key in a relation, the keys are called candidate keys.
Functional dependency: It is termed as a constraint between two sets of attributes of the database.
Key: A set of attributes that uniquely and minimally identifies a tuple of a relation.
1NF: A table is said to be in 1NF if the data in the table has an identifying key and does not include repeating
groups of data.
Super key: It refers to one or more than one column that identifies a unique row with in a table.
5.0 Objectives
After studying this chapter, you will be able to:
Define the relational algebra
Explain the select operation
Discuss the project operation
Explain the join operation
5.1 Introduction
This chapter begins a study of database programming, that is, how the user can ask queries of the database and
can modify the contents of the database. Our focus is on the relational model and in particular on a notation for
describing queries about the content of relations called ―relational algebra‖.
While ODL uses methods that, in principle, can perform any operation on data, and the E/R model does not
embrace a specific way of manipulating data, the relational model has a concrete set of ―standard‖ operations
on data. Surprisingly, these operations are not ―Turing complete‖ the way ordinary programming languages
are. Thus, there are operations we cannot express in relational algebra that could be expressed. This situation is
not a defect of the relational model or relational algebra, because the advantage of limiting the scope of
operations is that it becomes possible to optimize queries written in a very high level language such as SQL.
Similarly, to retrieve those rows from ‗Marks‘ table that have value greater than 60 in ‗Phy‘ attribute and have
value greater than 70 in attribute ‗Comp‘, the Selection operation is written as:
δphy >60 AND comp >70 (Marks)
The Table 5.3 shows the actual ‗Marks‘ table, while the Table 5.4 shows the result of the statement.
Table 5.3: Marks Table
Roll_No Phy Math Comp
1 86 58 86
2 78 75 78
3 96 74 54
4 54 76 78
Table 5.4: Result
Roll_No Phy Math Comp
1 86 58 86
2 78 75 78
S L
K P
F Q
H L
Projection operation can also be used to change the order of attributes in a relation. The resulting relation has
the attributes in the same order as specified in the projection operation.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Customer-name Branch-name
Downtown
Johnson
Mianus
Smith
Perryridge
Hayes
Round Hill
Samar
Perryridge
Williams
Redwood
Lindsay
Brighton
Samar
Brighton
It may surprise you to discover that, given a division operation and the schemas of the relations, we can, in
fact, define the division operation in terms of the fundamental operations. Let r (R) and s(S) be given, with S Í
R:
r ’ s= П R-S (r) – П R-S ((П R-S (r) x s) – П R-S,S(r))
To see that this expression П is true, we observe that П R-S (r) gives us all tuples t that satisfy the first condition
of the definition of division. The expression on the right side of the set difference operator,
П R-S ( (П R-S(r) x s) – П R-S,S(r)),
serves to eliminate those tuples that fail to satisfy the second condition of the definition of division. Let us see
how it does so. Consider П R-S (r) x s. This relation is on schema R, and pairs every,tuple in П R-S (r) with every
tuple in s. The expression П R-S,S (r) merely reorders the attributes of r.
Thus, (П R-S (r) x s) – П R-S,S (r) gives us those pairs of tuples from П R-S (r) and s that do not appear in r. If a
tuple tj is in
ПR-S ((П R-S (r) x s) – П R-S,S (r)),
then there is some tuple ts in s that does not combine with tuple tj to form a tuple in r. Thus, tj holds a value for
attributes R - S that does not appear in r ’ s. It is these values that we eliminate from П R-S (r).
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
5. Relational Algebra is
(a)Data Definition Language (b) Meta Language
(c)Procedural query Language (d) None of these
5.8 Set Operators
The three most common operations on sets are union intersection; and difference. We assume the reader is
familiar with these operations. Which are defined as follows on arbitrary sets R and S:
R S, the union of R and S; is the set of elements that are in R or S or both. An element appears only once in
the union even if it is present in both R and S.
R S, the intersection of R and S. is the set of elelilents that are in both R and S.
R - S, the difference of R and S, is the set of elements that are in R but not in S. Note that R - S is different
from S - R; the latter is the set of elements that are in S but not in R.
When we apply these operations to relations, we need to put some conditions on R and S:
1. R and S must have schemas with identical sets of attributes, and the types (domains) for each attribute must
be the same in R and S.
2. Before me compute the set-theoretic union, intersection, or difference of sets of tuples, the columns of R and
S must be ordered so that the order of attributes is the same for both relations.
Sometimes we would like to take the union, intersection, or difference of relations that have the same number
of attributes, with corresponding domains, but that use different names for their attributes. If so, we may use
the renaming operator to change the schema of one or both relations and give them the same set of attributes.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
5.9 Summary
The union of two relations is formed by adding the tuples from first relation to those of a second relation
and a third resultant relation is produced.
The intersection of two relations returns a relation that includes all those common tuples.
The difference of two relations returns a relation that includes all those tuples that occur in the first
relation but not in the second.
The projection operation is used to select data of particular attributes (columns) from a single relation and
discards the other columns.
The Selection Operation is used to select a subset of tuples (or horizontal subset or rows) from a single
relation that satisfy the given selection condition.
5.10 Keywords
Cartesian product or Cross Product (×): The Cartesian product of two relations is the concatenation of tuples
belonging to the two relations and consisting of all possible combination of the tuples. R = P × Q
Join (?): Allows the combining of two relations to form a single new relation.
Selection (σ): Selects only some of the tuples, those satisfy given criteria, from the relation. It yields a
horizontal subset of a given relation. R = ζB(P)
Natural join: It is that one of the duplicated columns is eliminated in the resultant relation.
Union ( ): Selects tuples that are in either P or Q or in both of them. The duplicate tuples are eliminated. R =
P Q
6.0 Objectives
After studying this chapter, you will be able to:
Explain the tuple relational calculus.
Discuss the domain relational calculus.
Comparison of TRC, DRC and RA.
6.1 Introduction
The relational calculus is a non-procedural query language. R-C uses both languages are logically differentiation.
―The relational calculus is a non-procedural a different approach than relational algebra, but query language
whereas relational algebra is a procedural query language.‖
In non-procedural query language, the user is concerned with the details of how to obtain the end results.
Whereas in procedural query language, we define each step in order to obtain the end result. In relational
calculus, a query is expressed as a formula consisting variables. There is no mechanism to specify a how formula
should be evaluated. Relational calculus is of two types
(i) Tuple Relational Calculus
(ii) Domain Relational Calculus
In this chapter we are going to discuss about both the relational tuple and domain calculus. We will also
discuss about the example queries of this calculus.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
We now illustrate the calculus through several examples, using the instances B1 of Boats, R2 of Reserves, and
S3 of Sailors showed in Figures 6.2, 6.3, and 6.4. We will use parentheses as needed to make our formulas
unambiguous. Often, a formula p(R) includes a condition R Rel and the meaning of the phrases some tuple R
and for all tuples R is intuitive. We will use the notation R Î Rel(p(R)) for R(R Relԛp(R)).
Similarly, we use the notation ∀ R Î Rel(p(R)) for ∀ R(R ∈ Rel ⟹p(R)).
(Q) Find the names and ages of sailors with a rating above 7.
{P | ∃ S ∈ Sailors (S.rating > 7 ԛ P.name = S.sname ԛ P.age = S.age)}
This query illustrates a useful convention: P is considered to be a tuple variable with exactly two fields, which
are called name and age, because these are the only fields of
P that are mentioned and P do not range over any of the relations in the query; that is, there is no sub formula
of the form P Î Relname. The result of this query is a relation with two fields, name and age. The atomic
formulas P.name = S.sname and P.age = S.age give values to the fields of an answer tuple P. On instances B1,
R2, and S3, the answer is the set of tuples.
(Q) Find the sailor name, boat id, and reservation date for each reservation.
{P | 'R Î Reserves 'S Î Sailors (R.sid = S.sid ԛ P.bid= R.bid ԛ P.day= R.day ԛ P.sname = S.sname)}
For each Reserves tuple, we look for a tuple in Sailors with the same sid. Given a pair of such tuples, we
construct an answer tuple P with fields name, bid, and day by copying the corresponding fields from these two
tuples. This query illustrates how we can combine values from different relations in each answer tuple. The
answer to this query on instances B1, RÎ, and S3 is shown in Figure 6.1.
(Q) Find the names of sailors who have reserved boat 103.
(Q) Find the names of sailors who have reserved at least two boats.
{P | ∃ S ∈ Sailors ∃ R1 ∈ Reserves ∃ R∈ ∈ Reserves
(S.sid = R1.sid ԛ R1.sid = R∈ .sid ԛ R1.bid 6= R∈ .bid ԛ P.sname = S.sname)}
Contrast this query with the algebra version and see how much simpler the calculus version is. In ipart, this di
erence is due to the cumbersome renaming of felds in the algebra version, but the calculus version really is
simpler.
(Q) Find the names of sailors who have reserved all boats.
{P | ∃ S ∈ Sailors ∀ B ∈ Boats (∃ R ∈ Reserves (S.sid = R.sid ԛ R.bid = B.bid ԛ P.sname = S.sname))}
This query was expressed using the division operator in relational algebra. Notice how easily it is expressed in
the calculus. The calculus query directly reflects how we might express the query in English. ―Find sailors S
such that for all boats B there is Reserves tuple showing that sailor S has reserved boat B.‖
(Q) Find sailors who have reserved all red boats.
{S | S ∈ Sailors ԛ∀ B ∈ Boats
(B.color = ‗red‘Þ (∃ R ∈ Reserves (S.sid = R.sid ԛ R.bid = B.bid)))}
This query can be read as follows: For each candidate (sailor), if a boat is red, the sailor must have reserved it.
That is, for a candidate sailor, a boat being red must imply the sailor having reserved it. Observe that since we
can return an entire sailor
tuple as the answer instead of just the sailor‘s name, we have avoided introducing a new free variable (e.g., the
variable P in the previous example) to hold the answer values. On instances B1, R2, and S3, the answer
contains the Sailors tuples with sids 22 and 31.
We can write this query without using implication, by observing that an expression of the form p Þq is
logically equivalent to p ԛ q: {S | S ∈ Sailors ԛ ∀ B ∈ Boats (B.color ¹‗red‘ Ú(∃ R ∈ Reserves(S.sid =
R.sid ԛ R.bid= B.bid)))}
This query should be read as follows: ―Find sailors S such that for all boats B, either the boat is not red or
Reserves tuple shows that sailor S has reserved boat B.‖
The Domain Relational Calculus: An expression of the domain calculus is of the form
{X1, X2, …………, Xn/COND(X1, X2,……..,Xn, Xn+1, Xn+2,……., Xn+m)}
where X 1, X 2, …….., X n, X n+1, X n+2 ……..X n+m are domain variables that range over domains and COND is a
condition or formula of the domain relational calculus.
Retrieve the birthdates and address of the employee whose name is ‗John B. Smith‘.
{uv | ( q) ( r) ( s) ( t) ( w) ( x) ( y) ( z) (EMPLOYEE (qrstuvwxyz) and q = ‗John‘ and r = ‗B‘ and s=‗smith‘)}
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
This differs from the TRC version in giving each attribute a (variable) name. The condition (I N; T, A )
Sailors ensures that the domain variables I, N, T, and A are restricted to be field of the same tuple. In
comparison with the TRC query, we can say T > 7 instead of S.rating > 7, but we must specify the tuple (I, N,
T, A ) in the result, rather than just S.
Q Find the names of sailors who have reserved boat 103.
Notice that only the sname field is retained in the answer and that only N is a free variable. We use the
notation ' Ir,Br,D(….) as shorthand for 'Ir('Br('D(….))). Very often; all the quantified variables appear in a
single relation, as in this example. An even more compact notation in this case is ' (Ir,Br,Br,D) Î Reserves.
With this notation, which we will use henceforth, the above query would be as follows:
The comparison with the corresponding TRC formula should now be straightforward. This query can also be
written as follows; notice the repetition of variable I and the use of the constant 103:
Find the names of sailors who have reserved a red boat.
(Q) Find the names of sailors who have reserved at least two boats.
Notice how the repeated use of variable I ensure that the same sailor has reserved both the boats in question
This query can be read as follows: Find all values of N such that there is some tuple (I, N, T, A) in Sailors
satisfying the following condition: for every (B, BN, C) , either this is not a tuple in Boats or there is some
tuple (Ir, Br, D) in Reserves that proves that Sailor I has reserved boat B.
TheÚ quantifier allows the domain variables B, BN, and C to range over all values in their respective attribute
domains, and the pattern ‗¬ ((B, BN, C)ÎBoats)Ú‘ is necessary to restrict attention to those values that appear
in tuples of Boats. This pattern is common in DRC formulas, and the notation ?(B, BN, C) Î Boats can be used
as shorthand instead. This is similar to the notation introduced earlier for 9. With this notation the query would
be written as follows:
Here, we find all sailors such that for every red boat there is a tuple in Reserves that shows the sailor has
reserved it.
6.5 Summary
The relational calculus is a non-procedural a different approach than relational algebra, but query language
whereas relational algebra is a procedural query language.
A tuple variable is a variable that takes on tuples of a particular relation schema as values.
The domain relational calculus or domain variables that on values from an attribute‘s domain rather than
values for an entire tuple.
Non-procedural query language, the user is concerned with the details of how to obtain the end results.
Each variable in a TRC formula has a well-defined domain from which values for the variable are drawn.
6.6 Keywords
Domain variable: A domain variable is a variable that ranges over the values in the domain of some attribute
Query: A query is way to solve the problems using some SQL commands.
Relational Calculus: The relational calculus is a non-procedural a different approach than relational algebra, but
query language whereas relational algebra is a procedural query language.
Schema: The overall design of the database is called the database schema.
Tuple variable: A tuple variable is a variable that takes on tuples of a particular relation schema as values.
7.0 Objectives
After studying this chapter, you will be able to:
Describes the data definition language
Discuss data manipulation language
Discuss characteristics of SQL
Understand advantage of SQL
Define data types SQL and literals
Understand type of SQL commands
Explain SQL operators and their procedure
Discuss embedded SQL
7.1 Introduction
A SQL VIEW can be thought of as a saved query that returns a virtual table. This virtual table can be treated
like a real or regular database table. In other words, the VIEWs results can be presented to an end user as is, or
they can be re-queried to further limit the rows returned or apply grouping and ordering clauses. So we can
create a TSQL statement such as ―SELECT * FROM myView ORDER BY col‖. In addition, data can be
added to the database through a VIEW. This chapter will examine the syntax and options used in creating SQL
VIEWs.
A VIEW is a convenient way to give a user only partial access to a table. The VIEW can restrict the rows
being returned as well as the available columns. So granting the user access to the VIEW rather than the table
will effectively restrict their access. VIEWs are also a handy method for hiding a complex statement and only
presenting the end user with a simple one-table result set.
Defining views can be very simple, but managing and using them can become quite complex. A lot of rules
govern view creation and usage. This is focuses on view creation, modification, and usage, starting with the
definition and advantages of views.
Examples
Rule 1. You can't delete any of the rows in the CarType table that are visible in the picture since all the car
types are in use in the Car table.
Rule 2. You can't change any of the model_ids in the CarType table since all the car types are in use in the Car
table.
Rule 3. The values that you can enter in the model_id field in the Car table must be in the model_id field in the
CarType table.
Rule 4. The model_id field in the Car table can have a null value which means that the car type of that car in
not known
SQL Standards
An official standard for SQL was initially published by the American National Standards Institute (ANSI) and
the International Standards Organization (ISO) in 1986, and was expanded in 1989 and again in 1992 and
1999. SQL is also a U.S. Federal Information Processing Standard (FIPS), making it a key requirement for
large government computer contracts. Over the years, other international, government, and vendor groups have
pioneered the standardization of new SQL capabilities, such as call-level interfaces or object-based extensions.
Many of these new initiatives have been incorporated into the ANSI/ISO standard over time. The evolving
standards serve as an official stamp of approval for SQL and have speeded its market acceptance.
Relational Foundation
SQL is a language for relational databases, and it has become popular along with the relational database
model. The tabular, row/column structure of a relational database is intuitive to users, keeping the SQL
language simple and easy to understand. The relational model also has a strong theoretical foundation that has
guided the evolution and implementation of relational databases. Riding a wave of acceptance brought about
by the success of the relational model, SQL has become the database language for relational databases.
Client/Server Architecture
SQL is a natural vehicle for implementing applications using a distributed, client/ server architecture. In this
role, SQL serves as the link between ―front-end‖ computer systems optimized for user interaction and ―back-
end‖ systems specialized for database management, allowing each system to do what it does best. SQL also
allows personal computers to function as front-ends to network servers or to larger minicomputer and
mainframe databases, providing access to corporate data from personal computer applications.
Code:
Sign Exponent Fraction Total
Single-Precision 1 8 23 32
Double-Precision 1 11 52 64
With the double precision standard, the mantissa precision can go up to 52 binary digits, about 15 decimal
digits.
5. Data and Time - A date and time value is usually stored in memory as an exact integer number with 8 bytes
representing an instance by measuring the time period between this instance and a reference time point in
millisecond precision, second fraction precision of 3. How MySQL is store date and time values? We will try
to find out later.
2. Hex String Literals are used to construct character strings and exact numbers. The syntax rules for hex string
literals are also very simple:
A hex string literal is a sequence of hex digits enclosed by quote characters and prefixed with ―x‖.
The quote character is the single quote character ―'―.
Examples of hex string literals:
Code:
x'41424344'
x'31323334'
x'31323334'
x'01'
x'0001'
x'ff'
x'ffffffff'
x'ffffffffffffffff'
3. Numeric Literals are used to construct exact numbers and approximate numbers. Syntax rules of numeric
literals are:
A numeric literal can be written in signed integer form, signed real numbers without exponents, or real
numbers with exponents.
Examples of numeric literals:
Quote:
1
-22
33.3
-44.44
55.555e5
-666.666e-6
4. Date and Time Literals are used to construct date and time values. The syntax of date and time literals are:
A date literal is written in the form of ―DATE 'yyyy-mm-dd'―.
A time literal is written in the form of ―TIMESTAMP 'yyyy-mm-dd hh:mm:ss'―.
Examples of data and time literals:
Quote:
DATE '1999-01-01'
TIMESTAMP '1999-01-01 01:02:03'
Self Assessment Question
1. The original version was developed at ………………… San Jose Research Laboratory (now the Almaden
Research Center).
(a) IBM‘s (b) Microsoft
(c) HCL (d) None of these.
2. The SQL …………..includes a query language based on both the relational algebra and the tuple
relational calculus.
(a) DDL (b) DML
(c) Embedded DML (d) None of these.
3. The SQL DDL includes commands for specifying integrity constraints that the data stored in the
database must satisfy. Updates that violate integrity constraints are disallowed.
(a) True (b) False
4. The SQL is a standard interactive and programming language for querying and modifying data and
managing……………..
(a) data warehouse (b) table
(c) databases (d) None of these
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
7 =, eq Equals
¬=, ^=, < >, ne does not equal
>, gt is greater than
<, lt is less than
>=, ge is greater than or equal to
<=, le is less than or equal to
=* Sounds like (use with character operands only). Retrieving
Values with the SOUNDS-LIKE Operator.
Eqt Equal to truncated strings (use with character operands
only). Truncated String Comparison Operators.
Gtt greater than truncated strings
Ltt less than truncated strings
Get greater than or equal to truncated strings
Let less than or equal to truncated strings
Net not equal to truncated strings
8 ¬, ^, NOT indicates logical NOT
9 &, AND indicates logical AND
10 |, OR indicates logical OR
7.10 Table
Tables are the basic structure where data is stored in the database. Given that in most cases, there is no way for
the database vendor to know ahead of time what your data storage needs are, chances are that you will need to
create tables in the database yourself. Many database tools allow you to create tables without writing SQL, but
given that tables are the container of all the data.
The foundation of every Relational Database Management System is a database object called table. Every
database consists of one or more tables, which store the database‘s data/information. Each table has its own
unique name and consists of columns and rows.
The database table columns (called also table fields) have their own unique names and have a pre-defined data
types. Table columns can have various attributes defining the column functionality (the column is a primary
key, there is an index defined on the column, the column has certain default value, etc.). While table columns
describe the data types, the table rows contain the actual data for the columns. It is important to include the
CREATE TABLE syntax.
Sometimes, we want to provide a default value for each column. A default value is used when you do not
specify a column‘s value when inserting data into the table. To specify a default value, add ―Default [value]‖
after the data type declaration. In the example, if we want to default column ―Address‖ to ―Unknown‖ and
City to ―Mumbai‖, we would type in
CREATE TABLE customer
(First_Name char(50),
Last_Name char(50),
Address char(50) default ‗Unknown‘,
City char(50) default ‗Mumbai‘,
Country char(25),
Birth_Date date)
Here is an example of a simple database table, containing customer‘s data. The first row, listed in bold,
contains the names of the table columns:
Customers table
FirstName LastName Email DOB Phone
Kamal Kumar kamal.Kumar@yahoo.com 2/4/1968 626 222-2222
Satish Sharma s.satish@gmail.com 4/4/1974 323 455-4545
Paula Bhayel pb@hotmail.com 5/24/1978 416 323-3232
Rinku Kumar rrk@nic.co.in 20/10/1980 416 323-8888
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
7.10.2 Update Statement
The SQL Update is used to update one table columns with specified values. You can update one or multiple
columns at the same time. For instance you might want to update customer address and this case you would
update several columns like AddressLine1, AddressLine2, City, Post code with usually either hard coded or
provided values from end user application. You could also update values using calculated fields for instance
you might want to update Employee holiday entitlement (once a year) and you could use Employee start date
to calculate number of years that employee has worked for the company and use some ―IF‖ (case in SQL)
logic to specify the correct holiday entitlement. Very often you might want to perform updates using another
related table and for instance in our Holiday entitlement case we could use a HolidayEntitlement table and
match it against our Years Worked that could be calculated using EmployeeStartDate and provide matching
row (HolidayEntitlement) from related HolidayEntitlement table.
This statement would update all supplier names in the suppliers table from IBM to HP.
You may wish to update records in one table based on values in another table. Since you cannot list more than
one table in the UPDATE statement, you can use the EXISTS clause.
Example:
Example: To delete an employee with id 100 from the employee table, the SQL delete query would be like,
DELETE FROM employee WHERE id = 100;
To delete all the rows from the employee table, the query would be like,
DELETE FROM employee;
SQL DELETE Example
The ―Persons‖
P_Id LastName FirstName Address City
1 Kumar Rahul Sector 10 Kota
2 Singh Satyendra Borgvn 23 Kota
3 Sharma Pankaj Sector 20 Kanpur
4 Verma Johan Bakken 2 Kanpur
5 Tjessem Jakob Sector 67 Kota
DELETE Statement: This command deletes only the rows from the table based on the condition given in the
where clause or deletes all the rows from the table if no condition is specified. But it does not free the space
containing the table.
TRUNCATE statement: This command is used to delete all the rows from the table and free the space
containing the table.
SQL DROP Statement:
The SQL DROP command is used to remove an object from the database. If you drop a table, all the rows in
the table is deleted and the table structure is removed from the database. Once a table is dropped we cannot get
it back, so be careful while using RENAME command. When a table is dropped all the references to the table
will not be valid.
Syntax to drop a SQL table structure:
DROP TABLE table_name;
Example: To drop the table employee, the query would be like
DROP TABLE employee;
Caution
Changing any part of an object name can break scripts and stored procedures. We recommend you do not use
this statement to rename stored procedures, triggers, user-defined functions, or views; instead, drop the object
and re-create it with the new name.
Derived Tables
A Derived table is a table expression that appears in the FROM clause of a query. Derived tables can be used
when the use of column aliases is not possible because another clause is processed before the alias name.
Example:
1: USE AdventureWorks
2: SELECT MONTH(HireDate) as Hire_Month
3: FROM HumanResources.Employee
4: GROUP BY Hire_Month;
When we execute above query, we will get the following result
Msg 207, Level 16, State 1, Line 4
Invalid column name 'Hire_Month'.
The reason for getting the above error message is GROUP BY clause is processed before the Select Clause.
Here Alias name is not known when group by is processed.
We can solve the above error by re-writing the above query using Derived Tables.
1: USE AdventureWorks
2: SELECT Hire_Month
3: FROM (SELECT MONTH(HireDate) as Hire_Month
4: FROM HumanResources.Employee) AS m
5: GROUP BY Hire_Month;
The result of a table expression is always a table or expression.
Example:
1: SELECT d.StartDate, (SELECT EmployeeID
2: FROM HumanResources.Employee e WHERE e.EmployeeID = d.EmployeeID)
3: AS EmployeeID
4: FROM HumanResources.EmployeeDepartmentHistory d
5: WHERE d.StartDate IN ('1998-01-11 00:00:00.000',
6: '1997-02-26 00:00:00.000');
Common Table Expressions
A common table expression (CTE) is a named table expression supported by Transact-SQL. It is similar to
Derived table but it is not stored as an object and lasts for the duration of the query.
Common Table Expressions can be used in two types of queries
1. Non-Recursive
2. Recursive
CASE expression
WHEN value THEN result
[WHEN ...]
[ELSE result]
END
This ―simple‖ CASE expression is a specialized variant of the general form above. The expression is
computed and compared to all the values in the WHEN clauses until one is found that is equal. If no match is
found, the result in the ELSE clause (or a null value) is returned. This is similar to the switch statement in C.
The example above can be written using the simple CASE syntax:
=> SELECT a,
CASE a WHEN 1 THEN ‗one‘
WHEN 2 THEN 'two'
ELSE ‗other‘
END
FROM test;
a | case
---+-------
1 | one
2 | two
3 | other
COALESCE
COALESCE(value [, ...])
The COALESCE function returns the first of its arguments that is not null. This is often useful to substitute a
default value for null values when data is retrieved for display, for example:
SELECT COALESCE(description, short_description, ‗(none)‘) ...
NULLIF
NULLIF(value1, value2)
The NULLIF function returns a null value if and only if value1 and value2 are equal. Otherwise it returns
value1. This can be used to perform the inverse operation of the COALESCE example given above:
SELECT NULLIF(value, ‗(none)‘) ...
COALESCE and NULLIF are just shorthand for CASE expressions. They are actually converted into CASE
expressions at a very early stage of processing, and subsequent processing thinks it is dealing with CASE.
Thus an incorrect COALESCE or NULLIF usage may draw an error message that refers to CASE.
7.10.6 Join
The join keyword is used in an SQL statement to query data from two or more tables, based on a relationship
between certain columns in these tables. Tables in a database are often related to each other with keys.
A primary key is a column (or a combination of columns) with a unique value for each row. Each primary key
value must be unique within the table. The purpose is to bind data together, across tables, without repeating all
of the data in every table.
With the help of joins, you can retrieve data from two or more table on the basic of the relationship the table.
The various types of JOIN you can use, and the differences between them.
Inner join
Left join
Right join
Full join
Table 7.2: the ―Persons‖
P_Id LastName FirstName Address City
1 Kumar Rahul Sector 10 Kota
2 Singh Satyendra Borgvn 23 Kota
3 Sharma Pankaj Sector 20 Kanpur
Note that the ―P_Id‖ column is the primary key in the ―Persons‖ Table 7.2. This means that no two rows can
have the same P_Id. The P_Id distinguishes two persons even if they have the same name.
That the ―O_Id‖ column is the primary key in the ―Orders‖ Table 7.3 and that the ―P_Id‖ column refers to the
persons in the ―Persons‖ table without using their names.
Inner Join
The inner join keyword return rows when there is at least one match in both tables. Inner join is the same as
join.
SQL INNER JOIN Syntax
SELECT column_name(s)
FROM table_name1
INNER JOIN table_name2
ON table_name1.column_name=table_name2.column_name
Example
The ―Persons‖ table:
P_Id LastName FirstName Address City
1 Kumar Rahul Sector 10 Kota
2 Singh Satyendra Borgvn 23 Kota
3 Sharma Pankaj Sector 20 Kanpur
The inner join keyword return rows when there is at least one match in both tables. If there are rows in
―Persons‖ that do not have matches in ―Orders‖, those rows will NOT be listed.
Left join
The left join keyword returns all rows from the left table (table_name1), even if there are no matches in the
right table (table_name2). In some databases left join is called left outer join.
Now we want to list all the persons and their orders - if any, from the tables above.
We use the following SELECT statement:
SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo
FROM Persons
Left Join Orders
ON Persons.P_Id=Orders.P_Id
ORDER BY Persons.LastName
The LEFT JOIN keyword returns all the rows from the left table (Persons), even if there are no matches in the
right table (Orders).
Right Join
The right join keyword returns all the rows from the right table (table_name2), even if there are no matches in
the left table (table_name1). In some databases right join is called right outer join.
SQL Right Join Syntax
SELECT column_name(s)
FROM table_name1
RIGHT JOIN table_name2
ON table_name1.column_name=table_name2.column_name.
Example
The ―Persons‖ table:
P_Id LastName FirstName Address City
1 Kumar Rahul Sector 10 Kota
2 Singh Satyendra Borgvn 23 Kota
3 Sharma Pankaj Sector 20 Kanpur
Now we want to list all the orders with containing persons - if any, from the tables above.
We use the following SELECT statement:
SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo
FROM Persons
right join Orders
ON Persons.P_Id=Orders.P_Id
ORDER BY Persons.LastName
The result-set will look like this:
The right join keyword returns all the rows from the right table (Orders), even if there are no matches in the
left table (Persons).
Full Join
The full join keyword return rows when there is a match in one of the tables.
SQL FULL JOIN Syntax
SELECT column_name(s)
FROM table_name1
FULL JOIN table_name2
ON table_name1.column_name=table_name2.column_name
Example
The ―Persons‖ table:
P_Id LastName FirstName Address City
1 Kumar Rahul Sector 10 Kota
2 Singh Satyendra Borgvn 23 Kota
3 Sharma Pankaj Sector 20 Kanpur
Now we want to list all the persons and their orders, and all the orders with their persons.
We use the following SELECT statement:
SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo FROM Persons
FULL JOIN Orders
ON Persons.P_Id=Orders.P_Id
ORDER BY Persons.LastName
The result-set will look like this:
The full join keyword returns all the rows from the left table (Persons), and all the rows from the right table
(Orders). If there are rows in ―Persons‖ that do not have matches in ―Orders‖, or if there are rows in ―Orders‖
that do not have matches in ―Persons‖, those rows will be listed as well.
7.10.7 Union
The UNION operator is used to combine the result-set of two or more SELECT statements. That each
SELECT statement within the UNION must have the same number of columns. The columns must also have
similar data types. Also, the columns in each SELECT statement must be in the same order.
The purpose of the SQL UNION query is to combine the results of two queries together. In this respect,
UNION is somewhat similar to JOIN in that they are both used to related information from multiple tables.
One restriction of UNION is that all corresponding columns need to be of the same data type. Also, when
using UNION, only distinct values are selected (similar to SELECT DISTINCT).
E_ID E_Name
01 Kumar, Rahul
02 Singh, Satyendra
03 Singh, Stephen
04 Sharma, Pankaj
―Employees_USA‖:
E_ID E_Name
01 Turner, Sally
02 Kent, Clark
03 Singh, Stephen
04 Scott, Stephen
Now we want to list all the different employees in Norway and USA.
We use the following SELECT statement:
SELECT E_Name FROM Employees_Norway
UNION
SELECT E_Name FROM Employees_USA
The result-set will look like this:
E_Name
Kumar, Rahul
Singh, Satyendra
Singh, Stephen
Sharma, Pankaj
Turner, Sally
Kent, Clark
Scott, Stephen
Note: This command cannot be used to list all employees in Norway and USA. In the example above we have
two employees with equal names, and only one of them will be listed. The UNION command selects only
distinct values.
SQL UNION ALL Example
Now we want to list all employees in Norway and USA:
SELECT E_Name FROM Employees_Norway UNION ALL
SELECT E_Name FROM Employees_USA
Result
E_Name
Kumar, Rahul
Singh, Satyendra
Singh, Stephen
Sharma, Pankaj
Turner, Sally
Kent, Clark
Singh, Stephen
Scott, Stephen
7.10.8 Intersections
Similar to the UNION command, INTERSECT also operates on two SQL statements. The difference is that,
while UNION essentially acts as an OR operator (value is selected if it appears in either the first or the second
statement), the INTERSECT command acts as an AND operator (value is selected only if it appears in both
statements). The INTERSECT query allows you to return the results of 2 or more ―select‖ queries. However, it
only returns the rows selected by all queries. If a record exists in one query and not in the other, it will be
omitted from the INTERSECT results. Each SQL statement within the INTERSECT query must have the same
number of fields in the result sets with similar data types.
The syntax is as follows:
[SQL Statement 1]
INTERSECT
[SQL Statement 2]
Let‘s assume that we have the following two tables,
Table Store Information
and we want to find out all the dates where there are both store sales and internet sales. To do so, we use the
following SQL statement:
SELECT Date FROM Store_Information
INTERSECT
SELECT Date FROM Internet_Sales
Result:
Date
Jan-07-1999
7.10.9 Minus
SQL MINUS or EXCEPT operator work on two table expressions. The result set takes records from the first
table expression, and then subtract out the ones that appear in the second table expression. If the second table
expression includes the records which are not appear in the first table expression, these records will be
ignored.
The MINUS operates on two SQL statements. It takes all the results from the first SQL statement, and then
subtract out the ones that are present in the second SQL statement to get the final answer. If the second SQL
statement includes results not present in the first SQL statement, such results are ignored.
The syntax is as follows:
[SQL Statement 1]
MINUS
[SQL Statement 2]
Let‘s continue with the same example:
Table Store_Information
Table Internet_Sales
Date Sales
Jan-07-1999 $250
Jan-10-1999 $535
Jan-11-1999 $320
Jan-12-1999 $750
and we want to find out all the dates where there are store sales, but no internet sales. To do so, we use the
following SQL statement:
SELECT Date FROM Store_Information
MINUS
SELECT Date FROM Internet_Sales
Result:
Date
Jan-05-1999
Jan-08-1999
7.10.10 Views
A SQL View is a virtual table, which is based on SQL SELECT query. Essentially a view is very close to a
real database table (it has columns and rows just like a regular table), except for the fact that the real tables
store data, while the views do not. The view‘s data is generated dynamically when the view is referenced. A
view references one or more existing database tables or other views. In effect every view is a filter of the table
data referenced in it and this filter can restrict both the columns and the rows of the referenced tables.
A view is a consists of columns from one or more tables. Though it is similar to a table, it is stored in the
database. It is a query stored as an object. Hence, a view is an object that derives its data from one or more
tables. These tables are referred to as base or underlying tables. A view serves as a security mechanism. This
ensures that users are able to retrieve and modify only the data seen by them. Users cannot see or access the
remaining data in the underlying tables. A view also serves as a mechanism to simplify query execution.
Complex queries can be stored in the form as a view, and data from the view can be extracted using simple
queries.
7.10.11 Indexes
Indexes are created on columns in tables or views. The index provides a fast way to look up data based on the
values within those columns. For example, if you create an index on the primary key and then search for a row
of data based on one of the primary key values, SQL Server first finds that value in the index, and then uses
the index to quickly locate the entire row of data. Without the index, a table scan would have to be performed
in order to locate the row, which can have a significant effect on performance.
You can create indexes on most columns in a table or a view. The exceptions are primarily those columns
configured with large object (LOB) data types, such as image, text, and varchar (max). You can also create
indexes on XML columns, but those indexes are slightly different from the basic index and are beyond the
scope of this section. Instead, we will focus on those indexes that are implemented most commonly in a SQL
Server database.
Index in SQL is created on existing tables to retrieve the rows quickly. When there are thousands of records in
a table, retrieving information will take a long time. Therefore indexes are created on columns which are
accessed frequently, so that the information can be retrieved quickly. Indexes can be created on a single
column or a group of columns. When a index is created, it first sorts the data and then it assigns a ROWID for
each row.
The following SELECT statement finds the product ID, name, and list price of any products whose unit price
exceeds 40:
SELECT ProductID, Name, ListPrice
FROM Production.Product
WHERE ListPrice > 40
ORDER BY ListPrice ASC
The column names listed after the SELECT keyword (ProductID, Name, and ListPrice) form the select list.
This list specifies that the result set has three columns, and each column has the name, data type, and size of
the associated column in the Product table. Because the FROM clause specifies only one base table, all column
names in the SELECT statement refer to columns in that table.
The FROM clause lists the Product table as the one table from which the data is to be retrieved.
The WHERE clause specifies the condition that the only rows in the Product table that qualify for this
SELECT statement are those rows in which the value of the ListPrice column is more than 40.
The ORDER BY clause specifies that the result set is to be sorted in ascending sequence (ASC) based on the
value in the ListPrice column.
Sub Query
A subquery is a query that is nested inside a SELECT, INSERT, UPDATE, or DELETE statement, or inside
another subquery. A subquery can be used anywhere an expression is allowed. In this example a subquery is
used as a column expression named MaxUnitPrice in a SELECT statement.
USE Anuragi;
GO
SELECT Ord.SalesOrderID, Ord.OrderDate,
(SELECT MAX(OrdDet.UnitPrice)
FROM Anurag.Sales.SalesOrderDetail AS OrdDet
WHERE Ord.SalesOrderID = OrdDet.SalesOrderID) AS MaxUnitPrice
FROM Anuragi.Sales.SalesOrderHeader AS Ord
A subquery is also called an inner query or inner select, while the statement containing a subquery is also
called an outer query or outer select. Many Transact-SQL statements that include subqueries can be
alternatively formulated as joins. Other questions can be posed only with subqueries. In Transact-SQL, there is
usually no performance difference between a statement that includes a subquery and a semantically equivalent
version that does not. However, in some cases where existence must be checked, a join yields better
performance. Otherwise, the nested query must be processed for each result of the outer query to ensure
elimination of duplicates. In such cases, a join approach would yield better results. The following is an
example showing both a subquery SELECT and a join SELECT that return the same result set:
/* SELECT statement built using a subquery. */
SELECT Name
FROM Anuragi.Production.Product
WHERE ListPrice =
(SELECT ListPrice
FROM Anuragi.Production.Product
WHERE Name = ‗Chainring Bolts‘ );
6. The ………………..is the command used to insert new data (a new row) into a table by specifying a list of
values to be inserted into each table column.
(a) Update command (b) Insert command
(c) Delete command (d) None of these
7. Update is used to update one table columns with specified values. You can update one or multiple columns
at the same time.
(a) Delete command (b) Insert command
(c) Update command (d) None of these
9. A Derived table is a table expression that appears in the FROM clause of a query.
(a) True (b) False
Retaining of duplicates is important in computing an average. Suppose that the account balances at the (small)
Brighton branch are 1000, 3000, 2000, and 1000. The average balance is 7000/4 = 1750.00. If duplicates were
eliminated, we would obtain the wrong answer (6000/3 =2000).
There are cases where we must eliminate duplicates prior to computing an aggregate function. If we do want to
eliminate duplicates, we use the keyword distinct in the aggregate expression. An example arises in the query
―Find the number of depositors for each branch.‖ In this case, a depositor counts only once, regardless of the
number of accounts that depositor may have. We write this query as follows:
select branch-name, count (distinct customer- . name) from depositor, account where depositor.account-number
=, account.account-number group by branch-name
At times, it is useful to state a condition that applies to groups rather than to tuples. For example, we might be
interested in only those branches where the average account balance is more than $1200. This condition does
not apply to a single tuple; rather, it applies to each group constructed by the group by clause. To express such
a query, we use the having clause of SQL. Predicates in the having clause are applied after the formation of
groups, so aggregate functions may be used. We express this query in SQL as follows:
select branch-name, avg (balance) from account group by branch-name having avg (balance) > 1200
At times, we wish to treat the entire relation as a single group. In such cases, we do not use a group by clause.
Consider the query ―Find the average balance for all accounts.‖ We write this query as follows:
select avg (balance) from account
We use the aggregate function count frequently to count the number of tuples in a relation. The notation for this
function in SQL is count (*). Thus, to find the number of tuples in the customer relation, we write
select count (*) from customer
SQL does not allow the use of distinct with count(*). It is legal to use distinct with max and min, even
though the result does not change. We can use the keyword all in place of distinct to specify duplicate
retention, but, since all is the default, there is no need to do so.
If a where clause and a having clause appear in the same query, the predicate in the where clause is applied first.
Tuples satisfying the where predicate are then placed into groups by the group by clause. The having
clause, if it is present, is then applied to each group; the groups that do not satisfy the having clause
predicate are removed. The remaining groups are used by the select clause to generate tuples of the
result of the query.
Aggregate functions can be used Table 7.4 and expressions only in the following:
The select list of a SELECT statement (either a subquery or an outer query).
A COMPUTE or COMPUTE BY clause.
A HAVING clause.
COUNT ROWCOUNT_BIG
COUNT_BIG STDEV
GROUPING STDEVP
GROUPING_ID SUM
MAX VAR
VARP
First you declare the name of cursor cursor_name after the keyword CURSOR. The name of cursor can
have up to 30 characters in length and follows the rules of identifiers in PL/SQL. It is important to note
that cursor‘s name is not a variable so you cannot use it as a variable such as assign it to other cursor or
use it in an expression.
parameter1, parameter2… are optional section in cursor declaration. These parameter allows you to pass
arguments into the cursor.
RETURN return_specification is an optional part
Next you specify the valid SQL statement which returns a result set where the cursor points to.
Finally you can indicate a list of columns you want to update after the FOR UPDATE OF. This part is
optional so you can omit it in the CURSOR declaration.
PL/SQL Cursor Declaration Example
CURSOR cur_chief IS
SELECT first_name,
last_name,
department_name
FROM employees e
INNER JOIN departments d ON d.manager_id = e.employee_id;
Opening a PL/SQL Cursor
After declaring a cursor you can use open it by following the below syntax:
Opening PL/SQL Cursor Syntax
OPEN cursor_name [ ( argument_1 [, argument_2 ...] ) ];
You have to specify the cursor‘s name cursor_name after the keyword OPEN. If the cursor was defined with a
parameter list, you need to pass corresponding arguments to the cursor also. When you OPEN the cursor,
PL/SQL executes the SQL SELECT statement and identifies the active result set. Note that the OPEN action
does not actually retrieve records from database. It happens in the FETCH step. If the cursor was declared with
FOR UPDATE clause, PL/SQL locks all the records in the result set.
We can open our cursor cur_chief above as follows:
Open PL/SQL Cursor Example
OPEN cur_chief;
7.15 Keywords
Cursor: It is a temporary work area created in the system memory when a SQL statement is executed.
DDL: It is provides commands for defining relation schemas, deleting relations, creating indices, and
modifying relation schemas.
Join: It is used in an SQL statement to query data from two or more tables, based on a relationship between
certain columns in these tables.
SQL (Structured Query Language): It is a computer language aimed to store, manipulate, and query data
stored in relational databases.
UNION operator: is used to combine the result-set of two or more SELECT statements. That each SELECT
statement within the UNION must have the same number of columns. UNION operator is used to combine the
result-set of two or more SELECT statements. That each SELECT statement within the UNION must have the
same number of columns.
View: It is a virtual table, which is based on SQL SELECT query.
8.0 Objectives
After studying this chapter, you will be able to:
Define database administration
Discuss about the failure classification in database administration
Describes the RAID
Discuss about the transaction model
Explain the data dictionary storage
8.1 Introduction
Many database administrators and programmers are faced with tables and structures designed by others,
perhaps created many years ago. In order to conceptualize database objects and structures, programmers need
to understand the capabilities of modern database systems and how to retrieve database metadata.
In this chapter, students will learn how to manipulate the data stored in tables and to return meaningful results
to help analyze the data stored. From beginning to end, participants will learn by doing SQL-based projects in
their own MySQL shell, and then handing them in for instructor feedback. These projects, as well as the final
project (developing tables for a blog), will add to the student‘s portfolio and will contribute to certificate
completion.
System Error: The system has entered an undesirable state (for example, deadlock), as a result of which a
transaction cannot continue with its normal execution. The transaction, however, can be re-executed at a later
time.
System Crash: There is a hardware malfunction, or a bug in the database software or the operating system, that
causes the loss of the content of volatile storage, and brings transaction processing to a halt. The content of the
nonvolatile storage remains intact, and is not corrupted.
Disk Failure: A disk block loses its contents as a result of either a head crash or failure during a data transfer.
Copies of data on other disks, or archival backups on tertiary media, such as tapes, are used to recover from
the failure.
Magnetic Disks: Read-write head Positioned very close to the platter surface (almost touching it) reads or
writes magnetically encoded information. Surface of platter divided into circular tracks Over 16,000 tracks per
platter on typical hard disks. Each track is divided into sectors. A sector is the smallest unit of data that can be
read or written. Sector size typically 512 bytes Typical sectors per track: 200 (on inner tracks) to 400 (on outer
tracks) To read/write a sector disk arm swings to position head on right track platter spins continually; data is
read/written as sector passes under head Head-disk assemblies multiple disk platters on a single spindle
(typically 2 to 4) one head per platter, mounted on a common arm. Cylinder it consists of ith track of all the
platters.
Earlier generation disks were susceptible to head-crashes surface of earlier generation disks had metal-oxide
coatings which would disintegrate on head crash and damage all data on disk.
Current generation disks are less susceptible to such disastrous failures, although individual sectors may get
corrupted Disk controller - interfaces between the computer system and the disk drive hardware. accepts high-
level commands to read or write a sector initiates actions such as moving the disk arm to the right track and
actually reading or writing the data Computes and attaches checksums to each sector to verify that data is read
back correctly If data is corrupted, with very high probability stored checksum would not match recomputed
checksum Ensures successful writing by reading back sector after writing it Performs remapping of bad
sectors.
Optimization of disk-block access block: A contiguous sequence of sectors from a single track data is
transferred between disk and main memory in blocks sizes range from 512 bytes to several kilobytes Smaller
blocks: more transfers from disk larger blocks: more space wasted due to partially filled blocks Typical block
sizes today range from 4 to 16 kilobytes Disk-arm-scheduling algorithms order pending accesses to tracks so
that disk arm movement is minimized elevator algorithm move disk arm in one direction (from outer to inner
tracks or vice versa), processing next request in that direction, till no more requests in that direction, then
reverse direction and repeat.
Optimization of disk block access file organization: Optimize block access time by organizing the blocks to
correspond to how data will be accessed e.g. Store related information on the same or nearby cylinders. Files
may get fragmented over time E.g. if data is inserted to/deleted from the file or free blocks on disk are
scattered, and newly created file has its blocks scattered over the disk Sequential access to a fragmented file
results in increased disk arm movement Some systems have utilities to defragment the file system, in order to
speed up file access.
8.5 RAID
The RAID stands for (Redundant Arrays of Independent Disks). RAID is the use of multiple disks and data
distribution techniques to get better Resilience and/or Performance. RAID can be implemented in Software or
Hardware or any combination of both. This presentation is a simple introduction to the RAID levels with some
information on Caching and different I/O profiles.
Redundant Arrays of Independent Disks: Disk organization techniques that manage a large numbers of disks,
providing a view of a single disk of high capacity and high speed by using multiple disks in parallel, and high
reliability by storing data redundantly, so that data can be recovered even if a disk fails The chance that some
disk out of a set of N disks will fail is much higher than the chance that a specific single disk will fail. e.g., a
system with 100 disks, each with MTTF of 100,000 hours (approx. 11 years), will have a system MTTF of
1000 hours (approx. 41 days) Techniques for using redundancy to avoid data loss are critical with large
numbers of disks Originally a cost-effective alternative to large, expensive disks I in RAID originally stood for
inexpensive‘‘ Today RAIDs are used for their higher reliability and bandwidth. The ―I‖ is interpreted as
independent.
Improvement of Reliability via Redundancy: Redundancy store extra information that can be used to rebuild
information lost in a disk failure e.g., Mirroring (or shadowing ) Duplicate every disk. Logical disk consists of
two physical disks. Every write is carried out on both disks reads can take place from either disk If one disk in
a pair fails, data still available in the other data loss would occur only if a disk fails, and its mirror disk also
fails before the system is repaired Probability of combined event is very small Except for dependent failure
modes such as fire or building collapse or electrical power surges Mean time to data loss depends on mean
time to failure, and mean time to repair E.g. MTTF of 100,000 hours, mean time to repair of 10 hours gives
mean time to data loss of 500*10 6 hours (or 57,000 years) for a mirrored pair of disks (ignoring dependent
failure modes).
Improvement in Performance via Parallelism: Two main goals of parallelism in a disk system:
1. Load balance multiple small accesses to increase throughput
2. Parallelize large accesses to reduce response time.
Improve transfer rate by striping data across multiple disks. Bit-level striping – split the bits of each byte
across multiple disks In an array of eight disks, write bit i of each byte to disk i. Each access can read data at
eight times the rate of a single disk. But seek/access time worse than for a single disk Bit level striping is not
used much anymore Block-level striping – with n disks, block i of a file goes to disk ( i mod n ) + 1 Requests
for different blocks can run in parallel if the blocks reside on different disks A request for a long sequence of
blocks can utilize all disks in parallel.
RAIDs are Redundant Arrays of Inexpensive Disks. There are six levels of organizing these disks:
0 -Non-redundant Striping
1 - Mirrored Disks
2 - Memory Style Error Correcting Codes
3 - Bit Interleaved Parity
4 - Block Interleaved Parity
5 - Block Interleaved Distributed Parity
6 - P + Q Redundancy
RAID Levels Schemes to provide redundancy at lower cost by using disk striping combined with parity bits
Different RAID organizations, or RAID levels, have differing cost, performance and reliability characteristics:
RAID Level 1: Mirrored disks with block striping Offers best write performance. Popular for applications such
as storing log files in a database system. RAID Level 0 Block striping; non-redundant. Used in high-
performance applications where data lost is not critical.
RAID Level 2: Memory-Style Error-Correcting-Codes (ECC) with bit striping. RAID Level 3: Bit-Interleaved
Parity a single parity bit is enough for error correction, not just detection, since we know which disk has failed
when writing data, corresponding parity bits must also be computed and written to a parity bit disk. To recover
data in a damaged disk, compute XOR of bits from other disks (including parity bit disk).
RAID Level 3: Faster data transfer than with a single disk, but fewer I/Os per second since every disk has to
participate in every I/O. Subsumes Level 2 (provides all its benefits, at lower cost). RAID Level 4: Block-
Interleaved Parity; uses block-level striping, and keeps a parity block on a separate disk for corresponding
blocks from N other disks. When writing data block, corresponding block of parity bits must also be computed
and written to parity disk.
It provides higher I/O rates for independent block reads than Level 3 block read goes to a single disk, so
blocks stored on different disks can be read in parallel Provides high transfer rates for reads of multiple blocks
than no-striping Before writing a block, parity data must be computed Can be done by using old parity block,
old value of current block and new value of current block (2 block reads + 2 block writes) Or by recomputing
the parity value using the new values of blocks corresponding to the parity block More efficient for writing
large amounts of data sequentially Parity block becomes a bottleneck for independent block writes since every
block write also writes to parity disk.
RAID Level 5 : Block-Interleaved Distributed Parity ; partitions data and parity among all N + 1 disks, rather
than storing data in N disks and parity in 1 disk e.g., with 5 disks, parity block for nth set of blocks is stored on
disk ( n mod 5) + 1, with the data blocks stored on the other 4 disks. Higher I/O rates than Level 4. Block
writes occur in parallel if the blocks and their parity blocks are on different disks. Subsumes Level 4: provides
same benefits, but avoids bottleneck of parity disk.
RAID Level 6: P+Q Redundancy scheme; similar to Level 5, but stores extra redundant information to guard
against multiple disk failures. Better reliability than Level 5 at a higher cost; not used as widely.
Choice of RAID Level Factors in choosing RAID level Monetary cost Performance, Number of I/O operations
per second, and bandwidth during normal operation Performance during failure Performance during rebuild of
failed disk Including time taken to rebuild failed disk RAID 0 is used only when data safety is not important
e.g. data can be recovered quickly from other sources Level 2 and 4 never used since they are subsumed by 3
and 5 Level 3 is not used anymore since bit-striping forces single block reads to access all disks, wasting disk
arm movement, which block striping (level 5) avoids Level 6 is rarely used since levels 1 and 5 offer adequate
safety for almost all applications So competition is between 1 and 5 only.
Level 1 provides much better write performance than level 5 Level 5 requires at least 2 block reads and 2
block writes to write a single block, whereas Level 1 only requires 2 block writes Level 1 preferred for high
update environments such as log disks Level 1 had higher storage cost than level 5 disk drive capacities
increasing rapidly (50%/year) whereas disk access times have decreased much less (x 3 in 10 years) I/O
requirements have increased greatly, e.g. for Web servers When enough disks have been bought to satisfy
required rate of I/O, they often have spare storage capacity so there is often no extra monetary cost for Level
1! Level 5 is preferred for applications with low update rate, and large amounts of data Level 1 is preferred for
all other applications.
Hardware Issues Software RAID: RAID implementations done entirely in software, with no special hardware
support. RAID implementations with special hardware use non-volatile RAM to record writes that are being
executed.
Beware: power failure during write can result in corrupted disk e.g. failure after writing one block but before
writing the second in a mirrored system Such corrupted data must be detected when power is restored,
recovery from corruption is similar to recovery from failed disk NV-RAM helps to efficiently detected
potentially corrupted blocks otherwise all blocks of disk must be read and compared with mirror/parity block.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
As a transaction is executing, it lock the part of the database that it is modifying, making the data unavailable
for modifications by other transactions. Isolation of data from outside interference is one of the defining
qualities of transactions.
If the transaction‘s operations succeed, the transaction completes by writing changes to disk in a commit
operation. A commit operation releases the transaction‘s locks, making the updated data available to other
transactions.
If an error occurs partway through execution, making it impossible for the entire transaction to succeed, the
entire transaction fails. Rather than leave partial results in the database, the transaction backs out the changes
that it has already made, leaving the database with the values that it had before the transaction started. The
back-out process is called rollback.
Another type of database transaction, two-phase commit, is adapted for distributed use across multiple
databases. Two-phase commit uses a transaction monitor to coordinate concurrent updates. The transaction
monitor first checks that all databases can make the desired change. Even if all can make the change, they
must wait for the transaction monitor's signal before doing so. If all cannot make the change, none do.
Every transaction must maintain data consistency in its database, but the two-phase commit protocol extends
the scope of this requirement beyond individual databases. A common example is a funds transfer, in which
the transaction monitor ensures that the funds debited from one account are credited in the other account or
that neither account is modified.
The ACID model is one of the oldest and most important concepts of database theory. It sets forward four
goals that every database management system must strive to achieve: atomicity, consistency, isolation and
durability. No database that fails to meet any of these four goals can be considered reliable.
Atomicity
Atomicity states that database modifications must follow an ―all or nothing‖ rule. Each transaction is said to be
―atomic.‖ If one part of the transaction fails, the entire transaction fails. It is critical that the database
management system maintain the atomic nature of transactions in spite of any DBMS, operating system or
hardware failure.
Consistency
Consistency states that only valid data will be written to the database. If, for some reason, a transaction is
executed that violates the database‘s consistency rules, the entire transaction will be rolled back and the
database will be restored to a state consistent with those rules. On the other hand, if a transaction successfully
executes, it will take the database from one state that is consistent with the rules to another state that is also
consistent with the rules.
Isolation
Isolation requires that multiple transactions occurring at the same time not impact each other‘s execution. For
example, if Joe issues a transaction against a database at the same time that Mary issues a different transaction;
both transactions should operate on the database in an isolated manner. The database should either perform
Joe‘s entire transaction before executing Mary‘s or vice-versa. This prevents Joe‘s transaction from reading
intermediate data produced as a side effect of part of Mary‘s transaction that will not eventually be committed
to the database. Note that the isolation property does not ensure which transaction will execute first, merely
that they will not interfere with each other.
Durability
Durability ensures that any transaction committed to the database will not be lost. Durability is ensured
through the use of database backups and transaction logs that facilitate the restoration of committed
transactions in spite of any subsequent software or hardware failures.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Caution
Any unsecured data in any data dictionary table can be easily altered or deleted by any user.
Reliability
o data loss on power failure or system crash
o physical failure of the storage device
Classification:
1. volatile storage: loses contents when power is turned off
2. non-volatile storage: contents persist when power is switched off
Storage Hierarchy
Primary Storage: Fastest media but volatile (cache, main memory)
Secondary Storage: next lower level in hierarchy, non-volatile, moderately fast access time, sometimes
also called on-line storage (magnetic disks, flash memory)
Tertiary Storage: lowest level in hierarchy, non-volatile, slow access time, also called o-line storage
(magnetic tape, optical storage)
RAID
Redundant Arrays of Independent Disks: disk organization that takes advantage of utilizing large
numbers of inexpensive, mass-market disks
Main Idea: Improvement of reliability via redundancy, i.e., store extra information that can be used to
rebuild information lost in case of a disk failure. Use Mirroring (or shadowing): duplicate every disk
(logical disk consists of two physical disks)
Different RAID levels (0-6) have different cost, performance, and reliability characteristics.
Storage Access
A data base file is partitioned in to fixed-length storage units called blocks (or pages). Blocks/pages
are units of both storage allocation and data transfer.
Database system seeks to minimize the number of block transfers between disk and main memory.
Transfer can be reduced by keeping as many blocks as possible in main memory.
Buffer Pool: Portion of main memory available to store copies of disk blocks.
Buffer Manager: System component responsible for allocating and managing buer space in main
memory.
Buffer Manager
Program calls on buffer manager when it needs block from disk
The requesting program is given the address of the block in main memory, if it is already present in
the buffer.
If the block is not in the buffer, the buffer manager allocates space in the buffer for the block,
replacing (throwing out) some other blocks, if necessary to make space for new blocks.
The block that is thrown out is written back to the disk only if it was modified since the most recent
time that it was written to/fetched from the disk.
Once space is allocated in the buffer, the buffer manager reads in the block from the disk to the buffer,
and passes the address of the block in the main memory to the requesting program.
Sequential file
A sequential file maintains the records in the logical sequence of its primary key values.
A sequential file can be stored on devices like magnetic tape that allow sequential access.
In this organization records are written consecutively when the file is created. Records in a sequential
file can be stored in two ways.
• Pile file: Records are placed one after another as they arrive (no sorting of any kind).
• Sorted file: Records are placed in ascending or descending values of the primary key.
File Reorganization: In file reorganization all records, which are marked to be deleted are deleted and all
inserted records are moved to their correct place (sorting).
5. Files in his type are stored in direct access storage devices such as magnetic disk, using an identifying key.
6. The identifying key relates to tits actual storage position in the file.
7. The computer can directly locate the key to find the desired record without having to search through any
other record first.
8. Here the records are stored randomly, hence the name random file.
9. It uses online system where the response and updation are fast.
8.10 Summary
A database administrator (short form DBA) is a person responsible for the installation, configuration,
upgrade, administration, monitoring and maintenance of physical databases.
DBA is usually expected to have experience with one or more of the major database management
products, such as Structured Query Language, SAP, and Oracle-based database management software.
A disk block loses its contents as a result of either a head crash or failure during a data transfer. Copies of
data on other disks, or archival backups on tertiary media, such as tapes, are used to recover from the
failure.
There is a hardware malfunction, or a bug in the database software or the operating system, that causes the
loss of the content of volatile storage, and brings transaction processing to a halt. The content of the non-
volatile storage remains intact, and is not corrupted.
RAID Levels Schemes to provide redundancy at lower cost by using disk striping combined with parity
bits Different RAID organizations, or RAID levels, have differing cost, performance and reliability
characteristics RAID Level 1.
8.11 Keywords
Atomicity: It states that database modifications must follow an ―all or nothing‖ rule. Each transaction is said to
be ―atomic.‖
Database Administrator (DBA): It is the person (or group of people) responsible for overall control of the
database system.
Logical Error: The transaction cannot continue with its normal execution because of such things as bad input,
data not found, or resource limit exceeded.
RAID: It stands for Redundant Arrays of Independent Disks. RAID is the use of multiple disks and data
distribution techniques to get better Resilience and/or Performance.
System Crash: There is a hardware malfunction, or a bug in the database software or the operating system, that
causes the loss of the content of volatile storage, and brings transaction processing to a halt.
9.0 Objectives
After studying this chapter, you will be able to:
Discuss the database system architectures
Explain the centralized system
Discuss the client-server system
Explain the parallel and distributed database system
9.1 Introduction
In this chapter we are going to discuss the data base system architecture. We also discuss the basic structure of
distributed systems. Unlike parallel systems, in which the processors are tightly coupled and constitute a single
database system, a distributed database system consists of loosely coupled sites that share no physical
components. Furthermore, the database systems that run on each site may have a substantial degree of mutual
independence.
Each site may participate in the execution of transactions that access data at one site, or several sites. The main
difference between centralized and distributed database systems is that, in the former, the data reside in one
single location, whereas in the latter, the data reside in several locations. This distribution of data is the cause
of many difficulties in transaction processing and query processing. In this chapter, we address these
difficulties. Environments and in the schemas under which data are stored. A multidatabase system is a
software layer that enables such a heterogeneous collection of databases to be treated like a homogeneous
distributed database.
Caution
When placing mission critical applications on a client/server system. The end-user computing evolution
provided computing power at the workplace, and resulted in end-user demand for access to corporate data with
little regard for the security of that data.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Caution
You must always create a logical-log backup after a parallel archive to make sure that you can restore the
database in the event of a failure. Unlike a sequential archive, a parallel archive cannot be used for a restore
without the accompanying logical-log files, so a logical-log backup is essential.
4. The statement in SQL which allows to change the definition of a table is…….
(a). Alter (b). Update (c). Create (d). Select
Distributed database design: The methodology used for the logical design of a centralized database applies to
the design of the distributed one as well. However, for a distributed database three additional factors have to be
considered.
Data Fragmentation: Before we decide how to distribute the data we must determine the logical units of
distribution. The database may be broken up into logical units called fragments which will be stored at
different sites. The simplest logical units are the tables themselves.
Horizontal fragmentation: A horizontal fragment of a table is a subset of rows in it. So horizontal
fragmentation divides a table 'horizontally' by selecting the relevant rows and these fragments can be assigned
to different sides in the distributed system.
Vertical fragmentation: a vertical fragment of a table keeps only certain attributes of it. It divides a table
vertically by columns. It is necessary to include the primary key of the table in each vertical fragment so that
the full table can be reconstructed if needed.
Mixed fragmentation: in a mixed fragmentation each fragment can be specified by a SELECT-PROJECT
combination of operations. In this case the original table can be reconstructed be applying union and natural
join operations in the appropriate order.
Data Replication: A copy of each fragment can be maintained at several sites. Data replication is the design
process of deciding which fragments will be replicated.
Data Allocation: Each fragment has to be allocated to one or more sites, where it will be stored. There are
three strategies regarding the allocation of data:
Fragmented (or partitioned): The database is partitioned into disjoint fragments, with each fragment assigned
to one site (no replication). This is also called ‗non-redundant allocation‘.
Complete replication: A complete copy of the database is maintained at each site (no fragmentation). Here,
storage costs and communication costs for updates are most expensive. To overcome some of these problems,
snapshots are sometimes used. A snapshot is a copy of the data at a given time. Copies are updated
periodically. Selective replication: A combination of fragmentation and replication.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Note
You can use a (Data Source Name) DSN connection (in which case you must ensure that your data source
name is unique), or you can use a DSN-less connection. You use a DSN-less connection by coding your ASP
scripts to issue commands to your Access database directly. This bypasses the ODBC software on the server,
and allows for faster connection and execution time on the database interaction.
Web appliance supports only System DSNs because these are the only ones accessible to Microsoft Windows
2000 applications (such as IIS), and hence to remote servers.
While all drivers require that you provide the database location, some drivers require additional parameters.
These are often specified by the database. For example, Microsoft Access DSNs must refer to database files on
the local file system.
The ODBC Manager allows you to view, add, modify, and delete data sources. It offers three options as
follows:
View Data Source List: The View Data Source List option allows you to view, modify, and delete data
sources.
Add SQL Server Data Source: The Add SQL Server Data Source option allows you to add SQL Server
data sources.
Add Access Data Source: The Add Access Data Source option allows you to add Access data sources.
Did You Know?
ODBC defines a standard C API for accessing a relational DBMS. It was developed by the SQL Access Group
in 1992 to standardize the use of a DBMS by an application.
9.9 Summary
A single database is a collection of tables, which are related to each other with the help of common fields.
RDBMS includes features such as data independence and data abstraction that help in efficiently
organizing the data.
Prevention of data redundancy and persistent storage makes RDBMS useful for storing data.
Data Source Name provides connectivity to a database through an ODBC driver. The DSN contains
database name, directory, database driver, User ID, password, and other information.
A client is defined as a requester of services and a server is defined as the provider of services.
9.10 Keywords
Centralization: Is the process by which the activities of an organization, particularly those regarding planning
and decision-making become concentrated within a particular location and/or group.
Collusion: This group may decide to collude in order to inflate their own trust values and deflate trust values
for peers that are not in the collective. Therefore, a certain level of resistance needs to be in place to limit the
effect of malicious collectives.
Distributed Database: Is a collection of multiple logically interrelated databases distributed over a computer
network and a distributed database management system is a software system that manages a distributed
database.
Fragmentation: The relation is partitioned into several fragments. Each fragment is
Heterogeneous: if sites may run different DBMS products, which need not be based on the same underlying
data model and so may be composed of Relational, Network, Hierarchical and Object-oriented DBMSs.
1.0 Objectives
After studying this chapter, you will be able to:
Discuss the basic concept of C programming
Explain the C character set
Understand about the data types used in C
Define and declare the various variables in C
Discuss the different operators used in C
Explain the various arithmetic expressions in C
Understand the operator precedence and their associativity
1.1 Introduction
The C is a programming language developed at AT & T‘s Bell Laboratories of USA in 1972. It was designed
and written by a man named Dennis Ritchie. In the late seventies C began to replace the more familiar
languages of that time like PL/I, ALGOL, etc. No one pushed C. It was not made the ‗official‘ Bell Labs
language. Thus, without any advertisement C‘s reputation spread and its pool of users grew. Ritchie seems to
have been rather surprised that so many programmers preferred C to older languages like FORTRAN or PL/I,
or the newer ones like Pascal and APL. But, that is what happened.
main()
function1() function2()
{ { {
statement1; statement1; statement1;
statement2; statement2; statement2;
……; ……..; ……;
……; ……..; ……;
} } }
Programmers are free to name C program functions (except the main () function).
Learning any programming language becomes easy with a hands-on approach, so let us get right to it. The
following is a simple C program that prints a message ‗Hello, world‘ on the screen.
/* First program of C language */
1.#include<stdio.h>
2.main()
3.{
4. printf(―Hello, world‖);
5.}
Type this program in any text editor and then compile and run it using a C-compiler. However, your task will
become much easier if you are using an IDE such as Turbo C
Caution
Using more than one main ( ) in a C program, is illegal and it creates an error during program execution.
Letters Digitals
Uppercase A to Z All decimal digits 0 to 9
Special Character
, comma & ampersand
. period ^ caret
; semicolon * astreek
: colon –minus
? question + plus
― quotation < opening bracket
! exclamation > closing bracket
/ slash ( left parenthesis
_ underscore ) right parenthesis
$ dollar sign { opening bracket
% percent sign } closing bracket
# number sign
White spaces
Blank
Horizontal
New line
Table 1.2: ANSI C trigraph sequences
Trgraph Sequence Translation
??& & ampersand
??^ ^ caret
?? * * astreek
??– –minus
??+ + plus
??< < opening bracket
??> > closing bracket
??( ( left parenthesis
??) ) right parenthesis
??{ { opening bracket
??} } closing bracket
Identifiers refer to the names of variables, functions and arrays. These are user defined names consist of
sequence of letters and digits, with a letter as first character. Both upper and lower case letter are permitted.
Lower case letter are commonly used. The underscore character is also permitted in identifiers.
3. A program that translates from a low level language to a higher level one is
(a) Decompiler (b) Compiler
(c) Interpreter (d) None of these.
1.4 Constants
Constants in C refer to fixed values that do not change during the execution of a program
Integer Constants
An integer constant refers to a sequence of digits. There are three types of integers namely decimal, octal and
hexadecimal. Decimal integer consists of set of digits, 0 through 9. Valid example of decimal integer constant
is:
123
_321
0
+78
Embedded spaces, commas, and non-digit characters are not permitted between digits for example
15 750
20,000
1000
An octal integer constant consists of any combination of digits from the set 0 through 7 with a leading 0. Some
examples of octal integers are:
037
0
0435
0551
A sequence of digits preceded by 0x or 0X is considered as hexadecimal integer They may also include
alphabets A through F or a through f. The letters A through F represent the numbers 10 through the following
are the examples of valid hex integers
0x2
0x9F
0Xbcd
0x
We rarely use octal and hexadecimal numbers in programming. The largest integer value that can be stored is
machine-dependent. It is 32167 on l6-bit machines and 2l47, 483, 647 on 32-bit machines. It is also possible to
store larger integer constants on these machines by appending qualifiers such as U, L and UL to the constants.
For example:
56789U or 56789u (unsigned integer)
987612347UL or 987612314ul (unsigned long integer)
9876543L or 98765431 (long integer)
The output is shown that the integer values larger than 3276 are not properly stored on a 16-bit machine.
However, when they are qualified as long integer (by appending L) the values are correctly stored.
Program
main( )
{
printf (―lnteger values\n\n‖) ;
printf (―%d %d %d\n‖, 327 67,327 67 + 1,327 67 +10);
print(―/n‖);
printf (―Long integer values\n\n‖);
printf (―%ld %ld %ld\n ―, 32767L + 1L32767L+1L,32767L+ 10L);
}
Output
Integer values
32767 –32768 –32759
Long integer values
32767 32768 32777
Real Constants
Integer numbers are inadequate to represent quantities that vary continuously, such as distances, heights,
temperatures, prices, and so on. These quantities are represented by number containing fractional parts like
17.548. Such numbers are called real constants. Further examples of real constants are:
0.0083
-0.75
435.36
+247.0
These numbers are shown in decimal notation, having a whole number followed by a decimal point and the
fractional part. It is possible to omit digits before the decimal point or digits after the decimal point. That is,
215.
.95
.71
+.5
are all valid real numbers. A real number may also be expressed in exponential (or scientific) notation. For
example, the value 215.65 may be written as 2.l565e2 in exponential notation. The general form is:
mantissa e exponent
The mantissa is either a real number expressed in decimal notation or an integer .The exponent is an integer
number with an optional plus or minus sign. The letter e separating the mantissa and the exponent can be
written in either lowercase or uppercase. Since the exponent causes the decimal point to ―float‖, this notation
is said to represent a real number in floating point form. Examples of legal floating point constants are:
0.65e4
12e-2
1.5e+5
3.18E3
-1.2E-1
Embedded white space is not allowed.
Exponential notation is useful for representing numbers that small in magnitude. For example, 7500000000
may be written -0.000000368 is equivalent to -3.688-7. Floating point constants are normally represented as
double-precision quantities. However, the suffixes f or F may be used to force single-precision and I or L to
extend double-precision further. Some examples of valid and invalid numeric constants are given in 1.4
String Constants
A string constant is a sequence of characters enclosed in double quotes. The characters may be letters,
numbers, special characters and blank space examples are:
―Hello!‖
―1 987‖
―WELL DONE‖
―?...t‘
―5+3‖
―X‖
Remember that a character constant (e.g., ‗X‘) is not equivalent to the single character string constant (e.g.,
―X‖). Further, a single character string constant does not have an equivalent integer value while a character
constant has an integer value. Character strings are often used in programs to build meaningful programs.
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
1.6 Variables
A variable is a data name that may be used to store a data value. Unlike constants that remain unchanged
during the execution of a program, a variable may take different values at different times during execution.
A variable name can be chosen by the programmer in a meaningful way so as to reflect its function or nature
in the program i.e.
Average
height
Total
Counter_1
class_strength
As mentioned earlier, variable names may consist of letter, digits and underscores (_) character, subject to the
following conditions:
l. They must begin with a letter. Some system permits underscores as the first character.
2. ANSI standard recognizes a length of 31 characters. Since only the first eight characters are treated as
significant by many compilers.
3. White space is not allowed.
Some examples of valid variable names are:
Akash Value T_raise
Delhi x1 ph_value
Mark sum1 distance
Invalid examples include:
t23 (area)
% 25th
Further examples of variable names and their correctness are given in Table 1.8.
If only the first eight character are recognized by a compiler, then the two names
average_height
average_weight
mean the same thing to the computer. Such names can be rewritten as
avg_heigtrt and avg_weight
ht_average and wt_average
without changing their meanings.
The program segment given in Figure 1.10 illustrates declaration of variables. Main () is the beginning of the
program. The opening brace {signals the execution of the program. Declaration of variables is usually done
immediately after the opening brace of the program. The variable can also be declared outside (either before or
after) the main function. The importance of place of declaration will be dealt in detail later while discussing
functions.
Main( ) /*..........Program Name........... ...........*/
{
/*......................... Declaration....................... */
float x, y;
int code;
short int count;
long int amount;
double deviation;
unsigned n;
char c;
/*...............Computation.....................*/
…..
……..
……
} /*..............Program ends.....................*/
Figure 1.10: Declaration of variables.
When an adjective (qualifier) short, long, or unsigned is used without a basic data type specifier, the C
compilers treat the data type as an int. If we want to declare a character then we must do so using both the
terms like unsigned char.
The compiler automatically assigns integer digits beginning with 0 to all the enumeration constants. That is,
the enumeration constant value1 is assigned 0, value 2 is assigned 1, and so on. However, the automatic
assignments can be overridden by assigning values explicitly to the enumeration constants. For example:
enum day {Monday = 1, Tuesday, …….. Sunday};
Here, the constant Monday is assigned the value of that increase successively by l. The remaining constants
are assigned values that increase successively by 1.
The definition and declaration of enumerated variables can be combined in one statement.
Example:
enum day {Monday, ... Sunday} week_st, week_end;
The storage class is another qualifier (like long or unsigned) that can be added to a variable declaration as
shown below:
auto int count;
register char ch;
static int x;
extern long total;
Static and external (extern) variables are automatically initialized to zero. Automatic (auto) variables contain
undefined values (known as ‗garbage‘) unless they are initialized explicitly.
The control string contains the format of data being received. The ampersand symbol & before each variable
name is an operator that specifies the variable name‘s address. We must always use this operator, otherwise
unexpected results may occur. Let us look at an example:
scanf(―%d‖, &number);
When this statement is encountered by the computer, the execution stops and waits for the value of the
variable number to be typed in. Since the control string ―%d‖ specifies that an integer value is to be read from
the terminal, we have to type in the value in integer form. Once the number is typed in and the ‗Return‘ Key is
pressed, the computer then proceeds to the next statement. Thus, the use of scanf provides an interactive
feature and makes the program ‗user friendly‘.
Example
The program in Figure 2.3 illustrates the use of scanf function.
The first executable statement in the program is a printf, requesting the user to enter an integer number. This is
known as ―prompt message‖ and appears on the screen like
Enter an integer number
As soon as the user types in an integer number, the computer proceeds to compare the value with 100. If the
value typed is less than 100, then a message
Your number is smaller than 100
is printed on the screen. Otherwise, the message
Your number contains more than two digits
is printed. Outputs of the program, run for two different inputs are also shown in Figure 1.12.
Program
/************************************************************/
/* INTERACTING COMPUTING USING scanf FUNCTION */
/************************************************************/
main( )
{
int number;
printf (―Enter an integer number\n‖);
scanf (―%d‖, &number);
if ( number < 100 )
printf(―Your number is smaller than 100\n\n‖);
else
printf(―Your number contains more than two digits\n‖);
}
Output
Enter an integer number
54
Your number is smaller than 100
Enter an integer number
108
Your number contains more than two digits
Figure 1.12: Use of scanf function.
Some compilers permit the use of the ‗prompt message‘ as a part of the control string in scanf, like
scanf (―Enter a number %d‖, &number);
In this we have used a decision statement if…else to decide whether the number is less than 100.
In this case, computer requests the user to input the values of the amount to be invested, interest rate and
period of investment by printing a prompt message
lnput amount, interest rate, and period and then waits for input values. As soon as we finish entering the three
values
Program
/***************************************************/
/* INTERACTIVE INVESTMENT PROGRAM */
/***************************************************/
main( )
{
int year, period ;
float amount, inrate, value ;
printf(―lnput amount, interest rate, and period\n\n,,) ;
scanf (―%f %‖d‖, &amount, &inrate, &period) ;
printf(―\n‖) ;
year = 1 ;
while( year <= period )
{
value = amount + inrate * amount ;
printt(―%2d Rs %8.2f\n‖, year, value) ;
amount = value ;
year=year+1;
}
}
Output
lnput amount, interest rate, and period
10000 0.14 5
1 Rs 11400.00
2 Rs 12996.00
3 Rs 14815.44
4 Rs 16889.60
5 Rs 19254.15
Input amount, interest rate, and period
20000 0.12 7
Rs 22400.00
Rs 25088.00
Rs 28098.56
Rs 31470.39
Rs 35246.84
Rs 39476.46
Rs 44213;63
Fig. 1.13: Interactive investment program.
corresponding to the three variables amount, inrate, and period, the computer begins to calculate the amount at
the end of each year, upto ‗period‘ and produces output.
Note that the scanf function contains three variables. In such cases, care should be exercised to see that the
values entered match the order and type of the variables in the list. Any mismatch might lead to unexpected
results. The compiler may not detect such errors.
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
1.8 Operators
The C supports a rich set of operators. We have already used several of them, such as =, *. _, *, & and <. An
operator is a symbol that tells the computer to perform certain mathematical or logical manipulations.
Operators are used in programs to manipulate data and variables. They usually form a part of the mathematical
of logical expressions.
The C operators can be classified into a number of categories. They include:
1. Arithmetic operators.
2. Relational operators.
3. Logical operators.
4. Assignment operators.
5. Increment and decrement operators.
6. Conditional operators.
7. Bitwise operators.
8. Special operators.
Integer division truncates any fractional part. The module division produces the remainder of an integer
division. Examples of arithmetic operators are:
a–b a+b
a*b a/b
a%b -a * b
Here a and b are variables and are known as operands. The module division operator % cannot be used on
floating point data.
Note that C does not have an operator for exponentiation.
Integer Arithmetic
When both the operands in a single arithmetic expression such as a+b are integers, the expression is called an
integer expression and the operation is called integer arithmetic. Integer arithmetic always yields an integer
value. The largest integer value depends on the machine, as pointed out earlier. In the above examples, if a and
b are integers, then for a = 14 and b = 4
we have the following results:
a – b = 10
a + b = 18
a * b = 56
a / b = 3 (decimal part truncated)
a % b = 2 (remainder of division)
During integer division, if both the operands are of the same sign, the result is truncated towards zero. If one of
them is negative, the direction of truncation is implementation dependent. That is,
6/7 = and -6/-7= 0
but -6/7 may be zero or -1. (Machine dependent)
Similarly, during modulo division, the sign of the result is always the sign of the first operand (the dividend.)
That is
-14 % 3 = -2
-14 % -3 = -2
14 % -3 = 2
Example: The program in Figure 1.4 shows the use of integer arithmetic to convert a given number of days
into months and days.
The variables months and days are declared as integers. Therefore, the statement
months = days/30;
truncates the decimal part and assigns the integer part to months. Similarly, the statement
days = days%30;
assigns the remainder part of the division to days. Thus the given number of days is converted into an
equivalent number of months and days and the result is printed as shown in the output.
Program
/*************************************************************************/
/* PROGRAM TO CONVERT DAYS TO MONTHS AND DAYS */
/************************************************************************/
main( )
{
int months, days;
printl(―Enter days\ n‖);
scanf(―%d‖, &days);
months = days/30;
days = days % 30;
printf(―Months = %d Days = %d‖, months, days);
}
Output
Enter days
265
Months=8 Days=25
Enter days
364
Months = 12 Days = 4
Enter days
45
Months = 1 Days = 15
Figure1.15: Illustration of integer arithmetic.
Real Arithmetic
An arithmetic operation involving only real operands is called real arithmetic. A real operand may assume
values either in decimal or exponential notation. Since floating point values are rounded to the number of
significant digits permissible, the final value is an approximation of the correct result. If x, y, and z are floats,
then we will have:
x = 6.017.0 = 0.857143
y = 1.0/3.0 = 0.333333
z = -2.0/3.0 = -0.66667
The operator % cannot be used with real operands.
Mixed-mode Arithmetic
When one of the operands is real and the other is integer, the expression is called a mixed-mode arithmetic
expression. If either operand is of the real type, then only the real operation is performed and the result is
always a real number. Thus
15/10.0 = 1 .5
where as
15/10 = 1
Example:
Output of the program in Figure.1.16 shows round-off errors that can occur in computation of floating point
numbers.
Program
/************************************************************************ /
/* PROGRAM SHOWING ROUND-OFF ERRORS */
/* Sum of n terms of l/n */
/************************************************************************/
main( )
{
float sum, n, term;
int count = 1;
sum = 0;
printf(―Enter value of n\n‖);
scanf(―%f‖, &n);
term = 1.0/n;
while( count <= n )
{
sum = sum + term;
count++;
}
printf(―Sum = %f\n‖, sum);
}
Output
Enter value of n
99
Sum = 1.000001
Enter value of n
143
Sum = 0.999999
Figure 1 16: Round-off errors in floating point computations.
We know that the sum of n terms of 1/n is 1. However, due to errors in floating point representation, the result
is not always 1.
The C supports six relational operators in all. These operators and their meanings are shown in Table 1.17.
A simple relational expression contains only one relational operator and takes the following form:
ae-1 relational operator ae -2
ae-1 and ae -2 are arithmetic expressions, which may be simple constants, variables or combination of them.
Given below are some examples of simple relational expressions and their values:
4.5 < = 10 True
4.5 < -10 False
-35 >= 0 False
10 < 7+5 True
a+b = = c+d True
only if the sum of values of a and b is equal to the sum of values of c and d. When arithmetic expressions are
used on either side of a relational operator, the arithmetic expressions will be evaluated first and then the
results compared. That is, arithmetic operators have a higher priority over relational operators.
Relational expressions are used in decision statements such as, if and while to decide the course of action of a
running program. We have already used the while statement.
a>b && x = = 10
An expression of this kind which combines two or more relational expressions is termed as a logical
expression or a compound relational expression. Like the simple relational expressions, a logical expression
also yields a value of one or zero, according to the truth table shown in Table 1.18. The logical expression
given above is true only if a > b is true and x == 10 is true. If either (or both) of them are false, the expression
is false.
Table 1.18: Truth Table
It is easier to read and understand, and is more efficient because the expression 5*i-2 is evaluated only once.
Example program of Figure 1.6 prints a sequence of squares of numbers. Note the use of the shorthand
operator * =
The program attempts to print a sequence of squares of numbers starting from 2. The statement
a * = a;
which is identical to
a = a*a;
replaces the current value of a by its square. When the value of a becomes equal or greater than N (=100) the
while is terminated. Note that the output contains only three values 2, 4 and 16.
Program
/*******************************************************************/
/* PROGRAM TO SHOW USE OF SHORTHAND OPERATORS */
/*******************************************************************/
#define N 100
#define A 2
main()
{
int a;
a = A;
while( a < N )
{
printf(―%d\n‖, a);
a * = a;
}
}
Output
2
4
16
Figure 1.20: Use of shorthand operator.
Comma Operator
The comma operator can be used to link the related expressions together. A comma-linked list of expressions
is evaluated left to right and the value of right-mist expression is the value of the combined expression. For
example, the statement
value = (x = 10, y = 5, x+y);
first assigns the value 10 to x, then assigns 5 to y, and finally assigns 15 (i.e, 10+5) to value.
Since comma operator has the lowest precedence of all operators, the parentheses are necessary.
Some applications of comma operator are:
In for loops:
for (n = 1, m = 10; n <= m; n++, m++)
In while loops:
while(c=getchar (), c! = ‗10‘)
Exchanging values:
t = x, x = y, y = t;
10 include<stdio.h>. are
(a) Preprocessor (b) Function
(c) File name (d) comment
1.9 Summary
The C is a programming language developed at AT & T‘s Bell Laboratories of USA in 1972.
ANSI C standard—in 1983, the American National Standards Institute (ANSI) commissioned a
committee, X3J11, to standardize the C language.
The character set that used to form words numbers and expression depend on the computer on which the
programs run. Constants in c refer to fixed values.
Data types in C language is reach in its data type storage representation and machine instruction to hand
IC constants, variables.
A variable character defined as a character (char) type data.
1.10 Keywords
ANSI: In 1983, the American National Standards Institute (ANSI) commissioned a committee, X3J11, to
standardize the C language.
C character set: The characters that can be used to form words, numbers and expressions depend upon the
computer on which the program is run.
Constants: Constants in C refer to fixed values that do not change during the execution of a program.
Data-types: Data-type helps the programmer to select the type appropriate to the needs of the application as
well as the machine.
Decompiler: A program that translates from a low level language to a higher level one is a decompiler.
Integer Constants: An integer constant refer to a sequence of digits
Keywords and identifiers: Keywords serve as basic building blocks for program statements whereas;
identifiers refer to the names of variables, functions and arrays.
main():.The ‗main()‘ function is the most important function and must be present in every C program.
Object Oriented Programming (OOP): Object-oriented programming (OOP) is a programming paradigm
using ―objects‖ – data structures consisting of data fields and methods together with their interactions.
Real Constants: Number containing fractional parts like 17.548 is called real constants.
String Constants: A string constant is a sequence of characters enclosed in double quotes.
Tokens: In a passage of text, individual words and punctuation marks are called tokens.
Trigraph Character: Trigraph sequences provide a way to enter certain characters that are not available on
some keyboards.
Variables: A variable is a data name that may be used to store a data value.
2.0 Objectives
After studying this chapter, you will be able to:
Discuss the sequential statements in C
Understand the unformatted I/O functions
Explain the formatted input
Explain the formatted output
Define the branching statements in C
Discuss the switch statement
2.1 Introduction
―Decision making‖ is one of the most important concepts of computer programming. Programs should be able
to make logical (true/false) decisions based on the condition they are in; every program has one or few
problem/s to solve; depending on the nature of the problems, important decisions have to be made in order to
solve those particular problems.
In C programming ―selection construct‖ or ―conditional statement‖ is used for decision making. Figure 2.1
illustrates ―selection construct‖.
Figure 2.1: Simple selection construct.
Conditional statement is the term used by many programming languages. The importance of conditional
statements should not be ignored because every program has an example of these statements. ―if statement‖
and ―switch statement‖ are the most popular conditional statements used in C.
Branching
Branch is the term given to the code executed in sequence as a result of change in the program‘s flow; the
program‘s flow can be changed by conditional statements in that program. Figure 2.2 shows the link between
selection (decision making) and branching (acting).
The control string specifies the field format in which the data is to be entered and the arguments arg1
,arg2,......argn specify the address of locations where the data is stored. Control string and arguments are
separated by commas.
Control string contains field specifications which direct the interpretation of input data. It may include:
Field (or format) specifications, consisting of the conversion character %,a data type character (or type
specifier), and an optional number, specifying field width.
Blanks, tabs, or newlines.
Blanks tabs and newlines are ignored. The data type character indicates the type of data that is to be assigned
to the variable-associated with the corresponding argument. The field width specifier is optional. The
discussions that follow will clarify these concepts.
The percent sign (%) indicates that a conversion specification follows. w is an integer number that specifies
the field width of the number to be read and d, known, as data type character, indicates that the number to be
read is in integer mode. Consider the following example:
scanf(―%2d %5d‖, &num1, &num2);
Data line:
50 31426
The value 50 is assigned to num1 and 31426 to num2. Suppose the input data is as follows:
31426 50
The variable num1 will be assigned 31 (because of %2d) and num2 will be assigned 426 (unread part of
31426). The value 50 that is unread will be assigned to the first variable in the next scanf call. This kind of
errors may be eliminated if we use the field specifications without the field width specifications. That is, the
statement
Scanf(―%d %d‖, &num1 &num2.)
will read the data
31426 50
correctly and assign 31426 to num1 and 50 to num2.
What happens if we enter a floating point number instead of an integer? The fractional part may be stripped
away! Also, scanf may skip reading further input.
When the scanf reads a particular value, reading of the value will terminate as soon as the number of
characters specified by the field width is reached (if specified) or until a character that is not valid for the value
being read is encountered. In the case of integers, valid characters are an optionally signed sequence of digits.
An input field may be skipped by specifying * in the place of field width. For example, the statement
scanf(―%d 7od‖, &num1, &num2);
will assign the data
31426 50
as the follows: 123 456 789
123 to a
456 skipped (because of *)
789 to b
The data type character d may be preceded by ‗r‘ (letter ell) to read long integers.
Example :
Various input formatting options for reading integers are experimented with in the program shown in Figure
2.3.
The first scanf requests input data for three integer values a, b, and c, and accordingly three values 1, 2, and 3
are keyed in. Because of the specification %* d the value 2 has been skipped and 3 is assigned to the variable
b. Notice that since no data is available for c, it contains garbage.
The second scanf specifies the format %2d and %4d for the variables x and y respectively. Whenever we
specify field width for reading integer numbers, the input numbers should not contain more digits than the
specified size. Otherwise, the extra digits on the right-hand side will be truncated and assigned to the next
variable in the list. Thus, the second scanf has truncated the four digit number 6789 and assigned 67 to X and
89 to y. the value 4321 has assigned to the first variable in the immediately following scanf statement.
main()
{
int a,b,c,x,y,z;
int p,q,r;
Caution
Input data items must be separated by spaces, tabs or newlines punctuation marks do not count as separators
because when the scanf function searches the input data line for a value to be, read, it will always bypass any
white space characters.
main()
{
int no;
char name1[15], name2[15], name3[15];
printf(―Enter serial number and name one\n‖);
scanf(―%d %15c‖, &no, name1);
printf(―%d %15s\n\n‖, no, name1);
printf(―Enter serial number and name two\n‖);
scanf(―%d %s‖, &no, name2);
printf(―%d %15s\n\n‖, no, name2);
printf(―Enter serial number and name three\n‖);
scanf(―%d %15s‖, &no, name3);
printf(―%d %15s\n\n‖, no, name3);
}
Output
Enter serial number and name one
1 123456789012345
1 123456789012345r
Enter serial number and name two
2 New York
2 New
Enter serial number and name three
2 York
Enter serial number and name one
1 123456789012
1 123456789012 r
Enter serial number and name two
2 New-York
2 New-York
Enter serial number and name three
3 London
3 London
Figure 2.5: Reading of strings.
The specification % [characters] does exactly the reverse. That is, the characters specified after the
circumflex ( ) are not permitted in the input string. The reading of the string will be terminated at the
encounter of one of these characters.
We have just seen that % specifier cannot be used to read strings with blank spaces. But, this can be done with
the help of % [] specification. Blank spaces may be included within the brackets, thus enabling the scanf to
read strings with spaces. Remember that the lowercase and uppercase letters are distinct.
Example illustrates the use of % [] specification.
Example
The program in Figure 3.6 illustrates the function of % [ ] specification.
will read the data correctly and assign the values to the variables in the order in which they appear.
15 p 1.575 coffee
Some systems accept integers in the place of real numbers and vice-versa, and the input data is converted to
the type specified in the control string.
main()
{
char address[80];
printf(―Enter address\n‖);
scanf(―%[a-z]‖, address);
printf(―%-80s\n\n‖, address);
}
Output
Enter address
new delhi 110 002
new delhi
main()
{
char address[80];
printf(―Enter address\n‖);
scanf(―%[^\n]‖, address);
printf(―%-80s\n\n‖, address);
}
Output
Enter address
New Delhi 110 002
New Delhi 110 002
Figure 2.6: Illustration of conversion specification [.] for string.
are successfully read. This value can be used to test whether any errors occurred in reading the input. For
example, the statement
scanf(―%d %f %s‖, &a, &b, name);
will return the value 3 if the following data is typed in:
20 150.25 motor
and will return the value 1 if the following line is entered
20 motor 150.25
This is because the function would encounter a string when it was expecting a floating point value, and would
therefore terminate its scan after reading the first value.
Example: The program presented in Figure 2.7 illustrates the testing for correctness of reading of data by
scanf function
main()
{
int a;
float b;
char c;
printf(―Enter values of a, b and c\n‖);
if (scanf(―%d %f %c‖, &a, &b, &c) == 3)
printf(―a = %d b = %f c = %c\n‖ , a, b, c);
else
printf(―Error in input.\n‖);
}
Output Enter values of a, b and c
12 3.45 A
a = 12 b = 3.450000 c = A
Enter values of a, b and c
23 78 9
a = 23 b = 78.000000 c = 9
Enter values of a, b and c
8 A 5.25
Error in input.
Enter values of a, b and c
Y 12 67
Error in input.
Enter values of a, b and c
15.75 23 X
a = 15 b = 0.750000 c = 2
Figure 2.7: Detection of errors in scanf input.
The function scanf is expected to read three items of data and therefore, when the values for all the three
variables are read correctly, the program prints out their values. During the third run' the second item does not
match with the type of variable and therefore the reading is terminated and the error message is printed. Same
is the case with the fourth run.
In the last run although data items do not match the variables, no error message has been printed. When we
attempt to read a real number for an int variable, the integer part is assigned to the variable and the truncated
decimal part to assigned to the next variable. Note the variable the character ‗2‘ is assigned to the character
variable C.
Commonly used scanf format codes are given in Table 2.8
The following letters may be used as prefix for certain conversion characters.
h for short integers
I for long integers or double
L for long double
Points to Remember While Using scanf
New features are added to these routines from time to time as new versions of systems are released. We should
consult the system reference manual before using these routines. Given below are some of the general points to
keep in mind while writing a scanf statement.
1. All function arguments, except the control string, must be pointers to variables.
2. Format specifications contained in the control string should match the arguments in order.
3. Input data items must be separated by spaces and must match the variables receiving the input in the same
order.
4. When searching for a value, scanf ignores line boundaries and simply looks for the next appropriate
character.
5. Any unread data items in a line will be considered as a part of the data input line to the next scanf call.
6. When the field width specifier w is used, it should be large enough to contain the input data size.
Caution
When scanf encounters an ‗invalid mismatch‘ of data or a character that is not valid for the value being read,
the reading will be terminated.
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
print(―%–10.2e‖,y) 9 8 8 e + 0 1
print(―%e‖,y)
9 8 7 6 5 4 0 e + 0 1
Some systems also support a special field specification character that lets the user define the field size at run-
time. This takes the following form:
In this case, both the field width and the precision are given as arguments which will supply the values for w
and p. For example,
printf(―%* *f‖,7,2,number);
is equivalent to
printf(―%7.2f‖,number);
The advantage of this format is that the values for width and precision may be supplied at run-time, thus
making the format a dynamic one. For example, the above statement can be used as follows:
int width = 7;
int precision = 2;
printf(―%*.*f‖, width, precision, number);
Example:
All the options of printing a real number are illustrated in Figure 3.9.
main( )
{
float y = 98.7654;
printf(―%7.4f\n‖, y);
printf(―%f\n‖, y);
printf(―%7.2f\n‖, );
printf(―%–7.2f\n‖, y);
printf(―%07.2f\n‖, y);
printf(―%*.*f‖, 7, 2, y);
printf(― \n‖); printf(―%10.2e\n‖;
y); printf(―%12,4e\n‖, –y);
printf(―%–10.2e\n‖,y);
printf(―%e\n‖, y);
}
Output
98.7654
98.765404
98.77
98.77
0098.77
98.77
9.88e+001
–9.8765e+001
9.88e+001
9.876540e+001
Figure.2.10: Formatted output of real numbers.
Did You Know?
Microsoft C supports only three digits in exponent part.
Main( )
{
char x = ‗A‘;
static char name[20] = ―ANIL KUMAR GUPTA‖; printf(―OUTPUT OF
CHARACTERS\n\n‖); printf(―%c\n%3c\n%5c\n‖, x,x,x);
printf(―%3c\n%c\n‖, x,x); printf(―\n‖);
printf(―OUTPUT OF STRINGS\n\n‖); printf(―%s\n‖, name);
printf(―%20s\n‖, name); printf(―%20,10sin‖, name);
printf(―%.5s\n‖, name);
printf(―%–20.10s\n‖, name);
printf(―%5s\n‖, name);
}
Output
Output of characters
A
A
A
A
A
OUTPUT OF STRINGS
ANIL KUMAR GUPTA
ANIL KUMAR GUPTA
ANIL KUMAR
ANIL.
ANIL KUMAR
ANIL KUMAR GUPTA
Figure 2.12: Printing of characters and strings.
Commonly used printf format codes are given in Table 2.12 and format flags in Table 2.13
Code Meaning
%c print a single character
%d print a decimal integer
%e print a floating point value in exponent form
%f print a floating point value without exponent
%g print a floating point value either e-type or f-type depending on value
%i print a signed decimal integer
%o print an octal integer, without leading zero
%s print a string
%u print an unsigned decimal integer
%x print a hexadecimal integer, without leading Ox
The following letters may be used as prefix for certain conversion characters.
h for short integers,
l for long integers or double,
L for long double
Table 2.13: Output Format Flags
Flag Meaning
– Output is left-justified within the field. Remaining field will be blank
+ + or – will precede the signed numeric item.
0 Causes leading zeroes to appear.
#(with 0 or x) Causes octal and hex items to be preceded by 0 and Ox, respec tively.
# (with e,f or g) Causes a decimal point to be present in all floating point numbers,
even if it is whole number. Also prevents the truncation of trailing zeros in
g-type conversion.
2.6.1 if Statement
The if statement is a powerful decision making statement and is used to control the flow of execution of
statements. It is basically a two-way decision statement and is used in conjunction with an expression. It takes
the following form:
if (test expression)
It allows the computer to evaluate the expression first and then, depending on whether the value of the
expression (relation or condition) is ‗true‘ (non-zero) or ‗false‘ (zero), it transfers the control to a particular
statement. This point of program has two paths to follow, one for the true condition and the other for the false
condition as shown in Figure. 2.14
Entry
Test
False
expression
True
True
Simple if Statement
The general form of a simple if statement is
if(test expression)
{
statement-block;
}
statement-x;
The ‗statement-block‘ may be a single statement or a group of statements. If the test expression is true, the
statement-block will be executed; otherwise the statement-block will be skipped and the execution will jump
to the statement-x. Remember, when the condition is true both the statement-block and the statement-x are
executed in sequence. This is illustrated in Figure. 2.15.
Consider the following segment of a program that is written for processing of marks
obtained in an entrance examination.
if (category = SPORTS)
{
marks = marks + bonus_marks;
}
printf(―7o|‖, marks);
The program tests‘the type of category of the student. If the student belongs to the SPORTS
category, then additional bonus_marks are added‘to his marks before they are printed. For
others, bonus_marks are not added.
Example:
The program in Figure 2.13 reads four values a, b, c, and d from the terminal and evaluates the ratio of (a+b) to
(c–d) and prints the result, if c–d is not equal to zero.
The program given in Figure 3.13 has been run for two sets of data to see that the paths function properly. The
result of the first run is printed as
Ratio = –3 181818
The second run has neither produced any results nor any message. During the second run, the value of (c–d) is
equal to zero and therefore the statements contained in the statement–block are skipped. Since no other
statement follows the statement–block, program stops without producing any output.
Program
/* *********** Illustration of if Statement **************/
main( )
{
int a, b, c, d;
float ratio;
printf(―Enter four integer values\n‖);
scanf(―%d %d %d ―, &a, &b, &c, &d);
if (c–d != 0)
{
ratio = (ftoat) (a+b) / (float) (c_d);
printf(―Ratio = %f \n;‘, ratio);
}
}
Output
if (test expression)
{
True-block statement(s)
}
else
{
False-block statement(s)
}
statement-xs
If the test expression is true, then the true-block statement (S), immediately following the if statement is
executed, otherwise, the false-block (S) are executed. In either case, either true-block or false-block will be
executed, not both. This is illustrated in Figure 2.15 in both the cases, the control is transferred subsequently to
statement-x.
Figure 2.18: Flowchart of if else control.
Let us consider an example of counting the number of boys and girls in a class. We code 1 for a boy and 2 for
a girl. The program statement to do this may be written as follows:
lf (code == 1;
boy=boy+1.
if (code ==2)
girl =girl +l;
The first test determines whether or not the student is a boy. If yes, the number of boys is increased by 1 and,
the second test. The second test again determines whether the student is a girl. This is unnecessary. Once a
student is identified as a boy, there is no need to test again for a girl. A student can be either a boy or a girl, not
both. The program segment can be modified using the else clause as follows:
if (code == 1;
boy = boy+1.
else
girl =girl +l;
xxxxxxxxxx
Here, if the code is equal to 1, the statement boy = boy + 1; is executed and the control is transferred to the
statement xxxxxx, after skipping the else part. If the code is not equal to 1, the statement boy = boy + 1; is
skipped and the statement in the else part girl = girl + 1; is executed before the control reaches the statement
xxxxxx.
Consider the program given in Figure 3.16. When the value (c – d) is zero, the ratio is not calculated and the
program stops without any message. In such cases we may not know whether the program stopped due to a
zero value or some other error. This program can be improved by adding the else clause as follows:
if (c–d != 0)
{
ratio = (float)(a+b)/(float)(c–d);
printf(―Ratio = %f\n‖, ratio);
}
else
printf(―c–d is zero\n‖);
If Tn–1 (usually known as previous term) is known, then Tn (known as present term) can be easily found by
multiplying the previous term by x/n. Then
ex = To + T1 + T2 +..... + Tn: sum
The program uses count to count the number of terms added. The program stops when the value of the term is
less than 0.0001 (ACCURACY). Note that when a term is less than ACCURACY, the value of n is set equal to
999 (a number higher than 100) and therefore the while loop terminates. The results are printed outside the
while loop.
Output
Enter value of x:0
Terms = 2 Sum = 1.000000
Enter value of x:0.1
Terms=5 Sum=1.105171
Enter value of x:0.5
Terms = 7 Sum = 1.648720
Enter value of x:0.75
Terms=8 sum:2. 116997
Enter value of x:0.99
Terms=9 Sum =2.691232
Enter value of x:1
Terms=9 Sum:2.718279
Figure 2.19: Illustration of if else statement.
The logic of execution is illustrated in Figure 2.17. If the condition -1 is false, the statement -3 will be
executed; otherwise it continues to perform the second test. If the condition -2 is true, the statement -1 will be
evaluated; otherwise the statement -2 will be evaluated and then the control is transferred to the statement -x.
A commercial bank has introduced an incentive policy of giving bonus to all its deposit holders. The policy is
as follows: A bonus of 2% of the balance held on 31st December is given to everyone, irrespective of their
balance, and 5% is given to female account holders if their balance is more than Rs.5000. This logic can be
coded as follows:
if (sex is female )
{
if (balance >500)
bonus=0.05 *balance;
else
{
bonus= 0.02 * balance;
{
balance = balance +bonus
bonus:0.02–balance;
There is an ambiguity as to over which if the else belongs to. In C, an else is linked to the closer non-
terminated if. Therefore, the else is associated with the inner if and there is no else option for the outer if. This
means that the computer is trying to execute the statement
balance = balance + bonus;
without really calculating the bonus for the male account holders.
Consider another alternative which also looks correct;
if (sex is female)
{
if (balance > 5000)
bonus = 0.05*balance;
}
else
bonus = 0.02 * balance;
balance = balance + bonus;
In this case, else is associated with the outer if and therefore bonus is calculated for the male account holders.
However, bonus for the female account holders, whose balance is equal to or less than 5000 is not calculated
because of the missing else option for the inner if.
Example: The program in Figure 2.18 selects and prints the largest of the three numbers using nested
if....else statements.
Program
/**********************************/
/* selecting the largested of three values */
/*********************************/
main( )
{
float A, B, C;
printf(―Enter three values\ n‖);
scanf(―%f %f‖, &A, &B, &C);
printf(―\nLargest value is‖);
if (A>B)
{
if (A>c)
printf(―%f \n‖, A);
}
else
{
if (C>B)
printf(―%f\n‖, C);
else
printf(―%f \n‖, B);
}
}
Output
Enter three values
23445 67379 88843
Largest value is 88843.000000
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
1
0 1
1 0 1
0 1 0 1
1 0 1 0 1
The expression is an integer expression or characters. Value-l, Value-2 are constants or constant
expressions (evaluable to an integral constant) and are known as ccs€ lables. Each of these values should be
unique within a switch statement. block-l, block-2 are statement lists and may contain zero or more
statements. There is no need to put braces around these blocks. Note that case labels end with a colon (:).
When the switch is executed, the value of the expression is successively compared against the values value-1,
value-2,....If a case is found whose value matches with the value of the expression, then the block of
statements that follows the case are executed.
Program
/****************************************/
/*Use of else...if ladder */
/************************************** /
main( )
{
int units, custnum;
float charges;
printf{(―Enter CUSTOMEB NO. and UNITS consumed\n‖);
scanf(―%d %d‖, &custnum, &units);
if (units <= 200)
charges:0.5 * units;
else if (units <= 400)
charges = 100 + 0 65. (units - 200);
else if (units <: 600)
charges:230 + 08. (units – 400);
else
charges = 390 + (units – 600);
printf(―\n\nCustomer No: %d: Charges: %.2f \n‖, custnum, charges);
}
Output
Enter CUSTOMER NO. and UNITS consumed 101 .150
Customer No:101 Charges : 75.00
Enter CUSTOMER NO. and UNITS consumed 202 225
Customer No:202 Charges = 116.25
Enter CUSTOMER NO. and UNITS consumed 303 375
Customer No:303 Charges = 213.75
Enter CUSTOMER NO. and UNITS consumed 404 520
Customer No:404 Charges : 326.00
Enter CUSTOMER NO. and UNITS consumed 505 625
Customer No:505 Charges = 415.00
Figure 2.23: Illustration of else…if ladder.
Switch (expression)
{
Case value-1;
block-1
break;
case value-2
block-2
break;
default:
default-block
break;
}
Statement-x
Figure 2.24: Selection process of the switch statement.
The break statement at the end of each block signals the end of a particular case and causes an exit from the
switch statement, transferring the control to the statement-x following the switch.
The default is an optional case. When present, it will be executed if the value of the expression does not match
with any of the case values. If not present, no action takes place if all matches fail and the control goes to the
statement-x.
The selection process of switch statement is illustrated in the flowchart shown in Figure 2.25.
The switch statement can be used to grade the students. This is illustrated below:
index= marks/10;
switch (index)
{
case 10:
case 9:
case 8:
gradg = ,,Honours,‘;
break;
case 7:
case 6:
grade = ―First Division‖;
break;
case 5:
grede = ―Second Division‖;
break;
case 4:
grade = ―Third Division‖;
break;
default:
grade = ―Fail‖;
break;
}
printf(―%s\n‖; grade);
Figure 2.25: Selection process of the switch statement.
Marks Index
100 10
90-99 9
80-89 8
70-79 7
60–69 6
50–59 5
40–49 4
0 0
This segment of the program illustrates two important features. First, it uses empty cases. The first three cases
will execute the same statements
grade = ―Honours‖;
break;
Same is the case with case 7 and case 6‘ Second, default condition is used for all other case where marks is
less than 40.
The switch statement is often used for menu selection. For example:
printf (― TRAVEL GUIDE \n \n‖);
printf(― A Air Timlngs\n‖ );
printf(― T Train Timings\n‖);
printf(― B Bus Service\n‖ );
printf(― X To skiP\n‖ );
printf(― \n Enter your choice\n‖);
character = getchar( );
switch (character)
{
case ‗A‘:
air-display( );
break:
case ‗B‘ :
bus-display( );
break;
case ‗T‘ :
train-display( );
break;
default :
printf(― No choice\n‖);
}
It is possible to nest the switch Statements, That is, a switch may be part of a case block.
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
2.8 Summary
―if statement‖ and ―switch statement‖ are the most popular conditional statements used in C.
Branch is the term given to the code executed in sequence as a result of change in the program‘s flow; the
program‘s flow can be changed by conditional statements in that program.
Sequential statements are those statements in a program that executes one by one in a given sequence.
Input data items must be separated by spaces, tabs or newlines punctuation marks do not count as
separators.
The function scanf is expected to read three items of data and therefore, when the values for all the three
variables are read correctly, the program prints out their values.
Computer outputs are used as information for analysing certain relationships between variables and for
making decisions.
The C language programs follow a sequential form of execution of statements.
The if statement is powerful decision statement and used to control the flow of execution.
The switch statement tests the value of given variables (or expression) against a list of case values and
when a match is found.
2.9 Keywords
if Statement: The if statement is a powerful decision making statement and is used to control the flow of
execution of statements.
Printf:The printf function is just a useful function from the standard library of functions that are accessible by
C programs. The behavior of printf is defined in the ANSI standard.
Putchar : There is an analogous function putchar for writing characters one at a time to the Terminal.
Scanf: Is an input function which can read data from a terminal.
Switch Statement: When the switch is executed, the value of the expression is successively compared against
the values value-1, value-2 ...If a case is found whose value matches with the value of the expression, then the
block of statement.
The else…if Ladder: The conditions are evaluated from the top (of the ladder), downwards. As soon as a true
condition is found, the statement associated with it is executed and the control is transferred to the statement-x
1skipping the rest (of the ladder). When all the n conditions become false, then the final else containing the
default-statement will be-executed.
3.0 Objectives
After studying this chapter, you will be able to:
Differentiate between while-loop and do-while loop
Discuss the control statement
Explain the while statement
Explain the for statement
Discuss the nesting of for loops
Explain the do statement
Define the while statement
Define the?:operator
Discuss the jumping statement
Discuss the control transfer statement
3.1 Introduction
We have seen that a C program is a set of statements which are normally executed sequentially in the order in
which they appear. This happens when no options or no repetitions of certain calculations are necessary.
However, in practice, we have a number of situations where we may have to change the order of execution of
statements based on certain conditions, or repeat a group of statements until certain specified conditions are
met. This involves a .kind of decision making to see whether a particular condition has occurred or not and then
direct the computer to execute certain statements accordingly.
The 10 times. That is, the loop is executed 10 times. This number can be decreased or increased easily by modifying
the relational expression appropriately in the statement if (n == 10). On such occasions where the exact number of
repetitions is known, there are more convenient methods of looping in C. These looping capabilities enable us to
develop concise programs containing repetitive processes without the use of goto statements.
In looping, a sequence of statements is executed until some conditions for termination of the loop are satisfied.
A program loop therefore consists of two segments, one known as the body of the loop and the other known as
the control statement. The control statement tests certain conditions, and then directs the repeated execution of
the statements contained in the body of the loop.
Depending on the position of .the control statement in the loop, a control structure may be classified either as the
entry-controlled loop or as the exit-controlled loop. The flowcharts in Figure 4.1 illustrate these structures. In
the entry-controlled loop, the control conditions are tested before the start of the loop execution. If the conditions
are not satisfied, then the body of the loop will not be executed. In the case of an exit-controlled loop, the test is
performed at the end of the body of the loop and therefore the body is executed unconditionally for the first time.
Figure 3.1: Loop control structure.
The test conditions should be carefully stated in order to perform the desired number of loop executions. It is
assumed that the test condition will eventually transfer the control out of the loop. In case, due to some reason it
does not do so, the control sets up an infinite loop and the body is executed over and over again.
main()
{
longint p;
int n;
double q;
printf(―------------------------------------------\n‖);
printf(― 2 to power n n 2 to power -n\n‖);
printf(―------------------------------------------\n‖);
p = 1;
for (n = 0; n < 21 ; ++n) /* LOOP BEGINS */
{
if (n == 0)
p = 1;
else
p = p * 2;
q = 1.0/(double)p ;
printf(―%10ld %10d %20.12lf\n‖, p, n, q);
} /* LOOP ENDS */
printf(―------------------------------------------\n‖);
}
Output
-----------------------------------------------
2 to powern n 2 to power -n
-----------------------------------------------
1 0 1.000000000000
2 1 0.500000000000
4 2 0.250000000000
8 3 0.125000000000
16 4 0.062500000000
32 5 0.031250000000
64 6 0.015625000000
128 7 0.007812500000
256 8 0.003906250000
512 9 0.001953125000
1024 10 0.000976562500
2048 11 0.000488281250
4096 12 0.000244140625
8192 13 0.000122070313
16384 14 0.000061035156
32768 15 0.000030517578
65536 16 0.000015258789
131072 17 0.000007629395
262144 18 0.000003814697
524288 19 0.000001907349
1048576 20 0.000000953674
Figure 3.3: Program to print ‗Power of 2‘ table using for loop.
Notice that the initialization section has two parts p = 1 and n = 1 separated by a comma. Like the initialization
section, the increment section may also have more than one part. For example, the loop
for (n=1, m=50; n<=m; n=n+1, m=m-1) {
p = m/n;
printf(―%d %d %d\n‖, n, m, p); }
is perfectly valid. The multiple arguments in the increment section are separated by commas.
The third feature is that the test-condition may have any compound relation and the testing need not be limited
only to the loop control variable. Consider the example below:
sum = 0;
for (i = 1; i < 20 && sum < 100; ++i)
{
sum = sum+i; printf(―%d %d\n‖, sum);
}
The loop uses a compound test condition with the control variable i and external variable sum. The loop is
executed as long as both the conditions i < 20 and sum < 100 are true. The sum is evaluated inside the loop.
It is also permissible to use expressions in the assignment statements of initialization and increment sections.
For example, a statement of the type
for (x = (m+ny2; x > 0; x = x/2)
is perfectly valid.
Another unique aspect of for loop is that one or more sections can be omitted, if necessary. Consider the
following statements:
m = 5;
for(;m != 100;) {
printf(―%d\n‖, m);
m = m+5;
}
Both the initialization and increment sections are omitted in the for statement. The initialization has been done
before the for statement and the control variable is incremented inside the loop. In such cases, the sections are
left blank. However, the semicolons separating the sections must remain. If the test-condition is not present,
the for statement sets up an infinite loop. Such loops can be broken using break or goto statements in the loop.
We can set up time delay loops using the null statement as follows:
for ( j = 1000; j > 0; j = j-1)
This loop is executed 1000 times without producing any output; it simply causes a time delay. Notice that the
body of thee loop contains only a semicolon, known as a null statement. This can also be written as
for U=1000; j > 0; j = j-1);
This implies that the C compiler will not give an error message if we place a semicolon by mistake at the end
of a for statement. The semicolon will be considered as a null statement and the program may produce some
nonsense.
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
do {
body of the loop
}
while (test-condition);
On reaching the do statement, the program proceeds to evaluate the body of the loop first. At the end of the
loop, the test-condition in the while statement is evaluated. If the condition is true, the program continues to
evaluate the body of the loop once again. This process continues as long as the condition is true. When the
condition becomes false, the loop will be terminated and the control goes to the statement that appears
immediately after the while statement.
main()
{
int count, n;
float x, y;
printf(―Enter the values of x and n : ―);
scanf(―%f %d‖, &x, &n);
y = 1.0;
count = 1; /* Initialisation */
/* LOOP BEGINs */
while ( count <= n) /* Testing */
{
y = y*x;
count++; /* Incrementing */
}
/* END OF LOOP */
printf(―\nx = %f; n = %d; x to power n = %f\n‖,x,n,y);
}
Output
Enter the values of x and n : 2.5 4
x = 2.500000; n = 4; x to power n = 39.062500
Enter the values of x and n : 0.5 4
x = 0.500000; n = 4; x to power n = 0.062500
Figure 3.5: Program to compute x to the power n using while loop.
Since the test-condition is evaluated at the bottom of the loop, the do. .while construct provides an exit-
controlled loop and therefore the body of the loop is always executed at least once.
A simple example of a do.. .while loop is:
do {
printf(―Input a number\n‖);
number = getnum( );
}
while(number>0);
This segment of a program reads a number from the keyboard until a zero
or a negative.
The test conditions may have compound relations as well. For instance, the statement while (number > 0 &&
number < 100);
in the above example would cause the loop to be executed as long as the number keyed in lies between 0 and
100.
Consider another example:
l=1;
sum=0;
do {
sum=sum+1;
l=l+2;
}While(sum<40!! l<10)
printf(―%d%d\n‖, l, sum);
The loop will be executed as long as one of the two relations is true.
Caution
In the running of program, control the loop otherwise program will not be terminated.
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
Ex2: Define Do Statement?
……..………………………………………………………………………………………………………………
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
The while is an entry-controlled loop statement. The test-condition is evaluated and if the condition is true,
then the body of the loop is executed. After execution of the body, the test condition is once again evaluated and
if it is true, the body is executed once again. This process of repeated execution of the body continues until
the test-condition finally becomes false and the control is transferred out of the loop. On exit, the program
continues with the statement immediately after the body of the loop. The body of the loop may have one or
more statements. The braces are needed only if the body contains two or more statements. However, it is a
good practice to use braces even if the body has only one statement.
sum = 0; n=1;
while (n <= 10) {
SUM= SUM + n ‗ n;
n = n+1;
}
printf(―sum = %d\n‖, sum);
The body of the loop is executed 10 times for n = 1, 2, ................. 10 each time adding the square of the value of n,
which is incremented inside the loop. The test condition may also be written as n < 11; the result would be
the same.
Another example of while statement which uses the keyboard input is shown below:
character = ‗ ‘
while (character != ‗Y‘)
character = getchar();
First the character is initialized to ‗ ‘. The while statement then begins by testing whether character is not
equal to Y. Since the character was initialized to ‗ ‘, the test is true and the loop statement
character = getchar();
xxxxxxx;.
is executed. Each time a letter is keyed in, the test is carried out and the loop statement is executed until
the letter Y is pressed. When Y is pressed, the condition becomes false because character equals Y, and the
loop terminates, thus transferring the control to the statement xxxxxxx;.
Example:
A program to evaluate the equation
Y=Xn
when n is a non-negative integer, is given in Figure 4.9.
The variable y is initialized to 1 and then multiplied by x, n times using the while loop. The loop control
variable, count is initialized outside the loop and incremented inside the loop. When the value of count
becomes greater than n, the control exits the loop.
2. To print out a and b given below, which of the following printf() statement will you use?
#include<stdio.h>
float a=3.14;
double b=3.14;
(a) printf(―%f %lf‖, a, b); (b) printf(―%Lf %f‖, a, b);
(c) printf(―%Lf %Lf‖, a, b); (d) printf(―%f %Lf‖, a, b);
3. The for loop is another entry-controlled loop that provides a more concise loop control structure.
(a) True. (b) False
During running of a program when a statement like goto begin; is met, the flow of control will jump to the
statement immediately following the label begin: This happens unconditionally.
Note that a goto breaks the normal sequential execution of the program. If the label is before the statement
goto label; a loop will be formed and some statement will be executed repeatedly. Such a jump is known as
backward jump. On the other hand, if the label: is placed after the goto label; some statements will be skipped
and the jump is known as a forward jump.a goto is often used at the end of a program to direct the control to
the input statement, to read further data. Consider the following example:
main( )
{
double x, y;
read:
scanf(―%f‖, &x)
if (x<o)goto read;
y=sqrt(x);
printf(%f%f\n‖, x, y);
goto read;
}
This program is written to evaluate the square root of a series of numbers read from the terminal‘ The program
uses two goto statements, one at the end, after printing the results to transfer the control back to the input
statement and the other to skip any furthi.o*f.riu,ion when the number is negative. Due to the unconditional
goto statement at the end, the control is always transferred back to the input statement. In fact, this program
puts the computer in a permanent loop known as an infinite loop The computer goes round and round until we
take some special steps to terminate the loop. Such infinite loops should be avoided. example illustrates how
such infinite loops can be eliminated.
Example: Program presented in Figure. 3.6 illustrates the use of the goto statement.
The program evaluates one square root for five numbers. The variable count keeps the count of numbers read.
When count is less than or equal to 5, goto read; directs the control to the label read; otherwise, the program
prints a message and stops.
Program
/*********************************** /
/*Use of goto Statements
#include <math.h>
main( )
{
double x, y;
int count;
count = 1;
print{(―Enter FIVE real values in a LINE \n‖);
read:
scanf(―%if‖, &x);
printf(― \n‖);
if(x<0)
printf(―ltem - %dis negative\n‖,count);
else
{
y = sqrt(x);
printf(―%d\t %lf \n‖, x, Y);
}
Count = count+1;
if (count <= 5)
goto read;
printf(― \nEnd of computation‖);
}
Output
Enter FIVE real values in a LINE
50.70 40 36 75 11.25
50.750000 7.1 23903
40.000000 6.324555
Item -3 is negative
75.000000 8.660254
11.250000 3.354102
End of computation
Another use of the goto statement is to transfer the control out of a loop (or nested loops) when certain peculiar
conditions are encountered.
Example:
While ( )
{
for( )
{
If( ) goto end_of_program;
}
}
End_of_program:
Jumping out of loops
We should try to avoid using goto as far as possible. But there is nothing wrong, if we use it to enhance the
readability of the program or to improve the execution speed.
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
3.8 Control Transfer Statement
Loops perform a set of operations repeatedly until the control variable fails to satisfy the test-condition. The
number of times a loop is repeated is decided in advance and the test condition is written to achieve this.
Sometimes, when executing a loop it becomes desirable to skip a part of the loop or to leave the loop as soon
as a certain condition occurs. For example, consider the case of searching for a particular name in a list
containing, say, 100 names. A program loop written for reading and testing the names a 100 times must be
terminated as soon as the desired name is found. The C permits a jump from one statement to another within a
loop as well as a jump out of a loop.
main()
{
int m;
float x, sum, average;
printf(―This program computes the average of a set of numbers\n‖);
printf(―Enter values one after another\n‖);
printf(―Enter a NEGATIVE number at the end.\n\n‖);
sum = 0;
for (m = 1 ; m < = 1000 ; ++m)
{
scanf(―%f‖, &x);
if (x < 0)
break;
sum += x;
}
average = sum/(float)(m–1);
printf(―\n‖);
printf(―Number of values = %d\n‖, m–1);
printf(―Sum = %f\n‖, sum);
printf(―Average = %f\n‖, average);
}
Output
This program computes the average of a set of numbers
Enter values one after another
Enter a NEGATIVE number at the end.
21 23 24 22 26 22 –1
Number of values = 6
Sum = 138.000000
Average = 23.000000
We have used the for statement to perform the repeated addition of each of the terms in the series. Since it is
an infinite series, the evaluation of the function is terminated when the term x‖ reaches the desired accuracy.
The value of n that decides the number of loop operations is not known and therefore we have decided
arbitrarily a value of 100, which may or may not result in the desired level of accuracy.
#define LOOP 10
#define ACCURACY 0.0001
main()
{
int n;
float x, term, sum
printf(―Input value of x : ―);
scanf(―%f‖, &x);
sum = 0;
for (term = 1, n = 1 ; n < = LOOP ; ++n)
{
sum += term ;
if (term < = ACCURACY)
goto output; EXIT FROM THE LOOP
term *= x ;
}
printf(―\nFINAL VALUE OF N IS NOT SUFFICIENT\n‖);
printf(―TO ACHIEVE DESIRED ACCURACY\n‖);
goto end;
output:
printf(―\nEXIT FROM LOOP\n‖);
printf(―Sum = %f; No.of terms = %d\n‖, sum, n);
end:
; /* Null Statement*/
}
Output
Input value of x : .21
EXIT FROM LOOP
Sum = 1.265800; No.of terms = 7
Input value of x : .75
EXIT FROM LOOP
Sum = 3.999774; No.of terms = 34
Input value of x : .99
FINAL VALUE OF N IS NOT SUFFICIENT
TO ACHIEVE DESIRED ACCURACY
Figure 3.8: Use of goto to exit from a loop.
The test of accuracy is made using an if statement and the goto statement exits the loop as soon as the accuracy
condition is satisfied. If the number of loop repetitions is not large enough to produce the desired accuracy, the
program prints an appropriate message.
Break statement is not very convenient to use here. Both the normal exit and the break exit will transfer the
control to the same statement that appears next to the loop. But, in the present problem, the normal exit prints
the message.
―FINAL VALUE OF N IS NOT SUFFICIENT TO ACHIEVE DESIRED ACCURACY‖
And the forced exit prints the results of evaluation. Notice the use of a null statement at the end. This is
necessary because a program should not end with a label.
Like the break statement, the C supports another similar statement called the continue statement. However,
unlike the break which causes the loop to be terminated, the continue, as the name implies, causes the loop to
be continued with the next iteration after skipping any statements in between. The continue statement tells the
compiler, ―SKIP THE FOLLOWING STATEMENTS AND CONTINUE WITH THE NEXT ITERATION‖.
The format of the continue statement is simply.
Continue
The use of the continue statement in loops is illustrated in Figure 4.10. In while and do loops, continue causes
the control to go directly to the test-condition and then to continue the iteration process. In the case of for loop,
the increment section of the loop is executed before the test-condition is evaluated.
Example: The program illustrates the use of continue statement.
1‘hr program evaluates tile square root of a series of numbers and prints the results. The process stops
when the number 9999 is typed in.
In case, the series contains any negative numbers, the process of evaluation of square root should. be
bypassed for such numbers because the square root of a negative number is not defined. The continue
statement is used to achieve this. The program also prints a message saying that the number is negative
and keeps an account of negative numbers. Final output includes the number of positive values
evaluated and the number of negative items encountered.
#include <math.h>
main()
{
int count, negative;
double number, sqroot;
printf(―Enter 9999 to STOP\n‖);
count = 0 ;
negative = 0 ;
while (count < = 100)
{
printf(―Enter a number : ―);
scanf(―%lf‖, &number);
if (number == 9999)
break; /* EXIT FROM THE LOOP
if (number < 0)
{
printf(―Number is negative\n\n‖);
negative++ ;
continue; /* SKIP REST OF THE LOOP*/
}
sqroot = sqrt(number);
printf(―Number = %lf\n Square root = %lf\n\n‖, number, sqroot);
count++ ;
}
printf(―Number of items done = %d\n‖, count);
printf(―\n\nNegative items = %d\n‖, negative);
printf(―END OF DATA\n‖);
}
Output
Enter 9999 to STOP
Enter a number : 25.0
Number = 25.000000
Square root = 5.000000
3.9 Summary
The for loop is entry controlled loop that provides a more concise structure.
The simplest of all the looping structure in c is the while statement.
When the break statement is encountered inside a loop, the loop is immediately exited and the program
continues with the statement immediately.
The continue as name implies causes the loop to be continued with the next iteration after skipping any
statement in between.
The goto requires a label in order to identify the place where the branch is to be made.
3.10 Keywords
continue statement: During the loop operations, it may be necessary to skip a part of the body of the loop
under certain conditions. For example, in processing of applications for some job, we might like to exclude the
processing of data of applicants belonging to a certain category.
if else statement: The basic operation of if else statement is that a statement or group of statements is executed
under if.
goto statement: goto statement in highly structured language like C, there may be occasions when the use of
goto might be desirable.
if else: The for loop is another entry-controlled loop that provides a more concise loop control structure.
while loop: is used to execute a block of code as long as some condition is true.
4.0 Objectives
After studying this chapter, you will be able to:
Discuss the single-dimensional arrays
Understand how to perform the operations on array
Defined the examples of complex programs with array
Explain the multi-dimensional arrays
4.1 Introduction
An array is a data structure used to store a collection of data items all of the same type.
The name of the array is associated with the collection of data. To access an individual data item, you need to
indicate to the computer which array element you want. This is indicated using an array index (or.subscript).
Why are arrays useful? Suppose you want to write a program which accepts 5 integers input by the user, and
prints them out in reverse order. You could do it like this:
int first, second, third, fourth, fifth;
printf("enter 5 integers, separated by spaces: ");
scanf("%d %d %d %d %d", &first, &second, &third, &fourth, &fifth);
printf("in reverse order: %d, %d, %d, ", fifth, fourth, third);
printf("%d, %d\n", second, first); /* output is all on 1 line */
This works as required. But – what if you had 50 inputs? Or 500?! Or…
Using integer variables would become very cumbersome…
Example:
In above example, a is an array of type integer which has storage size of 3 elements. The total size
would be 3 * 2 = 6 bytes.
* MEMORY ALLOCATION :
Program :
#include <stdio.h>
#include <conio.h>
void main()
{
int a[3], i;;
clrscr();
printf("\n\t Enter three numbers : ");
for(i=0; i<3; i++)
{
scanf("%d", &a[i]); // read array
}
printf("\n\n\t Numbers are : ");
for(i=0; i<3; i++)
{
printf("\t %d", a[i]); // print array
}
getch();
}
Output :
Features :
Array size should be positive number only.
String array always terminates with null character ('\0').
Array elements are countered from 0 to n-1.
Useful for multiple reading of elements (numbers).
Disadvantages :
There is no easy method to initialize large number of array elements.
It is difficult to initialize selected elements.
Caution
Changing data types of elements in an array will cause the occurrence of error in declaration.
anotherWay:
i = 0 vect[i] = 1
i = 1 vect[i] = 2
i = 2 vect[i] = 3
i = 3 vect[i] = 4
i = 4 vect[i] = 5
i = 5 vect[i] = 6
i = 6 vect[i] = 7
i = 7 vect[i] = 8
i = 8 vect[i] = 9
i = 9 vect[i] = 10
*/
Here is a more complex program that will demonstrate how to read, write and traverse the integer arrays
#include <stdio.h>
void intSwap(int *x, int *y);
int getIntArray(int a[], int nmax, int sentinel);
void printIntArray(int a[], int n);
void reverseIntArray(int a[], int n);
int main(void) {
int x[10];
int hmny;
hmny = getIntArray(x, 10, 0);
printf(―The array was: \n‖);
printIntArray(x,hmny);
reverseIntArray(x,hmny);
printf(―after reverse it is:\n‖);
printIntArray(x,hmny);
}
void intSwap(int *x, int *y)
/* It swaps the content of x and y */
{
int temp = *x;
*x = *y;
*y = temp;
}
/* n is the number of elements in the array a.
These values are printed out, five per line. */
void printIntArray(int a[], int n){
int i;
for (i=0; i<n; ){
printf(―\t%d ―, a[i++]);
if (i%5==0)
printf(―\n‖);
}
printf(―\n‖);
}
Output:
Enter the size of an array: 5
Enter the elements in ascending order: 4 7 8 11 21
Enter the number to be search: 11
The number is found.
Linear search: A linear search is the most basic of search algorithm you can have. A linear search
sequentially moves through a collection (or data structure) looking for a matching value.
Here is a c program to search an element in an array using binary search
Example:
#include<stdio.h>
main()
{
int array[100], search, c, number;
printf(―Enter the number of elements in array\n‖);
scanf(―%d‖,&number);
printf(―Enter %d numbers\n‖, number);
for ( c = 0 ; c < number ; c++ )
scanf(―%d‖,&array[c]);
printf(―Enter the number to search\n‖);
scanf(―%d‖,&search);
for ( c = 0 ; c < number ; c++ )
{
if ( array[c] == search ) /* if required element found */
{
printf(―%d is present at location %d.\n‖, search, c+1);
break;
}
}
if ( c == number )
printf(―%d is not present in array.\n‖, search);
return 0;
}
Output:
Enter the number of elements in array
5
Enter 5 numbers
123
56
99
–4568
957
Enter the number to search
99
99 is present at location 3.
Example:
#include<stdio.h>
int main(){
int i,j,s,temp,a[20];
printf(―Enter total elements: ―);
scanf(―%d‖,&s);
printf(―Enter %d elements: ―,s);
for(i=0;i<s;i++)
scanf(―%d‖,&a[i]);
for(i=1;i<s;i++){
temp=a[i];
j=i–1;
while((temp<a[j])&&(j>=0)){
a[j+1]=a[j];
j=j–1;
}
a[j+1]=temp;
}
printf(―After sorting: ―);
for(i=0;i<s;i++)
printf(― %d‖,a[i]);
return 0;
}
Output:
Enter total elements: 5
Enter 5 elements: 3 7 9 0 2
After sorting: 0 2 3 7 9
Bubble Sort
Bubble sort, also known as sinking sort, is a simple sorting algorithm that works by repeatedly stepping
through the list to be sorted, comparing each pair of adjacent items and swapping them if they are in the wrong
order. The pass through the list is repeated until no swaps are needed, which indicates that the list is sorted.
Here is an example of simple bubble sort implementation using array ascending order in c programming
language
Example:
#include<stdio.h>
int main(){
int s, temp, i, j, a[20];
printf(―Enter total numbers of elements: ―);
scanf(―%d‖,&s);
printf(―Enter %d elements: ―,s);
for(i=0;i<s;i++)
scanf(―%d‖,&a[i]);
Output:
Enter total numbers of elements: 5
Enter 5 elements: 6 2 0 11 9
After sorting: 0 2 6 9 11
Caution
Be aware while using pointer with array, it easily make an error, most of which are not detectable by compiler
and cause program malfunction in a place that may be distant from the instruction that caused the problem.
Exercise: Check Your Progress 1
Note: i) Use the space below for your answer.
Ex1:Define Single-Dimensional Arrays.
……..………………………………………………………………………………………………………………
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
Example:
int a[3][3];
In above example, a is an array of type integer which has storage size of 3 * 3 matrix. The total size would be
3 * 3 * 2 = 18 bytes.
It is also called as 'multidimensional array.'
* MEMORY ALLOCATION :
Program :
#include <stdio.h>
#include <conio.h>
void main()
{
int a[3][3], i, j;
clrscr();
printf("\n\t Enter matrix of 3*3 : ");
for(i=0; i<3; i++)
{
for(j=0; j<3; j++)
{
scanf("%d",&a[i][j]); //read 3*3 array
}
}
printf("\n\t Matrix is : \n");
for(i=0; i<3; i++)
{
for(j=0; j<3; j++)
{
printf("\t %d",a[i][j]); //print 3*3 array
}
printf("\n");
}
getch();
}
Output :
Filling with user input - When working with two-dimensional arrays (such as accessing, filling, printing, etc.),
it is necessary to use nested loops. The outer loop controls the number of rows and the inner loop controls the
number of columns.
Manipulating a matrix - Suppose you want to save the information for 30 students and 3 exam grades for each
student entered at the keyboard. In addition, you want to find the average (which could be a decimal value)
for each student, and then store this average in a fourth column of the same matrix. Remember, you will need
to obtain the grades before you can compute the average. Here is one possibility:
import java.io.*;
import BreezyGUI.*;
• Length: Just as a command such as list.length returns the length of a one dimensional array, scores.length
will return the number of rows in this two-dimensional array. scores[ i ].length will return the number of
columns of the row with subscript i in a two-dimensional array.
Working with Strings - Create a matrix of String values, fill the matrix by list, and print the matrix. Notice
that the "internal" arrays are of differing sizes. Notice how the .length is used to deal with these varying
lengths during printing.
3. main()
{
char thought[2][30]={―Do not walk in front of me..‖,‖I am not follow‖};
printf(―%c%c‖,*(thought[0]+9),*(*(thought+0)+5));
}
What is the output of this program?
(a) k k (b) Do not walk in front of me
(c) I may not follow (d) K
4. What will be output if you will execute following c code?
#include<stdio.h>
#include<conio.h>
void main(){
int a[]={0,1,2,3,4,5,6,7,8,9,10};
int i=0,num;
num=a[++i+a[++i]]+a[++i];
printf(―%d‖,num);
}
(a) 6 (b) 7
(c) 8 (d) 9
5. When array elements are passed to a function with call by reference, function has pointer arguments.
(a) True (b) False
6. When array is declared with rows and columns it is called as 2-D i.e. two dimensional array
(a) True (b) False
The CAT function is a useful tool for building multidimensional arrays. B = cat(DIM,A1,A2,...) builds a
multidimensional array by concatenating A1, A2 ... along the dimension DIM
Accessing Elements
To access a single element of a multidimensional array, use integer subscripts. For example D(1,2,2,22), using
D defined in the previous slide, returns 6.
Let A be a 3 by 3 by 2 array. PERMUTE(A,[2 1 3]) returns an array with the row and column subscripts
reversed (dimension 1 is the row, dimension 2 is the column, dimension 3 is the depth and so on). Similarly,
PERMUTE(A,[3,2,1]) returns an array with the first and third subscripts interchanged.
Functions like EIG that operate on planes or 2D matrices do not accept multi-dimensional arrays as arguments.
To apply such functions to different planes of the multidimensional arrays, use indexing or FOR loops. For
example:
INTERP3, INTERPN, and NDGRID are examples of interpolation and data gridding functions that operate
specifically on multidimensional data. Here is an example of NDGRID applied to an N-dimensional matrix.
You can build multidimensional cell arrays and multidimensional structure arrays, and can also convert
between multidimensional numeric and cell arrays.
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
4.8 Summary
An array is a collection of similar elements. These similar elements could be all integers or all floats or all
characters etc.
Array size must be declared using constant value before initialization.
A single dimensional array will be useful for simple grouping of data that is relatively small in size.
Sorting is the process to arrange the array elements in ascending or descending order.
Arrays are provides a simple mechanism where more than one elements of same type are to be used.
4.9 Keywords
Array: An array is the collection of elements with contiguous memory allocation.
Binary Search: It is a digital scheme for locating a specific object in a large set. Each object in the set is given
a key which help us to searching special objects from a collection.
Bubble sort: It is a simple sorting algorithm that works by repeatedly stepping through the list to be sorted,
comparing each pair of adjacent items and swapping them if they are in the wrong order.
Insertion sort: It is one of the basic algorithms that exist among hundreds of sorting algorithms. It only
performs n–1 passes, where n is the number of elements to sort.
Linear Search: It is the most basic of search algorithm you can have. A linear search sequentially moves
through a collection (or data structure) looking for a matching value.
Multi-Dimensional Arrays: Multidimensional arrays operate on the same principle as single-dimensional
arrays. It shows the different dimensions of an array.
5.0 Objectives
After studying this chapter, you will be able to:
Discuss the concept of string and string variable
Explain the string input/output functions
Understand the arrays of strings
Define and declare string handling functions
5.1 Introductions
In C language, strings are stored in an array of char type along with the null terminating character ―\0‖ at the
end. In other words to create a string in C you create an array of chars and set each element in the array to a
char value that makes up the string. When sizing the string array you need to add plus one to the actual size of
the string to make space for the null terminating character, ―\0‖.
Syntax to declare a string in C:
char fname[4];
The above statement declares a string called fname that can take up to 3 characters. It can be indexed just as a
regular array as well.
fname[] = {‗t‘, ‗w‘, ‗o‘};
Character t w o \0
ASCII Code 116 119 41 0
The last character is the null character having ASCII value zero.
5.2 Concepts of String and String Variable
Strings in C are represented by arrays of characters. The end of the string is marked with a special character,
the null character, which is simply the character with the value 0. (The null character has no relation except in
name to the null pointer. In the ASCII character set, the null character is named NUL.) The null or string-
terminating character is represented by another character escape sequence, \0.
Because C has no built-in facilities for manipulating entire arrays (copying them, comparing them, etc.), it also
has very few built-in facilities for manipulating strings.
In fact, C‘s only truly built-in string-handling is that it allows us to use string constants (also called string
literals) in our code. Whenever we write a string, enclosed in double quotes, the C automatically creates an
array of characters for us, containing that string, terminated by the \0 character. For example, we can declare
and define an array of characters, and initialize it with a string constant:
char string[] = ―Hello, world!‖;
In this case, we can leave out the dimension of the array, since the compiler can compute it for us based on the
size of the initializer (14, including the terminating \0). This is the only case where the compiler sizes a string
array for us, however; in other cases, it will be necessary that we decide how big the arrays and other data
structures we use to hold strings are.
To do anything else with strings, we must typically call functions. The C library contains a few basic string
manipulation functions, and to learn more about strings, we will be looking at how these functions might be
implemented.
Example:
/* to check whether a string is palindrome*/
#include <stdio.h>
#include <string.h>
#define FALSE 0
main()
{
int flag=l;
int right, left, n;
char w[50]; /* maximum width of string 50*/
puts(―Enter string to be checked for palindrome‖);
gets(w);
n=strlen(w)– 1;
for ((left=0, right=n); left<=n/2; ++left, – –right) {
if (w[left]!=w [right])
{
flag=FALSE;
break;
}
}
if (flag)
{
puts (w);
puts (― is a palindrome‖);
}
else
printf (―%s is NOT a palindrome‖ ) ;
}
Output
Enter string to be checked for palindrome
palap
palap
is a palindrome
5.4 Arrays of Strings
In the array of strings we defined the strings in two ways:
Single dimensional,
Two dimensional (Multidimensional).
In the single dimensional strings of array following example:
char string5[20] = ―Hello, ―;
char string6[] = ―world!‖;
printf(―%s\n‖, string5);
strcat(string5, string6);
printf(―%s\n‖, string5);
Arrays of strings (arrays of character arrays) can be declared and handled in a similar manner to that described
for 2-D arrays. Consider the given example:
#include< stdio.h>
void main(void)
{
char names[2][8] = {―Frans‖, ―Coenen‖};
/* Output */
/* Output initials */
Here we declare a 2-D character array comprising two ―roes‖ and 8 ―columns‖. We then initialize this array
with two character strings. The output the array we need to index into each row using the 2-D array name on
its own (strings) as a pointer cause only the first element (―row‖) to be produced. Note that we can still index
to individual elements using index pairs. The output from the above will be:
names = Frans, Coenen
names = Frans
Initials = F. C.
Caution
String comparison operators can be confusing when you are comparing numeric strings, if you are used to
assuming as numbers, not strings. It may cause of error.
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
strlen()
Syntax: len = strlen(ptr);
where len is an integer and ptr is a pointer to char. strlen() returns the length of a string, excluding the null.
The following code will result in len having the value 13.
int len;
char str[15];
strcpy(str, ―Hello, world!‖);
len = strlen(str);
strcpy()
Syntax: strcpy(ptr1, ptr2);
where ptr1 and ptr2 are pointers to char. strcpy() is used to copy a null-terminated string into a variable. Given
the following declarations, several things are possible.
char S[25];
char D[25];
Putting text into a string:
strcpy(S, ―This is String 1.‖);
Copying a whole string from S to D:
strcpy(D, S);
Copying the tail end of string S to D:
strcpy(D, &S[8]);
Ensure that the source string is null-terminated, very strange and sometimes very ugly things may result.
strncpy()
Syntax: strncpy(ptr1, ptr2, n);
where n is an integer and ptr1 and ptr2 are pointers to char
strncpy() is used to copy a portion of a possibly null-terminated string into a variable. Care must be taken
because the ‗\0‘ is put at the end of destination string only if it is within the part of the string being copied.
Given the following declarations, several things are possible.
char S[25];
char D[25];
Assume that the following statement has been executed before each of the remaining code fragments.
Putting text into the source string:
strcpy(S, ―This is String 1.‖);
Copying four characters from the beginning of S to D and placing a null at the end:
strncpy(D, S, 4);
D[4] = ‗\0‘;
Copying two characters from the middle of string S to D:
strncpy(D, &S[5], 2);
D[2] = ‗\0‘;
Copying the tail end of string S to D:
strncpy(D, &S[8], 15);
which produces the same result as strcpy(D, &S[8]);
Caution
Be aware that strncpy will not automatically append a null terminator, which means that you can go from a
regular, null-terminated string to a non-null-terminated string if you try to copy a string that would not present
correctly at the destination.
strcat()
Syntax: strcat(ptr1, ptr2);
where ptr1 and ptr2 are pointers to char
strcat() is used to concatenate a null-terminated string to end of another string variable. This is equivalent to
pasting one string onto the end of another, overwriting the null terminator. There is only one common use for
strcat().
char S[25] = ―world!‖;
char D[25] = ―Hello, ―;
Concatenating the whole string S onto D:
strcat(D, S);
strncat()
Syntax: strncat(ptr1, ptr2, n);
where n is an integer and ptr1 and ptr2 are pointers to char
strncat() is used to concatenate a portion of a possibly null-terminated string onto the end of another string
variable. Care must be taken because some earlier implementations of C do not append the ‗\0‘ at the end of
destination string. Given the following declarations, several things are possible, but only one is commonly
used.
char S[25] = ―world!‖;
char D[25] = ―Hello‖;
Concatenating five characters from the beginning of S onto the end of D and placing a null at the end:
strncat(D, S, 5);
strncat(D, S, strlen(S) –1);
Both would result in D containing ―Hello, world‖.
strcmp()
Syntax: diff = strcmp(ptr1, ptr2);
where diff is an integer and ptr1 and ptr2 are pointers to char
strcmp() is used to compare two strings. The strings are compared character by character starting at the
characters pointed at by the two pointers. If the strings are identical, the integer value zero (0) is returned. As
soon as a difference is found, the comparison is halted and if the ASCII value at the point of difference in the
first string is less than that in the second (e.g. ‗a‘ 0x61 vs. ‗e‘ 0x65) a negative value is returned; otherwise, a
positive value is returned. Examine the following examples.
char s1[25] = ―pat‖;
char s2[25] = ―pet‖;
diff will have a negative value after the following statement is executed.
diff = strcmp(s1, s2);
diff will have a positive value after the following statement is executed.
diff = strcmp(s2, s1);
diff will have a value of zero (0) after the execution of the following statement, which compares s1 with itself.
diff = strcmp(s1, s1);
strncmp()
Syntax: diff = strncmp(ptr1, ptr2, n);
where diff and n are integers ptr1 and ptr2 are pointers to char. strncmp() is used to compare the first n
characters of two strings. The strings are compared character by character starting at the characters pointed at
by the two pointers. If the first n strings are identical, the integer value zero (0) is returned. As soon as a
difference is found, the comparison is halted and if the ASCII value at the point of difference in the first string
is less than that in the second (e.g. ‗a‘ 0x61 vs. ‗e‘ 0x65) a negative value is returned; otherwise, a positive
value is returned. Examine the following examples.
char s1[25] = ―pat‖;
char s2[25] = ―pet‖;
diff will have a negative value after the following statement is executed.
diff = strncmp(s1, s2, 2);
diff will have a positive value after the following statement is executed.
diff = strncmp(s2, s1, 3);
diff will have a value of zero (0) after the following statement.
diff = strncmp(s1, s2, 1);
The following example show the all condition that occurs in the handling the stings function
char str[25] = ―cot‖;
char ch = ‗u‘;
char D[25] = ―pat‖;
Replacing a single character using a char variable:
D[1] = ch;
This would result in D containing ―put‖.
Replacing a single character using a char literal:
D[1] = ‗e‘;
This would result in D containing ―pet‖.
Replacing a single character using a single character from a string variable:
D[1] = str[1];
This would result in D containing ―pot‖.
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
Ex2: Write a program to compare first four charactes of two string.
……..………………………………………………………………………………………………………………
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
5.6 Summary
Strings in C are represented by arrays of characters.
The gets and puts are unformatted I/O functions, there are no format specifications associated with them.
The printf and scanf, in order to input or output more than one variable, separate statements have to be
written for each variable
In the array of strings we defined the strings in two ways: Single dimensional and two dimensional
(Multidimensional).
In C, a string is stored as a null-terminated char array.
5.7 Keywords
Null-terminated String: String is terminated by a special character which is called as null terminator or null
parameter (/0).
strcat(): It is used to concatenate a null-terminated string to end of another string variable. This is equivalent to
pasting one string onto the end of another, overwriting the null terminator.
String: C string is defined as an array of characters or a pointer to characters.
strlen() It returns the length of a string, excluding the null
strncmp(): It is used to compare the first n characters of two strings. The strings are compared character by
character starting at the characters pointed at by the two pointers.
6.0 Objectives
After studying this chapter, you will be able to:
Discuss the basic concept of elements of user-defined functions
Explain the categories of functions
Define and passing parameters to functions
Understand about the arrays in functions
Explain the nesting of functions
Define the recursion
Explain the command line arguments
Discuss the storage classes
6.1 Introduction
In C functions can be classified into two categories, namely, library functions and user-defined functions. main
is an example of user defined function. printf and scanf belong to the category of library functions. The main
distinction between user-defined and library function is that the former are not required to be written by user
while the latter have to be developed by the user at the time of writing a program. However the user defined
function can become a part of the C program library.
/*Example */
/* A function called many times */
#include <stdio.h>
main ( )
{
float a, b, c, d, sum1, sum2, sum3;
float add(f1oat a, float b); /*function declaration*/
printf(―enter 2 float numbers\n‖);
scanf (―%f%f‖, &a, &b);
sum1 =add(a, b); /*function call*/
printf(―enter 2 more float numbers\n‖);
scanf(―%f%f‖, &c, &d);
sum2 =add(c, d); /*function call*/
sum3 =add(sum1, sum2); /*function call*/
printf(sum of %f and %f =%f\n‖, a, b, sum1);
printf(―sum of %f and %f =%f\n‖, c, d, sum2);
printf(―sum of %f and %f =%f\n‖, sum1, sum2, sum3);
}
/*function definition*/
float add (float c, float d) /*function declarator*/
{
float e;
e=c+d;
return e;
i
Result of program
enter 2 float numbers
1.5 3.7
enter 2 more float numbers
5.6 8.9
sum of 1.500000 and 3.700000 =5.200000
sum of 5. 600000 and 8 . 900000 :14 – 500000
sum of 5.200000 and 14.500000 :19.70000
We have defined sum1, sum2 and sum3 as float variables.
We are calling function add three times with the following assignment statements:
sum1 =add(a, b);
sum2 = add(c, d);
sum3 = add( sum1, surr2);
Thus the program goes back and forth between main &, add as given below:
main()
add(a, b)
main()
add(c, d)
main()
add (sum1, sum2)
main()
Had we not used the function odd, we would have to write statements pertaining to add 3 times in the main
program such a program would be large and difficult to read. In this method we have to code for add only
once, and hence the program size is small. This is one of the reasons for the usage of functions.
In Example, we could add another function call by add (10.005 ,3.1125); This statement will also work
perfectly. After the function is executed, the sum will be returned to the main function. Therefore, both
variables and constants can be passed to a function by making use of the same function declaration.
We have used arguments to send values to the called function, in the same way we can also use arguments to
send back information to the calling function. The arguments that are used to send back data are called Output
Parameters.
It is a bit difficult for novice because this type of function uses pointer. Let‘s see an example:
#include<stdio.h>
#include<conio.h>
void calc(int x, int y, int *add, int *sub) {
*add = x+y;
*sub = x–y;
}
void main()
{
int a=20, b=11, p,q;
clrscr();
calc(a,b,&p,&q);
printf(―Sum = %d, Sub = %d‖,p,q);
getch();
}
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
Ex2: Define function that return multiple values.
……..………………………………………………………………………………………………………………
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
6.7 Recursion
Recursion is a powerful tool that can really simplify your code if you find that you have a problem that can be
solved by using it. A recursive function is one that calls itself one or more times. One example of recursive
functions is operations on binary trees. Binary trees are an advanced data structure, but all operations on them
are considered recursive in nature. Traversing linked lists is another recursive problem, but linked lists too are
an advanced topic.
Recursion is a powerful tool, but it takes a lot of careful planning so it can be difficult to implement, and many
programmers will simply pass it up because of this. In order to successfully implement a recursive function,
you must identify one or more exit conditions for stopping the recursive calls. If you do this wrong, your code
will enter an endless loop and cause a stack overflow because of all the function calls.
Recursion is never necessary, and many never use it in practice because of the time it takes to design the
problem is often longer than just coding it iteratively. In these days where agile development is king, time is
everything. It is still good to know in case you do come across a problem recursive in nature, or see it in
somebody else‘s code.
Most examples of recursion are rather contrived, and to be honest this programmer has never used it in C since
learning it in college. Calculating factorials, tower of Hanoi, and the Sieve of Eratosthenes are common ones
for explaining recursion without going into too advanced of concepts. Calculating factorials is simple enough
to explain the concept so we are going to write a very quick sample to do just that.
You may remember the definition of factorials from math courses, you may not. The point to take home is that
the recursive function calc_factorial in calc_factorial.c calls itself until the base case is resolved and it returns
1 instead of n – 1.
#include <stdio.h>
int calc_factorial(int n);
int main() {
int i;
int n_values[5] = {1, 2, 5, 3, 9};
int factorials[5];
for (i = 0; i < 5; i++) { factorials[i] = calc_factorial(n_values[i]);
}
for (i = 0; i < 5; i++)
{
printf(―Factorial of %d is %d‖, n_values[i], factorials[i]);
printf(―n‖);
}
}
int calc_factorial(int n) {
int n_minus_one;
int next_n;
//Base case for exiting the recursion is a value of 1.
if (n <= 1) {
return 1;
} else
{
//Otherwise return the next iteration‘s n value.
n_minus_one = n – 1;
next_n = n * calc_factorial(n_minus_one);
return next_n;
}}
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
6.8 Command Line Arguments
In C it is possible to accept command line arguments. Command-line arguments are given after the name of a
program in command-line operating systems like DOS or Linux, and are passed in to the program from the
operating system. To use command line arguments in your program, you must first understand the full
declaration of the main function, which previously has accepted no arguments. In fact, main can actually
accept two arguments: one argument is number of command line arguments, and the other argument is a full
list of all of the command line arguments.
The full declaration of main looks like this:
int main ( int argc, char *argv[] )
The integer, argc is the argument count. It is the number of arguments passed into the program from the
command line, including the name of the program.
The array of character pointers is the listing of all the arguments. argv[0] is the name of the program, or an
empty string if the name is not available. After that, every element number less than argc is a command line
argument. You can use each argv element just like a string, or use argv as a two dimensional array. argv[argc]
is a null pointer.
How could this be used? Almost any program that wants its parameters to be set when it is executed would use
this. One common use is to write a function that takes the name of a file and outputs the entire text of it onto
the screen.
#include <stdio.h>
int main ( int argc, char *argv[] )
{
if ( argc != 2 ) /* argc should be 2 for correct execution */
{
/* We print argv[0] assuming it is the program name */
printf( ―usage: %s filename‖, argv[0] );
}
else
{
// We assume argv[1] is a filename to open
FILE *file = fopen( argv[1], ―r‖ );
/* fopen returns 0, the NULL pointer, on failure */
if ( file == 0 )
{
printf( ―Could not open file\n‖ );
}
else
{
int x;
/* read one character at a time from file, stopping at EOF, which indicates the end of the file. Note that the
idiom of ―assign to a variable, check the value‖ used below works because the assignment statement
evaluates to the value assigned. */
while ( ( x = fgetc( file ) ) != EOF )
{
printf( ―%c‖, x );
}
fclose( file );
}
}
}
This program is fairly short, but it incorporates the full version of main and even performs a useful function. It
first checks to ensure the user added the second argument, theoretically a file name. The program then checks
to see if the file is valid by trying to open it. This is a standard operation, and if it results in the file being
opened, then the return value of fopen will be a valid FILE*; otherwise, it will be 0, the NULL pointer. After
that, we just execute a loop to print out one character at a time from the file.
6.9.1Automatic Variables
(i) Storage location: Except for register variables, the other three types will be stored in memory.
(ii) Scope: Auto variables are declared within a function and are local to the function. This means the value
will not be available in other functions.
Auto variables defined in different functions will be independent of each other, even if they have the same
name
Auto variables are local to the block in a function. If an auto variable is defined on top of the function after the
opening brace, then it is available for the entire function. If it is defined later in a block after another opening
brace, it will be valid only till the end of the block i.e., up to the corresponding closing brace.
Execute the program and you will get the following results:
x = 20 in the first block
y = 120 in the function
x = 120 after the return from f2
x = 10 after the first block
x = 30 in the second block
x = 10 after the second block
x = 5.555000 in the function
x = 10 after return from function will be 10
Caution
Be carefull before using register variables in program, it may create register memory problem.
Example: The program presented in figure converts the given temperature in Fahrenheit to Celsius
using the following conversion formula: c= F –32/1.8
Program
/*******************************************************************/
/* FAHRENHEIT CELSIUS CONVERSION TABLE */
/*****************************************************************/
#define F_LOW 0 /*****************************/
#define F_MAX 250 /* SYMBOLIC CONSTANTS */
#define STEP 25 /********************************/main()
{
typedef float REAL ; /* TYPE DEFINITION */
REAL Fahrenheit,Celsius
/* DECLARATION */
Fahrenheit = F_LOW; /
* INITIALIZATION*/
printf(―Fahrenheit Celsius\n\n‖);
while(Fahrenheit <= F_MAX )
{ Celsius = ( Fahrenheit 32.0)/1.8;
printf(― %5.1f %7.2f\n‖, Fahrenheit, Celsius);
Fahrenheit = Fahrenheit + STEP;
}
}
Output
Fahrenheit Celsius
0.0– 17.78
25.0– 3.89
50.0 10.00
75.0 23.89
100.0 37.78
125.0 51.67
150.0 65.56
175.0 79.44
200.0 93.33
225.0 107.22
250.0 121.11
The program prints a conversion table for reading temperature in Celsius, given the Fahrenheit values. The
minimum and maximum, values and step size defined as symbolic constants.
These values can be changed by redefining the #define statements. A user-defined data type name REAL is
used to declare the variables Fahrenheit and Celsius.
The formation specifications %5.l f and % 7.2 in the second printf statement produces two column output as
shown.
2. The reverse function reverses the number and sends it back to the ……. function.
(a) reverse (b) main
(c) add (d) None of above.
4. The arguments declared as part of the prototype are also known as ……. parameters.
(a) informal (b) formal
(c) multiple (d) single
5. Function definition consists of two parts i.e., function declarator and …………
(a). Function prototype (b). function body
(c). Function calling (d). None of these.
6.10 Summary
In C functions classified into two categories, namely, library functions and user-defined functions.
Elements of user-defined functions as a function declaration function definition function call.
A function declaration is called a function prototype.
A recursive function is one that calls itself one or more times.
Command-line arguments are given after the name of a program in command-line operating systems like
DOS or Linux, and are passed in to the program from the operating system.
6.11 Keywords
Data type: Data type specifies the types of data stored in a variable.
Formal parameters: The arguments declared as part of the prototype are also known as formal parameters.
Function declarator: The function declarator is a replica of the function declaration.
Register variables: Register variable are a special case of automatic variables. Automatic variables are
allocated storage in the memory of the computer.
Static variables: Static variables are local to the functions and exist till the termination of the program.
main( )
{
int x = 10;
int y = 20;
int p,q;
p = prod(x,y);
q = prod (p,prod(x,2));
printf(―%d %d\n‖, p,q);
}
prod(a,b)
int a,b;
{
return (a*b);
}
8. Write a function that will generate and print the first n Fibonacci numbers.
9. Distinguish between the following:
a) Global and local variables
b) Automatic and static variables
10. Which of the following function headers is invalid? And why?
a) Average (x,y,z);
b) Power (a, n–1)
c) product (m, 10)
d) double minimum (float a; float b;)
7.0 Objectives
After studying this chapter, you will be able to:
Discuss the concepts of pointer
Explain about the pointer variables
Understand how to declare and initialize the pointers
Discuss pointers on pointer
Explain the compatibility and application of pointers
Discuss the memory allocation functions and memory mapping
Explain the memory management functions
7.1 Introduction
Pointers are widely used in programming; they are used to refer to memory location of another variable
without using variable identifier itself. They are mainly used in linked lists and call by reference functions.
Figure 7.1 illustrates the concept of pointers. As you can see here; Yptr is pointing to memory address 100.
Figure 7.1: Concept of pointers.
Note: The 10th element will have the subscript 9, since the 1st element has the subscript 0.
Example: If the 10th element of a long double is stored from location 2000 onwards, find the location of the
15th element and the first element.
The 1st element will be stored at location 2000 – 10 * 10=1900.
The 15th element will be stored at 1900 + 15 * 10=2050.
Let us now consider a pointer to an integer. Let the integer be mark. Then the address will be denoted as
&mark. Note that all addresses will be in integers for all data types. In the case of pointers, the address of mark
will be stored in another location. The pointer is a variable that contains the address of the variable. We can
assign the address of the integer to an integer pointer. Usually we declare: int mark; mark =75; we can also
declare int*ip; This means ip is a pointer to the integer. We can assign ip = &mark; i.e. we have assigned the
address of mark to ip. Let us pictorially explain this.
1011 75
1030 1011
Here mark = 75 and the address of mark is 1011. Therefore ip = 1011. This value will also be stored at another
location 1030. Here ip points to an integer mark, and holds the address of mark. Since the pointer is also a
variable, it will be stored in another location. The * is called the indirection or dereferencing operator.
Similarly we can write
float f = 101.23;
float * fp;
fp = &f;
Here fp points to a float because we have assigned the address of f to fp. Remember that the pointer can point
to any type of variable such as a float or char or int or string. Pointers themselves are always of type int
because it is the value of the address.
It is necessary to become familiar with pointers. Therefore let us apply the concepts learnt.
We can have the definitions of the following types:
int i=204;
float f= 101. 23;
int * ip ; / * ip is a pointer to integer */
float * fp ; /* fp is a pointer to float */
This is carried out as follows:
ip = &i;
fP = &f;
By assigning ip to the address of i, ip points to integer i. Similarly, fp points to float f, suppose we now assign:
i = 100;
ip automatically points to 100.
Similarly if we assign
f = 100.05; then fp points to the new value. What actually happens? The variables i & f are assigned storage
locations; ip holds the address of where i is stored fp holds the address of f. When we assign new values to i
and f, the values stored in ip and fp are not affected. They continue to point to i and f, but the values of i and f
have been actually changed.
If we now add the following assignment statements
int a [5 ];
ip = &a[0]
We have defined an array of integer a with 5 elements. When we assign the address of a [0] i.e., the 0th
element of a to ip, ip will point to the array. The old assignment to ip is lost. It is irrecoverable.
We can also perform arithmetic operations on pointer variables, such as:
ip = ip + 5; /*pointer moved up by 5 locations*/
ip = ip–10; /* ip moved down by 10 locations/
ip– –; /*decremented*/
ip++; /*pointer incremented*/
*ip++; /*value incremented*/
*ip– –; /*value decremented*/
However, such operations on pointers are limited. We cannot carry out the following operations on pointers:
ip+fp; /*invalid*/
ip*fp; /*invalid*/
ip*2; /*invalid*/
fp/10; /*invalid*/
ip = rp*10; /*invalid*/
If we say ip = fp; then both fp and ip will point to the same location, and hence fp will point to the same
variable pointed to by ip.
Caution
We must assign the pointers to the specific integers and floats; otherwise they will not point to value.
The fourth feature of a variable is its data type. In the example, var is an integer. A pointer has all the four
properties of variables. However, the data type of every pointer is always an integer because it is the value of
the memory address. Memory addresses are integers. They cannot be floats or any other data types. They may
point to an integer or a float or a character or a function etc. They have a name. They have a value. For
instance the following is a valid declaration of a pointer to an integer.
int * ip;
Here ip is the name of a pointer. It points to or it contains the address of an integer, which is the value. It will
also be stored in, another location in memory like any other variables.
The first line declares qty and m as integer variables and q as a pointer variable pointing an integer. The
second line assigns the value 165 to qty and the third line assigns the address of qty to the pointer variable q.
The fourth line contains the indirection operator *. When the operator * is placed before a pointer variable in
an expression, the pointer will return the value of the variable. The * can be called as ‗value at address‘. Thus
the value of q would be 165. The two statements
q=&qty;
m=*p;
are equivalent to
m=qty.
Exercise: Check Your Progress 1
Note: i) Use the space below for your answer.
Ex1:Define.pointer
……..………………………………………………………………………………………………………………
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
Example:
int *ptr; float *string;
We will get the same result by assigning the address of num to a regular (non pointer) variable. The benefit is
that we can also refer to the pointer variable as *ptr the asterisk tells to the computer that we are not interested
in the value 21260 but in the value stored in that memory location. While the value of pointer is 21260 the
value of sum is 45 however we can assign a value to the pointer * ptr as in *ptr=45.
When we place the value 45 in the memory address pointer by the variable ptr. Since the pointer contains the
address 21260 the value 45 is placed in that memory location. And since this is the location of the variable
num the value also becomes 45. this shows how we can change the value of pointer directly using a pointer
and the indirection pointer.
/* Program to display the contents of the variable their address using pointer variable*/
include <stdio.h>
{
int num, *intptr;
float x, *floptr;
char ch, *cptr;
num=123;
x=12.34;
ch=‗a‘;
intptr=&x;
cptr=&ch;
floptr=&x;
printf(―Num %d stored at address %un‖,*intptr,intptr);
printf(―Value %f stored at address %un‖,*floptr,floptr);
printf(―Character %c stored at address %un‖,*cptr,cptr);
}
Caution
In the pointer concept do not try to perform mathematical operations such as division, multiplication, and
modulus on pointers. Because adding (incrementing) and subtracting (differencing) pointers are only
acceptable.
The abundance of C operators is another cause of confusion that leads to errors. The expressions shown as:
*ptr++, *p[],(ptr).member should be carefully used. A proper understanding of the precedence and associative
rules should be carefully used.
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
Here is a simple and trivial example to give you a quick idea of how you might see calloc and realloc in action.
You will have many chances for malloc viewing as it is the most popular of the three by far.
#include <stdio.h>
#include <stdlib.h>
void main() {
char *ptr, *retval;
ptr = (char *)calloc(10, sizeof(char));
if (ptr == NULL)
printf(―calloc failed\n‖);
else
printf(―calloc successful\n‖);
retval = realloc(ptr, 5);
if (retval == NULL)
printf(―realloc failed\n‖);
else
printf(―realloc successful\n‖);
free(ptr);
free(retval);
}
First we declared two pointers and allocated a block of memory the size of 10 chars for ptr using the calloc
function. The second pointer retval is used for getting the return value from the call to realloc. Then we
reallocate the size of ptr to 5 chars instead of 10. After we check whether all went well with that call, we free
up both pointers.
You can play around with the values of size passed to either of the memory allocation functions to see how big
a chunk you can ask for before it fails on you. Do not worry, your operating system has the ability to keep your
program in check, you will not hurt it this way.
The built-in memory management logic implements a nibble-allocation memory management algorithm that
provides superior performance to calling malloc and free directly. This algorithm causes memory blocks to be
allocated up front in larger sizes and then subsequently split up when future allocation requests are received.
These blocks can be reset and reused in applications that are constantly allocating and freeing memory.
The key memory management function that a user might use is the following:
rtxMemAlloc: This function allocates a block of memory in much the same way malloc would. The only
difference from the user‘s perspective is that a pointer to a context structure is required as an argument.
The allocated memory is tracked within this context.
rtxMemFreePtr: This function releases the memory held by a pointer in much the same way the C free
function would. The only difference from a user‘s perspective is that a pointer to a context structure is
required as an argument. This context must have been used in the call to rtxMemAlloc at the time the
memory was allocated.
rtxMemFree: This function releases all memory held within a context.
rtxMemReset: This functions resets all memory held within a context. The difference between this and
the rtxMemFree function is that this function does not actually free the blocks that were previously
allocated. It only resets the pointers and indexes within those blocks to allow the memory to be reused.
rtxMemRealloc: This function works in the same way as the C realloc function. It reallocates an existing
block of memory. As in the other cases above, a pointer to a context structure is a required argument.
Note that these memory management functions are only used in the generation of C code, not C++ (although a
user can use them in a C++ application). For C++, the built-in new and delete operators are used to ensure
constructors and destructors are properly executed.
7.10 Summary
The memory locations are arranged in increasing order of addresses, starting from 0000, and increases one
by one.
The & operator used to denote the address of the variable in the scanf function.
The & operator has four properties name, value, address and data type.
The declaration of a pointer by specifying the type of data stored in the location identified by the pointer.
The compiler allocates a base address and sufficient amount of storage to contain all the elements of the
array in contiguous memory locations.
Pointers provide enormous power and flexibility to the programmers, they use cause manufactures if it not
properly handled.
String is an array of characters, terminated with a null character. A void pointer is a C convention for a
raw address.
7.11 Keywords
Array: An array is a variable that holds multiple values of the same type.
Character string: A series of characters manipulated as a group. A character string differs from a name in that
it does not represent anything a name stands for some other object.
Compiler: A compiler is a computer program that transforms human readable source code of another computer
program into the machine readable code that a CPU can execute.
Pointer: A pointer is a variable that contains the address of another variable.
Ragged arrays: The character arrays with the rows of varying length are called ragged arrays.
Void pointer: A void pointer is used for working with raw memory or for passing a pointer to an unspecified
type.
8.0 Objectives
After studying this chapter, you will be able to:
Understand the definition of structure
Explain the structures and functions
Discuss how to passing structures to the functions
Explain the passing structures through pointers
Define the uses of structures
Discuss the difference between structures and arrays
Differentiate between structure and unions
Explain the pointer to structures and derived data types
Define the derived data types
Discuss the enumerated data types
8.1 Introduction
Arrays and. structures have similarities as well as differences between them. Both arrays and structures
represent collections of a number of items. While an array is a collection of items of the same data type, this
does not hold good for a structure. For instance int x [10]; defines an array with dimension 100 i.e., 100 items,
all of the same data type namely integers. However, structures can represent items of varying data types
pertaining to an item. The only similarity between an array and a structure lies in the fact that there can be a
collection of structures which is known as array of structures.
The structure declaration above is similar to the prototype in a function in so far as memory allocation is
considered. The system does not allocate memory as soon as it finds structure declaration, which is for
information and checking consistency later on. The allocation of memory takes place only when structure
variables are declared. What is a structure variable? It is similar to other variables. For instance int i means that
i is an integer variable. Similarly the given is a structure variable declaration.
Here s1 is a variable of type structure book. Suppose, we define,
struct book,s1, s2 ;
This means that there are two variables s1 and s2of type struct book. These variables can hold different values
for their members.
Another point to be noted is that the structure declaration appears above all other declarations.
An example which does nothing, but define structure and declare structure variables is given as:
main ()
{
struct book
{
char title [25];
char author [15];
char publisher [25];
float price;
unsigned year;
};
struct book s1, s2, s3 ;
If you want to define a large number of books, then how will you modify the structure variable declaration? It
will be as follows:
struct book s[1000];
This will allocate space for storing 1000 structures or records of books. However, how much storage space
will be allocated for each element of the array? It will be the sum of storage spaces required for each member.
In struct book the storage space required will be as given below:
title 25 + 1 (for null to indicate end of string)
author 15 + 1
publisher 25 + 7
price 4
year 2
Therefore the system allots Space for 1000 structure variables each with the above requirement.Space are
allocated only after seeing the structure variable declaration.
Let us take another example to make the concept clear. You know that the bank account of each account
holder is a record. Let us define a structure for it.
struct account
{
unsigned number;
char name [15];
int balance ;
} al, a2;
Instead of declaring separate structure variables such as struct account al, a2; we can use coding as in the
example given. Here the variables are declared just after the closing brace of the structure declaration and
terminated with a semicolon. This is perfectly correct. The declaration of the members of the structure is clear;
the balance has been declared as an integer instead of a float to make it simple. This means that the minimum
transaction is a rupee.
To access the members of a structure, you use the ―.‖ (scope resolution) operator. Shown here is an example of
how you can accomplish initialization by assigning values using the scope resolution operator.
struct object player1;
player1.id = ―player1‖;
player1.xpos = 0;
player1.ypos = 0;
struct second_structure_type {
double double_member;
struct first_structure_type struct_member;
};
The first structure type is incorporated as a member of the second structure type. You can initialize a variable
of the second type as follows:
1. struct second_structure_type demo;
2. demo.double_member = 12345.6789;
3. demo.struct_member.integer_member = 5;
4. demo.struct_member.float_member = 1023.17;
The member operator is used to access members of structures that are themselves members of a larger
structure. No parentheses are needed to force a special order of evaluation; a member operator expression is
simply evaluated from left to right.
In principle, structures can be nested indefinitely. Statements such as the following are syntactically
acceptable, but bad style.
my_structure.member1.member2.member3.member4 = 5;
What happens if a structure contains an instance of its own type, however? For example:
struct regression
{
int int_member;
struct regression self_member;
};
In order to compile a statement of this type, your computer would theoretically need an infinite amount of
memory. In practice, however, you will simply receive an error message along the given lines:
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
8.5 Array of Structures
Let us now create an array of structures for the account. This is nothing but an array of accounts, declared with
size. Let us restrict the size to 5. The records will be created by using keyboard entry.
The program is given as:
/*program to demonstrate structures*/
#include<stdio h>
main()
{
struct account
{
unsigned number;
char name [15] ;
int balance;
}a [5];
int i;
for (i=0; i<=4; i++)
{
printf (―a/c no: =\t name::\t balance::\n,, ) ;
scanf (― %u%s%d‖, & a [1] .number, a [i] .name, &a [i] .balance) ;
}
for (i = 0; i<=4; 1++)
{
printf (―a/c No: =%u\t name:= %s\t balance: = %d\n, a [i] .number, a [i] .name, a [i].balance) ;
}
}
Output:
a/c No:= name:= balance:=
1 suresh 5000
a/c No:= name:= balance: =
2. Lesley 3000
a,/c No: = name: = balance: =
3. ahmed 5500
a/c No: = name: = balance: =
4. lakshmi 10900
a/c no: = name: = balance: =
5. Thomas 29000
a/c no: =1 name: =Suresh balance: =5000
a/c no: =2 name: =Lesley balance: =3000
a/c no: =3 name: =Ahmed balance: =5500
a/c no: =4 name: =Lakshmi balance: =10900
a/c no:=5 name: =Thomas balance: =29000
The structure array has been declared as part of structure declaration as a[5]you will see that the individual
elements of the 5 accounts are scanned., and printed in the same order.
When we scan a name, we do not give the address but actual name of the variables as in a [1].Name, since it is
a string variable. Remember this uniqueness. This program basically gets the 5 structures or, records pertaining
to 5 account holders. Thereafter, the details of the 5 accounts are printed using the for statement. The first half
of the result was typed by the user and the last 5 lines are the output of the program.
Caution
Be cautious about errors, because errors/bugs are very common while developing a program. If you do not
detect them and correct them, they cause a program to produce wrong results.
The union above could be used to either store the current time (in seconds) to hold time accurate to a second.
Or it could be used to hold time accurate to a millisecond. Presumably there are times when you would want
one or the other, but not both. This declaration should look familiar. It is the same as a struct definition, but
with the keyword union instead of struct.
Caution
Assignment of a struct should not be confused with the requirement of memory management when dealing
with a pointer to a struct.
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
Ex2: What is the Difference between Structures and Arrays?
……..………………………………………………………………………………………………………………
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
The pointer r is a pointer to a structure. Please note the fact that r is a pointer, and therefore takes four bytes of
memory just like any other pointer. However, the malloc statement allocates 45 bytes of memory from the
heap. *r is a structure just like any other structure of type Rec. The following code shows typical uses of the
pointer variable:
strcpy((*r).name, ―Leigh‖);
strcpy((*r).city, ―Raleigh‖);
strcpy((*r).state, ―NC‖);
printf(―%s\n‖, (*r).city);
free(r);
You deal with *r just like a normal structure variable, but you have to be careful with the precedence of
operators in C. If you were to leave off the parenthesis around *r the code would not compile because the ―.‖
operator has a higher precedence than the ―*‖ operator. Because it gets tedious to type so many parentheses
when working with pointers to structures, C includes a shorthand notation that does exactly the same thing:
strcpy(r->name, ―Leigh‖);
The r-> notation is exactly equivalent to (*r), but takes two fewer characters.
Example: Bookshop Inventory - A bookshop uses a personal computer to maintain the inventory of books
that are being sold at the shop. The list includes details such as author, title, price publisher, stock position, etc.
Whether a customer wants a book, the shopkeeper inputs the title and author of the book and system replies
whether it is in the list or not. If it is not, an appropriate message is displayed. If book is in the list, then the
system displays the book details and asks for number of copies. If the requested copies are available, the total
cost of the books is displayed; otherwise the message ―Required copies not in stock‖ is displayed.
The program uses a template to define the structure of the book. Note that the date of publication, a member of
record structure, is also defined as a structure. When the title and author of a book are specified, the program
searches for the book in the list using the function
look_up(table, s1, s2 m)
The parameter table which receives the structure variable book is declared as type struct record. The
parameters s1 and s2 receive the string values of title and author while m receives the total number of books in
the list. Total number of books is given by the expression
sizeof(book)/sizeof(struct record)
The search ends when the book is found in the list and the function returns the serial number of the first book
in the list is zero. The program terminates when we respond ―NO‖ to the question
Do you want other book?
Note that we use the function
Get(string)
To get title, author, etc. from the terminal. This enables us to input strings with spaces such as ―C Language‖.
We can not use scanf to read this string since it contains two words. Since we are reading the quantity as string
using the get (string) function, we have to convert it to as integer before using it in any expression. This is
done using the atoi( )function.
#include <stdio.h>
#include <string.h>
struct record
{
char author[20];
char title[30];
float price;
struct
{
char month[10];
int year;
}
date;
char publisher[10];
int quantity;
};
int look_up(struct record table[],char s1[],char s2[],int m);
void get (charstring [ ] );
main()
{
char title[30], author[20];
int index, no_of_records; char response[10], quantity[10];
struct record book[] = {
{―Ritche‖,‖C Language‖,45.00,‖May‖,1977,‖PHI‖,10},
{―Kochan‖,‖Programming in C‖,75.50,‖July‖,1983,‖Hayden‖,5},
{―Balagurusamy‖,‖BASIC‖,30.00,‖January‖,1984,‖TMH‖,0},
{―Balagurusamy‖,‖COBOL‖,60.00,‖December‖,1988,‖Macmillan‖,25}
};
no_of_records = sizeof(book)/ sizeof(struct record);
do
{
printf(―Enter title and author name as per the list\n‖);
printf(―\nTitle: ―);
get(title);
printf(―Author: ―);
get(author);
index = look_up(book, title, author, no_of_records);
if(index != –1) /* Book found */
{
printf(―\n%s %s %.2f %s %d %s\n\n‖,
book[index].author,
book[index].title,
book[index].price,
book[index].date.month,
book[index].date.year,
book[index].publisher);
printf(―Enter number of copies:‖);
get(quantity);
if(atoi(quantity) < book[index].quantity)
printf(―Cost of %d copies = %.2f\n‖,atoi(quantity),
book[index].price * atoi(quantity));
else
printf(―\nRequired copies not in stock\n\n‖);
}
else
printf(―\nBook not in list\n\n‖);
printf(―\nDo you want any other book? (YES / NO):‖);
get(response);
}
while(response[0] == ‗Y‘ || response[0] == ‗y‘);
printf(―\n\nThank you. Good bye!\n‖);
}
void get(charstring [] )
{
char c;
int i = 0;
do
{
c = getchar();
string[i++] = c;
}
while(c != ‗\n‘);
string[i–1] = ‗\0‘;
}
int look_up(struct record table[],char s1[],char s2[],int m)
{
int i;
for(i = 0; i < m; i++)
if(strcmp(s1, table[i].title) == 0 && strcmp(s2, table[i].author) == 0)
return(i); /* book found */
return(–1); /* book not found */
}
Output
Enter title and author name as per the list
Title: BASIC
Author: Balagurusamy
Balagurusamy BASIC 30.00 January 1984 TMH
Enter number of copies:5
Required copies not in stock
Do you want any other book? (YES / NO):y
Enter title and author name as per the list
Title: COBOL
Author: Balagurusamy
Balagurusamy COBOL 60.00 December 1988 Macmillan
Enter number of copies:7
Cost of 7 copies = 420.00
Do you want any other book? (YES / NO):y
Enter title and author name as per the list
Title: C Programming
Author: Ritche
Book not in list
Do you want any other book? (YES / NO): n
Thank you. Good bye!
8.14 Summary
A structure is usually used when we wish to store dissimilar data together.
Structure elements can be accessed through a structure variables able using a dot(.) operator.
It is possible to create an array of structure.
The data type of the component is any of the intrinsic data types, or a previously defined derived data type.
Derived data types and structures allow programmer to group different kinds of information that belong to
a single entity.
8.15 Keywords
Array of Structures: These are the arrays of the basic data types such as integers and floats.
Nested structures: Structures can contain other structures as members; in other words, structures can nest.
Structure Elements: These are the members of the structures declared as different types.
Structure: A structure is a collection of one or more variables, possibly of different data types, grouped
together under a single name for convenient handling.
Unions: A union is a collection of variables of different types, just like a structure.
9.0 Objectives
After studying this chapter, you will be able to:
Explain the file system basics
Understand about the standard streams in C
Define the file pointers
Explain the file handling functions
Discuss the getw and putw functions
Understand about the input/output operations on file
Explain the working with string using fputs() and fgets()
Define and declare the the fprintf and fscanf functions
Explain the direct access file
9.1 Introduction
A file represents a sequence of byte on the disk where a group of related data is stored. File is created for
permanent storage of data. It is a ready made structure. In C, we use a structure pointer of file type to declare a
file.
C provides a number of functions that help to perform basic file operations. They are,
Most operating systems allow the user to redirect the input or output of these streams to other files. Some
implementations run on hardware that lacks keyboards or screens; in that case the predefined streams may be
mapped to a serial line, printer or other file.
in.list
foo 70
bar 98
biz A+
then fscanf() will not be able to read that line (since there is no integer to read) and it would not advance to the
next line in the file. For this error, fscanf() will not return EOF (it is not at the end of the file).
Errors like that will at least mess up how the rest of the file is read. In some cases, they will cause an infinite
loop. One solution is to test against the number of values we expect to be read by fscanf() each time. Since our
format is ―%s %d‖, we expect it to read in 2 values, so our condition could be:
Now, if we get 2 values, the loop continues. If we do not get 2 values, either because we are at the end of the
file or some other problem occurred (e.g., it sees a letter when it is trying to read in a number with %d), then
the loop will end.
Another way to test for end of file is with the library function feof(). It just takes a file pointer and returns a
true/false value based on whether we are at the end of the file.
To use it in the above example, you would do:
while (!feof(ifp)) {
if (fscanf(ifp, ―%s %d‖, username, &score) != 2)
break;
fprintf(ofp, ―%s %d‖, username, score+10);
}
Note that, like testing != EOF, it might cause an infinite loop if the format of the input file was not as
expected. However, we can add code to make sure it reads in 2 values.
When you use fscanf(...) != EOF or feof(...), they will not detect the end of the file until they try to read past it.
In other words, they would not report end-of-file on the last valid read, only on the one after it.
Caution
In string.h all the functions will not preserve ‗\0‘ termination – this can have surprising results particularly
when copying a string into a character array too small to hold it.
Exercise: Check Your Progress 1
Note: i) Use the space below for your answer.
Ex1:Define file handling.
……..………………………………………………………………………………………………………………
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
putw(integer,fp);
getw(fp);
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
When you press the Enter key at the beginning of a line, fgets() reads the newline and places it into the first
element of the array line. Use that fact to terminate the input loop. Encountering end-of-file also terminates it.
2. get c and putc functions and are used to read and write values
(a) integer (b) string
(c) character (d) None of these.
fprintf(f1,%s%d%f‖,name,age,7.5);
Here name is an array variable of type char and age is an int variable
The general format of fscanf is
fscanf(fp,‖controlstring‖, list);
This statement would cause the reading of items in the control string.
Example:
fscanf(f2,‖5s%d‖, item, &quantity‖);
Like scanf, fscanf also returns the number of items that are successfully read.
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
9.11 Summary
The C communicates with files using a new data type called a file pointer.
File I/O is done using a buffer to improve the efficiency.
A file is a text file or a binary file depending upon its contents.
Library functions convert \n to \r\n or vice versa while writing/reading to\form a file.
The fgets() function reads input through the first newline character.
9.12 Keywords
FILE:It is a file data structure and must be written in upper case letters.
fprintf(),fputc() and fputs(): All these function are just like the normal output function.
fscanf():.It is a read the formatted data from the file.
fseek: It is a function is used to move the file position to a desired location within the file.
Pointer_to_file: It is a pointer veriable which holds the starting address of the data file.
10.0 Objectives
After studying this chapter, you will be able to:
Discuss the definition and properties of algorithm
Explain the flow chart symbols
Discuss the conversion of flow chart to language
Discuss the example of simple algorithms
Explain the program design
Define the error of C programming
10.1 Introduction
When you write a program, you have to tell the computer every small detail of what to do. And you have to get
everything exactly right, since the computer will blindly follow your program exactly as written. How, then,
do people write any but the most simple programs? It is not a big mystery, actually. It is a matter of learning to
think in the right way. A program is an expression of an idea. A programmer starts with a general idea of a
task for the computer to perform. Presumably, the programme has some idea of how to perform the task by
hand, at least in general outline. The problem is to flesh out that outline into a complete, unambiguous, step-
by-step procedure for carrying out the task. Such a procedure is called an ―algorithm‖. (Technically, an
algorithm is an unambiguous, step-by-step procedure that terminates after a finite number of steps; we do not
want to count procedures that go on forever.) An algorithm is not the same as a program. A program is written
in some particular programming language. An algorithm is more like the idea behind the program, but it is the
idea of the steps the program will take to perform its task, not just the idea of the task itself. When describing
an algorithm, the steps do not necessarily have to be specified in complete detail, as long as the steps are
unambiguous and it is clear that carrying out the steps will accomplish the assigned task. An algorithm can be
expressed in any language, including English. Of course, an algorithm can only be expressed as a program if
all the details have been filled in.
As mentioned earlier, an algorithm can be analyzed in terms of time efficiency or space utilization. We will
consider only the former right now. The running time of an algorithm is influenced by several factors:
1) Speed of the machine running the program
2) Language in which the program was written. For example, programs written in assembly language
generally run faster than those written in C or C++, which in turn tend to run faster than those written in
Java.
3) Efficiency of the compiler that created the program
4) The size of the input: processing 1000 records will take more time than processing 10 records.
5) Organization of the input: if the item we are searching for is at the top of the list, it will take less time to
find it than if it is at the bottom.
The first three items in the list are problematic. We don‘t want to use an exact measurement of running time:
To say that a particular algorithm written in Java and running on a Pentium IV takes some number of
milliseconds to run tells us nothing about the general time efficiency of the algorithm, because the
measurement is specific to a given environment. The measurement will be of no use to someone in a different
environment. We need a general metric for the time efficiency of an algorithm; one that is independent of
processor or language speeds, or compiler efficiency.
The fourth item in the list is not environment-specific, but it is an important consideration. An algorithm will
run slower if it must process more data but this decrease in speed is not because of the construction of the
algorithm. It's simply because there is more work to do. As a result of this consideration, we usually express
the running time of an algorithm as a function of the size of the input. Thus, if the input size is n, we express
the running time as T(n). This way we take into account the input size but not as a defining element of the
algorithm.
Finally, the last item in the list requires us to consider another aspect of the input, which again is not part of the
actual algorithm. To account for this, we express timing analyses in terms of "worst case", "average case" or
"best case" based on the organization of the data, or the probability of finding an element quickly. For our
purposes in the following sections, we will assume a "worst case" organization (i.e., we will not worry about
the organization of the input for now).
How does the execution time of an algorithm grow as the input size grows?
Does one grow more quickly than the other – how much more? The ideal situation is one where the running
time grows very slowly as you add more input. So, rather than deal with exact values, we keep it general by
comparing the growth of the running time as the input grows, to the growth of known functions. The
following functions are the ones typically used:
input
size
n (1) log n n n log n n2 n3 2n
5 1 3 5 15 25 125 32
10 1 4 10 33 100 103 103
100 1 7 100 664 104 106 1030
1000 1 10 1000 104 106 109 10300
10000 1 13 10000 105 108 1012 103000
For small inputs, the difference in running times would hardly be noticeable on a fast computer. For inputs of
100,000, however, (if we assume one microsecond per instruction) an algorithm comparable to the (n log n)
function will take about 1.7 CPU seconds, an n2 algorithm will take about 2.8 CPU hours which is
unacceptable, and an n3 algorithm would take 31.7 CPU years. There is no way we would ever use such an
algorithm to deal with large inputs.
Statements 1, 2, and 3 are each executed once. Statements 5, 6, and 7 are each executed n times. Statement 4
(which controls the loop) is executed n + 1 times (one additional check is required – why?), and statement 8 is
executed once. This is summarized below:
Thus, the computing time for this algorithm in terms of input size n is: T(n) = 4n + 5. We can see intuitively,
that as n grows, the value of this expression grows linearly. We say T(n) has an "order of magnitude (or rate
of growth) of n". We usually denote this using big-Oh notation: T(n) = O(n), and we say that the algorithm
has a complexity of O(n). In some cases, we might also say the algorithms has a "time complexity" of O(n) to
distinguish the growth rate for the running time of the algorithm from the amount of memory, or space, that
the algorithm would use during its execution. Of course, intuition is not enough for us skeptical computer
science types – we must be able to show mathematically which of the standard functions given in the table
above indicates the correct rate of growth.
How does this expression have to be judged? Is this good or bad? If we double the number of strings to be
sorted, the computing time quadruples! If we increase it ten-fold, it takes even 100 = 102 times longer until the
program will have terminated! All this is caused by the expression n2. One says: Sorting by minimum search
has quadratic complexity. This gives us a forefeeling that this method is unsuitable for large amounts of data
because it simply takes far too much time.
So it would be a fallacy here to say: ―For a lot of money, we'll simply buy a machine which is twice as fast,
then we can sort twice as many strings (in the same time).‖ Theoretical running time considerations offer
protection against such fallacies.
The number of (machine) instructions which a program executes during its running time is called its time
complexity in computer science. This number depends primarily on the size of the program's input, that is
approximately on the number of the strings to be sorted (and their length) and the algorithm used. So
approximately, the time complexity of the program ―sort an array of n strings by minimum search‖ is
described by the expression c·n2.
c is a constant which depends on the programming language used, on the quality of the compiler or interpreter,
on the CPU, on the size of the main memory and the access time to it, on the knowledge of the programmer,
and last but not least on the algorithm itself, which may require simple but also time consuming machine
instructions. So while one can make c smaller by improvement of external circumstances (and thereby often
investing a lot of money), the term n2, however, always remains unchanged.
(read "f of n is big oh of g of n" or "f is big oh of g") if there is a positive integer C such that f(n) <= C * g(n)
for all positive integers n.
The basic idea of big-Oh notation is this: Suppose f and g are both real-valued functions of a real variable x.
If, for large values of x, the graph of f lies closer to the horizontal axis than the graph of some multiple of g,
then f is of order g, i.e., f(x) = O(g(x)). So, g(x) represents an upper bound on f(x).
Example 1:
Suppose f(n) = 5n and g(n) = n. To show that f = O(g), we have to show the existence of a constant C as given
in Definition 1. Clearly 5 is such a constant so f(n) = 5 * g(n). We could choose a larger C such as 6, because
the definition states that f(n) must be less than or equal to C * g(n), but we usually try and find the smallest
one. Therefore, a constant C exists (we only need one) and f = O(g).
Example 2:
In the previous timing analysis, we ended up with T(n) = 4n + 5, and we concluded intuitively that T(n) = O(n)
because the running time grows linearly as n grows. Now, however, we can prove it mathematically:
To show that f(n) = 4n + 5 = O(n), we need to produce a constant C such that:
If we try C = 4, this doesn't work because 4n + 5 is not less than 4n. We need C to be at least 9 to cover all n.
If n = 1, C has to be 9, but C can be smaller for greater values of n (if n = 100, C can be 5). Since the chosen C
must work for all n, we must use 9:
4n + 5 <= 4n + 5n = 9n
Since we have produced a constant C that works for all n, we can conclude:
T(4n + 5) = O(n).
Example 3:
Say f(n) = n2: We will prove that f(n) O(n). To do this, we must show that there cannot exist a constant C
that satisfies the big-Oh definition. We will prove this by contradiction.
Suppose there is a constant C that works; then, by the definition of big-Oh: n2 <= C * n for all n. Suppose n is
any positive real number greater than C, then: n * n > C * n, or n 2 > C * n. So there exists a real number n
such that n2 > C * n. This contradicts the supposition, so the supposition is false. There is no C that can work
for all n: f(n) O(n) when f(n) = n2.
Example 4:
Suppose f(n) = n2 + 3n - 1. We want to show that f(n) = O(n2).
f(n) = n2 + 3n - 1
< n2 + 3n (subtraction makes things smaller so drop it)
<= n2 + 3n2 (since n <= n2 for all integers n)
= 4n2
Therefore, if C = 4, we have shown that f(n) = O(n2). Notice that all we are doing is finding a simple function
that is an upper bound on the original function. Because of this, we could also say that f(n) = O(n 3) since (n3)
is an upper bound on n2. This would be a much weaker description, but it is still valid.
Example 5:
Show: f(n) = 2n7 - 6n5 + 10n2 – 5 = O(n7)
f(n) < 2n7 + 6n5 + 10n2
<= 2n7 + 6n7 + 10n7
= 18n7
nj <= nd if j <= d
we can change the exponents of all the terms to the highest degree (the original function must be less than this
too). Finally, we add these terms together to get the largest constant C we need to find a function that is an
upper bound on the original one.
Big-Oh, therefore is a useful method for characterizing an approximation of the upper bound running time of
an algorithm. By ignoring constants and lower degree terms, we end up with the approximate upper bound. In
many cases, this is sufficient. Sometimes, however, it is insufficient for comparing the running times of two
algorithms. For example, if one algorithm runs in O(n) time and another runs in O(n 2) time, you cannot be
sure which one is fastest for large n. Presumably the first is, but perhaps the second algorithm was not
analyzed very carefully. It is important to remember that big-Oh notation gives no tight information on how
good an algorithm is, it just gives an upper bound on how bad it can be.
There is another notation that is sometimes useful in the analysis of algorithms. To specify a lower bound on
the growth rate of T(n), we can use big-Omega ( ) notation. If the lower bound running time for an algorithm
is g(n) then we say T(n) = (n). As an example, any algorithm with m inputs and n outputs that uses all the
inputs to generate the output would require at least (m + n) work. In a rough sense, big-Omega notation tells
us that an algorithm requires at least this much time to run, hence it is a lower bound on the running time or
alternatively can be thought of as the best case running time for the algorithm.
One last variation on this theme is big-Theta ( ) notation. Big- bounds a function from both above and
below, so two constants must be provided rather than one if we are doing formal proofs of the bounds of a
given function.
We'll discuss big-Omega and big-Theta notation a bit more below. In the meantime, there is one adjustment
that we must make to our definition of big-Oh. You may have noticed that we have been avoiding logarithms
in the discussion so far. We cannot avoid them for long, however, because many algorithms have a rate of
growth that matches logarithmic functions (a few of these include binary search, MergeSort, and QuickSort).
As a brief review, recall that log2 n is the number of times we have to divide n by 2 to get 1; or alternatively,
the number of 2's we must multiply together to get n:
n = 2k log2 n = k
Many "Divide and Conquer" algorithms solve a problem by dividing it into 2 smaller problems, then into 2
even smaller problems. You keep dividing until you get to the point where solving the problem is trivial. This
constant division by 2 suggests a logarithmic running time.
Thus, since log(1) = 0, there is no constant C such that 1 <= C * log(n) for all n. Note, however, that for n >=
2, it is the case that 1 <= log(n) and so the constant C = 1 works for sufficiently large n (larger than 1). This
suggests that we need a stronger definition of big-Oh than given previously.
if there are positive integers C and N such that f(n) <= C * g(n) for all integers n >= N.
Using this more general definition for big-Oh, we can now say that if we have f(n) = 1, then f(n) = O(log(n))
since C = 1 and N = 2 will work.
With this definition, we can clearly see the difference between the three types of notation:
In all three graphs above, n0 is the minimal possible value to get valid bounds, but any greater value will work.
Figure A shows big- , which bounds a function between two constant factors (i.e., giving a tight lower and
upper bound). We can write f(n) = (g(n)) if there exist positive constants n0, c1 and c2 such that to the right
of n0, the value of f(n) always lies between c1 * g(n) and c2 * g(n) (inclusive). Thus, in order to prove that a
function is (g(n)), we must prove that the function is both O(g(n)) and (g(n)).
Figure B shows big-Oh, which gives an upper bound for a function to within a constant factor. We can write
f(n) = O(g(n)) if there exist positive constants n0 and c such that to the right of n0, f(n) always lies on or below
c * g(n).
Finally, Figure C shows big- , which gives a lower bound for a function to within a constant factor. We can
write f(n) = (g(n)) if there exist positive constants n0 and c such that to the right of n0, f(n) always lies on or
above c * g(n).
There is a handy theorem that relates these notations:
Theorem: For any two functions f(n) and g(n), f(n) = (g(n)) if and only if f(n) = O(g(n)) and f(n) = (g(n)).
Example 6:
Show: f(n) = 3n3 + 3n - 1 = (n3)
As implied by the theorem above, to show this result, we must show two properties:
(i) f(n) = O (n3)
(ii) f(n) = (n3)
First, we show (i), using the same techniques we've already seen for big-Oh. We consider N = 1, and thus we
only consider n >= 1 to show the big-Oh result.
f(n) = 3n3 + 3n - 1
< 3n3 + 3n + 1
<= 3n3 + 3n3 + 1n3
= 7n3
f(n) = 3n3 + 3n - 1
> 3n3 - n3 since n3 > 3n - 1 for any n >= 2
= 2n3
Thus, with C = 2 and N = 2, we have shown that f(n) = (n3), since f(n) is shown to always be greater than
2n3.
Definition (Omega) Consider a function f(n) which is non-negative for all integers . We say that
``f(n) is omega g(n),'' which we write , if there exists an integer and a constant c>0 such
that for all integers , .
The definition of omega is almost identical to that of big oh. The only difference is in the comparison--for big
oh it is ; for omega, it is . All of the same conventions and caveats apply to
omega as they do to big oh.
Example:
Consider the function which is shown in Figure . Clearly, f(n) is non-negative
for all integers . We wish to show that . According to Definition , in order to show this
we need to find an integer and a constant c>0 such that for all integers , .
As with big oh, it does not matter what the particular constants are--as long as they exist! For example,
suppose we choose c=1. Then
So, we have that for c=1 and , for all integers . Hence, . Figure
clearly shows that the function is less than the function f(n)=5n-64n+256 for all values of .
Of course, there are many other values of c and that will do. For example, c=2 and .
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
………..………………………………………………………………………………………………………...….
……………………………………………………………………………………………………………………..
10.5 Summary
A program is an expression of an idea. A programmer starts with a general idea of a task for the computer
to perform.
A computer is essentially a physical device designed to carry out a collection of primitive actions.
A program specification is a more formal and detailed description of the program‘s operation.
Top-down design is useful for defining the overall structure of a program, and the dependencies between
functions.
10.6 Keywords
Action: An action symbol contains a set of instructions in C code. The syntax is the one of C language.
Assembly language: An assembly language is a low-level programming language for computers,
microprocessors, microcontrollers, and other programmable devices.
Pseudocode: Pseudocode is an artificial and informal language that helps programmers develops algorithms.
Pseudocode is a ―text-based‖ detail (algorithmic) design tool.
Top-down design: Top-down design is a methodology that starts at the highest level of a design concept and
proceeds towards the lowest level.
1.0 Objectives
After studying this chapter, you will be able to:
Explain the data structures
Discuss the operations of linked list
Explain the types of linked list
Discuss the header node of linked list
Explain the applications on linked lists
Evaluate a polynomial
The Static Segment holds the global variables and the Heap Segment hold the dynamic variable. Dynamic
memory allocation is done by using malloc () function in C.
Output:
1 value of i is = 0
2 value of i is = 1
3 value of i is = 2
4 value of i is = 3
5 value of i is = 4
Caution
Array size must be declared using constant value before initialization.
#include <stdio.h>
#include <conio.h>
#include<malloc.h>
#include <windows.h>
/* structure containing a data part and link part */
struct node
{
int data ;
struct node * link ;
};
void append ( struct node **, int ) ;
void addatbeg ( struct node **, int ) ;
void addafter ( struct node *, int, int ) ;
void display ( struct node * ) ;
int count ( struct node * ) ;
void del ( struct node **, int ) ;
int main( )
{
struct node *p ;
p = NULL; /* empty linked list */
Output:
14 30 25 42 17
777 888 99914 30 25 4217
777 888 999 1 14 30 99 25 42 17 0
No. of elements in the Linked List = 11
Element 10 not found
777 888 999 14 30 25 42 17 0
No. of elements in the linked list = 9
To begin with we have defined a structure for a node. It contains a data part and a link part. The variable p has
been declared as pointer to a node. We have used this pointer as pointer to the first node in the linked list. No
matter how many nodes get added to the linked list, p would continue to pointer to the first node in the list.
When no node has been added to the list, p has been set to NULL to indicate that the list is empty.
The append( ) function has to deal with two situations:
(a) The node is being added to an empty list.
(b) The node is being added at the end of an existing list.
In the first case, the condition
if (*q== NULL) gets satisfied. Hence, space is allocated for the node using malloc( ).
Data and the link part of this node are set up using the statements.
temp -> data = num;
temp -> link = NULL;
Lastly, p is made to point to this node, since the first node has been added to the list and p must always point to
the first node. Note that *q is nothing but equal to p.
In the other case, when the linked list is not empty, the condition
if (*q = = NULL) -
would fail, since *q (i.e. p is non-NULL). Now temp is made to point to the first node in the list through the
statement.
temp = *q ;
Then using temp we have traversed through the entire linked list using the statements.
while (temp -> link != NULL)
temp = temp -> link ;
The position of the pointers before and after traversing the linked list is shown in Figure 1.1.
Instead of showing the links to the next node we have shown the addresses of the next node in the link part of
each node.
For adding a new node at the beginning, firstly space is allocated for this node and data is stored in it through
the statement
temp -> data = num;
now we need to make the link part of this node point to the existing first node. This has been achieved through
the statement
temp -> link = *q;
Lastly, this new node must be made the first node in the list. This has been attained through the statement
*q = temp;
The addafter( ) function permits us to add a new node after a specified number of node in the linked list.
To begin with, through a loop we skip the desired number of nodes after which a new node is to be added.
Suppose we wish to add a new node containing data as 99 after the third node in the list. The position of
pointers once the control reaches outside of for loop is shown in Figure 1.4. Now space is allocated for the
node to be inserted and 99 is stored in the data part of it.
Figure 1.4: Position of pointers once the control reaches outside of for loop.
All that remains to be done is readjustment of links such that 99 go in between 30 and 25. This is achieved
through the statements.
r -> link = temp -> link;
temp -> link = r;
The first statement makes link part of node containing 99 to point to the node containing 25. The second
statement ensures that the link part of node containing 30 points to the node containing 99. In execution of the
second statement the earlier link between 30 and 25 is severed. So now 30 no longer points to 25, it points to
99. The display( ) and count( ) functions are straight forward. We leave them for you to understand.
That brings us to the last function in the program i.e. del( ). In this function through the while loop, we have
traversed through the entire linked list, checking at each node, whether it is the node to deleted. If so, we have
checked if the node being deleted is the first node in the linked list. If it is so, we have simply shifted p (which
is same as *q) to the next node and then deleted the earlier node.
If the node to be deleted is an intermediate node, then the position of various pointers and links before and
after the deletion is shown in Figure 1.5.
Figure 1.5: Position of various pointers and links before and after the deletion.
Caution
Make sure that your linked list functions work sensibly with the empty list. If you run into a function that will
fails for the empty list.
/* merges the two linked lists, restricting the common elements to occur only once in the final list */
void merge ( struct node *p, struct node *q, struct node **s)
{
struct node *z;
z = NULL;
/* traverse both linked lists till the end. If end of any one list is reached loop is terminated */
while ( p!= NULL && q!= NULL)
{
/* if node being added in the first node */
if (*s == NULL)
{
*s = (struct node*) malloc (sizeof (struct node));
z=*s;
}
else
}
z -> link =(struct node*) malloc (sizeof (struct node));
z=z->link;
}
if (p-> data < q-> data)
{
z->data=p->data;
p=p->link;
}
else
{
if (q-> data < p-> data)
{
z->data=q->data;
q=q->link;
}
else
{
if (p->data ==q->data)
{
z -> data = q -> data ;
p=p->link;
q=q->link;
}
}
}
}
/* if end of first list has not been reached */
while (p!= NULL)
{
z -> link = (struct node*) malloc (sizeof ( struct node));
z = z -> link ;
z -> data = p -> data ; p = p -> link ;
}
While merging the two lists it is assumed that the lists themselves are in ascending order. While building the
two lists the add( ) function makes sure that when a node is added the elements in the lists are maintained in
ascending order.
The function merge( ) receives three parameters. The first two parameters p and q are of the type struct node *
which point to the two lists that are to be merged. The third parameters is of the type struct node ** which
holds the address of pointer third which is a pointer to the resultant merged list. Before calling merge( ) third
contains a NULL value.
First of all we check if both the lists that are to be merged, are empty or not. If the lists are empty then the
control simply returns from the function. Otherwise, a loop is executed to traverse the lists that are pointed to
by p and q. If end of any of the list is reached then the loop is terminated.
To begin with, a NULL value is stored in z, which is going to point to the resultant merged list. Inside the
while loop, we check the special case of adding the first node to the merged list pointed to by z. If the node
being added is the first node then z is made to point to the first node of the merged list through the statement
z= s;
Next, the data from both the lists are compared and whichever is found to be smaller is stored in the data part
of the first node of the merged list. The pointers that point to the merged list and to the list from where we
copied the data are incremented appropriately.
During the next iteration of the while loop, if condition for first node fails and we reach the else block. Here
we allocate the memory for the new node and its address is stored in z -> link. Then z is made to point to this
node, through the statement
z = z -> link;
While comparing the data, if we find that the data of both the lists are equal then the data is added only once to
the merged list and pointers of all the three lists are incremented, this is done through the statements.
if (p -> data == q-> data)
{
z -> data = q -> data;
p = p -> link;
q = q -> link;
}
The procedure of comparing, adding the data to the merged list and incrementing the pointer of the merged list
and the list from where the data is added is repeated till any of the list ends.
If we reach end of first and/or second list the while loop terminates. If we have reached end only one list then
the remaining elements of the other list are simply dumped in the merged list as they are already in the
ascending order. The working of the merge function is shown in Figure 1.6. Figure 1.7(a, b, c, d, e) shows the
steps to merging two linked lists.
Example: The program shows the implementation of a singly-linked list consisting of four nodes. The program
displays the value present in each node.
#include<stdio.h>
struct new_list
{
int value;
struct new_list *next_element;
} n1, n2, n3, n4; //Creates four nodes of type
new_list void main()
{
int j;
n1.value = 200; //Assigning value to node1
n2.value = 400; //Assigning value to node2
n3.value = 600; //Assigning value to node3
n4.value = 800; //Assigning value to node4
n1.next_element = &n2; //Assigning address of node2 to node1
n2.next_element = &n3; //Assigning address of node3 to node2
n3.next_element = &n4; //Assigning address of node4 to node3
n4.next_element = 0; //Assigning 0 to node4 to indicate the end of the list
j = n1.next_element->value; //Storing the value of node1 in variable j
printf(―%d\n‖, j); /* you can use this statement to print the value present in node1 or print j directly as depicted
in the above statement*/
printf(―%d\n‖, n1.next_element->value);
printf(―%d/n‖, n4.next_element->value); //Printing the value of node4
printf(―%d/n‖, n2.next_element->value); //Printing the value of node2
printf(―%d/n‖, n3.next_element->value); //Printing the value of node3
}
Output:
After you compile the program, you will get the following output:
400
0
600
800
In this example:
1. First a structure named new_list is created. The list contains an integer data variable named value to store
data and a pointer variable named next_element to point to next node.
2. Then, four objects namely, n1, n2, n3, and n4 are created to access the structure elements. In the program
they act as nodes in a list.
3. In the main () function, the value for the four nodes n1, n2, n3, and n4 are assigned.
4. Then, the address of n2 is stored in n1, address of n3 is stored in n2, and address of n4 is stored in n3. The
address of n4 is assigned zero to depict the end of the list.
5. Finally, the value present in n1, n4, n2 and n3 are printed.
Example: The program shows the implementation of a doubly-linked list consisting of three nodes. The
program displays the value present in each node.
#include<stdio.h>
struct list
{
int value;
struct list *next; //Creating a pointer to point to the next element
struct list *previous;//Creating a pointer to point to the previous element
} n1, n2, n3; //Creating three nodes of type list
void main()
{
int j;
n1.value = 20; //Assigning value to node1
n2.value = 40; //Assigning value to node2
n3.value = 60; //Assigning value to node3
n1.next = &n2; //Assigning address of node2 to node1
n2.next = &n3; //Assigning address of node3 to node2
n2.previous = &n1; //Assigning address of node1 to node2
n3.previous = &n2; //Assigning address of node2 to node3
n3.next = 0; //Assigning 0 to node3 to indicate the end of the list
n1.previous = 0; //Assigning 0 to node1 to indicate there are no elements present before node1
j = n1.next->value; //Storing the value of node1 in variable j
printf (―%d\n‖, j);
printf (―%d\n‖, n1.next->value); // you can use this statement to print the value present in node1 or print j
directly as depicted in the above statement
printf (―%d/n‖, n1.next->value); //Printing the next value of node1
printf (―%d/n‖, n2.next->value); //Printing the next value of node2
printf (―%d/n‖, n1.previous->value); //Printing the previous value of node1
printf (―%d/n‖, n2.previous->value); //Printing the previous value of node2
printf (―%d/n‖, n3.previous->value); //Printing the previous value of node3 }
Output:
After you compile the program, you will get the following output:
40
60
0
20
40
In this example:
1. First, a structure named list is created. The list contains an integer data variable named value to store data,
a pointer variable named next_element to point to next node and a pointer variable named
previous_element to point to previous node.
2. Then, the three objects namely, n1, n2, and n3 are created to access the structure elements. In the program
they act as nodes in a list.
3. In the main () function, the value for nodes n1, n2, and n3 are assigned.
4. Then, the address of n2 is stored in n1 and address of n3 is stored in n2. In order to traverse backwards the
address of n1 is stored in n2 and address of n2 is stored in n3. The address of n3 is assigned a NULL value
to depict the end of the list.
5. Finally, the value present in n1, n2, and n3 are printed.
Example: The program shows the implementation of a circular-linked list consisting of three nodes. The
program displays the value present in each node.
#include<stdio.h>
struct list
{
int value;
struct list *next_element;
} n1, n2, n3; //Creates four nodes of type new_list
void main()
{
int j;
n1.value = 35; // Assigning value to node1
n2.value = 65; // Assigning value to node2
n3.value = 85; //Assigning value to node3
n1.next_element = &n2; //Assigning address of node2 to node1
n2.next_element = &n3; //Assigning address of node3 to node2
n3.next_element = &n1; //Assigning address of node3 to node1
j = n1.next_element->value; //Storing the value of node1 in variable j
printf(―%d\n‖, j); //Printing the value of j
/* you can use this statement to print the value present in node1*/
printf(―%d\n‖, n1.next_element->value);
printf(―%d/n‖, n2.next_element->value); //Printing the value of node2
printf(―%d/n‖, n3.next_element->value); //Printing the value of node3
}
Output:
After you compile the program, you will get the following output:
65
65
85
35
In this example:
1. First, a structure named list is created. The list contains an integer data variable named value to store data
and a pointer variable named next_element to point to next node.
2. Then, the three objects namely, n1, n2, and n3 are created to access the structure elements. In the program
they act as nodes in a list.
3. In the main () function, the value for nodes n1, n2 and n3 are assigned.
4. Then, the address of n2 is stored in n1 and address of n3 is stored in n2. Since, it is a circular list the
address of n3 is assigned to n1 instead of NULL value.
5. Finally, the value present in n1, n2, and n3 are printed.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Example: The program shows the implementation of a circular doubly-linked list consisting of three nodes and
a HEAD node. The program displays the value present in each node.
#include<stdio.h>
struct list
{
int value;
struct list *next; //Creating a pointer to point to the next element
struct list *previous; //Creating a pointer to point to the previous element
} n1, n2, n3, h; //Creates four nodes of type list
void main()
{
int j;
n1.value = 10; //Assigning value to node1
n2.value = 15; //Assigning value to node2
n3.value = 20; //Assigning value to node3
h.value = 3; //Assigning value to HEAD node
n1.next = &n2; //Assigning address of node2 to node1
n2.next = &n3; //Assigning address of node3 to node2
n3.next = &h; //Assigning address of HEAD node to node3
h.next = &n1; //Assigning address of node1 to HEAD node
n1.previous = &h; //Assigning address of node1 to HEAD node
n2.previous = &n1; //Assigning address of node1 to node2
n3.previous = &n2; //Assigning address of node2 to node3
h.previous = &n3; //Assigning address of node3 to HEAD node
j = n1.next_element->value; //Storing the value of node1 in variable j
printf(―%d\n‖, j);
printf(―%d\n‖, n1.next->value); // you can use this statement to print the value present in node1 or print j
directly as depicted in the above statement
printf(―%d/n‖, n2.next->value); //Printing the value of node2
printf(―%d/n‖, n3.next->value); //Printing the value of node3
printf(―%d/n‖, h.next->value); //Printing the value of HEAD node
printf(―%d/n‖, n1.previous->value); //Printing the previous value of node1
printf(―%d/n‖, n2.previous->value); //Printing the previous value of node2
printf(―%d/n‖, n3.previous->value); //Printing the previous value of node3
printf(―%d/n‖, h.previous->value); //Printing the previous value of HEAD node }
Output:
After you compile the program, you will get the following output:
15
20
3
10
3
10
15
20
In this example:
1. First, a structure named list is created. The list contains an integer data variable named value to store data,
a pointer variable named next_element to point to next node, and a pointer variable named
previous_element to point to previous node.
2. Then, the four objects namely, n1, n2, n3, and h are created to access the structure elements. In the
program, they act as nodes in a list. The HEAD node (h) contains the total number of nodes present in the
list.
3. In the main() function, the value for the nodes n1, n2, n3, and h are assigned.
4. Then, the address of n2 is stored in n1 and the address of n3 is stored in n2. In order to traverse backwards,
the address of h is stored in n3 and address of n1 is stored in h.
5. Finally, the value present in n1, n2, n3, and h are printed.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Ex2: What are the advantages of doubly linked list over singly linked list
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Self Assesment Questions
3. Which of the following operations is performed more efficiently by doubly linked list than singly linked list
(a) Deleting a node whose location is given.
(b) Searching of an unsorted list for a given item.
(c) Inserting a new node after node whose location is given.
(d) Traversing the list to process each node.
6. Linked lists are not suitable data structures for which one of the following problems……….
(a). insertion sort (b). Binary search (c). radix sort (d). polynomial manipulation
1.6.2 Traversal
Now we want to see the information stored inside the linked list. We create node *temp1. Transfer the address
of *head to *temp1. So *temp1 is also pointed at the front of the linked list. Linked list has 3 nodes.
We can get the data from first node using temp1->data. To get data from second node, we shift *temp1 to the
second node. Now we can get the data from second node. Figure 1.13 shows the traversal of a linked lists.
while ( temp1!=NULL )
{
printf(―temp1->data‖);// show the data in the linked list
temp1 = temp1->next; // transfer the address of ‗temp->next‘ to ‗temp‘
}
This process will run until the linked list‘s next is NULL.
Now node *temp1 is now pointed at the last node and *old_temp is pointed at the previous node of the last node.
Now rest of the work is very simple. Previous node of the last node old_temp will be NULL so it becomes the
last node of the linked list. Free the space allocated for last node (see Figure 1.17).
old_temp->next = NULL; // previous node of the last node is null
free(temp1);
Lab Exercise
1. Write a C program to store 20 integers in linked list in descending order.
2. Write a C program to evaluate a third degree polynomial.
#include <stdio.h>
#include <stdlib.h>
typedef struct Node {
unsigned char c;
struct Node *next;
}Node;
typedef Node *slist;
slist reverse(slist);
Node *makeNode(unsigned char);
/*
*/
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
We assume the coefficients to be representable in the floating-point number system of the host computer.
The algorithm achieves maximum accuracy, even in the neighborhood of a root where cancellation dooms an
ordinary floating-point evaluation.
with maximum accuracy. Except for the special cases and , which can be
calculated directly, an iterative solution method is used. We first compute a floating-point approximation of
. We then carry out a residual iteration by solving a linear system of equations. The new solution interval
determined in the next step is checked for being of maximum accuracy, i.e. for being exact to one unit in the
last place of the mantissa (1 ulp).
The following C program is a correct one, but the thing to emphasize is trouble with the use of global
variables. Let us go through the C code first:
#include<stdio.h>
#include<conio.h>
int horner(int,int);
int count=0;
void main()
{
/*Horner‘s rule for evaluating a polynomial */
/* let us take a0=0,a1=1,a2=2.. and so on */
int n,x,h=0; //n is order, x is value of X in polynomial.
scanf(―%d %d‖,&n,&x);
h=horner(n,x);
printf(―%d‖,h);
getch();
}
int horner(int n, int x)
{
int i;
if(count!=n)
{
i=count;
count++;
printf(―%d+%d‖, i, x);
return (i + x*horner(n, x));
}
else
{
printf(―%d))=―,count);
return count;
}
}
Initially there was a big problem with the above program when we not used the local variable ‗i‘ in the
‗horner‘ function. You can check the output with and without the use of ‗i‘. The problem was that all the return
statements were being evaluated after the last ‗horner‘ got evaluated for the recursive function, and in the
meanwhile, the global variable ‗count‘ got changed every time ‗horner‘ was evaluated. Since the return
statement depends upon ‗count‘, the output came out to be deviated from the expected value depending upon
the size of input. Hence the variable ‗i‘ came into existence.
GetNth()
Write a GetNth() function that takes a linked list and an integer index and returns the data value stored in the
node at that index position. GetNth() uses the C numbering convention that the first node is index 0, the
second is index 1, ... and so on. So for the list {42, 13, 666} GetNth() with index 1 should return 13. The index
should be in the range [0…length–1]. If it is not, GetNth() should assert() fail (or you could implement some
other errorcase strategy).
void GetNthTest() {
struct node* myList = BuildOneTwoThree(); // build {1, 2, 3}
int lastNode = GetNth(myList, 2); // returns the value 3
}
Essentially, GetNth() is similar to an array[i] operation — the client can ask for elements by index number.
However, GetNth() on a list is much slower than [ ] on an array. The advantage of the linked list is its much
more flexible memory management — we can Push() at any time to add more elements and the memory is
allocated as needed.
// Given a list and an index, return the data
// in the nth node of the list. The nodes are numbered from 0.
// Assert fails if the index is invalid (outside 0…lengh-1).
int GetNth(struct node* head, int index) {
// Your code
DeleteList()
Write a function DeleteList() that takes a list, deallocates all of its memory and sets its head pointer to NULL
(the empty list).
void DeleteListTest() {
struct node* myList = BuildOneTwoThree(); // build {1, 2, 3}
DeleteList()
The DeleteList() implementation will need to use a reference parameter just like Push() so that it can change
the caller‘s memory (myList in the above sample). The implementation also needs to be careful not to access
the .next field in each node after the node has been deallocated.
void DeleteList(struct node** headRef) {
// Your code
1.9 Summary
A data structure is an arrangement of data in a computer‘s memory or even disk storage.
List is collection of scalar variables (heterogeneous), which are arranged in an order.
An array is a data structure of multiple elements with the same data type. Array elements are
accessed using subscript.
A linked list is one of the fundamental data structures used in computer programming.
A linked list is called a self-referential data.
Linked lists are used as a building block for many other data structures, such as stacks, queues
and their variations.
1.10 Keywords
Association lists: Linked lists are used to implement associative arrays, and are in this context called
association lists.
Doubly-linked lists: Node has two links, one to the previous node and one to the next.
Linear linked lists: It has one link per node.
Linked list: It is one of the fundamental data structures used in computer programming.
Pointer: A pointer is a variable that contains the address of a variable.
2.0 Objectives
After studying this chapter, you will be able to:
Discuss about a stacks
Explain the basic operations of stack
Discuss the array implementation of a stack
Define the stack as a linked list
Explain the stack as an abstract data structure
Discuss the applications of stacks
2.1 Introduction
Stacks are simple data structures and important tools in programming language. Stacks are linear lists which
have restrictions on the insertion and deletion operations. These are special cases of ordered list in which
insertion and deletion is done only at the ends. The basic operations performed on stack are push and pop. The
stack implementation can be done in two ways - static implementation or dynamic implementation. Stack can
be represented in the memory using a one-dimensional array or a singly linked list.
A stack is simply a list of elements with insertions and deletions permitted at one end-called the stack top. That
means that it is possible to remove elements from a stack in reverse order from the insertion of elements into
the stack. Thus, a stack data structure exhibits the LIFO (last in first out) property. Push and pop are the
operations that are provided for insertion of an element into the stack and the removal of an element from the
stack, respectively. Shown in Figure 2.2 are the effects of push and pop operations on the stack.
Figure 2.2 shows the push operation in a stack. The stack has two elements 45 and 36. The ‘Top‘ points to ‗36‘
as it is the last item in the stack. Element 52 is added on the stack through push operation. The ‗Top‘ points to
‗52‘ after the push operation as it is the last item recently added. After adding 52, the stack is full or it is in
stack overflow condition. No more items can be added in this stack. The syntax used for Push operation is
PUSH (stack, item).
Figure 2.3 shows the pop operation in stack. The stack initially has three items, 25, 37 and 18. The ‗Top‘
points to the last item, 18. After the pop operation, item 18 is deleted from stack. Now, the ‗Top‘ points to 37.
The syntax used for Pop operation is POP (stack).
Case 1:
printf(―\tElement to be Pushed :‖);
scanf(―%d‖,&val);
push(val); //value to be added of int data type and return void
break;
Case 2:
val=pop();
if(val!= –1) //check the condition val is not equal to –1 case of underflow
printf(―\tPopped Element : %d\n‖,val);
break;
Case 3:
display();
break;
Case 4:
break;
default:
printf(―\tWrong Choice‖);
break;
}
}while(choice!=4);
return 0;
}
step 2: The main( ) method of the program is called. //entry level of the program
If user enter choice 3 case: 3 is selected and display function is called. //return void.
Go to step 1.
If user enter choice 4 case: 4 at once terminate from the program and return 0.
If user enter any other number display the message ―wrong choice‖.
Step 6: Repeat step 5
/*End of program*/
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
2.5 Stack as a Linked List
Initially the list is empty, so the top pointer is NULL. The push function takes a pointer to an existing list as
the first parameter and a data value to be pushed as the second parameter, creates a new node by using the data
value, and adds it to the top of the existing list. A pop function takes a pointer to an existing list as the first
parameter, and a pointer to a data object in which the popped value is to be returned as a second parameter.
Thus it retrieves the value of the node pointed to by the top pointer, takes the top point to the next node, and
destroys the node that was pointed to by the top.
If this strategy is used for creating a stack with the previously used four data values: 10, 20, 30, and 40, then
the stack is created as shown in Figure 2.4.
Program: A complete C program for implementation of a stack using the linked list is given here:
# include <stdio.h>
# include <stdlib.h>
struct node
{
int data;
struct node *link;
};
struct node *push(struct node *p, int value)
{
struct node *temp;
temp=(struct node *)malloc(sizeof(struct node));
/* creates new node
using data value
passed as parameter*/
if(temp==NULL)
{
printf(―No Memory available Error\n‖);
exit(0);
}
temp->data = value;
temp->link = p;
p = temp;
return(p);
}
struct node *pop(struct node *p, int *value)
{
struct node *temp;
if(p==NULL)
{
printf(― The stack is empty cannot pop Error\n‖);
exit(0);
}
*value = p->data;
temp = p;
p = p->link;
free(temp);
return(p);
}
void main()
{
struct node *top = NULL;
int n,value;
do
{
do
{
printf(―Enter the element to be pushed\n‖);
scanf(―%d‖,&value);
top = push(top,value);
printf(―Enter 1 to continue\n‖);
scanf(―%d‖,&n);
} while(n == 1);
printf(―Enter 1 to pop an element\n‖);
scanf(―%d‖,&n);
while( n == 1)
{
top = pop(top,&value);
printf(―The value poped is %d\n‖,value);
printf(―Enter 1 to pop an element\n‖);
scanf(―%d‖,&n);
}
printf(―Enter 1 to continue\n‖);
scanf(―%d‖,&n);
}
while(n == 1);
}
Example: Input and Output
Enter the element to be pushed
10
Enter 1 to continue
1
Enter the element to be pushed
20
Enter 1 to continue
0
Enter 1 to pop an element
1
The value popped is 20
Enter 1 to pop an element
1
The value poped is 10
Enter 1 to pop an element
0
Enter 1 to continue
1
Enter the element to be pushed
30
Enter 1 to continue
1
Enter the element to be pushed
40
Enter 1 to continue
0
Enter 1 to pop an element
1
The value popped is 40
Enter 1 to pop an element
0
Enter 1 to continue
1
Enter the element to be pushed
50
Enter 1 to continue
0
Enter 1 to pop an element
1
The value popped is 50
Enter 1 to pop an element
1
The value popped is 30
Enter 1 to pop an element
0
Enter 1 to continue
0
Applications: Page-visited history in IE (use ‘back‘ button) Undo sequence in text editor Program function
stack.
Table 2.5: If it is array-based, need to consider whether or not the stack is full in push( )
operation.
Array-Based Implementation
Program
// define stack exception
#include <exception>
#include <string>
using namespace std;
class StackException: : public runtime_error
}
public:
StackException(const string & message= ―‖)
: exception(message.c_str())
{}
}; // end StackException
************************************************ //
// Header file StackA.h for the ADT stack.
// Array-based implementation.
************************************************* //
#include ―StackException.h‖
const int MAX_STACK = maximum-size-of-stack;
typedef desired-type-of-stack-item StackItemType;
class Stack
}
public:
// constructors and destructor:
Stack(); // default constructor
// copy constructor and destructor are supplied by the compiler
// stack operations:
bool isEmpty() const;
// Determines whether a stack is empty.
void push(StackItemType newItem) throw(StackException);
// Adds an item to the top of a stack.
// Exception: Throws StackException if the item cannot
// be placed on the stack
void pop() throw(StackException);
// Removes the top of a stack.
// Exception: Throws StackException if the stack is empty.
void pop(StackItemType& stackTop) throw(StackException);
// Retrieves and removes the top of a stack.
// Exception: Throws StackException if the stack is empty.
void getTop(StackItemType& stackTop) const
throw(StackException);
// Retrieves the top of a stack.
// Exception: Throws StackException if the stack is empty.
private:
StackItemType items[MAX_STACK]; // array of stack items
int top; // index to top of stack
}; // end class
// End of header file.
2. If a, b and c are integer variables with the values a = 8, b = 3 and c = –5. Then what is the value of the
arithmetic expression: 2 * b + 3 * (a – c)
(a) 45 (b) 6 (c) –16 (d) –1
3. If the variables i, j and k are assigned the values 5, 3 and 2 respectively, then the expression i = j + (k + + =
6)+7
(a) gives an error message (b) assigns a value 16 to i
(c) assigns a value 18 to i (d) assigns a value 19 to i
There‘s no real reason to put the operation between the variables or values. They can just as well precede or
follow the operands. You should note the advantage of prefix and postfix: the need for precedence rules and
parentheses are eliminated
The time complexity is O(n) because each operand is scanned once, and each operation is performed once.
A more formal algorithm:
create a new stack while(input stream is not empty)
{
token = getNextToken();
if(token instanceof operand){
push(token);
}
else
if (token instance of operator)
op2 = pop();
op1 = pop();
result = calc(token, op1, op2);
push(result);
}
}
return pop();
Demonstration with 2 3 4 + * 5 –
2.7.2 Backtracking
Backtracking is used in algorithms in which there are steps along some path (state) from some starting point to
some goal.
1. Find your way through a maze.
2. Find a path from one point in a graph (roadmap) to another point.
3. Play a game in which there are moves to be made (checkers, chess).
In all of these cases, there are choices to be made among a number of options. We need some way to
remember these decision points in case we want/need to come back and try the alternative.
Consider the maze. At a point where a choice is made, we may discover that the choice leads to a dead-end.
We want to retrace back to that decision point and then try the other (next) alternative. Again, stacks can be
used as a part of the solution. Recursion is another, typically more favored, solution, which is actually
implemented by a stack.
While the method executes, the local variables and parameters are simply found by adding a constant
associated with each variable/parameter to the Base Pointer. When a method returns:-
1. Get the program counter from the activation record and replace what‘s in the PC.
2. Get the base pointer value from the AR and replace what‘s in the BP.
3. Pop the AR entirely from the stack.
Example: If the infix expression is a * b + c / d, then different snapshots of the algorithm, while scanning the
expression from right to left, are shown in Table 2.2.
The final prefix output that we get is d c / b a * + whose reverse is + * a b / c d, which is the prefix equivalent
of the input infix expression a * b + c * d. Note that all the operands are simply pushed to the queue in steps 1,
3, 5, and 7. In step 2, the operator / is pushed to the empty stack of operators. In step 4, the operator + is
checked against the elements in the stack. Since/(division) has higher priority than + (addition), the queue is
emptied to the prefix output (thus we get ‘dc‘ as the output) and then the operator/is written (thus we get ‘dc/‘
as the output). The operator + is then pushed to the stack. In step 6, the operator * is checked against the stack
elements. Since * (multiplication) has a higher priority than + (addition), * is pushed to the stack. Step 8
signifies that all of the infix expression is scanned. Thus, the queue of operands is emptied to the prefix output
(to get‘d c / b a‘), followed by the emptying of the stack of operators (to get‘d c / b a * +‘).
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
while ( *( p -> s ) )
{
if ( *( p -> s ) == ‗ ‘ || *( p -> s ) == ‗\t‘ )
{
p -> s++ ;
continue ;
}
if ( isdigit ( *( p -> s ) ) || isalpha ( *( p -> s ) ) )
{
while ( isdigit ( *( p -> s ) ) || isalpha ( *( p -> s ) ) )
{
*( p -> t ) = *( p -> s ) ;
p -> s++ ;
p -> t++ ;
}
}
if ( *( p -> s ) == ‗(‘ )
{
push ( p, *( p -> s ) ) ;
p -> s++ ;
}
if ( *( p -> s ) == ‗*‘ || *( p -> s ) == ‗+‘ || *( p -> s ) == ‗/‘ || *( p -> s ) == ‗%‘ || *( p -> s ) == ‗–‘ || *( p -> s )
== ‗$‘ )
{
if ( p -> top != –1 )
{
opr = pop ( p ) ;
while ( priority ( opr ) >= priority ( *( p -> s ) ) )
{
*( p -> t ) = opr ;
p -> t++ ;
opr = pop ( p ) ;
}
push ( p, opr ) ;
push ( p, *( p -> s ) ) ;
}
else
push ( p, *( p -> s ) ) ;
p -> s++ ;
}
if ( *( p -> s ) == ‗)‘ )
{
opr = pop ( p ) ;
while ( ( opr ) != ‗(‘ )
{
*( p -> t ) = opr ;
p -> t++ ;
opr = pop ( p ) ;
}
p -> s++ ;
}
}
*( p -> t ) = ‗\0‘ ;
} /* returns the priority of an operator */
int priority ( char c )
{
if ( c == ‗$‘ )
return 3 ;
if ( c == ‗*‘ || c == ‗/‘ || c == ‗%‘ )
return 2 ;
else
{
if ( c == ‗+‘ || c == ‗–‘ )
return 1 ;
else
return 0 ;
} } /* displays the postfix form of given expr. */
void show ( struct infix p )
{
printf (―%s‖, p.target ) ;
}
3) Convert expression in postfix form to prefix form
#include <stdio.h>
#include <conio.h>
#include <string.h>
#define MAX 50
struct postfix
{ char stack[MAX][MAX], target[MAX] ;
char temp1[2], temp2[2] ;
char str1[MAX], str2[MAX], str3[MAX] ;
int i, top ;
};
void initpostfix ( struct postfix * ) ;
void setexpr ( struct postfix *, char * ) ;
void push ( struct postfix *, char * ) ;
void pop ( struct postfix *, char * ) ;
void convert ( struct postfix * ) ;
void show ( struct postfix ) ;
void main( )
{
struct postfix q ;
char expr[MAX] ;
clrscr( ) ;
initpostfix ( &q ) ;
printf ( ―\nEnter an expression in postfix form: ‖) ;
gets ( expr ) ;
setexpr ( &q, expr ) ;
convert ( &q ) ;
printf ( ―\nThe Prefix expression is: ‖ ) ;
show ( q ) ;
getch( ) ;
} /* initializes the elements of the structure */
void initpostfix ( struct postfix *p )
{ p -> i = 0 ;
p -> top = –1 ;
strcpy ( p -> target, ―‖ ) ;
} /* copies given expr. to target string */
void setexpr ( struct postfix *p, char *c )
{ strcpy ( p -> target, c ) ;
}/* adds an operator to the stack */
void push ( struct postfix *p, char *str )
{
if ( p -> top == MAX – 1 )
printf (―\nStack is full.‖ ) ;
else {
p -> top++ ;
strcpy ( p -> stack[p -> top], str ) ;
}
} /* pops an element from the stack */
void pop ( struct postfix *p, char *a )
{ if ( p -> top == –1 )
printf (―\nStack is empty.‖ ) ;
else
{
strcpy ( a, p -> stack[p -> top] ) ;
p -> top– – ; }
} /* converts given expr. to prefix form */
void convert ( struct postfix *p )
{ while ( p -> target[p -> i] != ‗\0‘ )
{ /* skip whitespace, if any */
if ( p -> target[p -> i] == ‗ ‘)
p -> i++ ;
if( p -> target[p -> i] == ‗%‘ || p -> target[p -> i] == ‗*‘ ||
p -> target[p -> i] == ‗-‘ || p -> target[p -> i] == ‗+‘ ||
p -> target[p -> i] == ‗/‘ || p -> target[p -> i] == ‗$‘ )
{ pop ( p, p -> str2 ) ;
pop ( p, p -> str3 ) ;
p -> temp1[0] = p -> target[ p -> i] ;
p -> temp1[1] = ‗\0‘ ;
strcpy ( p -> str1, p -> temp1 ) ;
strcat ( p -> str1, p -> str3 ) ;
strcat ( p -> str1, p -> str2 ) ;
push ( p, p -> str1 ) ;
}
else
{
p -> temp1[0] = p -> target[p -> i] ;
p -> temp1[1] = ‗\0‘ ;
strcpy ( p -> temp2, p -> temp1 ) ;
push ( p, p -> temp2 ) ;
}
p -> i++ ;
} }/* displays the prefix form of expr. */
void show ( struct postfix p )
{
char *temp = p.stack[0] ;
while ( *temp ) {
printf (―%c‖, *temp ) ;
temp++;
}
}
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Caution
In the stacks reserving a small amount of space for each stack may increase the numbers of times overflow
occurs and the time required for resolving an overflow such as by adding space to the stack may be more
expensive than the space saved.
Program
#define MAX 10
#include<stdio.h>
#include<conio.h>
int stack[MAX],topA=-1,topB=MAX;
void pushA(int no) // The push operation adds a new item to the top of the stack
{ if(topA==topB) //check for the overflow
{ printf(.\n OVERFLOW.);
return;
} stack[++(topA)]=no;
} int popA() // The pop() method removes the last element of an array, and returns that element of the stack.
{ if(topA== –1) //check for the underflow
{
printf(―\n UNDERFLOW‖);
return –999;
} return stack[(topA)– –];
} void showA()
{
int i;
if(topA== –1) // check the condition for stack empty
{ printf(―\n stack Empty‖);
return;
} for(i=topA;i>=0;i– –)
{ printf(―\n %d.,stack[i]‖); // return stack elements by for loop
}} void pushB(int no)
{ if(topB–1==topA) //check for stack B with stack A
{ printf(―\n OVERFLO‖.);
return;
} stack[.(topB)]=no;
} int popB()
{ if(topB==MAX)
{ printf(―.\n UNDERFLOW.‖);
return –999;
} return stack[(topB)– –];
} void showB()
{ int i;
if(topB==MAX)
{ printf(―.\n stack Empty.‖);
return;
} for(i=topB;i<MAX;i++)
{ printf(―.\n %d.,stack[i]‖);
}
} void main() //program start from here
{ clrscr();
int ch,val;
do // do-while loop start
{ printf(―.\n\n\n 1 PUSH A.‖);
printf(―.\n 2 PUSH B.‖);
printf(―.\n 3 POP A.‖);
printf(―.\n 4 POP B.‖);
printf(―.\n 5 Show A.‖);
printf(―.\n 6 Show B.‖);
printf(―.\n 0 EXIT.‖);
printf(―\nEnter your Choice‖);
scanf(―%d‖,&ch);
switch(ch)
{ case 1:printf(―\n enter Number‖);
scanf(―%d‖,&val);
pushA(val); //pushA() method called and go to declaration of pushA()
break;
case 2: printf(―\n enter Number‖);
scanf(―%d‖,&val);
pushB(val); //pushB() method called and go to declaration of pushB()
break;
case 3: val=popA();
if(val!=-999)
printf(―%d popped‖,val);
break;
case 4 : val=popB();
if(val!= –999)
printf(―%d popped‖,val);
break;
case 5: showA();
break;
case 6: showB();
break;
case 0:break;
default:printf(―\n Invalid choice‖);
}}while(ch!=0);
getch();
}
Step 2: PushA() method is declared of int no parameter. //void return type Check the condition:
if(topA==topB) // –1=MAX
Print overflow and return void
return stack[++(topA)]=no; //increment the position of stack by 1 in pushA()
Step 3: Declare the popA() method with int return type.
Check the condition: if(topA== –1) //no element
Print underflow and return –999
Return stack[(topA) – –]; // decrement the position of stack by 1in popA()
Step 6: Declare pushB() method with int no parameter. //return type void
Check the condition: if(topB–1==topA) // the topA element equal to topB–1 element
If true print overflow
Return stack[– –(topB)]=no; //return stack (– –topB)
Step 10: Now print all the method pushA, popA, showA and so on //Menu base
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Lab Exercise
1. Implement a stack in C in which each item on the stack is a varying number of integers. Choose a C
data structure for such a stack and design push and pop routines for it.
2. Write a program for copying contents of one stack to another.
2.10 Summary
A stack is simply a list of elements with insertions and deletion permitted at one end called
the stack top. Thus a stack data structure exhibits the LIFO (last in first out) property.
Push and pop are the operations that are provided for insertion and the removal of an
element from the stack.
A multiple stack program is a program with more than one stack.
Stack provides a restricted subset of basic container functionality. It provides insertion
removal and inspection of the element at the top of the stack. It does not allow iteration through its elements.
Stack is a container adapter, meaning that it is implemented on the top of some underlying
container type. By default that underlying type is dequeue, but vector or lists may be selected explicitly.
2.11 Keywords
ADT Stack: A stack is an ordered list in which insertions and deletions are made at one end called ―top‖. It is
also known as Last-In-First-Out (LIFO) list.
Backtracking: Backtracking is used in algorithms in which there are steps along some path (state) from some
starting point to some goal.
Multi stacks: A multiple stack program is a program with more than one stack.
Operation on Stacks: In modern computer languages, the stack is usually implemented with more operations
than just ―push‖ and ―pop‖.
Stacks: A stack is simply a list of elements with insertions and deletions permitted at one end-called the stack
top.
3.0 Objectives
After studying this chapter, you will be able to:
Explain concept of queues
Discuss types of queue
Understand linear queues
Understand queues as a linked list and abstract data structure
Explain applications of queues
3.1 Introduction
A queue is a linear list of elements that consists of two ends known as front and rear. We can delete elements
from the front end and delete elements at the rear end of a queue. A queue in an application is used to maintain
a list of items that are ordered not by their values but by their sequential value.
Queue is a non-primitive linear data structure, where the homogeneous data elements are stored in sequence.
In queue, data elements are inserted from one end and deleted from the other end. Hence, it is also called as
First-In First-Out (FIFO) list. Figure 3.1 shows a queue with 4 elements, where 55 are the front element and 65
is the rear element. Elements can be added from the rear and deleted from the front.
A queue is also a list of elements with insertions permitted at one end called the rear, and deletions permitted
from the other end called the front. This means that the removal of elements from a queue is possible in the
same order in which the insertion of elements is made into the queue. Thus, a queue data structure exhibits the
FIFO (first in first out) property. Insert and delete are the operations that are provided for insertion of elements
into the queue and the removal of elements from the queue.
Output:
1. Insert
2. Delete
3. Display
4. Quit
Enter your choice:
1 Enter an element to add in the queue: 25
Enter your choice: 1
Enter an element to add in the queue: 36
Enter your choice: 3
Elements in the queue: 25, 36
Enter your choice: 2
Element deleted from the queue is: 25
In this example:
1. The preprocessor directives #include are given. MAXSIZE is defined as equal to 50 using the #define
statement.
2. The queue is declared as an array using the declaration int queue_arr[MAX].
3. The while loop displays the different options on the screen and accepts the value entered in the variable
choice.
4. The switch case compares the value entered and calls the method corresponding to it. If the value entered is
invalid, it displays the message ―Wrong choice‖.
5. Insert method: The insert method inserts item in the queue. The if condition checks whether the queue is full
or not. If the queue is full, the ―Queue overflow‖ message is displayed. If the queue is not full, the item is
inserted in the queue and the rear is incremented by 1.
6. Delete method: The delete method deletes item from the queue. The if condition checks whether the queue
is empty or not. If the queue is empty, the ―Queue underflow‖ message is displayed. If the queue is not
empty, the item is deleted and front is incremented by 1.
7. Display method: The display method displays the contents of the queue. The if condition checks whether the
queue is empty or not. If the queue is not empty, it displays all the items in the queue.
Each time a customer makes payments for their goods (or the machine part is removed from the line, or a
person steps off from the escalator and so on) that object leaves from the front of the queue. This represents
the ―dequeue‖ function of the queue. Each time another customer or object enters the line to wait, they join the
end of the line. This represents the ―enqueue‖ function of the queue. The ―size‖ function of the queue returns
the length of the line and ―empty‖ function returns true only if the line is empty. Figure 3.4 depicts how a
queue is represented.
Figure 3.4: Representation of a Queue
#include<stdio.h>
#include<conio.h>
#define SIZE 5
int Q_F(int R)
{ return (R==SIZE-1)?1:0; }
int Q_E(int F, int R)
{ return(F>R)?1:0; }
void front_insert(int num, int Q[], int *F, int *R)
{ if(*F==0 || *R== –1)
{
Q[++(*r)]=item;
return;
}
if(*F!=0)
{
Q[– –(*F)]=item;
return;
}
printf(―Front inertion not possible\n‖);
}
void rear_delete(int Q[], int *F, int *R)
{ if(Q_E(*F, *R))
{
printf(―Queue underflow\n‖);
return;
}
printf(―The element deleted is %d\n‖, Q[(*R)– –]);
if(*F>*R)
{ *F=0, *R= –1; }
}
void display(int Q[], int F, int R)
{
int i;
if(Q_E(F, R))
{
printf(―Queue is empty\n‖);
return;
}
printf(―Contents of the queue is:\n‖);
for(i=F;i<=R; i++) { printf(―%d\n‖, Q[i]);
}
}
void main()
{
int choice, num, F, R, Q[10];
F=0; R= –1;
for(;;)
{
printf(―1. Insert at front/n‖);
printf(―2. Delete at rear end/n‖);
printf(―3. Display/n‖);
printf(―4. Exit/n‖);
scanf(―%d‖, &choice);
switch(choice)
{
case 1: printf(―Enter the number to be inserted\n‖);
scanf(―%d‖, &num);
front_insert(num, Q, &F, &R);
break;
case 2: rear_delete(Q, &F, &R);
break;
case 3: display(Q, F, R);
break;
default: exit(0);
}
}
}
Output:
1. Insert at front end
2. Delete at rear end
3. Display
4. Exit
1
Enter the number to be inserted
30
1. Insert at front end
2. Delete at rear end
3. Display
4. Exit
2
Enter the number to be inserted
40
1. Insert at front end
2. Delete at rear end
3. Display
4. Exit
3
The contents of the queue is
30 40
1. Insert at front end
2. Delete at rear end
3. Display
4. Exit
2
In this example:
1. The header files are defined and a constant value 5 is defined for variable SIZE using #define header. The
SIZE defines the size of the queue.
2. Four functions are created namely, Q_F( ), Q_E( ), front_insert( ), rear_delete( ), and display( ). The user
has to select an appropriate function to be performed.
3. The switch statement is used to call the front_insert( ), rear_delete( ), and display( ) functions.
4. When the user enters 1, the front_insert( ) function is called. In the front_insert( ) function the if loop checks
if the F pointer is equal to 0 or R pointer is equal to -1. If the result is true, then the R pointer is incremented
and the value entered by the user (num) is assigned to the Q. The value of R is returned. The second if loop
checks if the F pointer is not equal to 0. If the result is true, then the F pointer is decremented and the value
entered by the user (num) is assigned to Q. The value of F is returned. Else, The program prints the message
―front insertion not possible‖
5. When the user enters 2, rear_delete( ) function is called. In the rear_delete( ) function the if loop calls the
Q_E( ) function with the current pointer values of F and R. If the condition is true, the program prints the
message ―Queue underflow‖. It returns the value of F and R. The program prints the deleted element.
6. When the user enters 3, the function display( ) is called. In the function display( ) the if loop checks for the
queue size. If the queue is not empty the program displays the elements.
7. When the user enters 4, the program exits.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
In this example:
1. The header files are included and a constant value 5 is defined for variable SIZE using #define statement.
The SIZE defines the size of the queue.
2. A queue is created using an array named Q with an element capacity of 20. A variable named COUNT is
declared to store the count of number elements present in the queue.
3. Four functions are created namely, Q_F(), Q_E(), rear_insert(), front_delete(),and display(). The user has to
select an appropriate function to be performed.
4. The Switch statement is used to call the rear_insert(), front_delete(), and display() functions.
5. When the user enters 1, rear_insert() function is called. In the rear_insert() function, the if loop checks if the
count is full. If the condition is true, then the program prints a message ―Queue is empty‖. Else, it checks for
the value of R and assigns the element (num) entered by the user to R. Initially, when there are no elements in
the queue, R value will be 0. After every insertion the variable COUNT is incremented.
6. When the user enters 2, the front_delete() function is called. In this function, the if loop checks if the
variable COUNT is empty. If the condition is true, then the program prints a message ―Queue underflow‖.
Else, the element in the 0th position will be deleted. The size of F is computed and the COUNT is set to 1.
7. When the user enters 3, the display() function is called. In this function, the if loop checks if the value of
COUNT is 0. If the condition is true, the program prints a message ―Queue is empty‖. Else, the value of F is
assigned to the variable i. The for loop then displays the elements present in the queue.
8. When the user enters 4, the program exits.
Output:
1. INSERT
2. DELETE
3. DISPLAY
4. EXIT
1. INSERT
2. DELETE
3. DISPLAY
4. EXIT
Enter your choice:
1
Enter the element to be inserted
20
Enter a priority
2
1. INSERT
2. DELETE
3. DISPLAY
4. EXIT
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Caution
While using queue, Shift. IIRC it is not O(1), but O(n) and might be too slow if the queue gets large.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
2. Identify the data structure which allows deletions at both ends of the list but insertion at only one end.
(a) Input-restricted deque (b) Output-restricted deque
(c) Priority queues (d) None of these.
4. What should be the value of R pointer, before inserting elements into the queue?
(a) -1 (b) 0 (c) 1 (d) Is not set any value
Public Queues
In a domain environment, public queues are queues that are published in Active Directory and hence are
replicated throughout your Windows Server 2003 family forest. Note that only the properties for these queues
are replicated not the actual queues themselves or their contents. Any computer within your forest can
potentially access information regarding public queues provided the user has sufficient permissions to access
the applicable queue objects. Generally speaking, any user in the forest with access to Active Directory and the
Send To permission for a given public queue can send messages to it. In an Active Directory environment,
defining public queues ensures that queues are registered in the directory service, and that their registration is
backed up. They are persistent and available to other applications.
Receiving Messages
You cannot retrieve a message from a remote transactional queue or from a local nontransactional queue
within a transaction. Message Queuing does not provide support for transactional remote-read operations
(accessing the contents of a queue from a computer other than the one on which the queue is located).
However, this same functionality can be achieved by using a transactional read-response application.
Note that nontransactional receive operations are allowed from any queue (transactional or nontransactional,
local or remote). In particular, a nontransactional receive operation from a transactional queue is essentially a
transaction consisting of a single receive operation.
Caution
While using a transaction to retrieve messages, the queue must be a local transactional queue.
3.7 Recursion
• Recursion is a problem-solving approach in which a problem is solved using repeatedly applying the same
solution to smaller instances.
• Each instance to the problem has size.
• An instance of size n can be solved by putting together solutions of instances of size at most n-1.
• An instance of size 1 or 0 can be solved very easily.
The internal execution of this call begins with a character input, and then a second call to "print_backwards()"
(at this point, nothing has been output to the screen). Again, space is set aside for this second call:
The process repeats, but inside the third call to "print_backwards()" a full-stop character is input, thus allowing
the third call to terminate with no further function calls:
This allows the second call to "print_backwards()" to terminate by outputting an "i" character, which in turn
allows the first call to terminate by outputting an "H" character:
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
3.8 Fibonacci Series Using Recursion
As a simple rule of recursion, any function can be computed using a recursive routine if :
1. The function can be expressed in its own form.
2. There exists a termination step, the point at which f(x) is known for a particular ‗x‘.
Therefore to write a recursive program to find the nth term of the fibonacci series, we have to express the
fibonacci sequence in a recursive form using the above 2 rules :
1. fib(n) = fib(n-1) + fib(n-2) (recursive defination of fibonacci series).
2. if n=0 or n=1, return n (termination step).
Using these 2 rules, the recursive program for finding the nth term of the fibonacci series can be coded very
easily as shown.
#include "stdio.h"
#include "conio.h"
int y;
fib(int n)
{ if(n==1 || n==0)
return n;
y=fib(n-1)+fib(n-2);
return y;
}
main()
{ int a,r;
clrscr();
printf("Enter any number : ");
scanf("%d",&a);
r=fib(a);
printf("The no. at position %d is %d",a,r);
getch();
return 0;
}
6. Queue is a
(a) Linear data structure (b) Non-linear data structure
(c) Both (l) and (2) (d) None of these.
8. The end from which an element gets removed from the queue is called
(a) front (b) rear (c) top (d) bottom
Solution
Suppose we are moving discs from tower 1 to tower 2 with the help of tower 3. We can see that the bigger disc
of the bottom can only be moved when all top discs are moved out of tower 1. As this is the bigger disc it can
only go to an empty rod. So this creates a subproblem. To move all the disc from 1 to 2, we need to move n-1
discs from 1 to 3 using 2. Now after moving the top n-1, we can move the bottom one to tower2 as it is empty
at that point. After that we need to move all discs from tower 3 to tower 2 with the help of tower 1, as this is
currently empty.
Lab Exercise
1. Write a C program to implement a double-ended queue, which is a queue in which insertions and deletions
may be performed at either end. Use a linked representation.
2. Write program of queue and insert data in the front end.
3.10 Summary
A queue is an ordered collection of items in which deletion takes place at the front and insertion at the rear
of the queue.
The basic operations performed on a queue include inserting an element at the rear end and deleting an
element at the front end.
Representing a queue in the memory includes representing the way in which the elements are stored in the
memory and naming the address to which the front and rear pointers point to.
The different types of queues are double ended queue, circular queue, and priority queue.
A queue is a list of elements with insertion permitted at one end called the rear and deletion permitted
from the other called the front.
The purpose of initializing the queue is served by assigning–1 (as a sentinel value) to the front and rear
variables.
Application of the queue data structure is in the implementation of priority queues required to be
maintained by the scheduler of an operating system.
3.11 Keywords
Circular queue: It is easier to represent a queue as a circular list than as a linear list. As a linear list, a queue is
specified by two pointers, one to the front of the list and the other to its rear.
Dequeue: Process of deleting elements from the queue.
Enqueue: Process of inserting elements into queue.
Front end: It refers to the first node in the queue.
Header nodes: An alternative method is to place a header. node as the first node of a circular list. This list
header may be recognized by a special value in its il_fo field that cannot be the valid contents of a list node in
the context of the problem, or it may contain a flag marking it as a header.
Rear end: It refers to the last node in the queue.
4.0 Objectives
After studying this chapter, you will be able to:
Explain the representation of tree
Define the binary tree
Understand the representation of binary tree
Discuss the basic operation on binary tree
Explain the creation of binary search tree
4.1 Introduction
We all know that data structure is a set of data elements grouped together under one name. A data structure
can be considered as a set of rules that hold the data together. Almost all computer programs use data
structures. Data structures are an essential part of algorithms. We can use it to manage huge amount of data in
large databases. Some modern programming languages emphasize more on data structures than algorithms.
Choosing the best data structure for a program is a challenging task. Similar tasks may require different data
structures. We derive new data structures for complex tasks using the already existing ones. We need to
compare the characteristics before choosing the right data structure. A tree is a hierarchical data structure
suitable for representing hierarchical information. The tree data structure has the characteristics of quick
search, quick inserts, and quick deletes.
In the hierarchical organization of books shown in Figure 4.1, Books is the root of the tree. Books can be
classified as Fiction and Non-fiction. Non-fiction books can be further classified as Realistic and Non-realistic
which are the leaves of the tree. Thus, it forms a complete tree structure.
Trees are primarily treated as data structures rather than as data types.
A tree is a widely-used data structure that depicts a hierarchical tree structure with a set of linked nodes. The
elements of data structure in a tree are arranged in a non-linear fashion i.e. they use two dimensional
representations. Thus, trees are known as non-linear data structures. This data structure is more efficient in
inserting additional data, deleting unnecessary data, and searching new data.
This is a tree because it is a set of nodes {A,B,C,D,E,F,G,H,I}, with node A as a root node and the remaining
nodes partitioned into three disjointed sets {B,G,H,I}, {C,E,F} and {D}, respectively. Each of these sets is a
tree because each satisfies the aforementioned definition properly.
Shown in Figure 4.3 is a structure that is not a tree.
Even though this is a set of nodes {A,B,C,D,E,F,G,H,I}, with node A as a root node, this is not a tree because
the fact that node E is shared makes it impossible to partition nodes B through I into disjointed sets.
Degree of a Node of a Tree
The degree of a node of a tree is the number of subtrees having this node as a root. In other words, the degree
is the number of descendants of a node. If the degree is zero, it is called a terminal or leaf node of a tree.
Degree of a Tree
The degree of a tree is defined as the maximum of degree of the nodes of the tree, that is, degree of tree = max
(degree (node i) for i = 1 to n).
Level of a Node
We define the level of the node by taking the level of the root node as 1, and incrementing it by 1 as we move
from the root towards the subtrees. So the level of all the descendants of the root nodes will be 2. The level of
their descendants will be 3, and so on. We then define the depth of the tree to be the maximum value of the
level of the node of the tree.
Figure 4.5: A Binary Tree.In the Figure 4.5, the node A is the root node. The nodes B and D belong to the left
sub-tree and nodes C, E, F and G belong to the right sub-tree.
A full binary tree is a binary of depth k having 2k − 1 nodes. If it has < 2k − 1, it is not a full binary tree. For
example, for k = 3, the number of nodes = 2k − 1 = 23 − 1 = 8 − 1 = 7. A full binary tree with depth k = 3 is
shown in Figure 4.7.
Figure 4.9: An array representation of a complete binary tree having 5 nodes and depth 3.
Shown in Figure 4.10 is another example of an array representation of a complete binary tree with depth k = 3,
with the number of nodes n = 4.
Figure 4.10: An array representation of a complete binary tree with 4 nodes and depth 3.
In general, any binary tree can be represented using an array. We see that an array representation of a complete
binary tree does not lead to the waste of any storage. But if you want to represent a binary tree that is not a
complete binary tree using an array representation, then it leads to the waste of storage as shown in Figure
4.11.
Figure 4.11: An array representation of a binary tree.
A tree representation that uses this node structure is shown in Figure 4.12.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Preorder Traversal
The preorder traversal of a non-empty binary tree is defined as follows:
1. First, visit the root node.
2. Next, traverse the left sub-tree of root node in preorder.
3. Finally, traverse the right sub-tree of root node in preorder.
The Figure 4.13 depicts the functioning of preorder traversal.
In the Figure 4.14, T1 is the root node; T2 and T3 are the sub-trees of the root node. The preorder traversal for
a binary tree traverses first the root node, then the left sub-tree and finally the right sub-tree. Since the
traversing process is in the order of root node, left and right sub-trees, let us assign the alphabet N for visiting
root node, L for visiting left sub-tree and R for visiting right sub-tree. The term T1NLR indicates that node T1
is the root node of the binary tree and subscript NLR indicates preorder tree traversal. The following represents
the preorder traversal for binary tree present in the Figure 4.14.
Hence, the preorder traversal for the tree shown in Figure 4.14 is T1 T2 T4 T7 T8 T3 T5 T9 T6. The traversal
in preorder starts with traversing root, left sub-tree and right sub-tree. But this traversing happens only during
the downward movement of the traverse operation in a binary tree. If the upward traversing is required in a
tree, then it takes place in a reverse manner. Due to this reason, a stack is required to save pointer variable
during the tree traversal. This mode of traversing is known as iterative traverse. The general form of iterative
traversal for preorder using stack is as follows:
Step 1: If the tree is empty. //Check if the tree is empty
then (―empty tree‖) // If tree is empty write ―empty tree‖
return
else
Place the pointer to the root of the tree on the stack// Move the pointer to the root of stack
Step 2:
Repeat step 3 while stack is not empty.
Step 3: Pop the top of the stack.
Repeat while the pointer value is not NULL.
Write (Node containing data).
If the right sub-tree is not empty, then stack the pointer to the right sub-tree and set pointer value to left sub-
tree.
Inorder Traversal
The inorder traversal of a non-empty tree is defined as follows:
1. First, traverse the left sub-tree of the root node in inorder.
2. Next, visit the root node.
3. Finally, traverse the right sub-tree of the root node.
The Figure 4.15 depicts the functioning of inorder traversal. The figure provides a functional procedure of the
inorder traversal.
Hence, the inorder traversal for the tree shown in Figure 4.14 is T7 T4 T8 T2 T1 T5T9T3 T6.
Example: Program for Inserting Elements into the Tree and Traversing in Inorder.
#include<stdio.h>
#include<conio.h>
/*Define tree as a structure with data and pointers to the left and right sub-tree*/
struct tree
{long info; struct tree *left;
struct tree *right;};
/* bintree is declared as the datatype tree and initialized to Null*/
struct tree *bintree=NULL;
/*Global declaration of function insert which returns a pointer to the tree structure and accepts a pointer to tree
and a long digit as parameters */
struct tree *insert(struct tree*bintree, long digit);
/*Global declaration of function inorder which does not return any value and accepts a pointer to tree as a
parameter*/
void inorder (struct tree*bintree);
void main() // Define main function
{
long digit;
clrscr();
puts(―Enter integers: and 0 to quit‖);
scanf(―%d‖,&digit); //Reads the first number to be inserted
while (digit!=0)
{
bintree=insert(bintree,digit); // Inserts the number entered
scanf(―%d‖,&digit); }
puts(―Inorder traversing of bintree:\n‖);
inorder(bintree); //Calling inorder function to traverse the tree
}
struct tree* insert(struct tree* bintree, long digit); //insert function is defined
{
if(bintree==NULL) //checks if the tree is empty
{
bintree=(struct tree*) malloc(sizeof(struct tree)); //Allocates memory for the tree
bintree->left=bintree->right=NULL; //Left and right sub-trees is set to NULL
/* The digit entered is assigned to the info element of the tree node*/
bintree->info=digit;
}
else
{
if(digit<bintree->info) //If the entered number is less than the data of the node
bintree->left=insert(bintree->left,digit); //insert the digit in the left sub-tree else
/*If the entered number is greater than the data of the node*/
if(digit>bintree->info)
bintree->right=insert(bintree->right,digit); //insert the digit in the right sub-tree
else
if(digit==bintree->info) //If entered number is equal to data of the node
{ //exits program after printing that duplicate node is present puts(―Duplicates
node:program exited‖);
exit(0);
}
}
return(bintree);
}
void inorder(struct tree*bintree) //Defining inorder function
{
if(bintree!=NULL) //Checks if tree is empty
{
inorder(bintree->left); //Calls the inorder function for left sub-trees
printf(―%4ld‖,bintree->info); // Prints data of the node
inorder(bintree->right); // Inorder function of right sub-trees
}
}
Output:
Enter integers and 0 to quit
61237890
Inorder traversing of bintree
1236789
In this program,
1. First the structure tree is defined. It contains a variable info of long type and pointers to the right and left
sub-trees.
2. The variable bintree is declared as data type tree and initialized to NULL.
3. The function insert and inorder are globally declared.
4. In the main() function,
(a) First, the numbers to be entered are declared using long data type.
(b) Then, the digit entered is read by the computer.
(c) If the digit is not 0, the insert() function is called to insert the entered digit into the binary tree. Step b, and
c are executed repeatedly until the digit entered is 0.
(d) Finally, the inorder() function is called to traverse the tree.
5. The insert() function is defined. It accepts a pointer to a tree and a digit to be inserted in the tree as
parameters. The insert function performs the following steps:
(a) It checks if the tree is empty or non-empty.
(b) If the tree is empty it assigns memory to the node, sets the left and right pointers of the node as NULL and
assigns the digit to the info variable of the node.
(c) If the tree is non-empty, then it performs the following steps:
(i) If the digit entered is less than the info stored in the node, it recursively calls itself to enter the digit in
the left sub-tree.
(ii) If the digit entered is greater than the info stored in the node, it recursively calls itself to enter the digit
in the left sub-tree.
(iii) If the entered digit is equal to the info stored in the node, then it displays a message ―Duplicates node:
program exited‖ and exits.
(d) The function inorder() is then defined. It accepts a pointer to a tree and a digit to be inserted in the tree as
parameters
(i) It checks if the tree is empty or non-empty.
(ii) If the tree is non-empty, it traverses first the left sub-tree, then prints the value of the variable info
stored in the node and then traverses the right sub-tree.
In the inorder traversal, the nodes traversal starts with visiting right sub-tree, root and left sub-tree. While
traversing using stacks, the left sub-tree in a binary tree is traversed by moving down the tree towards left and
then pushing node with data into the stack until the left sub-tree pointer node is NULL. Once the left sub-tree
is traversed, the stack becomes non-empty, then pop the elements from the stack and print the data, and then
traverse the pointer towards right sub-tree. This process continues until right sub-tree is NULL. The algorithm
for inorder traversal using stacks is as follows:
Step 1: If the tree is empty then
{
write( ―empty tree‖)
return
} else
Place the pointer to the root of the tree on the stack.
Step 2: Repeat step 4 while stack is not empty.
Step 3: Repeat while pointer value is not NULL and stack the pointer to the left sub-tree.
Repeat while the pointer is not NULL.
Write (Node containing data)
If the right sub-tree is not empty, then stack the pointer to the right sub-tree and set pointer to the right sub-
tree.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Postorder Traversal
The postorder traversal of a non-empty tree is defined as follows:
1. First, traverse the left sub-tree of the root node in postorder.
2. Next, the right sub-tree of the root node in postorder.
3. Finally, visit the root node.
The Figure 4.16 depicts the functioning of postorder traversal. The figure represents the functionality of
postorder traversal.
Example: Consider the binary tree shown in the Figure 4.14. The postorder traversing depends on traversing
first the left sub-tree, right sub-tree and then the node. Since, the traversing process starts with the left sub-tree,
right sub-tree and root node, let us consider the alphabets L, R, N for postorder traversal.
The term T1LRN indicates that node T1 is the root node of the binary tree and subscript LRN indicates
postorder tree traversal.
Finally the postorder traversal for the tree shown in figure 4.14 is T7 T8 T4 T2 T9 T5 T6 T3 T1. The post
order traversal starts with left sub-tree, then moves to the right sub-tree, and then finally to the root.
Considering the postorder traversal using stacks, each node is stacked twice during the traversal of left sub-tree
and right sub-tree. To distinguish between the left sub-tree and right sub-tree a traversing flag is used. During
the traverse of right sub-tree, flag is set to 1. This helps in checking the flag field of the corresponding node. If
the flag of a node is negative, then right sub-tree is traversed else the left sub-tree is traversed. The algorithm
for postorder of binary tree using stacks is as follows:
Step 1: If the tree is empty then
{
write (―Empty tree‖) return
}
else
Initialize the stack and pointer value to the root of tree.
Step 2: Start an infinite loop and repeat till step 5.
Step 3: Repeat while pointer value is no NULL, stack current pointer value. Set pointer value to left sub-tree.
Step 4: Repeat while top pointer on stack is negative.
Pop pointer off stack
write (value of pointer)
If the stack is empty
then return
Step 5: Set pointer value to the right sub-tree of the value on top of the stack.
Step 6: Stack the negative value of the pointer to right sub-tree.
Example
Program for Inorder, Preorder and PostorderTree Traversals.
#include<stdio.h>
#include<conio.h>
struct node 1 //Declare node as struct variable
{
int data; //Declare data with data type int
struct node *right, *left; //The node stores pointers to the right and left sub-trees. }root,*p,*q;
/*declare root as a variable of type node and p and q as pointers to node*/
struct node *make(int y)
{
struct node *newnode; //Declare newnode as pointer to struct node newnode=(struct node *)
malloc(sizeof(struct node)); //Allocate space in memory
/*Assign object data to newnode and initialize to variable y*/
newnode->data=y;
/*Declare right newnode and left newnode to NULL*/
newnode->right=newnode->left=NULL;
return(newnode);
}
void left(struct node *r,int x) // Define left sub-tree function
{
/*Checks if left sub-trees is not equal to NULL*/
if(r->left!=NULL) printf(―\n Invalid !‖); // Prints invalid
else
r->left=make(x); //Initialize left sub-tree
}
void right(struct node *r, int x) //Define right sub-tree
{
/*Checks if right sub-tree is not equal to NULL*/
if(r->right!=NULL)
printf(―\n Invalid !‖); // Prints invalid
else
r->right=make(x); //Initialize right sub-tree
} void inorder(struct node *r) //Define inorder traversal function
{
/*Conditional statement, check if r is not equal to NULL*/
if(r!=NULL)
{
/*Recursively call inorder passing the address of the left sub-tree*/
inorder(r->left);
printf(―\t %d‖, r->data); //Prints the data of the node
/*Recursively call inorder passing the address of the left sub-tree*/
inorder(r->right);
}
}
void preorder(struct node *r) //Define preorder function
{
/*Checks if r is not equal to NULL*/
if(r!=NULL)
{
printf(―\t %d‖, r->data); //Prints the data of the node
/*Recursively call preorder passing the address of the left sub-tree*/
preorder(r->left);
/*Recursively call preorder passing the address of the right sub-tree*/
preorder(r->right);
}
}
void postorder(struct node *r) //Define postorder function
{
if(r!=NULL) //Checks if r is not equal to NULL
{
/*Recursively call postorder passing the address of the left sub-tree*/
postorder(r->left); /
/*Recursively call postorder passing the address of the left sub-tree*/
postorder(r->right);
printf(―\t %d‖, r->data); //Prints the data of the node
}
}
void main()
{
int no; //Declare variable no
int choice; //Declare variable choice
clrscr();
printf(―\n Enter the root:‖);
scanf(―%d‖,& no); //Reads the number entered
root=make(no); //Initialize the number to root
p=root; // Value of root is then assigned to variable p
while(1) //Checks the conditions provided in while loop
{
/*Prints the statement ―Enter another number‖*/
printf(―\n Enter another number:‖);
scanf(―%d‖, &no); //Reads the number entered
/*Conditional statement, check if no is equal to –1*/
if(no== –1)
break; //If condition is true, the if loop breaks
p=root; //Assign value of root to p variable
q=root; //Assign value of root to q variable
/*Check if no is not equal to variable p and q not equal to NULL*/
while(no!=p->data && q!=NULL)
{
p=q;
if(no<p->data) //Check if no is less than variable p
/*Set q to the left sub-tree of p*/
q=p->left;
else
/*Set q to the right sub-tree of p*/
q=p->right;
}
/*Check if variable no is less than p variable with data*/
if(no<p->data)
{
/*prints the node of left tree*/
printf(―\n Left branch of %d is %d‖, p->data, no);
left(p, no);
}
else
{
right(p,no);
/*prints the node of right tree*/
printf(―\n Right Branch of %d is %d‖, p->data,no);
}
while(1)
{
printf(―\n 1.Inorder Traversal \n 2.Preorder Traversal \n 3.Postorder Traversal \n 4.Exit‖);
Output:
Enter the root: 5
Enter another number: 7
Right branch of 5 is 7
1. Inorder traversal
2. Preorder traversal
3. Postorder traversal
4. Exit
Enter choice: 1
5 7
1. Inorder traversal
2. Preorder traversal
3. Postorder traversal
4. Exit
Enter choice: 2
5 7
1. Inorder traversal
2. Preorder traversal
3. Postorder traversal
4. Exit
Enter choice: 3
7 5
In this program,
1. First, the header file stdio.h is included using include keyword.
(a) The variable node is defined as a structure. It has an integer variable data and pointers to its left and right
sub-trees, root is a declared as a variable of type node. The variables p and q are declared as pointers to node.
2. Then, the make() function is defined. It returns a pointer to the structure node and accepts as parameter an
integer variable y. It executes the following steps:
(a) It declares newnode as pointer to the structure node and assigns memory to it.
(b) It assigns the integer y to the data variable of the newnode.
(c) It sets the right and left sub-tree pointers of the newnode to NULL
3. Then, the left() sub-tree function is defined. It accepts as parameters r which is a pointer to the structure
node and an integer variable x. It executes the following steps:
(a) It checks if the left pointer of r is empty. If it is not empty, then it calls the make function passing x as a
parameter.
4. Then, the right() sub-tree function is defined. It accepts as parameters r which is a pointer to the structure
node and an integer variable x.:
(a) It checks if the right pointer of r is empty. If it is not empty, then it calls the make function passing x as a
parameter.
5. Then, the inorder() function is defined. It accepts as parameters r which is a pointer to the structure node.
(a) It checks if r is non-empty. If it is non-empty, it then traverses the left sub-tree, prints the data and then
traverses the right sub-tree.
6. Then, the preorder() function is defined. It accepts as parameters r which is a pointer to the structure node.
(a) It checks if r is non-empty. If it is non-empty, it prints the data, traverses the left sub-tree, and then
traverses the right sub-tree.
7. Then, the postorder() function is defined. It accepts as parameters r which is a pointer to the structure node.
(a) It checks if r is non-empty. If it is non-empty, it then traverses the left sub-tree, then the right sub-tree and
then prints the data.
8. In the main() function,
(a) The variables no, choice are declared as integer variables.
(b) The value for the root node is accepted and added to the tree by calling the make() function.
(c) The value of root is then assigned to variable p and q.
(d) The program execution enters a while loop in which the following steps are performed:
(i) First, it accepts another integer no.
(ii) Then, the while loop is exited if no is equal to –1.
(iii) Then, the following steps are repeatedly performed if the no. entered is not equal to variable data
of p and q is not equal to NULL.
I. First, p is assigned the value of q.
II. If the number is lesser than the data of p, q is assigned the address of the left sub-tree else q
is assigned the address of the right sub-tree
(iv) If no is less than data of variable p the function left() is called passing p and no as the parameters
else the function right () is called.
(e) A while loop is used to obtain the choice of traversal.
(i) If 1 is entered, inorder traversal is selected and the inorder function is executed.
(ii) If 2 is entered, preorder traversal is selected and preorder function is executed.
(iii) If 3 is selected, preorder traversal is selected and postorder function is executed.
(iv) If 4 is entered, the while loop is exited.
(v) If wrong digit is entered, an error message is printed on the screen.
(f) The getch() prompts the user to enter a key to exit the program.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Binary search trees provide an efficient way to search through an ordered collection of objects. Consider
searching an ordered list. The search must proceed successively from one end of the list to the other. On an
average, n/2 nodes must be compared for an ordered list that contains n nodes. In the worst case, all n nodes
need to be compared. For a large collection of objects, this is very expensive.
Binary search tree enables searching quickly through the nodes. The longest path to search is equal to the
height of the tree. Thus, the efficiency of a binary search tree depends on the height of the tree. For a tree with
n nodes, the smallest possible height is log n and that is the number of comparisons that are needed on an
average to search the tree.
Thus, binary search trees are node based data structures used in many system programming applications for
managing dynamic sets. Another example for a binary search tree is given in Figure 4.18. We know that, all
the elements in the left sub-tree are lesser than the root node and the elements in the right sub-tree are greater
than the root node.
In the binary search tree represented in Figure 4.18, 10 is the root node and all the elements in the left sub-tree
are lesser than 10 and the elements in the right sub-tree are greater than 10. Every node in the tree satisfies this
condition for the existing left and right sub-trees.
Caution
A tree must be balanced to obtain the smallest height, i.e., both the left and right sub-trees must have the same
number of nodes.
3. The nodes which have the same parent node are known as …………………...
(a) Root (b) Siblings (c) Left nodes (d) Right nodes
In the Figure 4.19, the node T1 at level 0 represents the root node. The two sub nodes T2 and T3 are the child
nodes at level 1. The successor nodes T4, T5, T6 and T7 are the terminal nodes. The number of nodes at level
1 is equal to 2. Similarly the number of nodes at level 2 is equal to 4.
The main advantage of a complete binary tree is that the position of the parent node and child node can be
mapped easily in an array. The mapping for a binary tree is defined by assigning a number to every node in the
tree. The root node is assigned the number 1.For the other nodes, if i is its number, then the left child node is
assigned the position 2i and the right child node is assigned the position 2i+1. The mapping of binary tree
provides a simple form of array storage representation. Hence the nodes in array can be stored as a[i], where a[
] is an array.
Caution
In a complete binary tree, at level 0 there must be only one node known as root node and it should have a
maximum of two sub nodes at level 1, and at level 2 there must be a maximum of 4 sub nodes.
Example: Figure 4.20 depicts the representation of complete binary tree in an array. An array representation of
binary tree allocates nodes of the tree in a memory. Each node is indexed such that the nodes are associated
with the index number of the array for allocation.
In the Figure 4.21, nodes a and b have two child nodes. The parent node a has two child nodes b and c forming
the left sub-tree and right sub-tree respectively. Similarly, node b is the parent node for nodes d and e. A
binary tree in which its non-leaf nodes possess exactly two child nodes represents a strictly binary tree.
Example: Consider Figure 4.22 that depicts a binary tree with a single child node. The binary tree with single
child node is initial phase of designing the binary tree which can be made a complete binary tree by extending
the nodes.
In the Figure 4.22, the parent nodes b and c have only one child, node d and e respectively. The node e also
has single child node f. By adding another child node to the parent nodes b, c and e, we can obtain an extended
binary tree.
Figure 4.23 depicts an extended binary tree.
Figure 4.23: Extended Binary Tree.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Thus, 9 must be copied at the position where the value of node was 8 and the left pointer of 10 must be set as
NULL. This completes the entire deletion procedure.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
The C function to find the height of a tree is shown in the given example.
Example
In a binary search tree, a node with minimum value is found by traversing and obtaining the extreme left node
of the tree. If there is no left sub-tree, then the root is returned as the node which holds the item of the least
value.
6. In a binary tree, at level 1, there must be only one node known as root node.
(a) True (b) False
7. Which among the following traversal of binary tree starts with traversing the root node?
(a) Preorder (b) Inorder (c) Postorder (d) All of these.
8. Which among the following operations is performed before performing insertion operation?
(a) Searching (b) Deletion (c) Modification (d) None of these.
Consider the binary tree in Figure 4.34. The figure is a simple binary tree with four levels of nodes.
In the Figure 4.34, the inorder traversal of the binary tree is ―H D I B E A F C G‖. The equivalent threaded
binary tree is shown in Figure 4.35.
For a threaded binary tree leaf nodes are considered. In Figure 4.35, if we consider leaf node I then the inorder
predecessor of I is D and the left thread will point at node D.
Similarly, the inorder successor of I is B and the right thread will point at node B. All the nodes in the binary
tree will be traversed in the similar format.
But, the left thread of node H does not have an inorder predecessor and right thread of node G does not have
inorder successor. In such situations, the threads pointing to particular node are not obtained. Hence, to solve
such problems, the threaded binary tree uses a node called head node. The head node will have an identical
structure similar to the normal tree nodes. In such cases, if the tree is non-empty, then its left child will point at
the root of the tree. Similarly, the left thread of node H and right thread of node G will point to its head node.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Using the symbols and frequencies from the Table 4.36, we create the leaf nodes and then sort them. Symbols
D and E have the least frequency, 8; these two nodes are combined to make a node DE having frequency
8+8=16. This new node DE is the parent node of the nodes D and E, and DE replaces D and E as shown in
Figure 4.37.
Again we sort the nodes based on their frequency of occurrence. Now DE and C have the least frequencies i.e.,
16 and 10 each. This time we combine DE and C to create a new node DEC having frequency 26. Nodes DE
and C are replaced by their parent DEC as depicted in Figure 4.38.
Similarly, combine B with frequency 12 and DEC with frequency 26 to create BDEC. BDEC becomes the
parent of B and DEC with frequency 38. At last only two nodes are left namely, BDEC and A. We again sort
them and combine both to form ABDEC which has a frequency count of 62.
Figure 4.39: The Huffman Tree.
After making ABDEC parent of A and BDEC and replacing them with ABDEC, we have created the Huffman
tree for the symbols in Table 4.1. Node ABDEC is the root of the tree. The Figure 4.38 shows the Huffman
tree thus constructed.
Figure 4.41: Two example expression trees; an expression tree‘s leaves are operands, its internal nodes
operators.
An expression tree need not store parentheses because the correct order of operations inheres in the tree‘s
structure. Preorder, inorder, and postorder traversals of an expression tree yield prefix, infix, and postfix
expressions, respectively. Huffman coding trees and expression trees are examples of full binary trees. Any
node of a full binary tree has either zero or two children. Some books reverse the definitions of full and
complete binary trees presented here.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
A multiway tree of order m is an ordered tree where each node has at most m children. For each node, if k is
the actual number of children in the node, then k – 1 is the number of keys in the node. If the keys and subtrees
are arranged in the fashion of a search tree, then this is called a multiway search tree of order m. For example,
the following is a multiway search tree of order . Note that the first row in each node shows the keys, while the
second row shows the pointers to the child nodes. Of course, in any useful application there would be a record
of data associated with each key, so that the first row in each node might be an array of records where each
record contains a key and its associated data. Another approach would be to have the first row of each node
contain an array of records where each record contains a key and a record number for the associated data
record, which is found in another file. This last method is often used when the data records are large. The
example software will use the first method.
In a balanced tree each node must be in one of these three states. If there exists a node in a tree where this is
not true, then such a tree is said to be unbalanced.
A new node is inserted at the leaf or terminal node level. The only nodes which can have their balance
indicator changed by such an insertion are those which lie on a path between the root of the tree and the newly
inserted leaf. The possible changes which can occur to a node on this path are as follows:
1. The node was either left or right heavy and has now become balanced.
2. The node was balanced and has now become left or right heavy.
3. The node was heavy and the new node has been inserted in the heavy subtree, thus creating an unbalanced
subtree. Such a node is said to be a critical node.
If condition 1 applies to a current node, then the balance indicators of all ancestor nodes of this node remain
unchanged, since the longest path in the subtree remains unchanged. When condition 2 applies to a current
node, then the balance indicators of the ancestors will change. If condition 3 applies to a current node, then the
tree has become unbalanced and this node has become critical.
In the case of rebalancing a tree when a critical node has been encountered, there are 2 broad cases which can
arise, each of which can be further subdivided into 2 essentially similar subcases. A general representation of
case 1 is given in the following figure, where the rectangles labeled T1, T2, and T3 represent trees and the
node labeled NEW denotes the node being inserted.
The expression at the bottom of each rectangle denotes the maximum path length in that tree after insertions.
For example in figure (a) since node X is critical, the node Y must have been balanced prior to insertion. This
case covers the situation when Y has become heavy in the same direction that X was heavy. A concrete
example of the second possibility for case 1 is exemplified in figure (c). The PATH and DIRECTION vectors
are defined in the next algorithm.
Figure 4.44: Case of rebalancing a tree.
The second case, which is given in figure (d), Y, becomes heavy in an opposite direction to that in which X
was heavy. It is clear that node Z must have been balanced prior to insertion. A specific example of case 2b is
given in figure (e). Again PATH and DIRECTION refer to vectors that are associated with the next algorithm.
(e)Example of case 2
The node structure of a tree will consist of a left pointer (LPTR), a right pointer (RPTR), a key field (K), a
balance indicator (BI), and an information field (DATA). The name of the node structure is NODE. A list head
for the tree is assumed with its left pointer containing the address of the root of the actual tree. A general
algorithm for inserting a node into a height balanced tree is as follows:-
1. If this is the first insertion then allocate a node, set its fields and exit
2. If the name is not already in the tree then attach the new node to the existing tree else write item already
present and exit
3. Search for an unbalanced node
4. Adjust the balance indicators, if there is no critical node, exit
5. If the node was balanced and then becomes heavy or the node was heavy and becomes balanced then adjust
the balance indicators and exit
6. Rebalance the tree and exit.
Lab Exercise
1. Write a C program to represent the linked list representation of binary tree into sequential representation.
2. Write a recursive function to add the first ‗n‘ natural numbers. [Hint: (addition) = 1 + 2 + … + n]].
Situation: A Turning Tide of Customers When Binary Tree opened its doors in 1993, it helped companies
migrate to Lotus Notes. By 1999, Binary Tree was noticing a new trend in the migration market – an increase
in demand for migrations to Microsoft. Some of Binary Trees former customers who originally migrated to
Lotus Notes were now pounding on Binary Trees doors to demand a complete overhaul to the Microsoft
solution. ―Those requests kept escalating until they reached a point where it was obvious that this was a big
opportunity for us,‖ said Steven Pivnik, CEO of Binary Tree. We have learned to listen to our customers and
there are a number of reasons why they want to migrate to the Microsoft platform. Many of them cite that the
Microsoft platform is maturing substantially and that they are looking for reduced costs, better productivity,
and lower costs of ownership.
4.12 Summary
A tree structure is a way of presenting the hierarchical nature of a structure in a graphical form.
The two ways to represent trees are linked representation and array representation.
To evaluate an expression tree, we recursively evaluate the left and right sub-trees and then apply the
operator at the root.
The three types of graphs are directed graphs, undirected graphs, and mixed graphs.
A binary tree is a finite set of data elements and each node contains a maximum of two branches.
A binary tree is called a strictly binary tree when the non-terminal nodes have exactly two child nodes
forming left and the right sub-trees.
In a threaded binary tree, the pointers are represented as threads such that the threads points to the node in
the binary tree for any operations.
4.13 Keywords
Converse inorder traversal: The tree obtained after the inorder traversal is represented in the original tree
format.
Infinite loop: A series of instructions in a computer program which, on execution, result in a cyclic repetition
of the same instructions.
Leaf nodes: The nodes without any successors.
Node predecessor: Node representing the parent node.
Node successor: Node representing the child node.
Ordered list: An ordered list is one which is maintained in some predefined order, such as alphabetical or
numerical order.
Threads: It is the pointer linking the other nodes in the tree.
Traversing flag: Indicates if the node was visited during traversal.
5. Write a procedure for linked list representation of binary tree. Consider the array representation provided
in question 2 for constructing a linked representation.
6. Represent the following binary tree in an array.
7. ―In threaded binary trees, threads are used instead of pointers.‖ Justify with example.
8. In the binary tree given, delete node D and insert node I.
10. ―Binary search trees have more advantages when compared to other data structures‖. Justify the term.
5.0 Objectives
After studying this chapter, you will be able to:
Define the graphs
Discuss the undirected and directed graph or digraph
Explain the graph representation
Discuss the breadth first traversal and depth first traversal
Explain the adjacency matrix and adjacency list
Explain the orthogonal representation of graph
Discuss the adjacency multilist representation
Discuss the graph traversals
Explain the transitive closure
5.1 Introduction
In computer science, a graph is an abstract data structure that is meant to implement the graph concept from
mathematics.
A graph data structure consists mainly of a finite (and possibly mutable) set of ordered pairs, called edges or
arcs, of certain entities called nodes or vertices. As in mathematics, an edge (x, y) is said to point or go from x
to y. The nodes may be part of the graph structure, or may be external entities represented by integer indices or
references.
A graph data structure may also associate to each edge some edge value, such as a symbolic label or a numeric
attribute (cost, capacity, length, etc.).
It is often useful to bound the running time of graph algorithms. Unlike most other computational problems,
for a graph G = (V, E) there are two relevant parameters describing the size of the input the number |V| of
vertices in the graph and the number |E| of edges in the graph. Inside asymptotic notation (and only there), it is
common to use the symbols V and E, when someone really means |V| and |E|. We adopt this convention here
to simplify asymptotic functions and make them easily readable. The symbols V and E are never used inside
asymptotic notation with their literal meaning, so this abuse of notation does not risk ambiguity. For example
O(E + VlogV) means O((E,V) |E| + |V| log |V| for a suitable metric of graphs. Another common convention-
referring to the values |V| and |E| by the names n and m, respectively-sidesteps this ambiguity.
Example: Figure 5.1 is a connected graph. It has only one connected component, namely itself. Figure 5.2 is a
graph with two connected components.
A (simple) cycle in a graph is a (simple) path of length three or more that connects a vertex to itself. We do not
consider paths of the form v (path of length 0), v, v (path of length 1), or v, w, v (path of length 2) to be cycles.
A graph is cyclic if it contains at least one cycle. A connected, acyclic graph is sometimes called a free tree.
Figure 5.2 shows a graph consisting of two connected components where each connected component is a free
tree. A free tree can be made into an ordinary tree if we pick any vertex we wish as the root and orient each
edge from the root.
1. Every free tree with n3–1 vertices contains exactly n-1 edges.
2. If we add any edge to a free tree, we get a cycle.
We can prove (1) by induction on n, or what is equivalent, by an argument concerning the ―smallest
counterexample.‖ Suppose G = (V, E) is a counterexample to (1) with the fewest vertices, say n vertices. Now
n cannot be 1, because the only free tree on one vertex has zero edges, and (1) is satisfied. Therefore, n must
be greater than 1.
Now claim that in the free tree there must be some vertex with exactly one incident edge. In proof, no vertex
can have zero incident edges, or G would not be connected. Suppose every vertex has at least two edges
incident. Then, start at some vertex v1, and follow any edge from v1. At each step, leave a vertex by a different
edge from the one used to enter it, thereby forming a path v1, v2, v3, . . ..
Since there are only a finite number of vertices in V, all vertices on this path cannot be distinct; eventually, we
find vi = vj for some i < j. We cannot have i = j-1 because there are no loops from a vertex to itself, and we
cannot have i = j-2 or else we entered and left vertex vi+1 on the same edge. Thus, i £ j-3, and we have a cycle
vi, vi+1, . . . , vj = vi. Thus, we have contradicted the hypothesis that G had no vertex with only one edge
incident, and therefore conclude that such a vertex v with edge (v, w) exists. Now consider the graph G'
formed by deleting vertex v and edge (v, w) from G. G' cannot contradict (1), because if it did, it would be a
smaller counterexample than G. Therefore, G' has n-1 vertices and n-2 edges. But G has one more edge and
one more vertex than G', so G has n-1 edges, proving that G does indeed satisfy (1). Since there is no smallest
counterexample to (1), we conclude there can be no counterexample at all, so (1) is true. Now easily prove
statement (2), that adding an edge to a free tree forms a cycle. If not, the result of adding the edge to a free tree
of n vertices would be a graph with n vertices and n edges. This graph would still be connected, and we
supposed that adding the edge left the graph acyclic. Thus we would have a free tree whose vertex and edge
count did not satisfy condition (1).
Figure 5.3: shows a digraph with four vertices and five arcs.
Notice that the ―arrowhead‖ is at the vertex called the ―head‖ and the tail of the arrow is at the vertex called
the ―tail.‖ We say that arc v –> w is from v to w, and that w is adjacent to v.
Example:
The vertices of a digraph can be used to represent objects, and the arcs relationships between the objects. For
example, the vertices might represent cities and the arcs airplane flights from one city to another. As another
example, which we introduced in a digraph can be used to represent the flow of control in a computer
program. The vertices represent basic blocks and the arcs possible transfers of flow of control.
A path in a digraph is a sequence of vertices v1, v2, . . . , vn, such that v1 –> v2, v2 –> v3, . . . , vn-1 –> vn are
arcs. This path is from vertex v1 to vertex vn, and passes through vertices v2, v3, . . . , vn-1, and ends at vertex vn.
The length of a path is the number of arcs on the path, in this case, n-1. As a special case, a single vertex v by
itself denotes a path of length zero from v to v. In Figure 5.4 the sequence 1, 2, 4 is a path of length 2 from
vertex 1 to vertex 4.
A path is simple if all vertices on the path except possibly the first and last, are distinct. A simple cycle is a
simple path of length at least one that begins and ends at the same vertex. In Figure 5.4, the path 3, 2, 4, 3 is a
cycle of length three. In many applications it is useful to attach information to the vertices and arcs of a
digraph. For this purpose we can use a labelled digraph, a digraph in which each arc and/or each vertex can
have an associated label. A label can be a name, a cost, or a value of any given data type.
Closely related is the labelled adjacency matrix representation of a digraph, where A[i, j] is the label on the arc
going from vertex i to vertex j. If there is no arc from i to j, then a value that cannot be a legitimate label must
be used as the entry for A[i, j].
Caution
For a simple graph with no self-loops, the adjacency matrix must have 0s on the diagonal.
2. A graph is said to be……….if there is a path between any two of its nodes
(a). connected (b). complete (c). alanced (d). binary.
4. The graph G is said to be………..if each edge in the graph is assigned a non negative numerical value called
the weight or length of the edge
(a). complete (b). weighted (c). balanced (d). tree.
Here
1. M = One bit mark field to be used to indicate whether or not the edge has been examined.
2. vi = Vertex in graph such that there is an edge joining vi to vj.
3. vj = Vertex in graph such that there is an edge joining vi to vj.
4. LINK i for vi = Link to some other node representing an edge incident to vi.
5. LINK j for vi = Link to some other node reprenting an edge incident to vj.
Proof.
Let VY = {v1, . . . , vk}. Let X be constructed from Y by adding vertices vk+1, . . . , vn such that for m > k, vm is
adjacent to all but at most one of v1, . . . , vm−1. Assuming that an orthogonal representation of X[v 1, . . . ,
vm−1] in Rd has been constructed satisfying (2), we show there is an orthogonal representation of X[v1, . . . , vm]
in Rd satisfying (2). If vm is adjacent to v1, . . . , vm−1 then choose as m any vector in
Otherwise, let vs be the only vertex of X[v1, . . . , vm−1] not adjacent to vm in X[v1, . . . , vm]. We want to choose
a vector m such that
We can conclude the desired vector exists, since clearly none of the subspaces A i, Bi is equal to W. Thus we
have constructed an orthogonal representation of X in Rd such that , are linearly independent for
any distinct vertices u, v of X.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Ex2: What is Directed graph?
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
As for depth-first search, we can build a spanning forest when we perform a breadth-first search. In this case,
we consider edge (x, y) a tree edge if vertex y is first visited from vertex x in the inner loop of the search
procedure BFS.
It turns out that for the breadth-first search of an undirected graph, every edge that is not a tree edge is a cross
edge, that is, it connects two vertices neither of which is an ancestor of the other. The breadth-first search
algorithm given in algorithm inserts the tree edges into a set T, which we assume is initially empty. Every
entry in the array mark is assumed to be initialized to the value unvisited; algorithm works on one connected
component. If the graph is not connected, BFS must be called on a vertex of each component. Note that in a
breadth-first search we must mark a vertex visited before enqueuing it, to avoid placing it on the queue more
than once.
Example: The breadth-first spanning tree for the graph G is shown in algorithm. We assume the search began
at vertex a. As before, we have shown tree edges solid and other edges dashed. We have also drawn the tree
with the root at the top and the children in the left-to-right order in which they were first visited.
Breadth-First Algorithm
The time complexity of breadth-first search is the same as that of depth-procedure first search.
bfs ( v );
{ bfs visits all vertices connected to v using breadth-first search }
var
Q: QUEUE of vertex;
x, y: vertex;
begin
mark[v] := visited;
ENQUEUE(v, Q);
while not EMPTY(Q) do begin
x := FRONT(Q);
DEQUEUE(Q);
for each vertex y adjacent to x do
if mark[y] = unvisited then begin
mark[y] := visited;
ENQUEUE(y, Q);
INSERT((x, y), T)
end
end
end; { bfs }
Each vertex visited is placed in the queue once, so the body of the while loop is executed once for each vertex.
Each edge (x, y) is examined twice, once from x and once from y. Thus, if a graph has n vertices and e edges,
the running time of BFS is O(max(n, e)) if we use an adjacency list representation for the edges. Since e³ n is
typical, we shall usually refer to the running time of breadth-first search as O(e), just a we did for depth-first
search. Depth-first search and breadth-first search can be used as frameworks around which to design efficient
graph algorithms. For example, either method can be used to find the connected components of a graph, since
the connected components are the trees of either spanning forest. We can test for cycles using breadth-first
search in O(n) time, where n is the number of vertices, independent of the number of edges. As we discussed
in any graph with n vertices and n or more edges must have a cycle. However, a graph could have n-1 or fewer
edges and still have a cycle, if it had two or more connected components. One sure way to find the cycles is to
build a breadth-first spanning forest. Then, every cross edge (v, w) must complete a simple cycle with the tree
edges leading to v and w from their closest common ancestor, as shown in Figure 5.8.
Since all vertices on the adjacency list at C have now been exhausted, the search returns to B, from which the
search proceeds to D. Vertices A and C on the adjacency list of D were already visited, so the search returns to
B and then to A.
At this point the original call of dfs(A) is complete. However, the digraph has not been entirely searched;
vertices E, F and G are still unvisited. To complete the search, we can call dfs(E).
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
The APSP problem is to find for each ordered pair of vertices (v, w) the smallest length of any path from v to
w. We could solve this problem using Dijkstra's algorithm with each vertex in turn as the source. A more
direct way of solving the problem is to use the following algorithm due to R. W. Floyd. For convenience, let us
again assume the vertices in V are numbered 1, 2 , . . . , n. Floyd's algorithm uses an n x n matrix A in which to
compute the lengths of the shortest paths. We initially set A[i, j] = C[i, j] for all i ¹ j. If there is no arc from i to
j, we assume C[i, j] = ¥. Each diagonal element is set to 0. We then make n iterations over the A matrix. After
the kth iteration, A[i, j] will have for its value the smallest length of any path from vertex i to vertex j that does
not pass through a vertex numbered higher than k. That is to say, i and j, the end vertices on the path, may be
any vertex, but any intermediate vertex on the path must be less than or equal to k.
In the kth iteration we use the following formula to compute A.
The subscript k denotes the value of the A matrix after the kth iteration, and it should not be assumed that there
are n different matrices. We shall eliminate these subscripts shortly. This formula has the simple interpretation
shown in Figure.5.10.
To compute Ak[i, j] we compare Ak- 1[i, j], the cost of going from i to j without going through k or any higher-
numbered vertex, with Ak-1[i, k] + Ak- 1[k, j], the cost of going first from i to k and then from k to j, without
passing through a vertex numbered higher than k. If passing through vertex k produces a cheaper path than
what we had for Ak- 1[i, j], then we choose that cheaper cost for Ak[i, j].
.
Figure 5.10: Including k among the vertices to go from i to j
Example: Consider the weighted digraph shown in Figure 5.10 The values of the a matrix initially and after the
three iterations are shown in Figure 5.11.
Since Ak[i, k] = Ak-1[i, k] and Ak[k, j] = Ak-1[k, j], no entry with either subscript equal to k changes during the
kth iteration. Therefore, we can perform the computation with only one copy of the matrix. A program to
perform this computation on n x n matrices The running time of this program is clearly O(n 3), since the
program is basically nothing more than a triply nested for-loop. To verify that this program works, it is easy to
prove by induction on k that after k passes through the triple for-loop, A[i, j] holds the length of the shortest
path from vertex i to vertex j that does not pass through a vertex numbered higher than k.
Since the adjacency-matrix version of Dijkstra finds shortest paths from one vertex in O(n2) time, it, like
Floyd‘s algorithm, can find all shortest paths in O(n3) time. The compiler, machine, and implementation
details will determine the constants of proportionality. Experimentation and measurement are the easiest way
to ascertain the best algorithm for the application at hand. If e, the number of edges, is very much less than n2,
then despite the relatively low constant factor in the O(n3) running time of Floyd, we would expect the
adjacency list version of Dijkstra, taking O(ne logn) time to solve the APSP, to be superior, at least for large
sparse graphs.
1. There is already a path from i to j not passing through a vertex numbered higher than k– 1 or
2. There is a path from i to k not passing through a vertex numbered higher than k – 1 and path from k to j not
passing through a vertex numbered higher than k-1.
As before Ak[i, k] = Ak-1[i, k] and Ak[k, j] = Ak-1[k, j] so we can perform the computation with only one copy
of the A matrix. The resulting Pascal program, named Warshall after its discoverer, is shown in algorithm.
Warshall's algorithm for transitive closure
procedure Warshall ( var A: array[1..n, 1..n] of boolean;
C: array[1..n, 1..n] of boolean );
{ Warshall makes A the transitive closure of C }
Var
i, j, k: integer;
begin
for i := 1 to n do
for j := 1 to n do
A[i, j] := C[i, j];
for k := 1 to n do
for i := 1 to n do
for j := 1 to n do
if A[i, j ] = false then A[i, j] := A[i, k] and A[k, j]
end; { Warshall }
Lab Exercise
1. Write a complete program for Dijkstra's algorithm using a partially ordered tree as a priority queue and
linked adjacency lists.
2. Write a program to compute the transitive reduction of a digraph. What is the time complexity of your
program?
Case Study: Finding Strong Components- By performing two depth-first searches, we can test
whether a directed graph is strongly connected, and if it is not, we can actually produce the subsets of
vertices that are strongly connected to them. This can also be done in only one depth-first search, but the
method used here is much simpler to understand.
First, a depth-first search is performed on the input graph G. The vertices of G are numbered by a post order
traversal of the depth-first spanning forest, and then all edges in G are reversed, forming Gr. The graph Figure
1represents Gr for the graph G shown in Figure 2; the vertices are shown with their numbers.
The algorithm is completed by performing a depth-first search on Gr, always starting a new depth first search at
the highest-numbered vertex. Thus, we begin the depth-first search of Gr at vertex G, which is numbered 10.
This leads nowhere, so the next search is started at H. This call visits I and J. The next call starts at B and visits
A, C, and F. The next calls after this are dfs(D) and finally dfs(E). The resulting depth-first spanning forest is
shown in Figure3.
Each of the trees (this is easier to see if you completely ignore all non tree edges) in this depth-first spanning
forest forms a strongly connected component. Thus, for our example, the strongly connected components are
{G}, {H, I, J}, {B, A, C, F}, {D}, and {E}.
To see why this algorithm works, first note that if two vertices v and w are in the same strongly connected
component, then there are paths from v to w and from w to v in the original graph G, and hence also in Gr.
Now, if two vertices v and w are not in the same depth-first spanning tree of Gr, clearly they cannot be in the
same strongly connected component.
To prove that this algorithm works, we must show that if two vertices v and w are in the same depth-first
spanning tree of Gr, there must be paths from v to w and from w to v. Equivalently, we can show that if x is the
root of the depth-first spanning tree of Gr containing v, then there is a path from x to v and from v to x.
Applying the same logic to w would then give a path from x to w and from w to x. These paths would imply
paths from v to w and w to v (going through x).
Since v is a descendant of x in Gr's depth-first spanning tree, there is a path from x to v in Gr and thus a path
from v to x in G. Furthermore, since x is the root, x has the higher postorder number from the first depth-first
search. Therefore, during the first depth-first search, all the work processing v was completed before the work
at x was completed. Since there is a path from v to x, it follows that v must be a descendant of x in the
spanning tree for G otherwise v would finish after x. This implies a path from x to v in G and completes the
proof.
Figure 1: Gr numbered by postorder traversal of G
Figure 3: Depth-first search of Gr strong components are {G}, {H, I, J}, {B, A, C, F}, {D},{E}
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
5.7 Summary
A path is simple if all vertices on the path, except possibly the first and last, are distinct.
Adjacency matrix representation the time required to access an element of an adjacency matrix is
independent of the size of V and E.
Adjacency-matrix version of Dijkstra finds shortest paths from one vertex in O(n 2) time, it, like Floyd‘s
algorithm, all shortest paths in O(n3) time.
The center of graph G is a vertex of minimum eccentricity. Thus, the centre of a digraph is a vertex that is
closest to the vertex most distant from it.
During a depth-first traversal of a directed graph, certain arcs, when traversed, lead to unvisited vertices.
The arcs leading to new vertices are called tree arcs.
5.8 Keywords
Directed Graph (digraph for short): G consists of a set of vertices V and a set of arcs E. The vertices are also
called nodes or points; the arcs could be called directed edges or directed lines.
Graph G: It is said to be complete if every node u in G is adjacent to every other node v in G.
Loops: An edge e is called a loop if it has identical endpoints, that is, if e=[u, u].
Mixed Graph: A mixed graph G contains both directed and undirected edges
Multiple Edges: Distinct edges e and e' are called-multiple edges if they connect the same endpoints, that is, if
e=[u, v] and e= [u, v].
6.0 Objectives
After studying this chapter, you will be able to:
• Define the bubble and selection sort
• Explain the merge and quick sort
• Explain the insertion and shell sort
• Discuss the address calculation and radix sort
• Explain the comparison of sorting methods
• Discuss the hash table and collision resolution techniques
• Explain the Linear Search (Sequential Search)
• Define the binary search
• Discuss the searching an ordered table
6.1 Introduction
Finding better algorithms to sort a given set of data is an ongoing problem in the field of computer science.
Sorting is placing a given set of data in a particular order. Simple sorts place data in ascending or descending
order. For discussion purposes, we will look at sorting data in ascending order. However, you may modify the
code to sort the data in descending order by reversing the relational operators (i.e., change ‗nums[j] < nums[j–
1]‘ to ‗nums[j] > nums[j–1]‘). In this lesson we will analyze sorts of different efficiency, and discuss when and
where they can be used. In order to simplify the explanation of certain algorithms, we will assume a swap( )
function exists that switches the values of two variables.
An example of such function for: int variables is void swap (int &item1,int &item2) //reference parameters,
point directly to the storage location of the variables passed. Local copies are not made, and these values are
saved after the function life span ends.
Implementation
void bubbleSort(int numbers[], int array_size)
{
int i, j, temp;
Parallel Analysis
Steps 1-10 is a one big loop that is represented n–1 times. Therefore, the parallel time complexity is O(n). If
the algorithm, odd-numbered steps need (n/2)–2 processors and even-numbered steps require (n/2) – 1
processors. Therefore, this needs O(n) processors.
Suppose an array are consists of 5 numbers. The selection sort algorithm works as follows:
1. In the first iteration the 0 th element 25 is compared with 15, element 17 and since 25 is greater than 17,
they are interchanged.
2. Now the 0 th element 17 is compared with 2 nd element 31. But 17 being less than 31, hence they are not
interchanged.
3. This process is repeated till 0 th element is compared with rest of the elements. During the comparison if 0
th element is found to be greater than the compared element, then they are interchanged, otherwise not.
4. At the end of the first iteration, the 0 th element holds the smallest number.
5. Now the second iteration starts with the l st element 25. The above process of comparison and swapping is
repeated.
6. So if there are n elements, then after (n – 1) iterations the array is sorted.
Program
#include <stdio.h>
void main( ) //entry level of the program
{
int arr[5] = { 25, 17, 31, 13, 2 };//initialize the array
int i, j, temp;
for ( i = 0; i <= 3; i++ )//outer loop from 0 to 3
{
for ( j = i + 1 ; j <= 4 ; j++) //inner loop from 1 to 4
{
if (arr[i] > arr[j]) //check the condition for adjoining array
{
temp = arr[i];
arr[i] = arr[j]; //interchange the array[j] to array[i]
arr[j] = temp;
}
}
}
printf (―\n Array after sorting:\n‖);
for ( i = 0; i <= 4; i++ )
printf (―%d\t‖, arr[i] ) ; //print the sorted array
}
Algorithm of Program
Step 1: The main() method of program is called. // program start from here
Step 2: Initialization
Set arr[5]<-{ 25, 17, 31, 13, 2};
and var i, j, temp;
Step 3: loop:
Outer loop start: from 0 to 3 // for row
Inner loop start: from 1 to 4 // for column
Compare: if ( arr[i] > arr[j] )
Set temp <- arr[i]
Set arr[i] -> arr[j] // interchange the value of one position to other
Then set arr[j] -> temp
End of inner loop
End of outer loop
Step 4: Now print the sorted array through for loop from 0 to 4 times.
//End of algorithm
printf(―Before‖);
printf(―%d‖, ary[j]);
printf(―\n‖);
merge_sort(ary, 0, MAX_ARY – 1); //method is called and assign the parameter and return void
Explanation
1. The merging of two sub lists, the first running from the index 0 to m, and the second running from the
index (m + 1) to (n–1) requires no more than (n–l + 1) iterations.
2. So if l = 1, then no more than n iterations are required, where n is the size of the list to be sorted.
3. Therefore, if n is the size of the list to be sorted, every pass that a merge routine performs requires a time
proportional to O(n), since the number of passes required to be performed is log2 n.
4. The time complexity of the algorithm is O(n log2(n)), for both average-case and worst case.
5. The merge sort requires an additional list of size n.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Following points explains the algorithm of insertion sort for a, array of 5 elements:
1. In the first iteration the lst element 17 is compared with 0 th element 25. Since 17 is smaller than 25, 17 is
inserted on 0 th place. The 0 th element 25 is shifted one position to the right.
2. In the second iteration, the 2 nd element 31 and 0 th elements are compared. Since 31 is greater than 17,
nothing is declared. Then the 2 nd element 31 is compared with the lst element, Again no action is taken as
25 is less than 31.
3. In the third iteration, the 3rd element 13 is compared with the 0 th element 17. Since, 13 is smaller than 17,
13 is inserted at the 0 th place in the array and all the element from 0 th till 2 nda position are shifted to
right by one position.
4. In the fourth iteration the 4 th element 2 nd is compared with the 0 th element 13. Since, 2 is smaller than
13, the 4 th element is inserted at the 0 th place in the array and all the elements from 0 th till 3 rd are
shifted right by one position. As a result, the array now becomes a sorted array.
On the other hand, if the input is pre-sorted, the running time is O(n), because the test in the inner for loop
always fails immediately. Indeed, if the input is almost sorted, insertion sort will run quickly. Because of this
wide variation, it is worth analyzing the average-case behaviour of this algorithm.
Program
#include<stdio.h>
#include<conio.h>
void main()
{
int A[20], N, Temp, i, j;
clrscr();
printf (―\n\n\t Enter The Number Of Terms... ‖);
scanf (―%d‖, &N);
printf (―\n\t Enter The Elements Of The Array...‖);
for (i=0; i<N; i++)
{
gotoxy (25,11+i);
scanf (―\n\t\t%d‖, &A[i]);
}
for (i=1; i<N; i++)
{
Temp = A[i];
j = i–1;
while (Temp<A[j] && j>=0)
{
A[j+1] = A[j];
j = j–1;
}
A[j+1] = Temp;
}
Printf (―\n\t The Ascending Order List is...:\n‖);
for (i=0; i<N; i++)
printf (―\n\t\t\t %d‖, A[i]);
getch();
}
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Example:
An array,
25 57 48 37 12 92 86 33
Let us create ten sub-files, one for each of the ten possible first digits. Initially, each of these sub-files is
empty. Consider an array of pointers f[10], where f[i] points to the first element in the file whose digit is i.
After scanning the first element (i.e., 25), it is placed into the file header by f[2]. Each of the sub-files is
maintained as a sorted linked list of the original array elements.
3. The sort which inserts each elements A(K) into proper position in the previously sorted
sub array A(1), ..., A(K–1)
(a). Insertion sort (b). Radix sort (c). Merge sort (d). Bubble sort
Program
#include <stdio.h>
#include <malloc.h>
#include <stdlib.h>
\* Intilization of variable and function*\
int check_order(unsigned int *, int);
void lsd_radix_sort(unsigned int *, int);
main(int argc, char *argv[]) {
int i, nvals = 10000;
unsigned int *array;
if(argc > 1)\\ checking condition
nvals = atoi(argv[1]);
array = malloc(nvals * sizeof(unsigned int)); \\assigning arrays size
for (i = 0; i < nvals; i++)\\ condition
array [i] = random();
lsd_radix_sort(array, nvals);
if(i = check_order(array, nvals))
printf (―%d misorderings\n‖, i);
else
printf (―array is in order\n‖);
}
int check_order(unsigned int *ip, int n)
{
int i, nrev = 0;
for (i = 1; i < n; i++)
if (ip[i–1] > ip[i])
nrev++;\\ incrementing the value
return (nrev);\\ returning the value }
A third method is folding, in which the identifier is partitioned into several parts, all but the last part being of
the same length. These parts are then added together to obtain the hash value. To store the name or to add
attributes of the name, we compute the hash value of the name, and place the name or attributes, as the case
may be, at that place in the table whose index is the hash value of the name. To retrieve the attribute values of
the name kept in the symbol table, we apply the hash function of the name to that index of the table where we
get the attributes of the name. So we find that no comparisons are required to be done; the time required for the
retrieval is independent of the table size. The retrieval is possible in a constant amount of time, which will be
the time taken for computing the hash function. Therefore a hash table seems to be the best for realization of
the symbol table, but there is one problem associated with the hashing, and that is collision. Hash collision
occurs when the two identifiers are mapped into the same hash value. This happens because a hash function
defines a mapping from a set of valid identifiers to the set of those integers that are used as indices of the table.
Therefore we see that the domain of the mapping defined by the hash function is much larger than the range of
the mapping, and hence the mapping is of a many-to-one nature. Therefore, when we implement a hash table, a
suitable collision-handling mechanism is to be provided, which will be activated when there is a collision.
Collision handling involves finding an alternative location for one of the two colliding symbols. For example,
if x and y are the different identifiers and h(x = h(y)), x and y are the colliding symbols. If x is encountered
before y, then the ith entry of the table will be used for accommodating the symbol x, but later on when y
comes, there is a hash collision. Therefore we have to find a suitable alternative location either for x or y. This
means we can either accommodate y in that location, or we can move x to that location and place y in the ith
location of the table. Various methods are available to obtain an alternative location to handle the collision.
They differ from each other in the way in which a search is made for an alternative location. The following are
commonly used collision-handling techniques:
Rehashing
In rehashing we find an alternative empty location by modifying the hash function and applying the modified
hash function to the colliding symbol. For example, if x is the symbol and h(x) = i, and if the i th location is
already occupied, then we modify the hash function h to h1, and find out h1(x), if h1(x) = j. If the j th location is
empty, then we accommodate x in the j th location. Otherwise, we once again modify h 1 to some h2 and repeat
the process until the collision is handled. Once the collision is handled, we revert to the original hash function
before considering the next symbol.
Overflow chaining
Overflow chaining (see Figure 6.6) is a method of implementing a hash table in which the collisions are
handled automatically. In this method, we use two tables: a symbol table to accommodate identifiers and their
attributes, and a hash table, which is an array of pointers pointing to symbol table entries. Each symbol table
entry is made of three fields: the first for holding the identifier, the second for holding the attributes, and the
third for holding the link or pointer that can be made to point to any symbol table entry.
Figure 6.7: Hash table implementation using overflow chaining for collision handling.
Program
#include < stdio.h>
#include < stdlib.h>
#define HASHSIZE1000
#define MAXLINE1024
typedef struct tnode {
char *data;
struct tnode *next;
} node;
void htable_init(node *hashtable); // fire up hashtable
void htable_insert(node *hashtable, char *str); // insert data into hashtable
void htable_resolve(node *hashtable, int loc, char *str); // resolve collisions in hashtable
void htable_display(node *hashtable); // display hashtable
int htable_delete(node *hashtable, char *str); // delete an entry from hashtable
int htable_hash(char *str); // hash data for hashtable
int main(void) {
char line[MAXLINE];
node *hashtable;
hashtable = (node *)malloc(HASHSIZE * sizeof(node));
htable_init(hashtable);
while((fgets(line, MAXLINE, stdin)) != NULL)
htable_insert(hashtable, line);
htable_display(hashtable);
return 0;
}
/* fire up hashtable */
void htable_init(node *hashtable) {
int i = 0;
for(i = 0; i < HASHSIZE; i++)
hashtable[i].data = NULL, hashtable[i].next = NULL;
}
/* insert data into hashtable */
void htable_insert(node *hashtable, char *str) {
int index = 0;
// determine hash function
index = htable_hash(str);
if(hashtable[index].data != NULL) {
// collision occurs - resolve by chaining
htable_resolve(hashtable, index, str);
} else {
hashtable[index].data = calloc(strlen(str) + 1, sizeof(char));
strcpy(hashtable[index].data, str);
}
}
/* hash data for hashtable */
int htable_hash(char *str) {
int index = 0;
char *tmp = NULL;
tmp = calloc(strlen(str) + 1, sizeof(char));
strcpy(tmp, str);
while(*tmp) {
index += *tmp;
tmp++;
}
index = index % HASHSIZE;
return index;
}
/* resolve collisions in hashtable */
void htable_resolve(node *hashtable, int loc, char *str) {
node *tmp;
tmp = hashtable + loc;
while(tmp->next != NULL)
tmp = tmp->next;
tmp->next = (node *)malloc(sizeof(node));
tmp->next->data = calloc(strlen(str) + 1, sizeof(char));
strcpy(tmp->next->data, str);
tmp->next->next = NULL;
}
/* display hashtable */
void htable_display(node *hashtable)
{
int i = 0;
node *target;
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Example
Input
Enter the number of elements in the list, max = 10
10
Enter the elements
23
1
45
67
90
100
432
15
77
55
Output
The list before sorting is:
The elements of the list are:
23 1 45 67 90 100 432 15 77 55
Enter the element to be searched
100
The element whose value is 100 is present at position 5 in list
Input
Enter the number of elements in the list max = 10
10
Enter the elements
23
1
45
67
90
101
23
56
44
22
Output
The list before sorting is:
The elements of the list are:
23 1 45 67 90 101 23 56 44 22
Enter the element to be searched
100
The element whose value is 100 is not present in the list
Explanation
1. In the best case, the search procedure terminates after one comparison only, whereas in the worst case, it
will do n comparisons.
2. On average, it will do approximately n/2 comparisons, since the search time is proportional to the number
of comparisons required to be performed.
3. The linear search requires an average time proportional to O(n) to search one element. Therefore to search
n elements, it requires a time proportional to O(n2).
4. We conclude that this searching technique is preferable when the value of n is small. The reason for this is
the difference between n and n2 is small for smaller values of n.
Program
#include <stdio.h>
#define MAX 10
Example
Input
Enter the number of elements in the list, max = 10
10
Enter the elements
34
2
1
789
99
45
66
33
22
11
Output
The elements of the list before sorting are:
34 2 1 789 99 45 66 33 22 11
1 2 3 4 5 6 7 8 9 10
Enter the element to be searched:
99
The element whose value is 99 is present at position 5 in the list
Input
When we search the data which are stored in an arbitrary order in an array, we will not find the search key to be
in a particular region in the array. However, in cases where the data is stored in a sorted order, there is a
possibility that the search key is present in the middle element. If it is there, then it can be located by the
running program which terminates the search. When the search key is not present in the middle element, then
we can assume it to be greater than the key which is present in the middle element, and the middle element
along with all the other elements present here, these are eliminated. Now, the only situation left is that it is
above the middle element. Similarly, in cases where the search key is less than middle element‘s key then the
middle element along with all the other elements present above it are eliminated. In any of these cases, more
than half of the elements remain and we need to follow the same solution for these elements, hence elements
1 to [mid–1] or elements [mid+1] to n where the variable pointing to the middle element is called mid.
Lab Exercise
1. Write a program that implements insertion sort algorithm for a linked list of integers.
2. Write a program that sorts the elements of a two-dimensional array row wise and column wise.
while(i<isize)
{
if(val == index[i])
{
pos = 8 * i;
return pos;
}
if(val < index[i])
{
low = 8 * (i–1);
high = 8 * i;
break;
}
else
{
low = 8 * i;
high = 8 * (i+1);
}
i++;
}
while(low < high)
{
if(val == arr[low])
return low;
else
low++;
}
return –1;
}
int main()
{
int arr[MAX]={8,14,26,38,72,115,306,
321,329,387,409,512,540,567,583,592,602,611,618,741,798,811,814,876};
int index[(MAX/8)+1]={0};
createIndex(&index[0],(MAX/8)+1,&arr[0],MAX);
int opt=0, pos=0;
while(opt < MAX)
{
pos = indexSeqSearch(arr[opt], &index[0], (MAX/8)+1, &arr[0], MAX);
if( pos != –1)
{
printf(―\n%d found at position %d‖, arr[opt],pos);
}
else
printf(―\n%d not found‖, arr[opt]);
opt++;
}
return 0;
}
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Program
#include <stdio.h>
#include <stdlib.h>
#define MAX 5
int interpolationsearch(int a[], int low, int high, int x)
{
int mid;
while(low<=high)
{
mid=low+(high–low)*((x–a[low])/(a[high]–a[low]));
if(x==a[mid])
return mid+1;
if(x<a[mid])
high=mid–1;
else
low=mid+1;
}
return –1;
}
int main()
{
int arr[MAX];
int i, n;
int val, pos;
printf(―\n Enter total elements (n < %d) :‖,MAX);
scanf(―%d‖, &n);
printf(―Enter %d Elements :‖,n);
for(i=0; i<n; i++)
scanf(―%d‖, &arr[i]);
printf(―\n LIST:‖);
for(i=0; i<n; i++)
printf(―%d\t‖,arr[i]);
printf(―\n Search For :‖);
scanf(―%d‖, &val);
pos=interpolationsearch(&arr[0],0,n,val);
if(pos== –1)
printf(―\n Element %d not found\n‖, val);
else
printf(―\n Element %d found at position %d\n‖, val, pos);
return 0;
}
7. The running time of the following sorting algorithm depends on whether the
Partitioning is balanced or unbalanced
(a). Insertion sort (b). Selection sort (c). Quick sort (d). Merge sort.
6.18 Keywords
Card Sort: It is called Radix sort.
Hashing: A data object called a symbol table is required to be defined and implemented in
many applications, such as compiler/assembler writing.
Insertion Sort: Insertion sort is implemented by inserting a particular element at the appropriate position.
Merge Sort: This is another sorting technique having the same average-case and worst-case time complexities,
but requiring an additional list of size n.
Radix Sort: Radix sort is sometimes known as card sort, because it was used, until the advent of modern
computers, to sort old-style punch cards.
1.0 Objectives
After studying this chapter, you will be able to:
Discuss the history of operating systems
Define Operating system
Define the types of operating system
Discuses the system components and its services
Example the system calls
Understand the system programs
1.1 Introduction
Modern general-purpose computers, including personal computers and mainframes, have an operating
system to run other programs, such as application software. Examples of operating systems for
personal computers include Microsoft Windows, Mac OS (and Darwin) , UNIX, and Linux The lowest
level of any operating system is its kernel. This is the first layer of software loaded into memory
when a system boots or starts up. The kernel provides access to various common core services to all
other system and application programs. These services include, but are not limited to: disk access,
memory management, task scheduling, and access to other hardware devices.
As well as the kernel, an operating system is often distributed with tools for programs to display and
manage a graphical user interface (although Windows and the Macintosh have these tools built into
the operating system), as well as utility programs for tasks such as managing files and configuring the
operating system. They are also often distributed with application software that does not relate
directly to the operating system‘s core function, but which the operating system distributor finds
advantageous to supply with the operating system.
The delineation between the operating system and application software is not precise, and is
occasionally subject to controversy. From commercial or legal points of view, the delineation can
depend on the contexts of the interests involved. For example, one of the key questions in the United
States v. Microsoft antitrust trial was whether Microsoft‘s Web browser was part of its operating
system, or whether it was a separable piece of application software.
Like the term ―operating system‖ itself, the question of what exactly should form the ―kernel‖ is
subject to some controversy, with debates over whether things like file systems should be included in
the kernel. Various camps advocate microkernels, monolithic kernels, and so on. Operating systems
are used on most, but not all, computer systems. The simplest computers, inc luding the smallest
embedded systems and many of the first computers did not have operating systems. Instead, they
relied on the application programs to manage the minimal hardware themselves, perhaps with the aid
of libraries developed for the purpose. Commercially-supplied operating systems are present on
virtually all modern devices described as computers, from personal computers to mainframes, as well
as mobile computers such as PDAs and mobile phones.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Table 1
1.4.1 Hardware
In order to function properly, a computer system must have all four types of hardware: input,
processing, output, and storage.
Figure 1
In this example, the mouse and keyboard are the input devices and the monitor and speakers are
output devices. The processor is contained inside the tower unit and the storage devices are the hard
drive, CD-ROM drive and the diskette drive. Let us explore each of the devices in detail.
Input devices accept data in a form that the computer can utilize. Also, the input devices send the
data or instructions to the processing unit to be processed into useful informa tion. There are many
examples of input devices, but the most commonly used input devices are shown below:
Figure 2
Figure 3
The input device feeds data, raw unprocessed facts, to the processing unit. The role of the processing
unit or central processing unit is to use a stored program to manipulate the input data into the
information required. In looking at the computer system below, the Central Processing Unit (CPU) is
not exactly visible. The CPU is found inside the tall, vertical unit, calle d a tower, located just to the
right of the monitor.
Figure 4
The CPU is the brain of the computer. The CPU consists of electronic circuits that interpret and
execute instructions; it communicates with the input, output, and storage devices. The CPU, with the
help of memory, executes instructions in the repetition of machine cycles.
A machine cycle consists of four steps:
1. The control unit fetches an instruction and data associated with it from memory.
2. The control unit decodes the instruction.
3. The arithmetic/logic unit executes the instruction.
4. The arithmetic/logic unit stores the result in memory.
The first two instructions are called instruction time, I-time. Steps 3 and 4 are called execution time,
E-time. The speed of computer is measured in megahertz, MHz
A MHz is a million machine cycles per second. A personal computer listed at 500 MHz has a
processor capable of handling 500 million machine cycles per second. Another measure of speed is
gigahertz (GHZ), a billion machine cycles per second. A third measure of speed is a megaflop, which
stands for one million floating-point operations per second. It measures the ability of the computer to
perform complex mathematical operations.
Memory, or primary storage, works with the CPU to hold instructions and data in order to be
processed. Memory keeps the instructions and data for whatever programs you happen to be using at
the moment. Memory is the first place data and instructions are placed after being input; processed
information is placed in memory to be returned to an output device. It is very important to know that
memory can hold data only temporarily because it requires a continuous flow of electrical current. If
current is interrupted, data is lost. Memory is in the form of a semiconductor or silicon chip an d is
contained inside the computer.
Figure 5
There are two types of memory: ROM and RAM. ROM is read only memory. It contains programs
and data that are permanently recorded when the computer is manufactured. It is read and used by the
processor, but cannot be altered by the user. RAM is random access memory. The user can access
data in RAM memory randomly. RAM can be erased or written over at will by the computer program
or the computer user. The amount of RAM has increased dramatically in recent years.
Memory is measured in bytes. A byte is usually made up of 8 bits and represents one character—a
letter, digit, or symbol. The number of bytes that can be held is a measure of the memory and storage
capacity. Bytes are usually measured in groups of kilobytes, megabytes, gigabytes, and terabytes. The
following chart defines each term.
Table 2
Memory is usually measured in Megabytes; a typical personal computer will have 64MB or more.
Storage is usually measured in Gigabytes.
Since we have said that memory is in the form of chips and must maintain a constant flow of
electricity, there must be a more permanent form of storage that does not depend on a constant flow
of electricity. That form of storage is called secondary or auxiliary storage. The benefits of secondary
storage are large space capacity, reliability, convenience and economy.
Magnetic disk storage is a very popular type of secondary storage—the floppy disk drive is an
external disk drive, while a hard disk drive is an internal disk drive. The floppy disk drive is usually a
3 ½``drive and uses a diskette made of flexible Mylar and coated with iron oxide, a substance that
can be magnetized. A diskette records data as magnetized spots on the tracks of its surface. A floppy
disk can hold 1.44 MBs, or a ‗Zip‘ drive can hold 100 MBs. A hard disk, an internal disk, is a metal
platter coated with magnetic oxide that can be magnetized to represent data. Hard disks come in a
variety of sizes and can be assembled into a disk pack. Hard disks for personal computers are 3 ½``
disks in sealed modules. A hard disk is capable of holding a great deal more than floppy disks. Hard
disks for personal computers are measured in gigabytes. (Remember, a gigabyte is roughly a thousand
megabytes or a thousand floppy disks.)
While the size or data capacity of a hard drive is very important, the speed of accessing that data is
equally as important. Files on hard drives can be accessed significan tly faster and more conveniently
than floppy drives.
Figure 6
The ever-demanding need for storage has required even better storage capacity than that of magnetic
disks. Optical disk technology meets that need. Included in the list of this type of technology is the
optical disk, the CD-ROM or DVD-ROM. The CD-ROM, compact disk read-only memory can hold
up to 660 MBs per disk or the equivalent of more than 400 standard 3 ½`` diskettes. The new storage
technology that outpaces all others is called DVD-ROM, digital versatile disk. The DVD has a 4.7
GB capacity, which is about seven times that of the CD-ROM.
A backup system is way of storing data in more than one location. Magnetic tape is usually used for
this purpose. Magnetic tape is an inexpensive type of storage; it looks like the tape used in
audiocassettes. Finally, the last component of a computer system is the output device. An output
device displays the processed information to the user. The two most popular forms of output devices
are the printer and the monitor. The monitor produces output that is temporary —the output is lost
when it is rewritten or erased or when power is lost. Monitor output is called softcopy. The printer
displays output in a permanent manner; it is called hardcopy. Other types of output devices include
voice output and music output devices.
Caution
In order to protect the data on your hard drive, you should have a backup system.
Word Processor Provides the tools for entering and revising text, adding
graphical elements, formatting and printing documents.
Spreadsheets Provides the tools for working with numbers and allows
you to create and edit electronic spreadsheets in
managing and analyzing information.
Presentation Graphics Provides the tools for creating graphics that represent
data in a visual, easily understood format.
Communication Software Provides the tools for connecting one computer with
another to enable sending and receiving information and
sharing files and resources.
As important as applications software may be, it is not able to directly communicate with hardware
devices. Another type of software is required operating systems software.
Operating Systems software is the set of programs that lies between applications software and the
hardware devices. Think of the cross section of an onion (see Figure 7). The inner core of the onion
represents the hardware devices, and the applications software represents the outside layer. The
middle layer is the operating systems software. The instructions must be passed from the outer layer
through the middle layer before the reaching the inner layer.
All computers, regardless of size, require the operating systems software. As soon as your personal
computer is turned on, the operating systems software is loaded into RAM in order to use your
computer devices and other software. A few short years ago, personal computers used an operating
system call MS-DOS, Microsoft Disk Operating System. This was a command -driven program in
which you needed to know command names and syntax. The need for a more user -friendly system
brought about Microsoft Windows operating systems software. Icons or pictures, requiring no
knowledge of spelling or syntax, drive Windows operating systems software. Windows is a GUI,
graphical user interface. A GUI uses graphic symbols, icons, in its interface. Further, Windows
allows you to multitask, which means that you may use more than one program at the same time. The
newest version of Windows is Windows 8.
Here are the expected features of Windows 8:
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
The most important system program for an operating system is its command interpreter. It is the
program which reads and interprets the commands given by the user. This program is also known as
control card interpreter or command line interpreter or the console command processor (in CP/M) or
the shell (In Unix). Its function is simple: get the next command and execute it. The commands given
to the command interpreter are implemented in two ways. In one approach the command interpreter
itself contains the code to execute the command. So the numbers of commands that can be given
determine the size of the command interpreter. An alternative approach implements all commands by
special system programs. So the command interpreter merely uses the command to identify a file to
be loaded into memory and executed. Thus a command delete X would search for a file ca lled delete,
load it into the memory and pass it the parameter X. In this approach new commands can be easily
added to the system by creating new files of the proper name. The command interpreter, which can
now be quite small, need not be changed in order to add new commands.
2. The commonly used UNIX commands like date, Is, cat, etc. are stored in
(a). /dev directory (b). /bin and /usr/bin directories
(c). /UNIX directory (d). /tmp directory
3. When a computer is first turned on or restarted, a special type of absolute loader called .............is
executed
(a). Compile and Go loader (b). Boot loader
(c). Bootstrap loader (d). Relating loader
4. Which of the following Operating systems is better for implementing a Client-Server network
(a). MS DOS (b). Windows 95
(c). Windows 98 (d). Windows 2000
Simple View
An Operating System is the layer between the hardware and software, as in Figure 8.
Figure 8: Operating System is the layer between the hardware and software.
An Operating System is responsible for the following functions
Device management using device drivers
Process management using processes and threads
Inter-process communication
Memory management
File systems
In addition, all operating systems come with a set of standard utilities. The utilities allow common
tasks to be performed such as
being able to start and stop processes
being able to organise the set of available applications
organise files into sets such as directories
view files and sets of files
edit files
rename, copy, delete files
communicate between processes
Kernel
The kernel of an operating system is the part responsible for all other operations. When a computer
boots up, it goes through some initialisation functions, such as checking memory. It then loads the
kernel and switches control to it. The kernel then starts up all the processes needed to communicate
with the user and the rest of the environment (e.g. the LAN)
The kernel is always loaded into memory, and kernel functions always run, handling processes,
memory, files and devices.
The traditional structure of a kernel is a layered system, such as UNIX. In this, all layers are part of
the kernel, and each layer can talk to only a few other layers. Application programs and utilities live
above the kernel.
The UNIX kernel (see Figure 9)
Most of the Operating Systems being built now use instead a micro kernel, which minimises the size
of the kernel. Many traditional services are made into user level services. Communication being
services is often by an explicit message passing mechanism.
The major micro-kernel Operating System is Mach. Many others use the concepts of Mach. (see
Figure 1.0)
Figure 10: Micro-kernel Operating System.
Some systems, such as Windows NT use a mixed approach (see Figure 1.10)
Fourth Generation
With the development of LSI (Large Scale Integration) circuits, chips, operating system entered in
the system entered in the personal computer and the workstation age. Microprocessor technology
evolved to the point that it becomes possible to build desktop computers as powerful as the
mainframes of the 1970s. Two operating systems have dominated the personal computer scene: MS -
DOS, written by Microsoft, Inc. for the IBM PC and other machines using the Intel 8088 CPU and its
successors, and UNIX, which is dominant on the large personal computers using the Motorola 6899
CPU family.
It was then revealed that Google had been in cahoots with other manufacturers had been developing a
new open-source OS for mobile devices which was the beginning of the Open Handset Alliance
(OHA). This group included LG, HTC, T-Mobile, Samsung, Motorola, Intel, and Qualcomm amongst
others. They all got together to develop an open standard for mobile devices based on Linux.
The OS has been available since Oct 2008 for developers as open -source software.
The mobile OS was first introduced to buyers in late 2008 with the HTC Dream (also called T -Mobile
G1), and started getting really popular with the launch of HTC Hero in July 2 009.
Many started calling the new OS a serious rival to iPhone, and by February 2010 there were 60,000
plus Android phones shipping daily. The App Market now boasts over 30,000 apps to download. Still
has a way behind the iPhone, but app creators now bring out apps for both systems.
Android has had many updates from v1.0. They released v1.5 which added camcorder functionality
and home screen widgets. Then v1.6 came along with voice search and an improved Android App
Market. Larger screens came with v2.0 and satnav. Then came v2.1 which brought extra home
screens and animated wallpapers amongst other features.
Currently there are many manufacturers developing and releasing Android hardware, from smart
phones, tablet PC‘s, e-book readers, not forgetting Google TV.
Back in March 2010 Google has partnered with Sony and Intel to create a TV platform powered by
the Android OS. It is mission to merge the TV and the Internet together bringing it into the home
using a new generation of televisions, blu-ray players and set-top boxes. Google have also partnered
with Dish Network to integrate Google TV within their DVR's.
Questions
1. How the Android operating system first introduced to buyers?
2. Write the benefits of Android operating system.
1.9 Summary
Kernel is a program that constitutes the central core of a computer operating system.
Operating system (OS) is a software program that manages the hardware and software resources
of a computer.
System call is the mechanism used by an application program to request service from the
operating system.
An operating system that utilizes multitasking is one that allows more than one program to run
simultaneously.
Symmetric multi processing (SMP) involves a multiprocessor computer architecture where two or
more identical processors connect to a single shared main memory.
1.10 Keywords
Asymmetric multi processing: Asymmetric hardware systems commonly dedicated individual
processors to specific tasks.
MISD multi processing: Multiple Instructions, Single Data is a type of parallel computing
architecture where many functional units perform different operations on the same data.
Real time operating system (RTOS): Real-time operating systems are used to control machinery,
scientific instruments and industrial systems such as e mbedded systems.
SIMD multi processing: In a single instruction stream, multiple data stream computer one processor
handles a stream of instructions, each one of which can perform calculations in parallel on multiple
data locations.
UNIX-like: UNIX-like family is a diverse group of operating systems, with several major
subcategories including System V, BSD, and Linux.
2.0 Objectives
After studying this chapter, you will be able to:
Define process
Describes the process state model
Explain the description and control of process
Describes the PCB, process control
Understand the threads
Explain the threads in LINUX
2.1 Introduction
A process is a program that is running on your computer. This can be anything from a small
background task, such as a spell-checker or system events handler to a full-blown application like
Internet Explorer or Microsoft Word. All processes are composed of one or more threads.
Since most operating systems have many background tasks running, your computer is likely to have
many more processes running than actual programs. For example, you may only have three programs
running, but there may be twenty active processes. You can view active processes in Windows by
opening the Task Manager (press Ctrl-Alt-Delete and click Task Manager). On a Mac, you can see
active processes by opening Activity Monitor (in the Applications→Utilities folder).
The term "process" can also be used as a verb, which means to perform a series of operations on a set
of data. For example, your computer's CPU processes information sent to it by various programs.
Object program i.e. code to be executed. Data is used for executing the program. While executing
the program, it may require some resources. Last component is used for verifying the status of the
process execution. A process can run to completion only when all requested resources have been
allocated to the process. Two or more processes could be executing the same program, each using
their own data and resources.
Suspended Processes
Characteristics of suspend process
1. Suspended process is not immediately available for execution.
2. The process may or may not be waiting on an event.
3. For preventing the execution, process is suspended by OS, parent process , process itself
and an agent.
4. Process may not be removed from the suspended state until the agent orders the removal.
Swapping is used to move all of a process from main memory to disk. When all the
process by putting it in the suspended state and transferring it to disk.
1. Pointer: Pointer points to another process control block. Pointer is used for maintaining
the scheduling list.
2. Process State: Process state may be new, ready, running, waiting and so on.
3. Program Counter: It indicates the address of the next instruction to be executed for this
process.
4. Event information: For a process in the blocked state this field contains information
concerning the event for which the process is waiting.
5. CPU register: It indicates general purpose register, stack pointers, index registers and
accumulator‘s etc. number of register and type of register totally depends upon the
computer architecture.
1. Memory Management Information: This information may include the value of base and
limit register. This information is useful for deal locating the memory when the process
terminates.
2. Accounting Information: This information includes the amount of CPU and real time used
time limits, job or process numbers, account numbers etc.
Process control block also includes the information about CPU scheduling, I/O re source
management, file management information, priority and so on. The PCB simply serves as
the repository for any information that may vary from process to process.
When a process is created, hardware registers and flags are set to the values provided b y
the loader or linker. Whenever that process is suspended, the contents of the processor
register are usually saved on the stack and the pointer to the related stack frame is stored
in the PCB. In this way, the hardware state can be restored when the proc ess is scheduled
to run again.
An operating system, as a service provider, naturally also imposes control on processes that consume
the services. That is both processes and resources are under the operating system‘s control. To enable
this control, some facilities must be provided, which is control tables we are covering.
Control Structures
The operating system constructs and maintains tables of information about each entity that it is
managing. Figure 2.4 illustrates that four different types of tables are maintained by the operating
system:
• Memory tables: are used to keep track of both main memory and secondary memory. Part of main
memory is reserved for use by the operating system; the remainder is available for use by processes.
• I/O tables: are used by the operating system to manage I/O devices. They should record:
The availability of each particular device
The status of I/O operations relating to each device and the location in main memory being
used as the source or destination of the I/O transfer.
• File tables: provides information about
the existence of files
their location on secondary memory
their current status and attributes
• Process tables: contains what the operating system must know to manage and control processes,
including:
Process location
First a program statically consists of a set of instructions that manipulate data, thus the operating
system needs to allocate space for its code and data. In addition, the dynamic execution of a program
requires a stack that is used to keep track of procedure calls and parameter passing between
procedures.
Finally, each process has associated with it a number of attributes that are used by the operat ing
system for process control. Typically the operating system needs to maintain a structure called
process control block (PCB) containing these attributes. The collection of the code, data, stack, and
attributes is referred to as process image.
The location of a process image depends on how the memory management is implemented. Most
modern operating systems use a memory management scheme in which a process image consists of a
set of blocks that need not be stored contiguously. The blocks may be variable le ngth (usually called
segments), or fixed length (called pages), or a combination. This scheme allows the operating system
to bring in only a portion of any particular process. Therefore process tables must show the location
of each segment and/or page of each process image. Figure 2.5 depicts a primary process table with
one entry for each process.
Process attributes
Process attributes are stored in PCB. Different systems organize this information in different ways, so
here we examine only what type of information should be included as attributes, instead of how. The
typical elements of a PCB are:
Process identification
All operating systems need to assign a unique identifier to each process in the systems so as to refer
to them conveniently. The identifier may be numeric, and for example simply the index of the
corresponding entry in the primary process table; otherwise, a mapping must be available to allow the
operating system to locate the right entry based on the identifier.
Almost every process is created by another process, its parent process, on behalf of a user. Their
identifiers should also be present in the PCB of the child process.
Process state information
It consists of the contents of processor registers. Typically the context includes user-visible registers,
control and status registers, and stacks pointers.
Caution
When a process is interrupted, the context of the processor must be saved so that it can be restored
when the process resumes execution.
Process creation
Before we have roughly discussed what happens when a process is created, now it is clearer what is
involved in the period after we know the control structures for processes. First, a unique identifier is
assigned to the new process and a new entry is added to the primary process table. Second, the space
for all elements of a process image is allocated, and then the PCB is initialized. The values for some
fields of the PCB structure are already known, e.g. the ID of the parent process, program counter, and
system stack pointers, while others fields will use default values, e.g. the process state initialized to
Ready or Ready/Suspend, or simply be filled with zero. Finally the PCB will be linked into some data
structures and other structures may be created for billing and performance assessment purposes.
Process switching
When to switching processes
It is mentioned before that multiprogramming systems perform process switching when the currently
running process has to wait for the completion of I/O operation and thus keep the processor busy for
higher efficiency. Upon the arrival of interrupt signal, the operating system will move all the blocked
processes waiting for the signal to the Ready state, and the decide whether to resume execution of the
process currently in the Running state or to pre-empt that process for a higher-priority Ready process.
This relates to process scheduling, what we will talk about next week.
Non-process kernel
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Naturally, the Not Running state is to be split into two states: Ready and Blocked. Thus if the
creation period and termination period of a process are also considered states, we will obtain a five -
state model as depicted in Figure 2.7.
During the lifespan of a process, its execution status may be in one of four states: (associated with
each state is usually a queue on which the process resides)
Executing: the process is currently running and has control of a CPU
Waiting: the process is currently able to run, but must wait until a CPU becomes available
Blocked: the process is currently waiting on I/O, either for input to arrive or output to be sent.
Suspended: the process is currently able to run, but for some reason the OS has not placed the
process on the ready queue.
Ready: the process is in memory, will execute given CPU time.
Processes in the Blocked state cannot execute until some event occurs, such as the completion of I/O
operation, while a ready process is always prepared to execute when given the opportunity.
Correspondingly, the queuing diagram in Figure 2.7 (b) may be extended to reflect this five-state
model. Figure 2.7 (a) differentiates the blocked processes and the ones that may be dispatched again
immediately by giving two paths from the Running state to the Ready state. An additional queue is
set up for blocked processes. It means that when an event occurs, the dispatcher will go all the way
through the queue for those processes waiting for that event. In some cases, there may be hundreds or
even more processes in that queue, therefore it would be more efficient to have a number of queues,
one for each event. Thus Figure 2.8 (b) is obtained.
When a process has to wait for the completion of I/O operation and thus gives up the processor,
dispatching control to another process may avoid the idle of the processor. But this arrangement does
not entirely solve the problem. The processor may be so much faster than I/O that all of the processes
in memory are waiting for I/O. Thus even with multiprogramming, a processor could be idle for a
long time.
Why not expand main memory so that more processes may be accommodated? It is possible, but
cannot be a once-for-all solution since more memory means higher cost and with more memory the
average size of programs is also likely to increase.
A real solution is swapping, which involves moving part or all of a blocked process from main
memory to disk. Thus memory space may be freed for the system to bring in new processes to run.
With swapping, a new state, Suspend, must be added to the process behaviour model, as depict ed in
Figure 2.9 (a)
However a single Suspend state is still not enough, since the system needs to distinguish the
suspended processes that remain blocked and those that are though suspended and residing in pended
and residing in secondary memory but are available for execution as soon as they are loaded into
main memory.
Accordingly, two Suspend states, Blocked/Suspend and Ready/Suspend, are introduced in Figure 2.9
(b). A process may move from Blocked/Suspend to Ready/Suspend when the event for which it has
been waiting happens. All processes in either state may be brought back into main memory.
3. Switching the CPU to another Process requires to save state of the old process and loading new
process state is called as…………………………..
(a) Process blocking (b) context switch
(c) Time sharing (d) none of the above.
Process state: The state may be new, ready, running, and waiting, halted, and so on.
Program counter: The counter indicates the address of the next instruction to be executed for this
process.
CPU registers: The registers vary in number and type, depending on the computer architecture. They
include accumulators, index registers, stack pointers, and general -purpose registers, plus any
condition-code information.
Along with the program counter, this state information must be saved when an interrupt occurs, to
allow the process to be continued correctly afterward (Figure 2.1 1).
CPU-scheduling information: This information includes a process priority, pointers to scheduling
queues, and any other scheduling parameters. Memory-management information: This information
may include such information as the value of the base and limit registers, the page tables, or the
segment tables, depending on the memory system used by the operating system.
Memory-management information: This information may include such information as the value of
the base and limit registers, the page tables, or the segment tables, depending on the memory system
used by the operating system.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
2.6 Threads
A thread is the smallest unit of processing that can be performed in an OS. In most modern operating
systems, a thread exists within a process - that is, a single process may contain multiple threads.
2.6.1 Processes Vs Threads
As we mentioned earlier that in many respect threads operate in the same way as that of processes.
Some of the similarities and differences are:
Similarities
Like processes threads share CPU and only one thread active (running) at a time.
Like processes, threads within processes, threads within processes execute sequentially.
Like processes, thread can create children.
And like process, if one thread is blocked, another thread can run.
Differences
Unlike processes, threads are not independent of one another.
Unlike processes, all threads can access every address in the task.
Unlike processes, threads are design to assist one other. Note that processes might or might not
assist one another because processes may originate from different users.
Why Threads?
Following are some reasons why we use threads in designing operating systems.
1. A process with multiple threads makes a great server for example printer server.
2. Because threads can share common data, they do not need to use undercrosss communication.
3. Because of the very nature, threads can take advantage of multiprocessors.
2.7 Threads in Linux
Threads are ―light weight processes‖ (LWPs). The idea is a process has five fundamental parts: code
(―text‖), data (VM), and stack, file I/O, and signal tables. ―Heavy-weight processes‖ (HWPs) have a
significant amount of overhead when switching: all the tables have to be flushed f rom the processor
for each task switch. Also, the only way to achieve shared information between HWPs is through
pipes and ―shared memory‖. If a HWP spawns a child HWP using fork(), the only part that is shared
is the text.
Threads reduce overhead by sharing fundamental parts. By sharing these parts, switching happens
much more frequently and efficiently. Also, sharing information is not so ―difficult‖ anymore:
everything can be shared. There are two types of threads: user -space and kernel-space.
Disadvantages: User-space threads have a problem that a single thread can monopolize the time slice
thus starving the other threads within the task. Also, it has no way of taking advantage of SMPs
(Symmetric Multiprocessor systems, e.g. dual-/quad-Pentiums). Lastly, when a thread becomes I/O
blocked, all other threads within the task lose the time slice as well.
Solutions/work around. Some user-thread libraries have addressed these problems with several work-
around. First time slice monopolization can be controlled with an external monitor that uses its own
clock tick. Second, some SMPs can support user-space multithreading by firing up tasks on specified
CPUs then starting the threads from there [this form of SMP threading seems tenuous, at best]. Third,
some libraries solve the I/O blocking problem with special wrappers over system calls, or the task can
be written for non-blocking I/O.
Advantages: Since the clock tick will determine the switching times, a task is less likely to hog the
time slice from the other threads within the task. Also I/O blocking is not a problem. Lastly, if
properly coded, the process automatically can take advantage of SMPs and will run incrementally
faster with each added CPU.
2.7.3 Combination
Some implementations support both user- and kernel-space threads. This gives the advantages of each
to the running task. However, since Linux‘s kernel-space threads nearly perform as well as user-
space, the only advantage of using user-threads would be the cooperative multitasking.
Several problems with threads originate from a classic view and its intrinsic concurrency complexity.
2.7.4 Classic View
In many other multithreaded OSs, threads are not processes merely parts of a parent task. Therefore,
the question of ―what happens if a thread calls fork() or (worse) if a thread execve()‘s some external
program‖ becomes problematic: the whole task could be replaced. The POSIX 1c standard defines a
thread calling fork() to duplicate only the calling thread in the new process; and an execve() from a
thread would stop all threads of that process.
Having two different implementations and schedulers for processes is a flaw that has perpetuated
from implementation to implementation. In fact, some multitasking OSs have opted not to support
threads due to these problems (not to mention the effort needed to make the kernel and libraries 100%
re-entrant). For example, the POSIX-compliant Windows NT does not to support threads (Windows
NT does support threads but they are not POSIX compliant).
Achieving this vision is based on a single business operating system known as COS - Cummins
Operating System, utilized throughout the company. This incorporates 10 major practi ces, as
illustrated, with Six Sigma as the primary process improvement method. COS practices understand
the requirements of both internal and external customers. Having understood these requirements
Cummins can then compare its desired and current performance.
Six Sigma is not simply a quality initiative and nor is it just about solving problems. It is about
making major process improvements in key areas of the business where a gap exists between desired
and current levels of performance. This starting point provides an opportunity for Cummins to look
carefully at every input into its business processes.
Six Sigma therefore helps Cummins to understand the key processes in helping it to deliver results;
and provides an appreciation that the business outputs are determined by the processes that the inputs
go through to become outputs. Using Six Sigma to improve business processes for selected projects
thus enables positive gains to be made in key areas of the business.
Question
1. Explain the Cummins operating system business model.
2. What do you understand by Six Sigma?
2.8 Summary
Processes access all operating system resources through a well -defined system call interface with
the operating system.
Depending on the operating system (OS), a process may be made up of multiple threads of
execution that execute instructions concurrently.
A process consists of a set of instructions, which may include ones related to I/O operations.
Process Control is the active changing of the process based on the result s of process monitoring.
Operating systems are considered as a manager of the underneath various hardware resources;
however this management is not the extreme goal, but a way in which the processes may access
the resources reasonably.
The five-state model provides a systematic way of modelling the behaviour of processes. Many
operation systems are indeed constructed using this model.
A proxy server satisfying the requests for a number of computers on a LAN would be benefited
by a multi-threaded process.
Accounting information includes the amount of CPU and real time used, time limits, account
numbers, job or process numbers, and so on.
2.9 Keywords
Buffering: A buffer is a temporary storage location for data while the data is being transferred .
CPU Registers: The central processing unit (CPU) contains a number of memory locations which
are individually addressable and reserved for specific purpose. These memory locations are called
registers.
Process Control Block (PCB): The PCB is a certain store that allows the operating systems to locate
key information about a process.
Process Counter: Program instructions uniquely identified by their program counters (PCs) provide a
convenient and accurate means of recording the context of program executi on and PC-based
prediction techniques have been widely used for performance optimizations at the architectural level.
Process Management: The operating system manages many kinds of activities ranging from user
programs to system programs like printer spooler, name servers, file server etc. Each of these
activities is encapsulated in a process.
Process State: The process state consist of everything necessary to resume the process execution if it
is somehow put aside temporarily.
Synchronization: In computer science, especially parallel computing, synchronization means the
coordination of simultaneous threads or processes to complete a task in order to get correct runtime
order and avoid unexpected race conditions.
Thread: A thread is a single sequence stream within in a process. Because threads have some of the
properties of processes, they are sometimes called lightweight processes. In a process, threads allow
multiple executions of streams.
3.0 Objectives
After studying this chapter, you will be able to:
Understand the types of scheduler
Explain the scheduling criteria
Discuss about uniprocessor scheduling
Discuss about multiprocessor scheduling
Understand the algorithm evaluation
Explain the process scheduling in Linux
3.1 Introduction
Back in the old days of batch systems with input in the form of card images on a magnetic tape, the
scheduling algorithm was simple: just run the next jobs on the tape. With timesharing systems, the
scheduling algorithm became more complex, because there were generally multiple users waiting for
service. There may be one or more batch streams as well (e.g., at an insurance company, for
processing claims). On a personal computer you might think there would be only one active process.
After all, a user entering a document on a word processor is unlikely to be simultaneously compiling
a program in the background. However, there are often background jobs, such as electronic mail
daemons sending or receiving e-mail. You might also think that computers have gotten so much faster
over the years that the CPU is rarely a scarce resource any more. However, new applications tend to
demand more resources. Processing digital photographs or watching real time video is examples.
What process to be created is another issue that long-term scheduling needs to deal with? The
decision may be made on a first-come-first-served basis or it can be a tool to manage system
performance. For example, if the information is available, the scheduler may attempt to keep a mix of
processor-bound and I/O-bound processes. A processor bound process is one that mainly performs
computational work and occasionally uses I/O devices, while an I/O-bound process is one that uses
I/O devices more than the microprocessor.
Short-term scheduling: Short-term scheduling is the most common use of the term scheduling, i.e.
deciding which ready process to execute next. The short-term scheduler, also known as the
dispatcher, is invoked whenever an event occurs that may lead to the suspension of the cur rent
process or that may provide an opportunity to pre-empt a currently running process in favour of
another.
Figure 2: associates the three levels of scheduling with different storage facilities, the outmost
storage devices accommodating file systems for executable programs, the virtual memory space for
blocked processes, and main memory for processes that may be executed right away.
3.4.2 Pre-emption
Another issue relating to scheduling is whether a running process could be pre-empted or not. There
are two categories:
Non pre-emptive: In this case, a running process continues to execute until (a) it terminates or (b)
blocks itself to wait for I/O or to request some operating system service.
Pre-emptive: The currently running process may be interrupted and moved to the Ready state by the
operating system. The pre-emption may possibly be made due to the arrival of a new process, or the
occurrence of an interrupt that places a blocked process in the READY state.
Pre-emptive policies incur greater overhead than non pre-emptive ones but may be preferred since
they prevent some processes from monopolizing the processor for a long time.
3.4.3 Priority
In many systems, a process is assigned a priority and the scheduler will always choose a process of
higher priority over one of lower priority. Figure 4 depicts the revised process queue model with the
consideration of priority.
If we use turnaround time to measure the performance of various algorithms, we may obtain Table 2,
which also includes the so-called normalized turnaround time, which is the ratio of turnaround time
to service time. It indicates the relative delay experienced by a process. The minimum possible value
for this ratio is of course 1.0; increasing values correspond to a decreasing level of service.
First-Come-First-Served
First-come-first-served (FCFS) is the simplest scheduling policy, also known as FIFO. With this
policy, when a process becomes ready, it joins the ready queue and when the currently running
process finishes, the process at the head of the ready queue, which has waited there for the longest
time, is selected for execution.
As we can see from Table 2, though process E requires only 2 units of serv ice time, it has to wait
until all the preceding processes complete. When we line up for checking out books in the library, we
of course are unwilling to be behind some guy who will borrow a hundred books. So FCFS performs
much better for long processes and is pretty unfair for short ones.
Thus a system with FCFS tends to favour processor-bound processes over I/O-bound ones, since the
latter, though requiring relative light use of the processor has to wait a long time for the completion
of processes before it in the ready queue before it gets dispatched for a short time and is blocked for
I/O operation. The same thing may repeatedly happen during the execution of the I/O -bound process.
Noticeably, while the process is waiting, the I/O devices that are suppos ed to be used will be idle,
thus leading to the inefficiency of I/O devices.
Hence FCFS is not an attractive alternative on its own, but it may be combined with a priority scheme
to provide an effective scheduler.
Round Robin
A straightforward way to reduce the suffering of short processes is to use a time-sharing scheme,
called round robin, with which the operating system assigns short time slices to each process and if
the slices allocated for a process are not enough for it to complete, then the process has to wait until
its time slice comes again. With the help of a clock, whenever a clock interrupt occurs, the operating
system will check if the time slice for the current process ends.
If yes, then another process will be scheduled and allocated a time slice. With round robin, the key
design issue is the length of the time slice or quantum. If the slice is short, then short processes tend
to get chance to run early. However very short time slices should be avoided, since there is overhead
involved in handling the clock interrupt and performing the scheduling. In principle, a time slice
should be slightly greater than the time required for a typical interaction . Figure 6 illustrates the
effect the decision has on response time. Figure 4 and Table 2 show th e results for our example using
time quanta q of 1 and 4 time units respectively. The round robin policy is generally more effective
than FCFS, however the I/O-bound processes are still to some extent treated unfairly, because these
processes are very likely to be blocked before they use up a complete time quantum, while the
processor-bound processes generally make great use of time slices.
Where w is the time since the process was created and s is the expected service time. Then whenever
the current process is blocked or completes, the process with the greatest value will be scheduled to
run.
Since a smaller denominator (smaller expected service time) in the fraction results in a greater value,
shorter processes are favoured. And again as SRT and SPN, the expected service time needs to be
estimated based on the history.
Feedback
Another approach to favour shorter processes is to penalize processes that have been running longer,
thus avoiding predicting the expected processing time.
To do so, a dynamic priority mechanism is used. As Figure 3.7 shows, the ready process queue is
split into several queues, each with a different priority. When a process first enters the system, it is
put in the queue RQ0, which has the highest priority. After its first execution and when it becomes
READY again, it is placed in RQ1, and so on and so forth. The feedback approach is also pre -
emptive. Like round robin, a specific quantum of time is allocated to each scheduled process. When
the time is out, the current process is pre-empted and another process is chosen from the queues on
the highest-priority-first basis. The processes in the same queue follow a FCFS policy. Figure 4 and
Table 2 shows our example in the case when the quantum of time is one unit of time.
Obviously, shorter processes are favoured over longer ones since the latter tend to gradually drift
downward and cannot get a chance to run for a long time until there are no processes of higher
priority. Starvation is also possible in the feedback policy if new processes come in frequently.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Caution
While multiprocessor scheduling when a header file is changed, all the code files that include it must
be recompiled.
3.5.1 Timesharing
Let us first address the case of scheduling independent processes; later we will consider how to
schedule related processes. The simplest scheduling algorithm for dealing with unrelated processes
(or threads) is to have a single system wide data structure for ready processes, possibly just a list, but
more likely a set of lists for processes at different priorities as depicted in Figure 9 (a). Here the 16
CPUs are all currently busy, and a prioritized set of 14 processes are waiting to run. The first CPU to
finish its current work (or have its process block) is CPU 4, which then locks the scheduling queues
and selects the highest priority process, A, as shown in Figure 9 (b). Next, CPU 12 goes idle and
chooses process B, as illustrated in Figure 9 (c). As long as the processes are completely unrelated,
doing scheduling this way is a reasonable choice.
Figure 9: Using a single data structure for scheduling a multiprocessor.
Having a single scheduling data structure used by all CPUs timeshares the CPUs, much as they would
be in a uniprocessor system. It also provides automatic load balancing because it can never happen
that one CPU is idle while others are overloaded. Two disadvantages of this approach are the
potential contention for the scheduling data structure as the numbers of CPUs grows and the usual
overhead in doing a context switch when a process blocks for I/O.
It is also possible that a context switch happens when a process‘ quantum expires. On a
multiprocessor, that has certain properties not present on a uniprocessor. Suppose that the process
holds a spin lock, not unusual on multiprocessors, as discussed above. Other CPUs waiting on the
spin lock just waste their time spinning until that process is scheduled again and releases the lock. On
a uniprocessor, spin locks are rarely used so if a process is suspended while it holds a mutex, and
another process starts and tries to acquire the mutex, it will be immediately b locked, so little time is
wasted.
To get around this anomaly, some systems use smart scheduling, in which a process acquiring a spin
lock sets a process-wide flag to show that it currently has a spin lock. When it releases the lock, it
clears the flag. The scheduler then does not stop a process holding a spin lock, but instead gives it a
little more time to complete its critical region and release the lock. Another issue that plays a role in
scheduling is the fact that while all CPUs are equal, some CPUs ar e more equal. In particular, when
process A has run for a long time on CPU k, CPU k‘s cache will be full of A‘s blocks. If A gets to
run again soon, it may perform better if it is run on CPU k, because k‘s cache may still contain some
of A‘s blocks. Having cache blocks preloaded will increase the cache hit rate and thus the process‘
speed? In addition, the TLB may also contain the right pages, reducing TLB faults. Some
multiprocessors take this effect into account and use what is called affinity scheduling. The basic idea
here is to make a serious effort to have a process run on the same CPU it ran on last time. One way to
create this affinity is to use a two-level scheduling algorithm. When a process is created, it is
assigned to a CPU, for example based on which one has the smallest load at that moment. This
assignment of processes to CPUs is the top level of the algorithm. As a result, each CPU acquires its
own collection of processes.
Figure 10: A set of 32 CPUs split into four partitions, with two CPUs available.
Periodically, scheduling decisions have to be made. In Uniprocessor systems, shortest job first is a
well-known algorithm for batch scheduling. The analogous algorithm for a multiprocessor is to
choose the process needing the smallest number of CPU cycles that is the process who‘s CPU -count
X run-time is the smallest of the candidates. However, in practice, this information is rarely
available, so the algorithm is hard to carry out.
To see the kind of problem that can occur when the threads of a process (or processes of a job) are
independently scheduled, consider a system with threads A 0 and A 1 belonging to process A and
threads B 0 and B 1 belonging to process B. threads A 0 and B 0 are timeshared on CPU 0; threads A 1 and
B1 are timeshared on CPU 1. Threads A 0 and A 1 need to communicate often. The communication
pattern is that A 0 sends A 1 a message, with A 1 then sending back a reply to A 0 , followed by another
such sequence. Suppose that luck has it that A 0 and B 1 start first, as shown in Figure: 11.
Figure: 11 Communication between two threads belonging to process A that are running out of
phase.
In time slice 0, A 0 sends A 1 a request, but A 1 does not get it until it runs in time slice 1 starting at 100
msec. It sends the reply immediately, but A 0 does not get the reply until it runs again at 200 msec.
The net result is one request-reply sequence every 200 msec.
The solution to this problem is gang scheduling, which is an outgrowth of co -scheduling. Gang
scheduling has three parts:
Groups of related threads are scheduled as a unit, a gang.
All members of a gang run simultaneously, on different timeshared CPUs.
All gang members start and end their time slices together.
The trick that makes gang scheduling work is that all CPUs are scheduled synchronously. This means
that time is divided into discrete quanta as we had in Figure: 3.10. At th e start of each new quantum,
all the CPUs are rescheduled, with a new thread being started on each one. At the start of the
following quantum, another scheduling event happens. In between, no scheduling is done. If a thread
blocks, its CPU stays idle until the end of the quantum.
The idea of gang scheduling is to have all the threads of a process run together, so that if one of them
sends a request to another one, it will get the message almost immediately and be able to reply almost
immediately. In Figure: 12, since all the A threads are running together, during one quantum, they
may send and receive a very large number of messages in one quantum, thus eliminating the problem
of Figure: 9.
4. One measure of work is the number of processes that are completed per time unit, called...........
(a). Efficiency (b). State (c). Waiting time (d). Throughput
5. The dispatcher always chooses a process in the queue with lowest priority to execute.
(a). True (b). False
3.6.3 Simulation
To get a more accurate evaluation of scheduling algorithms, we can use simulations. Simulations
involve programming a model of the computer system.
3.6.4 Implementation
The only real way to test an operating system is to write the code and run it. However, this approach
is very expensive.
Caution
Be aware that a pre-empted process is not suspended, since it remains in the TASK_RUNNING state;
it simply no longer uses the CPU.
Questions
1. Write brief discussion of the campaign of the lead generation for sap partner.
2. Explain the concept of central process scheduling.
3.8 Summary
Real-time processes should never be blocked by lower-priority processes, they should have a
short response time and, most important, such response time should have a minimum variance .
The mid-term scheduler temporarily removes processes from main memory and places them on
secondary memory (such as a disk drive) or vice versa.
Dispatcher is invoked whenever an event occurs that may lead to the suspension of the current
process or that may provide an opportunity to pre-empt a currently running process in favour of
another.
The scheduling algorithm of traditional UNIX operating systems must fulfil several conflicting
objectives: fast process response time, good throughput for background jobs.
The scheduling policy is also based on ranking processes according to their priority.
3.9 Keywords
Nonpreemptive: In this case, a running process continues to execute until (a) it terminates or (b)
blocks itself to wait for I/O or to request some operating system service.
Pre-emptive: The currently running process may be interrupted and moved to the Ready state by the
operating system.
Throughput: One measure of work is the number of processes that are completed per time unit,
called throughput.
Turnaround time: Turnaround time is the sum of the periods spent waiting to get into memory,
waiting in the ready queue, executing on the CPU, and doing I/O.
Waiting time: Waiting time is the sums of periods spend waiting in the ready queue.
4.0 Objectives
After studying this chapter, you will be able to:
Explain the critical-section problem
Discuss the synchronization hardware
Explain the classical problems of synchronization
Define the critical regions
Explain the deadlocks-system model
4.1 Introduction
Among the problems that need to be addressed by computer scientists in order for sophisticated
operating systems to be built are deadlock and process synchronization. Deadlock occurs when two or
more processes (programs in execution) request the same resources and are allocated them in such a
way that a circular chain of processes is formed, where each process is waiting for a resource held by
the next process in the chain. As a result, no process can continue; they are deadlocked. An operating
system can handle this situation with various prevention or detection and recovery techniques. For
example, resources might be numbered 1, 2, 3, and so on. If they must be requested by each process
in this order, it is impossible for a circular chain of deadlocked pro cesses to develop. Another
approach is simply to allow deadlocks to occur, detect them by examining non active processes and
the resources they are holding, and break any deadlock by aborting one of the processes in the chain
and releasing its resources.
Process synchronization is required when one process must wait for another to complete some
operation before proceeding. For example, one process (called a writer) may be writing data to a
certain main memory area, while another process (a reader) may be re ading data from that area and
sending it to the printer. The reader and writer must be synchronized so that the writer does not
overwrite existing data with new data until the reader has processed it. Similarly, the reader should
not start to read until data has actually been written to the area. Various synchronization techniques
have been developed. In one method, the operating system provides special commands that allow one
process to signal to the second when it begins and completes its operations, so t hat the second knows
when it may start. In another approach, shared data, along with the code to read or write them, are
encapsulated in a protected program module. The operating system then enforces rules of mutual
exclusion, which allow only one reader or writer at a time to access the module. Process
synchronization may also be supported by an interprocess communication facility, a feature of the
operating system that allows processes to send messages to one another.
One day, Mohan checks the balance and seeing that it is 4,500 decides to add 2,250 to the account.
Unfortunately, access to the account is not locked, so just then poor students withdraws 900 from the
account and the new balance is recorded as 3,600. After adding the 2,250 Mohan records the balance
as 6,750 rather than 5,850 as it should be.
The nature of the problem is clearer when we examine the assembly langu age code for such an
operation:
The root of the problem stems from a context switch occurring in the middle of the execution of the
critical section.
We only need to enforce mutual exclusion with a single lock or semaphore to correctly solve this
problem. In python, a lock object from the threading module is a semaphore where s == 1. That is,
only one thread is allowed to hold the lock at a time (mutual exclusion).
Bounded Buffer
This problem is also called the producers and consumers problem. A finite s upply of containers is
available. Producers take an empty container and fill it with a product. Consumers take a full
container, consume the product and leave an empty container. The main complexity of this problem is
that we must maintain the count for both the number of empty and full containers that are available.
Producers produce a product and consumers consume the product, but both use of one of the
containers each time. Here is a solution to the bounded buffer problem with N containers using a
monitor:
Writers have mutual exclusion, but multiple readers at the same time are allowed. Here is a basic
python class that implements a solution to readers and writers using simple locks
Dinning Philosophers
Five philosophers (the tasks) spend their time thinking and eating. They eat at a round table with five
individual seats. To eat, each philosopher needs two forks (the resources). There are five forks on the
table, one to the left and one to the right of each seat. When a philosopher can not grab both forks, he
sits and waits. Eating takes random time, and then the philosopher puts the forks down and leaves the
dining room. After spending some random time thinking about the nature of the universe, he again
becomes hungry, and the circle repeats itself.
A Philosopher needs both the fork on his left hand on his right to eat. The forks are shared with the
neighbours on either side.
It can be observed that a straightforward solution, when forks are impleme nted by semaphores, is
exposed to deadlock. There exist two deadlock states when all five philosophers are sitting at the
table holding one for each. One deadlock state is when each philosopher has grabbed the fork left of
him, and another is when each has the fork on his right.
Here is a basic monitor based solution:
import threading
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Ex2: Semaphore?
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
4.5 Critical Regions
1. Motivation: The time dependent errors can be easily generated when semaphores are used to solve
the critical section problem. To overcome this difficulty a new language construct, the critical
region was introduced.
2. Definition and notation
A variable v of type T, which is to be shared among many processes, can be declared:
VAR v: SHARED T;
The variable v can be accessed only inside a region statement of the following form:
REGION v DO S;
This construct means that while statement S is being executed, no other process can access the
variable v. Thus, if the two statements, are executed concurrentl y in distinct sequential processes, the
result will be equivalent to the sequential execution S1 followed by S2, or S2 followed by S1
REGION v DO S1;
REGION v DO S2;
To illustrate this construct, consider the frames class defined in abstract data type. Si nce mutual
exclusion is required when accessing the array free, we need to declare it as a shared array.
VAR free: SHARED ARRAY [l..n] OF Boolean;
The acquire procedure must be rewritten as follows;
The critical-region construct guards against some simple errors associated with the semaphore
solution to the critical section problem which may be made by a programmer.
3. Compiler implementations of the critical region construct. For each declaration VAR v: SHARED
T; the compiler generates a semaphore v-mutex initialized to 1. For each statement, REGION v
DO S; the compiler generates the following code:
p(v-mutex);
S;
V(v-mutex);
Critical region may also be nested. In this case, however, deadlocks may result.
Example of deadlock)
VAR x, y: SHARED T;
PARBEGIN
Q: REGION x DO REGION y DO S1;
R: REGION y DO REGION x DO S2;
PAREND;
However, the class concept alone cannot guarantee that such sequences will be observed.
A process might operate on the file without first gaining access permission to it
A process might never release the file once it has been granted access to it
A process might attempt to release a file that it never required
A process might request the same file twice
4.5.2 Monitors
A monitor is a collection of procedures, variables, and data structure s grouped together.
Processes can call the monitor procedures but cannot access the internal data structures.
Only one process at a time may be active in a monitor.
Active in a monitor means in ready queue or CPU with the program counter somewhere in a
monitor method.
A monitor is a language construct.
Compare this with semaphores, which are usually an OS construct.
The compiler usually enforces mutual exclusion.
Condition variables allow for blocking and unblocking
o cv.wait() blocks a process.
The process is said to be waiting for (or waiting on) the condition variable cv.
o cv.signal() (also called cv.notify) unblocks a process waiting for the condition variable cv.
When this occurs, we need to still require that only one process is active in the monitor. This can be
done in several ways:
o On some systems the old process (the one executing the signal) leaves the monitor and the
new one enters
o On some systems the signal must be the last statement executed inside the monitor
o On some systems the old process will block until the monitor is available again
o On some systems the new process (the one unblocked by the signal) will remain blocked until
the monitor is available again.
Figure 4.7: The relative priority of these queues determines the operation of the monitor
implementation.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Detection schemes find the circular chain. At least one victim is chosen; its partial service is
rolled back and its resources freed, in an attempt to break the deadlock and allow the other
processes to complete. The victim(s) can be restarted later, witho ut alteration, but its rolled back
service must be repeated.
Holt provides an example of a single process in deadlock: process revenge is infinitely suspended
by a wait for an event that never occurs. Similarly, states that a kernel process is in deadlock if it
is permanently blocked waiting for a resource that will never become available. In accordance
with the halting problem, however, this type of dead state is impossible, in general, to prevent or
even detect with current technology. A deadlock cannot occur with only one process. Deadlock is
an anomaly of traffic control (i.e., competition synchronization) and contains at least two
competing processes.
Holt also considers deadlock as a circular wait with consumable resources, such as messages, that
can be dynamically created and destroyed. Message deadlocks, however, are controllable neither
by serial execution nor by most of the previously.
Researchers in the field of distributed databases typically limit their applications to requests for
data locks, and their treatments are consistent with deadlock material. Additional restrictions are
required to obtain a global knowledge of a deadlock, however, and fungible resources (basically
OR-requests) are generally not included in detection algorithms.
Some additional remarks will be useful. Processes in deadlock may be suspended in an inactive
wait or looping endlessly in a busy wait. They may be holding resource buffers, ―soft‖ resources,
which are typically fungible. In an attempt to differentiate between dif ferent types of dead states,
some of the literature has to use the term resource deadlock. In the remainder of this paper, we
accede to this terminology, although dead states with resources are obviously not always resource
deadlocks.
Communication Deadlock
A dead state can occur with a circular chain of incorrectly written processes that wait for a message
from another in the chain before each sends the awaited message. For example, two processes will
never complete if they are programmed to wait for a (consumable) resource instance before they
create the resource instance requested by the other. The producer/consumer example previously cited
also contains errors in the cooperation mechanism, preventing the sending of the awaited signal.
These communication deadlocks are not caused by interleaved code execution, nor can they be
prevented by resource pre-allocation, serial execution, or maintenance of safe states. Typically, after
some period of time, a circular wait is detected or assumed and the processes are aborted.
Scheduling Deadlock
A circular chain of four cars infinitely waiting behind stop signs at each of the four corners of an
intersection has been called a resource deadlock. A dead state with two trains approaching on
different tracks where each gives precedence to the other at a crossroads has also been incorrectly
labelled. These cars and trains are not waiting for resources held by others; the scheduler has failed to
assign a single process the priority required for accessing the resource. Recovery does not entail roll
back and the resultant repetition of service; it is sufficient to dynamically modify the scheduling
algorithm at the present point of service, perhaps by waiving one vehicle through. These scheduling
deadlocks, caused by incomplete competition mechanisms, do not require any alteration of processes.
Interleaved Deadlock
Interleaved deadlock is caused by a combination of competition and cooperation mechanisms, such
that only the competition mechanisms are incorrect. Requests from at least two processes compete for
the same resource elements and the scheduler interleaves their execution, allocating a resource to
each. Each process is restricted from releasing the held resource while waiting for an event to occur
at another process. In resource deadlock, the scheduler continually assigns highest priority to the
earlier requests (resources cannot be pre-empted), blocking later (active or suspended) competing
requests. Each high priority request waits for the service of its process‘s blocked request(s). A
circular chain exists of processes that are waiting for resources held by other processes in the chain.
Resource deadlock can be controlled by mechanisms of either the processes or of the scheduler. For
example, processes can agree to access resources in a linear order or the scheduler can maintain a
safe state. When resource deadlock is rare, control systems may avoid the overhead of prevention
algorithms. Detection schemes find the circular wait. After roll back, one or more processes are
restarted for repeated service, without correction, in the hope that the interleave d execution that
caused the dead state does not recur.
Unacceptable Waits
Some of the literature defines deadlock as a state in which processes wait forever, and thus never
complete service. We expand study to include unacceptable waits, states in which some processes do
not complete service either by the end of time or by a shorter time restriction. (It does not concern us
if a wait is unbounded, such as in the protocols of stop signs or slotted Aloha, only that the wait is
beyond a limit imposed by the user or the resource system.)
The producer-consumer problem of the introduction contains two processes, each with a follow
within wait, one with a blocked by wait, and the other with a contingency containing a request of the
other process. This anomaly is not a resource deadlock. Confusion occurs because the same construct
has been used to implement both competition and cooperation synchronization.
A dead state that appears to be ambiguous. In priority deadlock, a system pre-allocates resource
buffers for message reassembly. All buffers at a receiving router have been reserved for low -priority
messages. High-priority packets are scheduled to all of the bandwidth, but are discarded by the
receiving router because of the lack of buffers. The sending router keeps timing out and resending the
high-priority packets, so that low-priority packets cannot reach their pre-allocated buffers and
complete their reassembly. A circular dead state occurs due to inter leaved service of packets that are
assigned resources needed by others in the circular chain. We see that the prevention schemes of
(total) resource pre-allocation as well as resource pre-emption are applicable. Yet, are there
inconsistencies? Can the scheduler break the dead state without roll back by simply raising the
priority of low-priority packets? In addition, it appears that bandwidth is pre -empted, violating one of
the preconditions. On further examination, we see that bandwidth is repeatedly reass igned to the
high-priority packets. In addition, high-priority packets are indeed rolled back each time they are
refused buffer space at the receiving node. The deadlock can be broken only following roll back. This
resource deadlock indeed satisfies our definition. Low-priority requests are blocked by high-priority
requests for bandwidth, while high-priority requests are blocked by low-priority requests for buffer
space. Messages cannot complete service since each of the blocking packets has been assigned a
contingency containing a blocked packet of its message.
4.6.1 Characterization
Deadlock occurs if the following four conditions take place simultaneously in a system:
Mutual Exclusion: At least one resource must be held in a non-sharable mode. It means that only one
process at a time can use the resource. If another process requests the resource, the requesting process
must be delayed until the resource has been released.
Hold and Wait: A process must be holding at least one resource and waiting, to acqu ire additional
resources that are currently being held by other processes.
No Pre-emption: Resources cannot be pre-empted. A resource can be released only voluntarily by the
process holding it after that process has completed its task.
Circular Wait: A set {P 0 P 1 , P 2 ... P n } of waiting processes must exist such that P 1 is waiting for a
resource that is held by P 1 is waiting for a resource that is held by P 2 , ..., P n-1 is waiting for a resource
that is held by P n and P n is waiting for a resource that is held by P 0 .
All four conditions must hold for a deadlock to occur.
Caution
Processes in communication deadlock must be reconceived by their users before they can complete
service.
2. A process Pi that is waiting for some currently unavailable resource is said to be............
(a). Operating system (b). blocked
(c). Both(a) and (b) (d). None of these.
5. A deadlock.................... algorithm requires each process to make known in advance the maximum
number of resources of each type that it may need.
(a). Banker‘s (b). avoidance
(c). resource request (d). None of these.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Wait/Die Wound/Wait
O needs a resource held by Y O waits Y dies
Y needs a resource held by O Y dies Y waits
Banker's Algorithm
The algorithm was developed in the design process for the operating system and originally. The name
is by analogy with the way that bankers account for liquidity constraints .
Algorithm
The Banker‘s algorithm is run by the operating system whenever a process requests resources. The
algorithm avoids deadlock by denying or postponing the request if it determines that accepting the
request could put the system in an unsafe state (one where deadlock could occur). When a new
process enters a system, it must declare the maximum number of instances of each resource type that
may not exceed the total number of resources in the system. Also, when a process gets all its
requested resources it must return them in a finite amount of time.
Resources\
For the Banker‘s algorithm to work, it needs to know three things:
How much of each resource each process could possibly request
How much of each resource each process is currently holding
How much of each resource the system currently has available
Resources may be allocated to a process only if it satisfies the following conditions:
request ≤ max, else set error condition as process has crossed maximum claim made by it.
request ≤ available, else process waits until resources are available.
Some of the resources that are tracked in real systems are memory, semaphores and interface access.
The Banker‘s algorithm derives its name from the fact that this algorithm could be used in a banking
system to ensure that the bank does not run out of resources, because the bank would never allocate
its money in such a way that it can no longer satisfy the needs of all its customers. By using the
Banker‘s algorithm, the bank ensures that when customers request money the bank never leaves a
safe state. If the customer‘s request does not cause the bank to leave a safe state, the cash will be
allocated otherwise the customer must wait until some other customer deposits enough.
Basic data structures to be maintained to implement the Banker‘s algorithm:
Let n be the number of processes in the system and m be the number of resource types. Then we need
the following data structures:
Available: A vector of length m indicates the number of available resources of each type. If
Available[j] = k, there are k instances of resource type Rj available.
Max: An n×m matrix defines the maximum demand of each process. If Max[i,j] = k, then Pi may
request at most k instances of resource type Rj.
Allocation: An n×m matrix defines the number of resources of each type currently allo cated to
each process. If Allocation [i,j] = k, then process Pi is currently allocated k instance of resource
type Rj.
Need: An n×m matrix indicates the remaining resource need of each process. If Need[i,j] = k,
then Pi may need k more instances of resource type Rj to complete task.
Note: Need = Max - Allocation.
4.8.2 Detection
The deadlock detection, deadlocks are allowed to occur. Then the state of the system is examined to
detect that a deadlock has occurred and subsequently it is corrected. An algori thm is employed that
tracks resource allocation and process states, it rolls back and restarts one or more of the processes in
order to remove the detected deadlock. Detecting a deadlock that has already occurred is easily
possible since the resources that each process has locked and/or currently requested are known to the
resource scheduler of the operating system. This approach is simpler than deadlock avoidance or
deadlock prevention. It is so because predicting a deadlock before it happens is difficult, as it is
generally an undecidable problem, which itself results in a halting problem. However, in specific
environments, using specific means of locking resources, deadlock detection may be decidable. In the
general case, it is not possible to distinguish between algorithms that are merely waiting for a very
unlikely set of circumstances to occur and algorithms that will never finish because of deadlock.
Deadlock detection techniques include, but are not limited to model checking. This approach
constructs a finite state-model on which it performs a progress analysis and finds all possible
terminal sets in the model. These then each represent a deadlock.
After a deadlock is determined, it can be corrected by using one of the following methods:
Process Termination: One or more process involved in the deadlock may be aborted. We can choose
to abort all processes involved in the deadlock. This ensures that deadlock is resolved with certainty
and speed. But the expense is high as partial computations will b e lost. Or, we can choose to abort
one process at a time until the deadlock is resolved. This approach has high overheads because after
each abortion an algorithm must detect if the system is still in deadlock. Several factors must be
considered while choosing a candidate for termination, such as priority and age of the process.
Resource Pre-emption: Resource allocated to various processes may be successively pre -empted and
allocated to other processes until deadlock is broken.
7. If the system does not ensure that a deadlock cannot be prevented or a deadlock cannot be avoided,
then a deadlock may occur.
(a). True (b). False
8. ......................is a variant of the resource allocation graph, which can be used to detect deadlocks
in the system that has resources, all of which have only single instances.
(a). Wait-for Graph (b). Prevention
(c). avoidance (d). None of these.
10 Codes that reference one or more variables in a................fashion while any of those variables is
possibly being altered by another thread.
(a). Read-update-write (b). write-update-read
(c). Both (a) and (b) (d). None of these.
4.11 Summary
The use of locks is actually very easy. For every thread, before it accesses the set of data items.
It is successful, this thread enters the critical section and the lock is locked.
The key to preventing trouble involving shared storage is find some way to prohibit more than
one process from reading and writing the shared data simultaneously.
The process synchronization there is a thumb rule which states that production of information is
done by the producer and the consumption is done by the consumer processes.
Monitors are implemented by using queues to keep track of the processes attempting to become
active monitor.
4.12 Keywords
Bounded Waiting: The exists a limit as to how many other processes can get into their critical
sections after a process requests entry into their critical section and before that request is granted.
Circular Wait: The circular chain of waiting, in which each process is waiting for a resource held by
the next process in the chain.
Deadlock: A condition that occurs when two processes are each waiting for the other to complete
before proceeding.
Mutual Exclusion: The no more than one process can execute in its critical sect ion at one time.
Operating System: The most important program that runs on a computer. Every general -purpose
computer must have an operating system to run other programs.
5.0 Objectives
After studying this chapter, you will be able to:
• Discuss protection of a system
• Discuss the domain of protection
• Define the implementation of access matrix
5.1 Introduction
OS Security revolves around the appropriate protection of four elements. Confidentiality prevents or
minimizes unauthorized access and disclosure of data and information. Integrity makes sure that the
data being worked with is actually the correct data. Availability is the property of a system or system
resource being accessible and usable upon demand by an authorized system entity, according to
performance specification for the system. Authenticity makes possible that a computer system be able
to verify the identity of a user.
5.2 Protection
5.2.1 Goals of Protection
As computer systems have become more sophisticated and pervasive in their applications, the need to
protect their integrity has also grown. Protection was originally conceived as an adjunct to
multiprogramming operating systems, so that untrustworthy users might safely share a common
logical name space, such as a directory of files, or share a common physical name spa ce, such as
memory. Modern protection concepts have evolved to increase the reliability of any complex system
that makes use of shared resources.
There are several reasons for providing protection. Most obvious is the need to prevent mischievous,
intentional violation of an access restriction by a user. Of more general importance, however, is the
need to ensure that each program component active in a system uses system resources only in ways
consistent with the stated policies for the uses of these resource s. This requirement is an absolute one
for a reliable system.
Protection can improve reliability by detecting latent errors at the interfaces between component
subsystems. Early detection of interface errors can often prevent contamination of a healthy
subsystem by a subsystem that is malfunctioning.
An unprotected resource cannot defend against use (or misuse) by an unauthorized or incompetent
user. A protection-oriented system provides means to distinguish between authorized and
unauthorized usage.
Caution
Inflexibility of a protection system can prevent to secure a system or data.
Rings are numbered from 0 to 7, with outer rings having a subset of the privileges of the inner
rings.
Each file is a memory segment, and each segment description includes an entry that indicates the
ring number associated with that segment, as well as read, writes, and exe cutes privileges.
Each process runs in a ring, according to the current-ring-number, a counter associated with each
process.
A process operating in one ring can only access segments associated with higher (farther out)
rings, and then only according to the access bits. Processes cannot access segments associated
with lower rings.
Domain switching is achieved by a process in one ring calling upon a process operating in a lower
ring, which is controlled by several factors stored with each segment descriptor:
o An access bracket, defined by integers b1 <= b2.
o A limit b3 > b2
o A list of gates, identifying the entry points at which the segments may be called.
If a process operating in ring i calls a segment whose bracket is such that b1 <= i < = b2, then the
call succeeds and the process remains in ring i.
Otherwise a trap to the OS occurs, and is handled as follows:
o If i < b1, then the call is allowed, because we are transferring to a procedure with fewer
privileges. However if any of the parameters being passed are of segments below b1, then
they must be copied to an area accessible by the called procedure.
o If i > b2, then the call is allowed only if i <= b3 and the call is directed to one of the entries
on the list of gates.
Overall this approach is more complex and less efficient than other protection schemes.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Domain switching can be easily supported under this model, simply by providing ―switch‖ access to
other domains:
The ability to copy rights is denoted by an asterisk, indicating that processes in that domain have the
right to copy that access within the same column, i.e. for the same object. There are two important
variations:
o If the asterisk is removed from the original access right, then the right is transferred, rather than
being copied. This may be termed a transfer right as opposed to a copy right.
o If only the right and not the asterisk is copied, then the access right is added to the new domain,
but it may not be propagated further. That is the new domain does not also receive the right to
copy the access. This may be termed a limited copy right, as shown in Figure 5.5:
Figure 5.5: Access matrix with copy rights.
The owner right adds the privilege of adding new rights or removing existing ones:
5.5.5 Comparison
Each of the methods here has certain advantages or disadvantages, depending on the particular
situation and task at hand.
Many systems employ some combination of the listed methods.
2. …………………. makes sure that the data being worked with is actually the correct data.
(a) Integrity (b) Assurance (c) Confidentiality (d) None of these.
3. …………..is the property of a system or system resource being accessible and usable upon demand
by an authorized system entity, according to performance specification for the system. (a) Integrity
(b) Assurance (c) Availability (d) None of these
4. Authenticity makes possible that a computer system be able to verify the identity of a user.
(a) True (b) False
With an access list scheme revocation is easy, immediate, and can be selective, general, partial, total,
temporary, or permanent, as desired.
With capabilities lists the problem is more complicated, because access rights are distributed
throughout the system. A few schemes that have been developed include:
o Reacquisition - Capabilities are periodically revoked from each domain, which must then re-
acquire them.
o Back-pointers - A list of pointers is maintained from each object to each capability which is held
for that object.
o Indirection - Capabilities point to an entry in a global table rather than to the object. Access rights
can be revoked by changing or invalidating the table entry, which may affect multiple processes,
which must then re-acquire access rights to continue.
o Keys - A unique bit pattern is associated with each capability when created, which can be neither
inspected nor modified by the process.
A master key is associated with each object.
When a capability is created, its key is set to the object‘s master key.
As long as the capability‘s key matches the object‘s key, then the capabilities remain valid.
The object master key can be changed with the set-key command, thereby invalidating all
current capabilities.
More flexibility can be added to this scheme by implementing a list of keys for each object,
possibly in a global table.
The concept of incorporating protection mechanisms into programming languages is in its infancy,
and still remains to be fully developed. However the general goal is to provide mechanisms for three
functions:
Distributing capabilities safely and efficiently among customer processes. In particular a user
process should only be able to access resources for which it was issued capabilities.
Specifying the type of operations a process may execute on a resource, such as reading or writing.
Specifying the order in which operations are performed on the resource, such as opening before
reading.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
5.8 Security
Security, on the other hand, requires not only an adequate protection system, but also consideration
of the external environment within which the system operates. Internal protection is not useful if the
operator‘s console is exposed to unauthorized personnel, or if files (stored, for example, on tapes and
disks) can simply be removed from the computer system and taken to a system that has no protection.
These security problems are essentially management, rather than operating -system, problems.
The information stored in the system (both data and code), as well as the physical resources of the
computer system, need to be protected from unauthorized access, malicious destruction or alteration,
and accidental introduction of inconsistency. In this chapter, we examine the ways in which
information may be misused or intentionally made inconsistent. We then present mechanism to guard
against such occurrences.
Absolute protection of the system from malicious abuse is not possible, but the cost to the perpetrator
can be made sufficiently high to deter most, if not all, attempts t o access, without proper authority,
the information residing in the system.
5.9 Authentication
A major security problem for operating systems is the authentication problem. The protection system
depends on an ability to identify the programs and processes that are executing. This ability, in turn,
eventually rests on our power to identify each user of the system. A user normally identifies him.
How do we determine whether a user‘s identity is authentic? Generally, authentication is based on
one or more of three items: user possession (a key or card), user knowledge (a user identifier and
password), and a user attribute (fingerprint, retina pattern, or signature).
5.9.1 Passwords
The most common approach to authenticating a user identity is the use of user passwords. When the
user identifies herself by user ID or account name, she is asked for a password. If the user -supplied
password matches the password stored in the system, the system assumes th at the user is legitimate.
Passwords are often used to protect objects in the computer system, in the absence of more complete
protection schemes. They can be considered a special case of either keys or capabilities. For instance,
a password could be associated with each resource (such as a file). Whenever a request is made to use
the resource, the password must be given. If the password is correct, access is granted. Different
password may be associated with different access rights.
For example, different passwords may be used for each of reading, appending, and updating a file.
There are two common ways to guess a password. One is for the intruder (either human or program)
to know the user or to have information about the user. All too frequently, people use obvious
information (such as the names of their cats or spouses) as their passwords. The other way is to use
brute force; trying all possible combinations of letters, numbers, and punctuation until the password
is found. Short passwords do not leave enough choices to. Prevent their being guessed by repeated
trials. For example, a four-decimal password provides only 10,000 variations. On the average,
guessing 5000 times would produce a correct hit. If a program could be written that would try a
password every 1 millisecond, it would then take only about 5 seconds to guess a four -digit password.
Longer passwords are less susceptible to being guessed by enumeration and systems that differentiate
between uppercase and lowercase letters and that allow use of numbers and all punctuation characters
in passwords, make the task of guessing the password much more difficult. Of course, users must take
advantage of the large password space and must not, for example, use only lowercase letters.
This function is used to encode all passwords. Only the encoded passwords are stored. When a user
presents a password, it is encoded and compared against the stored encoded password. Even if the
stored encoded password is seen, it cannot be decoded, so the password cannot be determined. Thu s,
the password file does not need to be kept secret. The function f(j) is typically an encryption
algorithm that has been designed and tested rigorously.
The flaw in this method is that the system no longer has control over the passwords. Although the
passwords are encrypted, anyone with a copy of the password file can run fast encryption routines
against it, encrypting each word in a dictionary, for instance, and comparing the results against file
passwords.
Caution
Security at both levels must be maintained if operating-system security is to be ensured. A weakness
at a high level of security (physical or human) allows circumvention of strict low -level (operating-
system) security measures. Ignoring any level can occur a problem in system.
The seed is the authentication challenge from the computer. The secret and the seed are used as input
to the function f(secret, seed). The result of this function is transmitted as the password to the
computer. Because the computer also knows the secret and the seed, it can perform the same
computation. If the results match, the user is authenticated. The next time that the user needs to be
authenticated, another seed is generated and the same steps ensue. This time, the password is
different.
Inside a text-editor program, for example, there may be code to search the file to be edited for certain
keywords. If any are found, the entire file may be copied to a special area accessible to the creator of
the text editor. A code segment that misuses its environment is call ed a Trojan horse. The Trojan-
horse problem is exacerbated by long search paths (such as are common on UNIX systems). The
search path lists the set of directories to search when an ambiguous program name is given. The path
is searched for a file of that name and the file is executed. All the directories in the search path must
be secure, or a Trojan horse could be slipped into the user‘s path and executed accidentally.
For instance, consider the use of the ―.‖ character in a search path. The ―.‖ tells the s hell to include
the current directory in the search. Thus, if a user has ―.‖ in her search path, has set her current
directory to a friend‘s directory, and enters the name of a normal system command, the command
may be executed from the friend‘s directory instead. The program would run within the user‘s
domain, allowing including deleting the user‘s files, for instance.
5.12.1 Worms
A worm is a process that uses the spawn mechanism to clobber system performance.
The worm spawns copies of itself, using up system resources and perhaps locking out system use by
all other processes. On computer networks, worms are particularly potent, since they may reproduce
themselves among systems and thus shutdown the entire network. Such an event occurred in 1988 to
UNIX systems on the worldwide Internet network, causing millions of dollars of lost system and
programmer time. The Internet links thousands of government, academic, research, and industrial
computers internationally, and serves as the infrastructure for electronic exchange of scientific
information. At the close of the workday on November 2, 1988, Robert Tappan Morris, Jr., a first -
year Cornell graduate student, unleashed a worm program on one or more hosts connected t o the
Internet. Targeting Sun Microsystems‘ Sun 3 workstations and VAX computers ―running variants of
Version 4 BSD UNIX, the worm quickly spread over great distances; within a few hours of its
release, it had consumed system resources to the point of brin ging down the infected machines.
Although Robert Morris designed the self-replicating program, for rapid reproduction and
distribution, some of the features of the UNIX networking environment provided the means to
propagate the worm throughout the system.
It is likely that Morris chose for initial infection an Internet host left open; for and accessible to
outside users. From there, the worm program exploited flaws in the UNIX operating system‘s security
routines and took advantage of UNIX utilities that simplify resource sharing in local-area networks to
gain unauthorized access to thousands of other connected sites. Morris‘ methods of attack are
outlined next.
5.12.2 Viruses
Another form of computer attack is a virus. Like worms, viruses are designed to spread into other
programs and can wreak havoc in a system, including modifying or destroying files and causing
system crashes and program malfunctions.
Whereas a worm is structured as a complete, standalone program, a virus is a fragment of code
embedded in a legitimate program. Viruses are a major problem for computer users, especially users
of microcomputer systems.
Multiuser computers, generally, are not prone to viruses because the executable programs are
protected from writing by the operating system. Even if a virus does infect a program, its powers are
limited because other aspects of the system are protected. Single -user systems have no such
protections and, as a result, a virus has free run. —
Viruses are usually spread by users downloading viral programs from public bulletin boards or
exchanging floppy disks containing an infection. A case from February 1992 involving two Cornell
University students provides an illustration. The students had developed three Macintosh game
programs with an embedded virus that they distributed to worldwide software archives via the
Internet. The virus was discovered when a mathematics professor in Wales downloaded the games,
and antivirus programs on his system alerted him to an infection. Some 200 other users had als o
downloaded the games. Although the virus was not designed to destroy data, it could spread to
application files and cause such problems as long delays and program malfunctions.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Caution
Be aware while using Internet, mainly viruses enters through accessing Internet. Some viruses can
harm the software as well as hardware.
The problem is how can trusted computers be connected safely to an untrustworthy network? One
solution is the use of a firewall to separate trusted and untrusted systems. A firewall is a computer or
router that sits between the trusted and the untrusted. It limits network access between the two
security domains, and monitors and logs all connections. For instance, web servers use the http
protocol to communicate with web browsers. A firewall therefore may need to allow http to pass. The
Morris Internet worm used the finger protocol to break into comput ers, so finger would not be
allowed to pass. In fact, a firewall can separate a network into multiple domains. A common
implementation has the Internet as the untrusted domain; a semi trusted and semi secure network,
called the demilitarized zone (DMZ), as another domain; and a company‘s computers as a third
domain (see Figure 5.10). Connections are allowed from the Internet to the DMZ computers and from
the company computers to the Internet, but are not allowed from the Internet or DMZ computers to
the company computers.
Figure 5.10: Network security through domain separation via firewall.
Optionally, controlled communications may be allowed between the DMZ and one or more company
computers. For instance, a web server on the DMZ may need to query a database server on the
corporate network. In this manner, all access is contained, and any DMZ systems that are broken into
based on the protocols allowed through the firewall still are unable to access the company computers.
5.14 Encryption
The various provisions that an operating system may make for authorization may not offer sufficient
protection for highly sensitive data. Moreover, as computer networks gain popularity, more sensitive
(classified) information is being transmitted over channels where e avesdropping and message
interception are possible. To keep such sensitive information secure, we need mechanisms to allow a
user to protect data that are transferred over the network.
Encryption is one common method of protecting information transmitted o ver unreliable links. The
basic mechanism works as follows:
1. The information (text) is encrypted (encoded) from its initial readable form (called clear text), to
an internal form (called cipher text). This internal text form, although readable, does not make
any sense.
2. The cipher text can be stored in a readable file, or transmitted over unprotected channels.
3. To make sense of the cipher text, the receiver must decrypt (decode) it back into clear text.
7. Security provided by the kernel offers better protection than that provided by a compiler.
(a) True (b) False
The Problem
Data theft, one of the major security issues facing the companies, could lead to heavy financial and
economic loss to the organizations, according to experts.
Apart from the more prevalent forms of data theft like online hacking of the organization's networks
and stealing of hard copies of the files of the organization, the com panies were now waking up to yet
another 'physical' way of data theft with innocent looking IT gadgets like iPods, digital cameras, MP3
players, and smart phones. The expenses incurred on preventing the theft of data were slowly taking a
major part of the IT budget for many organizations.
A New HR Dilemma
The possible threats posed by these portable devices were also becoming a HR dilemma for the
organizations. Insider information was the biggest threat to the safety of the data resources of the
organization.
Corporate insiders could easily evade the ring of security.
Statistics showed that internal security breaches were growing faster than the external security
breaches; and it constituted almost half of the total security breaches in the organizations.
Questions
1. What is the pod slurping?
2. How the data security can improve in an organization?
5.15 Summary
• Computer systems contain many objects. These objects need to be protected from misuse. Objects
may be hardware (such as memory CPU time, or I/O devices) or software (such as files,
programs, and abstract data types).
• The access matrix is sparse. It is normally implemented either as access lists associated with each
object, or as capability lists associated with each domain.
• Real systems are much more limited, and tend to provide protection for only files.
• UNIX is representative, providing read, write, and execution protection separately for the owner,
group, and general public for each file.
• Protection is an internal problem. Security must consider both the computer system and the
environment (people, buildings, businesses, valuable objects, and threats) within which the
system is used.
• The data stored in the computer system must be protected from unauthoriz ed access, malicious
destruction or alteration, and accidental introduction of inconsistency.
• The various authorization provisions in a computer system may not confer sufficient protection
for highly sensitive data.
5.16 Keywords
Access Matrix: The access matrix is a general model of protection. The access matrix provides a
mechanism for protection without imposing a particular protection policy on the system or its users.
Access Right: An access right is permission to perform an operation on an obje ct.
Domain: A domain is a set of access rights. Processes execute in domains and may use any of the
access rights in the domain to access and manipulate objects.
Encryption: Encryption is the conversion of data into a form, called a ciphertext that cannot be easily
understood by unauthorized people.
Virus: A virus is a fragment of code embedded in a legitimate program. Viruses are a major problem
for computer users, especially users of microcomputer systems.
Worm: A worm is a process that uses the spawn mechanism to clobber system performance. The
worm spawns copies of itself, using up system resources and perhaps locking out system use by all
other processes.
5.17 Review Questions
1. What are the goals and principles of protection?
2. What are the protection mechanism and OS mode protection?
3. What are the main differences between capability lists and access lists?
4. Explain the domain of the protection.
5. What is the access matrix? How it is implemented?
6. What are the revocations of access rights?
7. What is the language based protection? Explain with example.
8. What is security? Explain the security problem. What is authentication?
9. Write the short notes:
a. One time password
b. Program threats
c. System threats
d. Threat monitoring
10. What are two advantages of encrypting data stored in the computer system?
6.0 Objectives
After studying this chapter, you will be able to:
Understand memory management
Explain the memory management requirements
Describes the address space
Explain the linking and loading
Define swapping
Describes the memory partitioning
Understand the paging
Explain the segmentation
6.1 Introduction
Memory management is the act of managing computer memory. In its simpler forms, this involves
providing ways to allocate portions of memory to programs at their request, and freeing it for reuse
when no longer needed. The management of main memory is critical to the computer system.
Virtual memory systems separate the memory addresses used by a process from actua l physical
addresses, allowing separation of processes and increasing the effectively available amount of RAM
using disk swapping. The quality of the virtual memory manager can have a big impact on overall
system performance.
Garbage collection is the automated allocation and deal-location of computer memory resources for a
program. This is generally implemented at the programming language level and is in opposition to
manual memory management, the explicit allocation and deal -location of computer memory
resources. Region-based memory management is an efficient variant of explicit memory management
that can deal-locate large groups of objects simultaneously.
A program‘s machine language code must be in the computer‘s main memory in order to execute.
Assuring that at least the portion of code to be executed is in memory when a processor is assigned to
a process is the job of the memory manager of the operating system. This task is complicated by two
other aspects as modern computing systems.
The first is multiprogramming. From its definition, we know that multiprogramming mean that
several (at least two) processes can be active within the system during any particular time interval.
But these multiple active processes result from various jobs entering and leaving the system in an
unpredictable manner. Pieces, or blocks, of memory are allocated to these processe s when they enter
the system, and are subsequently freed when the process leaves the system. Therefore, at any given
moment, the computer‘s memory, viewed as a whole, consists of a part of blocks, some allocated to
processes active at that moment, and others free and available to a new process which may, at any
time, enter the system.
In general, then, programs designed to execute in this multiprogramming environment must be
compiled so that they can execute from any block of storage available at the time o f the program‘s
execution. Such program is called reloadable programs, and the idea of placing them into any
currently available block of storage is called relocation.
The second aspect of modern computing systems affecting memory management is the need to allow
the programmer to use a range of program addresses, which may be larger, perhaps significantly
larger than the range of memory locations actually available. That is, we want to provide the
programmer with a virtual memory, with characteristics (especially size) different from actual
memory, and provide it in a way that is invisible to the programmer. This is accomplished by
extending the actual memory with secondary memory such as disk. Providing an efficiently operating
virtual memory is another task for the memory management facility.
Physical memory is divided into fixed size blocks called frames. Logical memory is also divided into
blocks of the same, fixed size called pages. When a program is to be executed, its pages are loaded
into any available memory frames from the disk. The disk is also divided into fixed size, which is the
same size as the memory frames.
The logical address space is the set of addresses seen by the user (or user process)
The physical address space is the set of addresses in physical memory (RAM). The two address
spaces need not be the same size and usually are not in most modern systems. In systems with
―virtual memory‖, the logical address space is typically much larger than the physical address space.
(see Figure 6.2)
Caution
The code and data for a process must be in RAM before it could be run.
3. The main memory and the floppy disk have less storage capacity than the................
(a). Floppy disk (b). hard disk
(c). Both (a) and (b) (d). None of these
4. The access speed of ........................is also much faster than a hard disk.
(a). main memory (b). read only memory
(c). random access memory (d). None of these
This feature will be removed in the next version of Microsoft SQL Server. Do not use this feature in
new development work, and modify applications that currently use this feature as soon as possible.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Ex2: Identify Address Spacing.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
6.5.4 Linking
A linker joins several object modules into a single load module. Input to the linker is a set of object
modules. Each module has been translated with relative addresses, relative to 0 as the start address of
the module.
References from one module to another (function calls, data references, etc.) are still symbolic:
A dynamic linker postpones some of the linkage functions until ru n time. Dynamic linking can be
done at load-time or at run time. Load-time dynamic linking prepares a reloadable load module in
normal way. It leaves some external references unresolved. These references are usually to system
utilities or language libraries. At load time, system copies of the target modules can be linked.
Utilities can be changed without forcing programmers to re -link existing load modules. A shared code
can be linked to more than one program. Run-time dynamic linking goes a step farther. Some
modules will not be linked in until they are actually called.
Caution
The loader must generate a single contiguous module in which all the external references have been
resolved.
6.5.5. Overlaying
Overlaying is a technique that is used to execute a process even if the memory is not enough. The
programmer can define two or more overlays. These overlays can execute in the memory
independently. The operating system can swap overlays and manage the memory. The di sadvantage
of overlying is that it requires extensive involvement of programmer. The programme has to identify
and define overlays efficiently. It is a very difficult task.
Following the creation of a high level language (HLL) source program, there are pro cessing before
we can get a process as shown in Figure 6.4.
6.6 Swapping
Swapping is exchanging data between the hard disk and the RAM
The goal of the virtual memory technique is to make an application think that it has more memory
than actually exists. If you read the recommended question then you know that the virtual memory
manager (VMM) creates a file on the hard disk called a swap file. Basically, the swap file (also
known as a paging file) allows the application to store any extra data that can‘t be stored in the RAM
– because the RAM has limited memory. Keep in mind that an application program can only use the
data when it‘s actually in the RAM. Data can be stored in the paging file on the hard disk, but it i s
not usable until that data is brought into the RAM. Together, the data being stored on the hard disk
combined with the data being stored in the RAM comprise the entire data set needed by the
application program. So, the way virtual memory works is that whenever a piece of data needed by an
application program cannot be found in the RAM, then the program knows that the data must be in
the paging file on the hard disk.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
The base register holds the entry point of the program, and may be added to a relative address to
generate an absolute address. The bounds register indicates the ending location of the program, which
is used to compare with each physical address generated.
If the later is within bounds, then the execution may proceed; otherwise, an interrupt is generated,
indicating illegal access to memory. The relocation can be easily supported with this mechanism with
the new starting address and ending address assigned respectively to the base register and the bounds
register.
Placement algorithm
Different strategies may be taken as to how space is allocated to processes:
• First fit: Allocate the first hole that is big enough. Searching may start either at the beginning of the
set of holes or where the previous first-fit search ended.
• Best fit: Allocate the smallest hole that is big enough. The entire list of holes must be searched
unless it is sorted by size. This strategy produces the smallest leftover hole.
• Worst fit: Allocate the largest hole. In contrast, this strategy aims to produce the largest leftover
hole, which may be big enough to hold another process.
Experiments have shown that both first fit and best fit are better than worst fit in terms of decreasing
time and storage utilization.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Ex2: Write a short note about Swapping.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
6.8 Paging
Paging is a memory-management scheme that permits the physical -address space of a process to be
non-contiguous. Paging avoids the considerable problem of fitting the varying -sized memory chunks
onto the backing store, from which most of the previous memory-management schemes suffered.
When some code fragments or data residing in main memory need to be swapped out, space must be
found on the backing store. The fragmentation problems discussed in connection with main memory
are also prevalent with backing store, except that access is much slower, so compaction is impossible.
Because of its advantages over the previous methods, paging in its various forms is commonly used
in most operating systems.
(a) Paging.
(b) Segmentation.
Figure 6.11: Address translation with paging and segmentation.
Traditionally, support for paging has been handled by hardware. However, recent designs have
implemented paging by closely integrating the hardware and operating system, especially on 64 -bit
microprocessors. (See Figure 6.12)
Figure 6.12: Paging hardware.
6. .........................if it is known in advance that a program will reside at a specific location of main
memory
(a). Compiler time (b). Run time
(c). Both (a).and (b) (d). None of these
8...................or dynamic relocation defers the process of determining absolute addresses until the
address is used.
(a). memory management (b). dynamic loading
(c). file management (d). None of these
6.9 Segmentation
Segmentation avoids internal fragmentations which are present in both fixed partitioning and paging,
but like dynamic partitioning, it suffers from external fragmentation. However the problem is not that
serious because a process may be broken into a number of smaller pieces and the resulting external
holes will be much smaller.
Different from paging, which is invisible to the programmer and the compiler, segmentation, is
usually visible, which is actually based on the programmers‘ logical view of programs. Typically, the
programmers will assign programs and data to different segments, thus leading to a major advantage
of segmentation that the protection and sharing may be easily supported. With segmentation, the
logical addresses and physical addresses do not have a simple relationship any more like with
partitioning and paging. Each logical address is explicitly expressed by a two tuple: segment -number,
offset. The operating system maintains a segment table for each process and a list of free blocks of
main memory. As each segment table entry would have to give the starting address in main memory
of the corresponding segment, as well as the length of the segment to assure that invalid addresses are
not used. Paging and segmentation nowadays have been combined to eliminate both external and
internal fragmentation, and facilitate programmers‘ control on programs. An example for this is the
architecture of the Intel 80x86. An important aspect of memory management that became unavoidable
with paging is the separation of the user‘s view of memory and the actual physical memory. The
user‘s view of memory is not the same as the actual physical memory. The user‘s view is mapped
onto physical memory. The mapping allows differentiation between logical memory and physica l
memory.
Question
1. What are the solution and key benefits of fine soft studio memory manager?
2. Explain the business description fine soft studio memory manager.
6.10 Summary
Segmentation memory management scheme is used to divide a program into a number of smaller
blocks, called segments.
With segmentation, the logical addresses and physical addresses do not have a simple relationship
any more like with partitioning and paging.
Logical memory is broken into blocks of the same size called pages.
An important aspect of paging is the clear separation between the user‘s view of memory and the
actual physical memory.
The simplest partitioning method is dividing memory into several fixed -sized partitions in
advance, called fixed partitioning.
The address translation procedure with dynamic partitioning, where the processor provides
hardware support for address translation, protection, and relocation.
Swapping is a general term for exchanging blocks of program code or data between main and
secondary memory.
The range of virtual addresses that the operating system assigns to a user or separately running
program is called an address space.
6.11 Keywords
First fit: Allocate the first hole that is big enough. Searching may start either at the beginning of the
set of holes or where the previous first-fit search ended.
Memory Management: It is about sharing memory so that the largest number of processes can run in
the most efficient way.
Paging: It is a memory-management scheme that permits the physical-address space of a process to
be non-contiguous
Segmentation: It avoids internal fragmentations which are present in both fixed parti tioning and
paging, but like dynamic partitioning, it suffers from external fragmentation.
Swapping: It is a general term for exchanging blocks of program code or data between main and
secondary memory.
7.0 Objectives
After studying this chapter, you will able to:
Understand the concept of virtual memory
Discuss about demand paging
Explain about page replacement
Understand about thrashing
Explain demand segmentation
7.1 Introduction
Processes in a system share the CPU and main memory with other processes. However, sharing the
main memory poses some special challenges. As demand on the CPU increases, processes slow down
in some reasonably smooth way. But if too many processes need too much memory, then some of
them will simply not be able to run. When a program is out of space, it is out of luck. Memory is also
vulnerable to corruption.
If some process inadvertently write to the memory used by another process , that process might fail in
some bewildering fashion totally unrelated to the program logic. In order to manage memory more
efficiently and with fewer errors, modern systems provide an abstraction of main memory known as
virtual memory (VM). Virtual memory is an elegant interaction of hardware exceptions, hardware
address translation, main memory, disk files, and kernel software that provides each process with a
large, uniform, and private address space.
With one clean mechanism, virtual memory provides three important capabilities.
(1). It uses main memory efficiently by treating it as a cache for an address space stored on disk,
keeping only the active areas in main memory, and transferring data back and forth between disk and
memory as needed.
(2). It simplifies memory management by providing each process with a uniform address space.
(3). It protects the address space of each process from corruption by other processes.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Caution
To restart the instruction, we must reset the two registers to the values they had before we started the
execution of the instruction.
It is important to realize that these bits must be updated on every memory reference, so it is essential
that they be set by the hardware. Once a bit has been set to 1, it stays 1 until t he operating system
resets it to 0 in software. If the hardware does not have these bits, they can be simulated as follows.
When a process is started up, all of its page table entries are marked as not in memory. As soon as
any page is referenced, a page fault will occur. The operating system then sets the R bit (in its
internal tables), changes the page table entry to point to the correct page, with mode READ ONLY,
and restarts the instruction. If the page is subsequently written on, another page fault wi ll occur,
allowing the operating system to set the M bit and change the page‘s mode to READ/WRITE. The R
and M bits can be used to build a simple paging algorithm as follows. When a process is start up,
both page bits for all its pages are set to 0 by the operating system. Periodically (e.g., on each clock
interrupt), the R bit is cleared, to distinguish pages that have not been referenced recently from those
that have been. When a page fault occurs, the operating system inspects all the pages and divides
them into four categories based on the current values of their R and M bits:
The new one goes on the back of the list; the one at the front of the list is dropped. As a page
replacement algorithm, the same idea is applicable. The operating system maintains a list of all pages
currently in memory, with the page at the head of the list the oldest one and the page at the tail the
most recent arrival. On a page fault, the page at the head is removed and the new page added to the
tail of the list. When applied to stores, FIFO might remove mustache wax, but it might also remove
flour, salt, or butter. When applied to computers the same problem arises. For this reason, FIFO in its
pure form is rarely used.
7.4.4 The Second Chance Page Replacement Algorithm
A simple modification to FIFO that avoids the problem of throwing out a heavily used page is to
inspect the R bit of the oldest page. If it is 0, the page is both old and unused, so it is replaced
immediately. If the R bit is 1, the bit is cleared, the page is put onto the end of the list of pages, and
its load time is updated as though it had just arrived in memory. Then the search continues.
The operation of this algorithm, called second chance, is shown in Figure 7.4. In Figure 7.4(a) we see
pages A through H kept on a linked list and sorted by the time they arrived in memory.
Figure 7.4: Operation of second chance, (a) Pages sorted in FIFO order, (b) Page list if a page fault
occurs at time 20 and A has its R bit set. The numbers above the pages are their loading times.
Suppose that a page fault occurs at time 20. The oldest page is A, wh ich arrived at time 0, when the
process started. If A has the R bit cleared, it is evicted from memory, either by being written to the
disk (if it is dirty), or just abandoned (if it is clean). On the other hand, if the R bit is set, A is put
onto the end of the list and its ‗‗load time‘‘ is reset to the current time (20). The R bit is also cleared.
The search for a suitable page continues with B. What second chance is doing is looking for an old
page that has not been referenced in the previous clock inter val. If all the pages have been referenced,
second chance degenerates into pure FIFO. Specifically, imagine that all the pages in Figure 7.4(a)
have their R bits set. One by one, the operating system moves the pages to the end of the list, clearing
the R bit each time it appends a page to the end of the list. Eventually, it comes back to page A,
which now has its R bit cleared. At this point A is evicted. Thus the algorithm always terminates.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Not surprisingly, this algorithm is called clock. It differs from second chance only in the
implementation.
The workings of this algorithm are given in Figure 7.6 for four page frames and page references in
the order
0123210323
After page 0 is referenced, we have the situation of Figure 7.6(a). After page 1 is reference, we have
the situation of Figure 7.6(b), and so forth.
Figure 7.6: LRU using a matrix when pages are referenced in the order 0, 1, 2, 3, 2, 1, 0, 3, 2, 3.
When a page fault occurs, the page with the lowest counter is chosen for replacement. The main
problem with NFU is that it never forgets anything. For example, in a multi -pass compiler, pages that
were heavily used during pass 1 may still have a high count well into later passes. In fact, if pass 1
happens to have the longest execution time of all the passes, the pages containing the code for
subsequent passes may always have lower counts than the pass 1 pages. Consequently, the operating
system will remove useful pages instead of pages no longer in use. Fortunately, a small modification
to NFU makes it able to simulate LRU quite well. The modification has two parts. First, the counters
are each shifted right 1 bit before the R bit is added in. Second, the R bit is added to the leftmost,
rather than the rightmost bit. Figure 7.7 illustrates how the modified algorithm, known as aging,
works.
Suppose that after the first clock tick the R bits for pages 0 to 5 have the values 1, 0, 1, 0, 1, and 1,
respectively (page 0 is 1, page 1 is 0, page 2 is 1, etc.). In other words, between tick 0 and tick 1,
pages 0, 2, 4, and 5 were referenced, setting their R bits to 1, while the other ones remain 0. After the
six corresponding counters have been shifted and the R bit inserted at the left, they have the values
shown in Figure 7.7(a). The four remaining columns show the six counters after the next four clock
ticks.
Figure 7.7: The aging algorithm simulates LRU in software. Shown are six pages for five clock
ticks. The five clock ticks are represented by (a) to (e).
When a page fault occurs, the page whose counter is the lowest is removed. It is clear that a page that
has not been referenced for, say, four clock ticks will have four leading zeros in its counter and thus
will have a lower value than a counter that has not been reference d for three clock ticks.
It has been long known that most programs do not reference their address space uniformly, but that
the references tend to cluster on a small number of pages. A memory reference may fetch an
instruction, it may fetch data, or it may store data. At any instant of time, t, there exists a set
consisting of all the pages used by the k most recent memory references. This set, w(k, t), is the
working set. Because the k = 1 most recent references must have used all the pages used by the k > 1
most recent references, and possibly others, w(k, t) is a monotonically no decreasing function of k.
The limit of w(k, t) as k becomes large is finite because a program cannot reference more pages than
its address space contains, and few programs will use every single page. Figure 7.8 depicts the size of
the working set as a function of k.
Figure 7.8: The working set is the set of pages used by the k most recent memory references.
The function w (k, t) is the size of the working set at time t.
The algorithm works as follows. The hardware is assumed to set the R and M bits, as we have
discussed before. Similarly, a periodic clock interrupt is assumed to cause software to run that clears
the Referenced bit on every clock tick. On every page fault, the page table is scanned to look for a
suitable page to evict. As each entry is processed, the R bit is examined. If it is 1, the current virtual
time is written into the Time of last use field in the page table, indicating that the page was in use at
the time the fault occurred. Since the page has been referenced during the current clock tick, it is
clearly in the working set and is not a candidate for removal (τ is assumed to span multiple clock
ticks).
If R is 0, the page has not been referenced during the current clock tick and may be a candidate for
removal. To see whether or not it should be removed, its age, that is, the current virtual time minus
its Time of last use is computed and compared to τ. If the age is greater than τ, the page is no longer
in the working set. It is reclaimed and the new page loaded here. The scan continues updating the
remaining entries. However, if R is 0 but the age is less than or equal to τ, the page is still in the
working set. The page is temporarily spared, but the page with the greatest age (smallest value of
Time of last use) is noted. If the entire table is scanned without finding a can didate to evict, that
means that all pages are in the working set. In that case, if one or more pages with R = 0 were found,
the one with the greatest age is evicted. In the worst case, all pages have been referenced during the
current clock tick (and thus all have R = 1), so one is chosen at random for removal, preferably a
clean page, if one exists.
Figure 7.10: Operation of the WSClock algorithm. (a) and (b) give an example of what happens
when R = 1. (c) and (d) give an example of R = 0.
What happens if the hand comes all the way around to its starting point?
There are two cases to distinguish:
1. At least one write has been scheduled.
2. No writes have been scheduled.
In the former case, the hand just keeps moving, looking for a clean page. Since one or more writes
have been scheduled, eventually some write will complete and its page will be marked as clean. The
first clean page encountered is evicted.
This page is not necessarily the first write scheduled because the disk driver may reorder writes in
order to optimize disk performance.
In the latter case, all pages are in the working set; otherwise at least one write would have been
scheduled. Lacking additional information, the simplest thing to do is cl aim any clean page and use
it. The location of a clean page could be kept track of during the sweep. If no clean pages exist, then
the current page is chosen and written back to disk.
The first minicomputer to introduce virtual memory was the Norwegian NORD-1; during the 1970s,
other minicomputers implemented virtual memory, notably VAX models running VMS.
Self Assessment Questions
1. ………….is an elegant interaction of hardware address translation and kernel software that
provides each process with a uniform and private address space.
(a). Main memory (b). Secondary memory
(c). Hard disk (d). Virtual memory
2. A .................manipulates entire processes, whereas a pager is concerned with the individual pages
of a process.
(a). swapper (b). segmentation
(c). paging (d). None of these
4. A good approximation to the optimal algorithm is based on the observation that pages.
(a). True (b). False
5. An improved algorithm that is based on the clock algorithm but also uses the working set
information is called………….
(a). counter clock (b). WSClock
(c). paging (d). None of these
7.5 Thrashing
If the number of frames allocated to a low-priority process falls below the minimum number required
by the computer architecture, we must suspend that process execution. We should then page out its
remaining pages, freeing all its allocated frames. This pr ovision introduces a swap-in, swap-out level
of intermediate CPU scheduling. In fact, look at any process that does not have ―enough‖ frames.
Although it is technically possible to reduce the number of allocated frames to the minimum, there is
some (larger) number of pages in active use. If the process does not have this number of frames, it
will quickly page fault. At this point, it must replace some page. However, since all its pages are in
active use, it must replace a page that will be needed again right away. Consequently, it quickly faults
again, and again, and again. The process continues to fault, replacing pages for which it then faults
and brings back in right away. This high paging activity is called thrashing
The average service time for a page fault will increase, due to the longer average queue for the
paging device. Thus, the effective access time will increase even for a process that is not thras hing.
To prevent thrashing, we must provide a process as many frames as it needs. But how do we know
how many frames it ―needs‖? There are several techniques. The working -set strategy starts by
looking at how many frames a process is actually using. This approach defines the locality model of
process execution.
Caution
Faulting processes must use the paging device to swap pages in and out.
The specific problem is how to prevent thrashing. Thrashing has a high page-fault rate. Thus, we
want to control the page-fault rate. When it is too high, we know that the process needs more frames.
Similarly, if the page-fault rate is too low, then the process may have too many frames. We can
establish upper and lower bounds on the desired page-fault rate. If the actual page-fault rate exceeds
the upper limit, we allocate that process another frame; if the page -fault rate falls below the lower
limit, we remove a frame from that process. Thus, we can directly measure and control the page-fault
rate to prevent thrashing.
As with the working-set strategy, we may have to suspend a process. If the page -fault rate increases
and no free frames are available, we must select some process and suspend it. The freed frames are
then distributed to processes with high page-fault rates.
Questions
1. How do you change the Virtual Memory Settings?
2. What are the basic Reasons for Low Virtual Memory?
7.7 Summary
Virtual memory is an elegant interaction of hardware exceptions, hardware address translation,
main memory, disk files, and kernel software that provides each process with a large, uniform,
and private address space.
A swapper manipulates entire processes, whereas a pager is concerned with the individual pages
of a process.
A page fault could occur at any memory reference. If the page fault occurs on the instruction
fetch, we can restart by fetching the instruction again.
Demand paging can have a significant effect on the performance of a computer system. To see
why, let us compute the effective access time for a demand paged memory.
The working-set model is based on the assumption of locality. This model uses a paramete r to
define the working-set window.
7.8 Keywords
Demand Paging: A demand-paging system is similar to a paging system. Processes reside on
secondary memory.
Locality Model: The locality model states that, as a process executes, it moves from locality to
locality. A locality is a set of pages that are actively used together.
Secondary memory: This memory holds those pages that are not presentin main memory.
Thrashing: it quickly faults again, and again, and again. The process continues to fault, replacing
pages for which it then faults and brings back in right away. This high paging activity is called
thrashing.
Virtual Memory: Modern systems provide an abstraction of main memory known as virtual memory
(VM) in order to manage memory more efficiently and with fewer errors.
8.0 Objectives
After studying this chapter, you will be able to:
Describe the input-output devices
Explain the hardware support for I/O
Describe the I/O communication techniques
Explain the I/O software device drivers
Define a performance consideration
8.1 Introduction
One of the main functions of operating systems is to control all the computer‘s input/output (I/O)
devices. It must issue commands to the devices, catch interrupts (I/O), and handle errors. It should
also provide an interface between the devices and the rest of the system that is simple and easy to
use. Different people look at I/O hardware in different ways. Electrical engineers look at it in term of
chips, wires, power supplies, motors and all the other physical components that make up the
hardware. Programmers look at the interface presented to the software the commands the hardware
accepts, the functions it carries out, and the errors that can be reported back.
The control of devices connected to the computer is a major concern of operating -system designers.
Because I/O devices vary so widely in their function and speed (consider a mouse, a hard disk, and a
CD-ROM jukebox), a variety of methods are needed to control them. These methods form the I/O
sub-system of the kernel, which separates the rest of the kernel from th e complexity of managing I/O
devices.
The I/O-device technology exhibits two conflicting trends. On one hand, we see increasing
standardization of software and hardware interfaces. This trend helps us to incorporate improved
device generations into existing computers and operating systems. On the other hand, we see an
increasingly broad variety of I/O devices. Some new devices are so unlike previous devices that it is
a challenge to incorporate them into computers and operating systems.
This challenge is met by a combination of hardware and software techniques. The basic I/O hardware
elements, such as ports, buses, and device controllers, accommodate a wide variety of I/O devices. To
encapsulate the details and oddities of different devices, the kernel of an operating system is
structured to use device driver modules. The device drivers present a uniform device access interface
to the I/O subsystem, much as system calls provide a standard interface between the application and
the operating system.
If you look closely, the boundary between device that are block addressable and those that are not is
not well defined. Everyone agrees that a disk is a block addressable device because no matter where
the arm currently is, it is always possible to seek another cylinder and then wait for the required
block to rotate under the head. Now consider a magnetic tape containing blocks of 1K bytes. If the
tape drive is given a command to read block N, it always rewind the tape and go forward unti l it
comes to block N. This operation is analogous to a disk doing seek, except that it take much longer.
Also, it may or may not be possible to rewrite one block in the middle of a tape. Even if were
possible to use magnetic tapes as block devices that are stretching the point somewhat: they are
normally not used that way.
The other type of I/O device is the character device. A character device delivers or accepts a stream
of character, without regard to any block structure. It is not addressable and does not have any seek
operation. Terminals, line printers, paper tapes, punched cards, network interface, mice (for
pointing), and most other devices that are not disk like can be seen as character devices.
This classification scheme is not perfect. Some devices just do not fit in. However, the model of
block and character devices is general enough that it can be used as a basis for making the I/O system
device independent. A typical microcomputer system consists of a microprocessor plus memory and
I/O interface. The various components that form the system are linked through buses that transfer
instructions, data, addresses and control information among the components. The block diagram of a
microcomputer system is shown in Figure 8.1.
Figure 8.1: Block Diagram of a Microcomputer System.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Using device controllers for connecting I/O devices to a computer system instead of connecting them
directly to the system bus has the following advantages:
A device controller can be shared among multiple I/O devices allowing many I/O devices to be
connected to the system.
I/O devices can be easily upgraded or changed without any change in the computer system.
I/O devices of manufacturers other than the computer manufacturer can be easily plugged in to
the computer system. This provides more flexibility to the users in buying I/O devices of their
choice.
Caution
Do not attempt to resize a partition on a device that is in use, it may create data lose.
It has microcode and a processor to do many tasks, such as bad -sector mapping, pre fetching,
buffering, and caching. Because a common type of software fault is a write through an incorrect
pointer to an unintended region of memory, a memory-mapped device register is vulnerable to
accidental modification. Of course, protected memory helps to reduce this risk.
8.3.1 Polling
The complete protocol for interaction between the host and a controller can be intricate, but the basic
handshaking notion is simple. One explains handshaking by an example. Assume that 2 bits are used
to coordinate the producer consumer relationship between the controller and the host. The controller
indicates its state through the busy bit in the status register. (Recall that to set a bit means to write a 1
into the bit, and to dear a bit mean to write a 0 into it.) The controller sets the busy bit when it is busy
working, and clears the busy bit when it is ready to accept the next command. The host signals its
wishes via the command-ready bit in the command register. The host sets the command -ready bit
when a command is available for the controller to execute. For this example, the host writes output
through a port, coordinating with the Controller by handshaking as follows.
1. The host repeatedly reads the busy bit until that bit becomes clear.
2. The host sets the write bit in the command register and writes a byte into the data -out register.
3. The host sets the command-ready bit.
4. When the controller notices that the command-ready bit is set, it sets the busy bit.
5. The controller reads the command register and sees the write command. It reads the data -out
register to get the byte, and does the I/O to the device.
6. The controller dears the command-ready bit, clears the error bit in the status register to indicate
that the device I/O succeeded, and clears the busy bit to indicate that it is finished.
This loop is repeated for each byte. In step 1, the host is busy-waiting or polling: It is in a loop,
reading the status register over and over until the busy bit becomes clear. If the controller and device
are fast, this method is a reasonable one. But if the wait may be long, the host should probably switch
to another task. But then how does the host know when the controller has become idle? For some
devices, the host must service the device quickly, or data will be lost. For instance, when data are
streaming in on a serial port or from a keyboard, the small buffer on the controller will overflow and
data will be lost if the host waits too long before returning to read the bytes.
8.3.2 Interrupts
The basic interrupt mechanism works as follows. The CPU hardware has a wire called the interrupt
request line that the CPU senses after executing every instruction. When the CPU detects that a
controller has asserted a signal on the interrupt request line, the CPU save s a small amount of state,
such as the current value of the instruction pointer, and jumps to the interrupt -handler routine at a
fixed address in memory. The interrupt handler determines the cause of the interrupt performs the
necessary processing and executes a return from interrupt instruction to return the CPU to the
execution state prior to the interrupt. That the device controller raises an interrupt by asserting a
signal on the interrupt request line, the CPU catches the interrupt and dispatches to th e interrupt
handler and the handler clears the interrupt by servicing the device. Figure 8.5 summarizes the
interrupt-driven I/O cycle. This basic interrupt mechanism enables the CPU to respond to an
asynchronous event, such as a device controller becoming ready for service. In a modern operating
system, we need more sophisticated interrupt-handling features. First, we need the ability to defer
interrupt handling during critical processing. Second, we need an efficient way to dispatch to the
proper interrupt handler for a device, without first polling all the devices to see which one raise can
distinguish between high- and low-priority interrupts, and can respond with the appropriate degree of
urgency.
During I/O, interrupts are raised by the various device controllers when they are ready for service.
These interrupts unify that output has completed, or that input data are available, or that a failure has
been detected. The interrupt mechanism is also used to handle a wide variety of exceptions, such as
dividing by zero, accessing a protected or nonexistent memory address, or attemp ting to execute a
privileged instruction from user mode. The events that trigger interrupts have a common property:
They are occurrences that induce the CPU to execute an urgent, self -contained routine. An operating
system has other good uses for an efficient hardware mechanism that saves a small amount of
processor state, and then calls a privileged routine in the kernel. For example, many operating
systems use the interrupt mechanism for virtual-memory paging. A page fault is an exception that
raises an interrupt. The interrupt suspends the current process and jumps to the page fault handler in
the kernel. This handler saves the state of the process, moves the process to the wait queue, performs
page-cache management, schedules an I/O operation to fetch the page, schedules another process to
resume execution, and then returns from the interrupt.
Another example is found in the implementation of system calls. A system call is a function that is
called by an application to invoke a kernel service.
In programmed I/O, the I/O operations are completely controlled by the processor. The processor
executes a program that initiates, directs and terminate an I/O operation. It requires a little special I/O
hardware, but is quite time consuming for the processor since the processor has to wait for slower I/O
operations to complete.
I/O Commands
There are four types of I/O commands that an I/O interface may receive when it is addressed by a
processor:
Control: These commands are device specific and are used to provide specific instructions to the
device, e.g. a magnetic tape requiring rewinding and moving forward by a block.
Test: This command checks the status such as if a device is ready or not or is in error condition.
Read: This command is useful for input of data from input device.
Write: this command is used for output of data to output device.
I/O Instructions
An I/O instruction is stored in the memory of the computer and is fetched and execut ed by the
processor producing an I/O-related command for the I/O interface. With programmed I/O, there is a
close correspondence between the I/O-related instructions and the I/O commands that the processor
issues to an I/O interface to execute the instructions. In systems with programmed I/O, the I/O
interface, the main memory and the processors normally share the system bus. Thus, each I/O
interface should interpret the address lines to determine if the command is for itself. There are two
methods for doing so. These are called memory-mapped I/O and isolated I/O.
With memory-mapped I/O, there is a single address space for memory locations and I/O devices. The
processor treats the status and data registers of I/O interface as memory locations and uses the sa me
machine instructions to access both memory and I/O devices. For a memory -mapped I/O only a single
read and a single write line are needed for memory or I/O interface read or write operations. These
lines are activated by the processor for either memory access or I/O device access. With isolated I/O,
there are separate controls lines for both memory and I/O device read or write operations. Thus a
memory reference instruction does not affect an I/O device. In isolated I/O, the I/O devices and
memory are addressed separately; hence separate input/output instructions are needed which cause
data transfer between addressed I/O interface and processor
Interrupt-Processing
The occurrence of an interrupt fires a numbers of events, both in the processor hardware and
software.
When an I/O device completes an I/O operation, the following sequence of hardware events occurs:
1. The device issues an interrupt signal to the processor.
2. The processor finishes execution of the current instruction before responding to the interrupt.
3. The processor tests for the interrupts and sends an acknowledgement signal to the device that
issued the interrupt.
4. The minimum information required to be stored for the task being currently executed, before the
CPU starts executing the interrupt routine (using its registers) are:
(a) The status of the processor, which is contained in the register called program status word
(PSW).
(b) The location of the next instruction to be executed, of the currently executing program,
which is contained in the program counter (PC).
5. The processor now loads the PC with the entry location of the interrupt -handling program that
will respond to this interrupting condition. Once the PC has been loaded, the proc essor proceeds
to execute the next instruction that is the next instruction cycle, which begins with an instruction
fetch. Because the instruction fetch is determined by the contents of the PC, the result is that
control is transferred to the interrupt-handler program.
6. The PC and PSW relating to the interrupted program have already been saved on the system stack.
The contents of the processor registers are also needed to be saved on the stack that is used by the
called interrupt servicing routine because these registers may be modified by the interrupt-
handler. Here a user program is interrupted after the instruction at location N. The contents of all
of the registers plus the address of the next instruction (N+1) are pushed on to the stack.
7. The interrupt handler next processes the interrupt. This includes determining of the event that
caused the interrupt and also the status information relating to the I/O operation.
8. When interrupt processing is complete, the saved register values are retrieved from the stac k and
restored to the registers.
9. The final step is to restore the values of PSW and PC from the stack. As a result, the instruction
to be executed will be from the previously interrupted program.
Thus, interrupt handling involves interruption of the currently executing program, execution of
interrupt servicing program and restart of interrupted program from the point of interruption.
Design issues
Two design issues arise in implementing interrupt-driven I/O:
1) How does the processor determine which device issued the interrupt?
2) If multiple interrupts have occurred, how does the processor decide which one to be processed
first?
To solve these problems, four general categories of techniques are in common use:
Multiple Interrupt Lines
The simplest solution to the problems above is to provide multiple interrupt lines, which will result in
immediate recognition of the interrupting device. Priorities can be assigned to various interrupts and
the interrupt with the highest priority should be selected for service in case a multiple interrupt
occurs. But providing multiple interrupt lines is an impractical approach because only a few lines of
the system bus can be devoted for the interrupt.
Software Poll
In this scheme, on the occurrence of an interrupt, the processor jumps to an interrupt service program
or routine whose job it is to poll (roll call) each I/O interface to determine which I/O interface has
caused the interrupt. This may be achieved by reading the status register of the I/O interface. Once
the correct interface is identified, the processor branches to a device -service routine specific to that
device. The disadvantage of the software poll is that it is time consuming.
Daisy Chain
This scheme provides a hardware poll. With this technique, an interrupt a cknowledge line is chained
through various interrupt devices. All I/O interfaces share a common interrupt request line. When the
processor senses an interrupt, it sends out an interrupt acknowledgement. This signal passes through
all the I/O devices until it gets to the requesting device. The first device which has made the interrupt
request thus senses the signal and responds by putting in a word which is normally an address of
interrupt servicing program or a unique identifier on the data lines. This word is also referred to as
interrupt vector. This address or identifier in turn is used for selecting an appropriate interrupt -
servicing program. The daisy chaining has an in-built priority scheme, which is determined by the
sequence of devices on interrupt acknowledge line.
Bus Arbitration
In this scheme, the I/O interface first needs to control the bus and only after that it can request for an
interrupt. In this scheme, since only one of the interfaces can control the bus, therefore only one
request can be made at a time. The interrupt request is acknowledged by the CPU on response of
which I/O interface places the interrupt vector on the data lines. An interrupt vector normally
contains the address of the interrupt serving program. An example of an interrup t vector can be a
personal computer, where there are several IRQs (Interrupt request) for a specific type of interrupt.
The DMA interface transfers the entire block of data, one word at a time, directly to or from memory,
without going through the processor. When the transfer is complete, the DMA interface sends an
interrupt signal to the processor. Thus, in DMA the processor involvement can be restricted at the
beginning and end of the transfer.
The DMA mechanism can be configured into a variety of ways. In which all interfac es share the same
system bus. The DMA acts as the supportive processor and can use programmed I/O for exchanging
data between memory and I/O interface through DMA interface. But once again this spoils the basic
advantage of DMA not using extra cycles for transferring information from memory to/from DMA
and DMA from/to I/O interface. The configuration suggests advantages over the one. In these systems
a path is provided between I/O interface and DMA interface, which does not include the system bus.
The DMA logic may become part of an I/O interface and can control one or more I/O interfaces. In an
extended concept an I/O bus can be connected to this DMA interface. Such a configuration is quite
flexible and can be extended very easily. In both these configurati ons, the added advantage is that the
data between I/O interface and DMA interface is transferred off the system bus, thus eliminating the
disadvantage we have witnessed for the first configuration.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
8.5 I/O Software Device Drivers
A device driver is a program routine that links a peripheral device to an operating system of a
computer. It is essentially a software program that allows a user to employ a device, such as a printer,
monitor, or mouse. It is written by programmers who comprehend the detailed knowledge of the
device‘s command language and characteristics and contains the specific machine language necessary
to perform the functions requested by the application. When a new hardware device is added to the
computer, such as a CD-ROM drive, a printer, or a sound card, its driver must be installed in order to
run it. The operating system ―calls‖ the driver, and the driver ―drives‖ the device. In Windows, for
example, everything that is seen on the screen is the result of the displ ay driver (video driver). The
display driver effectuates the visual appearance of the screen according to the precise commands that
Windows issues to it.
The driver is the link between the operating system and the peripheral device. If the peripheral devi ce
is changed, or if a bug is found in the driver, the driver must also be changed. A new version of the
driver is then written and released by the manufacturer of the device.
The basic input/output (I/O) hardware features, such as ports, buses, and device controllers,
accommodate a wide variety of I/O devices. To encapsulate the details and unique features of
different devices, the kernel of an operating system is set up to use device driver modules. The device
drivers present a uniform device-access interface to the I/O subsystem.
Each of the different types of I/O devices is accessed through a standardized set of functions -an
interface. The tangible differences are encapsulated in kernel modules (i.e., device drivers) that
internally are customized for each device, but that export and utilize one of the standard interfaces. A
device driver sets the direct memory access (DMA) control registers to use appropriate source and
destination addresses, and transfer length. The DMA controller is then instructed to begin the I/O
operation. Refer to Figure 8.9 to see how a device driver (a mouse driver in this example) relates to
the structure of the operating system.
Figure 8.9: Mouse and mouse device driver in a kernel input/output structure of an operating
system.
Device drivers are saved as files, and are called upon when a particular peripheral or hardware device
is needed. On the Macintosh, for instance, they are stored in the extensions folder like extensions;
their features are preset and cannot be modified. Once they are installed, the devices they control
become available for use.
To provide an efficient and convenient access to the hard disk, the operating system requires the file
system to allow the data to be stored, located, and retrieved easil y. The file system is composed of
several different levels. The lowest level (see Figure 8.10) is the input/output (I/O) control, and
consists of device drivers and interrupt handlers to transfer information between the memory and the
hard disk. A device driver is the basically a translator. It is input consists of high -level commands,
and it is output consists of low-level, hardware-specific instructions, which are utilized by the
hardware controller, which interfaces the I/O device to the rest of the oper ating system. The device
driver usually writes specific bit patterns to designated locations in the I/O controller‘s memory to let
the controller know on which device location to act and what subsequent actions to provide.
Caution
There are no user-level file permissions. All file locations specified by the UTL_FILE_DIR
parameters are valid, for both reading and writing, for all users of the file I/O procedures. This can
override operating system file permissions.
2. The I/O interface provides a method for transferring information between…………and external I/O
devices.
(a) internal storage (b) external storage
(c) virtual storage (d) cash memory
4. An I/O interface is bridge between the processor and I/O devices. It controls the data exchange
between the external devices and …………………..
(a) main memory (b) external device
(c) processor registers (d) All of these
5. Programmed input/output is not a useful I/O method for computers where hardware costs need to
be minimized.
(a) True (b) False
System configuration
The system configuration is:
Uniprocessor 80486DX2 running at 66MHz.
EISA bus.
32MB of RAM.
64MB of swap space.
NBUF set to 3000.
Two 1GB SCSI-2 hard disks.
One 16-port and one 8-port non-intelligent serial card using 16450 UARTs.
22 ASCII terminals and PCs running terminal emulation software.
One V.42 fax modem.
Defining a Performance Goal
The system administrator is tasked with improving the interactive performance of the system. Funds
are available for upgrading the machine‘s subsystems if sufficient need is demonstrated. Any change
to the system must be undertaken with minimal disruption to the users.
Collecting Data
The administrator ensures that system accounting is enabled using sar enable(ADM), and produces
reports of system activity at five-minute intervals during the working week by placing the following
line in root‘s crontab(C) file:
0 8-18 * * 1-5 /usr/lib/sa/sa2 -s 8:00 -e 18:00 -i 300 -A
=The administrator notes the times at which users report that the system response is slow and
examines the corresponding operating system activity in the report (effectivel y using sar -u):
08:00:00 %usr %sys %wio %idle
11:10:00 42 46 4 8
11:15:00 40 49 6 5
11:20:00 38 50 7 5
11:25:00 41 47 5 7
The system is spending a large amount of time in system mode and little time idle or waiting for I/O.
The length of the run queue shows that an unacceptably large number of user processes are lined up
for running (sar -q statistics):
08:00:00 runq-sz %runocc swpq-sz %swpocc
11:10:00 4.3 85
11:15:00 7.8 98
11:20:00 5.0 88
11:25:00 3.5 72
An acceptable number of processes on the run queue would be two or fewer.
At times when the system response seems acceptable, the system activity has the following patter n:
08:00:00 %usr %sys %wio %idle
16:40:00 55 20 0 25
16:45:00 52 25 2 21
16:50:00 59 20 1 20
16:55:00 54 21 2 23
This shows that the system spends little time waiting for I/O and a large proportion of time in user
mode. The %idle figure shows more than 20% spare CPU capacity on the system. The run queue
statistics also show that user processes are getting fair access to run on the CPU:
08:00:00 runq-sz %runocc swpq-sz %swpocc
16:40:00 1.0 22
16:45:00 2.1 18
16:50:00 1.6 9
16:55:00 1.1 12
Formulating a Hypothesis
From the CPU utilization statistics, it looks as though the system is occasionally spending too much
time in system mode. This could be caused by memory shortages or too much overhead placed on the
CPU by peripheral devices. The low waiting on I/O figures imply that memory shortage is not a
problem. If the system were swapping or paging, this would usually generate much more disk
activity.
The administrator next examines the performance of the memory, disk and serial I/O subsystems to
check on their performance.
The zero values for swpot/s and bswot/s indicate that there was no swapping out activity.Examining
the sar -q, sar -r and sar -w reports at other times shows occasional short periods of paging activity
but these are correlated with batch payroll runs. It should be possible to reduce the impact of these on
the system by rescheduling the jobs to run overnight.
The administrator next examines the buffer cache usage statistics for the same period (sar -b
statistics):
08:00:00 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s
11:10:00 27 361 93 5 16 68 0 0
11:15:00 35 320 89 7 22 66 0 0
11:20:00 22 275 92 5 15 65 0 0
11:25:00 22 282 96 9 27 67 0 0
These figures show hit rates on the buffer cache of about 90% for reads and 65% for writes.
Approximately 30KB of data (bread/s + bwrit/s) is being read from or written to disk per second.
Disk performance is examined next using the statistics provided by sar -d:
08:00:00 device %busy avque r+w/s blks/s avwait avserv
11:10:00 Sdsk-0 0.91 3.70 2.37 13.15 12.42 4.60
Sdsk-1 25.01 1.62 11.39 55.21 3.26 5.30
11:15:00 Sdsk-0 0.57 2.58 1.37 6.98 13.05 8.26 Sdsk-1 24.10 1.43 10.93 50.42 3.11 7.23
11:20:00 Sdsk-0 0.81 2.42 1.98 11.01 9.55 6.72 Sdsk-1 21.77 1.85 6.05 39.11 4.54 5.37
11:25:00 Sdsk-0 0.76 3.90 2.00 9.52 14.18 4.89 Sdsk-1 20.24 2.07 5.83 34.87 10.60 9.91
These results show that the busiest disk (Sdsk-1) has acceptable performance with a reasonably short
request queue, acceptable busy values, and low wait and service times. The pattern of activity on the
root disk (Sdsk-0) is such that the request queue is longer since requests are tending to arrive in
bursts. There is no evidence that the system is disk I/O bound though it may be possible to improve
the interactive performance of some applications by increasing the buffer cache hit rates.
Questions
1. What is the Multiuser System?
2. What is the idle figure in I/O-bound multiuser system?
8.7 Summary
The Input /Output interface provides a method for transferring information between internal
storage and external I/O devices.
The Input /Output subsystem of a computer, referred to as I/O, provides an efficient mode of
communication between the central system and the output environment.
A device driver is a program routine that links a peripheral device to an operating system of a
computer. It is essentially a software program that allows a user to employ a device, such as a
printer, monitor, or mouse.
An I/O instruction is stored in the memory of the computer and is fetched and executed by the
processor producing an I/O-related command for the I/O interface.
Data buffering is quite useful for the purpose of smoothing out the gaps in speed of processor and
the I/O devices. The data buffers are registers, which hold the I/O information temporarily.
The Performance Consideration presents a number of considerations regarding implementation of
Linux-based solutions on System platform.
8.8 Keywords
Block Device: It is one that stores information in fixes-size blocks, each one with its own address.
Character Device: It is delivers or accepts a stream of character, without regard to any block
structure.
Controller: It is a collection of electronics that can operate a port, a bus, or a device.
Data Buffers: It is refer to registers, which hold the I/O information temporarily.
I/O Interface: It is bridge between the processor and I/O devices.
Page Fault: It is an exception that raises an interrupt.
9.0 Objectives
After studying this chapter, you will be able to:
Discuss the characteristics of disk drives
Explain the disk scheduling
Describe the disk management
Explain the disk reliability
Explain the swap space management
Discuss the stable storage implementation
9.1 Introduction
Modern disk drives are addressed as large one-dimensional arrays of logical blocks, where the logical
block is the smallest unit of transfer. The size of a logical block is usually 512 bytes, although some
disks can be low-level formatted to have a different logical block size, such as 1,024 bytes. The one-
dimensional array of logical blocks is mapped onto the sectors of the disk sequentially. Sector 0 is the
first sector of the first track on the outermost cylinder. The mapping proceeds in order through th at
track, then through the rest of the tracks in that cylinder, and then through the rest of the cylinders
from outermost to innermost.
By using this mapping, we can—at least in theory—convert a logical block number into an old-style
disk address that consists of a cylinder number, a track number within that cylinder, and a sector
number within that track. In practice, it is difficult to perform this translation, for two reasons. First,
most disks have some defective sectors, but the mapping hides this by s ubstituting spare sectors from
elsewhere on the disk. Second, the number of sectors per track is not a constant on some drives.
Seeking
The speed of head movement, or seeking, is limited by the power available for the pivot motor
(halving the seek time requires quadrupling the power) and by the arm‘s stiffness. Accelerations of
30–40 g are required to achieve good seek times, and too flexible an arm can twist and bring the head
into contact with the platter surface. Smaller diameter disks have correspondingly reduced distances
for the head to move. These disks have smaller, lighter arms that are easier to stiffen against
flexing—all contributing to shorter seek times.
A seeks is composed of:
A speedup, where the arm is accelerated until it reaches half of the seek distance or a fixed
maximum velocity,
A coast for long seeks, where the arm moves at its maximum velocity,
A slowdown, where the arm is brought to rest close to the desired track, and
A settle, where the disk controller adjusts the head to access the desired location.
Very short seeks (less than, say, two to four cylinders) are dominated by the settle time (1 –3 ms). In
fact, a seek may not even occur; the head may just resettle into position on a new track. Short seeks
(less than 200–400 cylinders) spend almost all of their time in the constant acceleration phase, and
their time is proportional to the square root of the seek distance pl us the settle time. Long seeks spend
most of their time moving at a constant speed, taking time that is proportional to distance plus a
constant overhead. As disks become smaller and track densities increase, the fraction of the total seek
time attributed to the settle phase increases.
―Average‖ seek times are commonly used as a Figure 9.1 of merit for disk drives, but they can be
misleading. Such averages are calculated in various ways, a situation further complicated by the fact
that independent seeks are rare in practice. Shorter seeks are much more common, although their
overall frequency is very much a function of the workload and the operating system driving the disk.
If disk requests are completely independent of one another, the average seek distance will be one
third of the full stroke. Thus, some sources quote the one -third-stroke seeks time as the ―average‖.
Others simply quote the full-stroke time divided by three. Another way is to sum the times needed to
perform one seek of each size and divide this sum by the number of different seek sizes. Perhaps the
best of the commonly used techniques is to weight the seek time by the number of possible seeks of
each size: Thus, there are N–1 different single-track seeks that can be done on a disk with N
cylinders, but only one full-stroke seek. This emphasizes the shorter seeks, providing a somewhat
better approximation to measured seek-distance profiles. What matters to people building models,
however, is the seek-time-versus-distance profile. We encourage manufacturers to include these in
their disk specifications, since the only alternative is to determine them experimentally.
Data Layout
A SCSI disk appears to its client computer as a linear vector of addressable blocks, each typically
256–1,024 bytes in size. These blocks must be mapped to physical sectors on the disk, which are the
fixed-size data-layout units on the platters. Separating the logical and physical vie ws of the disk in
this way means that the disk can hide bad sectors and do some low -level performance optimizations,
but it complicates the task of higher level software that is trying to second -guess the controller (for
example, the 4.2 BSD UNIX fast file system).
Zoning
Tracks are longer at the outside of a platter than at the inside. To maximize storage capacity, linear
density should remain near the maximum that the drive can support; thus, the amount of data stored
on each track should scale with its length. This is accomplished on many disks by a technique called
zoning, where adjacent disk cylinders are grouped into zones. Zones near the outer edge have more
sectors per track than zones on the inside. There are typically 3 –20 zones, and the number is likely to
double by the end of the decade. Since the data transfer rate is proportional to the rate at which the
media passes under the head, the outer zones have higher data transfer rates. For example, on a
Hewlett-Packard C2240 3.5 inch disk drive, the burst transfer rate (with no inter track head switches)
varies from 3.1 MB per second at the inner zone to 5.3 mbps at the outermost zone.
Track Skewing
Faster sequential access across track and cylinder boundaries is obtained by skewing logical sector
zero on each track by just the amount of time required to cope with the most likely worst -case head-
or track-switch times. This means that data can be read or written at nearly full media speed. Each
zone has its own track and cylinder skew factors.
Sparing
It is prohibitively expensive to manufacture perfect surfaces, so disks invariably have some flawed
sectors that cannot be used. Flaws are found through extensive testing during manufacturing, and a
list is built and recorded on the disk for the controller‘s use.
So that flawed sectors are not used, references to them are remapped to other portions of the disk.
This process, known as sparing, is done at the granularity of single sectors or whole tracks. The
simplest technique is to remap a bad sector or track to an alternate location. Alternatively, slip
sparing can be used, in which the logical block that would map to the bad sector and the ones after it
are ―slipped‖ by one sector or by a whole track. Many combinations of techniques are possible, so
disk drive designers must make a complex trade-off involving performance, expected bad-sector rate,
and space utilization. A concrete example is the HP C2240 disk drive, which uses both forms of
track-level sparing: slip-track sparing at disk format time and single-track remapping for defects
discovered during operation.
Bus Interface
The most important aspects of a disk drive‘s host channel are its topology, its transfer rate, and it is
overhead. The SCSI is currently defined as a bus, although alternative versions are being discussed,
as are encapsulations of the higher levels of the SCSI protocol across other transmission media, such
as Fibre Channel. Most disk drives use the SCSI bus operation‘s synchronous mode, which can ru n at
the maximum bus speed. This was 5 mbps with early SCSI buses; differential drivers and the ―fast
SCSI‖ specification increased this to 10 mbps a couple of years ago. Disks are now appearing that
can drive the bus at 20 mbps (―fast, wide‖), and the standard is defined up to 40 mbps. The maximum
bus transfer rate is negotiated between the host computer SCSI interface and the disk drive. It appears
likely that some serial channel such as Fibre Channel will become a more popular transmission
medium at the higher speeds, partly because it would have fewer wires and requires a smaller
connector. Because SCSI is a bus, more than one device can be attached to it. SCSI initially
supported up to eight addresses, a figure recently doubled with the use of wide SCSI . As the number
of devices on the bus increases, contention for the bus can occur, leading to delays in executing data
transfers. This matters more if the disk drives are doing large transfers or if their controller overheads
are high. In addition to the time attributed to the transfer rate, the SCSI bus interfaces at the host and
disk also require time to establish connections and decipher commands. On SCSI, the cost of the low -
level protocol for acquiring control of the bus is on the order of a few micros econds if the bus is idle.
The SCSI protocol also allows a disk drive to disconnect from the bus and reconnect later once it has
data to transfer. This cycle may take 200 µs but allows other devices to access the bus while the
disconnected device processes data, resulting in a higher overall throughput.
In older channel architectures, there was no buffering in the disk drive itself. As a result, if the disk
was ready to transfer data to a host whose interface was not ready, then the disk had to wait an enti re
revolution for the same data to come under the head again before it could retry the transfer. In SCSI,
the disk drive is expected to have a speed-matching buffer to avoid this delay, masking the
asynchrony between the bus and the mechanism. Since most SCSI drives take data off the media more
slowly than they can send it over the bus, the drive partially fills its buffer before attempting to
commence the bus data transfer. The amount of data read into the buffer before the transfer is
initiated is called the fence; its size is a property of the disk controller, although it can be specified
on modern SCSI disk drives by a control command. Write requests can cause the data transfer to the
disk‘s buffer to overlap the head repositioning, up to the limit permi tted by the buffer‘s size. These
interactions are illustrated in Figure 9.2.
Caching of Requests
The functions of the speed-matching buffer in the disk drive can be readily extended to include some
form of caching for both reads and writes. Caches in disk drives tend to be relatively small (currently
64 kilobytes to 1 megabyte) because of space limitations and the relatively high cost of the dual
ported static RAM needed to keep up with both the disk mechanism and the bus interface.
Read-ahead
A read that hits in the cache can be satisfied ―immediately,‖ that is, in just the time needed for the
controller to detect the hit and send the data back across the bus. This is usually much quicker than
seeking to the data and reading it off the disk, so most modern SCSI disks provide some form of read
caching. The most common form is read-ahead—actively retrieving and caching data that the disk
expects the host to request momentarily.
As we will show, read caching turns out to be very important when it comes to mode lling a disk
drive, but it is one of the least well specified areas of disk system behaviour. For example, a read that
partially hits in the cache may be partially serviced by the cache (with only the no cached portion
being read from disk), or it may simply bypass the cache altogether. Very large read requests may
always bypass the cache. Once a block has been read from the cache, some controllers discard it;
others keep it in case a subsequent read is directed to the same block.
Some early disk drives with caches did on-arrival read-ahead to minimize rotation latency for whole
track transfers; as soon as the head arrived at the relevant track, the drive started reading into its
cache. At the end of one revolution, the full track‘s worth of data had been re ad, and this could then
be sent to the host without waiting for the data after the logical start point to be reread. (This is
sometimes—rather unfortunately—called a ―zero-latency read‖ and is also why disk cache memory is
often called a track buffer.) As tracks get longer but request sizes do not, on-arrival caching brings
less benefit; for example, with 8-Kbyte accesses to a disk with 32 KB tracks, the maximum benefit is
only 25% of a rotation time.
On-arrival caching has been largely supplanted by simple read-ahead in 0 which the disk continues to
read where the last host request left off. This proves to be optimal for sequential reads and allows
them to proceed at the full disk bandwidth. (Without read ahead, two back -to-back reads would be
delayed by almost a full revolution because the disk and host processing time for initiating the second
read request would be larger than the inter-sector gap.) Even here there is a policy choice: Should the
read-ahead be aggressive, crossing track and cylinder boundaries, or should it stop when the end of
the track is reached? Aggressive read-ahead is optimal for sequential access, but it degrades random
accesses because head and track switches typically cannot be aborted once initiated, so an unrelated
request that arrives while the switch is in progress can be delayed.
Figure 9.2: Overlap of bus phases and mechanism activity. The low -level details of bus
arbitration and selection have been elided for simplicity.
A single read-ahead cache can provide effective support for only a single sequential read stream. If
two or more sequential read streams are interleaved, the result is no benefit at all. This can be
remedied by segmenting the cache so that several unrelated data items can be cached. For example, a
256 KB cache might be split into eight separate 32KB cache segments by appropriate configuration
commands to the disk controller.
Write Caching
In most disk drives, the cache is volatile, losing its contents if power to the drive is lost. To perform
write caching and prevent data loss, this kind of cache must be managed carefully. One technique is
immediate reporting, which the HP-UX file system uses to allow back-to-back writes for user data. It
allows selected writes to the disk to be reported as complete as soon a s they are written into the
disk‘s cache. Individual writes can be flagged ―must not be immediate -reported‖; otherwise, a write
is immediately reported if it is the first write since a read or a sequential extension of the last write.
This technique optimizes a particularly common case—large writes that the file system has split into
consecutive blocks. To protect itself from power failures, the file system disables immediate
reporting on writes to metadata describing the disk layout. Combining immediate re porting with read
ahead means that sequential data can be written and read from adjacent disk blocks at the disk‘s full
throughput.
Volatile write-cache problems go away if the disk‘s cache memory can be made non -volatile. One
technique is battery-backed RAM, since a lithium cell can provide 10 year retention. Thus equipped,
the disk drive is free to accept all the write requests that will fit in its buffer and acknowledge them
all immediately. In addition to the reduced latency for write requests, two thro ughput benefits also
result:
(1) Data in a write buffer are often overwritten in place, reducing the amount of data that must be
written to the mechanism, and
(2) The large number of stored writes makes it possible for the controller to schedule them in n ear-
optimal fashion, so that each takes less time to perform. These issues are discussed in more detail
elsewhere.
As with read caching, there are several possible policies for handling write requests that hit data
previously written into the disk‘s cache. Without non-volatile memory, the safest solution is to delay
such writes until the first copy has been written to disk. Data in the write cache must also be scanned
for read hits; in this case, the buffered copy must be treated as primary, since the disk may not yet
have been written to.
Command Queuing
With SCSI, support for multiple outstanding requests at a time is provided through a mechanism
called command queuing. This allows the host to give the disk controller several requests and let the
controller determine the best execution order—subject to additional constraints provided by the host,
such as ―do this one before any of the others you already have.‖ Letting the disk drive perform the
sequencing gives it the potential to do a better job by using its detailed knowledge of the disk‘s
rotation position.
As you can see, this is a marked improvement over simple FIFO. However, if a process requests
many nearby tracks it can dominate disk activity and greatly increase the latency for other processes
(whose tracks are more distantly located). This condition is known as starvation, because one process
is preventing the other processes from accessing the disk (starving them from disk access). This may
be optimal however; it is not fair to the other processes. Fairness dictates that the latency be a s
evenly spread among running processes as possible. So other algorithms have been developed to
prevent processes with high track locality from starving the other processes.
9.3.3 SCAN/LOOK
Starvation is a bad thing, so OS developers devised a scheduling algorithm based on the elevator
algorithm. The SCAN services tracks in only one direction (either increasing or decreasing track
number). When SCAN reaches the edge of the disk (or track 0), it reverses direction. The LOOK is
the obvious optimization of having the read/write head reversed when the last track in that direction
is serviced.
The LOOK behaves almost identically to SSTF, but avoids the starvation problem of SSTF. This is
because LOOK is biased against the area recently traversed, and heavily favours tracks clustered at
the outermost and innermost edges of the platter. The LOOK is also biased towards more recently
arriving jobs (on average).
9.3.4 C-LOOK
The C-LOOK (circular LOOK) is an effort to remove the bias in LOOK for track clusters at the edges
of the platter. The C-LOOK basically only scans in one direction. Either you sweep from the inside
out, or the outside in. When you reach the end, you just swing the head all the way back to the
beginning. This actually takes advantage of the fact that many drives can move the read/write head at
high speeds if you are moving across a large number of tracks (e.g. the seek time from the last track
to track 0 is smaller than you would expect and usually considerably less than the time it would take
to seek there one track at a time).
Even through the average number of tracks traversed is the same as LOOK in the worst case, N and F
LOOK are in some sense, more fair than plain old LOOK. The sub queues system caps the maximum
latency a process can expect between a request and it being serviced (unlike SST F that can starve
processes for arbitrary lengths of time).
Most hard disks are low-level formatted at the factory as a part, of the manufacturing process. This
formatting enables the manufacturer to test the disk, and to initialize the mapping from logical block
numbers to defect-free sectors on the disk. For many hard disks, when the disk controller is instructed
to low-level format the disk, it can also be told how many bytes of data space to leave between the
header and trailer of all sectors. It is usually possible to choose among a few sizes, such as 256, 512,
and 1024 bytes. Formatting a disk with a larger sector size means that fewer sectors can fit on each
track, but that also means fewer headers and trailers are written on each track, and thus increases the
space available for user data. Some operating systems can handle only a sector size of 512 bytes.
To use a disk to hold files, the operating system still needs to record its own data structures on the
disk. It does so in two steps. The first step is to partition the disk into one or more groups of
cylinders. The operating system can treat each partition as though the latter were a separate disk.
For most computers, the bootstrap is stored in read-only memory (ROM). This location is convenient,
because ROM needs no initialization, and is at a fixed location that the processor can start executing
when powered up or reset. And, since ROM is read only, it cannot be infected by a computer virus.
The problem is that changing this bootstrap code requires changing the ROM hardware chips. For this
reason, most systems store a tiny bootstrap loader program in the boot ROM, whose only job is to
bring in a full bootstrap program from disk. The full bootstrap program can be changed easily: A new
version is simply written onto the disk. The full bootstrap program is stored in a partition called the
boot blocks, at a fixed location on the disk. A disk that has a boot partition is called a boot disk or
system disk.
The code in the boot ROM instructs the disk controller to read the boot blocks into memory (no
device drivers are loaded at this point), and then starts executing that code. The full bootstrap
program is more sophisticated than the bootstrap loader in the boot ROM, and is able to load the
entire operating system from a non-fixed location on disk, and to start the operating system running.
Even so, the full bootstrap code may be small. For example, MS -DOS uses one 512 byte block for its
boot program (see Figure 9.3).
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Several improvements in disk-use techniques have been proposed. These methods involve the use of
multiple disks working cooperatively. To improve speed, disk striping (or interleaving) uses a group
of disks as one storage unit. Each data block is broken into several sub blocks, with one sub block
stored on each disk. The time required to transfer a block into memory improve s dramatically,
because all the disks transfer their sub blocks in parallel. If the disks have their rotations
synchronized, the performance improves further, because all the disks become ready to transfer their
sub blocks at the same time/rather than waiting for the slowest rotational latency. The larger the
number of disks that are striped together, the larger the total transfer rate of the system.
Caution
A partition cannot be made larger than the space available on the device.
Self Assessment Questions
1. The application of digital signal processing may soon increase channel speeds their
current...............................per second.
(a) 90 MB (b) 100 MB
(c) 110 MB (d) 120 MB
2. The speed of head movement or..................... is limited by the power available for the pivot m otor
and by the arm‘s stiffness.
(a) sparing (b) track skewing
(c) zoning (d) seeking
4....................... algorithm is based on the observation that seek times are lower for nearby tracks.
(a) FIFO (b) SSTF
(c) SCAN (d) LOOK
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
9.7 Stable Storage Implementation
The concept of a write-ahead log, which required the availability of stable storage. By definition,
information residing in stable. Storage is never lost. To implement such storage, we need to replicate
the needed information on multiple storage devices with independent failure modes. We need to
coordinate the writing of updates in a way that guarantees that a failure during an update does not
leave all the copies in a damaged state, and that, when we are recovering from a failure, we can force
all copies to a consistent and correct value, even if there is another failure during the reco very.
A disk writes results in one of three outcomes:
Successful completion: The data were written correctly on disk.
Partial failure: A failure occurred in the midst of transfer, so only some of the sectors were
written with the new data, and the sector being written during the failure may have been
corrupted.
Total failure: The failure occurred before the disk write started, so the previous data values on the
disk remain intact.
An output operation is executed as follows:
1. Write the information onto the first physical block.
2. When the first write completes successfully, write the same information onto the second physical
block.
3. Declare the operation complete only after the second write completes successfully.
9.8 Summary
The track following system is used to perform a head switch.
The maximum bus transfer rate is negotiated between the host computer SCSI interface and the
disk drive.
The LOOK is the obvious optimization of having the read/write head reversed when the last track
in that direction is serviced.
The C-LOOK (circular LOOK) is an effort to remove the bias in LOOK for track clusters at the
edges of the platter.
The operating system is responsible for several other aspects of disk management.
The ECC processing is done automatically by the controller whenever a sector is read or written.
A small fraction of the disk space is used to hold parity blocks.
Swap space is used in various ways by different operating systems, depending on the
implemented memory management algorithms.
9.9 Keywords
Disk Drives: A disk drive is a device implementing such a storage mechanism with fixed or
removable media; with removable media the device is usually distinguished from the media as in
compact disc drive and the compact disc.
Error-correcting Code (ECC): ECC is used to verify data transmissions by locating and correcting
transmission errors. It is commonly used by RAM chips that include forward error correction, which
ensures all the data being sent to and from the RAM is transmitted correctly.
Redundant Array of Independent Disks (RAID): Redundant array of independent disks is a storage
technology that combines multiple disk drive components into a logical unit.
Shortest Seek Time First (SSTF): Shortest seek first (or shortest seek time first) is a secondary
storage scheduling algorithm to determine the motion of the disk‘s arm and head in servicing read
and write requests.
Small Computer System Interface (SCSI): Small computer system interface is a set of standards for
physically connecting and transferring data between computers and peripheral devices.
10.0 Objectives
After studying this chapter, you will be able to:
Explain the concepts of file
Discuss the directory structure
Define the file sharing
Explain the protection of file system
Discuss the file system in Linux
10.1 Introduction
The file management, formerly known as data management, is the part of the operating system that
controls the storing and accessing of data by an application program. The data may be on internal
storage (for example, database), on external media (diskette, tape, printer), or on another system. File
management, then, provides the functions that an application uses in creating and accessing data on
the system and ensures the integrity of the data according to the definitions of the application. File
management provides functions that allow you to manage files (create, change, override, or delete)
using CL commands, and create and access data through a set of operations (for example, read, write,
open, or close). File management also provides with the capabilit y to access external devices and
control the use of their attributes for creating and accessing data. If you want to make more efficient
use of printers and diskette devices, file management provides the capability of spooling data for
input or output. For example, data being written to a printer can be held on an output queue until the
printer is available for printing. On the IBM AS/400 system, each file (also called a file object) has a
description that describes the file characteristics and how the data associated with the file is
organized into records, and, in many cases, the fields in the records. Whenever a file is processed, the
operating system (the Operating System/400 or OS/400 program) uses this description. You can
create and access data on the system by using these file objects. File management defines and
controls several different types of files. Each file type has associated CL commands to create and
change the file, and you can also create and access data through the operations provided by file
management.
File name
The symbolic file name is the only information kept in human-read form. As it is obvious, a file name
helps users to differentiate between various files. A file name generally consists of a string of
characters. The string of characters prior to the ―.‖ is called as filename and the part after it is called
the file extension (or file type) that differentiates between different types of files. We can have files
with same names but different extensions and therefore we generally refer to a file with its name
along with its extension and that forms a complete file name.
File type
A file type is required for the systems that support different types of files. As discussed earlier, file
type is a part of the complete file name. We might have two different files; say ―cs384report.doc‖ and
―cs384report.txt‖. Therefore the file type is an important attribute which helps differentiating
between files based on their types. File types indicate which application should be used to open a
particular file.
Location
This is a pointer to the device and location on that device of the file. As it is clear from the attribute
name, it specifies where the file is stored.
Size
Size attribute keeps track of the current size of a file in bytes, words or blocks.
Protection
Protection attribute of a file keeps track of the access-control information that controls who can do
reading, writing, executing, and so on.
Usage count
This value indicates the number of processes that are currently using (have opened) a particular file.
Time, Date, and Process Identification
This information may be kept for creation, last modification, and last use. Data provided by this
attribute is often helpful for protection and usage monitoring. Each process has its own identification
number which contains information about file hierarchy.
Creating a file
When creating a file, a space in the file system must be found for the file and then an entry for the
new file must be made in the directory. The directory entry records the name of the file and the
location in the file system.
Writing a file
To write a file, a system call is made specifying both the name and the file and the information to be
written to the file. Given the name of the file, the system searches the directory to find the location of
the file. The directory entry will need to store a pointer to the current block of the file (usually the
beginning of the file). Using this pointer, the address of the next block can be computed where the
information will be written. It is also important to make sure that the file is not overwritten in case of
an append operation, i.e. when we are adding a block of data at the end of an already existing file.
Reading a file
To read a file, a system call is made that specifies the name of the file and where (in memory) the
next block of the file should be put. Again, the directory is searched for the associated directory
entry, and the directory will need a pointer to the next block to be read. Once the block is read, the
pointer is updated.
Repositioning a file
When repositioning a file, the directory is searched for the appropriate entry, and the current file
position is set to a given value. This file operation is also called as file seeks.
Truncating a file
The user may erase some contents of a file but keep its attributes. Rather than forcing the user to
delete the file and then recreate it, this operation allows the all the attributes to remain unchanged,
except the file size.
Deleting a file
To delete a file, the directory is searched for the named file. Having found the associated directory
entry, the space allocated to the file is released (so it can be reused by other files) and invalidates the
directory entry.
The six operations described comprise only the minimal set of required file operations. More
commonly, we might also want to edit the file and modify its contents. A special case of editing a file
is appending new information at the end of the file. Copies of the file can also be created, and since
files are named objects, renaming an existing file may also be needed. If the file is a binary obje ct
format, we may also want to execute it. Also of use are facilities to lock sections of an open file for
multiprogramming access, to share sections, and even to map sections into memory or virtual -
memory systems. This last function allows a part of the virtual address to be logically associated with
section of a file. Reads and writes to that memory region are then treated as reads and writes to the
file.
Caution
The write pointer must be updated ensuring successive writes that used to write a sequence of blocks
to the file.
Single-level directory
In a single-level directory system, all the files are placed in one directory. This is very co mmon on
single-user operating systems. A single-level directory has significant limitations when the number of
files increases or when there is more than one user. Since all files are in the same directory, they must
have unique names. If there are two users who call their data file then the unique-name rule is
violated. Even with a single user, as the number of files increases, it becomes difficult to remember
the names of all the files in order to create only files with unique names. Figure 10.2 below sho ws the
structure of a single-level directory system.
Two-level directory
In the two-level directory system, the system maintains a master block that has one entry for each
user. This master block contains the addresses of the directory of the users. There are still problems
with two-level directory structure. This structure effectively isolates one user from another. This is an
advantage when the users are completely independent, but a disadvantage when the use rs want to
cooperate on some task and access files of other users. Some systems simply do not allow local files
to be accessed by other users. (see Figure 10.3).
Tree-structured Directories
In the tree-structured directory, the directory themselves are considered as files. This leads to the
possibility of having sub-directories that can contain files and subdirectories. An interesting policy
decision in a tree-structured directory structure is how to handle the deletion of a directory. If a
directory is empty, its entry in its containing directory can simply be deleted. However, suppose the
directory to be deleted is not empty, but contains several files or sub -directories then it becomes a bit
problematic. Some systems will not delete a directory unless it is empty. Thus, to delete a directory,
someone must first delete all the files in that directory. If these are any subdirectories, this procedure
must be applied recursively to them so that they can be deleted too. This approach may result in a
substantial amount of work. An alternative approach is just to assume that when a request is made to
delete a directory, all of that directory‘s files and sub directories are also to be deleted. A typi cal
tree-structured directory system is shown in Figure 10.4.
Acyclic-graph Directories
The acyclic directory structure is an extension of the tree-structured directory structure. In the tree-
structured directory, files and directories starting from some fixed directory are owned by one
particular user. In the acyclic structure, this prohibition is taken out and thus a directory or file under
directory can be owned by several users. Shows in Figure 10.5 an acycl ic-graph directory structure.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
2. The part of operating system that maps various files to the ………………is called as file
management system
(a). read only memory (b). physical devices
(c). random access memory (d). None of these.
4. A…………….is required for the systems that support different types of files.
(a). file type (b). data type (c). file management (d). None of these
Contiguous Allocation
The contiguous allocation method requires each file to occupy a set of contiguous address on t he
disk. Disk addresses define a linear ordering on the disk. With this ordering, accessing block b+1
after block b normally requires no head movement. When head movement is needed (from the last
sector of one cylinder to the first sector of the next cylinder), it is only one track. Thus, the number
of disk seeks required for accessing contiguous allocated files in minimal. Contiguous allocation of a
file is defined by the disk address and the length of the first block. If the file is n blocks long, and
starts at location b, then it occupies blocks b, b+1, b+2,…, b+n -1. The directory entry for each file
indicates the address of the starting block and the length of the area allocated for this file.
Indexed Allocation
The indexed allocation method is the solution to the problem of both contiguous and linked
allocation. This is done by bringing all the pointers together into one location called the index block.
The index block will occupy some space and thus could be considered as an overhead of the method.
In indexed allocation, each file has its own index block, which is an array of disk sector of addresses.
Indexed allocation supports direct access, without suffering from external fragmentation. Any free
block anywhere on the disk may satisfy a request for more space.
Sorts of Files
Most files are just files, called regular files; they contain normal data, for example text files,
executable files or programs, input for or output from a program and so on.
While it is reasonably safe to suppose that everything you encounter on a Linux system is a file, there
are some exceptions.
Directories: Files that are lists of other files.
Special Files: The mechanism used for input and output.
Links: A system to make a file or directory visible in multiple parts of the system‘s file tree.
(Domain) Sockets: A special file type, similar to TCP/IP sockets, providing inter process
networking protected by the file system‘s access control.
Named Pipes: Act more or less like sockets and form a way for processes to communicate with
each other, without using network socket semantics.
Partition
Most people have a vague knowledge of what partitions are, since every operating system has the
ability to create or remove them. It may seem strange that Linux uses more than one partition on the
same disk, even when using the standard installation procedure, so some explanation is called for.
One of the goals of having different partitions is to achieve higher data security in case of disaster.
By dividing the hard disk in partitions, data can be grouped and separated. When an accident occurs,
only the data in the partition that got the hit will be damaged, while the data on the other partitions
will most likely survive. This principle dates from the days when Linux did not have journal file
systems and power failures might have lead to disaster. The use of partitions remains for security and
robustness reasons, so a breach on one part of the system does not automatically mean that the whole
computer is in danger. This is currently the most important reason for partitioning.
A simple example: a user creates a script, a program or a web application that starts filling up the
disk. If the disk contains only one big partition, the entire system will stop functioning if the disk is
full. If the user stores the data on a separate partition, then only that (data) partition will be affected,
while the system partitions and possible other data partitions keep functioning. Mind that having a
journal file system only provides data security in case of power failure and sudden disconnection of
storage devices. This does not protect your data against bad blocks and logical errors i n the file
system. In those cases, you should use a RAID (Redundant Array of Inexpensive Disks) solution.
Partition Layout and Types
There are two kinds of major partitions on a Linux system:
Data Partition: Normal Linux system data, including the root partition containing all the data to
start-up and run the system.
Swap Partition: Expansion of the computer‘s physical memory, extra memory on hard disk.
6. ……………….attribute keeps track of the current size of a file in bytes, words or blocks.
(a). Size (b). Location (c). Protection (d). None of these
7. The operating system provides systems calls to create, write, read, reposition, truncate and delete
files.
(a). True (d). False
9. The most common directory structures used by multi -user systems are………………
(a). single-level directory (b). two-level directory
(c). Both (a) and (b) (d). None of these
10. A…………….has significant limitations when the number of files increases or when there is more
than one user.
(a). single-level directory (b). two-level directory
(c). tree-structured directory (d). None of these
10.7 Summary
The file management, formerly known as data management, is the part of the operating system
that controls the storing and accessing of data by an application program.
In a single-level directory system, all the files are placed in one directory.
The using memory on a hard disk is naturally slower than using the real memory chips of a
computer.
The kernel is on a separate partition as well in many distributions, because it is the most
important file of system.
One of the goals of having different partitions is to achieve higher data security in case of
disaster.
10.8 Keywords
Data Partition: The normal Linux system data, including the root partition containing all the data to
start up and run the system.
File Management: It provides functions that allow you to manage files (create, change, override, or
delete) using CL commands, and create and access data through a set of operations (for example,
read, write, open, or close).
Operating System: It provides systems calls to create, write, read, reposition, truncate and delete
files.
Swap Partition: It is expansion of the computer‘s physical memory, extra memory on hard disk.
Sub Directory: The directory under root (/) directory is subdirectory which can be created, renamed
by the user.
1.0 Objectives
After studying this chapter, you will be able to:
Discuss the introduction to network
Define the computer networks
Explain the need and uses of computer network
Discuss the applications of network and criteria
1.1 Introduction
Today computer is available in many offices and homes and therefore there is a need to share data and
programs among various computers. With the advancement of data communication facilities the
communication between computers has increased and thus it has extended the power of computer beyond the
computer room. Now a user sitting at one place can communicate with computers of any remote site through
communication channel. The aim of this chapter is to introduce you the various aspects of computer network.
In the world of computers, networking is the practice of linking two or more computing devices together for
the purpose of sharing data. Networks are built with a mix of computer hardware and computer software.
1.2 Network
A network comprises two or more computers that have been connected in order to enable them to
communicate with each other, and share resources and files.
1.2.1 Computer Networks
A computer network is interconnection of various computer systems located at different places. In computer
network two or more computers are linked together with a medium and data communication devices for the
purpose of communication data and sharing resources. The computer that provides resources to other
computers on a network is known as server. In the network the individual computers, which access shared
network resources, are known as nodes.
Parts of a network
There are five basic components of a network: clients, servers, channels, interface devices and operating
systems.
Servers: Sometimes called host computers, servers are powerful computers that store data or applications and
connect to resources that are shared by the users of a network.
Clients: These computers are used by the users of the network to access the servers and shared resources (such
as hard disks and printers). These days, it is typical for a client to be a personal computer that the users also
use for their own non-network applications.
Channels: Called the network circuit, the channel is the pathway over which information travels between the
different computers (clients and servers) that comprises the network.
Interface devices: These are hardware devices that connect clients and servers (and sometimes other networks)
to the channel. Examples include modems and network interface cards.
Operating systems: The network operating system is the software of the network. It serves a similar purpose
that the operating system serves in a stand-alone computer.
Transmission medium: This is the actual physical medium of the channel. Computer network channels use
either wire line or wireless media.
Wire line media: Also called guided media and line-based media. In networks that use wire line media, the
transmission of information takes place on a wire or cable. The three types of wire line media are twisted-pair
wire, coaxial cable and fibre-optic cable. (Try and find examples of each of these media, and their relative
speeds). While twisted-pair and coaxial cable are more commonly used today, fibre optic cables are
becoming increasingly popular.
Wireless media also called radiated media. As the name indicates, in networks that use wireless media, there
is no physical wire along which information travels; instead, information is transmitted through the air, from
one transmission station to the next. Networking examples include radio, cellular, microwave and satellite.
Broadcast TV and FM radio use wireless transmission as well (though the underlying engineering is a little
different).
Transmission rate or bandwidth: This property of a network channel describes how fast information can be
transmitted over the channel. It is measured in bits per second people very commonly use the term bandwidth to
mean transmission rate.
Transmission directional capability: The direction in which information can be transmitted over a channel
depends on whether the channel is simple, half-duplex or full-duplex.
Simplex: Information can be transmitted only in one direction.
Half-duplex: Information can be transmitted in both directions, but only in one direction at a time.
Full-duplex: Information can be transmitted in both directions simultaneously.
Signal type: There are two signal types analog and digital. It is a little hard to understand the exact difference
without discussing a lot of electrical engineering and physics, so we would not go there. What you need to
take away is that:
Analog signals are ‗continuous‘ (they take on a wide range of values) and digital signals are ‗discrete‘,
and binary.
Digital signals are more ‗natural‘ for computer networks, since, as we know, computers represent all
information in binary.
The reason why we have to worry about analog signals is because the communications. Channels that
predated computer networks (like telephone lines, cable TV lines and radio transmitters) were all designed
to carry analog signals.
Communication and Access to Information: The primary purpose of computer networking is to facilitate
communication. A network allows a user to instantly connect with another user, or network, and send and
receive data. It allows remote users to connect with one other via videoconferencing, virtual meetings and
digital emails. Computer networks provide access to online libraries, journals, electronic newspapers, chat
rooms, social networking Websites, email clients and the World Wide Web. Users can benefit from making
online bookings for theatres, restaurants, hotels, trains and airplanes. They can shop and carry out banking
transactions from the comfort of their homes.
Computer networks allow users to access interactive entertainment channels, such as video on demand,
interactive films, interactive and live television, multi-person real-time games and virtual-reality models.
Resource Sharing: Computer networks allow users to share files and resources. They are popularly used in
organizations to cut costs and streamline resource sharing. A single printer attached to a small local area
network (LAN) can effectively service the printing requests of all computer users on the same network. Users
can similarly share other network hardware devices, such as modems, fax machines, hard drives and
removable storage drives.
Networks allow users to share software applications, programs and files. They can share documents (such as
invoices, spreadsheets and memos), word processing software, videos, photographs, audio files, project
tracking software and other similar programs. Users can also access, retrieve and save data on the hard drive of
the main network server.
Centralized Support and Administration: Computer networking centralizes support, administration and
network support tasks. Technical personnel manage all the nodes of the network, provide assistance, and
troubleshoot network hardware and software errors. Network administrators ensure data integrity and devise
systems to maintain the reliability of information through the network. They are responsible for providing
high-end antivirus, anti-spyware and firewall software to the network users. Unlike a stand-alone system, a
networked computer is fully managed and administered by a centralized server, which accepts all user requests
and services them as required.
Performance
Performance can be measured in many ways, including transit time and response time. Transit time is the
amount of time required for a message to travel from one device to another. Response time is the elapsed time
between an inquiry and a response. The performance of a network depends on a number of factors, including
the number of users, the type of transmission medium, the capabilities of the connected hardware, and the
efficiency of the software.
Number of users: Having a large number of concurrent users can slow response time in a network not
designed to coordinate heavy traffic loads. The design of a given network is based on an assessment of the
average number of users that will be communicating at any one time. In peak load periods, however, the
actual number of users can exceed the average and thereby decrease performance. How a network
responds to loading is a measure of its performance.
Type of transmission medium: The medium defines the speed at which data can travel through a
connection (the data rate). Today‘s networks are moving to faster and faster transmission media, such as
fiber-optic cabling. A medium that can carry data at 100 megabits per second is 10 times more powerful
than a medium that can carry data at only 10 megabits per second. However, the speed of light imposes an
upper bound on the data rate.
Hardware: The types of hardware included in a network affect both the speed and capacity of
transmission. A higher-speed computer with greater storage capacity provides better performance.
Software: The software used to process data at the sender, receiver, and intermediate nodes also affects
network performance. Moving a message from node to node through a network requires processing to
transform the raw data into transmittable signals, to route these signals to the proper destination, to ensure
error-free delivery, and to recast the signals into a form the receiver can use. The software that provides
these services affects both the speed and the reliability of a network link. Well-designed software can
speed the process and make transmission more effective and efficient.
Reliability: In addition to accuracy of delivery, network reliability is measured by frequency of failure, the
time it takes a link to recover from a failure, and the network‘s robustness in a catastrophe.
Frequency of failure: All networks fail occasionally. A network that fails often, however, is of little value to a
user.
Recovery time of a network after a failure: A network that recovers quickly is more useful than one that does
not.
Security: Network security issues include protecting data from unauthorized access and viruses.
Unauthorized access: For a network to be useful, Protection can be accomplished at a number of levels. At
the lowest level are user identification codes and passwords. At a higher level are encryption techniques.
In these mechanisms, data are systematically altered in such a way that if they are intercepted by an
unauthorized user, they will be unintelligible.
Viruses: Because a network is accessible from many points, it can be susceptible to computer viruses. A
virus is an illicitly introduced code that damages the system. A good network is protected from viruses by
hardware and software designed specifically for that purpose.
Caution
Networks must be protected from catastrophic events such as fire, earthquake, or theft. One protection against
unforeseen damage is a reliable system to back up network software.
Network hub: the central connection point for network cables that connect to computers or other devices on a
network. The hub has several network cable jacks or ports that you use to connect network cables to
computers. The hub contains circuitry that enables each computer to communicate with any other computer
connected to the hub (see Figure 1.3).
All the networking hardware described here is known as Ethernet. Ethernet is the industry-wide standard for
computer networks. Standard Ethernet networks transmit data at 10 million bits per second (Mbps). A newer
Ethernet standard, called Fast Ethernet, transmits data at 100 Mbps. Computer networks often contain a
mixture of 10 Mbps and 100 Mbps devices.
Suppose you want to network a few computers together in a small area where it would be expensive to have
network cabling installed in an existing building. Or perhaps you just have a desktop computer and a notebook
computer at home and you would like to be able to roam the house with the notebook computer and perhaps
even browse the Web from the hammock in the back yard. Wireless Ethernet makes all this possible. You can
install wireless adapters in each computer and form a wireless network Figure 1.6.
Caution
While protecting the network, sensitive data must be protected from unauthorized access.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Topology: The geometric arrangement of a computer system. Common topologies include bus, star, and ring.
Protocol
The protocol defines a common set of rules and signals that computers on the network use to communicate.
One of the most popular protocols for LANs is called Ethernet. Another popular LAN protocol for PCs is the
IBM token-ring network.
Architecture
Networks can be broadly classified as using either peer-to-peer or client/server architecture. Computers on a
network are sometimes called nodes. Computers and devices that allocate resources for a network are called
servers.
The types of networks can be further classified into two more divisions:
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
The term middleware is used to describe separate products that serve as the glue between two applications. It
is, therefore, distinct from import and export features that may be built into one of the applications.
Middleware is sometimes called plumbing because it connects two sides of an application and passes data
between them. Common middleware categories include:
TP monitors
DCE environments
RPC systems
Object Request Brokers (ORBs)
Database access systems
Message Passing
Audio/Video Servers
Audio/video servers bring multimedia capabilities to Web sites by enabling them to broadcast streaming
multimedia content. Streaming is a technique for transferring data such that it can be processed as a steady and
continuous stream. Streaming technologies are becoming increasingly important with the growth of the
Internet because most users do not have fast enough access to download large multimedia files quickly. With
streaming, the client browser or plug-in can starts displaying the data before the entire file has been
transmitted.
For streaming to work, the client side receiving the data must be able to collect the data and send it as a steady
stream to the application that is processing the data and converting it to sound or pictures. This means that if
the streaming client receives the data more quickly than required, it needs to save the excess data in a buffer if
the data does not come quickly enough, however, the presentation of the data will not be smooth.
There are a number of competing streaming technologies emerging. For audio data on the Internet, the de facto
standard is Progressive Network‘s Real Audio.
Chat Servers
Chat servers enable a large number of users to exchange information in an environment similar to Internet
newsgroups that offer real-time discussion capabilities. Real time means occurring immediately. The term is
used to describe a number of different computer features. For example, real-time operating systems are
systems that respond to input immediately. They are used for such tasks as navigation, in which the computer
must react to a steady flow of new information without interruption. Most general-purpose operating systems
are not real-time because they can take a few seconds, or even minutes, to react.
Real time can also refer to events simulated by a computer at the same speed that they would occur in real life.
In graphics animation, for example, a real-time program would display objects moving across the screen at the
same speed that they would actually move.
Fax Servers
A fax server is an ideal solution for organizations looking to reduce incoming and outgoing telephone
resources but that need to fax actual documents.
FTP Servers
One of the oldest of the Internet services, File Transfer Protocol makes it possible to move one or more files
securely between computers while providing file security and organization as well as transfer control.
Groupware Servers
A GroupWare server is software designed to enable users to collaborate, regardless of location, via the Internet
or a corporate Intranet and to work together in a virtual atmosphere.
IRC Servers
An option for those seeking real-time capabilities, Internet Relay Chat consists of various separate networks
(or ―nets‖) of servers that allow users to connect to each other via an IRC network.
List Servers
List servers offer a way to better manage mailing lists, whether they are interactive discussions open to the
public or one-way lists that deliver announcements, newsletters, or advertising.
Mail Servers
Almost as ubiquitous and crucial as Web servers, mail server‘s move and store mail over corporate networks
via LANs and WANs and across the Internet.
News Servers
News servers act as a distribution and delivery source for the thousands of public news groups currently
accessible over the USENET news network. USENET is a worldwide bulletin board system that can be
accessed through the Internet or through many online services The USENET contains more than 14,000
forums called newsgroups that cover every imaginable interest group. It is used daily by millions of people
around the world.
Proxy Servers
Proxy servers sit between a client program typically a Web browser and an external server (typically another
server on the Web) to filter requests, improve performance, and share connections.
Telnet Servers
A Telnet server enables users to log on to a host computer and perform tasks as if they are working on the
remote computer itself.
Web Servers
At its core, a Web server serves static content to a Web browser by loading a file from a disk and serving it
across the network to a user‘s Web browser. The browser and server talking to each other using HTTP mediate
this entire exchange.
3……..the central connection point for network cables that connect to computers or other devices on a
network.
(a). Network (b). Network hub
(c). Network adapter cards (d). None of these
4…………expansion cards that provide the physical connection between each computer and the network.
(a). Network cards (b). Pen cards
(c). Network adapter cards (d). None of these
5. ………are more ‗natural‘ for computer networks, since, as we know, computers represent all information in
binary.
(a). Analog signals (b). Network signals
(c). Digital signals (d). None of these
1.9 Summary
The modern form of communication like e-mail and Internet is possible only because of computer
networking.
Data Routing is the process of finding the most efficient route between source and destination before
sending the data.
In simplex mode the communication take place in one direction. The receiver receives the signal from the
transmitting device.
In half-duplex mode the communication channel is used in both directions, but only in one direction at a
time. Thus a half-duplex line can alternately send and receive data.
The computer that provides resources to other computers on a network is known as server.
In the network the individual computers, which access shared network resources, are known as nodes.
1.10 Keywords
Communication Satellite: The problem of line-sight and repeaters are overcome by using satellites which are
the most widely used data transmission media in modern days.
Data sequencing: A long message to be transmitted is broken into smaller packets of fixed size for error free
data transmission.
Internet: The newest type of network to be used within an organisation is an internet or Internet Web. Such
networks enable computers (or network) of any type to communicate easily.
Transmission: Communication of data achieved by the processing of signals.
Teleconferencing: It refers to electronic meetings that involve people who are at physically different sites.
Telecommunication technology allows participants to interact with one another without travelling to the same
location.
1.11 Review Questions
1. What is the model and communication task?
2. What are the needs of computer networks?
3. Differentiate between half-duplex and full-duplex.
4. What is the use of computer networks?
5. What is the application of computer network?
6. Differentiate between LAN and WAN.
7. How many types of server and network?
8. What are the software and hardware networking?
9. What are the network criteria?
10. What are the hub and network cables?
2.0 Objectives
After studying this chapter, you will be able to:
Explain the Base Band, Broadband
Discuss the Analog and Digital Data
Define the transmission impairment
Discuss the Shannon capacity
2.1 Introduction
Data transmission is the transfer of data from point-to-point often represented as an electro-magnetic signal
over a physical point-to-point or point-to-multipoint communication channel. Examples of such channels are
copper wires, optical fibers, wireless communication channels, and storage media. The term usually refers to
digital communications (i.e. digital bit stream), but may include analog data transmission as well.
Data transmission is a subset of the field of data communications, which also includes computer networking or
computer communication applications and networking protocols, for example routing and switching.
Data transmission, Sending and receiving data via cables (e.g., telephone lines or fiber optics) or wireless relay
systems. Because ordinary telephone circuits pass signals that fall within the frequency range of voice
communication (about 300–3,500 hertz), the high frequencies associated with data transmission suffer a loss of
amplitude and transmission speed. Digital computers use a modem to transform outgoing digital electronic
data; a similar system at the receiving end translates the incoming signal back to the original electronic data.
Specialized data-transmission links carry signals at frequencies higher than those used by the public telephone
network.
The purpose of a network is to transmit information from one computer to another. To do this, you first have to
decide how to encode the data to be sent, in other words its computer representation. This will differ according
to the type of data, which could be:
Audio data
Text data
Graphical data
Video data
Data representation can be divided into two categories:
Digital representation: which means that the information is encoded as a set of binary values, in other
words a sequence of 0s and 1s
Analogue representation: which means that the data will be represented by the variation in a
continuous physical quantity
Broadcast Networks
A broadcast network has a single communication channel that is shared by all the machines on the network.
All the machines on the network send and receive messages, called packets. Each packet conation an address
field specifies the intended recipient. Upon receiving a packet, machine checks the address field. If packet is
intended for itself, it processes the packet; if packet is not intended for itself it is simply ignored. When a
packet is transmitted by one machine and received by all the machines on the network. This mode of operation
is known as Broadcast Mode. Some Broadcast systems also support transmission to a sub-set of machines,
something known as Multicasting.
Point-to-point networks
This type of network consists of many connections between individual pairs of machines. To go from the
source to the destination, a packet of information on this type of network may have to first visit one or more
intermediate machines. Often multiple routes, of different length are possible, so routing algorithms play an
important role in point-to-point networks. A network based on point-to-point communication is shown in Fig
2.
Digital data take on discrete values. For example, data are stored in computer memory in the form of 0s and
1s. They can be converted to digital signal or modulated into an analog signal for transmission across a
medium. Eg., Text and Character Strings. A number of codes have been devised by which characters are
represented by a sequence of bits. The most commonly used text code now a day is the International Reference
Alphabet (IRA). The U.S National version of IRA is referred to as the American National Standard Code for
Information Interchange (ASCII).
Suppose a manager has to write several letters to various clients. First he has to use his PC and Word
Processing package to prepare the letter, if the PC is connected to all the client's PC through networking, he
can send the letters to all the clients within minutes. Thus irrespective of geographical areas, if PCs are
connected through communication channel, the data and information, computer files and any other programs
can be transmitted to other computer systems within seconds. The modern form of communication like e-mail
and Internet is possible only because of computer networking.
In data communication four basic terms are frequently used. They are:
Data: A collection of facts in raw forms that become information after processing.
Signals: Electric or electromagnetic encoding of data.
Signalling: Propagation of signals across a communication medium.
Transmission: Communication of data achieved by the processing of signals.
Figure. 3 Simplex
2. Half-duplex: In half-duplex mode the communication channel is used in both directions, but only in one
direction at a time. Thus a half-duplex line can alternately send and receive data.
As the signal travels along the network cable, it gradually decreases in strength and can become distorted. If
the cable length is too long, the received signal can be unrecognizable or misinterpreted.
As a safeguard, baseband systems sometimes use repeaters to receive incoming signals and retransmit them at
their original strength and definition.
This increases the practical length of a cable.
If sufficient total bandwidth is available, multiple analog transmission systems, such as cable television and
network transmissions, can be supported simultaneously on the same cable.
Each transmission system is allocated a part of the total bandwidth. All devices associated with a given
transmission system, such as all computers using a LAN cable, must then be tuned so that they use only the
frequencies that are within the allocated range.
While baseband systems use repeaters, broadband systems use amplifiers to regenerate analog signals at their
original strength.
In broadband transmission, signals flow in one direction only, so there must be two paths for data flow in order
for a signal to reach all devices. There are two common ways to do this:
1 Through mid-split broadband configuration, the bandwidth is divided into two channels, each using a
different frequency or range of frequencies. One channel transmits signals; the other receives signals.
2 In dual-cable broadband configuration, each device is attached to two cables. One cable is used to send, and
the other is used to receive.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Exercise: Check Your Progress 2
Case Study-History of Network Transmission
AT&T built its original long distance network from copper wire strung on telephone poles. Telephone calls
travelled down these wires as analog signals, electrical waves of analogous form to the original voices. Each
call required two wires to form a complete electrical circuit. Telephone signals weaken from electrical
resistance as they travel down the wires. But thicker wires have lower resistance, so wires as thick as 1/6 inch
were used on circuits such as the original New York-Chicago line of 1892.
After 1904, properly spaced loading coils, which effectively reduce the resistance in the line, allowed for
longer lines and thinner wires. Vacuum-tube repeaters, introduced in 1914, made it possible to amplify or
strengthen the signals, allowing for still longer lines and still thinner wires. An electrical trick called a
―phantom circuit‖ allowed two pairs of wires to carry three calls.
Demand grows
In the 1910s, AT&T developed several new technologies to meet the growing demand on major routes.
Underground cables carried more wires in less space and provided protection from the weather. AT&T
installed its first underground cable between Philadelphia and Washington, D.C., in 1912. Carrier-current
systems sent several calls down a single pair of wires by superimposing the calls on higher frequency currents,
rather than transmitting the signals on their natural voice frequencies. AT&T installed its first carrier system
between Baltimore and Pittsburgh in 1918. That system carried four calls on a single pair of wires.
Broadband begins
Still higher calling volume, and the beginning of work on television, fueled AT&T‘s invention of the first
broadband transmission medium, broadband copper coaxial cable. AT&T installed its first experimental
coaxial cable between New York and Philadelphia in 1936. The first ―regular‖ installation connected
Minneapolis, Minn., and Stevens Point, Wis., in 1941. This L1 coaxial-cable system could carry 480 telephone
conversations or one television program. Subsequent coaxial-cable systems had much higher capacity. The L5
systems of the 1970s could carry 132,000 calls or more than 200 television programs.
Microwave radio relay
Coaxial-cable systems developed in tandem with microwave radio relay, a broadband system by which
conversations and television travelled via radio along a series of towers. The first such system, with seven
towers on seven hilltops, connected New York and Boston in 1947. This system carried 2,400 conversations;
later systems carried as many as 19,200.
Microwave relay had lower construction and maintenance costs than coaxial cable, particularly across difficult
terrain. By the 1970s, radio-relay systems carried 70% of AT&T‘s voice and 95 % of its television traffic.
Fiber-optic systems
In the 1980s, both coaxial cable and microwave relay gave way to an entirely new system - fiber-optics. Fiber-
optic systems use rapid pulses of light traveling on fibers of ultra-pure glass. It was a digital rather than an
analog medium, and particularly well suited for transmitting data as well as voice.
Glass fibers, as thin as a human hair, make up the highways for modern fiber optic communication systems.
AT&T installed its first fiber-optic route between Washington, D.C., and New York in 1983. In 1989, AT&T
announced that it would retire all its analog transmission facilities. Within a few years, the analog coaxial-
cable and radio-relay systems were relegated to back-up duty. Meanwhile, continuing advances in fiber-optic
technology greatly increased the capacity of the new systems, a process that continues today.
Questions
1. Discuss the brief history of Network Transmission.
2. Write the history of broadband.
2.8 Summary
Signals travel from transmitter to receiver via a path. This path, called the medium, guided or unguided.
A guided medium is contained within physical boundaries, while an unguided, medium is boundless.
Radio waves used to transmit data. These waves use unguided are usually propagated through the air.
Fiber-optic cables are composed of a glass or plastic inner core surrounded by cladding, all encased in an
outside jacket.
Satellite communication uses a satellite in geosynchronous orbit to relay signals. A system of three
correctly spaced satellites covers most of the earth.
The Shannon capacity is a formula to determine the theoretical maximum data rate for a channel.
2.9 Keywords
Cellular telephony: Cellular telephony is moving fast toward integrating the existing system with satellite
communication.
Guided media: It provides a conduit from one device to another; include twisted-pair cable, coaxial cable, and
fiber-optic cable. A signal travelling along any of these media is directed and contained by the physical limits
of the medium.
Optical fiber: Optical fiber is a glass or plastic cable that accepts and transports signals in the form of light.
Reflection: When the angle of incidence becomes greater than the critical angle, a new phenomenon occur
called reflection.
Satellite transmission: Satellite transmission is much like line-of-sight microwave transmission in which one
of the stations is a satellite orbiting the earth.
3.0 Objectives
After studying this chapter, you will be able to:
Explain the open system interconnection model
Discuss the functions of the ISO/OSI layers
3.1 Introduction
An ISO standard that covers all aspects of network communications is the Open Systems Interconnection
(OSI) model. An open system is a model that allows any two different systems to communicate regardless of
their underlying architecture. Vendor-specific protocols close off communication between unrelated systems.
The purpose of the OSI model is to open communication between different systems without requiring changes
to the logic of the underlying hardware and software. The OSI model is not a protocol; it is a model for
understanding and designing a network architecture that is flexible, robust, and interoperable.
Upon reaching its destination, the signal passes into layer 1 and is transformed back into bits. The data units
then move back up through the OSI layers. As each block of data reaches the next higher layer, the headers
and trailers attached to it at the corresponding sending layer are removed, and actions appropriate to that layer
are taken. By the time it reaches layer 7, the message is again in a form appropriate to the application and is
made available to the recipient.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Caution
While synchronization of bits, bits must be encoded into signals-electrical or optical.
Example:
In Figure 3.6 a node with physical address 10 sends a frame to a node with physical address 87. The two nodes
are connected by a link. At the data link level this frame contains physical (link) addresses in the header. These
are the only addresses needed. The rest of the header contains other information needed at this level. The
trailer usually contains extra bits needed for error detection.
Routing: When independent networks or links are connected together to create an internetwork (a network of
networks) or a large network, the connecting devices (called routers or gateways) route the packets to their
final destination. One of the functions of the network layer is to provide this mechanism.
Example:
Now imagine that in Figure 3.8 we want to send data from anode with network address A and physical address
10, located on one local area network, to a node with a network address P and physical address 95, and located
on another local area network. Because the two devices are located on different networks, we cannot use
physical addresses only; the physical addresses have only local jurisdiction. What we need here are universal
addresses that can pass through the bound arise of local area networks. The network (logical) addresses have
this characteristic. The packet at the network layer contains the logical addresses, which remain the same from
the original source to the final destination. They will not change when we go from network to network.
However, the physical addresses will change when the packet moves from one network to another. The box
with the R is a router (internetwork device).
For added security, the transport layer may create a connection between the two end ports. A connection is a
single logical path between the source and destination that is associated with all packets in a message. Creating
a connection involves three steps: connection establishment, data transfer, and connection release. By
confining transmission of all packets to a single pathway, the transport layer has more control over sequencing,
flow, and error detection and correction.
Specific responsibilities of the transport layer include the following:
Service-point addressing: Computers often run several programs at the same time. For this reason, source-to-
destination delivery means delivery not only from one computer to the next but also from a specific process
(running program) on one computer to a specific process (running program) on the other. The transport layer
header therefore must include a type of address called a service-point address (or port address). The network
layer gets each packet to the correct computer; the transport layer gets the entire message to the correct process
on that computer.
Example:
Figure 3.10 shows an example of a transport layer. Data coming from the upper layers have service-point
(port) addresses j and k (j is the address of the sending application and k is the address of the receiving
application). Since the data size is larger than the network layer can handle, the data are split into two packets,
each packet retaining the service-point addresses (j and k). Then in the network layer, network addresses (A
and P) are added to each packet. The packets may travel on different paths and arrive at the destination either
in order or out of order. The two packets are delivered to the destination network layer, which is responsible
for removing the network layer headers. The two packets are now passed to the transport layer, where they are
combined for delivery to the upper layers.
Translation: The processes (running programs) in two systems are usually exchanging information in the form
of character strings, numbers, and so on. Because different computers use different encoding systems, the
presentation layer is responsible for interoperability between these different encoding methods. The
presentation layer at the sender changes the, information from its sender-dependent format into a common
format. The presentation layer at the receiving machine changes the common format into its receiver-
dependent format.
Encryption: To carry sensitive information, a system must be able to assure privacy. Encryption means that
the sender transforms the original information to another form and sends the resulting message out over the
network. Decryption reverses the original process to transform the message back to its original form.
Compression: Data compression reduces the number of bits to be transmitted. Data compression becomes
particularly important in the transmission of multimedia such as text, audio, and video.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
3. The end-to-end delivery of the entire message is the responsibility of the layer.
(a) network (b). transport
(c). session (d). presentation
OSI/ISO
Physical layer – controls electrical and mechanical aspects of data transmission, e.g., voltage levels, cable
lengths, and so on.
Data-link layer – addresses the transmission of data frames (or packets) over a physical link between
network entities, includes error correction.
Network layer – establishes paths for data between computers and determines switching among routes
between computers, determines how to disaggregate messages into individual packets.
Transport layer – deals with data transfer between end systems and determines flow control.
Session layer – creates and manages sessions when one application process requests access to another
applications process, e.g., MS Word importing a spread sheet from Excel.
Presentation layer – determines syntactic representation of data, e.g., agreement on character code like
ASCII/Unicode.
Application layer – establishes interface between a user and a host computer, e.g., searching in a database
application.
TCP/IP
Physical layer – not really part of this model, since TCP and IP deal with software; usually thought to refer
to all hardware beneath the network layer.
Network or data link layer – defined by whatever the Internet Protocol will run over, e.g., a token-ring
network.
Internet or network layer – provides network addressing and routing, providing a common address space
and connecting heterogeneous networks. IP runs here.
Transport layer – manages data-consistency by providing a reliable byte stream between nodes on a
network. TCP and User Datagram Protocol (UDP) run here.
Process and applications layer – provides application services to users and programs.
Questions
1 Sketch a basic OSI model to show the various layers in the correct order.
2 Label the diagram to show the function of each layer.
3.5 Summary
The International Standards Organization (ISO) created a model called the Open Systems Interconnection
(OSI), which allows diverse systems to communicate.
The seven-layer OSI model provides guidelines for the development of universally compatible
architecture, hardware, and software.
The transport layer links the network support layers and the user support layers.
The data link layer is responsible for delivering data units from one station to the next without errors.
The session layer establishes, maintains, and- synchronizes the interactions between communicating
devices.
The TCP/IP, a five-layer hierarchical protocol suite developed before the OSI model, is the protocol suite
used in the Internet.
3.6 Keywords
Application layer: The application layer enables the users to access the network.
Network layer: The network layer is responsible for the source-to-destination delivery of a packet across
multiple network links.
Physical layer: The physical layer coordinates the functions required to transmit a bit stream over a physical
medium.
Presentation layer: The presentation layer ensures interoperability between communicating devices through
transformation of data into a mutually agreed-upon format.
Transport layer: The transport layer is responsible for the source-to-destination delivery of the entire message.
4.0 Objectives
After studying this chapter, you will be able to:
Discuss the Ethernet
FDDI (fiber distributed data interface)
Discuss the network operation
Define the ATM (asynchronous transfer mode
Explain the ATM service categories
4.1 Introduction
The main interest of real world network has been in understanding the structural properties and patterns in the
evolution of large graphs and networks. What does a ―normal‖ social network look like? How will it evolve
over time? How can we spot ―abnormal‖ interactions (e.g., spam) in a time-evolving e-mail graph? The large
networks can be divided into two parts:
The study of statistical properties and models that govern the generation and evolution of large real-world
networks. We view the network as a big complex system, observe its static and temporal properties and
patterns to design models that capture and help us understand the temporal and static patterns of real-world
networks.
The study of the network by starting from individual nodes and small communities. We are especially
interested in modeling the spread of influence and information over the network and the substructures of
the network, called cascades, which this process creates. We aim to find common and abnormal sub-
network patterns and understand the propagation of influence, information, diseases and computer viruses
over the network. Once we know the propagation patterns and structure, we devise algorithms for
efficiently finding influential nodes.
4.2 Ethernet
IEEE 802.3 supports a LAN standard originally developed by Xerox and la - extended by a joint venture
between Digital Equipment Corporation, Intel Corporation, and Xerox. This was called Ethernet.
IEEE 802.3 defines two categories: baseband and broadband, as shown in Figure 4.1. The word base
specifies a digital signal (in this case, Manchester encoding). The word broad specifies an analog signal (in
this case, PSK encoding). IEEE divides the baseband category into five different standards: 10Base5,
10Base2. 10Base-T, l0Base5, and 100Base-T. The first number (10, 1, or 100) indicates the data rate in
Mbps. The last number or letter (5, 2, 1, or T) indicates the maximum cable length or the type of cable.
IEEE defines only one specification for the broadband category: lOBroad36. Again, the first number (10)
indicates the data rate. The last number defines the maximum cable length. However, the maximum cable
length restriction can be changed using networking devices such as repeaters or bridges.
Implementation
Although the bulk of the IEEE Project 802 standard focuses on the data link layer of the OSI model, the 802
model also defines some of the physical specifications for each of the protocols defined in the MAC layer. In
the 802.3 standard, the IEEE defines the types of cable, connections, and signals that are to be used in each of
five different Ethernet implementations. All Ethernet LANs are configured as logical buses, although.
The physical connectors and cables utilized by 10Base5 include coaxial cable, network interface cards,
transceivers, and attachment unit interface (AUI) cables.
RG-8 Cable RG-8 cable (RG stands for radio government) is a thick coaxial cable that provides the backbone
of the IEEE 802.3 standard.
Transceiver Each station is attached by an AUI cable to an intermediary device called a medium attachment-
unit (MAU) or, more commonly, a transceiver (short for transmitter-receiver). The transceiver performs the
CSMA/CD function of checking for voltages and collisions on the line and may contain a small buffer. It also
serves as the connector that attaches a station to the thick coaxial cable itself via a tap.
AUI Cables Each station is linked to its corresponding transceiver by an attachment unit interface (AUI), also
called a transceiver cable. An AUI is a 15-wire cable with plugs that performs the physical layer interface
functions between the station and the transceiver. Each end of an AUI terminates in a DB-15 (15-pin)
connector. One connector plugs into a port on the NIC, the other into a port on the transceiver. AUIs are
restricted to a maximum length of 50 meters, allowing for some flexibility in placement of stations relative to
the 10BASE5 backbone cable.
Transceiver Tap Each transceiver contains a connecting mechanism, called a tap because it allows the
transceiver to tap into the line at any point. The tap is a thick cable-sized well with a metal spike in the centre.
The spike is attached to wires inside the transceiver. When the cable is pressed into the well, the spike pierces
the jacket and sheathing layers and makes an electrical connection between the transceiver and the cable. This
kind of connector is often called a vampire tap because it bites the cable.
The physical layout of 10Base2 is illustrated in Figure 4.6. The connectors cables utilized are: NICs, thin
coaxial cable, and BNC-T connectors. In this technology the transceiver circuitry has moved into the NIC, and
the transceiver tap has been replaced by a connector that splices the station directly into the cable, eliminating
the need for AUI cables.
As Figure 4.8 shows, each station contains an NIC. A length of four-pair UTP of not more than 100 meters
connects the NIC in the station to the appropriate port in the 10Base-T hub.
The weight and flexibility of the cable and the convenience of the RJ-45 jack and plug make 10Base-T the
easiest of the 802.3 LANs to install and reinstall. When a station needs to be replaced, a new station can
simply be plugged in.
1Base5: StarLAN
StarLAN is an AT&T product used infrequently today because of its slow speed. At only 1 Mbps, it is 10 times
slower than the three standards discussed above.
What is interesting about StarLAN is its range, which can be increased by a mechanism called daisy chaining.
Like 10Base-T, StarLAN uses twisted-pair cable to connect stations to a central intelligent hub. Unlike
10Base-T, which requires that each station have its own dedicated cable into the hub, StarLAN allows as many
as 10 stations to be linked, each to the next, in a chain in which only the lead device connects to the hub (see
Figure 4.9).
Figure 4.9: 1Base5.
Switched Ethernet
Switched Ethernet is an attempt to improve the performance of 10Base-T The 10Base-T Ethernet is a shared
media network, which- means that the entire involved in each transmission. This is because the topology,
though physically a star is logically a bus. When a station sends a frame to a hub, the frame is sent out from all
ports (interfaces) and every station will receive it. In this situation, only one station can send a frame at any
time. If two stations try to send frames simultaneously, there is a collision.
Figure 4.10 shows this situation. Station A is sending a frame to station E. The frame is received by the hub
and is sent to every station. All of the cabling in the system is involved in this transmission. Another way to
think about this is that one transmission uses the entire capacity of 10 Mbps; if one station uses it, no other
station can.
However, if we replace the hub with a switch, a device that can recognize the destination address and can route
the frame to the port to which the destination station is connected, the rest of the media are not involved in the
transmission process. This means that the switch can receive another frame from another station at the same
time and can route this frame to its own final destination. In this way, theoretically, there is no collision.
Using a switch, instead of a hub, we can theoretically increase the capacity of a network with N devices to N x
10 Mbps because 10Base-T uses two pairs of UTP for full-duplex communication.
Figure 4.11 shows a Switched Ethernet. When station A is sending a frame to station E , station B can also
send a frame to station D without any collision.
Figure 4.11: An Ethernet using a switch.
Fast Ethernet
With new applications such as computer-aided design (CAD), image processing, and real-time audio and video
being implemented on LANs, there is a need for a LAN with a data rate higher than 10 Mbps. Fast Ethernet
operates at 100 Mbps.
In the physical layer, the specification developed for Fast Ethernet is a star topology similar to 10Base-T;
however, to match the physical layer to different re available, IEEE has designed two categories of Fast
Ethernet: 100Base-X and 100Base-T4. The first uses two cables between the station and the hub; the - uses
four. 100Base-X itself is divided into two types: 100Base-TX and 100Base-FX (see Figure 4.12).
100Base-TX
The 100Base-TX design uses two category 5 unshielded twisted-pair (UTP) or shielded twisted-pair (STP)
cables to connect a station to the hub. One pair is carry frames from the station to the hub and the other to
carry frames from the station. The encoding is 4B/5B to handle the 100 Mbps; the signaling is NRZ-I, The
distance between the station and the hub (or switch) should be less than 100 meters (see Figure 4.13).
100Base-FX
The 100Base-FX design uses two optical fibers, one to carry frames from the station to the hub and the other
from the hub to the station. The encoding is 4B/SB and signaling is NRZ-I. The distance between the station
and the hub (or switch) should be less than 2000 meters (see Figure 4.14).
100Base-T4
The 100Base-T4 scheme was designed in an effort to avoid rewiring. It requires four pairs of category 3 (voice
grade) UTP that are already available for telephone service inside most buildings. Two of the four pairs are
bidirectional; the other two are unidirectional. This means that in each direction, three pairs are used at the
same time to carry data. Because a 100-Mbps data rate cannot be handled by a voice-grade UTP, the
specification splits the 100-Mbps flow of data into three 33.66-Mbps flows. To reduce the baud rate of the
transmission, a method called 8B/6T (eight binary/six ternary) is used in which each block of eight bits is
transformed into six bauds of three voltage levels (positive, negative, and zero). Figure 4.15 shows the scheme
and an encoding example.
Figure 4.14:100Base-FX.
Gigabit Ethernet
The migration from 10 Mbps to 100 Mbps encouraged the IEEE 802.3 design Gigabit Ethernet, which has a
data rate of 1000 Mbps or 1 Gbps. The strategy is the same; the MAC layer and the access method remain the
same, but the domain is reduced. The physical layer-the transmission media and the system-however, changes.
Gigabit Ethernet is mainly designed to use optical fiber although the protocol does not eliminate the use of
twisted pair cables. Gigabit Ethernet usually serves as a backbone to connect Fast Ethernet networks. An
example is show in Figure 4.16.
Figure 4.16: Use of Gigabit Ethernet.
Four implementations have been designed for Gigabit Ethernet: 1000Base-LX, 1000Base-SX, 1000Base-CX,
and 1000Base-T. The encoding is 8B/IOB, which means a group of 8 binary bits are encoded into a group of
10 binary bits. Table 4.1 shows the features of the four implementations.
Example:
Figure 4.17 show how FDDI access works. We have simplified example by showing only four stations and
making the following assumption,: the TTRT is 30 time units; the time required for the token to go from one
station to am is 1 time unit, each station is allowed to send two synchronous data units per turn: each station
has a lot of asynchronous data to send (waiting in buffers).
In the same round, station 2 follows the same procedure. The token arriving time is now 31 because the token
arrived at station 1 at time 4. It was held 1-6 time units (2 for synchronous data and 24 for asynchronous data)
and it took I time unit for the token to travel between stations (4 + 26 + 1= 31).
Note that the asynchronous allocation time is almost equally distributed between stations. In round 1, station 1
has the opportunity to send 24 time unit equivalents of asynchronous data, but the other stations did not have
such an opportunity. However in rounds 2, 3, and 4, station 1 was deprived of this privilege, but other stations
(one in each round) had the opportunity to send. In round 2, station 2 sent 16; and in round 4, station 4 sent 16.
Addressing
FDDI uses a six-byte address which is imprinted on the NIC card similar to Ethernet addresses.
The reason for this extra encoding step is that, although NRZ-I provides adequate synchronization under
average circumstances, sender and receiver may go out of synchronization anytime the data includes a long
sequence of Os. 4B/SB encoding transforms each four-bit data segment into a five-bit unit that contains no
more than two consecutive Os. Each of the 16 possible four-bit patterns is assigned a five-bit pattern M
represent it. These five-bit patterns have been carefully selected so that even sequential data units cannot result
in sequences of more than three Os (none of the five-bit patterns start with more than one 0 or end with more
than two Os); see Table 4.2.
Five-bit codes that have not been assigned to represent a four-bit counterpart are used for control (see Table
4.3). The SD field contains the J and K codes, and the ED field contains the symbols TT. To guarantee that
these control codes do not endanger synchronization or transparency, the designers specify bit patterns that can
never occur in the data field. In addition, their order is controlled to limit the number of sequential bit patterns
possible. A K always follows a J, and an H is never followed by an R.
Table 4.3: 4B15B control symbol
The FDDI standard divides transmission functions into four protocols: physical medium dependent (PMD),
physical (PHY), media access control (MAC), and logical link control (LLC). These protocols correspond to
the physical and data link layers of the OSI model (see Figure 4.19). In addition, the standard specifies a fifth
protocol (used for station management).
Each frame is preceded by 16 idle symbols (1111), for a total of 64 bits, to initialize clock synchronization
with the receiver.
Frame Fields There are eight fields in the FDDI frame:
• Start delimiter (SD). The first byte of the field is the frame's starting flag. As in Token Ring, these bits are
replaced in the physical layer by the control codes (violations) J and K (the five-bit sequences used to
represent J and K are shown in Table 4.3).
• Frame control (FC). The second byte of the frame identifies the frame type.
• Addresses. The next two fields are the destination and source addresses. Each address consists of two to six
bytes.
• Data. Each data frame can carry up to 4500 bytes of data.
• CRC. FDDI uses the standard IEEE four-byte cyclic redundancy check.
• End delimiter (ED). This field consists of half a byte in the data frame or a full byte in the token frame. It is
changed in the physical layer with one T violation symbol in the data/command frame or two T symbols in the
token frame. (The code for the T-symbol is shown Table 4.3.)
• Frame status (FS). The FDDI FS field is similar to that of Token Ring. It is included only in the
data/command frame and consists of 1.5 bytes.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Dual Ring
FDDI is implemented as a dual ring (see Figure 4.21). In most cases, data transmission is confined to the
primary ring. The secondary ring is provided in case the primary fails.
The secondary ring makes FDDI self-healing. Whenever a problem occurs on the primary ring, the secondary
can be activated to complete data circuits and maintain service (see Figure 4.22).
Figure 4.22: FDDI ring after a failure.
Nodes connect to one or both rings using a media interface connector (MIC) that can be either male or female
depending on the requirements of the-station.
Nodes
FDDI defines three types of nodes: dual attachment station (DAS), single attachment station (SAS), and dual
attachment concentrator (DAC); see Figure 4.23.
DAS: A dual attachment station (DAS) has two MICs (called MIC A and MIC A MIC B and connects to both
rings. To do so requires an expensive NIC with two inputs and two outputs. The connection to both rings gives
it improved reliability and throughput these improvements, however, are predicated on the stations remaining
on. Faults are bypassed by a station's making a wrap connection from the primary ring to the secondary to
switch signals from one input to another output. However, for DAS stations to make this switch, they must be
active (turned on).
SAS: Most workstations, servers, and minicomputers attach to the ring in single attachment station (SAS)
mode. An SAS has only one MIC (called MIC S) and therefore can connect only to one ring. Robustness is
achieved by connecting SASs to intermediate ring nodes, called dual attachment concentrators (DACs), rather
than to the FDDI ring directly. This configuration allows each workstation to operate through a simple NIC
with only one input and one output. The concentrator (DAC) provides the connection to the, dual ring. Faulty
stations can be turned off and bypassed to keep the ring, alive
DAC: As mentioned above, a dual attachment concentrator (DAC) connects an SAS to the dual ring. It
provides wrapping (diverting traffic from one ring to the other to bypass a failure) as well as control functions.
It uses MIC M to-connect to an SAS.
4.4.3 Versatility
New technology is easily integrated into client-to-server network connections as the operation is controlled
centrally. Of course when this new technology is integrated into the system, a given staff must then be trained
to use the new technology which can be time consuming and have a few pitfalls as workers integrate the new
system into existing protocols. Peer-to-peer systems depend largely on existing software platforms installed on
computers linked to the network and while systems for the entire network cannot be changed, each user is able
to customize a work station to optimize personal efficiency.
Caution
If your computer are part of a large network, your must verify with your network administrator that the
computer names, domain name and other information used in setting up windows server 2003,windows XP,
and ISA Server 2004 otherwise it will conflicts with network operations.
Caution
Switches multiplexers and routers must incorporate elaborate software systems to manage the various sizes of
packets in the mixed network traffic.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Figure 4.28: shows the relationship of different classes to the total capacity of the network.
Network-Related Attributes
The network-related attributes are those that define characteristics of the network. The following are some
network-related attributes:
CLR: The cell loss ratio (CLR) defines the fraction of cells lost (or delivered so late that they are considered
lost) during transmission. For example, if the sender sends 100 cells and one of them is lost, the CLR is
CLR = 1/100 = 10-2
CTD: The cell transfer delay (CTD) is the average time needed for a cell to from source to destination. The
maximum CTD and the minimum CTD are also considered attributes.
CDV: The cell delay variation (CDV) is the difference between the CTD maximum and the CTD minimum.
CER: The cell error ratio (CER) defines the fraction of the cells delivered in error.
4.7 ARCNET
ARCNET, once quite popular in office automation, has reinvented itself into an embedded networking
technology that is frequently found in applications such as industrial control, building automation,
transportation, robotics and gaming. Like Ethernet and Controller Area Network (CAN), ARCNET is a data-
link layer technology with no defined application layer. Designers write their own application layer to meet
their particular needs and frequently do not advertise the fact that ARCNET is being used in their product.
ARCNET incorporates a token-passing protocol where media access is determined by the station with the
token. When a station receives the token, it can either initiate a transmission to another station or it must pass
the token to its logical neighbour. All stations are considered peers and no one station can consume all the
bandwidth since only one packet can be sent each token pass. This scheme avoids collisions and gives
ARCNET its greatest advantage in real-time applications—it is deterministic! By being deterministic, the
designer can accurately predict the time it takes for a particular station to gain access to the network and send a
message. This is of particular importance for control or robotic applications where timely responses or
coordinated motion are needed.
4.8 AppleTalk
AppleTalk was designed with a transparent network interface. That is, the interaction between client
computers and network servers requires little interaction from the user. In addition, the actual operations of the
AppleTalk protocols are invisible to end users, who see only the result of these operations. Two versions of
AppleTalk exist: AppleTalk Phase 1 and AppleTalk Phase 2.
AppleTalk Phase 1, which is the first AppleTalk specification, was developed in the early 1980s strictly for
use in local workgroups. Phase 1 therefore has two key limitations: its network segments can contain no more
than 127 hosts and 127 servers, and it can support only non extended networks.
AppleTalk Phase 2, which is the second enhanced AppleTalk implementation, was designed for use in larger
internetworks. Phase 2 addresses the key limitations of AppleTalk Phase 1 and features a number of
improvements over Phase 1. In particular, Phase 2 allows any combination of 253 hosts or servers on a single
AppleTalk network segment and supports both nonextended and extended networks.
Figure 4.31: Socket clients use sockets to send and receive datagrams.
Nodes
An AppleTalk node is a device that is connected to an AppleTalk network. This device might be a Macintosh
computer, a printer, an IBM PC, a router, or some other similar device. Within each AppleTalk node exist
numerous software processes called sockets. As discussed earlier, the function of these sockets is to identify
the software processes running in the device. Each node in an AppleTalk network belongs to a single network
and a specific zone.
Networks
An AppleTalk network consists of a single logical cable and multiple attached nodes. The logical cable is
composed of either a single physical cable or multiple physical cables interconnected by using bridges or
routers. AppleTalk networks can be nonextended or extended. Each is discussed briefly in the following
sections.
No extended Networks
A no extended AppleTalk network is a physical-network segment that is assigned only a single network
number, which can range between 1 and 1,024. Network 100 and Network 562, for example, are both valid
network numbers in a no extended network. Each node number in a nonextended network must be unique, and
a single no extended network segment cannot have more than one AppleTalk Zone configured on it. (A zone is
a logical group of nodes or networks.) AppleTalk Phase 1 supports only nonextended networks, but as a rule,
nonextended network configurations are no longer used in new networks because they have been superseded
by extended networks. Figure 4.32 illustrates a nonextended AppleTalk network.
Figure 4.32: A nonextended network is assigned only one network number.
Extended Networks
An extended AppleTalk network is a physical-network segment that can be assigned multiple network
numbers. This configuration is known as a cable range. AppleTalk cable ranges can indicate a single network
number or multiple consecutive network numbers. The cable ranges Network 3-3 (unary) and Network 3-6, for
example, are both valid in an extended network. Just as in other protocol suites, such as TCP/IP and IPX, each
combination of network number and node number in an extended network must be unique, and its address
must be unique for identification purposes. Extended networks can have multiple AppleTalk zones configured
on a single network segment, and nodes on extended networks can belong to any single zone associated with
the extended network. Extended network configurations have, as a rule, replaced nonextended network
configurations. Figure 4.33 illustrates an extended network.
Zones
An AppleTalk zone is a logical group of nodes or networks that is defined when the network administrator
configures the network. The nodes or networks need not be physically contiguous to belong to the same
AppleTalk zone. Figure 4.34 illustrates an AppleTalk internetwork composed of three noncontiguous zones.
Figure 4.34: Nodes or networks in the same zone need not be physically contiguous.
Exercise: Check Your Progress 3
Note: i) Use the space below for your answer.
Ex1: Draw the hierarchy of AppleTalk internetwork consists of components.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
4.9 Summary
In fast Ethernet, the data rate is increased to 100Mbps, but the collision domain is reduced to 250 meters.
Asynchronous Transfer Mode (ATM) is the cell relay protocol designed by the ATM Forum and adopted
by the ITU-T. The combination of ATM and B-ISDN will allow high-speed interconnection of all the
world's networks.
User-related attributes are those attributes that define how fast the user wants to send data. These
AppleTalk networks are the interaction between client computers and network servers requires little
interaction from the user.
AppleTalk networks are arranged hierarchically. Four basic components form the basis of an AppleTalk
network: sockets, nodes, networks, and zones.
ARCNET incorporates a token-passing protocol where media access is determined by the station with the
token.
4.10 Keywords
Cell Relay: Asynchronous Transfer Mode (ATM) is the cell relay protocol designed by the ATM Forum and
adopted by the ITU-T.
FDDI: Fiber distributed data interface (FDDI) is a local area network protocol using optical fiber as a
medium, with a 100-Mbps data rate.
Gigabit Ethernet: It is mainly designed to use optical fiber although the protocol does not eliminate the use of
twisted pair cables.
NIC: Each station on an Ethernet network has its network interface card (NIC). The NIC usually fits inside the
station and provides station with a six-byte physical address.
WAN: ATM is potentially as effective a LAN and short-haul mechanism as it is a WAN mechanism.
5. 0 Objectives
After studying this chapter, you will be able to:
Define LAN architecture
Discuss about the IEEE 802 Standards
Explain the Wireless LANS
Describes the bridges
5.1 Introduction
A Local Area Network (LAN) is a group of computers and associated devices that share a common
communications line or wireless link. Typically, connected devices share the resources of a single processor or
server within a small geographic area (for example, within an office building). Usually, the server has
applications and data storage that are shared in common by multiple computer users. A local area network may
serve as few as two or three users (for example, in a home network) or as many as thousands of users (for
example, in an FDDI network).
An engineer (IEEE) has produced a set of standards for LAN architectures. Although token ring and Ethernet
were both created before the IEEE standards, the IEEE specifications for IEEE 802.3 (Ethernet) and IEEE
802.5 (token ring) now provide vendor-neutral standards for these important LAN technologies.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
IEEE 802.3 is a collection of IEEE standards defining the physical layer and the media access control
(MAC) sub-layer of the data link layer of wired Ethernet.
This is generally a LAN technology with some WAN applications. Physical connections are made between
nodes and/or infrastructure devices (hubs, switches, routers) by various types of copper or fiber cable.
IEEE 802.3 is a technology that can support the IEEE 802.1network architecture.
The maximum packet size is 1518 bytes, although to allow the Q-tag for Virtual LAN and priority data in
802.3ac it is extended to 1522 bytes. If the upper layer protocol submits a PDU (Protocol data unit) less
than 64 bytes, 802.3 will pad the data field to achieve the minimum 64 bytes.
Physical Connections
IEEE 802.3 specifies several different physical layers, whereas Ethernet defines only one. Each IEEE 802.3
physical layer protocol has a name that summarizes its characteristics. The coded components of an IEEE
802.3 physical-layer name are shown in Figure 5.2.
Figure 5.2: IEEE 802.3 Physical-Layer Name Components.
A summary of Ethernet Version 2 and IEEE 802.3 characteristics appears in Table 5.1.
Ethernet is most similar to IEEE 802.3 10Base5. Both of these protocols specify a bus topology network with a
connecting cable between the end stations and the actual network medium. In the case of Ethernet, that cable is
called a transceiver cable. The transceiver cable connects to a transceiver device attached to the physical
network medium. The IEEE 802.3 configuration is much the same, except that the connecting cable is referred
to as an attachment unit interface (AUI), and the transceiver is called a medium attachment unit (MAU). In
both cases, the connecting cable attaches to an interface board (or interface circuitry) within the end station.
Frame Formats
Ethernet and IEEE 802.3 frame formats are shown in Figure 5.3.
The probabilistic nature of CSMA/ CD leads to uncertainty about the delivery time; which created the
need for a different protocol
The token ring, on the hand, is very vulnerable to failure.
Token bus provides deterministic delivery time, which is necessary for real time traffic.
Token bus is also less vulnerable compared to token ring.
Functions of a Token Bus
It is the technique in which the station on bus or tree forms a logical ring that is the stations are assigned
positions in an ordered sequence, with the last number of the sequence followed by the first one as shown in
Figure 5.4. Each station knows the identity of the station following it and preceding it.
A control packet known as a Token regulates the right to access. When a station receives the token, it is
granted control to the media for a specified time, during which it may transmit one or more packets and may
poll stations and receive responses when the station is done, or if its time has expired then it passes token to
next station in logical sequence. Hence, steady phase consists of alternate phases of token passing and data
transfer.
The MAC sub layer consists of four major functions: the interface machine (IFM), the access control machine
(ACM), the receiver machine (RxM) and the transmit machine (TxM).
IFM interfaces with the LLC sub layer. The LLC sub layer frames are passed on to the ACM by the IFM and if
the received frame is also an LLC type, it is passed from RxM component to the LLC sub layer. IFM also
provides quality of service.
The ACM is the heart of the system. It determines when to place a frame on the bus, and responsible for the
maintenance of the logical ring including the error detection and fault recovery. It also cooperates with other
stations ACM‘s to control the access to the shared bus, controls the admission of new stations and attempts
recovery from faults and failures.
The responsibility of a TxM is to transmit frame to physical layer. It accepts the frame from the ACM and
builds a MAC protocol data unit (PDU) as per the format.
The RxM accepts data from the physical layer and identifies a full frame by detecting the SD and ED (start and
end delimiter). It also checks the FCS field to validate an error-free transmission.
Frame Form
The frame format of the Token Bus is shown in Figure 5.5. Most of the fields are same as Token Ring. So, we
shall just look at the Frame Control Field in Table 5.2.
It imposes the reliability in an elegant manner. Although logically the network remains as a ring, physically
each station is connected to the wire center with two twisted pairs for 2-way communication. Inside the wire
center, bypass relays are used to isolate a broken wire or a faulty station. This Topology is known as Star-
Connected Ring.
Frame Format
Token Ring and IEEE 802.5 support two basic frame types: tokens and data/command frames. Tokens are 3
bytes in length and consist of a start delimiter, an access control byte, and an end delimiter. Data/command
frames vary in size, depending on the size of the Information field. Data frames carry information for upper-
layer protocols, while command frames contain control information and have no data for upper-layer
protocols.
Token Frame Fields
Start delimiter (1 byte): Alerts each station of the arrival of a token (or data/command frame). This field
includes signals that distinguish the byte from the rest of the frame by violating the encoding scheme used
elsewhere in the frame.
Access-control (1 byte): Contains the Priority field (the most significant 3 bits) and the Reservation field
(the least significant 3 bits), as well as a token bit (used to differentiate a token from a data/command
frame) and a monitor bit (used by the active monitor to determine whether a frame is circling the ring
endlessly).
End delimiter (1 byte): Signals the end of the token or data/command frame. This field also contains bits to
indicate a damaged frame and identify the frame that is the last in a logical sequence.
Data/command frames have the same three fields as Token Frames, plus several others. The Data/command
frame fields are described below:
Frame-control byte (1 byte)—Indicates whether the frame contains data or control information. In control
frames, this byte specifies the type of control information.
Destination and source addresses (2-6 bytes)—Consists of two 6-byte address fields that identify the
destination and source station addresses.
Data (up to 4500 bytes)—Indicates that the length of field is limited by the ring token holding time, which
defines the maximum time a station can hold the token.
Frame-check sequence (FCS- 4 byte)—Is filed by the source station with a calculated value dependent on
the frame contents. The destination station recalculates the value to determine whether the frame was
damaged in transit. If so, the frame is discarded.
Frame Status (1 byte)—This is the terminating field of a command/data frame. The Frame Status field
includes the address-recognized indicator and frame-copied indicator.
Differences between Ethernet and IEEE 802.3 LANs are subtle. Ethernet provides services corresponding to
Layers 1 and 2 of the OSI reference model, while IEEE 802.3 specifies the physical layer (Layer 1) and the
channel-access portion of the link layer (Layer 2), but does not define a logical link control protocol. Both
Ethernet and IEEE 802.3 are implemented in hardware. Typically, the physical manifestation of these
protocols is either an interface card in a host computer or circuitry on a primary circuit board within a host
computer.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
2. …………..define physical network interfaces such as network interface cards, bridges, routers, connectors,
cables, and all the signaling and access methods associated with physical network connections.
(a) TCP/IP (b) IEEE 802 standards
(c) FTP (d) None of these
4. ……………..is a collection of IEEE standards defining the physical layer and the media access control
(MAC) sub-layer of the data link layer of wired Ethernet?
(a) IEEE 802.2 (b) IEEE 802.6
(c) IEEE 802.4 (d) IEEE 802.3
5. …………………..defines the medium access control(MAC) layer for bus networks that use a token-passing
mechanism (token bus networks).
(a) IEEE 802.4 (b) IEEE 802.3
(c) IEEE 802.5 (d) IEEE 802.7
6. ……………..provides deterministic delivery time, which is necessary for real time traffic.
(a) Token ring (b) Token bus
(c) Both (a) and (b) (d) None of these
5.5 Summary
IEEE has produced a set of standards for LAN architectures. Although token ring and ethernet were both
created before the IEEE standards, the IEEE specifications for IEEE 802.3 (ethernet) and IEEE 802.5
(token ring) now provide vendor-neutral standards for these important LAN technologies.
LAN (Local Area Network) refers to a group of computers interconnected into a network so that they are
able to communicate, exchange information and share resources (e.g. printers, application programs,
database etc).
IEEE 802.3 is a collection of IEEE standards defining the physical layer and the media access control
(MAC) sub-layer of the data link layer of wired Ethernet.
The MAC sublayer consists of four major functions: the interface machine (IFM), the access control
machine (ACM), the receiver machine (RxM) and the transmit machine (TxM).
5.6 Keywords
Bridge: It is a network communication device that is used to connect one segment of the network with another
that uses the same protocol.
CRC check: It is a mathematical formula that uses the data as input and produces a numeric result that is
almost as unique as the input data.
IEEE (Institute of Electrical and Electronic Engineers): It is a technical association of industry professionals
with a common interest in advancing all communications technologies.
LAN: It is a group of computers and associated devices that share a common communications line or wireless
link.
MAC addresses: These are ‗burned‘ into the Network Interface Card (NIC), and cannot be changed. ARP and
RARP on how IP addresses are translated into MAC addresses and vice versa.
Wireless LANs: These networks are set up to provide wireless connectivity within a finite coverage area.
6.0 Objectives
After studying this chapter, you will be able to:
Discuss the overview of TCP/IP
Explain the TCP/IP protocol
Define internet control message protocol
Explain the reverse address resolution protocol
6.1 Introduction
Transmission Control Protocol/Internet Protocol (TCP/IP) is an industry standard protocols that is designed for
large networks consisting of network segments that are connected by routers. TCP/IP is the protocol that is
used on the Internet, which is the collection of thousands of networks worldwide that connect research
facilities, universities, libraries, government agencies, private companies, and individuals.
TCP/IP is made up of two acronyms, TCP, for Transmission Control Protocol, and IP, for Internet Protocol.
TCP handles packet flow between systems and IP handles the routing of packets. However, that is a simplistic
answer that we will expound on further.
All modern networks are now designed using a layered approach. Each layer presents a predefined interface to
the layer above it. By doing so, a modular design can be developed so as to minimize problems in the
development of new applications or in adding new interfaces.
The ISO/OSI protocol with seven layers is the usual reference model. Since TCP/IP was designed before the
ISO model was developed it has four layers; however the differences between the two are mostly minor.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
UDP Ports
To use UDP, an application must supply the IP address and UDP port number of the source and destination
applications. A port provides a location for sending messages. A unique number identifies each port. UDP
ports are distinct and separate from TCP ports even though some of them use the same number. Just like TCP
ports, UDP port numbers below 1024 are well-known ports that IANA assigns. Table 6.2 lists a few well-
known UDP ports.
Table 6.2 Well-known UDP ports
UDP Port Number Description
53 Domain Name System (DNS) name queries
69 Trivial File Transfer Protocol (TFTP)
137 NetBIOS name service
138 NetBIOS datagram service
161 SNMP
Caution
Be aware while accessing Internet Websites wrong data accessing can get the virus may be cause to harm the
system.
ICMP contains a series of defined Destination Unreachable messages. Table 6.4 lists and describes the most
common messages.
ICMP does not make IPv4 a reliable protocol. ICMP attempts to report errors and provide feedback on specific
conditions. ICMP messages are carried as unacknowledged IPv4 packets and are themselves unreliable.
ARP Process
When sending the initial packet as the sending host or forwarding the packet as a router, IPv4 sends the IPv4
packet, the next-hop IPv4 address, and the next-hop interface to ARP. Whether performing a direct or indirect
delivery, ARP performs the following process:
1. Based on the next-hop IPv4 address and interface, ARP checks the appropriate ARP cache for an entry that
matches the next-hop IPv4 address. If ARP finds an entry, ARP skips to step 6.
2. If ARP does not find an entry, ARP builds an ARP Request frame. This frame contains the MAC and IPv4
addresses of the interface from which the ARP request is being sent and the IPv4 packet‘s next-hop IPv4
address. ARP then broadcasts the ARP Request frame from the appropriate interface.
3. All nodes on the subnet receive the broadcasted frame and process the ARP request. If the next-hop address
in the ARP request corresponds to the IPv4 address assigned to an interface on the subnet, the receiving node
updates its ARP cache with the IPv4 and MAC addresses of the ARP requestor. All other nodes silently
discard the ARP request.
4. The receiving node that is assigned the IPv4 packet‘s next-hop address formulates an ARP reply that
contains the requested MAC address and sends the reply directly to the ARP requestor.
5. When the ARP requestor receives the ARP reply, the requestor updates its ARP cache with the address
mapping. With the exchange of the ARP request and the ARP reply, both the ARP requestor and ARP
responder have each other‘s address mappings in their ARP caches.
6. The ARP requestor sends the IPv4 packet to the next-hop node by addressing it to the resolved MAC
address. Figure 6.2 shows this process.
A reverse address resolution protocol (RARP) is used for diskless computers to determine their IP address
using the network. The RARP message format is very similar to the ARP format. When the booting computer
sends the broadcast ARP request, it places its own hardware address in both the sending and receiving fields in
the encapsulated ARP data packet. The RARP server will fill in the correct sending and receiving IP addresses
in its response to the message. This way, the booting computer will know its IP address when it gets the
message from the RARP server. RARP request packet is usually generated during the booting sequence of a
host. A host must determine its IP address during the booting sequence. The IP address is needed to
communicate with other hosts in the network. When a RARP server receives a RARP request packet, it
performs the following steps:
1. The MAC address in the request packet is looked up in the configuration file and mapped to the
corresponding IP address.
2. If the mapping is not found, the packet is discarded.
3. If the mapping is found, a RARP reply packet is generated with the MAC and IP address. This packet is sent
to the host, which originated the RARP request.
When a host receives a RARP reply packet, it gets its IP address from the packet and completes the booting
process. This IP address is used for communicating with other hosts, till it is rebooted. The length of a RARP
request or a RARP reply packet is 28 bytes.
The ‗operation‘ field in the RARP packet is used to differentiate between a RARP request and a RARP reply
packet. In an RARP request packet, the source and destination IP address values are undefined. In a RARP
reply packet, the source IP address is the IP address of the RARP server responding to the RARP request and
the destination IP address is the IP address of the host that sent the said RARP request.
Since a RARP request packet is a broadcast packet, it is received by all the hosts in the network. But only a
RARP server processes a RARP request packet, all the other hosts discard the packet. The RARP reply packet
is not broadcast, it is sent directly to the host, which sent the RARP request. If more than one RARP server
responds to a RARP request, then only the first RARP reply received is used. All other replies are discarded. If
a RARP reply is not received within a reasonable amount of time, the host, which sent the RARP request, will
not be able to complete its booting sequence. Usually, the host will again retry sending the RARP request after
a timeout period.
The BOOTP and DHCP protocols can be used instead of RARP to get the IP address from the MAC address.
Protocol Structure - RARP (Reverse Address Resolution Protocol)RARP and ARP has the same structure:
RARP packet:
Table 6.5: RARP packet
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
Hardware type Protocol type
Hardware address length Protocol address length Opcode
Source hardware address :::
Source protocol address :::
Destination hardware address :::
Destination protocol address :::
Hardware type - Specifies a hardware interface type for which the sender requires a response.
Protocol type - Specifies the type of high-level protocol address the sender has supplied.
Hlen - Hardware address length.
Plen - Protocol address length.
Operation - The values are as follows:
ARP request.
ARP response.
RARP request.
RARP response.
Dynamic RARP request.
Dynamic RARP reply.
Dynamic RARP error.
InARP request.
InARP reply.
Sender hardware address -HLen bytes in length.
Sender protocol address - PLen bytes in length.
Target hardware address - HLen bytes in length.
Target protocol address - PLen bytes in length.
Exercise: Check Your Progress 2
Note: i) Use the space below for your answer.
Ex1: List the advantages and disadvantages of RARP.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
The DHCP client computer finishes initializing the TCP/IP protocol on the interface. Once complete, the client
can use all TCP/IP services and applications for normal network communications and connectivity to other
IPv4 hosts. Figure 6.5 shows the basic DHCP process.
If a computer has multiple network adapters, the DHCP process occurs separately over each network adapter
that is configured for automatic TCP/IP addressing until each network adapter in the computer has been
allocated a unique IPv4 address configuration.
Figure 6.6 shows DHCP client states and messages, which are discussed in detail in the following sections
Computers running Windows XP or Windows Server 2003 use an additional DHCP message, the DHCP
Inform message, to request and obtain information from a DHCP server for the following purposes:
To detect authorized DHCP servers in an environment that includes the Active Directory directory service.
To obtain updated addresses for DNS servers and WINS servers and a DNS domain name when making a
remote access connection.
To obtain additional configuration parameters.
Initializing State
In the Initializing state, the DHCP client is trying to initialize TCP/IP and it does not yet have an IPv4 address
configuration. This state occurs the first time the TCP/IP protocol stack is initialized after being configured for
automatic configuration and when the DHCP client cannot renew the lease on an IPv4 address configuration.
When the DHCP client is in the Initializing state, its IPv4 address is 0.0.0.0, also known as the unspecified
address. The DHCP client‘s first task is to obtain an IPv4 address configuration by broadcasting a
DHCPDiscover message from UDP port 67 to UDP port 68. Because the DHCP client does not yet have an
IPv4 address and has not determined the IPv4 addresses of any DHCP servers, the source IPv4 address for the
DHCPDiscover broadcast is the unspecified address, 0.0.0.0, and the destination is the limited broadcast
address, 255.255.255.255. The DHCPDiscover message contains the DHCP client‘s media access control
(MAC) address and computer name.
If a DHCP server is on the DHCP client‘s subnet, the server receives the broadcast DHCPDiscover message. If
no DHCP server on the DHCP client‘s subnet (a more typical configuration), a DHCP relay agent on the
DHCP client‘s subnet receives the broadcast DHCPDiscover message and relays it as a unicast DHCPDiscover
message from the DHCP relay agent to one or more DHCP servers. Before forwarding the original
DHCPDiscover message, the DHCP relay agent makes the following changes:
Increments the Hops field in the DHCP header of the DHCPDiscover message. The Hops field, which is
separate from the Time to Live (TTL) field in the IPv4 header, indicates how many DHCP relay agents
have handled this message. Typically, only one DHCP relay agent is located between any DHCP client
and any DHCP server.
If the value of the Giaddr (Gateway IP Address) field in the DHCP header of the DHCPDiscover message
is 0.0.0.0 (as set by the originating DHCP client), changes the value to the IPv4 address of the interface on
which the DHCP Discover message was received. The Giaddr field records the IPv4 address of an
interface on the subnet of the originating DHCP client. The DHCP server uses the value of the Giaddr field
to determine the address range, known as a scope, from which to allocate an IPv4 address to the DHCP
client.
Changes the source IPv4 address of the DHCP Discover message to an IPv4 address assigned to the DHCP
relay agent.
Changes the destination IPv4 address of the DHCP Discover message to the unicast IPv4 address of a
DHCP server.
The DHCP relay agent sends the DHCPD is cover message as a unicast IPv4 packet rather than as an IPv4 and
MAC-level broadcast. If the DHCP relay agent is configured with multiple DHCP servers, it sends each DHCP
server a copy of the DHCP Discover message.
Figure 6.7 shows the sending of the DHCPD is cover message by a DHCP relay agent that is configured with
two DHCP servers.
Selecting State
In the Initializing state, the DHCP client can select from the set of IPv4 address configurations that the DHCP
servers offered. All DHCP servers that receive the DHCPDiscover message and that have a valid IPv4 address
configuration for the DHCP client respond with a DHCPOffer message from UDP port 68 to UDP port 67. A
DHCP server can receive the DHCPDiscover message either as a broadcast (because the DHCP server is on
the same subnet as the DHCP client) or as a unicast from a DHCP relay agent.
The DHCP server uses the following process to determine the scope on the DHCP server from which an IPv4
address for the DHCP client is to be selected and included in the DHCPOffer message:
1. If the Giaddr field is set to 0.0.0.0, set the value of the Giaddr field to the IPv4 address of the interface on
which the DHCPDiscover message was received.
2. For each scope on the DHCP server, perform a bit-wise logical AND of the value in the Giaddr field with
the subnet mask of the scope. If the result matches the subnet prefix of the scope, the DHCP server
allocates an IPv4 address from that scope. To obtain the subnet prefix of the scope, the DHCP server
performs a bit-wise logical AND of the subnet mask of the scope with any address in the scope.
If the DHCPDiscover message was received as a broadcast, the DHCP server sends the DHCPOffer message
to the DHCP client using the offered IPv4 address as the destination IPv4 address and the client‘s MAC
address as the destination MAC address. If the DHCPDiscover message was received as a unicast, the DHCP
server sends the DHCPOffer message to the DHCP relay agent. The DHCP relay agent uses the Giaddr value
to determine the interface to use to forward the DHCPOffer message. The DHCP relay agent then forwards the
DHCPOffer message to the client using the offered IPv4 address as the destination IPv4 address and the
client‘s MAC address as the destination MAC address.
Figure 6.8 shows the sending of the DHCPOffer message.
The DHCPOffer messages contain the DHCP client‘s MAC address, an offered IPv4 address, appropriate
subnet mask, a server identifier (the IPv4 address of the offering DHCP server), the length of the lease, and
other configuration parameters. When a DHCP server sends a DHCPOffer message offering an IPv4 address,
the DHCP server reserves the IPv4 address so that it will not be offered to another DHCP client.
The DHCP client selects the IPv4 address configuration of the first DHCPOffer message it receives. If the
DHCP client does not receive any DHCPOffer messages, it continues to retry sending DHCPDiscover
messages for up to one minute. After one minute, a DHCP client based on Windows Server 2003 or Windows
XP configures an alternate configuration, either through APIPA or an alternate configuration that has been
configured manually.
Requesting State
In the Requesting state, the DHCP client requests a specific IP address configuration by broadcasting a
DHCPRequest message. The client must use a broadcast because it does not yet have a confirmed IPv4 address
configuration. Just as in the DHCPDiscover message, the DHCP client sends the DHCPRequest message from
UDP port 67 to UDP port 68 using the source IPv4 address of 0.0.0.0 and the destination IPv4 address of
255.255.255.255.
If the DHCP client does not have a DHCP server on its subnet, a DHCP relay agent on its subnet receives the
broadcast DHCPRequest message and relays it as a unicast DHCPRequest message from the DHCP relay
agent to one or more DHCP servers.
The data in the DHCPRequest message varies in the following way, depending on how the requested IPv4
address was obtained:
If the IPv4 address configuration of the DHCP client was just obtained with a DHCPDiscover/DHCPOffer
message exchange, the DHCP client includes the IPv4 address of the server from which it received the
offer in the DHCPRequest message. This server identifier causes the specified DHCP server to respond to
the request and all other DHCP servers to retract their DHCP offers to the client. These retractions make
the IPv4 addresses that the other DHCP servers offered immediately available to the next DHCP client.
If the IPv4 address configuration of the client was previously known (for example, the computer was
restarted and is trying to renew its lease on its previous address), the DHCP client does not include the
IPv4 address of the server from which it received the IPv4 address configuration. This condition ensures
that when restarting, the DHCP client can renew its IPv4 address configuration from any DHCP server.
Figure 6.9 shows the sending of the DHCPRequest message.
Bound State
In the Bound state, the DHCP client receives confirmation that DHCP server has allocated and reserved the
offered IPv4 address configuration to the DHCP client. The DHCP server that leased the requested IPv4
address responds with either a successful acknowledgment (DHCPAck) or a negative acknowledgment
(DHCPNak). The DHCP server sends the DHCPAck message from UDP port 68 to UDP port 67, and the
message contains a lease period for the requested IPv4 address configuration as well as any additional
configuration parameters.
If the DHCPRequest message was received as a broadcast, the DHCP server sends the DHCPAck message to
the DHCP client using the offered IPv4 address as the destination IPv4 address and the client‘s MAC address
as the destination MAC address. If the DHCPRequest was received as a unicast, the DHCP server sends the
DHCPAck message to the DHCP relay agent. The DHCP relay agent uses the Giaddr value to determine the
interface to use to forward the DHCPAck message. The DHCP relay agent then forwards the DHCPAck
message to the DHCP client using the offered IPv4 address as the destination IPv4 address and the DHCP
client‘s MAC address as the destination MAC address.
Figure 6.10 shows the sending of the DHCPAck message.
Figure 6.10: Sending the DHCPAck message.
When the DHCP client receives the DHCPAck message, it enters the Bound state. The DHCP client completes
the initialization of TCP/IP, which includes verifying that the IPv4 address is unique on the subnet. If the IPv4
address is unique, the DHCP client computer can use TCP/IP to communicate. If the IPv4 address is not
unique, the DHCP client broadcasts a DHCPDecline message and returns to the Initializing state. The DHCP
server receives the DHCPDecline message either as a broadcast or as a unicast through a DHCP relay agent.
When the DHCP server receives the DHCPDecline message, it marks the offered IPv4 address as unusable.
The DHCPNak message is forwarded to the DHCP client‘s subnet using the same method as the DHCPAck
message. When the DHCP client receives a DHCPNak, it returns to the Initializing state.
Renewing State
In the Renewing state, a DHCP client is attempting to renew the lease on its IPv4 address configuration by
communicating directly with its DHCP server. By default, DHCP clients first try to renew their lease when
50% of the lease time has expired. To renew its lease, a DHCP client sends a unicast DHCPRequest message
to the DHCP server from which it obtained the lease.
The DHCP server automatically renews the lease by responding with a DHCPAck message. This DHCPAck
message contains the new lease and additional configuration parameters so that the DHCP client can update its
settings. For example, the network administrator might have updated settings on the DHCP server since the
lease was acquired or last renewed. When the DHCP client has renewed its lease, it returns to the Bound state.
Figure 6.11 shows the DHCP renewing process.
Figure 6.11: DHCP renewing process.
When the DHCP server receives the DHCPRequest message, it compares the subnet prefix of client‘s
previously allocated IPv4 address to the subnet prefix of the IPv4 address stored in the Giaddr field and does
the following:
If the two subnet prefixes are the same and the IPv4 address can be reallocated to the DHCP client, the
DHCP server sends a DHCPAck to the DHCP relay agent. When the DHCP relay agent receives the
DHCPAck, the agent re-addresses the message to the client‘s current IPv4 address and MAC address.
If the two subnet prefixes are the same and the IPv4 address cannot be reallocated to the DHCP client, the
DHCP server sends a DHCPNak to the DHCP relay agent. When the DHCP relay agent receives the
DHCPNak, it sends the message to the client‘s current IPv4 address and MAC address. At this point, the
DHCP client goes into the Initializing state.
If the two subnet prefixes are not the same, the DHCP client has moved to a different subnet, and the
DHCP server sends a DHCPNak to the DHCP relay agent. When the DHCP relay agent receives the
DHCPNak, the agent sends the message to the client‘s current IPv4 address and MAC address. At this
point, the DHCP client goes into the Initializing state.
Caution
Be aware about the IP address while sending the file, any mistake can cause the loss of data.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
How It Works
Currently, there are three versions of NFS. NFS version 2 (NFSv2) is older and is widely supported. NFS
version 3 (NFSv3) has more features, including 64bit file handles, Safe Async writes and more robust error
handling. NFS version 4 (NFSv4) works through firewalls and on the Internet, no longer requires portmapper,
supports ACLs, and utilizes stateful operations. Red Hat Enterprise Linux supports NFSv2, NFSv3, and
NFSv4 clients, and when mounting a file system via NFS, Red Hat Enterprise Linux uses NFSv3 by default, if
the server supports it.
All versions of NFS can use Transmission Control Protocol (TCP) running over an IP network, with NFSv4
requiring it. NFSv2 and NFSv3 can use the User Datagram Protocol (UDP) running over an IP network to
provide a stateless network connection between the client and server.
6.13 Summary
Transmission Control Protocol/Internet Protocol (TCP/IP) is an industry standard protocols that is
designed for large networks consisting of network segments that are connected by routers.
RARP is used by many diskless systems to obtain their IP address when bootstrapped. The RARP packet
format is nearly identical to the ARP packet. An RARP request is broadcast, identifying the sender‘s
hardware address, asking for anyone to respond with the sender‘s IP address. The reply is normally
unicast.
RARP is available for Ethernet, Fiber Distributed-Data Interface, and Token Ring LANs. ARP (Address
Resolution Protocol) performs the opposite function as the RARP: mapping of an IP address to a physical
machine address.
SMTP (Simple Mail Transfer Protocol) is a TCP/IP protocol used in sending and receiving e-mail.
File Transfer Protocol (FTP) is a standard Internet protocol for transmitting files between computers on the
Internet.
DHCP is a TCP/IP standard that reduces the complexity and administrative overhead of managing network
client IPv4 addresses and other configuration parameters.
6.14 Keywords
FTP: It is a protocol used for transferring files from one computer to another typically from your computer to
a Web server.
RARP: It is used for diskless computers to determine their IP address using the network.
SMTP: It is used as the common mechanism for transporting electronic mail among different hosts within the
Department of Defense Internet protocol.
Sockets: It is a name given to the package of subroutines that provide access to TCP/IP on most systems.
TCP/IP: It is the protocol that is used on the Internet, which is the collection of thousands of networks
worldwide that connect research facilities, universities, libraries, government agencies, private companies, and
individuals.
UDP: It provides a connectionless datagram service that offers unreliable, best-effort delivery of data
transmitted in messages.
7.0 Objectives
After studying this chapter, you will be able to:
Discuss about the IP.
Explain the domain name system
Describes the uniform resource locator
Define electronic mail
7.1 Introduction
The following gives an introduction to IP addresses and subnetting on local area networks. If you want to find
out about the advantages of using private network IP addresses on your local area network, or what subnetting
can do for you, the explanation is here. You can also find the recipe for how you calculate a subnet mask, a
network address and broadcast address. An IP address is an address used to uniquely identify a device on an IP
network. The address is made up of 32 binary bits which can be divisible into a network portion and host
portion with the help of a subnet mask. The 32 binary bits are broken into four octets (1 octet = 8 bits). Each
octet is converted to decimal and separated by a period (dot). For this reason, an IP address is said to be
expressed in dotted decimal format (for example, 172.16.81.100). The value in each octet ranges from 0 to 255
decimal, or 00000000 − 11111111 binary.
7.2 Introduction to IP
Networks provide communication between computing devices. To communicate properly, all computers
(hosts) on a network need to use the same communication protocols. An Internet Protocol network is a
network of computer using Internet Protocol for their communication protocol.
All computers within an IP network must have an IP address that uniquely identifies that individual host. An
Internet Protocol-based network (an IP Network) is a group of hosts that share a common physical connection
and that use Internet Protocol for network layer communication. The IP addresses in an IP network are
contiguous, that is, one address follows right after the other with no gaps.
Even without subnetting, hosts on the Internet or any other IP network are assigned a network number.
Network numbering allows a group of hosts (peers) to communicate efficiently with each other. Hosts on the
same network may be computers located in the same facility or all computers used by a workgroup, for
example. Multi-homed hosts, that contain multiple network adapters, can belong to multiple networks, but
each adapter is assigned exactly one network number.
Network numbers look very much like IP addresses, but the two should not be confused. Consider for example
the host IP address 10.0.0.1, an address commonly used on private networks. Because it is a Class A address,
with no subnetting employed, its leftmost byte (eight bits) by default refers to the network address and all
other bits remain set at zero. Thus, 10.0.0.0 is the network number corresponding to IP address 10.0.0.1.
The portion of the IP address that does not refer to the network refers instead to the host address- literally, the
unique identifier of the host on that network. In the above example, the host address becomes ‗0.0.0.1‘ or
simply ‗1‘. Also note that a network address becomes a reserved address that should not be assigned to any
actual host. Configuring a live host at 10.0.0.0 in the example above could impact communications for all
hosts on that network.
The Table 7.1 illustrates the default numbering scheme for Class A, B, and C networks.
In general, a network address uses the leftmost byte of its hosts addressing if the hosts fall within the Class A
range, the leftmost two bytes for hosts in Class B, and the leftmost three bytes for hosts in Class C. This
algorithm is applied in practice through the use of a network mask. The Table 7.1 shows the decimal
representation of the default network masks that is commonly used by network operating systems. Note that
the decimal value ‗255‘ corresponds to one byte that has all bits set to one (11111111).
Broadcast Address
192 . 168 . 1 . 255
(decimal) All ones in the host
Broadcast Address portion
11000000 . 10011000 . 00000001 . 11111111
(binary)
Example:
In example, we borrow 2 bits from what would normally be the host portion and use it as bits that indicate the
network portion. This makes a smaller network of just 64 addresses, of which 62 are usable for hosts.
Remember, the first address in a range of IP addresses is reserved for the network address. The last address is
reserved for the broadcast address.
Caution
The same IP address in a network can cause of a problem during Internet accessing.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
7.3 Domain Name System (DNS)
The DNS is a widely used naming service on the Internet and other TCP/IP networks. The network protocols,
data and file formats, and other aspects of the DNS are Internet Standards, specified in a number of RFC
documents. The DNS has a distributed, client-server architecture. There are reference implementations for the
server and client, but these are not part of the standard. There are a number of additional implementations
available for many platforms.
TOP-LEVEL .org
|
MID-LEVEL .diverge.org
______________________|________________________
| | |
BOTTOM-LEVEL strider.diverge.org samwise.diverge.org wormtongue.diverge.org
The system can also be logically divided even further if one wishes at different points. The example shown
above shows three nodes on the diverge.org domain, but we could even divide diverge.org into sub domains
such as ―strider.net1.diverge.org‖, ―samwise.net2.diverge.org‖ and ―wormtongue.net2.diverge.org‖; in this
case, 2 nodes reside in ―net2.diverge.org‖ and one in ―net1.diverge.org‖.
There are directories of names, some of which may be sub-directories of further names. These directories are
sometimes called zones. There is provision for symbolic links, redirecting requests for information on one
name to the records bound to another name. Each name recognized by the DNS is called a Domain Name,
whether it represents information about a specific host, or a directory of subordinate Domain Names (or both,
or something else).
Unlike most file system naming schemes, however, Domain Names are written with the innermost name on
the left and progressively higher-level domains to the right, all the way up to the root directory if necessary.
The separator used when writing Domain Names is a period, ―.‖.
7.3.3 Delegation
Using NS records, authority for portions of the DNS namespace below a certain point in the tree can be
delegated, and further sub-parts below that delegated again. It is at this point that the distinction between a
domain and a zone becomes important. Any name in the DNS is called a domain, and the term applies to that
name and to any subordinate names below that one in the tree. The boundaries of a zone are narrower, and are
defined by delegations. A zone starts with a delegation (or at the root), and encompasses all names in the
domain below that point, excluding names below any subsequent delegations.
7.3.4 Delegation to Multiple Servers
For redundancy, it is common (and often administratively required) that there be more than one name server
providing information on a zone. It is also common that at least one of these servers be located at some
distance (in terms of network topology) from the others, so that knowledge of that zone does not become
unavailable in case of connectivity failure. Each nameserver will be listed in an NS record bound to the name
of the zone, stored in the parent zone on the server responsible for the parent domain. In this way, those
searching the name hierarchy from the top down can contact any one of the servers to continue narrowing their
search. This is occasionally called walking the tree.
There are a number of name servers on the Internet which are called root nameservers. These servers provide
information on the very top levels of the domain namespace tree. These servers are special in that their
addresses must be pre-configured into nameservers as a place to start finding other servers. Isolated networks
that cannot access these servers may need to provide their own root name servers.
Caution
Save and scan any attachments before opening them, they are also a common source of viruses and can harm
the system.
2. A host‘s IP address is the address of a specific host on an IP network. All hosts on a network must have a
……………..IP address.
(a) Two (b) Unique
(c) Three (d) None of these
3. The………….is reserved and allows a single host to make an announcement to all hosts on the network.
(a) Network address (b) Host address
(c) Broadcast address (d) None of these
4. …………… client is an application that is used to read, write and send email. In simple terms it is the user
interface to the email system.
(a) SMTP (b) IP
(c) FTP (d) Email
Exercise: Check Your Progress 2
Note: i) Use the space below for your answer.
Ex1: List the limitation of emails.
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
7.7 Summary
Subnets are created by using a so-called subnet mask to divide a single Class A, B, or C network number
into smaller pieces, thus allowing an organization to add subnets without having to obtain a new network
number through an Internet service provider.
URL contains the name of the protocol to be used to access the file resource, a domain name that identifies
a specific computer on the Internet, and a pathname, a hierarchical description that specifies the location of
a file in that computer.
Electronic mail (email) is the term given to an electronic message, usually a form of simple text message
that a user types at a computer system and is transmitted over some form of computer network to another
user.
An internetwork is a collection of individual networks, connected by intermediate networking devices, that
functions as a single large network.
A routing protocol is a protocol that specifies how routers communicate with each other, disseminating
information that enables them to select routes between any two nodes on a computer network, the choice
of the route being done by routing algorithms.
The Simple Network Management Protocol (SNMP) is an application-layer protocol that facilitates the
exchange of management information between network devices.
The SNMP message format specifies which fields to include in the message and in what order. Ultimately,
the message is made of several layers of nested fields.
MIME stands for Multi-purpose Internet Mail Extensions or Multimedia Internet Mail Extensions. At first
it was used as a way of sending more than just text via email. The protocol was extended to manage file
typing by Web servers.
The Hypertext Transfer Protocol (HTTP) is an application protocol for distributed, collaborative,
hypermedia information systems. HTTP is the foundation of data communication for the World Wide
Web.
7.8 Keywords
Broadcast IP address: is the last IP address in the range of IP addresses. To be more precise, the broadcast
address is the IP address in which all binary bits in the host portion of the IP address are set to one.
Connectionless: It is describes communication between two network end points in which a message can be
sent from one end point to another without prior arrangement.
Domain name: This is a name that a company has registered so that they can use it on the Internet. Other
examples are: apple.com, or microsoft.com.
IP multicast: It is a method of sending Internet Protocol (IP) datagrams to a group of interested receivers in a
single transmission.
IP Network: It is a group of hosts that share a common physical connection and that use Internet Protocol for
network layer communication.
Network address: It is the first IP address in the range of IP addresses. To be more precise, the network
address is the address in which all binary bits in the host portion of the IP address are set to zero.
Simple Network Management Protocol (SNMP): It is an ―Internet-standard protocol for managing devices on
IP networks. Devices that typically support SNMP include routers, switches, servers, workstations, printers,
modem racks.‖
Subnet: It is a segment of a network. Subnetting is a technique that allows a network administrator to divide
one physical network into smaller logical networks and, thus, control the flow of traffic for security or
efficiency reasons.
URL: It is one type of Uniform Resource Identifier (URI); the generic term for all types of names and
addresses that refer to objects on the World Wide Web.
8.0 Objectives
After studying this chapter, you will be able to:
Discuss the need for security
Discuss the common threats and security barriers in network pathway
Explain the classification of attacks
Discuss the approaches to network security and levels
8.1 Introduction
When a computer connects to a network and begins communicating with others, it is taking a risk. Network security
involves the protection of a computer‘s internet account and files from intrusion of an unknown user. Basic
security measures involve protection by well selected passwords, change of file permissions and back up
of computer‘s data.
Security concerns are in some ways peripheral to normal business working, but serve to highlight just how important it
is that business users feel confident when using IT systems. Security will probably always be high on the IT agenda
simply because cyber criminals know that a successful attack is very profitable. This means they will always strive to
find new ways to circumvent IT security, and users will consequently need to be continually vigilant. Whenever
decisions need to be made about how to enhance a system, security will need to be held uppermost among its
requirements. The Web has become an integral part of the Internet. The Web facility on the internet is made
up of collection of server and client that can exchange information.
8.4 Attacks
A network attack can be defined as any method, process, or means used to maliciously attempt to compromise
network security.
There are a number of reasons that an individual(s) would want to attack corporate networks. The individuals
performing network attacks are commonly referred to as network attackers, hackers, or crackers.
External Threats: Individuals carry out external threats or network attacks without assistance from internal
employees or contractors. A malicious and experienced individual, a group of experienced individuals, an
experienced malicious organization, or inexperienced attackers (script kiddies) carry out these attacks. Such
attackers usually have a predefined plan and the technologies (tools) or techniques to carry out the attack.
One of the main characteristics of external threats is that they usually involve scanning and gathering
information. Users can therefore detect an external attack by scrutinizing existing firewall logs. Users can
also install an Intrusion Detection System to quickly identify external threats.
External threats can be further categorized into either structured threats or unstructured threats:
Structured External Threats: These threats originate from a malicious individual, a group of malicious
individual(s), or a malicious organization. Structured threats are usually initiated from network attackers that
have a premeditated thought on the actual damages and losses that they want to cause. Possible motives for
structured external threats include greed, politics, terrorism, racism, and criminal payoffs. These attackers are
highly skilled on network design, avoiding security measures, Intrusion Detection Systems (IDSs), access
procedures, and hacking tools. They have the necessary skills to develop new network attack techniques and
the ability to modify existing hacking tools for their exploitations. In certain cases, an internal authorized
individual may assist the attacker.
Unstructured External Threats: These threats originate from an inexperienced attacker, typically from a
script kiddie. Script kiddie refers to an inexperienced attacker who uses cracking tools or scripted tools
readily available on the Internet to perform a network attack. Script kiddies are usually inadequately skilled
to create the threats on their own. They can be considered bored individuals seeking some form of fame by
attempting to crash Websites and other public targets on the Internet. External attacks can also occur either
remotely or locally:
Remote external attacks: These attacks are usually aimed at the services that an organization offers to the
public. The various forms that remote external attacks can take are. Remote attacks aimed at the services
available for internal users. This remote attack usually occurs when there is no firewall solution implemented
to protect these internal services. Remote attacks aimed at locating modems to access the corporate network.
Denial of service (DoS) attacks to place an exceptional processing load on servers in an attempt to prevent
authorized user requests from being serviced war dialling of the corporate private branch exchange (PBX).
Attempts to brute force password authenticated systems.
Local External Attacks: These attacks typically originate from situations where computing facilities are
shared and access to the system can be obtained.
Internal Threats: Internal attacks originate from dissatisfied or unhappy inside employees or contractors.
Internal attackers have some form of access to the system and usually try to hide their attack as a normal
process. For instance, internal disgruntled employees have local access to some resources on the internal
network already. They could also have some administrative rights on the network. One of the best means to
protect against internal attacks is to implement an Intrusion Detection System and to configure it to scan for
both external and internal attacks. All forms of attacks should be logged and the logs should be reviewed and
followed up. With respect to network attacks, the core components that should be included when users
design network security are:
1. Network attack prevention
2. Network attack detection
3. Network attack isolation
4. Network attack recovery
2. The Internet is comprised of many different computers all of which fall into…….categories.
(a). Three (b). Two
(c). Four (d).None of these
4. …………is concerned with making sure that unwanted people cannot read or modify message intended for
other receivers.
(a). Network (b). Server
(c). Security (d). None of these
5. Three common factors emerges when dealing with network security, these are vulnerability, threat, and
attack.
(a). True (b). False
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
8.8 Firewall
One of the primary security issues in computer networks is to access control and besides login and password security,
it is the firewall security that firms use to protect their private networks from rest of the internet.
A method for implementing security policies designed to keep a network secure from intruders.
It can be a single router that filters out unwanted packets or maybe a combination of routers and servers each
performing some type of firewall processing.
Firewall is widely used to give users secure access to the internet as well as to separate a company‘s public Web
server from its internal network.
A firewall is either a hardware device as a software package running on a specially configured computer that sits
between a secured network (your internal network) (the internet).
Firewall performs tasks, including preventing unauthorised access to your network, limiting incoming and
outgoing traffic, authenticating users, logging traffic information and producting report.
Fundamental role is not only to monitor the traffic but also to block certain kinds of traffic completely.
Basically, a firewall is a barrier to keep destructive forces away from your property. In fact, that is why it is
called a firewall. Its job is similar to a physical firewall that keeps a fire from spreading from one area to the next.
There are several types of firewall techniques:
1. Packet filters: Looks at each packet entering or leaving the network and accepts or rejects it based on user-
defined rules. Packet filtering is fairly effective and transparent to users, but it is difficult to configure. In
addition, it is susceptible to IP spoofing.
2. Application gateway: Applies security mechanisms to specific applications, such as FTP and Telnet servers.
This is very effective, but can impose performance degradation.
3. Circuit-level gateway: Applies security mechanisms when a TCP or UDP connection is established. Once
the connection has been made, packets can flow between the hosts without further checking.
4. Proxy server: Intercepts all messages entering and leaving the network. The proxy server effectively hides
the true network addresses.
8.9.1 Architecture
There are generally four types of firewalls: Packet Filtering Firewalls, Circuit Level Gateways, Application Level
Gateways, and Stateful Multilevel Inspection Firewalls. These firewall designs are in increasing order of complexity
and evolution.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Caution
A firewall can cause significant problems to users who run their PC as servers such as network file sharing or
FTP service. Connections INTO your PC from outside hosts may be stopped. Running network services on
PCs can be dangerous. The firewall may be a reminder that you have to reconfigure with security in mind.
Caution
Do not configure a default route on your internal and DMZ interfaces. Your firewall should have exactly one
default route via your ISP‘s Router. Wrong configuration may be the cause of harm to system.
VLAN‘s allow a network manager to logically segment a LAN into different broadcast domains Figure 8.3
since this is a logical segmentation and not a physical one, workstations do not have to be physically located
together. Users on different floors of the same building, or even in different buildings can now belong to the
same LAN.
Physical View
Logical View
Figure 8.3: Physical and logical view of a VLAN.
VLAN‘s also allow broadcast domains to be defined without using routers. Bridging software is used instead
to define which workstations are to be included in the broadcast domain. Routers would only have to be used
to communicate between two VLAN‘s.
Port VLAN
1 1
2 1
3 2
4 1
The main disadvantage of this method is that it does not allow for user mobility. If a user moves to a different
location away from the assigned bridge, the network manager must reconfigure the VLAN.
1212354145121 1
2389234873743 2
3045834758445 2
5483573475843 1
The main problem with this method is that VLAN membership must be assigned initially. In networks with
thousands of users, this is no easy task. Also, in environments where notebook PC‘s are used, the MAC address is
associated with the docking station and not with the notebook PC. Consequently, when a notebook PC is moved to
a different docking station, its VLAN membership must be reconfigured.
Layer 2 VLAN: Membership by Protocol Type
VLAN membership for Layer 2 VLAN‘s can also be based on the protocol type field found in the Layer 2
header Figure 8.6.
Protocol VLAN
IP 1
IPX 2
IP Subnet VLAN
23.2.24 1
26.21.35 2
Although VLAN membership is based on Layer 3 information, this has nothing to do with network routing and
should not be confused with router functions. In this method, IP addresses are used only as a mapping to
determine membership in VLAN‘s. No other processing of IP addresses is done. In Layer 3 VLAN‘s; users
can move their workstations without reconfiguring their network addresses. The only problem is that it
generally takes longer to forward packets using Layer 3 information than using MAC addresses.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
8.14 Summary
Security issues for networks are visible and important, but their analysis is similar to the analysis done for
other aspects of security.
Networks usually employ many copies of the same or similar software, with a copy on each of several (or
all) machines in the network.
A network‘s security depends on all the cryptographic tools at our disposal, good program development
processes, operating system controls, trust and evaluation and assurance methods, and inference and
aggregation controls.
Application gateway sits between the internet and a company‘s internal network and provides middleman
services to users on either site.
Packet filtering firewalls was the first firewall architecture and was developed by Cisco.
8.15 Keywords
Authentication: Process of establishing the identifies of the communicating parties, here the receiver must get
assured that the sender is the one that it claims to be. Sender is not posing itself as somebody else.
Diversionary Tactics: Hackers may strike a set of servers in a target company and then when security
administrators are busy putting out that fire, they slip in and attack another part of the network.
Firewall: One of the primary security issue in computer networks is to access control and besides login and
password security, it is the firewall security that firms use to protect their private networks from rest of the
internet.
Security Attack: Any action that compromise the security of information owned by the organization or
individual.
Security service: A service that enhances the security of the data processing system and information transfers of
an organisation.
Vulnerabilities: A hacker who worms his way into the VPN has free and easy access to the network.
9.0 Objectives
After studying this chapter, you will be able to
Explain the virus and threats
Discuss the malicious programs
Explain the types of viruses
Discuss the virus countermeasures and antivirus approach
Explain the advanced antivirus techniques
Explain the distributed denial of service attacks and description
9.1 Introduction
VIRUS (Vital Information Resource under Seize). Actually, Virus is a piece of executed code that performs
some unwanted or undesirable action in system that is harmful for system. This harmful code is written in
different languages, like Batch Programming, C, VB etc. There are various types of viruses today, like File
Virus, Boot Sector Virus etc. file virus includes all types of files and boot sector virus is a powerful piece of
code that may not be found sometimes because it hides itself in boot sector of operating system; so that after
each time the computer starts, the virus will execute itself and harm the system at booting time.
Today popular viruses are built in C and mostly these are in VB. Earlier viruses were built in batch
programming, but these viruses are not very efficient to harm the system and quite easily for antivirus to detect
and remove it, even a good computer engineer can also find the virus and remove it. But in case of VBscript
viruses, it is very difficult to find and remove the virus from system.
Some virus programs appear to have been sent by a friend or a company you have emailed before. Most come
from unknown sources. Once opened they work by pulling names out of a computer‘s address book and using
them to further spread the virus. You can set your email to accept plain text only. Block or remove emails that
contain file attachments. Some viruses are even programmed to instruct your pc to show you only the plain
text but can still infect your computer with hidden malicious code.
Anti-virus programs hunt for viruses and clean attachments if possible. When a file cannot be cleaned, the
anti-virus program will isolate the file. The anti-virus program uses the definition list you download from the
program‘s Website, or it matches up a general pattern of what a virus looks like. The schedule with which the
anti-virus definitions are updated can vary, and you may get caught in that window of vulnerability between
the virus appearing and it getting updated to the list. Software companies use patches to correct a problem or a
weakness that people can take advantage of. Patching is a necessity and will be an ongoing method for
computer systems weakness prevention.
The Internet is not always what you see. Scan all of your email attachments as you download them. This
should be done with files you download directly from internet sites as well as music files, programs, e-books,
games, etc.
Caution
Be careful sharing files with others, such as MP3, videos, programs, pictures, etc. Downloadable data can
contain malicious code that you download without knowing it, and will infect your computer.
9.3.2 Trojans
Trojans are malicious programs that perform actions which are not authorized by the user they delete, block,
modify or copy data, and they disrupt the performance of computers or computer networks. Unlike viruses and
worms, the threats that fall into this category are unable to make copies of themselves or self-replicate.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
9.4 Types of Viruses
Viruses usually consume a large amount of computer memory, resulting in system crashes. Viruses are
categorized into several parts based on its features.
Boot Sector Virus
File Infection Virus
Multipartite Virus
Network Virus
E-mail Virus
Macro Virus
Anti-Virus Software
The basic countermeasure to prevent infection with a computer virus is to install anti-virus software. Such
software identifies and removes known virus.
Firewalls
Firewalls prevent unauthorized access of your computer through a network connection. They guard against
someone installing software remotely on your computer or viewing your private data. While anti-virus
software looks for viruses already in your computer, firewalls block network traffic that might contain viruses.
Passwords
One way a virus might infect your computer is through someone copying files containing a virus from a digital
drive or CD onto your computer. A good countermeasure is to limit access to your computer with a log-in
including a password. When only people who have the password can access your computer, you can make sure
they check their files for viruses before copying them onto your machine.
Secure Wireless Networks
Even with a firewall, someone might gain access to your computer via an unsecured wireless network. When
you first set up a wireless network, a password is not required for access. Anyone who can find the signal can
use the wireless network, and try to gain access to your computer without your knowledge. Securing your
wireless network with a password is a countermeasure that prevents unauthorized use, and protects all the
computers that are connected to the network.
Email
Email is a common vulnerability for all computers that use it. Email enters through firewalls, and attachments
may masquerade as safe files when they really contain a virus. Countermeasures include not opening
attachments to emails if you do not recognize the sender, and not opening attachments that have strange names
or endings.
Downloading
Downloading files and installing software represents a key vulnerability for most computers. When you
download and install software that contains a virus, you by-pass most of the other countermeasures. Anti-virus
software may detect the virus once it is active but, if the virus is new, the software may not recognize it.
Countermeasures specifically aimed at downloading malicious files include only downloading files from
reputable Websites, and limiting downloading to clearly identified files for specific purposes.
2. A ……….involves sending mangled IP fragments with overlapping, over-sized payloads to the target
machine.
(a). Teardrop attack (b). Peer-to-peer attacks
(c). Warm (d). None of these
5 . A good countermeasure is to limit access to your computer with a log-in including a……….
(a). Log-in (b). Password.
(c). Enter (d). None of these
Method of Infection
Rogue scanners may exploit security vulnerabilities to forcibly install, or they may attempt to trick users into
installing by masquerading as a free security scan or system tune-up. Rogue scanners are also distributed via
email scams, often masquerading as breaking news alerts.
Prevention
Follow these computer safety tips to prevent unintended installations of rogue scanners and other forms of
malware. Before installing any new program, search the Web and read reviews.
Removal
The free SmitFraudFix tool is capable of detecting and removing advanced antivirus for instructions.
9.8 Distributed Denial of Service Attacks
A denial-of-service attack (DoS attack) or distributed denial-of-service attack (DDoS attack) is an attempt to
make a computer or network resource unavailable to its intended users. Although the means to carry out,
motives for, and targets of a DoS attack may vary, it generally consists of the concerted efforts of a person, or
multiple people to prevent an Internet site or service from functioning efficiently or at all, temporarily or
indefinitely. Perpetrators of DoS attacks typically target sites or services hosted on high-profile Web servers
such as banks, credit card payment gateways, and even root name servers. The term is generally used relating
to computer networks, but is not limited to this field; for example, it is also used in reference to CPU resource
management.
One common method of attack involves saturating the target machine with external communications requests,
such that it cannot respond to legitimate traffic, or responds so slowly as to be rendered effectively
unavailable. Such attacks usually lead to a server overload. In general terms, DoS attacks are implemented by
either forcing the targeted computers to reset, or consuming its resources so that it can no longer provide its
intended service or obstructing the communication media between the intended users and the victim so that
they can no longer communicate adequately.
Denial-of-service attacks are considered violations of the IAB‘s Internet proper use policy, and also violate the
acceptable use policies of virtually all Internet service providers. They also commonly constitute violations of
the laws of individual nations. When the DoS Attacker sends many packets of information and requests to a
single network adapter, each computer in the network would experience effects from the DoS attack.
Caution
Use a suitable downloader to download any software to avoid the virus attack on your system.
A system may also be compromised with a Trojan, allowing the attacker to download a zombie agent.
Attackers can also break into systems using automated tools that exploit flaws in programs that listen for
connections from remote hosts. This scenario primarily concerns systems acting as servers on the Web.
Stacheldraht is a classic example of a DDoS tool. It utilizes a layered structure where the attacker uses a client
program to connect to handlers, which are compromised systems that issue commands to the zombie agents,
which in turn facilitate the DDoS attack. Agents are compromised via the handlers by the attacker, using
automated routines to exploit vulnerabilities in programs that accept remote connections running on the
targeted remote hosts. Each handler can control up to a thousand agents.
These collections of systems compromisers are known as botnets. DDoS tools like Stacheldraht still use classic
DoS attack methods centered on IP spoofing and amplification like Smurf attacks and fraggle attacks (these
are also known as bandwidth consumption attacks). SYN floods (also known as resource starvation attacks)
may also be used.
Reflected/Spoofed Attack
A distributed reflected denial of service attack (DRDoS) involves sending forged requests of some type to a
very large number of computers that will reply to the requests. Using Internet Protocol address spoofing, the
source address is set to that of the targeted victim, which means all the replies will go to (and flood) the target.
ICMP Echo Request attacks (Smurf Attack) can be considered one form of reflected attack, as the flooding
hosts send Echo Requests to the broadcast addresses of mis-configured networks, thereby enticing many hosts
to send. Many services can be exploited to act as reflectors, some harder to block than others. DNS
amplification attacks involve a new mechanism that increased the amplification effect, using a much larger list
of DNS servers.
Degradation-of-service Attacks
―Pulsing‖ zombies are compromised computers that are directed to launch intermittent and short-lived
flooding of victim Websites with the intent of merely slowing it rather than crashing it. This type of attack,
referred to as ―degradation-of-service‖ rather than ―denial-of-service‖, can be more difficult to detect than
regular zombie invasions and can disrupt and hamper connection to Websites for prolonged periods of time,
potentially causing more disruption than concentrated floods. Exposure of degradation-of-service attacks is
complicated further by the matter of discerning whether the attacks really are attacks or just healthy and likely
desired increases in Website traffic.
Unintentional Denial of Service
This describes a situation where a Website ends up denied, not due to a deliberate attack by a single individual
or group of individuals, but simply due to a sudden enormous spike in popularity. This can happen when an
extremely popular Website posts a prominent link to a second, less well-prepared site, for example, as part of a
news story. The result is that a significant proportion of the primary site‘s regular users potentially hundreds of
thousands of people click that link in the space of a few hours, having the same effect on the target Website as
a DDoS attack.
Denial-of-Service Level II
The goal of DoS L2 (possibly DDoS) attack is to cause a launching of a defense mechanism which blocks the
network segment from which the attack originated. In case of distributed attack or IP header modification (that
depends on the kind of security behaviour) it will fully block the attacked network from Internet, but without
system crash.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
9.9 Summary
Email programs are the most common means of propagation, delivering around 75% of the 50 top virus-
like threats.
Malware attacks are increasing in both frequency and sophistication, thus posing a serious threat to the
internet economy and to national security.
Anti-virus professionals use bait files to take a sample of a virus. It is more practical to store and exchange
a small, infected bait file, than to exchange a large application program that has been infected by the virus.
Application viruses spread from one application to another on the computer. Each time an infected
application program is run; the virus takes control and spreads to other applications.
Script viruses infect other script files on the computer. Script viruses, which are written in high-level script
languages such as Perl or Visual Basic, gain control when a user runs an infected script file.
9.10 Keywords
Bootsector Virus: A virus which attaches itself to the first part of the hard disk that is read by the computer
upon bootup. These are normally spread by floppy disks.
Malicious Tools: Malicious tools are malicious programs designed to automatically create viruses, worms, or
Trojans, conduct DoS attacks on remote servers, hack other computers, etc.
Malicious Programs: Which are specifically designed to delete, block, modify, or copy data or to disrupt the
performance of computers and computer networks.
Network Viruses: This virus spreads rapidly through the local area network and eventually across the Internet,
but in majority of cases this virus spreads within the shared resources such as folder and drives.
Peer-To-Peer Attacks: Are different from regular botnet-based attacks. With peer-to-peer there is no botnet
and the attacker does not have to communicate with the clients it subverts.
10.0 Objectives
After studying this chapter, you will be able to:
Explain the encryption and decryption
Discuss the cryptography terminology
Explain the classification of cryptography
Discuss the security of algorithms
Define the steganography
Explain the steganography versus cryptography
Explain the public key encryption
Discuss the comparison of symmetric and asymmetric key cryptography
Discuss the public key cryptanalysis
10.1 Introduction
A message is called either plaintext or clear text. The process of disguising a message in such a way as to hide
its substance is called encryption. An encrypted message is called cipher text. The process of turning cipher
text back into plaintext is called decryption. The art and science of keeping messages secure is called
cryptography, and it is practiced by cryptographers. Cryptanalysts are practitioners of cryptanalysis, the art and
science of breaking cipher text, i.e. seeing through the disguise. The branch of mathematics embodying both
cryptography and cryptanalysis is called cryptology, and it is practitioners are called cryptologists.
PGP uses public-key encryption to protect e-mail and data files. Communicate securely with people you have
never met, with no secure channels needed for prior exchange of keys. PGP is well featured and fast, with
sophisticated key management, digital signatures, data compression, and good ergonomic design.
Pretty Good Privacy (PGP) is a high security cryptographic software application for MS-DOS, UNIX,
VAX/VMS, and other computers. PGP allows people to exchange files or messages with privacy,
authentication, and convenience. Privacy means that only those intended to receive a message can read it.
Authentication means that messages that appear to be from a particular person can only have originated from
that person. Convenience means that privacy and authentication are provided without the hassles of managing
keys associated with conventional cryptographic software. No secure channels are needed to exchange keys
between users, which makes PGP much easier to use. This is because PGP is based on a powerful new
technology called ―public key‖ cryptography.
Implementations of symmetric-key encryption can be highly efficient, so that users do not experience any
significant time delay as a result of the encryption and decryption. Symmetric-key encryption also provides a
degree of authentication, since information encrypted with one symmetric key cannot be decrypted with any
other symmetric key. Thus, as long as the symmetric key is kept secret by the two parties using it to encrypt
communications, each party can be sure that it is communicating with the other as long as the decrypted
messages continue to make sense.
Symmetric-key encryption is effective only if the symmetric key is kept secret by the two parties involved. If
anyone else discovers the key, it affects both confidentiality and authentication. A person with an unauthorized
symmetric key not only can decrypt messages sent with that key, but can encrypt new messages and send them
as if they came from one of the legitimate parties using the key.
Symmetric-key encryption plays an important role in SSL communication, which is widely used for
authentication, tamper detection, and encryption over TCP/IP networks. SSL also uses techniques of public-
key encryption.
The scheme shown in figure 10.2 ―Public-Key Encryption‖ allows public keys to be freely distributed, while
only authorized people are able to read data encrypted using this key. In general, to send encrypted data, the
data is encrypted with that person‘s public key, and the person receiving the encrypted data decrypts it with the
corresponding private key. Compared with symmetric-key encryption, public-key encryption requires more
processing and may not be feasible for encrypting and decrypting large amounts of data. However, it is
possible to use public-key encryption to send a symmetric key, which can then be used to encrypt additional
data. This is the approach used by the SSL/TLS protocols.
The reverse of the scheme shown in figure 10.2, ―Public-Key Encryption‖ also works data encrypted with a
private key can be decrypted only with the corresponding public key. This is not a recommended practice to
encrypt sensitive data, however, because it means that anyone with the public key, which is by definition
published, could decrypt the data. Nevertheless, private-key encryption is useful because it means the private
key can be used to sign data with a digital signature, an important requirement for electronic commerce and
other commercial applications of cryptography.
Caution
Changing the encryption algorithm may cause disruptions to the system.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
2. Encryption is the process used to convert ........into a form that is not readable without knowledge of the
rules and key to decrypt.
(a). Cipher text (b). Plaintext
(c). Encrypt (d). None of these
3. In symmetric cryptography, the same key is used for both encryption and decryption by..................
(a). Sender (b). Receiver
(c). E-mail (d). None of these
5. ……..allows the recipient of information to determine its origin by confirming the sender‘s identity
(a). Authentication (b). Authorized
(c). Recipient (d). None of these
In cryptographic systems, the term key refers to a numerical value used by an algorithm to alter information,
making that information secure and visible only to individuals who have the corresponding key to recover the
information.
Secret key cryptography is also known as symmetric key cryptography. With this type of cryptography, both
the sender and the receiver know the same secret code, called the key. Messages are encrypted by the sender
using the key and decrypted by the receiver using the same key. This method works well if you are
communicating with only a limited number of people, but it becomes impractical to exchange secret keys with
large numbers of people. In addition, there is also the problem of how you communicate the secret key
securely. Public key cryptography, also called asymmetric encryption, uses a pair of keys for encryption and
decryption. With public key cryptography, keys work in pairs of matched public and private keys.
The public key can be freely distributed without compromising the private key, which must be kept secret by
its owner. Because these keys work only as a pair, encryption initiated with the public key can be decrypted
only with the corresponding private key.
The major advantage asymmetric encryption offers over symmetric key cryptography is that senders and
receivers do not have to communicate keys up front. Provided the private key is kept secret, confidential
communication is possible using the public keys.
Substitution Cipher
A substitution cipher uses a key to know how the substitution should be carried out. In the caesar cipher, each
letter is replaced with the letter three places beyond it in the alphabet. This is referred to as a shift alphabet.
If the caesar cipher is used with the english alphabet, when George wants to encrypt a message of ―FBI,‖ the
encrypted message would be ―IEL.‖ Substitution is used in today‘s algorithms, but it is extremely complex
compared to this example. Many different types of substitutions take place usually with more than one
alphabet. This example is only meant to show you the concept of how a substitution cipher works in its most
simplistic form.
Transposition Cipher
In a transposition cipher, permutation is used, meaning that letters are scrambled. The key determines the
positions that the characters are moved to, as illustrated in figure 10.3. This is a simplistic example of a
transposition cipher and only shows one way of performing transposition. When introduced with complex
mathematical functions, transpositions can become quite sophisticated and difficult to break. Most ciphers
used today use long sequences of complicated substitutions and permutations together on messages. The key
value is inputted into the algorithm and the result is the sequence of operations (substitutions and
permutations) that are performed on the plaintext.
Simple substitution and transposition ciphers are vulnerable to attacks that perform frequency analysis. In
every language, there are words and patterns that are used more often than others. For instance, in the english
language, the words ―the,‖ ―and,‖ ―that,‖ and ―is‖ are very frequent patterns of letters used in messages and
conversation. The beginning of messages usually starts ―Hello‖ or ―Dear‖ and ends with ―Sincerely‖ or
―Goodbye.‖ These patterns help attackers figure out the transformation between plaintext to cipher text, which
enables them to figure out the key that was used to perform the transformation. It is important for
cryptosystems to not reveal these patterns.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
10.5.1 Authentication
DSA
This digital signature algorithm (DSA) is used for generating digital signatures in digital certificates. Only
someone who has a public-private key pair can generate a digital signature.
A digital signature consists of two integers, called‗s‘ (signature) and ‗r‘ (verification), which are sent to the
client for authentication. These integers are generated from several random integers.
First two prime integer numbers ‗p‘ and ‗q‘ are taken. Then two random integers ‗h‘ and ‗k‘ are selected from
these. Here ‗h‘ is in the range of 1 and p-1, while ‗k‘ is a value greater than 0 and less than ‗q‘. Subsequently,
another value ‗g‘ is calculated using ‗h‘, ‗p‘ and ‗q‘. Finally, ‗r‘ is calculated using ‗g‘, ‗p‘ and ‗q‘.
For gene rating‗s‘, first a random message ‗m‘ is created. Then its hash is calculated using a hashing algorithm
like MD5. Finally, ‗s‘ is generated using ‗k‘, the hashed message, private key, ‗r‘ and ‗q‘.
The digital signature along with ‗p‘, ‗q‘ and ‗g‘ is sent to the client for verifying its identity. The hashing
algorithm used, the message ‗m‘ and the public key are also sent. On the client side the message ‗m‘ is first
subjected to the hashing algorithm. Then a value ‗v‘ (called verifier) is calculated from this hashed message,
‗s‘, ‗p‘, ‗q‘, and the public key. Now if ‗v‘ is equal to ‗r‘, then the digital signature is verified.
MD5
MD5 (Message Digest) is a hashing algorithm used in generating digital signatures. The output of MD5 is a
message digest, which can be used to authenticate the owner of a private key.
The MD5 algorithm takes a message and checks whether it is size is 448-bits. If it is not, then it pads it with
extra bits. Then it again takes the original message and converts it to 64 bits. These are then added to the 448
bits to give a block of 512 bits. This block is then broken into 32, 16-bit message blocks. A loop is started in
which each of the 32 blocks is processed. Outside this loop, four separate 32-bit variables A, B, C, and D of
standard values are taken. Then the values of these four variables A, B, C, D are copied to four different
variables say a, b, c, and d. Next, within the loop new values are calculated for a, b, c, and d using the 16-bit
blocks and the, b, c, and d values themselves. A different equation is used for each of these four variables.
Now A, B, C, and D are incremented with the new values of a, b, c, and d.
Finally A, B, C, and D totaling to 128 bits (32x4) is the hash calculated, which is also called a message digest.
10.5.2 Encryption
Broadly speaking there are two encryption techniques symmetric and asymmetric used for secure
communication? In symmetric encryption, the same key is used for both encryption as well as decryption. This
is known as the private key. Consider two parties, A and B, wanting to engage in an encrypted communication.
Party A generates a private key and sends its copy to party B. Hence both parties use this key to encrypt as
well as decrypt messages.
In asymmetric encryption, party A generates a public-private key pair, and sends just the public key to party B.
When B wants to send a secret message to A, it encrypts the message using A‘s public key. When A receives
this encrypted message, it can only decrypt it with its corresponding private key. Similarly, the reverse can
also happen. This procedure is also known as PKI or Public Key Infrastructure.
RSA
RSA, which is named after its developers (Rivest, Shamir, Adleman), is an asymmetric or public key
algorithm. In this, the public-private key pair has a fixed length in bits, which can be decided at the time of
their generation like 512, 768, 1,024, 2,096, with higher numbers corresponding to stronger encryption. When
the public key is generated, it consists of the key size and a positive integer called public exponent, which has
some typical standard values. The private key when generated includes these two along with a private
exponent and two prime numbers. The two prime numbers are derived such that their product is equal to the
key size. In RSA, key size is the same for both keys. The private exponent in the private key is calculated from
the public exponent and the two prime numbers.
Once the keys have been generated, they are ready for encrypting or decrypting data or message. The number
of bits in the message being encrypted must be less than or equal to the key size. If not, the message is broken
into separate blocks and then encrypted. If the message size is smaller than the key size then some extra bits
are padded to the message.
The encrypted message is created using the original message itself, public exponent, and the key size
information in the public key. When the encrypted message is received on the other end, the private exponent
and the key size is used to decrypt it. Since the private exponent is calculated using the public exponent, only
the correct private key can decrypt the message. The encryption and decryption of the message requires a lot
of exponential calculations. So RSA or as such public key encryption is slow.
DES
DES (Data Encryption Standard) is developed by IBM. It is a symmetric key encryption technique that
encrypts messages in 64-bit chunks. Though the actual key size is 64-bits, it only uses 56 bits for
encryption/decryption. The remaining 8 bits are used for checking whether the key has changed during its
transmission either accidentally or intentionally.
Both the 56-bit key and the 64-bit data go through a process of permutation and transformation. The objective
is to create sixteen; 48-bits sub keys using the 56-bit key and the 64-bit data in 16 loops. The following is the
explanation of one loop.
The 56-bit key is first changed according to a key permutation table. Permutation tables change the bit
positions. Then the changed key is divided into two 28-bit halves. The bits in each half are then shifted by two
places. The shifting is done in all the rounds except the first, second, ninth, and the sixteenth round. Then from
these two halves, a 48-bit key is chosen using a compression permutation table.
The 64-bit data is divided into two 32-bit halves called a left L and a right-half R. Now R is subjected to
another permutation table called the expansion permutation table where each 32-bit block is expanded to 48
bits (by padding and repeating some bits).
After this, R is XORed (pronounced Exclusive OR, which is a digital gate function) with the 48-bit sub key
generated from the 56-bit key. The result of this is fed to 8 permutation tables known as S-boxes. Each S-box
accepts 6 bits (8x6=48) and generates a 4-bit output. The total output from the eight S-boxes is then combined,
resulting in a 32-bit chunk. This 32-bit chunk is then fed to another permutation table called a P-box. The P-
box also produces a 32-bit chunk, which is then XORed with L. Finally if this is not the sixteenth round, L
becomes R and vice versa. This swapping is called transformation. The 64-bit data undergoes 15 more such
rounds for encryption. During decryption the opposite process is repeated. Since the algorithm involves just
XORing and changes in bit positions, DES is relatively faster.
Caution
Be sure to remember the encryption code you entered. If you need to enter the encryption code again for some
reason and you do not enter the same encryption code, all the data stored on the hard disk will be overwritten
as a security precaution.
10.6 Steganography
The objective of steganography is to hide a secret message within a cover-media in such a way that others
cannot discern the presence of the hidden message. Technically in simple words ―steganography means hiding
one piece of data within another‖. Modern steganography uses the opportunity of hiding information into
digital multimedia files and also at the network packet level.
Hiding information into a media requires following elements.
The cover media(C) that will hold the hidden data
The secret message (M), may be plain text, cipher text or any type of data
The stego function (Fe) and its inverse (Fe-1)
An optional stego-key (K) or password may be used to hide and unhide the Message
The stego function operates over cover media and the message (to be hidden) along with a stego-key
(optionally) to produce a stego media (S). The schematic of steganographic operation is shown in Figure 10.4.
Steganography and cryptography are great partners in spite of functional difference. It is common practice to
use cryptography with steganography.
10.10 Summary
Encryption is the process used to convert plaintext into a form that is not readable without knowledge of
the rules and key to decrypt.
Decryption is the process used to convert from an encrypted form back to plaintext, using a key.
A code book uses a series of code values to use as replacements. In traditional code book models, a
message is converted by mapping from alphabet letters to numbers or other letters, according to a pattern
in the code book.
Symmetric encryption (private key encryption) is a type of encryption where the same secret key is used to
encrypt and decrypt information or there is a simple transform between the two keys.
Asymmetric encryption (Public Key Encryption) uses different keys for encryption and decryption. The
encryption key is public so that anyone can encrypt a message.
10.11 Keywords
Digital Signatures: A message signed with a sender‘s private key can be verified by anyone who has access to
the sender‘s public key, thereby proving that the sender had access to the private key, and the part of the
message that has not been tampered with.
Public-key: Systems that use two keys, a public key known to everyone and a private key that only the
recipient of messages uses.
Public-key Cryptography: Refers to a cryptographic system requiring two separate keys, one to lock or
encrypt the plaintext, and one to unlock or decrypt the cyphertext.
Recipient decrypts: The received message using their own secret key, identifies the sender from their now-
cleartext signature, and then decrypts the result using the sender‘s public key.
Symmetric-key Algorithms: A public key algorithm does not require a secure initial exchange of one, or more,
secret keys between the sender and receiver.
11.0 Objectives
After studying this chapter, you will be able to:
Understand the concept of digital signature
Discuss the requirements of digital signature
Explain the types of digital signature
Discuss the authentication protocol
Explain the symmetric encryption approach
Discuss the public-key encryption approach
11.1 Introduction
A digital signature authenticates electronic documents in a similar manner a handwritten signature
authenticates printed documents. This signature cannot be forged and it asserts that a named person wrote or
otherwise agreed to the document to which the signature is attached. The recipient of a digitally signed
message can verify that the message originated from the person whose signature is attached to the document
and that the message has not been altered either intentionally or accidentally since it was signed. Also, the
signer of a document cannot later disown it by claiming that the signature was forged. In other words, digital
signatures enable the ―authentication‖ and ―non-repudiation‖ of digital messages, assuring the recipient of a
digital message of both the identity of the sender and the integrity of the message.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Performance
The digital signature system should minimize network traffic.
Where
P = our first partie
A = Arbiter
M = Message
Here P sends a message containing: the message, [the hash code of the message and his identifier] all
encrypted with a secret key Kpa that only P and A share (the part in the brackets is the signature). This way the
arbiter can decrypt the signature with the secret key that they share and confirm that the message (checks the
hash value and compares) and the user P (if he was able to decrypt the signature with the key provided for that
user) are correct and not have been tampered with.
Here A sends a message to Q that is encrypted with Kaq that only Q and A know that includes: identifier of P,
the message, the signature ( ) and a timestamp. Timestamp is here to inform Q of a
timely transfer and not an old reply. Now Q can store the message and in any further dispute it can use it to
prove it came from P by sending back the same message just without the timestamp. Then A can decrypt it by
using Kaq and check the signature if it really came from P.
In this dialogue both parties as we said need to have high trust in the arbiter in order to function. If this is true
then both sides can be safe in knowing that their signature cannot be forged or from the other party.
11.5.1 CHAP
The Challenge Handshake Authentication Protocol (CHAP) is an authentication protocol that is primarily used
for remote access PPP connections. CHAP is the successor of the Plain Authentication Protocol (PAP), which
transmits the username and password in clear text over the network media. CHAP uses a more secure method;
when a client logs on, the server sends a challenge request to the client, the client replies with a challenge
response that is a hashed (one-way encrypted) value based on the username/password-combination and a
random number. The server performs the same encryption and if the resulting value matches the response from
the client, the client is authenticated. The actual password is not transmitted across the network.
11.5.2 MS-CHAP
MS-CHAP is the Microsoft version of CHAP and provides the technology from CHAP combined with
Microsoft authentication and encryption. MS-CHAP is available on Microsoft Windows 95, NT, 2000 and
later versions. Windows 2003 includes MS-CHAP v2, which provides stronger security for the handshake
process and mutual authentication. The latter means the client authenticates itself to the server, and the server
authenticates itself to the client. While CHAP requires the password to be stored in plain text on the
authentication server, an MS-CHAP password can be the user‘s Windows password stored on a domain
controller. This allows centralized management of the username and password and offers a ‗single sign-on‘ to
connect to the remote access server and access resources in the remote network.
Caution
When data is not available block by block (as data that comes from a network stream) and the encryption
system should retain data in memory until all blocks arrive. This could be dangerous from a security point of
view.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
A means of associating public and private key pairs to the corresponding users is required. That is, there must
be a binding of a user‘s identity and the user‘s public key. This binding may be certified by a mutually trusted
party. For example, a certifying authority could sign credentials containing a user‘s public key and identity to
form a certificate. Systems for certifying credentials and distributing certificates are beyond the scope of this
standard. NIST intends to publish separate document(s) on certifying credentials and distributing certificates.
The integer‘s p, q, and g can be public and can be common to a group of users. A user‘s private and public
keys are x and y, respectively. They are normally fixed for a period of time. Parameters x and k are used for
signature generation only, and must be kept secret. Parameter k must be regenerated for each signature.
Parameters p and q shall be generated, or using other FIPS approved security methods. Parameters x and k
shall be generated, or using other FIPS approved security methods.
While Diffie and Hellman provided a general model for digital signatures of any kind, the method developed
by Rivest, Shamir, and Adleman in 1977, known as ―RSA,‖ has become the most proven and most popular,
and achieved the widest adoption by standards bodies and in practice. Two other methods, discrete logarithm
cryptography (including the Digital Signature Algorithm and the Diffie-Hellman key agreement method) and
elliptic curve cryptography have also been embodied in several standards, but neither has yet been as widely
adopted in practice as RSA.
The RSA digital signature scheme applies the sender‘s private key to a message to generate a signature. The
signature can then be verified by applying the corresponding public key to the message and the signature
through the verification process, providing either a valid or invalid result. These two operations — sign and
verify — comprise the RSA digital signature scheme.
Any signature generated by the first operation will always verify correctly with the second operation if the
corresponding public key is used. If the signature was generated differently or if the message was altered after
being signed, then the chances of the second operation verifying correctly are extremely small; with typical
parameters, the chance is roughly 1 in 2160 or essentially zero. Although there are better ways to forge a
signature than just guessing, the use of a sufficiently large key ensures security by making it computationally
impractical to do so. For instance, it has been estimated to take thousands or even millions of years to break a
given 1024-bit key (find the private key, given the public key), depending on the amount of computing power
applied.
Taking a closer look at the signature generation portion of the process, the first step in generating an RSA
signature is applying a cryptographic hash function to the message. The hash function is specifically designed
to reduce a message of any length to a short number, called the ―hash value‖ (typically 160 bits long), and to
do it in a way such that two conditions are satisfied:
It is difficult to find a message with a specific hash value.
It is difficult to find two messages with the same hash value (an easier problem to solve).
Caution
The care should be taken while generating encryption/decryption key, it must be unique due to security reason.
The first requirement is needed, for example, in financial systems. When a customer‘s computer orders a
bank‘s computer to buy a ton of gold, the bank‘s Computer needs to be able to make sure that the computer
giving the order really belongs to the company whose account is to be debited.
What happens if Alice later denies sending the message? Step 1 is that everyone sues everyone. Finally, when
the case comes so court and Alice vigorously- denies sending Bob the disputed message, the judge will ask
Bob how he can be sure that the disputed message came from Alice and not from Trudy. Bob first points out
that BB will not accept a message from Alice unless it is encrypted with KA, so there is no possibility of Trudy
sending BB a false message from Alice.
Bob then dramatically produces Exhibit A, KBB (A, t, P). Bob says that this is a message signed by BB which
proves Alice sent P to Bob. The judge then asks BB (whom everyone trusts) to decrypt Exhibit A. When BB
testifies that Bob is telling the truth, the judge decides in favour of Bob. Case dismissed.
One potential problem with the signature protocol of Figure11.4 is Trudy replaying either message. To
minimize this problem, timestamps are used throughout. Furthermore, Bob can check all recent messages to
see if RA was used in any of them. If so, the message is discarded as a replay. Note that Bob will reject very
old messages based on the timestamp. To guard against instant replay attacks, Bob just checks the RA of every
incoming message to see if such a message has been received from Alice in the past hour: If not, Bob can
safely assume this is a new request.
How the Stroke objects that constitute a signature should be collected, stored, and authenticated are topic areas
that are not discussed in the Tablet PC Platform SDK or other Microsoft software documentation. This case
provides some initial context for your own research and discussion around these tasks.
The four core discussion and policy areas you should consider with respect to e-signatures are:
Collection—can your e-signatures be collected with a simple Ink Picture control, or with a component that
you have built to match your business needs, or with a third-party product?
Protection—how can the expropriation and misuse of the e-signature‘s Strokes be prevented?
Storage—how and where should the e-signature‘s Strokes be saved, if at all?
Validation—how can your application perform signature recognition (validation) of the collected Strokes?
Questions
1. What is e-signature?
2. Explain digital ink signature?
11.11 Summary
Authentication is the process of verifying that information is coming from a trusted source.
CHAP is the successor of the Plain Authentication Protocol (PAP), which transmits the username and
password in clear text over the network media.
Digital Signature Standard (DSS) is the digital signature algorithm (DSA) developed by the U.S. National
Security Agency (NSA) to generate a digital signature for the authentication of electronic documents.
A digital signature is basically a way to ensure that an electronic document (e-mail, spreadsheet, text file,
etc.) is authentic.
Cryptography is a security-related technology.
11.12 Keywords
Cipher text: It is encrypted form. And unreadable until it has been converted into plain text (decrypted).
Hash function: It is any algorithm or subroutine that maps large data sets, called keys, to smaller data sets.
Plain test: It is clear test.
Private Key: It is a secret key, used in Asymmetric Encryption.
Public key: It is a publically distributed key, used in Asymmetric Encryption.
Radius: It is authentication protocol
12.0 Objectives
After studying this chapter, you will be able to:
Explain the business and computer
Discuss the e-mail and e-commerce
Explain the project management
Discuss the computers in personnel administration
Discuss the accounting and marketing
Explain the computers in cost and budget control
Discuss the manufacturing and materials management
Explain the banking and insurance and stock broking
Explain the purchasing and computers in warehousing
12.1 Introduction
Using a computer and the internet, you will learn about information technology and online conferencing by
interacting with other students and your course teacher. The first module in the course teaches you the basics
of information technology, the business role of computers, and how to understand computers. In the second
module of the course, you will learn about the various computer operating systems, applications software, and
hardware add-ons to make computers more effective. The third module of the course deals with
communication systems, computer networks, and the Internet. The final module discusses workplace
implications and issues associated with the World Wide Web.
12.3 E-Mail
Electronic mail, commonly known as email or e-mail, is a method of exchanging digital messages from an
author to one or more recipients. Modern email operates across the Internet or other computer networks. Some
early email systems required that the author and the recipient both be online at the same time, in common with
instant messaging. Today‘s email systems are based on a store-and-forward model. Email servers accept,
forward, deliver and store messages. Neither the users nor their computers are required to be online
simultaneously; they need connect only briefly, typically to an email server, for as long as it takes to send or
receive messages.
An email message consists of three components, the message envelope, the message header, and the message
body. The message header contains control information, including, minimally, an originator‘s email address
and one or more recipient addresses. Usually descriptive information is also added, such as a subject header
field and a message submission date/time stamp.
Originally a text-only (7-bit ASCII and others) communications medium, email was extended to carry multi-
media content attachments, a process standardized in RFC 2045 through 2049. Collectively, these RFCs have
come to be called Multipurpose Internet Mail Extensions (MIME).
Three types of email marketing:
Direct Email
Direct email involves sending a promotional message in the form of an email. It might be an announcement of
a special offer, for example. Just as you might have a list of customer or prospect postal addresses to send your
promotions too, so you can collect a list of customer or prospect email addresses.
Retention Email
Instead of promotional email designed only to encourage the recipient to take action (buy something, sign-up
for something, etc.), you might send out retention emails. These usually take the form of regular emails known
as newsletters. A newsletter may carry promotional messages or advertisements, but will aim at developing a
long-term impact on the readers. It should provide the readers with value, which means more than just sales
messages. It should contain information which informs, entertains or otherwise benefits the readers.
Advertising in Other People’s Emails
Instead of producing your own newsletter, you can find newsletters published by others and pay them to put
your advertisement in the emails they send their subscribers. Indeed, there are many email newsletters that are
created for just this purpose to sell advertising space to others.
Caution
Be aware while forwarding mail. Forwarding email is so simple that viruses can quickly infect many
machines. Most viruses do not even require users to forward the email they scan a users‘ computer for email
addresses and automatically send the infected message to all of the addresses they find.
12.4 E-Commerce
Simplest form ecommerce is the buying and selling of products and services by businesses and consumers over
the Internet. People use the term "ecommerce" to describe encrypted payments on the internet.
Sometimes these transactions include the real-time transfer of funds from buyer to seller and sometimes this is
handled manually through an eft-pos terminal once a secure order is received by the merchant.
Internet sales are increasing rapidly as consumers take advantage of lower prices offer by wholesalers retailing
their products. This trend is set to strengthen as Web sites address consumer security and privacy concerns.
Mobile Computing
Mobile Computing involves the development and deployment of specialized software and technologies that
enable mobile and hand-held computing devices to function, such as smart phones, PDAs, and pocket PCs. In
this knowledge domain, students apply their software development and deployment knowledge to many
different business and scientific areas such as the internet, desktop computing and enterprise-scale
communications.
Students in this area learn the theory and practice of creating highly available and secure voice and data
networks. Areas of emphasis include routing and switching, system and network administration, and system
management.
Wireless Networking
In Wireless Networking, students have the opportunity to apply their computing fundamentals to the growing
world of wireless communications and technologies. This includes all aspects of radio-frequency-based
communications, the technologies used for multi-point digital communications, and the application of these
technologies to corporate, IT project and process management.
In project and process management, students will augment their fundamental skills in computing and
information technology through advanced courses and ongoing research in the skills needed for successful
project management. These include human resource management, change control, risk management, and the
tools and techniques required for scope definition and management.
In this knowledge domain, students extend their skills in database systems to develop specialized data
management solutions for both business management efforts and discovery support systems in the life
sciences.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
12.7 Accounting
The systematic recording, reporting, and analysis of financial transactions of a business. The person in charge
of accounting is known as an accountant, and this individual is typically required to follow a set of rules and
regulations, such as the generally accepted accounting principles. Accounting allows a company to analyze the
financial performance of the business, and look at statistics such as net profit.
12.7.1 Purpose of Accounting
1. Economists may define it as the practical application of economic theory in that it measures income and
values assets.
2. Corporate managers may define it as a set of timely gauges that helps them actually manage the
organisation.
3. Labour unions may see it as a monitor of an organisations activities and performance, particularly in
relation to the benefits secured by employees versus owners.
4. A Board of directors or a Chief Executive Officer (CEO) may see accounting as a data process and
reporting system that provide the information needed for sound financial or economic decision making for
their organisation.
5. Banks and other providers of loan funds may see it as a process of providing reports showing the financial
position of an organisation in relation to the assets owned, amounts owed to others and monies invested as
well as the profitability of the organisation‘s operations in relation to repaying the loan with interest.
6. Governments may see it as a way of making organisations accountable to the general community by way
of taxation contributions and transparency in the outcomes from their decision making.
7. Potential investors may see it as a method of evaluating an organisation‘s effectiveness in relation to
industry benchmarks and the investor‘s required returns.
8. Investors in some failed enterprises may sadly call it a method of fooling some of the people, some of the
time with what has been dubbed.
Financial Accounting
Financial accounting is focused on producing a limited set of specific prescribed financial statements in
accordance with generally accepted accounting principles. The central outputs from financial accounting are
audited financial statements such as the balance sheet and income statement that provides a scorecard by
which a company‘s overall past performance can be judged by outsiders.
This branch of accounting targets those external stakeholders that have an interest in the reporting enterprise,
but that are not involved in the day-to-day operations. The reports produced by this branch are used for so
many different purposes that it is often called ―general-purpose accounting‖. In addition to the financial
statements, external stakeholders also have access to financial reporting via press releases that are sent directly
to investors and creditors or via the open communications of the internet.
The emphasis in financial accounting is on summaries of financial consequences of past activities and
decisions. So, only summarized data is prepared, that covers the entire organization. The data prepared must be
objective, precise and be verifiable, usually by an outside ‗auditor‘. This style of reporting must follow the
generally accepted accounting principles that are set by peak accounting bodies in conjunction with
government agencies. The numbers used in financial accounting are historical in nature. Now whilst appearing
set in stone, financial statements are actually based on estimates, judgments, and assumptions. This is why
financial statements usually include ‗notes to the accounts‘ which are the explanations from management that
help explain and interpret the numerical information. A more specialised area of financial accounting is tax
accounting.
Management Accounting
Managerial accounting deals with information that is not made public and is used for internal decision making
only. These reports are far more detailed than financial accounting and can cover performances and activities
by departments, products, customers, and employees. It is an accounting system that helps management
achieve the goals and objectives of the organisation with an emphasis on the measurement, analysis,
communication and the control of financial and non-financial information. This branch of accounting is
primarily interested in assisting the organisation‘s department heads, division managers, and supervisors make
better decisions about the day-to-day operations of the business and in particular, those relating to the planning
and control decisions. The essential data is conveyed in a wide variety of reports and is specifically targeted at
those who direct and control the organisation. These reports help to promote more efficient and effective plan
making, resources organizing, personnel directing, motivating and performance evaluation, and operations
control. Unlike financial accounting, there are no external rules governing management accounting. The
emphasis in this branch is on making decisions that affect the future with results being compared to budgets,
activity-based costing, and financial planning or to industry benchmarks. These reports are delivered
frequently and in a timely way according to the requirements of management. Most reports are analytical in
nature with a heavy emphasis on variances in the key indicators that monitor the financial performance of the
business unit. A more specialised area of management accounting is cost accounting.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Caution
The transaction should be so secure to avoid the hacking of the account by the hackers.
Budgetary Control
No system of planning can be successful without having an effective and efficient system of control.
Budgeting is closely connected with control. The exercise of control in the organization with the help of
budgets is known as budgetary control. The process of budgetary control includes:
1. Preparation of various budgets.
2. Continuous comparison of actual performance with budgetary performance.
3. Revision of budgets in the light of changed circumstances.
A system of budgetary control should not become rigid. There should be enough scope of flexibility to provide
for individual initiative and drive. Budgetary control is an important device for making the organization. More
efficient on all fronts. It is an important tool for controlling costs and achieving the overall objectives.
2. Which one of the following is NOT a benefit for the entrepreneur from the accounting process
(a). It is a reality check (b). It helps calculate the tax assessment
(c). It only has to report the good results (d). It helps monitor key indicators.
4. Which one of the following is NOT a function usually performed by the accountant..............
(a). Cash flow and profit forecasting
(b). Recording the day to day financial transactions
(c). Resolving complex financial reporting issues
(d). Tax planning
5. Computer based control systems can be combined with manufacturing technology, such as robots, machine
tools, automated guided vehicles, to improve manufacturing operations
(a). True (b). False
12.10.2 Banking
Computers are getting more sophisticated. They have given banks a potential they could only dream about and
have given bank customers high expectations. The changes that new technologies have brought to banking are
enormous in their impact on officers, employees, and customers of banks. Advances in technology are
allowing for delivery of banking products and services more conveniently and effectively than ever before thus
creating new bases of competition. Rapid access to critical information and the ability to act quickly and
effectively will distinguish the successful banks of the future. The bank gains a vital competitive advantage by
having a direct marketing and accountable customer service environment and new, streamlined business
processes. Consistent management and decision support systems provide the bank that competitive edge to
forge ahead in the banking marketplace.
Major Applications
The advantages accruing from computerization are three-directional to the customer, to the bank and to the
employee.
For the Customer
Banks are aware of customer's need for new services and plan to make them available. IT has increased the
level of competition and forced them to integrate the new technologies in order to satisfy their customers. They
have already developed and implemented a certain number of solutions among them:
Self-inquiry Facility: Facility for logging into specified self-inquiry terminals at the branch to inquire and
view the transactions in the account.
Remote Banking: Remote terminals at the customer site connected to the respective branch through a
modem, enabling the customer to make inquiries regarding his accounts, on-line, without having to move
from his office.
Anytime Banking- anywhere Banking: Installation of ATMs which offer non-stop cash withdrawal,
remittances and inquiry facilities. Networking of computerized branches inter-city and intra-city will
permit customers of these branches, when interconnected, to transact from any of these branches.
Tele Banking: A 24-hour service through which inquiries regarding balances and transactions in the
account can be made over the phone.
Electronic Banking: This enables the bank to provide corporate or high value customers with Graphical
User Interface (GUI) software on a PC, to inquire about their financial transactions and accounts, cash
transfers, cheque book issue and inquiry on rates without visiting the bank. Moreover, LC text and details
on bills can be sent by the customer, and the bank can download the same. The technology used to provide
this service is called electronic data interchange (EDI). It is used to transmit business transactions in
computer-readble form between organizations and individuals in a standard format.
12.11.1 Purchasing
The activity of acquiring goods or services to accomplish the goals of an organization.
The major objectives of purchasing are to:
1. Maintain the quality and value of a company's products.
2. Minimize cash tied-up in inventory.
3. Maintain the flow of inputs to maintain the flow of outputs.
4. Strengthen the organization‘s competitive position.
Purchasing may also involve
Development and review of the product specifications
Receipt and processing of requisitions
Advertising for bids
Bid evaluation
Award of supply contracts
Inspection of good received
Their appropriate storage and release
12.13 Summary
Computers are used to help design products using computer generated models and 3D drawings. Reduces
the need to build physical models to test certain conditions, known as prototypes.
Manufacturing operational control focuses on day-to-day operations, and the central idea of this process is
effectiveness and efficiency.
Maintains a thorough knowledge of all (Electronic Banking) EB policies and procedures, protocols,
authorizations, interfaces with external EB systems, as well as emerging EB technology and applications.
Managerial accounting deals with information that is not made public and is used for internal decision
making only.
Inventory control and management is a crucial process, especially in establishments related to retail and
production.
12.14 Keywords
Banks: Are aware of customer's need for new services and plan to make them available. IT has increased the
level of competition and forced them to integrate the new technologies in order to satisfy their customers.
Computer-integrated Manufacturing (CIM): Computers control the whole production line. Best example is in
car production where robots undertake much of the work, reducing the need for labour to perform boring,
routine tasks.
Data Warehouses: Contain a wide variety of data that present a coherent picture of business conditions at a
single point in time.
Information Technology: It concerned with improvements in a variety of human and organizational problem-
solving endeavors through the design, development.
Internal Storage: It allows transferring metadata together with the data it describes; thus, metadata is always
at hand and can be manipulated easily.
1.0 Objectives
After studying this chapter, you will be able to:
Understand system concepts
Discuss the characteristics of system
Explain types of system
Discuss management information system (MIS)
Discuss decision support system (DSS)
Discuss enterprise resource planning (ERP) systems
1.1 Introduction
The concept ―system‖ refers to wholeness, interrelationships between parts or elements and self-
regulation. It is a systematic organisation of the elements that operate in a unique way if we can take
an example of our motorbike. It has different parts viz. brake, handle, gear, battery etc.
All these parts have their own functions. These are all like elements. If any part does not function, the
other parts are also affected and the bike cannot functions this all discussion shows that the elements
are interrelated and interdependent, functioning towards the bike effective operation. With all these
characteristics, the bike becomes a system. Hence a system has a number of elements functioning
together in an interrelated and interdependent manner towards the attainment of certain functions of
the system as a whole. A system is also dependent to the surroundings. Man is a social animal who
lives in a more or less organised group of people which is known as society. If we apply the concept
of system as described above, society can be considered a system, with a set of goals to be achieved,
different sections with different functions, working towards these common goals of the society. A
society has a certain set of elements working toward the goal of managing funds for the welfare of
the people, another set for taking care of the health of the people, ano ther set for education of the
people and for employment of the people, and so on and so forth. Unless all these different sections
of the society work effectively in a coordinated tradition, the goal of the society i.e. successful
proportion of the society cannot be achieved.
Theoretical Framework
An open system exchanges matter and energy with its surroundings. Most systems are open systems;
like a car, coffeemaker, or computer. A closed system exchanges energy, but not matter, with its
environment; like Earth or the project Biosphere 2 or 3. An isolated system e xchanges neither matter
nor energy with its environment. A theoretical example of such system is the Universe.
Process and Transformation Process
A system can also be viewed as a bounded transformation process, that is, a process or collection of
processes that transforms inputs into outputs. Inputs are consumed; outputs are produced. The
concept of input and output here is very broad. For example, an output of a passenger ship is the
movement of people from departure to destination.
Subsystem
A subsystem is a set of elements, which is a system itself, and a component of a larger system.
System Model
A system comprises multiple views. For the man-made systems it may be such views as planning,
requirement (analysis), design, implementation, deployment, str ucture, behaviour, input data, and
output data views. A system model is required to describe and represent all these multiple views.
System Architecture
System architecture, using one single integrated model for the description of multiple views such as
planning, requirement (analysis), design, implementation, deployment, structure, behaviour, input
data, and output data views, is a kind of system model. Scholars in various disciplines who are
concerned about the tendency toward the fragmentation of knowledge and the increasing complexity
of phenomena have sought a unifying approach to knowledge. Some biologist, developed a general
systems thereby that applied to any arrangement of elements such as cells, people, societies or even
planets. Norbert Wiener, a mathematician observed that information and communications provides
connecting links for unifying fragments or elements, His systems concept of information theory,
which shows the parallel between the functioning of human beings and electronic systems, la id the
foundation for today‘s computer systems.
Systems analysis and information systems were founded in general systems theory, which emphasizes
a close look at all, parts of a system. Too often analysts focus on only one component and over look
other equally important component. General systems theory is concerned with ―developing a
systematic, the theoretical framework upon which to make decisions‘. It discourages thinking in a
vacuum and encourages consideration of all the activities of the organizat ion and its external
environment. Pioneering work in general systems theory emphasized that organizations be viewed as
total systems. The idea of systems has become most practical and necessary in conceptualizing the
interrelationships and integration of operations, especially when using computers. Thus a system is a
way of thinking about organizations and their problems. It also involves a set of techniques that helps
in solving problems.
Caution
The important point is that users must know the central objective of a computer application early in
the analysis for a successful design and conversion.
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
MIS structure
The concept of the MIS has evolved over a period of time comprising many different facets of the
organizational function. MIS is a necessity of all the organizations.
The initial concept of MIS was to process data from the organization and presents it in the form of
reports at regular intervals. The system was largely capable of handling the data from collection to
processing. It was more impersonal, requiring each individual to pick and choose the processed data
and use it for his requirements. This concept was further modified when a distinction was made
between data and information. The information is a product of an analysis of data. T his concept is
similar to a raw material and the finished product. What are needed are information and not a mass of
data. However, the data can be analyzed in a number of ways, producing different shades and
specifications of the information as a product. It was, therefore, demanded that the system concept be
an individual- oriented, as each individual may have a different orientation towards the information.
This concept was further modified, that the system should present information in such a form and
format that it creates an impact on its user, provoking a decision or an investigation. It was later
realized then even though such an impact was a welcome modification, some sort of selective
approach was necessary in the analysis and reporting. Hence, the concept of exception reporting was
imbibed in MIS. The norm for an exception was necessary to evolve in the organization. The concept
remained valid till and to the extent that the norm for an exception remained true and effective. Since
the environment turns competitive and is ever changing, fixation of the norm for an exception
becomes ka futile exercise at least for the people in the higher echelons of the organization. The
concept was then evolved that the system should be capable of handling a need ba sed exception
reporting. This need maybe either of an individual or a group of people. This called for keeping all
data together in such a form that it can be accessed by anybody and can be processed to suit his
needs. The concept is that the data is one but it can be viewed by different individuals in different
ways. This gave rise to the concept of database, and the MIS based on the database proved much
more effective.
Over a period of time, when these conceptual developments were taking place, the conce pt of the end
user computing using multiple databases emerged. This concept brought a fundamental charge in
MIS. The change was decentralization of the system and the user of the information becoming
independent of computer professionals. When this becomes a reality, the concept of MIS changed to a
decision making system. The job in a computer department is to manage the information resource and
leave the task of information processing to the user. The concept of MIS in today‘s world is a system
which handles the databases, databases, provides com-putting facilities to the end user and gives a
variety of decision making tools to the user of the system.
The concept of MIS gives high regard to the individual and his ability to use information. An MIS
gives information through data analysis. While analyzing the data, it relies on many academic
disciplines. These include the theories, principles and concepts from the Management Science,
Psychology and Human Behaviour, making the MIS more effective and useful. The se academic
disciplines are used in designing the MIS, evolving the decision support tools for modelling and
decision making.
The foundation of MIS is the principles of management and if its practices. MIS uses the concept of
management Information System can be evolved for a specific objective if it is evolved after
systematic planning and design. It calls for an analysis of a business, management views and policies,
organization culture and the culture and the management style. This is possible only when it in
conceptualized as system with an appropriate design. The MIS, therefore, relies heavily on the
systems theory offers solutions to handle the complex situations of the input and output flows. It uses
theories of communication which helps to evolve a s ystem design capable of handling data inputs,
process, and outputs with the least possible noise or distortion in transmitting the information form a
source to a destination. It uses the principles of system Design, viz., an ability of continuous
adjustment or correction in the system in line with the environmental change in which the MIS
operates. Such a design helps to keep the MIS tuned with the business management‘s needs of the
organization.
The concept, therefore, is a blend of principle, theories and practices of the Management, Information
and System giving rise to single product known as Management Information System (MIS). The
conceptual view of the MIS is shown as a pyramid in Figure.1.1.
The Physical view of the MIS can be seen as assembly of several subsystems based on the databases
in the organization. These subsystems range from data collection, transaction processing and
validating, processing, analyzing and storing the information in databases. The subsystem could be at
a functional level or a corporate level. The information is evolved through them for a functional or a
department management and it provides the information for the management of business at the
corporate level. The physical view of the MIS can be shown as in Figure.1.2.
Figure.1.2: The physical view of the MIS.
The MIS is a product of a multi–disciplinary approach to the business management. The MIS differs
since the people in two organizations involved in the same business. The MIS is for the people in the
organization. The MIS model may be the same but it differs greatly in the contents.
The MIS, therefore, is a dynamic concept subject to change, time and again, with a change in the
business management process. It continuously interacts with the internal and the external
environment of the business and provides a corrective mechanism in the system so that the change
needs of information are with effectively. The MIS, therefore, is a dynamic design, the primary
objectively. The MIS, therefore, is a dynamic design the primary objective of which is to the
information the information for decision making and it is developed considering the organizational
fabric, giving due regard to the people in the organizational the management f unctions and the
managerial and the managerial control.
The MIS model of the organization changes over a time as the business passes through several phases
of developmental growth cycle. It supports the management of the business in each phase by giving
the information which is crucial in that phase. Every phase has critical success factors in each phase
of growth cycle and the MIS model gives more information on the critical success factors for decision
making.
Caution
MIS needs to be kept under a constant review and modification to meet the corporate needs of the
information, prescribed in product design for the organization.
2. The concept ‗system‘ refers to wholeness, interrelationships between parts or ............and self -
regulation.
(a).object (b) system
(c) elements (d) function
3. Successful functioning of each section is determinant for maintaining the continuity of the society.
(a) True (b) False
4. In computer science and information science, system is a software system which has components
as its .................inter-process communications as its behavior.
(a).structure (b) observable
(c) structure and observable (d) None of these
(1) Monetary cost: The decision support system requires investing in information system to collect
data from many sources and analyze them to support the decision making. Some analysis for Decision
Support System needs the advance of data analysis, statistics, econometrics a nd information system,
so it is the high cost to hire the specialists to set up the system.
(2) Overemphasize decision making: Clearly the focus of those of us interested in computerized
decision support is on decisions and decision making. Implementing Decision Support System may
reinforce the rational perspective and overemphasize decision processes and decision making. It is
important to educate managers about the broader context of decision making and the social, political
and emotional factors that impact organizational success. It is especially important to continue
examining when and under what circumstances Decision Support System should be built and used.
We must continue asking if the decision situation is appropriate for using any type of Decisio n
Support System and if a specific Decision Support System is or remains appropriate to use for
making or informing a specific decision.
(3) Assumption of relevance: According to researcher ―Once a computer system has been installed it
is difficult to avoid the assumption that the things it can deal with are the most relevant things for the
manager‘s concern.‖ The danger is that once Decision Support System becomes common in
organizations, that managers will use them inappropriately. There is limited eviden ce that this occurs.
Again training is the only way to avoid this potential problem.
(4) Transfer of power: Building Decision Support System, especially knowledge -driven Decision
Support System, may be perceived as transferring decision authority to a software program. This is
more a concern with decision automation systems than with Decision Support System. We advocate
building computerized decision support systems because we want to improve decision making while
keeping a human decision maker in the ―decision loop‖. In general, we value the ―need for human
discretion and innovation‖ in the decision making process.
(5) Unanticipated effects: Implementing decision support technologies may have unanticipated
consequences. It is conceivable and it has been demonstrated that some Decision Support System
reduce the skill needed to perform a decision task. Some Decision Support System overloads decision
makers with information and actually reduces decision making effectiveness. We are sure that other
such unintended consequences have been documented. Nevertheless, most of the examples seem
correctable, avoidable or subject to remedy if and when they occur.
(6) Obscuring responsibility: The computer does not make a ―bad‖ decision, people do.
Unfortunately some people may deflect personal responsibility to a Decision Support System.
Managers need to be continually reminded that the computerized decision support system is an
intermediary between the people who built the system and the people who use the system. The entire
responsibility associated with making a decision using a Decision Support System resides with people
who built and use the system.
(7) False belief in objectivity: Managers who use Decision Support System may or may not be more
objective in their decision making. Computer software can encourage more rational action, but
managers can also use decision support technologies to rationalize their actions. It is an
overstatement to suggest that people using a Decision Support System are more objective and rational
than managers who are not using computerized decision support.
(8) Status reduction: Some managers argue using a Decision Support System will diminish their
status and force them to do clerical work. This perceptual problem can be a disadvantage of
implementing a Decision Support System. Managers and IS staff who advocate building and using
computerized decision support need to deal with any status issues that may arise. This perception
may, or should be less common now that computer usage is common and accepted in organizations.
(9) Information overload: Too much information is a major problem for people and many Decision
Support System increase the information load. Although this can be a problem, Decision Support
System can help managers organize and use information. Decision Support System can actually
reduce and manage the information load of a user. Decision Support System developers need to try to
measure the information load created by the system and Decision Support System users need to
monitor their perceptions of how much information they are receiving. The increasing ubiquity of
handheld, wireless computing devices may exacerbate this problem and disadvantage.
6. A system is a set of ..........................which are different from relationships of the set or its
elements to other elements or sets.
(a).elements (b) relationships
(c) Both (a) and (b) (d) None of these
7. Natural systems may not have an apparent objective but their outputs can be interpreted as
purposes.
(a) True (b) False
8. Human-made systems are made with purposes that are achieved by the delivery of...............
(a).inputs (b) inputs and outputs
(c) process (d) outputs
Components of ERP:
Customer Relationship Management (CRM) - Sales Force Automation, Quoting & Estimating, Order
Entry
Manufacturing - Forecasting, Material & Production Planning (MPP), Shop Floor Control, Routings,
Capacity Planning & Scheduling, Purchasing, Lot/Serial Control, Inventory, Workflow
Supply Chain - Demand Planning, Purchasing, Supplier Management, Purchasing to Jobs/Projects
Financials - Costing, Accounts Receivable (AR), Accounts Payable (AP), General Ledger (GL)
Human Resources - Labor Collection, Payroll, Benefits
Business Performance Management (BPM) - Business Intelligence (BI), Multi-Entity Consolidation,
Corporate Governance, Reporting
Engineering & Product Lifecycle Management (EPLM) - Parts & Bill of Materials (BOM's), CAD
Interface, Routings, Parts & Product Attributes, Change Management
Business Benefits:
- Syncronization - End-to-end business function integration
- Accessibility - All key business information in one place
- Responsiveness - Real-time workflow and reporting tools
- Decisiveness - Informed decision making
- Consistency - Everyone in the company is on the same page
- Efficiency - Elimination of most or all side systems and manual calculations
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
1.8 Summary
Systems analysis and information systems were founded in general systems theory, which
emphasizes a close look at all, parts of a system.
Open system has many interfaces with its environment i.e. system that interacts freely with its
environment, taking input and returning output.
Closed system does not interact with the environment; changes in the environment and
adaptability are not issues for closed system.
Management information system provides information that is needed to manage organizations
efficiently and effectively. Management information systems involve three primary resources:
people, technology, and information or decision making.
Initial concept of MIS was to process data from the organization and presents it in the form of
reports at regular intervals.
Computerized decision support systems became practical with the development of minicomputers,
timeshare operating systems and distributed computing.
1.9 Keywords
Closed System: A system that cannot exchange matter with its surroundings.
Decision Support System: It is a collection of integrated software applications and hardware that
form the backbone of an organization‘s decision making process.
Enterprise Resource Planning: is business management software that allows an organization to use a
system of integrated applications to manage the business.
Management Information System: It is a product of a multi- disciplinary approach to the business
management.
Physical systems: It is tangible entities that may be static or dynamic in operation.
Subsystem: It is a set of elements, which is a system itself, and a component of a larger system.
2.0 Objectives
After studying this chapter, you will be able to:
Define system
Explain the concept of system development life cycle
Describes the phases of system development life cycle
Explain the considerations for candidate systems
2.1 Introduction
The systems development life cycle (SDLC) is the process of understanding how an information
system (IS) can support business needs, designing the system, building it, and delivering it to users.
If you have taken a programming class or have programmed on your own, this probably sounds pret ty
simple.
Most of us would like to think that these problems only occur to ―other‖ people or ―other‖
organizations, but they happen in most companies. Sampling of significant IT project failures. Even
Microsoft has a history of failures and overdue projects (e.g., Windows 1.0, Windows 95). Although
we would like to promote this chapter as a ―silver bullet‖ that will keep you from experiencing failed
IS projects, we must admit that a silver bullet guaranteeing IS development success does not exist.
Instead, this chapter will provide you with several fundamental concepts and many practical
techniques that you can use to improve the probability of success.
The key person in the SDLC is the systems analyst who analyzes the business situation, identifies
opportunities for improvements, and designs an information system to implement them. Being a
systems analyst is one of the most interesting, exciting, and challenging jobs around. As a systems
analyst, you will work with a variety of people and learn how they co nduct business. Specifically,
you will work with a team of systems analysts, programmers, and others on a common mission.
In a system the different components are connected with each other and they are interdependent. For
example, human body represents a complete natural system. We are also bound by many national
systems such as political system, economic system, educational system and so forth. The objective of
the system demands that some output is produced as a result of processing the suit able inputs. A well-
designed system also includes an additional element referred to as ‗control‘ that provides a feedback
to achieve desired objectives of the system.
The systems development life cycle (SDLC) is a conceptual model used in project management that
describes the stages involved in an information system development project, from an initial
feasibility study through maintenance of the completed application.
Various SDLC methodologies have been developed to guide the processes involved, including the
waterfall model (which was the original SDLC method); rapid application development (RAD); joint
application development (JAD); the fountain model; the spiral model; build and fix; and synchronize -
and-stabilize. Frequently, several models are combined into some sort of hybrid methodology.
Documentation is crucial regardless of the type of model chosen or devised for any application, and is
usually done in parallel with the development process. Some methods work better for specific types
of projects, but in the final analysis, the most important factor for the success of a project may be
how closely the particular plan was followed.
Caution
Users of the system should be kept up-to-date concerning the latest modifications and procedures.
2. In a system the different components are connected with each other and they are..................
(a). interdependent (b). dependent
(c). inter-process (d). None of these.
3. The objective of the system demands that some output is produced as a result of processing the
suitable...................
(a). output (b). inputs.
(c). process (d). None of these.
5. In the system analysis and design terminology, the system development life cycle also
means..........................
(a). design (b). software development life cycle
(c). Both (a) and (b) (d). None of these.
All the data and the findings must be documented in the form of detailed data flow diagrams (DFDs),
data dictionary, logical data structures and miniature specification.
The main points to be discussed in this stage are:
Specification of what the new system is to accomplish based on the user requirements.
Functional hierarchy showing the functions to be performed by the new system and their
relationship with each other.
Functional network, which are similar to function hierarchy but they highlight the functions
which are common to more than one procedure.
List of attributes of the entities – these are the data items which need to be held about each entity
(record)
2.4.4 System Analysis
Systems analysis is a process of collecting factual data, understand the processes involved,
identifying problems and recommending feasible suggestions for improving the system functioning.
This involves studying the business processes, gathering operational data, understand the information
flow, finding out bottlenecks and evolving solutions for overcoming the weaknesses of the system so
as to achieve the organizational goals. System Analysis also includes subdividing of complex process
involving the entire system, identification of data store and manual processes.
The major objectives of systems analysis are to find answers for each business process: Wha t is being
done How is it being done, who is doing it, When is he doing it, Why is it being done and How can it
be improved? It is more of a thinking process and involves the creative skills of the System Analyst.
It attempts to give birth to a new efficient system that satisfies the current needs of the user and has
scope for future growth within the organizational constraints. The result of this process is a logical
system design. Systems analysis is an iterative process that continues until a preferred a nd acceptable
solution emerges.
Preliminary or General Design: In the preliminary or general design, the features of the new system
are specified. The costs of implementing these features and the benefits to be derived are estimated.
If the project is still considered to be feasible, we move to the detailed design stage.
Structured or Detailed Design: In the detailed design stage, computer oriented work begins in
earnest. At this stage, the design of the system becomes more structured. Structure design is a blue
print of a computer system solution to a given problem having the same components and inter -
relationships among the same components as the original problem. Input, output, databases, forms,
codification schemes and processing specifications are drawn up in detail.
In the design stage, the programming language and the hardware and software platform in which the
new system will run are also decided. There are several tools and techniques used for describing the
system design of the system.
These tools and techniques are:
Flowchart
Data flow diagram (DFD)
Data dictionary
Structured English
Decision table
Decision tree
The system design involves:
Defining precisely the required system output
Determining the data requirement for producing the output
Determining the medium and format of files and databases
Devising processing methods and use of software to produce output
Determine the methods of data capture and data input
Designing Input forms
Designing Codification Schemes
Detailed manual procedures
Documenting the Design
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………….
2.4.6 Coding
The system design needs to be implemented to make it a workable system. This demands the coding
of design into computer understandable language, i.e., programming language. This is also called the
programming phase in which the programmer converts the program specifications into computer
instructions, which we refer to as programs. It is an important stage where the defined procedures are
transformed into control specifications by the help of a computer language. The programs coordinate
the data movements and control the entire process in a system. It is generally felt that the programs
must be modular in nature. This helps in fast development, maintenance and future changes, if
required.
2.4.7 Testing
Before actually implementing the new system into operation, a test run of the system is done for
removing the bugs, if any. It is an important phase of a successful system. After codifying the whole
programs of the system, a test plan should be developed and run on a given set of test data. The
output of the test run should match the expected results. Sometimes, system testing is considered a
part of implementation process.
Using the test data following test run are carried out:
Program test
System test
Program test: When the programs have been coded, compiled and brought to working conditions,
they must be individually tested with the prepared test data. Any undesirable happening must be
noted and debugged (error corrections)
System Test: After carrying out the program test for each of the programs of the system and errors
removed, then system test is done. At this stage the test is done on actual data. The complete system
is executed on the actual data. At each stage of the execution, the results or output of the sys tem is
analysed. During the result analysis, it may be found that the outputs are not matching the expected
output of the system. In such case, the errors in the particular programs are identified and are fixed
and further tested for the expected output.
When it is ensured that the system is running error -free, the users are called with their own actual
data so that the system could be shown running as per their requirements.
2.4.8 Implementation
After having the user acceptance of the new system developed, the implementation phase begins.
Implementation is the stage of a project during which theory is turned into practice.
The major steps involved in this phase are:
Acquisition and Installation of Hardware and Software
Conversion
User Training
Documentation
The hardware and the relevant software required for running the system must be made fully
operational before implementation. The conversion is also one of the most critical and expensive
activities in the system development life cycle. The data from the old system needs to be converted to
operate in the new format of the new system.
The database needs to be setup with security and recovery procedures fully defined. During this
phase, all the programs of the system are loaded onto the user‘s computer. After loading the system,
training of the user starts.
Main topics of such type of training are:
How to execute the package
How to enter the data
How to process the data (processing details)
How to take out the reports
After the users are trained about the computerized system, working has to shift from manual to
computerized working. The process is called ‗Changeover‘.
2.4.9 Maintenance
Maintenance is necessary to eliminate errors in the system during its working life and to tune the
system to any variations in its working environments. It has been seen that there are always some
errors found in the systems that must be noted and corrected. It also means the review of the system
from time to time.
The review of the system is done for:
knowing the full capabilities of the system
knowing the required changes or the additional requirements
Studying the performance.
If a major change to a system is needed, a new project may have to be set up to carry out the change.
The new project will then proceed through all the above life cycle phases.
2.5.3 Prototyping
Prototyping is especially useful in situations where the requirements (and therefore the costs) are
poorly defined or when speed is needed. However, it requires effective management to make sure that
the iterations of prototyping do not continue in definitely. It is important to have tools such as 4GLs
and screen generators when using this approach. If the project is large, it is probably better to
establish the information requirements through prototyping and then use a more formal SDLC to
complete the system.
Prototyping is the process of building a model of a system. In terms of an information system,
prototypes are employed to help system designers build an information system that intuitive and easy
to manipulate for end users. Prototyping is an iterative process that is part of the analysis phase of the
systems development life cycle. During the requirements determination portion of the systems
analysis phase, system analysts gather information about the organization‘s current procedures and
business processes related the proposed information system. In addition, they study the current
information system, if there is one, and conduct user interviews and collect documentation. This
helps the analysts develop an initial set of system requirements. Prototyping can augment this process
because it converts these basic, yet sometimes intangible, specifications into a tangible but limited
working model of the desired information system. The user feedback gained from developing a
physical system that the users can touch and see facilitates an evaluative response that the analyst can
employ to modify existing requirements as well as developing new ones. Prototyping comes in many
forms - from low tech sketches or paper screens (Pictive) from which users and developers can paste
controls and objects, to high tech operational systems using CASE (computer -aided software
engineering) or fourth generation languages and everywhere in between. Many organizations use
multiple prototyping tools. For example, some will use paper in the initial analysis to facilitate
concrete user feedback and then later develop an operational prototype using fourth generation
languages, such as Visual Basic, during the design stage.
7. The...............may also reject the proposal or request some modifications in the proposal.
(a). organization (b). management
(c). system analysis (d). None of these.
8. The feasibility study is basically the test of the proposed system in the light of its workability,
meeting user‘s requirements, effective use of resources and the cost effectiveness.
(a). True (b). False
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
2.6 Summary
The systems development life cycle (SDLC) is a conceptual model used in project management
that describes the stages involved in an information system development project, from an initial
feasibility study through maintenance of the completed application.
Preliminary system study is the first stage of system development life cycle. This is a brief
investigation of the system under consideration and gives a clear picture of what actually the
physical system is?
The main goal of feasibility study is not to solve the problem but to achieve the scope. In the
process of feasibility study, the cost and benefits are estimated with greater accuracy to find the
Return on Investment (ROI).
Maintenance is necessary to eliminate errors in the system during its working life and to tune the
system to any variations in its working environments.
Prototyping is especially useful in situations where the requirements (and therefore the costs) are
poorly defined or when speed is needed.
2.7 Keywords
Feasibility Study: It is basically the test of the proposed system in the light of its workability,
meeting user‘s requirements, effective use of resources and of course, the cost effectiveness.
Program test: When the programs have been coded, compiled and brought to working conditions,
they must be individually tested with the prepared test data.
Prototyping: It is the process of building a model of a system. In terms of an information system,
prototypes are employed to help system designers build an information system that intuitive and easy
to manipulate for end users.
Structure Design: It is a blue print of a computer system solution to a given problem having the same
components and inter-relationships among the same components as the original problem.
System Life Cycle: It is an organizational process of developing and maintaining systems. It helps in
establishing a system project plan, because it gives overall list of processes and sub -processes
required for developing a system.
Systems Development Life Cycle (SDLC): It is the process of understanding how an information
system (IS) can support business needs, designing the system, building it, and delivering it to users.
3.0 Objectives
After studying this chapter, you will be able to:
Explain the concept of system analyst
Discuss the historical perspective of system analyst
Understand the role and working of systems analyst
Explain systems analysis works and feasibility
3.1 Introduction
System development can generally be thought of having two major components: systems analysis and
systems design. In System Analysis more emphasis is given to understanding the details of an
existing system or a proposed one and then deciding whether the pr oposed system is desirable or not
and whether the existing system needs improvements. Thus, system analysis is the process of
investigating a system, identifying problems, and using the information to recommend improvements
to the system.
The system analyst is the person (or persons) who guides through the development of an information
system. In performing these tasks the analyst must always match the information system objectives
with the goals of the organization. Role of System Analyst differs from orga nization to organization.
Most common responsibilities of System Analyst are following
Caution
The system analyst must have a solid understanding of computer hardware and software and should
keep up-to-date knowledge on all the latest technologies.
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
3. Time to time, the users update the analyst with the necessary information for developing the
system.
(a) True (b) False
Caution
Systems analysts must be familiar with an organization‘s needs from top to bottom in order to set up
a computer network that performs all the jobs a company requires.
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
But they were all able to see and grasp big-picture concepts very quickly, and break them down into
subcomponents. People who have a computer science or math background tend to be very technical,
and sometimes that can be a hindrance.‖ Systems analysts need to be independent thinkers -people
who can ―think out of the box‖ by grasping concepts quickly and seeing the big picture as op posed to
the small details. ―It also looks for someone who is self -motivated. Here take the ball and run with it
and come back if you have any issues,‖ says one employer who heads up a technology group.
Many systems analysts come from creative backgrounds; some return to those fields, while others
combine their artistic passions with Internet opportunities. ―If it left my position and was able to do
anything, it would go back to photography or painting or apply those talents to Web design,‖ says one
systems analyst.
3.10 Summary
Systems analyst researches problem, plans solutions, recommends software and systems, and
coordinates development to meet business or other requirements.
The system analyst is the person (or persons) who guides through the development of an
information system.
Successful systems analyst must acquire four skills analytical, technical, managerial, and
interpersonal.
The system analyst role leads and coordinates requirements elicitation and use -case modelling by
outlining the system‘s functionality and delimiting the system.
The primary objective of any system analyst is to identify the need of the organization by
acquiring information by various means and methods.
3.11 Keywords
End User: It is the person that a software program or hardware device is designed.
Computer Platforms: It is includes a hardware architecture and a software framework (including
application frameworks), where the combination allows software, particularly application software, to
run.
Interpersonal Skills: Such skills are required at various stages of development process for interacting
with the users and extracting the requirements out of them.
Problem Solving Skills: A system analyst has enough problem solving skills for defining the
alternate solutions to the system and also for the problems occurring at the various stages of the
development process.
System Analyst: It is the person who selects and configures computer systems for an organization or
business.
4.0 Objectives
After studying this chapter, you will able to:
Understand system planning
Explain that why system planning is necessary
Define strategic MIS planning
Discuss managerial and operational MIS planning
Determine the user‘s requirements
Explain strategies for determining information requirements
4.1 Introduction
Planning is fundamental to the way our cities, towns and villages look, the way they work, and the
way they relate to each other. Getting right planning is that our goals for society are easier to achieve.
Good planning can have a huge beneficial effect on the way we live our lives. It must have a vision of
how physical development can improve a community. We need a simpler, faster, more accessible
system that serves both business and the community.
Figure 4.2 shows the basic process for this phase of the Systems Development Life Cycle (SDLC).
The second major task is to perform a preliminary investigation of the selected project or projects.
Figure 4.4 shows this second task. The review process is typically initiated by someone submitting a
systems request proposal. The proposal is then evaluated.
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
Their treatment in my posts will be focused on how they are appli ed during systems planning.
Each planning activity is described in terms of:
Objectives
Responsibilities
Inputs
Method
Working documents
Tools
Deliverables
Variations
Non-analytical duties that form part of the planner‘s responsibilities within a project management
administrative framework, such as progress reporting and budgetary control, are covered in Project
Management posts.
4.3.1 Relationship to Systems Integration Life Cycle
System planning is the first phase of the life cycle, and when it i s completed, enough work will have
been done to initiate the individual projects identified in the plan. These projects will, themselves,
involve additional analysis and design, leading to development, implementation and operation of
systems. Systems integration, in other words, is a process of progressively more detailed analysis and
design that leads to working system solutions.
The relationship between the depth of analysis carried out in systems planning and in subsequent
phases of the life cycle is shown in the Figure 4.6.
The Figure 4.6 illustrates the relationship for business processes, but a similar relationship holds true
for all aspects of the systems plan (e.g., data, hardware and communications technologies, a nd
applications, organization of human resources to implement the plan, and so forth).
Limited Resources
Information management covers a wide range of activities. It is impossible to address all areas of
information management at the same time on a continuous basis. Developing a plan will help you set
priorities and improve the management of information to support your ministry‘s business objectives.
Coordination
Developing ministry information management plans can lead to greater coordination across the
ministry. This coordination can help you identify needs as well as implement and leverage best
practices across the ministry.
Caution
Attempting to change information management practices prematurely can lead to failure in an
organization.
Define vision and future state. Before developing the plan, the team should define the vision and
future state that you are trying to achieve. The vision and guiding principles of the Information
Management Framework can set the stage for this discussion. However, it is important to define the
future state in terms of the ministry‘s business objectives and operating environment.
Assess current state. It is important to understand current practices related to information
management. This will help you identify your ministry‘s information volumes and growth rates, its
strengths (i.e., good practices that are in place), any gaps and will help you to recognize where you
are starting from.
Identify gaps and set priorities. By analyzing the results of the interviews, current state assessment,
the project team can then begin to identify gaps between current practices and the planned future
state. For each gap, you will need to assess ―what needs to be done,‖ ―what is the level of effort
required,‖ and ―what the likely impact on the organization is.‖
Develop the action plan. Based on your assessment of priorities, the next stage is to develop the
action plan. The action plan should address all gaps. Typically, the action plan will be a three to five
year strategic and tactical plan to improve information management practices in your ministry with
the outcome of moving closer to the future state identified earlier in the process. The action plan
should include an estimate of resources – human and financial – required to carry out the plan.
Validate the plan. Because information management occurs in a distributed manner in most
organizations, it is important to validate your plan with staff. This will allow you the opportunity to
gain ―buy-in‖ and to show how the plan will improve productivity and help the ministry achieve its
business objectives. Based on the validation, you may end up making adjustments to the plan.
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
2. One of the major reasons for the form is to have a ....................in place in the case of very large
number of requests.
(a) information system (b) tracking mechanism
(c) system mechanism (d) None of these
In a specific case, one of the strategies may be used as the primary strategy; others may be used as
supplementary strategies. The set of four strategies is applicable both to organizational information
requirements determination and to application requirements. For each strategy, there are a number of
methods and methodologies that are in use (or have been proposed). In the discussion of strategies,
some methods or methodologies will be used as illustrations; no attempt will be made to provide a
comprehensive list. In addition to strategies and methods for eliciting requirements, there are also
strategies and methods for obtaining assurance that requirements are complete and correct and those
systems as implemented meet that requirement. A complete strategy for information system analysis,
design, and implementation should include both an eliciting strategy and a qualit y assurance strategy.
The selection of an assurance strategy has been described elsewhere; this paper focuses only on the
strategy for eliciting or determining the information requirements. It is not directed at life cycle or
other methodologies for assurance.
4.9 Prototyping
Prototyping is a method used by designers to acquire feedback from users about future designs.
Prototypes are similar to mock-ups, but are usually not as low-fidelity as mock-ups and appear
slightly later in the design process.Prototypes may be horizontal or vertical: A horizontal prototype
appears to have a very broad range of the intended future features, but only very little of the actual
functionality of the features is implemented. For example, a horizontal prototype of a computer
application may have a very well-developed and broad user interface (the horizontal dimension) but
not much of the underlying functionality is implemented (the vertical dimension, i.e. the deeper
layers of the software). Correspondingly, a vertical prototype only has very few features, which on
the other hand are almost fully implemented or at least so-called ―walking skeletons‖.
7. A ........................only has very few features and is almost fully implemented or at least so -called
―walking skeletons‖.
(a) clustering reports (b) vertical prototype
(c) horizontal prototype (d) All of these
3. The Project Team: This ad hoc team will consist of representatives from the user area, systems
analyst(s), and any necessary technical resources that may be required as the information systems
project proceeds through the SDLC stages.
Operational feasibility investigates whether there will be sufficient support for the project from
management and users.
Technical feasibility determines if the necessary technology exists and is capable of providing
adequate service.
Economic feasibility represents a general determination of whether the resulting benefits will exceed
the anticipated costs of the solution.
Schedule feasibility determines if the resulting solution can be implemented within a time frame that
will prove beneficial to the organization.
An initial investigation report is prepared at the end of this stage to document the specific problem
and what has been determined through the initial investigation. This report represents closure to a
reported problem and allows management, through the steering committee, to make a decision about
the allocation of scarce resources to the resolution of a business problem. The result of this initial
investigation will be a recommendation to either take no action, resolve the problem through
personnel changes (either reassignment or training), or continue with a subsequent stage of the
SDLC.
4.11 Summary
The system planning phase is the starting point for the systems analysis and design process.
Depending on the size of the organization, this proposal review process may be formal or
informal.
A preliminary investigation is performed where the facts surrounding the project are researched.
A business functional model is defined by analyzing major functional areas of a business.
Data architecture is derived from the business function model by combining information
requirements into generic data entities and subject databases.
4.12 Keywords
Alignment Methodologies: The methodologies in the ―alignment‖ category align IS an objective with
organizational goals.
Impact Methodologies: It helps to create and justify new uses of IT.
Life Cycle: Planning can help assure you that you are addressing issues of information management
throughout the life cycle of information assets.
Prototyping: It is a method used by designers to acquire feedback from users about future designs.
System Planning: It is the first phase of the life cycle, and when it is completed, enough work will
have been done to initiate the individual projects identified in the plan.
5.0 Objectives
After studying this chapter, you will be able to:
Understand the nature of information
Discuss about information gathering technique
Explain about samples of existing document
Understand the research and site inspection
Explain about site observation
Explain about interviews
Define types of interviews
Understand the conducting an interview
5.1 Introduction
Information gathering is a very key part of the feasibility analysis process. Information gathering is
both an art and a science. It is a science because it requires a proper methodology and tools in order
to be effective. It is an art too, because it requires a sort of mental dexterity to achieve the best
results.
Gathering is therefore the process of gathering information about the present system. We must know
what information to gather, where to find it, how to collect it, and ultimately how to process the
collected information.
Value of information
Unlike other tangible resources, information is not readily quantifiable that is, it is impossible to
predict the ultimate value of information to its users. Also, over time, there is no predictable change
in the value of information.
Multiplicative quality of information
The results produced by the use of information differ greatly from those produced by the use of other
resources for instance, information is not lost when given to others, and does not decrease when
‗consumed‘: sharing information will almost always because it to increase t hat is, information has a
self-multiplicative quality.
Dynamics of information
Information cannot be regarded as a static resource to be accumulated and stored within the confines
of a static system. It is a dynamic force for change to the system within w hich it operates. It adds
value to an organization through encouraging innovation and change without being tangible.
Individuality of information
Information comes in many different forms, and is expressed in many different ways. Information can
take on any value in the context of an individual situation. This proves that, as a resource,
information is different from most other resources. The very fact that information is characterized as
a dynamic force, ―constantly altering and extending a store of knowledge‖, corresponds with
situations in development in which outside information is offered to focus groups to alter their
understanding of certain practices, which in turn can help them solve problems (such as improving
food security or standards of living).
Apart from the attributes identified the following, also containing elements of intangibility, may be
added to the list:
Alleviation of uncertainty
Information is the resolution of uncertainty. This is perhaps one of the intangible attributes best
known among a variety of researchers.
Interdependency
Information almost always forms part of technology - it is the ―soft‖ part. Without its information
component, technology has little value as a resource for potential users who are not familiar with its
workings or its background. With regard to developing rural communities, one should bear in mind
that it is not necessarily new technology that brings about these achievements. All outside technology
applied for the first time could be viewed as new to the user group or that particular situation, and
could have similar effects.
Context dependency
The value of information as a resource in rural development depends largely on situation -specific
issues: for example, one could argue that agriculture -related information is mostly technical in
nature. However, people with little exposure to modern society have many related issues they need to
know about. for example, identified certain types of basic information needed for the development of
crop production by traditional farmers; inter alia, information about agricultural input (seeds,
fertilizer, etc.), extension, technology (farming equipment, etc.), implementation techniques
(sloughing, sowing, pest and weed control), soil, water and climatic conditions, conservation, credit,
marketing and infrastructure.
Culture dependency
Another attribute of information that can influence its usefulness as a development resource is that it
is culture dependent - involving conceptual and cognitive differentiation. Information is culture
specific, it is incommunicable unless acculturated that is, adapted for the cu ltural environment or the
cultural mind-set of the recipient group. Here, Shields and information is not totally value -free, but is
socially conditioned and shaped by the social structures that apply it. This aspect has serious
implications for developers' efforts to transfer information to the rural communities of developing
countries.
Medium dependency
Information is not only culture dependent, but also medium dependent. Once information is
concretized outside the human memory it should be packaged in some or other format (i.e., print,
images, sound, electronic digits, etc.) to be communicated to someone else. Unless receivers know
how to use that particular format, the information will remain inaccessible and rendered useless; for
example, an electronic medium directed at users who are unfamiliar with such facilities can impede
access to available information. Thus, medium dependency of information can have serious
implications for quite a number of rural people who are dependent on oral communication, ow ing to
their oral tradition and the fact that many of them are not literate. This attribute could cause
information to be a less useful resource when compared with other resources needed for development
purposes.
Conversion dependency
It is a well-known fact that information is not used in the original form offered by its creator alone
often, it needs to be adapted to suit a particular situation or specific circumstances. It can also happen
that only a small chunk of the original information is used together with other chunks of information
to form a new information package needed for a particular situation. In this way, more value can be
added to the appropriateness of information. Particularly in a situation where outside information
from the industrialised world is used to improve a practice in rural development, the information
content needs to be adapted to bring it to the level of understanding of potential recipients.
Caution
The view of information as a corporate resource, which like other resources such as people, money,
raw materials, equipment and energy, should be managed to give a competitive edge.
5.3.2 Reporters
One of the most reliable sources of information (although not completely reliable) is other
journalists. They may be your colleagues or reporters from a news agency which supplies your
organisation. If they are well trained, experienced and objective, their reports will usually be accurate
and can be trusted. However, if there are any essential facts missing from their reports, these will
have to be provided. Either they will have to provide them or you will have to find the missing facts
yourself. Mistakes can happen. This is why news organisations should have a system for checking
facts. In small newsrooms, where the reporter may also be the editor o r newsreader, the reporter must
be especially careful in checking facts.
There is also the danger that reporters misinterpret what they think they see and then present that as a
fact. This often happens when reporting such things as the size of a crowd. Un able to count every
person in it, they make an estimate, often sharing their guesses with other journalists on the scene.
This is just an estimate and any report which says ―there were 40,000 people present‖ should be
treated with caution, unless the reporter knows the exact number who came through the gate.
All sources, including reporters, are said to be reliable if we think they can be believed consistently.
If a source is always correct in the information they provide, we will believe them next time. If they
make a mistake, we may doubt what they say. Reliability is built up over time.
Your personal reliability as a journalist is important. If you have a good record for fair and accurate
reporting, you will be believed. If you get a reputation for being careless in your work or biased in
your interpretation, your colleagues, readers or listeners will not be able to rely upon you. In all cases
it is better only to report what you know and make it clear in your report that everything else is either
an estimate, an opinion or the word of someone else, perhaps a witness. You must always try to give
precise facts and attributed opinion. If you cannot do that, you can use phrases like ―it is believed that
...‖ or ―it appears that ...‖. It is better to do this than to leave your readers or listeners believing that
what you have said is a proven fact.
Caution
A reporter‘s story should be checked by the news editor then the sub -editor.
The key here consists of the PART and WAREHOUSE fields together, but WAREHOUSE-
ADDRESS is a fact about the WAREHOUSE alone. The basic problems with this design are:
The warehouse address is repeated in every record that refers to a part stored in that warehouse.
If the address of the warehouse changes, every record referring to a part stored in that warehouse
must be updated.
Because of the redundancy, the data might become inconsistent, with different records showing
different addresses for the same warehouse.
If at some point in time there are no parts stored in the warehouse, there may be no record in
which to keep the warehouse's address.
To satisfy second normal form, the record shown above should be decomposed into (replaced by) the
two records:
The EMPLOYEE field is the key. If each department is located in one place, then the LOCATION
field is a fact about the DEPARTMENT -- in addition to being a fact about the EMPLOYEE. The
problems with this design are the same as those caused by violations of second normal form:
The department's location is repeated in the record of every employee assigned to that
department.
If the location of the department changes, every such record must be updated.
Because of the redundancy, the data might become inconsistent, with different records showing
different locations for the same department.
If a department has no employees, there may be no record in which to keep the department's
location.
To satisfy third normal form, the record shown above should be decomposed into the two records:
EMPLOYEE DEPARTMENT DEPARTMENT LOCATION
There is a slight technical difference between functional dependencies and single -valued facts as we
have presented them. Functional dependencies only exist when the things involved have unique and
singular identifiers (representations). For example, suppose a person's address is a single-valued fact,
i.e., a person has only one address. If we do not provide unique identifiers for people, then there will
not be a functional dependency in the data:
PERSON ADDRESS
Ram Sharma 123 Main St., United Staet
Ram Sharma 321 Center St., India
Although each person has a unique address, a given name can appear with several different addresses.
Hence we do not have a functional dependency corresponding to our single -valued fact.
Note that other fields, not involving multi-valued facts, are permitted to occur in the record, as in the
case of the QUANTITY field in the earlier PART/WAREHOUSE example.
The main problem with violating fourth normal form is that it leads to uncertainties in the
maintenance policies. Several policies are possible for maintaining two independent multi -valued
facts in one record:
(1) A disjoint format, in which a record contains either a skill or a language, but not both:
This is not much different from maintaining two separate record types. (We note in passing that such
a format also leads to ambiguities regarding the meanings of blank fields. A blank SKILL could mean
the person has no skill, or the field is not applicable to this employee, or the data is unknown, or, as
in this case, the data may be found in another record.)
(2) A random mix, with three variations:
(a) Minimal number of records, with repetitions:
(c) Unrestricted:
(3) A ―cross-product‖ form, where for each employee, there must be a record for every possible
pairing of one of his skills with one of his languages:
4. ......................of information can have serious implications for quite a number of rural people who
are dependent on oral communication.
(a). Culture dependency (b). Medium dependency
(c). Context dependency (d). Conversion dependency
5.9 Questionnaires
Questionnaire is a self-administered tool that is more economical and requires less skill to administer
than the interview. At any point in time, unlike the interview, feedback from many respondents can
be collected at the same time. Since questionnaires are usually self -administered it is critical that
questions be clear and unambiguous.
5.10 Interview
5.10.1 Meaning of an Interview
An interview can be defined as an oral tool to test a candidate‘s traits for employment or admission to
a premiere institution of learning. Being an oral test, it calls for your skills of oral and non-verbal
communication to support your performance before a panel of experts. There are different types of
interviews, such as Panel interview. Each type of interview requires your attention for a careful
application of a particular set of communication skills.
5.10.2 Purpose of an Interview
The interview is one of the most important phases of the job search process. Your resume and cover
letter are simply tools to get you to the interviewing stage. The interview is your opportunity to
convince an employer that you are the right person for the job.
As the interviewee, the main purposes of the interview are to:
Communicate information about yourself, your experience and your abilities
Seek further information about the position and the organization
Evaluate the match between your needs and what the job offers
The main purposes of the interview for the interviewer are to gather relevant information about the
candidate‘s:
Interview preparation––interest in and knowledge of the industry, the position and the
organization.
Communication skills––oral presentation skills and the ability to interact with others.
Qualifications––academic, work, volunteer and other experience
Leadership potential and teamwork––demonstrated ability to work with others and to get others to
work together.
Clear and realistic career goals––future plans and awareness of career paths
Self awareness realistic appraisal of self
Motivation and success potential––enthusiasm for the position; demonstrated patterns of
accomplishment
Work ethic––acceptance of responsibility, ability to keep commitments and attitude of the
importance of work.
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
5.12.2 Preparation
Do as much research as possible in advance on the person and/or topic you are working on. Sources
might include the library, public records, the internet and people you know who can provide
background information. Prepare your questions in advance in writing and bring them to the
interview. Refer to them but do not show them to the interviewee, because i t creates too formal an
atmosphere. Ask other questions as they might arise, based on what the interviewee says or
something new that might come to you on the spur of the moment. Bring two pencils (or pens) and
paper. A stenographer's notebook is usually easier to handle than a large pad but use whatever is
comfortable. Bring a tape recorder if you can but be sure to get the permission to use it from the
person you are interviewing. You also should take notes, because it will help in the reconstruction
phase, and, yes, tape recorders fail occasionally.
5.12.4 Reconstruction
As soon as it is practical after the interview, find a quiet place to review your handwritten notes. In
your haste while taking notes, you may have written abbreviations for words that would not mean
anything to you a day or two later. Or some of your scribbling may need deciphering, and, again, it is
more likely you will be better able to understand the scribbles soon after the interview. Underline or
put stars alongside quotes that seemed most compelling. One star for a good quote, two star s for a
very good one, etc. It will speed the process when you get to the writing stage. One other thing to
look for in your notes: the quote you wrote down might not make a lot of sense, unless you remember
what specific question it was responding to. In short, fill in whatever gaps exist in your notes that
will help you better understand them when writing.
7. In a structured observation the observer looks for and records a specific action
(a). indirect (b). obtrusive
(c). structured (d). natural
8. ....................is a self-administered tool that is more economical and requires less skill to
administer than the interview.
(a). Questionnaire (b). Information
(c). Interview (d). None of these
5.13 Summary
The information should be seen as something tangible, physical and concrete, while viewpoints
from within the information profession emphasise the intangibility of information.
Medium dependency of information can have serious implications for quite a number of rural
people who are dependent on oral communication, owing to their oral tradition and the fact that
many of them are not literate.
The main sources of information are users of the system, forms and documents used in the
organization, procedure manuals, rule books etc, reports used by the organization and existing
computer programs.
The normalization rules are designed to prevent update anomalies and data inconsistencies. With
respect to performance tradeoffs, these guidelines are biased toward the assumption that all non -
key fields will be updated frequently.
Member Site Inspection Panels are used as one means of ensuring that Members have sufficient
information about the site and proposal to reach a decision at a subsequent Development
Management Committee meeting.
5.14 Keywords
Culture Dependency: Attribute of information that can influence its usefulness as a development
resource is that it is culture dependent.
Frequent Complaint: A frequent complaint is that information is often denied its role as a resource
yet, when looking at the effect of information in development situations.
Information: Information may be processed and refined, so that raw materials (e.g., databases) are
converted into finished products.
Questionnaires: Questionnaires are useful for collecting statistical data. Sometimes the
questionnaires are not promptly replied and several follow-ups/personal interviews may be required
to get questionnaires back from respondents.
Randomization: Randomization is a sampling technique characterized as having no predetermined
pattern or plan for selecting sample data.
Sampling: Sampling is the process of collecting sample documents, forms, and records.
Stratification: Stratification is a systematic sampling technique that attempts to reduce the variance
of the estimates by spreading out the sampling.
6.0 Objectives
After studying this chapter, you will be able to:
Explain the structured analysis
Discuss the types of charts
Explain the data flow diagram
Discuss the guidelines for drawing dataflow diagrams
Explain the logical and physical data flow diagram
Discuss the data dictionary
Explain the decision trees and structured English
6.1 Introduction
The structured analysis is a set of techniques and graphical tools used by the analyst for applying a
systematic approach to systems analysis. The traditional approach focuses on cost/benefit and
feasibility analyses, project management, hardware and software selection, and personnel
considerations. In contrast, structured analysis uses graphical tools such as data flow diagram, data
dictionary, structured English, decision tree, and decision tables. The outcome of structured analysis
is a new document, called system specifications, which provides the basis for design and
implementation.
Bar Charts
A bar chart is a little more complex. It shows the opening and closing prices, as well as the highs and
lows. The bottom of the vertical bar indicates the lowest traded price for that time period, while the
top of the bar indicates the highest price paid. The vertical bar itself indicates the currency pair‘s
trading range as a whole. The horizontal hash on the left side of the bar is the opening price, and the
right-side horizontal hash is the closing price.
Here is an example of Figure 6.1 a bar chart for EUR/USD:
Figure 6.1: The word ―bar‖ in reference to a single piece of data on a chart.
A bar is simply one segment of time, whether it is one day, one week, or one hour. When you see the
word ‗bar‘ going forward, be sure to understand what time frame it is referencing. Bar charts are also
called ―OHLC‖ charts, because they indicate the open, the high, the low, and the close for that
particular currency.
Here‘s an example of Figure 6.2 a price bar:
Figure 6.2: The price bar.
Open: The little horizontal line on the left is the opening price
High: The top of the vertical line defines the highest price of the time period
Low: The bottom of the vertical line defines the lowest price of the time period
Close: The little horizontal line on the right is the closing price
Line Charts
A simple line chart draws a line from one closing price to the next closing price. When strung
together with a line, we can see the general price movement of a currency pair over a period of time.
Here is an example of Figure 6.3 a line chart for EUR/USD:
Pie Charts
A pie chart compares parts to a whole. As such it shows a percentage distribution. The entire pie
represents the total data set and each segment of the pie is a particular category within the whole. So,
to use a pie chart, the data you are measuring must depict a ratio or percentage relationship. You must
always use the same unit of measure within a pie chart.
The pie chart in Figure 6.4 shows where ABC enterprise‘s sales come from.
Figure 6.4: Example of a pie chart.
Note 1: Be careful not to use too many segments in your pie chart. More than about six and it gets far
too crowded. Here it is better to use a bar chart instead.
Note 2: If you want to emphasize one of the segments, you can detach it a bit from the main pie. This
visual separation makes it stand out.
Note 3: For all their obvious usefulness, pie charts do have limitations, and can be misleading.
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
3. Structured analysis and design are broken into four ..................domains within application
architecture
(a). secondary (b). primary
(c). storage (d). None of these.
Figure 6.7 shows a logical DFD and a physical DFD for a grocery store cashier. The customer brings
the items to the register; prices for all items are looked up, and then totalled; next, payment is given
to the cashier finally, the customer is given a receipt. The l ogical DFD illustrates the processes
involved without going into detail about the physical implementation of activities.
Price code found on most grocery store items is used. In addition, the physical DFD mentions manual
processes such as scanning, explains that a temporary file is used to keep a subtotal of items, and
indicates that the payment could be made by cash, check, or debit card. Finally, it refers to the receipt
by its name, cash register receipt.
Figure 6.7: The physical data flow diagram shows certain details not found on the logical data flow
diagram.
Steps of Developing DFD
The first we need to conceptualize data flows from a top -down perspective. To begin with we made a
list of business activities and use it to determine various.
External entities
Data flows
Process
Data stores
Caution
Altering or manipulating the data in data dictionary tables can permanently and detrimentally affect
the operation of a database.
5. Structure charts are used to define the summary structure flow from......................
(a). not process (b). one process to another
(c). Both (a) and (b) (d). None of these
6. These models are joined with ..........................to define the events of an application.
(a). data flow models (b). data flow diagram
(c). structure analysis (d). None of these
7. The structured systems analysis and design method is an approach to designing............. ........
(a). Design (b). analyzing information systems
(c). structure analysis (d). None of these
6.10 Summary
The structured analysis is the process that is used for documenting this complexity.
Oracle reads the data dictionary to ascertain that schema objects exist and that users have proper
access to them.
Structured analysis and design are broken into four primary domains within application
architecture. These are the data flows, data models, structure charts, and state models.
The structured systems analysis and design method (SSADM) is an approach to designing and
analyzing information systems.
The traditional approach to analysis is focused on cost/benefit and feasibility analysis, hardware
and software selection and personal considerations.
6.11 Keywords
Data Flow Diagram: The two-dimensional diagram that explains how data is processed and
transferred in a system.
Data Stores: Data stores may be local to a specific level in the set of DFDs. A data store is used only
if it is referenced by more than one process.
Graphic: It is refer to image or visual representation of an item.
Representation: A person induced into a contract on the basis of an untrue or misleading
representation may sue for rescission of the contract and/or for damages.
Structured Analysis: It is a software engineering technique that uses graphical diagrams to develop
and portray system specifications that are easily understood by users.
7.0 Objectives
After studying this chapter, you will be able to:
Discuss the need of feasibility study
Explain the types of feasibility
Explain the steps of feasibility study
7.1 Introduction
A feasibility studies main goal is to assess the economic viability of the proposed business. The
feasibility study needs to answer the question ―Does the idea make economic sense?‖ The study
should provide a thorough analysis of the business opportunity, including a look at all the possible
roadblocks that may stand in the way of the cooperative‘s success. The outcome of the feasibility
study will indicate whether or not to proceed with the proposed venture. If the res ults of the
feasibility study are positive, then the cooperative can proceed to develop a business plan.
If the results show that the project is not a sound business idea, then the project should not be
pursued. Although it is difficult to accept a feasibility study that shows these results, it is much better
to find this out sooner rather than later, when more time and money would have been invested and
lost. It is tempting to overlook the need for a feasibility study. Often, the steering committee may
face resistance from potential members on the need to do a feasibility study. Many people will feel
that they know the proposed venture is a good idea, so why carry out a costly study just to prove what
they already know? The feasibility study is important because it forces the NGC (new generation
cooperative) to put its ideas on paper and to assess whether or not those ideas are realistic. It also
forces the NGC to begin formally evaluating which steps to take next.
The NGC‘s organizers will typically hire a consultant to conduct the feasibility study. Because the
consultant is independent of the cooperative, he or she is in a better position to provide an objective
analysis of the proposed venture. He or she should have previous experience in directly related work.
To get an estimate of the costs of a feasibility study, prepare a rough outline of the work needed to be
done.
It might be tempting to choose the lowest cost consultant or a personal acquaintance of one of the
NGC‘s organizers, but always remember that quality work is the most important factor when
choosing a consultant. Make sure that the consultant can provide an independent assessment of the
business opportunity. For instance, hiring an engineering firm or an equipment manufacturer to
conduct market analysis may lead to biased results in favor of proceeding with the venture.
Engineering firms and equipment manufacturers may have an incentive to show positive results so
they can obtain contracts with the cooperative once it chooses to start up oper ations. Engineering
firms and equipment manufacturers are needed in order to provide information about equipment
requirements and costs, but an independent consultant should conduct the overall feasibility study.
Caution
The consultant should have a good understanding of the industry as well as the new generation
cooperative model of business.
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
3.………………..is a determination of whether the proposed system will be acceptable to the people
or not.
(a) Social feasibility (b) Legal feasibility
(c) Management feasibility (d) Time feasibility
The first package was installed in January 2002. More than 200 packages have been installed to date.
The next two candidate systems are similarly described. The information along with additional data
available through the vendor highlights the positive and negative features of each system. The
constraints unique to each system are also specified. For example, in the IBM PC package, the lack of
an available source code means that the user has to secure a maintenance contract that costs 18% of
the price of the package per year. In contrast the HP 100 package is less expensive and offers a
source code to the user. A maintenance contract (optional) is available at 18% of the price of the
package.
7.4.5 Determine and Evaluate Performance and Cost Effectiveness of Each Candidate System
Each candidate system‘s performance is evaluated against the system performan ce requirements set
prior to the feasibility study. Whatever the criteria, there has to be as close a match as practicable,
although trade-offs are often necessary to select the best system. In the safe deposit case, the criteria
chosen in advance were accuracy, growth potential, and response time less than five seconds,
expandable main and auxiliary storage, and user friendly software. Often these characteristics do not
lend themselves to quantitative measures. They are usually evaluated in qualitative ter ms (excellent,
good, etc.) based on the subjective judgment of the project team.
The cost encompasses both designing and installing the system. It includes user training, updating the
physical facilities and documenting. System performance criteria are eva luated against the cost of
each system to determine which system is likely to be the most cost effective and also meets the
performance requirements. The safe deposit problem is easy. The analyst can plot performance
criteria and costs for each system to determine how each fare. Costs are more easily determined when
the benefits of the system are tangible and measurable. An additional factor to consider is the cost of
the study design and development. The cost estimate of each phase of the safe deposit proj ect was
determined for the candidate system (IBM PC). In many respects, the cost of the study phase is a
―sunk cost‖ (fixed cost). Including it in the project cost estimate is optional.
Disapproval of the feasibility report is rare if it has been conducted properly. When a feasibility team
has maintained good rapport with the user and his/ her staff it makes the recommendations easier to
approve. Technically, the report is only a recommendation, but it is an authoritative one. Management
has the final say. Its approval is required before system design is initiated.
Oral Presentation
The feasibility report is a good written presentation documenting the activities involving the
candidate system. The pivotal step, however, is selling the proposed change. Invariably the project
leader or analyst is expected to give an oral presentation to the end user. Although it is not as
polished as the written report, the oral presentation has several important objectives. The most
critical requirements for the analyst who gives the oral presentation are:
(1) Communication skills and knowledge about the candidate system that can be translated into
language understandable to the user and
(2) The ability to answer questions, clarifies issues, maintain credibility and pick up on any new
ideas or suggestions.
The substance and form of the presentation depend largely on the purposes sought. Table 7.2 suggests
a general outline. The presentation may aim at informing, confirming, or persuading.
1. Informing. This simply means communicating the decisions already reached on system
recommendations and the resulting action plans to those who will participate in the implementation.
2. Confirming. A presentation with this purpose verifies facts and recommendations already
discussed and agreed upon. Unlike the persuading approach, no supportive evidence is presented to
sell the proposed change, nor is there elaborate reasoning behind recommendations and conclusions.
Although the presentation is not detailed, it should be complete. Confirming is itself part of the
process of securing approval. It should reaffirm the benefits of the candidate system and provide a
clear statement of results to be achieved.
3. Persuading
This is a presentation pitched toward selling ideas - attempts to convince executives to take action on
recommendations for implementing a candidate system.
Regardless of the purpose sought, the effectiveness of the oral presentation dep ends on how
successful the project team has been in gaining the confidence of frontline personnel during the
initial investigation. How the recommendations are presented also has an impact. Here are some
pointers on how to give an oral presentation:
1. Rehearse and test your ideas before the presentation. Show that you are in command. Appear
relaxed.
2. Final recommendations are more easily accepted if they are presented as ideas for discussion,
even though they seem to be settled and final.
3. The presentation should be brief, factual and interesting Clarity and persuasiveness is critical.
Skill is needed to generate enthusiasm and interest throughout the presentation.
4. Use good organization. Distribute relevant material to the user and other parties in advance.
5. Visual aids (graphs, charts) are effective if they are simple, meaningful and imaginative. An
effective graph should teach or tell what is to be communicated.
6. Most important, present the report in an appropriate physical environment where the acoustics,
seating pattern, visual aid technology and refreshments are available.
The most important element to consider is the length of the presentation. The duration often depends
on the complexity of the project, the interest of the user group and the competence of the project
team. A study that has company wide applications and took months to complete would require hours
or longer presenting. The user group that was involved at the outset would likely permit a lengthy
presentation, although familiarity with the project often dictates a brief presentation. Unfortunately,
many oral presentations tend to be a rehash of the written document with little flare or excitement.
Also, when the analyst or the project leader has a good reputation and success record from previous
projects, the end user may request only a brief presentation.
Figure 1 illustrates typifies the processes of a generic membership system. You can see the effect e -
enablement has on the current process on the right site of the illustration.
Running costs
These are an upkeep of the web server and maintenance costs.
Running costs for change process
This is the cost of factoring in for your employees to train and adapting to the newly introduced
technology, mainly the strategies used to make the change as smooth as possible.
Additionally being on the Internet would result in the your company having to become familiar to
respond to emails, queries, and complaints that require instant or quick responses as opposed to
replying to a Customer/Client via a letter. To be successful online, your company would have to
address this issue of Change Management in that it would have to incorporate into its business,
processes in order to guide the company to successfully maximize its effectiveness on the Internet.
―Business is streamlined and service is almost instantaneous when it is done on the Web.‖
Financial Benefits
Improve Cash flow
Online payment would result in the lead-time to receive payments for membership within the same
day of the application being made rather than the average 14 day delay. E -enabling the membership
process is not just about reengineering a process so that it is quicker than before, it would result in a
complete overhaul of the previous way of managing membership. Figure 1 illustrates the current
membership process on the left and the E-enabled equivalent. As you can see in the E-enabled
application and payment are made online. Once the application has been completed the
documentation is sent via email to the member adding value by cutting the waiting time. This also
saves on printing and postage for the Society. Even if the member does not have an email address the
documentation will be available for download for registered members to the web site.
Increase Revenues the Internet will increase the volume of members. By going online with your
business, you will generate revenue from places you never imagined.
Non-financial Benefits
Communication
Direct email marketing incurs little or no cost compared to the traditional direct mail marketing.
Direct email marketing allows the flexibility of sending the companies message day or night, exactly
when they want.
Transparency
The Information Management Website will allow the membership process to become transparent. For
example, for the first time ever, Management will be able to know as a matter of fact:
1. The total number of members
2. Those members who need to renew their membership
3. Those members who are in arrears with their membership fees
4. Total number of members and accredited members
5. Forecast for the expected revenue that will be generated in advance and look at historic monthly
generated revenues.
Exposure
The Internet means that your company will become a global Business. Thus attracting potential
members from internationally. ―Using the Web to sell your products removes the physical boundaries
from your customer base. Customers from all over the world can learn about and purchase your
products online.‖
Wider Considerations
When prospective members are signing up for membership they are providing personal data. This
data includes the members name, full address, and credit card details. This means that the Society
should register for the Data Protection Act 1998 which has a one off fee of £35.
Furthermore, your company must take measures to secure the line of communication from the
visitor‘s computer and your web site. An SSL (system security layer) connection guarantees th at this
communication cannot be intercepted by any other person like a hacker.
7.5 Summary
A feasibility studies main goal is to assess the economic viability of the proposed business.
The feasibility study is the important step in any software development process. This is because it
makes analysis of different aspects like cost required for developing and executing the system,
the time required for each phase of the system and so on.
Feasibility study is made on the system being developed to analyze whether the system
development process require training of personnel.
Feasibility is the determination of whether or not a project is worth doing.
A good feasibility study would review a company‘s strengths and weaknesses, its position in the
marketplace, and it is financial situation.
Each candidate system‘s performance is evaluated against the system performance requirements
set prior to the feasibility study.
7.6 Keywords
Economic Feasibility: It is the most frequently used technique for evaluating a proposed system. It is
also called Cost/Benefit Analysis.
Legal Feasibility: It is a determination of whether the proposed project is under legal obligation of
known Acts, Statutes, etc.
Management Feasibility: It is a determination of whether the proposed system is acceptable to the
management of the organization.
Operation Feasibility: It is related to human organizational aspects. The points to be considered here
are - what changes will be brought with the system?, what new skills will be required?, do the
existing staff members have these skills and can they be trained?
Social Feasibility: It is a determination of whether the proposed system will be acceptable to the
people or not. It finds out the probability of the project being accepted by the group of people who
are directly affected by the changed system.
Technical Feasibility: It is concerned with specifying the equipment and the computer system that
will satisfy and support the proposed user requirements.
Time Feasibility: It is a determination of whether the project will be completed within a specified
time period. If the project takes too much time, it is likely to be rejected.
8.0 Objectives
After studying this chapter, you will be able to:
Explain the data analysis
Describe the classifications of costs and benefits
Describe the cost categories
Process of determining costs/benefits
Explain the system proposal
8.1 Introduction
A cost benefit analysis is done to determine how well, or how poorly, a planned action will turn out.
Although a cost benefit analysis can be used for almost anything, it is most commonly done on
financial questions. Since the cost benefit analysis relies on the addition of positive factors and the
subtraction of negative ones to determine a net result, it is also known as running the numbers.
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
Caution
Tangible and intangible costs and benefits should be considered in the evaluation process.
Tangible Costs
Tangible costs include the types of things a business writes checks for: salaries and wages, leases,
operational inputs, employee medical benefits, transportation and commercial insurance. These costs
have a clear place in the general ledger. The company cannot conduct business or produce a quality
product without spending on tangible costs. They are also easy to quantify, so management tends to
focus on the manipulation of tangible costs.
Intangible Costs
Intangible costs are less easily measured. Some key and common intangible costs might include a
drop in employee morale, dissatisfaction with working conditions or customer disappointment with a
decline in service or product quality. Intangible costs resul t from an identifiable source, but the costs
are often not predicted. They may occur after a new practice or policy is put into effect, such as a cut
in staffing levels or in employee benefits. Managers can try to estimate intangible costs as soon as
they see a pattern of loss. This estimate will be the basis of a decision to either change or continue a
practice that frustrates employees or customers. If a new procedure has injured an employee, the
company may need to act quickly to avoid government fines and inspections.
Direct Costs
Direct costs can be defined as costs which can be accurately traced to a cost object with little effort.
Cost object may be a product, a department, a project, etc. Direct costs typically benefit a single cost
object therefore the classification of any cost either as direct or indirect is done by taking the cost
object into perspective. A particular cost may be direct cost for one cost object but indirect cost for
another cost object. Most direct costs are variable but this may not always be the case. For example,
the salary of a supervisor for a month who has only supervised the construction of a single building is
a direct fixed cost incurred on the building.
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
Indirect Costs
Costs which cannot be accurately attributed to specific cost objects are called indirect costs. These
typically benefit multiple cost objects and it is impracticable to accurately trace them to individual
products, activities or departments etc.
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
Ex2: Following costs are incurred by a factory on the production of identical cupboards:
1. Laborers' wages 2. Synthetic wood
3. Power consumption 4. Glass
5. Nails and screws 6. Factory insurance
7. Handles, locks and hinges 8. Wood
9. Supervisors' salaries 10. Factory depreciation
11. Varnish, glue, paints 12. Factory manager's salary
Classify the above costs as direct or indirect.
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
Solution
1. Direct 2. Direct
3. Indirect 4. Direct
5. Indirect 6. Indirect
7. Direct 8. Direct
9. Indirect 10. Indirect
11. Indirect 12. Indirect
2. Customer support is a range of…………..to assist customers in making cost effective and correct
use of a product.
(a) vendor service (b) customer services
(c) consumer service (d) Both (a) and (c).
Hardware Costs
It is relate to the actual purchase or lease of the computer and peripherals (for example, printer, disk
drive, tape unit). Determining the actual cost of hardware is generally more difficult when the system
is shared by various users than for a dedicated stand- alone system. In some cases, the best way to
control for this cost is to treat it as an operating cost.
Personnel Costs
It is include EDP staff salaries and benefits (health insurance, vacation time, sick pay, etc.) as well as
pay for those involved in developing the system Costs incurred during development of a system are
one - time costs and are labelled developmental costs. Once the system is installed, the costs of
operating and maintaining the system become recurring costs.
Facility Costs
It is expenses incurred in the preparation of the physical site where the application or the computer
will be in operation. This includes wiring, flooring, acoustics, lighting and air conditioning. These
costs are treated as onetime costs and are incorporated into the overall cost estimate of the candidate
system.
Operating Costs
It is include all costs associated with the day-to-day operation of the system; the amount depends on
the number of shifts, the nature of the applications, and the calibre of the operating staff. There are
various ways of covering operating costs. One approach is to treat operating costs as overhea d.
Another approach is to charge each authorized user for the amount of processing they request from
the system.
The amount charged is based on computer time, staff time and volume of the output produced. In any
case, some accounting is necessary to determine how operating costs should be handled.
Supply Costs
It is variable costs that increase with increased use of paper, ribbons, disks, and the like. They should
be estimated and included in the overall cost of the system.
A system is also expected to provide benefits. The first task is to identify each benefit and then assign
a monetary value to it for cost/ benefit analysis. Benefits may be tangible and intangible, direct or
indirect.
The two major benefits are improving performance and minimizing the cost of processing. The
performance category emphasizes improvement in the accuracy of or access to information and easier
access to the system by authorized users. Minimizing costs through an efficient system - error control
or reduction of staff- is a benefit that should be measured and included in cost/benefit analysis
6. ………… analysis is a procedure that gives a picture of the various costs, benefits and rules
associated with a system.
(a) Cost (b) Benefits
(c) Both (a) and (b) (d) Data
Cover Letter: It should list the people who did the study and summarize the objectives of the study.
Title Page of Project: Name of the project, the names of the team members, date submitted.
Table of Contents: It is useful to readers of long proposals; omit if less than 10 pages.
Executive Summary: precisely provides the who, what, when, where, why, and how of the proposal.
Outline of Systems Study with Appropriate Documentation: it is provides information about all the
methods used in the study and who or what was studied.
Detailed results of the systems study: It describes what was found out about human and systems
needs through all the methods described in the detailed results of the systems study.
Systems Alternatives: It is two or three alternatives that directly address the problem.
Systems Analysts Recommendations: It is the recommended solution.
Summary: It is brief statement that mirrors the content of the executive summary. Conclude the
proposal on a positive note.
Appendices: It can include any information that may be of interest.
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
8.7 Summary
Cost benefit analysis is done to determine how well, or how poorly, a planned action will turn
out. Although a cost benefit analysis can be used for almost anything, it is most commonly done
on financial question
Data analysis is a prerequisite to cost/ benefit analysis. System investigation and data gathering
lead to an assessment of current findings.
The outlay of cash for a specific item or activity is referred to as a tangible cost. They are usually
shown as disbursements on the books.
Cost/ benefit analysis is a procedure that gives a picture of the various costs, benefits and rules
associated with a system.
System proposal is presented to management for determining whether a candidate system should
be designed.
8.8 Keywords
Hardware Costs: It is relate to the actual purchase or lease of the computer and peripherals.
Indirect Costs: It is refer to results of operations that are not directly associated with a given system
or activity
Operating Costs: It is include all costs associated with the day-to-day operation of the system.
Supply Costs: It is variable costs that increase with increased use of paper, ribbons, disks etc.
Tangibility: It is refers to the ease with which costs or benefits can be measured.
9.0 Objectives
After studying this chapter, you will be able to:
Discuss the design process
Explain the phases of design
Discuss the module coupling and cohesion
Explain the prototyping model
Discuss the joint application development
Discuss the object-oriented design process
Discuss the processing controls and data
9.1 Introduction
The purpose of system design is to create a technical solution that satisfies the functional
requirements for the system. At this point in the project lifecycle there should be a functional
specification, written primarily in business terminology, cont aining a complete description of the
operational needs of the various organizational entities that will use the new system. The challenge is
to translate all of this information into technical specifications that accurately describe the design of
the system, and that can be used as input to system construction.
The functional specification produced during system requirements analysis is transformed into a
physical architecture. System components are distributed across the physical architecture, usable
interfaces are designed and prototyped and technical specifications are created for the application
developers, enabling them to build and test the system. Many organizations look at system design
primarily as the preparation of the system component specifications; however, constructing the
various system components is only one of a set of major steps in successfully building a system. The
preparation of the environment needed to build the system, the testing of the system, and the
migration and preparation of the data that will ultimately be used by the system are equally
important. In addition to designing the technical solution, system design is the time to initiate focused
planning efforts for both the testing and data preparation activities.
The System Requirements Document drives system design, which consists of the following process
steps:
Establish functional requirements from the analysis of system requirements
Determine design requirements – technology needs, solution space
Iteratively consider design requirements and architecture
Determine an initial design approach
Analyze risks in the development and implementation of the proposed design approach
Initial estimate of performance
Establish an error budget
Initial estimate of cost and schedule
Plan for work to address research needs and reduce/mitigate risks
Trade studies – iterate on cost/performance trade offs
Figure 9.2 is a process diagram that illustrates the system design process.
There are four major documentation products that result from the system design phase:
1. System Design Proposal
The System Design Proposal establishes the scope of work, WBS and cost estimate for the system
design phase.
2. Systems Engineering Management Plan
The Systems Engineering Management Plan (SEMP) describes the management processes that will be
used throughout the new instrument development project.
The SEMP includes the following:
Staff and organization
Roles and responsibilities
Project work flow
Decision making process
Reporting and documentation requirements
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
There are two broad categories of design methodologies: the systematic and the formal types. As the
name imply, the formal type makes extensive use of mathematical notations for the object
transformations and for checking consistencies. The systematic type s are less mathematical and are
consisting of the procedural component, which prescribes what action or task to perform and the
representation component, which prescribes how the software structure should be represented.
Generally, techniques from the systematic design methodologies can be integrated and can utilize
representation schemes from other techniques when and as appropriate. Due to the fact that
methodologies have been developed from different milieu specifically to address certain problems o r
groups of problems, there is no common baseline on which to evaluate or compare the methodologies
against each other.
However, the underlying principles of the methodologies can be analyzed and examined for a better
understanding of the basis for each methodology. With a better understanding of the methodology, its
domain of application can be more effectively applied or more accurately defined. Generally,
alternative design allows for important trade off analysis before coding the software. Thus,
familiarity with several methodologies makes creating competitive designs more logical and
systematic with less reliance on inspiration. It is not the intention of this section to explain the
detailed mechanics of each of the methodologies but to discuss specific p rinciples of each
methodology.
Stepwise refinement begins with specifications obtained from requirement analysis. The solution to
the problem is first broken down into a few major modules or processes that will demonstrably solve
the problem. Then through successive refinement, each module is decomposed until there are
sufficient details so that implementation a programming language is straight forward. In this way, a
problem is segmented into smaller, manageable units and the amount of details that have to be
focused on any point in time is minimized. This allows the designer to channel his resources at a
specific issue at the proper time. As stepwise refinement begins at the top -level, success is highly
dependent on the designer‘s conceptual understanding of the complete problem and the desired
solution.
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
You will most likely encounter this Anti Pattern in a C shop that has recently gone to C++, or has
tried to incorporate CORBA interfaces, or has just implemented some kind of object tool that is
supposed to help them. It is usually cheaper in the long run to spend the money on object -oriented
training or just hire new programmers who think in objects.
Specified Disaster
Sometimes, those who generate specifications and requirements do not necessarily have real
experience with object-oriented systems. If the system they specify makes architectural commitments
prior to requirements analysis, it can and often does lead to Anti Patterns such as Functional
Decomposition.
Example
Functional Decomposition is based upon discrete functions for the purpose of data manipulation, for
example, the use of Jackson Structured Programming. Functions are often methods within an object -
oriented environment. The partitioning of functions is based upon a different paradigm, which leads
to a different grouping of functions and associated data.
The simple examples in Figure 9.4 see a functional version of a customer loan scenario:
Highly Coupled
When the modules are highly dependent on each other then they are called highly coupled.
Loosely Coupled
When the modules are dependent on each other but the interconnection among them is weak then they
are called loosely coupled.
Uncoupled
When the different modules have no interconnection among them then th ey are called uncoupled
modules. Factors affecting coupling between modules:
The various factors which affect the coupling between modules are depicted in Table 9.1.
1. Data Coupling
Two modules are data coupled if they communicate using an elementary data item that is passed as a
parameter between the two; for example, an integer, a float, a character, etc. This data item should be
problem related and not used for a control purpose.
When a non-global variable is passed to a module, modules are called data coupled. It is the lowest
form of a coupling. For example, passing the variable from one module in C and receiving the
variable by value (i.e., call by value).
2. Stamp Coupling
Two modules are stamp coupled if they communicate using a composite data item, such as a record,
structure, object, etc. When a module passes a non-global data structure or an entire structure to
another module, they are said to be stamp coupled.
For example, passing a record in Pascal or a structure variable in C or an object in C++ language to a
module.
3. Control Coupling
Control coupling exists between two modules if data from one module is used to direct the order of
instruction execution in another. An example of control coupling is a flag set in one module that is
tested in another module.
Figure 9.9: Control coupling.
The sending module must know a great deal about the inner workings of the receiving module. A
variable that controls decisions in subordinate module C is set in super-ordinate module A and then
passed to C.
4. External Coupling
It occurs when modules are tied to an environment external to software. External coupling is essential
but should be limited to a small number of modules with structures.
5. Common Coupling
Two modules are common coupled if they share some global data items (e.g., Global variables).
Diagnosing problems in structures with considerable common coupling is time -consuming and
difficult. However, this does not mean that the use of global data is ne cessarily ―bad.‖
Caution
A software designer must be aware of potential consequences of common couplings and take special
care to guard against them.
6. Content Coupling
Content coupling exists between two modules if their code is shared; for example, a branch from one
module into another module. It is when one module directly refers to the inner workings of another
module. Modules are highly interdependent on each other. It is the highest form of coupling. It is also
the least desirable coupling as one component actually modifies another and thereby the modified
component is completely dependent on the modifying one.
High coupling among modules not only makes a design difficult to understand and maintain, but it
also increases development effort as the modules having high coupling cannot be developed
independently by different team members. Modules having high coupling are difficult to implement
and debug.
9.6.2 Cohesion
Cohesion is a measure of the relative functional strength of a module. The cohesion of a component is
a measure of the closeness of the relationships between its components. A cohesive module performs
a single task within a software procedure, requiring little interaction with procedures being performed
in other parts of a program. A strongly cohesive module implements functionality that is related to
one feature of the solution and requires little or no interaction with other modules. This is shown in
Figure 9.11. Cohesion may be viewed as the glue that keeps the module together. It is a measure of
the mutual off city of the components of a module.
Thus, we want to maximize the interaction within a module. Hence, an important design objective is
to maximize the module cohesion and minimize the module coupling.
Types of Cohesion
There are seven levels of cohesion in decreasing order of desirability, which are as follows:
1. Functional Cohesion
Functional cohesion is said to exist if different elements of a module cooperate to achieve a single
function (e.g., managing an employee‘s payroll). When a module displays functional cohesion, and if
we are asked to describe what the module does, we can describe it using a single sentence.
3. Communicational Cohesion
A module is said to have communicational cohesion if all the functions of the module refer to or
update the same data structure; for example, the set of functions defined on an array or a stack. All
the modules in communicational cohesion are bound tightly because they operate on the same input
or output data.
For example, the set of functions defined on an array or a stack.
4. Procedural Cohesion
A module is said to possess procedural cohesion if the set of functions of the module are all part of a
procedure (algorithm) in which a certain sequence of steps has to be carried out for achieving an
objective; for example, the algorithm for decoding a message.
5. Temporal Cohesion
When a module contains functions that are related by the fact that all the functions must be executed
in the same time span, the module is said to exhibit temporal cohesion. The set of functions
responsible for initialization, start up, shutdown of some process, etc., exhibit temporal cohesion.
7. Coincidental Cohesion
A module is said to have coincidental cohesion if it performs a set of tasks that relate to each other
very loosely. In this case, the module contains a random collection of functions. It means that the
functions have been put in the module out of pure coincidence without any thought or design. It is the
worst type of cohesion.
Module design with high cohesion and low coupling characterizes a modules a black box when the
entire structure of the system is described. Each module can be dealt with separately when the
module functionality is described.
2.......................... consist of several functions that pass data along, for example, update and write a
record.
(a) Functional (b) Clustered (c) Sequential (d) Communicational
5.........................is essential but should be limited to a small number of modules with structures.
(a) Common coupling (b) Content coupling (c) Control coupling (d) External coupling
2. Data Modelling
The information flow defined as part of the business modelling phase is refined into a set of data
objects that are needed to support the business. The characteristics of each object are identified and
the relationships between these objects are defined.
3. Process Modelling
In this model, information flows from object to object to implement a business function. To add,
modify, delete, or retain a data object, there is a need for description which is done in this phase.
4. Application Generation
The RAD assumes the use of fourth generation techniques. The RAD process works to reuse existing
program components or create reusable components. To facilitate the cons truction of the software
using the above cases, automated tools are used.
5. Testing and Turnover
In this phase we have to test the programs, but we use some already existing programs which are
already tested, so the time involved in testing is less. Only the new programs or components must be
tested.
The separate stages in the design process later in this section. However, you should not assume from
this that design is a simple, well structured process. In reality, you develop a design by proposing
solutions and refining these solutions as information becomes available. You inevitably have to
backtrack and retry when problems arise. Sometimes you explore options in detail to see if they
work; at other times you ignore details until late in the process. These process activities develop an
example of an object-oriented design. The example that uses to illustrate object-oriented design is
part of a system for creating weather maps using automatically collected meteorological data. The
detailed requirements for such a weather mapping system would take up many pages. However,
overall system architecture can be developed from a relatively brief system description:
A weather mapping system is required to generate weather maps on a regular basis using data
collected from remote, unattended weather stations and other data sources such as weather observers,
balloons and satellites. Weather stations transmit their data to the area computer in response to a
request from that machine.
The area computer system validates the collected data and integrates the data from different sources.
The integrated data is archived and, using data from this archive and a digitized map database, a set
of local weather maps is created. Maps may be printed for distribution on a special -purpose map
printer or may be displayed in a number of different formats.
This description shows that part of the overall system is concerned with collecting data, part with
integrating the data from different sources, part with archiving that data and part with creating
weather maps. Figure 9.24 illustrates a possible system architecture that can be derived from this
description. This is a layered architecture that reflects the different stages of processing in the system
namely data collection, data integration, data archiving and map generation . A layered architecture is
appropriate in this case because each stage only relies on the processing of the previous stage for its
operation. Figure 9.24 shown the different layers and have included the layer name in a UML
package symbol that has been denoted as a subsystem. A UML package represents a collection of
objects and other packages. It is here to show that each layer includes a number of other components.
In Figure 9.25 it has expanded on this abstract architectural model by showing that the comp onents of
the subsystems. Again, these are very abstract and they have been derived from the information in the
description of the system. The design example is explained by focusing on the weather station
subsystem that is part of the data collection layer.
The system context and the model of system use represent two complementary models of the
relationships between a system and its environment:
1. The system context is a static model that describes the other systems in that environment.
2. The model of the system use is a dynamic model that describes how the system actually interacts
with its environment.
The context model of a system may be represented using associations where, essentially, a simple
block diagram of the overall system architecture is produced. This can be expanded by representing a
subsystem model using UML packages as shown in Figure 9.25. This illustrates that the context of
the weather station system is within a subsystem concerned with data collection. It also shows other
subsystems that make up the weather mapping system.
Figure 9.26: Use cases for the weather station.
When you model the interactions of a system with its environment you should use an abstract
approach that does not include too much detail of these interactions. The appro ach that is proposed in
the UML is to develop a use case model where each use case represents an interaction with the
system. In use case models, each possible interaction is named in an ellipse and the external entity
involved in the interaction is represented by a stick figure. In the case of the weather station system,
this external entity is not a human but the data processing system for the weather data.
A use case model for the weather station is shown in Figure 9.26. This shows that weather station
interacts with external entities for start-up and shutdown, for reporting the weather data that has been
collected and for instrument testing and calibration.
Each of these use cases can be described using a simple natural language description. This helps
designers identify objects in the system and gives them an understanding of what the system is
intended to do. It uses a stylised form of this description that clearly identifies what information is
exchanged, how the interaction is initiated etc. This is shown in Figure 9.27 where it has described
the Report use case from Figure 9.26.
The use case description helps to identify objects and operations in the system. From the description
of the Report use case, it is obvious that objects representing the instruments that collect weather data
will be required as will an object representing the summary of the weather data. Operations to request
weather data and to send weather data are required.
9.11.2 Architectural Design
Once the interactions between the software system that is being designed and the system‘s
environment have been defined, you can then use this information as a basis for designing the system
architecture. Of course, you need to combine this with your general knowledge of principles of
architectural design and with more detailed domain knowledge
The automated weather station is a relatively simple system and its architecture can again be
represented as a layered model. It has illustrated this in Figure 9.28 as three UML packages within
the more general Weather station package. Notice how it has used UML annotations (text in boxes
with a folded corner) to provide additional information here.
The three layers in the weather station software are:
1. The interface layer which is concerned with all communications with other parts of the system and
with providing the external interfaces of the system.
2. The data collection layer which is concerned with managing the collection of data from the
instruments and with summarising the weather data before transmission to the mapping system.
3. The instruments layer which is an encapsulation of all of the instruments that are used to collect
raw data about the weather conditions.
In general, you should try and decompose a system so that architectures are as simple as possible. A
good rule of thumb is that there should not be more than seven fundamental entities included in an
architectural model. Each of these entities can be described separately but, of cours e, you may choose
to reveal the structure of the entities as it done in Figure 9.25.
These approaches help you get started with object identification. In practice, many different sources
of knowledge have to be used to discover objects and object classes. Objects and operations that are
initially identified from the informal system description can be a starting point for the design. Further
information from application domain knowledge or scenario analysis may then be used to refine and
extend the initial objects. This information may be collected from requirements documents, from
discussions with users and from an analysis of existing systems.
There are five object classes shown in Figure 9.29. The Ground thermometer, Anemometer and
Barometer represent application domain objects and the WeatherStation and WeatherData objects
have been identified from the system description and the scenario (use case) description.
These objects are related to the different levels in the system architectur e.
1. The WeatherStation object class provides the basic interface of the weather station with its
environment. Its operations therefore reflect the interactions shown in Figure 9.26. In this case,
we use a single object class to encapsulate all of these inte ractions but, in other designs, it may be
more appropriate to use several classes to provide the system interface.
Further data processing security for complex transactions can provide by database management
systems. A single transaction may result in updates to a number of different files, for example a sale
will affect both the accounts receivable and inventory files. In this case, all the required updates must
be made successfully before the transaction is committed to the database; if an error occurs at any
stage after the start of processing, then all the updates will be rolled back (i.e. undone), to prevent the
possibility of inconsistent entries. Additional management controls should be implemented to reduce
data processing risk. Employees should be thoroughly trained in the software and procedures that
they are expected to use. Regular backups should be made and stored in secure off -site locations.
Separation of duties among employees reduces the risk of deliberate fraud , and awareness of ethical
standards should be a part of company policy.
Component Testing
Component testing is also known as unit testing. The aim of the tests carried out in this testing type is
to search for defects in the software component. At the same time, it also verifies the functioning of
the different software components, like modules, objects, classes, etc., which can be tested
separately.
Integration Testing
This is an important part of the software validation model, where the interaction between the different
interfaces of the components is tested. Along with the interaction between the different parts of t he
system, the interaction of the system with the computer operating system, file system, hardware and
any other software system it might interact with is also tested.
System Testing
System testing, also known as functional and system testing is carried o ut when the entire software
system is ready. The concern of this testing is to check the behavior of the whole system as defined
by the scope of the project. The main concern of system testing is to verify the system against the
specified requirements. While carrying out the tester is not concerned with the internals of the
system, but checks if the system behaves as per expectations.
Acceptance Testing
Here the tester especially has to literally think like the client and test the software with respect to
user needs, requirements and business processes and determine, whether the software can be handed
over to the client. At this stage, often a client representative is also a part of the testing team, so that
the client has confidence in the system.
There are different types of acceptance testing:
Operational acceptance testing
Compliance acceptance testing
Alpha testing
Beta testing
Often when validation testing interview questions are asked, they revolve around the different types
of validation testing. The difference between verification and validation is also a common software
validation testing question.
8. The JAD provides a working environment in which to accelerate methodology activities and
deliverables.
(a) True (b) False
One way to overcome these constraints is to keep a file on all transactions as they occur. For
example, transactions can be recorded on tape, which can be an input to an audit program. The
program pulls selected transactions and makes them available for tracing their‘ status. The systems
analyst must be familiar with basic auditing or work closely with an auditor to ensure an effective
audit trail during the design phase.
The proper audit of a system also requires documentation. Documentation is the basis for the review
of internal controls by internal or independent auditors. It also provides a reference for system
maintenance. Preparing documentation occupies much of the analyst‘s time. When the
implementation deadline is tight, documentation is often the first item to be ignored.
Documentation may be internal (in program documentation) or external hard -copy documentation. It
must be complete and consistent for all systems prepared according to standards. So a plan to approve
a new design should include documentation standards before programming and conversion.
The primary purpose of auditing is to check that controls built into the des ign of candidate systems
ensure its integrity.
Caution
Audit considerations must be incorporated at an early stage in the system development so that
changes, otherwise it can be timing consuming and confusing.
9.15 Summary
Preliminary system study is the first stage of system development life cycle.
Systems analysis is a process of collecting factual data, understand the processes involved,
identifying problems and recommending feasible suggestions for improving the system
functioning.
Transform analysis is a set of design steps that map DFDs with transform characteristics into a
design structure chart.
Structured analysis and design technique is a data flow-oriented design approach.
The SADT methodology provides a precise and concise representation sch eme and a set of
techniques to graphically define complex system requirements.
The Jackson system development method is a data structure -oriented design approach.
Functional decomposition is good in a procedural programming environment.
The OOD methodology is a recent development as such it is still dynamic and evolving.
Cohesion is a measure of the relative functional strength of a module.
The basic concept of information engineering is that information systems should he engineered
like other products.
9.16 Keywords
Coupling: A coupling is a device used to connect two shafts together at their ends for the purpose of
transmitting power.
Data Flow Diagrams (DFDs): A data flow diagram is a graphical representation of the flow of data
through an information system, modelling its process aspects.
Jackson System Development (JSD): Jackson system development is a method of system
development that covers the software life cycle either directly by providing a framework into which
more specialized techniques can fit.
Object-oriented Design (OOD): Object-oriented design is the process of planning a system of
interacting objects for the purpose of solving a software problem. It is one approach to software
design.
Structured Analysis and Design Technique (SADT): Structured analysis and design technique is a
software engineering methodology for describing systems as a hierarchy of functions.
Structured Design (SD): Structured design is the art of designing the components of a system and the
interrelationship between those components in the best possible way.
10.0 Objectives
After studying this chapter, you will be able to:
Discuss the input design of system
Discuss the output design of system
Explain graphics design
Understand about desktop publishing
Discuss the form design
Explain layout considerations of form
Discus how to design an automated form
Explain form controls
10.1 Introduction
In this chapter we define systems design as the process of developing specifications for a candidate
system that meet the criteria established in systems analysis. A major step in design is the preparation
of input and the design of output reports in a form acceptable to the user. This chapter reviews input
and output design and the basics of forms design. As we shall see, these steps are necessary for
successful implementation.
10.4 Graphics
Get a better understanding of the basics of graphic design by studying the elements and principles of
graphic design that govern effective design and page layout. Graphic design is the process and art of
combining text and graphics and communicating an effective message in the design of logos,
graphics, brochures, newsletters, posters, signs, and any other type of visual communication.
Designers achieve their goals by utilizing the elements and principles of graphic design.
Graphic design is almost everywhere. Crammed into our homes, all over our cities and dotted around
the countryside, its images, letters, colours and shapes are consciously put together to perform all
sorts of functions.
In short, graphic design is visual communication. It employs lots of different techniques and modes,
but is very seldom purely decorative: graphic design has a job to do and graphic designers are in the
employ of their clients. The graphic designer may be briefed to create a piece of work which catches
a customer‘s eye in a busy supermarket, or they may be required to herald the formation of a new
business. Their client may want their work to impart cultural knowledge at a museum or help foreign
tourists find their way to the bus station. Or graphic designers could be empl oyed for something as
run of the mill as creating a new look for the company stationery. Using an array of visual elements –
including type, colour, shape, photography, illustration, painting, and digital imagery and so on –
graphic designers work with their clients to deliver the required message in the most effective way.
What is Graphic Design: It is the process and art of combining text and graphics and communicating
an effective message in the design of logos, graphics, brochures, newsletters, posters, signs, and any
other type of visual communication. Desktop publishing software is a tool for graphic designers and
non-designers to create visual communications.
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
Types of Forms
Forms are classified into several categories:
Flat Forms,
Unit Set/Snap out Forms
Continuous Strip/Fanfold Forms,
NCR Paper
Pre-printed Forms.
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
The application of the following design rules for simple forms will usually result in an efficient, easy
to use product:
Study the purpose and use of the form and design it with the user in mind.
Keep the design simple. Use a minimum of type fonts and sizes; eliminate unnecessary
information and lines.
Include a form number and name on each form.
Use standard sizes of paper (or screens, if automated) where practical.
Use standard terminology in wording instructions.
Arrange items in a logical sequence.
Arrange items in the sequence in which the information will be extracted during processing.
Preprint constant data (such as agency name) so as to keep variable (fill -in) data to a minimum.
Allow sufficient spacing for the method of fill-in (manual, typewriter, and computer).
10.7.2 Identification
When using a form for the first time, a person reads the title first, in order to gain an idea of the
purpose of the form. Some kind of identification is needed to make the purpose and function
understandable to the user. In addition to the title, identification will include:
Agency name.
Form number, date of edition.
Any suppression notices (―This form replaces RMD 101‖).
Any internal control symbols.
Different users will pay attention to different parts of the identification. A member of the public is
interested in the agency name and form title; a stock clerk is interested in the form number, edition
date, and suppression notice; a file clerk is primarily interested in the form number. Place the
identification information on the form to make it accessible to all users, no matter what their
emphasis.
10.7.3 Title and Subtitle
Where the title is placed can be determined by how the form is used. Top left is often used when the
upper right corner is reserved for filing data. Or the title can be cantered across the top to increase its
visibility in filing equipment or to eliminate a break in typing sequence. A subtitle is helpful,
especially if the form is used by the public, to explain or qualify the main title. If there is more than
one type of a category of form, for example, ―Daily Contact Report,‖ each form should be
distinguished by a subtitle under the main title, such as ―Requests from Field Offices‖ or
―Shipments.‖
10.7.9 Instructions
Place brief instructions at the top of the form below or near the title to tell the user how many copies
are required, who should submit the form, and where, when, and to whom copies are to be sent.
If detailed instructions are included elsewhere, such as on the reverse, direct the user to that area
(―See reverse for instructions.‖) Short instructions that relate to a specific section should be placed at
the top of that section.
Longer, more involved instructions are placed:
On the front of the form if there is enough room for both the instructions and the fill -in data.
On the back, if there is not sufficient space on the front.
On a separate sheet or in a booklet.
In an administrative directive or agency operations manual.
10.7.10 Routing
You can significantly simplify the handling of forms by incorporating effective routing and mailing
design techniques. These techniques can also reduce the chance of errors and speed the delivery of
the mail.
When possible, make the form self-routing, eliminating the need for a routing slip or transmittal
letter. This can be done in several ways, such as:
A ―to/from‖ line or lines which the user fills in.
A routing name or address pre-printed, for constant routing.
A pre-printed multiple arrangements of departments, with instructions to route in the order listed.
10.7.11 Arrangement
The arrangement of the items on the form, other than the identification data, should facilitate the
entering and retrieving of information. There should be a discernible sequence of the fields or items,
so the user does not have to move from one area of the form to another and back again to either
complete or interpret the form. Three basic arrangement factors are involved:
1) Grouping data.
2) Establishing item sequence.
3) Aligning data.
Grouping Data - If different people are to be entering data on the same form, have the item
arrangement match the sequence of the processing steps so that the last person to enter data on the
form is entering it at the bottom or on the last page of the form. This eliminates the need for
backtracking or searching the form for the correct entry area. If the form is used as a source
document to collect data on different types of materials, group the related items together.
It is helpful to identify the groupings either by numbering or lettering. You can identify subgroups
the same way you would in an outline, with letters following numbers in the progression of
importance.
Establishing Item Sequence: Grouped related items, put them in a sequence that will allow the user
to move from one to the next without having to look back up the page. This can aid the person filling
in the form as well as the person transcribing or reviewing the information. Numbering the items can
inform the user that there is a prescribed sequence to follow.
Aligning Data: Arrange the items to follow people‘s visual habits, from left to right and from top to
bottom. By arranging the form in this manner, you can relieve the user of wasted motion. If the
information is to be entered using a typewriter, punch machine, or computer, help the user by
aligning items vertically to reduce the number of tabular or marginal stops.
10.7.13 Margins
Some space should be allowed around the text area of a form for utility as well as appearance. Some
printers and reproduction shops require margins as working space for the sprocket holes that permi t
machines to grip the paper during printing, or for trimming the paper when several copies of a form
are printed on larger sheets. Allow at least 1/3 inch at the top, 1/2 inch at the bottom, and 3/10 inch at
the sides. If using card stock, allow 1/8 inch on all sides.
Sometimes it is necessary for the image on the form to extend to the edge of the paper, for example,
if the form consists of several pages put together in an overlapping configuration to indicate
comparative or cumulative figures. Printing the image to the edge requires printing on a sheet of
paper larger than the finished form size, then trimming the printed page to the finished size. This
process is called bleeding, which means to run off the edge of the trimmed printed sheet. If the form
is designed for offset printing, a good practice is to draw lines beyond the image size. When trimmed,
the lines will bleed off the edge of the paper, leaving a clean edge. Because of the extra handling and
trimming, bleeding can be expensive, so use it only if you need it for making a form more effective.
10.7.14 Spacing
Space requirements will be determined by the amount of fill -in data needed as well as the amount of
printed material such as captions, headings, and instructions. The writing method (hand, typewriter,
computer or other office machine) determines the amount of space you should allow for fill -in data,
and the number of characters per inch of typeface used determines the amount of space needed for
printed matter.
Horizontal spacing is based on the number of characters written per inch and is determined by the
writing method. Vertical spacing is based on the number of writing lines that can be written per inch.
Many forms are typewritten, some are handwritten, and a small percentage combin es the two
methods. Since most forms created on computer for computer entry are designed to be compatible
with the machine spacing, incorrect horizontal or vertical spacing is rarely a problem.
Typewritten Spacing
Horizontal Spacing: There are 12 characters of elite type and 10 characters of pica type to the
horizontal inch on standard typewriters. Accordingly, when counting horizontal spaces, allow 1/12
inch for elite and 1/10 inch for pica type; 1/10 inch accommodates either elite or pica type and allows
maximum entry space. Whenever possible, add a minimum of one extra space to the required number
of characters to prevent crowding.
Vertical Spacing: There are six vertical lines per inch on the standard typewriter, elite or pica.
Accordingly, 1/6 inch, or a multiple of 1/6 inch, should be allowed for each line of typing. By
measuring spacing this way, you require the user to adjust the form in the typewriter for the first line
of typing only, after which no further adjustments are needed.
Handwritten Spacing
Horizontal Spacing: Provide 1/10 to 1/6 inch per character to avoid crowding or excessive
abbreviation.
Vertical Spacing: Provide 1/4 inch to 1/3 inch per line. When using a box design, allow 1/3 inch.
Otherwise, 1/4 inch is usually plenty of space for handwritten entries.
If a form is filled in by both typewriter and handwritten methods, determine the horizontal space by
hand fill-in requirements and the vertical space by typewriter requirements. The 1/3 inch vertical
spacing will accommodate either method of entry.
Caution
Lack of information of each section of a form can generate a problem while filling the form.
5. A source document collects...................an input action, and provides a record of tilt original
transaction.
(a) input data (b) triggers
(c) authorizes an input action (d) All of these
10.8 Automated Form Design
These are basically online forms such as reservation forms, online forms online shopping forms etc.
The automated or electronic forms are created through some software such as Microsoft Word, Adobe
Designer, etc.
10.8.1 Creating an Automated Form in a Word 2007
This example describes how to create a simple form in a Word document that automatically prompts a
user to fill in information.
How to Create the Template
To create a template with automatic FILLIN fields, follow these steps:
1. On the File menu, click New.
2. In the new document task pane, click general templates under new from template. In the templates
dialog box, select the template that you want to use. Under Create New, click template. Click OK.
3. Create the FILLIN fields. To do this, use either of the followin g methods. Method 1: Create a
Field by Using the Menus
o Position the insertion point where you want to insert the text field.
o On the Insert menu, click Field.
o In the Categories list, click Mail Merge.
o In the Field Names list, click Fill-in.
o In the Field properties box, type quotation marks around the message that you want to display.
For example, use the following syntax to display a message that prompts users to enter a first and last
name:
FILLIN ―Please enter your first and last name.‖
o Click OK.
A sample of the message appears. Click OK to return to your document.
NOTE: To view the field code that you just inserted, press ALT+F9.
o Repeat steps a through f for every place in the document where you want to insert a FILLIN
field.
Method 2: Create a Field by Using Keystrokes
o Position the insertion point where you want to insert the field.
o Press CTRL+F9.
Field braces ({ }) appear in the document.
o Position the insertion point inside the field braces.
o Type the following
FILLIN ―message‖
Where message is the instruction that Word prompts the user with (about what to enter in this
field).NOTE: If you press F9 while the insertion point is still on the field, you can see a sample of the
message that will be displayed. You do not have to follow this step to create the FILLIN field.
4. On the File menu, click Save As.
5. Name the template appropriately.
How to Use the Template
To create new documents based on the new template, follow these steps:
1. On the file menu, click New.
2. In the new document task pane, click general templates.
3. In the templates dialog box, select your template, and then click OK.
Word creates a new document, automatically searches for all FILLIN fields in the document, and then
starts to prompt the user for input.
The results are not perfect, but it is a great time-saver. Often, for a simple form, you only have to do
minor tweaking to get a useful form.
If a form is to be revised, that official usually must approve the final revisions. It is natural for this
manager, supervisor, or management officer to feel that the daily responsibilities of the job are
enough to handle without worrying about the analysis and improvement of forms. As a result, agency
officials may be overlooking that unsuitable forms, or the need for more or fewer forms, may actually
be causing operating problems in the agency. Supervisors should be aware of the time that they and
their staff spend both in studying forms for possible changes and in handling problems caused by
inefficient forms. Suggestions arising from these problems should be forwarded to the forms
management staff (or the records management officer) for a detailed forms analysis.
Many forms travel from the originating office to other departments for preparation and use. In the
majority of agencies, most forms are administrative, and the various departments have similar
operations. This often results in forms from the different departments being similar or redundant. The
analyst can assist with standardization of these forms or with the elimination of needless forms.
An important benefit of maintaining a functional control file is the detection and elimination of
unauthorized forms (often called ―bootleg forms‖). These are forms that have been designed or
reproduced outside of the established forms management program. By knowing which forms are
approved for use within the agency, personnel can eliminate those forms which should not be used.
The functional classification file can be difficult to develop, but it is an excellent type of control file
to use in analyzing the agency‘s forms and their use.
7. A memory form is a record of history data that remains in a file, is used for reference and serves as
control on details.
(a) True (b) False
10.10 Summary
Input design is the process of converting user-originated input to a computer-based format. In the
system design phase, the expanded data flow diagram identifies logical data flows, data stores,
sources and destinations.
Computer output is the most important and direct sources of information to the user. Efficient,
intelligible output design should improve the system‘s relationships with the user and help in
decision making
Graphic design is the process and art of combining text and graphics and communicating an
effective message in the design of logos, graphics, brochures, newsletters, posters, signs, and any
other type of visual communication.
Desktop publishing software allows the user to rearrange text and graphics on screen, change
typefaces as easily as changing shoes, and resize graphics on the fly, before finally committing a
design to paper
Automated forms are the infrastructure use to submit the information or data. These are basically
online forms such as reservation forms, online forms online shopping forms etc.
10.11 Keywords
Control File: It is provides a complete profile of each form from its creation to its current status.
Desktop Publishing: It is the process of using the computer and specific types of software to combine
text and graphics to produce documents such as newsletters, brochures, books, etc.
Graphic Design: It is the process and art of combining text and graphics and communicating an
effective message in the design of logos etc.
Systems Flowchart: It is referring to specifies master as (data base), transaction files, and computer
programs.
Video display terminal: It is a computer terminal having a video display that uses a cathode -ray tube.
11.0 Objectives
After studying this chapter, you will be able to:
Explain the supplier and types
Discuss the software industry
Explain the role of consultant
Understand the post installation review
Discuss about the hardware and software selection
Define ownership
Understand the financial consideration in selection
Explain about used computer
Define computer contract
11.1 Introduction
In the computer software stream learn the basics of operating system structures, memory
management, compilers, middleware, etc. Computers today are designed in conjunction with compiler
technology and almost all make use of an operating system - this includes laptops, cell phones, and
PDAs. Students will also study the basics of data structures, programming languages, databases,
security, and software engineering.
Computer hardware stream will learn the basics of digital design at the gate and system/architectural
level. Most people will spend their entire life no more than one meter away from some type of digital
system (e.g. laptop, cell phone, PDA, iPod, GPS, auto, controllers, etc.) Digital hardware surrounds
us all and affords many interesting careers. Students in this stream will study computer hardware,
computer architecture, and digital systems design.
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
Caution
The document produced during the workflow and process plannin g stage should be used as a guide
against which the system will be tested.
2. The procedures for obtaining any further information from you should the suppliers
have......................
(a). queries (b). system analysis
(c). design (d). None of these.
3. This process should usually include a demonstration of their proposed offering by each of
the........................
(a). selling (b). buying (c). suppliers (d). None of these.
4. ...................industry encompasses all the activities and businesses involved with development,
maintenance and distribution of computer software
(a). Software (b). Hardware
(c). Both (a) and (b) (d). None of these.
Software selection is a critical aspect of system development. Th e search starts with the software,
followed by the hardware. There are two ways of acquiring software: custom – made or ―off – the –
shelf‖ packages. Today‘s trend is toward purchasing packages, which represent roughly 10% of what
it costs to develop the same in house. In addition to reduced cost, there are other advantages:
1. A good package can get the system running in a matter of days rather than the weeks or months
required for ―home-grown‖ packages.
2. MIS personnel are released for other projects.
3. Packages are generally reliable and perform according to stated documentation.
4. Minimum risks are usually associated with large – scale systems and programming efforts.
5. Delays in completing software projects in house often occur because programmers qu it in
midstream.
6. It is difficult to predict the cost of ―home-grown‖ software.
7. The user has a change of seeing how well the package performs before purchasing it.
11.7 Ownership
The primary objective of policies concerning ownership of intellectual property must be to preserve,
protect and foster the open and creative expression and exchange of information, ideas and research
results. This is not only the responsibility of a public educational institution; it is the basic premise
on which a university must exist.
To encourage the production and distribution of creative works, our legal system has established
property rights for inventions and writings through patents and copy-rights. Ownership of these
properties is reserved to the creator for a limited time during which the creator may sell, lease or
distribute the product of his/her efforts. The purpose of these limited rights is to establish an
incentive to make the fruits of individual creativity available to society at large.
Although governmental agencies and most businesses assert ownership of the intellectual property
created by their employees, the University of Wisconsin System has not typically done so. Such
ownership provides the opportunity to withhold as well as disseminate. Ownership of intellectual
property by the University would provide a general right and perhaps a responsibility to censor; this
runs counter to the University mission to engage in open innovation and inquiry. Individual
ownership is also more consistent with the self-directed nature of much university work and of
Wisconsin traditions in particular.
The intellectual property in original works of authorship such as books, articles and similar works is
protected by copyright, which is held to exist at the point the material is created. As with patents,
ownership at the University is normally vested in the creator. For example, faculty have ordinaril y
owned rights to books created in the course of their scholarly activities, regardless of the funding mix
supporting their work and the extent to which University resources have been used in their
preparation. Copyright law has been amended recently to include computer software. One provision
called ―work for hire‖ states that when an employee is specifically directed to produce a software
product as a condition of employment, ownership rights including copyright rest with the employer.
11.8.1 Rental
Computer rental is for the short – term use of a system, generally form 1 to 12 months. Each month a
payment is made for the use of the equipment. Both the user and supplier have the option of
cancelling the rental with advance notice, usually 30 or 60 days ahead of the termination date.
Because the commitment is short-term, the renter has a great deal of flexibility. The decision to
purchase a system can be delayed until financing is adequate, until a new generation of equipment is
available, or until such time as the organization wishes, for whatever reason. Flexibility can be
particularly important when an organization is experiencing planned rapid growth and will outgrow a
specific system in a brief period, when important reorganizations of divisions and departments that
will affect computing resources are in progress, or when the enterprise is in a period of dynamic
change.
11.8.2 Lease
As lease is a commitment to use a system for a specific time, generally from three to seven years.
Payments are predetermined and do not change throughout the course of the lease. Depending on the
terms of the lease, payments are monthly, quarterly, semi -annual, or annual and include the cost of
equipment service and maintenance. At the end of the lease period the lessor generally does not own
the equipment. (If that is not the case, and the equipment becomes the property of the lessor, the
Internal Revenue Service considers the agreement a conditional sale and the entire transaction must
then be treated as a purchase.)
11.8.3 Purchase
The ownership of computers through outright purchase is the most common method of computer
acquisition and is increasing in popularity as lease costs rise. Over time, the purchase option
frequently costs the least, especially in light of the tax advantages that can some – times be gained.
Under purchase, the organization takes title to the equipment. Of course, the money for the purchase
must be taken from operating funds or borrowed. And, in a sense the organization is locked in to the
system it purchases, since changing to a different computer system is more difficult; either the system
must be sold or arrangements must be negotiated to trade it in on a different computer.
The organization must acquire its own maintenance services (for parts and labour), usually from the
manufacturer, and pay the monthly charges, which fluctuate from year to year. In addition, if the
equipment was financed, payment on the loan must be made periodically. The cash outflow still may
be lower than with renting or leasing, depending on the terms arranged by the purchaser. In return for
the outgoing cash, purchase offers specific tax advantages:
1. The monthly maintenance charges are deductible as a business expense.
2. Interest on any loan to finance the purchase is deductible as a business expense.
3. The cost of the equipment can be depreciated over time; this also lowers the taxable income and
therefore the income taxes paid.
4. Local, state, and federal taxes paid on the purchase may be deductible from income taxes.
The purchase option indicates the use of depreciation to reduce taxes. In a sense then, depreciation
deductions on income tax reduce the cost of the computer to the organization. Normally, this benefit
is not possible under lease agreements and it is never feasible for short – term rentals. Of course, the
tax benefits described apply only to firms that operate for profit. Non profit firms that do not pay
income taxes thus do not receive tax benefits from computer purchase.
Table 11.1 Comparison of Computer Systems Financing Options
Functionality
Most of the time, refurbished computers are just customer returns. In some cases, there could be
nothing wrong with the computers at all. The customer just did not like the products, so they returned
them within the warranty period. In cases like this, the computer manufacturers will sell these
computers as refurbished. They cannot sell them as new because they have been opened at used
slightly for a short period of time.
The other instance of refurbished computers is defective parts. This could range from anything to a
RAM module to the motherboard. When the manufacturers refurbish thes e computers, they normally
replace the faulty part with a brand new one. If the issue is with installation, they will properly
reinstall the part.
Manufacturers also test for like-new functionality before selling. Not only do they test the
replacement parts, but everything else as well. Expect refurbished computers to function just as well
as new ones.
Warranty
Large computer companies will offer the exact same warranty on refurbished products as they will on
new products. So as far as the warranty goes, you really are losing nothing by purchasing refurbished.
If a warranty is important to you, make sure that you buy refurbished from the manufacturer rather
than some independent reseller that refurbishes computers. This will give you the full warranty that
you want.
Price
Computer manufacturers cannot sell any product as new unless it is truly 100% new. This means that
customer returned computers, extra computers and previous generation computers, must be sold under
a different name. So, manufacturers sell all these types of products with a refurbished label.
They also know that consumers will not pay near as much for refurbished computer as they would
brand new ones. In turn, refurbished computers are often offered at rock bottom prices --often times
up to 20% off the new price.
Here is a good example of when to buy refurbished: Apple originally sold a Mac Book for 80000,
new. The company also offered the same laptop for $1300 refurbished. A month later, Apple gets
newer processors in and overhauls all of their notebooks. The original laptop is now offered at 45000
refurbished.
Accessories
One question you may have with buying refurbished is: Do you receive everything that comes with a
new computer? Yes. Computer manufacturers will provide you with all original documentation, user
manuals, power cords and other accessories. The one downfall is that your item will usually come in
a plain brown packaging rather than the original box.
On-Premises Software Deployment: Before the widespread availability, affordability and adoption of
networks particularly the Internet on premises software deployments were virtually the only choice
for businesses Some examples include large-scale enterprise resource planning (ERP) and customer
relationship management (CRM) systems as well as single-user programs such as QuickBooks or
Microsoft Office. As its name implies, an on-premises implementation is software that‘s installed and
operated on computer(s) located on the premises of the software licensee, rather than at a remote
facility. This model largely defined and drove the first generation of business computing.
However, on-premises software is limited in its ability to support remote access to computing
services. Customizations if allowed– can be difficult and expensive. Software vendors also make
significant investments in legacy code that tend to work poorly in off-premises configurations.
11.13.2 Acceptance
In today‘s marketplace, removing all potential objections to accepting hardware solutions is critical.
Designers are under increasing pressure to develop products faster and cheaper. Potential customers
are increasingly demanding complete solutions to their problems. Hardware alone is rarely the
answer.
In order to provide a more complete solution to customers, significant software support including
software drivers, network protocol support, configuration, management and control applications,
reference software, or development tools are needed to improve the ability of a customer to
successfully move beyond just a device to a real solution. In many situations, software can be the
differentiating factor in making a component choice.
6. Software industry primarily concerned with the development of two types of software.
(a). True (b). False.
7. India‘s software exporting industry is one of the world‘s successful information technology
industries.
(a). True (b). False.
11.14 Warranties
An implied warranty is one that arises from the nature of the transaction, and the inherent
understanding by the buyer, rather than from the express representations of the seller.
11.14.1 Covered (and is free)
All individual hardware parts are covered for a minimum of one year, including the cost of
removing the faulty part at our base and reconnecting the new one. Parts need to be demonstrably
faulty, either through a manufacturer's software test routine, or other repeatable test, which takes
less than 30mins. Less frequent faults (eg occurring once or twice in day) would be classified as
reliability issues.
Where individual parts have a manufacturer's warranty greater than a year (eg SCSI disks), then
the one year minimum is extended accordingly. Where the terms of the manufacturer's warranty
exceeds our own (eg on-site swap-out of monitors), you will be entitled to this service provided
you have complied with the manufacturer's requirements (eg registering your product with them).
At our discretion and where applicable and mutually convenient, repair work may be carried out
on site. The inclusion of on-site installation at the time of initial purchase does not constitute an
entitlement to subsequent on-site support unless specifically itemised on the invoice.
Faulty or damaged items should be notified within seven days of receipt of goods, and will be
dealt with in accordance with the returns procedure laid out by the manufacturer. All goods
returned must be in the manufacturers' original packaging complete with all ancillary items. The
company reserves the right to refuse returns for items which have become obsolete or were part of
a special order, regardless of the time the return is requested or the condition of the goods.
The failure of some of them may go unnoticed to some or all of your applications depending upon
when the failure occurs. For example, if the Registry Service crashes after your consumer have
successfully obtained all necessary EPR information for the services it needs in order to function,
then it will have no adverse affect on your application. However, if it fails before this point, your
application will not be able to make forward progress. Therefore, in any determination of reliabi lity
guarantees it is necessary to consider when failures occur as well as the types of those failures.
It is never possible to guarantee 100% reliability and fault tolerance. The laws of physics (namely
thermodynamics and the always increasing nature of entropy) mean that hardware degrades and
human error is inevitable. All we can ever do is offer a probabilistic approach: with a high degree of
probability, a system will tolerate failures and ensure data consistency/make forward progress.
Furthermore, proving fault-tolerance techniques such as transactions or replication comes at a price:
performance. This trade-off between performance and fault-tolerance is best achieved with
application knowledge: any attempts at opaquely imposing a specific approach wil l inevitably lead to
poorer performance in situations where it is simply not necessary.
Exercise: Check Your Progress 2
Case Study-Computer Software History
There is a high likelihood that every individual from developed countries has to deal with some kind
of software. Computer software is a rather broad term that is used to encompass the different types of
software based on which a computer actually works. Yet, even though software engineering has
become a large and profitable industry just few people known how it all started. Here one can find
out more about the history of software and how it all came together.
It all started with Alan Turing who proposed a theory about software in 1935 in his essay called
Computable numbers with an application to the decision problem. The term software was however
not used in the written literature until 23 years later when John W Tukey used it in print. The term is
commonly used to describe application software but in computer engineering this word encompasses
all the information that is processed by the computer system, programs in general and data. It is
believed that the history of software as we know it began in 1946 when the first software bug was
developed. Software has become in time cheaper and faster as did hardware. At first some elements
of the computer were considered to be software but then they have joined the ranks of hardware.
Software has constantly increased in popularity as the importance of computers has increased.
Moreover, individuals started to want more from computers and this caused a need to further progress
and development of the software that was being produced. For a long time however, software was
bundled with the hardware by original equipment manufacturers. This meant that a new computer
would have not come with pre-installed software but that the software had to be installed by the
specialists working for the original equipment manufacturers. Nowadays things are much simpler
since people can perform software download from the internet whenever they need a new program.
Questions
1. Who was the founder of computer software?
2. When software industries began?
11.16 Summary
Software is supplied in accordance with the publisher‘s license agreement, and we do not offer
any warranties beyond the scope of that user license, which the purchaser is deemed to have
accepted.
IT suppliers range from small local outfits to global organizations. Even the largest suppliers can
provide systems, services and consultancy to small businesses.
Client/Server computing is a technique in which application is shared between a desktop ―client‖
and one or more network attached ―servers‖.
The Post Installation Review (PIR) is also an ideal opportunity for the client to identify any
additional requirements such as training or report writing.
The primary objective of policies concerning ownership of intellectual property must be to
preserve, protect and foster the open and creative expression and exchange of information, ideas
and research results.
11.17 Keywords
Capacity: Capacity refers to the capability of the software package to handle the user‘s requirements
for size of files, number of data elements, volume of transactions and reports and number of
occurrences of data elements.
Delivery Model: A ―delivery model‖ refers to the approach taken to ―deliver‖ enterprise software. It
is usually used when referring to a software application.
Lease: As lease is a commitment to use a system for a specific time, generally from three to seven
years.
Proprietary: A proprietary design or technique is one that is owned by a company. It also implies that
the company has not divulged specifications that would allow other companies to duplicate the
product.
Reliability: It is the probability that the software will execute for a specified time period without a
failure. It is particularly important to the professional user.
System integrators: System integrators select the appropriate hardware and software for your specific
needs and deliver an integrated, working system.
12.0 Objectives
After studying this chapter, you will able to:
Understand system security
Define that why system security is an important concern
Discuss the threats to system security
Explain personal computer and system integrity
Define risk analysis
12.1 Introduction
Disaster recovery closely parallels computer security operations in several functional areas. Threat
evaluation, risk assessment, mitigation, and service priorities are but only a few of the items that are
on the event horizon. Traditional disaster recovery procedure looks at the varying aspects of planning
and implementation from an administrative perspective, focusing primarily on physical infrastructure,
backup and restoration procedure, staffing, logistical operations, and connectivity. Attention to
computer security must be given at all levels of recovery to ensure the integrity of the system(s).
Computer security is not restricted to these three broad concepts. Additional ideas that are often
considered part of the taxonomy of computer security include:
Access control Ensuring that users access only those resources and services that they are entitled
to access and that qualified users are not denied access to services that they legitimately expect to
receive
Nonrepudiation Ensuring that the originators of messages cannot deny that they in fact sent the
messages
Availability Ensuring that a system is operational and functional at a given moment, usually
provided through redundancy; loss of availability is often referred to as ―denial -of-service‖
Privacy Ensuring that individuals maintain the right to control what information is collected
about them, how it is used, who has used it, who maintains it, and what purpose it is used for
The effects of various threats vary considerably: some affect the confidentiality or integrity of data
while others affect the availability of a system.
12.4.1 Errors and Omissions
Errors and omissions are an important threat to data and system integrity. These errors are caused not
only by data entry clerks processing hundreds of transactions per day, but also by all types of users
who create and edit data. Many programs, especially those designed by users for personal computers,
lack quality control measures. However, even the most sophisticated programs cannot detect all types
of input errors or omissions. A sound awareness and training program can help an organization
reduce the number and severity of errors and omissions.
Users, data entry clerks, system operators, and programmers frequently make errors that contribute
directly or indirectly to security problems. In some cases, the error is the threat, such as a data entry
error or a programming error that crashes a system. In other cases, the errors create vulnerabilities.
Errors can occur during all phases of the systems life cycle.
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
Potential exposures may be classified as natural, technical, or human threats. Examples include:
Natural Threats: internal flooding, external flooding, internal fire, external fire, seismic activity,
high winds, snow and ice storms, volcanic eruption, tornado, hurricane, epidemic, tidal wave,
typhoon.
Technical Threats: power failure/fluctuation, heating, ventilation or air conditioning failure,
malfunction or failure of CPU, failure of system software, failure of application software,
telecommunications failure, gas leaks, communications failure, nuclear fallout.
Human Threats: robbery, bomb threats, embezzlement, extortion, burglary, vandalism, terrorism,
civil disorder, chemical spill, sabotage, explosion, war, biological contamination, radiation
contamination, hazardous waste, vehicle crash, airport proximity, work stoppage (Internal/External),
computer crime.
6. Trojan Horse commonly use network services to propagate to other host systems.
(a) True (b) False
7. ........................is one of the most insidious threats that enterprises encounter today.
(a) Malware (b) Virus (c) Worm (d) Trojan Horse
12.7.2 Records
Records can be classified in one of the three following categories: vital records , important records,
and useful records.
Vital records are irreplaceable. Important records can be obtained or reproduced at considerable
expense and only after considerable delay. Useful records would cause inconvenience if lost, but can
be replaced without considerable expense.
Vital and important records should be duplicated and stored in an area protected from fire or its
effects.
Protection of records also depends on the particular threat that is present. An important consideration
is the speed of onset and the amount of time available to act. This could range from gathering papers
hastily and exiting quickly to an orderly securing of documents in a vault. Identifying records and
information is most critical for ensuring the continuity of operations .
A systematic approach to records management is also an important part of the risk analysis process
and business recovery planning. Additional benefits include: reduced storage costs, expedited service,
federal and state statutory compliance.
Records should not be retained only as proof of financial transactions, but also to verify compliance
with legal and statutory requirements. In addition, businesses must satisfy retention requirements as
an organization and employer. These records are used for independent examination and verification
of sound business practices.
Federal and state requirements for records retention must be analyzed. Each organization should have
its legal counsel approve its own retention schedule. As well as retaining records, the orga nization
should be aware of the specific record salvage procedures to follow for different types of media after
a disaster.
Caution
Records kept in the computer room should be minimized and should be stored in closed metal files or
cabinets. Records stored outside the computer room should be in fire-resistant file cabinets with fire
resistance of at least two hours.
12.8.1 Recovery
Recovery of database integrity has the highest priority; if a database transaction fails or must be
cancelled, the effects of the transaction must be removed and the databas e must be restored to its
exact condition before the transaction began.
The standard Adabas system provides transaction logic (called ET logic), extensive
checkpoint/logging facilities, and transaction-reversing backout processing to ensure database
integrity.
Restarting the database following a system failure means reconstructing the task sequence from a
saved level before the failure, up to and including the step at which the failure occurred -including, if
possible, successfully completing the interrupted operation and then continuing normal database
operation. Adabas provides a recovery aid that reconstructs a recovery job stream to recover the
database.
Recoverability is often an implied objective. Everyone assumes t hat whatever happens, the system
can be systematically recovered and restarted. There are, however, specific facts to be determined
about the level of recovery needed by the various users of the system. Recoverability is an area where
the DBA needs to take the initiative and establish necessary facts. Initially, each potential user of the
system should be questioned concerning his recovery/restart requirements. The most important
considerations are:
how long the user can manage without the system;
how long each phase can be delayed;
what manual procedures, if any, the user has for checking input/output and how long these take;
What special procedures, if any, need to be performed to ensure that data integrity has been
maintained in a recovery/restart situation?
The RLOG holds a minimum of four consecutive generations, up to a maximum value specified when
the RLOG is activated; the maximum is 32. If RLOG space is not sufficient to hold the specified
number of generations, the oldest generation is overwritten with the newest in wraparound fashion.
The RLOG file is formatted like other database components by running the ADAFRM utility (SIZE
parameter), and then defined using the PREPARE function of the Recovery Aid ADARAI utility
(with the RLOGSIZE parameter). The space required for the RLOG file is approximately 10 cylinders
of 3380 or equivalent device space.
The ADARAI PREPARE function must be performed just before the ADASAV SAVE run that begins
the first generation to be logged. After ADARAI PREPARE is executed, all subsequent nucleus and
utility jobs that update the database must specify the RLOG file. Of course, the RLOG file can be
included in any or all job streams, if desired.
The RLOG file job statement should be similar to the following:
//DDRLOGR1 DD DISP=SHR,DSN=... .RLOGR1
Note that the node you specify may be any active cluster node; it does not have to be the node
identified as the lead node on the command line when the cluster was originally started. Also, the
deployment file you specify must be the currently active deployment settings for the running database
cluster.
If security is enabled for the cluster, you must also specify a username and, optionally, a password on
the command line. (If you specify a username but not a password, you will be prom pted for the
password.) The full syntax for specifying the node to reconnect to is as follows. You only need to
specify the port number if the server was started using a different port number than the default.
username:password@nodename:port
For example, the following command attempts to rejoin the current system to the cluster that includes
the node voltserver2 using the username operator. VoltDB will prompt for the password.
$ voltdb rejoinhost operator@voltserver2 \
deployment mydeployment.xml
12.12 Plans
A disaster recovery plan is a written plan describing the steps a company or individual would take to
restore computer operations in the event of a disaster. Every company and each of its' department or
division within an enterprise usually has it's own disaster recovery plan. A disaster recovery plan
contains four major components: the emergency plan, the backup plan, the recovery plan, the backup
plan, the recovery plan, and the test plan.
Hot site––is a separate facility that mirrors the systems and operations of the critical site. The hot site
always operates concurrently with the main site. This type of backup site is the most expensive to
operate. Hot sites are popular with stock exchanges and other financial institutions that may need to
evacuate due to potential bomb threats and must resume normal operations as soon as possible.
Warm site––A warm site is a location where the business can relocate to aft er the disaster that is
already stocked with computer hardware similar to that of the original site, but does not contain
backed up copies of data and information.
Cold site––is a site that mirrors some of the critical site hardware, but does become opera tional until
the critical site becomes unavailable. It's the most inexpensive type of backup site for a business to
operate. It does not include backed up copies of data and information from the original location of the
business, nor does it include hardware already set up.
The location of the alternate site facility is important. It should be close enough to be convenient, yet
not too close that a single disaster, such as an earthquake, could destroy both facilities. All sites
should have high-speed Internet services.
12.13 Team
General roles and responsibilities for teams involved in systems recovery are defined below. Specific
tasks for each team related to recovery in the event of an incident are listed in section 8.0. Individuals
currently filling these positions, along with their contact information, can be found in Appendix B.
Team Members:
Associate Director
Facilities Manager
Telecom Manager
Internet Services Manager
Data Services Manager
Responsibilities:
This team is responsible for the overall coordination of responses to all emergencies affecting
information and telecommunication systems.
Readiness Responsibilities
o Support related training
o Test and update System Recovery Plan.
o Manage Assets and Services Database
o Conduct an annual review of the Systems Recovery Plan along with a reassessment of risks
and an update of the Risk Mitigation Plan.
o Market IT Systems Recovery Plan to campus.
Recovery Responsibilities
o Assess severity of service interruption and declare disaster if warranted.
o Initiate action by appropriate recovery team(s).
o Manage communications amongst recovery teams.
o Manage communications to Campus.
o Co-ordinate resources and financial requirements needed to effect recovery.
o Declare return to service under temporary operations.
o Declare return to service under normal operations
o Assess recovery process after return to normal operation.
o Implements updates or improvements to SRP
12.13.2 Facilities Recovery Team
Team Leader – Telecom Manager
Backup Team Leader – Facilities Manager
Team Members:
Facilities Manager
Network Analyst
Telecom Electrician
Telecom Technician
Physical Plant Person(s)
Responsibilities
This team is responsible for responding to emergencies which physica lly impact the computer rooms,
network and telephone equipment rooms, cabling and wiring infrastructure, ancillary equipment such
as UPS and air-conditioning, servers and network devices. The focus of this recovery team is to
provide the facilities and devices necessary to restore services that have been disrupted.
Readiness Responsibilities
o Review backup power and cooling capabilities.
o Ensure related contracts are in place.
o Ensure network and wiring diagrams are up-to-date and a printed copy is available.
o Ensure information from Physical Plant is available as needed.
o Conduct annual review of IT facilities with regard to power, cooling, security, fire detection
and suppression and water detection.
o Ensure that Asset and Services Management Database (ASMD) is up to date and a copy is
available.
o Document power management grid layout for data centre.
o Insure an appropriate level of spares is in place.
Recovery Responsibilities
o Perform assessment and recovery tasks as outlined in section 8.4.
o Proactively communicate to Recovery Management Team.
Team Members:
Data Services Manager
Facilities Manager
System Support Specialist
System Support Specialist
System Support Specialist
System Support Specialist
Responsibilities
This team is responsible for response and resolution of all emergencies affecting the server hardware,
operating system and applications as identified in the asset and service management database.
Examples of these services include Email, Banner, WebCT.
Readiness Responsibilities
o Ensure operating system, data and application files are backed up.
o Conduct periodic test to ensure backup and recovery procedures are current and tested.
o Ensure currency of configuration parameters, procedures, tools, and process es for re-build of
services.
o Ensure that service contracts are up-to-date.
o Train backup personnel in rebuild procedures.
o Keep Assets and Services Management Database up to date as systems are added, changed or
removed.
Recovery Responsibilities
o Perform assessment and recovery tasks as outlined in section 8.5.
o Proactively communicate to Recovery Management Team.
Team Members:
Lab Supervisor
Desktop Team Lead
Help Desk Team Lead
Responsibilities
This team is responsible for response and resolution of all emergencies affecting all desktops except
where units have their own desktop support staff. Information Technology will work with any such
units to assist them with their own system recovery plan.
Readiness Responsibilities
o Identify source of replacement desktops.
o Keep inventory of assets up to date.
o Ensure ghost images of default configurations are in place.
Recovery Responsibilities
o Perform assessment and recovery tasks as outlined in section 8.6.
o Proactively communicate to Recovery Management Team.
Recovery Responsibilities
o Perform assessment and recovery tasks as outlined in section 8.7.
o Proactively communicate to Recovery Management Team.
It is encouraging that in all of the ethics codes of the computer professional societies there is an
emphasis on the relationship and interaction of the computer professional with other people, rather
than with machines. This properly places the focus of ethical behaviour upon ethical or right dealings
with people, rather than upon the technology. One reason that the four codes are not only similar to
each other, but also very similar to codes of non-computer professionals is that they take a generic
approach to ethics. With the exception of the concern raised about privacy and the confidentiality of
data, the codes could have been written to cover most professions and do not fully reflect the unique
ethical problems raised by computer technology.
12.17 Summary
Security is ―freedom from risk or danger.‖ In the context of computer science, security is the
prevention against unauthorized recipients.
Computer security is frequently associated with three core areas i.e. Confidentiality, Integrity,
and Authentication.
Privacy is a property of individuals; confidentiality is a property of data; and security is a
property assigned to computer hardware and software systems.
Analyzing security by function can be a valuable part of the security planning process.
The term malicious hackers refer to those who break into computers without authorization.
12.18 Keywords
Errors and omissions are an important threat to data and system integrity.
Network spoofing: In network spoofing, a system presents itself to the network as though it were a
different system.
Packet replay: This refers to the recording and retransmission of message packets in the network.
Trojan horse: A program that performs a desired task, but that also includes unexpected functions.
Virus: A code segment that replicates by attaching copies of itself to ex isting executables.
12.19 Review Questions
1. Define the term security system.
2. Why system security is an important concern?
3. What are the various threats to system security?
4. What do you mean by system integrity?
5. How risk can be analysed in system security?
6. Where and when recovery may fail?
7. Why disaster/recovery planning is useful?
8. What are the various ethics in system development?
9. Briefly explain the term security vulnerabilities.
10. What are the preventions to protect our systems from Malware?
13.0 Objectives
After studying this chapter, you will be able to:
Discuss the data processing concept
Differentiate between data and information
Discuss the characteristics of useful information
Define the data processing
Explain the need and approaches of data processing
Discuss the types of data processing
Define the data management
Explain the data organization
Discuss the database management systems
13.1 Introduction
The Electronic Data Processing (EDP) division provides computerized services for all departments in
the TRC. TRC departments have direct access to the data with their personal computers (PC). The
EDP division is continuing to give data management support including data entry/verification to
various studies undertaken in the centre. Also, this division generates reports and prepares pre -
printed forms for field activity and supply data tabulations for monitoring the studies and publication
of research work. Also, helps in preparation of employees‘ pay roll, income-tax sheets, loan
schedules and central bills.
This division takes care to serve all the departments in their computing, data sharing and helps in
accessing internet connection throughout. At present time the EDP division supports three server
systems and four network printers and catering support for over 80 Pentium computers. The EDP
division protects data through frequent backups. Apart from data processing and data management,
this division is taking care of all the servers, PC‘s and printers by bringing under comprehensive
maintenance contract service to avoid breakdown. Also, annual procurement for computer
consumables is done by making indent through this division for user departments.
In this division, at present, six data entry/verification operators, six data processing assistants and
one EDP in charge are working.
Figure 13.2: Sales per employee for each of the ROBCOR's two divisions.
When data is stored electronically in files, it can be used as input for an information system. An
information system has programs to process (or transform) data to produce information as an output,
see Figure 13.3. Information reveals meaning of data.
For example, student‘s data values such as ID, Name, Address, Major, and Phone number represent
raw facts. Class roll is a list which shows students ID and Names of those students who are enrolled
in particular class.
13.4.1 Input
The term input refers to the activities required to record data and to make it available for processing.
The input can also include the steps necessary to check, verify and validate data contents.
Figure 13.4: Data processing cycle.
13.4.2 Processing
The term processing denotes the actual data manipulation techniques such as classifying, sorting,
calculating, summarizing, comparing, etc. that convert data into information.
13.4.3 Output
It is a communication function which transmits the information, generated after processing of data, to
persons who need the information. Sometimes output also includes decoding activity which converts
the electronically generated information into human-readable form.
13.4.4 Storage
It involves the filing of data and information for future use. The above mentione d four basic
functions are performed in a logical sequence as shown in Figure 13.4 in all data processing systems.
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
Data Conversion
Data conversion is the process of converting one form of a data into another, for example converting
data from paper source to digital database or converting from one operating system to another.
Data Entry
Data entry, automated data capture, quality checks and proof reading. The skilled staffs deliver 100%
accurate outputs. Enter data from hand-written, printed document, electronic data, scanned image or
any other type of documents. Also, offer 24 x 7 support and assistance in data entry works.
Process of Data
Data processing involves digitizing, capturing and handing out of data including word processing,
form processing, image processing, data entry etc. from different resources as well as changing them
into a database for effective analyzes and research.
Image Processing
Image processing is analyzing as well as manipulating images to an exact format or quality otherwise
reporting based on the study of images. Image processing as well involves exchange of one
arrangement of an image into an additional format as per the most of the company.
Form Processing
Form processing and survey processing are an important part in major domains such as banking,
insurance, health care, billing etc. Persons use optical character recognition, intelligent character
recognition software for quick processing and intelligent mark recognition. In many cases these
software could not be used to capture data from owing to unreadable handwriting, etc. In such
situation, we undertake manual data entry to undertake form processing. As well, make sure that the
manually procedure forms are verified for accuracy before final production.
Caution
Data base must be secure with protect security system; do not give the permission to access all data
to everyone.
3. The term.....................refers to the activities required to record data and to make it available for
processing.
(a) input (b) output
(c) process (d) storage
All the calculations on data are performed manually. This is a slow method and errors may occur.
This is an old method. It was used before the invention of calculators. But data is still processed
manually in many small shops.
Example: A book seller (a small book shop) records his daily transactions manually. He prepares bills
by pen, paper and carbon paper (no doubt, brain is the main data processor in this case). At the end of
day he will use carbon copies made on a particular date to know how many books he sold and how
much income he got.
Example: Book seller can use a calculator to speed up his data processing system. There will be a less
chance of errors in calculations. Bill calculations will be much faster with a calculator and easier too.
Example: Suppose there are 800 students in a college. There is a manual library system in this
college. If we want to know which students have not returned books since one year? We will have to
search registers for 800 students‘ record. But computer can do this job within seconds.
13.8.2 Field
Data items are physically arranged as fields in a computer file. Their length may be fixed or variable.
Since all individuals have 3 digit employee numbers, a 3 -digit field is required to store the particular
data. Hence, it is a fixed field. In contrast, since customer's name varies considerably from one
customer to another, a variable amount of space must be available to store this element. This can be
called as variable field.
13.8.3 Record
A record is a collection of related data items or fields. Each record normally corresponds to a specific
unit of information. For example, various fields in the record, illustrated in Figure 13.9 are employee
number, employee's name, basic salary and house rent allowance. This is the data used to produce the
payroll register report. The first record contains all the data concerning the employee Pankaj. The
second record contains all the data concerning the employee Rekha. Each subsequent record contains
all the data for a given employee. It can be seen how each related item is grouped together to form a
record.
13.8.5 Database
The collection of related files is called a database. A database contains all the related files for a
particular application.
In essence, data warehousing solutions are meant to enhance data collection and i ntegration to enable
accurate and timely reporting. Since good design translates to improved information handling and
management, it supports quick, efficient and informed business analysis and decision -making, which
are essential to staying competitive and profitable. With such clear benefits to data warehousing,
companies should commit resources and develop a strong enterprise vision to ensuring that a
workable data warehouse is put into place and maintained.
13.11 Future Trends in Data Warehousing
13.11.1 Top Ten Trends in Data Warehousing
Although data warehousing has greatly matured as a technology discipline over the past ten years,
enterprises that undertake data warehousing initiatives continue to face fresh challenges that evolve
with the changing business and technology environment. The data warehouse is being called on to
support new initiatives, such as customer relationship management and supply chain management,
and has also been directly impacted by the rise of e- business. Data warehousing vendors have
developed new and more sophisticated technologies and have acquired and merged with other
vendors. The number of home-grown and packaged software implementations throughout the average
enterprise has grown rapidly, creating more data sources and information delivery options. With all
of the activity surrounding data warehousing, it is hard to sort out which issues and trends are most
pressing for enterprises. To that end, this section presents insights into the ten biggest data
warehousing challenges facing organizations.
What problems and challenges have made these do-over necessary? There are some common pitfalls
that many enterprise data warehousing initiatives have fallen into:
Many organizations undertake data warehousing projects with a build-it-and-they-will-come
attitude. Unfortunately, this philosophy has doomed many a data warehouse to failure. Data
warehousing projects need to involve end users from the beginning to ensure buy -in when the
data warehouse is complete. Some organizations also fail to create the killer apps that actually
deliver the benefits of the data warehouse to end users.
Another pitfall is not architecting the data warehouse for performance, scalability and reliability.
Many enterprises do not take future needs into account when building their initial data warehouse
and fail to anticipate the demands of warehouse operations. They are forced to rebuild their data
warehouse from the ground up when data volumes and user demands overwhelm the ir original
systems.
Data quality issues are often ignored in initial data warehouse implementations. Enterprises do
not feel the negative impact of poor data quality until after their data warehouse is already up and
running. Many are now re-examining the quality of the data in their warehouses and are
undertaking the painful process of resolving data quality problems.
Some data warehouses are unsuccessful because their sponsors did not take the time to define
success at the outset of the project. According to META Group, only 40% of enterprises measure
ROI for their data warehousing initiatives. Without a clear definition of success, it is hard to
determine whether the data warehouse is delivering real business benefits.
Finally, many data warehousing projects simply fall into the late-and over budget trap.
Enterprises fail to anticipate the scope of their data warehousing projects and do not implement
proper project planning.
The good news behind past data warehousing ―failures‖ is that enterprises have le arned from their
mistakes and are developing a set of best practices as they correct the problems. This means more
successful implementations in the future as newcomers to data warehousing learn from those who
have been there before.
Outsourcing
Although enterprises have not yet begun to outsource their actual data warehouses, they are
outsourcing other applications and, by extension, the data used and generate d by those applications.
The use of outsourcing is growing rapidly. Gartner, Inc. estimates that by 2003, 45% of large
enterprises will host or rent some form of business application with an application service provider
(ASP). ASPs offer fast application deployment and application expertise that an enterprise might not
possess. While the benefits can be great, enterprises that use ASPs must manage the risks inherent in
outsourcing data. First, enterprises should make sure that their ASP is taking adequate s ecurity
measures to keep data separate and private from the data of the ASP‘s other customers. Second, the
enterprise should ensure that the ASP has experience with moving large volumes of data so that
migration of data to and from the ASP will go smoothly. Third, the ASP should have proven
experience in backup and recovery for the database(s) being used. Finally, enterprises should ensure
that the flow of data between the enterprise‘s internal systems and the ASP can be kept intact.
Enterprises can handle the growing number of end users through the use of several techniques
including parallelism and scalability, optimized data partitioning, aggregates, cached result se ts and
single-mission data marts. These techniques allow a large number of employees to concurrently
access the data warehouse without compromising performance. Accommodating the different needs
of various end-user groups will require as much of an organizational solution as a technical one. Data
warehousing teams should involve end users from the beginning in order to determine the types of
data and applications necessary to meet their decision-making needs.
More Complex Queries
In addition to becoming more numerous, queries against the data warehouse will also become more
complex. User expectations are growing in terms of the ability to get exactly the type of information
needed, when it is needed. Simple data aggregation is no longer enough to satisfy use rs who want to
be able to drill down on multiple dimensions. For example, it may not be enough to deliver a regional
sales report every week. Users may want to look at the data by customized dimensions – perhaps by a
certain customer characteristic, a specific sales location or the time of purchase.
Users are also demanding more sophisticated business intelligence tools. According to Gartner, data
mining is the most rapidly growing business intelligence technology. Other sophisticated
technologies are also becoming more popular. Vendors are developing software that can monitor data
repositories and trigger reactions to events on a real -time basis. For example, if a telecom customer
calls to cancel his call-waiting feature, real-time analytic software can detect this and trigger a
special offer of a lower price in order to retain the customer. Vendors are also developing a new
generation of data mining algorithms, featuring predictive power combined with explanatory
components, robustness and self-learning features. These new algorithms automate data mining and
make it more accessible to mainstream users by providing explanations with results, indicating when
results are not reliable and automatically adapting to changes in underlying predictive models and/or
data structures.
Enterprises can handle complex queries and the demands of advanced analytic technologies by
implementing some of the same techniques used to handle the increasing number of users, including
parallelism. These techniques ensure that complex queries will not compromise data warehouse
performance. In trying to meet end-user demands, enterprises will also need to address data
warehouse availability. In global organizations, users need 24 x 7 uptime in order to get the
information they need. In enterprises with moderate data volumes, high availability is easily
implemented with high redundancy levels. In enterprises with large data volumes, however, systems
must be carefully engineered for robustness through the use of well -designed parallel frameworks.
As the complexity, variety, and penetration of such services grows, data centres will continue to grow
and proliferate. Several forces are shaping the data centre landscape and we expect future data centres
to be lot more than simply bigger versions of those existing today. Data centres into distributed,
virtualized, multi-layered infrastructures that pose a variety of difficult challenges.
In particular, we consider a layered model of virtualized data centres and discuss storage, networking,
management, and power/thermal issues for such a model. Because of the vastness of the space, we
shall avoid detailed treatment of certain well researched issues. In particular, we do not delve into the
intricacies of virtualization techniques, virtual machine migration and scheduling in virtualized
environments.
The bottom layer in this conceptual model is the Physical Infrastructure Layer (PIL) that manages the
physical infrastructure (often known as ‗‗server farm‖) installed in a given l ocation. Because of the
increasing cost of the power consumed, space occupied, and management personnel required, server
farms are already being located closer to sources of cheap electricity, water, land, and manpower.
These locations are by their nature geographically removed from areas of heavy service demand, and
thus the developments in ultra high-speed networking over long distances are essential enablers of
such remotely located server farms. In addition to the management of physical computing hardwa re,
the PIL can allow for larger-scale consolidation by providing capabilities to carve out well -isolated
sections of the server farm (or ‗‗server patches‖) and assign them to different ‗‗customers.‖ In this
case, the PIL will be responsible for management of boundaries around the server patch in terms of
security, traffic firewalling, and reserving access bandwidth. For example, set up and management of
virtual LANs will be done by PIL.
The next layer is the Virtual Infrastructure Layer (VIL) which exploi ts the virtualization capabilities
available in individual servers, network and storage elements to support the notion of a virtual
cluster, i.e., a set of virtual or real nodes along with QoS controlled paths to satisfy their
communication needs. In many cases, the VIL will be internal to an organization who has leased an
entire physical server patch to run its business. However, it is also conceivable that VIL services are
actually under the control of infrastructure provider that effectively presents a v irtual server patch
abstraction to its customers. This is similar to cloud computing, except that the subscriber to a virtual
server patch would expect explicit SLAs in terms of computational, storage and networking
infrastructure allocated to it and would need enough visibility to provide its own next level
management required for running multiple services or applications.
The third layer in our model is the Virtual Infrastructure Coordination Layer (VICL) whose purpose
is to tie up virtual server patches across multiple physical server farms in order to create a
geographically distributed virtualized data centre (DVDC). This layer must define and manage virtual
pipes between various virtual data centres. This layer would also be responsible for cross -geographic
location application deployment, replication and migration whenever that makes sense. Depending on
its capabilities, VICL could be exploited for other purposes as well, such as reducing energy costs by
spreading load across time-zones and utility rates, providing disaster or large scale failure tolerance,
and even enabling truly large-scale distributed computations.
Finally, the Service Provider Layer (SPL) is responsible for managing and running applications on
the DVDC constructed by the VICL. The SPL would require substantial visibility into the physical
configuration, performance, latency, availability and other a spects of the DVDC so that it can manage
the applications effectively. It is expected that SPL will be owned by the customer directly.
The model in Figure 13.15 subsumes everything from a non virtualized, single location data centre
entirely owned by a single organization all the way up to a geographically distributed, distributed,
fully virtualized data centre where each layer possibly has a separate owner. The latter extreme
provides a number of advantages in terms of consolidation, agility, and flexibility, but it also poses a
number of difficult challenges in terms of security, SLA definition and enforcement, efficiency and
issues of layer separation. For this reason, real data centres are likely to be limited instances of this
general model.
In subsequent sections, we shall address the needs of such DVDC‘s when relevant, although many of
the issues apply to traditional data centres as well.
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………
13.16 Requirements for Modern Data
Businesses and other organizations of all sizes rely heavily on their data centre in order to run their
operations. In this section, we shall take a look at seven of the most crucial requirements of modern
data centres.
1. Continuity
For many organizations, if a system was to become unavailable even for a short period of time, it
could have devastating effects on their ability to function, with costs potentially running into the
millions. As a result, one of the most important factors to consider in the modern data centre is
continuity. A data centre must minimize or, ideally, eliminate the potential for downtime for example
with emergency backup power generation should the data centre suffers a power outage.
2. Security
Another key concern is security. With so much critical and often classified information being stored
on data centres, it is important that this information is protected from the threats of unauthorized
eyes. Another security concern relates to the threat of an unexpected disaster such as fire or flood.
Data centres should have backup options available should the system fail, and should remain secure
at all times.
3. Individual or co-located?
Data centres can be split into two categories. The first category involves data centres that serve the
needs of an individual company (a large data centre located on the premises of a la rge business, for
example, and which has been customized to suit that business‘s purposes). The second involves data
centre services where equipment and bandwidth can be rented by many different customers or
businesses. This is known as a ―co-located‖ data centre.
4. Low costs
With the data centre such a crucial part of so many businesses, it‘s obvious that the costs associated
with installing, maintaining and upgrading a data centre are going to be significant. As a result, any
steps that can be taken to lower the costs of data centres can result in huge savings for a company
savings that have the potential to put them at a competitive advantage against other organizations.
5. Environmentally friendly
Data centres are one of the most energy-intensive parts of any organization. In fact, data centres
alone make up about 2% of the world‘s annual electricity bill. Because data centres rely so heavily on
energy, it means that their carbon footprint is quite high. With governments around the world taking
steps to minimize carbon emissions, companies are now expected to find ways to lower the energy
output of their data centres. This can have tremendous cost savings as well, both in terms of
electricity bill spending and also through avoiding impending government carbon taxes.
6. Scalability
When a data centre is being installed, one of the most important considerations is its scalability. After
all, as a company grows, data centres may reach capacity, at which point the data centre will need to
be upgraded in order for an organization to continue its growth. Scalability must be planned for from
the outset, otherwise companies may find themselves needing to replace their data centre altogether,
which can be quite costly.
7. Automation
Data centres that can almost eliminate the need to be accessed by IT personnel (except under special
circumstances) are known as ―dark data centres‖, and these can had tremendous benefits for the
business. Many of the maintenance processes associated with such data centres are automat ed, saving
the cost of man hours and also saving electricity due to eliminating the need for lighting.
8. The EDP programs and data files cannot be changed without the use of EDP equipment.
(a) True (b) False
9. The DBMS can maintain the integrity of the database by allowing more than one user to update the
same record at the same time.
(a) True (b) False
10. Data centres are not one of the most energy-intensive parts of any organization.
(a) True (b) False
13.17 Summary
Data processing is change of one format of a data into another format for better maintenance as
well as effective analyzes and study.
Data management is a discipline that focuses on the proper generation, storage, and retrieval of
data.
Data processing is a process of converting data into the information as well as it can convert
information into a data.
The data management system is the set of procedures and people through which information is
processed.
In mechanical data processing data is processed with the help of devices or machines.
Data verification is the process of checking a copy of data to make sure that it is exactly equal to
the original copy of the data.
Data validation procedures use data validation rules (or check routines) to ensure the validity
(mostly correctness and meaningfulness) of data.
13.18 Keywords
Customer Relationship Management (CRM): Customer relationship management is a widely
implemented strategy for managing a company‘s interactions with customers, clients and sales
prospects. It involves using technology to organize, automate, and synchronize business processes
principally sales activities, but also those for marketing, customer service, and technical support.
Data Definition Languages (DDL): A DDL is a language used to define data structures within a
database. It is typically considered to be a subset of SQL, the Structured Query Language, but can
also refer to languages that define other types of data.
Data Processing: Data Processing System is a system which processes data which has been captured
and encoded in a format recognizable by the data processing system or has been creat ed and stored by
another unit of an information processing system.
Database Maintenance: Database maintenance is an activity which is designed to keep a database
running smoothly.
Database Management System (DBMS): A Database Management System is computer software
designed for the purpose of managing databases based on a variety of data models.
Electronic Data Processing (EDP): Electronic data processing is use of computers in recording,
classifying, manipulating, and summarizing data.
1.0 Objectives
After studying this chapter, you will be able to:
Discuss the software crisis
Explain the object-oriented programming paradigm
Discuss the basic concepts of OOP
Discuss the advantages/benefits of OOP
Define the application of OOP
1.1 Introduction
A computer is a machine that receives instructions and produces a result after performing an
appropriate assignment. Since it is a machine, it expects good and precise directives in order to do
something. The end result depends on various factors ranging from the particular capabilities of the
machine, the instructions it received, and the expected result.
Computer programming is the art of writing instructions (programs) that ask the computer to do
something and give a result. A computer receives instructions in many different forms.
Some of the operating systems on the market are: Microsoft Windows 3.X, Corel Linux, IBM OS \2,
and Microsoft Windows 9X, Apple OS 10, Red Hat Linux, Microsoft Windows Millen nium, BeOS,
Caldera Linux, and Microsoft Windows 2000 etc. A particular OS (for example Microsoft Windows
98) depending on a particular processor (for example Intel Pentium) is sometimes referred to as a
platform. Some of the computer languages running on Microsoft Windows operating systems are C++,
Pascal, Basic, and their variants.
There are various computer languages, for different reasons, capable of doing different things.
Fortunately, the computer can distinguish between different languages and perfor m accordingly. These
instructions are given by the programmer who is using compilers, interpreters, etc, to write programs.
Examples of those languages are Basic, C++, Pascal, etc.
C++ is an extension to C programming language. It was developed at AT&T Bel l Laboratories in the
early 1980s by Bjarne Stroustrup. It is a deviation from traditional procedural languages in the sense
that it follows object-oriented programming (OOP) approach which is quite suitable for managing
large and complex programs.
An object-oriented language combines the data to its function or code in such a way that access to data
is allowed only through its function or code. Such combination of data and code is called an object.
For example, an object called Tutor may contain data and function.
The data part contains the Name, Dept and Employee code. The function part consists of three
functions: To_pay ( ), Deductions ( ) and Net_pay ( ).
Many software products are either not finished, or not used, or else are delivered with major errors.
Figure 1.1 shows the fate of the defence software projects undertaken in the 1970s, around 50% of the
software products were never delivered, and one-third of those which were delivered were never used.
It is interesting to note that only 2% were used as delivered, without being subjected to any changes.
This illustrates that the software industry has a remarkably bad record in delivering products.
Changes in user requirements have always been a major problem. Another study (Figure 1.2) shows
that more than 50% of the systems required modifications due to changes in user requirements and
data formats. It only illustrates that, in a changing world with a dynamic business environment,
requests for change are unavoidable and therefore systems must be adaptable and tolerant to changes.
The data of an object can access the functions of other objects. However, functions of one object can
access the functions of other objects.
Some of the striking features of object-oriented programming are:
Emphasis is on data rather than procedure.
Programs are divided into what are known as objects.
Data structures are designed such that they characterize the objects.
Functions that operate on the data of an object are tied together in the data structure.
Data is hidden and cannot be accessed by external functions.
Objects may communicate with each other through functions.
New data and functions can be easily added whenever necessary.
Follows bottom-up approach in program design.
Object-oriented programming is the most recent concept among programming paradigms is means
different things to different people. It is therefore important to have a working definite object -oriented
programming before we proceed further. Our definition of object-oriented programming is as follows
―Object-oriented programming is an approach that provides a way of modelling programs by creating
partitioned memory area for both data, and functions that can be used as templates for creating copies
of such modules on demand.‖
That is, an object is considered to be a partitioned area of compute r memory that stores data and set of
operations that can access that data. Since the memory partitions are independent, the objects are used
in a variety of different programs without modifications.
Object-oriented programming (OOP) have taken the best ideas of structured programming and
combined them with several powerful new concepts that encourage the approach the task of
programming in a new way. In general when programming in an object -oriented fashion you break
down a problem into subgroup of related parts that take into account both code and data related to
each group. Also, you organize this subgroup into a hierarchical for all intents and purpose, an object
in a variable of area-defined type. It may seem strong at first to think of an object, which lines both
code and data on variable. However, in object-oriented programming, this is precisely the case. When
you define an object, you are implicitly creating a new data type.
1.4.1 Objects
Objects are the basic run-time entities in an object-oriented system. They may represent a person, i.e.
a bank account, a table of data or any item that the program must handle. They may also resent user -
defined data such as vectors, time and lists. Progra mming problem is analyzed in terms of objects and
the nature of communication between them. Program objects should be chosen such that they match
closely with the real-world objects. As pointed out earlier, objects take up space in the memory and
have an associated address like a record in Pascal, or a structure in C.
When a program is executed, the objects interact by sending messages to one another. For example if
―customer‖ and ―account.‖ are two objects in a program, then the customer object may send a message
to the account object requesting for the bank balance. Each object contains data and code to
manipulate the data. Objects can interact without having to know details of each other‘s data or code.
It is sufficient to know the type of message accept ed and the type of response returned by the objects.
Although different authors represent them differently, Figure 1.4 shows two notations that are
properly used in object-oriented analysis and design.
1.4.2 Classes
We just mentioned that objects contain data and code to manipulate that data. The entire set of data
and code of an object can be made a user-defined data type with the help of a class. In fact, objects are
variables of type class. Once a class has been defined, we can create any number of objects belonging
to that class. Each object is associated with the data of type class with which they created. A class is
thus a collection of objects of similar type. For example, mango, apple and orange a re members of the
class fruit. Classes are user-defined data types and behave like the built-in types of a programming
language. For example, the syntax used to create an object is no different than the syntax used to
create an integer object in C.
If fruit has been defined as a class, then the statement fruit mango; will create an object mango
belonging to the class fruit has been define as a class than the statement
Fruit mango; Will create an object mango belonging to the class fruit
Caution
Be careful not introduce local variables with the same names as the instance fields in the class. For
example, the following constructor will not set the salary.
public Employee(String n, double s, . . .)
{
String name = n; // ERROR
double salary = s; // ERROR
...
}
1.4.3 Data Abstraction
The wrapping up of data and functions into a single unit (called class) is known as encapsulation. Data
encapsulation is the most striking feature of a class. The data is not accessible to the outside world and
only those functions which are wrapped in the class can access it. These functions provide the
interface between the objects data and the program. This insulation of the data from direct access by
the program is called ―data hiding‖ or ―information hiding‖.
Abstraction refers to the act of representing essential features without including the background
details or explanations. Classes use the concept of abstraction and are defined as a list of abstract
attributes such as size, weight and cost, and functions to operate on these attributes. They encapsulate
al1 the essential properties of the objects that are to be created. The attribute are something called data
member because they hold information. The function that operates on these data sometime call ed
methods or member functions.Since the classes use the concept of data abstraction, they are known as
Abstract Data Types (ADT).
1.4.4 Inheritance
Inheritance is the process by which objects of one class acquire the properties of objects of another
class. It supports the concept of hierarchical classification. For example, the bird robin is a part the
class flying bird which is again a part of the class bird. The principle behind this sort of division is
that each derived class shares common characteristics with the class from which it is derived. As
illustrated in Figure 1.5.
In OOP, the concept of inheritance provides the idea of reusability. This means that we can add
additional features to an existing class without modifying it. This is possible by deriving a new class
from the existing one. The new class will have the combined features of both the classes. The real
appeal and power of the inheritance mechanism is that it allows the programmer to reuse a class that is
almost, but not exactly, what he wants, and to tailor the class in such a way that it does not introduce
any undesirable side effects into the rest of the classes.
Note that each sub-class defines only those features that are unique to it. Without the use of
classification, each class would have to explicitly include all of its features.
1.4.5 Dynamic Binding
Binding refers to the linking of a procedure call to the code to be executed in response to the call. It is
associated with polymorphism and inheritance. A function call associated with a polymorphic
reference depends on the dynamic type of that reference.
Every object will have this procedure. Its algorithm is, however, unique to each object and so the draw
procedure will be redefined in each class that defines the object. At run-time, the code matching the
object under current reference will be called.
1.4.7 Polymorphism
Polymorphism is another important OOP concept. Polymorphism means the ability to take more than
one form. For example, an operation may exhibit different behaviour in different instances. The
behaviour depends upon the types of data, used in the operation. For example, cons ider the operation
of addition. For two numbers, the operation will generate a sum. If the operands are strings, then the
operation would produce a third string by concatenation.
The Figure 1.6 illustrates at a single function name can be used to handle different number and
different types of arguments. This is something similar to a particular word having several different
meanings depending on the context.
Polymorphism plays an important role in allowing objects having different internal structures to share
the same external interface. This means that a general class of operations may be accessed in the same
manner even through specific actions associated with each operation may differ. Polymorphism is
extensively used in implementing inheritance. Object-oriented programming languages support
polymorphism, which in characterized by the phase ―on interface multiple method‖. In simple terms,
polymorphism in an attribute that allows one interface to be used with a general class of actions.
Polymorphism helps in reducing complexity by allowing the same interface to specify a general class
of action. It is compiler‘s job to select the ―specify action‖ on it appli es to each situation. The
programmers do not need to make this selection manually operator, overloading, function, overloading
and overlooking example of polymorphism structure. Finally you translate these subgroups self -
contained units called object.
In a multi-function program, many important data items are placed as global so that they may be
accessed by all the functions. Each function-may have its own local data. Global data are more
vulnerable to an inadvertent change by a function. In a large progra m it is very difficult to identify
what data is used by which function. In case we need to revise an external data structure, we should
also revise all functions that access the data. This provides an opportunity for bugs to creep in.
Another serious drawback with the procedural approach is that it does not model real world problems
very well. This is because functions are action-oriented and do not really corresponding to the
elements of the problem.
Some characteristics exhibited by procedure-oriented programming are:
Emphasis is on doing things (algorithms).
Large programs are divided into smaller programs known as functions.
Most of the functions share global data.
Data move openly around the system from function to function. Functions transforms data from
one form to another
Employs top-down approach in program design.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Objects communicate with one another by sending and receiving information much the same way as
people pass messages to one another. The concept of message passing makes it easier to talk about
building systems that directly model or simulate their real -world counterparts.
A message for an object is a request for execution of a procedure, and therefore will invoke a function
(procedure) in the receiving object that generates the desired result. Message passing involves
specifying the name of the object, the name of the function (message) and the information to be sent.
Example
Objects have a life cycle. They can be created and destroyed. Communication with an object is
feasible as long as it is alive.
4………………. is the process by which objects of one class acquire the properties of objects of
another class.
(a) Inheritance (b) Data abstraction
(c) Data abstraction (d) Objects
5. Major features that are not required for object-based programming are:
(a) Data encapsulation
(b) Data hiding and access mechanisms
(c) Software complexity can be easily managed
(d) Operator overloading
The object-oriented paradigm sprang from the language, has matured into design, and has recently
moved into analysis. It is believed that the richness of OOP environment will enable the software
industry to improve not only the quality of software systems but also its productivity. Object-oriented
technology is certainly going to change the way the software engineers think, analyze, design and
implement future systems.
Caution
Be careful before installing C++ software in your computer, you should install only supported C++
version in your running operating system.
Solution
We provide them with the option to hire our resources on ‗Time‘ and ‗Resource‘ basis, under the
contract we would depute full time developers and assist them with all kinds of technical support from
our senior management. The other options include ‗ODC Offshore Development Centre‘ and ODC
with ―Build Operate Transfer‖. The Client had opted for us to setup a dedicated ―Offshore
Development Centre‖ for them.
The project management happens from the Client‘s office and all the Project Managers operate from
the Client divisions and most of them have never put their foot in India. It is the beginning of the
process which would later on to be considered in parallel with Change Management.
The senior team provides approximately a man-week of free support to the team working in the office
for the client while the developers took ideas and understanding of the project on a regular basis from
the Project Managers in Client‘s country. Their main support is to come from Technical Head of
Radix.
In order to ensure the smoothness in project coordination an experienced project coordinator has been
appointed for the task. Though the technical communicator works only part time between the teams
but the objective of having a proper and smoother communication is always unwavering.
Technical Supremacy
As the project execution is in turn then the management from the elite core of both Radix and Client
and the developers themselves are better off cadre in C, C++ programming, combined experience,
knowledge and efforts. Client is now supposed to leverage up on a new level of technical superiority
in the market.
Risks
The pricing offered to client is very competitive and involved the salary and operational cost of Radix
and the payments again has to be made on the month end. Project execution calls for collective skil ls
much in demand is that of developing skills, analytical skills and important among them as quality
control aspects.
Single resource would be able to carry out all of the demands; however task bifurcation is imminent
demand. Or else the efficient output of the resources could not be maintained.
Cost Benefits
Such procedures go on to help Client utilize same set of resources in a clearly productive manner with
least of hassles that would have otherwise call for the additional resource hiring. There fore client
achieved significant cost savings even while outsourcing the project, and building the dedicated
―Offshore Development Centre‖.
Build Operate and Transfer
Based on the success of such contracts this setup can eventually migrate to BOT and a company to
mirror development operations in India.
Questions
1. Describe the role of developers and programmers in Offshore Development Centre.
2. Describe the Offshore Development Centre.
1.7 Summary
Polymorphism means one name, multiple forms. It allows us to have more than one function with
the same name in a program.
Dynamic binding means that the code associated with a given procedure is not known until the
time of the call at run-time.
Message passing involves specifying the name of the object, the name of the function (message)
and the information to be sent.
Object-oriented technology offers several benefits over the conventional programming methods
the most important one being the reusability.
Applications of OOP technology has gained importance in almost all areas of computing
including real-time business systems.
1.8 Keywords
Assembly Language: An assembly language is a low-level programming language for computers,
microprocessors, microcontrollers, and other programmable devices in which each statement
corresponds to a single machine language instruction.
Function Overloading: Function overloading is one of the most powerful features of C++
programming language. It forms the basis of polymorphism (compile -time polymorphism).
Machine language: Machine language is the programming language the computer understands; its
native tongue. Machine language instructions are written with binary numbers, 0 and 1.
Object-oriented Programming: Object-oriented programming (OOP) is a programming paradigm
using objects data structures consisting of data fields and methods together with their interactions to
design applications and computer programs.
Polymorphism: Polymorphism is a programming language feature that allows values of different data
types to be handled using a uniform interface.
2.0 Objectives
After studying this chapter, you will be able to:
Discuss the C++ program development environment
Explain the programming language and C++ standards
Discuss about various C++ compilers
Explain the C++ Standard Library
Understand the prototype of main () function
Explain about standard I/O operators
2.1 Introduction
An integrated development environment (IDE) is a programming environment that has been packaged
as an application program, typically consisting of a code editor, a compiler, a debugger, and a
graphical user interface (GUI) builder. The IDE may be a standalone application or may be included
as part of one or more existing and compatible applications. The BASIC programming language, for
example, can be used within Microsoft Office applications, which makes it possible to write a
WordBasic program within the Microsoft Word application. IDEs provide a user -friendly framework
for many modern programming languages, such as Visual Basic, Java, and PowerBuilder.
IDEs for developing HTML applications are among the most commonly used. For example, many
people designing Web sites today use an IDE (such as HomeSite, DreamWeaver, or FrontPage) for
Web site development that automates many of the tasks involved.
In this chapter we will discuss about the C++ programming development environment in which we
develop different types of software‘s.
Let us consider the steps in creating and executing a C++ application using a C++ development
environment (illustrated in Figure 2.1). C++ systems generally consist of three parts: a program
development environment, the language and the C++ Standard Library. C++ programs typically go
through six phases: edit, pre-process, compile, link, load and execute. The following discussion
explains a typical C++ program development environment.
Figure 2.1: C++ environment.
Phase 4: Linking
Phase 4 is called linking. C++ programs typically contain references to functions and data defined
elsewhere, such as in the standard libraries or in the private libraries of groups of programmer s
working on a particular project. The object code produced by the C++ compiler typically contains
―holes‖ due to these missing parts. A linker links the object code with the code for the missing
functions to produce an executable image (with no missing pi eces). If the program compiles and links
correctly, an executable image is produced.
Phase 5: Loading
Before a program can be executed, it must first be placed in memory. This is done by the loader,
which takes the executable image from disk and transfers it to memory. Additional components from
shared libraries that support the program are also loaded.
Phase 6: Execution
Finally, the computer executes the program.
Most programs in C++ input and/or output data. Certain C++ functions take their input from cin (the
standard input stream; pronounced ―see-in‖), which is normally the keyboard, but cin can be
redirected to another device. Data is often output to cout (the standard output stream; pronounced
―see-out‖), which is normally the computer screen, but cout can be redirected to another device.
When we say that a program prints a result, we normally mean that the result is displayed on a screen.
Data may be output to other devices, such as disks and hardcopy printers. There is also a standard
error stream referred to as cerr. The cerr stream (normally connected to the screen) is used for
displaying error messages. It is common for users to assign cout to a device other than the screen
while keeping cerr assigned to the screen, so that normal outputs are separated from errors.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Caution
The every program must be translated into a machine language that the computer can understand.
This translation is performed by compilers, interpreters, and assemblers.
For the purposes of this International Standard, the definitions given in ISO/IEC 2382 and the
following definitions apply. Terms that are used only in a small portion o f this International Standard
are defined where they are used and italicized where they are defined.
Version history
The following is a rough outline of product release information.
Year Version
1997 1
1998 3
1999 4 (released as Inprise)
2000 5
2002 6
2003 X
2005 2006 (10)
2007 2007 (11)
Aug. 2008 2009 (12)
24 Aug. 2009 2010 (14)
30 Aug. 2010 XE (15)
31 Aug. 2011 XE2 (16)
Notice that this pseudo-coded algorithm is valid for a group of elements, regardless how exactly
those elements are stored (of course, provided that we are able to perform the required tests).
Let us try to implement it for both linked lists and arrays:
Linked lists:
struct Element // This is an extremely simplified definition,
{ // but enough for this example.
int value;
struct Element * next;
};
int high = list->value; // list points to the first element
struct Element * current = list->next;
// refers (points) to second element
while (current != NULL) // test if within the group of elements
{
if (current->value > high)
{
high = current->value;
}
current = current->next; // Advance to next element
}
Arrays:
int high = *array;
int * one_past_end = array + size;
int * current = array + 1; // starts at second element
while (current != one_past_end) // test if within the group of elements
{
if (*current > high)
{
high = *current;
}
current++; // Advance to the next element
}
Surprise! Both fragments of code are almost identical. It is just the syntax th at we use to manipulate
and access the elements what changes. Notice that in both cases we have a pointer pointing to the
current element. This pointer is compared to a particular value to test if we are within the group of
values. Also, the pointer is dereferenced (in different concrete ways in both cases, but both are
dereferencing operations) to obtain the particular value. This pointer allows us to advance to the next
element (again, in different concrete ways, but still, in both cases we make the point er point to the
next element)
There is one important detail that makes the two examples conceptually identical: in this case, both
data structures (array and linked list) are treated as a sequential group of elements; in both cases, the
operations required are:
1. Point to a particular element
2. Access the element that is pointed
3. Point to the next element
4. Test if we are pointing within the group of elements
Notice that with these operations, we can implement any algorithm that requires sequential access to
the elements of a group.
class to_lower
{
public:
char operator() (char c) const // notice the return type
{
return tolower(c);
}
};
string lower (const string & str)
{
string lcase = str;
transform (str.begin(), str.end(), lcase.begin(), to_lower());
return lcase;
}
The transform line could have been:
transform (lcase.begin(), lcase.end(), lcase.begin(), to_lower())
(remember that the output sequence can be the same input sequence, if we want in -place
transformations)
For instance, we could use this function object greater to sort a sequence in descending order:
vector<int> values;
// ... add elements...
sort(values.begin(), values.end(), greater<int>());
The trick is that the third parameter is an operation that will be used instead of direct comparison, and
that operation is supposed to emulate the ―less-than‖ comparison. If we use ―greater-than‖ instead,
we are ―lying‖ to the algorithm and always giving the opposite result- the outcome is that the
sequence ends up sorted in the exact opposite order.
The function objects representing arithmetic operations include plus, minus, multiplies, and divides
(and a couple others that will omit). These are binary operations that return the sum, difference,
product, or division of the first argument and the second (in that order). You can imagine that their
implementation is also straightforward.
We can use the multiplies function object to obtain the product of all the numbers in a sequence as
shown below:
list<double> values;
// ... add elements ...
double product = accumulate (values.begin(), values.end(), 1.0,
multiplies<double>());
The trick here is that the user-provided operation is supposed to replace direct addition (e.g., we may
want to accumulate the grades of all the students, or accumulate the lengths of a group of strings,
etc.). We provide an operation that multiplies instead of adding.
int main(void)
int main(int argc, char **argv)
int main(int argc, char *argv[])
int main()
The parameters argc, argument count, and argv, argument vector, respectively give the number and
value of the program‘s command-line arguments. The names of argc and argv may be any valid
identifier in C, but it is common convention to use these names. In C++, the names are to be taken
literally, and the ―void‖ in the parameter list is to be omitted, if strict conformance is desired. Other
platform-dependent formats are also allowed by the C and C++ standards, except that in C++ the
return type must stay int; for example, Unix (though not POSIX.1) and Microsoft Windows have a
third argument giving the program‘s environment, otherwise accessible through getenv in stdlib.h:
int main(int argc, char **argv, char **envp)
To print the content of a variable the double quotes are not used. Take a look at an example:
#include<iostream>
using namespace std;
int main()
{
char Yes = ‗y‘;
cout << Yes;
return 0;
}
Note: If you do not want to use the namespace std, you could write std::cout << Yes
The << operator can be used multiple times in a single statement. Take a look at an example:
#include<iostream>
using namespace std;
int main()
{
cout << ―Hello,‖ << ―this is a test‖ << ―string.‖;
return 0;
}
2.7.2 Standard input (cin)
The standard input device is the keyboard. With the cin and >> operators it is possible to read input
from the keyboard.
Take a look at an example:
#include<iostream>
using namespace std;
int main()
{
char MY_CHAR;
cout << ―:Press a character and press return:‖;
cin >> MY_CHAR;
cout << MY_CHAR;
return 0;
}
Note: The input is processed by cin after the return key is pressed.
The cin operator will always return the variable type that you use with cin. So if you request an
integer you will get an integer and so on. This can cause an error when the user of the program does
not return the type that you are expecting. (Example: you ask for an integer and you get a string of
characters.)
Later on we will offer a solution to this problem.
The cin operator is also chainable. For example:
In this case the user must give two input values that are separated by a ny valid blank separator (tab,
space or new-line).
Caution
The header file iostream must be included to make use of the input/output (cin/cout) operators.
endl Manipulator
This manipulator has the same functionality as the ‗n‘ newline character.
For example:
Sample Code
1. cout << ―India‖ << endl;
2. cout << ―Training‖;
produces the output:
India
Training
setw Manipulator
This manipulator sets the minimum field width on output. The syntax is:
setw(x)
Here setw causes the number or string that follows it to be printed within a field of x characters wide
and x is the argument set in setw manipulator. The header file that must be included while using setw
manipulator is .
Sample Code:
1. #include <iostream>
2. using namespace std;
3. #include <iomanip>
4. void main( )
5. {
6. int x1=12345,x2= 23456, x3=7892;
7. cout << setw(8) << ―India‖ << setw(20) << ―Values‖ << endl
8. << setw(8) << ―E1234567‖ << setw(20)<< x1 << endl
9. << setw(8) << ―S1234567‖ << setw(20)<< x2 << endl
10. << setw(8) << ―A1234567‖ << setw(20)<< x3 << endl;
11. }
setfill Manipulator
This is used after setw manipulator. If a value does not entirely fill a field, then the character
specified in the setfill argument of the manipulator is used for filling the fields.
Sample Code
1. #include <iostream>
2. using namespace std;
3. #include <iomanip>
4. void main()
5. {
6. cout << setw(10) << setfill(‗$‘) << 50 << 33 << endl;
7. }
The output of the above example is:
This is because the setw sets 10 for the width of the field and the number 50 has only 2 positions in
it. So the remaining 8 positions are filled with $ symbol which is specified in the set fill argument.
The first cout statement contains fixed notation and the set precision contains argument 3. This means
that three digits after the decimal point and in fixed notation will output the first cout statement as
0.100. The second cout produces the output in scientific notation. The default value is used since no
setprecision value is provided.
A comment can also start with //, extending to the end of the line. For example:
#include <iostream>
using namespace std;
main()
{
cout << ―Hello World‖; // prints Hello World
return 0;
}
When the above code is compiled, it will ignore // prints Hello World and final executable will
produce following result:
Hello World
Within a /* and */ comment, // characters have no special meaning. Within a // comment, /* and */
have no special meaning. Thus, you can ―nest‖ one kind of comment within the other kind. For
example:
/* Comment out printing of Hello World:
cout << ―Hello World‖; // prints Hello World
*/
cout<< ―Age:‖<<age<<endl;
cout<< ―Salary:‖<<salary<<endl; cout<<‖Code:‖<<code<<endl;
}
2.10.2 Derived Data Types
These are the data types which are derived from the fundamental data types.
It is further divide into two categories
i)Built-In and ii)User-defined, which are discussed below as separate topics.
Pointer: A pointer is a variable that holds the memory address of other variable. It is also of different
data types, ex- char pointer can store address of only char variables, int pointer c an store address of
int variables and so on.
Reference: A reference in the simplest sense is an alias or alternate name for a previously defined
variable.
//Program to illustrate References
#include<iostream>
void main(void)
{
int var;
int &refvar=var;//here a reference variable to var is declared remember var was previously
declared
}
User-Defined Derived Data Types
1. Class: A class is a collection of variables and function under one reference name. it is the way of
separating and storing similar data together. Member functions are often the means of acce ssing,
modifying and operating the data members (i.e. variables). It is one of the most important features
of C++ since OOP is usually implemented through the use of classes.
2. Structure: In C++ structure and class same except for some very minor difference s.
3. Union: A union is a memory location shared by two or more different variables, generally of
different data types.
4. Enumerations: It can be used to assign names to integer constants.
int var;
var=POOR;//this makes programs more understandable
cout<<var<<endl;
var=GOOD;
cout<<var<<endl;
var=EXCELLENT;
cout<<var;
}
2. A class is a collection of variables and function under one reference name it is the way of
separating and storing similar data together.
(a). True (b). False.
3. ................are means to identify the type of data and associated operations of handling it. C++
provides a predefined set of data types for handling the data it uses.
(a). Variables (b). Data types
(c). Int (d). Float
5. The..............argc, argument count, and argv, argument vector, respectively give the number and
value of the program‘s command-line arguments.
(a). Parameters (b). int
(c). long (d). None of these
6. The. Operations Template Library provides a number of useful, generic algorithms to perform the
most commonly used operations on groups/sequences of elements.
(a).True (b). False
2.11 Summary
The C++ is a general purpose programming language based on the C programmi ng language as
described in ISO/IEC 9899:1990 programming languages - C.
The term computer language is sometimes used interchangeably with programming language.
A complete specification for a programming language includes a description, possibly idealized,
of a machine or processor for that language.
The programming language is a notation for writing programs, which are specifications of a
computation or algorithm.
The earliest programming languages predate the invention of the computer, and were used to
direct the behaviour of machines such as Jacquard looms and player pianos.
2.12 Keywords
Abstractions: Programming languages usually contain abstractions for defining and manipulating
data structures or controlling the flow of execution.
C++: It is an object-oriented programming (OOP) language that is viewed by many as the best
language for creating large-scale applications. C++ is a superset of the C language.
Structure: In C++ structure and class same except for some very minor differences.
Unsigned: The Variable type of int can hold negative and positive numbers but a signed int holds
negative, zero or positive numbers.
Variable: It is a way of referring to a memory location used in a computer program. This memory
location holds values- perhaps numbers or text or more complicated types of data like a payroll
record.
3.0 Objectives
After studying this chapter, you will be able to:
Discuss the turbo C++ IDE
Explain the creating, compiling and running a C++ program using IDE
Define the elements of C++ language
Explain the C++ tokens
Explain the type conversion in expressions
3.1 Introduction
C++ is a third generation programming language. When computers were first invented, they were
programmed with very simple, low-level commands. A programmer would design a program, and
then translate the program into a specific set of codes, known as machin e language. These codes
would be fed into a computer with switches, punch-cards, or primitive keypads. These programs were
cumbersome to write, and very hard to debug. (Debugging is the act of removing mistakes in a
program.) Machine code is considered the first generation of programming languages. C++ is a
programming language substantially different from C. Many see C++ as ―a better C than C,‖ or as C
with some add-ons. C++ shares the same low level constructs as C, however, and assume some
knowledge of C in this course. You might want to have a look at the C introduction course to get up
to speed on that language. C++ is a programming language of many different dialects, similar to the
way that each spoken language has many different dialects. In C++, dia lects are not because the
speakers live in the North or South. Instead, it is because there are many different compilers that
support slightly different features. There are several common compilers: in particular, Borland C++,
Microsoft C++, and GNU C++. There are also many front-end environments for the different
compilers the most common is Dev-C++ around GNU‘s g++ compiler. Some, such as g++, are free,
while others are not.
3.3 Creating, Compiling and Running a C++ Program Using IDE and
Command Line
3.3.1 Creating and Compile C++ Programs Using IDE
In order to run a program and see it doing wonderful things, you should first write the program. The
program can be written in any text editor, such as vi and emacs in Unix environment and using
command prompt in DOS. There are also several Integrated Development Environment (IDE)
packages available which provide a complete programming environment f or C++ in which you can
write, compile, run, and debug your program.
C++ programs are saved with extensions .C, .cc, .cpp, .cxx depending on the platform you are
working upon.
Once you have saved a program, next stage is compiling it using a compiler which converts your C++
program into the object code which the computer can understand.
1. Choose Empty Project from New Project dialog and choose a name for the program, in our case
first and click Ok.
Write the code of the program and save it.
1. Click on Compile button (third row, first button) to compile your source code. If there are any
errors in your program, then a window at the bottom will specify the warnings.
2. After program is compiled, click on Run button (next to compile).
3. However, DevC++ has a problem. As soon as you run the program, output window opens
momentarily and then it closes. So to come around this solution, set a breakpoint at the end of
main function and then click on Debug instead of running it.
Output:
When you run the program, output window will show the string.
Compilation
There are many C compilers around. The cc being the default Sun compiler. The GNU C compiler gcc
is popular and available for many platforms. PC users may also be familiar with the Borland bcc
compiler.
There are also equivalent C++ compilers which are usually denoted by CC (note upper case CC. For
example Sun provides CC and GNU GCC. The GNU compiler is also denoted by g++
For the sake of compactness in the basic discussions of compiler operation we will simply refer to the
cc compiler -- other compilers can simply be substituted in place of cc unless otherwise stated.
To compile your program simply invoke the command cc. The command must be followed by the
name of the (C) program you wish to compile. A number of compiler options can be specified also.
Thus, the basic compilation command is:
cc program.c
where program.c is the name of the file.
If there are obvious errors in your program (such as mistypings, misspelling one of the key words or
omitting a semi-colon), the compiler will detect and report them.
There may, of course, still be logical errors that the compiler cannot detect. You may be telling the
computer to do the wrong operations.
When the compiler has successfully digested your program, the compiled version, or executable, is
left in a file called a.out or if the compiler option -o is used: the file listed after the -o.
It is more convenient to use a -o and filename in the compilation as in
cc -o program program.c
which puts the compiled program into the file program (or any file you name fo llowing the ―-o‖
argument) instead of putting it in the file a.out .
Caution
Missing semicolon after any statement cause the compile time error
3.4.2 Identifiers
A C++ identifier is a name used to identify a variable, function, class, module, or any other user -
defined item. An identifier starts with a letter A to Z or a to z or an underscore (_) followed by zero
or more letters, underscores, and digits (0 to 9).
C++ does not allow punctuation characters such as @, $, and % within identifiers. C++ is a case
sensitive programming language. Thus Manpower and manpower are two different identifiers in C++.
Here are some examples of acceptable identifiers:
mohd zara abc move_name a_123
myname50 _temp j a23b9 retVal
3.4.3 Trigraphs
A few characters have an alternative representation, called a trigraph sequence. A trigraph is a three -
character sequence that represents a single character and the sequence alwa ys starts with two question
marks.
Trigraphs are expanded anywhere they appear, including within string literals and character literals,
in comments, and in preprocessor directives.
??= #
??/ \
??‘ ^
??( [
??) ]
??! |
??< {
??> }
??- ~
All the compilers do not support trigraphs and they are not advised to be used because of their
confusing nature.
3.4.4 Whitespace
A line containing only whitespace, possibly with a comment, is known as a blank line, and C++
compiler totally ignores it. Whitespace is the term used in C++ to describe blanks, tabs, newline
characters and comments. Whitespace separates one part of a statement from another and enables the
compiler to identify where one element in a statement, such as int, ends and the next element begins.
Therefore, in the statement,
int age;
There must be at least one whitespace character (usually a space) between int and age for the
compiler to be able to distinguish them. On the other hand, in the statement
fruit = apples + oranges; // Get the total fruit
No whitespace characters are necessary between fruit and =, or between = and apples, although you
are free to include some if you wish for readability purpose.
3. C++ programs are saved with extensions…………. depending on the platform you are working
upon.
(a) .cc (b) .c and .cxx
(c) .cpp (d) All of these
4. The GNU …………….. gcc is popular and available for many platforms.
(a) C++ compiler (b) G++ compiler
(c) C compiler (d) DevC++ compiler
3.5.2 Functions
In C++, statements are typically grouped into units called functions. A function is a collection of
statements that executes sequentially. Every C++ program must contain a special function called
main (). When the C++ program is run, execution starts with the first statement inside of main ().
Functions are typically written to do a very specific job. For example, a function named Max () might
contain statements that figures out which of two numbers is larger. A function named CalculateGrade
() might calculate a student‘s grade.
3.5.3 Libraries
Libraries are groups of functions that have been ―packaged up‖ for reuse in many different programs.
The core C++ language is actually very small and minimalistic — however, C++ comes with a bunch
of libraries, known as the C++ standard libraries, that provide programmers with lots of extra
functionality. For example, the iostream library contains functions for doing input and output. During
the link stage of the compilation process, the libraries from the C++ standa rd library are the runtime
support libraries that are linked into the program.
Taking a look at a sample program, Now that you have a brief understanding of what statements,
functions, and libraries are, let us look at a simple hello world program.
Consider our hello world program:
#include <iostream>
int main()
{
using namespace std;
cout << ―Hello world!‖ << endl;
return 0;
}
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
3.6.2 Identifiers
Identifiers refer to the names of variables, functions, arrays, classes, etc. created by the programmer.
They are the fundamental requirement of any language. Each language has its own rules for naming
these identifiers.
The following rules are common to both C and C++:
Only alphabetic characters, digits and underscores are permitted.
The name cannot start with a digit.
Uppercase and lowercase letters are distinct.
A declared keyword cannot be used as a variable name.
Symbolic names can be used in C++ for various data items used by a programmer in his program. For
example, if a programmer wants to store a value 50 in a memory location, he/she choose any
symbolic name (say MARKS) and use it as given below:
MARKS = 50
The symbol ‗=‘ is an assignment operator. The significance of the above statement is that ‗MARKS‘
is a symbolic name for a memory location where the value 50 is being stored.
A symbolic name is generally known as an identifier. The identifier is a sequence of characters taken
from C++ character set.
The rules for the formation of an identifier are:
(i) An identifier can consist of alphabets, digits and/or underscores.
(ii) It must not start with a digit.
(iii) C++ is case sensitive, i.e., upper case and lower case letters are considered different form each
other. It may be noted that TOTAL and total are two different identifier names.
(iv) It should not be a reserved word.
3.6.3 Literals
Literals (often referred to as constants) are data items that never change their value during the
execution of the program.
The following types of literals are available in C++.
(i) integer-constants
(ii) character-constants
(iii) floating-constants
(iv) string-literals
(i) Integer constants
Integer constants are whole numbers without any fractional part. It may contain either + or – sign, but
decimal point or commas does not appear in any integer constant. C++ allows three types of integer
constants.
Decimal (Base 10)
Octal (Base 8)
Hexadecimal (Base 16)
(ii)Character constants
A character constant in C++ must contain one or more characters and must be enclosed in single
quotation marks. For example ‗A‘, ‗9‘, etc. C++ allows non graphic characters which cannot be typed
directly from keyboard, e.g., backspace, tab, carriage return etc. These characters can be represented
by using a escape sequence. An escape sequence represents a single character. Table 3.2 gives a
listing of common escape sequences.
3.6.4 Punctuators
The following characters are used as punctuators in C++. (Table 3.3)
3.6.5 Operators
Operators are special symbols used for specific purposes. C++ provides six types of operators.
Arithmetical operators
Relational operators
Logical operators
Unary operators
Assignment operators
Conditional operators
Comma operator
Arithmetical operators
An operator that performs an arithmetic (numeric) operation: +, –, *, / , or %. For these operations
always two or more than two operands are required. Therefore these operators are called binary
operators. Table 3.4 shows the arithmetic operators.
Relational Operators
The relational operators are used to test the relation between two values. All relational operators are
binary operators and therefore require two operands. A relational expression returns zero when the
relation is false and a non-zero when it is true.
Table 3.5 shows the relational operators.
Example:
int x = 2; Int l = 1;
int y = 3;
int z = 5;
Logical Operators
The logical operators are used to combine one or more relational expression. Table 3.6 shows the
logical operators.
Table 3.6: Logical operators
The NOT operator is called the unary operator because it requires only one operand.
Example
int x = 5; int z = 9; int y = 7;
(x > y) & & (z > y)
The first expression (x > y) evaluates to false and second expression (z > y) evaluates to true.
Therefore, the final expression is false.
In AND operation, if any one of the expression is false, the entire expression is false.
In OR operation, if any one of the expression is true, the entire expression is true.
In NOT operation, only one expression is required.
If the expression is true, the NOT operation of true is false and vice versa.
Unary Operators
C++ provides two unary operators for which only one variable is required.
Example
a = - 50; a = - b;
a = + 50; a = + b;
Here plus sign (+) and minus sign (-) are unary because they are not used between two variables.
Assignment Operator
The assignment operator ‗=‘ stores the value of the expression on the right hand side of the equal sign
to the operand on the left hand side.
Example
int m = 5, n = 7;
int x, y, z;
x = y = z = 0;
In addition to standard assignment operator shown above, C++ also supports compound assignment
operators.
C++ provides two special operators viz ‗++‘ and ‗– –‘ for incrementing and decrementing the value
of a variable by 1. The increment/decrement operator can be used with any type of variable but it
cannot be used with any constant.
With the prefix version of these operators, C++ performs the increment or decrement operation before
using the value of the operand. For instance the following code.
int sum, ctr;
sum = 12;
ctr = 4;
sum = sum + (++ctr);
will produce the value of sum as 17 because ctr will be first incremented and then added to sum
producing value 17.
Similarly, the following code
sum = 12;
ctr = 4;
sum = sum + ( - - ctr);
will produce the value of sum as 15 because ctr will be first decremented and then added to sum
producing value 15.
With the postfix version of these operators, C++ first uses the value of the operand in evaluating the
expression before incrementing or decrementing the operand‘s value.
For example, the following code
sum = 12;
ctr = 4;
sum = sum + (ctr + +);
will produce the value of sum as 16 because ctr will be first used in the expression producing the
value of sum as 16 and then increment the value of ctr by 1 (ctr becomes now 5)
Similary, the following code
sum = 12;
ctr = 4;
sum = sum + (ctr – –) will produce the value of sum as 16 because ctr will be first used with its value
4 producing value of sum as 16 and then decrement the value of ctr by 1 (ctr becomes 3).
Let us study the use of compound assignment operators in the Table 3.7:
Example:
int x = 2; / / first
x + = 5; / / second
In the second statement, the value of x is 7.
Conditional Operator
The conditional operator ?: is called ternary operator as it requires three operands. The format of the
conditional operator is:
Conditional_expression ? expression1 : expression2;
If the value of conditional_expression is true then the expression1 is evaluated,
otherwise expression2 is evaluated.
Example:
int a = 5;
int b = 6;
big = (a > b) ? a : b;
The condition evaluates to false, therefore big gets the value from b and it becomes 6.
Comma Operator
The comma operator gives left to right evaluation of expressions. It enables to put more than one
expression separated by comma on a single line.
Example
int i = 20, j = 25;
int sq = i * i, cube = j * j * j;
In the above statements, comma is used as a separator between two statements / expressions.
There is one additional special case: If one operand is long and the other is unsigned int, and if the
value of the unsigned int cannot be represented by a long, both operands are converted to unsigned
long. Once these conversion rules have been applied, each pair of operands is of the same type and
the result of each operation is the same as the type of both operands. For example, consider the type
conversions that occur in Figure 3.1. First, the character ch is converted to an integer. Then the
outcome of ch/i is converted to a double because f*d is double. The outcome of f+i is float, because f
is a float. The final result is double.
The process in which one pre-defined type of expression is converted into another type is called
conversion. There are two types of conversion in C++.
Implicit conversion
Explicit conversion
The int value of b is converted to type float and stored in a temporary variable before being ultiplied
by the float variable c. The result is then converted to double so that it can be assigned to the double
variable a.
7. Decimal integer constants consist of sequence of digits and should begin with 0 (zero).
(a) True (b) False
The functionality of these explicit conversion operators is enough for most needs with fundamental
data types.
For example, the following code is syntactically correct:
// class type-casting
#include <iostream>
using namespace std;
class CDummy {
float i,j;
};
class CAddition {
int x,y;
public:
CAddition (int a, int b) { x=a; y=b; }
int result() { return x+y;}
};
int main () {
CDummy d;
CAddition * padd;
padd = (CAddition*) &d;
cout << padd->result();
return 0;
}
The program declares a pointer to CAddition, but then it assigns to it a reference to an object of
another incompatible type using explicit type-casting:
padd = (CAddition*) &d;
Traditional explicit type-casting allows to convert any pointer into any other pointer type,
independently of the types they point to. The subsequent call to member result will produce either a
run-time error or a unexpected result.
In order to control these types of conversions between classes, we have four specific casting
operators: dynamic_cast, reinterpret_cast, static_cast and const_cast. Their format is to follow the
new type enclosed between angle-brackets (<>) and immediately after, the expression to be converted
between parentheses.
dynamic_cast <new_type> (expression)
reinterpret_cast <new_type> (expression)
static_cast <new_type> (expression)
const_cast <new_type> (expression)
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Questions:
1. What the CJIS stands for?
2. What is the string indexing?
3.8 Summary
C++ is currently one of the most popular third generation languages.
C++ is a case sensitive programming language.
A function is a collection of statements that executes sequentially. Every C++ program must
contain a special function called main ().
A C++ program is written using these tokens, white spaces, and the syntax of the language.
Whitespace is the term used in C++ to describe blanks, tabs, newline characters and comments.
Libraries are groups of functions that have been ―packaged up‖ for reuse in many different
programs.
3.9 Keywords
Function: A function is a collection of statements that executes sequentially.
Identifiers: A C++ identifier is a name used to identify a variable, function, class, module, or any
other user-defined item.
Libraries: Libraries are groups of functions that have been ―packaged up‖ for reuse in many different
programs.
Token: A token is a group of characters that logically belong together.
Trigraph: A trigraph is a three-character sequence that represents a single character and the sequence
always starts with two question marks.
White space: Whitespace is the term used in C++ to describe blanks, tabs, newline characters and
comments.
4.0 Objectives
After studying this chapter, you will be able to:
Explain the sequential statements
Discuss the mathematical functions
Explain the branching and looping statements
Discuss the nested loops
4.1 Introduction
C++ Language must be able to perform different sets of actions depending on the circumstances. A
C++ program is a set of statements which are normally executed sequentially n the order in which
they appear. This happens when no options or no repetitions of certain calculations are necessary.
However, in practice, we have a number of situations where we may have to change the order of
execution of statements based on certain conditions; repeat a group of statements until certain
specified conditions are met. This involves a kind of decision making to see whether a particular
condition has occurred or not and then direct the computer to execute certain statements accordingly.
C++ has 4 decision making instructions, they are:
1. If-else statement
2. Switch statement
3. Conditional operator statement
4. Goto statement.
4.2.1 If Statement
If statement is a powerful decision making statement and is used to control the flow of execution of
statements. It is basically a two-way decision statement and is used in conjunction with an
expression. It takes the following form:
if (test expression)
It allows the computer to evaluate the expression first and then, depending on whether the value of
the expression (relation or condition) is ―true‖ (non-zero) or ―false‖ (zero), it transfers the control to
a particular statement.
This point of program has two paths to follow, one for the true condition and the other for the false
condition as shown in Figure 4.1.
The if statement may be implemented in different forms depending on the complexity of conditions to
be tested.
Simple if statement
If else statement
nested if....else statement
else if ladder.
Simple If Statement
The general form of a simple if statement is
if(test expression)
{
statement-block;
}
statement-x;
The ―statement-block‖ may be a single statement or a group of statements. If the test expression is
true, the statement-block will be executed; otherwise the statement-block will be skipped and the
execution will jump to the statement-x. Remember, when the condition is true both the statement -
block and the statement-x are executed in sequence. This is illustrated in Figure 4.2.
The logic of execution is illustrated in Figure 4.4. If the condition 1 is false, the statement 3 will be
executed; otherwise it continues to perform the second test. If the condition 2 is true, the statement 1
will be evaluated; otherwise the statement 2 will be evaluated and then the control is transferred to
the statement x.
Example:
#include <iostream>
#include <iomanip>
using namespace std;
int main ()
{
int first, second;
cout << ―Enter two integers.‖ << endl;
cout << ―First‖ << setw (3) << ―: ‖;
cin >> first;
cout << ―Second ‖<< setw (2) << ―: ‖;
cin >> second;
if (first > second)
cout << ―first is greater than second.‖ << endl;
return 0;
}
switch (expression)
{
case value 1;
block 1
break;
case value 2
block 2
break;
default:
default-block
break;
}
statement x
The break statement at the end of each block signal the end of a particular case and causes an exit
from the switch statement, transferring the control to the statement -x following the switch.
The default is an optional case. When present, it will be executed if the value of the expression does
not match with any of the case values. If not present, no action takes place if all matches fail and the
control goes to the statement-x.
Example:
#include <iostream>
using namespace std;
int main ()
{
char permit;
cout << ―Are you sure you want to quit? (y/n) : ‖;
cin >> permit;
switch (permit)
{
case ―y‖ :
cout << ―Hope to see you again!‖ << endl;
break;
case ―n‖ :
cout << ―Welcome back!‖ < < endl;
break;
default:
cout << ―What? I do not get it!‖ << endl;
}
return 0;
}
Caution
In the switch statement, the default case does not have to appear at the end. It can appear anywhere in
the switch block.
All the math functions require the header <cmath>. (C++ programs must use the header file math.h.)
In declaring the math functions, this header defines the macro called HUGE_VAL. The macros
EDOM and ERANGE are also used by the math functions. These macros are defined in the header
<cerrno> (or the file errno.h). If an argument to a math function is not in the domain for which it is
defined, an implementation-defined value is returned, and the built-in global integer variable errno is
set equal to EDOM. If a routine produces a result that is too large to be represented, an overflow
occurs. This causes the routine to return HUGE_VAL, and errno is set to ERANGE, indicating a
range error. If an underflow happens, the function returns zero and sets errno to ERANG E. All angles
are in radians. Originally, the mathematical functions were specified as operating on values of type
double, but Standard C++ added overloaded versions to explicitly accommodate values of type float
and long double.
The operation of the functions is otherwise unchanged.
acos
#include <cmath>
float acos(float arg);
double acos (double arg);
long double acos(long double arg);
The acos() function returns the arc cosine of arg. The argument to acos() must be in the range -1 to 1;
otherwise a domain error will occur.
Related functions are asin(), atan(), atan2(), sin(), cos(), tan(), sinh(), cosh(), and tanh().
asin
#include <cmath>
float asin(float arg);
double asin(double arg);
long double asin(long double arg);
The asin() function returns the arc sine of arg. The argument to asin() must be in the range –1 to 1;
otherwise a domain error will occur. Related functions are acos(), atan(), atan2(), sin(), cos(), tan(),
sinh(), cosh(), and tanh().
atan
#include <cmath>
float atan(float arg);
double atan(double arg);
long double atan(long double arg);
The atan() function returns the arc tangent of arg. Related functions are asin(), acos(), atan2(), tan().
cos(), sin(), sinh(). cosh(). and tanh().
atan2
#include <cmath>
float atan2(float y, float x);
double atan2(double y, double x);
long double atan2(long double y, long double x);
The atan2() function returns the arc tangent of y/x. It uses the signs of its arguments to compute the
quadrant of the return value. Related functions are asin(). acos(). at an(). tan(), cos(). sin(), sinh().
cosh().
and tanh().
ceil
#include <cmath>
float ceil(float num);
double ceil(double num);
long double ceil(long double num);
The ceil() function returns the smallest integer (represented as a floating -point value) not less than
num. For example, given 1.02, ceil() would return 2.0. Given –1.02, ceil() would return –1. Related
functions are floor() and fmod().
cos
#include <cmath>
float cos(float arg);
double cos(double arg);
long double cos(long double arg);
The cos() function returns the cosine of arg. The value of arg must be in radians. Related functions
are asin(), acos(), atan2(), atan(), tan(), sin(), sinh(), cos(), and tanh().
cosh
#include <cmath>
float cosh(float arg);
double cosh(double arg);
long double cosh(long double arg);
The cosh() function returns the hyperbolic cosine of arg. Related functions are asin(), acos(), atan2(),
atan(), tan(), sin(), cosh(), and tanh().
exp
#include <cmath>
float exp(float arg);
double exp(double arg);
long double exp(long double arg);
The exp() function returns the natural logarithm base e raised to the arg power. A related function is
log().
fabs
#include <cmath>
float fabs(float num);
double fabs(double num);
long double fabs(long double num);
The fabs() function returns the absolute value of num. A related function is abs().
floor
#include <cmath>
float floor(float num);
double floor(double num);
long double floor(long double num);
The floor() function returns the largest integer (represented as a floating -point value) not greater than
num. For example, given 1.02, floor() would return 1.0. Given –1.02, floor() would return –2.0.
Related functions are fceil() and fmod().
fmod
#include <cmath>
float fmod(float x, float y);
double fmod(double x, double y);
long double fmod(long double x, long double y);
The fmod() function returns the remainder of x/y. Related functions are ceil(), floor(), and fabs().
frexp
#include <cmath>
float frexp(float num, int * exp);
double frexp(double num, int * exp);
long double frexp(long double num, int * exp);
The frexp() function decomposes the number num into a mantissa in the range 0.5 to less than 1, and
an integer exponent such that num = mantissa * 2exp. The mantissa is returned by the function, and
the exponent is stored at the variable pointed to by exp. A related function is ldexp().
ldexp
#include <cmath>
float ldexp(float num, int exp);
double ldexp(double num, int exp);
long double ldexp(long double num, int exp);
The ldexp() returns the value of num * 2exp. If overflow occurs, HUGE_VAL is returned. Related
functions are frexp() and modf().
log
#include <cmath>
float log(float num);
double log(double num);
long double log(long double num);
The log() function returns the natural logarithm for num. A domain error occurs if num is negative,
and a range error occurs if the argument is zero. A related function is log10().
log10
#include <cmath>
float log10(float num);
double log10(double num);
long double log10(long double num);
The log10() function returns the base 10 logarithm for num. A domain error occurs if num is
negative, and a range error occurs if the argument is zero. A related function is log().
modf
#include <cmath>
float modf(float num, float * i);
double modf(double num, double * i);
long double modf(long double num, long double * i);
The modf() function decomposes num into its integer and fractional parts. It returns the fractional
portion and places the integer part in the variable pointed to by i. Related functions are frexp() and
ldexp().
pow
#include <cmath>
float pow(float base, float exp);
float pow(float base, int exp);
double pow(double base, double exp);
double pow(double base, int exp);
long double pow(long double base, long double exp);
long double pow(long double base, int exp);
The pow() function returns base raised to the exp power (base exp ). A domain error may occur if base
is zero and exp is less than or equal to zero. It will also happen if base is negative and exp is not an
integer. An overflow produces a range error. Related functions are exp(), log(), and sqrt().
sin
#include <cmath>
float sin(float arg);
double sin(double arg);
long double sin(long double arg);
The sin() function returns the sine of arg. The value of arg must be in radians.
Related functions are asin(), acos(), atan2(), atan(), tan(), cos(), sinh(), cosh(), and tanh()
sinh
#include <cmath>
float sinh(float arg);
double sinh(double arg);
long double sinh(long double arg);
The sinh() function returns the hyperbolic sine of arg. Related functions are asin(), acos(), atan2(),
atan(), tan(), cos(), tanh(), cosh(), and sin().
sqrt
#include <cmath>
float sqrt(float num);
double sqrt(double num);
long double sqrt(long double num);
The sqrt() function returns the square root of num. If it is called with a negative argument, a domain
error will occur. Related functions are exp(), log(), and pow().
tan
#include <cmath>
float tan(float arg);
double tan(double arg);
long double tan(long double arg);
The tan() function returns the tangent of arg. The value of arg must be in radians. Related functions
are acos(), asin(), atan(), atan2(), cos(), sin(), sinh(), cosh(), and tanh().
tanh
#include <cmath>
float tanh(float arg);
double tanh(double arg);
long double tanh(long double arg);
The tanh() function returns the hyperbolic tangent of arg. Related functions are acos(), asin(), atan(),
atan2(), cos(), sin(), cosh(), sinh(), and tan().
2. The....... statement is a powerful decision making statement and is used to control the flow of
execution of statements.
(a) if (b).break
(c) return (d) None of these.
Output
Enter a small number: 2
Enter a large number: 20
Enter a skip number: 4
Enter a target number: 6
skipping on 4
skipping on 8
Small: 10 Large: 8
Caution
The continue statement is only valid inside a loop. If you write it outside the loop, your program will
generate a compile error.
4.4.3 Return Statement
The expression clause, if present, is converted to the type specified in the function declaration, as if
an initialization were being performed. Conversion from the type of the expression to the return type
of the function can create temporary objects. For more information about how and when temporaries
are created.
The value of the expression clause is returned to the calling function. If the expression is omitted, the
return value of the function is undefined. Constructors and destructors, and functions of type void,
cannot specify an expression in the return statement. Functions of all other types must specify an
expression in the return statement.
When the flow of control exits the block enclosing the function definition, the result is the same as it
would be if a return statement without an expression had been executed. This is invalid for functions
that are declared as returning a value.
A function can have any number of return statements.
The following example uses an expression with a return statement to obtain the largest of two
integers.
The following function searches through an array of integers to determine if a match exists for the
variable number. If a match exists, the function match returns the value of i. If a match does not exist,
the function match returns the value -1 (negative one).
while(test condition)
{
Body of loop
}
At the first, condition is evaluated and if it is true then the body of loop is executed and then again
the condition is evaluated and if it is true then the body of loop executed again. This process goes
again and again until the condition is not false. If the condition goes false then the control is exit out
of loop.
do
{
Body of loop
}
while(condition);
In the above syntax the body of the loop is must be executed even the condition is not satisfied
because in this the condition is evaluated after the body, so it is provides exit-control loop.
Initialization: in this part of the loop the variable is initialized such as i=0; and t his is the control
variable for the loop, which tells from where the loop is starting.
Condition: The condition is a relational expression and the value of the variable is evaluated using
some condition for exampole i<=5;
Increment/decrement: After the execution of the statements which are in the body of the loop the
control will again transfer to the loop and then the value of the variable is incremented. This value
can be incremented by one or more according to our need. If we want to increment variable w ith one
then we can use i++;
Caution
The do-while loop will be executed at least once, whereas, depending upon the expression, the while
loop may not be executed at all.
The syntax for a nested do. while loop statement in C++ is as follows:
do
{
statement(s); // you can put more statements.
do
{
statement(s);
}while( condition );
}while( condition );
Example:
The following program uses a nested for loop to find the prime numbers from 2 to 100:
#include <iostream>
using namespace std;
int main ()
{
int i, j;
for(i=2; i<100; i++) {
for(j=2; j <= (i/j); j++)
if(!(i%j)) break; // if factor found, not prime
if(j > (i/j)) cout << i << ― is prime\n‖;
}
return 0;
}
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
That is,
a becomes !a
!a becomes a
&& becomes ||
|| becomes &&
Examples:
!(a && b || !c) becomes !a || !b && c
!(a <= 0 || !condition) becomes a > 0 && condition
Question
1. What are rules for switch statement?
2. Discuss the De Morgan‘s rule for decision making.
5. The condition for the loop is tested and if satisfied then it executes the body of the loop.
(a) True (b) False
4.7 Summary
A sequential statement is used in the body of the process described in the C++.
If statement is a powerful decision making statement and is used to control the flow of execution
of statements.
In while loop is the simple form of loop, and it is entry controlled loop statement.
In for loop, the continue statement causes the conditional test and then the re -initialization
portion of the loop to be executed.
C++ has a built-in multi way decision statement known as a switch.
4.8 Keywords
Decision-making: The statements are needed to alter the sequence of the statements in the program
depending upon certain circumstances.
Do-while: The do-while loop is mostly used for writing menu driven programs.
Loops: It is used to repeat a block of code. Being able to have your program repeatedly execute a
block of code is one of the most basic but useful tasks in programming.
Many Programs: It is require that a group of instructions be executed repeatedly, until some
particular condition has been satisfied. This process is known as looping.
Switch-Case Statement: It is a multi-way decision making statement.
5.0 Objectives
After studying this chapter, you will be able to:
Explain the arrays
Discuss the User Defined Functions
Define the return values and their types
Explain the function calls
Discuss the passing parameters to functions
5.1 Introduction
An array is a data type used to represent a large number of values of the same type. An array might
be used to represent all the salaries in a company or all the weights of participants in a fitness
program. Each element in an array has a position, with the initial element having position zero. An
array element‘s position can be used as an index or subscript to access that element. The elements of
an array are randomly accessible through the use of subscripts.
A function is a group of statements that together perform a task. Every C++ program has at least one
function which is main(), and all the most trivial programs can define additional functions.
It can divide up ones code into separate functions. How it divides up ones code among different
functions is up to you, but logically the division usually is so each function performs a specific task.
5.2 Arrays
5.2.1 The Meaning of an Array
An array is a collection of elements of the same data type are reference by a common name. Each
element of an array can be referred to by an array name and a subscript or index. Arrays can be one -
dimensional or two-dimensional.
Arrays of all types are possible, including arrays of arrays. A typical array d eclaration allocates
memory starting from a base address. An array name is, in effect, a pointer constant to this base
address.
To illustrate some of these ideas, let us write a small program that fills an array, prints out values,
and sums the elements of the array:
Initializing arrays
When declaring a regular array of local scope (within a function, for example), if we do not specify
otherwise, its elements will not be initialized to any value by default, so their content will be
undetermined until we store some value in them. The elements of global and static arrays, on the
other hand, are automatically initialized with their default values, which for all fundamental types
this means they are filled with zeros.
In both cases, local and global, when we declare an array, we have the possi bility to assign initial
values to each one of its elements by enclosing the values in braces { }. For example:
After this declaration, array abc would be 5 int long, since we have provided 5 initialization values.
char cArr[10];
The array subscript starts from zero. Therefore, cArr[2] would refer to the third element in the array
cArr where 2 is the array subscript.
#include <iostream>
using namespace std;
const int ROW=4;
const int COLUMN =3;
void main()
{
int i, j;
int India[ROW][COLUMN];
for(i=0; i < ROW; i++) //goes through the ROW elements
for(j=0;j < COLUMN;j++) //goes through the COLUMN elements
{
cout << ―Enter value of Row ‖ << i+1;
cout << ―Column ‖ << j+1 << ―:‖;
cin>>India[i][j];
}
cout << ―nnn‖;
cout << ― COLUMNn‖;
cout << ― 1 2 3‖;
for(i=0;i < ROW;i++)
{
cout << ―nROW ‖ << i+1;
for(j=0;j < COLUMN;j++)
cout << India[i][j];
}
}
}}
Example:
Following is the source code for a function called max (). This function takes two parameters num1
and num2 and returns the maximum between the two:
// function returning the max between two numbers
int max(int num1, int num2)
{
// local variable declaration
int result;
C++ allows programmers to define their own functions. For example the following is a definition of a
function which given the co-ordinates of a point (x, y) will return its distance from the origin.
void FunctionName();
A function that does not return a value is declared and defined as void. Here is an example:
void Introduction()
{
cout << ―This program is used to calculate the areas of some shapes. \n‖
<< ―The first shape will be a square and the second, a rectangle. \n‖
<< ―You will be requested to provide the dimensions and the program‖
<< ―will calculate the areas‖;
}
Any function could be a void type as long as it is not expecting it to retur n a specific value. A void
function with a more specific assignment could be used to calculate and display the area of a square.
Here is an example:
void SquareArea()
{
double Side;
cout << ―\nEnter the side of the square:‖;
cin >> Side;
cout << ―\nSquare characteristics:‖;
cout << ―\nSide =‖ << Side;
cout << ―\nArea =‖ << Side * Side;
}
When a function is of type void, it cannot be displayed on the same line with the cout extractor and it
cannot be assigned to a variable (since it does not return a value). Therefore, a void function can only
be called.
A return value, if not void, can be any of the data types we have studied so far. This means that a
function can return a char, an int, a float, a double, a bool, or a string. Here are examples of decl aring
functions by defining their return values:
double FunctionName();
char FunctionName();
bool FunctionName();
string FunctionName();
If declare a function that is returning anything (a function that is not void), the compiler will need to
know what value the function returns. The return value must be the same type declared. The value is
set with the return keyword.
If a function is declared as a char, make sure it returns a character (only one character). Here is an
example:
char Answer()
{
char a;
cout << ―Do you consider yourself a reliable employee (y=Yes/n=No)?‖;
cin >> a;
return a;
}
A good function can also handle a complete assignment and only hand a valid value to other calling
functions. Imagine it want to process member‘s applications at a sports club. You can define a
function that would request the first and last names; other functions that need a member‘s full name
would request it from such a function without worrying whether the name is complete. The following
function is in charge of requesting both names. It returns a full name that any desired function can
use:
string GetMemberName()
{
string FName, LName, FullName;
cout << ―New Member Registration.\n‖;
cout << ―First Name:‖;
cin >> FName;
cout << ―Last Name:‖;
cin >> LName;
FullName = FName + ― ‖ + LName;
return FullName;
}
The return value can also be an expression. Here is an example:
double SquareArea(double Side)
{
return (Side * Side);
}
A return value could also be a variable that represents the result. Here is example:
double SquareArea(double Side)
{
double Area;
Area = Side * Side;
return Area;
}
If a function returns a value (other than void), a calling function can assign its result to a local
variable like this:
Major = GetMajor();
Here is an example:
#include <iostream>
using namespace std;
int GetMajor()
{
int Choice;
cout << ―\n1 - Business Administration‖;
cout << ―\n2 – History‖;
cout << ―\n3 – Geography‖;
cout << ―\n4 – Education‖;
cout << ―\n5 - Computer Sciences‖;
cout << ―\nYour Choice:‖;
cin >> Choice;
return Choice;
}
int main()
{
int Major;
cout << ―Welcome to the student orientation program.‖;
cout << ―Select your desired major:‖;
Major = GetMajor();
cout << ―You select‖ << Major; cout << ―\n‖;
return 0;
}
int main()
{
Message(); // Calling the Message() function
return 0;
}
The compiler treats the calling of a function depending on where the function is declared with regards
to the caller. It can declare a function before calling it. Here is an example:
#include <iostream.h>
using namespace std;
void Message()
{
cout << ―This is C++ in its truest form.‖;
}
int main()
{
Message(); // Calling the Message() function
return 0;
}
#include <iostream>
using namespace std;
int main()
{
void Message();
cout << ―We will start with the student registration process.‖;
Message(); // Calling the Message() function
return 0;
}
void Message()
{
cout << ―Welcome to the Red Oak High School.‖;
}
To use any of the functions that ship with the compiler, first include the library in which the function
is defined, then call the necessary function. Here is an example that calls the getchar() function:
#include <iostream>
#include <cstdio>
using namespace std;
int main()
{
cout << ―This is C++ in its truest form...\n\n‖;
getchar();
return 0;
}
2. A declaration for an external variable can look just like a declaration fo r a variable that occurs
..............a function or a block.
(a). inside (b). outside
(c). Both (a) and (b) (d). None of these
#include <iostream>
int sqr(int x);
int main(void)
{
int t=10;
cout<<sqr(t) <<t;
return 0;
}
int sqr(int x)
{
x = x*x;
return(x);
}
In this example, the value of the argument to sqr(), 10, is copied into the parameter x. When the
assignment x = x*x takes place, only the local variable x is modified. The variable t, used to call
sqr(), still has the value 10. Hence, the output is 100 10.
Remember that it is a copy of the value of the argument that is passed into the function. What occurs
inside the function has no effect on the variable used in the call.
swap() is able to exchange the values of the two variables pointed to by x and y because their
addresses (not their values) are passed. Thus, within the function, the contents of the variables can be
accessed using standard pointer operations, and the contents of the variables used to call the function
are swapped. Remember that swap() (or any other function that uses pointer parameters) must be
called with the addresses of the arguments. The following program shows the correct way to call
swap():
In this example, the variable i is assigned the value 10 and j is assigned the value 20. Then wap() is
called with the addresses of i and j. (The unary operator & is used to produce the address of the
variables.) Therefore, the addresses of i and j, not their values, are passed into the function swap( ).
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
5.7 Recursion
A function can call itself. This is called recursion, and recursion can be direct or indirect. It is direct
when a function calls itself; it is indirect recursion when a function calls another function that then
calls the first function.
Some problems are most easily solved by recursion, usually those in which you act on data an d then
act in the same way on the result. Both types of recursion, direct and indirect, come in two varieties:
those that eventually end and produce an answer, and those that never end and produce a runtime
failure. Programmers think that the latter is quite funny (when it happens to someone else).
The local variables in the second version are independent of the local variables in the first, and they
cannot affect one another directly, any more than the local variables in main() can affect the local
variables in any function it calls.
When implementing a recursive solution one usually has at least two cases:
Base Case
General Case
For a function/method to be called recursive, it usually has a call to itself within its code in the
general case, with a smaller case being passed through.
A typical example to illustrate a recursive function is the factorial function (i.e. 5!)
//************************************
// Returns the factorial of a number
//************************************
int fact(int num)
{
if(num <= 0)
return 1; // Base Case
else
return num * fact(num-1); // General Case - also note it calls itself
}
void CountDown(int nValue)
{
using namespace std;
cout << nValue << endl;
CountDown(nValue-1);
}
int main(void)
{
CountDown(10);
return 0;
}
Caution
It is important to note that when a function calls itself, a new copy of that function is run.
The keyword auto can be used to explicitly specify the storage class. An example is
auto int a, b, c;
auto float f = 7.78;
Because the storage class is automatic by default, the keyword auto is seldom used.
The auto storage-class specified declares an automatic variable, a variable with a local lifetime. It is
the default storage-class specified for block-scoped variable declarations.
Few programmers use the auto keyword in declarations because all block-scoped objects not
explicitly declared with another storage class are implicitly automatic.
Therefore, the following two declarations are equivalent:
// auto_keyword.cpp
int main()
{
auto int i = 0; // Explicitly declared as auto.
int j = 0; // Implicitly auto.
}
{
for (register i = 0; i < LIMIT; ++i) {
·····
}
}
Caution
The compiler does not accept user requests for register variables; instead, it makes its own register
choices when global register-allocation optimization.
int f()
{
static int called = 0;
++called;
·····
return called;
}
Caution
Assigning to a static local variable is not thread safe and is not recommended as a programming
practice.
Task
Write the program for declare the static variable in C++.
5.9.4 Storage Class extern
One method of transmitting information across blocks and functions is to use external v ariables.
When a variable is declared outside a function at the file level, storage is permanently assigned to it,
and its storage class keyword is extern. A declaration for an external variable can look just like a
declaration for a variable that occurs inside a function or a block. Such a variable is considered to be
global to all functions declared after it. On block exit or function exit, the external variable remains
in existence. Such variables cannot have automatic or register storage class.
// external.cpp
// defined in another translation unit
extern int DefinedElsewhere;
int main() {
int DefinedHere;
{
// refers to DefinedHere in the enclosing scope
extern int DefinedHere;
}
}
Questions
1. What is the support vector machine training?
2. What is the radial basis function?
5.10 Summary
Array is a data structure which allows a collective name to be given to a group of elements which
all have the same type. An individual element of an array is identified by its own unique index (or
subscript).
Command line argument is a parameter which is passed to a program at the time or instant when it
is invoked or executed from the command line.
Function calls itself. This is called recursion and recursion direct or indirect. It is direct when a
function calls itself; it is indirect recursion when a function calls another function that then calls
the first function.
C++ provides a data structure, the array, which stores a fixed-size sequential collection of
elements of the same type.
A User-Defined Function is a function provided by the user of a program or environment, in a
context where the usual assumption is that functions are built into the program or environment.
5.11 Keywords
Call by Reference: It is method the address of an argument is copied into the parameter.
Call by Value: This method copies the value of an argument into the formal parameter of the
subroutine. Changes made to the parameter have no effect on the argument.
Extern: It variable is declared outside a function at the file level, storage is permanently assigned to
it, and its storage class keyword is extern.
Function Body: It is contains a collection of statements that define what the function does.
Parameter List: It is refers to the type, order, and number of the parameters of a function.
void: Function that does not return a value is declared and defined as void.
6.0 Objectives
After studying this chapter, you will be able to:
Discuss the classes
Explain the structures and unions, classes
Discuss the friend function and friend classes
Explain the inline function
Define the scope resolution operator
Explain the static class and data members
Define the static member function
Explain the passing object to functions
Discuss the returning objects
6.1 Introduction
In object-oriented programming languages like C++, the data and functions (procedures to manipulate
the data) are bundled together as a self-contained unit called an object. A class is an extended concept
similar to that of structure in C programming language; this class describes the data properties alone.
In C++ programming language, class describes both the properties (data) and behaviours (functions) of
objects. Classes are not objects, but they are used to instantiate objects.
6.2 Classes
A class is a way to bind the data and its associated functions together. It allows the data (and
function) to be hidden, if necessary, from external use. When defining a class, we are creating a new
abstract data type that can be treated like any other built-in data type. Generally, a class specification
has two parts:
1. Class declaration
2. Class function definitions
The class declaration describes the type and scope of its members. The class function definition
describes how the class functions are implemented.
The general form of a class declaration is:
class class_name
{
private:
variable declarations;
function declarations;
public;
variable declarations;
function declaration;
};
The class declaration is similar to a structure declaration. The keyword class specifies that what
follows is an abstract data of type class_name. The body of a class is enclosed within braces and
terminated by a semicolon. The class body contains the declaration of variables and functions. Th ese
functions and variables are collectively called class members. They are usually grouped under two
sections, namely, private and public to denote which of the members are private and which of them
are public. The keywords private and public are known as visibility labels. Note that these keywords
are followed by a colon.
The class members that have been declared as private can be accessed only from within the class. On
the other hand, public members can be accessed from outside the class also. The data h iding (using
private declaration) is the key feature of object-oriented programming. The use of the keyword
private is optional. By default, the members of a class are private. If both the labels are missing, then,
by default, all the members are private. Such a class is completely hidden from the outside world and
does not serve any purpose.
The variables declared inside the class are known as data members and the functions are known as
member functions. Only the member functions can have access to the pri vate data members and
private functions. However, the public members (both functions and data) can be accessed from
outside the class. This is illustrated in Figure 6.1. The binding of data and the functions together into
a single class-type variable is referred to as encapsulation.
class item
{
int number; // variables declaration
float cost; // private by default
public:
void getdata(int a, float b); // functions declaration
void putdata(void); // using prototype
};
We usually give a class some meaningful name, such as item. This name now becomes a new type
identifier that can be used to declare instances of that clas s type. The class item contains two data
members and two function members. The data members are private by default while both the
functions are public by declaration. The function getdata() can be used to assign values to the
member variables number and cost, and putdata() for displaying their values. These functions provide
the only access to the data members from outside the class. This means that the data cannot be
accessed by any function that is not a member of the class item. Note that the functions are declared,
not defined. Actual function definitions will appear later in the program. The data members are
usually declared as private and the member functions as public.
Figure 6.2 shows two different notations used by the OOP analysts to represent a class.
Figure 6.2: Representation of a class.
There are no limits to how many, or what data type a structure can hold. Remember though, a structure
can become quite large sometimes because it is size in memory is that of the sum total of the memory
size of all of it is member variables.
Now that we have all of these objects, what do we do with them? Accessing a member variable is very
simple and uses dot-notation: object.member_variable
TIME newTime() {
TIME time;
cout << ―Hour:‖;
cin >> time.hour;
cout << ―Minute:‖;
cin >> time.min;
//... yadda yadda yadda
return time;
}
Now let us take a look at pointer objects. When any object, a structure, union, or class, is declared as a
pointer, it no longer uses dot-notation, but pointer-notation: object->member_variable. The arrow or
―pointer‖ is used to signify the difference between a normal object and a pointer.
DATE newDate() {
DATE date {07, 06, 2002};
/* A struct can be initialized in this manner.
month=07, day=10, year=2002
*/
char dummy;
cout << ―Enter new date (mm/dd/yyyy):‖;
cin >> date.month >> dummy >> date.day >> dummy >> date.year;
return date;
}
As an alternative, the above code can request the date format to be (mm dd yyyy) and eliminate the
need for the char dummy variable.
6.3.2 Classes
A class is where C++ becomes the most objects oriented that it can. Classes offer many abilities and
methods of storing, handling, and retrieving data. A class is very similar to a structure in that it can
have member variables, but it can also have member functions. These are functions within the class.
Member functions are called the same way member variables are:
object.member_function (...)
class CAT {
int lives;
int getLives(); //returns the number of lives
};
In a class, as well as in a structure, there are levels of access; they are public, private, and protected.
We would not worry about protected access, thought, since that pertains to inheritance, and is out of
the scope of this tutorial.
Anything that is in the public section of the class can be referenced and changed by anything outside
of the class like a normal variable. In the private section, opposite is true. Nothing but member
functions can access, change or call member variables or functions.
class CAT {
public:
int getLives(); //returns the number of lives
private:
int lives;
};
6.3.3 Relationship between Class, Structure and Union
The separating factor between a structure and a union is that a structure can also have member
functions just like a class. The difference between a structure and a class is that all member functions
and variables in a structure are by default public, but in a class, they default to private as previously
discussed.
It is often a good idea to use constructors to initialize the member variables of a structure. Other than
that, though, it is against current standards, and usually looked down upon, to use functions in a
structure, and is usually considered just being lazy.
Caution
A union member cannot be a class object that has a constructor, destructor, or overloaded copy
assignment operator, nor can it be of reference type. A union member cannot be declared with the
keyword static.
//friend_functions.cpp
//compile with: /EHsc
#include <iostream>
using namespace std;
class Point
{
friend void ChangePrivate( Point &);
public:
Point( void ) : m_i(0) {}
void PrintPrivate( void ){cout << m_i << endl; }
private:
int m_i;
};
void ChangePrivate ( Point &i ) { i.m_i++; }
int main()
{
Point sPoint;
sPoint.PrintPrivate();
ChangePrivate(sPoint);
sPoint.PrintPrivate();
}
Did You Know?
C++ was developed in bell labs by Bjarne Stroustrup in 1983 -1985. Bjarne Stroustrup, after
completing the doctoral degree at the Computing Laboratory of the Cambridge University, joined the
Bell Laboratories.
The friend declarations can go in either the public, private, or protected section of a class it does not
matter where they appear. In particular, specifying a friend in the section marked protected does not
prevent the friend from also accessing private fields.
Now, Node does not need to provide any means of accessing the data stored in the tree. The
BinaryTree class that will use the data is the only class that will ever need access to the data or key.
(The BinaryTree class needs to use the key to order the tree, and it will be the gateway through which
other classes can access data stored in any particular node.)
Now in the BinaryTree class, you can treat the key and data fields as though they were public:
class BinaryTree
{
private:
Node *root;
int find(int key);
};
int BinaryTree::find(int key)
{
// check root for NULL...
if(root->key == key)
{
// no need to go through an access or function
return root->data;
}
// perform rest of find
6.6 Inline Function
Inline functions are functions where the call is made to inline functions. The actual code then gets
placed in the calling program.
Reason for the need of Inline Function:
Normally, a function call transfers the control from the calling program to the function and after the
execution of the program returns the control back to the calling program after the function call. These
concepts of function save program space and memory space and are used because the fu nction is
stored only in one place and is only executed when it is called. This execution may be time consuming
since the registers and other processes must be saved before the function gets called.
The extra time needed and the process of saving is valid for larger functions. If the function is short,
the programmer may wish to place the code of the function in the calling program in order for it to be
executed. This type of function is best handled by the inline function. In this situation, the
programmer may be wondering ―why not write the short code repeatedly inside the program wherever
needed instead of going for inline function?‖ Although this could accomplish the task, the problem
lies in the loss of clarity of the program. If the programme repeats t he same code many times, there
will be a loss of clarity in the program. The alternative approach is to allow inline functions to achieve
the same purpose, with the concept of functions.
The inline function takes the format as a normal function but when it is compiled it is compiled as
inline code. The function is placed separately as inline function, thus adding readability to the source
program. When the program is compiled, the code present in function body is replaced in the place of
function call. General Format of inline Function:
The general format of inline function is as follows:
The keyword inline specified in the above example, designates the function as inline function. For
example, if a programmer wishes to have a function named india with return value as integer and with
no arguments as inline it is written as follows:
Example:
The concept of inline functions:
#include <iostream>
using namespace std;
int india(int);
void main( )
{
int x;
cout << ―\n Enter the Input Value:‖;
cin>>x;
cout << ―\n The Output is:‖<< india(x);
}
inline int india(int x1)
{
return 5*x1;
}
The output would be the same even when the inline function is written solely as a function. The
concept, however, is different. When the program is compiled, the code present in the inline function
india( ) is replaced in the place of function call in the calling program. The concept of inline function
is used in this example because the function is a small line of code.
A programmer must make wise choices when to use inline functions. Inline functions will save time
and are useful if the function is very small. If the function is large, use of inline functions must be
avoided.
2. The data members are …………….by defaults while both the functions are public by declaration.
(a) public (b) private
(c) protected (d) None of these
4. The .....................is placed only in the function declaration of the friend function and not in the
function definition.
(a) friend class (b) friend function
(c) friend declarations (d) keyword friend
5. The inline function takes the format as a normal function but when it is compiled it is compiled
as..........................
(a) outline code (b) function code
(c) inline code (d) data member code
6. The concept of inline function is used in this example because the function is a small line of code.
(a) True (b) False
The declaration of the inner block hides the declaration of same variable in outer block. This means,
within the inner block, the variable x will refer to the data object declared there in. To access the
global version of the variable, C++ provides scope resolution operator.
In the above example, x has a value of 20 but ::x has value 10. Similarly, this operator is used when a
member function is defined outside the class.
For example
Class MyClass
{
int n1, n2;
public:
{
void func1(); ---------Function Declaration
}
};
public void MyClass::func1() ---Use of Scope Resolution Operator to write function definition
outside class definition
{
// Function Code
}
class c{
int i;
int j;
static int m;
static int n;
public:
void zap();
static void clear();
};
void c::zap() {
i = 0; j = 0; m = 0; n = 0;
}
void c::clear() {
m = 0; n = 0;
}
There i and j are instance variables and m and n are class variables. Every object of class c will h ave
its own private i and j, which can have different values for different objects; however, all objects will
access the same m and n, which will, of course, have the same values for all objects.
Static variables are like non-inline member functions in that they are declared in a class declaration
and defined in the corresponding source file. To define static variables m and n, the source file for
class c must contain the following declarations (which are also definitions):
int c::m;
int c::n;
Caution
The static member variables must be defined outside the class.
{
int number;
static int count;
public:
void getdata()
{
++count;
number=count;
}
void putdata(void)
{
cout<< ―Count is‖ <<count<< ―\n‖;
cout<< ―Number is‖<<number<< ―\n‖;
}
};
int item::count=0;
int main()
{
clrscr();
item x,y,z; //Three object created from class item
x.getdata();
y.getdata();
z.getdata();
x.putdata(); The count is 3 and number is 1
y.putdata(); The count is 3 and number is 2
z.putdata(); The count is 3 and number is 3
return 0;
}
{
static int count;
int code;
public:
static void showcount(void)
{
cout<< ―Count is‖ <<count<< ―\n‖;
}
void setcode(void)
{
code = ++count;
}
void setcount(void)
{
cout<< ―Code is‖ <<code<< ―\n‖;
}
};
int find::count=0;
int main()
{
clrscr();
find x,y,z;
x.setcode(); //Code and count is 1.
y.setcode(); //Code and count is 2
find::showcount(); //Count is 2
z.setcode(); //Code and count is 3
find::showcount(); //Count is 3
x.setcount(); //Code is 1 for object x.
y.setcount(); //Code is 2 for object y.
z.setcount(); //Code is 3 for object z.
x.setcode(); //Code and count is 4.
y.setcode(); //Code and count is 5
z.setcode(); //Code and count is 6
find::showcount(); //Count is 6
//The value of code will increase from its previous value of that object.
x.setcount(); //Code is 4 for object x.
y.setcount(); /Code is 5 for object x.
z.setcount(); //Code is 6 for object x.
return 0;
}
Since the count is declared as static it has only one copy irrespect ive of number of objects created.
When static count function called again and again it will increase count for all the objects created
simultaneously therefore the count will be uniform for all object, but for the data member code it is
unique for each object created because it is not a static data member. Therefore from the above
example there will only one value for count data member irrespective of number of objects
constructed since it is a static data member, where as for code data member the value wil l be different
for each object.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
#include <isotream>
using namespace std;
class time
{
int hours;
int minutes;
public:
void gettime(int h, int m)
{ hour = h; minutes = m;}
void puttime(void)
{
count<< hours << ―hours and‖;
count<< minutes << ―minute‖ << ―\n‖;
}
void sum (time, time); //declaration with objects as arguments
};
void time:: sum(time t1, time t2)//t1, t2 are objects
{
minutes = t1.minutes + t2.minutes;
hours = minutes/60;
minutes = minutes%60;
hours = hours + t1.hours + t2.hours;
}
int main()
{
time T1, T2, T3;
T1.gettime (2, 45); //get T1
T2.gettime (3, 30); //get T2
T3.sum(T1, T2); //T3 = T1 + T2
count<< ―T1 =‖;T1.puttime(); //display T1
count << ―T2=‖; T2.puttime(); //display T2
count << ―T3=‖; T3.puttime(); //display T3
return 0;
}
Since the member function sum ( ) is invoked by the object T3, with the objects T1 and T2 as
arguments, it can directly access the hours and minutes variables of T3. But, the members of T1 and
T2 can be accessed only by using the dot operator (like T1.hours and T1.minutes). Therefore, inside
the function sum ( ), the variables hours and minutes refer to T3, T1.hours and T1.minutes refer to T1,
and T2.hours and T2.minutes refer to T2. Figure 6.3 illustrates how the members are accessed inside
the function sum ( ).
An object can also be passed as an argument to a non-member function. However, such functions can
have access to the public member functions only through the objects passed as arguments to it. These
functions cannot have access to the private data members.
Example:
class AnimalLister
{
public:
Animal* getNewAnimal()
{
Animal* animal1 = new Animal();
return animal1;
}
}
If 1 create an instance of Animal Lister and get Animal reference from it, then where 1 supposed to
delete it?
int main() {
AnimalLister al;
Animal *a1, *a2;
a1 = al.getNewAnimal();
a2 = al.getNewAnimal();
}
The problem here is AnimalLister does not have a way to track the list of Animals Created, so how
does 1 change the logic of such code to have a way to delete the objects created.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Example:
#include <iostream>
using namespace std;
class MyClass
{
int a, b;
public:
void setAB(int i, int j) { a = i, b = j; }
void display() {
cout << ―\n a is‖ << a << ―\n‖;
cout << ―\n b is‖ << b << ―\n‖;
}
};
int main()
{
MyClass ob1, ob2;
ob1.setAB(10, 20);
ob2.setAB(0, 0);
cout << ―ob1 before assignment:‖;
ob1.display();
cout << ―ob2 before assignment:‖;
ob2.display();
return 0;
}
Output:
ob1 before assignment:
a is 10
b is 20
ob2 before assignment:
a is 0
b is 0
ob1 after assignment:
a is 10
b is 20
ob2 after assignment:
a is 10
b is 20
ob1 after changing ob1:
a is -1
b is -1
ob2 after changing ob1:
a is 10
b is 20
6.14 Summary
A class is a combination of member functions (Methods) and member variables (fields)
The class declaration is similar to a structure declaration.
The data hiding is the key feature of object-oriented programming.
Data-encapsulation is key to the functionality of many high level or later generation programming
languages such as C++ and Java.
Data hiding is the principle of using private member variables within a class.
The keyword friend is placed only in the function declaration of the friend function and not in the
function definition.
A class is an extension to the structure data type. A class can have both varia bles and functions as
members.
6.15 Keywords
Class: A class is very similar to a structure in that it can have member variables, but it can also have
member functions.
Friend Class: A class can also be declared to be the friend of some other class. When we create a
friend class then all the member functions of the friend class also become the friend of the other
class.
Friend Function: A friend function is used for accessing the non-public members of a class. A class
can allow non-member functions and other classes to access its own private data, by making them
friends.
Inline Function: Inline function is the optimization technique used by the compilers. One can simply
prepped inline keyword to function prototype to make a function inline.
Static Class Members: Static classes and class members are used to create data and functions that can
be accessed without creating an instance of the class.
Static Data Members: Static data members are subject to class-member access rules, so private
access to static data members is allowed only for class -member functions and friends.
6.16 Review Questions
1. What is return objects from class methods?
2. How to passed object to functions for any data type?
3. What is a class? How does it accomplish data hiding?
4. What is a Union?
5. How is a member function of a class defined?
6. How does a C++ structure differ from a C++ class?
7. What is a friend function? What are the merits and demerits of using friend functions?
8. What is the main reason for using structure?
9. When do we declare a member of a class static?
10. Describe the mechanism of accessing data members and member functions in the following cases:
(a) Inside the main program.
(b) Inside a member function of the same class.
(c) Inside a member function of another class.
7.0 Objectives
After studying this chapter, you will be able to:
Understand the Arrays of Objects
Identifying and classifying the pointers to objects
Discuss the type checking C++ pointers
Explain the this pointer
Discuss the pointers to derived types
7.1 Introduction
Pointers and arrays were examined as they relate to C++‘s built -in types. Here, they are discussed
relative to objects. This chapter also looks at a feature related to the pointer called a reference. The
chapter concludes with an examination of C++‘s dynamic allocation operators.
A pointer is a variable that is used to store a memory address. The address is the location of the
variable in the memory. Pointers help in allocating memory dynamically. Pointers improve execution
time and saves space. Pointer points to a particular data type.
The general form of declaring pointer is:-
type *variable_name;
type is the base type of the pointer and variable_name is the name of the variable of the pointer. For
example,
int *x;
x is the variable name and it is the pointer of type integer.
It is possible to have arrays of objects. The syntax for declaring and using an object array is exactly
the same as it is for any other type of array. For example, this program uses a three -element array of
objects:
#include <iostream>
using namespace std;
class cl {
int i;
public:
void set_i(int j) { i=j; }
int get_i() { return i; }
};
int main()
{
cl ob[3];
int i;
for(i=0; i<3; i++) ob[i].set_i(i+1);
for(i=0; i<3; i++)
cout << ob[i].get_i() << ―\n‖;
return 0;
}
class cl {
int i;
public:
cl(int j) { i=j; }
int get_i() { return i; }
};
Here, the constructor function defined by cl requires one parameter. This implies that any array
declared of this type must be initialized. That is, it precludes this array declaration:
cl a[9]; // error, constructor requires initializes
The reason that this statement is not valid (as cl is currently defined) is that it implies that cl has a
parameter less constructor because no initializes are specified. However, as it stands, cl does not have
a parameter less constructor. Because there is no valid constructor that corresponds to this
declaration, the compiler will report an error. To solve this problem, you need to o verload the
constructor function, adding one that takes no parameters. In this way, arrays that are initialized and
those that are not are both allowed.
class cl {
int i;
public:
cl() { i=0; } // called for non-initialized arrays
cl(int j) { i=j; } // called for initialized arrays
int get_i() { return i; }
};
Given this class, both of the following statements are permissible:
cl a1[3] = {3, 5, 6}; // initialized
cl a2[34]; // uninitialized
#include <iostream>
using namespace std;
class cl {
int i;
public:
cl(int j) { i=j; }
int get_i() { return i; }
};
int main()
{
cl ob(88), *p;
p = &ob; // get address of ob
cout << p->get_i(); // use -> to call get_i()
return 0;
}
When a pointer is incremented, it points to the next element of its type. For example, an integer
pointer will point to the next integer. In general, all pointer arithmetic is relative to the base type of
the pointer. (That is, it is relative to the type of data that the pointer is de clared as pointing to.) The
same is true of pointers to objects. For example, this program uses a pointer to access all three
elements of array ob after being assigned ob‘s starting address:
#include <iostream>
using namespace std;
class cl {
int i;
public:
cl() { i=0; }
cl(int j) { i=j; }
int get_i() { return i; }
};
int main()
{
cl ob[3] = {1, 2, 3};
cl *p;
int i;
p = ob; // get start of array
for(i=0; i<3; i++) {
cout << p->get_i() << ―\n‖;
p++; // point to next object
}
return 0;
}
int *pi;
float *pf;
in C++, the following assignment is illegal:
pi = pf; // error -- type mismatch
Of course, you can override any type incompatibilities using a cast, but doing so bypasses C++‘s
type-checking mechanism.
3. If a variable is a pointer to a structure, then which of the following operator is used to access data
members of the structure through the pointer variable?
(a). . (b). &
(c). * (d). ->
4. What is (void*)0?
(a). Representation of NULL pointer (b). Representation of void pointer
(c). Error (d). None of above
#include <iostream>
using namespace std;
class pwr {
double b;
int e;
double val;
public:
pwr(double base, int exp);
double get_pwr() { return val; }
};
pwr::pwr(double base, int exp)
{
b = base;
e = exp;
val = 1;
if(exp==0) return;
for( ; exp>0; exp--) val = val * b;
}
int main()
{
pwr x(4.0, 2), y(2.5, 1), z(5.7, 0);
cout << x.get_pwr() << ― ―;
cout << y.get_pwr() << ―‖;
cout << z.get_pwr() << ―\n‖;
return 0;
}
Within a member function, the members of a class can be accessed directly, without any object or
class qualification. Thus, inside pwr(), the statement b = base; means that the copy of b associated
with the invoking object will be assigned the value contained in base.
However, the same statement can also be written like this:
this->b = base;
The this pointer points to the object that invoked pwr(). Thus, this –>b refers to that
object‘s copy of b. For example, if pwr() had been invoked by x (as in x(4.0, 2)), then
this in the preceding statement would have been pointing to x. Writing the statement
without using this is really just shorthand.
Here is the entire pwr() function written using the this pointer:
Actually, no C++ programmer would write pwr() as just shown because nothing is gained, and the
standard form is easier. However, the pointer is very important when operators are overloaded an d
whenever a member function must utilize a pointer to the object that invoked it.
Remember that the this pointer is automatically passed to all member functions.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
7.6 Pointers to Derived Types
A pointer of one type cannot point to an object of a different type. However, there is an important
exception to this rule that relates only to derived classes. To begin, assume two classes called B and
D. Further, assume that D is derived from the base class B. In this situation, a pointer of type B *
may also point to an object of type D. More generally, a base class pointer can also be used as a
pointer to an object of any class derived from that base.
Although a base class pointer can be used to point to a derived object, the opposite is not true. A
pointer of type D * may not point to an object of type B. Further, although you can use a base pointer
to point to a derived object, you can access only the members of the derived type that were imported
from the base. That is, you would not be able to access any members added by the derived class.
(You can cast a base pointer into a derived pointer and gain full access to the entire derived class,
however.)
Here is a short program that illustrates the use of a base pointer to access derived objects.
#include <iostream>
using namespace std;
class base {
int i;
public:
void set_i(int num) { i=num; }
int get_i() { return i; }
};
class derived: public base {
int j;
public:
void set_j(int num) { j=num; }
int get_j() { return j; }
};
int main()
{
base *bp;
derived d;
bp = &d; // base pointer points to derived object
// access derived object using base pointer
bp->set_i(10);
cout << bp->get_i() << ―‖;
/* the following would not work. You cannot access element of
a derived class using a base class pointer.
bp->set_j(88); // error
cout << bp->get_j(); // error
*/
return 0;
}
As you can see, a base pointer is used to access an object of a derived class. Although you must be
careful, it is possible to cast a base pointer into a pointer of the derived type to access a member of
the derived class through the base pointer.
For example, this is valid C++ code:
It is important to remember that pointer arithmetic is relative to the base type of the pointer. For this
reason, when a base pointer is pointing to a derived object, incrementing the pointer does not cause it
to point to the next object of the derived type. Instead, it will point to what it thinks is the next object
of the base type. This, of course, usually spells trouble. For example, this program, while
syntactically correct, contains this error.
// This program contains an error.
#include <iostream>
using namespace std;
class base {
int i;
public:
void set_i(int num) { i=num; }
int get_i() { return i; }
};
class derived: public base {
int j;
public:
void set_j(int num) {j=num;}
int get_j() {return j;}
};
int main()
{
base *bp;
derived d[2];
bp = d;
d[0].set_i(1);
d[1].set_i(2);
cout << bp->get_i() << ―‖;
bp++; // relative to base, not derived
cout << bp->get_i(); // garbage value displayed
return 0;
}
The use of base pointers to derived types is most useful when creating run -time polymorphism
through the mechanism of virtual functions.
Caution
The array size must be declared at the time of declaration of array. Because arrays size is fixed and it
must be specified at compile time.
Did You Know?
In 1957, the first of the major languages appeared in the form of FORTRAN.
#include <iostream>
using namespace std;
class cl {
public:
cl(int i) { val=i; }
int val;
int double_val() { return val+val; }
};
int main()
{
int cl::*data; // data member pointer
int (cl::*func)(); // function member pointer
cl ob1(1), ob2(2); // create objects
data = &cl::val; // get offset of val
func = &cl::double_val; // get offset of double_val()
cout << ―Here are values: ‖;
cout << ob1.*data << ―‖ << ob2.*data << ―\n‖;
cout << ―Here they are doubled: ‖;
cout << (ob1.*func)() << ―‖;
cout << (ob2.*func)() << ―\n‖;
return 0;
}
In main(), this program creates two member pointers: data and func.
Note carefully the syntax of each declaration. When declaring pointers to members, you must specify
the class and use the scope resolution operator. The program also creates objects of cl called ob1 and
ob2. As the program illustrates, member pointers may point to either functions or data. Next, the
program obtains the addresses of val and double_val(). As stated earlier, these ―addresses‖ are really
just offsets into an object of type cl, at which point val and double_val() will be found. Next, to
display the values of each object‘s val, each is accessed through data. Finally, the program uses func
to call the double_val() function. The extra parentheses are necessary in order to correctly associate
the .* operator.
When you are accessing a member of an object by using an object or a reference, you must use the .*
operator. However, if you are using a pointer to the object, you need to use the –>* operator, as
illustrated in this version of the preceding program:
#include <iostream>
using namespace std;
class cl {
public:
cl(int i) { val=i; }
int val;
int double_val() { return val+val; }
};
int main()
{
int cl::*data; // data member pointer
int (cl::*func)(); // function member pointer
cl ob1(1), ob2(2); // create objects
cl *p1, *p2;
p1 = &ob1; // access objects through a pointer
p2 = &ob2;
data = &cl::val; // get offset of val
func = &cl::double_val; // get offset of double_val()
cout << ―Here are values: ‖;
cout << p1->*data << ―‖ << p2->*data << ―\n‖;
cout << ―Here they are doubled: ‖;
cout << (p1->*func)() << ―‖;
cout << (p2->*func)() << ―\n‖;
return 0;
}
In this version, p1 and p2 are pointers to objects of type cl. Therefore, the –>* operator is used to
access val and double_val(). Remember, pointers to members are different from pointers to specific
instances of elements of an object. Consider this fragment (assume that cl is declared as shown in the
preceding programs):
int cl::*d;
int *p;
cl o;
p = &o.val // this is address of a specific val
d = &cl::val // this is offset of generic val
Here, p is a pointer to an integer inside a specific object. However, d is simply an offset that indicates
where val will be found in any object of type cl. In general, pointer -to-member operators are applied
in special-case situations. They are not typically used in day-to-day programming.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
7.8 References
C++ contains a feature that is related to the pointer called a reference. A reference is essentially an
implicit pointer. There are three ways that a reference can be used: as a function parameter, as a
function return value, or as a stand-alone reference. Each is examined here.
Constructing 1
-10
Destructing 1
Of course, the type of the initializer must be compatible with the type of data for which memory is
being allocated.
This program gives the allocated integer an initial value of 87:
#include <iostream>
#include <new>
using namespace std;
int main()
{
int *p;
try {
p = new int (87); // initialize to 87
} catch (bad_alloc xa) {
cout << ―Allocation Failure\n‖;
return 1;
}
cout << ―At ‖ << p << ―‖;
cout << ―is the value ‖ << *p << ―\n‖;
delete p;
return 0;
}
delete [ ] p_var;
Here, the [ ] informs delete that an array is being released.
For example, the next program allocates a 10-element integer array.
#include <iostream>
#include <new>
using namespace std;
int main()
{
int *p, i;
try {
p = new int [10]; // allocate 10 integer array
} catch (bad_alloc xa) {
cout << ―Allocation Failure\n‖;
return 1;
}
for(i=0; i<10; i++ )
p[i] = i;
for(i=0; i<10; i++)
cout << p[i] << ―‖;
delete [] p; // release the array
return 0;
}
7.10 Summary
Pointer is a variable that is used to store a memory address. The address is the location of the
variable in the memory. Pointers help in allocating memory dynamically.
Reference is a relation between objects in which one object designates, or acts as a means by
which to connect to or link to, another object. The first object in this relation is said to refer to the
second object.
The pointer accessible only within the nonstatic member functions of a class, struct, or union
type. It points to the object for which the member function is called. Static member functions do
not have a pointer.
Dynamic allocation is one of the some ways of using memory provided by the C++ standard. To
accomplish this in C++ the malloc function is used and the new keyword is used for C++.
Memory allocated and deallocated dynamically by using the new and the delete operators.
7.11 Keywords
Array of Objects: An array of variables of type ―class‖ is known as ―Array of objects‖. The
―identifier‖ used to refer the array of objects is a user defined data type.
malloc(): Allocates a block of size bytes of memory, returning a pointer to the beginning of the
block. The content of the newly allocated block of memory is not initialized, remaining with
indeterminate values.
new Operator: Pointers provide the necessary support for C++‘s powerful dynamic memory
allocation system. Dynamic allocation is the means by which a program obtains memory while it is
running.
Pointer: It is a variable which contains the address in memory of another variable. We have a pointer
to any variable type.
this Pointer: Every class member function has a hidden parameter the this pointer. this points to the
individual object.
8.0 Objectives
After studying this chapter, you will be able to:
Explain the constructors
Discuss the default constructor
Define the parameterized constructors
Understand the concept of copy constructors
8.1 Introduction
It is very common for some part of an object to require initialization before it can be used. For
example, think back to the stack class developed earlier. Before the stack could be used, to had to be
set to zero. This was performed by using the function init(). Because the requirement for initialization
is so common, C++ allows objects to initialize themselves when they are created. This automatic
initialization is performed through the use of a constructor function.
Because classes have complicated internal structures, including data and functions, object
initialization and cleanup for classes is much more complicated than it is for simple data structures.
Constructors and destructors are special member functions of classes that are used to construct and
destroy class objects. Construction may involve memory allocation and initialization for objects.
Destruction may involve cleanup and deallocation of memory for objects.
8.2 Constructors
A Class is defined as constructor by declaring a constructor in the form of function inside the class.
In other word a function in the name of class is a constructor with few exceptions from the regular
function. Constructor is a special member function. The constructor must have identical name as the
class. Constructor is the extension of class with additional features and certain limitations. The class
cannot be defined without parameters or arguments that are class can be defined in the form of
function and thus class can be declared with argument and without argument. T he class without
argument is called default constructor with argument is known as parameterized constructors. The
construction is used to initialize the class; as seen earlier a class cannot be initialized; this limitation
is over come by having a constructor inside the class. The constructor can be overload.
Rules of Constructor:
The constructors must have the same name as the class.
The constructors will take the form of function prototype.
The constructors are used to initialize the class and its objects.
The constructors are invoked automatically as soon as the objects are created.
The constructors do not return type even void like function.
The constructors cannot be inherited.
The constructors can have a default argument.
The constructors may or may not have argument.
The constructors can also be defined as inline function.
The constructors cannot have its class as argument.
The constructors can have its class as argument but only through reference. This is known as copy
constructor.
The constructor can be constant, thus constructor arguments never be changed.
Constructor can be empty or without statement inside it. It is a do nothing constructor it must be
created when overloading operator is defined in the class.
Constructor can also have other than initialization statement
A constructor is defined as follows which will interpret the rule of constructor.
class integer
{
int m, n;
public:
integer(void);
};
integer::integer(void)
{
m=0; n=0;
}
Calling a constructor:
1. Explicit calling. 2. Implicit calling.
Explicit calling, it is declared by combining the declaration and passing the argument to the
constructor.
integer int1=integer(10,20);
The above statement can re written in the following form.
integer int1;
int1(10,20);
The above method is implicit calling it can also be made s horter in the following way.
integer int1(10,20);
Caution
The constructors must be declared in the public section of the class.
Example:-
class abc
{
int m,n;
public:
abc(int x,int y); //paramererise constructor
................
.................
};
abc::abc(int x,int y)
{
m=x; n=y;
}
If only the derived‘s constructor takes parameters then also it. But what if both the base and derived
class contains parameterized constructors.
// this code contains ERRORS
// base class
class base
{
int a;
public:
base(int n)
{
//...
}
};
// derived class
class derived:public base
{
int b;
public:
derived(int m)
{
//...
}
};
//main
void main()
{
base b(10); //ok
We need to introduce expanded from of constructor declaration (of the derived class).
derived-constructor (arg): base1 (arg),
base2 (arg),
...
baseN (arg)
{
...
...
}
Here base1, base2 etc. are constructor functions of base classes, which the derived class inherits.
// this code is ok
// base class
class base
{
int a;
public:
base(int n){a=n;}
};
// derived class
class derived:public base
{
int b;
public:
// since this class is derived
// from only one class ‗base‘
// therefore only one constructor
// is listed
derived(int m1,int m2):base(m2)
{b=m1;}
};
// main
void main()
{
base b(10); //ok
Caution
Constructors cannot return values. Specifying a constructor with a return type is an error, as is taking
the address of a constructor.
8.5 Copy Constructors
A copy constructor is a special constructor in the C++ programming language used to create a new
object as a copy of an existing object. This constructor takes a single argument: a reference to the
object to be copied. Normally the compiler automatically creates a copy constructor for each class
(known as an implicit copy constructor) but for special cases the programmer creates the copy
constructor, known as an explicit copy constructor. In such cases, the compiler does not create one.
The copy constructor is a constructor which creates an object by initializing it with an object of the
same class, which has been created previously.
The copy constructor is used to:
Initialize one object from another of the same type.
Copy an object to pass it as an argument to a function.
Copy an object to return it from a function.
If a copy constructor is not defined in a class, the compiler itself defines one. If the class has pointer
variables and has some dynamic memory allocations, then it is a must to have a copy constructor.
The most common form of copy constructor is shown here:
Here, obj is a reference to an object that is being used to initialize another object.
#include <iostream>
using namespace std;
class Line
{
public:
int getLength( void );
Line( int len ); // simple constructor
Line( const Line &obj); // copy constructor
~Line(); // destructor
private:
int *ptr;
};
// Member functions definitions including constructor
Line::Line(int len)
{
cout << ―Normal constructor allocating ptr‖ << endl;
// allocate memory for the pointer;
ptr = new int;
*ptr = len;
}
Line::Line(const Line &obj)
{
cout << ―Copy constructor allocating ptr. ‖ << endl;
ptr = new int;
*ptr = *obj.ptr; // copy the value
}
Line::~Line(void)
{
cout << "Freeing memory! ‖ << endl;
delete ptr;
}
int Line::getLength( void )
{
return *ptr;
}
display(line);
return 0;
}
When the above code is compiled and executed, it produces following result:
#include <iostream>
using namespace std;
class complex
{
Float x y;
public:
complex(){ }// constructor no arg
complex(float a) {x = y = a;}// constructor-one arg
complex(float real, float imag)// constructor-two orgs {x = real; y = imag;)
{x = real; y = imag;}
friend complex sum(complex, complex);
friend void show(complex);
};
complex sum(complex cl, complex c2) // friend
{
Complex c3;
C3.x = c1.x + c2.x;
C3.y = c1.y + c2.y;
Return(c3);
}
void show(complex c) //friend
{
cout << c .x << ― + j‖ << c .y << ―\n‖;
}
int main()
{
Complex A(2.7, 3.5); //define & initialize
Complex B(1.6); //define & initialize
Complex C; // define
C = sum(A, B); //sum() is a friend
Cout << ―A = ‖; show(A); //sum() is a also friend
Cout << ―B = ‖; show(B);
Cout << ―C = ‖; show(C);
// Another way to give initial values (second method)
complex P,Q,R; // define P, Q and R
P = complex(2.5, 3.9); //initialize P
Q = complex(1.6, 2.5); //initialize Q
R = sum(P,Q);
Cout << ―\n‖;
Cout << ―p = ‖; show(P);
Cout << ―Q = ‖; show(Q);
Cout << ―R = ‖; show(R);
Return 0;
}
8.9 Destructors
As opposed to a constructor, a destructor is called when a program has finished using an instance of
an object. A destructor does the cleaning behind the scenes. Like the default constructor, the compiler
always create a default destructor if you don‘t create one. Like the default constructor, a destru ctor
also has the same name as its object. This time, the name of the destructor starts with
a tilde.
To create your own destructor, in the header file, type ~ followed by the name of the object. Here is
an example:
#ifndef BricksH
#define BricksH
class TBrick
{
public:
TBrick();
TBrick(double L, double h, double t);
TBrick(const TBrick &Brk);
~TBrick();
double getLength() const;
void setLength(const double l);
double getHeight() const;
void setHeight(const double h);
double getThickness() const;
void setThickness(const double t);
double CementVolume();
void ShowProperties();
private:
double Length;
double Height;
double Thickness;
};
#endif
As done with a default constructor, you don‘t need to put anything in the implementation of a
destructor. In fact, when a program terminates, the compiler can itself destroy all of the objects and
variables that your program has used. The only true time you will be concerned with destroying
objects is if the objects were created dynamically, which we will learn when studying pointers.
You can implement your destructor in the header file by just providing it with empty parentheses:
ifndef BricksH
#define BricksH
class TBrick
{
public:
...~TBrick() {}
...private:
..};
#endif
All classes are implemented as classes (constants) and are encapsulated by a Mixin Layer. In other
words, this Mixin Layer implements a basic
stock information broker feature (BasicSIB).
1 class StockInformationBroker {
2 DBBroker m_db ;
3 public:
4 StockInfo & collectInfo ( StockInfoRequest & req ) {
5 string * stocks = req . getStocks ();
6 StockInfo * info = new StockInfo ();
7 for ( unsigned int i = 0; i < req . num (); i ++)
8 info -> addQuote ( stocks [i], m_db . get ( stocks [i ]));
9 return * info ; }
10 };
11
12 class Client {
13 StockInformationBroker & m_broker ;
14 public:
15 void run ( string * stocks , unsigned int num ) {
16 StockInfo & info = m_broker . collectInfo ( StockInfoRequest ( stocks , num ));
17 ... }
18 };
Pricing Feature as Mixin Layer: Now, we want to add a Pricing feature that charges the clients
account depending on the received stock quotes. Depicts this feature implemented using common
FOP concepts. Client is refined by an account management (Lines 16 -23), SIR is refined by a price
calculation (Lines 2-5), and SIB charges the clients account when passing information to the client
(Lines 10-12). There are several problems to this approach:
(1) The Pricing features is expressed in terms of the structure of the BasicSIB feature. This problem
is caused because FOP can only express hierarchy-conform refinements. It would be better to
describe the pricing feature using abstractions as product and customer.
(2) The interface of collectInfo was extended. Therefore, the Client must override the method run in
order to pass a reference of itself to the SIB. This is an inelegant workaround and increases the
complexity.
(3) The charging procedure of the clients cannot be altered depending on the runtime control flow.
Moreover, it is assigned to the SIB which is clearly not responsible for this function.
(4) An hypothetical accounting functionality that traces and logs the transfers (not depicted) suffers
from excessive method shadowing because all affected methods, e.g. collectInfo, price, balance, etc.,
have to be shadowed. Pricing Feature as Aspectual Mixin Layer. Depicts the pricing feature
implemented by an Aspectual Mixin Layer. The key difference is the Charging aspect. It serves as an
observer of calls to the method collectInfo. Every call to this method is intercepted and the client is
charged depending on its request. This solves the problem of the extended interface because the client
is charged by the aspect instead by the SIB. An alternative is to pass the client‘
refines class StockInfoRequest
1{
2 f loat basicPrice ();
3 f loat calculateTax ();
4 public:
5 f loat price ();
6 };
7
8 refines class StockInformationBroker {
9 public:
10 StockInfo & collectInfo ( Client &c, StockInfoRequest & req ) {
11 c. charge ( req );
12 return super:: collectInfo ( req ); }
13 };
14
15 refines class Client {
16 f loat m_balance ;
17 public:
18 f loat balance ();
19 void charge ( StockInfoRequest & req );
20 void run ( string * stocks , unsigned int num ) {
21 StockInfo & info = super:: m_broker . collectInfo (* this ,
22 StockInfoRequest ( stocks , num ));
23 ... }
24 };
reference to the extended collectInfo method (not depicted). In both cases, the Client does not need to
override the run method. A further advantage is that the charging of client‘ accounts can be made
dependent to the control flow (using the cflow pointcut). This makes it possible to implement the
charging function variable. In this context, the method shadowing is prevented by using wildcard
expressions in pointcuts, e.g., for capturing calls to all methods which are relevant for price tran sfer
(accounting feature for tracing and logging transfers). Finally, our example shows that using
Aspectual Mixin Layers we were able to refine only these classes that play the roles of product (SIR)
and customer (Client).
Questions
1. What are the aspectual mixin layers clarify the use of Feature C++?
2. What is the pricing feature as mix in layer?
8.10 Summary
Constructors are called automatically by the compiler when defining class objects. The
destructors are called when a class object goes out of scope.
A copy constructor is a special constructor in the C++ programming language for creating a new
object as a copy of an existing object.
Destructors are usually used to deallocate memory and do other cleanup for a class object and its
class members when the object is destroyed.
In computer programming languages the term ―default constructor‖ refers to a constructor that is
automatically generated in the absence of explicit constructors this automatically provided
constructor is usually a nullary constructor.
A destructor is called for a class object when that object passes out of scope or is explicitly
deleted.
8.11 Keywords
Constructor: If however, any kind of constructor is declared by the programmer than a default is not
supplied default values.
Default Arguments: C++ allows a function to assign default values to parameters. The default value
is assigned when no argument corresponding to that parameter is specified in the call to that function.
Default Constructor: It is a constructor in C++ that has no parameters or where it has parameters
they are all defaulted. If no constructor is supplied then the compiler will supply a default.
Destructor: It is used to destroy the objects that have been created by a constructor.
Parameterized Constructors: The constructors that take arguments are called parameterized
constructors.
9.0 Objectives
After studying this chapter, you will be able to:
Describe the function overloading
Explain the overloading constructor function
Explain the finding address of overloaded function
Describe the operator overloading
9.1 Introduction
Function overloading is one of the most powerful features of C++ programming language. It forms
the basis of polymorphism (compile-time polymorphism). Most of the time you will be overloading
the constructor function of a class.
Function overloading is a feature of C++ that allows us to create multiple functions with the same
name, so long as they have different parameters.
C++ permits the use of two functions with the same name. However such functions essentially have
different argument list. The difference can be in terms of number or type of arguments or both. This
process of using two or more functions with the same name but differing in the signature is called
function overloading. But overloading of functions with different return types is not allowed.
#include <iostream>
using namespace std;
int myfunc(int i); // these differ in types of parameters
double myfunc(double i);
int main()
{
cout << myfunc(10) << ― ‖; // calls myfunc(int i)
cout << myfunc(5.4); // calls myfunc(double i)
return 0;
}
double myfunc(double i)
{
return i;
}
int myfunc(int i)
{
return i;
}
The next program overloads myfunc() using a different number of parameters:
#include <iostream>
using namespace std;
int myfunc(int i); // these differ in number of parameters
int myfunc(int i, int j);
int main()
{
cout << myfunc(10) << ― ‖; // calls myfunc(int i)
cout << myfunc(4, 5); // calls myfunc(int i, int j)
return 0;
}
int myfunc(int i)
{
return i;
}
int myfunc(int i, int j)
{
return i*j;
}
#include <iostream>
#include <cstdio>
using namespace std;
class date {
int day, month, year;
public:
date(char *d);
date(int m, int d, int y);
void show_date();
};
// Initialize using string.
date::date(char *d)
{
sscanf(d, ―%d%*c%d%*c%d‖, &month, &day, &year);
}
// Initialize using integers.
date::date(int m, int d, int y)
{
day = d;
month = m;
year = y;
}
void date::show_date()
{
cout << month << ―/‖ << day;
cout << ―/‖ << year << ―\n‖;
}
int main()
{
date ob1(12, 4, 2001), ob2(―10/22/2001‖);
ob1.show_date();
ob2.show_date();
return 0;
}
In this program, you can initialize an object of type date, either by specifying the date using three
integers to represent the month, day, and year, or by using a string that cont ains the date in this
general form:
mm/dd/yyyy
#include <iostream>
#include <new>
using namespace std;
class powers {
int x;
public:
// overload constructor two ways
powers() { x = 0; } // no initializer
powers(int n) { x = n; } // initializer
int getx() { return x; }
void setx(int i) { x = i; }
};
int main()
{
powers ofTwo[] = {1, 2, 4, 8, 16}; // initialized
powers ofThree[5]; // uninitialized
powers *p;
int i;
// show powers of two
cout << ―Powers of two: ‖;
for(i=0; i<5; i++) {
cout << ofTwo[i].getx() << ― ‖;
}
cout << ―\n\n‖;
// set powers of three
ofThree[0].setx(1);
ofThree[1].setx(3);
ofThree[2].setx(9);
ofThree[3].setx(27);
ofThree[4].setx(81);
// show powers of three
cout << ―Powers of three: ‖;
for(i=0; i<5; i++) {
cout << ofThree[i].getx() << ― ‖;
}
cout << ―\n\n‖;
// dynamically allocate an array
try {
p = new powers[5]; // no initialization
} catch (bad_alloc xa) {
cout << ―Allocation Failure\n‖;
return 1;
}
// initialize dynamic array with powers of two
for(i=0; i<5; i++) {
p[i].setx(ofTwo[i].getx());
}
// show powers of two
cout << ―Powers of two: ‖;
for(i=0; i<5; i++) {
cout << p[i].getx() << ― ‖;
}
cout << ―\n\n‖;
delete [] p;
return 0;
}
In this example, both constructors are necessary. The default constructor is used to construct the
uninitialized ofThree array and the dynamically allocated array. The parameterized constructor is
called to create the objects for the ofTwo array.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
The following example demonstrates implicitly defined and user -defined copy constructors:
#include <iostream>
using namespace std;
struct A {
int i;
A() : i(10) { }
};
struct B {
int j;
B() : j(20) {
cout << "Constructor B(), j = " << j << endl;
}
B(B& arg) : j(arg.j) {
cout << "Copy constructor B(B&), j = " << j << endl;
}
B(const B&, int val = 30) : j(val) {
cout << "Copy constructor B(const B&, int), j = " << j << endl;
}
};
struct C {
C() { }
C(C&) { }
};
int main() {
A a;
A a1(a);
B b;
const B b_const;
B b1(b);
B b2(b_const);
const C c_const;
// C c1(c_const);
}
The following is the output of the above example:
Constructor B(), j = 20
Constructor B(), j = 20
Copy constructor B(B&), j = 20
Copy constructor B(const B&, int), j = 30
The statement A a1(a) creates a new object from a with an implicitly defined copy constructor. The
statement B b1(b) creates a new object from b with the user-defined copy constructor B::B(B&). The
statement B b2(b_const) creates a new object with the copy constructor B::B(const B&, int). The
compiler would not allow the statement C c1(c_const) because a copy construct or that takes as its
first parameter an object of type const C& has not been defined.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
struct X {
int f(int) { return 0; }
static int f(char) { return 0; }
};
int main() {
int (X::*a)(int) = &X::f;
// int (*b)(int) = &X::f;
}
The compiler will not allow the initialization of the function pointer b. No nonmember function or
static function of type int(int) has been declared.
If f is a template function, the compiler will perform template argument deduction to determine which
template function to use. If successful, it will add that function to the list of viable functions. If there
is more than one function in this set, including a non-template function, the compiler will eliminate
all template functions from the set and choose the non-template function. If there are only template
functions in this set, the compiler will choose the most specialized template function. The following
example demonstrates this:
{
public:
Complex(double re,double im)
:real(re),imag(im)
{};
Complex operator+(const Complex& other);
Complex operator=(const Complex& other);
private:
double real;
double imag;
};
Complex Complex::operator+(const Complex& other)
{
double result_real = real + other.real;
double result_imaginary = imag + other.imag;
return Complex( result_real, result_imaginary );
}
The assignment operator can be overloaded similarly. Notice that we did not have to call any accessor
functions in order to get the real and imaginary parts from the parameter other since the overload ed
operator is a member of the class and has full access to all private data. Alternatively, we could have
defined the addition operator globally and called a member to do the actual work. In that case, we did
also have to make the method a friend of the class, or use an accessor method to get at the private
data:
friend Complex operator+(Complex);
Why would you do this? when the operator is a class member, the first object in the expression must
be of that particular type:
Complex a( 1, 2 );
Complex a( 2, 2 );
Complex c = a.operator=( b );
When it is a global function, the implicit or user -defined conversion can allow the operator to act
even if the first operand is not exactly of the same type:
Complex c = 2+b; //if the integer 2 can be converted by the Complex class, this expression is
valid by the way, the number of operands to a function is fixed; that is, a binary operator takes two
operands, a unary only one, and you cannot change it. The same is true for the precedence of
operators too; for example the multiplication operator is called before addition. There are some
operators that need the first operand to be assignable, such as: operator=, operator(), operator[] and
operator->, so their use is restricted just as member functions(non -static), they cannot be overloaded
globally. The operator=, operator& and operator, (sequencing) have already defined meanings by
default for all objects, but their meanings can be changed by overloading or erased by making them
private.
string prefix(―de‖);
string word(―composed‖);
string composed = prefix+word;
Caution
Operator overloading should only be utilized when the meaning of the overloaded operator‘s
operation is unambiguous.
9.6.1 Creating Prefix and Postfix Forms of the Increment (++) and Decrement (– –) operators
(Overloading Unary Operator)
In this program, only the prefix form of the increment operator was overloaded. However, Standard
C++ allows you to explicitly create separate prefix and postfix versions of increment or decr ement
operators. To accomplish this, you must define two versions of the operator++() function. One is
defined as shown in the foregoing program. The other is declared like this:
loc operator++(int x);
If the ++ precedes its operand, the operator++() function is called. If the ++ follows it is operand, the
operator++(int x) is called and x has the value zero.
This example can be generalized. Here are the general forms for the prefix and postfix ++ and – –
operator functions.
//Prefix increment
type operator++( ) {
//body of prefix operator
}
//Postfix increment
type operator++(int x) {
//body of postfix operator
}
//Prefix decrement
type operator– –( ) {
//body of prefix operator
}
//Postfix decrement
type operator– –(int x) {
//body of postfix operator
}
When overloading one of these operators, keep in mind that you are simply combining an assignment
with another type of operation.
Caution
Be aware when working with older C++ versions where the increment and decrement operators are
concerned. In older versions of C++, it was not possible to specify separate prefix and postfix
versions of an overloaded ++ or – –. The prefix form was used for both.
#include <iostream>
using namespace std;
class loc {
int longitude, latitude;
public:
loc() {} // needed to construct temporaries
loc(int lg, int lt) {
longitude = lg;
latitude = lt;
}
void show() {
cout << longitude << ― ‖;
cout << latitude << ―\n‖;
}
friend loc operator+(loc op1, loc op2); // now a friend
loc operator-(loc op2);
loc operator=(loc op2);
loc operator++();
};
// Now, + is overloaded using friend function.
loc operator+(loc op1, loc op2)
{
loc temp;
temp.longitude = op1.longitude + op2.longitude;
temp.latitude = op1.latitude + op2.latitude;
return temp;
}
// Overload - for loc.
loc loc::operator-(loc op2)
{
loc temp;
// notice order of operands
temp.longitude = longitude - op2.longitude;
temp.latitude = latitude - op2.latitude;
return temp;
}
// Overload assignment for loc.
loc loc::operator=(loc op2)
{
longitude = op2.longitude;
latitude = op2.latitude;
return *this; // i.e., return object that generated call
}
// Overload ++ for loc.
loc loc::operator++()
{
longitude++;
latitude++;
return *this;
}
int main()
{
loc ob1(10, 20), ob2( 5, 30);
ob1 = ob1 + ob2;
ob1.show();
return 0;
}
There are some restrictions that apply to friend operator functions. First, you may not overload the =,
( ), [ ], or –> operators by using a friend function. when overloading the increment or decrement
operators, you will need to use a reference parameter when using a friend function.
2. The default constructor is used to construct the uninitialized of..........array and the dynamically
allocated array.
(a) zero (b) one
(c). three (c) multi
The type size_t is a defined type capable of containing the largest single piece of memory that can be
allocated. (size_t is essentially an unsigned integer.) The parameter size will contain the number of
bytes needed to hold the object being allocated. This is the amount of memory that version of new
must allocate. The overloaded new function must return a pointer to the memory that it allocates, or
throw a bad_alloc exception if an allocation error occurs. Beyond these constraints, the overloaded
new function can do anything else you require. When you allocate a
object using new the object‘s constructor is automatically called.
The delete function receives a pointer to the region of memory to be freed. It then releases the
allocated memory back to the system. When an object is deleted, its destructor f unction is
automatically called.
The new and delete operators may be overloaded globally so that all uses of these operators call
custom versions. They may also be overloaded relative to one or more classes. Let us begin with an
example of overloading new and delete relative to a class. For the sake of simplicity, no new
allocation scheme will be used. Instead, the overloaded operators will simply invoke the standard
library functions malloc() and free().
To overload the new and delete operators for a class, simply make the overloaded operator functions
class members. For example, here the new and delete operators are overloaded for the loc class:
#include <iostream>
#include <cstdlib>
#include <new>
using namespace std;
class loc {
int longitude, latitude;
public:
loc() {}
loc(int lg, int lt) {
longitude = lg;
latitude = lt;
}
void show() {
cout << longitude << ― ‖;
cout << latitude << ―\n‖;
}
void *operator new(size_t size);
void operator delete(void *p);
};
// new overloaded relative to loc.
void *loc::operator new(size_t size)
{
void *p;
cout << ―In overloaded new.\n‖;
p = malloc(size);
if(!p) {
bad_alloc ba;
throw ba;
}
return p;
}
// delete overloaded relative to loc.
void loc::operator delete(void *p)
{
cout << ―In overloaded delete.\n‖;
free(p);
}
int main()
{
loc *p1, *p2;
try {
p1 = new loc (10, 20);
} catch (bad_alloc xa) {
cout << ―Allocation error for p1.\n‖;
return 1;
}
try {
p2 = new loc (-10, -20);
} catch (bad_alloc xa) {
cout << ―Allocation error for p2.\n‖;
return 1;;
}
p1->show();
p2->show();
delete p1;
delete p2;
return 0;
}
Output from this program is shown here.
In overloaded new.
In overloaded new.
10 20
-10 -20
In overloaded delete.
In overloaded delete.
type class-name::operator[](int i)
{
// . . .
}
Technically, the parameter does not have to be of type int, but an operator[ ]() function is typically
used to provide array subscripting, and as such, an integer value is generally used.
Given an object called O, the expression
O[3]
translates into this call to the operator[ ]() function:
O.operator[](3)
That is, the value of the expression within the subscripting operators is passed to the operator[ ]()
function in its explicit parameter. This pointer will point to O, the object that generated the call.
In the following program, atype declares an array of three integers. It is constructor function
initializes each member of the array to the specified values. The overloaded operator[ ]() function
returns the value of the array as indexed by the value of its parameter.
#include <iostream>
using namespace std;
class atype {
int a[3];
public:
atype(int i, int j, int k) {
a[0] = i;
a[1] = j;
a[2] = k;
}
int operator[](int i) { return a[i]; }
};
int main()
{
atype ob(1, 2, 3);
cout << ob[1]; // displays 2
return 0;
}
#include <iostream>
using namespace std;
class loc {
int longitude, latitude;
public:
loc() {}
loc(int lg, int lt) {
longitude = lg;
latitude = lt;
}
void show() {
cout << longitude << ― ‖;
cout << latitude << ―\n‖;
}
loc operator+(loc op2);
loc operator()(int i, int j);
};
// Overload ( ) for loc.
loc loc::operator()(int i, int j)
{
longitude = i;
latitude = j;
return *this;
}
// Overload + for loc.
loc loc::operator+(loc op2)
{
loc temp;
temp.longitude = op2.longitude + longitude;
temp.latitude = op2.latitude + latitude;
return temp;
}
int main()
{
loc ob1(10, 20), ob2(1, 1);
ob1.show();
ob1(7, 8); // can be executed by itself
ob1.show();
ob1 = ob2 + ob1(10, 10); // can be used in expressions
ob1.show();
return 0;
}
return 0;
}
This produces the result:
I have 14 cents.
#include<iostream.h>
#include<conio.h>
#include<stdio.h>
void main()
{
clrscr();
char str1[30],str2[30],str3[60];
int i,j;
cout<< ―Enter first string: ‖;
gets(str1);
cout<< ―\nEnter second string: ‖;
gets(str2);
for(i=0;str1[i]!='\0';++i)
str3[i]=str1[i];
for(j=0;str2[j]!='\0';++j)
str3[i+j]=str2[j];
str3[i+j]='\0';
cout<< ―\nThe concatenate string is ‖<<str3;
getch();
}
Here is a program that illustrates the effect of overloading the comma operator.
#include <iostream>
using namespace std;
class loc {
int longitude, latitude;
public:
loc() {}
loc(int lg, int lt) {
longitude = lg;
latitude = lt;
}
void show() {
cout << longitude << ― ‖;
cout << latitude << ―\n‖;
}
loc operator+(loc op2);
loc operator,(loc op2);
};
// overload comma for loc
loc loc::operator,(loc op2)
{
loc temp;
temp.longitude = op2.longitude;
temp.latitude = op2.latitude;
cout << op2.longitude << ― ‖ << op2.latitude << ―\n‖;
return temp;
}
// Overload + for loc
loc loc::operator+(loc op2)
{
loc temp;
temp.longitude = op2.longitude + longitude;
temp.latitude = op2.latitude + latitude;
return temp;
}
int main()
{
loc ob1(10, 20), ob2( 5, 30), ob3(1, 1);
ob1.show();
ob2.show();
ob3.show();
cout << ―\n‖;
ob1 = (ob1, ob2+ob2, ob3);
ob1.show(); // displays 1 1, the value of ob3
return 0;
}
This program displays the following output:
10 20
5 30
11
10 60
11
11
#include <iostream>
using namespace std;
class Distance
{
private:
int feet; // 0 to infinite
int inches; // 0 to 12
public:
// required constructors
Distance(){
feet = 0;
inches = 0;
}
Distance(int f, int i){
feet = f;
inches = i;
}
friend ostream &operator<<( ostream &output,
const Distance &D )
{
output << ―F : ‖ << D.feet << ― I : ‖ << D.inches;
return output;
}
friend istream &operator>>( istream &input, Distance &D )
{
input >> D.feet >> D.inches;
return input;
}
};
int main()
{
Distance D1(11, 10), D2(5, 11), D3;
cout << ―Enter the value of object : ‖ << endl;
cin >> D3;
cout << ―First Distance : ‖ << D1 << endl;
cout << ―Second Distance : ‖ << D2 << endl;
cout << ―Third Distance :‖ << D3 << endl;
return 0;
}
Output:
Enter the value of object :
70
10
First Distance : F : 11 I : 10
Second Distance :F : 5 I : 11
Third Distance :F : 70 I : 10
9.13 Summary
Function overloading is a feature of C++ that allows us to create multiple functions with the same
name.
The default constructor is used to construct the uninitialized of three arrays and the dynamically
allocated array. The parameterized constructor is called to create the objects of two arrays.
Overloading the plus operator (+) is as simple as declaring a function named operator+, giving it
two parameters of the type of the operands.
C++ is able to input and output the built-in data types using the stream extraction operator >> and
the stream insertion operator <<.
The –> pointer operator is called the class member access operator, and considered a unary
operator when overloading.
9.14 Keywords
Comma Operator: It is a binary operator. It use to separate the arguments.
Friend Function: It is used for accessing the non-public members of a class
Function Overloading: It is the process of using the same name for two or more functions.
Parameterized Constructor: It is containing parameter.
String Concatenation: It is consists of adding one string to another.
String Objects: It is refer to special type of container, specifically designed to operate with sequences
of characters.
10.0 Objectives
After studying this chapter, you will be able to:
Describe the features or advantages of inheritance
Explain the type of inheritance
Describe the based classes and derived classes
Explain the inheriting multiple base classes
Describe the constructors, destructors and inheritance
10.1 Introduction
In the lesson on composition, you learned how to construct complex classes by combining simpler
classes. Composition is perfect for building new objects that have a has -a relationship with their
subobjects. However, composition (and aggregation) is just one of the two major ways that C++ lets
you construct complex classes. The second way is through inheritance.
Unlike composition, which involves creating new objects by combining and connecting other objects,
inheritance involves creating new objects by directly acquiring the attributes and behaviors of other
objects and then extending or specializing them. Like composition, inheritance is everywhere in real
life. You inherited your parents genes, and acquired physical attributes from both of them.
Technological products (computers, cell phones, etc…) often inherit features from their predecessors.
C++ inherited many features from C, the language upon which it is based, and C itself inherited many
of its features from the programming languages that came before it.
Multilevel Inheritance
In this type of inheritance, there are number of level and it has used in that cases where we want to
use all properties in number of levels to the requirement. For example, class A inherited in class B
and class B has inherited in class C for class B so on. Where class A is base class C. In another way
we can say B is derived class a base class for C and A indirect base class for C is indirect base class
for C and C indirect derived class for class A (see the Figure 10.2):
Multiple Inheritances
In this type of inheritance, number of classes has inherited in a single class. Where two or more
classes are, know as base class and one is derive class (see the Figure 10.3):
Hierarchical Inheritance
This type of inheritance helps us to create a baseless for number of classes and those numbers of
classes can have further their branches of number of class (see the Figure 10.4).
Figure 10.4: Hierarchical inheritance.
Hybrid Inheritance
In this type of inheritance, we can have mixture of number of inheritances but this can generate an
error of using same name function from no of classes, which will bother the compiler to how to use
the functions. Therefore, it will generate errors in the program. This has known as ambiguity or
duplicity (see the Figure 10.5):
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
10.4 Based Classes and Derived Classes
Based Class
A class is a mechanism for creating user-defined data types. It is similar to the C language structure
data type. In C, a structure is composed of a set of data members. In C++, a class type is like a C
structure, except that a class is composed of a set of data members and a set of operations that can be
performed on the class.
In C++, a class type can be declared with the keywords union, struct, or class. A union object can
hold any one of a set of named members. Structure and class objects hold a complete set of members.
Each class type represents a unique set of class members including data members, member functions,
and other type names. The default access for members depends on the class key:
The members of a class declared with the keyword class are private by default. A class is inherited
privately by default.
The members of a class declared with the keyword struct are public by default. A structure is
inherited publicly by default.
The members of a union (declared with the keyword union) are public by default. A union cannot be
used as a base class in derivation.
Once you create a class type, you can declare one or more objects of that class type. For example:
class X
{
/* define class members here */
};
int main()
{
X xobject1; // create an object of class type X
X xobject2; // create another object of class type X
}
Derived Class
A derived class is a class that inherits the properties from its super class. For example, a Cat is a
super class and Monx cat is a derived class which has all properties of a Cat and does not have a tail.
A concrete derived class is a derived class which implements the all functionality that are missed in
the super class.
Explaination Derived class with an example using C++.
Inheritance is one of the important feature of OOP which allows us to make hierarchical
classifications of classes. In this, we can create a general class which defines the most common
features. Other more specific classes can inherit this class to define those features that are unique to
them. In this case, the classes which inherit from other classes, is referred as derived class.
For example, a general class vehicle can be inherited by more specific classes car and bike. The
classes car and bike are derived classes in this case.
class vehicle
{
int fuel_cap;
public:
drive();
};
int main() { }
Class A contains one protected data member, an integer i. Because B derives from A, the members of
B have access to the protected member of A. Function f() is a friend of class B:
The compiler would not allow pa->i = 1 because pa is not a pointer to the derived class B.
The compiler would not allow int A::* point_i = &A::i because i has not been qualified with
the name of the derived class B.
Function g() is a member function of class B. The list of remarks about which statements the
compiler would and would not allow apply for g() except for the following:
The compiler allows i = 2 because it is equivalent to this ->i = 2.
Function h() cannot access any of the protected members of A because h() is neither a friend or a
member of a derived class of A.
class b { };
class d : public b // public derivation
{ };
We can use both a structure and a class as base classes in the base list of a derived class declaration:
If the derived class is declared with the keyword class, the default access specifier in its base list
specifiers is private.
If the derived class is declared with the keyword struct, the default access specifier in its base list
specifiers is public.
In the following example, private derivation is used by default because no access specifier is used in
the base list and the derived class is declared with the keyword class:
struct B
{ };
class D : B // private derivation
{ };
Members and friends of a class can implicitly convert a pointer to an object of that class to a pointer
to either:
A direct private base class
A protected base class (either direct or indirect)
Caution
The comma (,) character separates the base class names. And do not forget the public keyword-it
should appear in front of every base class name. If the public keyword is omitted from one or more
base class names, those base classes become private base classes.
The order in which base classes are specified is not significant except in certain cases where
constructors and destructors are invoked. In these cases, the order in which base classes are specified
affects the following:
The order in which initialization by constructor takes place. If your code relies on the Book portion
of CollectionOfBook to be initialized before the Collection part, the order of specification is
significant. Initialization takes place in the order the classes are specified in the base-list.
The order in which destructors are invoked to clean up. Again, if a particular "part" of the class must
be present when the other part is being destroyed, the order is significant. Destructors are called in
the reverse order of the classes specified in the base-list.
When specifying the base-list, you cannot specify the same class name more than once. However, it is
possible for a class to be an indirect base to a derived class more than once.
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
class X {
public:
X(); // constructor for class X
};
Constructors are used to create, and can initialize, objects of their class type.
You cannot declare a constructor as virtual or static, nor can you declare a constructor as const,
volatile, or const volatile.
You do not specify a return type for a constructor. A return statement in the body of a constructor
cannot have a return value.
Destructors
Destructors are used to control the behaviour of an object when it passes out of scope or is otherwise
to be discarded. If the class is simple then a destructor may not be necessary: the default is simply to
discard the store occupied by the data members. However, if pointers are involved, then you must
explicitly delete objects pointed to so that memory leakage does not occur.
Inheritance
Inheritance is the process of creating new classes from the existing class or classes.
Using inheritance, one can create general class that defines traits common to a set of related items.
This class can then be inherited (reused) by the other classes by using the properties of the existing
ones with the addition of its own unique properties.
The old class is referred to as the base class and the new classes, which are inherited from the base
class, are called derived classes.
Forms of Inheritance :
Single Inheritance - If a class is derived from a single base class, it is called as single inheritance.
Multiple Inheritance - If a class is derived from more than one base class, it is known as multiple
inheritance.
Multilevel Inheritance - The classes can also be derived from the classes that are already derived.
This type of inheritance is called multilevel inheritance.
Hierarchical Inheritance - If a number of classes are derived from a single base class, it is called as
hierarchical inheritance.
For example:
class Base {
public:
Base (const char *str, ...);
};
I want to have Derived's constructor call Base's constructor, passing all the optional parameters. Right now I
am splitting things up into a separate initialization function that takes a va_list, but I'd really like to avoid
having to separate it out:
class Base {
public:
Base (void);
Base (const char *str, ...); // <-- this would call Init()
protected:
void Init (va_list args);
};
class Derived : public Base {
public:
Derived (const char *str, ...) {
// call's Base Init with va_list...
va_list args;
va_start(args, str);
Init(args);
va_end(args);
}
};
Example:
#include <iostream>
using namespace std;
class base {
int i; // private to base
public:
int j, k;
void seti(int x) { i = x; }
int geti() { return i; }
};
return 0;
}
In the example, class X has two sub objects of class V, one that is shared by classes B1 and B2 and
one through class B3.
7. Constructor and destructors are usually defined as......... members of their class and may never
possess a return value.
(a) private (b) public
(c) protected (d) All of these
8. The constructors of any...........base classes are called first in the order of inheritance from the
ultimate base class to the lowest virtual class in the inheritance hierarchy.
(a) virtual (b) static
(c) dynamic (d) All of these
9. The purpose of destructor is to release the memory when the compiler memory is reduced or
insufficient to execute certain program.
(a) True (b) False
10. The derived class inherit some or all of the properties of the base class.
(a) True (b) False
Results
The global software company supplements its workforce with an average of 150 IT and business
consultants drawn from Manpower Professional‘s broad talent pipeline.
Manpower Professional‘s on site management team generates streamlined, efficient recruiting and
hiring in locations across the U.S. The onsite team conducts quarterly reviews aimed at continuous
improvement in quality recruitment and client satisfaction. With a solid partnership in place, the firm
is well positioned for new growth opportunities.
Manpower Professional
Manpower Professional‘s seasoned recruiters play a central role in flexible workforce strategy for
many technology industry companies. We use a consultative approach and our knowledge of complex
IT environments to deliver specialized IT talent at all levels for short - and long-term projects, project
solutions and permanent hiring solutions.
Question
1. Describe the challenge Global Software Company.
2. What is the manpower professional in IT sector?
……………………………………………………………………………………………………………………..
..……………………………………………………………………………………………………………………
…..…………………………………………………………………………………………………………………
……..………………………………………………………………………………………………………………
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
10.11 Summary
Inheritance is the process of creating new classes, called derived classes, from existing or base
classes.
Derived class has access to most of the functions and variables of the base class.
Constructor is a special member function. And it must have identical name as the class.
Constructor is the extension of class with additional features and certain limitations.
The purpose of destructor is to release the memory when the compiler memory is reduced or
insufficient to execute certain program.
The constructor and destructor of a base class are not inherited the assignment operator is not
inherited the friend functions and friend classes of the base class are also not inherited.
10.12 Keywords
Derived Class: It is refers to class that inherits some or its entire member from another class.
Multiple Inheritances: It is refers to number of classes has inherited in a single class.
Private Class: It is accessible only to the member and friend function of the class.
Public Class: It is accessible from anywhere where the object is visible.
Reusability: It is refers to the ability for multiple programmers to use the same written and debugged
existing class of data.
11.0 Objectives
After studying this chapter, you will be able to:
Discuss the concept and type of polymorphism
Explain the virtual functions and polymorphism
Discuss the pure virtual functions
Explain the virtual functions
Discuss the early versus late binding
11.1 Introduction
Polymorphism is the ability to use an operator or method in different ways. Polymorphism gives
different meanings or functions to the operators or methods. Poly, refer too many, signifies the many
uses of these operators and methods. A single method usage or an operator functioning in many ways
can be called polymorphism. Polymorphism refers to codes, operations or objects that behave
differently in different contexts.
Polymorphism is a powerful feature of the object oriented programming language C++. A sin gle
operator + behaves differently in different contexts such as integer, float or strings referring the
concept of polymorphism. The concept of overloading is also a branch of polymorphism. When the
exiting operator or function operates on new data type it is overloaded. This feature of polymorphism
leads to the concept of virtual methods.
Function Overloading
Function overloading means multiple functions of same name with different arguments. C++ allows
functions to be overloaded, that is the same function to have more than one definition.
Operator Overloading
Operators are similar to functions they take operands and return a value . For Example: the + operator
can be used to add two integers, two real‘s or two addresses.
Runtime Polymorphism
If the member function is selected when the program is running then it called Runtime
Polymorphism. This is also called late binding or dynamic binding. Runtime Polymorphism achieves
the concept of Virtual Function.
Virtual Function
Virtual function is a member function of a class, whose functionality can be over -ridden in its derived
classes. It can be used with virtual keyword. Virtual member functions are resolved during run -time.
This mechanism is known as dynamic binding. The non-virtual member functions are resolved at
compile time. This mechanism is called static binding.
When people talk about polymorphism in C++ they usually mean the thing of using a derived class
through the base class pointer or reference, which is called subtype polymorphism. But they often
forget that there are all kinds of other polymorphisms in C++, such as parametric polymorphism, ad -
hoc polymorphism and coercion polymorphism.
These polymorphisms also go by different names in C++,
Subtype polymorphism is also known as runtime polymorphism.
Parametric polymorphism is also known as compile-time polymorphism.
Ad-hoc polymorphism is also known as overloading.
Coercion is also known as (implicit or explicit) casting.
Here we illustrate all the polymorphisms through examples in C++ language and also give insight on
why they have various other names.
Since they are all of Felidae biological family, and they all should be able to meow, they can be
represented as classes inheriting from Felid base class and overriding the meow pure virtual function,
// file cats.h
class Felid {
public:
virtual void meow() = 0;
};
class Cat : public Felid {
public:
void meow() { std::cout << ―Meowing like a regular cat! meow! \n‖; }
};
class Tiger : public Felid {
public:
void meow() { std::cout << ―Meowing like a tiger! MREOWWW!\n‖; }
};
class Ocelot: public Felid {
public:
void meow() {std::cout << ―Meowing like an ocelot! mews!\n‖; }
};
Now the main program can use Cat, Tiger and Ocelot interchangeably through Felid (base class)
pointer,
#include <iostream>
#include ―cats.h‖
void do_meowing(Felid *cat) {
cat->meow();
}
int main() {
Cat cat;
Tiger tiger;
Ocelot ocelot;
do_meowing(&cat);
do_meowing(&tiger);
do_meowing(&ocelot);
}
Here the main program passes pointers to cat, tiger and ocelot to do_meowing function that expects a
pointer to Felid. Since they are all Felids, the program calls the right meow function for each felid
and the output is:
Meowing like a regular cat! meow!
Meowing like a tiger! MREOWWW!
Meowing like an ocelot! mews!
Subtype polymorphism is also called runtime polymorphism for a good reason. The resolution of
polymorphic function calls happens at runtime through an indirection via the virtual table. Another
way of explaining this is that compiler does not locate the address of the function to be called at
compile-time, instead when the program is run; the function is called by dereferencing the right
pointer in the virtual table.
In type theory it is also known as inclusion polymorphism.
Caution
When you want to perform the polymorphism than the signature of all the functions must be same as
in base and in derived classes. Avoiding it can prevent to implement polymorphism.
#include <iostream>
#include <string>
int add(int a, int b) {
return a + b;
}
std::string add(const char *a, const char *b) {
std::string result(a);
result += b;
return result;
}
int main() {
std::cout << add(5, 9) << std::endl;
std::cout << add(―hello‖, ―world‖) << std::endl;
}
Ad-hoc polymorphism also appears in C++ if you specialize templates. Returning to the previous
example about max function, here is how you would write a max for two c har *,
template <>
const char *max(const char *a, const char *b) {
return strcmp(a, b) > 0 ? a : b;
}
Now you can call ::max(―foo‖, ―bar‖) to find maximum of strings ―foo‖ and ―bar‖.
11.2.4 Coercion Polymorphism (Casting)
Coercion happens when an object or a primitive is cast into another object type or primitive type. For
example,
float b = 6; // int gets promoted (cast) to float implicitly
int a = 9.99 // float gets demoted to int implicitly
Explicit casting happens when you use C‘s type-casting expressions, such as (unsigned int *) or (int)
or C++‘s static_cast, const_cast, reinterpret_cast, or dynamic_cast.
Coercion also happens if the constructor of a class is not explicit, for example,
#include <iostream>
class A {
int foo;
public:
A(int ffoo) : foo(ffoo) {}
void giggidy() { std::cout << foo << std::endl; }
};
void moo(A a) {
a.giggidy();
}
int main() {
moo(55); // prints 55
}
If you made the constructor of A explicit, that would no longer be possible. It is always a good idea
to make your constructors explicit to avoid accidental conversions.
Also if a class defines conversion operator for type T, then it can be used anywhere where type T is
expected.
For example,
class CrazyInt {
int v;
public:
CrazyInt(int i) : v(i) {}
operator int() const { return v; } // conversion from CrazyInt to int
};
The CrazyInt defines a conversion operator to type int. Now if we had a function, let us say, print_int
that took int as an argument, we could also pass it an object of type CrazyInt,
#include <iostream>
void print_int(int a) {
std::cout << a << std::endl;
}
int main() {
CrazyInt b = 55;
print_int(999); // prints 999
print_int(b); // prints 55
}
Subtype polymorphism that discussed earlier is actually also coercion polymorphism because the
derived class gets converted into base class type.
Caution
Be aware while using runtime polymorphism only method should be overridden except data member.
Next, p is assigned the address of b, and vfunc() is called via p. Since p is pointing to an object of
type base, that version of vfunc() is executed. Next, p is set to the address of d1, and again vfunc() is
called by using p. This time p points to an object of type derived1. This causes derived1::vfunc() to
be executed. Finally, p is assigned the address of d2, and p−>vfunc() causes the version of vfunc()
redefined inside derived2 to be executed. The key point here is that the kind of object to which p
points determines which version of vfunc() is executed. Further, this determination is made at run
time, and this process forms the basis for run-time polymorphism.
Although you can call a virtual function in the ―normal‖ manner by using an object‘s name and the
dot operator, it is only when access is through a base class pointer (or reference) that run -time
polymorphism is achieved. For example, assuming the preceding example, this is syntactically valid:
d2.vfunc(); // calls derived2‘s vfunc()
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
When a virtual function is made pure, any derived class must provide its own definition. If the
derived class fails to override the pure virtual function, a compile-time error will result.
The following program contains a simple example of a pure virtual function. The base class, number,
contains an integer called val, the function setval() , and the pure virtual function show(). The derived
classes hextype, dectype, and octtype inherit number and redefine show() so that it outputs the value
of val in each respective number base (that is, hexadecimal, decimal, or octal).
#include <iostream>
using namespace std;
class number {
protected:
int val;
public:
void setval(int i) { val = i; }
// show() is a pure virtual function
virtual void show() = 0;
};
class hextype : public number {
public:
void show() {
cout << hex << val << ―\n‖;
}
};
class dectype : public number {
public:
void show() {
cout << val << ―\n‖;
}
};
class octtype : public number {
public:
void show() {
cout << oct << val << ―\n‖;
}
};
int main()
{
dectype d;
hextype h;
octtype o;
d.setval(20);
d.show(); // displays 20 - decimal
h.setval(20);
h.show(); // displays 14 – hexadecimal
o.setval(20);
o.show(); // displays 24 - octal
return 0;
}
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
Starting Population
If we want to reproduce five solutions into the new generation, we sum up the fatnesses of all the
current solutions (750) and then generate a random number that tells us which solution is to survive.
The fitness of a particular solution relates to the proportional chance it has to reproduce into the next
generation.
Fitness Graph
Once we have a new set of solutions, we then apply the cross-over principle. We pick two solutions at
random, and then ―breed‖ them. Let us imagine our solutions are represented by an array of four
numbers to breed these solutions, we pick a random crossover point, and then we swap their arrays.
Questions
1. Describe the benefit of a genetic algorithm (GA).
2. What is the importance of genetic algorithm for polymorphism?
11.6 Summary
Polymorphism is one of the features of oops. It simply means one name many forms.
Virtual function is a member function of a class, whose functionality can be over-ridden in its
derived classes.
In C++ parametric polymorphism is implemented via templates.
Operators are similar to functions they take operands and return a value .
Virtual member functions are resolved during run-time. This mechanism is known as dynamic
binding.
11.7 Keywords
Abstract Classes: An abstract class is a class that is designed to be specifically used as a base class.
An abstract class contains at least one pure virtual function.
Compile Time Polymorphism: Compile time polymorphism is functions and operators overloading.
Function Overloading: Function overloading is one of the most powerful features of C++
programming language. It forms the basis of polymorphism (compile -time polymorphism).
Polymorphism: Polymorphism is a programming language feature that allows values of different data
types to be handled using a uniform interface.
Runtime Time Polymorphism: Runtime time polymorphism is done using inheritance and virtual
functions.
Virtual Function: A virtual function or virtual method is a function or method whose behaviour can
be overridden within an inheriting class by a function with the same signature.
12.0 Objectives
After studying this chapter, you will be able to:
Discuss the formatted I/O
Describe the using manipulators to format I/O
Explain the overloading operators
12.1 Introduction
The C++ Programming Language", "Designing and implementing a general input/output facility for a
programming language is notoriously difficult". He did an excellent job, and the C++ IOstreams
library is part of the reason for C++'s success. IO streams provide an incredibly flexible yet simple
way to design the input/output routines of any application.
IOstreams can be used for a wide variety of data manipulations thanks to the following features:
A 'stream' is internally nothing but a series of characters. The characters may be either normal
characters (char) or wide characters (wchar_t). Streams provide you with a universal character -
based interface to any type of storage medium (for example, a file), without requiring you to
know the details of how to write to the storage medium. Any object that can be written to one
type of stream, can be written to all types of streams. In other words, as long as an object has a
stream representation, any storage medium can accept objects with tha t stream representation.
Streams work with built-in data types, and you can make user-defined types work with streams by
overloading the insertion operator (<<) to put objects into streams, and the extraction operator
(>>) to read objects from streams.
The stream library.s unified approach makes it very friendly to use. Using a consistent interface
for outputting to the screen and sending files over a network makes life easier. The programs
below will show you what is possible.
Streams cin, cout, and cerr correspond to C‘s stdin, stdout, and stderr.
By default, the standard streams are used to communicate with the console. However, in
environments that support I/O redirection (such as DOS, Unix, OS/2, and Windows), the standard
streams can be redirected to other devices or files. For the sake of simplicity, the examples in assume
that no I/O redirection has occurred. Standard C++ also defines these four additional streams: win,
wout, werr, and wlog. These are wide character versions of the standard streams. Wide characters are
of type wchar_t and are generally 16-bit quantities. Wide characters are used to hold the large
character sets associated with some human languages.
Caution
If you use a string constant (or variable) to store the file name, you must add a special conversion
when connecting the stream to avoid the error
12.4 Formatting Using the IOS Members
Each stream has associated with it a set of format flags that control the way information is formatted.
The ios class declares a bitmask enumeration called fmtflags in which the following values are
defined. (Technically, these values are defined within ios_base, which, as explained earlier, is a base
class for ios.)
These values are used to set or clear the format flags. If you are using an older compiler, it may not
define the fmtflags enumeration type. In this case, the format flags will be encoded into a long
integer.
When the skipws flag is set, leading white-space characters (spaces, tabs, and newlines) are discarded
when performing input on a stream. When skipws is cleared, white-space characters are not
discarded.
When the left flag is set, output is left justified. When right is set, output is right justified. When the
internal flag is set, a numeric value is padded to fill a field by inserting spaces between any sign or
base character. If none of these flags are set, output is right justified by default. By default, numeric
values are output in decimal. However, it is possible to change the number base. Setting the oct flag
causes output to be displayed in octal. Setting the hex flag causes output to be displayed in
hexadecimal. To return output to decimal, set the dec flag.
Setting showbase causes the base of numeric values to be shown. For example, if the conversion base
is hexadecimal, the value 1F will be displayed as 0x1F.
By default, when scientific notation is displayed, the e is in lowercase. Also, when a hexadecimal
value is displayed, the x is in lowercase. When uppercase is set, these characters are displayed in
uppercase.
The flags specified by flags are cleared. (All other flags are unaffected.)
The following program illustrates unsetf(). It first sets both the uppercase and scientific flags. It then
outputs 100.12 in scientific notation. In this case, the ―E‖ used in the scientific notation is in
uppercase. Next, it clears the uppercase flag and again outputs 100.12 in scientific notation, using a
lowercase ―e.‖
#include <iostream>
using namespace std;
int main()
{
cout.setf(ios::uppercase | ios::scientific);
cout << 100.12; // displays 1.0012E+02
cout.unsetf(ios::uppercase); // clear uppercase
cout << ― \n‖ << 100.12; // displays 1.0012e+02
return 0;
}
In this version, only the flags specified by flags2 are affected. They are first cleared and then set
according to the flags specified by flags1. Note that even if flags1 contains other flags, only those
specified by flags2 will be affected. For example,
#include <iostream>
using namespace std;
int main( )
{
cout.setf(ios::showpoint | ios::showpos, ios::showpoint);
cout << 100.0; // displays 100.0, not + 100.0
return 0;
}
Here, the basefield flags (i.,e., dec, oct, and hex) are first cleared and then the hex flag is set.
Remember, only the flags specified in flags2 can be affected by flags specified by flags1. For
example, in this program, the first attempt to set the showpos flag fails.
Keep in mind that most of the time you will want to use unsetf() to clear flags and the single
parameter version of setf() (described earlier) to set flags. The setf(fmtflags, fmtflags) version of
setf() is most often used in specialized situations, such as setting the number base. Another good use
may involve a situation in which you are using a flag template that specifies the state of all format
flags but wish to alter only one or two. In this case, you could specify the template in flags1 and use
flags2 to specify which of those flags will be affected
3. …………………..are used to hold the large character sets associated with some human languages.
(a) 8-bit (b) 16-bit
(c) Wide characters (d) White-space characters
5. The ……………. function has a second form that allows you to set all format flags associated with
a stream.
(a) setprecision () (b) main() (c) flags() (d) unsetf()
fmtflags flags( );
The following program uses flags() to display the setting of the format flags relative to cout. Pay
special attention to the showflags() function. You might find it useful in programs you write.
#include <iostream>
using namespace std;
void showflags() ;
int main()
{
// show default condition of format flags
showflags();
cout.setf(ios::right | ios::showpoint | ios::fixed);
showflags();
return 0;
}
Here, it becomes the field width, and the field width is returned. In some implementations, the field
width must be set before each output. If it is not, the default field width is used. The streamsize type
is defined as some form of integer by the compiler.
After you set a minimum field width, when a value uses less than the specified width, the field will
be padded with the current fill character (space, by default) to reach the field width. If the size of the
value exceeds the minimum field width, the field will be overrun. No values are truncated.
When outputting floating-point values, you can determine the number of digits to be displayed after
the decimal point by using the precision() function. Its prototype is shown here:
Here, the precision is set to p, and the old value is returned. The default precision is 6.
In some implementations, the precision must be set before each floating-point output. If it is not, then
the default precision will be used.
By default, when a field needs to be filled, it is filled with spaces. You can specify the fill character
by using the fill() function. Its prototype is
After a call to fill(), ch becomes the new fill character, and the old one is returned.
Here is a program that illustrates these functions:
#include <iostream>
using namespace std;
int main()
{
cout.precision(4) ;
cout.width(10);
cout << 10.12345 << ―\n‖; // displays 10.12
cout.fill(―*‖);
cout.width(10);
cout << 10.12345 << ―\n‖; // displays *****10.12
// field width applies to strings, too
cout.width(10);
cout << ―Hi!‖ << ―\n‖; // displays *******Hi!
cout.width(10);
cout.setf(ios::left); // left justify
cout << 10.12345; // displays 10.12*****
return 0;
}
This program‘s output is shown here:
10.12
*****10.12
*******Hi!
10.12*****
There are overloaded forms of width(), precision(), and fill() that obtain but do
not change the current setting. These forms are shown here:
char fill( );
streamsize width( );
streamsize precision( );
Caution
Conditional and arithmetic expressions must be contained in parentheses.
This displays:
64
??????2343
………..……………………………………………………………………………………………………………
…………………………………………………………………………………………………………………...
#include <iostream>
#include <iomanip>
using namespace std;
// A simple output manipulator.
ostream &sethex(ostream &stream)
{
stream.setf(ios::showbase);
stream.setf(ios::hex, ios::basefield);
return stream;
}
int main()
{
cout << 256 << ― ‖ << sethex << 256;
return 0;
}
Be careful when seeking the put pointer into the middle of in the stream. If you put anything in the
stream, it will directly into the stream at the put location. In other words, if you need to insert data in
the middle of a stream, you have to manually move the data that would be ov erwritten. As a side
note, if you are finding yourself doing that too often, then you may want to use a string representation
of your data, which can simplify this kind of random access operation.
3. Once you are at the right location in the stream, input and output is done through the << and >>
operators. If you want to input an object to the stream, use the << operator; for output, use >>.
The class for your object must, of course, have provided overloads for these methods. Here is a
short example:
//Inserts var into string (like objects are displayed by
// putting them to cout)
output_stream<<var;
//Gets the value from the stream‘s characters positioned
// after the get pointer and puts it into var.
input_stream>>var;
If var is an object (either a built in class or a user defined type), the exact process of the input or
output is dependent on the overloaded >> or << operator respectively.
Questions
1. What do input and output really mean?
2. How do streams work?
12.11 Summary
A stream is a logical device that either produces or consumes information.
The setf(fmtflags, fmtflags) version of setf() is most often used in specialized situations, such as
setting the number base.
The << output operator is referred to as the insertion operator because it inserts characters into a
stream.
Encapsulation is an essential component of object-oriented programming.
An output manipulator is particularly useful for sending special codes to a device.
The standard streams are used to communicate with the console.
12.12 Keywords
Field Width: describes the width of the next element to be output. This value can be
obtained/modified by calling the member function width or parameterized manipulator setw.
Format Flags: a set of internal indicators describing how certain input/output operations shall be
interpreted or generated. The state of these indicators can be obtained or modified by calling the
members flags, setf and unsetf, or by using manipulators.
I/O Stream Library: The IO stream library is an object-oriented library that provides input and output
functionality using streams.
Manipulators: Manipulators are functions specifically designed to be used in conjunction with the
insertion (<<) and extraction (>>) operators on stream objects.
Setw( ): The Setw ( ) manipulator is used to set the width of the word to be displayed on screen.