Вы находитесь на странице: 1из 17

Evolution Of Computers:

o A Brief History Of Computers

Prepared by Mubeen Ahmed

A Brief History Of Computers

!"History reveals a clear pattern in the evolution of computers. Processing power


increases rapidly after the introduction of the new technology. The rate of growth
eventually slows down as the technology is exploited to it’s full potential.
While in the background other technologies are nurturing and one ultimately
supersedes the other to become the dominant technology and this cycle is
repeated.

!"Under the right conditions the shift to the new technology can lead to possible
increase in processor speed of hundred to thousand times

Electromechanical Computer
All- electronic computer with vacuum tubes
Fully transistorized computer
Scalable massive parallelism
Machine for Abacus from
computational China
assistance Assisted

1822
1642 1672 CharlesBabbage +,-,/,* and solve polynomial
Blasie Pascal Leibniz made equations
made the first a machine
machine that that could Idea of a programmable machine
could Add perform all
four basic Never succeeded but made an “Analytical
functions Machine”

Input # control #processor # store #output

Inspired inventors made little


improvement

Inspire a brilliant countess


Lady Ada Lovelace She thought about analytical
design and realized that DO
IF would necessary

British Mathematican George Boole


started to study about the foundation 1807 American Logican
of logic Charles Sadis Pierce
observerd that upcoming
An argument be presented by x or y electrical ON/OFF
But the result could only be True or technology could be
False intertwined with Boole work

Studied in detail and found out that


AND OR NOT could be used
together to analyze any proposition
logically 1937
George Stibitz from Bell
Laboratory practically
Made an adder then a
multiplier etc.
Using Boole and Pierce
work
Howard Aiken used wheels Mauchly and Ecleut was
controlled by electrical given a project to make the
impulses first complete electrical
Beginning of electrical machine using vaccum tubes
Computational machines Electrical Engineer At Upeen
MARK I was made in
World War II
ENIAC was made after the
war ended It was a massive
machine

Von Neuman meet Herman


Goldstine accidentially.
He collaborated extensively with
the ENIAC team.
His efforts were to use compter
to solve real world problems
This collaboration lead to the
Most influential paper which
formed basis for the VON
NEUMAN ARCHITECTURE

FOUR STEP SYSTEM

Extract Input One # Extract Input Two #


Extract The Instruction # Store the output

SCALAR PROCESSING

FLOPS is floating point operations per


second which is a term used to
compare the processing power of
machines
Transistors were introduced in 1950’s by John Bardeen and William Shockley
from Bell Labs
More transistors could be placed on one chip
And they were very much faster.
The US Govt intervened to accelerate ythe development Remington Rand and
IBM given the challenge to make first all transistor machine
Remigton Rand won the contract
LARC made with 60,000 transistors
IBM worked in the background made 169100 transistor machine
But were unable to reach the required speed
After losing millions of dollars both industries decided to proceed to a more
lucrative business market
A vacuum built on the high computation side this was later on taken up by
Control Data Corporation lead by Seymour Cray
Which would lead the market for next two decades

Integrated circuits and then processors on a


single chip were introduced
Power consumption decreased
These integrated circuits marked the
beginning of increase of speed more by
design
Seymour implemented what was known as
vectorization in processor design

9 * 100 instructions

Task multiply
100 numbers

Output result

100 + 9 instructions
Computer Architectures
Taxonomy of Architectures

For computer architectures, Flynn proposed that the two dimensions be termed Instruction and
Data, and that, for both of them, the two values they could take be Single or Multiple.

Single Instruction, Single Data (SISD)

This is the oldest style of computer architecture,


and still one of the most important: all personal
computers fit within this category. Single
instruction refers to the fact that there is only
one instruction stream being acted on by the
CPU during any one clock tick; single data
means, analogously, that one and only one data
stream is being employed as input during any
one clock tick. These factors lead to two very
important characteristics of SISD style
computers:

• Serial Instructions are executed one after


the other, in lock-step;
• Deterministic
• Examples: Most non-supercomputers
Multiple Instruction, Single Data (MISD)

Few actual examples of computers in this


class exist;

However, special-purpose machines are


certainly conceivable that would fit into this
niche: multiple frequency filters operating
on a single signal stream, or multiple
cryptography algorithms attempting to
crack a single coded message. Both of
these are examples of this type of
processing where multiple, independent
instruction streams are applied
simultaneously to a single data stream.

Single Instruction, Multiple Data (SIMD)

A very important class of architectures in the


history of computation, single-
instruction/multiple-data machines are
capable of applying the exact same
instruction stream to multiple streams of
data simultaneously. For certain classes of
problems, e.g., those known as data-parallel
problems, this type of architecture is
perfectly suited to achieving very high
processing rates, as the data can be split
into many different independent pieces, and
the multiple instruction units can all operate
on them at the same time.

• Synchronous (lock-step)
• Deterministic
Multiple Instruction, Multiple Data (MIMD)

Many believe that the next major


advances in computational capabilities will
be enabled by this approach to parallelism
which provides for multiple instruction
streams simultaneously applied to multiple
data streams. The most general of all of
the major categories, a MIMD machine is
capable of being programmed to operate
as if it were in fact any of the four.

• Synchronous or asynchronous

MIMD instruction streams can


potentially be executed either
synchronously or asynchronously,
i.e., either in tightly controlled lock-
• step or in a more loosely bound "do

your own thing" mode. Some kinds of algorithms require one or the other, and
different kinds of MIMD systems are better suited to one or the other; optimum
efficiency depends on making sure that the system you run your code on reflects the
style of synchronicity required by your code.

• Non-deterministic
• Multiple Instruction or Single Program

MIMD-style systems are capable of running in true "multiple-instruction" mode,


with every processor doing something different, or every processor can be given
the same code; this latter case is called SPMD, "Single Program Multiple Data",
and is a generalization of SIMD-style parallelism, with much less strict
synchronization requirements.
Terminology of Parallelism

Task : A logically discrete section of computational work.

Parallel Tasks : Tasks whose computations are independent of each other, so that
all such tasks can be performed simultaneously with correct results.

Parallelizable Problem : A problem that can be divided into parallel tasks. This
may require changes in the code and/or the underlying algorithm.

Example of Parallelizable Problem:


Calculate the potential energy for each of several thousand independent
conformations of a molecule; when done, find the minimum energy conformation
Example of a Non-parallelizable Problem:
Calculation of the Fibonacci series (1,1,2,3,5,8,13,21,...) by use of the formula:
F(k + 2) = F(k + 1) + F(k)
A non-parallelizable problem, such as the calculation of the Fibonacci sequence
above, would entail dependent calculations rather than independent ones

Types of Parallelism: There are two basic ways to partition computational work among
parallel tasks:

Data parallelism: each task performs the same series of calculations, but applies them to
different data. For example, four processors can search census data looking for people
above a certain income; each processor does the exact same operations, but works on
different parts of the database.
Functional parallelism: each task performs different calculations, i.e., carries out
different functions of the overall problem. This can be on the same data or different data.
For example, 5 processors can model an ecosystem, with each processor simulating a
different level of the food chain (plants, herbivores, carnivores, scavengers, and
decomposers).

Observed speedup of a code which has been parallelized =


wall-clock time of serial execution
---------------------------------------
wall-clock time of parallel execution
Synchronization
The temporal coordination of parallel tasks. It involves waiting until two or more
tasks reach a specified point (a sync point) before continuing any of the tasks.
Synchronization is needed to coordinate information exchange among tasks; e.g.,
the previous example finding minimum energy conformation: all of the
conformations had to be completed before the minimum could be found, so any task
that was dependent upon finding that minimum would have had to wait until it was
found before continuing.
Synchronization can consume wall-clock time because processor(s) sit idle waiting
for tasks on other processors to complete.
Synchronization can be a major factor in decreasing parallel speedup, because, as
the previous point illustrates, the time spent waiting could have been spent in useful
calculation, were synchronization not necessary.

Parallel Overhead

Time to start a task


This involves, among other things:
identifying the task
locating a processor to run it
loading the task onto the processor
putting whatever data the task needs onto the processor
actually starting the task
Time to terminate a task
Termination isn't a simple chore, either: at the very least, results have to be
combined or transferred, and operating system resources have to be freed before the
processor can be used for other tasks.
Synchronization time, as previously explained.
Classification By Memory

Shared Memory

• The same memory is accessible to multiple processors


• Synchronization is achieved by tasks' reading from and writing to the
shared memory.
• A shared memory location must not be changed by one task while
another, concurrent task is accessing it.
• Data sharing among tasks is fast (speed of memory access)
• Disadvantage: scalability is limited by number of access pathways to
memory
• User Memory
Distributed is responsible for specifying synchronization, e.g., locks

DISTRIBUTED MEMORY

• Memory is physically distributed among processors; each local memory is directly


accessible only by its processor.
• Synchronization is achieved by moving data (even if it's just the message itself)
between processors (communication).
• A major concern is data decomposition -- how to divide arrays among local CPUs
to minimize communication
Parallel Program Design :

!"First we cover the ideal goals for a parallel solution. We review functional and
data parallelism, and SPMD and Master Worker.
!"Then we walk through 5 problem examples showing diagrams of possible parallel
solutions.
!"Problems faced in prallel programming

Goals (ideal)

Ideal (read: unrealistic) goals for writing a program with maximum speedup and
scalability:
• Each process has a unique bit of work to do, and does not have to redo any other work in
order to get its bit done.
• Each process stores the data needed to accomplish that work, and does not require anyone
else's data.
• A given piece of data exists only on one process, and each bit of computation only needs
to be done once, by one process.
• Communication between processes is minimized.
• Load is balanced; each process should be finished at the same time.

Usually it is much more complicated than this!


Keep in mind that:
• There may be several parallel solutions to your problem.
• The best parallel solution may not flow directly from the best serial solution.
Major Decisions

Functional Parallelism?

• Partition by task (functional parallelism)


. • Each process performs a different "function" or executes a
different code section
• First identify functions, then look at the data requirements

Data Parallelism?

• Each process does the same work on a unique piece of data


• "Owner computes"
• First divide the data. Each process then becomes responsible for
whatever work is needed to process that data.
• Data placement is an essential part of a data-parallel algorithm
• Data parallelism is probably more scalable than functional parallelism
Distributed memory programming models

Distributed memory architectures are fertile grounds for the use of many different styles
of parallel programming, from those emphasizing homogeneity of process but
heterogeneity of data, to full heterogeneity of both.

Data parallel
Many significant problems, over the entire computational
complexity scale, fall into the data parallel model, which basically
stands for "do the same thing to all this data":
Explicit data distribution (via directives)
The data is assumed to have some form of regularity, some
geometric shape or other such characteristic by which it may be
subdivided among the available processors, usually by use of
directives commonly hidden from the executable code within
program comment statements.

Single thread of control


Each processor in the distributed environment is loaded with a copy
of the same code, hence single thread of control; it is not necessary,
nor expected, that all processors will be synchronized in their
execution of this code, although the amount of instruction-
separation is generally kept as small as possible in order to, among
other things, maintain high levels of processor efficiency (i.e., if
some processors have much more work to do than others, even
though they're all running off the same code, then it'll turn out that
some processors get finished long before the others do, and will
simply be sitting there spinning, soaking up cycles and research
bucks, until the other processors complete their tasks ... this is
known as load-imbalance, and we'll talk more about this later, but it
should be obvious even now that it is a bad thing).
Examples:
HPF
High Performance Fortran (HPF) is a standard in this sort of
work
Key principles in explicit message passing programming

We're now going to discuss some general issues relevant to the construction of well-
designed distributed applications which rely on explicit message passing for data- and
control-communications. These principles are largely concerned with issues you should
be focusing on as you consider the parallelization of your application:
• How is memory going to be used, and from where?
• How will the different parts of the application be coordinated?
• What kinds of operations can be done collectively?
• When should communications be blocking, and when non-blocking?
• What kinds of synchronization considerations need to be addressed, and when?
• What kinds of common problems could be encountered, and how can they be
avoided?
As has been mentioned before, and as will be mentioned again:
There's no substitute for a good design ... and the worse your design, the more time
you'll spending debugging it.
It must be emphasized that the machine does not think for itself. It may exercise some degree of
judgment and discrimination, but the situations in which these are required, the criteria to be
applied, and the actions to be taken according to the criteria, have all to be foreseen in the
program of operating instructions furnished to the machine. Use of the machine is no substitute
for thought on the basic organization of a computation, only for the labour of carrying out the
details of the application of that thought."
Douglass R. Hartree, Moore School lecture, Univ. of Penn., 9 July 1946

Addressability

As one module in a distributed application, knowing what you know, and, for what you
don't who to ask, is one of the central issues in message passing applications. "What you
know" is the data you have resident on your own processor; what you don't know" is
anything that resides elsewhere, but you've discovered is necessary for you to find out.

CPU can issue load/store operations involving local memory space only

Requests for any data stored in remote processor's memory must be converted by
programmer or run-time library into message passing calls which copy data between local
memories.

You not only have to know that you don't know something, or that something that you
used to know is now out-of-date and needs refreshing ... you also need to know where to
go to get the latest version of the information you're interested in.

No shared variables or atomic global updates (e.g. counters, loop indices)

Synchronization is going to cost you, because there's no easy way to quickly get this kind
of information to everybody ... that's just one of defining characteristics of this model of
operation, and if its implications are too detrimental to the effectiveness of your
application, that's a good enough reason to explore other alternatives.

Communication and Synchronization

The act of communicating within a distributed computing environment is very much a


team-effort, and has implications beyond that of simply getting information from
processor-a to processor-b.
On multicomputers, all interprocessor communication, including synchronization, is
implemented by passing messages (copying data) between processors
Making sure that everyone is using the right value of variable x is, without question, a
very important aspect of distributed computing; but so is making sure that no one tries to
use that value before the rest of the pieces are in place, a matter of synchronization.
Given that the only point of connection among all of the processing elements in a
distributed environment lies in the messages that are exchanged, synchronization, then,
must also be a matter of message-passing.

In fact, synchronization is very often seen as a separable subset of all communication


traffic, more a matter of control information than data

keep your synchronization requirements to the absolute minimum, and code them to be
lean-and-mean so that as little time is taken up in synchronization (and consequently
away from meaningful computation) as possible.

All messages must be explicitly received (sends and receives must be paired)
Just like the junk mail that piles up in you mailbox and obscures the really important stuff
(like your tax return, or the latest edition of TV-Guide), messages that are sent but never
explicitly received are a drain on network resources.
Grain Size
Grain size loosely refers to the amount of
computation that is done between Starvation
communication or synchronization
The amount of time a processor is
( T + S ) * equally shared load interrupted to report it’s present state
Should not be large or the processor
So S is important will not have time to compute

Flooding and Throttling

For many parallel problem the problem is


broken down into further parallel task Deadlock

This should not so much that you are A set of processes is deadlocked if each
unable to the number of tasks exceeds the processes in the set hold and none will
number of processors if this happens the release until the processes have granted the
forward execution of the program fill be other resources that they are waiting
severly impaired
You can try to detect a deadlock a kill a
Dynamic switching is a technique might be process but this requires a monitoring
used to jump between the two. system

You can make deadlock impossible if you


number your resources and requesting
Load Balancing resources in ascending order.

We can distribute the load by (N/P) Partitioning and Scheduling


(floor or ceiling)
One of the most important tasks
Ceiling has the advantage that one
processor does not become the bottle Scheduling might be static or dynamic
neck Job Jar technique

Communication Bottle Necks

Which is the bottle neck of parallel


computation and how to remove it .
Costs of Parallel Processing

By this point, I hope you will have gotten the joint message that:
Parallel processing can be extremely useful, but...
There Ain't No Such Thing As A Free Lunch

Programmer's time
As the programmer, your time is largely going to be spent doing the following:

Analyzing code for parallelism


The more significant parallelism you can find, not simply in the existing code, but even
more importantly in the overall task that the code is intended to address, the more
speedup you can expect to obtain for your efforts.

Recoding
Having discovered the places where you think parallelism will give results, you now have
to put it in. This can be a very time-consuming process.
Complicated debugging

Debugging a parallel application is at least an order of magnitude more infuriating,


because you not only have multiple instruction streams running around doing things at
the same time, you've also got information flowing amongst them all, again all at the
same time, and who knows!?! what's causing the errors you're seeing?
It really is that bad. Trust me.
Do whatever you can to avoid having to debug parallel code:
• consider a career change;
• hire someone else to do it;
• or write the best, self-debugging, modular and error-correcting code you
possibly can, the first time.
If you decide to stick with it, and follow the advice in that last point, you'll find that the
time you put into writing good, well-designed code has a tremendous impact on how
quickly you get it running correctly. Pay the price up front.
and only for as long as you actually need them.