Вы находитесь на странице: 1из 19

SCHOOL OF COMPUTER SCIENCES

U N I V ER S IT I SA I N S M A L AY S IA

CST131 - COMPUTER ORGANISATION


SEM I 2009/2010

ASSIGNMENT 2

NAME MATRIC NO.


DA RWAE SH S HA N MU G A M 106705
S H A RV E S H S . S I VA S H A R E N 106458
K AV IN DR A N GU NA S EL IN 102771

06th OCTOBER 2009


CST 131 - Assignment 2

Assignment 2
(CST131 – Computer Organisation)
Sharvesh S.Sivasharen (106458), Darwaesh Shanmugam (106705), Kavindran Gunaselin
(102771).
School of Computer Sciences,
Universiti Sains Malaysia,
11800 USM Penang.

Abstract

Our purpose for this assignment is to discuss the variation of the design of Memory and
CPU structure due to the rapid increment of processors speed. In these assignment, we have
stated the introduction of Memory and CPU structure as well as our motive for choosing that
two aspect for our assignment. Our main discussion is about the variation that occurred in the
design of Memory and CPU structure due to the increment in the processor speed. We do also
discuss about the pros and cons of previous efforts as well as the possibilities of future
efforts. At the end of our assignment we were able to produce the current effects of the
increment of processor speed on the design of Memory and CPU structure as well as the
future efforts that could be applied by the processor manufacturers to address increment in
processor speed.

i
CST 131 - Assignment 2

1. Introduction and the rationality for choosing the two aspects.

The increase in processor speed has significant demands on the memory system and
central processing unit (CPU) structure. Dramatic changes had been by manufactures at the
production of memories and CPU. According to the Moore’s Law, the bigger pieces of
hardware will generally be slower than smaller pieces and vice versa because for the smaller
piece of hardware, the logic and memory are placed closer together on more densely packed
chips, the electrical path length is shortened where by this increases the operating speed.
There are two reasons why this simple principle is applicable to memories built from similar
technologies. First, larger memories have more signal delay and require more levels to
decode addresses to fetch the required datum. Second, in most technologies we can obtain
smaller memories that are much faster than larger memories. This is primarily because the
designer can use more power per memory cell in a smaller design. The fastest memories are
normally available in smaller number of bits per chips at any point in time, and they cost
substantially more per byte [1]. This is situation is keep on changing from time to time and
the memory design varies. This is the reason that encourages us to choose memory as our
first factor for this assignment. As we know there are limitations in making a CPU smaller, so
the CPU structure is often modified to cope with the demand for speed and smaller size. New
technologies and designs have been implemented in a CPU such as pipelining and hyper-
threading technology. CPU is considered like a heart of a person where the computer is just a
useless piece of metal box with electronic chips in it without a CPU. CPU does all the
processing and this is why we have chosen CPU structure as our second factor for this
assignment.

As we know, processer speed is increasing from time to time and the processor
manufacturers are taking plenty of action to address this problem. The main rational behind
our action to choose memory and CPU structure for our assignment is both of these
contributes to the “bottleneck problem”, where if the memory is high, it can keep many
instructions and the CPU slow, only can execute few instructions at a time, there is no
increase in processor speed and vice versa. To increase the processor speed, the designs of
memory and CPU structure are the ones which are affected the most.

1
CST 131 - Assignment 2

2. Memories and the affects of the design by the increase of processor speed

2.1 Cache Memory

This is the closest memory to the CPU that holds the most recently accessed code or
data. It is small and fast. The caches are made up of SRAM (static random access memory).
When the CPU finds the requested data item in the cache, it is called a cache hit. When the
CPU does not find the data item it needs in the cache, it results in cache miss, and the data
item must be fetched from main memory. The size of caches can vary widely in their size and
organization, and there may be more than one level of cache in a hierarchy [2].

The increase in processor speed has boosted the intention to create multi-level caches.
Nowadays there are caches which have a two level structure, with the level closer to the
processor designated as L1 cache and the next level designated as L2 cache. Larger caches
improve effective bandwidth by sending fewer requests (misses) across the interconnection.
Nowadays on-chip caches are increasing to the range of megabyte. Some of the programs
might have working sets that are too large to fit in these caches. Further, large on-chip caches
have the effect of increasing the system cost [3]. The multi-level caches has increased to the
level L3 and L4 this is due to the rate of increase in the processor speed which is keep on
increasing from time to time.

Figure 1.1 The process that occurs in cache memory

2
CST 131 - Assignment 2

2.2 Internal / Main Memory

Internal memory provides the needs of caches and functions as the I/O interface. The
main memory is usually made up of DRAM (Dynamic random access memory) and has
relatively large storage capacity compared to the caches (SRAMs). The DRAMs also have
larger access times as compared to SRAMs.

The rate of the increase in processor speed has exceeds the rate of improvement in
DRAM memory speed. Separate chips are used for allowing microprocessors to use
expensive packages that dissipate high power and provide more of pins to make wider
connection to external memory, while allowing DRAMs to use inexpensive packages which
dissipate low power and use only a few dozen pins. The increase of processor speed has made
the computer designers to scale the number of memory chips independent of the number of
processors [2].

Figure 1.2 Processes that occurs in the internal memory

3
CST 131 - Assignment 2

3. CPU structures and the affects of the design by the increase of processor speed.

3.1 Pipelining

Pipelining is a technique where few instructions are overlapped for execution. [1] In
the Intel’s 8088-8086 processor, the instructions are fetched soon as the previous one is
complete. This means it is done step by step. For example if there is 9 instructions and each
instruction takes 6 unit of time this means the process finishes at the 54th unit. Where else
with structured pipeline several instructions could be fetched at the same time. A single-cycle
of instruction is divided into six stages. A six stage pipeline will allow up to six instructions
to be executed for a single clock cycle. Therefore a single cycle must be divided into six
parts, with each part corresponds to a stage of instructions.

Fetch Instruction (FI): read next instruction


Decode Instruction (DI): determine opcode & operand specifiers
Calculate Operands (CO): addressing mode
Fetch Operands(FO): from memory (for register, no need)
Execute Instructions(EI): execute and store results in dest

The process is faster if there are more stages in a pipeline as it decreases the execution
time. This means the process is faster because the first and the last instruction use the external
buses. The cycle of the next instruction can begin at the same time as the internal decoding of
the instruction.

4
CST 131 - Assignment 2

Figure 2.1 Six stage pipeline.

As shown in the figure above, there are six stage pipeline and it reduces the execution
time from [T1=nk = 9 instruction x 6 stages] 54 unit of time to [Tk= k+(n-1)=6 stages +(9
instruction – 1)] 14 unit of time. This means the execution is speed up to [speedup = S=
T1/Tk= 54/14] 3.86. This process proves that pipelining has affected the processor speed.

5
CST 131 - Assignment 2

Figure 2.2 Graph of speedup factor versus number of instructions.

Figure 2.3 Graph of speedup factor versus number of stages.

The two graphs above prove that the processing speed is increased when the stages
per cycle of instruction are increased as it can execute many instructions at once. Pipelining
does not decrease the time for a single datum to be processed. It only increases the
throughput of the system when processing a stream of instructions

6
CST 131 - Assignment 2

3.2 Hyper-Threading

Hyper-Threading enables a processor to function as two logical processors. [5] A


CPU has many parts and Hyper-threading enables all the parts of the CPU to work on
different concurrently. Hyper-threading is the next level of super threading which is an
improvement of single threaded CPU. The differences between hyper-threading and super
threading are there are no restrictions that all the instructions issued by the front end on each
clock are from the same thread. The figure below shows the illustration of a single-threaded
CPU, a super-threaded CPU and a hyper threaded CPU.

Pipeline
bubbles

Single-threaded Super-threaded Hyper-threaded


CPU CPU CPU

Figure 3.1 Illustration of Single-threading, Super-threading and Hyper-threading


processor.

The figure 3.1 clearly shows that single-threaded CPU only runs instructions for a
program while the other instructions from other programs waits for it to end, this means there
are many pipeline bubbles (unused stages in a pipeline). Where else for the super threaded,
the processors are able to executing more than one thread at a time. This means the processor
is capable to run instructions for few programs at a time. It is noticed that there are still
pipeline bubbles in super-threaded CPU. There are restrictions in super-threaded CPU; all the

7
CST 131 - Assignment 2

instructions issued by the front end on each clock are from the same thread. The arrows in the
super-threaded CPU (figure 3.1) show the limitation of mixing the instructions. Compared to
the illustration of hyper-threading (figure3.1), we can say that the CPU’s front end is fully
utilized as there are no pipeline bubbles. This clearly shows that the processor can now
execute more instructions and the speed has been affected. Now with the hyper-threading the
processor is much faster as if there are two CPU running. [6]

8
CST 131 - Assignment 2

4. Discussion.

4.1 Cache Memory

4.1.1 Pros of Increment in Cache Memory

By increasing the cache memory, data can be transferred much faster because there is
no usage of system bus. Besides, effective bandwidth could be improved by sending fewer
requests to the interconnection. More cache memory could hold working sets for more
programs.

4.1.2 Cons of increment in Cache Memory

There might be programs with working sets that can’t fit in these caches and the cost
of the system might increase by increasing caches.

4.2 Internal / Main Memory

4.2.1 Pros of Internal / Main Memory

By increasing the size of main memory the number of pins will be increased. This
makes the response of the memory faster. Increment in pins make a smaller amount of
memory is added to the cache. Thus, it will make the process faster.

4.2.2 Cons of Internal / Main Memory

Increment in main memory has limit and after it exceeds it will end up in more virtual
memory and the cost of the system will increase by the increase in processor speed

9
CST 131 - Assignment 2

4.3 Pipelining

4.3.1 Pros of Pipelining

Pipelining reduces the cycle time of processor which increases instruction issued rate.
Pipeline can also save circuitry and reduce from implementing a more complex
combinational circuit.

4.3.2 Cons of Pipelining

In pipelining, only single instruction is executed at a single time. This prevents the
branch prediction delay but it’s not so efficient. The latency caused by instruction in a non
pipelined processor is slightly lower compared to a pipelined processor. A pipelined
processor is not stable in its bandwidth compared to a non pipelined processor. This is
because a different program makes it hard to predict.

4.4 Hyper-Threading

4.4.1 Pros of Hyper-Threading

Hyper-Threading improves the support for multi-threaded codes. It also enables


multiple threads to be executed simultaneously. Reaction and responses of are much faster
than a single-thread. There will not be any performance loss if only one thread is active.
Hyper-threading increases performance with multiple threads and it has better resource
utilization.

10
CST 131 - Assignment 2

4.4.2 Cons of Hyper-Threading

Hyper-Threading can only process like a dual processor but the performances is not as
fast as a true dual core processor. To take advantage of hyper-threading performance, serial
execution cannot be used because threads are non-deterministic and involve extra design.
Threads do also have increased overhead. Another disadvantages of hyper-threading is that
shared resource may have conflicts

11
CST 131 - Assignment 2

5. Future Trends.

5.1 Future of Memory

Figure 5.1 Illustration of Quantum dots

Quantum dots could well be the future of memory. These quantum dots which are
currently in development are atoms of semiconductor which is memory and are capable of
creating storage which is fast and longer lasting then current memories in the market such as
DRAM .Research have recently shown that quantum dot memory is capable of storing 1
terabyte which is 1000 gigabytes of data per square inch which is vast improvement
compared to the technology we have now .They have also shown that it is possible to write
information on to this memory in just a fraction of seconds. Although this technology is far
from reaching the consumer market, quantum dot may well be the future of memory
technology. It looks very promising and could replace all of the different kinds of memory
existing today if it can be mass manufactured in the industry. [11]

5.2 MRAM

The second technology currently being developed is MRAM which stands for
magnetic random access memory which is combination of information storage in magnetic
materials with the semiconductor technology which is used in today memory chips such as
the DRAM dynamic random access memory. MRAM works by using the magnetization of
the electrons called spin which enters from a magnet into the semiconductor spin polarized
transports the information stored in the magnet can be read out electronically at ultra high

12
CST 131 - Assignment 2

speed. The information density of such a MRAM technology would be much higher than
today’s RAM where the access times for MRAM may be even smaller. The resulting memory
technology offers many advantages which include a less complex structure than that of today
DRAM because it has better capabilities for downscaling. There is every reason to expect that
MRAM will take over as the standard chip in computing technology when CMOS technology
has reached its limits. [12]

Figure 5.2 Illustration of MRAM

5.3 Pipelining and Instructions

The future of pipelining involves the development of using more pipelines if possible
and thru parallelism if the technology permits with each pipeline being a superscalar. This
process involves moving and processing multiple instructions simultaneously through the
fetch, analyze, execute, and store stages. Instructions are run through stages in parallel which
every clock cycle is used to process instructions. As a stage of a pipeline in a computers CPU
finishes manipulating an instruction, the instruction is pass on to the next stage and gets
another instruction from the stage before it while moving several instructions along the
pipeline simultaneously. This process is more efficient than it would be if each instruction
had to start at the first stage after the previous instruction finished the final stage. The more
pipelines a CPU have, the faster it can execute instructions. There would be problem with this

13
CST 131 - Assignment 2

technology but it could be solved by using branch prediction unit which correctly guess the
correct instruction sequence.

Figure 5.3 Comparison of RISC, Superscalar, VLIW pipelining

14
CST 131 - Assignment 2

6. Conclusion

By doing this assignment, we were able to identify the two main aspects that affect
the processor speed. Besides we were able to identify the design of memory and CPU
structure that varies according to the increment of processor speed. We were also able to
discuss the advantages and disadvantages of the affects and the possibilities of future efforts.
The future of the design of memory and CPU structure is determined by how fast a processor
advances. This in turn requires memory and CPU structure to be developed to match the
capability of processors. With each passing year, we can only predict the future but only
continuous development of technology will determine how far the technology will lead us in
terms of processor speed.

15
References

[1] John L Hennessy and David A Patterson, Computer Architecture: A Quantitative


Approach, Morgan Kaufman, CA, 1996.

[2] David Patterson, Thomas Anderson et al. A Case for Intelligent RAM: IRAM, IEEE
Micro, April 1997.

[3] Nihar R. Mahapatra, Balakrishna Venkatrao, The Processor-Memory bottleneck:


Problems and Solutions. www.acm.org/crossroads/xrds5-3/pmgap.html.1996

[4] Web Media Brands Inc. http://www.webopedia.com/TERM/I/pipelining.html. Last


modified: March 2003.

[5] Intel Corporation. Intel® Pentium® 4 Processor, Supporting Hyper-Threading


Technology1. Document Number: 303128-004. http://download.intel.com/ design/
Pentium4/ datashts /303128.pdf.

[6] Jon Hannibal Stokes. Introduction to Multi-threading, Super-threading and Hyper-


threading. http://arstechnica.com/old/content/2002/10/hyperthreading.ars. 2002

[7] Susan Eggers, Hank Levy, Steve Gribble. Simultaneous Multithreading Project.
University of Washington. 2007

[8] D. Burger, J.R. Goodman, and A. Kagi. Memory Bandwidth Limitations of Future
Microprocessors, Proc. 23rd Ann. Int'l Symp. Computer Architecture, Assoc. of
Computing Machinery, 1996.

[9] Wikimedia Foundation, Inc. Hyper-Threading. en.wikipedia.org/wiki/Hyper-


threading. 2009

[10] Answers Corporation. Computer Desktop Encyclopedia: Hyper-threading.


www.answers.com/topic/hyper-threading. 2009

i
[11] Chris Lee. The Perfect Computer Memory. http://arstechnica.com/ hardware/ news/
2007/ 12/ the-perfect-computer-memory.ars. December 26, 2007

[12] William Stallings. Computer Organization and Architecture Sixth Edition. Pearson
Education International. 2003

[13] Hamacher , V.C.. Computer Organization, Fifth Edition, McGraw Hill, 2002

[14] David Patterson, Thomas Anderson et al., A Case for Intelligent RAM : IRAM, IEEE
Micro, April 1997.

[15] D. Burger, J.R. Goodman, and A. Kagi. Memory Bandwidth Limitations of Future
Microprocessors, Proc. 23rd Ann. Int'l Symp. Computer Architecture, Assoc. of
Computing Machinery, pp. 79-90, Aug. 1996.

[16] Hammerstrom, D. and E. Davidson. Information Content of CPU Memory


Referencing Behavior, Proc. 4th Ann. Int'l Symp. Computer Architecture, Assoc. of
Computing Machinery, pp. 184-192. March 1977.

[17] L. Rudolph and D. Criton. Creating a Wider Bus using Caching Techniques. Proc. 1st
Int'l Symp. High Performance Computer Architecture. IEEE Comp Society Press, pp.
90-99. 1995.

[18] T. Mudge and P. Bird. An Instruction stream Compression Technique. Proc of Micro
-30, Dec 1997.

[19] Wm. A. Wulf and Sally A McKee. Hitting the Memory wall: Implications of the
Obvious. Computer Architecture News, 23(1), pp. 20-24, March 1995.

ii

Вам также может понравиться