Вы находитесь на странице: 1из 16

http://www.eastaughs.fsnet.co.uk/cpu/index.

htm

The microprocessor is sometimes referred to as the 'brain' of the personal computer,


and is responsible for the processing of the instructions which make up computer
software. It houses the central processing unit, commonly referred to as the CPU,
and as such is a crucially important part of the home PC. However, how many people
really understand how the chip itself works?

This tutorial aims to provide an introduction to the various parts of the


microprocessor, and to teach the basics of the architecture and workings of the CPU
across three specific sections:

CPU Structure
This section, using a simplified model of a central processing unit as an example,
takes you through the role of each of the major constituent parts of the CPU. It also
looks more closely at each part, and examines how they are constructed and how
they perform their role within the microprocessor.

Instruction Execution
Once you are familiar with the various elements of the processor, this section looks
at how they work together to process and execute a program. It looks at how the
various instructions that form the program are recognised, together with the
processes and actions that are carried out during the instruction execution cycle
itself.

Further Features
Now that the basics have been covered, this section explores the further
advancements in the field of microprocessor architecture that have occured in recent
years. Explanations of such techniques as pipelining and hyperthreading are
provided, together with a look at cache memory and trends in CPU architecture.

Each section also concludes with a multiple choice quiz with which you can test your
knowledge, while some also contain interactive animations in order to improve your
learning experience. These animations are in Macromedia Flash format, and will
require Flash Player to be installed on your computer. If it is not, please visit the
Macromedia website in order to download and install the browser plug-in.

The first section of this tutorial related to the structure of the central processing unit.
Please click the button marked with the next arrow below to proceed.
As there are a great many variations in
architecture between the different kinds of
CPU, we shall begin my looking at a
simplified model of the structure. The
model to be used can be seen on the right
of this page, and is a good basis on which
to build your knowledge of the workings of
a microprocessor. The simplified model
consists of five parts, which are:

Arithmetic & Logic Unit (ALU)


The part of the central processing unit
that deals with operations such as
addition, subtraction, and multiplication of
integers and Boolean operations. It
receives control signals from the control
unit telling it to carry out these
operations. For more, click the title above.
The simplified model of the central processing unit.
Click on an area for more details.
Control Unit (CU)
This controls the movement of instructions in and out of the processor, and also
controls the operation of the ALU. It consists of a decoder, control logic circuits, and
a clock to ensure everything happens at the correct time. It is also responsible for
performing the instruction execution cycle. More on the control unit can be
discovered by clicking the title above.

Register Array
This is a small amount of internal memory that is used for the quick storage and
retreival of data and instructions. All processors include some common registers used
for specific functions, namely the program counter, instruction register, accumulator,
memory address register and stack pointer. For more, click the title above.

System Bus
This is comprised of the control bus, data bus and address bus. It is used for
connections between the processor, memory and peripherals, and transferal of data
between the various parts. Click the title above for more.

Memory
The memory is not an actual part of the CPU itself, and is instead housed elsewhere
on the motherboard. However, it is here that the program being executed is stored,
and as such is a crucial part of the overall structure involved in program execution.
For further information on the memory, please see the seperate tutorial if available.

For more information on these parts of the CPU, click the corresponding title of the
description above. You could also click on the part in question on the diagram to the
right. Alternatively, click the right arrow button below to move on to the next page,
which looks at the arithmetic and logic unit.
The ALU, or the arithmetic and logic unit,
is the section of the processor that is
involved with executing operations of an
arithmetic or logical nature. It works in
conjunction with the register array for
many of these, in particular, the
accumulator and flag registers. The
accumulator holds the results of
operations, while the flag register contains
a number of individual bits that are used
to store information about the last
operation carried out by the ALU. More on
these registers can be found in the
register array section.

You can look at the ALU as comprising


many subcomponents for each specific
task that it is required to perform. Some
The simplified model of the central processing unit,
of these tasks and their appropriate with the arithmetic & logic unit highlighted in red. Click
subcomponents are: on a different section for more information.

Addition and subtraction


These two tasks are performed by constructs of logic gates, such as half adders and
full adders. While they may be termed 'adders', with the aid of they can also perform
subtraction via use of inverters and 'two's complement' arithmetic.

The topic of logic gates is too expansive and detailed to be covered in full here. Many
resources exist on the internet and elsewhere relating to this topic, however, so it is
recommended that you read further into the areas outlined above to aid with your
learning.

Multiplication and division


In most modern processors, the multiplication and division of integer values is
handled by specific floating-point hardware within the CPU. Earlier processors used
either additional chips known as maths co-processors, or used a completely different
method to perform the task.

Logical tests
Further logic gates are used within the ALU to perform a number of different logical
tests, including seeing if an operation produces a result of zero. Most of these logical
tests are used to then change the values stored in the flag register, so that they may
be checked later by seperate operations or instructions. Others produce a result
which is then stored, and used later in further processing.

Comparison
Comparison operations compare values in order to determine such things as whether
one number is greater than, less than or equal to another. These operations can be
performed by subtraction of one of the numbers from the other, and as such can be
handled by the aforementioned logic gates. However, it is not strictly necessary for
the result of the calculation to be stored in this instance.. the amount by which the
values differ is not required. Instead, the appropriate status flags in the flag register
are set and checked to detemine the result of the operation.
Bit shifting
Shifting operations move bits left or right
within a word, with different operations
filling the gaps created in different ways.
This is accomplished via the use of a shift
register, which uses pulses from the clock
within the control unit to trigger a chain
reaction of movement across the bits that
make up the word. Again, this is a quite
complicated logical procedure, and further
reading may aid your understanding.

Click the next button below to move on


and look at the control unit, or
alternatively click on a section of the
diagram above to view a different section.

The control unit is arguably the most


complicated part of this model CPU, and is
responsible for controlling much of the
operation of the rest of the processor. It
does this by issuing control signals to the
other areas of the processor, instructing The simplified model of the central processing unit,
them on what should be performed next. with the control unit highlighted in red. Click on a
different section for more information.

Similarly to the arithmetic and logic unit,


the control unit can be broken down further for easier understanding. As such, the
three main elements of the control unit are as follows:

Decoder
This is used to decode the instructions that make up a program when they are being
processed, and to determine in what actions must be taken in order to process them.
These decisions are normally taken by looking at the opcode of the instruction,
together with the addressing mode used. This is covered in greater detail in the
instruction execution section of this tutorial.

Timer or clock
The timer or clock ensures that all processes and instructions are carried out and
completed at the right time. Pulses are sent to the other areas of the CPU at regular
intervals (related to the processor clock speed), and actions only occur when a pulse
is detected. This ensures that the actions themselves also occur at these same
regular intervals, meaning that the operations of the CPU are synchronised.

Control logic circuits


The control logic circuits are used to create the control signals themselves, which are
then sent around the processor. These signals inform the arithmetic and logic unit
and the register array what they actions and steps they should be performing, what
data they should be using to perform said actions, and what should be done with the
results.

Further detail is not required at this stage on the control unit, though it is clear that
there is much detail at lower levels that has yet to be touched on. However, to move
on to the next element of the processor
(the register array), please click the next
button below.

A register is a memory location within the


CPU itself, designed to be quickly accessed
for purposes of fast data retrieval.
Processors normally contain a register
array, which houses many such registers.
These contain instructions, data and other
values that may need to be quickly
accessed during the execution of a
program.

Many different types of registers are


common between most microprocessor
designs. These are:
The simplified model of the central processing unit,
Program Counter (PC) with the register array highlighted in red. Click on a
This register is used to hold the memory different section for more information.

address of the next instruction that has to


executed in a program. This is to ensure the CPU knows at all times where it has
reached, that is able to resume following an execution at the correct point, and that
the program is executed correctly.

Instruction Register (IR)


This is used to hold the current instruction in the processor while it is being decoded
and executed, in order for the speed of the whole execution process to be reduced.
This is because the time needed to access the instruction register is much less than
continual checking of the memory location itself.

Accumulator (A, or ACC)


The accumulator is used to hold the result of operations performed by the arithmetic
and logic unit, as covered in the section on the ALU.

Memory Address Register (MAR)


Used for storage of memory addresses, usually the addresses involved in the
instructions held in the instruction register. The control unit then checks this register
when needing to know which memory address to check or obtain data from.

Memory Buffer Register (MBR)


When an instruction or data is obtained from the memory or elsewhere, it is first
placed in the memory buffer register. The next action to take is then determined and
carried out, and the data is moved on to the desired location.

Flag register / status flags


The flag register is specially designed to contain all the appropriate 1-bit status flags,
which are changed as a result of operations involving the arithmetic and logic unit.
Further information can be found in the section on the ALU.
Other general purpose registers
These registers have no specific purpose,
but are generally used for the quick
storage of pieces of data that are required
later in the program execution. In the
model used here these are assigned the
names A and B, with suffixes of L and U
indicating the lower and upper sections of
the register respectively.

The final main area of the model


microprocessor being used in this tutorial
is the system bus. Click the next arrow
button below in order to read more.

The system bus is a cable which carries


data communication between the major
components of the computer, including the
microprocessor. Not all of the
communication that uses the bus involves
the CPU, although naturally the examples
used in this tutorial will centre on such The simplified model of the central processing unit,
instances. with the system bus highlighted in red. Click on a
different section for more information.

The system bus consists of three different


groups of wiring, called the data bus, control bus and address bus. These all have
seperate responsibilities and characteristics, which can be outlined as follows:

Control Bus
The control bus carries the signals relating to the control and co-ordination of the
various activities across the computer, which can be sent from the control unit within
the CPU. Different architectures result in differing number of lines of wire within the
control bus, as each line is used to perform a specific task. For instance, different,
specific lines are used for each of read, write and reset requests.

Data Bus
This is used for the exchange of data between the processor, memory and
peripherals, and is bi-directional so that it allows data flow in both directions along
the wires. Again, the number of wires used in the data bus (sometimes known as the
'width') can differ. Each wire is used for the transfer of signals corresponding to a
single bit of binary data. As such, a greater width allows greater amounts of data to
be transferred at the same time.

Address Bus
The address bus contains the connections between the microprocessor and memory
that carry the signals relating to the addresses which the CPU is processing at that
time, such as the locations that the CPU is reading from or writing to. The width of
the address bus corresponds to the maximum addressing capacity of the bus, or the
largest address within memory that the bus can work with. The addresses are
transferred in binary format, with each line of the address bus carrying a single
binary digit. Therefore the maximum address capacity is equal to two to the power of
the number of lines present (2^lines).
This concludes the look at the simplified model processor that will be used for the
remainder of this tutorial. The next section will look at the instruction execution
process, and how these different parts work together to execute programs. However,
before that, there's a chance to test what you've learnt in this section regarding
processor architecture. Click the next arrow below to take a short quiz relating to this
section of the tutorial.

Following on from looking at the structure and architecture of the central processing
unit itself, we shall now look at how the CPU is used to execute programs and make
the computer as a whole run smoothly and efficiently. To do this, we must take a
step back from concentrating solely on the processor, and look at the complete
computer unit.

A flow diagram illustrating the flow of data within the PC during program execution and the
saving of data. Further explanation can be found below.

When software is installed onto a modern day personal computer (most commonly
from a CD-ROM, though other media or downloading from the internet is also
common), code comprising the program and any associated files is stored on the
hard drive. This code comprises of a series of instructions for performing designated
tasks, and data associated with these instructions. The code remains there until the
user chooses to execute the program in question, on which point sections of the code
are loaded into the computers memory.

The CPU then executes the program from memory, processing each instruction in
turn. Of course, in order to execute the instructions, it is necessary for the CPU to
understand what the instruction is telling it to do. Therefore, recognition for
instructions that could be encountered needs to be programmed into the processor.
The instructions that can be recognized by a processor are referred to as an
'instruction set', and are described in greater detail on the next page of the tutorial.

Once the instruction has been recognized, and the actions that should be carried out
are decided upon, the actions are then performed before the CPU proceeds on to the
next instruction in memory. This process is called the 'instruction execution cycle',
and is also covered later on in this tutorial. Results can then be stored back in the
memory, and later saved to the hard drive and possibly backed up onto removal
media or in seperate locations. This is the same flow of information as when a
program is executed only in reverse, as illustrated in the diagram above.
On the next page of this tutorial is a more in-depth look at instruction sets. Click the
next arrow below to proceed.

As outlined in the introduction to this section, for a processor to be able to process


an instruction, it needs to be able to determine what the instruction is asking to be
carried out. For this to occur, the CPU needs to know what actions it may be asked to
perform, and have pre-determined methods available to carry out these actions. It is
this idea which is the reasoning behind the 'instruction set'.

When a processor is executing a program, the program is in a machine language.


However, programmers almost never write their programs directly into this form.
While it may not have been originally written in this way, it is translated to a machine
language at some point before execution so that it is understandable by the CPU.
Machine language can be directly interpreted by the hardware itself, and is able to be
easily encoded as a string of binary bits and sent easily via electrical signals.

The instruction set is a collection of pre-defined machine codes, which the CPU is
designed to expect and be able to act upon when detected. Different processors have
different instruction sets, to allow for greater features, easier coding, and to cope
with changes in the actual architecture of the processor itself. Each machine code of
an instruction set consists of two seperate fields:

Opcode Operand(s)

The opcode is a short code which indicates what operation is expected to be


performed. Each operation has a unique opcode. The operand, or operands, indicate
where the data required for the operation can be found and how it can be accessed
(the addressing mode, which is discussed in full later). The length of a machine code
can vary - common lengths vary from one to twelve bytes in size.

The exact format of the machine codes is again CPU dependant. For the purpose of
this tutorial, we will presume we are using a 24-bit CPU. This means that the
minimum length of the machine codes used here should be 24 binary bits, which in
this instance are split as shown in the table below:

Opcode 6 bits (18-23) - Allows for 64 unique opcodes (2^6)


Operand(s) 18 bits (0-17) - 16 bits (0-15) for address values
- 2 bits (16/17) for specifying
addressing mode to be used

Opcodes are also given mnemonics (short names) so that they can be easily referred
to in code listings and similar documentation. For example, an instruction to store
the contents of the accumulator in a given memory address could be given the
binary opcode 000001, which may then be referred to using the mnemonic STA
(short for STore Accumulator). Such mnemonics will be used for the examples on
upcoming pages.

Now we know what form the data is in when it is read by the CPU, it is necessary to
learn about the cycle by which the instructions of a program are executed. This is the
topic of the next page of the tutorial,
which can be accessed by clicking the next
arrow below

Once a program is in memory it has to be


executed. To do this, each instruction
must be looked at, decoded and acted
upon in turn until the program is
completed. This is achieved by the use of
what is termed the 'instruction execution
cycle', which is the cycle by which each
instruction in turn is processed. However, Diagram showing the basics of the instruction
to ensure that the execution proceeds execution cycle. Each instruction is fetched from
memory, decoded, and then executed.
smoothly, it is is also necessary to
synchronise the activites of the processor.

To keep the events synchronised, the clock located within the CPU control unit is
used. This produces regular pulses on the system bus at a specific frequency, so that
each pulse is an equal time following the last. This clock pulse frequency is linked to
the clock speed of the processor - the higher the clock speed, the shorter the time
between pulses. Actions only occur when a pulse is detected, so that commands can
be kept in time with each other across the whole computer unit.

The instruction execution cycle can be clearly divided into three different parts, which
will now be looked at in more detail. For more on each part of the cycle click the
relevant heading, or use the next arrow as before to proceed though each stage in
order.

Fetch Cycle
The fetch cycle takes the address required from memory, stores it in the instruction
register, and moves the program counter on one so that it points to the next
instruction.

Decode Cycle
Here, the control unit checks the instruction that is now stored within the instruction
register. It determines which opcode and addressing mode have been used, and as
such what actions need to be carried out in order to execute the instruction in
question.

Execute Cycle
The actual actions which occur during the execute cycle of an instruction depend on
both the instruction itself, and the addressing mode specified to be used to access
the data that may be required. However, four main groups of actions do exist, which
are discussed in full later on.

Clicking the next arrow below will take you to further information relating to the
fetch cycle.

The first part of the instruction execution cycle is the fetch cycle. To best illustrate
the actions that occur within the fetch cycle, there is an interactive animation below.
Once the instruction has been fetched and stored in the instruction register, it must
then be decoded. The decoding process is detailed on the next page, which can be
accessed by clicking the next arrow below.

Once the instruction has been fetched and is stored, the next step is to decode the
instruction in order to work out what actions should be performed to execute it. This
involves examining the opcode to see which of the machine codes in the CPU's
instruction set it corresponds to, and also checking which addressing mode needs to
be used to obtain any required data. Therefore, using the CPU model from this
tutorial, bits 16 to 23 should be examined.

Once the opcode is known, the execution cycle can occur. Different actions need to
be carried out dependant on the opcode, with no two opcodes requiring the same
actions to occur. However, there are generally four groups of different actions that
can occur:

• Transfer of data between the CPU and memory.


• Transfer of data between the CPU and an input or output devices.
• Processing of data, possibly involving the use of the arithmetic and logic unit.
• A control operation, in order to change the sequence of subsequent
operations. These can possibly be conditional, based on the values stored at
that point within the flag register.

For greater simplicity, and as describing all the possible instructions is unnecessary,
the following tutorial pages will only look at a few possible instructions. These are:

Mnemonic Description
MOV Moves a data value from one location to another
ADD Adds to data values using the ALU, and returns the
result to the accumulator
STO Stores the contents of the accumulator in the specified
location
END Marks the end of the program in memory
The four instructions used in the examples for the remainder of this section of the tutorial

The following three pages of the tutorial will look at the first two of these
instructions, and how they are executed in each of the three main addressing modes.
These addressing modes are:

Immediate addressing
With immediate addressing, no lookup of data is actually required. The data is
located within the operands of the instruction itself, not in a seperate memory
location. This is the quickest of the addressing modes to execute, but the least
flexible. As such it is the least used of the three in practice.

Direct addressing
For direct addressing, the operands of the instruction contain the memory address
where the data required for execution is stored. For the instruction to be processed
the required data must be first fetched from that location.
Indirect addressing
When using indirect addressing, the operands give a location in memory similarly to
direct addressing. However, rather than the data being at this location, there is
instead another memory address given where the data actually is located. This is the
most flexible of the modes, but also the slowest as two data lookups are required.

The next page looks at immediate addressing. Click the next arrow below to proceed

The first of the three addressing modes to be looked at is immediate addressing.


When writing out the code in mnemonic form, operands that require this mode are
marked with a # symbol. With immediate addressing, the data required for
execution of the instruction is located directly within the operands of the instruction
itself. No lookup of data from memory is required.

To best illustrate the methods used by immediate addressing there is an interactive


animation below.

The next of the three addressing modes that will be looked at is direct addressing. To
proceed, click the next arrow below.

The second of the three addressing modes to be looked at is direct addressing.


When writing out the code in mnemonic form, no symbol is required to mark
operands which use this form. Direct addressing means that the operands of the
instruction hold the address of the location in memory where the data required can
be found. The data is then fetched from this location in order to allow the instruction
to be executed.

To best illustrate the methods used by direct addressing there is an interactive


animation below.

The final of the three modes of addressing to be looked at is indirect addressing. To


proceed to the next page where this mode is covered, please click the next arrow
button below.

The final of the three addressing modes to be looked at is indirect addressing.


When writing out the code in mnemonic form, operands that require this mode are
marked with a @ symbol. Indirect addressing means that the memory address given
in the operands of the instruction is not the location of the actual data required.
Instead, that address holds a further address, at which the data is stored. This can
prove useful if decisions need to be made within the execution, as the memory
address used in processing can be changed during execution.

To best illustrate the methods used by indirect addressing there is an interactive


animation below

Now that we have covered all the stages of the instruction execution process, and
also the three main addressing modes that are used, we are able to examine the full
execution of simple programs. The next page of the tutorial shows the full execution
of one such simple program, and is available by clicking on the next arrow button
below.
In the previous two sections the basics of the workings and architecture of the
central processing unit has been explained. There has been a general look at a
simple processor architecture, an explanation of the method by which instructions
are executed, and how the various different addressing modes affect how the CPU
processes instructions. However, modern CPUs are very rarely as simple as the ones
that have been discussed thus far.

While the information covered up to this point is still applicable and relevant to the
majority of microprocessors, many refinements to the workings and architecture
have also been implemented. In this final section of the tutorial there will be a brief
look at three main areas where these refinements have occured:

Pipelining
This is a method by which the processor can be involved in the execution of more
than a single instruction at one time. Understandably, this enables the execution of
the program to be completed with greater speed, but is not without complications
and problems. These have to be overcome by careful design.

CISC and RISC architectures


Over the course of the development of the modern day processor, two competing
architectures have emerged. CISC and RISC have several major differences in
features and ideas, but both were designed with the intention of improving CPU
performance. Current processors tend not to be strictly adherent to either
architecture, instead being a mix of the two ideals.

Modern architectures
Outside of pipelining, RISC and CISC, many other improvements to the general
architecture of the microprocessor have been developed. These are in many differing
areas such as cache memory and specialised instruction set extensions. New
advancements are added with each new generation of processors.

The first of these areas to be covered is the topic of pipelining. Click the next arrow
below to read more about the topic.

Up until this point in the tutorial we have assumed that the processor is only able to
process one instruction at a time. All examples have shown an instruction having to
be executed in full before the next one can be started on. However, this is not how
modern CPUs work. Pipelining is the name given to the process by which the
processor can be working on more than one instruction at once.

The simplest way to approach pipelining is to consider the three stage fetch, decode
and execute instruction execution cycle outlined earlier. There are times during each
of these subcycles of the main cycle where the main memory is not being accessed,
and the CPU could be considered 'idle'. The idea, therefore, is to begin the fetch
stage for a second instruction while the first stage is being decoded. Then, when
instruction one is being executed and instruction two is being decoded, a third
instruction can be fetched.

Below is an interactive animation that demonstrates the benefits which this simple
form of pipelining can produce.
Across the nine time cycles shown above, the non pipelined method manages to
completely execute three instructions. With pipelining, seven instructions are
executed in full - and another two are started. However, pipelining is not without
problems, and does not necessarily work as well as this. For more on the problems
associated with pipelining and how they can be overcome, click the next arrow
below.

While pipelining can severely cut the time taken to execute a program, there are
problems that cause it to not work as well as it perhaps should. The three stages of
the instruction execution process do not necessarily take an equal amount of time,
with the time taken for 'execute' being generally longer than 'fetch'. This makes it
much harder to synchronise the various stages of the different instructions. Also,
some instructions may be dependent on the results of other earlier instructions. This
can arise when data produced earlier needs to be used, or when a conditional branch
based on a previous outcome is used.

One of the simplest ways in which the effects of these problems can be reduced is by
breaking the instruction execution cycle into stages that are more likely to be of an
equal duration. For example, the diagram below shows how the cycle can be broken
down into six stages rather than three:

Diagram showing the differences between the common 3 stage model of the instruction
execution cycle, and the 6 stage model used in more advanced pipelining.

However, while this may solve some of the problems outlined above, it is not without
creating further problems of its own. Firstly, it is not always the case than an
instruction will use all six of these stages. Simple load instructions, for example, will
not require the use of the final 'write operand' stage, which would possibly upset the
synchronisation. There is also the matter of potential conflicts within the memory
system, as three of the above stages (fetch instruction, fetch operands, write
operand) require access to the memory. Many memory management systems would
not allow three seperate instructions to be accessing the memory at once, and hence
the pipelining would not be as beneficial as it would first seem.

On top of this, the problem of conditional branching and result dependant


instructions also occurs. This means that the processor needs to be designed well in
order to cope with these potential interruptions to the flow of data. As you can tell,
there are many issues which need to be taken into consideration relating to the
technique of pipelining. While it is a powerful technique for the purpose of increasing
CPU performance, it does require careful design and consideration in order to achieve
the best possible results.
Next, we shall move on to look at the competing architectures of CISC and RISC.
Click the next arrow to proceed.

Years of development have been undertaken into improving the architecture of the
central processing unit, with the main aim of improving performance. Two competing
architectures were developed for this purpose, and different processors conformed to
each one. Both had their strengths and weaknesses, and as such also had supporters
and detractors.

CISC: Complex Instruction Set Computers


Earlier developments were based around the idea that making the CPU more
complex and supporting a larger number of potential instructions would lead to
increased performance. This idea is at the root of CISC processors, such as the Intel
x86 range, which have very large instruction sets reaching up to and above three
hundred seperate instructions. They also have increased complexity in other areas,
with many more specialised addressing modes and registers also being implemented,
and variable length of the instruction codes themselves.

Performance was improved here by allowing the simplification of program compilers,


as the range of more advanced instructions available led to less refinements having
to be made at the compilation process. However, the complexity of the processor
hardware and architecture that resulted can cause such chips to be difficult to
understand and program for, and also means they can be expensive to produce.

RISC: Reduced Instruction Set Computers


In opposition to CISC, the mid-1980s saw the beginnings of the RISC philosophy.
The idea here was that the best way to improve performance would be to simplify
the processor workings as much as possible. RISC processors, such as the IBM
PowerPC processor, have a greatly simplified and reduced instruction set, numbering
in the region of one hundred instructions or less. Addressing modes are simplified
back to four or less, and the length of the codes is fixed in order to allow
standardisation across the instruction set.

Changing the architecture to this extent means that less transistors are used to
produce the processors. This means that RISC chips are much cheaper to produce
than their CISC counterparts. Also the reduced instruction set means that the
processor can execute the instructions more quickly, potentially allowing for greater
speeds. However, only allowing such simple instructions means a greater burden is
placed upon the software itself. Less instructions in the instruction set means a
greater emphasis on the efficient writing of software with the instructions that are
available. Supporters of the CISC architecture will point out that their processors are
of good enough performance and cost to make such efforts not worth the trouble.

CISC RISC
Large (100 to 300) Instruction Set Small (100 or less)
Complex (8 to 20) Addressing Modes Simple (4 or less)
Specialised Instruction Format Simple
Variable Code Lengths Fixed
Variable Execution Cycles Standard for most
Higher Cost / CPU Complexity Lower
Compilation Simplifies Processor design
Processor design Complicates Software
Summary of the main differences between the two competing architectures

Looking at the most modern processors, it becomes evident that the whole rivalry
between CISC and RISC is now not of great importance. This is because the two
architectures are converging closer to each other, with CPUs from each side
incorporating ideas from the other. CISC processors now use many of the same
techniques as RISC ones, while the reduced instruction sets of RISC processors
contain similar numbers of instructions to those
found in certain CISC chips. However, it is still
important that you understand the ideas behind
these two differing architectures, and why each
design path was chosen.

The final page of this section of the tutorial looks


at other improvements to modern CPU
architectures. Click the next arrow below to
proceed.

On top of the topics already covered in this


section, there are many other ways in which
companies who manufacture microprocessors
have attempted to improve the performance of
their CPUs. These are generally at too high a level
to be discussed within this tutorial, but this page The Pentium 4 is the first of Intel's
contains a brief introduction to three of the most processors to make use of the new
common of these other techniques. hyperthreading technology

Cache memory
This is a small amount of high-speed memory used specifically as a fast and effective
method of storage for commonly used instructions. Most programs end up accessing
the same data and instructions over and over again at some point in their execution.
Placing these in higher speed storage, such as a cache, provides a great
improvement in the time taken for processing over continual accessing from the
main memory at a slower speed.

Home computer processors traditionally have implemented the cache directly into
their architecture, in what is known as a 'Level 1' cache. The most modern CPUs also
make use of external caches, which are referred to as 'Level 2' cache and much
larger in size than 'Level 1' caches. More recent processors have larger caches - for
instance, the Intel 486 had a cache of only eight kilobytes, while the Pentium II used
multiple stores totalling up to two megabytes of storage space.

Specialised instruction set extensions


The most commonly known extensions to the traditional CPU instruction set are
Intel's MMX and AMD's 3DNow! technology. These both come into use when the
processor is asked to perform operations involving graphics, audio and video, and
consist of a number of specific instructions which are specialized to perform the short
repetitive tasks that make up the large majority of multimedia processing. These
extensions use SIMD (Single Instruction, Multiple Data) instructions in order to
greatly reduce the time taken, as such instructions perform their operations to
multiple pieces of data at the same time.

MMX makes use of fifty-seven SIMD instructions, while the Pentium 4 raises this
number to one hundred and forty-four. This includes further extensions to improve
operations relating to internet-related activity, such as the streaming of music and
video files. The improved 3DNow! technology found in the AMD Athlon processor also
contains SIMD instructions for this purpose. Such extensions ultimately enhance the
performance of the processor in activites relating to gaming, multimedia applications,
and use of the internet and other forms of communication.

Hyperthreading
Hyperthreading is a new technology, introduced by Intel with their most recent
Pentium 4 processors. It works by using what is known as 'simultaneous
multithreading' to make the single processor appear to the computer operating
system as multiple logical processors. This enables the CPU via use of shared
hardware resources to execute multiple seperate parts of a program (or 'threads') at
the same time.

This technology does not provide the same performance increase as actual seperate
processors would do, but provides a considerable boost for less cost and power
consumption than said multiple processors would require. Current processors such as
the aforementioned Pentium 4 currently split the CPU into two logical processors.
Intel are currently working on further advancements which will enable splitting
higher numbers of threads to be simultaneously executed.

This concludes the section of the further features of the more modern
microprocessor. Following this page is a multiple-choice quiz with which you can test
your knowledge from this section. Click the next arrow below to continue.

You have reached the conclusion of the microprocessor tutorial. Hopefully you should
now have a greater understanding of the architecture of the microprocessor and how
it works.

Вам также может понравиться