Академический Документы
Профессиональный Документы
Культура Документы
htm
CPU Structure
This section, using a simplified model of a central processing unit as an example,
takes you through the role of each of the major constituent parts of the CPU. It also
looks more closely at each part, and examines how they are constructed and how
they perform their role within the microprocessor.
Instruction Execution
Once you are familiar with the various elements of the processor, this section looks
at how they work together to process and execute a program. It looks at how the
various instructions that form the program are recognised, together with the
processes and actions that are carried out during the instruction execution cycle
itself.
Further Features
Now that the basics have been covered, this section explores the further
advancements in the field of microprocessor architecture that have occured in recent
years. Explanations of such techniques as pipelining and hyperthreading are
provided, together with a look at cache memory and trends in CPU architecture.
Each section also concludes with a multiple choice quiz with which you can test your
knowledge, while some also contain interactive animations in order to improve your
learning experience. These animations are in Macromedia Flash format, and will
require Flash Player to be installed on your computer. If it is not, please visit the
Macromedia website in order to download and install the browser plug-in.
The first section of this tutorial related to the structure of the central processing unit.
Please click the button marked with the next arrow below to proceed.
As there are a great many variations in
architecture between the different kinds of
CPU, we shall begin my looking at a
simplified model of the structure. The
model to be used can be seen on the right
of this page, and is a good basis on which
to build your knowledge of the workings of
a microprocessor. The simplified model
consists of five parts, which are:
Register Array
This is a small amount of internal memory that is used for the quick storage and
retreival of data and instructions. All processors include some common registers used
for specific functions, namely the program counter, instruction register, accumulator,
memory address register and stack pointer. For more, click the title above.
System Bus
This is comprised of the control bus, data bus and address bus. It is used for
connections between the processor, memory and peripherals, and transferal of data
between the various parts. Click the title above for more.
Memory
The memory is not an actual part of the CPU itself, and is instead housed elsewhere
on the motherboard. However, it is here that the program being executed is stored,
and as such is a crucial part of the overall structure involved in program execution.
For further information on the memory, please see the seperate tutorial if available.
For more information on these parts of the CPU, click the corresponding title of the
description above. You could also click on the part in question on the diagram to the
right. Alternatively, click the right arrow button below to move on to the next page,
which looks at the arithmetic and logic unit.
The ALU, or the arithmetic and logic unit,
is the section of the processor that is
involved with executing operations of an
arithmetic or logical nature. It works in
conjunction with the register array for
many of these, in particular, the
accumulator and flag registers. The
accumulator holds the results of
operations, while the flag register contains
a number of individual bits that are used
to store information about the last
operation carried out by the ALU. More on
these registers can be found in the
register array section.
The topic of logic gates is too expansive and detailed to be covered in full here. Many
resources exist on the internet and elsewhere relating to this topic, however, so it is
recommended that you read further into the areas outlined above to aid with your
learning.
Logical tests
Further logic gates are used within the ALU to perform a number of different logical
tests, including seeing if an operation produces a result of zero. Most of these logical
tests are used to then change the values stored in the flag register, so that they may
be checked later by seperate operations or instructions. Others produce a result
which is then stored, and used later in further processing.
Comparison
Comparison operations compare values in order to determine such things as whether
one number is greater than, less than or equal to another. These operations can be
performed by subtraction of one of the numbers from the other, and as such can be
handled by the aforementioned logic gates. However, it is not strictly necessary for
the result of the calculation to be stored in this instance.. the amount by which the
values differ is not required. Instead, the appropriate status flags in the flag register
are set and checked to detemine the result of the operation.
Bit shifting
Shifting operations move bits left or right
within a word, with different operations
filling the gaps created in different ways.
This is accomplished via the use of a shift
register, which uses pulses from the clock
within the control unit to trigger a chain
reaction of movement across the bits that
make up the word. Again, this is a quite
complicated logical procedure, and further
reading may aid your understanding.
Decoder
This is used to decode the instructions that make up a program when they are being
processed, and to determine in what actions must be taken in order to process them.
These decisions are normally taken by looking at the opcode of the instruction,
together with the addressing mode used. This is covered in greater detail in the
instruction execution section of this tutorial.
Timer or clock
The timer or clock ensures that all processes and instructions are carried out and
completed at the right time. Pulses are sent to the other areas of the CPU at regular
intervals (related to the processor clock speed), and actions only occur when a pulse
is detected. This ensures that the actions themselves also occur at these same
regular intervals, meaning that the operations of the CPU are synchronised.
Further detail is not required at this stage on the control unit, though it is clear that
there is much detail at lower levels that has yet to be touched on. However, to move
on to the next element of the processor
(the register array), please click the next
button below.
Control Bus
The control bus carries the signals relating to the control and co-ordination of the
various activities across the computer, which can be sent from the control unit within
the CPU. Different architectures result in differing number of lines of wire within the
control bus, as each line is used to perform a specific task. For instance, different,
specific lines are used for each of read, write and reset requests.
Data Bus
This is used for the exchange of data between the processor, memory and
peripherals, and is bi-directional so that it allows data flow in both directions along
the wires. Again, the number of wires used in the data bus (sometimes known as the
'width') can differ. Each wire is used for the transfer of signals corresponding to a
single bit of binary data. As such, a greater width allows greater amounts of data to
be transferred at the same time.
Address Bus
The address bus contains the connections between the microprocessor and memory
that carry the signals relating to the addresses which the CPU is processing at that
time, such as the locations that the CPU is reading from or writing to. The width of
the address bus corresponds to the maximum addressing capacity of the bus, or the
largest address within memory that the bus can work with. The addresses are
transferred in binary format, with each line of the address bus carrying a single
binary digit. Therefore the maximum address capacity is equal to two to the power of
the number of lines present (2^lines).
This concludes the look at the simplified model processor that will be used for the
remainder of this tutorial. The next section will look at the instruction execution
process, and how these different parts work together to execute programs. However,
before that, there's a chance to test what you've learnt in this section regarding
processor architecture. Click the next arrow below to take a short quiz relating to this
section of the tutorial.
Following on from looking at the structure and architecture of the central processing
unit itself, we shall now look at how the CPU is used to execute programs and make
the computer as a whole run smoothly and efficiently. To do this, we must take a
step back from concentrating solely on the processor, and look at the complete
computer unit.
A flow diagram illustrating the flow of data within the PC during program execution and the
saving of data. Further explanation can be found below.
When software is installed onto a modern day personal computer (most commonly
from a CD-ROM, though other media or downloading from the internet is also
common), code comprising the program and any associated files is stored on the
hard drive. This code comprises of a series of instructions for performing designated
tasks, and data associated with these instructions. The code remains there until the
user chooses to execute the program in question, on which point sections of the code
are loaded into the computers memory.
The CPU then executes the program from memory, processing each instruction in
turn. Of course, in order to execute the instructions, it is necessary for the CPU to
understand what the instruction is telling it to do. Therefore, recognition for
instructions that could be encountered needs to be programmed into the processor.
The instructions that can be recognized by a processor are referred to as an
'instruction set', and are described in greater detail on the next page of the tutorial.
Once the instruction has been recognized, and the actions that should be carried out
are decided upon, the actions are then performed before the CPU proceeds on to the
next instruction in memory. This process is called the 'instruction execution cycle',
and is also covered later on in this tutorial. Results can then be stored back in the
memory, and later saved to the hard drive and possibly backed up onto removal
media or in seperate locations. This is the same flow of information as when a
program is executed only in reverse, as illustrated in the diagram above.
On the next page of this tutorial is a more in-depth look at instruction sets. Click the
next arrow below to proceed.
The instruction set is a collection of pre-defined machine codes, which the CPU is
designed to expect and be able to act upon when detected. Different processors have
different instruction sets, to allow for greater features, easier coding, and to cope
with changes in the actual architecture of the processor itself. Each machine code of
an instruction set consists of two seperate fields:
Opcode Operand(s)
The exact format of the machine codes is again CPU dependant. For the purpose of
this tutorial, we will presume we are using a 24-bit CPU. This means that the
minimum length of the machine codes used here should be 24 binary bits, which in
this instance are split as shown in the table below:
Opcodes are also given mnemonics (short names) so that they can be easily referred
to in code listings and similar documentation. For example, an instruction to store
the contents of the accumulator in a given memory address could be given the
binary opcode 000001, which may then be referred to using the mnemonic STA
(short for STore Accumulator). Such mnemonics will be used for the examples on
upcoming pages.
Now we know what form the data is in when it is read by the CPU, it is necessary to
learn about the cycle by which the instructions of a program are executed. This is the
topic of the next page of the tutorial,
which can be accessed by clicking the next
arrow below
To keep the events synchronised, the clock located within the CPU control unit is
used. This produces regular pulses on the system bus at a specific frequency, so that
each pulse is an equal time following the last. This clock pulse frequency is linked to
the clock speed of the processor - the higher the clock speed, the shorter the time
between pulses. Actions only occur when a pulse is detected, so that commands can
be kept in time with each other across the whole computer unit.
The instruction execution cycle can be clearly divided into three different parts, which
will now be looked at in more detail. For more on each part of the cycle click the
relevant heading, or use the next arrow as before to proceed though each stage in
order.
Fetch Cycle
The fetch cycle takes the address required from memory, stores it in the instruction
register, and moves the program counter on one so that it points to the next
instruction.
Decode Cycle
Here, the control unit checks the instruction that is now stored within the instruction
register. It determines which opcode and addressing mode have been used, and as
such what actions need to be carried out in order to execute the instruction in
question.
Execute Cycle
The actual actions which occur during the execute cycle of an instruction depend on
both the instruction itself, and the addressing mode specified to be used to access
the data that may be required. However, four main groups of actions do exist, which
are discussed in full later on.
Clicking the next arrow below will take you to further information relating to the
fetch cycle.
The first part of the instruction execution cycle is the fetch cycle. To best illustrate
the actions that occur within the fetch cycle, there is an interactive animation below.
Once the instruction has been fetched and stored in the instruction register, it must
then be decoded. The decoding process is detailed on the next page, which can be
accessed by clicking the next arrow below.
Once the instruction has been fetched and is stored, the next step is to decode the
instruction in order to work out what actions should be performed to execute it. This
involves examining the opcode to see which of the machine codes in the CPU's
instruction set it corresponds to, and also checking which addressing mode needs to
be used to obtain any required data. Therefore, using the CPU model from this
tutorial, bits 16 to 23 should be examined.
Once the opcode is known, the execution cycle can occur. Different actions need to
be carried out dependant on the opcode, with no two opcodes requiring the same
actions to occur. However, there are generally four groups of different actions that
can occur:
For greater simplicity, and as describing all the possible instructions is unnecessary,
the following tutorial pages will only look at a few possible instructions. These are:
Mnemonic Description
MOV Moves a data value from one location to another
ADD Adds to data values using the ALU, and returns the
result to the accumulator
STO Stores the contents of the accumulator in the specified
location
END Marks the end of the program in memory
The four instructions used in the examples for the remainder of this section of the tutorial
The following three pages of the tutorial will look at the first two of these
instructions, and how they are executed in each of the three main addressing modes.
These addressing modes are:
Immediate addressing
With immediate addressing, no lookup of data is actually required. The data is
located within the operands of the instruction itself, not in a seperate memory
location. This is the quickest of the addressing modes to execute, but the least
flexible. As such it is the least used of the three in practice.
Direct addressing
For direct addressing, the operands of the instruction contain the memory address
where the data required for execution is stored. For the instruction to be processed
the required data must be first fetched from that location.
Indirect addressing
When using indirect addressing, the operands give a location in memory similarly to
direct addressing. However, rather than the data being at this location, there is
instead another memory address given where the data actually is located. This is the
most flexible of the modes, but also the slowest as two data lookups are required.
The next page looks at immediate addressing. Click the next arrow below to proceed
The next of the three addressing modes that will be looked at is direct addressing. To
proceed, click the next arrow below.
Now that we have covered all the stages of the instruction execution process, and
also the three main addressing modes that are used, we are able to examine the full
execution of simple programs. The next page of the tutorial shows the full execution
of one such simple program, and is available by clicking on the next arrow button
below.
In the previous two sections the basics of the workings and architecture of the
central processing unit has been explained. There has been a general look at a
simple processor architecture, an explanation of the method by which instructions
are executed, and how the various different addressing modes affect how the CPU
processes instructions. However, modern CPUs are very rarely as simple as the ones
that have been discussed thus far.
While the information covered up to this point is still applicable and relevant to the
majority of microprocessors, many refinements to the workings and architecture
have also been implemented. In this final section of the tutorial there will be a brief
look at three main areas where these refinements have occured:
Pipelining
This is a method by which the processor can be involved in the execution of more
than a single instruction at one time. Understandably, this enables the execution of
the program to be completed with greater speed, but is not without complications
and problems. These have to be overcome by careful design.
Modern architectures
Outside of pipelining, RISC and CISC, many other improvements to the general
architecture of the microprocessor have been developed. These are in many differing
areas such as cache memory and specialised instruction set extensions. New
advancements are added with each new generation of processors.
The first of these areas to be covered is the topic of pipelining. Click the next arrow
below to read more about the topic.
Up until this point in the tutorial we have assumed that the processor is only able to
process one instruction at a time. All examples have shown an instruction having to
be executed in full before the next one can be started on. However, this is not how
modern CPUs work. Pipelining is the name given to the process by which the
processor can be working on more than one instruction at once.
The simplest way to approach pipelining is to consider the three stage fetch, decode
and execute instruction execution cycle outlined earlier. There are times during each
of these subcycles of the main cycle where the main memory is not being accessed,
and the CPU could be considered 'idle'. The idea, therefore, is to begin the fetch
stage for a second instruction while the first stage is being decoded. Then, when
instruction one is being executed and instruction two is being decoded, a third
instruction can be fetched.
Below is an interactive animation that demonstrates the benefits which this simple
form of pipelining can produce.
Across the nine time cycles shown above, the non pipelined method manages to
completely execute three instructions. With pipelining, seven instructions are
executed in full - and another two are started. However, pipelining is not without
problems, and does not necessarily work as well as this. For more on the problems
associated with pipelining and how they can be overcome, click the next arrow
below.
While pipelining can severely cut the time taken to execute a program, there are
problems that cause it to not work as well as it perhaps should. The three stages of
the instruction execution process do not necessarily take an equal amount of time,
with the time taken for 'execute' being generally longer than 'fetch'. This makes it
much harder to synchronise the various stages of the different instructions. Also,
some instructions may be dependent on the results of other earlier instructions. This
can arise when data produced earlier needs to be used, or when a conditional branch
based on a previous outcome is used.
One of the simplest ways in which the effects of these problems can be reduced is by
breaking the instruction execution cycle into stages that are more likely to be of an
equal duration. For example, the diagram below shows how the cycle can be broken
down into six stages rather than three:
Diagram showing the differences between the common 3 stage model of the instruction
execution cycle, and the 6 stage model used in more advanced pipelining.
However, while this may solve some of the problems outlined above, it is not without
creating further problems of its own. Firstly, it is not always the case than an
instruction will use all six of these stages. Simple load instructions, for example, will
not require the use of the final 'write operand' stage, which would possibly upset the
synchronisation. There is also the matter of potential conflicts within the memory
system, as three of the above stages (fetch instruction, fetch operands, write
operand) require access to the memory. Many memory management systems would
not allow three seperate instructions to be accessing the memory at once, and hence
the pipelining would not be as beneficial as it would first seem.
Years of development have been undertaken into improving the architecture of the
central processing unit, with the main aim of improving performance. Two competing
architectures were developed for this purpose, and different processors conformed to
each one. Both had their strengths and weaknesses, and as such also had supporters
and detractors.
Changing the architecture to this extent means that less transistors are used to
produce the processors. This means that RISC chips are much cheaper to produce
than their CISC counterparts. Also the reduced instruction set means that the
processor can execute the instructions more quickly, potentially allowing for greater
speeds. However, only allowing such simple instructions means a greater burden is
placed upon the software itself. Less instructions in the instruction set means a
greater emphasis on the efficient writing of software with the instructions that are
available. Supporters of the CISC architecture will point out that their processors are
of good enough performance and cost to make such efforts not worth the trouble.
CISC RISC
Large (100 to 300) Instruction Set Small (100 or less)
Complex (8 to 20) Addressing Modes Simple (4 or less)
Specialised Instruction Format Simple
Variable Code Lengths Fixed
Variable Execution Cycles Standard for most
Higher Cost / CPU Complexity Lower
Compilation Simplifies Processor design
Processor design Complicates Software
Summary of the main differences between the two competing architectures
Looking at the most modern processors, it becomes evident that the whole rivalry
between CISC and RISC is now not of great importance. This is because the two
architectures are converging closer to each other, with CPUs from each side
incorporating ideas from the other. CISC processors now use many of the same
techniques as RISC ones, while the reduced instruction sets of RISC processors
contain similar numbers of instructions to those
found in certain CISC chips. However, it is still
important that you understand the ideas behind
these two differing architectures, and why each
design path was chosen.
Cache memory
This is a small amount of high-speed memory used specifically as a fast and effective
method of storage for commonly used instructions. Most programs end up accessing
the same data and instructions over and over again at some point in their execution.
Placing these in higher speed storage, such as a cache, provides a great
improvement in the time taken for processing over continual accessing from the
main memory at a slower speed.
Home computer processors traditionally have implemented the cache directly into
their architecture, in what is known as a 'Level 1' cache. The most modern CPUs also
make use of external caches, which are referred to as 'Level 2' cache and much
larger in size than 'Level 1' caches. More recent processors have larger caches - for
instance, the Intel 486 had a cache of only eight kilobytes, while the Pentium II used
multiple stores totalling up to two megabytes of storage space.
MMX makes use of fifty-seven SIMD instructions, while the Pentium 4 raises this
number to one hundred and forty-four. This includes further extensions to improve
operations relating to internet-related activity, such as the streaming of music and
video files. The improved 3DNow! technology found in the AMD Athlon processor also
contains SIMD instructions for this purpose. Such extensions ultimately enhance the
performance of the processor in activites relating to gaming, multimedia applications,
and use of the internet and other forms of communication.
Hyperthreading
Hyperthreading is a new technology, introduced by Intel with their most recent
Pentium 4 processors. It works by using what is known as 'simultaneous
multithreading' to make the single processor appear to the computer operating
system as multiple logical processors. This enables the CPU via use of shared
hardware resources to execute multiple seperate parts of a program (or 'threads') at
the same time.
This technology does not provide the same performance increase as actual seperate
processors would do, but provides a considerable boost for less cost and power
consumption than said multiple processors would require. Current processors such as
the aforementioned Pentium 4 currently split the CPU into two logical processors.
Intel are currently working on further advancements which will enable splitting
higher numbers of threads to be simultaneously executed.
This concludes the section of the further features of the more modern
microprocessor. Following this page is a multiple-choice quiz with which you can test
your knowledge from this section. Click the next arrow below to continue.
You have reached the conclusion of the microprocessor tutorial. Hopefully you should
now have a greater understanding of the architecture of the microprocessor and how
it works.