Architecture 3 PDF

e-PG PATHSHALA- Computer Science
Computer Architecture
Module 3
Quadrant – I – E-text
Performance Metrics
The objectives of this module are to identify and evaluate the performance metrics for a
processor and also discuss the CPU performance equation.
When you look at the computer engineering methodology you have technology
trends that happen and various improvements that happen with respect to technology
and this will give rise to newer and newer architectures. You have to evaluate the
existing systems for bottlenecks and then try to come up with better architectures and
this process continues. While evaluating the existing systems for bottlenecks, you will
have to have certain metrics and certain benchmarks based on which you'll have the
evaluation done.
You should basically be able to

 measure performance
 report performance and
 summarise performance.
These steps are necessary because that'll help you make intelligent choices about the
computer systems that you want to purchase. It will help you see through the marketing
hype - there is so much of hype happening about computer systems and unless you
have some basics about the performance of computer systems you will not be able to
manage this and you will not be able to make a judicious choice when purchasing
systems. Understanding performance measures is also a key to understanding the
underlying organizational motivation, based on what factors people try to bring these
modifications, so that performance will be improved. You will be able to understand the
motivational aspects based on which certain innovations were brought in. While
discussing about performance, you should be able to answer some questions like this:
• Why is some hardware better than others for different programs?

• What factors of system performance are hardware related?
(e.g., Do we need a new machine, or a new operating system?)
• How does the machine's instruction set affect performance?
Performance is important both from the purchasing perspective and the designer's
perspective. When you look at the purchasing perspective, given a collection of
machines, you'll have to be able to decide which has the best performance, the least
cost, and also the best cost per performance ratio. Similarly, from a designer’s
1
perspective, you are faced with several design options like which has the best
performance improvement, least cost and best cost/performance. Unless you have
some idea about the performance metrics, you will not be able to decide which will be
the best performance improvement that you can think of and which will lead to least cost
and which will give you the best cost performance ratio. So, whether you're looking at
the designer's perspective or purchaser’s perspective, both of them need to have some
knowledge about the performance metrics and both require these performance metrics
for comparison.
Our goal is to understand what factors in the architecture contribute to the
overall system performance and the relative importance and cost of these factors.
Performance means different things to different people. Say, for example, take an
analogy from the airline industry. If you have to choose between different types of
aircrafts, what the various factors that you'll have to consider? Do you have to worry
only about the cruising speed - how fast the craft flies, or do you have to worry about
how far the car craft will fly - the flight range., or look at how big these aircrafts are and
how many people can be transported at one point of time from one place to another
place. So these are different factors that need to be considered and you cannot expect
a particular aircraft to satisfy all these requirements. You'll have to decide which one is
more important than the other factors. All three factors are important, no doubt about it,
but all three of them may not be equally important - you may have more importance to
certain factors compared to other factors. The criteria of performance evaluation differ
among the users and designers. The same holds good when you're looking at a
computer industry also. You have different classes of computer systems and you may
have certain performance criteria which are important for certain types of applications,
whereas they may not be so important for other types of applications. You should be
able to decide which is important for which type of processor. You'll have to be aware of
the fact that you should never let an engineer get away with simply presenting the data -
you always should insist that he or she should lead off with the conclusions to which the
data led, to justify the reasons why you get these data. Only when you are able to
understand the internal architecture of the processor, you’ll be able to make a judicious
choice.
There are different things that affect the performance of a computer

system. The instructions that you use and the implementation of these instructions, the
memory hierarchy, the way the I/O is handled - all this may contribute to your
performance. The primary factor when you're looking at computer performance is time.
All of us are worried about how fast the program executes. So the most important
performance factor is the time. When you're looking at time being the most important
factor, are you looking at response time, or are you looking at something else? What we
mean by response time is the latency - you ask the processor to execute a particular
task and how fast you get a response from the processor - that is basically what is
called the response time.
• How long does it take for my job to run?

• How long does it take to execute a job?
2
• How long must I wait for the database query?
The other important time factor is throughput. It is the total amount of work done in a
given time.
• How many jobs can the machine run at once?

• What is the average execution rate?
• How much work is getting done?
Response time (execution time) – the time between the start and the completion of a
task is important to individual users. Throughput (bandwidth) – the total amount of work
done in a given time is important to data center managers. We will need different
performance metrics as well as a different set of applications to benchmark embedded
and desktop computers, which are more focused on response time, versus servers,
which are more focused on throughput
If we have to maximize performance, we obviously need to minimize our

execution time. Performance is inversely related to execution time.
Performance = 1/ Execution time
If a processor X is n times faster than Y, then,
Decreasing response time almost always improves throughput.
As an example, If computer A runs a program in 10 seconds and computer B runs the

same program in 20 seconds, how much faster is A than B?
Speedup of A over B = 20 /10 = 2, indicating A is two times faster

than B.
Execution time is the time the CPU spends working on the task, it does not include the
time waiting for I/O or running other programs. You know the processor does not run
only your program, it may be running other programs also and when there is an I/O
transfer, it may block this program and then switch over to a different program. We don’t
consider the time taken for doing the I/O operations and always only worried about the
CPU execution time. That is the time that the CPU spends on a particular program.
To determine the CPU execution time for a program, you can find out the total number
of clock cycles that the program takes and multiply it by the clock cycle time. Each
3
program is made up of a number of instructions and each instruction takes a number of
clock cycles to execute. If you find out the total number of clock cycles per program and
if you know the clock cycle time for each of these clock cycles, then the CPU execution
times can simply be calculated as the product of the total number of CPU clock cycles
per program and these clock cycle. Because of the clock cycle time and clock rate being
inversely related, this can also be written as CPU clock cycles for a program divided by
the clock rate.
or
Since the CPU execution time is a product of these two factors, you can improve
performance by either reducing the length of the clock cycle time or by the number of
clock cycles required for a program. A clock cycle is the basic unit of time to execute
one operation/pipeline stage/etc. The clock rate (clock cycles per second in MHz or
GHz) is inverse of clock cycle time (clock period) CC = 1 / CR.
The clock rate basically depends on the specific CPU organization, whether it is
pipelined or non-pipelined, the hardware implementation technology - the VLSI
technology that is used. A 10 ns clock cycle relates to 100 MHz clock rate, a 5 ns clock
cycle relates to 200 MHz clock rates and so on. If you're looking at a 250 ps clock cycle,
then it corresponds to 4 GHz clock rate. The higher the clock frequency, the lower is
your clock cycle.
As an example, consider the following problem:
A program runs on computer A with a 2 GHz clock in 10 seconds. What clock rate must
a computer B run at to run this program in 6 seconds? Unfortunately, to accomplish
this, computer B will require 1.2 times as many clock cycles as computer A to run the
program.
4
You find that the second processor should run at a clock rate of 4 GHz if you want to
finish the program a little earlier.
When you have to find out the total execution time in terms of the total number of
clock cycles multiplied by the clock cycle period, you have a problem of calculating the
total number of clock cycles. Not all instructions take the same amount of time to
execute - say you'll have to know the number of clock cycles that each instruction takes
and you should be able to add up all these clock cycles to find out the total number of
clock cycles. One way to think about execution time is that it equals the number of
instructions multiplied by the average time per instruction. Somehow, if we find out the
average time per instruction, we should be able to calculate the execution time. A
computer machine (ISA) instruction is comprised of a number of elementary or micro
operations which vary in number and complexity depending on the instruction and the
exact CPU organization (Design). A micro operation is an elementary hardware
operation that can be performed during one CPU clock cycle. This corresponds to one
micro-instruction in microprogrammed CPUs. Examples: register operations: shift, load,
clear, increment, ALU operations: add , subtract, etc. Thus, a single machine instruction
may take one or more CPU cycles to complete termed as the Cycles Per Instruction
(CPI). Average (or effective) CPI of a program: The average CPI of all instructions
executed in the program on a given CPU design.
Example problem:
• Computers A and B implement the same ISA. Computer A has a clock cycle
time of 250 ps and an effective CPI of 2.0 for some program and computer B has
a clock cycle time of 500 ps and an effective CPI of 1.2 for the same program.
Which computer is faster and by how much?
Each computer executes the same number of instructions, I, so
5
Computing the overall effective CPI is done by looking at the different types of
instructions and their individual cycle counts and averaging.
where ICi is the count (percentage) of the number of instructions of class i executed,
CPIi is the (average) number of clock cycles per instruction for that instruction class and
n is the number of instruction classes.
The overall effective CPI varies by instruction mix – is a measure of the dynamic
frequency of instructions across one or many programs.
To look at an example, consider the following instruction mix:
How much faster would the machine be if a better data cache reduced the average load
time to 2 cycles?
– Load  20% x 2 cycles = .4
– Total CPI 2.2  1.6
– Relative performance is 2.2 / 1.6 = 1.38
How does this compare with reducing the branch instruction to 1 cycle?
– Branch  20% x 1 cycle = .2
– Total CPI 2.2  2.0
– Relative performance is 2.2 / 2.0 = 1.1
We can now write the basic performance equation as:
6
or
CPU time Instructions ClockCycles Seconds

  
Program Program Instruction ClockCycle
1
ExecutionTime   (Instr. Count)  (CPI)  (cycle time)
Performance
These equations separate the three key factors that affect performance
 Can measure the CPU execution time by running the program

 The clock rate is usually given
 Can measure overall instruction count by using profilers/ simulators
without knowing all of the implementation details
 CPI varies by instruction type and ISA implementation for which we
must know the implementation details
To conclude, if you look at the aspects of the CPU execution time, you have three
factors which affect the CPU execution time - the clock cycle time, the average number
of clock cycles per instruction which is your CPI value and the instruction count. The
various factors that affect these three parameters are:
 Instruction count is affected by different factors - depends on the way the

program is written, if you are a skilled programmer, you use a crisp algorithm and
you code it appropriately, then it is going to use less number of instructions. So,
the first thing depends upon the algorithm that you want to use and the skill of the
programmer who writes this code. The second thing is once you've written a
code, the compiler is responsible for translating these instructions into your
machine instructions. The compiler should be an optimizing compiler so that it
translates this code into fewer number of machine instructions. The compiler
definitely has a role to play in reducing the instruction count, but remember the
7
compiler can only use the instructions that are supported in your instruction set
architecture. So the instruction set architecture also plays a role in reducing the
instruction count. In the previous session, we've looked at how the same
operation can be implemented as different sequences of instructions depending
upon the ISA. So with the help of the ISA, the compiler will be able to generate
code which uses less number of machine instructions.
 Clock cycle time depends upon the CPU organization and also depends upon the
technology that is used. By organization, we mean whether the instruction unit is
implemented as a pipelined unit or a non-pipelined unit. Pipelining facilitates multi
cycle operations, which reduce the clock cycle time. This will be dealt with in
detail in the subsequent modules.
 CPI, which is the average number of clock cycles per instruction, depends upon
the program used because you may use complicated instructions which have a
number of elementary operations or simple instructions. Similarly, the compiler
may translate the program using complicated instructions instead of using
simpler instructions. So, the compiler may also have a role to play, and because
the compiler is only using the instructions in your ISA, the ISA definitely has a
role to play. Finally, the CPU organization has also a role to play in deciding the
CPI values.
Having identified the various parameters that will affect the three factors constituting the
CPU performance equation, computer designers should strive to take appropriate
design measures to reduce these factors, thereby reducing the execution time and thus
improving performance.
To summarize, we've looked at how we could define the performance of a processor

and why performance is necessary for a computer system. We have pointed out
different performance metrics, looked at the CPU performance equation and the factors
that affect the CPU performance equation. This module also provided different
examples which illustrate the calculation of the CPU execution time using the CPU
performance equation.

Architecture 3 PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Architecture 3 PDF

Загружено:

Авторское право:

Доступные форматы

e-PG PATHSHALA- Computer Science

You should basically be able to

• Why is some hardware better than others for different programs?

There are different things that affect the performance of a computer

• How long does it take for my job to run?

• How many jobs can the machine run at once?

If we have to maximize performance, we obviously need to minimize our

Performance = 1/ Execution time

If a processor X is n times faster than Y, then,

Decreasing response time almost always improves throughput.

As an example, If computer A runs a program in 10 seconds and computer B runs the

Speedup of A over B = 20 /10 = 2, indicating A is two times faster

As an example, consider the following problem:

To look at an example, consider the following instruction mix:

– Load  20% x 2 cycles = .4

– Total CPI 2.2  1.6

– Relative performance is 2.2 / 1.6 = 1.38

– Branch  20% x 1 cycle = .2

– Total CPI 2.2  2.0

– Relative performance is 2.2 / 2.0 = 1.1

We can now write the basic performance equation as:

CPU time Instructions ClockCycles Seconds

 Can measure the CPU execution time by running the program

 Instruction count is affected by different factors - depends on the way the

To summarize, we've looked at how we could define the performance of a processor

Вам также может понравиться