501-Midterm Ps

CIS 501 Mid-Term Exam
Fall 2001 Name:__________________________ SSN: __________________________
Problem 1: ____ (of 20) Problem 2: ____ (of 15) Problem 3: ____ (of 13) Problem 4: ____ (of 25) Problem 5: ____ (of 27) Extra Credit: ____ (of 13) Final Score: _____ (of 100)
This is a 90 minute, closed book exam.
November 1, 2001
1 True or False (20 points, 2 points each question)

Indicate whether each of these statements is true or false. If the statement is false, give a reason why it is so (hint: because it is not true is not a good reason). a. Increasing the size of a cache results in lower miss rates and higher performance.
b. Amdahls law states that transistor densities will increase exponentially.
c.Three of the six principles of the RISC movement are 1) few addressing modes, 2) single-cycle instruction execution, and 3) reliance on compiler optimizations.
d. Given a ve-stage DLX pipeline (IF,ID,EX,MEM,WB) in which branches complete in the EX stage and are handled by stalling, then converting all branches to delayed branches reduces the average penalty (number of cycles added to execution) per branch from two cycles to one.
e. If processor A has a higher clock rate than processor B and a higher MIPS rating than processor B, then it will execute a given program as fast or faster than processor B.
f. Processors dont implement Beladys cache replacement algorithm because it slows down the hit time of the cache.
November 1, 2001
g. One of the advantages of an accumulator architecture over a load/store architecture is smaller code size for programs.
h. It is impossible to have WAR hazards in an in-order DLX pipeline with single-cycle operations. The possibility for WAR hazards arises only when multi-cycle operations (like oating-point operations and cache misses) are introduced.
i. Address synonyms can be avoided simply by using a physically tagged cache.
j. Increased transistor counts have made CISC architectures more attractive than RISC architectures.
November 1, 2001
2 Short Answer (15 points, 3 points each question)

Answer each of the following questions in a few sentences. Uses examples if needed. a. One of the criteria for a good instruction set is programmability. What is programmability? The RISC movement started as the denition of programmability was changing. What was this change? According to Wulf, what are two instruction set design principles that improve its programmability under this new denition?
b. Pipelining is used because it improves instruction throughput. Increasing the level of pipelining cuts the amount of work performed at each pipeline stage, allowing more instructions to exist in the processor at the same time and instructions to complete at a more rapid rate. However, throughput will not improve as pipelining is increased indenitely. Give two reasons for this.
c. In benchmarking, it is sometimes useful to summarize the performance of a group of benchmarks into a single number. There are three potential functions that can perform this summary: arithmetic mean, harmonic mean, and geometric mean. Which should be used?
November 1, 2001
d. Why is miss rate not a good metric for evaluating cache performance? What is the appropriate metric? Give its denition. What is the reason for using a combination of rstand second- level caches rather than using the same chip area for a larger rst-level cache?
e. The original motivation for using virtual memory was compatibility. What does that mean in this context? What are two other motivations for using virtual memory?
November 1, 2001
3 Branch Prediction (13 points)

Consider the following sequence of actual outcomes for a single static branch. T means the branch is taken. N means the branch is not taken. For this question, assume that this is the only branch in the program. TTTNTNTTTNTNTTTNTN a. (5 points) Assume that we try to predict this sequence using a BHT with one-bit counters. The counters in the BHT are initialized to the N state. Which of the branches in this sequence would be mis-predicted? Use this table for your answer.
predictor state before prediction
branch outcome T T T N T N T T T N T N T T T N T N
misprediction?
November 1, 2001
c. (5 points) Now, assume a two-level branch predictor that uses one bit of branch historyi.e., a one-bit BHR. Since there is only one branch in the program, it does not matter how the BHR is concatenated with the branch PC to index the BHT. Assume that the BHT uses one-bit counters and that, again, all entries are initialized to N. Which of the branches in this sequence would be mis-predicted? Again, use this table.
predictor state before prediction
branch outcome T T T N T N T T T N T N T T T N T N
misprediction?
c. (3 points) What is a return-address-stack? When is a return address stack updated?
November 1, 2001
4 Caches and Address Translation (25 points)

a. (3 points) Which of the following techniques are aimed at reducing the cost of a miss? dividing the current block into sub-blocks, a larger block size, the addition of a second level cache, the addition of a victim buffer, early restart with critical word rst, a writeback buffer, skewed associativity, software prefetching, the use of a TLB, and multi-porting.
b. (3 points) Why are the rst level caches usually split (instructions and data are in different caches) while the L2 is usually unied (instructions and data are both in the same cache)?
For the rest of the question, consider a 64-byte cache with 8 byte blocks, an associativity of 2 and LRU block replacement. Virtual addresses are 16 bits. The cache is physically tagged. The processor has 16KB of physical memory. c. (3 points) What is the total number of tag bits?
November 1, 2001
b. (3 points) Assuming there are no special provisions for avoiding synonyms, what is the minimum page size?
c. (3 points) Assume each page is 64 bytes. How large would a single-level page table be? Each page requires 4 protection bits, and entries must be an integral number of bytes.
d. (10 points) For the following sequence of references, label the cache misses. Using Mark Hills 3C model, label each miss as being either a compulsory miss, a capacity miss, or a conict miss. The addresses are given in octal (each digit represents 3 bits). Assume the cache initially contains block addresses: 000, 010, 020, 030, 040, 050, 060, and 070 which were accessed in that order.
cache state prior to access reference address 024 100 270 570 074 272 004 044 640 000 410 710 550 570 410 miss? which?
November 1, 2001
5 Instruction Sets and Pipelining (27 points)

In class, we have talked about the classic 5-stage DLX pipeline: IF, ID, EX, MEM, WB. The DLX pipeline is designed specically to execute the DLX instruction set. DLX is a load store architecture that performs one memory operation per instruction, hence a single MEM stage in the pipeline sufces. Also, its most common addressing mode is register displacement addressing. The EX stage is placed before the MEM stage to allow it to be used for address calculation. In this question we will consider a variation in the DLX instruction set and the interactions of this variation with the pipeline structure. The particular variation we are considering involves swapping the MEM and EX stages, creating a pipeline that looks like this: IF, ID, MEM, EX, WB. This change has two effects on the instruction set. First, it prevents us from using register displacement addressing (there is no longer an EX in front of MEM to accomplish this). However, in return we can use instructions with one memory input operand, i.e., register-memory instructions. For instance: multf_m f0,f2,(r2) multiplies the contents of register f2 and the value at memory location pointed to by r2, putting the result in f0. a. (3 points) Dropping the register displacement addressing mode is potentially a big loss, since it is the mode most frequently used in DLX. Why is it so frequent? Give two popular software constructs whose implementation uses register displacement addressing (i.e., uses displacement addressing with non-zero displacements)?
b. (3 points) What is the difference between a dependence and a hazard?
November 1, 2001
10
c. (5 points) In this question we will work with the SAXPY loop.

do I = 0,N Z[I] = A*X[I] + Y[I]
Here is the new assembly code.

0: 1: 2: 3: 4: 5: 6: 7: 8: 9: slli r2,r1,#3 addi r3,r2,#X multf_m f2,f0,(r3) addi r4,r2,#Y addf_m f4,f2,(r4) addi r4,r2,#Z sf f4,(r4) addi r1,r1,#1 slei r6,r1,r5 bnez r6,#0 // I is in r1 // A is in f0
// N is in r5
Using the instruction numbers, label the data and control dependences. For 3 extra credit points, account for cross-iteration hazards (hazards between instructions from different iterations) and hazards through memory if any.
November 1, 2001
11
d. (10 points) Fill in the pipeline diagram for code for the new SAXPY loop. Label the stalls as d* for data-hazard stalls and s* for structural stalls. What is the latency of a single iteration? (The number of cycles between the completion of two successive #0 instructions). For this question, assume that FP addition takes 2 cycles, FP multiplication takes 3 cycles and that all other operations take a single cycle. The functional units are not pipelined. The FP adder, FP multiplier and integer ALU are all separate functional units, such that there are no structural hazards between them. As in DLX, the register le is written by the WB stage in the rst half of a clock cycle and is read by the ID stage in the second half of a clock cycle. In addition, the processor has full forwarding. The processor stalls on branches until the outcome is available which is at the end of the EX stage. The processor has no provisions for maintaining precise state.
instruction 0: slli r2,r1,#3 1: addi r3,r2,#X 2: mulf_m f2,f0,(r3) 3: addi r4,r2,#Y 4: addf_m f4,f2,(r4) 5: addi r4,r2,#Z 6: sf f4,(r4) 7: addi r1,r1,#1 8: slei r6,r1,r5 9: bnez r6, #0 0: slli r2,r1,#3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
e. (3 points) In DLX, what is the reason for forcing non-memory operations to go through the MEM stage rather than proceeding directly to the WB stage?
November 1, 2001
12
f. (3 points) Aside from the direct loss of register displacement addressing and the subsequent instructions required to explicitly compute addresses, what are two other disadvantages of this sort of pipeline?
h. (10 extra credit points) Reduce the stalls by pipeline scheduling a single loop iteration. Show the resulting code and ll in the pipeline diagram. You do not need to show the optimal schedule for a correct response.
instruction
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
November 1, 2001
13

501-Midterm Ps

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

501-Midterm Ps

Загружено:

Авторское право:

Доступные форматы

CIS 501 Mid-Term Exam

Fall 2001 Name:__________________________ SSN: __________________________

This is a 90 minute, closed book exam.

CIS 501 Mid-Term Exam

1 True or False (20 points, 2 points each question)

b. Amdahls law states that transistor densities will increase exponentially.

CIS 501 Mid-Term Exam

i. Address synonyms can be avoided simply by using a physically tagged cache.

CIS 501 Mid-Term Exam

2 Short Answer (15 points, 3 points each question)

CIS 501 Mid-Term Exam

CIS 501 Mid-Term Exam

3 Branch Prediction (13 points)

predictor state before prediction

CIS 501 Mid-Term Exam

predictor state before prediction

c. (3 points) What is a return-address-stack? When is a return address stack updated?

CIS 501 Mid-Term Exam

4 Caches and Address Translation (25 points)

CIS 501 Mid-Term Exam

CIS 501 Mid-Term Exam

5 Instruction Sets and Pipelining (27 points)

b. (3 points) What is the difference between a dependence and a hazard?

CIS 501 Mid-Term Exam

c. (5 points) In this question we will work with the SAXPY loop.

Here is the new assembly code.

CIS 501 Mid-Term Exam

CIS 501 Mid-Term Exam

CIS 501 Mid-Term Exam

Вам также может понравиться

Fall 2001 Name: SSN: