Академический Документы
Профессиональный Документы
Культура Документы
Instructor: S. D. Cotofana
IN4343
Contents
Contact Information Introduction Other areas of optimization Modied Multiplier Design Modied Divider Design Appendix A: 32 bit twos complementation logic 2 3 4 5 10 13
Page 1
IN4343
Contact Information
Name: E-mail: Student number: Name: E-mail: Student number: Name: E-mail: Student number: Stefan van Breukelen s van breukelen@hotmail.com 4192591 Wouter van Teijlingen wouter@van-teijlingen.nl 4170377 Remco de Wit r.dewit-1@student.tudelft.nl 4179889
Page 2
IN4343
Introduction
The goal of the Processor Design Project is to improve the LEON3 core performance. Improving the core performance can be done by optimizing the design of the core in terms of performance (number of clock cycles, data path delay etc.), area and/or power dissipation. Depending on the requirements of the application, a designer can choose to optimize the design for any of these characteristics. Our proposal to improve the LEON3 core performance is based on optimizing the design in terms of performance. We believe that performance is still a key requirement in many applications and improving the design in terms area or power will yield a lower gain (using a FPGA) in comparison with a ASIC design. Therefor one can better choose a ASIC design when optimizing for area or power. It is because of these factors that area and power consumption will not be taken into account when improving the performance, as long as they are within reasonable1 limits. Due to the frequent use of arithmetic operations like multiplying and dividing in modern day computers, we propose to modify the multiplier and divider. Besides the multiplier and divider other computer architecture and computer arithmetic aspects were considered in order to improve the performance of the LEON3 core. !!!say something about the rest of the report!!!
1 not
doubling resources
Page 3
IN4343
Adder
One of the possible areas of optimization was the adder module of the processor. This is an obvious contender for optimization because addition is one of the most used operations. In order to improve the preformance of the adder one can choose to reduce the number of cycles or reduce the data delay path. To determine whether the adder module was suitable for optimization a testbench was created. This testbench helped with understanding the adder and determining the number of cycles required. The testbench showed that additions were done in one cycle, therefore no clock cycles could be gained. !!!check data delay path!!! It became clear that little performance could be gained by modifying the adder module.
Superscalar architecture
In a superscalar architecture several instructions can be initiated simultaneously and executed independently, therefore allowing for a better throughput. Superscaling is similar to pipelining but in pipelining multiple instructions have to be in dierent pipeline stages at any given time.
Out-Of-Order execution
Another way of improving the performance of the LEON3 core can be achieved by reordering the instructions. By reordering the instructions the processor can avoid being idle and thereby improve the performance. This reordering of instructions is called Out-Of-Order (OOO) execution. The processor will execute the instructions in order of availability of data or operands and avoids being idle while data is retreived for the next instruction.
Conclusion
Due to the complexity of these optimizations and the small time window the focus will lie on the multiplier and divider. These optimizations are kept for future work.!!!!!
Page 4
IN4343
Page 5
IN4343
Page 6
IN4343
Partial Product 0 x Multiplicand 1 x Multiplicand 1 x Multiplicand 2 x Multiplicand -2 x Multiplicand -1 x Multiplicand -1 x Multiplicand 0 x Multiplicand
one. In an interesting paper titled A Logarithmic Time Method for Twos Complementation by Kang and Gaudiot a O(log(n)) algorithm is proposed for twos complementation, where, in this case, n is the length of the multiplicand. There proposed solution can be generalized to any number of bits in the system. It is implemented for 32 bits values in this design as it was considered as an interesting design for complementation of values. Normally one inverts the value and adds one. Using this method full propagation of the carry bit is possible. In this design based on simple logic gates we have 5 stages before complementation is nished. In Figure 2 a simple 8 bit complementer is shown.
Page 7
IN4343
are calculated we are required to sum the products and produce the nal result. Exactly this is discussed in the following section.
Page 8
IN4343
Evaluation of performance
In this we evaluate the performance.
Page 9
IN4343
Barrel shifter
Due to normalization required by the SRT divider, on the dividend and divisor, a barrel shifter is implemented. A barrel shifter is a circuit which can shift data by a specic number of bits in a single cycle. Due to this property it is well suited for use in a fast SRT divider. Because of the normalization and the 32 bit size of the divisor, a left shifter with a maximum of 31 bit shifts is required. To build this barrel shifter a sequence of two-to-one multiplexers are used. An example of a 8-bit barrel shifter is given in gure x!!!. Using this approach input data can be shifted an arbitrary number of bits in a single cycle. The number of two-to-one multiplexers required for an n-bit input with m number of bit shifts, is n log2 m. The number of levels in a barrel shifter design is log2 m. For the 8-bit barrel shifter in gure x!!!! 24 (8 log2 8) two-to-one multiplexers are required and for a 7 bit shift 3 (log2 8) levels are required. For the SRT divider two barrel shifters are required: a 32 bit barrel shifter and a 64 bit barrel shifter, for the divisor and dividend respectively. Both of these barrel shifters require a maximum of 31 bit shifts in order to normalize the data. A higher number of bit shifts would be redundant because Page 10
IN4343
the dividend would be too large and an overow will be inevitable. The 32 bit barrel shifter requires 160 (32 log2 32) multiplexers and the 64 bit barrel shifter requires 320 (64 log2 32) multiplexers. Both barrel shifters have 5 (log2 32) levels and thus require 5 bits to select the number of bit shifted. The bits which are shifted out are used to determine whether an overow condition has occured. When the dividend is positive an overow occurs when a 1 is shifted out and when the dividend is negative an overow occurs when a 0 is shifted out.
IN4343
References
[1] J.-Y. Kang and J.-L. Gaudiot, A Simple High-Speed Multiplier Design, IEEE Trans. Computers, vol. 55, no. 10, pp. 1253-1258, Oct. 2006. [2] J.-Y. Kang and J.-L. Gaudiot, A Logarithmic Time Method for Twos Complementation, Proc. Intl Conf. Computational Science, pp. 212-219, 2005. [3] J.-Y. Kang and J.-L. Gaudiot, A Fast and Well-Structured Multiplier, Proc. Euromicro Symp. Digital System Design, pp. 508-515, Sept. 2004. [4] M. Prajapati, S. K. Lenka. Ecient Implementation of a Well-Structured Modied Booth Multiplier Design. Int. Journ. of VLSI and Embedded Systems, ISSN: 2249 - 6556, vol 04, Issue 03, May, 2013. [5] R. P. Rajput and M. N Shanmukha Swamy, High speed Modied Booth Encoder multiplier for signed and unsigned numbers , 2012 14th International Conference on Modelling and Simulation 978-0-7695- 4682-7/12 2012 IEEE. [6] S-R Kuang, J-P Wang and C-Y Guo, Modied Booth multipliers with a regular partial product array, IEEE Trans. on Circuit and Systems, vol.56, Issue 5, pp. 404-408, May 2009.
Page 12
IN4343
Page 13
IN4343
Page 14