Вы находитесь на странице: 1из 12

3rd Seminar on Science and Technology, 10-11 August 2004, Kota Kinabalu, Sabah

A NEW 16-BITS RISC PROCESSOR ARCHITECTURE: CONTROLLER STATE MACHINES AND FUNCTIONAL VERIFICATION USING VERILOG HDL Ismail Saad, Pukhraj Vaya, Abu Bakar A.R, Wan Hoong Wai School of Engineering and Information Technology University Malaysia Sabah, Locked Bag 2073, 88999 Kota Kinabalu, Sabah, Malaysia Tel: +60-8-832-0000 x 3147/3066, Fax: +60-8-832-0348 (e-mail: ismail_s@ums.edu.my, vaya@ums.edu.my , abubakar@seit.ums.edu.my) ABSTRACT This paper presents the design and simulation of new 16-bits RISC microprocessor architecture with an emphasis on state machines namely Controller State Machine (CSM) model and the processor functionality verification using Verilog Hardware Description Language (HDL). The processor system consists of ROM, RAM, I/O and CPU. The CPU module is merely a shell which instances the real processor definition in cpu_core.v, control.v, datapath.v and alu.v module. The design and verification of CSM, which represents the core mechanism of control unit architectural design, are elaborated in detail in this paper. The processor offers 36 types of instruction to be used by the programmer. The functional verification task of the processor is carried out using VCS (Verilog Code Simulator) simulators by executing the 36 instructions which four of them are discussed in this paper. Key words: Verilog HDL, RISC, Datapath, Behavioural Model, VCS Simulator


Introduction Microprocessor application is not limited to personal computer but also used in a specific field

such as robotics, communications, control systems, etc [1-5]. However, the existing process of designing a very large scale ICs such a new microprocessor for specific application is complicated, time consuming and prone to human errors. Thus, we have employed the design methodology based on the Verilog-HDL (Hardware Description Language) software for our new architecture of 16-bits RISC microprocessor. The

3rd Seminar on Science and Technology, 10-11 August 2004, Kota Kinabalu, Sabah

Verilog is a tool that simplify the design processes by allowing designer to describe the design at the highest level of abstraction (behavioral and register transfer level) [8-10]. The design can be tested by simulation before sent-off for fabrication and thus, cost and time are saved [6-7]. However, the success of designing such new processors depends mainly on the accurate design of the state controller in the system control unit [12-14]. In this paper, we present the design and simulation of a new 16-bits processor architecture based on HDL design methodology using Verilog language with an emphasis on the design and verification of controller state machine as well as the processor functionality.


Processor Architecture The new 16-bits RISC processor design has a multiplexed 16-bit data and address path. The

instruction has a variable length, as it takes one word for instruction that operates within registers only and two words for instructions operated on registers/memory and register/immediate. The 16-bits instruction field consists of 2-mode bit, 1-bit each for set condition (set_bit) and test condition (test_bit), 3-bit ALU function (ALU_func) and 3-bit each for destination register (Rd), source1 register (Rs1) and source2 register (Rs2). The processor can execute 36 instructions, which are grouped into 2 instructions type; arithmetic/logical and load/store. There are six registers in the processor where 3 of them are general purpose (R1,R2,R3) while the other 3 are dedicated register that is PC (Program Counter), IR (Instruction Register) and DR (Direct Register). On top of that, a dummy register, R0 (always zero) is also included in the register file which follow the convention of RISC architecture [15].

3. Processor Verilog Module Systems The top module of processor system is defined in system.v file. It consists of CPU, 256 words of ROM (addresses 0-255), 256 words of RAM (addresses 256-511), I/O module consisting of a bank of 16 switches (mapped at address 512) and a bank of 16 LEDs (mapped at address 513), transparent address latch that stores address and decoder module to select either ROM, RAM or I/O modules. The second top

3rd Seminar on Science and Technology, 10-11 August 2004, Kota Kinabalu, Sabah

module is cpu.v file, which is merely a shell which simulates the pad ring and instances the real processor architecture definition in the cpu_core.v module. The cpu_core.v module instances the processor control.v and datapath.v definition. Finally, datapath.v module instances the ALU definition in alu.v file as show in figure 1 below. A monitor.v module is also written for monitoring the activity of processor design. The control.v, datapath.v, monitor.v and alu.v module includes an opcodes.v module which contains a definition of operational codes and oprenads of the processor architecture.


cpu_core.v control.v

datapath.v alu.v Fig.1: Verilog Module Structure for Processor

4. Processor Control Unit Design The control unit is the core of the microprocessor. It accepts as input, those signals that are needed to operate the controller, and provides as output all the control signals necessary to effect that operation. Thus, two main functions of control unit are to execute operations in a proper sequence by means of CSM and to interpret the instruction words and consequently generate the control signal that causes each instruction to be executed. Our control unit design consists of 16-bit Instruction Register (IR), 1-bit Zero Flag register, Controller State Machine and Sub States of memory cycles and the different types of generated control signals as illustrated in Fig.2:

3rd Seminar on Science and Technology, 10-11 August 2004, Kota Kinabalu, Sabah


ReadPC_1 ReadR0_1 Zero Flag

Zero Function TrisPC ReadR1_1 TrisALU ReadR2_1 ReadR3_1 ReadPC_2 nTrisRd ReadR0_2 PC_inc ReadR1_2 Rs2_sel ReadR2_2 ReadR3_2 WriteR2 WriteR1 TrisRs2 TrisRd

0: Fetch1 3: Fetch2 1: Execute Sub_state : 0: address_setup 1: address_hold Zero zero_flag_reg

3: data_setup 2: data_hold

testbit setbit ModeBit Opcode ALUfunc Rd Rs1

LoadDR LoadPC WriteR3 WritePC



14 13


11 10

Fig.2: Processor Control Module Architecture

4.1 Controller State Machine The CSM has three states: Fetch1 (00), Fetch2 (11) and Execute (01) that coded by using gray code. The controller state machine is based on the Mealy machines as referred in the reference [1,14]. Details of the state transition are shown in the state diagram in the Fig.3.
TRUE, 00 or FLASE, 00/01 Fetch1 (00) TRUE, 10,11 TRUE, XX Fetch2 (11) TRUE, 10/11 FALSE, 10/11 FALSE, XX Execute (01) TRUE, 01

Fig.3: Controller State Machine State Diagram

3rd Seminar on Science and Technology, 10-11 August 2004, Kota Kinabalu, Sabah

In addition it also has 4 memory cycles sub states: address_setup (00), address_hold (01), data_setup (11) and data_hold (10). To distinguish transitions of operation from one state to another, the data_hold sub state of memory cycle and the 2-mode bit fields of instruction are used. Referring to Fig. 3, TRUE or FALSE represents the presence of data_hold in the sub state cycle, the 2-bit (00,01,11,10) is represent the possible values of mode bit and XX is referred as dont care condition. Details of states transition are explained details below.

Fetch1 states case: The Fetch1 states will remain in its current state when the data_hold is TRUE and mode bit is 00. Then, Fetch1 states will jump to the Execute states and if the data_hold is TRUE and mode bit is 01. In order for the Fetch1 states to jump from its present state to Fetch2, the condition to be fulfilled is when the data_hold is TRUE and mode bit is 10 or 11. When the data_hold is FALSE and mode bit is 01 or 10, the Fetch1 state will remains at its current state. Execute states case: The Execute state will remain in its current state during the FALSE data_hold and dont care conditions (XX) of mode bit occur. Then, if data_hold is TRUE and mode bit is dont care conditions then the next state will be Fecth1. Fetch2 states case: For the purpose of Load and Store operations both Fetch2 and Execute states will be used accordingly. If the data_hold is TRUE and mode bit is 10 or 11 then the next state will be jump to execute states. Otherwise, if data_hold is FALSE and mode bit is 10 or 11 then the current state will be remained.

Generally, Fetch1 states is dedicated for register and register instruction type, which uses 4 clock cycle or 1 memory cycle to be executed. Execute states is for register and immediate instruction type, that uses 8 clock cycle or 2 memory cycles to be executed. For Load and Store instruction type, which is the

3rd Seminar on Science and Technology, 10-11 August 2004, Kota Kinabalu, Sabah

longest instruction to be executed, Fetch2 and Execute state is uses 3 memory cycles or 12 clock cycles. Hence, all instructions are completed in exactly 12 clock cycles. Gray code style is employed for the state assignments since each of the state transition requires only one bit changing. This approach is chosen to reduce the glitch problem during bit changing process. This controller state machine is coded in verilog by using case statement and the algorithm can be viewed as follows:
case (state) `Fetch1: if (sub_state == `data_hold && (ModeBit == 2'b00)) state <= `Fetch1; else if (sub_state == `data_hold && ModeBit == 2'b01) state <= `Execute; else if (sub_state == `data_hold && ((ModeBit == 2'b10) || (ModeBit == 2'b11))) state <= `Fetch2; else if (ModeBit == 2'b01 || ModeBit == 2'b00) state <= `Fetch1; `Fetch2: if (sub_state == `data_hold && ((ModeBit == 2'b10) || (ModeBit == 2'b11))) state <= `Execute; else if (ModeBit == 2'b10 || ModeBit == 2'b11) state <= `Fetch2; `Execute: if (sub_state == `data_hold ) state <= `Fetch1; else state <= `Execute;

4.2 VERIFICATION OF THE CONTROLLER STATE MACHINE The verification of the controller state machine is done by simulating the whole control unit together with the instructions that saved in the ROM. The states transition will take place when data_hold is TRUE. With remain states are excluded three states transition discussed accordingly in the following. The states and sub_state of the processor are defined in the verilog control module as below:
`define Fetch1 `define Execute `define Fetch2 `define `define `define `define address_setup address_hold data_setup data_hold 0 1 3 0 1 3 2

3rd Seminar on Science and Technology, 10-11 August 2004, Kota Kinabalu, Sabah

Fig. 4.0: Fetch1 (0) to Execute (1) state transition

Fig. 4.0 shows the transition from Fetch1 (2) to Execute (1) state happen when data_hold (3) is TRUE (2) and mode bit equals to 01. It also shown the Execute operation (register and immediate) uses 8 clock cycles to be executed denoted by c1 to c2 range.

Fig. 4.1: Execute (1) to Fetch1 (0) state transition

Fig. 4.1 shows the transition from Execute (1) to Fetch1 (0) state happen when the data_hold (3) is TRUE (2) and mode bit equals to 00. In the first data_hold TRUE there is state transition happen due to the previous instruction (4040) is Execute operation where it uses 8 clock cycles as denoted by red line in the Fig. 4.1. The Fetch1 operation (register and register) is the shortest types of instruction to be executed where it only used 4 clock cycles as denoted by c1 to c2 range.

Fig. 4.2: Fetch1 (0) to Fetch2 (3) state transition

3rd Seminar on Science and Technology, 10-11 August 2004, Kota Kinabalu, Sabah

Fig. 4.2 shows the transition from Fetch1 (0) to Fetch2 (3) state happen when the data_hold is TRUE (2) and mode bit equals to 11. The executed instruction (c00b) is longest types of instruction. It used for store and load operation where it requires Fetch2 (3) and Execute (1) state and total 12 clock cycles needed in order to executed the instruction as denoted by c1 to c2 range.


Processor Functionality Verification Verification of processor functionalities is done for the basic operations which include arithmetic,

logic and shift operation. This processor architecture offers 36 types of instructions available for use. At the simulation level, the functionalities of the processor are verified through timing diagram of each module as generated in the VCS simulator windows. For example, only 4 types of instructions are shown here which Register + Immediate, Register + Register and Load/Store Instructions. 5.1 Register + Immediate value operation test //R1 R0 + 259(103hex);

Rd Rs1Addi Imm

This instruction is used to verify add operations between Register1 (R1) and immediate value (259) where the immediate value is stored into Register1. Details of the process are shown in Fig.5.0.

Fig.5.0: Timing Diagram of Register + Immediate Operation

3rd Seminar on Science and Technology, 10-11 August 2004, Kota Kinabalu, Sabah


Register + Register operation test // R3 R1 + R2;

Rd Rs1 Addr Rs2

This instruction is used to verify add operation within registers. In this example the instruction involves add operation between Register1 and Register2 then output is stored into Register3. The immediate value 259(0103hex) in the Register1 is added to immediate value in the Register2: 93(005dhex) that is stored initially, then result: 352(0160hex) is stored into Register 3. Details of the process are shown in the Fig.5.1.

Fig.5.1: Timing Diagram of Register + Register Operation


Store operation test // mem[R0 + 259] R1;

mem[Rs1+ Imm] Rd

This instruction is used to verify store operation. In this example the instruction involves store operation from Register2 into memory at location [259]. The Register0 (R0) is a dummy register and it always 0. After Write signals enabled the content of Register2 is stored into memory addresses at [259]. Details of the process are shown in the Fig.5.3.

3rd Seminar on Science and Technology, 10-11 August 2004, Kota Kinabalu, Sabah

Fig.5.3: Timing Diagram of Store Operation

The content of memory at location [259] after store operation is now equal to 10 as shown in the Fig.5.4.

Fig.5.4: Interactive Display of Memory Content


Load operation test // R2 mem[R0 + 259];

Rd mem[Rs1+ Imm]

This instruction is used to verify load operation. In this example the instruction involves load from memory at location [259] into Register 2. The content of memory locations addresses at [259] is loaded into destination register (Register2). Details of the process are shown in the Fig.5.5.

3rd Seminar on Science and Technology, 10-11 August 2004, Kota Kinabalu, Sabah

Fig.5.5: Timing Diagram of Load Operation


New 16-bit processor architecture is successfully designed based on HDL methodology and

simulated completely through VCS simulator in Synopsys tools in order to verify the processor functionalities. The success of the processor depends to the state controller of the system. As presented in the paper, controller state machine is designed to control the state transition and its functionalities is verified through execution of 36 instructions out of which 4 instructions as test bench cases are explained thoroughly in the paper.

3rd Seminar on Science and Technology, 10-11 August 2004, Kota Kinabalu, Sabah



[1] D.D Gajski, Principles of Digital Design, Prentice Hall, 1997. [2] M. Zwolinski, Digital System Design with VHDL, Prentice Hall, 2000. [3] D. A. Patterson & J.L. Hennesy, Computer Organization and Design - The Hardware/ Software Interface, Morgan Kaufmann, 1999. [4] M. Morris Mano, Digital Logic and Computer Design, Prentice Hall, 1997. [5] G.H Miller, Microcomputer Engineering, 2nd edition, Printice Hall, 1998. [6] Dally, W-J. Chang, A. The Role of Custom Design In ASIC Chips, Proceedings of the 37th conference on design automation, ACM Press, pg 643-647, 2000. [7] Flynn, M-J. Winner, R-I. ASIC microprocessor, Proceedings of the 22nd annual International Workshop on Microprogramming and Microarchitecture, ACM Press, pg 237-243, 1989. [8] Samir Palnitkar, Verilog HDL A Guide to Digital Design and Synthesis, Printice Hall, 1995. [9] Lioupis, D. Papagiannis, A. Psihogiou, D., A Systematic approach to software peripherals for embedded system, Proceedings of the ninth International symposium on hardware/software codesign, ACM Press, pg 14-145, 2001. [10] J.C Diaz, P. Plaza, L.A. Merayo, P. Scarfone, M. Zamboni, Design and validation with HDL of a complex input/output processor for an ATM switch : the CMC, Verilog HDL conference, Proceedings, pg 67-71, 1995. [11] A.E Mahdi, I.A Grout, PLL based ASIC system for DSP real-time analogue interface, www.ece.ul.ie/hompage/ian_grout/publications.html ,2002. [12] M.G Arnold, T.A Bailey, J.R Cowles, J.J Cupal, A.W Wallace, A purely data structure for accurate high level timing simulation of synchronous designs, Verilog HDL Conference, pg 101- 107, 1994. [13] O. Hebert, I.C Kraljic, Y. Savaria, A Method to Derive Application-Specific Embedded Processing Cores, International Conference on Hardware Software Codesign, San Diego, California, United States, ACM Press, pg 88-92, 2000. [14] S.Golson., State machine design technique for Verilog and VHDL, Synopsys Journal of High-Level Design, pg 1-20, September 1994. [15] D.A Patterson, C.H Sequin, RISC 1: A Reduced Instruction Set VLSI Computer, International Symposium on Computer Architecture (selected paper), Spain, pg 216-230, 1998.