You are on page 1of 16

16-Bit RISC PROCESSOR

by

Michael Telcide
Nemanja Stojanovic
Cleiton Juffo

Final Project Report


ECE 414: Computer Organization and Design
December 17, 2013

Project directed by
Dr. Onur Tigli
Asst. Professor of Electrical and Computer Engineering

Page |1
TABLE OF CONTENTS

Page

1. Table of Contents

2. Abstract

3. Introduction

4. Design and Results

5. State Machine Diagram

6. CPU Diagram

7. Simulations

8. Schematics

9. Conclusions and References

10

10. Appendix [codes, power data, testbench]

11

Page |2
ABSTRACT
In this project, we are designing a 16-bit RISC processor to be implemented in
Verilog. This design presents the structural design and the functional characteristics of a
general purpose RISC processor. The entire project was constructed as a bottom up approach
in the design method. We started with basic sequential and combinational building blocks of
NON-pipelined process and built on more complex blocks. The 16 bit RISC processor
architecture features 16 bit instruction words, 16 internal general-purpose registers each of
which can hold a 16 bit data word, 6 external address lines to ROM, and 6 external address
lines to an external memory (RAM). Each module was designed, synthesized and tested at
each level of implementation. Afterwards, the modules were interconnected and integrated in
a top-level simulation by appropriate port mapping.

Page |3

INTRODUCTION
Among all kinds of CPU in use today, the Reduced Instruction Set or RISC CPU has the
majority market share. It is most commonly used in embedded systems, which are in almost
every consumer products on the market. RISC CPUs are basic in nature and offers lowpower consumption and small size. They are sometimes referred to as load-store processor
because of the basic mechanics upon which it operates. The idea of RISC CPU is to reduce
the complexity of the system and increase the speed. Any complex operation can be split into
smaller chucks that can be calculated simultaneously in most cases. Other important features
of the RISC CPU include uniform instruction coding, which allows faster coding. A good
example is that the op-code is always in the same bit position in each instruction, which is
always one word long. Another advantage is a homogeneous register set, which allows any
register to be used in any context and simplify compiler design. Lastly, complex addressing
modes are replaced by sequences of simple arithmetic instructions. The convenience of the
RISC processor is a direct explanation why it dominates the CPU market.

Page |4
DESIGN and RESULTS
Below is the state machine that represents our design. It includes five states. Once the
machine has started or has completed all of the tasks, it enters the idle state. At the beginning
of the program, the first state it Fetch; this state is to get the instruction from ROM and send
it to the instruction register (IR). The program Counter (PC) is also incremented at this stage.
In the second state, Load, the instruction is decoded by the IR and the address of the
operands are sent to the register files (RF). On the next Execute state, the operands are loaded
into the ALU form the RF; a signal is sent to enable the ALU for a specific operation and
another signal may be sent to enable RAM if we have to write to or read from it. At the Store
state, all operations are stored in either the RF or RAM. If there was a jump instruction, PC
will be incremented by the offset value.
State machine

RESET

5'b10000;//Send/Rec
eive data to/from
RAM //store results
into RF, add offset to
PC

Store

//send
next instruction
address to ROM

Execute

5'b01000;//Load operands from


RF, //Perform ALU Operation,
//enable RAM read/write, send
address to RAM

Idle

5'b00001; //do nothing, all


signals are off

5'b00010;//retrieve
instruction from ROM and
load it into IR, //increment
PC

Fetch

5'b00100;//send addresses into RF

Load

Page |5
The 16 bit computer will be able to perform the following operations. Based on this table, we
designed our computer to operate under these three categories; arithmetic, logic, and branch
operations.

The most important part of our design is to figure out all the necessary components that will
be need to implement the 16 bit RISC computer and how they will be interacting with one
another. The drawing below illustrates our logic, the components, and the datapath we think
would be needed in this project.

Page |6

After conceptually verify that our logic works, our next biggest task was to figure out how to
implement it in Verilog. After a lot thinking and a lot of debugging, the final codes are given
in the next section. The top module is the CPU and all the other modules inside are whats
needed to make it functional.

Page |8

Simulation Results
Upon completion of our design, we implemented our CPU as specified in the guidelines. This
consisted of using several modules and files given to us to simulate ROM, RAM and UART
components. Synthesizing as was specified allowed us to simulate a working CPU using
Xilinx, and then we went ahead and generated a bit file. Upon loading the bit file into the
FPGA board, we realized our program was working, but not optimally. It conducted several
instructions, ranging from load immediate, to ALU operations and was also able to write a
register value into memory at a specified location. Even so, using the FPGA board, PuTTy
and the UART module, we could only see the first memory value and not the next 64. We
then felt best if we were to remove the modules and files given to us, and for us to re-run the
simulations on Xilinx of just our CPU and relevant components. This is the simulation that
follows:
Sample Program

Page |8
We wrote the sample program above to test our CPU components using the OP-Codes given
to us in the table on page 5 of this report. Using the testbench code (which is provided in the
appendix) we were able to simulate our CPU.
Simulation

In this simulation, we see that the btn_press has to be activated low for an amount of time,
and then set to low to mimic a de-bounce. After this stage, we begin to receive date from the
ROM unit. All the instructions listed in the sample program table are completed and we send
data to be written to the ram (highlighted by the yellow cursor line). Since we do not have a
RAM module attached, we cannot simulate reading from memory.

Page |9
RTL and Tech Schematic
The synthesized CPU was then able to generate an RTL/tech schematic of our components.
Due to the size of these components, the image was very detailed and complex.

P a g e | 10
CONCLUSION
Designing this 16-bit processor gives us a lot of insight about computer architecture
in general. Not only did we learn about each component and how to implement them in
Verilog, we also know how each one of them is related to the other and how the overall
operation should be. Due to the scale of this project, we spent a lot of hours coding and
debugging. We ran into multiple problems while designing the CPU, for example, we
realized that we needed additional states in order to execute all instruction. As we went step
by step throughout our implementation of the design, we realized how massive and intense
this project was. It gave us great insight into how much hard work and sacrifice there must
have been made to make such processors possible prior to tools such as FPGAs and Xilinx.
Although our CPU did not function properly, we feel confident in saying that this was a
tremendous learning experience.

REFERENCES
R. Parihar , S. Reddy DESIGN OF 16 BIT RISC PROCESSOR
BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE May 2006

P a g e | 11

APPENDIX
Verilog HDL Code
We will send all codes electronically due to space constraints. They will all be in their
original .v (Verilog) file types.
Testbench Code
`timescale 1ns / 1ps
//TOP LEVEL TESTBENCH FOR SIMULATION
module tb_cpu;
// Inputs
reg [15:0] data_from_rom;
reg reset;
reg clk;
reg btn_press;
// Outputs
wire [5:0] address_to_rom;
wire enable_to_rom;
wire write_enable_to_ram;
wire [5:0] address_to_ram;
wire read_enable_to_ram;
wire enable_ram_read;
// Bidirs
wire [15:0] data_ram;
// Instantiate the Unit Under Test (UUT)
CPU uut (
.data_from_rom(data_from_rom),
.reset(reset),
.clk(clk),
.btn_press(btn_press),
.address_to_rom(address_to_rom),
.enable_to_rom(enable_to_rom),
.data_ram(data_ram),
.write_enable_to_ram(write_enable_to_ram),
.address_to_ram(address_to_ram),
.read_enable_to_ram(read_enable_to_ram),
.enable_ram_read(enable_ram_read)
);
always
begin
#10
end
initial begin

clk = ~clk;

// Initialize Inputs
reset = 0;
clk = 0;
btn_press = 1;
#100;#100;
btn_press = 0;

data_from_rom = 16'b1000000000000101; //LI R0 = 5 IN DECIMAL


#100;
data_from_rom = 16'b0000000100000001;
#100;
data_from_rom = 16'b1010000000010001;
#100;
data_from_rom = 16'b0110001000010000;
#100;
data_from_rom = 16'b1000000000000101;
#100;
data_from_rom = 16'b1001001100010000;
#300;
data_from_rom = 16'b0001010000100011;
#100;
data_from_rom = 16'b0110001100100000;
#100;
data_from_rom = 16'b1010000000100011;
#100;
data_from_rom = 16'b1001010100100000;
#100;
data_from_rom = 16'b1011000000000011;
#100;
data_from_rom = 16'b1100000000000011;
#100;
data_from_rom = 16'b1000000000000101;
#100;
data_from_rom = 16'b1011000000000011;
#100;
data_from_rom = 16'b1100000000000011;
#100;
data_from_rom = 16'b0000000000000011;
#100;
data_from_rom = 16'b1010000001010000;
#100;
data_from_rom = 16'b0110001000010000;
#100;
// Add stimulus here
end
endmodule

Synthesis Report Design Summary

CPU Project Status (11/25/2013 - 16:09:36)


Project File:

CPU.xise

Parser Errors:

No Errors

Module Name:

CPU

Implementation State:

Placed and Routed

Target Device:

xa7a100t-2Icsg324

Product Version:

ISE 14.2

Design Goal:

Balanced

Design Strategy:

Xilinx Default (unlocked)

Environment:

System Settings

Errors:

Warnings:

Routing Results:

Timing Constraints:

Final Timing Score:

All Signals Completely Routed

All Constraints Met

0 (Timing Report)

Device Utilization Summary


Slice Logic Utilization
Number of Slice Registers
Number used as Flip Flops
Number used as Latches

[-]

Used
459

1%

661

63,400

1%

657

63,400

1%

19,000

0%

15,850

1%

439
20
0

Number used as AND/OR logics

Number used as logic


Number using O6 output only
Number using O5 output only
Number using O5 and O6

Utilization

126,800

Number used as Latch-thrus

Number of Slice LUTs

Available

625
0
32

Number used as ROM

Number used as Memory

Number used exclusively as route-thrus

Number with same-slice register load

Number with same-slice carry load

Number with other load

Number of occupied Slices

215

Number of LUT Flip Flop pairs used

721

Number with an unused Flip Flop

271

721

37%

60

721

8%

Number with an unused LUT

Note(s)

Number of fully used LUT-FF pairs

390

721

54%

Number of unique control sets

16

Number of slice register sites lost


to control set restrictions

29

126,800

1%

51

210

24%

Number of RAMB36E1/FIFO36E1s

135

0%

Number of RAMB18E1/FIFO18E1s

270

0%

Number of BUFG/BUFGCTRLs

32

3%

Number of bonded IOBs

Number used as BUFGs

Number used as BUFGCTRLs

Number of IDELAYE2/IDELAYE2_FINEDELAYs

300

0%

Number of ILOGICE2/ILOGICE3/ISERDESE2s

300

0%

Number of ODELAYE2/ODELAYE2_FINEDELAYs

Number of OLOGICE2/OLOGICE3/OSERDESE2s

300

0%

Number of PHASER_IN/PHASER_IN_PHYs

24

0%

Number of PHASER_OUT/PHASER_OUT_PHYs

24

0%

Number of BSCANs

0%

Number of BUFHCEs

96

0%

Number of BUFRs

24

0%

Number of CAPTUREs

0%

Number of DNA_PORTs

0%

Number of DSP48E1s

240

0%

Number of EFUSE_USRs

0%

Number of FRAME_ECCs

0%

Number of ICAPs

0%

Number of IDELAYCTRLs

0%

Number of IN_FIFOs

24

0%

Number of MMCME2_ADVs

0%

Number of OUT_FIFOs

24

0%

Number of PCIE_2_1s

0%

Number of PHASER_REFs

0%

Number of PHY_CONTROLs

0%

Number of PLLE2_ADVs

0%

Number of STARTUPs

0%

Number of XADCs

0%

Average Fanout of Non-Clock Nets

5.93
Performance Summary

[-]

Final Timing Score:

0 (Setup: 0, Hold: 0, Component Switching Limit: 0)

Pinout Data:

Pinout Report

Routing Results:

All Signals Completely Routed

Clock Data:

Clock Report

Timing Constraints:

All Constraints Met


Detailed Reports

[-]

Report Name

Status

Generated

Errors

Warnings

Infos

Synthesis Report

Current

Mon Dec 16 19:45:37 2013

46 Warnings (1 new)

25 Infos (6 new)

Translation Report

Current

Mon Dec 16 19:51:15 2013

Map Report

Current

Mon Dec 16 19:51:50 2013

Place and Route Report

Current

Mon Dec 16 19:52:14 2013

1 Warning (1 new)

1 Info (0 new)

Current

Mon Dec 16 19:52:31 2013

1 Warning (1 new)

3 Infos (0 new)

Power Report
Post-PAR Static Timing Report
Bitgen Report
Secondary Reports

[-]

Report Name

Status

Generated

ISIM Simulator Log

Out of Date

Mon Dec 16 18:42:33 2013

WebTalk Report

Out of Date

Tue Dec 10 07:29:01 2013

WebTalk Log File

Out of Date

Tue Dec 10 07:29:10 2013