HW/SW Co-Design For Soc With Vivado HLS: Example 6-1 Serial Addition

HW/SW co-design for SoC with Vivado
HLS
Example 6-1 serial addition
The serial addition is a for loop, there won’t be any modification in the code as it is sequential.
Figure 1: Serial adder DFG
The code is as follows:
Example6-1.h:
#include <ap_int.h>
#include <stdio.h>
#define WIDTH 8
#define IN_NUMBER 8
typedef ap_uint<WIDTH> int8;
int8 sum ( int8 r[IN_NUMBER]);
as shown above the arbitrarty precision is used to define a variable called int8 with 8-bit width.
And the function prototype is defined.
example6-1.cpp:
#include "example6-1.h"
int8 sum ( int8 r[IN_NUMBER])

{
int8 sum = 0;
for (int k = 0; k <IN_NUMBER; k++)
{
sum += r[k];
}
return sum;
}
This code represent our intended HW, where it is simply a for loop calculate the sum of array
contents and return the result.
Tb-6-1.cpp:
#include <iostream>
#include <stdlib.h>
using namespace std;
int main (){
int8 sw_result, hw_result;

int8 r [IN_NUMBER];
// fill r with random numbers < 50 and print them on screen and
calculate the gold refence
sw_result =0;
for (int k =0 ; k<IN_NUMBER; k++)
{
r[k]= rand()% 50;
cout << "r[" << k << "] = " << r[k] << endl;
sw_result += r[k];
}
// DUT
hw_result = sum (r);
cout << "sw_result = " << sw_result << endl;

cout << "hw_result = " << hw_result << endl;
if (hw_result == sw_result) return 0;

else return 1;
}
Here is the test bench for simulation and co-simulation. If the HW result = SW then the main
return 0 and simulation passes. Otherwise, the simulation fails.
A successful simulation will output the following:

INFO: [SIM 211-2] *************** CSIM start ***************
WARNING: [SIM 211-51] HLS only supports CLANG compiler in Linux.
INFO: [SIM 211-4] CSIM will launch GCC as the compiler.
Compiling ../../../example6-1.cpp in debug mode
Generating csim.exe
r[0] = 41
r[1] = 17
r[2] = 34
r[3] = 0
r[4] = 19
r[5] = 24
r[6] = 28
r[7] = 8
sw_result = 171
hw_result = 171
INFO: [SIM 211-1] CSim done with 0 errors.
INFO: [SIM 211-3] *************** CSIM finish ***************
Finished C simulation.
We defined sum as the top module and after a successful synthesis the report will show:
1. General Report
2. Performance Estimates
a. Timing
b. Latency
i. Latency: the amount of clocks to output the results.
ii. Interval: the amount of clocks needed to read the next set of inputs.
3. Utilization Estimates
4. Interface
Figure 2: Serial adder performance estimates
Note: To reduce the latency the design should be concurrent ,hence unrolling the loop.
Figure 3: the utilization estimates of serial adder

Figure 4: serial adder interface
Vivado hls provide by default clk and rst ports and handshaking ports. The array is treated as a
block memory, to be noted that bram has only 2 ports, so it may be a bottleneck for some
designs which will require array partitioning to be optimized.
Figure 5: analysis of serial adder RTL
From the analysis perspective we can know how the HW works. Where c0, c1, c2 are the control
states which is similar to FSM states but the are no one-to-one mapping between them. The
read operation take 2 clock cycles and the addition require one. The orange sell refer to the
loop.
Example6-2: concurrent design using adder tree
Adder tree
Figure 6: adder tree
To make the design concurrent the for loop should be unrolled so this directive will be added:
#pragma HLS UNROLL
HLS consider arrays as memory, and memories can have only 2 port of reading. So in order to
make 8 memory readings at the same time, we want to have 4 memories of length 2. Thus
partition the array to 4 arrays using this directive:
#pragma HLS ARRAY_PARTITION variable=r factor=4 dim=1
After synthesis the result will be an adder tree. The code:

int8 sum ( int8 r[IN_NUMBER])

{
#pragma HLS ARRAY_PARTITION variable=r factor=4 dim=1
int8 sum = 0;
sum_label1:for (int k = 0; k <IN_NUMBER; k++)
{
#pragma HLS UNROLL
sum += r[k];
}
return sum;
}
Figure 7: comparison between Serial and concurrent adders
As shown in the figure the latency and interval has droped down significantly. Where we find
increase in the LUT usage.
Figure 8: Adder tree Interfaces, notice that the array was completely partitioned
Since the array are completely partitioned the array elements can be read all at the same time.
If that wasn’t done there would be bottleneck in the design and wouldn’t perform as expected.
Figure 9: the analysis perspective of adder tree
As it shown in the analysis perspective, all the functionality occurs in one clock cycles. And we
see that there are 7 adder units which will compose the adder tree.
Example 6-3 minimum area design
When minimum area is required that mean the using serial architecture following the code:
Header.h:
#include <stdio.h>
int powerof4 ( int a);
main.cpp for serial Datapath design

#include "header.h"
int powerof4 ( int a)

{
int result=1;
for (int i = 0; i<4; i++)
result *= a;
return result;
}
The test bench used to test the design is:

#include "header.h"
#include <iostream>
#include <stdlib.h>
using namespace std;
int main (){
int a, sw_result, hw_result;

unsigned int err_cnt=0;
sw_result =0;
for (int k =0 ; k<100; k++)
{
a= rand()% 50;
sw_result = pow (a,4);
// DUT
hw_result = powerof4 (a);
cout << "a = " << a;
cout << " sw_result = " << sw_result ;
cout << " hw_result = " << hw_result << endl;
if (hw_result != sw_result) err_cnt++;
return err_cnt;
}
If we want to compare the serial and concurrent architectures latency and resource utilization of
the power of 4 we got this table:
Figure 10 this figure shows that using the serial we used half the number of DSPs provided
Example 6-3 serial power of 4

It is simple for loop
int powerof4 ( int16 a)
{
int result=1;
power_loop:for (int i = 0; i<4; i++)
result *= a;
return result;

HW/SW Co-Design For Soc With Vivado HLS: Example 6-1 Serial Addition

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

HW/SW Co-Design For Soc With Vivado HLS: Example 6-1 Serial Addition

Загружено:

Авторское право:

Доступные форматы

HW/SW co-design for SoC with Vivado

Figure 1: Serial adder DFG

The code is as follows:

int8 sum ( int8 r[IN_NUMBER]);

int8 sum ( int8 r[IN_NUMBER])

using namespace std;

int main (){

int8 sw_result, hw_result;

cout << "sw_result = " << sw_result << endl;

if (hw_result == sw_result) return 0;

A successful simulation will output the following:

Figure 2: Serial adder performance estimates

Figure 3: the utilization estimates of serial adder

Figure 5: analysis of serial adder RTL

Figure 6: adder tree

After synthesis the result will be an adder tree. The code:

int8 sum ( int8 r[IN_NUMBER])

Example 6-3 minimum area design

int powerof4 ( int a);

main.cpp for serial Datapath design

int powerof4 ( int a)

The test bench used to test the design is:

using namespace std;

int main (){

int a, sw_result, hw_result;

Example 6-3 serial power of 4

Вам также может понравиться