0% found this document useful (0 votes)

6 views

Manuscript

Uploaded by

Manamini012

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Manuscript

Uploaded by

Manamini012

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Proceeding Paper

A High Level Synthesis Approach for RISC-V RV32I Based SoC

and Its FPGA Implementation †
Onur Toker

Electrical and Computer Engineering, Florida Polytechnic University, Lakeland, FL 33805, USA;
[email protected]
† Presented at the 10th International Electronic Conference on Sensors and Applications (ECSA-10), 15–30

November 2023; Available online: https://ecsa-10.sciforum.net/.

Abstract: In this paper, we present a RISC-V RV32I based System on Chip (SoC) design approach
using the Vivado High Level Synthesis (HLS) tool. The proposed approach consists of three separate
levels: The first one is an HLS design and simulation purely in C++. The second one is a Verilog
simulation of the HLS generated Verilog implementation of the CPU core, a RAM unit initialized
with a short assembly code, and a simple output port which simply forwards the output data to the
simulation console. Finally, the third level is an implementation and testing of this SoC on a low-cost
FPGA board (Basys3) running at a clock speed of 100 MHz. A sample C code is compiled using the
GNU RISC-V compiler tool chain and tested on the HLS generated RISC-V RV32I core as well. The
HLS design consists of a single C++ file with less than 300 lines, a single header file, and a testbench
in C++. Our design objectives are (1) The C++ code should be easy to read for an average engineer,
and (2) The coding style should dictate minimal area, i.e., minimal resource utilization, without
significantly degrading the code readability. The proposed system is implemented for two different
I/O bus alternatives: (1) A traditional single clock cycle delay memory interface, and (2) The industry
standard AXI bus. We present timing closure, resource utilization, and power consumption estimates.
Furthermore, by using the open-source synthesis tool yosys, we generate a CMOS gate-level design
and provide gate count details. All design, simulation, and constraint files are publicly available in a
GitHub repo. We also present a simple dual-core SoC design, but detailed multi-core designs and
other advanced futures are planned for future research.

Keywords: High Level Synthesis; RISC-V; System on Chip; FPGA; multi-core architectures

Citation: Toker, O. A High Level

Synthesis Approach for RISC-V
1. Introduction
RV32I Based SoC and Its FPGA
Implementation. Eng. Proc. 2023, 56,
In this paper, we we present a RISC-V RV32I based System on Chip (SoC) design and
0. https://doi.org/ implementation using a High Level Synthesis (HLS) approach. The complete core design
is done in HLS, and then simulated at the C level, then at the Verilog level, and finally
Academic Editor: Firstname
tested on a low-cost FPGA board at 100 MHz clock speed. Both assembly programs, and C
Lastname
programs compiled with the GNU RISC-V toolchain are used as RAM images for testing
Published: 15 November 2023 the HLS generated core. The proposed HLS core design has a single C++ file with less than
300 lines, and is designed to be both highly-readable and use minimal hardware resources.
There are several published papers for CPU design in different hardware description
languages (HDL). In [1], a very simple reduced instruction set (RISC) processor design
Copyright: © 2023 by the authors.
is presented with about 120 lines of Verilog code. See [2–4] and references therein for
Licensee MDPI, Basel, Switzerland.
related work. RISC-V is a free and open source instruction set architecture [5? ]. The
This article is an open access article
standard defines various ISAs starting with the base architecture RV32I. There are numerous
distributed under the terms and
Verilog implementations of RISC-V architectures, with varying degrees of performance and
conditions of the Creative Commons
resource utilization. The paper [6] presents a review of some of the well-known open source
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
designs, and links to relevant GitHub repos for source codes. One disadvantage of these
4.0/).
Verilog implementations is the length of the source codes, which is the main motivation

Eng. Proc. 2023, 56, 0. https://doi.org/10.3390/0 https://www.mdpi.com/journal/engproc

Eng. Proc. 2023, 56, 0 2 of 12

for the HLS based approach adapted in this work. The HLS based approach can be quite
useful for rapid prototyping of complex ideas, especially for systems with complex state
machines. To the best of author’s knowledge, there are limited published work where an
HLS based approach is used for a RISC-V core design. In [7], an HLS design is presented
but the source code is split into multiple files making it difficult to read. What is presented
in this work is a single file design which is relatively short, easily readable, and yet suitable
for an FPGA implementation with clock speeds of 100 MHz. Open source RISC-V cores can
be quite useful for computer architecture education too, see [8]. The proposed HLS RISC-V
RV32I core source codes are available in the public GitHub repo [? ]. Finally, the author
would like to cite [? ] as a source of inspiration for this work.
This paper is organized as follows: In Section 2, we summarize the RISC-V RV32I
instruction set architecture. In Section 3, a high level synthesis approach for design and
simulation is presented. Verilog simulations of our RISC-V SoC is presented in Section 4,
CMOS gate-level design using the open-source synthesis tool yosys and gate count details
are given in Section 5, and the FPGA implementation and testing are presented in Section 6.
In Section 7, a sample C program is used for testing the HLS generated core. A multi-core
RISC-V SoC approach is outlined in Section 8, and finally some concluding remarks are
made in Section 9.

2. RISC-V RV32I Architecture

In this section, we summarize the RISC-V RV32I instruction set architecture (ISA) [5? ].
From a programming perspective there are 32 registers x0,· · · ,x31, and a program counter
PC all having 32-bits size. The register x0 is hardwired to 0, and the instructions are divided
into six different groups R (Register), I (Immediate), S (Store), B (Branch), J (Jump), and
U (Upper) [5? ]. Full details of the instruction encoding and instruction fields are given
in [5? ]. For the HLS implementation of the instruction decoder stage, we divide I type
instructions into IA (Immediate arithmetic), IM (Immediate memory), IJ (Immediate jump),
and IE (Immediate exception) groups. Furthermore, U type instructions are divided into U1
(Upper1), U2 (Upper2). All of the instructions are 32-bits in size, and have a 7-bit opcode
field located between bits 6 down to 0. Furthermore, there are 3-bit func3, 7-bit func7, and
imm fields, but not all instructions have all of these three additional fields [5? ].
The sra and srai instructions use the most-significant bit (MSB) extension rule,
whereas the instructions sltu,sltiu,lbu,lhu,bltu, and bgeu use the zero extension rule.
The instructions ecall and ebreak are implemented as trap/halt. All unaligned memory
accesses are also implemented as trap/halt.

Table 1. RV32I instructions [5? ].

Inst Type Description

add R rd = rs1 + rs2
sub R rd = rs1 - rs2
xor R rd = rs1 ˆ rs2
or R rd = rs1 | rs2
and R rd = rs1 & rs2
sll R rd = rs1 << rs2
srl R rd = rs1 >> rs2
sra R rd = rs1 >> rs2
slt R rd = (rs1 < rs2)?1:0
sltu R rd = (rs1 < rs2)?1:0
Eng. Proc. 2023, 56, 0 3 of 12

Table 1. Cont.

Inst Type Description

addi IA rd = rs1 + imm
xori IA rd = rs1 ˆ imm
ori IA rd = rs1 | imm
andi IA rd = rs1 & imm
slli IA rd = rs1 << imm[0:4]
srli IA rd = rs1 >> imm[0:4]
srai IA rd = rs1 >> imm[0:4]
slti IA rd = (rs1 < imm)?1:0
sltiu IA rd = (rs1 < imm)?1:0
lb IM rd = M[rs1+imm][0:7]
lh IM rd = M[rs1+imm][0:15]
lw IM rd = M[rs1+imm][0:31]
lbu IM rd = M[rs1+imm][0:7]
lhu IM rd = M[rs1+imm][0:15]
sb S M[rs1+imm][0:7] = rs2[0:7]
sh S M[rs1+imm][0:15] = rs2[0:15]
sw S M[rs1+imm][0:31] = rs2[0:31]
beq B if(rs1 == rs2) PC += imm
bne B if(rs1 != rs2) PC += imm
blt B if(rs1 < rs2) PC += imm
bge B if(rs1 >= rs2) PC += imm
bltu B if(rs1 < rs2) PC += imm
bgeu B if(rs1 >= rs2) PC += imm
jal J rd = PC+4; PC += imm
jalr IJ rd = PC+4; PC = rs1 + imm
lui U1 rd = imm << 12
auipc U2 rd = PC + (imm << 12)
ecall IE Trap/Halt
ebreak IE Trap/Halt

3. HLS Approach for Design and Simulation

The HLS design consists of the C++ file riscv32i.cc and the header file riscv32i.h.
There is also a C-simulation testbench file riscv32i_tb.cc. In this section we simply
summarize the main design ideas, but the full source code is available in our GitHub repo [?
].
We start with the outline of the design file riscv32i.cc, see the Outline I. This file has
only the cpu() function which has two pointer arguments. For C-simulation, they have the
usual semantics but for hardware synthesis, the first one is interpreted as a single-port RAM
and the other is implemented as a 4-bit write-strobe signal. The local array reg_file[]
is interpreted as a multi-port RAM for hardware synthesis, which will correspond to the
internal register file. The HLS tool has a standard C compiler which works according
to standard semantic rules for simulations, but for hardware synthesis semantic details
are different and can be controlled by using the #pragma HLS directives. Full details are
available in the Vivado HLS User Guide [? ].
Eng. Proc. 2023, 56, 0 4 of 12

Outline I: Outline of the design file riscv32i.cc

#include "riscv32i.h"
#include <stdio.h>
#include <stdint.h>

// Write strobe
#define wstrb (*pstrb)

void cpu(arch_t mem[MEM_SIZE], volatile strb_t* pstrb) {

#pragma HLS RESOURCE variable=mem core=RAM_1P_BRAM
#pragma HLS INTERFACE ap_none port=pstrb

// Register file
arch_t reg_file[REGFILE_SIZE];

for (int i = 0; i < REGFILE_SIZE; i++)

reg_file[i] = 0;

arch_t pc = 0;

PROGRAM_LOOP: while (true) {

// Fetch
arch_t insn = mem[pc >> 2];

// Decode
opcode_t opcode = insn(6,0);
...

// Execute
switch (opcode) {
case OPCODE_R:
case OPCODE_IA:
switch(...) {
...
}
break;
...
}

// Write back to reg_file or memory or PC

// Branch handling
}
}

As seen in the Outline I, immediately after reset the program counter and all of the
registers are initialized to zero. There is an infinite while loop which will be exited if
an ECALL or EBREAK instruction is executed or an unaligned memory access is requested,
basically causing the CPU core to halt.
The HLS tool converts this while loop to a state machine with 11 states using the
one-hot encoding. Inside the loop, we have the usual instruction fetch, decode, execute,
write-back and branch handling. For example, insn = mem[pc » 2] will be synthesized as
a memory read operation, and opcode = insn(6,0) will be synthesized as selecting the
least significant 7 bits of the 32-bit value read from the memory. Note that, by using the
operator overloading features of C++, we are able to express slicing and concatenation in
C++, see [? ? ] for full details. For example, in the instruction decode stage, we have the
lines

immI = ( ((ap_int<ARCH>) insn) >> 20 );

immS = ( immI(31,5), insn(11,8), insn(7,7) );
immB = ...
immJ = ...
immU = ( insn(31,12), ((ap_uint<12>) 0) );

which corresponds to generating the 32-bit immediate value for various types of instruc-
tions. Note that ap_uint<p> is used for p-bit unsigned integers, insn(p,q) corresponds to
slicing, and ( ... , ... , ... ) corresponds to concatenation. These are possible
because of the standard operator loading features of C++. Note that the C simulation
semantics and the hardware synthesis semantics are different.
There are various switch statements, which are synthesized as wide-multiplexers.
Nested switch statements correspond to cascaded multiplexers. To make sure that minimal
number of adders, comparators, barrel-shifters, etc. are synthesized, and no hardware
resources are wasted or underutilized, we define first program variables src1, src2, res
and then write a bunch of switch statements. This coding style may look a bit unusual, but
still highly readable, and is adapted purely for optimal hardware synthesis. In other words,
Eng. Proc. 2023, 56, 0 5 of 12

the C++ coding style used in HLS greatly affects the final generated hardware, and we tried
to keep a reasonable balance between C++ code readability and hardware optimality.
The HLS tool automatically generates Verilog files in human readable format, but
also allows C-simulation based testing using the file riscv32i_tb.cc. This C-simulation
testbench reads a text file of hexadecimal values in human readable format, initializes the
memory by using these values and passes the control to the cpu() function. Immediately
after return, all register values and the memory are dumped to separate text files. In
Figure 1, Vivado HLS C-simulation for the following short assembly program is given:

li x1,1020
sw x0,0(x1)
loop: lw x2,0(x1)
addi x2,x2,1
sw x2,0(x1)
j loop

Values stored in registers and memory as well as internal signals are displayed in the
debug window. Hexadecimal values for each instruction is written to the file mem.txt, and
conversion is done by using an online assembler tool. See [? ] for full details.

Figure 1. Vivado HLS C-Simulation.

4. RISC-V SoC Simulation in Verilog

In this section, we present Verilog simulation of the HLS generated RISC-V core.
Either one can copy the HLS generated Verilog files to the Vivado project folder, or create
an IP object for block diagram based design. In this section, we simply copy and paste
the generated Verilog files from one folder to the other, but in the next FPGA based
design we will use the block diagram based design approach for better visualization of the
overall system.
Eng. Proc. 2023, 56, 0 6 of 12

Outline II: Outline of the System Verilog testbench

module sys_tb();

localparam T=10;

logic clk, reset, start, done, idle, ready, we, ce, vld;
logic [3:0] wstrb;
logic[9:0] addr;
logic [31:0] val_i, val_o;

cpu U1(
.ap_clk(clk),
.ap_rst(reset),
.ap_start(start),
.ap_idle(idle),
.ap_ready(ready),
.mem_V_address0(addr),
.mem_V_ce0(ce),
.mem_V_we0(we),
.mem_V_d0(val_i),
.mem_V_q0(val_o),
.pstrb_V(wstrb)
);

mem U3(.clk(clk), .we(we), .addr(addr),

.din(val_i), .dout(val_o), .wstrb(wstrb) );

//SRAM U4 (.clka(clk), .wea({4{we}} & wstrb), .addra(addr),

// .dina(val_i), .douta(val_o) );

initial clk = 0;
always #(T/2) clk = ~clk;

initial
begin
...
wait(idle==1);
$stop;
end
endmodule

The simulation testbench outline is given in Outline II, and the RAM with the I/O
devices are presented in Outline III. Basically, we have a simple system on chip consisting
of a single RISC-V RV32I core, a 4 KB RAM with single clock cycle read/write delay, and a
32-bit output port at memory address 0x0ff.

Outline III: Outline of the RAM and I/O devices

module mem(clk, we, addr, din, dout, wstrb);

input clk, we;

input [3:0] wstrb;
input [9:0] addr, read_addr;
input [31:0] din, dout;
logic [31:0] ram [0:1023];

always @(posedge clk)

begin
if (we) begin
if (wstrb[0]) ram[addr][ 7: 0] <= din[ 7: 0];
if (wstrb[1]) ram[addr][15: 8] <= din[15: 8];
if (wstrb[2]) ram[addr][23:16] <= din[23:16];
if (wstrb[3]) ram[addr][31:24] <= din[31:24];
/* add memory-mapped IO here */
if (addr == 255)
$write("%c", din[7:0]); // Change %c to %x
end
read_addr <= addr;
end
assign dout = ram[read_addr];

initial
$readmemh("C:/Users/onur/Desktop/MyWork/vivado/RISCV32I_HLS/mem.txt", ram);

//initial begin
// ram[0] = 32’h 3fc00093; // li x1,1020
// ram[1] = ...
//end
endmodule

In Figure 2, Verilog simulation results are shown. We are using the assembly program
given in the previous section, which basically writes the values 0, 1, 2, ... to the
address 0x0ff. The program counter PC is shown in the timing diagram, and the values
written to the output port at address 0x0ff are shown both in the simulation console and
the timing diagram. There is a specific reason why $write("%c", din[7:0]) is used for
the memory mapped I/O at address 0x0ff. If we use a C-compiler, and implement putc()
as a write to the I/O address 0x0ff, then all printf(...) and cout « ... will write to
Eng. Proc. 2023, 56, 0 7 of 12

the Verilog simulation console. This allows testing of more complex C/C++ programs with
the HLS generated RISC-V core.

Figure 2. Verilog simulation.

In our simulation testbench, we also have a block RAM option, shown as SRAM. This
allows testing the HLS generated RISC-V core using block RAMs available on most Xilinx
FPGAs, see Figure 3.

Figure 3. Block RAM should have single clock cycle read/write delay.

5. RISC-V RV32I Core Synthesis Gate Counts

In this short section, we present gate count results for the CMOS gate-level design
generated by the open-source synthesis tool yosys. The following script is for the synthe-
sis tool

read_verilog cpu.v cpu_reg_file_V.v

hierarchy -check
proc; opt; fsm; opt; memory; opt
techmap; opt
read_liberty -lib cmos_cells.lib
abc -liberty cmos_cells.lib
splitnets -ports; opt
stat
Eng. Proc. 2023, 56, 0 8 of 12

and the following gate-count results are reported by the synthesis tool:

=== cpu ===

Number of wires: 8282

Number of cells:
$_DFF_P_ 321
NAND 2689
NOR 3714
NOT 924

=== cpu_reg_file_V ===

Number of wires: 7714

Number of cells:
$_DFF_P_ 1056
NAND 4726
NOR 1505
NOT 387

In summary, a total of 1377 D-type flip-flops are used including the register file of
depth 32 and width 32. We have forced the synthesis tool to design using only two input
NAND and NOR gates, and with that constraint the total number of two-input NAND
gates is 7415, two-input NOR gates is 5219, and NOT gates is 1311.

6. RISC-V SoC Implementation on an FPGA

In this section, we will present a simple RISC-V SoC implemented on an FPGA. High
level details are presented in Figure 4, and elaborated design is shown in Figure 5.

CONCAT_0

VAND_0
In0[0:0]
SRAM_4K
In1[0:0]
RISCV_I32 dout[3:0] Op1[3:0]
In2[0:0] Res[3:0]
Op2[3:0] BRAM_PORTA
ap_start In3[0:0]
ap_ctrl mem_V_ce0 addra[9:0]
ap_start mem_V_we0 Utility Vector Logic clka
Concat
ap_clk mem_V_address0[9:0] dina[31:0]
ap_clk ap_rst mem_V_d0[31:0] douta[31:0]
mem_V_q0[31:0] pstrb_V[3:0] wea[3:0]
ap_rst
riscv_H1 Block RAM

SLICE_0 REG_0

Din[31:0] Dout[7:0] D[7:0]

LED[7:0]
CLK Q[7:0]
Slice CE

Output port

Figure 4. A RISC-V SoC block diagram for FPGA implementation.

The elaborated design has 1296 cells, and 1968 nets.

Eng. Proc. 2023, 56, 0 9 of 12

Figure 5. Elaborated design of the RISC-V SoC. The blue box on the right corresponds to the
register file.

Resource utilization of the implemented design is 1078 LUT (5.18%), 326 FF (0.78%)
and % 3 of the BRAM. The final system has 1.41 ns worst-case negative slack for the setup
time for 100 MHz clock. The power consumption is estimated as 81 mW at 100 MHz clock.
Figure 6 shows the FPGA implementation of the SoC for the Basys3 board. Note that the
whole SoC design fits into a portion of the clock region X0Y0. The large rectangular block at
the center of Figure 6 is the 4 KB RAM used for the system on chip.

Figure 6. RISC-V SoC FPGA implementation for the Basys3 board fits into a portion of the clock
region X0Y0.

We use the same assembly program given in Section 3, and make sure that the hex
values corresponding to assembly instructions are loaded to the SoC RAM. After the
system is reset using the button btnC, the CPU core can be started using the button btnU.
Figure 7 shows a Basys3 board implementation of our RISC-V SoC with the output port
connected to the on-board leds. Note that, bits 20 down-to 13 of the 32-bit value written to
memory is routed to the I/O port using the slice block shown in Figure 4. The assembly
program shown given in Section 3 has loop execution time of 170 ns, i.e., 17 clock cycles
loop execution time. The slicing block effectively slows down the counting speed so that
counting can be observed by naked eye.
Eng. Proc. 2023, 56, 0 10 of 12

Figure 7. RISC-V SoC implemented on a Basys3 board.

7. Testing with a Sample C Program

In this section, we use a short C program for simulating the RISC-V H1 core designed
earlier. Our testcode is given below
#define OUTPORT (0x0ff)
#include <stdint.h>

void main(void);
void main(void) {
*((volatile uint32_t*)OUTPORT) = ’R’;
*((volatile uint32_t*)OUTPORT) = ’I’;
*((volatile uint32_t*)OUTPORT) = ’S’;
*((volatile uint32_t*)OUTPORT) = ’C’;
*((volatile uint32_t*)OUTPORT) = ’\n’;
}

It is compiled with the GNU RISC-V compiler to generate the RAM image. As shown in the
Outline III, we have a $readmemh to initialize the RAM for the Verilog simulation. Again as
shown in the Outline III, all writes to address 0x0ff is forwarded to the simulation console
using the $write command. In summary, when the SoC is simulated with the GNU RISC-V
compiler to generated RAM image, we see the string ‘RISC’ written to the console followed
by a newline, which serves as another verification of the H1 core. In a future version of the
paper, we will be using longer C programs for a more comprehensive testing.

8. A multicore RISC-V SoC

In this section, we briefly summarize our multicore RISC-V SoC implementation. We
start by changing the memory interface from block RAM to a AXI master, i.e., change the
HLS directive as #pragma HLS INTERFACE m_axi depth=1024 port=mem. This will result
a different RISC-V RV32I core equipped with the AXI master interface. The Vivado HLS
generates a Verilog implementation with 42 states, which we call as the H2 core. For this
AXI equipped H2 core, we need to delete the write-strobe port, wstrb, and use

(mem[addr >> 2])( 7,0) = res;

(mem[addr >> 2])(16,0) = res;
(mem[addr >> 2])(32,0) = res;

to implement byte, word, and double-word sized memory write operations respectively.
Note that, the bit-slicing operator (.,.) can be used both on the left and right-hand side
of expressions.
In Figure 8 we have a dual-core RISC-V RV32I system with 8K on-chip RAM, two
8-bit output ports, a 16-bit input port, and a single UART port. The H2 core does not have
a tightly coupled memory (TCM) inside the unit, but this will be addressed in a future
version of the paper. Basically, in the current implementation both cores are using the
on-chip static RAM over the AXI bus. All GPIOs and the UART unit are also on the AXI-bus.
We have added a JTAG to AXI unit which can be used for debugging and initialization of
Eng. Proc. 2023, 56, 0 11 of 12

the on-chip static RAM. For this dual-core SoC to function properly, both cores should have
different reset vectors so that they can execute different programs independently.
axi_gpio_0

S_AXI
GPIO
s_axi_aclk
gpio_io_o[7:0] LED0[7:0]
s_axi_aresetn

AXI GPIO
cpu_0
axi_gpio_1
ap_ctrl
S_AXI
ap_start ap_start GPIO
m_axi_mem_V s_axi_aclk
ap_clk gpio_io_o[7:0] LED1[7:0]
s_axi_aresetn
ap_rst_n

AXI GPIO
riscv32i H2 (Pre-Production)
axi_smc axi_bram_ctrl_0
cpu_1
axi_bram_ctrl_0_bram
S00_AXI M00_AXI S_AXI
ap_ctrl
S01_AXI M01_AXI s_axi_aclk BRAM_PORTA BRAM_PORTA rsta_busy
ap_start
m_axi_mem_V S02_AXI M02_AXI s_axi_aresetn
ap_clk
aclk M03_AXI Block Memory Generator
ap_rst_n
aresetn M04_AXI AXI BRAM Controller

riscv32i H2 (Pre-Production) axi_gpio_2

AXI SmartConnect
S_AXI
jtag_axi_0 GPIO
s_axi_aclk
gpio_io_i[15:0]
s_axi_aresetn
aclk
M_AXI
aresetn
AXI GPIO
rst_clk_wiz_100M
JTAG to AXI Master axi_uartlite_0
slowest_sync_clk mb_reset
S_AXI
clk_wiz ext_reset_in bus_struct_reset[0:0] UART usb_uart
s_axi_aclk
aux_reset_in peripheral_reset[0:0] interrupt
ap_rst s_axi_aresetn
reset clk_out1 mb_debug_sys_rst interconnect_aresetn[0:0]
ap_clk clk_in1 locked dcm_locked peripheral_aresetn[0:0]
AXI Uartlite

Clocking Wizard Processor System Reset

sw[15:0]

Figure 8. A dual-core RISC-V SoC for FPGA implementation.

Based on our preliminary results, we see that the dual-core RISC-V system shown in
Figure 8 does fit into a Basys3 board.

9. Conclusions
In this paper, we have presented a high level synthesis approach for RISC-V RV32I
system design. The CPU core is designed and simulated at the C level, then the HLS
generated Verilog code is tested with RAM and I/O devices at the Verilog simulation
level. Finally, the complete system on chip design with memory and I/O devices are
implemented and tested on a low-cost FPGA board. Timing closure, resource utilization,
and power consumption estimates are also presented. CMOS gate-level design and gate
counts are generated by using an open-source synthesis tool. We have also outlined a
dual-core system design as well. The HLS generated CPU core has 14 states for a traditional
single clock cycle delay memory interface, and 42 states if the AXI bus support is needed.
For such more complex systems, design in Verilog will be more demanding and error prone
compared to an HLS based approach. Detailed analysis of multi-core designs are planned
for future research.

Funding: Funding is provided by NSF-1919855, Advanced Mobility Institute grants GR-2000028,

GR-2000029, and Florida Polytechnic University startup grant GR-1900022.
Acknowledgments: Author would like to acknowledge the support from NSF-1919855, Florida
Polytechnic University, and AMI.

References
1. Depablo, S.; Cebrián, J.A.; Herrero-de Lucas, L.C.; Rey-Boué, A.B. A very simple 8-bit RISC processor for FPGA. In Proceedings
of the FPGAworld Conference 2006, Stocholm, Sweden, 2006; pp. 9–15.
2. Archana, H.R.; Sanjana, T.; Bhavana, H.T.; Sunil, S.V. System Verification and Analysis of ALU for RISC Processor. In Proceedings
of the 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India,
19–20 March 2021; Volume 1, pp. 1785–1789. https://doi.org/10.1109/ICACCS51430.2021.9442045.
3. Wang, L.; Yu, Z.; Zhang, D.; Qin, G. Research on Multi-Cycle CPU Design Method of Computer Organization Principle
Experiment. In Proceedings of the 2018 13th International Conference on Computer Science Education (ICCSE), Colombo, Sri
Lanka, 8–11 August 2018; pp. 1–6. https://doi.org/10.1109/ICCSE.2018.8468694.
4. Eljhani, M.M.; Kepuska, V.Z. Reduced Instruction Set Computer Design on FPGA. In Proceedings of the 2021 IEEE 1st
International Maghreb Meeting of the Conference on Sciences and Techniques of Automatic Control and Computer Engineering
MI-STA, Tripoli, Libya, 25–27 May 2021; pp. 316–321. https://doi.org/10.1109/MI-STA52233.2021.9464409.
5. Waterman, A.; Asanović, K. The RISC-V Instruction Set Manual, Volume I: Unprivileged ISA version 20191213; RISC-V International:
2021.
Eng. Proc. 2023, 56, 0 12 of 12

6. Höller, R.; Haselberger, D.; Ballek, D.; Rössler, P.; Krapfenbauer, M.; Linauer, M. Open-Source RISC-V Processor IP Cores for
FPGAs — Overview and Evaluation. In Proceedings of the 2019 8th Mediterranean Conference on Embedded Computing
(MECO), Budva, Montenegro, 10–14 June 2019; pp. 1–6. https://doi.org/10.1109/MECO.2019.8760205.
7. Rokicki, S.; Pala, D.; Paturel, J.; Sentieys, O. What You Simulate Is What You Synthesize: Design of a RISC-V Core from C++
Specifications. In Proceedings of the RISC-V Workshop 2019, Zurich, Switzerland, 2006; pp. 1–2.
8. Harris, S.L.; Chaver, D.; Piñuel, L.; Gomez-Perez, J.; Liaqat, M.H.; Kakakhel, Z.L.; Kindgren, O.; Owen, R. RVfpga: Using
a RISC-V Core Targeted to an FPGA in Computer Architecture Education. In Proceedings of the 2021 31st International
Conference on Field-Programmable Logic and Applications (FPL), Dresden, Germany, 30 August–3 September 2021; pp. 145–150.
https://doi.org/10.1109/FPL53798.2021.00032.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

The Rocc Doc V2: An Introduction To The Rocket Custom Coprocessor Interface
0% (1)
The Rocc Doc V2: An Introduction To The Rocket Custom Coprocessor Interface
11 pages
RISC V Intro For Hackathon
100% (2)
RISC V Intro For Hackathon
40 pages
SiFive U8 Takes RISC-V Out of Order
No ratings yet
SiFive U8 Takes RISC-V Out of Order
3 pages
PlayStation Architecture: Architecture of Consoles: A Practical Analysis, #6
From Everand
PlayStation Architecture: Architecture of Consoles: A Practical Analysis, #6
Rodrigo Copetti
No ratings yet
RVCoreP An Optimized RISC-V Soft Processor of Five-Stage
No ratings yet
RVCoreP An Optimized RISC-V Soft Processor of Five-Stage
10 pages
Node.js 63 Interview Questions and Answers
From Everand
Node.js 63 Interview Questions and Answers
John Edward Cooper Berg
No ratings yet
mantovani_cicc20
No ratings yet
mantovani_cicc20
8 pages
Gur 2018
No ratings yet
Gur 2018
4 pages
Design A 5-Stage Pipeline RISC-V CPU and Optimise
100% (1)
Design A 5-Stage Pipeline RISC-V CPU and Optimise
8 pages
Final Year Project Review
No ratings yet
Final Year Project Review
25 pages
FPGA Implementation of Educational RISC - V Processor Suitable For Embedded Applications
No ratings yet
FPGA Implementation of Educational RISC - V Processor Suitable For Embedded Applications
5 pages
Design and Implementation of a 32-bit ISA RISC-V
No ratings yet
Design and Implementation of a 32-bit ISA RISC-V
5 pages
Five-Stage Pipelined 32-Bit RISC-V Base Integer Instruction Set Architecture Soft Microprocessor Core in VHDL
No ratings yet
Five-Stage Pipelined 32-Bit RISC-V Base Integer Instruction Set Architecture Soft Microprocessor Core in VHDL
6 pages
Milestone03_Computer Architecture Report_Group3
No ratings yet
Milestone03_Computer Architecture Report_Group3
45 pages
Single Cycle RISC-V Micro Architecture Processor and Its FPGA Prototype
No ratings yet
Single Cycle RISC-V Micro Architecture Processor and Its FPGA Prototype
5 pages
Project_Phase1
No ratings yet
Project_Phase1
2 pages
Rvcorep: An Optimized Risc-V Soft Processor of Five-Stage Pipelining
No ratings yet
Rvcorep: An Optimized Risc-V Soft Processor of Five-Stage Pipelining
9 pages
Riscv Design
No ratings yet
Riscv Design
82 pages
DesignandImplementationofRISC VISARV32IM
No ratings yet
DesignandImplementationofRISC VISARV32IM
6 pages
Research_and_Design_of_RISC-V_Four-Stage_Out-of-Order_Execution_Processor
No ratings yet
Research_and_Design_of_RISC-V_Four-Stage_Out-of-Order_Execution_Processor
3 pages
My_Thesis (2)
No ratings yet
My_Thesis (2)
59 pages
Reduced Instruction Set Computer (Risc) 32bit Processor On Field Programmable Gate Arrays (Fpgas) Implementation
No ratings yet
Reduced Instruction Set Computer (Risc) 32bit Processor On Field Programmable Gate Arrays (Fpgas) Implementation
5 pages
Poster_RISCV - Copy
No ratings yet
Poster_RISCV - Copy
1 page
IndiRA Design and Implementation of A Pipelined RISC-V Processor
No ratings yet
IndiRA Design and Implementation of A Pipelined RISC-V Processor
6 pages
RISC
No ratings yet
RISC
11 pages
ASIC Project ckpt3
No ratings yet
ASIC Project ckpt3
16 pages
Endgamechanger
No ratings yet
Endgamechanger
22 pages
ASIC Project ckpt1
No ratings yet
ASIC Project ckpt1
12 pages
ASIC Project ckpt2
No ratings yet
ASIC Project ckpt2
14 pages
A Portable and Linux Capable RISC-V Computer System in Verilog HDL
No ratings yet
A Portable and Linux Capable RISC-V Computer System in Verilog HDL
10 pages
Fpga Based 32 Bit Risc Processor Design
No ratings yet
Fpga Based 32 Bit Risc Processor Design
18 pages
Manual
No ratings yet
Manual
40 pages
A Configurable Risc V Processor Core For Fpga Devices
No ratings yet
A Configurable Risc V Processor Core For Fpga Devices
53 pages
HDL Programming and Sequential Circuitry For Multi-Core RISC-V Processor
No ratings yet
HDL Programming and Sequential Circuitry For Multi-Core RISC-V Processor
6 pages
PAPER1[1] (1) (2)
No ratings yet
PAPER1[1] (1) (2)
6 pages
Group 2 Report
No ratings yet
Group 2 Report
10 pages
Risc Processor Using VHDL
No ratings yet
Risc Processor Using VHDL
5 pages
8bit Risc Processor
No ratings yet
8bit Risc Processor
7 pages
Design A Simple Fpga Risc Cpu and System On A Chip - Slides
No ratings yet
Design A Simple Fpga Risc Cpu and System On A Chip - Slides
65 pages
milestone-2
No ratings yet
milestone-2
14 pages
ASIC Project ckpt4
No ratings yet
ASIC Project ckpt4
22 pages
chips-03-00020-v3 (1)
No ratings yet
chips-03-00020-v3 (1)
13 pages
16-Bit RISC PROCESSOR
100% (1)
16-Bit RISC PROCESSOR
16 pages
asic_project_1.0
No ratings yet
asic_project_1.0
14 pages
16-Bit Risc Cpu
No ratings yet
16-Bit Risc Cpu
5 pages
Research and Design of Low-Power High-Performance
No ratings yet
Research and Design of Low-Power High-Performance
7 pages
Ijert: 32-Bit Risc Processor For Computer Architecture
No ratings yet
Ijert: 32-Bit Risc Processor For Computer Architecture
6 pages
Electronics 13 00120 With Cover
No ratings yet
Electronics 13 00120 With Cover
15 pages
Design of A RISC Microcontroller Core in 48 Hours
No ratings yet
Design of A RISC Microcontroller Core in 48 Hours
11 pages
Risc V PDF
No ratings yet
Risc V PDF
117 pages
Design of A 32-Bit Dual Pipeline Superscalar RISC-V Processor On FPGA
No ratings yet
Design of A 32-Bit Dual Pipeline Superscalar RISC-V Processor On FPGA
4 pages
Ther Is CV Reader
No ratings yet
Ther Is CV Reader
192 pages
Project Report of RISC-V CPU
No ratings yet
Project Report of RISC-V CPU
6 pages
Framework and Tools For Undergraduates Designing RISC-V Processors On An FPGA in Computer Architecture Education
No ratings yet
Framework and Tools For Undergraduates Designing RISC-V Processors On An FPGA in Computer Architecture Education
4 pages
Conference Template A4
No ratings yet
Conference Template A4
4 pages
Block Diagram of a RISC-lab-ex
No ratings yet
Block Diagram of a RISC-lab-ex
7 pages
Controlunit PPT - PPX
No ratings yet
Controlunit PPT - PPX
31 pages
Introduction To Logic Circuit Design With VHDL
From Everand
Introduction To Logic Circuit Design With VHDL
Bilgehan Erkal
No ratings yet
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
No ratings yet
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
Rust for Embedded Systems
From Everand
Rust for Embedded Systems
James Oakton
No ratings yet
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
From Everand
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
Bruce Dang
No ratings yet
04 PULP Chips
No ratings yet
04 PULP Chips
49 pages
Brainstorming and Research: Core Technical Highlights
No ratings yet
Brainstorming and Research: Core Technical Highlights
9 pages
Optimizing_CNN_Computation_Using_RISC-V_Custom_Instruction_Sets_for_Edge_Platforms
No ratings yet
Optimizing_CNN_Computation_Using_RISC-V_Custom_Instruction_Sets_for_Edge_Platforms
14 pages
Verification Mag 2018 Issue
No ratings yet
Verification Mag 2018 Issue
39 pages
Buy ebook (Ebook) Guide to Computer Processor Architecture: A RISC-V Approach, with High-Level Synthesis by Bernard Goossens ISBN 9783031180224, 3031180224 cheap price
100% (6)
Buy ebook (Ebook) Guide to Computer Processor Architecture: A RISC-V Approach, with High-Level Synthesis by Bernard Goossens ISBN 9783031180224, 3031180224 cheap price
65 pages
Riscv Crypto Spec v0.9.0 Scalar
No ratings yet
Riscv Crypto Spec v0.9.0 Scalar
52 pages
Risc V
No ratings yet
Risc V
66 pages
Tue1224 SiFive Freedom U500 Kang
No ratings yet
Tue1224 SiFive Freedom U500 Kang
12 pages
Unpriv Isa Asciidoc
No ratings yet
Unpriv Isa Asciidoc
236 pages
Chapter 02 RISC V
No ratings yet
Chapter 02 RISC V
92 pages
exploring-instruction-set-architectural-variations-x86-arm-and-riscv-in-computeintensive-applications
No ratings yet
exploring-instruction-set-architectural-variations-x86-arm-and-riscv-in-computeintensive-applications
6 pages
Rolling The Dice With Random Instructions Is The Safe Bet On Risc V Verification
No ratings yet
Rolling The Dice With Random Instructions Is The Safe Bet On Risc V Verification
9 pages
Guide To Computer Processor Architecture A Riscv Approach With Highlevel Synthesis Bernard Goossens instant download
No ratings yet
Guide To Computer Processor Architecture A Riscv Approach With Highlevel Synthesis Bernard Goossens instant download
82 pages
A RISC-V Matrix Multiplier Using Systolic Arrays
No ratings yet
A RISC-V Matrix Multiplier Using Systolic Arrays
41 pages
Riscv Crypto Spec Scalar v1.0.1
No ratings yet
Riscv Crypto Spec Scalar v1.0.1
105 pages
Improving The Efficiency of Cryptography Algorithms On Resource-Constrained Embedded Systems Via RISC-V Instruction Set Extensions
No ratings yet
Improving The Efficiency of Cryptography Algorithms On Resource-Constrained Embedded Systems Via RISC-V Instruction Set Extensions
6 pages
Complete Download (Ebook) The RISC-V Instruction Set Manual: Volume II: Privileged Architecture by Andrew Waterman, Krste Asanović PDF All Chapters
100% (10)
Complete Download (Ebook) The RISC-V Instruction Set Manual: Volume II: Privileged Architecture by Andrew Waterman, Krste Asanović PDF All Chapters
55 pages
RV32 Reference 2
No ratings yet
RV32 Reference 2
18 pages
Computer Organization and Design RISC V Edition The Hardware Software Interface David A. Patterson instant download
100% (1)
Computer Organization and Design RISC V Edition The Hardware Software Interface David A. Patterson instant download
59 pages
labtask5
No ratings yet
labtask5
12 pages
Cheriot-Iot Security
No ratings yet
Cheriot-Iot Security
159 pages
CH32V201
No ratings yet
CH32V201
59 pages
Pulp Intro KGF
No ratings yet
Pulp Intro KGF
65 pages
Avispado 222 DataSheet
No ratings yet
Avispado 222 DataSheet
16 pages
Lec12 Datapath - Key
No ratings yet
Lec12 Datapath - Key
43 pages
Christopher Celio, Krste Asanovic, David Palerson
No ratings yet
Christopher Celio, Krste Asanovic, David Palerson
35 pages
Lab1 Spec
No ratings yet
Lab1 Spec
6 pages

Manuscript

Uploaded by

Manuscript

Uploaded by

Proceeding Paper

A High Level Synthesis Approach for RISC-V RV32I Based SoC

November 2023; Available online: https://ecsa-10.sciforum.net/.

Citation: Toker, O. A High Level

Eng. Proc. 2023, 56, 0. https://doi.org/10.3390/0 https://www.mdpi.com/journal/engproc

2. RISC-V RV32I Architecture

Table 1. RV32I instructions [5? ].

Inst Type Description

Inst Type Description

3. HLS Approach for Design and Simulation

Outline I: Outline of the design file riscv32i.cc

void cpu(arch_t mem[MEM_SIZE], volatile strb_t* pstrb) {

for (int i = 0; i < REGFILE_SIZE; i++)

PROGRAM_LOOP: while (true) {

// Write back to reg_file or memory or PC

immI = ( ((ap_int<ARCH>) insn) >> 20 );

Figure 1. Vivado HLS C-Simulation.

4. RISC-V SoC Simulation in Verilog

Outline II: Outline of the System Verilog testbench

mem U3(.clk(clk), .we(we), .addr(addr),

//SRAM U4 (.clka(clk), .wea({4{we}} & wstrb), .addra(addr),

Outline III: Outline of the RAM and I/O devices

input clk, we;

always @(posedge clk)

Figure 2. Verilog simulation.

5. RISC-V RV32I Core Synthesis Gate Counts

read_verilog cpu.v cpu_reg_file_V.v

=== cpu ===

Number of wires: 8282

=== cpu_reg_file_V ===

Number of wires: 7714

6. RISC-V SoC Implementation on an FPGA

Din[31:0] Dout[7:0] D[7:0]

Figure 4. A RISC-V SoC block diagram for FPGA implementation.

The elaborated design has 1296 cells, and 1968 nets.

Figure 7. RISC-V SoC implemented on a Basys3 board.

7. Testing with a Sample C Program

8. A multicore RISC-V SoC

(mem[addr >> 2])( 7,0) = res;

riscv32i H2 (Pre-Production) axi_gpio_2

Clocking Wizard Processor System Reset

Figure 8. A dual-core RISC-V SoC for FPGA implementation.

Funding: Funding is provided by NSF-1919855, Advanced Mobility Institute grants GR-2000028,

You might also like