ROCCC 2.0 User's Manual - Revision 0.6: February 9, 2011
ROCCC 2.0 User's Manual - Revision 0.6: February 9, 2011
February 9, 2011
1
Contents
1 Changes 7
1.1 Revision 0.6 Added Features . . . . . . . . . . . . . . . . . . . . 7
1.2 Revision 0.6 Bug Fixes . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Revision 0.5.2 Added Features . . . . . . . . . . . . . . . . . . . 8
1.4 Revision 0.5.2 Bug Fixes . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Revision 0.5.1 Added Features . . . . . . . . . . . . . . . . . . . 11
1.6 Revision 0.5.1 Bug Fixes . . . . . . . . . . . . . . . . . . . . . . . 12
1.7 Revision 0.5 Added Features . . . . . . . . . . . . . . . . . . . . 13
1.8 Revision 0.5 Bug Fixes . . . . . . . . . . . . . . . . . . . . . . . . 14
1.9 Revision 0.4.2 Added Features . . . . . . . . . . . . . . . . . . . 14
1.10 Revision 0.4.2 Bug Fixes . . . . . . . . . . . . . . . . . . . . . . . 15
1.11 Revision 0.4.1 Added Features . . . . . . . . . . . . . . . . . . . 16
1.12 Revision 0.4.1 Bug Fixes . . . . . . . . . . . . . . . . . . . . . . . 16
1.13 Revision 0.4 Added Features . . . . . . . . . . . . . . . . . . . . 16
1.14 Revision 0.4 Bug Fixes . . . . . . . . . . . . . . . . . . . . . . . . 17
1.15 Revision 0.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2 Installation 19
3 GUI 21
3.1 Installing The Plugin . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Preparing the GUI for using ROCCC . . . . . . . . . . . . . . . . 21
3.3 GUI Menu Overview . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.1 ROCCC Menu . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.2 ROCCC Toolbar . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.3 ROCCC Context Menu . . . . . . . . . . . . . . . . . . . 25
3.4 Loading the Example Files . . . . . . . . . . . . . . . . . . . . . 26
3.5 IP Cores View . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.6 Creating a New ROCCC Project . . . . . . . . . . . . . . . . . . 28
3.7 Build to Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.8 High Level Compiler Optimizations . . . . . . . . . . . . . . . . . 33
3.8.1 System Specific Optimizations . . . . . . . . . . . . . . . 34
3.8.2 Optimizations for both Systems and Modules . . . . . . . 34
3.9 Add IPCores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.10 Create New Module . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.11 Create New System . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.12 Import Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.13 Import System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.14 Intrinsics Manager . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.15 Open ”roccc-library.h” . . . . . . . . . . . . . . . . . . . . . . . . 38
3.16 Reset Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.17 Testbench Generation . . . . . . . . . . . . . . . . . . . . . . . . 39
3.18 Platform Generation . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.19 Updating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2
4 C Code Construction 43
4.1 General Code Guidelines . . . . . . . . . . . . . . . . . . . . . . . 43
4.1.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 Module Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3 System Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3.1 Windows and Generated Addresses . . . . . . . . . . . . . 45
4.3.2 N-dimensional arrays . . . . . . . . . . . . . . . . . . . . . 47
4.3.3 Feedback detection . . . . . . . . . . . . . . . . . . . . . . 47
4.3.4 Summation reduction . . . . . . . . . . . . . . . . . . . . 48
4.4 Instantiating Modules . . . . . . . . . . . . . . . . . . . . . . . . 48
4.4.1 Inlining Modules . . . . . . . . . . . . . . . . . . . . . . . 50
4.5 Control Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.6 Legacy Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.6.1 Legacy Module Code . . . . . . . . . . . . . . . . . . . . . 50
4.6.2 Legacy System Code . . . . . . . . . . . . . . . . . . . . . 53
4.7 Compiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.8 Hardware Specific Optimizations . . . . . . . . . . . . . . . . . . 54
4.8.1 Specifying Bit Width . . . . . . . . . . . . . . . . . . . . . 54
4.8.2 Systolic Array Generation . . . . . . . . . . . . . . . . . . 54
4.8.3 Temporal Common Subexpression Elimination . . . . . . 55
4.8.4 Arithmetic Balancing . . . . . . . . . . . . . . . . . . . . 56
4.8.5 Copy Reduction . . . . . . . . . . . . . . . . . . . . . . . 57
4.8.6 Fanout Tree Generation . . . . . . . . . . . . . . . . . . . 57
4.8.7 Smart Buffers . . . . . . . . . . . . . . . . . . . . . . . . . 57
3
6 Generated Specific Hardware Connections 76
6.1 Basic Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.2 Values created by optimizations . . . . . . . . . . . . . . . . . . . 77
7 Examples Provided 80
7.1 Module Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
7.2 System Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
8 Troubleshooting 84
8.1 Hi-CIRRF Failure . . . . . . . . . . . . . . . . . . . . . . . . . . 84
8.2 Lo-CIRRF Failure . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4
List of Figures
1 Copying the Plugins into Eclipse . . . . . . . . . . . . . . . . . . 21
2 ROCCC 2.0 Registration Window . . . . . . . . . . . . . . . . . 22
3 Location of the ROCCC 2.0 Preferences . . . . . . . . . . . . . . 22
4 The ROCCC Preferences Page . . . . . . . . . . . . . . . . . . . 23
5 ROCCC Menu Items . . . . . . . . . . . . . . . . . . . . . . . . . 23
6 ROCCC Toolbar . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
7 ROCCC Context Menu . . . . . . . . . . . . . . . . . . . . . . . 26
8 Importing the Examples . . . . . . . . . . . . . . . . . . . . . . . 27
9 The ROCCCExamples Project . . . . . . . . . . . . . . . . . . . 27
10 IP Cores View . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
11 Creating a New Project . . . . . . . . . . . . . . . . . . . . . . . 28
12 High-Level Optimizations Page . . . . . . . . . . . . . . . . . . . 29
13 Low-Level Optimizations Page . . . . . . . . . . . . . . . . . . . 30
14 Basic Control of the Pipelining Phase . . . . . . . . . . . . . . . 31
15 Advanced Control of the Pipelining Phase . . . . . . . . . . . . . 32
16 Stream Accessing Management Page . . . . . . . . . . . . . . . . 32
17 Successful compilation . . . . . . . . . . . . . . . . . . . . . . . . 33
18 VHDL Subdirectory Created . . . . . . . . . . . . . . . . . . . . 33
19 Add Component Wizard . . . . . . . . . . . . . . . . . . . . . . . 35
20 New Module Wizard . . . . . . . . . . . . . . . . . . . . . . . . . 36
21 Module Skeleton Code for MACC . . . . . . . . . . . . . . . . . . 36
22 New System Wizard . . . . . . . . . . . . . . . . . . . . . . . . . 37
23 System Skeleton Code for WithinBounds . . . . . . . . . . . . . . 37
24 Intrinsics Manager . . . . . . . . . . . . . . . . . . . . . . . . . . 39
25 Testbench Generation . . . . . . . . . . . . . . . . . . . . . . . . 40
26 Generate a PCore . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
27 Dependent Files Window . . . . . . . . . . . . . . . . . . . . . . 41
28 Generated PCore Folders . . . . . . . . . . . . . . . . . . . . . . 41
29 Check For Updates . . . . . . . . . . . . . . . . . . . . . . . . . . 42
30 (a) Module Code in C and (b) generated hardware . . . . . . . . 44
31 (a) Using a loop in module code and (b) resulting hardware . . . 45
32 (a) System Code in C and (b) generated hardware . . . . . . . . 46
33 Accessing a 3x3 Window . . . . . . . . . . . . . . . . . . . . . . . 47
34 A system with a three dimensional input and output stream . . . 48
35 (a) System Code That Contains Feedback and (b) Generated
Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
36 System Code That Results in a Summation Reduction . . . . . . 49
37 (a) Code That Instantiates a Module, (b) the Generated Hard-
ware, and (c) Generated Hardware After Inlining . . . . . . . . . 51
38 Boolean Select Control Flow. (a) In the original C, (b) in the
intermediate representation, and (c) in the generated hardware
datapath. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
39 Predicated Control Flow (A) in the original C, (B) in the inter-
mediate representation, and (C) in the generated hardware. . . . 52
5
40 Legacy Module Code . . . . . . . . . . . . . . . . . . . . . . . . . 53
41 Declaring And Using A Twelve Bit Integer Type . . . . . . . . . 55
42 C Code To Generate A Systolic Array . . . . . . . . . . . . . . . 55
43 Block Diagram Of Max Filter System . . . . . . . . . . . . . . . 56
44 Block Diagram Of Max Filter System After TCSE . . . . . . . . 56
45 System Code That Accesses a 3x3 Window . . . . . . . . . . . . 58
46 3x3 Smart Buffer Sliding Along a 5x5 Memory . . . . . . . . . . 58
47 System Code That Reads From A Fifo . . . . . . . . . . . . . . . 58
48 Memory Fetches When Using A FIFO . . . . . . . . . . . . . . . 59
49 Timing Diagram Of A System With Both Input Scalars And
Input Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
50 Block Diagram Of A Generated Module . . . . . . . . . . . . . . 62
51 Block Diagram Of A Generated System . . . . . . . . . . . . . . 63
52 C Code That Writes To Three Locations In The Same Stream
Each Loop Iteration . . . . . . . . . . . . . . . . . . . . . . . . . 63
53 Block Diagram Of Generated Hardware For Code That Writes
To Three Locations Each Loop Iteration . . . . . . . . . . . . . . 64
54 Timing Diagram Of Module Use . . . . . . . . . . . . . . . . . . 64
55 Timing Diagram Of Generated Code Reading From A Stream
With Memory Addresses . . . . . . . . . . . . . . . . . . . . . . . 65
56 Timing Diagram Of Generated Code Reading From A Stream
With Multiple Outstanding Memory Requests . . . . . . . . . . . 66
57 Timing Diagram Of Generated Code Reading From A Stream
With Multiple Channels . . . . . . . . . . . . . . . . . . . . . . . 66
58 Timing Diagram of Output Streams . . . . . . . . . . . . . . . . 67
59 Timing Diagram Of The End Of A System’s Processing . . . . . 68
60 C Code For MaxFilterSystem Which Uses A 3x3 Window . . . . 69
61 Basic Dataflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
62 Medium Dataflow . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
63 High fanout a) before registering and b) after registering . . . . . 72
64 Generated Systolic Array Hardware . . . . . . . . . . . . . . . . . 73
65 Theoretical Interface to a 32-bit Floating Point Divide IPCore . 74
66 Wrapper for the Theoretical 32-bit Floating Point Divide . . . . 75
67 System Code Sections Translated Into Hardware . . . . . . . . . 76
68 C Code That Infers Ports . . . . . . . . . . . . . . . . . . . . . . 77
69 Generated Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6
2 Installation
Installation and execution of ROCCC has been tested on the following systems:
• 32-bit Ubuntu Linux
• 64-bit Ubuntu Linux
• 32-bit CentOS Linux
• 64-bit CentOS Linux
• 64-bit OpenSuse Linux
• Macintosh Snow Leopard
Other systems are not supported.
The installation requires gcc 3.4 or above with g++, flex, bison, autoconf,
patch, python, and Eclipse 3.5.1 or higher. When uncompressed, the ROCCC
distribution folder should have the following directories:
• Documentation
The location of this user manual and the developer’s manual.
• Examples
A directory to be imported into the Eclipse framework that contains all
of the example code.
• GUI
The location of the Eclipse plugin .jar files.
• Install
The default location where ROCCC will be installed.
• ReferenceFiles
A directory containing the files necessary for PCore generation.
• Scripts
This directory contains scripts used in the install process.
• tmp
This directory is used for temporary storage when compiling with ROCCC.
Also, the ROCCC distribution folder will contain the following files:
• InferredBRAMFifo.vhdl
A VHDL file that is necessary for synthesis and simulation of ROCCC
generated system code.
• ROCCChelper.vhdl
A VHDL file that is necessary for synthesis and simulation of ROCCC
generated code. This file will also be placed in every vhdl subdirectory
upon compilation.
19
• StreamHandler.vhdl
A VHDL file that is necessary for simulation of systems using the test-
benches created from the GUI.
• roccc-library.h
A link to the local copy of the header file that contains declarations of all
available modules and IP.
• vhdlLibrary.sql3
A link to the local copy of the database used to store all available modules
and IP.
• warning.log
This file will contain all warnings and errors encountered during instal-
lation. If installation was successful, this file can be removed without
consequence.
In order to install ROCCC, run the bash script file ”rocccInstall.sh.” This
script will untar, compile, and initialize all of the packages necessary for ROCCC.
If your system is missing an essential element for the compilation of ROCCC
an error message will be displayed and ROCCC will not be installed.
The rocccInstall.sh script takes two optional parameters, -s and -l, which
specify where to install the source files and local files respectively. By default,
both locations are the install directory.
Included in this distribution are Eclipse plugins that controls access to all
of the ROCCC functionality. The plugins are located in the GUI directory and
you are responsible for moving the files into the appropriate plugin directory on
your system and removing any previously installed ROCCC plugins that may
exist.
If you experience any failures in the installation procedure, consult the trou-
bleshooting section at the end of this document.
20
3 GUI
The ROCCC GUI is a plugin designed for the Eclipse IDE that works on both
Linux and Mac systems. The user must have at least Eclipse version 3.5.1
installed. ROCCC currently supports the C++ and Java versions of Eclipse.
Eclipse can be downloaded for free at www.eclipse.org.
The ROCCC GUI plugin is continually evolving and may function slightly
differently in future releases.
Once you have moved the ROCCC plugins into the plugins folders inside
eclipse, ROCCC should be ready to run on Eclipse. The first time you run
Eclipse with the ROCCC plugins installed, ROCCC will set up the perspective
best used for working with ROCCC. It will also open up a page welcoming you
to ROCCC 2.0 and asking if you would like to register for updates and news as
shown in Figure 2.
21
Figure 2: ROCCC 2.0 Registration Window
Once this is done, a preference page will pop up asking for the ROCCC
distribution path. Set the preference value to wherever you had uncompressed
the ROCCC distribution folder. The validity of the chosen folder can be checked
by clicking the ”Verify ROCCC Distribution Folder” button on the preference
page as shown in Figure 4.
Once that is done, the ROCCC GUI should be ready to use. If you ever
try to use any of the ROCCC functionality and this preference is not set or
that directory is incorrect, the GUI will tell you and ask if you want to set the
ROCCC distribution folder in the Preference menu.
22
Figure 4: The ROCCC Preferences Page
23
• Build: Compile the open modules or system file.
• New
• Add
– IP Core: Add an IP Core directly to the database for future use.
• Import
• View
– IP Cores: Opens the IP Cores view to see available cores in the
ROCCC database.
– roccc-library.h: Open the roccc-library.h file in the default editor.
• Manage
– Intrinsics: Open the intrinsic window to add, edit, or delete intrin-
sics.
• Generate
– PCore Interface: Generate a PCore for a ROCCC module.
– Testbench: Generate a hardware testbench file for a ROCCC com-
ponent.
• Settings
– Reset Database: Reset the database back to its installation con-
figuration.
– Preferences: Open the preference page to manage preferences.
24
• Help
– User Manual: Opens the ROCCC user manual.
– Load Examples: Loads the ROCCC examples in an Eclipse project.
– Check for Updates: Check if a new version of ROCCC is available.
– ROCCC Webpage: Opens the ROCCC webpage.
– About ROCCC: View which version of ROCCC you are using.
25
Figure 7: ROCCC Context Menu
26
Figure 8: Importing the Examples
27
3.5 IP Cores View
ROCCC maintains its own database of compiled modules that can be viewed
at anytime. To view the contents of the database, click ROCCC → View →
IPCores on the Eclipse menu. The ROCCC IPCores view will open and display
all the inserted modules inside the database.
You can view what ports are on a specific module in the database by selecting
a component in the IPCores view. The neighboring table will then display all the
port names, directions, port sizes, and types for that selected component. You
can delete a compiled component from the database by clicking the component
name in the IPCores View and pressing the Delete key. The component will
also be removed from the roccc-library.h file.
You can also use any of the components in the ROCCC database by having
a valid module or system open and selected, move the cursor to where you want
to insert a call to a module, and double click the desired component in the
IPCores view. This will add a function call to the double clicked component in
the open ROCCC file and will add #include roccc-library.h to the top of the
file. All that you will have to do after that is put which variables you wish to
pass into the desired component function call.
28
A window will pop up asking for the name of the new project to make. Type
in the desired name of the project and press ”Ok.” Once that is done a new
project will show up in the project explorer with the name you chose. From
there, to add new modules or systems you either import them from already made
files or create new ones from scratch. To import premade modules or systems
into the project, use the Import → Module and Import → System under the
ROCCC menu. To create new modules or systems to be added to the project,
use the New → Module and New → System under the ROCCC menu.
On the first page you can select which high level compiler optimizations to
add to perform. Depending on whether you are compiling a module or a system,
you will see a different list of available optimizations to choose.
The second page available when compiling asks for which low-level compiler
optimizations to use as in Figure 13. These flags are same regardless of compiling
a module or system.
29
Figure 13: Low-Level Optimizations Page
The third optimization page available when compiling controls the extent
of pipelining in the generated hardware. As shown in Figure 14, the pipelining
may be controlled with a slider that adjusts the generated pipeline from fully
pipelined on the left to fully compacted on the right. When fully pipelined,
every operation will be placed into a separate pipeline stage, resulting in the
largest area but fastest clock. When fully compacted, the compiler will attempt
to put every operation into one pipeline stage, resulting in the slowest clock
speed but smaller area. When fully compacting code, instantiated modules will
retain their delay.
However, not all operations take the same amount of time to execute. To
naively have the compiler arbitrarily pack operations together without consider-
ing how expensive an operation is would give inconsistent results across different
components. Because of this, ROCCC allows you to specify weight values for
each basic operation in the advanced mode as shown in Figure 15. A larger
weight means that operation is more expensive in terms of execution time on
the desired platform. To edit these values, click the advanced tab at the top of
the Area vs Frequency page.
These weight values have no real absolute meaning, they only have meaning
relative to each other. For example, if our Mult operation takes twice as long
as our Add, we need to make sure we make the weight value for Mult is twice
that of Add. This can be done as (100 and 50) or (50 and 25), it doesnt really
matter as long as the weights are proportional to each other. In this case when
compaction occurs, the compiler would attempt to allow two chained additions
to happen together for every multiplication that is done.
If all the weights have the same value, that means that they all take the same
amount of execution time. Again, this can be achieved by having the weights
30
Figure 14: Basic Control of the Pipelining Phase
as all 1s or even all 500s, as long as they are all the same value. The default
weights that were distributed with ROCCC are the values we came up with for
targeting 150 MHZ on a LX-330. These weights combined with the pipeline
slider gives you precise control over how to tune your component in terms of
area and frequency.
Also available in the advanced view is control over the maximum allowable
fanout. When generating a circuit, if any register has a fanout larger than the
specified number registers are inserted along the paths in order to ease routing
constraints.
If compiling a system there is a fourth page in the compilation wizard for
managing the ways streams are accessed as shown in Figure 16. From here
you can select ”Add” to add managing info for either input or output streams.
From here a page will open asking for the stream name, the number of stream
channels, and for input streams, the maximum number of outstanding memory
requests at any time. Once pressing ”Finish” the values will be added to the
stream management page in the corresponding table you pressed ”Add” for.
Once these values are in the table, you can edit these values by double
clicking individual cells and changing the values. The number of outstanding
memory requests must be equal to or greater than the number of stream chan-
nels. Also, the number of stream channels must be a factor of the window the
data is being accessed from for that stream and the step size of the loop.
Once you have selected which optimizations to use and have set the argu-
ments for the optimizations that require them, select Finish. This will run the
ROCCC toolchain on the selected open file inside the Eclipse editor. All output
from the compilation will be outputted on the console inside of Eclipse as shown
in Figure 17.
31
Figure 15: Advanced Control of the Pipelining Phase
32
Figure 17: Successful compilation
If the compilation finished successfully, you will see a VHDL folder in the
project directory next to the file you compiled that will have the generated
VHDL code for that system or module as shown in Figure 18
The selected flags for each file are saved so that if you go to recompile a file
multiple times, it will load which flags were used during the previous compile.
The other way to compile a file is to right-click the desired file in the Project
Navigator and select Build to Hardware in the ROCCC submenu as shown in
Figure 7.
33
3.8.1 System Specific Optimizations
• Systolic Array Generation: Transform a wavefront algorithm that
works over a 2-dimensional array into a one-dimensional hardware struc-
ture with feedback at every stage in order to increase the throughput while
reducing hardware.
Note: This optimization cannot be combined with other optimizations.
• Temporal Common Sub Expression Elimination: Detection and
removal of common code across loop iterations to reduce the size of the
generated hardware.
• LoopFusion: Merge successive loops with the same bounds and no de-
pendencies.
• LoopInterchange: Switch the loop induction variables of two nested
loops.
• Loop Unrolling: Unroll the loop at the given C label by a specified
amount. If the loop has constant bounds, the loop can be fully unrolled.
Arguments:
Loop Label - The loop specified by the C label in the code.
Number of times to unroll - The number of times to unroll the loop. If
the loop has constant bounds, you can set the value to FULLY to fully
unroll the loop. If a system has all of its loops completely unrolled, it will
be transformed and compiled as a module.
• FullyUnroll: Fully unroll all loops in the original C code. If any of the
loops have variable bounds, this pass will stop compilation.
34
3.9 Add IPCores
When working on a ROCCC project, you may want to integrate some hardware
modules that you have access to outside of ROCCC. Using this component would
require you to insert the already created component into the ROCCC database
so the compiler can incorporate it as well as using it in future compilations. To
do this, select Add → IPCore in the ROCCC menu. A window will pop up
asking for the details of the component as shown in Figure 19.
First, specify the name and latency of the component. Next, you need to
add all of the ports for the added component. You need to specify at least one
input port and one output port before you can click Finish. If you need to edit
one of the already added ports, simply double click on the field you wish to
edit and you will be able to change the value of that field. Once everything is
added correctly, click Finish and the component will be added to the ROCCC
database. The component will now also be found in the IPCores view.
35
Figure 20: New Module Wizard
Input the name of the module and which project to add the new module to.
Next add all the ports that this module will have. If you ever need to edit an
already added port, simply double click the field you wish to edit and you will
be able to change the value of that field. Once everything is added correctly,
click Finish and the module will be added to the project. The new file will
open in the editor with the necessary starter code to begin coding the module
as shown in Figure 21.
36
Figure 22: New System Wizard
new Project section. Once you have a valid project open, select New → System
under the ROCCC menu or toolbar to begin creating the new system. A new
window will open asking for the details of the new system.
Input the name of the system and which project to add the new system
to. Lastly, select how many stream dimensions the system will have. Once
everything is added correctly, click Finish and the system will be added to the
project. The new file will open in the editor with the necessary starter code to
begin coding the system as shown in Figure 23.
37
First, browse for the desired ROCCC module file to import. Secondly, type
the name of the module you are importing. Lastly, select which project to
import the module into. Once finished, click the Finish button at the bottom
and the selected module will be imported into the project and will show up in
the Project Navigator view. This does not add the module to the database, this
solely adds the module C code to the project.
38
Figure 24: Intrinsics Manager
39
Figure 25: Testbench Generation
40
Figure 27: Dependent Files Window
PCores support being generated on all modules but currently not on systems.
3.19 Updating
There are a few ways to keep the ROCCC toolset up to date with the most
current version available. The first is by having the ROCCC GUI automatically
check for updates each time on startup. You can change whether or not you want
ROCCC checking for updates at startup in the preference page as in Figure 4.
The other way to check for updates is to manually check for updates by selecting
’Help → Check for Updates’ in the ROCCC menu as in Figure 29.
In both of these cases, ROCCC will check to see if there is a new version of
the compiler and if there is a new version of the GUI plugins. All messaging
about checking for updates will show up in the Eclipse console.
If there is an update for the compiler, it will ask if you would like to update.
Once selecting ”Yes” ROCCC will start patching the compiler to the latest ver-
41
Figure 29: Check For Updates
sion. If there is an update for the GUI plugins and you have selected you wanted
to update, ROCCC will download the latest plugins to the ”GUI” folder of the
distribution directory you installed. To complete installation of the plugins,
you must move the downloaded plugins from the ”GUI” folder of the ROCCC
distribution and place them inside the ”plugins” folder of the Eclipse directory.
It is also suggested you delete the old plugins from the Eclipse plugins folder as
well. Once this is done, restart Eclipse using the command ”./eclipse -clean”
in the terminal which should reload any new plugins and installation should be
complete.
42
4 C Code Construction
4.1 General Code Guidelines
ROCCC supports two styles of C programs, which we refer to as modules and
systems. Modules represent concrete hardware implementations of purely com-
putational functions. Modules can be constructed using instantiations of other
modules in order to create larger components that describe a specific architec-
ture.
System code performs repeated computation on streams of data. System
code consists of loops that iterate over arrays. System code may or may not
instantiate modules. System code represents the topmost perspective and gen-
erates hardware that interfaces to memory systems.
4.1.1 Limitations
ROCCC is not designed to compile entire applications into hardware and has
certain general restrictions on both module and system code. ROCCC is con-
tinually in development, so these restrictions may fluctuate or be eliminated
entirely in future releases. ROCCC 2.0 currently does not support:
• Logical operators that perform short circuit evaluation. The ”&” and ”|”
operators do work and should be used in place of ”&&” and ”||”
• Generic pointers
• Non-component functions, including C-library calls
• Shifting by a variable amount
• Non-for loops
• Variables named ’C’
• The ternary operator (?:)
43
// Example module code
// Input parameters must
// come before output
// parameters
void FIR(int A0, int A1,
int A2, int A3,
int A4,
int& result)
{
const int T[5] =
{3,5,7,9,11} ;
result = A0 * T[0] +
A1 * T[1] +
A2 * T[2] +
A3 * T[3] +
A4 * T[4] ;
}
(a) (b)
and cannot have arrays as input or output variables. Internal variables may be
created but are not visible outside of the module.
Figure 30a shows a simple FIR filter written as a module. This code takes five
inputs and computes a single output. When compiled, the hardware generated
will resemble the circuit shown in Figure 30b. The interface to the module is
exactly as described by the parameter list, the integer array T is not visible
outside of the module.
Modules do not generate addresses or fetch values from memory, but instead
have data pushed onto them, and then output scalar values after all computation
has been performed. They are completely pipelined and can support processing
new data every clock cycle.
If a module contains a loop, it will automatically be fully unrolled. Hence,
any loop inside of a module must have an end bound that can be statically
determined. Figure 31a provides an example of the supported loop structure
inside modules.
After unrolling, constant and copy propagation, we end up with the hardware
as shown in Figure 31b which is a single multiply as we would expect. There is
no loop control or other control created as the loop has been removed.
44
// This module contains a loop, it will
// automatically be fully unrolled
void Squared(int x, int& y)
{
int total ;
int i ;
total = 1 ;
for (i = 0 ; i < 2 ; ++i)
{
total *= x ;
}
y = total ;
}
(a) (b)
Figure 31: (a) Using a loop in module code and (b) resulting hardware
to modules, input scalars are read once at the beginning of computation and
output scalars are only generated once at the end of computation.
Similar to module code, system code is written as a void function that takes
input and output parameters. Input scalars are passed by value, output scalars
are passed by reference, and both input and output streams are passed as point-
ers. The function definition must declare inputs before outputs. Although
passed as pointers, the internal use of streams must be through array accesses.
An example of system code is shown in Figure 32a. This code takes a single
input scalar that is used to determine the length of the incoming streams, two
input streams V1 and V2, and an output stream Sum. The computation adds
all elements of the two input vectors and outputs them to the Sum stream.
Like module code, all inputs must be declared in the parameter list before any
outputs.
The generated hardware is shown in Figure 32b. Each stream specified in the
C code generates a memory interface that includes an address generator (AG)
and a BRAM FIFO structure. The specifics of the hardware communication
protocols are discussed in Section 5. Data reuse is handled through the creation
of smart buffers, which is detailed in Section 4.8.7. The code located in the
innermost loop will be translated into a datapath that is separate from the
control.
45
// Example system code
// Streams are passed as
// pointers but treated
// as arrays
void VectorAdd(int N,
int* V1,
int* V2,
int* Sum)
{
int i ;
for (i = 0 ; i < N ; ++i)
{
Sum[i] = V1[i] + V2[i] ;
}
}
(a) (b)
46
void WindowSystem(int* A, int* B)
{
int i, j ;
for (i = 0 ; i < 10 ; ++i)
{
for (j = 0 ; j < 10 ; ++j)
{
B[i][j] = A[i][j] + A[i+2][j+2] ;
}
}
}
The addresses we generate will be the same as in C, and note that if run in C
on a 10x10 array the results will be undefined.
When fetching the first window, we will therefore generate the offsets 0, 13,
and 26 for the first column and NOT 0, 10, 20. Similarly, we will generate the
offsets 1, 14, and 27 for the second column, and 2, 15, and 28 for the third
column of the window.
Additionally, we perform a normalization step on the window accesses to
adjust for negative offsets. If the C code accesses an array with a negative
offset, for example A[i-2] and A[i-1], we normalize these values to start at loca-
tion 0, meaning the previous offsets will be adjusted to A[i] and A[i+1]. After
the normalization, we determine the size of the memory rows we are accessing
identically as above.
47
// Example N-Dimensional code
void NDimensional(int*** A, int*** B)
{
int i, j, k ;
for (i = 0 ; i < 10 ; ++i)
{
for (j = 0 ; j < 10 ; ++j)
{
for (k = 0 ; k < 10 ; ++k)
{
B[i][j][k] = A[i][j][k] ;
}
}
}
}
Figure 34: A system with a three dimensional input and output stream
48
// Example code with feedback
void MaxSystem(int N, int* A,
int& final)
{
int i ;
int currentMax ;
for (i = 0 ; i < N ; ++i)
{
if (A[i] > currentMax)
{
currentMax = A[i] ;
}
else
{
currentMax = currentMax ;
}
final = currentMax ;
}
}
(a) (b)
Figure 35: (a) System Code That Contains Feedback and (b) Generated Hard-
ware
49
If you wish to accomplish this, you must declare an intermediate temporary
variable and assign the output of the module to this variable and then assign
the variable to the output array. This is shown in Figure 37a as the output
of FIR must be mapped to the variable tmp and then assigned to the output
stream B.
50
void FIRSystem(int* A, int* B)
{
int i ;
int tmp ;
for (i = 0 ; i < 10; ++i)
{
// Module instantiation
FIR(A[i], A[i+1], A[i+2],
A[i+3], A[i+4], tmp) ;
B[i] = tmp ;
}
}
Figure 37: (a) Code That Instantiates a Module, (b) the Generated Hardware,
and (c) Generated Hardware After Inlining
51
if (value > 5)
{
x = 1 ;
}
else
{
x = 2 ;
} x = ROCCCBoolSelect(1, 2, (value > 5)) ;
(a) (b)
(c)
Figure 38: Boolean Select Control Flow. (a) In the original C, (b) in the
intermediate representation, and (c) in the generated hardware datapath.
if (value > 5)
{
x = 1 ; pred = (value > 5) ;
} x = ROCCCBoolSelect(1, x, pred) ;
(a) (b)
(c)
Figure 39: Predicated Control Flow (A) in the original C, (B) in the intermediate
representation, and (C) in the generated hardware.
52
typedef struct
{
int A0_in ;
int A1_in ;
int A2_in ;
int A3_in ;
int A4_in ;
int result_out ;
} FIR_t ;
FIR_t FIR(FIR_t t)
{
const int T[5] = { 3, 5, 7, 9, 11 } ;
t.result_out = t.A0_in * T[0] + t.A1_in * T[1] +
t.A2_in * T[2] + t.A3_in * T[3] + t.A4_in * T[4] ;
return t ;
}
to the module. Input ports must be identified by adding the suffix ” in” and
output ports must be identified by adding the suffix ” out.”
The implementation function must be a function that returns and receives
an instance of this struct by value. Any return statements that are not at the
end of the function are ignored and cannot be used as a form of control flow.
All computation inside this function will be translated to hardware.
The FIR filter shown in Figure 40 is written in this style. Note that the
hardware generated for this code is nearly identical to the hardware generated
for the same code written in Figure 30. The only difference will be in the
ordering of the ports once compiled.
IMPORTANT NOTE: When compiling Legacy ROCCC modules, the order
in which you pass the parameters is not necessarily the order in which you
declared them in the struct. The order in which you pass parameters must
match the order in which they appear in the struct as exported in the ”roccc-
library.h” file. If using the GUI, this ordering is available by double-clicking the
module in the IPCores view. Modules written in the new style will have the
parameters in the same order as written.
53
4.7 Compiling
Compiling should be handled through the GUI.
In order to compile without using the GUI, you must call the program ”cre-
ateScript,” located in the Install/roccc compiler/src/tools directory. This pro-
gram takes two arguments, the C file and a file listing optimizations to perform.
A script file ”compile suif2hicirrf.sh” is then generated. Run this script and
then the script ”compile llvmtovhdl.sh” on the hi cirrf file to generate VHDL.
Details on this process are available in the Developer’s Manual.
54
typedef int ROCCC_int12 ;
In order for systolic array generation to recognize the optimization, the outer
loop must be labelled as shown in Figure 42.
The current version of systolic array generation only transforms a precise
software architecture into a specific instance of a systolic array. The code must
have a single two-dimensional array where the value of every cell is based upon
some function of the cells located to the north, west, and northwest. Option-
ally, the C code may have a constant array of values based upon the outer loop
bounds and a single dimensional input array based upon the loop bounds of
the innermost loop as seen in the Smith Waterman example. Any other soft-
ware architecture is not currently supported for the systolic array generation
optimization.
After transformation, the resulting hardware will expect a one dimensional
input array (A input) and produces a one dimensional output array (A output)
in place of the original two-dimensional array. The input stream A input should
be the values of the topmost row of the original two-dimensional array. The
output stream A output will generate the bottom row of the original two-
dimensional array. All of the intermediate values are discarded and not out-
put in the generated hardware structure. Additionally, the first column of the
original two-dimensional array must be passed in as scalars to the resulting
hardware.
55
Figure 43: Block Diagram Of Max Filter System
generating hardware, we take advantage of this fact and create feedback vari-
ables that eliminate redundant computations.
TCSE can only be performed on system code. The code does not have to
be written in any special way to take advantage of TCSE.
An example of the difference in hardware generated can be see in Figures
43 and 44. These block diagrams show the original structure of the Max Filter
System hardware that contains four Max Filter modules and operates on a
sliding 3x3 window and the Max Filter System after TCSE has been performed.
After TCSE, the generated hardware only has two Max Filter modules and two
have been replaced with feedback variables.
The generated hardware does require initial values for each piece of hardware
eliminated, so you might have to change the way you pass data into the hardware
depending on if you perform TCSE or not.
56
bitwise AND, OR, and XOR are balanced. For example, the statement ”a =
b + c + d + e” in software will be calculated serially. By performing arith-
metic balancing, the statement is changed into ”a = (b + c) + (d + e)”, with
”b+c” and ”d+e” calculated in parallel. Because floating point operators are
not strictly associative and commutative, and order of execution matters when
dealing with overflow, this optimization may change the final result when using
floating point values.
57
for(i = 0 ; i < 5 ; ++i )
{
for (j = 0 ; j < 5; ++j)
{
row1 = A[i][j] + A[i][j+1] + A[i][j+2] ;
row2 = A[i+1][j] + A[i+1][j+1] + A[i+1][j+2] ;
row2 = A[i+2][j] + A[i+2][j+1] + A[i+2][j+2] ;
B[i][j] = row1 + row2 + row3 ;
}
}
(shown with X’s in the diagram). The smart buffer initially reads nine values
from memory and exports all nine to the datapath for the first loop iteration,
and for subsequent iterations only three are read for each loop iteration.
The code as shown in Figure 47 will be analyzed by ROCCC and determined
that no reuse occurs between loop iterations. In this case, a FIFO interface is
generated. For each loop iteration, two elements are read in, as in Figure 48.
No reuse can be exploited between consecutive loop iterations.
for (i = 0 ; i < 5; i += 2)
{
B[i] = A[i] + A[i+1] ;
}
58
Figure 48: Memory Fetches When Using A FIFO
59
5 Interfacing With Generated Hardware
5.1 Port Descriptions
The VHDL generated by ROCCC communicates with the external platform in
a variety of ways described in this section. All inputs and outputs that connect
to ROCCC code are assumed to be active-high.
60
Figure 49: Timing Diagram Of A System With Both Input Scalars And Input
Streams
finished processing all of the input it was designed to process and remains
high until the reset signal is asserted.
• stall
The stall port is used by the interfacing code to stall the pipeline of the
generated hardware.
• Registers
For each input register, a single data port will be generated. When gen-
erating modules, all inputs are treated as registers. When generating
systems, any single variable that acts as input to the main loop will be
treated as an input register.
For each output register, a single data port will be generated. When
generating modules, all outputs are treated as registers. When generating
systems, any single variable that acts as output to the main loop will be
treated as an output register.
61
*+, -)# )#2++ "./0#1(2%3
45". 89
!"#$"%#&'()# : 6570#
!5". 8@
70#/0#1(2%3 %7.(
;<=>?
Figure 50: Block Diagram Of A Generated Module
62
*+, -&' &'2++ ./01'#(23%
456*+, =5-*+,
45./ 89 89 =5*>2//(+?
456-.'((/
!"#$%&'()
45:1++ =5()0'%
=5-(23(/
45233-(&&5-3%
89 89 =5233-(&&
45233-(&&
71'01'#(23% 37/(
!";<9
Figure 51: Block Diagram Of A Generated System
Figure 52: C Code That Writes To Three Locations In The Same Stream Each
Loop Iteration
63
.%/01.2"00)(,, :;
4(%<4' :;
.%0'4 :;
17'87'9("0+ 01/(
$ABC:
E2.3&F
?2)3&F
E234"//(&@ :; :; ?234"//(&@
E2.)%'((/
E257&& 67'87'*')("-= ?2(-8'+
>#"-8&( ?2)("0(/
E2"00)(,,2)0+
:; ?2"00)(,,
E2"00)(,, :;
17'87'9("0+ 01/(
$ABCD
Figure 53: Block Diagram Of Generated Hardware For Code That Writes To
Three Locations Each Loop Iteration
64
5&--'*67(&0&70.7&780+,&9
:;'0<7-'*6#,7&((+,--76,*,+&0'.*=
!"#$
%&#'()'*
(&0&)'* (&0&2 (&0&3
B/##)./0
&((+,--)+(1)./0
&((+,--)./0 &((+2 &((+3 &((+4
65
4&--'*56(&0&60.6&670+,&8
95,*,+&0'*5636&((+,--,-:
!"#$
%&#'()'*
(&0&)'* (&0&2 (&0&3
&((+,--)+(1)./0
&((+,--)./0 &((+2 &((+3 &((+; &((+<
5,..'*67-,(,7(17,78(&),9
Figure 56: Timing Diagram Of Generated Code Reading From A Stream With
:6)*)&,('*67;7,--&)..).<7;7'*=2(7"/,**)#.>
Multiple Outstanding Memory Requests
!"#$
%&'())*+'*
-,(,+"/,**)#0+'* -,(,4
-,(,+"/,**)#0+'* -,(,;
?2##+12(
,--&)..+&-3+12(
,--&)..+"/,**)#0+12( ,--&4 ,--&@
Figure 57: Timing Diagram Of Generated Code Reading From A Stream With
Multiple Channels
all channel data has been fetched, the interfacing code should set valid high and
hold it high until pop is seen to be high. An example of this timing protocol is
shown in Figure 57.
66
2)'3".#45,(',(46(7%.&48%9.:3+7
!"#$
%&'()*+,(
-.(.*+,( -.(./ -.(.0 -.(.1
7%.-%;*3;
5.2.5 Done
The done signal works differently, depending on if it is coming from module or
system code. Module code will drive the done signal high as soon as the first
value is processed; this can safely be ignored by any code interfacing with a
ROCCC module, as modules are stateless and can never be considered done.
System code will drive the done signal high on the rising edge of the clock after
the last output values are set. Figure 59 provides an example of the done signal’s
67
Figure 59: Timing Diagram Of The End Of A System’s Processing
5.2.6 Stall
The stall signal allows the interfacing code to stall the datapath in both modules
and systems. Stalls are not instantaneous - it takes 1-2 clock cycles for the stall
signal to propogate all the way up the datapath, to both the input and output
controller. In hardware, a common use for a stall signal is when interfacing with
memory that may become full. However, both input and output streams are
two-way handshakes, and any stream can be ”stalled” by simply not completing
the handshake. For this reason, and because stalls are not instantaneous, stalls
should be reserved for the case when there is no alternative.
When the stall signal is brought high, both input and output streams will
continue to interact with any interfacing code. However, the datapath will be
frozen, and data will not be pushed onto the datapath. Again, prefer to handle
full memory in the stream interface, and not with the stall signal.
68
for(i = 0 ; i < height ; ++i)
{
for (j = 0 ; j < width ; ++j)
{
MAX(window[i][j], window[i][j+1], window[i][j+2], maxCol1) ;
MAX(window[i+1][j], window[i+1][j+1], window[i+1][j+2], maxCol2) ;
MAX(window[i+2][j], window[i+2][j+1], window[i+2][j+2], maxCol3) ;
// Find the maximium of the three columns
MAX(maxCol1, maxCol2, maxCol3, finalOutput) ;
}
}
begin returning valid data to the component’s request for window elements; not
setting height and width to the correct values will result in the wrong addresses
being generated.
69
Figure 61: Basic Dataflow
5.4 Pipelining
Pipelining in ROCCC is guided by user-provided weights of basic operations.
By varying these numbers, along with a desired clock cycle weight, the aggres-
siveness of pipelining can be controlled by the user. Under ROCCC, the data
flow graph representing each loop body contains no initial registers. Registers
are then inserted into the data flow graph until no register to register path has
a total weight greater than the desired clock cycle weight.
In Figure 61, the leftmost mux has a critical path of one addition operation
(assuming W eight(add) > W eight(compare)), while the rightmost mux has
a critical path of one addition operation and one comparison. By choosing a
desired delay d such that W eight(mux) + W eight(add) < d < W eight(mux) +
W eight(add) + W eight(compare), registers were inserted after the leftmost
mux, but before the rightmost mux. This can be seen in Figure 62. When
dealing with complicated multi-operation datapaths and a large pipeline depth,
this sort of timing analysis is difficult and error-prone when performed by hand,
and time consuming when done at the gate level on large graphs by the synthesis
tool.
70
Figure 62: Medium Dataflow
5.6 Intrinsics
Unlike in C, integer division, modulus, and floating point operations are expen-
sive to do in hardware. In fact, there is no way to specify ”add two 32-bit floats”
or ”multiply two 16-bit floats”, other than implementing the algorithm yourself,
or using a hardware IPCore specifically designed for that purpose. These oper-
ations are significantly more complex than simple operations, such as addition,
and because there are several ways to implement division, the synthesis tool
does not blindly infer a solution.
In order to simulate or synthesize code generated with ROCCC that uses
integer division, integer modulus, or floating point operations, it is necessary
71
Figure 63: High fanout a) before registering and b) after registering
72
(&C *#$ #$+&& '3/>$@0+?"
:2<(&C
:2(;+330&1 65
:2<*'$003
:2=>&&
:2+??*0##2*?"
65 D2*(&C
:2+??*0##
65 D2(;+330&1
)2<(&C
65 D20./$"
)2(;+330&1
D2*0+?03
)2<*'$003 !"#$%&'()**+",-+./&0 65
)2=>&& D2+??*0##
)2+??*0##2*?"
65
)2+??*0##
$0./12'3'$2'3
$0./42'3'$2'3
65AB'$#
$0./52'3'$2'3
$0./62'3'$2'3
$0./72'3'$2'3
$0./82'3'$2'3
$0./92'3'$2'3
%>$/>$@0+?" ?%30
73
entity fp_div_gen32 is
port (
a : in STD_LOGIC_VECTOR(31 downto 0); --dividend
b : in STD_LOGIC_VECTOR(31 downto 0); --divisor
clk : in STD_LOGIC; --clok signal
ce : in STD_LOGIC; --clock enable, brought low to stall the core
result : out STD_LOGIC_VECTOR(31 downto 0) --quotient
);
end fp_div_gen32;
74
entity fp_div32 is
port (
clk : in STD_LOGIC; --clock signal
rst : in STD_LOGIC;
inputReady : in STD_LOGIC;
outputReady : out STD_LOGIC;
done : out STD_LOGIC;
stall : in STD_LOGIC;
a : in STD_LOGIC_VECTOR(31 downto 0);
b : in STD_LOGIC_VECTOR(31 downto 0);
result : out STD_LOGIC_VECTOR(31 downto 0)
);
end fp_div32;
component fp_div_gen32 IS
port (
a: IN std_logic_VECTOR(31 downto 0);
b: IN std_logic_VECTOR(31 downto 0);
operation_rfd: OUT std_logic;
clk: IN std_logic;
ce: IN std_logic;
result: OUT std_logic_VECTOR(31 downto 0)
);
END component;
begin
inv_stall <= not stall; --when we need to stall, we just stop enabling the clock
U0 : fp_div_gen32 port map ( a => a, b => b, clk => clk,
ce => inv_stall, result => result);
end Behavioral;
Figure 66: Wrapper for the Theoretical 32-bit Floating Point Divide
75
void SystemCode(int**A, int**B)
{
int i ;
int j ;
int x ;
x = 5 ; // Ignored
for (i = 0 ; i < 10 ; ++i)
{
for(j = 0 ; j < 10; ++j)
{
B[i][j] = A[i][j] + x ; // Only statement translated into hardware
}
}
x = B[9][9] ; // Ignored
}
76
void SystemCode()
{
int i ;
int endValue ; // Read and not written in the innermost loop,
// is an input scalar
77
entity SystemCode is
port (
-- Default signals
clk : in STD_LOGIC ;
rst : in STD_LOGIC ;
inputReady : in STD_LOGIC ;
outputReady : out STD_LOGIC ;
done : out STD_LOGIC ;
stall : in STD_LOGIC ;
-- Input Scalars
x_in : in STD_LOGIC_VECTOR(31 downto 0) ;
endValue_in : in STD_LOGIC_VECTOR(31 downto 0) ;
-- Output Scalars
z_out : out STD_LOGIC_VECTOR(31 downto 0)
) ;
end SystemCode ;
78
Systolic Array generation will turn the original two dimensional array into a
one dimensional array input (which corresponds to the first row of the two-
dimensional array) and will create initialization input ports for every element
in the first column of the original two-dimensional array.
79
7 Examples Provided
Twenty-five different example codes are provided to demonstrate the current
capabilities of ROCCC 2.0. These are located in the Examples subdirectory.
Additionally, legacy versions of each of these examples are included in the Ex-
amples subdirectory. The Examples subdirectory contains a directory with all
of the Module examples, a directory with all of the System examples, and a di-
rectory that contains C code to verify software functionality of all the examples.
• FFTOneStage
This example combines three stages of the FFTOneStage examples into a
complete butterfly operation to perform the butterfly operation of the FFT
on streams of data. Only compile this example after the FFT example.
80
• FIR
This example performs a five-tap finite impulse response filter on five in-
puts. This example shows how to create a module with internal constants
that are propagated in the hardware. This module should be compiled
before the FIRSystem example.
• Histogram
The histogram example shows the supported uses of ”if” statements in
the C code. Currently, if statements that provide one of two values to
a variable are supported and converted into boolean select logic in the
generated hardware. The histogram code generates a hardware module.
• MAC
The MAC example creates a hardware module for use in systems that
performs arithmetic on integers.
• MaxFilter
The MaxFilter example creates a hardware module that takes three values
and returns the maximum. This shows the mixing of supported ”if” state-
ments as well as internal registers not visible outside the module. This
code should be compiled before the MaxFilterSystem example.
• MD
The MD example performs a subset of the calculations necessary for a
single timestep in a molecular dynamics simulation. Two atom’s data
are passed in and the Coulombic force in the X, Y, and Z directions are
calculated. The MD module should be compiled before the MDComplete
example.
• MDFloat
The MDFloat example performs the same calculations as the MD example,
but uses single precision floating point calculations. The hardware module
generated creates instances of the default floating point cores as generated
by Xilinx Core Generator. If you wish to simulate or synthesize, you must
provide a VHDL mapping file that maps the stubs ROCCC uses with the
local copies of the floating point cores on your machine.
• ModuleWithALoop
This example shows the use of loops in modules. The loops must be fully
unrolled in order to compile. When no optimizations are selected this will
currently fail to compile.
• Pow10
Contains a loop that will automatically be fully unrolled. This example
will take a value and return the value raised to the tenth power.
• QuadraticFormula
This example performs the quadratic formula on complex numbers. This
81
example shows the usage of if statements that get transformed into pred-
ication.
• SingleCell
This example performs the calculations necessary for a single cell of a
wavefront algorithm like Smith-Waterman. This code can then be used as
a module in a larger systolic array generation.
82
• ModularSystolicArray
This version of systolic array uses a module for each individual cell of
the wavefront algorithm. You must compile this system with the systolic
array generation optimization selected.
• SmithWaterman
An implementation of the Smith-Waterman algorithm that can be com-
piled with the Systolic Array Generation optimization to create an efficient
hardware solution.
• VarianceFilter
This example takes a one dimensional input stream and calculates the vari-
ance among every twenty elements. The output is placed in an outgoing
stream. This example shows how integer division is treated.
83
8 Troubleshooting
When installing, the following messages may be output . If any of them occur,
then ROCCC is not correctly installed and will not function. In this case,
please keep track of which occurred and the file ”warning.log” generated during
compilation and visit the discussion board for further help.
• Compilation of gcc 4.0.2 failed
• Installation of gcc 4.0.2 failed
• Hi CIRRF compilation failed
• SQLite 3 compilation failed
• LLVM compilation failed
After installation, you may receive an error during compilation. Most errors
attempt to diagnose how they occurred, but some errors may exist that do not.
For these, please visit the discussion board for help.
If you have installed and receive an error in compilation, it will be reported
to be either a Hi-CIRRF compilation error or Lo-CIRRF compilation error. The
following sections deal with common errors at each stage.
84
8.2 Lo-CIRRF Failure
• ”Unknown component name!”
Make sure all function calls exist in the database before compiling.
85