Chisel Book
Chisel Book
with Chisel
Martin Schoeberl
Digital Design with Chisel
Fifth Edition
Digital Design with Chisel
Fifth Edition
Martin Schoeberl
Copyright © 2016–2023 Martin Schoeberl
This work is licensed under a Creative Commons Attribution-ShareAlike
4.0 International License. http://creativecommons.org/licenses/
by-sa/4.0/
Email: [email protected]
Visit the source at https://github.com/schoeberl/chisel-book
Published 2019 by Kindle Direct Publishing,
https://kdp.amazon.com/
Foreword xiii
Preface xv
1 Introduction 1
1.1 Installing Chisel and FPGA Tools . . . . . . . . . . . . . . . . . . 2
1.1.1 macOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Linux/Ubuntu . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.4 FPGA Tools . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Hello World . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Chisel Hello World . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 An IDE for Chisel . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Source Access and eBook Features . . . . . . . . . . . . . . . . . . 6
1.6 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Basic Components 11
2.1 Chisel Types and Constants . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Combinational Circuits . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.1 Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.1 Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Structure with Bundle and Vec . . . . . . . . . . . . . . . . . . . . 18
2.4.1 Bundle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.2 Vec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5 Wire, Reg, and IO . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.6 Chisel Generates Hardware . . . . . . . . . . . . . . . . . . . . . . 26
2.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
I
C ONTENTS
4 Components 47
4.1 Components in Chisel are Modules . . . . . . . . . . . . . . . . . . 47
4.2 Nested Components . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.3 An Arithmetic Logic Unit . . . . . . . . . . . . . . . . . . . . . . . 56
4.4 Bulk Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.5 External Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
II Index Contents
C ONTENTS
6.2.4 A Timer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.2.5 Pulse-Width Modulation . . . . . . . . . . . . . . . . . . . 86
6.3 Shift Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.3.1 Shift Register with Parallel Output . . . . . . . . . . . . . . 89
6.3.2 Shift Register with Parallel Load . . . . . . . . . . . . . . . 90
6.4 Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7 Input Processing 97
7.1 Asynchronous Input . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.2 Debouncing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.3 Filtering of the Input Signal . . . . . . . . . . . . . . . . . . . . . . 100
7.4 Combining the Input Processing with Functions . . . . . . . . . . . 102
7.5 Synchronizing Reset . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.6 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
12 Interconnect 185
12.1 A Classic Microprocessor Bus . . . . . . . . . . . . . . . . . . . . 185
12.2 An On-Chip Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
12.2.1 Combinational Handshake . . . . . . . . . . . . . . . . . . 188
12.2.2 Pipelined Handshake . . . . . . . . . . . . . . . . . . . . . 189
12.2.3 Example IO Device . . . . . . . . . . . . . . . . . . . . . . 190
12.2.4 Memory Mapped Devices . . . . . . . . . . . . . . . . . . 192
12.3 Bus and Interface Standards . . . . . . . . . . . . . . . . . . . . . 194
12.3.1 Wishbone . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
12.3.2 AXI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
12.3.3 Open Core Protocol . . . . . . . . . . . . . . . . . . . . . . 196
12.3.4 Further Bus Specifications . . . . . . . . . . . . . . . . . . 196
IV Index Contents
C ONTENTS
16 Summary 237
C Acronyms 243
Bibliography 247
Index 251
Contents Index V
List of Figures
2.1 Logic for the expression (a & b) | c. The wires can be a single bit
or multiple bits. The Chisel expression, and the schematics are the
same. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 A basic 2:1 multiplexer. . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 A D flip-flop based register with a synchronous reset to 0. . . . . . . 17
2.4 A vector wrapped in a Wire is just a multiplexer. . . . . . . . . . . . 20
2.5 A vector of registers. . . . . . . . . . . . . . . . . . . . . . . . . . 22
VI
L IST OF F IGURES
9.1 The light flasher split into a Master FSM and a Timer FSM. . . . . . 120
9.2 The light flasher split into a Master FSM, a Timer FSM, and a
Counter FSM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
9.3 A state machine with a datapath. . . . . . . . . . . . . . . . . . . . 125
9.4 State diagram for the popcount FSM. . . . . . . . . . . . . . . . . . 126
9.5 Datapath for the popcount circuit. . . . . . . . . . . . . . . . . . . 126
9.6 The ready/valid flow control. . . . . . . . . . . . . . . . . . . . . . 130
9.7 Data transfer with a ready/valid interface, early ready. . . . . . . . . 131
9.8 Data transfer with a ready/valid interface, late ready. . . . . . . . . 131
9.9 Single cycle ready/valid and back-to-back transfers. . . . . . . . . . 132
IX
Listings
X
L ISTINGS
Contents Index XI
Foreword
It is an exciting time to be in the world of digital design. With the end of Dennard
Scaling and the slowing of Moore’s Law, there has perhaps never been a greater
need for innovation in the field. Semiconductor companies continue to squeeze out
every drop of performance they can, but the cost of these improvements has been
rising drastically. Chisel reduces this cost by improving productivity. If designers
can build more in less time, while amortizing the cost of verification through reuse,
companies can spend less on Non-Recurring Engineering (NRE). In addition, both
students and individual contributors can innovate more easily on their own.
Chisel is unlike most languages in that it is embedded in another programming
language, Scala. Fundamentally, Chisel is a library of classes and functions repre-
senting the primitives necessary to express synchronous, digital circuits. A Chisel
design is really a Scala program that generates a circuit as it executes. To many, this
may seem counterintuitive: “Why not just make Chisel a stand-alone language like
VHDL or SystemVerilog?” My answer to this question is as follows: the software
world has seen a substantial amount of innovation in design methodology in the
past couple of decades. Rather than attempting to adapt these techniques to a new
hardware language, we can simply use a modern programming language and gain
those benefits for free.
A longstanding criticism of Chisel is that it is “difficult to learn.” Much of this
perception is due to the prevalence of large, complex designs created by experts to
solve their own research or commercial needs. When learning a popular language
like C++, one does not start by reading the source code of GCC. Rather, there are a
plethora of courses, textbooks, and other learning materials that cater toward new-
comers. In Digital Design with Chisel, Martin has created an important resource for
anyone who wishes to learn Chisel.
Martin is an experienced educator, and it shows in the organization of this book.
Starting with installation and primitives, he builds the reader’s understanding like a
building, brick-by-brick. The included exercises are the mortar that solidifies under-
standing, ensuring that each concept sets in the reader’s mind. The book culminates
with hardware generators like a roof giving the rest of the structure purpose. At
XIII
F OREWORD
the end, the reader is left with the knowledge to build a simple, yet useful design: a
RISC processor.
In Digital Design with Chisel, Martin has laid a strong foundation for productive
digital design. What you build with it is up to you.
Jack Koenig
Chisel and FIRRTL Maintainer
Staff Engineer, SiFive
XV
P REFACE
For the fourth edition we have switched to the actual Chisel version 3.5.4. We
have added arbiter, priority encoder, and comparator to the chapter of combina-
tional building blocks. We have extended the hardware generation chapter with
more functional examples, including building a fair arbitration tree out of a simple
2 to 1 arbitration circuits. We have added a new chapter on interconnect, bus in-
terfaces, and how to connect an IO device as a memory mapped device. We have
started a new chapter on debugging, testing and verification. The plan is to extend
the chapter on this important topic in the next edition. We have extended the proces-
sor chapter with a more gentle introduction of a microprocessor, including a figure
of the datapath.
Translations
This book has been translated to Chinese, Japanese, and Vietnamese. All transla-
tions are available as free PDF from the books web page. I would like to thank
Yuda Wang, Qiwei Sun, and Yun Chen for the Chinese translation; Seiji Munetoh,
Masatoshi Tanabata, and Takaaki Hagino for the Japanese translation; and VieLe
Duc Hung for the Vietnamese translation. If you are interested to translate this book
into another language, feel free to do it and publish it under the same license. Please
contact me then, so I can point to your translation.
Acknowledgements
I want to thank everyone who has worked on Chisel for creating such a cool hard-
ware construction language. Chisel is so joyful to use and therefore worth writing a
book about. I am thankful to the whole Chisel community, which is so welcoming
and friendly and never tired to answer questions on Chisel.
I would also like to thank my students in the last years of an advanced computer
architecture course where most of them picked up Chisel for the final project. Thank
you for moving out of your comfort zone and taking up the journey of learning and
using a bleeding-edge hardware description language. Many of your questions have
helped to shape this book.
It was a pleasure to use Chisel in the last three years of teaching a digital elec-
tronics course at the Technical University of Denmark. I know it is a challenge to
pickup Chisel and Java in parallel in the second semester. Thank you to all students
from this course, who had on open mind to pickup a modern programming language
for hardware description.
For the third edition, I would like to acknowledge Hans Jakob Damsgaard (@hanse-
mandse) for rewriting all test code of this book to ChiselTest, adding ChiselTest to
the testing chapter, adding the black box description, and an example for a multi-
clock memory.
This book is an introduction to digital system design using a modern hardware con-
struction language, Chisel [5]. In this book, we focus on a higher abstraction level
than usual in digital design books, to enable you to build more complex, interacting
digital systems in a shorter time.
This book and Chisel are targeting two groups of developers: (1) hardware de-
signers and (2) software programmers. Hardware designers who are fluent in VHDL
or Verilog and using other languages such as Python, Java, or Tcl to generate hard-
ware can move to a single hardware construction language where hardware gen-
eration is part of the language. Software programmers may become interested in
hardware design, e.g., as future chips from Intel will include programmable hard-
ware to speed up programs. It is perfectly fine to use Chisel as your first hardware
description language.
Chisel brings advances in software engineering, such as object-orientated and
functional programming, into the digital design domain. Chisel does not only allow
to express hardware at the register-transfer level but allows you to write hardware
generators.
Hardware is now commonly described with a hardware description language. The
time of drawing hardware components, even with CAD tools, is over. Some high-
level schematics can give an overview of the system but are not intended to describe
the system. The two most common hardware description languages are Verilog and
VHDL. Both languages are old, contain many legacies, and have a moving line
of what constructs of the language are synthesizable to hardware. Do not get me
wrong: VHDL and Verilog are perfectly able to describe a hardware block that can
be synthesized into an ASIC. For hardware design in Chisel, Verilog serves as an
intermediate language for testing and synthesis.
This book is not a general introduction to digital design and the fundamentals
of it. For an introduction of the basics in digital design, such as how to build a
gate out of CMOS transistors, refer to other digital design books. However, this
book intends to teach digital design at an abstraction level that is current practice
1
1 I NTRODUCTION
1 As the author is more familiar with FPGAs than ASICs as target technology, some design optimiza-
tions shown in this book are targeting FPGA technology.
2 Index Contents
1.1 I NSTALLING C HISEL AND FPGA T OOLS
1.1.1 macOS
Install the Java OpenJDK 8 (or 11) from AdoptOpenJDK. On Mac OS X, with the
packet manager Homebrew, sbt and git can be installed with:
$ brew install sbt git
Install GTKWave and IntelliJ (the community edition). When importing a project,
select the JDK you installed before.
1.1.2 Linux/Ubuntu
Install Java and useful tools in Ubuntu with:
$ sudo apt install openjdk-8-jdk git make gtkwave
For Ubuntu, which is based on Debian, programs are usually installed from a
Debian file (.deb). However, as of the time of this writing, sbt is not available as
a ready to install package. Therefore, the installation process is a little bit more
involved. Follow the instructions from sbt download
1.1.3 Windows
Install the Java OpenJDK (8 or 11) from AdoptOpenJDK. Chisel and Scala can also
be installed and used under Windows. Install GTKWave and IntelliJ (the community
edition). When importing a project, select the JDK you installed before. sbt can
be installed with a Windows installer, see: Installing sbt on Windows. Install a git
client.
Contents Index 3
1 I NTRODUCTION
However, is this Chisel? Is this hardware generated to print a string? No, this is plain
Scala code and not a representative Hello World program for a hardware design.
4 Index Contents
1.4 A N IDE FOR C HISEL
line and an editor of your choice. In the tradition of other books, all commands that
you shall type in a shell/terminal/CLI are preceded by a $ character, which you shall
not type in. As an example, here is the Unix ls command, which lists files in the
current folder:
$ ls
Contents Index 5
1 I NTRODUCTION
Visual Studio Code is another option for a Chisel IDE. The Scala Metals exten-
sion provides Scala support. On the left bar select Extensions and search for Metals
and install Scala (Metals). To import an sbt-based project, open the folder with File
- Open.
This book is freely available as a PDF eBook and in classical printed form Ama-
zon. The eBook version features links to further resources and Wikipedia entries.
We use Wikipedia entries for background information (e.g., binary number system)
that does not directly fit into this book. We optimized the format of the eBook for
reading on a tablet, such as an iPad.
6 Index Contents
1.6 F URTHER R EADING
The official Chisel documentation and further documents are available online:
• The Chisel home page is the official starting point to download and learn
Chisel.
• The schoeberl/chisel-lab GitHub repo contains Chisel exercises for the course
Digital Electronics 2. The exercises also fit well for a selfstudy with this book.
• The empty Chisel project is a good starting point with a very minimal hard-
ware (an adder) and a test. That project is a GitHub template where you can
base your GitHub repository on.
• The Chisel3 Cheat Sheet summarizes the main constructs of Chisel on a single
page.
• Scott Beamer’s course Agile Hardware Design contains advanced Chisel ex-
amples. The lectures include executable source examples and are available as
Jupyter notebooks.
• The Chisel Tutorial provides a ready setup project containing small exercises
with testers and solutions. However, it is a bit outdated.
Contents Index 7
1 I NTRODUCTION
1.7 Exercises
Each chapter ends with a hands-on exercises. For the introduction exercise, we will
use an FPGA board to get one LED blinking.6 As a first step clone (or fork) the
chisel-examples repository from GitHub. The Hello World example is in the folder
hello-world, set up as a minimal project. You can explore the Chisel code of the
blinking LED in src/main/scala/Hello.scala. Compile the blinking LED with
the following steps:
After some initial downloading of Chisel components, this will produce the Ver-
ilog file Hello.v. Explore this Verilog file. You will see that it contains two inputs
clock and reset and one output io led. When you compare this Verilog file with
the Chisel module, you will notice that the Chisel module does not contain clock
or reset. Those signals are implicitly generated, and in most designs, it is con-
venient not to need to deal with these low-level details. Chisel provides register
components, and those are connected automatically to clock and reset (if needed).
The next step is to set up an FPGA project file for the synthesize tool, assign the
pins, compile7 the Verilog code, and configure the FPGA with the resulting bitfile.
We cannot provide the details of these steps. Please consult the manual of your Intel
Quartus or AMD Vivado tool. However, the examples repository contains some
ready to use Quartus projects in folder quartus for several popular FPGA boards
(e.g., DE2-115). If the repository contains support for your board, start Quartus,
open the project, compile it by pressing the Play button, and configure the FPGA
board with the Programmer button and one of the LEDs should blink.
Congratulations! You managed to get your first design in Chisel running in
an FPGA!
If the LED is not blinking, check the status of reset. On the DE2-115 configura-
tion, the reset input is connected to SW0.
Now change the blinking frequency to a slower or a faster value and rerun the
build process. Blinking frequencies and also blinking patterns communicate differ-
6 If you at the moment have no FPGA board available, continue to read as we will show you a simulation
8 Index Contents
1.7 E XERCISES
ent “emotions”. For example, a slow blinking LED signals that everything is ok, a
fast blinking LED signals an alarm state. Explore which frequencies express best
those two different emotions.
As a more challenging extension to the exercise, generate the following blinking
pattern: the LED shall be on for 200 ms every second. For this pattern, you might
decouple the change of the LED blinking from the counter reset. You will need a
second constant where you change the state of the blkReg register. What kind of
emotion does this pattern produce? Is it alarming or more like a sign-of-live signal?
If you do not (yet) have an FPGA board, you can still run the blinking LED
example. You will use the Chisel simulation. To avoid a too long simulation time
change the clock frequency in the Chisel code from 50000000 to 50000. Execute
following instruction to simulate the blinking LED:
$ sbt test
This will execute the tester that runs for one million clock cycles. The blinking
frequency depends on the simulation speed, which depends on the speed of your
computer. Therefore, you might need to experiment a little bit with the assumed
clock frequency to see the simulated blinking LED.
Contents Index 9
2 Basic Components
In this section, we introduce the basic components for digital design: combinational
circuits and flip-flops. These essential elements can be combined to build larger,
more interesting circuits.
Digital systems, in general, use binary signals, which means a single bit or signal
can only have one of two possible values. These values are often called 0 and 1.
However, we also use following terms: low/high, false/true, and deasserted/asserted.
These terms mean the same two possible values of a binary signal.
The width of a vector of bits is defined by a Chisel width type (Width). The fol-
lowing expression casts the Scala integer n to a Chisel width, which we use for the
definition of the Bits vector:
n.W
Bits(n.W)
Constants can be defined by using a Scala integer and converting it to a Chisel type:
1 Thetype Bits in the current version of Chisel is missing operations and therefore not very useful for
user code.
11
2 BASIC C OMPONENTS
Constants can also be defined with a width, by using the Chisel width type:
3.U(4.W) // An 4-bit constant of 3
If you find the notation of 3.U and 4.W a little bit funny, consider it as a variant of
an integer constant with a type. This notation is similar to 3L, representing a long
integer constant in C, Java, and Scala.
Possible pitfall: One possible error when defining constants with a dedicated
width is missing the .W specifier for a width. E.g., 1.U(32) will not define a 32-
bit wide constant representing 1. Instead, the expression (32) is interpreted as bit
extraction from position 32, which results in a single bit constant of 0. Probably not
what the original intention of the programmer was.
Chisel benefits from Scala’s type inference and in many places type information
can be left out. The same is also valid for bit widths. In many cases, Chisel will
automatically infer the correct width. Therefore, a Chisel description of hardware
is more concise and better readable than VHDL or Verilog.
For constants defined in other bases than decimal, the constant is defined in a
string with a preceding h for hexadecimal (base 16), o for octal (base 8), and b
for binary (base 2). The following example shows the definition of constant 255 in
different bases. In this example we omit the bit width and Chisel infers the minimum
width to fit the constants in, in this case 8 bits.
"hff".U // hexadecimal representation of 255
"o377".U // octal representation of 255
" b1111_1111 ".U // binary representation of 255
The above code shows how to use an underscore to group digits in the string that
represents a constant. The underscore is ignored.
Characters to represent text (in ASCII encoding) can also be used as constants in
Chisel:
val aChar = ’A’.U
To represent logic values, Chisel defines the type Bool. Bool can represent a
true or false value. The following code shows the definition of type Bool and the
definition of Bool constants, by converting the Scala Boolean constants true and
false to Chisel Bool constants.
12 Index Contents
2.2 C OMBINATIONAL C IRCUITS
a
AND
b
OR logic
c
Figure 2.1: Logic for the expression (a & b) | c. The wires can be a single bit or
multiple bits. The Chisel expression, and the schematics are the same.
Bool ()
true.B
false .B
Figure 2.1 shows the schematic of this combinatorial expression. Note that this
circuit may be for a vector of bits and not only single wires that are combined with
the AND and OR circuits.
In this example, we do not define the type nor the width of signal logic. Both are
inferred from the type and width of the expression. The standard logic operations in
Chisel are:
val and = a & b // bitwise and
val or = a | b // bitwise or
val xor = a ˆ b // bitwise xor
val not = ˜a // bitwise negation
Contents Index 13
2 BASIC C OMPONENTS
The resulting width of the operation is the maximum width of the operators for ad-
dition and subtraction, the sum of the two widths for the multiplication, and usually
the width of the numerator for divide and modulo operations.2
A signal can also first be defined as a Wire of some type. Afterward, we can
assign a value to the wire with the := update operator.
val w = Wire(UInt ())
w := a & b
Table 2.1 shows the full list of operators (see also builtin operators). The Chisel
operator precedence is determined by the evaluation order of the circuit, which fol-
lows the Scala operator precedence. If in doubt, it is always a good practice to use
parentheses.4
Table 2.2 shows various functions defined on and for Chisel data types.
ware nodes is created by executing the Scala operators. The Scala operator precedence is similar but
not identical to Java/C. Verilog has the same operator precedence as C, but VHDL has a different
one. Verilog has precedence ordering for logic operations, but in VHDL those operators have the
same precedence and are evaluated from left to right.
14 Index Contents
2.2 C OMBINATIONAL C IRCUITS
Contents Index 15
2 BASIC C OMPONENTS
sel
a T
y
b F
2.2.1 Multiplexer
A multiplexer is a circuit that selects between alternatives. In the most basic form,
it selects between two alternatives. Figure 2.2 shows such a 2:1 multiplexer, or mux
for short. Depending on the value of the select signal (sel) signal y will represent
signal a or signal b.
A multiplexer can be built from logic. However, as multiplexing is such a stan-
dard operation, Chisel provides a multiplexer,
val result = Mux(sel , a, b)
where a is selected when the sel is true.B, otherwise b is selected. The type of
sel is a Chisel Bool; the inputs a and b can be any Chisel base type or aggregate
(bundles or vectors) as long as they are the same type.
With logical and arithmetical operations and a multiplexer, every combinational
circuit can be described. However, Chisel provides further components and control
abstractions for a more elegant description of a combinational circuit, which are
described in Chapter 5.
The second basic component needed to describe a digital circuit is a state element,
also called register, which is described next.
2.3 Registers
Chisel provides a register, which is a collection of D flip-flops. The register is im-
plicitly connected to a global clock and is updated on the rising edge. When an ini-
tialization value is provided at the declaration of the register, it uses a synchronous
reset connected to a global reset signal. A register can be any Chisel type that can
16 Index Contents
2.3 R EGISTERS
reset
0
D Q q
d
clock
An input is connected to the register with the := update operator and the output of
the register can be used just with the name in an expression:
reg := d
val q = reg
Figure 2.3 shows the circuit of our register definition with a clock, a synchronous
reset to 0.U, input d, and output q. The global signals clock and reset are implicitly
connected to each register defined.
A register can also be connected to its input and a constant as initial value at the
definition:
val bothReg = RegNext (d, 0.U)
Contents Index 17
2 BASIC C OMPONENTS
coming from Java and Scala, is to use camelCase for identifier consisting of several
words. The convention is to start functions and variables with a lower case letter
and classes (types), e.g., a Module name, with an upper case letter.
In Chisel you are relative free to name your identifiers. However, use taste and
descriptive names. Furthermore, several words are reserved. They are listed in
Appendix A.
2.3.1 Counting
Counting is a fundamental operation in digital systems. One might count events.
However, more often counting is used to define a time interval. Counting the clock
cycles and triggering an action when the time interval has expired.
A simple approach is counting up to a value. However, in computer science, and
digital design, counting starts at 0. Therefore, if we want to count 10 clock cycles,
we count from 0 to 9. The following code shows such a counter that counts till 9
and wraps around to 0 when reaching 9.
val cntReg = RegInit (0.U(8.W))
2.4.1 Bundle
A Chisel Bundle groups several signals. The entire bundle can be referenced as a
whole, or individual fields can be accessed by their name. A Bundle is similar to
a struct in C and SystemVerilog or a record in VHDL. We can define a bundle
(collection of signals) by defining a class that extends Bundle and list the fields as
vals within the constructor block.
18 Index Contents
2.4 S TRUCTURE WITH B UNDLE AND V EC
To use a bundle, we create it with new and wrap it into a Wire. The fields are accessed
with the dot notation:
val ch = Wire(new Channel ())
ch.data := 123.U
ch. valid := true.B
val b = ch.valid
2.4.2 Vec
A Chisel Vec (a vector) represents a collection of Chisel types of the same type.
Each element can be accessed by an index. A Chisel Vec is similar to array data
structures in other programming languages.5
A Vec is used for three different purposes: (1) dynamic addressing in hardware,
which is a multiplexer; (2) a register file, which includes multiplexing the read and
generating the enable signal for the write; (3) parametrization if the number of ports
of a Module. For other collections of things, being it hardware elements or other
generator data, it is better to use the Scala collection Seq.
Combinational Vec
A Vec is created by calling the constructor with two parameters: the number of
elements and the type of the elements. A combinational Vec needs to be wrapped
into a Wire
val v = Wire(Vec (3, UInt (4.W)))
5 The name Array is already used in Scala.
Contents Index 19
2 BASIC C OMPONENTS
select
x 0
y 1 muxOut
z 2
Individual elements are accessed with (index). A vector wrapped into a Wire is just
a multiplexer.
v(0) := 1.U
v(1) := 3.U
v(2) := 5.U
Here is another example of using Vec as a multiplexer. The three inputs are con-
nected to the three wires x, y, and z. The select wire selects which input is used
and connects it to muxOut.
val vec = Wire(Vec (3, UInt (8.W)))
vec (0) := x
vec (1) := y
vec (2) := z
val muxOut = vec( select )
Figure 2.4 shows the resulting schematic of the above code snippet.
Similar to using a WireDefault, we can set default values of a Vec with VecInit.
The following code represents a 3:1 multiplexer with three constant defaults. Note
that we specify the size (3 bits) of the UInt data types with the first constant. With
the condition (cond) we can overwrite those default values. This overwrite hardware
itself consists of three 2:1 multiplexers. The last line selects one of the three inputs
20 Index Contents
2.4 S TRUCTURE WITH B UNDLE AND V EC
of the defVec multiplexer. Note that VecInit already returns Chisel hardware, so
we do not need to wrap it in a Wire.6
val defVec = VecInit (1.U(3.W), 2.U, 3.U)
when (cond) {
defVec (0) := 4.U
defVec (1) := 5.U
defVec (2) := 6.U
}
val vecOut = defVec (sel)
It is not only possible to set initial constants (like in WireDefault) for the Vec
input, but we can also connect signals (wires) with VecInit to the inputs of the Vec.
The following example connects the wires d, e, and f to the three inputs of the Vec.
val defVecSig = VecInit (d, e, f)
val vecOutSig = defVecSig (sel)
Register Vec
We can also wrap a Vec into a register to define an array of registers. The following
code shows a vector of three registers.
val regVec = Reg(Vec (3, UInt (8.W)))
Figure 2.5 shows the schematic of that circuit. It contains three registers. The read
index (rdIdx) selects the multiplexer connected to the output of the three registers.
The output signal is dout The write index (wrIdx) selects which register will be
written with the data from din. wrIdx is driving a decoder which selects one of the
three enable signals of the registers.
Following example defines a register file for a processor; 32 registers each 32-
bits wide, as used in a classic 32-bit RISC processor such as the 32-bit version of
RISC-V.
val registerFile = Reg(Vec (32, UInt (32.W)))
6 Thisis different from a plain Vec that needs to be wrapped into a Wire. We could wrap the VecInit
into a WireDefault, but this uncommon coding style.
Contents Index 21
2 BASIC C OMPONENTS
en
rdIdx
wrIdx decoder
en
0
din 1 dout
2
en
An element of that register file is accessed with an index and used as a normal
register.
registerFile (index) := dIn
val dOut = registerFile (index)
A register of a vector can also be initialized. This is then the value that the register
is reset to. To initialize the register file, we use A VecInit with the constants for
reset, wrapped into a RegInit. The input of the three registers are then connected to
wires d, e, and f.
val initReg = RegInit ( VecInit (0.U(3.W), 1.U, 2.U))
val resetVal = initReg (sel)
initReg (0) := d
initReg (1) := e
initReg (2) := f
22 Index Contents
2.4 S TRUCTURE WITH B UNDLE AND V EC
If we want to reset all elements of a large register file to the same value (probably
0), we can use a Scala sequence Seq. VecInit can be constructed with a sequence
containing Chisel types. Seq contains a creation function fill to initialize a se-
quence with identical values. The following code constructs a register file contain-
ing 32 registers, each 32-bit wide and reset to 0:
val resetRegFile =
RegInit ( VecInit (Seq.fill (32) (0.U(32.W))))
val rdRegFile = resetRegFile (sel)
We can freely mix bundles and vectors. When creating a vector with a bundle type,
we need to pass a prototype for the vector fields. Using our Channel, which we
defined above, we can create a vector of channels with:
val vecBundle = Wire(Vec (8, new Channel ()))
When we want a register of a bundle type that needs a reset value, we first create
a Wire of that bundle, set the individual fields as needed, and then pass this bundle
to a RegInit:
val initVal = Wire(new Channel ())
With combinations of Bundles and Vecs we can define our own data structures,
which are powerful abstractions.
Possible pitfall: In Chisel 3, partial assignments are not allowed, although they
have been allowed in Chisel 2 and are possible in Verilog and VHDL. The following
Contents Index 23
2 BASIC C OMPONENTS
The argument is that it would be better to use bundles for this use case. One possible
workaround for this issue is to create a (local) bundle, create a Wire from that bundle,
assign the individual fields, casting that bundle with asUInt() to a UInt, and assigne
this value to the target UInt. Note that we define here a Bundle as a local data
structure as we need it only locally.
val assignWord = Wire(UInt (16.W))
The small drawback of this solution is that one needs to know in which orders bundle
fields are merged to a single bit vector. Another option is to use a vector of Bool to
individually assign values and then convert it to a UInt.
val vecResult = Wire(Vec (4, Bool ()))
// example assignments
vecResult (0) := data (0)
vecResult (1) := data (1)
vecResult (2) := data (2)
vecResult (3) := data (3)
24 Index Contents
2.5 W IRE , R EG , AND IO
You can later assign (or reassign) a value or expression to a Wire, Reg, or IO with
the Chisel operator :=
number := 10.U
reg := value - 3.U
Note the small difference between the Scala assignment operator “=” and the Chisel
operator “:=”. You use Scala’s “=” operator when creating a hardware object (and
giving it a name) but you use Chisel’s “:=” operator when assigning or reassigning
a value to an existing hardware object.
Combinational values can be conditionally assigned, but they need to be assigned
in every branch of the condition. Otherwise, one would describe a latch, which
the Chisel compiler will reject. The best practice is to define a default value at the
creation of the Wire. Therefore, the former code is better rewritten as follows.
val number = WireDefault (10.U(4.W))
Although Chisel infers the needed bit width for signals and registers, it is also a good
practice to specify the intended bit width at the creation of the hardware object. In
most cases it is also good practice to set registers to known initial values on reset:8
val reg = RegInit (0.S(8.W))
7 Scala also supports mutable variables with var, but those are of no use when describing hardware in
Chisel.
8 Leaving the register value undefined on reset may save some load on the reset wire. However, testing
Contents Index 25
2 BASIC C OMPONENTS
2.7 Exercises
In the introduction you implemented a blinking LED on an FPGA board (from
chisel-examples), which is a reasonable hardware Hello World example. It used
only internal state, a single LED output, and no input. Copy that project into a
new folder and extend it by adding some inputs to the io Bundle with val sw =
Input(UInt(2.W)).
For those switches, you also need to assign the pins for the FPGA board. You can
find examples of pin assignments in the Quartus project files of the ALU project
(e.g., for the DE2-115 FPGA board).
When you have defined those inputs and the pin assignment, start with a simple
test: drop all blinking logic from the design and connect one switch to the LED
output; compile and configure the FPGA device. Can you switch the LED on an
26 Index Contents
2.7 E XERCISES
off with the switch? If yes, you have now inputs available. If not, you need to
debug your FPGA configuration. The pin assignment can also be done with the
GUI version of the tool.
Now use two switches and implement one of the basic combinational functions,
e.g., AND two switches and show the result on the LED. Change the function. The
next step involves three input switches to implement a multiplexer: one acts as a
select signal, and the other two are the two inputs for the 2:1 multiplexer.
Now you have been able to implement simple combinational functions and test
them in real hardware in an FPGA. As a next step, we will take a first look at how the
build process works to generate an FPGA configuration. Furthermore, we will also
explore a simple testing framework from Chisel, which allows you to test circuits
without configuring an FPGA and toggle switches.
Contents Index 27
3 Build Process and Testing
To get started with more interesting Chisel code we first need to learn how to com-
pile Chisel programs, how to generate Verilog code for execution in an FPGA, and
how to write tests for debugging and to verify that our circuits are correct.
Chisel is written in Scala, so any build process that supports Scala is possible with
a Chisel project. One popular build tool for Scala is sbt, which stands for the Scala
interactive build tool. Besides driving the build and test process, sbt also downloads
the correct version of Scala and the Chisel libraries.
29
3 B UILD P ROCESS AND T ESTING
project
src
main
scala
package
sub-package
test
scala
package
target
generated
containing the hardware sources and test containing testers. The next folder in both
cases is scala, as Chisel is based on Scala. If you want to include Java code, which
may be useful for hardware generators, you would add a java folder. Chisel inherits
from Scala, which itself inherits from Java, the organization of source in packages.
Packages organize your Chisel code into namespaces. Packages can also contain
sub-packages. The folder target contains the class files and other generated files.
I recommend to also use a folder for generated Verilog files, which is usually call
generated.
To use the facility of namespaces in Chisel, you need to declare that a class/mod-
ule is defined in a package, in this example in mypack:
package mypack
import chisel3 ._
Note that in this example we see the import of the chisel3 package to use Chisel
classes.
To use the module Abc in a different context (package name space), the compo-
nents of packet mypack need to be imported. The underscore ( ) acts as wildcard,
meaning that all classes of mypack are imported.
30 Index Contents
3.1 B UILDING YOUR P ROJECT WITH SBT
import mypack ._
It is also possible to not import all types from mypack, but use the fully qualified
name mypack.Abc to refer to the module Abc in the package mypack.
class AbcUser2 extends Module {
val io = IO(new Bundle {})
It is also possible to import just a single class and create an instance of it:
import mypack .Abc
$ sbt run
This command will compile all your Chisel code from the source tree and search
for classes that contain an object that either has a main method or extends App. If
there is more than one such object, all objects are listed and one can be selected.
You can also directly specify the object that shall be executed as a parameter to sbt:
Contents Index 31
3 B UILD P ROCESS AND T ESTING
By default, sbt searches only the main part of the source tree and not the test
part.2 To execute tests based on ChiselTest you can simply run them with
$ sbt test
If you have a test that does not follow the ChiselTest convention and it contains
a main function, but is placed in the test part of the source tree you can execute it
with following sbt command:
Using the default version of emitVerilog() will put the generated files into the
root folder of our project (where we run the sbt command). To put the generated
files into a subfolder, we need to specify options to emitVerilog(). I recommend to
specify a folder generated, as shown in Figure 3.1. The build options can be set as
a second argument, which is an array of Strings. The following code will generate
the Verilog file Hello.v in the subfolder generated.
object HelloOption extends App {
emitVerilog (new Hello (), Array("--target -dir",
" generated "))
}
2 It is a convention from Java/Scala that the test folder contains unit tests and not objects with a main.
32 Index Contents
3.2 T ESTING WITH C HISEL
You can also request the Verilog code as a Scala String without writing a file.
You can simply print out the string for testing.
object HelloString extends App {
val s = getVerilogString (new Hello ())
println (s)
}
This form of output is popular when showing small Chisel examples in Scastie, a
web-based Scala compiler and runtime. See Hello World on Scastie for an example.
Contents Index 33
3 B UILD P ROCESS AND T ESTING
scalac
Hello.class
Chisel
JVM
FIRRTL
Hello.fir JVM
Chisel
Tester
JVM
Verilog
Treadle
Emitter
JVM
JVM
GTKWave Circuit
Synthesis
Hello.bit
34 Index Contents
3.2 T ESTING WITH C HISEL
in a software simulator and compare the simulation of the hardware with the soft-
ware simulation. This method is very efficient when testing an implementation of a
processor [14].
3.2.1 ScalaTest
ScalaTest is a testing tool for Scala (and Java). ChiselTest is an extension of ScalaT-
est. Therefore, we first explore a simple ScalaTest example. To use it, include the
library in your build.sbt with following line:
libraryDependencies += "org. scalatest " %% " scalatest " %
" 3.1.4 " % "test"
Tests are usually found in src/test/scala and the entire test suite can be run with:
$ sbt test
A minimal test (a testing hello world) to test a Scala integer addition and a multipli-
cation looks as follows:
import org. scalatest ._
import org. scalatest . flatspec . AnyFlatSpec
import org. scalatest . matchers . should . Matchers
ScalaTest enables simple unit tests that read like an executable specification. The
example above contains two tests and the output of the test run will repeat the spec-
ification and show that both tests passed:
Contents Index 35
3 B UILD P ROCESS AND T ESTING
[info] ExampleTest:
[info] Integers
[info] - should add
[info] Integers
[info] - should multiply
[info] ScalaTest
[info] Run completed in 119 milliseconds.
[info] Total number of tests run: 2
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 2, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Passed: Total 2, Failed 0, Errors 0, Passed 2
sbt test executes all available tests, which is useful for regression tests.3 How-
ever, if you want to run just a single test (suite) you can do this with:
If you misspell the class name, for example, Exampletest, there will be a relatively
silent error message: No tests were executed.
3.2.2 ChiselTest
ChiselTest is the standard testing tool for Chisel modules based on the ScalaTest
tool for Scala and Java, which we can use to run Chisel tests. To use it, include the
chiseltest library in your build.sbt with the following line:
Including ChiselTest this way automatically includes the necessary version of Sca-
laTest. Therefore, you do not need to include a line for the ScalaTest library. To use
ChiselTest, the following packages need to be imported:
import chisel3 ._
import chiseltest ._
import org. scalatest . flatspec . AnyFlatSpec
3 Try sbt test in the repository of this book and you will see more than 90 tests passing.
36 Index Contents
3.2 T ESTING WITH C HISEL
Testing a circuit contains (at least) two components: the device under test (often
called DUT) and the testing logic, also called a test bench. Tests are started with
sbt test. No object with a main function is needed.
The following code shows our simple design under test. It contains two input
ports (2-bit width) and two output ports, a 2-bit width and a Bool. The circuit does
a bit-wise AND to its inputs a and b and outputs the result on out and tests the two
signals for equality:
class DeviceUnderTest extends Module {
val io = IO(new Bundle {
val a = Input(UInt (2.W))
val b = Input(UInt (2.W))
val out = Output (UInt (2.W))
val equ = Output (Bool ())
})
The test bench for this DUT extends AnyFlatSpec with ChiselScalatestTester,
which provides ChiselTest functionality within ScalaTest. The method test() is
invoked with the DUT as parameter and the test code as a function literal.
class SimpleTest extends AnyFlatSpec with
ChiselScalatestTester {
"DUT" should "pass" in {
test(new DeviceUnderTest ) { dut =>
dut.io.a.poke (0.U)
dut.io.b.poke (1.U)
dut. clock.step ()
println (" Result is: " + dut.io.out. peekInt ())
dut.io.a.poke (3.U)
dut.io.b.poke (2.U)
dut. clock.step ()
println (" Result is: " + dut.io.out. peekInt ())
}
}
}
The input and output ports of the DUT are accessed with dut.io. You can set
Contents Index 37
3 B UILD P ROCESS AND T ESTING
values via a poke on a port, which takes the value as a Chisel type of the input port
as parameter. An output port can be read by invoking peekInt() or peekBoolean()
on the port, which will return the value as a Scala type. The tester advances the sim-
ulation by one clock cycle with dut.clock.step(). For advancing the simulation
by several clock cycles, we can provide a parameter to step(). We can print the
values of the outputs with println().
When you run the test
$ sbt " testOnly SimpleTest "
you will see the results printed to the terminal (besides other information):
...
Result is: 0
Result is: 2
[info] SimpleTest:
[info] DUT
[info] - should pass
...
38 Index Contents
3.2 T ESTING WITH C HISEL
Executing this test does not print out any values from the hardware, but that all
tests passed as all expect values are correct.
[info] SimpleTestExpect:
[info] DUT
[info] - should pass
[info] ScalaTest
[info] Run completed in 1 second, 85 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Passed: Total 1, Failed 0, Errors 0, Passed 1
A failed test, when either the DUT or the test bench contains an error, produces
an error message describing the difference between the expected and actual value.
In the following, we changed the test bench to expect a 4, which is an error:
[info] SimpleTestExpect:
[info] DUT
[info] - should pass *** FAILED ***
[info] io_out=2 (0x2) did not equal expected=4 (0x4)
(lines in testing.scala: 27) (testing.scala:35)
[info] ScalaTest
[info] Run completed in 1 second, 214 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 0, failed 1, canceled 0, ignored 0, pending 0
[info] *** 1 TEST FAILED ***
[error] Failed: Total 1, Failed 1, Errors 0, Passed 0
[error] Failed tests:
[error] SimpleTestExpect
The peek() function returns a Chisel type, which would need conversion to be
used as Scala type. To simplify using test values in Scala land, ChiselTest supports
peekInt() and peekBoolean(). The following test example reads the output with
peekInt(), which returns a Scala integer4 that is used in the assert() statement.
Similar we can read the equ output into a Scala Boolean, directly used in the assert
statement.
4 To support arbitrarily wide integer values, the return value is a Scala BigInt instead of a Scala Int.
Contents Index 39
3 B UILD P ROCESS AND T ESTING
This example is a bit too simple to see the benefit of reading values from the DUT
into Scala types. However, with more complex tests, e.g., looping till some value is
true, these functions become useful.
In this section, we described the basic testing facility with Chisel for simple tests.
However, keep in mind that the full power of Scala is available to write testers. This
includes, for example, writing a reference model of your hardware in Scala to test
the DUT against.
3.2.3 Waveforms
Testers, as described above, work well for small designs and for unit testing, as
it is common in software development. A collection of unit tests can also serve
regression testing. However, for debugging more complex designs, one would like
to investigate several signals at once. A classic approach to debug digital designs is
displaying the signals in a waveform. In a waveform the signals are displayed over
time.
Chisel testers can generate a waveform that includes all registers and all IO signals.
In the following examples, we show waveform testers for the DeviceUnderTest from
the former example (the 2-bit AND function). To generate a waveform for a test pass
a definition of writeVcd=1 to the test, as shown in the following sbt command:
You can view the waveform with the free viewer GTKWave or with ModelSim.
40 Index Contents
3.2 T ESTING WITH C HISEL
Start GTKWave and select File – Open New Window and navigate to the folder
where the Chisel tester put the .vcd file. By default, the generated files are in
test run dir then the description of the test. Within this folder, you should be
able to find DeviceUnderTest.vcd. You can select the signals from the left side and
drag them into the main window. If you want to save a configuration of signals you
can do so with File – Write Save File and load it later with File – Read Save File.
The generation of waveforms can also be initiated by passing the WriteVcdAnnotation
annotation to the test() function.5 We start with a simple tester that pokes values to
the inputs and advances the clock with step. We do not read any output or compare
it with expect. Instead, we will inspect the generated waveform in the .vcd file.
class WaveformTest extends AnyFlatSpec with
ChiselScalatestTester {
" Waveform " should "pass" in {
test(new DeviceUnderTest )
. withAnnotations (Seq( WriteVcdAnnotation )) { dut =>
dut.io.a.poke (0.U)
dut.io.b.poke (0.U)
dut. clock.step ()
dut.io.a.poke (1.U)
dut.io.b.poke (0.U)
dut. clock.step ()
dut.io.a.poke (0.U)
dut.io.b.poke (1.U)
dut. clock.step ()
dut.io.a.poke (1.U)
dut.io.b.poke (1.U)
dut. clock.step ()
}
}
}
Explicitly enumerating all possible input values does not scale. Therefore, we
will use some Scala code to drive the DUT. The following tester enumerates all
possible values for the two 2-bit input signals.
class WaveformCounterTest extends AnyFlatSpec with
ChiselScalatestTester {
" WaveformCounter " should "pass" in {
5 This is an alternative to using the command line options.
Contents Index 41
3 B UILD P ROCESS AND T ESTING
test(new DeviceUnderTest )
. withAnnotations (Seq( WriteVcdAnnotation )) { dut =>
for (a <- 0 until 4) {
for (b <- 0 until 4) {
dut.io.a.poke(a.U)
dut.io.b.poke(b.U)
dut.clock.step ()
}
}
}
}
}
When testing this module with the counter based tester, which iterates over all pos-
sible values, we get following output, verifying that the AND function is correct:
Elaborating design...
42 Index Contents
3.3 E XERCISES
Done elaborating.
dut: 0 0 0
dut: 0 1 0
dut: 0 2 0
dut: 0 3 0
dut: 1 0 0
dut: 1 1 1
dut: 1 2 0
dut: 1 3 1
dut: 2 0 0
dut: 2 1 0
dut: 2 2 2
dut: 2 3 2
dut: 3 0 0
dut: 3 1 1
dut: 3 2 2
dut: 3 3 3
dut: 0 0 0
test DeviceUnderTestPrintf Success: 0 tests passed in 18 cycles
in 0,031521 seconds 571,04 Hz
3.3 Exercises
For this exercise, we will revisit the blinking LED from chisel-examples and explore
Chisel testing.
Contents Index 43
3 B UILD P ROCESS AND T ESTING
Then follows the hardware description, as shown in Listing 1.1. To generate the
Verilog description, we need an application. The only action of this application is
to create a new Hello object and pass it to the emitVerilog() function.
object Hello extends App {
emitVerilog (new Hello ())
}
and explore the generated Hello.v with an editor. The generated Verilog code
may not be very readable, but we can find out some details. The file starts with
a module Hello, which is the same name as our Chisel module. We can identify
our LED port as output io led. Pin names are the Chisel names with a prepended
io . Besides our LED pin, the module also contains clock and reset input signals.
Those two signals are added automatically by Chisel.
Furthermore, we can identify the definition of our two registers cntReg and blkReg.
We may also find the reset and update of those registers at the end of the module
definition. Note that Chisel generates a synchronous reset.
For sbt to be able to fetch the correct Scala compiler and the Chisel library, we
need a build.sbt:
scalaVersion := " 2.13.8 "
Note that in this example, we have a concrete Chisel version number to avoid check-
ing on each run for a new version (which will fail if we are not connected to the
44 Index Contents
3.3 E XERCISES
Internet, e.g., when doing hardware design during a flight). Additionally, we have
added the Chisel3 compiler plugin which is needed since Chisel 3.5. Change the
build.sbt configuration to use the latest Chisel version by changing the library
dependency to
libraryDependencies += "edu. berkeley .cs" %% " chisel3 " %
" latest . release "
and rerun the build with sbt. Is there a newer version of Chisel available and will it
be automatically downloaded?
For convenience, the project also contains a Makefile. It just contains the sbt
command, so we do not need to remember it and can generate the Verilog code
with:
make
Besides a README file, the example project also contains project files for different
FPGA boards. E.g., in quartus/altde2-115 you can find the two project files to define
a Quartus project for the DE2-115 board. The main definitions (source files, device,
pin assignments) can be found in a plain text file hello.qsf. Explore the file and find
out which pins are connected to which signals. If you need to adapt the project to a
different board, this is where the changes are applied. If you have Quartus installed,
open that project, compile with the green Play button, and then configure the FPGA.
Note that the Hello World is a minimal Chisel project. More realistic projects
have their source files organized in packages and contain testers.
make
Contents Index 45
3 B UILD P ROCESS AND T ESTING
make test
You can also execute these tasks by running the sbt and git commands directly.
46 Index Contents
4 Components
A larger digital design is structured into a set of components, often in a hierarchical
way. Each component has an interface with input and output wires, sometimes
called ports. These are similar to input and output pins on an integrated circuit (IC).
Components are connected by wiring up the inputs and outputs. Components may
contain subcomponents to build the hierarchy. The outermost component, which is
connected to physical pins on a chip, is called the top-level component.
In this chapter, we will explain how components are described in Chisel and
provide several simple examples of components. Note that the components in this
section are very small (e.g., just an adder), just to show the principles: how to
define, instantiate, and connect components. Real-world examples shall contain
more “meat” than just a single line for an adder.
47
4 C OMPONENTS
Adder
+ y
Register
d q
48 Index Contents
4.1 C OMPONENTS IN C HISEL ARE M ODULES
Count10
Adder Register
0
1 a
next d q dout
y result
b
count
Contents Index 49
4 C OMPONENTS
io.dout := count
}
50 Index Contents
4.2 N ESTED C OMPONENTS
CompA
CompB CompD
CompC
called next is the input for the register component. The output of the register is the
count value and also the output of the Count10 component (dout).
Listing 4.3 shows the Chisel code for the Count10 component. The two compo-
nents are instantiated by creating them with new, wrapping them into a Module(),
and assigning them the names add and reg. In this example, we give the output of
the register (reg.io.q) the name count.
We connect 1.U and count to the two inputs of the adder component. We give the
output of the adder component the name result. The multiplexer selects between
0.U and result depending on the current counter value count. We name the out-
put of the multiplexer next and connect it to the input of the register components.
Finally, we connect the counter value count to the single output of the Count10
component, io.dout.
Contents Index 51
4 C OMPONENTS
// function of A
}
// function of B
}
52 Index Contents
4.2 N ESTED C OMPONENTS
// connect A
compA .io.a := io.inA
compA .io.b := io.inA
io.outX := compA.io.x
// connect B
compB .io.in1 := compA.io.y
compB .io.in2 := io.inC
io.outY := compB.io.out
}
Listing 4.4 shows the definition of the two example components A and B from
Figure 4.4. Component A has two inputs, named a and b, and two outputs, named
x and y. For the ports of component B we chose the names in1, in2, and out.
All ports use an unsigned integer (UInt) with a bit width of 8. As this example
code is about connecting components and building a hierarchy, we do not show any
implementation within the components. The implementation of the component is
written at the place where the comments states “function of X”. As we have no
function associated with those example components, we used generic port names.
For a real design, use descriptive port names such as data, valid, or ready.
Component C, shown in Listing 4.5, has three input and two output ports. It is
built out of components A and B. We show how A and B are connected to the ports
of C and also the connection between an output port of A and an input port of B.
Components are created with new, e.g., new CompA(), and need to be wrapped
Contents Index 53
4 C OMPONENTS
// function of D
}
into a Module(). The reference to that module is stored in a local variable, in this
example val compA = Module(new CompA()).
With this reference, we can access the IO ports by dereferencing the io field of
the module and then the individual fields of the IO Bundle.
The simplest component (D) in our design, shown in Listing 4.6, has just an
input port, named in, and an output port named out. The final missing piece of
our example design is the top-level component, which itself is assembled out of
components C and D, shown in Listing 4.7.
Good component design is similar to the good design of functions or methods in
software design. One of the main questions is how much functionality shall we put
into a component and how large should a component be. The two extremes are tiny
components, such an adder, and huge components, such as a full microprocessor,
Beginners in hardware design often start with tiny components. The problem is
that digital design books use tiny components to show the principles. The sizes of
the examples (in those books, and also in this book) are small to fit onto a page and
to avoid distracting details.
The interface to a component is a little bit verbose (with types, names, directions,
IO construction). As a rule of thumb, I propose that the core of the component, the
function, should be at least as long as the interface of the component.
For tiny components, such as a counter, Chisel provides a more lightweight way
to describe them as functions that return hardware.
54 Index Contents
4.2 N ESTED C OMPONENTS
// create C and D
val c = Module (new CompC ())
val d = Module (new CompD ())
// connect C
c.io.inA := io.inA
c.io.inB := io.inB
c.io.inC := io.inC
io.outM := c.io.outX
// connect D
d.io.in := c.io.outY
io.outN := d.io.out
}
Contents Index 55
4 C OMPONENTS
fn
a
ALU y
56 Index Contents
4.4 B ULK C ONNECTIONS
switch (io.fn) {
is (0.U) { io.y := io.a + io.b }
is (1.U) { io.y := io.a - io.b }
is (2.U) { io.y := io.a | io.b }
is (3.U) { io.y := io.a & io.b }
}
}
In this example, we use a new Chisel construct, the switch/is construct to describe
the table that selects the output of our ALU. To use this utility function, we need to
import another Chisel package:
import chisel3 .util._
Contents Index 57
4 C OMPONENTS
To connect all three stages we need just two <> operators. We can also connect
the port of a submodule with the parent module.
val fetch = Module (new Fetch ())
val decode = Module (new Decode ())
val execute = Module (new Execute )
58 Index Contents
4.5 E XTERNAL M ODULES
Blackboxes, on the other hand, can represent any component. They can be declared
in three different ways with their source either inlined or available in a separate file.
As an example, consider a 32-bit adder with the following IO.
class BlackBoxAdderIO extends Bundle {
val a = Input(UInt (32.W))
val b = Input(UInt (32.W))
val cin = Input(Bool ())
val c = Output (UInt (32.W))
val cout = Output (Bool ())
}
Contents Index 59
4 C OMPONENTS
60 Index Contents
4.5 E XTERNAL M ODULES
Blackboxes are instantiated the same way as other modules by wrapping them as
Module(new BlackBoxModule). They cannot be tested directly but must be wrapped
either in a named class or in an anonymous class in the tester. Both are allowed to
have the same IO as the blackbox.
class InlineAdder extends Module {
val io = IO(new BlackBoxAdderIO )
val adder = Module (new InlineBlackBoxAdder )
io <> adder.io
}
test(new Module {
val io = IO(new BlackBoxAdderIO )
val adder = Module (new InlineBlackBoxAdder )
io <> adder.io
})
Contents Index 61
5 Combinational Building Blocks
In this chapter, we explore various combinational circuits, basic building blocks that
we can use to construct more complex systems. In principle, all combinational cir-
cuits can be described with Boolean equations. However, more often, a description
in the form of a table is more efficient. We let the synthesize tool extract and mini-
mize the Boolean equations. Two basic circuits, best described in a table form, are
a decoder and an encoder.
The Boolean expression is given a name (e) by assigning it to a Scala value. The
expression can be reused in other expressions:
val f = ˜e
63
5 C OMBINATIONAL B UILDING B LOCKS
cond
cond2
1
2 w
w := 0.U
when (cond) {
w := 3.U
}
The logic of the circuit is a multiplexer, where the two inputs are the constants 0
and 3 and the select signal is the condition cond. Keep in mind that we describe
hardware circuits and not a software program with conditional execution.
The Chisel condition construct when also has a form of else, it is called .otherwise.
With assigning a value under any condition we can omit the default value assign-
ment:
val w = Wire(UInt ())
when (cond) {
w := 1.U
} . otherwise {
w := 2.U
}
Chisel also supports a chain of conditionals (like a if/elseif/else chain) with .elsewhen:
val w = Wire(UInt ())
when (cond) {
w := 1.U
64 Index Contents
5.2 D ECODER
} . elsewhen (cond2) {
w := 2.U
} . otherwise {
w := 3.U
}
when (cond) {
w := 3.U
}
// ... and some more complex conditional assignments
One might ask, why do we use when, .elsewhen, and .otherwise when Scala
has if, else if, and else? Those Scala statements are for conditional execution of
Scala code, not generating Chisel (multiplexer) hardware. Those Scala conditionals
have their use in Chisel when we write circuit generators, which take parameters to
conditionally generate different hardware instances.
5.2 Decoder
A decoder converts a binary number of n bits to an m-bit signal, where m ≤ 2n . The
output is one-hot encoded (where exactly one bit is one). Figure 5.2 shows a 2-bit to
4-bit decoder. We can describe the function of the decoder with a truth table, such
as Table 5.1.
A Chisel switch statement describes the logic as a truth table. To use the switch
statement we need to include the package chisel.util.
Contents Index 65
5 C OMBINATIONAL B UILDING B LOCKS
b0
a0 b1
Decoder
a1 b2
b3
a b
00 0001
01 0010
10 0100
11 1000
The following code uses the switch statement of Chisel to describe a decoder:
result := 0.U
switch (sel) {
is (0.U) { result := 1.U}
is (1.U) { result := 2.U}
is (2.U) { result := 4.U}
is (3.U) { result := 8.U}
}
The above switch statement lists all possible values of the sel signal and assigns the
decoded value to the result signal. Note that even if we enumerate all possible input
values, Chisel still needs us to assign a default value, as we do by assigning an initial
0 to result. This assignment will never be active and therefore optimized away by
the synthesize tool. It is intended to avoid situations with incomplete assignments
for combinational circuits (in Chisel a Wire) that will result in unintended latches in
66 Index Contents
5.3 E NCODER
hardware description languages such as VHDL and Verilog. Chisel does not allow
incomplete assignments.
In the example before we used unsigned integers for the signals. Maybe a clearer
representation of an encode circuit uses binary notation:
switch (sel) {
is ("b00".U) { result := "b0001".U}
is ("b01".U) { result := "b0010".U}
is ("b10".U) { result := "b0100".U}
is ("b11".U) { result := "b1000".U}
}
A table gives a very readable representation of the decoder function but is also
a little bit verbose. When examining the table, we see a regular structure: a 1 is
shifted left by the number represented by sel. Therefore, we can express a decoder
with the Chisel shift operation <<.
result := 1.U << sel
Decoders are used as a building block for a multiplexer by using the output as
an enable with an AND gate for the multiplexer data input. However, in Chisel, we
do not need to construct a multiplexer because a Mux is available in the core library.
Decoders can also be used for address decoding of some bits of an address bus of a
microprocessor. The outputs are used as select signals for memories and different
IO devices connected to the microprocessor (see Section 12.1).
5.3 Encoder
An encoder converts a one-hot encoded input signal into a binary encoded output
signal. The encoder does the inverse operation of a decoder.
Figure 5.3 shows a 4-bit one-hot input to a 2-bit binary output encoder, and Ta-
ble 5.2 shows the truth table of the encode function. However, an encoder works
only as expected when the input signal is one-hot coded. For all other input values,
the output is undefined. As we cannot describe a function with undefined outputs,
we use a default assignment that catches all undefined input patterns.
The following Chisel code assigns a default value of 0 and then uses the switch
statement for the legal input values.
b := "b00".U
Contents Index 67
5 C OMBINATIONAL B UILDING B LOCKS
a0
a1 b0
Encoder
a2 b1
a3
a b
0001 00
0010 01
0100 10
1000 11
???? ??
switch (a) {
is (" b0001 ".U) { b := "b00".U}
is (" b0010 ".U) { b := "b01".U}
is (" b0100 ".U) { b := "b10".U}
is (" b1000 ".U) { b := "b11".U}
}
For the decoder, we found an elegant single-line statement to express the logic.
This also enables us to describe a wide decoder. However, we are not aware of such
an expression for the encoder.
To express larger encoders, we need to write a (simple) hardware generator.
Therefore, we need to introduce the Scala loop construct. The following two lines
of Scala code express a loop, counting from 0 to 9.
// Loops i from 0 to 9
for (i <- 0 until 10) {
// use i to index into a Wire or Vec
}
68 Index Contents
5.4 A RBITER
The loop variable i can be used to index individual bits from a Wire or Reg; or
an element in a Vec. This Scala generator loop is the simplest form of describing
a hardware generator. Chapter 10 describes how to write hardware generators in
more detail. Note that the loop is executed at circuit generation time. This is not a
hardware counter.
For the encoder generator we will use a Vec, where each element represents one
column of the encoder table. The following code shows a 16-bit encoder, where the
output is 4 bits wide:
val v = Wire(Vec (16, UInt (4.W)))
v(0) := 0.U
for (i <- 1 until 16) {
v(i) := Mux(hotIn(i), i.U, 0.U) | v(i - 1)
}
val encOut = v(15)
The input of the encoder is hotIn and the output is encOut. Vec element 0 is the
default case (0), and also represents the output value when the least significant bit
(LSB) is set in hotIn.
Vec elements 1 till 15 are connected to a multiplexer. If the bit at position i is
set in hotIn, the multiplexer output is the index, otherwise it is 0. For the correct
behavior of our encoder we assume that the input signal is one-hot encoded. Finally
we need to merge all vector elements for a single output. As the vector elements are
0 when the corresponding bit in the input is 0, we can simply combine all elements
with an OR function. In the loop we OR the current element with one vector element
before (... | v(i-1)). When several elements are combined with a function we
call this operation also reduce. Therefore, here we perform an OR reduction.
5.4 Arbiter
We use an arbiter to arbitrate requests from several clients to a single shared re-
source. An example would be several processor cores sharing a single serial port
(UART).
Figure 5.4 shows the schematic of a 4-bit arbiter. It consists of four request lines
(r0–r3) and four grant lines (g0–g3). The arbiter grants only a single request. For
example, a request input of 0101 will result in a grant output of 0001. The arbiter
prioritizes the lower inputs. Therefore, we call it a priority arbiter. The lower the bit
number, the higher the priority.
Contents Index 69
5 C OMBINATIONAL B UILDING B LOCKS
r0 g0
r1 g1
Arbiter
r2 g2
r3 g3
To build a fair arbiter, we need to add state to remember the last arbitration. We
will present a fair arbiter in Section 10.6.2.
Figure 5.5 shows the schematic of a 4-bit arbiter. The individual grant requests
need to check if a lower bit has already won the arbitration. For the first request
the grant g0 depends only on the request r0. The second grant can only win the
arbitration when request r1 is asserted and request r0 is deasserted. For the next
requests, the lookup is further chained.
The following code shows an arbiter for 3 clients.
val grant = VecInit (false.B, false.B, false.B)
val notGranted = VecInit (false.B, false.B)
The code is the same as the schematic in Figure 5.5 (except we showed the code only
for a 3-bit arbiter). We use vectors of Bool to represent the request, grant, and not-
granted chain. We can see that grant(0) depends only on request(0). notGranted
is used to chain the information that no lower bit requests have been granted.
Small arbiters can also be directly described with a logic table. The following
code shows the table for a 3-bit arbiter.
val grant = WireDefault ("b0000".U(3.W))
switch ( request ) {
is ("b000".U) { grant := "b000".U}
is ("b001".U) { grant := "b001".U}
70 Index Contents
5.4 A RBITER
r0 g0
r1
g1
r2
g2
r3
g3
Contents Index 71
5 C OMBINATIONAL B UILDING B LOCKS
However, for larger arbitration circuits we will use our newly learned trick of a
for loop as a generator loop. The following code shows a parameterized arbiter for
n requests and grants. Here we use again Vec of Bool.
The code shown above is the loop version of the initial arbiter version. It generates
the arbitration circuit for n requests. The small difference to the manual version
(unrolled loop) is that we generate a notGranted wire also for the last request (n -
1). That wire is not used and the synthesize tool will optimize it away.
With our original encoder design we had to assume that the input is one-hot encoded,
meaning only one bit is allowed to be 1. Inputs with several bits set are illegal and
lead to undefined behavior.
We can solve this problem by combining the encoder with an arbitration circuit,
which selects only the highest-priority bit set. When we feed the output of the arbiter
into an encoder we create a priority encoder. Figure 5.6 shows the schematic.
72 Index Contents
5.6 C OMPARATOR
r0 g0
r1 g1 d0
Arbiter Encoder
r2 g2 d1
r3 g3
Figure 5.6: With an arbiter and an encoder we can build a priority encoder.
a a == b equ
Comparator
b a>b gt
5.6 Comparator
As the last circuit for the chapter of combinational building blocks, we present the
comparator. Figure 5.7 shows the schematic of a comparator. It has two multi-bit
inputs and compares those two values. It has two outputs: (1) signaling that a and b
are equal (equ) and (2) that a is greater than b. These two outputs are enough for all
possible comparisons. For example, the if equ or gt are asserted we know that the
condition a >= b is true. For the condition a <= b we test for not gt.
The following code snippet shows the comparator. As you can see, these are just
two lines of Chisel code. Therefore, compare functions are usually just directly used
in other components and not wrapped into a module.
val equ = a === b
val gt = a > b
Contents Index 73
5 C OMBINATIONAL B UILDING B LOCKS
5.7 Exercise
Describe a combinational circuit to convert a 4-bit binary input to the encoding of
a 7-segment display. You can either define the codes for the decimal digits, which
was the initial usage of a 7-segment display, or additionally, define encodings for
the remaining bit pattern to be able to display all 16 values of a single digit in
hexadecimal. When you have an FPGA board with a 7-segment display, connect
4 switches or buttons to the input of your circuit and the output to the 7-segment
display.
74 Index Contents
6 Sequential Building Blocks
Sequential circuits are circuits where the output depends on the input and previous
values. As we are interested in synchronous design (clocked designs), we mean
synchronous sequential circuits when we talk about sequential circuits.1 To build
sequential circuits, we need elements that can store state: the so-called registers.
6.1 Registers
The fundamental elements for building sequential circuits are registers. A register is
a collection of D flip-flops. A D flip-flop captures the value of its input at the rising
edge of the clock and stores it at its output. In other words, the register updates its
output with the value of the input on the rising edge of the clock.
Figure 6.1 shows the schematic symbol of a register. It contains an input D and an
output Q. Each register also contains an input for a clock signal. As this global clock
signal is connected to all registers in a synchronous circuit, it is usually not drawn
in the schematics. The little triangle on the bottom of the box symbolizes the clock
input and tells us that this is a register. We omit the clock signal in the following
1 Wecan also build sequential circuits with asynchronous logic and feedback, but this is a niche topic
and cannot be easily expressed in Chisel.
D Q
clock
75
6 S EQUENTIAL B UILDING B LOCKS
schematics. The omission of the global clock signal is also reflected by Chisel where
no explicit connection of a signal to the register’s clock input is needed. In Chisel a
register with input d and output q is defined with:
val q = RegNext (d)
Note that we do not need to connect a clock to the register; Chisel implicitly does
this. A register’s input and output can be arbitrarily complex types made out of a
combination of vectors and bundles.
A register can also be defined and used in two steps:
val delayReg = Reg(UInt (4.W))
delayReg := delayIn
First, we define the register and give it a name. Second, we connect the signal
delayIn to the input of the register. Note also that the name of the register contains
the string Reg. To easily distinguish between combinational circuits and sequential
circuits, it is common practice to have the marker Reg as part of the name. Also,
note that names in Scala (and therefore also in Chisel) are usually in CamelCase.
Variable names start with lowercase letter and class names start with an upper case
letter.
A register can be initialized on reset. The reset signal is, like the clock signal,
implicit in Chisel. We supply the reset value, for example, zero, as a parameter
to the register constructor RegInit. The input for the register is connected with a
Chisel assignment statement.
val valReg = RegInit (0.U(4.W))
valReg := inVal
76 Index Contents
6.1 R EGISTERS
reset
init
D Q
data
clock
reset
inVal 3 5 2 7 4
regVal 0 5 2 7
1 2 3 4 5 6 7
plexer.
Sequential circuits change their value over time. Therefore, their behavior can be
described by a diagram showing the signals over time. Such a diagram is called a
waveform or timing diagram.
Figure 6.3 shows a waveform for the register with a reset and some input data
applied to it. Time advances from left to right. On top of the figure, we see the
clock that drives our circuit. In the first clock cycle (1), before a reset, the register
content is undefined. In the second clock cycle reset is asserted high, and on the
rising edge of this clock cycle the register takes the initial value 0. Input inVal is
ignored. In the next clock cycle reset is 0, and the value of inVal is captured on
the next rising edge. From then on reset stays 0, as it should be, and the register
output follows the input signal with one clock cycle delay.
Waveforms are an excellent tool to specify the behavior of a circuit graphically.
Contents Index 77
6 S EQUENTIAL B UILDING B LOCKS
enable
data
D Q
clock
enable
inVal 2 3 5 6 2 7 4
regEnable 2 3 5 2 7
1 2 3 4 5 6 7
Especially in more complex circuits where many operations happen in parallel and
data moves pipelined through the circuit, timing diagrams are convenient. Chisel
testers can also produce waveforms during testing that can be displayed with a wave-
form viewer and used for debugging.
A typical design pattern is a register with an enable signal. Only when the enable
signal is true (high), the register captures the input; otherwise, it keeps its old value.
The enable can be implemented, similar to the synchronous reset, with a multiplexer
at the input of the register. One input to the multiplexer is the feedback of the output
of the register.
Figure 6.4 shows the schematic of a register with an enable signal. As this is also
a common design pattern, modern FPGA flip-flops contain a dedicated enable input,
and no additional (LUT) resources are needed to implement the register enable ibt.
78 Index Contents
6.1 R EGISTERS
Figure 6.5 shows an example waveform for a register with enable. Most of the
time, enable is high (true) and the register follows the input with one clock cycle
delay. Only in the fourth clock cycle enable is low, and the register keeps its value
(5) in clock cycle 5.
A register with an enable can be described in a few lines of Chisel code with a
conditional update:
val enableReg = Reg(UInt (4.W))
when ( enable ) {
enableReg := inVal
}
Using an enable signal for a register is so common that Chisel defines RegEnable
where the second parameter is the enable signal:
val enableReg2 = RegEnable (inVal , enable )
when ( enable ) {
resetEnableReg := inVal
}
The functionality of register enable and initialization at reset can be combined when
using the three-parameter version of RegEnable. The first parameter is the input
signal, the second parameter is the initialization value, and the third parameter is
the enable signal:
val resetEnableReg2 = RegEnable (inVal , 0.U(4.W), enable )
A register can also be part of an expression, without giving it a name. The fol-
lowing circuit detects the rising edge of a signal by comparing its current value with
the one from the last clock cycle (the delayed value).
val risingEdge = din & ! RegNext (din)
Now that we have explored all basic uses of a register, we put those registers to
good use and build more interesting sequential circuits. For the next schematics we
Contents Index 79
6 S EQUENTIAL B UILDING B LOCKS
will further simplify the register symbol and omit the D for the input and the Q for
the output.
6.2 Counters
One of the most basic sequential circuits is a counter. In its simplest form, a counter
is a register where the output is connected to an adder and the adder’s output is
connected to the input of the register. Figure 6.6 shows such a free-running counter.
A free-running counter with a 4-bit register counts from 0 to 15 and then wraps
around to 0 again. A counter shall also be reset to a known value.
val cntReg = RegInit (0.U(4.W))
80 Index Contents
6.2 C OUNTERS
event
1
conditional statement.
val cntReg = RegInit (0.U(8.W))
If we are in the mood of counting down, we start by resetting the counter register
with the maximum value and reset the counter to that value when reaching 0.
val cntReg = RegInit (N)
As we are coding and using more counters, we can define a function with a param-
eter to generate a counter for us.
Contents Index 81
6 S EQUENTIAL B UILDING B LOCKS
The last statement of the function genCounter is the return value of the function, in
this example, the output of the counting register cntReg.
Note that in all the examples our counter had values between 0 and N, including
N. If we want to count 10 clock cycles we need to set N to 9. Setting N to 10 would
be a classic example of an off-by-one error.
82 Index Contents
6.2 C OUNTERS
clock
reset
tick
counter 0 1 2 0 1 2 0 1
Figure 6.8: A waveform diagram for the generation of a slow frequency tick.
clock
reset
tick
slow cnt 0 1 2
This logical timing of one tick every n clock cycles can then be used to advance
other parts of our circuit with this slower, logical clock. In the following code, we
use just another counter that increments by 1 every n clock cycles.
val lowFrequCntReg = RegInit (0.U(4.W))
when (tick) {
lowFrequCntReg := lowFrequCntReg + 1.U
}
Figure 6.9 shows the waveform of the tick and the slow counter that increments
every tick (n clock cycles).
Examples of the usage of this slower logical clock are: blinking an LED, gener-
Contents Index 83
6 S EQUENTIAL B UILDING B LOCKS
ating the baud rate for a serial bus, generating signals for 7-segment display multi-
plexing, and subsampling input values for debouncing of buttons and switches.
Although width inference should size the registers, it is better to explicitly specify
the width with the type at register definition or with the initialization value. Explicit
width definition can avoid surprises when a reset value of 0.U results in a counter
with a width of a single bit.
Some of us feel like being a nerd, sometimes. For example, we want to design a
highly optimized version of our counter/tick generation. A standard counter needs
following resources: one register, one adder (or subtractor), and a comparator. We
cannot do much about the register or the adder. If we count up, we need to compare
against a number, which is a bit string. The comparator can be built out of inverters
for the zeros in the bit string and a large AND gate. When counting down to zero,
the comparator is a large NOR gate, which might be a little bit cheaper than the
comparator against a constant in an ASIC. In an FPGA, where logic is built out of
lookup tables, there is no difference between comparing against 0 or 1. The resource
requirement is the same for the up and down counter.
However, there is still one more trick a clever hardware designer can pull off.
Counting up or down needed a comparison against all counting bits, so far. What if
we count from N-2 down to -1? A negative number has the most significant bit set
to 1, and a positive number has this bit set to 0. We need to check this bit only to
detect that our counter reached -1. Here it is, the counter created by a nerd:
val MAX = (N - 2).S(8.W)
val cntReg = RegInit (MAX)
io.tick := false.B
84 Index Contents
6.2 C OUNTERS
+
-1
0
next cntReg =0 done
din
load
Select
6.2.4 A Timer
Another form of timer we can create, is a one-shot timer. A one-shot timer is like
a kitchen timer: you set the number of minutes and press start. When the specified
amount of time has elapsed, the alarm sounds. The digital timer is loaded with the
time in clock cycles. Then it counts down until reaching zero. At zero the timer
asserts done.
Figure 6.10 shows the block diagram of a timer. The register can be loaded with
the value of din by asserting load. When the load signal is deasserted counting
down is selected (by selecting cntReg - 1 as the input for the register). When the
counter reaches 0, the signal done is asserted and the counter stops counting by
selecting the 0 input of the multiplexer.
Listing 6.1 shows the Chisel code for the timer. We use an 8-bit register cntReg
that is reset to 0. The boolean value done is the result of comparing cntReg with
0. For the input multiplexer we introduce the wire next with a default value of 0.
The when/elsewhen block introduces the other two inputs with the select function.
The signal load has priority over the decrement selection. The last line connects the
multiplexer, represented by next, to the input of the register cntReg.
If we aim for a bit more concise code, we can directly assign the multiplexer
values to the register reg, instead of using the intermediate wire next.
Contents Index 85
6 S EQUENTIAL B UILDING B LOCKS
PWM
86 Index Contents
6.2 C OUNTERS
1.U)
din > cntReg
}
We use a function for the PWM generator to provide a reusable, lightweight com-
ponent. The function has two parameters: a Scala integer configuring the PWM
with the number of clock cycles (nrCycles), and a Chisel wire (din) that gives the
duty cycle (pulse width) for the PWM output signal. We use a multiplexer in this
example to express the counter. The last line of the function compares the counter
value with the input value din to return the PWM signal. The last expression in a
Chisel function is the return value, in our case the wire connected to the compare
function.
We use the function unsignedBitLength(n) to specify the number of bits for
the counter cntReg needed to represent unsigned numbers up to (and including) n.3
Chisel also has a function signedBitLength to provide the number of bits for a
signed representation of a number.
One application of a PWM signal is to dim an LED. In that case the eye serves
as low-pass filter. We expand the above example to drive the PWM generation by a
triangular function. The result is an LED with continuously changing intensity.
val FREQ = 100000000 // a 100 MHz clock input
val MAX = FREQ /1000 // 1 kHz
Contents Index 87
6 S EQUENTIAL B UILDING B LOCKS
din dout
We use two registers for the modulation: (1) modulationReg for counting up and
down and (2) upReg as a flag to determine if we shall count up or down. We count
up to the frequency of our clock input (100 MHz in our example), which results in a
signal of 0.5 Hz. The lengthy when/.elsewhen/.otherwise expression handles the
up- or down-counting and the switch of the direction.
As our PWM counts only up to the 1000th of the frequency to generate a 1 kHz
signal, we need to divide the modulation signal by 1000. As real division is very
expensive in hardware, we simply shift by 10 to the right, which equates a division
by 210 = 1024. As we have defined the PWM circuit as a function, we can simply
instantiate that circuit with a function call. Wire sig represents the modulated PWM
signal.
88 Index Contents
6.3 S HIFT R EGISTERS
q3 q2 q1 q0
serIn
Shift registers are often used to convert from serial data to parallel data or from
parallel data to serial data. Section 11.2 shows a serial port that uses shift registers
for the receive and send functions.
Figure 6.13 shows a 4-bit shift register with a parallel output function.
Contents Index 89
6 S EQUENTIAL B UILDING B LOCKS
load
load
load
load
d3 d2 d1 d0
serOut
0
Note that we are now shifting to the right, filling in zeros at the MSB.
6.4 Memory
A memory can be built out of a collection of registers, in Chisel a Reg of a Vec.
However, this is expensive in hardware, and larger memory structures are built as
SRAM. For an ASIC, a memory compiler constructs memories. FPGAs contain on-
chip memory blocks, also called block RAMs. Those on-chip memory blocks can
be combined for larger memories. Memories in an FPGA usually have one read and
one write port, or two ports that can be switched between read and write at runtime.
FPGAs (and also ASICs) usually support synchronous memories. Synchronous
memories have registers on their inputs (read and write address, write data, and
90 Index Contents
6.4 M EMORY
rdAddr rdData
wrAddr
wrData
wrEna
Memory
write enable). That means the read data is available one clock cycle after setting the
address.
Figure 6.15 shows the schematics of such a synchronous memory. The memory
is dual-ported with one read port and one write port. The read port has a single
input, the read address (rdAddr) and one output, the read data (rdData). The write
port has three inputs: the address (wrAddr), the data to be written (wrData), and a
write enable (wrEna). Note that for all inputs, there is a register within the memory
showing the synchronous behavior.
To support on-chip memory, Chisel provides the memory constructor SyncReadMem.
Listing 6.2 shows a component Memory that implements 1 KiB of memory with byte-
wide input and output data and a write enable.
An interesting question is which value is returned from a read when in the same
clock cycle a new value is written to the same address that is read out. We are inter-
ested in the read-during-write behavior of the memory. There are three possibilities:
the newly written value, the old value, or undefined (which might be a mix of some
bits from the old value and some of the newly written data). Which possibility is
Contents Index 91
6 S EQUENTIAL B UILDING B LOCKS
when(io. wrEna) {
mem. write (io.wrAddr , io. wrData )
}
}
available in an FPGA depends on the FPGA type and sometimes can be specified.
Chisel documents that the read data is undefined.
If we want to read out the newly written value, we can build a forwarding circuit
that detects that the addresses are equal and forwards the write data. Figure 6.16
shows the memory with the forwarding circuit. Read and write addresses are com-
pared and gated with the write enable to select between the forwarding path of the
write data or the memory read data. The write data is delayed by one clock cycle
with a register.
Listing 6.3 shows the Chisel code for a synchronous memory including the for-
warding circuit. We need to store the write data into a register (wrDataReg) to be
available in the next clock cycle in order to fit the synchronous memory that also
provides the read value in the next clock cycle. We compare the two input addresses
(wrAddr and rdAddr) and check if wrEna is true for the forwarding condition. That
condition is also delayed by one clock cycle. A multiplexer selects between the
forwarding (write) data or the read data from memory.
Chisel also provides Mem, which represents a memory with synchronous write and
an asynchronous read. As this memory type is usually not directly available in an
92 Index Contents
6.4 M EMORY
=
AND
rdAddr rdData
dout
wrAddr
wrData
wrEna
Memory
Contents Index 93
6 S EQUENTIAL B UILDING B LOCKS
when(io. wrEna) {
mem. write (io.wrAddr , io. wrData )
}
94 Index Contents
6.5 E XERCISES
FPGA, the synthesize tool will build it out of flip-flops. Therefore, we recommend
using SyncReadMem. If asynchronous read behavior is needed and the resources
are available in the FPGA you are using (e.g., in the shape of LUTRAM on Xilinx
FPGAs), you can manually implement this as a BlackBox. Vendors typically provide
code templates that can be used directly for this.
Memories in FPGAs can be initialized with either binary or hexadecimal initial-
ization files. The files are simple ASCII text files with the same number of lines as
there are entries in the corresponding memory. Each character represents either a
single bit or four bits. Traditionally, binary files use the .bin file extension, while
hexadecimal files use .hex. Using loadMemoryFromFile will result in emission of a
separate Verilog file and works in ChiselTest. Initializations are based around calls
to readmemb or readmemh.
6.5 Exercises
Use the 7-segment encoder from the last exercise and add a 4-bit counter as input to
switch the display from 0 to F. When you directly connect this counter to the clock
of the FPGA board, you will see all 16 numbers overlapped (all 7 segments will
light up). Therefore, you need to slow down the counting. Create another counter
that can generate a single-cycle tick signal every 500 milliseconds. Use that signal
as enable signal for the 4-bit counter.
Construct a PWM waveform with a generator function and set the threshold with
a function (triangular or a sine function). A triangular function can be created by
counting up and down. A sine function can be created with a lookup table that you
can generate with a few lines of Scala code (see Section 10.4). Drive a LED on an
FPGA board with that modulated PWM function. What frequency shall your PWM
signal be? What frequency is the driver running?
Digital designs are often sketched as a circuit on paper. Not all details need to be
shown. We use block diagrams, like in the figures in this book. It is an important
skill to be able to fluently translate between a schematic representation of the circuit
and a Chisel description. Sketch the block diagram for the following circuits:
val dout = WireDefault (0.U)
switch (sel) {
is (0.U) { dout := 0.U }
is (1.U) { dout := 11.U }
is (2.U) { dout := 22.U }
Contents Index 95
6 S EQUENTIAL B UILDING B LOCKS
switch (sel) {
is (0.U) { regAcc := regAcc }
is (1.U) { regAcc := 0.U}
is (2.U) { regAcc := regAcc + din}
is (3.U) { regAcc := regAcc - din}
}
96 Index Contents
7 Input Processing
Input signals from the external world into our synchronous circuit are usually not
synchronous to the clock; they are asynchronous. An input signal may come from a
source that does not have a clean transition from 0 to 1 or 1 to 0. An example is a
bouncing button or switch. Input signals may be noisy with spikes that could trigger
a transition in our synchronous circuit. This chapter describes circuits that deal with
such input conditions.
The latter two issues, debouncing switches, and filtering noise, can also be solved
with external, analog components. However, it is more (cost-)efficient to deal with
those issues in the digital domain.
97
7 I NPUT P ROCESSING
Synchronous circuit
btn btnSync
External world
7.2 Debouncing
Switches and buttons may need some time to transition between on and off. During
the transition, the switch may bounce between those two states. If we use such
a signal without further processing, we might detect more transition events than
we want to. One solution is to use time to filter out this bouncing. Assuming a
maximum bouncing time of tbounce we will sample the input signals with a period
T > tbounce . We will only use the sampled signal further downstream.
When sampling the input with this long period, we know that on a transition from
0 to 1 only one sample may fall into the bouncing region. The sample before will
safely read a 0, and the sample after the bouncing region will safely read a 1. The
sample in the bouncing region will either be 0 or a 1. However, this does not matter
as it then belongs either to the still 0 samples or to the already 1 samples. The
2 The exception is when the input signal is dependent on a synchronous output signal, and we know the
maximum propagation delay. A classic example is the interfacing of an asynchronous SRAM to a
synchronous circuit, e.g., by a microprocessor.
98 Index Contents
7.2 D EBOUNCING
bouncing in
debounced A
debounced B
Contents Index 99
7 I NPUT P ROCESSING
First, we need to decide on the sampling frequency. The above example assumes
a 100 MHz clock and results in a sampling frequency of 100 Hz (assuming that the
bouncing time is below 10 ms). The maximum counter value is fac, the division
factor. We define a register btnDebReg for the debounced signal, without a reset
value. The register cntReg serves as counter, and the tick signal is true when the
counter has reached the maximum value. In that case, the when condition is true and
(1) the counter is reset to 0 and (2) the debounce register stores the input sample.
In our example, the input signal is named btnSync as it is the output from the input
synchronizer shown in the previous section.
The debouncing circuit comes after the synchronizer circuit. First, we need to
synchronize in the asynchronous signal, then we can further process it in the digital
domain.
tick
en en en
din
a b c
Majority voting
dout
when (tick) {
// shift left and input in LSB
shiftReg := shiftReg (1, 0) ## btnDebReg
}
// Majority voting
val btnClean = ( shiftReg (2) & shiftReg (1)) | ( shiftReg (2)
& shiftReg (0)) | ( shiftReg (1) & shiftReg (0))
To use the output of our carefully processed input signal, we first detect the rising
edge with a RegNext delay element and then compare this signal with the current
value of btnClean. In this example, we use the single-cycle risingEdge signal to
increment a counter.
val risingEdge = btnClean & ! RegNext ( btnClean )
def tickGen () = {
val reg = RegInit (0.U( log2Up (fac).W))
val tick = reg === (fac -1).U
reg := Mux(tick , 0.U, reg + 1.U)
tick
}
asynchronous input to the circuit. That means when directly connected to the reset
of a flip-flop it may violate timing constraints. In case of a synchronous reset it may
violate setup and hold times of the flip-flop. Also when used as an asynchronous
reset input, it still needs to be synchronized to the clock. Specifically, the release
of the reset signal needs to be synchronized to the clock. Another failure with an
asynchronous reset can be that different parts of the circuit may be reset in two
different clock cycles and therefore be inconsistent.
The solution for this issue is to synchronize the reset signal in the very same way
as any other asynchronous input with two flip-flops.
The reset and clock signals are usually hidden from the Chisel design. However,
it is possible to access and set those signals. Each module has an implicit field
reset. The solution is to have a top-level module that performs the synchronizing
of the external reset signals and connects that synchronized signal to the reset input
of the contained module.
class SyncReset extends Module {
val io = IO(new Bundle () {
val value = Output (UInt ())
})
In the above example SyncReset is the top level module that contains a counter
(WhenCounter). The reset signal of the top-level module is called reset and is
connected to the input synchronizer (RegNext(RegNext(reset))). The output of
that input synchronizer (syncReset) is connected to the reset input of the counter
(cnt.reset := syncReset).
7.6 Exercise
Build a counter that is incremented by an input button. Display the counter value
in binary with the LEDs on an FPGA board. First observe if there are issues with
a bouncing input button. Then resolve that issue by building the complete input
processing chain with: (1) an input synchronizer, (2) a debouncing circuit, (3) a
majority voting circuit to suppress noise, and (4) an edge detection circuit to trigger
the increment of the counter.
As there is no guarantee that a modern button will always bounce, you can simu-
late the bouncing and the spikes by pressing the button manually in a fast succession
and using a low sample frequency. Select, e.g., one second as sample frequency, i.e.,
if the input clock runs at 100 MHz, divide it by 100,000,000. Simulate a bouncing
button by pressing several times in fast succession before settling to a stable press.
Test your circuit without and with the debouncing circuit sampling at 1 Hz. With the
majority voting, you need to press between one and two seconds for a reliable in-
crement of the counter. Also, the release of the button is majority voted. Therefore,
the circuit only recognizes the release when it is longer than 1–2 seconds.
state
Next
Ouput
state nextState out
logic
logic
in
107
8 F INITE -S TATE M ACHINES
reset red/
green orange
ring bell
clear
clear
output logic computes the output (out). As the output depends on the current state
only, this state machine is called a Moore machine.
A state diagram describes the behavior of such an FSM visually. In a state dia-
gram, individual states are depicted as circles labeled with the state names. State
transitions are shown with arrows between states. The guard (or condition) when
this transition is taken is drawn as a label for the arrow.
Figure 8.2 shows the state diagram of a simple example FSM. The FSM has three
states: green, orange, and red, indicating a level of alarm. The FSM starts at the
green level. When a bad event happens the alarm level is switched to orange. On a
second bad event, the alarm level is switched to red. In that case, we want to ring
a bell; ring bell is the only output of this FSM. We add the output to the red state.
The alarm can be reset with a clear signal.
Although a state diagram may be visually pleasing and the function of an FSM
can be grasped quickly, a state table may be quicker to write down. Table 8.1 shows
the state table for our alarm FSM. We list the current state, the input values, the
resulting next state, and the output value for the current state. In principle, we
would need to specify all possible inputs for all possible states. This table would
have 3 × 4 = 12 rows. We simplify the table by indicating that the clear input is
a don’t care when a bad event happens. That means bad event has priority over
clear. The output column has some repetition. If we have a larger FSM and/or more
outputs, we can split the table into two, one for the next state logic and one for the
output logic.
Finally, after all the design of our warning level FSM, we shall code it in Chisel.
Listing 8.1 shows the Chisel code for the alarm FSM. Note that we use the Chisel
type Bool for the inputs and the output of the FSM. To use the switch control in-
struction, we need to import chisel3.util. .
import chisel3 ._
import chisel3 .util._
The complete Chisel code for this simple FSM fits into one page. Let us step
through the individual parts. The FSM has two input signals and a single output
signal, captured in a Chisel Bundle:
val io = IO(new Bundle {
val badEvent = Input(Bool ())
val clear = Input(Bool ())
val ringBell = Output (Bool ())
})
At this place we could spend some discussion on optimal state encoding. Two com-
mon options are binary or one-hot encoding. However, we leave those low-level
optimizations to the synthesize tool and aim for readable code.1 Therefore, we use
the enumeration type ChiselEnum with symbolic names for the states:
object State extends ChiselEnum {
val green , orange , red = Value
}
import State._
The individual state values are enumerated in a comma separated list, followed by
an assignment of Value. The register holding the state is defined with the green state
as the reset value:
1 In the current version of Chisel,
the ChiselEnum type represents states in binary encoding. If we want
a different encoding, e.g., one-hot encoding, we can define Chisel constants for the state names.
The meat of the FSM is in the next state logic. We use a Chisel switch on the state
register to cover all states. Within each is branch we code the next state logic, which
depends on the inputs, by assigning a new value to the state register:
switch ( stateReg ) {
is ( green) {
when(io. badEvent ) {
stateReg := orange
}
}
is ( orange ) {
when(io. badEvent ) {
stateReg := red
} . elsewhen (io.clear) {
stateReg := green
}
}
is (red) {
when (io.clear) {
stateReg := green
}
}
}
Last, but not least, we code our ringing bell output to be true when the state is red.
io. ringBell := stateReg === red
Note that we did not introduce a nextState signal for the register input, as it is
common practice in Verilog or VHDL. Registers in Verilog and VHDL are described
in a special syntax and cannot be assigned (and reassigned) within a combinational
block. Therefore, the additional signal, computed in a combinational block, is in-
troduced and connected to the register input. In Chisel a register is a base type and
can be freely used and assigned within a combinational block.
AND risingEdge
din NOT
Figure 8.3 shows the schematic of the rising edge detector. The output becomes
1 for one clock cycle when the current input is 1 and the input in the last clock cycle
was 0. The state register is just a single D flip-flop where the next state is just the
input. We can also consider this as a delay element of one clock cycle. The output
logic compares the current input with the current state.
When the output depends also on the input, i.e., there is a combinational path
between the input of the FSM and the output, this is called a Mealy machine.
Figure 8.4 shows the schematic of a Mealy type FSM. Similar to the Moore FSM,
the register contains the current state, and the next state logic computes the next
state value (nextState) from the current state and the input (in). On the next clock
tick, state becomes nextState. The output logic computes the output (out) from
the current state and the input to the FSM.
Figure 8.5 shows the state diagram of the Mealy FSM for the edge detector. As
the state register consists just of a single D flip-flop, only two states are possible,
which we name zero and one in this example. As the output of a Mealy FSM does
not only depend on the state, but also on the input, we cannot describe the output as
part of the state circle. Instead, the transitions between the states are labeled with
the input value (condition) and the output (after the slash). Note also that we now
need to draw self transitions, e.g., in state zero when the input is 0 the FSM stays
state
Next
state nextState
logic Output
out
in logic
0/0 1/0
1/1
reset
zero one
0/0
Figure 8.5: The state diagram of the rising edge detector as Mealy FSM.
1 1
Figure 8.6: The state diagram of the rising edge detector as Moore FSM.
in state zero, and the output is 0. The rising edge FSM generates the 1 output only
on the transition from state zero to state one. In state one, which represents that the
input is now 1, the output is 0. We only want a single (cycle) pulse for each rising
edge of the input.
Listing 8.2 shows the Chisel code for the rising edge detection with a Mealy ma-
chine. As in the previous example, we use the Chisel type Bool for the single-bit
input and output. The output logic is now part of the next state logic; on the transi-
tion from zero to one, the output is set to true.B. Otherwise, the default assignment
to the output (false.B) counts.
One can ask if a full-blown FSM is the best solution for the edge detection cir-
cuit, especially, as we have seen a Chisel one-liner for the same functionality. The
hardware consumptions are similar. Both solutions need a single D flip-flop for the
state. The combinational logic for the FSM is probably a bit more complicated, as
the state change depends on the current state and the input value. For this func-
tion, the one-liner is easier to write and easier to read, which is more important.
Therefore, the one-liner is the preferred solution.
We have used this example to show one of the smallest possible Mealy FSMs.
FSMs shall be used for more complex circuits with three or more states.
import chisel3 ._
import chisel3 .util._
clock
din
risingEdge Mealy
risingEdge Moore
Figure 8.7: Mealy and a Moore FSM waveform for rising edge detection.
cycle pulse. The FSM stays in state puls just one clock cycle and then proceeds
either back to the start state zero or to the one state, waiting for the input to become
0 again. We show the input condition on the state transition arrows and the FSM
output within the state representing circles.
Listing 8.3 shows the Moore version of the rising edge detection circuit. It uses
double the number of D flip-flops than the Mealy or directly coded version. The
resulting next state logic is therefore also larger than the Mealy or directly coded
version.
Figure 8.7 shows the waveform of a Mealy and a Moore version of the rising edge
detection FSM. We can see that the Mealy output closely follows the input rising
edge, while the Moore output rises after the clock tick. We can also see that the
Moore output is one clock cycle wide, where the Mealy output is usually less than
a clock cycle.
From the above example, one is tempted to find Mealy FSMs the better FSMs
as they need less state (and therefore logic) and react faster than a Moore FSM.
However, the combinational path within a Mealy machine can cause troubles in
larger designs. First, with a chain of communicating FSM (see next chapter), this
combinational path can become lengthy. Second, if the communicating FSMs build
a circle, the result is a combinational loop, which is an error in synchronous design.
Due to a cut in the combinational path with the state register in a Moore FSM, all
the above issues do not exist for communicating Moore FSMs.
In summary, Moore FSMs combine better for communicating state machines;
they are more robust than Mealy FSMs. Use Mealy FSMs only when the reaction
within the same clock cycle is of utmost importance. Small circuits such as the
rising edge detection, which are practically Mealy machines, are fine as well.
import chisel3 ._
import chisel3 .util._
// Output logic
io. risingEdge := stateReg === puls
}
8.4 Exercise
In this chapter, you have seen many examples of very small FSMs. Now it is time to
write some real FSM code. Pick a little bit more complex example and implement
the FSM and write a test bench for it.
A classic example for a FSM is a traffic light controller (see [6, Section 14.3]).
A traffic light controller has to ensure that on a switch from red to green there is a
phase in between where both roads in the intersection have a no-go light (red and
orange). To make this example a little bit more interesting, consider a priority road.
The minor road has two car detectors (on both entries into the intersection). Switch
to green for the minor road only when a car is detected and then switch back to
green for the priority road.
• when start is high for one clock cycle, the flashing sequence starts;
• where the light goes on for six clock cycles, and the light goes off for four
clock cycles between flashes;
• after the sequence, the FSM switches the light off and waits for the next
start.
The FSM for a direct implementation1 has 27 states: one initial state that is wait-
ing for the input, 3 × 6 states for the three on states and 2 × 4 states for the off states.
We do not show the code for this simple-minded implementation of the light flasher.
The problem can be solved more elegantly by factoring this large FSM into two
smaller FSMs: the master FSM implements the flashing logic, and the timer FSM
implements the waiting. Figure 9.1 shows the composition of the two FSMs.
1 The state diagram is shown in [6, p. 376].
119
9 C OMMUNICATING S TATE M ACHINES
start light
Master FSM
timerSelect
timerLoad
timerDone
Timer FSM
Figure 9.1: The light flasher split into a Master FSM and a Timer FSM.
The timer FSM counts down for 6 or 4 clock cycles to produce the desired timing.
The timer specification is as follows:
• when timerLoad is asserted, the timer loads a value into the down counter,
independent of the state;
• timerDone is asserted when the counter completed the countdown and re-
mains asserted;
The following code shows the timer FSM of the light flasher:
val timerReg = RegInit (0.U)
timerDone := timerReg === 0.U
timerReg := 3.U
}
}
Listing 9.1 shows the master FSM. It has a starting state off and states for the
complete blinking sequence. In each state it waits for the time being done. The
timer is loaded whenever it is done and in the initial off state. Signal timerSelect
selects the value for the next state down counter.
// Timer connection
val timerLoad = WireDefault (false.B) // start timer
val timerSelect = WireDefault (true.B) // 6 or 4 cycles
val timerDone = Wire(Bool ())
timerLoad := timerDone
// Master FSM
switch ( stateReg ) {
is(off) {
timerLoad := true.B
timerSelect := true.B
when ( start) { stateReg := flash1 }
}
is ( flash1 ) {
timerSelect := false.B
light := true.B
when ( timerDone ) { stateReg := space1 }
}
is ( space1 ) {
when ( timerDone ) { stateReg := flash2 }
}
is ( flash2 ) {
start light
Master FSM
timerSelect
timerLoad
timerDone
cntLoad
cntDecr
cntDone
Timer Counter
Figure 9.2: The light flasher split into a Master FSM, a Timer FSM, and a Counter
FSM.
timerSelect := false.B
light := true.B
when ( timerDone ) { stateReg := space2 }
}
is ( space2 ) {
when ( timerDone ) { stateReg := flash3 }
}
is ( flash3 ) {
timerSelect := false.B
light := true.B
when ( timerDone ) { stateReg := off }
}
}
This solution with a master FSM and a timer has still redundancy in the code of
the master FSM. States flash1, flash2, and flash3 are performing the same func-
tion, states space1 and space2 as well. We can factor out the number of remaining
flashes into a second counter. Then the master FSM is reduced to three states: off,
flash, and space.
Figure 9.2 shows the design with a master FSM and two FSMs that count: one
FSM to count clock cycles for the interval length of on and off ; the second FSM to
count the remaining flashes. Listing 9.2 code shows the down counter FSM:
Note that the counter is loaded with 2 for 3 flashes, as it counts the remaining flashes
and is decremented in state space when the timer is done. Listing 9.3 shows the
master FSM for the double refactored flasher.
// Timer connection
val timerLoad = WireDefault (false.B) // start timer with a
load
val timerSelect = WireDefault (true.B) // select 6 or 4
cycles
val timerDone = Wire(Bool ())
// Counter connection
val cntLoad = WireDefault (false.B)
val cntDecr = WireDefault (false.B)
val cntDone = Wire(Bool ())
timerLoad := timerDone
switch ( stateReg ) {
is(off) {
timerLoad := true.B
timerSelect := true.B
cntLoad := true.B
Besides having a master FSM that is reduced to just three states, our current
solution is also better configurable. No FSM needs to be changed if we want to
change the length of the on or off intervals or the number of flashes.
In this section, we have explored communicating circuits, especially FSMs, that
only exchange control signals. To perform computation we can combine a FSM
with a datapath, as discussed in the next section.
dinValid popCntValid
din popCnt
Datapath
handshake signals are connected to the FSM. The FSM is connected with the dat-
apath with control signals towards the datapath and with status signals from the
datapath.
We will co-design the FSM and the datapath. Figure 9.4 shows the state diagram
of the FSM and Figure 9.5 shows the datapath for the popcount circuit. The FSM
starts in state Idle, where the FSM waits for input. When data arrives, as signaled
with an asserted dinValid, the FSM loads the shift register and advances to state
Count. The data is loaded into the shf register. On the load also the cnt register is
reset to 0.
In state Count, the number of ‘1’s is counted sequentially. We use a shift register,
an adder, an accumulator register, and a down counter (not shown in the datapath)
to perform the computation. To count the number of ‘1’s, the shf register is shifted
right, and the least significant bit is added to cnt each clock cycle. A counter, not
shown in the figure, counts down until all bits have been shifted through the least
significant bit. When the counter reaches zero, the popcount has finished. The FSM
switches to state Done and signals the result by asserting popCntReady. When the
result is read, signaled by asserting popCntValid, the FSM switches back to Idle,
ready to compute the next popcount.
The top level component, shown in Listing 9.4, instantiates the FSM and the
datapath components and connects them. Listing 9.5 shows the Chisel code for the
datapath of the popcount circuit. On a load signal, the regData register is loaded
with the input, the regPopCount register reset to 0, and the counter register regCount
set to the number of shifts to be performed. Otherwise, the regData register is
Idle Valid
Result read
Done Count
Finished
0
0 count
shf cnt
din
+
data.io.din := io.din
io. popCnt := data.io. popCnt
data.io.load := fsm.io.load
fsm.io.done := data.io.done
}
shifted to the right, the least significant bit of the regData register added to the
regPopCount register, and the counter decremented until it is 0. When the counter
is 0, the output contains the popcount.
when(io.load) {
dataReg := io.din
popCntReg := 0.U
counterReg := 8.U
}
// debug output
printf ("%x %d\n", dataReg , popCntReg )
switch ( stateReg ) {
is(idle) {
io. dinReady := true.B
when(io. dinValid ) {
io.load := true.B
stateReg := count
}
}
is( count ) {
when(io.done) {
stateReg := done
}
}
is(done) {
io. popCntValid := true.B
when(io. popCntReady ) {
stateReg := idle
}
} } }
valid
ready
Sender Receiver
data
Listing 9.6 shows the code of the FSM. The FSM starts in state idle. On a valid
signal for the input data (dinValid) it switches to the count state and waits till the
datapath has finished counting. When the popcount is done, the FSM switches to
state done and waits till the popcount is read (signaled by popCntReady).
The popcount example consumed data (a word) and produced data (the pop-
count). For the coordinated exchange of data, we use handshake signals. The next
section describes the ready/valid interface for flow control of unidirectional data
exchange.
clock
ready
valid
data D
1 2 3 4 5 6 7
clock
ready
valid
data D
1 2 3 4 5 6 7
clock
ready
valid
data D1 D2 D3
1 2 3 4 5 6 7
signals valid (from clock cycle 2 on) before the receiver is ready. The data transfer
happens in clock cycle 4. From clock cycle 5 on neither the sender has data nor the
receiver is ready for the next transfer. Similar to the “always ready” interface we
can envision and always valid interface. However, in that case the data will probably
not change on signaling ready and we would simply drop the handshake signals.
Figure 9.9 shows further variations of using the the ready/valid interface. In clock
cycle 2 it happens that both signals (ready and valid) become asserted just for a
single clock cycle and the data transfer of D1 happens. Data can be transferred back-
to-back (in every clock cycle) as shown in clock cycles 5 and 6 with the transfer of
D2 and D3
To make this interface composable, neither ready nor valid is allowed to depend
combinationally on the other signal. As this interface is so common, Chisel defines
the DecoupledIO bundle, similar to the following:
class DecoupledIO [T <: Data ]( gen: T) extends Bundle {
val ready = Input(Bool ())
val valid = Output (Bool ())
val bits = Output (gen)
}
The DecoupledIO bundle is parameterized with the type for the data. The interface
defined by Chisel uses the field bits for the data. DecoupledIO is part of the package
chisel3.util.
One question remains if the ready or valid may be deasserted after being asserted
and no data transfer has happened. For example a receiver might be ready for some
time and not receiving data, but due to some other events may become not ready.
The same can be envisioned with the sender, having data valid only some clock
cycles and becoming non-valid without a data transfer happened. If this behavior is
allowed or not is not part of the ready/valid interface, but needs to be defined by the
concrete usage of the interface.
Chisel places no requirements on the signaling of ready and valid when using the
class DecoupledIO. However, the class IrrevocableIO places following restrictions
on the sender:
Note that this is just a convention that cannot be enforced just by using the class
IrrevocableIO.
The AXI bus [3] uses one ready/valid interface for each of the following parts
of the bus: read address, read data, write address, and write data. AXI restricts the
interface that once the sender assets valid it is not allowed to deasserted it until
the data transfer happened. This is the same restriction as just described in the
comment of the IrrevocableIO interface. Furthermore, the sender is not allowed
to wait for a receivers ready to assert valid. The receiver side is more relaxed. If
ready is asserted, it is allowed to deassert it before valid is asserted. Furthermore,
the receiver is allowed to wait for a asserted valid before asserting ready.
Listing 9.7 shows an example of using the ready/valid interface. The circuit
represents a buffer built out of a register. The buffer has a ready/valid interface
(DecoupledIO) at the input and one at the output. The DecoupledIO bundle is de-
fined from the sender’s viewpoint. Therefore, the input of the buffer (in) needs to
change the direction with Flipped.
The module contains a register for the data (dataReg) and a single bit register
(emptyReg) signaling if the buffer is empty or full. This single bit represents a two
state Moore FSM with states empty and full. The input ready signal and the output
valid signal depend only on the state of emptyReg. There is no combinational path
between the input and the output of the buffer.
When the buffer is empty and there is valid data at the input, the data is registered
and the state changed to full. When the buffer is full and the consumer side signals
to be ready, the data is considered read and the buffer is empty again.
In Chisel we use vals to name hardware components. Note that the := operator is a
Chisel operator and not a Scala operator.
Scala also provides the more classic version of a mutable variable as var. The
following code defines an integer variable and reassigns it a new value:
1 The link points to the Scala 2 version of the book, as Chisel is still based on Scala 2.
135
10 H ARDWARE G ENERATORS
We will need Scala vars to write hardware generators, but never need vars to name
a hardware component.
You may have wondered what type those variables have. As we assigned an
integer constant in the above example, the type of the variable is inferred; it is a
Scala Int type. In most cases the Scala compiler is able to infer the type. However,
if we are in the mood of being more explicit, we can explicitly state the type as
follows:
val number : Int = 42
We use a loop for circuit generators. The following loop connects individual bits
of a shift register.
val regVec = Reg(Vec (8, UInt (1.W)))
Note that this is not the most concise expression of a shift register. It is better to
use a plain UInt with the right size and assign the new value for the register with an
expression using the ## operator and proper indexing. This code snippet is just to
show how a Scala for loop can be used for circuit generation.
Conditions are expressed with if and else. Note that this condition is evalu-
ated at Scala runtime during circuit generation. This construct does not create a
multiplexer, but allows to write configurable hardware generators.
for (i <- 0 until 10) {
print (i)
if (i%2 == 0) {
println (" is even")
} else {
println (" is odd")
}
}
Scala has the notion of a tuple. A tuple can hold a sequence of different types.
The tuple is built by placing the individual fields within parentheses. The fields are
then accessed with . n, starting with 1 for the first field. Following code creates a
tuple to represent a city with the zip code and the name.
val city = (2000 , " Frederiksberg ")
val zipCode = city._1
val name = city._2
Tuples are useful when we want to return more than one value from a function.
Tuples allow us to represent Chisel components with more than one output as a
lightweight function instead of a full-blown module.
Scala has a powerful collection library. One of the simpler collection types is Seq,
an ordered collection of elements (also called a sequence). The default implemen-
tation is immutable. We index into a Seq with (), with zero-based indexing. Seq is
a base class with several different implementations. However, for most Chisel hard-
ware generators direct use of Seq is the preferred choice. The following code shows
how to create a Seq that holds four Scala Int values. The second line accesses the
second element, and second will be 15.
val numbers = Seq (1, 15, -2, 0)
val second = numbers (1)
The return value of a function in Scala is the result of the last expression.2 We can
then create two adders by simply calling the function adder.
val x = adder(a, b)
// another adder
val y = adder(c, d)
Note that this is a hardware generator. That code is not executing any add operation
during elaboration, but creates two adders (hardware instances). Or in other words,
it returns a wire to the output of the adder. We have written our first hardware
generator!
The adder is an artificial example to keep it simple. Chisel has already an adder
generator function, like +(that: UInt).
Functions, as lightweight hardware generators, can also contain state (using a reg-
ister). The following example returns a one clock cycle delay element (a register).
If a function has just a single statement, we can write it in one line and omit the
curly braces ({}).
def delay (x: UInt) = RegNext (x)
By calling the function with the function itself as parameter, this generated a two
clock cycle delay.
val delOut = delay(delay(delIn))
Again, this is too short an example to be useful, as RegNext() is already that function
that creates the register for the delay.
Functions return only one value. In order to provide more than one output, we can
wrap several output wires into a Scala tuple. The following code generates hardware
that compares two inputs and has two outputs.
def compare (a: UInt , b: UInt) = {
val equ = a === b
val gt = a > b
(equ , gt)
2 Scala
also contains a return statement. The code could have been written a bit more verbose as
return x + y.
With the parenthesis, we wrap the two wires that are connected to the outputs of the
comparator circuit into a Scala tuple.
When creating a comparator component with the compare function, it returns a
tuple of two wires. We can access the two wires with the . n syntax.
val cmp = compare (inA , inB)
val equResult = cmp._1
val gtResult = cmp._2
However, we can directly decompose the tuple into two wires, in this case equ and
gt, with following syntax.
If we need more parameters, we can simply add additional parameters to the con-
structor of the Chisel module. However, is we pass those parameters through several
constructors it might become tedious to use them. Furthermore, when changing the
number or type of parameters, we need to edit several places.
Scala has a very light-weight construct to package several fields into a class: a
case classes. Case classes are like regular Scala classes, but with a very light-weight
definition. Following code defines a case class to represent three parameters. It
might be used for a device with a transmit (tx) buffer and a receive buffer(rx) of a
certain width.
case class Config ( txDepth : Int , rxDepth : Int , width: Int)
An object of that case class is created by simply calling the constructor. The fields
are immutable and can be read by accessing them:
val param = Config (4, 2, 16)
We can also add code to the case class to check that the parameters are valid.
case class SaveConf ( txDepth : Int , rxDepth : Int , width: Int) {
Chisel allows parameterizing functions with types, in our case with Chisel types.
The expression in the square brackets [T <: Data] defines a type parameter T that
is Data or a subclass of Data. Data is the root of the Chisel type system.
Our multiplexer function has three parameters: the boolean condition, one param-
eter for the true path, and one parameter for the false path. Both path parameters are
of type T, which is provided at function call. The function itself is straightforward:
we define a wire with the default value of fPath and change the value if the condi-
tion is true to the tPath. This condition is a classic multiplexer function. At the end
of the function, we return the multiplexer hardware (the output). We can use our
multiplexer function with simple types such as UInt:
val resA = myMux(selA , 5.U, 10.U)
The types of the two multiplexer paths need to be the same. The following wrong
usage of the multiplexer results in a runtime error:
val resErr = myMux(selA , 5.U, 10.S)
To show a more complex multiplexer, we define a new type as a Bundle with two
fields:
class ComplexIO extends Bundle {
val d = UInt (10.W)
val b = Bool ()
}
We can define Bundle constants by first creating a Wire of the Bundle and then set-
ting the subfields. Then we can use our parameterized multiplexer with this complex
type.
val tVal = Wire(new ComplexIO )
tVal.b := true.B
tVal.d := 42.U
val fVal = Wire(new ComplexIO )
fVal.b := false.B
fVal.d := 13.U
In our initial design of the function, we used WireDefault to create a wire with
the type T with a default value. If we need to create a wire just of the Chisel type
without using a default value, we can use fPath.cloneType to get the Chisel type.
The following function shows the alternative way to code the multiplexer.
def myMuxAlt [T <: Data ]( sel: Bool , tPath: T, fPath : T): T
= {
parameter of that type. Additionally, in this example, we also make the number of
router ports configurable.
class NocRouter [T <: Data ](dt: T, n: Int) extends Module {
val io =IO(new Bundle {
val inPort = Input(Vec(n, dt))
val address = Input(Vec(n, UInt (8.W)))
val outPort = Output (Vec(n, dt))
})
To use our router, we first need to define the data type we want to route, e.g., as a
Chisel Bundle:
class Payload extends Bundle {
val data = UInt (16.W)
val flag = Bool ()
}
We create a router by passing an instance of the user-defined Bundle and the number
of ports to the constructor of the router:
val router = Module (new NocRouter (new Payload , 2))
The Bundle has a parameter of type T, which is a subtype of Chisel’s Data type.
Within the bundle, we define a field data by invoking cloneType on the parameter.
and instantiate that router with a Port that takes a Payload as a parameter:
val router = Module (new NocRouter2 (new Port(new Payload ),
2))
We can use the full power of Scala to generate our logic (tables). For example,
we can generate a table of fixpoint constants to represent a trigonometric function,
compute constants for digital filters, or write an assembler in Scala to generate code
import chisel3 ._
for a microprocessor written in Chisel. All those functions are in the same code base
(same language) and can be executed during hardware generation.
A classic example for a table generation is the conversion of a binary number into
a binary-coded decimal (BCD) representation. BCD is used to represent a number
in a decimal format using 4 bits for each decimal digit. For example, decimal 13
is in binary 1101 and BCD encoded as 1 and 3 in binary: 00010011. BCD allows
displaying numbers in decimal, a more user-friendly number representation than
hexadecimal.
When using a classic hardware description language, such as Verilog or VHDL,
we would use another scripting or programming language to generate such a table.
We can write a Java program that computes the table to convert binary to BCD.
That Java program prints out VHDL code that can be included in a project. The
Java program is about 100 lines of code; most of the code generating VHDL strings.
However, the key part of the conversion is just two lines of code. With Chisel, we
can compute this table directly as part of the hardware generation. Listing 10.1
shows the table generation for the binary to BCD conversion.
We can also generate a logic table from a Scala Array. We may have data in a file
that we want to read in during hardware generation time for the logic table. List-
ing 10.2 shows how to use the Scala Source class form the Scala standard library to
read the file data.txt, which contains integer constants in a textual representation.
A few words on the maybe a bit intimidating expression:
val table = VecInit (array.map(_.U(8.W)))
A Scala Array can be implicitly converted to a Scala sequence (Seq), which supports
the mapping function map. map invokes a function on each element of the sequence
and returns a sequence of the return value of the function. Our function .U(8.W)
represents each Int value from the Scala array as a and performs the conversion
from a Scala Int value to a Chisel UInt literal, with a size of 8 bits. The Chisel
object VecInit creates a Chisel Vec from a sequence Seq of Chisel types.
We can use the initialization of a Chisel Vec from a Scala sequence to represent
a message that we may send out to a serial port. The following code converts the
standard greeting from a Scala/Java String to a Chisel Vec:
val msg = " Hello World!"
val text = VecInit (msg.map(_.U))
val len = msg. length .U
The Scala string msg can be used as a sequence and therefore, the map function is
available to map each Scala Char to a Chisel UInt. This code is extracted from the
serial port example, which is used later in this text to send a welcome message.
import chisel3 ._
import scala .io. Source
val N = (n -1).U
io.tick := tick
}
the Chisel code for the tester. The TickerTester has several parameters: (1) the
type parameter [T <: Ticker] to accept a Ticker or any class that inherits from
Ticker, (2) the design under test, being of type T or a subtype thereof, and (3)
the number of clock cycles we expect for each tick. The tester waits for the first
occurrence of a tick (the start might be different for different implementations) and
then checks that tick repeats every n clock cycles.
With a first, easy implementation of the ticker, we can test the tester itself, prob-
ably with some println debugging. When we are confident that the simple ticker
and the tester are correct, we can proceed and explore two more versions of the
ticker. Listing 10.6 shows the tick generation with a counter counting down to 0.
Listing 10.7 shows the nerd version of counting down to -1 to use less hardware by
avoiding the comparator.
We can test all three versions of the ticker by using ScalaTest specifications, cre-
ating instances of the different versions of the ticker and passing them to the generic
test bench. Listing 10.8 shows the test specification. We run only the ticker tests
with:
sbt " testOnly TickerTest "
import chisel3 ._
import chiseltest ._
import org. scalatest . flatspec . AnyFlatSpec
trait TickerTestFunc {
def testFn [T <: Ticker ]( dut: T, n: Int) = {
// -1 means that no ticks have been seen yet
var count = -1
for (_ <- 0 to n * 3) {
// Check for correct output
if ( count > 0)
dut.io.tick. expect (false.B)
else if (count == 0)
dut.io.tick. expect (true.B)
val N = (n -1).U
val N = n
First we define the hardware for the adder in function add. The vector (Chisel
type Vec) is located in vec. The Scala method reduce() combines all elements of a
collection with a binary operation, producing a single value. The reduce() method
reduces the sequence starting from the first element. It takes the first two elements
and performs the operation. The result is then combined with the next element, until
a single result is left.
The function to combine two elements is provided as parameter to reduce, in
our case add, which returns an adder. The resulting hardware is a chain of adders
computing the sum of the elements of vector vec. Instead of defining the (simple)
add function, we can provide the addition as anonymous function and use the Scala
wildcard “ ” to represent the two operands.
val sum = vec. reduce (_ + _)
With this one-liner we have generated the chain of adders. For the sum function a
chain is not the ideal configuration; a tree will have a shorter combinational delay. If
we do not trust the synthesize tool to rearrange our adder chain, we can use Chisel’s
reduceTree method to generated a tree of adders:
The function literal for the minimum function uses two parameters x and y and
returns a multiplexer (Mux) that compares the two parameters and returns the smaller
value.
val min = vec. reduceTree ((x, y) => Mux(x < y, x, y))
Let us extend this circuit to return not only the minimal value from the vec, but
also the position (index) of that minimal value in the vec. To return two values we
define the Bundle Two to hold the value and the index. We declare the vecTwo Vec
that can hold these bundles and connect them in a loop to the original input and the
index within the Vec, as shown in Listing 10.9.
As before, we use a function literal in the reduceTree method of the vecTwo,
comparing the value field within the bundle and returning the multiplexer for the
complete bundle. Value res points to the bundle containing the minimum value and
the position.
As a more advanced variation of the minimum search circuit, we will use more
Scala features to avoid creating the bundle to return the value and index. We will
use a tuple to represent both values. The following code shows the application of a
val res = vecTwo . reduceTree ((x, y) => Mux(x.v < y.v, x, y))
ferent types at different positions. Therefore, it cannot function as, for example,
multiplexer. However, we can use it as an indexable collection during hardware
generation.
val scalaVector = vec. zipWithIndex
.map ((x) => MixedVecInit (x._1 , x._2.U(8.W)))
val resFun2 = VecInit ( scalaVector )
. reduceTree ((x, y) => Mux(x(0) < y(0) , x, y))
In the above example we create a Scala Vector of the values with their index, but
now using Chisel’s “tuple”. We then convert the Scala Vector into a Chisel Vec.
Then we can again perform a tree-based reduction. Another benefit of this version
is that we have only one multiplexer, which selects between two Chisel “tuples”
that are actually MixedVecs. The result in resFun2 is a MixedVec with two elements,
accessed with an index, like a “normal” Vec.
With our tree reduction function we can build an arbitration tree out of just 2:1
arbiters. We can generate the arbitration circuit as follows:
class Arbiter [T <: Data: Manifest ](n: Int , private val gen:
T) extends Module {
val io = IO(new Bundle {
val in = Flipped (Vec(n, new DecoupledIO (gen)))
val out = new DecoupledIO (gen)
})
The input is a Vec of ready/valid interfaces and the output a single ready/valid inter-
face. We just need a function that provides arbitration between two requests.
Simple Arbitration
Fair Arbitration
when ( regReadyA ) {
regData := a.bits
regEmpty := false.B
regReadyA := false.B
}
when ( regReadyB ) {
regData := b.bits
regEmpty := false.B
regReadyB := false.B
}
out.bits := regData
out
}
to handle the case when both inputs are valid in the same clock cycle.
When a request is accepted, it stores the data and switches to one of the full
states (hasA or hasB). When the consumer of the output accepts the data, the arbiter
switches back to an idle state. It switches to the idle state that will accept a pending
request from the other input in the next clock cycle.
write read
full empty
Writer FIFO Reader
din dout
159
11 E XAMPLE D ESIGNS
The reader side provides data with dout and the read is initiated with read. The
empty signal is responsible for the flow control at the reader side.
Listing 11.1 shows a single buffer. The buffer has a enqueueing port enq of type
WriterIO and a dequeueing port deq of type ReaderIO. The state elements of the
buffer is one register that holds the data (dataReg) and one state register for the
simple FSM (stateReg). The FSM has only two states: either the buffer is empty
or full. If the buffer is empty, a write will register the input data and change to
the full state. If the buffer is full, a read will consume the data and change to the
empty state. The IO ports full and empty represent the buffer state for the writer
and the reader.
Listing 11.2 shows the complete FIFO. The complete FIFO has the same IO in-
terface as the individual FIFO buffers. BubbleFifo has as parameters the size of the
data word and depth for the number of buffer stages. We can build a depth stages
bubble FIFO out of depth FifoRegisters. We create the stages by filling them into
a Scala Array. The Scala array has no hardware meaning, it just serves as a con-
tainer to have references to the created buffers. In a Scala for loop we connect the
individual buffers. The first buffer’s enqueueing side is connected to the enqueue-
ing IO of the complete FIFO and the last buffer’s dequeueing side to the dequeueing
1 For completeness, the Chisel book repository contains a copy of the FIFO code as well.
b0 b1 b2 b3 b4 b5 b6 b7
8-bit data, least significant bit first, and then one or two stop bits (1). When no data
is transmitted, the output is 1. Figure 11.2 shows the timing diagram of one byte
transmitted.
We design our UART in a modular way with minimal functionality per module.
We present a transmitter (TX), a receiver (RX), a buffer, and then usage of those
base components.
First, we need an interface, a port definition. For the UART design, we use a
ready/valid handshake interface (extending DecoupledIO), with a data size of 8 bits.
class UartIO extends DecoupledIO (UInt (8.W)) {
}
The convention of a ready/valid interface is that the data is transferred when both
ready and valid are asserted.
Listing 11.3 shows a bare-bone serial transmitter (Tx). The IO ports are the txd
port, where the serial data is sent and a channel where the transmitter can receive
the characters to serialize and send. To generate the timing, we compute a constant
for the time in clock cycles for one serial bit.
We use three registers: (1) register to shift the data (serialize them) (shiftReg),
(2) a counter to generate the correct baud rate (cntReg), and (3) a counter for the
number of bits that still need to be shifted out (bitsReg). No additional state register
or FSM is needed, all state is encoded in those three registers.
Counter cntReg is continuously running (counting down to 0 and reloaded with
the start value when 0). All action is only done when cntReg is 0. As we build a
minimal transmitter, we have only the shift register to store the data. Therefore, the
channel is only ready when cntReg is 0 and no bits are left to shift out. The IO port
txd is directly connected to the least significant bit of the shift register.
When there are more bits to shift out (bitsReg =/= 0.U), we shift the bits to the
right and fill with 1 (the idle level of a transmitter). If no more bits need to be shifted
out, we check if the channel contains data (signaled with the io.channel.valid
input). If so, the bit string to be shifted out is constructed with one start bit (0), the
8-bit data, and two stop bits (1). Therefore, the bit count is set to 11.
This very minimal transmitter has no additional buffer and can accept a new char-
io. channel . ready := ( cntReg === 0.U) && ( bitsReg === 0.U)
io.txd := shiftReg (0)
cntReg := BIT_CNT
when( bitsReg =/= 0.U) {
val shift = shiftReg >> 1
shiftReg := 1.U ## shift (9, 0)
bitsReg := bitsReg - 1.U
} . otherwise {
when(io. channel .valid) {
// two stop bits , data , one start bit
shiftReg := 3.U ## io. channel .bits ## 0.U
bitsReg := 11.U
} . otherwise {
shiftReg := 0x7ff.U
}
}
} . otherwise {
cntReg := cntReg - 1.U
}
}
acter only when the shift register is empty and at the clock cycle when cntReg is 0.
Accepting new data only when cntReg is 0 means that the ready flag is also de-
asserted when there would be space in the shift register. However, we do not want
to add this “complexity” to the transmitter but delegate it to a buffer.
Listing 11.4 shows a single byte buffer, similar to the FIFO register for the bub-
ble FIFO. The input and the output are UartIOs. The buffer contains the mini-
mal state machine to indicate empty or full. The buffer driven handshake signals
(io.in.ready and io.out.valid) depend on the state register.
When the state is empty, and data on the input is valid, we register the data and
switch to state full. When the state is full, and the downstream receiver is ready,
the downstream data transfer happens, and we switch back to state empty.
With that buffer we can extend our bare-bone transmitter. Listing 11.5 shows
the combination of the transmitter Tx with a single-buffer in front. This buffer now
relaxes the issue that Tx was ready only for single clock cycles. We delegated the
solution of this issue to the buffer module. An extension of the single word buffer to
a real FIFO can easily be done and needs no change in the transmitter or the single
byte buffer.
Listing 11.6 shows the code for the receiver (Rx). A receiver is a little bit tricky,
as it needs to reconstruct the timing of the serial data. The receiver waits for the
falling edge of the start bit. From that event, the receiver waits 1.5 bit times to
position itself into the middle of bit 0. Then it samples and shifts in the bits every
bit time. You can observe these two waiting times as BIT CNT and START CNT. For
both sample times, the same counter (cntReg) is used. After 8 bits are shifted in,
validReg signals an available byte.
Listing 11.7 shows the usage of the serial port transmitter by sending out a friendly
message. We define the message as a Scala string (msg) and converting it to a Chisel
Vec of UInt. A Scala string is a sequence that supports the map method. The map
method takes as argument a function literal, applies this function to each element,
and builds a sequence of the function’s return values. If the function literal has only
one argument, as it is in this case, the argument can be represented by . Our func-
tion literal calls the Chisel method .U to convert the Scala Char to a Chisel UInt.
The sequence is then passed to VecInit to construct a Chisel Vec. We index into
the vector text with the counter cntReg to provide the individual characters to the
buffered transmitter. With each ready signal we increase the counter until the full
string is sent out. The sender keeps valid asserted until the last character has been
sent out.
Listing 11.8 shows the usage of the receiver and the transmitter by connecting
them together. This connection generates an Echo circuit where each received char-
io.txd := tx.io.txd
In Section 11.1 we defined our own types for the interface with common names
for signals, such as write, full, din, read, empty, and dout. The input and the out-
put of such a buffer consists of data and two signals for handshaking (for example,
we write into the FIFO when it is not full).
Here we can generalize this handshaking to the so called ready/valid interface.
We can enqueue an element (write into the FIFO) when the FIFO is ready. We
signal this at the writer side with valid. As this ready/valid interface is so common,
Chisel provides a definition of this interface in DecoupledIO as follows:2
class DecoupledIO [T <: Data ]( gen: T) extends Bundle {
val ready = Input(Bool ())
val valid = Output (Bool ())
val bits = Output (gen)
}
With the DecoupledIO interface we define the interface for our FIFOs: a FifoIO with
an enqueue (enq) and a dequeue (deq ) port consisting of read/valid interfaces. The
DecoupledIO interface is defined from the writer’s (producer’s) view point. There-
fore, the enqueue port of the FIFO needs to flip the signal directions.
class FifoIO [T <: Data ]( private val gen: T) extends Bundle {
val enq = Flipped (new DecoupledIO (gen))
val deq = new DecoupledIO (gen)
}
With the abstract base class and an interface we can specialize for different FIFO
implementations optimized for different parameters (speed, area, power, or just sim-
plicity).
when( fullReg ) {
when(io.deq.ready) {
fullReg := false.B
}
}. otherwise {
when(io.enq.valid) {
fullReg := true.B
dataReg := io.enq.bits
}
}
Listing 11.9 shows the refactored bubble FIFO with a ready/valid interface. Note
what we put the Buffer component inside BubbleFifo as private class. This helper
class is only needed for this component and therefore we hide it and avoid polluting
the name space. The buffer class has also been simplified. Instead of an FSM we
use only a single bit (fullReg) for the state of the buffer: full or empty.
The bubble FIFO is simple, easy to understand, and uses minimal resources.
However, as each buffer stage has to toggle between empty and full, the maximum
switch ( stateReg ) {
is( empty) {
when(io.enq.valid) {
stateReg := one
dataReg := io.enq.bits
}
}
is(one) {
when(io.deq.ready && !io.enq.valid) {
stateReg := empty
}
when(io.deq.ready && io.enq.valid) {
stateReg := one
dataReg := io.enq.bits
}
when (!io.deq.ready && io.enq.valid) {
stateReg := two
shadowReg := io.enq.bits
}
}
is(two) {
when(io.deq.ready) {
dataReg := shadowReg
stateReg := one
}
}
}
Listing 11.10 shows the double buffer FIFO. As each buffer element can store
two entries we need only half of the buffer elements (depth/2). The DoubleBuffer
contains two registers, dataReg and shadowReg. The consumer is served always
from dataReg. The double buffer has three states: empty, one, and two, which signal
the fill level of the double buffer. The buffer is ready to accept new data when is it
in state empty or one. The buffer has valid data when it is in state one or two.
If we run the FIFO at full speed, and the consumer is always ready, the steady
state of the double buffers are one. Only when the consumer deasserts ready, the
queue fills up and the buffers enter state two. However, compared to a single bubble
FIFO, a restart of the queue takes only half the number of clock cycles for the same
buffer capacity. Similar the fall through latency is half of the bubble FIFO.
switch (op) {
is("b00".U) {}
is("b01".U) { // read
when (! emptyReg ) {
fullReg := false.B
emptyReg := nextRead === writePtr
incrRead := true.B
}
}
is("b10".U) { // write
when (! fullReg ) {
doWrite := true.B
emptyReg := false.B
when( doWrite ) {
memReg ( writePtr ) := io.enq.bits
}
As there are two pointers that are incremented on an action and wrap around at
the end of the buffer, we define a function counter() that implements those wrap-
ping counters. With log2Ceil(depth).W we compute the bit length for the counter.
We can deconstruct such a tuple by using the parenthesis notation on the left hand
side of the assignment:
val (x1 , x2) = t
val readCond =
! outputValidReg && (( readPtr =/= writePtr ) || fullReg )
when(io.enq.fire) {
emptyReg := false.B
fullReg := ( nextWrite === readPtr ) & !read
incrWrite := true.B
doWrite := true.B
}
}
The handling of read and write pointer is identical to the register memory FIFO.
However, a synchronous on-chip memory delivers the result of a read in the next
clock cycle, where the read of the register file was available in the same clock cycle.
Therefore, we need an additional register to handle this latency.
when(p.we) {
ram(p.addr) := p.datai
}
}
p. datao := datao
}
}
}
11.5 Exercises
This exercise section is a little bit longer as it contains two exercises: (1) explor-
ing the bubble FIFO and implement a different FIFO design; and (2) exploring the
UART and extending it. Source code for both exercises is included in the chisel-
examples repository.
The FIFO source also includes a tester that provokes different read and write behav-
ior and generates a waveform in the value change dump (VCD) format. The VCD
file can be viewed with a waveform viewer, such as GTKWave. Explore the FifoS-
pec in the repository. The repository contains a Makefile to run the examples, for
the FIFO example just type:
$ make fifo
This make command will compile the FIFO, run the test, and starts GTKWave for
waveform viewing.3 Explore the tester and the generated waveform.
In the first cycles, the tester writes a single word. We can observe in the waveform
how that word bubbles through the FIFO, therefore the name bubble FIFO. This
bubbling also means that the latency of a data word through the FIFO is equal to the
depth of the FIFO.
The next test fills the FIFO until it is full. A single read follows. Notice how
the empty word bubbles from the reader side of the FIFO to the writer side. When
a bubble FIFO is full, it takes a latency of the buffer depth for a read to affect the
writer side.
The end of the test contains a loop that tries to write and read at maximum speed.
We can see the bubble FIFO running at maximum bandwidth, which is two clock
cycles per word. A buffer stage has always to toggle between empty and full for a
single word transfer.
A bubble FIFO is simple and for small buffers has a low resource requirement.
The main drawbacks of an n stage bubble FIFO are: (1) maximum throughput is
one word every two clock cycles, (2) a data word has to travel n clock cycles from
the writer end to the reader end, and (3) a full FIFO needs n clock cycles for the
restart.
These drawbacks can be solved by a FIFO implementation with a circular buffer.
The circular buffer can be implemented with a memory and read and write pointers.
Rerun/rewrite the test with the other FIFO implementation and compare the band-
width and latency. Synthesize the different FIFO versions and compare the resource
requirements.
$ make uart
Then use your synthesize tool to synthesize the design. The repository contains a
Quartus project for the DE2-115 FPGA board. With Quartus use the play button to
synthesize the design and then configure the FPGA. After configuration, you should
see a greeting message in the terminal.
Extend the blinking LED example with a UART and write 0 and 1 to the serial
line when the LED is off and on. Use the BufferedTx, as in the Sender example.
With the slow output of characters (two per second), you can write the data to
the UART transmit register and can ignore the ready/valid handshake. Extend the
example by writing repeated numbers 0-9 as fast as the baud rate allows. In this
case, you have to extend your state machine to poll the UART status to check if the
transmit buffer is free.
The example code contains only a single buffer for the Tx. Feel free to add the
FIFO that you have implemented to add buffering to the transmitter and receiver.
185
12 I NTERCONNECT
Address
decoder
CSm CSio
Input and
CPU Memory
Output
Figure 12.1: A classic computer consisting of a processor (CPU), memory, and I/O;
connected via address, data, and control buses.
Modern computers have different buses for different peripheral devices, for ex-
ample, a dedicated memory bus for external memory and I/O buses for peripheral
devices. Furthermore, modern I/O buses, such as PCI Express, are serial buses and
use point-to-point connections.
Nevertheless, the notion of the classic processor bus with an address bus, a data
bus, and chip select signals is still the mainstream mindset for core interconnec-
tions. We will derive an adaption of this concept for on-chip interconnect in the
next section.
Address
decoder
CSio
CSm
Input and
CPU Memory
Output
addr din dout rd/wr addr din dout rd/wr addr din dout rd/wr
Figure 12.2: The translation of the off-chip bus concept to an on-chip “bus”.
clock
address address
rd
ack
data data
1 2 3 4 5
Figure 12.2 shows the implementation of the bus concept within a chip. The
address, data output, and control signals are connected from the CPU to all periph-
erals. For the data input we use a multiplexer (instead of a tri-state bus). The address
decoder, besides generating the chip select signals, drives the selection of the data
input multiplexer.
With that simple setup we assume that each operation (read or write) can be ex-
ecuted in a single clock cycle. This is only possible for very small systems. We
can extend this by defining that we expect the read result in the next clock cycle,
following the read request. This fits well for on-chip memories with usually syn-
chronous reads that have one clock cycle latency. For IO devices this additional
clock cycle latency relaxes the timing constraints as well. We still assume that a
write is performed in one clock cycle.
If we want to communicate with devices with different or even varying latency,
we need to introduce handshaking. The processor signals the start of a transaction
with a read or write request, and the memory or peripheral device signals the end of
a transaction with an acknowledgment signal.
clock
address A1 A2 A3
rd
ack
data D1 D2 D3
1 2 3 4 5 6 7 8
Data and the acknowledgment are valid for a single clock cycle. The benefit of that
protocol specification is that it allows for a single-cycle transaction. However, the
price is that the handshake process, including decoding is a combinational circuit,
which can lead to issues with the maximum frequency. The standard Wishbone [12]
protocol uses same-cycle acknowledgement. The newer version of Wishbone added
a pipelined protocol.
Same-cycle acknowledgement (or ready signal) has been criticized in [13]. A
single-cycle transaction is usually not realistic in a larger system. Therefore, we
can define a specification where the acknowledge (or busy or ready) signal does not
need to be valid in the request cycle. That paper proposes SimpCon, a protocol
that enables pipelined transactions and avoids the combinational path between the
processor, the address decoding, and the peripheral device.
of the peripheral device. The request from the processor is only a single clock
cycle long. The address bus and the read signal do not need to be driven until the
acknowledgment. Compared to the former protocol, the ack signal needs to be valid
(low or high) no earlier than one clock cycle after the rd command, in clock cycle
3. The first read sequence has two clock cycles latency in this example. It has the
same latency as in the former example. However, as the request needs to be valid
only one clock cycle, we can pipeline requests. Read of addresses A2 and A3 can be
requested back to back, allowing a throughput of 1 data word per clock cycle.
The Patmos processor [20] uses an OCP version with exactly this protocol for ac-
cessing IO devices. Memory is connected via a burst interface. The Patmos Hand-
book [16] gives a detailed description of the used OCP interfaces. Furthermore, we
have started a Chisel repository with multicore devices, such as a network-on-chip,
that implement the described pipelined interface.
The on-chip version of an interconnect definition can be generalized to a point-
to-point connection. The processor and peripheral devices are connected with such
a point-to-point interface to a switching fabric. If the system contains more than
one processor (or master) we need arbitration within the switching fabric to decide
which master is allowed to issue read and write commands.
Listing 12.1 shows an IO device that implements the specification of the pipelined
interconnect. The IO device contains four loadable counters. To address those four
counters we need two address bits. We read the value from the counter with a read
transaction (rd is asserted) and get the result in the next clock cycle (in dout). We
write to a counter with wr asserted and the value set in din.
To implement the delayed acknowledge, we use a single bit register (ackReg) to
delay any asserted rd or wr. As we provide the read result in the clock cycle that
follows the read command and the address is only valid during this command cycle,
we need to store the address in addrReg.
The counters themselves consist of a small register file of 4 elements (a Reg of a
Vec). The counters are initialized to zero by using a Scala Seq, created with fill
containing the reset values as Chisel constants. That Seq is the input to the VecInit.
The counters are freely running and increment by one each clock cycle, except when
written a new value.
io.ack := ackReg
}
Address Device
0x0000–0x0fff ROM
0x1000–0x1fff RAM
0xf000 UART
0xf010 LEDs
0xf020 Keys
In our example system all devices, whether they be memory or IO devices, are
connected to shared address lines. Therefore, they appear in the shared address
space. To select individual devices, we use address decoding of some upper bits.
This is called memory-mapped devices, and as part of the system design we decide
on an address mapping.
Table 12.1 shows an example address map for a (16-bit) microcontroller. We
assume 16-bit addresses, therefore the range of the addresses is between 0x0000
and 0xffff. At the lowest address (the starting of program to be executed) we map a
read-only memory (ROM) that contains the program. In the next memory area we
map a writable memory (RAM) for the data. We decide to map all IO devices into
the upper area of the address space (above 0xf000), so they are out of the way, in
case we want to extend the memory. In this example we reserve 16 bytes of address
for each IO device. Note that this is a made-up example and that we have all the
flexibility when deciding on an address map.
Some IO devices do not have memory-mapped registers, as the counter example
device, but a ready/valid interface, as explained in Section 9.3. The UART, as pre-
sented in Section 11.2, for example, has such two ready/valid interfaces: one for
writing and one for reading a value. A common solution is to map write and read
channel to one address and drive the according signals on the write or read com-
mand. To signal if the write channel is ready to receive a new data word or the read
channel has a valid data, we map those two signals into a status register at a different
address.
Table 12.2 shows an address mapping for the UART. At the base address (0xf000)
we access a status register on a read and an optional control register on a write. At
the next address (0xf001) we read from a read buffer and write into a transmit buffer.
Bit Status
0 TDRE Transmit (TX) data register empty
1 RDRF Receive (RX) data register full
Table 12.3 shows the mapping of two flags into the status register. Both bits
signal that we can perform a write or a read transaction. When the transmit data
register is empty (TDRE), we can write (send) new data to the transmitter (TX).
When the receive data register is full, we can read data from the receiver (RX). The
terminology might sound a bit like using old terms. And this is true for our interface.
In fact, this is exactly the mapping of the first serial port for the IBM PC built with
the 8250 chip, and it is still valid.
Note that to use ready and valid in a status register for polling, the ready signal
from the transmitter and the valid signal from the receiver are not allowed to be
deasserted once they have been asserted. If this cannot be guaranteed, two single-
word buffers, as shown in Listing 9.7, can be inserted between the IO interface and
device with the read/valid interface.
For our memory-mapped devices we define a bundle:
class MemoryMappedIO extends Bundle {
val address = Input(UInt (4.W))
val rd = Input(Bool ())
val wr = Input(Bool ())
val rdData = Output (UInt (32.W))
val wrData = Input(UInt (32.W))
val ack = Output (Bool ())
}
Listing 12.2 shows the memory mapped interface to a streaming device, like a
serial port.
12.3.1 Wishbone
The Wishbone [12] specification is a definition of a point-to-point communication
and not a bus in the classic sense. Wishbone is a public domain standard used by
several open-source IP cores. The Wishbone interface specification is still in the
tradition of microcomputer or backplane buses. However, for an SoC interconnect,
which is usually point-to-point1 , this is not the best approach. The master is re-
quested to hold the address and data valid through the whole read or write cycle.
This complicates the connection to a master that has the data valid only for one
cycle. In this case the address and data have to be registered before the Wishbone
connect, or an expensive (in terms of time and resources) multiplexer has to be used.
A register results in one additional cycle of latency. A better approach would be to
register the address and data in the slave. In that case the address decoding in the
slave can be performed in the same cycle as when the address is registered. A sim-
ilar issue, with respect to the master, exists for the output data from the slave: As
it is only valid for a single cycle the data has to be registered by the master when
the master is not reading it immediately. Therefore, the slave should keep the last
valid data at its output even when the Wishbone strobe signal (wb.stb) is not as-
signed anymore. Holding the data in the slave is usually free in terms of hardware
complexity—it is just a specification issue. In the classic Wishbone specification
there is no way to perform a pipelined read or write. However, the latest Wishbone
specification (B4) contains also a pipelined definition. Note that the specification
now contains two different, not necessarily compatible, specifications.
12.3.2 AXI
The Advanced Microcontroller Bus Architecture (AMBA) [2] is an interconnection
definition from ARM. The specification defines three different buses: Advanced
1 Multiplexers are used instead of buses to connect several slaves and masters.
// ack
ackReg := io.mem.rd || io.mem.wr
io.mem.ack := ackReg
// write to tx
io.tx.bits := io.mem. wrData
io.tx. valid := io.mem.wr
}
High-performance Bus (AHB), Advanced System Bus (ASB), and Advanced Pe-
ripheral Bus (APB). The AHB is used to connect on-chip memory, cache, and ex-
ternal memory to the processor. Peripheral devices are connected to the APB. A
bridge connects the AHB to the lower-bandwidth APB. An AHB bus transfer can
be one cycle with burst operation. With the APB a bus transfer requires two cycles
and no burst mode is available. Peripheral bus cycles with wait states are added in
the version 3 of the APB specification. ASB is the predecessor of AHB and is not
recommended for new designs (ASB uses both clock phases for the bus signals –
very uncommon for today’s synchronous designs).
Amba AXI (Advanced eXtensible Interface) and ACE version 4 [3] is the latest
extension to AMBA. AXI introduces out-of-order transaction completion with the
help of a 4-bit transaction ID tag. A ready signal acknowledges the transaction start.
The master has to hold the transaction information (e.g. address) till the interconnect
signals ready. This enhancement ruins the elegant single-cycle address phase from
the original AHB specification.
The AXI bus uses ready/valid handshaking for all signals (read address, read data,
write address, write data, and write response). The decoupling of the write address
and the write data needs a more complex slave that can accept any order of arriving
address and data.
13.1 Debugging
During your design and coding phase you often debug your design. Debugging is the
process of finding defects in your code. Those defects are called bugs. Debugging
is often performed in parallel with writing new code.
One can debug a program by using a debugger or simply by printing interesting
values to the terminal, called printf debugging. In hardware elements are executing
in parallel. Therefore a common form of hardware debugging is generating wave-
forms and watching how signals of interest evolve over time. We call this waveform
debugging.1
A Chisel tester can generate waveforms, which can be viewed, for example, with
GTKWave. However, for quick checks it is also possible to print signal values
1 There is no entry in Wikipedia for this; we should create one.
199
13 D EBUGGING , T ESTING , AND V ERIFICATION
during simulation of the circuit. Values are printed at the rising edge of the clock.
Alternatively, you can use the behavior of ‘‘module name’’ syntax to refer to
the module with it. This is useful when you have several tests for a single module.
Simple tests start by writing test vectors with poke to the DUT, advancing the
clock, and testing the outputs with an expect. For debugging purposes we can also
peek values and print them out for manual inspection. The code in Listing 13.1 tests
the counter device that we introduced in Chapter 12 as an example IO device.
As you can see, the test covers only a few cases, but is already very long to read.
All those pokes and expects are cumbersome. As a first step, we shall introduce
functions to represent a read and a write request. Those functions abstract away the
manual “bit banging” at the interface pins in the testing code. Listing 13.2 shows a
test with those functions. For a shortcut we also define the function step to advance
the clock.
The read function takes an address as parameter and returns the read value. After
poking the address and the read signal the function advances the clock by one clock
cycle and deasserts the read function. In our example device, the read value should
be available after one clock cycle. However, we generalize the read function also for
devices with longer latencies and the read function waits in an endless loop that ack
will become true. Note that we use peekBoolean to read a Scala Boolean. However,
if a device has the fault of never asserting ack after a request, the test will hang in an
endless loop. A robuster read function shall contain a timeout for the ack polling.
Finally, we read the data from rdData with peekInt() to read a Scala integer value
(concrete a BigInt to express integers of any size).
The write function takes an address and the data parameters as Scala Int. Similar
to the read function, the values are poked into the device, the clock is advanced by
one clock cycle, and then the write signal deasserted. Here we also wait in an
endless loop for ack to become true.
With those three functions available, we can write more readable tests with fewer
lines of code. This testing code already covers more cases than the original bit-
banging tester.2
If you have a large test suite, you may wish to run only a subset of your tests as
part of a continuous integration run. The easiest way to achieve this and still have
to run only a single SBT command is by tagging your tests.
object Unnecessary extends Tag(" Unnecessary ")
By default, all tests are run using sbt test or sbt testOnly *. To leave out tests
tagged with, for example, Unnecessary, you can run:
$ sbt "testOnly * -- -l Unnecessary"
When you run the command, the test will show up as ignored in the terminal:
[info] TagTest:
[info] Integers
...
[info] No tests were executed.
If your tests (and tags) are part of a package, remember to provide the full refer-
ence path to both. The following subsections present advanced testing techniques
that you likely do not need yet. You can skip ahead to the exercise and return later
if you find the need.
Multiple threads are spawned with stacked calls to fork. The spawned threads
represent a hierarchy in which the first thread should not finish before any of the
subsequent threads.
Additional flexibility arises from the ability to supply your own switches to the
simulator command that starts the backend. This is done by using VerilatorFlags
to add switches to the Verilator simulation command, or VerilatorCFlags to add
switches to GCC. They should be in the list of annotations along with the backend
annotation. You need to refer to the tool’s user manual to find a detailed list of com-
mand line arguments. Note that VerilatorFlags and VerilatorCFlags annotations
are advanced features that should generally not be needed. Furthermore, the flags
are not guaranteed to remain stable.
Note that ChiselTest 0.3.4 and later support code coverage measures directly in
simulation. To support this, make sure to install Verilator version 4.028 or newer.
Also, beware that different simulators work in different ways. Verilator is a so-
called synchronous simulator, which means that it runs updates only at the rising
edge of the clock and thus does not support latches. It also does not officially support
multiple clocks. VCS, on the other hand, is an event-based simulator, which is
significantly more detailed in its simulations and supports all synthesizable Verilog
constructs. Generally, for single-clock circuits, Verilator is the fastest and most
widely available tool.
13.5 Exercise
Extreme programming is an agile software development style, focusing on quick
turnaround times and a strong dependency on unit tests. In its pure form one writes
the tests first, before implementing a feature. This style is not used so often in real
live. However, exploring it may help to focus on testing as an important part of
developing artifacts.
Therefore, the proposed exercise is to write test benches for designs that you
have not yet implemented. Pick one of the small projects from Chapter 7, e.g., the
debouncing circuit or the majority based filtering design, and write tests for it. Then
implement the hardware design itself.
Explore the experience of this little experiment. Did you implement tests that
found errors in your design? If all tests pass, are you sure you have tests that cover
a reasonable design space? How do you test your tests? Add a fault into your DUT
and see if your tests will catch it.
As you work through this exercise you may experience an unpleasant feel that
testing is hard and it is probably impossible to catch all errors.3 However, there is
hope in recent development in formal verification to complement testing. The topic
of formal verification with Chisel will be covered in a future edition of this book.
3A famous quote by Dijkstra is “Program testing can be used to show the presence of bugs, but never
to show their absence!”
209
14 D ESIGN OF A P ROCESSOR
cumulator is 0. Leros branches are relative to the current instruction and can branch
forward and backward around 2000 instructions. For larger control flow changes
and for function calls and returns, Leros has a jump-and-link (jal) instruction. That
instructions jumps to the address that is in the accumulator and stores the address of
the following instruction into a register. That value can then be used to return from
a function with jal.
The accumulator and the register file is in our current implementation 32 bits
wide.1
In Table 14.1 shows the instruction set of Leros. A represents the accumulator, PC
is the program counter, i is an immediate value (0 to 255), Rn a register n (0 to 255),
o a branch offset relative to the PC, and AR an address register for memory access.
Following code snippets shows examples of Leros instructions in assembly:
loadi 1
addi 2
ori 0x50
andi 0x1f
subi 0x13
loadi 0xab
addi 0x01
subi 0xac
scall 0
We can see that each instruction consists for the instruction name (also call opcode
mnemonic) and a constant. The constant can be written in decimal or hexadecimal
notion. The code shows immediate versions of load, arithmetic, and logic instruc-
tions. The last instruction (scall 0) is a system call and ends the execution (or
simulation). This short program is part of Leros test suit. The convention of the test
is that at the end of the program the accumulator shall contain 0.
Instructions are 16 bits wide. The higher byte is used to encode the instruc-
tion, the lower byte contains either an immediate value, a register number, or a
branch offset (part of the branch offset uses also bits in the upper byte). For example
00001001.00000010 is an add immediate instruction that adds 2 to the accumulator,
where 00001000.00000011 adds the content of R3 to the accumulator. For branches
we use 3 of the instruction bits for larger offsets.
Listing 14.1 shows the encoding of the instructions in the upper 8 bits of each
instruction. Not all instruction bits are currently used (unused are marked with -)
1 We try to keep it configurable to be able to also implement 16-bit or 64-bit versions of Leros.
+--------+----------+
|00000 - - -| nop |
|000010 -0| add |
|000010 -1| addi |
|000011 -0| sub |
|000011 -1| subi |
|00010 - - -| sra |
|00011 - - -| - |
|00100000| load |
|00100001| loadi |
|00100010| and |
|00100011| andi |
|00100100| or |
|00100101| ori |
|00100110| xor |
|00100111| xori |
|00101001| loadhi |
|00101010| loadh2i |
|00101011| loadh3i |
|00110 - - -| store |
|001110 -?| out |
|000001 -?| in |
|01000 - - -| jal |
|01001 - - -| - |
|01010 - - -| ldaddr |
|01100 -00| ldind |
|01100 -01| ldindb |
|01100 -10| ldindh |
|01110 -00| stind |
|01110 -01| stindb |
|01110 -10| stindh |
|1000 nnnn| br |
|1001 nnnn| brz |
|1010 nnnn| brnz |
|1011 nnnn| brp |
|1100 nnnn| brn |
|11111111| scall |
+--------+----------+
AR
address rdData
ALU A
1
Instr. Memory
wrData IO
+
PC
wrEna
Decode
Data Memory
// Alu ops
val nop = 0
val add = 1
val sub = 2
val and = 3
val or = 4
val xor = 5
val ld = 6
val shr = 7
An ALU usually has two operand inputs (call them a and b), an operation op (or
opcode) input to select the function and an output y. Listing 14.2 shows the ALU.
We first define shorter names for the three inputs. The switch statement defines
the logic for the computation of res. Therefore, it gets a default assignment of 0.
The switch statement enumerates all operations and assigns the expression accord-
ingly. All operations map directly to a Chisel expression. In the end, we assign the
result res to the ALU output y
val op = io.op
val a = accuReg
val b = io.din
val res = WireDefault (a)
switch (op) {
is(nop.U) {
res := a
}
is(add.U) {
res := a + b
}
is(sub.U) {
res := a - b
}
is(and.U) {
res := a & b
}
is(or.U) {
res := a | b
}
is(xor.U) {
res := a ˆ b
}
is(shr.U) {
res := a >> 1
}
is(ld.U) {
res := b
}
}
io.accu := accuReg
}
For the testing, we write the ALU function in plain Scala, as shown in List-
ing 14.3.
While this duplication of hardware written in Chisel and Scala implementation does
not detect errors in the specification; it is at least some sanity check. We use some
corner case values as the test vector:
op match {
case 0 => a
case 1 => a + b
case 2 => a - b
case 3 => a & b
case 4 => a | b
case 5 => a ˆ b
case 6 => b
case 7 => a >>> 1
case _ => -123 // This shall not happen
}
}
Full, exhaustive testing for 32-bit arguments is not possible, which was the reason
we selected some corner cases as input values. Beside testing against corner cases,
it is also useful to test against random inputs:
val randArgs =
Seq.fill (10)(scala.util. Random . nextInt ())
test( randArgs )
You can run the tests within the Leros project with
[info] AluAccuTest:
[info] AluAccu
[info] - should pass
[info] Run completed in 1 second, 794 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
For the decode component, we define a Bundle for the output, which is later used in
the execution state and fed partially into the ALU. The DecodeOut Bundle contains
more fields, not showing here. See the original Leros code base for the details.
class DecodeOut extends Bundle {
val operand = UInt (32.W)
val enaMask = UInt (4.W)
val op = UInt ()
val off = SInt (10.W)
val isRegOpd = Bool ()
val useDecOpd = Bool ()
val isStore = Bool ()
// ... and more fields
We also define a companion object for the DecodeOut class that includes a function
default() to create a DecodeOut object and sets all fields to default values.
object DecodeOut {
Decode takes as input an 8-bit opcode and delivers the decoded signals as output.
Those driving signals are assigned a default value by using the default function to
create that object.
class Decode () extends Module {
val io = IO(new Bundle {
val din = Input(UInt (16.W))
val dout = Output (new DecodeOut )
})
import DecodeOut ._
The decoding itself is just a large switch statement on the part of the instruction that
represents the opcode (in Leros for most instructions the upper 8 bits.)
switch ( instr (15, 8)) {
is(ADD.U) {
d.op := add.U
d. enaMask := MaskAll
d. isRegOpd := true.B
}
is(ADDI.U) {
d.op := add.U
d. enaMask := MaskAll
d. useDecOpd := true.B
}
is(SUB.U) {
d.op := sub.U
d. enaMask := MaskAll
d. isRegOpd := true.B
}
is(SUBI.U) {
d.op := sub.U
d. enaMask := MaskAll
d. useDecOpd := true.B
}
is(SHR.U) {
d.op := shr.U
d. enaMask := MaskAll
}
// ...
Additionally, the decode module also generates sign extended version of the con-
stant in the instruction and computes the offset for the indirect load and store in-
structions.
define a function getProgram that calls the assembler. For branch destinations, we
need a symbol table, which we collect in a Map. A classic assembler runs in two
passes: (1) collect the values for the symbol table and (2) assemble the program
with the symbols collected in the first pass. Therefore, we call assemble twice with
a parameter to indicate which pass it is.
def getProgram (prog: String ) = {
assemble (prog)
}
The assemble function starts with opening the source file and defining two helper
functions to parse the two possible operands: (1) an integer constant (allowing dec-
imal or hexadecimal notation) and (2) to read a register number.
def assemble (prog: String , pass2: Boolean ): Array[Int] = {
Listing 14.4 shows the core of the assembler for Leros. A Scala match expression
covers the core of the assembly function.
size tools cannot map to an on-chip memory. Using the MLIR backend shall fix this issue. Another
workaround for this issue, as done in the Patmos project for the bootloader, is to generate Verilog
code that fits the synthesize tools and include it as a black box.
switch ( stateReg ) {
is( fetch ) {
stateReg := execute
}
is( execute ) {
stateReg := fetch
}
}
The state machine just switches between the two states. In the fetch state we fetch
an instruction from the instruction memory and also decode that instruction. We
also start a read operation from the data memory in the fetch state, as the data
memory is synchronous and needs one clock cycle to deliver the read result.
In the execute state we compute a new value for the accumulator or store the read
result form the data memory into the accumulator. We also preform a write in the
execute state.
The following code shows the instantiation of the ALU including the accumula-
tor and the two main state registers: the program counter (pcReg) and the address
register (addrReg).
The following code shows the instantiation of the instruction memory. Note that
the instruction memory has as parameter the file name of the program.
val mem = Module (new InstrMem ( memAddrWidth , prog))
mem.io.addr := pcNext
val instr = mem.io.instr
The following code shows the instantiation of the decode module. The input of
the decode module is the instruction from the fetch module and the outputs are the
decode signals. As we need those signals in the execute state, they are registered in
decReg.
Listing 14.6 shows the data memory of Leros. The memory is organized in 32-
bit words. To enable byte access to those 32-bit words, the word is split into a
Vec of four 8-bit bytes. A read() operation returns a vector of four bytes that we
concatenate to a 32-bit word with the ## operator. For the write, we split the write
word into those four bytes and use the write mask (wrMask) to select which bytes
are written. The SyncReadMem components contains a write() function that takes a
vector and a write mask as parameters.
The following code shows the instantiation of the data memory and the connec-
tions of the ports.
val dataMem = Module (new DataMem (( memAddrWidth )))
14.9 Exercise
This exercise assignment in one of the last Chapters is in a very free form. You are
at the end of your learning tour through Chisel and ready to tackle design problems
that you find interesting.
One option is to reread the chapter and read along with all the source code in the
Leros repository, run the test cases, fiddle with the code by breaking it and see that
tests fail.
Another option is to write your own implementation of Leros. The implemen-
tation in the repository is just one possible organization. You could write a Chisel
simulation version of Leros with just a single pipeline stage, or go crazy and super-
pipeline Leros for the highest possible clocking frequency.
A third option is to design your processor from scratch. Maybe the demonstration
of how to build the Leros processor and the needed tools has convinced you that
processor design and implementation is no magic art, but engineering that can be
very joyful.
When you develop a circuit in open-source and share it, for example on GitHub,
this a very educational act, as others can learn to describe hardware in Chisel from
your code example. However, sharing just the source code forces others to copy
your code into their project. This leads to at least two problems: (1) Two copies are
never the same.1 This means that changes will happen to the source of one copy and
they are then not in sync anymore. (2) It is cumbersome to update the copy when
the original design has been improved with a bug fix or a new feature.
A better approach is to publish that open-source circuit as a library. Compiled
Chisel code are simply Java class files. And those class files are platform indepen-
dent. Therefore, this is an ideal way to share Chisel libraries. Java (and Scala) have
a long tradition and good infrastructure to support public sharing of libraries with
unique group identifiers and version control. That is also the way Chisel itself and
some support libraries are published.
This section describes the steps needed to publish a Chisel library. As the tools
you use for publishing may change quickly, consider finding the latest information
on the Internet. A good blog entry on the topic can be found here.
1I learned this phrase from Doug Locke during discussion sessions developing the safety-critical spec-
ification for Java
231
15 C ONTRIBUTING TO C HISEL
The ip-contributions library contains also the UART and the FIFOs, described
in this book. Modern IDEs let you automatically download the source code of the
library for inspection, when configured in build.sbt.
If you have a Chisel circuit that you would like to share, consider contributing it
to ip-contributions. Contribution starts with a git pull request of your addition.
This will start a friendly review process.
15.1.2 Prerequisite
Maven Central is one of the largest repositories for hosting software libraries. Pub-
lishing to Maven Central is easiest via Sonatype. Sonatype offers free hosting
of open-source projects via the Sonatype Repository. Following initial steps are
needed before publishing a library:
2. You need a unique groupId, which is usually a domain name in reverse order,
e.g., edu.berkeley.cs. You can also use your GitHub account as a groupId,
for example, mine is io.github.schoeberl. You register this groupId by
opening an issue. This is a manual process where you get a request to prove
that you own the requested groupId. When using the GitHub domain name,
you are requested to set up a repository to show your ownership.
credentials += Credentials(
"Sonatype Nexus Repository Manager",
"oss.sonatype.org",
"<user name>",
"<password>"
)
4. All artefacts must be signed with a PGP key pair. You can use the open-source
GNU Privacy Guard. You can create, list, and upload your public PGP key
with:
gpg --gen-key
gpg --list-keys
gpg --keyserver keyserver.ubuntu.com --send-keys keyID
2. Add information about the library into build.sbt. Here as an example the
relevant section of build.sbt in ip-contributions:
name := "ip-contributions"
version := "0.4.0"
publishTo := Some(
if (isSnapshot.value)
Opts.resolver.sonatypeSnapshots
else
Opts.resolver.sonatypeStaging
)
sbt publishSigned
In my setup the signing from sbt does not work, so I have to copy out the pgp
command to sign, something similar to:
sbt sonatypeRelease
Watch out during the publish local command for the version string of the pub-
lished library, which contains the string SNAPSHOT. If you use the tester and the
published version is not compatible with the Chisel SNAPSHOT, fork and clone the
chisel-tester repo as well and publish it locally.
To test your changes in Chisel, you probably also want to set up a Chisel project,
for example, by forking/cloning an empty Chisel project, renaming it, and removing
the .git folder from it.
Change the build.sbt to reference the locally published version of Chisel. Com-
pile your Chisel test application and take a close look to ensure that it picks up the
local published version of the Chisel library (there is also a SNAPSHOT version
published, so if, for example, the Scala version is different between your Chisel li-
brary and your application code, it picks up the SNAPSHOT version from the server
instead of your local published library.)
See also some notes at the Chisel repo.
15.2.2 Testing
When you change the Chisel library, you should run the Chisel tests. In an sbt-based
project, they are usually run with:
$ sbt test
Furthermore, if you add functionality to Chisel, you should also provide tests for
the new features.
15.3 Exercise
Invent a new operator for the UInt type, implement it in the Chisel library, and
write some usage/test code to explore the operator. It does not need to be a useful
operator; just anything will be good, for example, a ? operator that delivers the
lefthand side if it is different from 0 and the righthand side otherwise. Sounds like
a multiplexer, right? How many lines of code did you need to add?3
As simple as this was, please do not be tempted to fork the Chisel project and add
your little extensions. Changes and extensions shall be coordinated with the main
developers. This exercise was just a simple exercise to get you started.
If you are getting bold, you could pick one of the open issues and try to solve
it. Then contribute with a pull request to Chisel. However, probably first watch
the style of development in Chisel by watching the GitHub repositories. See how
changes and pull requests are handled in the Chisel open-source project.
3A quick and dirty implementation needs just two lines of Scala code.
Source Access
This book is available in open source. The repository also contains slides for a
digital design course with Chisel and all Chisel examples: https://github.com/
schoeberl/chisel-book
A collection of medium-sized examples, most of which are referenced in the
book, is also available in open source. This collection also contains projects for vari-
ous popular FPGA boards: https://github.com/schoeberl/chisel-examples
237
A Reserved Keywords
Several keywords are reserved in Chisel (and Scala) and cannot be used as identifiers
for your hardware design. Table A.1 lists the reserved words from Scala.
Table A.2 lists the reserved words added by the Chisel library. In contrast to
the Scala reserved word listing, it also contains type/class names defined by Chisel.
Although technically possible, you should also avoid using Chisel (and Scala) op-
erators, such as + or <<, for example.
239
B Chisel Projects
Chisel is not (yet) used in many projects. Therefore, open-source Chisel code to
learn the language and the coding style is rare. Here we list several projects we are
aware of that use Chisel and are in open source.
241
B C HISEL P ROJECTS
NoC is a state-of-the-art design with wormhole routing, credits for flow con-
trol, and virtual channels. OpenSoC Fabric is still using Chisel 2.
DANA is a neural network accelerator [7] that integrates with the RISC-V Rocket
processor using the Rocket Custom Coprocessor (RoCC) interface [8]. DANA
supports inference and learning.
Chiselwatt is an implementation of the POWER Open ISA. It includes instructions
to run Micropython.
VTA Hardware Design Stack is an accelerator for machine learning for the Apache
TVM machine learning compiler framework.
If you know an open-source project that uses Chisel, please drop me a note so I
can include it in a future edition of the book.
FF flip-flop
243
C ACRONYMS
IC instruction count
IC integrated circuit
IO input/output
JIT just-in-time
LC logic cell
MMIO memory-mapped IO
MUX multiplexer
OO object oriented
OOO out-of-order
OS operating system
[3] ARM. AMBA AXI and ACE protocol specification AXI3, AXI4, and AXI4-
Lite ACE and ACE-Lite. https://developer.arm.com/documentation/
ihi0022/e/, 2011.
[4] Krste Asanović, Rimas Avizienis, Jonathan Bachrach, Scott Beamer, David
Biancolin, Christopher Celio, Henry Cook, Daniel Dabbelt, John Hauser,
Adam Izraelevitz, Sagar Karandikar, Ben Keller, Donggyu Kim, John Koenig,
Yunsup Lee, Eric Love, Martin Maas, Albert Magyar, Howard Mao, Miquel
Moreto, Albert Ou, David A. Patterson, Brian Richards, Colin Schmidt,
Stephen Twigg, Huy Vo, and Andrew Waterman. The rocket chip genera-
tor. Technical Report UCB/EECS-2016-17, EECS Department, University of
California, Berkeley, Apr 2016.
[5] Jonathan Bachrach, Huy Vo, Brian Richards, Yunsup Lee, Andrew Waterman,
Rimas Avizienis, John Wawrzynek, and Krste Asanovic. Chisel: constructing
hardware in a Scala embedded language. In Patrick Groeneveld, Donatella
Sciuto, and Soha Hassoun, editors, The 49th Annual Design Automation Con-
ference (DAC 2012), pages 1216–1225, San Francisco, CA, USA, June 2012.
ACM.
[6] William J. Dally, R. Curtis Harting, and Tor M. Aamodt. Digital design using
VHDL: A systems approach. Cambridge University Press, 2016.
[7] Schuyler Eldridge, Amos Waterland, Margo Seltzer, Jonathan Appavoo, and
Ajay Joshi. Towards general-purpose neural network computing. In 2015 In-
ternational Conference on Parallel Architecture and Compilation, PACT 2015,
San Francisco, CA, USA, October 18-21, 2015, pages 99–112, 2015.
247
B IBLIOGRAPHY
[8] Schuyler Eldridge, Amos Waterland, Margo Seltzer, and Jonathan Ap-
pavooand Ajay Joshi. Towards general-purpose neural network computing.
In 2015 International Conference on Parallel Architecture and Compilation
(PACT), pages 99–112, Oct 2015.
[10] IBM. On-chip peripheral bus architecture specifications v2.1, April 2001.
[14] Martin Schoeberl. Lipsi: Probably the smallest processor in the world. In
Architecture of Computing Systems – ARCS 2018, pages 18–30. Springer In-
ternational Publishing, 2018.
[15] Martin Schoeberl, Sahar Abbaspour, Benny Akesson, Neil Audsley, Raffaele
Capasso, Jamie Garside, Kees Goossens, Sven Goossens, Scott Hansen, Rein-
hold Heckmann, Stefan Hepp, Benedikt Huber, Alexander Jordan, Evangelia
Kasapaki, Jens Knoop, Yonghui Li, Daniel Prokesch, Wolfgang Puffitsch, Pe-
ter Puschner, André Rocha, Cláudio Silva, Jens Sparsø, and Alessandro Toc-
chi. T-CREST: Time-predictable multi-core architecture for embedded sys-
tems. Journal of Systems Architecture, 61(9):449–471, 2015.
[16] Martin Schoeberl, Florian Brandner, Stefan Hepp, Wolfgang Puffitsch, and
Daniel Prokesch. Patmos reference handbook. Technical report, Technical
University of Denmark, 2014.
[17] Martin Schoeberl, David VH Chong, Wolfgang Puffitsch, and Jens Sparsø.
A time-predictable memory network-on-chip. In Proceedings of the 14th In-
ternational Workshop on Worst-Case Execution Time Analysis (WCET 2014),
pages 53–62, Madrid, Spain, July 2014.
[18] Martin Schoeberl and Morten Borup Petersen. Leros: The return of the accu-
mulator machine. In Martin Schoeberl, Thilo Pionteck, Sascha Uhrig, Jürgen
Brehm, and Christian Hochberger, editors, Architecture of Computing Systems
- ARCS 2019 - 32nd International Conference, Proceedings, pages 115–127.
Springer, 1 2019.
[19] Martin Schoeberl, Luca Pezzarossa, and Jens Sparsø. A minimal network
interface for a simple network-on-chip. In Martin Schoeberl, Thilo Pionteck,
Sascha Uhrig, Jürgen Brehm, and Christian Hochberger, editors, Architecture
of Computing Systems - ARCS 2019, pages 295–307. Springer, 1 2019.
[20] Martin Schoeberl, Wolfgang Puffitsch, Stefan Hepp, Benedikt Huber, and
Daniel Prokesch. Patmos: A time-predictable microprocessor. Real-Time Sys-
tems, 54(2):389–423, Apr 2018.
[21] Martin Schoeberl, Tórur Biskopstø Strøm, Oktay Baris, and Jens Sparsø.
Scratchpad memories with ownership. In 2019 Design, Automation and Test
in Europe Conference Exhibition (DATE), 2019.
[22] Bill Venners, Lex Spoon, and Martin Odersky. Programming in Scala, 3rd
Edition. Artima Inc, 2016.
[23] Andrew Waterman, Yunsup Lee, David A. Patterson, and Krste Asanovic.
The RISC-V instruction set manual, volume I: Base user-level ISA. Techni-
cal Report UCB/EECS-2011-62, EECS Department, University of California,
Berkeley, May 2011.
[24] Jerry Zhao, Animesh Agrawal, Borivoje Nikolic, and Krste Asanović. Constel-
lation: An open-source SoC-capable NoC generator. In 2022 15th IEEE/ACM
International Workshop on Network on Chip Architectures (NoCArc), pages
1–7, 2022.
[25] Michael Zimmer. Predictable Processors for Mixed-Criticality Systems and
Precision-Timed I/O. PhD thesis, EECS Department, University of California,
Berkeley, Aug 2015.
251
I NDEX
UART, 162
Vcd, 40
VCS, 205
Vec, 18
Vector, 18
Verification, 199
Verilator, 205
Verilog, 32
Waveform, 40
Waveform diagram, 77
when, 64
Wire, 14, 25
Wishbone, 194