Berkeley Analog Generator
Berkeley Analog Generator
Generator
Nicholas Werblun
Vladimir Stojanovic, Ed.
May 1, 2019
Copyright © 2019, by the author(s).
All rights reserved.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission.
Closing the Analog Design Loop with the Berkeley Analog Generator
by
Nicholas Werblun
Master of Science
in
in the
Graduate Division
of the
Committee in charge:
Spring 2019
Closing the Analog Design Loop with the Berkeley Analog Generator
Copyright 2019
by
Nicholas Werblun
3
1
Abstract
Closing the Analog Design Loop with the Berkeley Analog Generator
by
Nicholas Werblun
Master of Science in Electrical Engineering and Computer Sciences
University of California, Berkeley
Professor Vladimir Stojanović, Chair
Analog and mixed signal IC design is notoriously difficult and slow due in large part to the
layout. Modern integrated circuit fabrication with such small devices can have significant
interconnect parasitics that can drastically affect the behavior of a circuit’s design. The
implication is that simulations of circuit’s behavior are unreliable until after the interconnect
parasitics are extracted from the layout and included in the simulation.
The Berkeley Analog Generator (BAG) is a Python-based tool that interfaces with the
Cadence Virtuoso software [3] that aims to solve the above problem. BAG allows the user
to write parametrizable generator scripts that will automatically generate the entire layout
and schematic, as well as run the layout-versus-schematic (LVS) and post-layout extraction
(PEX) tools and export the results in a time that ranges from seconds to minutes based on
circuit complexity. Designers who have decided on a certain topology can write a layout and
schematic generator script in a high level programming language with class based hierarchy
once, and then any changes in the circuit simply require changing the corresponding param-
eters file containing the circuit specifications. Additionally, BAG allows the automation of
simulation and post-processing of simulation data as well as implementation of higher-level
design scripts that encapsulate designer insights and methodology, as well as opens the doors
for automated optimizer-driven circuit design.
This report shows examples of many common circuit blocks and their BAG implemen-
tation in an advanced process node, as well as an example of how BAG can be used to
speed up the design process. Although the generator scripting offers the implementation in
a higher-level language, certain implementation strategies and methodologies work better
than others, and this report aims at presenting a systematic generator writing methodology
and illustrates it on a set of typical analog-mixed signal blocks found in a high-speed link
front-end. In three months, a library of generators ranging from small basic circuits to entire
receiver chains were written; then in roughly two weeks, an LVS/PEX verified design for a
25Gbps optical communication link receiver in a 14nm FinFET process was created using
BAG cells and test benches. Further possibilities and uses of BAG are also discussed.
i
Contents
Contents i
List of Tables iv
Listings iv
Bibliography 52
iii
List of Figures
List of Tables
Listings
Acknowledgments
Thank you to professor Vladimir Stojanović for the advising and guidance through this
project. Starting is often the hardest part, as is finding a direction to follow. Huge thank
you to Sidney Buchbinder and Ruocheng Wang for mentoring me through my degree. Thank
you for answering my multi-hundreds of questions and teaching me roughly one semester’s
worth of information in just a few weeks total. I asked for design experience and I got it;
and you helped me through it.
Additional thanks to Olivia for telling me to join the group. Thanks to Kourosh for
making the test bench generators and explaining them to me. Finally a thanks to all the
rest of team Vlada: Krishna, Panos, Christos, Pavan, Nandish, Taewhan and Eric for making
my research experience go smoothly. Finally, thank you to Eric Chang et. al. for creating
BAG so that I never had to do layout by hand.
1
Chapter 1
If you know anyone who fits into the category of analog, RF or mixed signal designers, you
may already be familiar with the disheartened, defeated look they often display. Is it that IC
design attracts people of this nature? Or is it that IC design slowly gnaws at the existence
of engineers until they are but a shell of their former selves? For this report, we will assume
the latter. We will also assume this is due to standard practices more than any deep-rooted
truths about IC design or people in general.
1. Simulate the chosen design across all specs to ensure proper behavior.
CHAPTER 1. WHAT MAKES ANALOG DESIGN DIFFICULT? 2
If the circuit fails at any of these steps, the design must be modified and the steps
repeated. Only when all of tests are passed and verified through potentially hundreds of
different tests will the chip be sent for fabrication. Depending on the complexity of the chip
and the size of the team, this process can take anywhere from months to a year or more.
before they can start the cycle again. By Amdahl’s law then 1 [2], speeding up the layout
should allow one to close the design loop faster and reduce the pain of IC process.
1
Amdahl’s law is generally referenced in computer program runtime, but the concept of speeding up
fractions of a process applies here as well.
4
Chapter 2
As discussed in [3], BAG is a framework that allows users to create, use and test
process-portable analog generators. Designers can create template schematics and write
a scripted layout generator that incorporates their design methodology in a technology ag-
nostic, parametrized way. At the highest level, the user inputs parameters, examples of
which will be shown later in this chapter and in Chapter 3, such as device dimensions and
passive component values into a specification file, and a script will generate a schematic, LVS
tested layout and a PEX netlist. The main advantage is that the delay of the design loop
discussed in Chapter 1 is significantly reduced as post-layout effects can be directly included
into the flow.
CHAPTER 2. THE BERKELEY ANALOG GENERATOR 5
As will be discussed in later sections, there are a few main components to a “complete”
generator. At the lowest levels are the layout generator and schematic generator. These are
responsible for the physical process of generating the circuit representation and layout. In
order to use these, there is also a top level script that reads a parameter file and runs the
schematic/layout generators with these specifications. The top level is also responsible for
deciding whether or not to run LVS/PEX on the generated instance.
At an even higher level is the notion of a design script and design managers. Design
manager is a class responsible for using the top level generator mentioned previously and
overseeing the process of running tests and post-processing on test results. Design manager
has associated test bench scripts which are responsible for connecting the generated device
into a previously made test harness. The test bench script maps the pins of the instance to
the pins of the test harness and runs predetermined SPICE simulations (i.e. AC, transient,
S-parameters) before exporting the results to Python. Design manager can then pass the
results to a measurement manager which can process, plot, etc. The entire process can be
visualized as in Figure 2.1.
The notion of a design script is an even higher level concept which allows a designer to
encode their design procedure automatically into a close-looped script. The user can write
a script that computes passive and transistor sizings, allow design manager to generate and
test the post-PEX netlist, and iterate based on the results. The possibility of incorporating
a design script will be discussed later, although a design script is not presented in this work.
Design scripts are discussed in further detail in [3] and [6].
This template holds only a human-readable description of the connections. The schematic
will be copied over with actual values filled in by BAG afterward. One thing to note is the
presence of seemingly useless transistors, like in the bottom right. These transistors are
used by BAG to properly create dummy transistors in the layout, and anything else can be
removed if not used in the schematic generator. Additionally, the transistor in the bottom
left used for adding stabilization capacitance can also be removed if desired. Example layouts
below will not have this transistor.
Using the schematic template, the user then decides how many rows of transistors will
be in the layout, and assigns a number of transistors to each row. There are a number of
helper functions to generate a data structure containing information about which row the
transistor is in, the drain/source metal directions, number of fingers, etc. which is called the
“initialization step”. An example of how one might set up the rows and initialization for the
amplifier in Figure 2.2 is shown in Listing 2.1 and Listing 2.2. Note that these code blocks
are only portions of the full layout generator and there are, in general, a small number of
extra lines required. This section highlights only the most important portions of a generator.
CHAPTER 2. THE BERKELEY ANALOG GENERATOR 8
1 ################################################################################
2 # 2:
3 # I n i t i a l i z e the t r a n s i s t o r s in the design
4 # S t o r i n g each t r a n s i s t o r ’ s i n f o r m a t i o n ( name , l o c a t i o n , row , s i z e , e t c ) i n a
5 # d i c t i o n a r y o b j e c t a l l o w s f o r c o n v e n i e n t u s e l a t e r i n t h e code , and a l s o
6 # g r e a t l y s i m p l i f i e s the schematic generation
7 # The i n i t i a l i z a t i o n s e t s t h e t r a n s i s t o r ’ s row , width , and s o u r c e / d r a i n n e t names
8 # f o r p r o p e r dummy c r e a t i o n
9 ################################################################################
10 t a i l l = l a y o u t h e l p e r . i n i t i a l i z e t x ( name= ’ t a i l l ’ , row=r o w t a i l ,
11 f g s p e c= ’ b o t t o m t a i l ’ ,
12 d e f f n e t= ’ TAIL ’ )
13 t a i l r = l a y o u t h e l p e r . i n i t i a l i z e t x ( name= ’ t a i l r ’ , row=r o w t a i l ,
14 f g s p e c= ’ b o t t o m t a i l ’ ,
15 d e f f n e t= ’ TAIL ’ )
16 b i a s = l a y o u t h e l p e r . i n i t i a l i z e t x ( name= ’ b i a s ’ , row=r o w t a i l ,
17 f g s p e c= ’ b o t t o m b i a s ’ ,
18 d e f f n e t= ’ IBIAS TAIL ’ )
19 i n l e f t = l a y o u t h e l p e r . i n i t i a l i z e t x ( name= ’ i n l e f t ’ , row=r o w i n p u t ,
20 f g s p e c= ’ b o t t o m i n ’ ,
21 s e f f n e t= ’ TAIL ’ , d e f f n e t= ’VOUT N ’ )
22 i n r i g h t = l a y o u t h e l p e r . i n i t i a l i z e t x ( name= ’ i n r i g h t ’ , row=r o w i n p u t ,
23 f g s p e c= ’ b o t t o m i n ’ ,
24 s e f f n e t= ’ TAIL ’ , d e f f n e t= ’VOUT P ’ )
25 t o p l e f t = l a y o u t h e l p e r . i n i t i a l i z e t x ( name= ’ t o p l e f t ’ , row=r o w m i r r o r ,
26 f g s p e c= ’ top ’ ,
27 d e f f n e t= ’VOUT N ’ )
28 t o p r i g h t = l a y o u t h e l p e r . i n i t i a l i z e t x ( name= ’ t o p r i g h t ’ , row=r o w m i r r o r ,
29 f g s p e c= ’ top ’ ,
30 d e f f n e t= ’VOUT P ’ )
1 # Calculate positions of t r a n s i s t o r s
2 # This u s e s h e l p e r f u n c t i o n s t o p l a c e each t r a n s i s t o r w i t h i n a s t a c k / column o f a
3 # s p e c i f i e d s t a r t i n g i n d e x and
4 # width , and with a c e r t a i n a l i g n m e n t ( l e f t , r i g h t , c e n t e r e d ) w i t h i n t h a t column
5 l a y o u t h e l p e r . a s s i g n t x c o l u m n ( t x=b i a s , o f f s e t=c o l m i d , f g c o l=f g m i d , a l i g n =0)
6 l a y o u t h e l p e r . a s s i g n t x c o l u m n ( t x= t a i l l , o f f s e t=c o l s t a c k l e f t ,
7 f g c o l=f g s t a c k , a l i g n =0)
8 l a y o u t h e l p e r . a s s i g n t x c o l u m n ( t x=i n l e f t , o f f s e t=c o l s t a c k l e f t ,
9 f g c o l=f g s t a c k , a l i g n =0)
10 l a y o u t h e l p e r . a s s i g n t x c o l u m n ( t x=t o p l e f t , o f f s e t=c o l s t a c k l e f t ,
11 f g c o l=f g s t a c k , a l i g n =0)
12 l a y o u t h e l p e r . a s s i g n t x c o l u m n ( t x=t a i l r , o f f s e t=c o l s t a c k r i g h t ,
13 f g c o l=f g s t a c k , a l i g n =0)
14 l a y o u t h e l p e r . a s s i g n t x c o l u m n ( t x=i n r i g h t , o f f s e t=c o l s t a c k r i g h t ,
15 f g c o l=f g s t a c k , a l i g n =0)
16 l a y o u t h e l p e r . a s s i g n t x c o l u m n ( t x=t o p r i g h t , o f f s e t=c o l s t a c k r i g h t ,
17 f g c o l=f g s t a c k , a l i g n =0)
After initializing all transistors, the user then must specify their locations in the layout by
creating fictitious columns that stacks of transistors will be placed in. Based on the number
of fingers each transistor has, technology required spacing and any other spacing (to route
dummies, etc.) the user can compute columns based on a number of fingers, and assign the
transistors like in Listing 2.3.
The final transistor placement step is to set the drain and source orientations. A tran-
sistor’s source can be routed up or down which affects where the gate placements are made,
and the first diffusion region per transistor can be either source or drain to make alignment
simpler. There is also a helper function to automatically compute based on number of fingers
which region should be the source or drain based on another transistor the user wishes to
align to. This is shown in Listing 2.4.
CHAPTER 2. THE BERKELEY ANALOG GENERATOR 11
1 ################################################################################
2 # 4: A s s i g n t h e t r a n s i s t o r d i r e c t i o n s ( s /d up vs down )
3 #
4 # S p e c i f y t h e d i r e c t i o n s t h a t c o n n e c t i o n s t o t h e s o u r c e and c o n n e c t i o n s t o t h e d r a i n
5 # w i l l go ( up vs down ) . Doing s o w i l l a l s o d e t e r m i n e how t h e g a t e i s a l i g n e d
6 # ( i e w i l l i t be a l i g n e d t o t h e s o u r c e o r d r a i n )
7 # See t h e bootcamp f o r more d e t a i l s
8 # The h e l p e r f u n c t i o n s used h e r e h e l p t o a b s t r a c t away whether t h e i n t e n d e d
9 # s o u r c e / d r a i n d i f f u s i o n r e g i o n o f a t r a n s i s t o r o c c u r s on t h e even o r odd
10 # columns o f t h a t d e v i c e (BAG a l w a y s c o n s i d e r s t h e even columns o f a
11 # d e v i c e t o be t h e ’ s ’ ) . These h e l p e r f u n c t i o n s a l l o w a u s e r t o s p e c i f y
12 # whether t h e even columns s h o u l d be t h e t r a n s i s t o r s e f f e c t i v e s o u r c e o r
13 # e f f e c t i v e d r a i n , s o t h a t t h e u s e r d o e s not need t o worry about BAG ’ s n o t a t i o n .
14 ################################################################################
15
16 # S e t t a i l b i a s t x t o have s o u r c e on t h e l e f t m o s t d i f f u s i o n ( a r b i t r a r y )
17 # and s o u r c e g o i n g down
18 l a y o u t h e l p e r . s e t t x d i r e c t i o n s ( t x=b i a s , s e f f = ’ d ’ , s e f f d i r =0)
19 # A s s i g n t h e i n p u t t o be a n t i −a l i g n e d , s o t h a t t h e i n p u t s o u r c e and t a i l
20 # drain are v e r t i c a l l y aligned
21 l a y o u t h e l p e r . s e t t x d i r e c t i o n s ( t x=i n l e f t , s e f f = ’ s ’ , s e f f d i r =0)
22 l a y o u t h e l p e r . s e t t x d i r e c t i o n s ( t x=i n r i g h t , s e f f = ’ s ’ , s e f f d i r =0)
23
24 l a y o u t h e l p e r . a s s i g n t x m a t c h e d d i r e c t i o n ( t a r g e t t x= t a i l l , s o u r c e t x=i n l e f t ,
25 s e f f d i r =0 , a l i g n e d=F a l s e )
26 l a y o u t h e l p e r . a s s i g n t x m a t c h e d d i r e c t i o n ( t a r g e t t x=t a i l r , s o u r c e t x=i n r i g h t ,
27 s e f f d i r =0 , a l i g n e d=F a l s e )
28
29 l a y o u t h e l p e r . a s s i g n t x m a t c h e d d i r e c t i o n ( t a r g e t t x=t o p l e f t ,
30 s o u r c e t x=i n l e f t , s e f f d i r =2)
31 l a y o u t h e l p e r . a s s i g n t x m a t c h e d d i r e c t i o n ( t a r g e t t x=t o p r i g h t ,
32 s o u r c e t x=i n r i g h t , s e f f d i r =2)
Finally, the difficult setup is complete, and the user can call the function self.draw base()
with the information about rows, transistors, etc. set up in the previous steps. BAG will
then draw the all of the required polygons for the metal connections to the MOS devices and
everything else. An example base layout with no wiring done of the schematic in Figure 2.2
is shown in Figure 2.3.
1 #Connect up b i a s g a t e s + d r a i n
2 warr bias in = s e l f . connect to tracks (
3 [ t a i l l [ ’g ’ ] , t a i l r [ ’g ’ ] , bias [ ’d ’ ] , bias [ ’g ’ ] ] ,
4 tid tail gate
5 )
6 #c o n n e c t t a i l d r a i n s t o i n p u t s o u r c e s ( t a i l node )
7 warr tail = s e l f . connect to tracks (
8 [ t a i l l [ ’d ’ ] , t a i l r [ ’d ’ ] , in right [ ’ s ’ ] , i n l e f t [ ’ s ’ ] ] ,
9 tid tail ds
10 )
After drawing the base layout, users can then specify how to connect wires and ports by
specifying a metal layer, a wire width and a track. Tracks are an invisible grid that spans the
design space used by BAG to place wires properly. Listing 2.5 shows an example command
of how to connect elements. Note that this code makes no reference to anything specific
nor does it “hardcode” any parameters. Everything is generic to the specified parameters.
Figure 2.4 shows the connections made by BAG.
Lastly, the user finalizes connections and adds pins to wires. In the final step, BAG will
draw dummy transistors and create straps for the power supplies, automatically routing the
dummies’ connections to the supply, like in Figure 2.5. With a finished layout generator,
the user can now arbitrarily change their transistor specifications which will be reflected
automatically in the wiring and sizing. Figure 2.6 shows the same generator with various
width and number of finger choices. An important note is that the user can do all this
CHAPTER 2. THE BERKELEY ANALOG GENERATOR 14
without knowing any of the myriad of the layout design rules since BAG handles these
complexities internally and abstracts them away from the user.
CHAPTER 2. THE BERKELEY ANALOG GENERATOR 15
(b)
(a)
(c)
Figure 2.6: Differential amplifier layout with various widths and number of finger choices.
CHAPTER 2. THE BERKELEY ANALOG GENERATOR 16
BAG also offers hierarchy in layout using TemplateBase [3]. When a library of smaller
cells are created, the user can then “stamp” these cells into a larger unit and connect them
together to form more complex systems. Listing 2.6 demonstrates the code to insert a double
tail sense amplifier into a circuit. The user first creates a template, then computes a location
where the circuit should be placed using helper functions to guarentee everything is aligned
to the grid. Lastly, with the computed coordinates, the user can instantiate the circuit into
the layout. An example of a circuit containing a differential transimpedance amplifier (TIA)
and continuous-time linear equalizer (CTLE) is shown in Figure 2.7.
The benefit of codifying the layout procedure is that configuration can be automatically
included. For example, should a designer want variable resistors in their circuit, they may
opt for a resistor DAC. Resistor DACs are often an arbitrary number of arrayed unit resistors
with digitally controlled switches. One method of implementing such a resistor DAC is a
set of series resistors each with a parallel switch that can short out or connect any of the
series resistors. Using TemplateBase, we can place any amount of template layouts which
allows for a single generator with a large degree of freedom to create such a circuit. The user
can choose how many bits, what type of switch to use (NMOS, PMOS, passgate), and even
whether or not to include local inversion for the passgate. An example of multiple resistor
DAC layouts of this style are shown in Figure 2.8 and a small portion of the configuration
file in Listing 2.7.
CHAPTER 2. THE BERKELEY ANALOG GENERATOR 17
1 params :
2 o u t p u t b i t s d i r : ’ l e f t ’ #where t o drag t h e c o n t r o l b i t s t o
3 bits : 1
4 switch params :
5 switch type : ’ transmission ’
6 i n c l u d e i n v : True
7 switch params :
8 l c h : ! ! f l o a t 14 e−9
9 guard ring nf : 0
10 ptap w : 6
11 ntap w : 6
12 w d i c t : # Width o f each row . Each row n e e d s t h e width s p e c i f i e d
13 nmos : 6
14 pmos : 12
15
16 t h d i c t : # T h r e s h o l d i n f o r m a t i o n / t h i c k ox / e t c f o r each row
17 nmos : ’ s t a n d a r d ’
18 pmos : ’ s t a n d a r d ’
19
20 s e g d i c t : # Number o f f i n g e r s o f each t r a n s i s t o r
21 nmos : 8
22 pmos : 16
23 ...
24 ...
25 ...
26 res params :
27 s h o w p i n s : True
28 l : 0 . 5 e−6
29 w : 1 . 0 e−6
30 s u b t y p e : ’ ptap ’
31 threshold : ’ standard ’
32 nser : 1
33 npar : 1
34 ndum : 1
35 port layer : 5
36 ...
37 ...
38 ...
(a) 1 bit
(c) 5 bits
(b) 3 bits
(a) 1 bit
(b) 5 bits
An example test bench manager, generic AC TB,1 inputs an AC voltage or current and
runs an AC simulation and noise simulation. The corresponding measurement manager sifts
through the data and (regardless of the transfer function shape) computes the DC gain and
overall bandwidth. The noise data is integrated and reported, as well as CMRR.
Since BAG is written in Python, users can easily extend or add features to BAG. An
example is the extension of DesignManager to SweepDesignManager. This subclass inherits
all the basic properties and functions in design manager, but allows the user to specify a
set of variables in the parameter file to sweep. BAG will then automatically generate one
instance per parameter value in the range, and simulate them all in parallel. This allows
for the same type of sweeps one would do manually, but additionally includes parasitics and
LVS.
1
This test bench, and all others used in this report come courtesy of Kourosh H. of team Vlada.
25
Chapter 3
1 tr widths :
2 # How wide t o make t h e a c t u a l w i r e s
3 # f o r m a t i s { metal l a y e r : t r a c k width , metal l a y e r : width i n t r a c k s }
4 b i a s : { 4 : 2 , 5 : 2}
5 c l k : { 4 : 1 , 5 : 1}
6 s i g : { 4 : 1 , 5 : 1}
7 tr spaces :
8 # How wide t o make t h e s p a c e s between each w i r e o f a c e r t a i n t y p e
9 # same f o r m a t t i n g . { metal l a y e r : width i n t r a c k s }
10 ! ! python / t u p l e [ ’ b i a s ’ , ’ ’ ] : { 4 : 1 , 5 : 1}
11 ! ! python / t u p l e [ ’ c l k ’ , ’ ’ ] : { 4 : 3 , 5 : 3}
12 ! ! python / t u p l e [ ’ c l k ’ , ’ c l k ’ ] : { 4 : 2 , 5 : 2}
13 ! ! python / t u p l e [ ’ c l k ’ , ’ b i a s ’ ] : { 4 : 3 , 5 : 3}
14 ! ! python / t u p l e [ ’ s i g ’ , ’ ’ ] : { 4 : 2 , 5 : 2}
1 # C r e a t e a t r a c k t o put a w i r e on f o r c o n n e c t i n g t h e r e s i s t o r o ut pu t
2 # terminal to the input terminal
3 t i d e x t e n d r e s p i n p = TrackID (
4 #A way o f s p e c i f y i n g I want t h e n e x t h o r i z o n t a l l a y e r above my l o w e s t h o r z . l a y e r
5 horz conn layer + 2 ,
6 t r a c k i d x= s e l f . g r i d . c o o r d t o n e a r e s t t r a c k ( #H e l p e r f u n c t i o n
7 l a y e r i d=h o r z c o n n l a y e r + 2 ,
8 #This r e p r e s e n t s t h e c o o r d i n a t e o f t h e r e s i s t o r p o r t w i r e
9 c o o r d =( r e s p i n p . g e t b b o x a r r a y ( s e l f . g r i d ) . b o t t o m u n i t + \
10 diff amp vout p . get bbox array ( s e l f . grid ) . top unit ) /2 ,
11 unit mode=True
12 ),
13 #This i s where t h e p a r a m e t r i z a t i o n comes i n . By c h a n g i n g t h e width c o r r e s p o n d i n g
14 # t o ‘ b i a s ’ i n t h e s p e c f i l e , t h e width w i l l a u t o m a t i c a l l y change h e r e .
15 width=t r m a n a g e r . g e t w i d t h ( h o r z c o n n l a y e r +2 , ’ b i a s ’ )
16 )
also includes an option to generate input pair offset correction circuits. These circuits are
implemented as a current that subtracts from the input pair’s current during the integration
step of operation (discussed more in Chapter 4). The generator automatically accounts
for how the setup changes when offset correction is included and automatically adds more
pins/labels to the layout, as shown in Figure 3.4. Finally, the output of a comparator like
this is only valid for a short time. We need a latch to store the value in between evaluation
CHAPTER 3. EXAMPLE GENERATOR IMPLEMENTATIONS 28
cycles. Thankfully, this is possible with BAG. Using TemplateBase we can attach a latch
made previously to the output of the DTSA, like in Figure 3.5.
entire layout in seconds, which truly allows faster design iteration by removing the bottleneck
almost entirely.
The first front end (Figure 3.6) is a chain of a TIA followed by a Cherry-Hooper amplifier
stage, then a CTLE with resistor DACs and a capacitor DAC and two parallel preamplifiers
also with resistor DAC loads. As will be discussed in Chapter 4, there are also current
DACs to establish a common mode output in the first stage. The important thing to note
about these generated layouts is how much control the user has over the dimensions of the
blocks. Extremely wide or tall, or balanced layouts are possible. The aspect ratio is fully
customizable, and can be changed at a whim with a simple rerun of the script.
There are also versions that include DACs for every passive element as well as current
DACs for every bias input. There are versions that remove the Cherry-Hooper stage and
include a comparator. These various variants are also used for different simulation stages of
design and verifiaction.
Another example of a front end is the one that will be used in Chapter 4. This AFE is a
quad data rate AFE similar to the previous AFE and is comprised of the chain: TIA, CTLE,
2x parallel passive diff amps, 4x samplers. The architecture and operation is explained
in Chapter 4. The layout is shown in Figure 3.7, and of course has the same degree of
customizability alluded to in this entire chapter. The main point of demonstrating these
layouts is that with a library of “leaf cells,” composing large circuits is a very feasible task
CHAPTER 3. EXAMPLE GENERATOR IMPLEMENTATIONS 30
that would take an experienced user only one to two days to implement. With BAG, layout
is no longer a painstaking process.
CHAPTER 3. EXAMPLE GENERATOR IMPLEMENTATIONS 31
Chapter 4
Silicon photonics is a relatively new field, but the increasing possibility of incorporating
photonics elements on chip with the electronics has led to optical circuits becoming a research
hotspot.
To demonstrate the capabilities of BAG, an optical receiver design from concept to veri-
fication is shown.
• VDD = 0.8V
There are no power constraints or architecture constraints with the condition that any
chosen architecture will be implemented in BAG. We will assume the photonics are already
implemented, and we will also not simulate for temperature, voltage or process variations.
Each transistor will be of a unit size, and only the number of fingers will change. Noise in
general should be low, but there is no strict value requirement. We will see in the “Future
CHAPTER 4. DESIGN PROBLEM: AN OPTICAL RECEIVER 34
Work” of Chapter 5 how this could be extended, and an example of how one could generalize
the design procedure automatically.
This receiver is a quad data rate (QDR) receiver with each comparator operating in 90◦
phase offsets at a quarter of the clock rate to reduce the comparator constraints. Techni-
cally, only a single comparator clocked at 25GHz is necessary, however the time required for
a comparator to decide between a 1 or 0 bit depends on the available voltage drive and hence
impacts the receiver current sensitivity. To meet the target speed and sensitivity specifica-
tions, we use four comparators that operate only a quarter of the time to allow enough time
for decision making and regeneration. This will be further discussed in following parts of
this chapter.
Another design choice made is to drive the 0◦ and 180◦ offset comparators with the same
preamp, and 90◦ and 270◦ together. This was done to reduce the effect of back-injection of
the clock into the circuit elements before it.
The TIA is the main gain stage and is used to convert the incoming current waveform
into a voltage. Since the TIA is the first block in the chain, the formula for cascaded noise
figures
F2 − 1 F3 − 1
Ftotal = F1 + + + ... (4.1)
G1 G1 G2
(where F1 and Gi are the noise factor and gain of stage i respectively) tells us we want
the TIA to be high gain and low noise in order to reduce the overall noise factor of the
system.
The TIA is followed by a CTLE. A CTLE has a zero in its transfer function that can
be placed at a specific frequency, which theoretically allows the designer to extend the
bandwidth of previous stages as in Figure 4.2. One concern of the CTLE however, is that it
CHAPTER 4. DESIGN PROBLEM: AN OPTICAL RECEIVER 35
can only, in general, shape the energy of the frequency spectrum. This means that usually
to increase the gain at high frequency, we must throw away DC gain, as shown below:
Since we are targeting 25Gbps, the empirical target bandwidth the front end receiver
needs for a relatively optimal trade off in power and introduced ISI is ≈ 0.7× the data
rate, or roughly 17GHz [11]. The bare minimum would be roughly half, or 12.5GHz. We
will target somewhere in between. A good rule of thumb for comparators is that they need
milivolts of signal swing to consistently measure correctly. The gain bandwidth product
of the TIA is unlikely to be large enough to get a 40µA signal to roughly 10mV across a
frequency range of 0Hz to 17GHz and be low noise, so we increase the TIA gain and lower
the bandwidth so that even with the CTLE DC gain reduction, we can still get decent gain
in conjunction with the CTLE bandwidth extension.
The final stage before the comparators is a set of passively loaded differential amplifiers.
Their purpose is threefold. The CTLE would have to simultaneously drive four comparators,
which is a fairly large capacitive load. This limits the maximum achievable peaking gain by
pushing in the second pole. Additionally, the comparators tend to inject their clock signal
backwards which can impact the CTLE’s operation periodically. The amplifiers serve as
buffers which will isolate the kickback, and act as an intermediate step to reduce the amount
of capacitance the CTLE has to drive. The DC gain is also expected to be too low due to
the CTLE’s reduction, so the amplifiers will provide a relatively small gain (≈ 2×) to get as
much swing as possible.
VDD
Rf b
Cpd
+
IDC Vout
−
IDC
Rf b
The purpose of implementing the TIA in such a fashion is partly due to the work in [10].
This project report began as a project to port the design in [10] into a more generic BAG style
to demonstrate the process-portability of BAG. Additionally, [8] shows that this architecture
can be used with a scheme that splits the photodiode differentially to take advantage of the
dummy block. At DC, if IDC = 0 then the input and output common mode voltages are
equal. The purpose of the DC current source is to force the output common mode to be
higher if desired. Since the output will sit about mid-rail, that may not be enough to bias
the input of the CTLE, so we will need DC current through Rf b to force this difference.
If we assume the inverters are an amplifier with gain −Av then the input impedance can
be found by using Miller’s theorem.
Rf b
Zin = ZCpd || (4.2)
(1 + Av )
which simplifies to
Rf b
1+Av
Rf b (4.3)
1+ jω 1+A v
Cpd
Thus the input pole should be roughly at
1 + Av
ωp = (4.4)
Rf b Cpd
assuming that Cpd is the photodiode capacitance plus any TIA input and wiring capacitance
lumped together. From the small signal model of one half of the circuit (shown in Figure
4.4, ignoring Cpd ) we can derive the gain. KCL at the output node gives
vout
+ (gmn + gmp )(iin Rf b + vout ) − iin = 0 (4.5)
ron ||rop
CHAPTER 4. DESIGN PROBLEM: AN OPTICAL RECEIVER 37
Rf b
vout
+
iin vgs gmn vgs gmp vgs ron ||rop
−
which simplifies to
vout Rf b (gmn + gmp ) − 1
= (4.6)
iin −(gmn + gmp ) − ron1||rop
If we assume Rf b and ron ||rop are both much larger than 1, then the transfer function simplifies
to
vout
| | = Rf b (4.7)
iin
The gain is then approximately just Rf b . For a given choice of Rf b , we then use BAGs rapid
iteration to sweep for transistor widths. As the size increases, device parasitics become
dominant over the external capacitances, so there is an optimal size for bandwidth. Once
the maximum bandwidth is found, this sets the device sizes. Since we know the CTLE can
only get a couple of GHz of bandwidth extension, we can calculate and then sweep the TIA
resistor to see what resistance gives a bandwidth of around 9GHz. If we assume the CTLE
will cut the DC gain by roughly a third, and we can get about 1.5x amplification from the
preamps, then the overall gain should still be high enough for the comparator.
To set the output common mode, we can assume the input will be around mid-rail, so
the output can be approximated as follows:
Vo − VDD2
= IDC (4.8)
Rf b
In order to bias the CTLE input, we want this to potentially be a little higher than midrail
since the VGS needs to be large enough to give headroom to the tail transistors. Plugging into
equation 4.8 gives a starting point that can then be swept using BAG for better accuracy.
Post-PEX hand-simulation is then used to fine tune the current the get the desired result.
Note that by increasing the output common mode, the gain can reduce. This means that the
resistance might need to be higher than anticipated. This is also determined by simulating
BAG generated instances.
We were able to achieve a gain of 1500Ω with a bandwidth of 8.55GHz. In order to shift
the common mode, IDC was set to 55µA.
CHAPTER 4. DESIGN PROBLEM: AN OPTICAL RECEIVER 38
VDD
Rd Rd
V
− out+
Vin Cs Vip
Ibias
Rs
Since we know the pole location of the TIA after simulation, we can design the CTLE
to have its zero in close proximity. Firstly, we draw the approximate differential mode half
circuit:
VDD
Rd
Vout
Vin
Rs
2
2Cs
2gm (1 + jωRs Cs )
Gm = (4.10)
2 + gm Rs + 2jωRs Cs
We can also determine the output impedance by inspection
ZCl Rd
Ro ≈ Rd || = (4.11)
2 1 + 2jωRd Cl
So the overall gain is then
Apeak = gm Rd (4.16)
From these equations, we can choose the values of the degeneration components and pullup
resistors to place the zero and set the peak gain. We want the second pole to be far away at
roughly 20GHz or more, which fixes Rd for a given Cl . If we want a peak gain of around 2,
that then fixes the required gm , which for an assumed Vov of 200mV, fixes the bias current
and transistor sizes. We will (quite conservatively) assume the input to each amplifier is
10fF, so the total load is 20fF. Through BAG, we can sweep component values near the
desired points to get more accurate results.
Using the above steps, we were able to achieve a bandwidth extension of about 4GHz
with a DC gain of roughly 0.6. When placed in succession to the TIA, the overall gain
reduces to about 850 with a bandwidth extension to about 13.5GHz.
CHAPTER 4. DESIGN PROBLEM: AN OPTICAL RECEIVER 40
4.4 Preamplifiers
The preamplifiers are implemented as passively-loaded differential amplifiers as shown below.
The main purpose is to serve as a buffer between the CTLE and the DTSAs. The DTSAs
will provide a fairly substantial capacitive load to the CTLE as well as back-inject their
clock, which is undesired. The preamplifiers ideally will have a gain of around 2, but we will
target a gain greater than 1.
VDD
Rd Rd
V
− out+
Vin Vip
Ibias
Assuming the ro of the transistors are fairly large, then the gain of this amplifier is known
to be
Av = gm Rd (4.17)
with the unity gain frequency at
gm
ωu = (4.18)
Cl
Once we know the bandwidth of the CTLE, we can determine the required unity gain
bandwidth for an overall gain of 2 at this frequency. Again assuming each DTSA provides
10fF of load, then we know Cl . This fixes the required gm and therefore the device sizes and
bias current.
Following these steps, the amplifiers were able to obtain a gain of 1.5 up to 13.5GHz.
This sets the entire front end chain to have an overall bandwidth of 13.5GHz and a gain
of 1200. Assuming a 40µA peak-to-peak input current swing, this should be plenty for the
comparator.
4.5 Comparator
The comparator is implemented as a double tail sense amplifier.
CHAPTER 4. DESIGN PROBLEM: AN OPTICAL RECEIVER 41
VDD
CLK CLK
Von Vop
VDD
CLK
Dip Din
Vin Vip
CLK CLK
The DTSA consists of a clocked differential pair that feeds into cross-coupled inverters.
Much like the classic StrongArm sense-amp [9], the DTSA first precharges the Din and Dip
nodes up to VDD . After CLK goes high, the input pairs activate and begin to drain charge
from the parasitic capacitance at nodes Din and Dip . Depending on which input is larger,
one side will discharge faster which will turn off one of the switches connected to those
nodes faster than the other. This sets the value in the cross-coupled inverters which is then
reinforced through positive feedback.
When CLK goes low, the circuit resets and precharges the Dip /Din nodes for the next
cycle.
A typical cycle of the DTSA with a differential input of 10mV is plotted in Figure 4.9.
The clock is generated by taking ideal clocks and passing them through a chain of inverters
to generate both the real CLK and CLK signals.
The comparator usually is the first thing one would design, as its limitations generally
set the required gain and noise specs for the front end. In this project, the comparator was
designed last, since there is no power constraint. The rate at which voltage is reduced from
Dip /Din is based on the current through the input pairs. By increasing the size of the input
pairs, we can create a voltage difference much more quickly (assuming there is no process
offset in the input pairs) which allows even small inputs to be sensed before the cycle ends.
Thus, as long as our input is on the order of milivolts, we should be able to sense this.
As a test, we determine what the approximate noise tolerance of the DTSA is. We input
a small differential signal (about 1mV) and run a transient simulation with noise over many
CHAPTER 4. DESIGN PROBLEM: AN OPTICAL RECEIVER 43
This implies that ±0.5mV input is approximately 3σn . Although this includes both static
overdrive and noise contribution, a previously run automated overdrive test determined that
the static overdrive needed is much smaller than this input. This means that we can assume
the majority of the input is the effect of noise. Thus, we should budget about 2 × 9σn ,
or ±3mV of the eye opening to noise tolerance for the DTSA for a BER of 10−12 . This,
of course, ignores process variation. Process variation introduces an offset that affects the
current in each branch. From a Monte Carlo simulation, one can find the average offset which
can then additionally be added to the eye opening budget. There are also topologies that
can correct for offset, which would not completely eliminate the issue, but would certainly
reduce it.
CHAPTER 4. DESIGN PROBLEM: AN OPTICAL RECEIVER 44
To determine how much noise the amplifier chain introduces, an AC noise simulation was
also run. The output power spectral density is plotted in Figure 4.12.
Using Python to integrate this function up to 25GHz gives an rms noise of 4.6µV 2 , so
the mean expected noise is then 2mV.
CHAPTER 4. DESIGN PROBLEM: AN OPTICAL RECEIVER 46
From the previous sections, we know that the minimum input is approximately 1mV with
a noise tolerance of 3mV for a BER of 10−12 . For an input of 40µA peak-to-peak, we achieve
a gain of 1200 which puts the input amplitude at roughly 50mV peak-to-peak. This should
be plenty. Unfortunately, the desired bandwidth was not met, although the bandwidth does
surpass the theoretical minimum. This will mostly affect ISI performance, which will be seen
in the next simulation.
To determine over many cycles how the AFE performs, we measure the eye diagram at
the sampler inputs. This is shown in Figure 4.13. As can be seen, there is a fairly large eye
opening which surpasses the minimum required swing for proper evaluation.
CHAPTER 4. DESIGN PROBLEM: AN OPTICAL RECEIVER 47
Lastly, a PRBS32 pattern is fed into the receiver, and the sampler outputs are monitored
in a transient simulation. Note that this architecture would not operate sufficiently due to
the fact that the waveforms need to be adjusted in time to center the eye opening with the
clock edges. This adjustment was done manually for this test. Also, since each sampler is
only sampling once every 4 bits, the outputs are sliced and manually stitched together. A
portion of the test input and output are shown in Figure 4.14. The patterns are clearly
identical with only a time offset through the front end chain.
As a final summary of the design, Table 4.1 shows the tabulated performance of the
design. Measurements for energy and power are for the entire circuit, including biasing, clock
buffers and the front-end up to the samplers. The measures were collected by simulating the
transient current waveform and using Python to find the time average. The time average
multiplied with VDD gives the average power consumption, which is then multiplied by the
bits/time of the PRBS32 pattern input to get energy/bit.
CHAPTER 4. DESIGN PROBLEM: AN OPTICAL RECEIVER 48
Chapter 5
And here we are at the finale. For those who made it this far, congrats! To those who read
only the abstract before this, welcome. I hope it’s interesting!
As is to be expected, BAG’s rapid iteration opens the door for machine learning. Bag-
Net in [6] demonstrates that BAG generators can be used as a tool to solve a constrained
optimization problem with evolutionary algorithms coupled to deep-neural network discrimi-
nators using some of the generators demonstrated in this very work. BagNet shows the feasi-
bility of designing complex analog/mixed signal circuits with sample-efficient, unsupervised
learning by taking advantage of BAG’s rapid iteration abilities. The author also compares
the performance of BagNet solutions to a design script written by an expert designer.
52
Bibliography
[1] Phillip Allen. “THE PRACTICE OF ANALOG IC DESIGN”. en. In: (2004), p. 23.
[2] Amdahl’s Law. url: https://home.wlu.edu/~whaleyt/classes/parallel/topics/
amdahl.html (visited on 04/09/2019).
[3] Eric Chang et al. “BAG2: A process-portable framework for generator-based AMS
circuit design”. en. In: 2018 IEEE Custom Integrated Circuits Conference (CICC).
San Diego, CA: IEEE, Apr. 2018, pp. 1–8. isbn: 978-1-5386-2483-8. doi: 10.1109/
CICC . 2018 . 8357061. url: https : / / ieeexplore .ieee . org / document / 8357061/
(visited on 04/03/2019).
[4] P. Chiu, B. Zimmer, and B. Nikolić. “A double-tail sense amplifier for low-voltage
SRAM in 28nm technology”. In: 2016 IEEE Asian Solid-State Circuits Conference
(A-SSCC). Nov. 2016, pp. 181–184. doi: 10.1109/ASSCC.2016.7844165.
[5] Scott Elder. The REAL Cost for a Custom IC. url: https://www.planetanalog.
com/author.asp?section_id=526&doc_id=559840 (visited on 04/09/2019).
[6] Koroush Hakhamaneshi et al. “Late Breaking Results: Analog Circuit Generator based
on Deep Neural Network enhanced Combinatorial Optimization”. In: Design Automa-
tion Conference (). (Accepted for publication in DAC 2019, Las Vegas, NV, June 2-6,
p. 2.
[7] Nuno Lourenço, Ricardo Martins, and Nuno Horta. “Layout-Aware Sizing of Ana-
log ICs using Floorplan & Routing Estimates for Parasitic Extraction”. en. In: De-
sign, Automation & Test in Europe Conference & Exhibition (DATE), 2015. Grenoble,
France: IEEE Conference Publications, 2015, pp. 1156–1161. isbn: 978-3-9815370-4-
8. doi: 10 . 7873 / DATE . 2015 . 0411. url: http : / / ieeexplore . ieee . org / xpl /
articleDetails.jsp?arnumber=7092562 (visited on 04/09/2019).
[8] Nandish Mehta et al. “A 12Gb/s, 8.6uApp input sensitivity, monolithic-integrated fully
differential optical receiver in CMOS 45nm SOI process”. en. In: ESSCIRC Conference
2016: 42nd European Solid-State Circuits Conference. Lausanne, Switzerland: IEEE,
Sept. 2016, pp. 491–494. isbn: 978-1-5090-2972-3. doi: 10 . 1109 / ESSCIRC . 2016 .
7598348. url: http : / / ieeexplore . ieee . org / document / 7598348/ (visited on
04/10/2019).
BIBLIOGRAPHY 53
[9] Behzad Razavi. “The StrongARM Latch [A Circuit for All Seasons]”. en. In: IEEE
Solid-State Circuits Magazine 7.2 (2015), pp. 12–17. issn: 1943-0582. doi: 10.1109/
MSSC . 2015 . 2418155. url: http : / / ieeexplore . ieee . org / document / 7130773/
(visited on 04/10/2019).
[10] Krishna Settaluri. “Photonic Links – From Theory to Automated Design”. PhD thesis.
EECS Department, University of California, Berkeley, Apr. 2019. url: http://www2.
eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-8.html.
[11] Krishna T. Settaluri et al. “First Principles Optimization of Opto-Electronic Commu-
nication Links”. en. In: IEEE Transactions on Circuits and Systems I: Regular Papers
64.5 (May 2017), pp. 1270–1283. issn: 1549-8328, 1558-0806. doi: 10 . 1109 / TCSI .
2016.2633942. url: http://ieeexplore.ieee.org/document/7807207/ (visited on
04/09/2019).