Lecture 4 - Synthesis - Part 2 2022
Lecture 4 - Synthesis - Part 2 2022
Disclaimer: This course was prepared, in its entirety, by Zvi Webb. Many materials were copied from sources freely available on the internet. When possible, these sources have been cited;
however, some references may have been cited incorrectly or overlooked. If you feel that a picture, graph, or code example has been copied from you and either needs to be cited or removed,
please feel free to email [email protected] and I will address this as soon as possible.
Lecture Outline
• Boolean Minimization
• Constraint Definition
• Technology Mapping
• Verilog for Synthesis – revisited
• Timing Optimization
• Design Compiler (DC) Flow
2
Syntax Analysis
What have we discussed so far?
Library Definition
• Not too much…
• We briefly discussed compilation. Elaboration and
Binding
• And then we really dove down and dirty into standard
cell libraries. Pre-mapping
Optimization
• So at this point:
• We have loaded our design into the synthesizer. Constraint Definition
• And we have loaded our standard cell library and IPs.
• We can move on to discuss the “brains” of the synthesis Technology Mapping
process.
Post-mapping
Optimization
3
Syntax
Analysis
Library
Definition
Elaboration
and Binding
Pre-mapping
Boolean Minimization
Optimization
Constraint
Definition
Technology
4
Syntax
Analysis
• This is the core of synthesis and has been a very central subject of
research in computer science since the eighties.
Generic
RTL Compilation Binding Optimization Netlist
5
Syntax
Analysis
Elaboration Illustrated x1 x2 x3
0 0 0
f(x1x2x3)
1
Library
Definition
Elaboration
0 0 1 1 and Binding
0 1 0 1 Pre-mapping
Optimization
0 1 1 0
Constraint
1 0 0 0 Definition
1 0 1 0 Technology
Mapping
1 1 0 1
Post-mapping
Boolean Logic Optimization
Report and
export
F1=ACB’+DEF’+A’BCF+…
F2=C’B’+D’GH’+A’FG’+…
…
Two-level Logic
Inferred Registers
Syntax
Analysis
Two-Level Logic
B Library
C Definition
Elaboration
• During elaboration, primary inputs and outputs (ports) are B and Binding
D Pre-mapping
defined and sequential elements (flip-flops, latches) are F Optimization
A
inferred. C
Constraint
Definition
• This results in a set of combinational logic clouds with: D
Technology
Mapping
• Input ports and register outputs are inputs to the logic A
C Post-mapping
• Output ports and register inputs are the outputs of the logic D Optimization
• The outputs can be described as Boolean functions of the Report and
inputs. C
export
9
Syntax
f AC CD AC CD Analysis
A Constraint
A
AB AB Definition
CD 00 01 11 10 CD 00 01 11 10 Technology
Mapping
00 1 1 0 0 00 1 1 0 0
Post-mapping
Optimization
01 1 1 1 1 01 1 1 1 1 Report and
D export
D
11 0 0 1 1 11 0 0 1 1
C C ESPRESSO(F) {
10 1 1 1 1 10 1 1 1 1 do {
reduce(F);
expand(F);
B B irredundant(F);
Result of REDUCE: } while (F smaller);
Initial Set of Primes found by verify(F);
Steps1 and 2 of the Espresso Shrink primes while still }
Method covering the ON-set
A A Constraint
AB AB Definition
CD 00 01 11 10 CD 00 01 11 10 Technology
Mapping
00 1 1 0 0 00 1 1 0 0
Post-mapping
Optimization
01 1 1 1 1 01 1 1 1 1 Report and
D D export
11 0 0 1 1 11 0 0 1 1
C C ESPRESSO(F) {
10 1 1 1 1 10 1 1 1 1 do {
reduce(F);
expand(F);
B B irredundant(F);
} while (F smaller);
Second EXPAND generates a IRREDUNDANT COVER found by verify(F);
different set of prime implicants final step of espresso }
12
Syntax
Analysis
• BDDs are DAGs that represent the truth table of a given function Pre-mapping
Optimization
Constraint
Root node Definition
x1 x2 x3 f(x1x2x3)
f(x1,x2,x3) Technology
Mapping
0 0 0 1 x1
Post-mapping
0 1 Optimization
0 0 1 1
~(x2x3) Report and
x2 x2 x2 ~x3 export
0 1 0 1
0 1 0 1
0 1 1 0
1 0 0 0 ~x3
x3 x3 ~x3 x3 x3
1 0 1 0
0 1 0 1 0 1 0 1
1 1 0 1
1 1 1 0 0 0 1 0
f
Definition
• Given a Boolean function f(x1,x2,…,xi,…,xn) Technology
• Positive cofactor: fi1 = f(x1,x2,…,1,…,xn) Mapping
Post-mapping
• Negative cofactor: fi0 = f(x1,x2,…,0,…,xn) a Optimization
Report and
• Shannon’s expansion theorem states that export
1 1 1 0 0 1 0 1 0
f(x1, x2, x3) = ~x1~x2~x3 + ~x1~x2x3 + ~x1x2~x3 + x1x2~x3 = ~x1~x2 + ~x1x2~x3 + x1x2~x3
16
Syntax
Analysis
x1
f(x1,x2,x3) f(x1,x2,x3) x1
Optimization
x x x x Report and
export
~(x2x3) x2 x2 ~x3 x2
~(x2x3) x2 x2 ~x3 x2
y z y z
1 0 1 0
17
Syntax
Analysis
x3 x3 ~x3 x3 x3
f(x1, x2, x3) = ~x1~x2 + ~x3
~x1x2~x3 + x1x2~x3
1 0 1 0
18
Syntax
Analysis
• Given a BDD for a function f, the BDD for f’ can be c+bd b Post-mapping
Optimization
obtained by interchanging the terminal nodes. b b
Report and
• Equivalence check. c export
c+d
• Two functions f and g are equivalent if their BDDs (under c c
the same variable ordering) are the same.
• An Important Point: d
• The size of a BDD can vary drastically if the order d
in which the variables are expanded is changed.
• The number of nodes in the BDD can be 0 1
exponential in the number of variables in the
19
worst case, even after reduction.
Syntax
Analysis
Library
Definition
Elaboration
and Binding
Pre-mapping
Optimization
Constraint
Constraint Definition
Definition
Technology
Mapping
Post-mapping
Optimization
Report and
export
20
Syntax
Analysis
23
Syntax
Analysis
• Technology mapping is the phase of logic synthesis when gates are Pre-mapping
Optimization
selected from a technology library to implement the circuit. Constraint
Definition
• Why technology mapping? Technology
Mapping
• Straight implementation may not be good. Post-mapping
Optimization
• For example, F=abcdef as a 6-input AND gate causes a long delay. Report and
• Gates in the library are pre-designed, they are usually optimized in export
24
Syntax
Analysis
xor (5)
nor3 (3)
nor2(2)
27
Syntax
Analysis
2. Tree-ifying Library
Definition
Elaboration
and Binding
We get 3 trees
28
Syntax
Analysis
Post-mapping
cost i min k cost g i k cost ki Optimization
k1 k2 Report and
• where ki are the inputs to gate g. export
29
Syntax
Analysis
NAND2
I I
A B N x 6 6
I N N
A B 4
I N N I I I N
2 3
C D
CD NOT NAND2 AND2 NOR2 AOI21
Verilog for Synthesis -
revisited
31
Some things we may have missed
• Now that we’ve seen how synthesis works, let’s revisit some of
the things we may have skipped or only briefly mentioned earlier…
• Let’s take a simple 42 encoder as an example:
• Take a one-hot encoded vector and output the position of the ‘1’ bit.
• One possibility would be to describe this logic with a nested if-else block:
always @(x)
begin : encode
if (x == 4'b0001) y = 2'b00;
else if (x == 4'b0010) y = 2'b01;
else if (x == 4'b0100) y = 2'b10;
else if (x == 4'b1000) y = 2'b11;
else y = 2'bxx;
end
33
Some things we may have missed
• In the previous example, if the encoding was wrong (i.e., not one-hot), we would
have propagated an x in the logic simulation.
• But what if we guarantee that the input was one hot encoded?
• Then we could write our code differently…
always @(x)
begin : encode
if (x[0]) y = 2'b00;
else if (x[1]) y = 2'b01;
else if (x[2]) y = 2'b10;
else if (x[3]) y = 2'b11;
else y = 2'bxx;
end
34
A few points about operators
Y = ~X << 2
• Logical operators map into primitive logic gates
• Arithmetic operators map into adders, subtractors, …
X[3]
• Unsigned 2’s complement Y[5]
• No logic involved
• Variable shift amounts a whole different story shifter Y[0]
35
Datapath Synthesis
• Complex operators (Adders, Multipliers, etc.) are implemented in a special way
37
Clock Gating
• Local clock gating: 3 methods • Conventional RTL Code
• Logic synthesizer finds and //always clock the register
always @ (posedge clk) begin
implements local gating if (enable) q <= din;
opportunities end
• RTL code explicitly specifies
clock gating • Low Power Clock Gated RTL
//only clock the ff when enable is true
• Clock gating cell explicitly • ><
assign gclk = enable && clk;
instantiated in RTL always @ (posedge gclk) begin
q <= din;
• Global clock gating: 2 methods end
en
39
Solution: Glitch-free Clock Gate
• By latching the enable signal during the
positive phase, we can eliminate glitches:
clk
en
//clock gating with glitch prevention latch
always @ (enable or clk)
begin
if (!clk)
en_out en_out <= enable;
end
assign gclk = en_out && clk;
gclk
40
Merging clock enable gates
• Clock gates with common enable can be merged
• Lower clock tree power, fewer gates
• May impact enable signal timing and skew.
clk
E clk
en E
enable E
41
Data Gating
• While clock gating is very well understood and automated, a similar situation
occurs due to the toggling of data signals that are not used.
• These situations should be
recognized and data gated.
Timing Optimization
Definition
Technology
Mapping
Post-mapping
Optimization
Report and
export
44
How can we optimize timing?
• There are many ‘transforms’ that the synthesizer applies to the logic to
improve the cost function:
• Resize cells
• Buffer or clone to reduce load on
critical nets
• Decompose large cells
Delay = 4
• Swap connections on commutative pins or among equivalent nets
• Move critical signals forward
• Pad early paths
• Area recovery
• Simple example:
• Double inverter removal transform: Delay = 2
45
Resizing, Cloning and Buffering 0.05
0.04
0.03
d
0.02
• Resize a logic gate to better drive a load: 0.01
d 0.2 0
a a a
? e 0.2 A C 0 0.2 0.4 0.6 0.8 1
b b b 0.026
f 0.035 load
0.3
• Or make a copy (clone of the gate) to distribute the load: A B C
d 0.2
d
e 0.2 A
e
a
f 0.2 f
?
b g 0.2 a
B g
h 0.2 b h d 0.2
• Or just buffer the fanout net: e 0.2
a
B f 0.2
b B g 0.2
0.1
46 h 0.2
Redesign Fan-In/Fan-out Trees
• Redesign Fan-In Tree
Arr(a)=4 a e
1 a
Arr(b)=3 b b 1
e
1
1 c
Arr(c)=1 c Arr(e)=6 1 Arr(e)=5
1 d
Arr(d)=0 d
1 1
1 2
1 1
Longest Path = 5
47
Decomposition and Swapping
• Consider decomposing complex gates into less complex ones:
48
Retiming
FF FF FF
• Given the following network:
D Q DQ D Q
6 4 2 4 4
clock Cycle = 10
• How would you meet the 10ns clock cycle time?
• Re-order sequential elements and combinational logic
FF FF FF
D Q D Q DQ
6 4 2 4 4
49
clock Cycle = 10
Topographical Synthesis
• Also called Physical Aware Synthesis
50
Design Compiler (DC) Flow
51
DC NXT Transformations
1 Library (set target_library
Translation Logic Optimization Gate Mapping set link_library
creat_lib)
residue = 16’h0000; RTL Source db
if (high_bits == 2’b10) 2 Translate (analyze Logic
Libraries
residue = state_table[index]; elaborate)
design library
else
ndm
state_table[index] = 16’h0000; Physical
Library
5 Compile:
3 Floorplan Logic optimization + gate mapping
(read_floorplan) (compile_ultra)
4x
Timing Constraints 3x
create_clock ….
2x 8x
set_input_delay … Generic boolean or GTECH or
unmapped gates 1x 2x
4 Constrain (source)
Technology-specific placed gates
52
6 Save (write_icc2_files)
DC NXT Physical Synthesis Flow
RTL Design Load Libraries
db Design Constraints
Logic
Libraries
Load RTL Code
design library DC NXT
ndm
Physical Load Floorplan
Library Physical Synthesis
Apply Constraints
Standard Synthesize
Floorplan cell Netlist the Design
placement
Analyze Results
Design Planning & “SPG” Placement
(ICC II DP / ICC II) Write out Netlist
with Cell Placement
53
DC NXT Physical Synthesis Flow
RTL Design Load Libraries
db Design Constraints
Logic
Libraries
Load RTL Code
design library DC NXT
ndm
Physical Load Floorplan
Library Physical Synthesis
Apply Constraints
Standard cell
Synthesize
Floorplan Netlist the Design
placement
Analyze Results
Design Planning & “SPG” Placement
(ICC II DP / ICC II) Write out Netlist
with Cell Placement
54
Target Library: Used to Select Technology Specific Cells
• The target_library content is used during compile to create a technology-specific
gate-level netlist
• DC NXT optimization selects the smallest technology-specific gates that meet the required
DRCs, timing and logic functionality
• Default setting: (printvar target_library) Non-existent default
library name
target_library = your_library.db
• Before compile, specify the actual standard cell logic library file(s) provided by the silicon
vendor or library group
55
Reading RTL File(s) with analyze + elaborate
CWD or PWD: The directory that DC NXT is invoked from risc_design(CWD)
57
The “Chicken and Egg” Problem
• Physical synthesis requires a floorplan RTL
Design
• Generating a floorplan requires a netlist
• But synthesizing a netlist requires a Floorplan Physical Synthesis
floorplan …
• But generating a floorplan requires a
netlist …. Netlist
Design Planning
(ICC II DP)
58
Modified FP Constraints – Pre-Floorplan Synthesis
Prior to having a finalized floorplan available, if any floorplan constraints
are expected to be significantly different than the default
Specify them before the 1st synthesis run (after loading RTL design)
IP IP
RAM
RAM
Apply Constraints
Synthesize
Standard cell
Floorplan Netlist the Design
placement
Analyze Results
Design Planning & “SPG” Placement
(ICC II DP / ICC II) Write out Netlist
with Cell Placement
60
Default Design Scenario
Assumed external Assumed external
launching circuitry Current design capturing circuitry
JANE’s_DESIGN MY_DESIGN JOE’s_DESIGN
D Q M N D Q X D Q S T D Q
FF1 FF2 FF3 FF4
QB QB QB QB
Clk
61
Timing Path Definition
Path 1 Path 2 Path 3
A N D Q X D Q S C
FF2 FF3
QB QB
CLK_IN CLK_OUT
B F D
CURRENT_DESIGN
Path 4
DC NXT breaks designs into timing paths, each with a:
Startpoint
Input port (other than a Clock port)
Clock pin of Flip-Flop or register
Endpoint
Output port (other than a Clock port)
Any input pin of a sequential device, except clock pin1
62
Constraining Reg-to-Reg Paths: Example
Spec:
Unit of time is 1ns in this example.
Clock Period = 2ns Defined in the technology library.
MY_DESIGN
JANE’s_DESIGN TSetup, FF2
0.2ns
Tmax
A
D Q M N D Q X D Q S
FF1 FF2 FF3
0.6ns
QB QB QB
Clk
64
Constraining Output Paths :
Spec:
Latest Data Arrival Time at Port B, before Joe’s capturing clock = 0.8ns
mydesign.con
create_clock -period 2 [get_ports Clk]
set_clock_uncertainty –setup 0.3 [get_clocks Clk]
set_input_delay -max 0.6 -clock Clk [get_ports A]
set_output_delay -max 0.8 -clock Clk [get_ports B]
JOE’s_DESIGN
MY_DESIGN TT + Tsetup
Tmax 0.7ns 0.1ns
B
N D Q X D Q S T D Q
FF2 FF3 FF4
QB QB QB
Clk
D Q M N D Q X D Q S T D Q
FF1 FF2 FF3 FF4
QB QB QB QB
Clk
DC NXT calculates how much time is left for the internal logic.
67
Effect of Output Capacitive Load
MY_DESIGN 3fF or
30fF?
B
D Q
FF3
QB
Clk
1.2ns
Clk
0ns 2ns
50%
B
1.2ns
Capacitive loading on an output port affects the transition time, and
thereby the cell delay, of the output driver.
By default DC NXT assumes zero capacitive loading on outputs. It is
therefore important to accurately model capacitive loading on all outputs.
68
Modeling Output Capacitive Load: Example 1
Spec: Maximum capacitive load on output port B = 30fF
MY_DESIGN B
Unit of capacitance is 1pF in this example.
30fF Defined in the technology library.
mydesign.co
n
create_clock -period 2 [get_ports Clk]
set_input_delay -max 0.6 -clock Clk [get_ports A]
set_output_delay -max 0.8 -clock Clk [get_ports B]
set_load –max [expr {30.0/1000}] [get_ports B]
B A B A
AN2
B
A
70
Effect of Input Transition Time
A MY_DESIGN
D Q
input data
FF2
arrival time
QB
Fast or slow 1.4ns FF2 Setup
transition? time
Clk
0ns 2ns
Data
at A
Data at
FF2 D-pin
0.6ns
Rise and fall transition times on an input port affect the cell delay of the input gate.
71
Modeling Input Transition: Example 1
Spec: Maximum rise/fall input transition on input port A = 0.12ns
0.12ns
A MY_DESIGN
mydesign.co
n
create_clock -period 2 [get_ports Clk]
set_input_delay -max 0.6 -clock Clk [get_ports A]
set_output_delay -max 0.8 -clock Clk [get_ports B]
set_load –max [expr {30.0/1000}] [get_ports B]
set_input_transition –max 0.12 [get_ports A]
Apply Constraints
Standard cell
Synthesize
Floorplan Netlist the Design
placement
Analyze Results
Design Planning & “SPG” Placement
(ICC II DP / ICC II) Write out Netlist
with Cell Placement
74
Default compile_ultra Optimizations
compile_ultra Performs three levels of optimization:
Architectural level or high level synthesis
Only performed when compiling RTL or unmapped ddc
RTL Description Logic level or GTECH optimization
or unmapped ddc Gate-level or mapping optimization
Optimization priority: Logic DRCs and timing
Architectural Minimizes area without impacting other constraints
When used in Topo mode, performs under-the-hood
Logic level placement and routing estimation to calculate net RCs
Requires an Ultra as well as a DesignWare Foundation
Gate level license
Invokes additional high performance optimization
Optimized Netlist algorithms (examples follow)
What is the DesignWare Library?
A collection of soft IP blocks and Datapath components:
Technology independent, pre-verified, reusable, parameterizable, synthesizable
Accessing the Right Component:
Operator inferencing for arithmetic operators (architectural level)
+, -, *, >, =, <
Operators greater than 4 bits wide infer a hierarchical sub-block
Instantiation for a wide variety of standard IP
DW_fifo_..., DW_shiftreg, DW_div_seq, DW_ram_...
Sequential logic
optimization: DFF-ENBL
compile_ultra combination is logically equivalent to
al logic the U7 MUX with the FF2
Auto-ungrouping allows optimization D-FF, but smaller and
improved combo and faster
sequential logic TOP
optimization SUB1
D Q
FF1
U56 D Q
Note: May affect formal QB
EDFF
verification tools, and RTL- ENBL
QB
based testbenches!
Boundary Optimization - ON by Default
SUB1 U1 U2 SUB2 TOP
D Q
FF1
In2 U0 Timing-critical Boundary optimization
QB path
U4
U3 SUB3 SUB4
In3 U5
U6 U7
0
In4 1 D Q
FF2
QB
compile_ultra –no_autoungroup
TOP
Note: May affect SUB2 SUB3 SUB4
SUB1
formal verification D Q U6 U7
tools, and RTL- FF1 U5 0
D Q
based QB 1
FF2
testbenches! In4 QB
Generate a Constraint Report After Compile
compile_ultra ...
report_constraint –all_violators
max_delay/setup (‘Clk' group)
Required Actual
Endpoint Path Delay Path Delay Slack
-----------------------------------------------------------------
I_COUNT/PCint_reg[4]/D 1.71 1.75 r -0.03 (VIOLATED)
I_COUNT/PCint_reg[6]/D 1.71 1.75 r -0.03 (VIOLATED)
I_COUNT/PCint_reg[2]/D 1.71 1.75 r -0.03 (VIOLATED)
I_COUNT/PCint_reg[3]/D 1.71 1.74 r -0.03 (VIOLATED)
I_COUNT/PCint_reg[0]/D 1.71 1.73 f -0.02 (VIOLATED)
I_COUNT/PCint_reg[7]/D 1.71 1.73 r -0.02 (VIOLATED)
I_COUNT/PCint_reg[1]/D 1.71 1.72 r -0.01 (VIOLATED)
max_capacitance
Required Actual
Net Capacitance Capacitance Slack
---------------------------------------------------------------
CurrentState[0] 0.20 0.24 -0.04 (VIOLATED)
81
DC NXT Physical Synthesis Flow
RTL Design Load Libraries
db Design Constraints
Logic
Libraries
Load RTL Code
design library DC NXT
ndm
Physical Load Floorplan
Library Physical Synthesis
Apply Constraints
Standard cell
Synthesize
Floorplan Netlist the Design
placement
Analyze Results
Design Planning & “SPG” Placement
(ICC II DP / ICC II) Write out Netlist
with Cell Placement
82
Timing Report: Path Delay Section
Individual Contribution Running Total of
to Path Delay the Path Delay
Apply Constraints
Standard cell
Synthesize
Floorplan Netlist the Design
placement
Analyze Results
Design Planning & “SPG” Placement
(ICC II DP / ICC II) Write out Netlist
with Cell Placement
84
Data Needed for Physical Design or Layout
From DC NXT
write_file –f verilog ..
write_icc2_files \
write_sdc ..
–out DESIGN_icc2
write_scan_def ..
IC 3rd Party
Compiler II Layout tool
85
Main References
• Rob Rutenbar “From Logic to Layout”
• IDESA
• Rabaey, “Low Power Design Essentials”
• vlsicad.ucsd.edu ECE 260B – CSE 241A
• Roy Shor, BGU
• Synopsys slides
86