0% found this document useful (0 votes)
65 views

Lecture 4 - Synthesis - Part 2 2022

The lecture outline discusses Boolean minimization, constraint definition, technology mapping, Verilog for synthesis, timing optimization, and the Design Compiler flow. The document then explains that after loading the design and library, the synthesis process performs Boolean minimization by compiling the RTL into a Boolean data structure, binding non-Boolean modules to cells, and optimizing the Boolean logic. This results in a generic netlist that can be mapped to specific gates in a target technology.

Uploaded by

Shay Samia
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views

Lecture 4 - Synthesis - Part 2 2022

The lecture outline discusses Boolean minimization, constraint definition, technology mapping, Verilog for synthesis, timing optimization, and the Design Compiler flow. The document then explains that after loading the design and library, the synthesis process performs Boolean minimization by compiling the RTL into a Boolean data structure, binding non-Boolean modules to cells, and optimizing the Boolean logic. This results in a generic netlist that can be mapped to specific gates in a target technology.

Uploaded by

Shay Samia
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

Digital VLSI Design

Lecture 4: Logic Synthesis


Part 2
Semester B, 2021-22
Lecturer: Mr, Zvika Webb
14 March 2022

Disclaimer: This course was prepared, in its entirety, by Zvi Webb. Many materials were copied from sources freely available on the internet. When possible, these sources have been cited;
however, some references may have been cited incorrectly or overlooked. If you feel that a picture, graph, or code example has been copied from you and either needs to be cited or removed,
please feel free to email [email protected] and I will address this as soon as possible.
Lecture Outline

• Boolean Minimization
• Constraint Definition
• Technology Mapping
• Verilog for Synthesis – revisited
• Timing Optimization
• Design Compiler (DC) Flow

2
Syntax Analysis
What have we discussed so far?
Library Definition
• Not too much…
• We briefly discussed compilation. Elaboration and
Binding
• And then we really dove down and dirty into standard
cell libraries. Pre-mapping
Optimization
• So at this point:
• We have loaded our design into the synthesizer. Constraint Definition
• And we have loaded our standard cell library and IPs.
• We can move on to discuss the “brains” of the synthesis Technology Mapping
process.
Post-mapping
Optimization

Report and export

3
Syntax
Analysis
Library
Definition
Elaboration
and Binding
Pre-mapping

Boolean Minimization
Optimization
Constraint
Definition
Technology

Mapping to Generics and Libs, Basics of Boolean Minimization Mapping


Post-mapping
(BDDs, Two-Level Logic, Espresso) Optimization
Report and
export

4
Syntax
Analysis

Elaboration and Binding Library


Definition
Elaboration
and Binding

• During the next step of logic synthesis, the tool: Pre-mapping


Optimization
• Compiles the RTL into a Boolean data structure (elaboration) Constraint
Definition
• Binds the non-Boolean modules to leaf cells (binding), and Technology

• Optimizes the Boolean logic (minimization). Mapping


Post-mapping

• The resulting design is mapped to generic, Optimization


Report and
technology independent logic gates. export

• This is the core of synthesis and has been a very central subject of
research in computer science since the eighties.

Generic
RTL Compilation Binding Optimization Netlist

5
Syntax
Analysis

Elaboration Illustrated x1 x2 x3
0 0 0
f(x1x2x3)
1
Library
Definition
Elaboration
0 0 1 1 and Binding
0 1 0 1 Pre-mapping
Optimization
0 1 1 0
Constraint
1 0 0 0 Definition
1 0 1 0 Technology
Mapping
1 1 0 1
Post-mapping
Boolean Logic Optimization
Report and
export

F1=ACB’+DEF’+A’BCF+…
F2=C’B’+D’GH’+A’FG’+…

Two-level Logic

Inferred Registers
Syntax
Analysis

Two-Level Logic
B Library
C Definition
Elaboration
• During elaboration, primary inputs and outputs (ports) are B and Binding
D Pre-mapping
defined and sequential elements (flip-flops, latches) are F Optimization
A
inferred. C
Constraint
Definition
• This results in a set of combinational logic clouds with: D
Technology
Mapping
• Input ports and register outputs are inputs to the logic A
C Post-mapping
• Output ports and register inputs are the outputs of the logic D Optimization
• The outputs can be described as Boolean functions of the Report and
inputs. C
export

• The goal of Boolean minimization is to reduce D


the number of literals in the output functions.
A
• Many different data structures are used to represent the B F
D f = x 1x 2
Boolean functions:
• Truth tables, cubes, Binary Decision Diagrams, A
B
equations, etc. C
• A lot of the research was developed upon SOP or POS x3
representation, which is better known as “Two-Level Logic” x2
7 x1
Syntax
Analysis

Two-Level Logic Minimization Library


Definition
Elaboration
and Binding

• In our freshman year we learned about Karnaugh maps: Pre-mapping


Optimization

• For n inputs, the map contains 2n entries AB


A Constraint
Definition
CD 00 01 11 10
• Objective is to find the minimum prime cover 00 X 1 0 1
Technology
Mapping
• However… 01 0 1 1 1 Post-mapping
Optimization
D
• Difficult to automate (NP-complete) 11 0 X X 0 Report and
export
• Number of cells is exponential (<6 variables) C
10 0 1 0 1

• A different approach is the Quine-McCluskey method B

• Easy to implement in software


• BUT computational complexity too high
• Some Berkeley students fell asleep while solving
a Quine-McCluskey exercise.
They needed a shot of Espresso.
8
Syntax
Analysis

Espresso Heuristic Minimizer Library


Definition
Elaboration
and Binding
ESPRESSO(F) {
• Start with an SOP solution. do { Pre-mapping
Optimization
• Expand reduce(F); Constraint
expand(F); Definition
• Make each cube as large as possible without irredundant(F); Technology
covering a point in the OFF-set. } while (fewer terms in F); Mapping
• Increases the number of literals (worse solution) verify(F); Post-mapping
Optimization
• Irredundant }
Report and
export
• Throw out redundant cubes.
• Remove smaller cubes whose points are covered by larger cubes.
• Reduce
• The cubes in the cover are reduced in size.
• In general, the new cover will be different from the initial cover.
• “expand” and “irredundant” steps can possibly find out a new way to cover
the points in the ON-set.
• Hopefully, the new cover will be smaller.

9
Syntax
f  AC  CD  AC  CD Analysis

Espresso Example f  AC  ACD  AC  ACD


Library
Definition
Elaboration
and Binding
Pre-mapping
Optimization

A Constraint
A
AB AB Definition

CD 00 01 11 10 CD 00 01 11 10 Technology
Mapping
00 1 1 0 0 00 1 1 0 0
Post-mapping
Optimization
01 1 1 1 1 01 1 1 1 1 Report and
D export
D
11 0 0 1 1 11 0 0 1 1
C C ESPRESSO(F) {
10 1 1 1 1 10 1 1 1 1 do {
reduce(F);
expand(F);
B B irredundant(F);
Result of REDUCE: } while (F smaller);
Initial Set of Primes found by verify(F);
Steps1 and 2 of the Espresso Shrink primes while still }
Method covering the ON-set

4 primes, irredundant cover, Choice of order in which


but not a minimal cover! to perform shrink is important
Syntax
f  AC  AD  AC  CD Analysis

Espresso Example f  AC  AD  CD Only 6


Library
Definition
Elaboration
and Binding
literals! Pre-mapping
Optimization

A A Constraint
AB AB Definition

CD 00 01 11 10 CD 00 01 11 10 Technology
Mapping
00 1 1 0 0 00 1 1 0 0
Post-mapping
Optimization
01 1 1 1 1 01 1 1 1 1 Report and
D D export
11 0 0 1 1 11 0 0 1 1
C C ESPRESSO(F) {
10 1 1 1 1 10 1 1 1 1 do {
reduce(F);
expand(F);
B B irredundant(F);
} while (F smaller);
Second EXPAND generates a IRREDUNDANT COVER found by verify(F);
different set of prime implicants final step of espresso }

Only three prime implicants!


Syntax
Analysis

Multi-level Logic Minimization Library


Definition
Elaboration
and Binding

• Two-level logic minimization has been widely researched Pre-mapping


Optimization

and many famous methods have come out of it. Constraint


Definition
• However, often it is better and/or more practical to use Technology
Mapping
many levels of logic (remember logical effort?). Post-mapping
Optimization
Report and
export

• Therefore, a whole new optimization regime, known as


multi-level logic minimization was developed.
• We will not cover multi-level minimization in this course,
however, you should be aware that the output of logic
minimization will generally be multi-level and not two-level.

12
Syntax
Analysis

Multi-level Logic Minimization Library


Definition
Elaboration
and Binding

• For example: Pre-mapping


Optimization
• Given the following logic set: Constraint
Definition
t1 = a + bc; 17
a+bc Technology
t2 = d + e; Literals Mapping
t1t2 + fg
t3 = ab + d; Post-mapping

t4 = t1t2 + fg; d+e Optimization


Report and
t5 = t4h + t2t3; t4h + t2t3 t5 ’ F export
F = t5’; ab+d

• Multi-level Logic Minimization can result in: 13


Literals
t1 = d + e;
t2 = b + h; d+e t1t3 + fgh t4 ’ F
t3 = at2 + c;
t4 = t1t3 + fgh;
F = t4’;
b+h at2 +c
13
Syntax
Analysis

Binary Decision Diagrams (BDD) Library


Definition
Elaboration
and Binding

• BDDs are DAGs that represent the truth table of a given function Pre-mapping
Optimization
Constraint
Root node Definition
x1 x2 x3 f(x1x2x3)
f(x1,x2,x3) Technology
Mapping
0 0 0 1 x1
Post-mapping
0 1 Optimization
0 0 1 1
~(x2x3) Report and
x2 x2 x2 ~x3 export
0 1 0 1
0 1 0 1
0 1 1 0
1 0 0 0 ~x3
x3 x3 ~x3 x3 x3
1 0 1 0
0 1 0 1 0 1 0 1
1 1 0 1
1 1 1 0 0 0 1 0

f(x1, x2, x3) = ~x1~x2~x3 + ~x1~x2x3 + ~x1x2~x3 + x1x2~x3


14
Syntax
Analysis

Binary Decision Diagrams (BDD) Library


Definition
Elaboration
and Binding

• The Shannon Expansion of a function relates Pre-mapping


Optimization
the function to its cofactors: Constraint

f
Definition
• Given a Boolean function f(x1,x2,…,xi,…,xn) Technology
• Positive cofactor: fi1 = f(x1,x2,…,1,…,xn) Mapping
Post-mapping
• Negative cofactor: fi0 = f(x1,x2,…,0,…,xn) a Optimization
Report and
• Shannon’s expansion theorem states that export

• f = xi’ fi0 + xi fi1


• f = (xi + fi0 )(xi’ + fi1 )
• This leads to the formation of a BDD: b’c’ + bc c
• Example: f = ac + bc + a’b’c’
= a’ (b’c’ + bc) + a (c + bc)
= a’ (b’c’ + bc) + a (c)
15
Syntax
Analysis

Reduced Ordered BDD (ROBDD) Library


Definition
Elaboration
and Binding

• BDDs can get very big. Pre-mapping


Optimization
• So let’s see if we can provide a reduced representation. Constraint
Definition

• Reduction Rule 1: Merge equivalent leaves Technology


Mapping
Post-mapping
Optimization
a a a f(x1,x2,x3) x1
f(x1,x2,x3)
x1 Report and
export

~(x2x3) ~(x2x3) x2 ~x3


x2 x2 x2 ~x3 x2 x2

x3 x3 ~x3 x3 x3 x3 ~x3 x3 ~x3 x3


x3 ~x3

1 1 1 0 0 1 0 1 0
f(x1, x2, x3) = ~x1~x2~x3 + ~x1~x2x3 + ~x1x2~x3 + x1x2~x3 = ~x1~x2 + ~x1x2~x3 + x1x2~x3
16
Syntax
Analysis

Reduced Ordered BDD (ROBDD) Library


Definition
Elaboration
and Binding

• BDDs can get very big. Pre-mapping


Optimization
• So let’s see if we can provide a reduced representation. Constraint
Definition

• Reduction Rule 2: Merge isomorphic nodes Technology


Mapping
Post-mapping

x1
f(x1,x2,x3) f(x1,x2,x3) x1
Optimization
x x x x Report and
export

~(x2x3) x2 x2 ~x3 x2
~(x2x3) x2 x2 ~x3 x2
y z y z

x3 x3 ~x3 x3 ~x3 x3 x3 x3 ~x3 x3

1 0 1 0

17
Syntax
Analysis

Reduced Ordered BDD (ROBDD) Library


Definition
Elaboration
and Binding

• BDDs can get very big. Pre-mapping


Optimization
• So let’s see if we can provide a reduced representation. Constraint
Definition

• Reduction Rule 3: Eliminate Redundant Tests Technology


Mapping
Post-mapping
x1
f(x1,x2,x3) x1
f(x1,x2,x3) Optimization
x Report and
y export

y ~(x2x3) x2 x2 x2 ~x3 ~(x2x3) x2 x2 x2 ~x3

x3 x3 ~x3 x3 x3
f(x1, x2, x3) = ~x1~x2 + ~x3
~x1x2~x3 + x1x2~x3

1 0 1 0

18
Syntax
Analysis

Binary Decision Diagrams (BDD) Library


Definition
Elaboration
and Binding
f = ab+a’c+a’bd
• Some benefits of BDDs: Pre-mapping
Optimization
• Check for tautology is trivial. root Constraint
Definition
• BDD is a constant 1. node
a Technology
• Complementation. Mapping

• Given a BDD for a function f, the BDD for f’ can be c+bd b Post-mapping
Optimization
obtained by interchanging the terminal nodes. b b
Report and
• Equivalence check. c export
c+d
• Two functions f and g are equivalent if their BDDs (under c c
the same variable ordering) are the same.
• An Important Point: d
• The size of a BDD can vary drastically if the order d
in which the variables are expanded is changed.
• The number of nodes in the BDD can be 0 1
exponential in the number of variables in the
19
worst case, even after reduction.
Syntax
Analysis
Library
Definition
Elaboration
and Binding
Pre-mapping
Optimization
Constraint

Constraint Definition
Definition
Technology
Mapping
Post-mapping
Optimization
Report and
export

20
Syntax
Analysis

Constraint Definition Library


Definition
Elaboration
and Binding
• Following Elaboration, the design is loaded into the synthesis tool Pre-mapping

and stored inside a data structure. Optimization


Constraint

• Hierarchical ports (inputs/outputs) and registers can be accessed by Definition


Technology
name. Mapping
Post-mapping
set in_ports [get_ports IN*] Optimization
set regs [get_cells –hier *_reg] Report and
export

• At this point, we can load the design constraints in SDC format, as


we will learn in Lecture 5.
read_sdc –verbose sdc/constraints.sdc

• For example, to create a clock and define the target frequency:


create_clock –period $PERIOD –name $CLK_NAME [get_ports $CLK_PORT]

• Carefully check that all constraints were accepted by the tool!


21
Design Objects
• Design: A circuit description that performs one or more logical functions (i.e Verilog module).
• Cell: An instantiation of a design within another design (i.e Verilog instance).
• Called an inst in Stylus Common UI.
• Reference: The original design that a cell "points to" (i.e Verilog sub-module)
• Called a module in Stylus Common UI. module foo (a,b,out);
• Port: The input, output or inout port of a Design. input a, b; Design
output out; Port
• Pin: The input, output or inout pin of a Cell in the Design.
wire n1; Pin
• Net: The wire that connects Ports to Pins Net
Cell (inst)
and/or Pins to each other.
INVx1 U1 (.in(a),.out(n1));
• Clock: Port of a Design or Pin of a Cell explicitly
NANDX3 U2 (.in1(n1),.in2(b),.out(out));
defined as a clock source.
• Called a clock_tree in Stylus Common UI. Reference
endmodule (module)
22
Syntax
Analysis
Library
Definition
Elaboration
and Binding
Pre-mapping
Optimization
Constraint
Definition

Technology Mapping Technology


Mapping
Post-mapping
Optimization
Report and
export

23
Syntax
Analysis

Technology mapping Library


Definition
Elaboration
and Binding

• Technology mapping is the phase of logic synthesis when gates are Pre-mapping
Optimization
selected from a technology library to implement the circuit. Constraint
Definition
• Why technology mapping? Technology
Mapping
• Straight implementation may not be good. Post-mapping
Optimization
• For example, F=abcdef as a 6-input AND gate causes a long delay. Report and
• Gates in the library are pre-designed, they are usually optimized in export

terms of area, delay, power, etc.


• Fastest gates along the critical path, area-efficient gates (combination)
off the critical path.
• Can apply a minimum cost tree-covering algorithm to solve this
problem.

24
Syntax
Analysis

Technology Mapping Algorithm Library


Definition
Elaboration
and Binding

• Using a recursive tree-covering algorithm, we can easily, and almost Pre-mapping


Optimization
optimally, map a logic network to a technology library. Constraint
Definition
• This process incurs three steps: Technology
Mapping
• Map netlist and tech library to simple gates Post-mapping
Optimization
• Describe the netlist with only NAND2 and NOT gates
Report and
• Describe SC library with NAND2 and NOT gates and associate a cost with each gate export

• Tree-ifying the input netlist


• Tree covering can only be applied to trees!
• Split tree at all places, where fanout > 2
• Minimum Cost Tree matching
• For each node in your tree, recursively find the minimum cost target pattern at that
node.
• Let us briefly go through these steps
25
Syntax
Analysis

1. Simple Gate Mapping Library


Definition
Elaboration
and Binding

• Apply De Morgan laws to your Boolean function to make it Pre-mapping


Optimization
a collection of NAND2 and NOT gates. Constraint
Definition
• Let’s take the example of multi-level logic minimization: Technology
Mapping
Post-mapping
t1 = d + e;
f Optimization
t2 = b + h; Report and
t3 = at2 + c; g export
t4 = t1t3 + fgh; fgh
F = t4’; d
t1 F
t1  d  e  NAND  d , e  e
t2  b  h  NAND  b , h 
h
t3  at2  c  at2  c  NAND  NAND  a, t2  , c  t2
b

t4  t1t3  fgh  NAND t1t3 , fgh 
a
t3

fgh  fh  g  fh  g  NAND NAND  f , h , g  c
26 F  t4
Syntax
Analysis

1. Simple Gate Mapping Library


Definition
Elaboration
and Binding

• And then, given a set of gates (standard cell library) Pre-mapping


Optimization
with cost metrics (area/delay/power): Constraint
Definition
Technology
Mapping
Post-mapping
Optimization
Report and
export

• We need to define the gates with the same NAND2/NOT set:


aoi21 (3) oai22 (4)
inv(1) nand3 (3)
nand2(2)

xor (5)
nor3 (3)
nor2(2)

27
Syntax
Analysis

2. Tree-ifying Library
Definition
Elaboration
and Binding

• To apply a tree covering algorithm, we must work on a tree! Pre-mapping


Optimization
• Is any given logic network a tree? Constraint
Definition
• No! Technology

• We must break the tree at any node with fanout>2 Mapping


Post-mapping
Optimization
Report and
export

We get 3 trees
28
Syntax
Analysis

3. Minimum Tree Covering Library


Definition
Elaboration
and Binding

• Now, we can apply a recursive algorithm to achieve a minimum cover: Pre-mapping


Optimization
• Start at the output of the graph. i Constraint
Definition
• For each node, find all the matching target patterns. Technology

• The cost of node i for using gate g is: gi


Mapping

 
Post-mapping
cost  i   min k cost  g i    k cost  ki  Optimization
k1 k2 Report and
• where ki are the inputs to gate g. export

• For simplicity, we will redraw our graph and show an example:


• Every NOT is just an empty circle: I
• Every NAND is just a full circle: N k inputs to gi
• Every input is just a box: A

29
Syntax
Analysis

3. Minimum Tree Covering - Example Library


Definition
Elaboration
and Binding

F f: NOT 2 + min(w) = 2 + 11 = 13 Pre-mapping


Optimization
AOI21 I AND2 4 + min(y)+min(z) = 4 + 2 + 6 = 12
f Constraint
AOI21 6 + min(x) = 6 + 3 = 9 Definition
w: NAND2 3 + min(y)+min(z) = 3 + 2 + 6 = 11 Technology
Mapping
w y: NOT 2
N Post-mapping
z: NAND2 3 + min(x) = 3 + 3 = 6 Optimization
x: NAND2 3 Report and
export
I y N z

NAND2
I I
A B N x 6 6
I N N
A B 4
I N N I I I N
2 3
C D
CD NOT NAND2 AND2 NOR2 AOI21
Verilog for Synthesis -
revisited

31
Some things we may have missed
• Now that we’ve seen how synthesis works, let’s revisit some of
the things we may have skipped or only briefly mentioned earlier…
• Let’s take a simple 42 encoder as an example:
• Take a one-hot encoded vector and output the position of the ‘1’ bit.
• One possibility would be to describe this logic with a nested if-else block:
always @(x)
begin : encode
if (x == 4'b0001) y = 2'b00;
else if (x == 4'b0010) y = 2'b01;
else if (x == 4'b0100) y = 2'b10;
else if (x == 4'b1000) y = 2'b11;
else y = 2'bxx;
end

• The result is known as “priority logic”


• i.e., some bits have priority over others…
32
Some things we may have missed
• It would have been better to use a case construct:
• All cases are always @(x)
begin : encode
matched in parallel case (x)
4’b0001: y = 2'b00;
4’b0010: y = 2'b01;
• And better yet, synthesis 4'b0100: y = 2'b10;
can optimize away the 4'b1000: y = 2'b11;
constants and other default: y = 2'bxx;
endcase
Boolean equalities: end

33
Some things we may have missed
• In the previous example, if the encoding was wrong (i.e., not one-hot), we would
have propagated an x in the logic simulation.
• But what if we guarantee that the input was one hot encoded?
• Then we could write our code differently…
always @(x)
begin : encode
if (x[0]) y = 2'b00;
else if (x[1]) y = 2'b01;
else if (x[2]) y = 2'b10;
else if (x[3]) y = 2'b11;
else y = 2'bxx;
end

• In fact, we have implemented a “priority decoder”


(the least significant ‘1’ gets priority)

34
A few points about operators
Y = ~X << 2
• Logical operators map into primitive logic gates
• Arithmetic operators map into adders, subtractors, …
X[3]
• Unsigned 2’s complement Y[5]

• Model carry: target is one-bit wider that source X[2] Y[4]


• Watch out for *, %, and / X[1] Y[3]

• Relational operators generate comparators X[0] Y[2]

• Shifts by constant amount are just wire connections Y[1]

• No logic involved
• Variable shift amounts a whole different story  shifter Y[0]

• Conditional expression generates logic or MUX

35
Datapath Synthesis
• Complex operators (Adders, Multipliers, etc.) are implemented in a special way

• Pre-written descriptions can be found in


Synopsys DesignWare or Cadence ChipWare IP libraries.
36
Global Clock Gating
Clock Gating enF FSM

• As you know, since a clock is continuously toggling, it is a


major consumer of dynamic power.
enE Execution
• Therefore, in order to save power, we will try to turn off the Unit
clock for gates that are not in use.
• Block level (Global) clock-gating
enM Memory
• If certain operating modes do not use an entire Control
module/component, a clock gate should be defined in the RTL. clk

• Register level (Local) clock-gating


• However, even at the register level, Local Clock Gating
if a flip-flop doesn’t change it’s output,
din d q dout
internal power is still dissipated due
to the clock toggling. d q
din dout qn
• This is very typical of an enabled signal en
sampling, and therefore can be automatically en clk
qn clk
detected and gated by the synthesis tool. clk clk

37
Clock Gating
• Local clock gating: 3 methods • Conventional RTL Code
• Logic synthesizer finds and //always clock the register
always @ (posedge clk) begin
implements local gating if (enable) q <= din;
opportunities end
• RTL code explicitly specifies
clock gating • Low Power Clock Gated RTL
//only clock the ff when enable is true
• Clock gating cell explicitly • ><
assign gclk = enable && clk;
instantiated in RTL always @ (posedge gclk) begin
q <= din;
• Global clock gating: 2 methods end

• RTL code explicitly specifies • Instantiated Clock Gating Cell


clock gating //instantiate a clock gating cell
clkgx1 i1 (.en(enable), .cp(clk), .gclk_out(gclk));
• Clock gating cell explicitly always @ (posedge gclk) begin
instantiated in RTL q <= din;
end
38
Clock Gating – Glitch Problem
• What happens if there is a glitch on the enable signal?
clk

en

What if the glitch


gclk happened during
the high phase?

Ah, we live in a Not so Maybe the world aint


perfect world!  Fast! so perfect after all…

39
Solution: Glitch-free Clock Gate
• By latching the enable signal during the
positive phase, we can eliminate glitches:

clk

en
//clock gating with glitch prevention latch
always @ (enable or clk)
begin
if (!clk)
en_out en_out <= enable;
end
assign gclk = en_out && clk;

gclk

40
Merging clock enable gates
• Clock gates with common enable can be merged
• Lower clock tree power, fewer gates
• May impact enable signal timing and skew.

clk
E clk
en E

enable E

41
Data Gating
• While clock gating is very well understood and automated, a similar situation
occurs due to the toggling of data signals that are not used.
• These situations should be
recognized and data gated.

assign add_out = A+B;


assign shift_out = A<<B;
assign out = shift_add ? shift_out : add_out;

assign shift_in_A = {`WIDTH{shift_add} & A;


assign shift_in_B = {`WIDTH{shift_add} & B;
assign shift_out = shift_in_A << shift_in_B;
assign out = shift_add ? shift_out : add_out;
42
Design and Verification – HDL Linting
• HDL Linting tools provide a quick easy check of likely coding inconsistencies:
• Simulation problems Simulation/Synthesis
• Synthesis Problems Miss-matches
• Simulation Synthesis mismatches always @(a)
• Clock gating z = a & b;
• Latch inference
• Clock Domain Crossing issues Latch Inference

• Nonsensical assignments / implicit bit widths issues always @(a or b or c)


if (c) z = a & b;
• Not for checking syntactic correctness
• Use your simulator for that. Clock Gating
(Will generally be more helpful)
assign clka = clk & cond;
• Alternatively some synthesis tools will give you always @(posedge clka)
basic lint warnings z <= a & b;

• For simulation-synthesis mismatch errors


43
Syntax
Analysis
Library
Definition
Elaboration
and Binding
Pre-mapping
Optimization
Constraint

Timing Optimization
Definition
Technology
Mapping
Post-mapping
Optimization
Report and
export

44
How can we optimize timing?
• There are many ‘transforms’ that the synthesizer applies to the logic to
improve the cost function:
• Resize cells
• Buffer or clone to reduce load on
critical nets
• Decompose large cells
Delay = 4
• Swap connections on commutative pins or among equivalent nets
• Move critical signals forward
• Pad early paths
• Area recovery
• Simple example:
• Double inverter removal transform: Delay = 2

45
Resizing, Cloning and Buffering 0.05
0.04
0.03

d
0.02
• Resize a logic gate to better drive a load: 0.01
d 0.2 0
a a a
? e 0.2 A C 0 0.2 0.4 0.6 0.8 1
b b b 0.026
f 0.035 load
0.3
• Or make a copy (clone of the gate) to distribute the load: A B C
d 0.2
d
e 0.2 A
e
a
f 0.2 f
?
b g 0.2 a
B g
h 0.2 b h d 0.2
• Or just buffer the fanout net: e 0.2
a
B f 0.2
b B g 0.2
0.1
46 h 0.2
Redesign Fan-In/Fan-out Trees
• Redesign Fan-In Tree
Arr(a)=4 a e
1 a
Arr(b)=3 b b 1
e
1
1 c
Arr(c)=1 c Arr(e)=6 1 Arr(e)=5
1 d
Arr(d)=0 d

• Redesign Fan-Out Tree


3 3
1 Longest Path = 4
Slowdown of
1 1 buffer due to
1 1 load

1 1
1 2

1 1
Longest Path = 5
47
Decomposition and Swapping
• Consider decomposing complex gates into less complex ones:

• Swap commutative pins:


• Simple sorting on arrival times and delays can help
1 1
0 2
a 5 c 3
1 1
1 1
b 1 b 1
2 c 0
a
2 2

48
Retiming
FF FF FF
• Given the following network:
D Q DQ D Q

6 4 2 4 4

clock Cycle = 10
• How would you meet the 10ns clock cycle time?
• Re-order sequential elements and combinational logic
FF FF FF

D Q D Q DQ

6 4 2 4 4

49
clock Cycle = 10
Topographical Synthesis
• Also called Physical Aware Synthesis

50
Design Compiler (DC) Flow

51
DC NXT Transformations
1 Library (set target_library
Translation  Logic Optimization  Gate Mapping set link_library
creat_lib)
residue = 16’h0000; RTL Source db
if (high_bits == 2’b10) 2 Translate (analyze Logic
Libraries
residue = state_table[index]; elaborate)
design library
else
ndm
state_table[index] = 16’h0000; Physical
Library
5 Compile:
3 Floorplan Logic optimization + gate mapping
(read_floorplan) (compile_ultra)

4x
Timing Constraints 3x
create_clock ….
2x 8x
set_input_delay … Generic boolean or GTECH or
unmapped gates 1x 2x
4 Constrain (source)
Technology-specific placed gates

52
6 Save (write_icc2_files)
DC NXT Physical Synthesis Flow
RTL Design Load Libraries
db Design Constraints
Logic
Libraries
Load RTL Code
design library DC NXT
ndm
Physical Load Floorplan
Library Physical Synthesis

Apply Constraints

Standard Synthesize
Floorplan cell Netlist the Design
placement
Analyze Results
Design Planning & “SPG” Placement
(ICC II DP / ICC II) Write out Netlist
with Cell Placement

53
DC NXT Physical Synthesis Flow
RTL Design Load Libraries
db Design Constraints
Logic
Libraries
Load RTL Code
design library DC NXT
ndm
Physical Load Floorplan
Library Physical Synthesis

Apply Constraints

Standard cell
Synthesize
Floorplan Netlist the Design
placement

Analyze Results
Design Planning & “SPG” Placement
(ICC II DP / ICC II) Write out Netlist
with Cell Placement

54
Target Library: Used to Select Technology Specific Cells
• The target_library content is used during compile to create a technology-specific
gate-level netlist
• DC NXT optimization selects the smallest technology-specific gates that meet the required
DRCs, timing and logic functionality
• Default setting: (printvar target_library) Non-existent default
library name
target_library = your_library.db
• Before compile, specify the actual standard cell logic library file(s) provided by the silicon
vendor or library group

set_app_var target_library libs/20nm_wc.db


Sets an application variable value: Reserved Specify the library characterized for
Typos generate an “Error” message DC NXT the appropriate PVT corner, for
- Safer than TCL “set” command variable setup timing optimization

55
Reading RTL File(s) with analyze + elaborate
CWD or PWD: The directory that DC NXT is invoked from risc_design(CWD)

UNIX% cd risc_design dc_setup cons/ rtl/ libs/


UNIX% dcnxt_shell -topo TOP.v 20nm_wc.db
IP.db
dcnxt_shell-topo> source dc_setup.tcl
dcnxt_shell-topo> analyze –format verilog TOP.v
Compiling source file './rtl/TOP.v' dc_setup.tcl
dcnxt_shell-topo> elaborate MY_TOP set_app_var search_path “$search_path cons rtl libs”
set_app_var target_library 20nm_wc.db
set_app_var link_library “* $target_library IP.db”
Loading db file './libs/20nm_wc.db'
Loading db file './libs/IP.db'
Loading db file '.../libraries/syn/gtech.db'
Loading db file '.../libraries/syn/standard.sldb'
Loading link library ‘gtech’
Elaborated 1 design.
Current design is now 'MY_TOP’.
Presto compilation completed successfully. (MY_TOP MY_A MY_B)
Good Practice: Save the Design Before compile
risc_design(CWD)

analyze –f verilog {A.v B.v TOP.v} unmapped/ rtl/


MY_TOP.ddc TOP.v
elaborate MY_TOP A.v
link B.v
check_design
write_file –format ddc –hierarchy –output unmapped/MY_TOP.ddc
write_file –f verilog –hier –out unmapped/MY_TOP.v
source TOP.con
Good practice: Save design
... Discussed later in ddc format before
compile_ultra
constraining/compiling

analyze/elaborate translates RTL into unmapped ddc format


• Translation of large designs may take a long time
• May need to re-read the un-compiled design in the future
• read_ddc is faster  Save unmapped ddc

57
The “Chicken and Egg” Problem
• Physical synthesis requires a floorplan RTL
Design
• Generating a floorplan requires a netlist
• But synthesizing a netlist requires a Floorplan Physical Synthesis

floorplan …
• But generating a floorplan requires a
netlist …. Netlist

Design Planning
(ICC II DP)

What comes first – the netlist or the floorplan?

58
Modified FP Constraints – Pre-Floorplan Synthesis
Prior to having a finalized floorplan available, if any floorplan constraints
are expected to be significantly different than the default
 Specify them before the 1st synthesis run (after loading RTL design)

IP IP
RAM
RAM

Example DC NXT interconnect estimate


using modified floorplan constraints:
Example DC NXT interconnect Core size/shape, macro cell and pin
estimate using default floorplan placement
constraints
DC NXT Physical Synthesis Flow
RTL Design Load Libraries
db Design Constraints
Logic
Libraries
Load RTL Code
design library DC NXT
ndm
Physical Load Floorplan
Library Physical Synthesis

Apply Constraints

Synthesize
Standard cell
Floorplan Netlist the Design
placement

Analyze Results
Design Planning & “SPG” Placement
(ICC II DP / ICC II) Write out Netlist
with Cell Placement

60
Default Design Scenario
Assumed external Assumed external
launching circuitry Current design capturing circuitry
JANE’s_DESIGN MY_DESIGN JOE’s_DESIGN

D Q M N D Q X D Q S T D Q
FF1 FF2 FF3 FF4
QB QB QB QB

Clk

• DC NXT assumes a “synchronously-clocked” environment


• By default:
• Input data arrives from a pos-edge clocked device
• Output data goes to a pos-edge clocked device

61
Timing Path Definition
Path 1 Path 2 Path 3

A N D Q X D Q S C
FF2 FF3
QB QB
CLK_IN CLK_OUT

B F D
CURRENT_DESIGN
Path 4
DC NXT breaks designs into timing paths, each with a:
 Startpoint
 Input port (other than a Clock port)
 Clock pin of Flip-Flop or register
 Endpoint
 Output port (other than a Clock port)
 Any input pin of a sequential device, except clock pin1

62
Constraining Reg-to-Reg Paths: Example
Spec:
Unit of time is 1ns in this example.
Clock Period = 2ns Defined in the technology library.

create_clock -period 2 [get_ports Clk]

MY_DESIGN TSetup, FF3


0.2ns
N D Q X D Q S
FF2 FF3
Tmax
QB QB
Clk

0ns 1ns 2ns

What is the maximum delay requirement Tmax for the register-to-


register path through X in the MY_DESIGN? __________________
63
Constraining Input Paths:
Spec:
Latest Data Arrival Time at Port A, after Jane’s launching clock edge = 0.6ns
mydesign.con
create_clock -period 2 [get_ports Clk]
set_clock_uncertainty –setup 0.3 [get_clocks Clk]
set_input_delay -max 0.6 -clock Clk [get_ports A]

MY_DESIGN
JANE’s_DESIGN TSetup, FF2
0.2ns
Tmax
A
D Q M N D Q X D Q S
FF1 FF2 FF3
0.6ns
QB QB QB

Clk

What is the maximum delay Tmax for the input path N in


MY_DESIGN? __________________________

64
Constraining Output Paths :
Spec:
Latest Data Arrival Time at Port B, before Joe’s capturing clock = 0.8ns
mydesign.con
create_clock -period 2 [get_ports Clk]
set_clock_uncertainty –setup 0.3 [get_clocks Clk]
set_input_delay -max 0.6 -clock Clk [get_ports A]
set_output_delay -max 0.8 -clock Clk [get_ports B]

JOE’s_DESIGN
MY_DESIGN TT + Tsetup
Tmax 0.7ns 0.1ns
B
N D Q X D Q S T D Q
FF2 FF3 FF4
QB QB QB
Clk

What is the maximum delay Tmax for the output path S


in MY_DESIGN? __________________________
65
Exercise: Combinational Design
JANE’s_DESIGN JOE’s_DESIGN
MY_DESIGN TT Tsetup
TM
0.4ns 0.2ns 0.1ns
A Combo
B
D Q M T D Q
FF1 FF4
QB QB
500 MHz
VClk

How do you constrain the Combo path? What


is the maximum delay through Combo?

create_clock -period 2 –name VClk


set_clock_uncertainty –setup 0.3 [get_clocks VClk]
set_input_delay –clock VClk –max 0.4 [get_ports A]
set_output_delay –clock VClk –max 0.3 [get_ports B]
TCombo, max = ________________________
66
Timing Constraint Summary
All input paths are constrained
by set_input_delay All register-to-register paths
are constrained by
create_clock
JANE’s_DESIGN MY_DESIGN JOE’s_DESIGN

D Q M N D Q X D Q S T D Q
FF1 FF2 FF3 FF4
QB QB QB QB

Clk

All output paths are constrained


by set_output_delay

You specify how much time is used by external logic...

DC NXT calculates how much time is left for the internal logic.

67
Effect of Output Capacitive Load
MY_DESIGN 3fF or
30fF?
B
D Q
FF3
QB
Clk
1.2ns

Clk

0ns 2ns

50%
B
1.2ns
Capacitive loading on an output port affects the transition time, and
thereby the cell delay, of the output driver.
By default DC NXT assumes zero capacitive loading on outputs. It is
therefore important to accurately model capacitive loading on all outputs.
68
Modeling Output Capacitive Load: Example 1
Spec: Maximum capacitive load on output port B = 30fF

MY_DESIGN B
Unit of capacitance is 1pF in this example.
30fF Defined in the technology library.

mydesign.co
n
create_clock -period 2 [get_ports Clk]
set_input_delay -max 0.6 -clock Clk [get_ports A]
set_output_delay -max 0.8 -clock Clk [get_ports B]
set_load –max [expr {30.0/1000}] [get_ports B]

TCL: Arithmetic expression

What if an absolute capacitive load value is not available?


69
Modeling Output Capacitive Load: Example 2
Spec: Maximum load on output port B = 1 “AN2” gate load, or
= 3 “inv1a0” gates

Use load_of lib/cell/pin to place the load of a gate


from the technology library on the port:
MY_DESIGN MY_DESIGN A

B A B A
AN2
B
A

set_load –max [load_of my_lib/AN2/A] [get_ports B]

set_load –max [expr {[load_of my_lib/inv1a0/A] * 3}] \


[get_ports B]1

70
Effect of Input Transition Time
A MY_DESIGN

D Q
input data
FF2
arrival time
QB
Fast or slow 1.4ns FF2 Setup
transition? time

Clk
0ns 2ns
Data
at A

Data at
FF2 D-pin
0.6ns

Rise and fall transition times on an input port affect the cell delay of the input gate.

By default DC NXT assumes zero transition times on inputs. It is therefore important to


accurately model transition times on all inputs.

71
Modeling Input Transition: Example 1
Spec: Maximum rise/fall input transition on input port A = 0.12ns

0.12ns
A MY_DESIGN

mydesign.co
n
create_clock -period 2 [get_ports Clk]
set_input_delay -max 0.6 -clock Clk [get_ports A]
set_output_delay -max 0.8 -clock Clk [get_ports B]
set_load –max [expr {30.0/1000}] [get_ports B]
set_input_transition –max 0.12 [get_ports A]

What if a specific transition time value is not known?


72
Modeling Input Transition: Example 2
Spec: Driving cell on input port A = OR3B gate, or
OR3B = Qn pin of FD1 flip-flop
A MY_DESIGN
Q
FD1
Qn
mydesign.co
n
create_clock -period 10 [get_ports Clk]
set_input_delay -max 3 -clock Clk [get_ports A]
set_output_delay -max 4 -clock Clk [get_ports B]
set_load –max [expr {30.0/1000}] [get_ports B]
set_driving_cell –max –lib_cell OR3B [get_ports A]
or Note: Port A will also
set_driving_cell –max –lib_cell FD1 –pin Qn [get_ports A] inherit any logic DRCs
(max_tran/cap) defined
on the driving cell’s
If no pin is given, DC NXT will output pin (in the library)
use first output pin listed in the
library cell definition!
73
DC NXT Physical Synthesis Flow
RTL Design Load Libraries
db Design Constraints
Logic
Libraries
Load RTL Code
design library DC NXT
ndm
Physical Load Floorplan
Library Physical Synthesis

Apply Constraints

Standard cell
Synthesize
Floorplan Netlist the Design
placement

Analyze Results
Design Planning & “SPG” Placement
(ICC II DP / ICC II) Write out Netlist
with Cell Placement

74
Default compile_ultra Optimizations
compile_ultra  Performs three levels of optimization:
 Architectural level or high level synthesis
 Only performed when compiling RTL or unmapped ddc
RTL Description  Logic level or GTECH optimization
or unmapped ddc  Gate-level or mapping optimization
 Optimization priority: Logic DRCs and timing
Architectural  Minimizes area without impacting other constraints
 When used in Topo mode, performs under-the-hood
Logic level placement and routing estimation to calculate net RCs
 Requires an Ultra as well as a DesignWare Foundation
Gate level license
 Invokes additional high performance optimization
Optimized Netlist algorithms (examples follow)
What is the DesignWare Library?
 A collection of soft IP blocks and Datapath components:
 Technology independent, pre-verified, reusable, parameterizable, synthesizable
 Accessing the Right Component:
 Operator inferencing for arithmetic operators (architectural level)
 +, -, *, >, =, <
 Operators greater than 4 bits wide infer a hierarchical sub-block
 Instantiation for a wide variety of standard IP
 DW_fifo_..., DW_shiftreg, DW_div_seq, DW_ram_...

 Running compile_ultra requires a DesignWare Foundation license


 Enables arithmetic/datapath optimizations (shown next)
 Allows access to the DesignWare IP
 DesignWare library automatically included in the synthetic_library and
link_library variables during compile_ultra
User Controllable compile_ultra Optimizations
 Auto-ungrouping
 Boundary optimization
 Test-ready synthesis
 Adaptive retiming
Auto-Ungrouping Example Auto-ungrouping

SUB1 U1 U2 SUB2 TOP


D Q
FF1 U0 Timing-critical
QB path
U4
U3 SUB3 SUB4
U5 U7
U6
0
1 D Q
FF2
QB

Sequential logic
optimization: DFF-ENBL
compile_ultra combination is logically equivalent to
al logic the U7 MUX with the FF2
Auto-ungrouping allows optimization D-FF, but smaller and
improved combo and faster
sequential logic TOP
optimization SUB1
D Q
FF1
U56 D Q
Note: May affect formal QB
EDFF
verification tools, and RTL- ENBL
QB
based testbenches!
Boundary Optimization - ON by Default
SUB1 U1 U2 SUB2 TOP
D Q
FF1
In2 U0 Timing-critical Boundary optimization
QB path
U4
U3 SUB3 SUB4
In3 U5
U6 U7
0
In4 1 D Q
FF2
QB

Complement propagation Constant propagation Unconnected pin propagation


connects to compliment removes redundant gates removes redundant gates with
signal to reduce logic with tie-hi/lo inputs unconnected outputs

compile_ultra –no_autoungroup

TOP
Note: May affect SUB2 SUB3 SUB4
SUB1
formal verification D Q U6 U7
tools, and RTL- FF1 U5 0
D Q
based QB 1
FF2
testbenches! In4 QB
Generate a Constraint Report After Compile
compile_ultra ...
report_constraint –all_violators
max_delay/setup (‘Clk' group)
Required Actual
Endpoint Path Delay Path Delay Slack
-----------------------------------------------------------------
I_COUNT/PCint_reg[4]/D 1.71 1.75 r -0.03 (VIOLATED)
I_COUNT/PCint_reg[6]/D 1.71 1.75 r -0.03 (VIOLATED)
I_COUNT/PCint_reg[2]/D 1.71 1.75 r -0.03 (VIOLATED)
I_COUNT/PCint_reg[3]/D 1.71 1.74 r -0.03 (VIOLATED)
I_COUNT/PCint_reg[0]/D 1.71 1.73 f -0.02 (VIOLATED)
I_COUNT/PCint_reg[7]/D 1.71 1.73 r -0.02 (VIOLATED)
I_COUNT/PCint_reg[1]/D 1.71 1.72 r -0.01 (VIOLATED)

max_capacitance
Required Actual
Net Capacitance Capacitance Slack
---------------------------------------------------------------
CurrentState[0] 0.20 0.24 -0.04 (VIOLATED)

Summarizes of all violating constraints. If no violations are reported,


no further analysis or optimization is needed. Use report_timing
for detailed timing path information.
Generate Timing Reports for More Detail
report_timing Point Incr Path
-----------------------------------------------------------
clock Clk (rise edge) 0.00 0.00
clock network delay (ideal) 0.00 0.00
input external delay 1.20 1.20 r
Neg_Flag (in) 0.06 1.26 r
U102/ZN (nd02d0) 0.13 * 1.38 f
U97/ZN (nd02d2) 0.09 * 1.48 r
U98/ZN (invbd4) 0.06 * 1.54 f
U159/ZN (nd02d0) 0.07 * 1.61 r
U50/Z (an03d1) 0.14 * 1.75 r
I_COUNT/PCint_reg[4]/D (dfnrn4) 0.00 * 1.75 r
data arrival time 1.75

clock Clk (rise edge) 2.00 2.00


clock network delay (ideal) 0.00 2.00
clock uncertainty -0.20 1.80
I_COUNT/PCint_reg[4]/CP (dfnrn4) 0.00 1.80 r
library setup time -0.09 1.71

Timing reports will data required time 1.71


-----------------------------------------------------------
be discussed in data required time
data arrival time
1.71
-1.75
the next unit -----------------------------------------------------------
slack (VIOLATED) -0.03

81
DC NXT Physical Synthesis Flow
RTL Design Load Libraries
db Design Constraints
Logic
Libraries
Load RTL Code
design library DC NXT
ndm
Physical Load Floorplan
Library Physical Synthesis

Apply Constraints

Standard cell
Synthesize
Floorplan Netlist the Design
placement

Analyze Results
Design Planning & “SPG” Placement
(ICC II DP / ICC II) Write out Netlist
with Cell Placement

82
Timing Report: Path Delay Section
Individual Contribution Running Total of
to Path Delay the Path Delay

Point Incr Path


---------------------------------------------
clock clk (rise edge) 0.00 0.00
clock network delay (ideal) 0.50 0.50 Signal
input external delay 1.00 1.50 f Transition
data1 (in) 0.04 1.54 f
u2/Y (inv1a1) 0.12 * 1.66 r
u12/Y (or2a1) 0.26 * 1.92 r
u23/Y (mx2d2) 0.23 * 2.15 r
XYZ_reg[14]/D (fdef1a1) 0.00 * 2.15 r Arrival
data arrival time 2.15 Time

Net + Cell Delay


ABC_reg[7] XYZ_reg[14]
0.11
0.01 Y Y
D Y D
data1
CK
u2
u12 u23
clk
83
DC NXT Physical Synthesis Flow
RTL Design Load Libraries
db
Design Constraints
Logic
Libraries
Load RTL Code
design library DC NXT
ndm
Physical Load Floorplan
Library Physical Synthesis

Apply Constraints

Standard cell
Synthesize
Floorplan Netlist the Design
placement

Analyze Results
Design Planning & “SPG” Placement
(ICC II DP / ICC II) Write out Netlist
with Cell Placement

84
Data Needed for Physical Design or Layout
From DC NXT

write_file –f verilog ..
write_icc2_files \
write_sdc ..
–out DESIGN_icc2
write_scan_def ..

Directory containing: Individual required files:


- Gate-level Verilog netlist file - Gate-level netlist
- SDC constraints file - SDC constraints
- Floorplan DEF and Tcl files - SCAN-DEF file
- SCAN-DEF file - ….
- ….

IC 3rd Party
Compiler II Layout tool

85
Main References
• Rob Rutenbar “From Logic to Layout”
• IDESA
• Rabaey, “Low Power Design Essentials”
• vlsicad.ucsd.edu ECE 260B – CSE 241A
• Roy Shor, BGU
• Synopsys slides

86

You might also like