0% found this document useful (0 votes)
651 views112 pages

VLSI Design Flow - Comprehensive Overview

The document provides an overview of the VLSI design flow, from initial ideas to a manufactured chip. It discusses key stages in the process like system-level design, register transfer level (RTL) design, transforming the RTL to a GDS file layout, fabrication, and testing. Photolithography is described as a crucial process that uses masks and light to pattern circuit designs onto silicon wafers in multiple layers. Performance, power, and area are important metrics that the design flow aims to optimize.

Uploaded by

Sruthi Sri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
651 views112 pages

VLSI Design Flow - Comprehensive Overview

The document provides an overview of the VLSI design flow, from initial ideas to a manufactured chip. It discusses key stages in the process like system-level design, register transfer level (RTL) design, transforming the RTL to a GDS file layout, fabrication, and testing. Photolithography is described as a crucial process that uses masks and light to pattern circuit designs onto silicon wafers in multiple layers. Performance, power, and area are important metrics that the design flow aims to optimize.

Uploaded by

Sruthi Sri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 112

VLSI Design Flow

RTL to GDS
Overview

Comprehensive
Insights

Notebook
References
These insights are inspired by
the VLSI Design Flow course made
by Professor Sneh Saurabh.

🔗 NPTEL Course:
https://onlinecourses.nptel.ac.in
/noc23_ee137/preview

🔗 Links to Lectures:
https://sites.google.com/site/sne
hsaurabhhome/teaching/nptel-vlsi-
design-flow-rtl-to-gds?authuser=0

🔗 Text Book:
Saurabh, S. (2023). Introduction
to VLSI Design Flow. Published by
Cambridge University Press.

Further references are provided


at the end of each lecture of the
course.
Aspects Covered
Key Concepts and Terminology

Photolithography
“Art of Copying”
System-Level Design
HW/SW Partitioning
High-Level Synthesis
Maximum Clock Frequency
Logic Synthesis
RTL to NETLIST
Physical Design
NETLIST to GDS
Verification
Ensure Design Integrity
Manufacturing Test
Assure Chip Quality
Post-GDS Processes
Fabrication
Key Concepts and
Terminology
VLSI Design Flow, or Very Large
Scale Integration Design Flow,
is the road map that guides the
transformation of a brilliant
idea into a tangible silicon
reality. In the world of
microelectronics, it's the
essential playbook that takes
us from abstract concepts to
intricate, high-performance
integrated circuits.
This journey involves numerous
meticulous steps, from defining
a chip's functionality to its
physical realization in silicon
where each stage has a specific
purpose coming all together to
ensure that the end product
meets the “PPA” requirements:
Performance, Power, and Area.
In Out
Idea f

Pre-RTL
VLSI Design Flow

flow

Module
RTL … … …
EndModule

RTL to
GDS flow

GDS
Fabrication

Post-GDS
flow

Chip
Idea: Introducing initial
concept.
Pre-RTL: Conceptualization of
logic and functionality.
RTL (Register Transfer Level):
Design is expressed using RTL,
describing how data is
transferred and processed
within the chip.
RTL to GDS: Development of the
physical layout in the form of
a GDS file.
GDS (Graphic Data System):
Layout is finalized,
encompassing the placement,
routing, and physical details
required for manufacturing.
Post-GDS: Physical chip is
manufactured and tested.
Chip: Semiconductor end
product.
The VLSI design flow employs
abstractions to simplify and
manage the complexity of the
chip design process dividing it
into manageable stages which
enables designers to move from
high-level concepts to lower-
level details. This approach
aims to reach specific Figures
of Merit (FoMs) defining
circuit quality and efficiency.
* Performance: Achieving
optimal speed and efficiency
(Frequency of Clock).
* Power: Minimizing energy
consumption (Static and Dynamic
power).
* Area: Ensuring that the chip
occupies the smallest possible
area (Silicon wafer space).
The VLSI design flow is a
methodology to design an IC such
that it delivers the required
functionality or behaviour. It
depends on:
- Scope of Application:
* ASICs (Application Specific
Integrated Circuits): These are
custom-designed ICs tailored for
specific applications, such as
digital cameras and security
chips. They are not software
programmable.
* General-purpose Integrated
Circuits: These include
microprocessors, memory, and
FPGAs. They are versatile and
software programmable, making
them suitable for a wide range
of applications.
- Design Style:
* Full-Custom Design: involves
creating custom layouts and
designs in transistor-level,
providing the highest level of
optimization and performance.
* Standard-Cell Based Design:
pre-designed logic elements used
to build complex circuits
balancing between customization
and design efficiency.
* Gate-Array Based Design:
predefined logic gates
interconnected in various ways to
implement specific functions
combining flexibility with
moderate performance.
* FPGA Based Design: programmable
devices that allow for the
reconfiguration of logic elements
to adapt to different functions.
Photolithography
Art of Copying
Photolithography is a pivotal
semiconductor manufacturing
process, leveraging light to
define, replicate and scale down
precise patterns onto silicon
wafers. It plays a critical role
in achieving higher performance
and speed in a monolithic
silicon chip by shrinking
transistors which is resulting
in faster ICs due to shorter
interconnects and reduced
capacitance. This process is
instrumental in compactly
integrating all electronic
components onto a single piece
of silicon, contributing to
improved energy efficiency, a
critical factor for battery-
operated devices, while reducing
the cost “cost per transistor”.
Why so many layers are needed
for an IC ?
A1
Limited area

B1 B2

Side view 1 A2 Side view 2

layer 2

layer 1

via via via via

A1 A2 B1 B2
B1 A2
Making the interconnections
above within the limited area,
between the devices A and B
without they short, requires two
layers enabling the routing on
separate plans. Connections are
then established using Vias.
Photolithography is the process
of transferring precise
geometric patterns defining the
components and interconnections
onto silicon wafers. These
patterns, often designed on a
mask, serve as a stencil that
guides the exposure of the wafer
to light. As a result, the
silicon wafer acquires these
precise shapes, forming layers
within the structure of an IC.

Mask
Opaque region Transparent region

Light
(UV)

Silicon wafer patterned


Photolithography Process
1. Film Deposition

Deposited film
Substrate

Silicon wafer (the semiconductor


base material)

The film deposition on silicon


wafer refers to a metallization
process where metallic thin-
films are deposited on the wafer
to form the conductive circuit.
This deposited film, often
called an “Anti-Reflective
Coating" (ARC), aims to reduce
reflections of light during the
photolithography process while
providing good contact points to
the underlayers.
2. Photoresist Application

Photoresist
Deposited film
Substrate

Photoresist application is the


process of applying a light-
sensitive material, known as a
photoresist, onto a substrate
“the silicon wafer”. This
material undergoes chemical
changes upon exposure to UV
light through a mask. These
changes result in the creation
of precise patterns on the
photoresist, which are then
transferred to the underlying
layer, typically on the
deposited film.
3. Exposure
Light
Mask

Photoresist
Deposited film
Substrate

The exposure step involves


illuminating the photoresist
with light through a mask, which
carries the pattern to be
transferred onto the substrate.
The mask, typically made of
glass, comprises opaque and
transparent regions that allow
light to interact with the
photoresist, inducing changes in
its properties.
3. Development

Photoresist
Deposited film
Substrate

The development phase is where


the wafer, coated with the
exposed photoresist, is immersed
in a developer solution. This
solution interacts with the
exposed regions of the
photoresist, causing them to
dissolve and be removed, leaving
behind the desired pattern on
the wafer's surface on top of
the deposited film.
4. Etching

Etched
Photoresist

film
Deposited film
Substrate

The etching phase involves


exposing the wafer to a
specialized etching chemical
that selectively reacts with the
deposited film but not with the
remaining photoresist creating
the desired pattern on the film.

5. Photoresist removal

Deposited film
Substrate

This step finalizes the process


by removing the photoresist and
getting the desired pattern.
Photolithography Process

Film Deposition Photoresist Application

Development Exposure

Etching Photoresist Removal


The exceptional purity and
crystalline structure of silicon
wafers make them the preferred
choice for building ICs.
These silicon wafers, serving as
a substrate for a series of
manufacturing steps, are thin,
flat discs meticulously sliced
from cylindrical single crystals
of silicon known as silicon
ingots.
Silicon Ingot

Silicon Wafer
On a single silicon wafer, we
create multiple integrated
circuits, often referred to as
Dies. These dies are essentially
individual rectangular slices of
the wafer. Each die on the wafer
functions as a separate and
complete IC, and they are later
separated and packaged as
distinct semiconductor devices.

Wafer

Die

Packaged dies are generally


known as Chips.
This manufacturing process takes
place in semiconductor foundries
using specialized facilities
responsible for creating ICs
based on specific designs.
Semiconductor foundries play a
critical role by enforcing
manufacturing constraints
outlined in a Process Design Kit
(PDK) file. These constraints
are essential considerations
that designers must adhere to
during the design phase.

Foundry

Design PDK

Design
Team
System-Level Design
HW/SW Partitioning
System-Level Design is a phase
in the product development
process where an entire system,
consisting of various
interconnected components, is
designed and modeled to achieve
a specific functionality.
At this higher level of
abstraction, a large number of
solutions can be analyzed in
less time, and the results of
optimization conducted at this
level are expected to be better.
This phase considers the high-
level architecture, component
interactions, and overall system
behavior, aiming to ensure that
all components work together
cohesively and efficiently to
meet the desired objectives.
System-Level Design: Top View

Idea

Evaluation of Idea

Specification

HW/SW Partitioning

HW Portion SW Portion
of Spec of Spec
Existing
SW/HW
IC Design and Components Software
Fabrication Development

Fabricated Executables of
ICs, SoCs, Firmware,
Peripherals … Device Driver …

System Integration, Validation and Test

Product
Evaluation of Idea: The idea is
evaluated based on market
demand, financial viability and
technical feasibility.
Specification: Product Specs are
created, determining the
required features, relevant
figures of merit (Performance,
Power and Area: PPA) as well as
the time to market.
HW/SW Partitioning: The decision
is made, after identifying
various components required for
the system, regarding whether
each component will be
implemented in hardware “HW
portion of Spec” or software “SW
portion of Spec” based on the
project's requirements and
constraints.
Once the decision on HW/SW
partitioning is made, hardware
components enter the IC Design
and Fabrication phase, focusing
on realizing features in
physical hardware. Meanwhile,
software components proceed to
Software Development, where
functionalities are implemented
through code.
With all components ready and
available, including existing or
Intellectual Properties (IPs)
components that can be directly
reused at the system level, we
proceed to System Integration,
Validation, and Testing. This
phase is essential to confirm
the system's reliability and
performance before delivering
the final product.
HW/SW Partitioning
The motivation behind HW/SW
partitioning is to leverage the
advantages of both hardware and
software components by selecting
the optimal combination to
implement a specific function.
This decision-making process
ensures that each function is
executed efficiently and
effectively by either hardware
or software, depending on its
requirements and constraints.

Hardware Software
Performance High Low
Cost High Low
Risk due to bug High Low
Customization Low High
Development time High Low
Performance: Hardware can give
very high speed than software as
it can incorporate parallel
circuits that work concurrently,
enabling tasks to be completed
more quickly than in software,
where processes typically run
sequentially on a processor.
Cost: Software implementation is
cost-effective compared to
hardware as it involves code
rather than physical circuits.
Risk due to bug: Software
carries lower risk due to its
flexibility and ease of
customization. In the event of
errors, debugging and code
modification are
straightforward. In contrast,
hardware errors necessitate
costly and time-consuming
redesign and refabrication.
HW/SW Partitioning: Example
Take a video compression
algorithm with two main
components:
1. Computing the Discrete
Cosine Transform (DCT), which is
the bottleneck as it’s performed
multiple times during runtime.
2. Frame handling and other
computations.
To address this, the DCT part is
implemented in hardware with
parallel circuits for enhanced
speed and energy efficiency,
while frame handling and other
computations are handled in
software on a general-purpose
processor, ensuring flexibility
for potential configuration
changes.
HW/SW Partitioning: Example

Referred to as
a Hardware
Accelerator
Dedicated because it is
Hardware designed to
speed up and
optimize the
Implements DCT operation
the Discrete
Cosine
Transform
General-purpose
computation Microprocessor

Runs the frame


handling and
other
computations
Memory

Bus
HW/SW Partitioning: Methodology
Implement All Functionalities
in Software “S”, “H={}”

Speed = Evaluate(H,S)

Is Speed Got the


Yes
acceptable? Required
Partition
No
Profile(H,S)

Loop through SW
Functions i=1 to N

fi = identify ith bottleneck


function, S = S - fi and H = H U fi

New_Speed = Evaluate(H,S)

No Is Speed Yes
acceptable?
On completion
of Loop
No
New_Speed > Partition
Speed ? No
Found
Speed =
New_Speed Yes
Initially, the entire system
functionality is implemented in
software. By evaluating the
system's performance relative to
the specified threshold outlined
in the constraints, we make
decisions about implementing
certain functions, identified as
bottlenecks, in hardware, in
order to achieve the desired
performance and speed.
After evaluating the system's
performance Evaluate(H,S)
measuring the speed, the results
are compared to a predefined
threshold. If the performance
meets the desired level, the
desired partitioning is achieved
without the need for further
Profiling.
Profiling involves measuring the
duration of each function call,
essentially determining how much
runtime is consumed by each
function in a given executable.
These time-consuming functions
are classified by consumption,
and they are systematically set
to be implemented in hardware.
Therefore, as hardware is still
not existing yet, the New
system’s Speed is estimated and
assessed (using RTL design
simulation or FPGAs) combining
SW and HW modules. If it meets
the required threshold, the
desired partitioning is
achieved. Otherwise, the process
continues with remaining
bottlenecks until the desired
speed is attained or the set of
SW functions is exhausted.
On completion of the loop, the
last recorded New Speed becomes
the speed reference if it
surpasses the previous evaluated
speed. This decision determines
whether to continue
partitioning. If the new speed
does not exceed the previous
evaluated speed, no partition
can be found, and the system
cannot be further optimized.
One of the reasons why achieving
optimization can be challenging
is the presence of communication
time delays between software and
newly implemented hardware.
These delays can cause hardware
to wait for necessary data from
software, impacting overall
system performance.
High-Level Synthesis
Maximum Clock
Frequency
After completing the Hardware-
Software partitioning and
deciding on the system functions
to be implemented in hardware,
the next step involves creating
a Functional Specification for
the desired hardware, forming
the basis of the VLSI design
flow. High-Level Synthesis
(HLS), also known as Behavioral
Synthesis, plays a pivotal role
in this phase. It automates the
design process by generating the
Register-Transfer Level (RTL)
design, typically expressed in
hardware description languages
like Verilog, SystemVerilog, or
VHDL. HLS achieves this by
translating the Functional Spec
of hardware components, written
in high-level language (e.g. C,
C++, or Matlab), into RTL.
RTL: General Structure

Output of
Data Register
Path Feedback

Data Signals
..
From ALU
Ports or MUX D
Registers Q
ALU CP

..
D
Q
CP
Control To

From
Signals
..
D
Q
Ports

Ports FSM CP

Clock Registers

Control Path

ALU: Arithmetic Logic Unit


MUX: Multiplexer
FSM: Finite State Machine
In the RTL design, the flow of
data between registers is
modeled through two distinct
paths. The Data Path is where
the primary computations occur,
typically performed by
Arithmetic Logic Units (ALUs).
The results are Data Signals
driven to Registers where data
is temporarily held for
sequential transfer, and the
selection of which data goes
where is determined by a
Multiplexer (MUX). The Control
Signals for these multiplexers
are generated by the Finite
State Machine (FSM) in the
Control Path. The FSM represents
the system with a finite number
of states and transitions
between these states based on
input conditions.
Functional Specification to RTL

Functional Specification
(C, C++, SystemC, Matlab)

Manual Behavioral
IP Assembly
Coding Synthesis

RTL
(Verilog, SystemVerilog, VHDL)

In the process of translating


Functional Specification to RTL.
One, two, or a combination of
the following methods is used:
Manual Coding: This involves
directly writing RTL code using
a hardware description language.
It doesn't rely on high-level
language assistance and requires
detailed knowledge of the
hardware description language.
IP Assembly: This approach
includes reusing pre-designed
and pre-verified RTL sub-systems
or blocks called Intellectual
Properties such as processors,
hardware accelerators, device
drivers, RTOS… etc. This method
is particularly popular in
System-on-Chip (SoC) design,
where a complete system is
integrated on a single chip,
incorporating various predefined
components. It minimizes the
effort needed to redesign system
components and allows for the
use of internally developed or
purchased IP blocks, whether
they are hardware components or
software modules. This approach
enhances efficiency and reduces
development time and costs in
complex designs.
Behavioral Synthesis: or High
Level Synthesis, involves
translating untimed high-level
language algorithms, which are
not considering constraints of
time and clock cycles, into
timed RTL designs expressed in
hardware description languages,
meeting specified constraints.

Design of Algorithm
(C, C++, SystemC,
Matlab)

Constraints RTL
HLS
(Resource usage, (Verilog,
Frequency, Latency,
Tool VHDL)
Power)

Library of
Resources
The Behavioral Synthesis tool,
providing as output the RTL
design, takes three main inputs:
- Design of Algorithm: The
Algorithm describing the
Functional Specification is
written in a High-Level Language
like C, C++, SystemC or Matlab.
- Constraints: They specify the
requirements and limitations of
the design. They include:
* Resource Usage: It defines
the type and quantity of
resources like logic gates or
memory blocks that the design
can use, enabling efficient
exploitation of Area.
* Frequency: The maximum
operating frequency at which the
design can function efficiently.
* Latency: The maximum
allowable delay or processing
time for the design.
* Power: Power consumption
constraints to ensure the design
is energy-efficient.
- Library of Resources: It's a
repository of pre-designed
hardware components including IP
blocks, minimizing the necessity
to redesign common components.

Given that untimed algorithms,


using high-level languages, can
be implemented in many ways, the
behavioral synthesis tool
evaluates the cost of each
implementation selecting the one
that best adheres to specified
constraints, such as the maximum
operating clock frequency.
Maximum Clock Frequency
Given a synchronous circuit, the
maximum clock frequency defines
the speed at which the circuit
can reliably operate. It is
typically calculated using
Static Timing Analysis, which
assesses the timing constraints
within the circuit, such as the
combinational logic delay on
which the maximum frequency is
roughly estimated. Let’s first
get to know what’s a
combinational path.
Path

Flip-Flop
D
Q
CP

Combinational Path
Path: Sequence of pins through
which a signal can propagate.
Combinational Path: A path that
does not contain any sequential
circuit such as a flip-flop.
Flip-flops are Sequentially
adjacent if the output of one
flip-flop is fed as an input to
the other flip-flop through a
combinational path as shown in
the following figure.

Combinational
D Logic
Q D
CP Maximum delay = dMAX Q
CP
Launch
CLK Flip-Flop Capture
Flip-Flop

In synchronous circuits data


launched must be captured by the
sequentially adjacent flip-flop
in the next clock cycle.
The Clock Period should then be
greater than the delay of the
longest combinational path.

TCLK > dMAX FCLK < 1/dMAX

The combinational path that has


the largest delay is called the
critical path, setting the
circuit maximum frequency.

In addition to the combinational


logic delay that accounts for
signal propagation time, other
factors crucial for meeting
timing constraints are
considered in the Static Timing
Analysis (STA), including setup
time, hold time, logic gates
fanout… etc.
Behavioral Synthesis: Illustration
Given the algorithmic untimed behavior: Y = a + b + c
Cost Metrics: Circuit elements used, Latency, and Delay of
combinational path.

A1 RTL-1 A1 RTL-2
a a FF1
b + A2
b + D
Q
A2

+ FF1
+ FF2
CP
c D c D
CLK CP
Q Y CLK CP
Q Y

Resources: 2 Adders and 1 Flip-Flop Resources: 2 Adders and 2 Flip-Flops


Latency: 1 Clock Cycle Latency: 2 Clock cycles
Worst Delay: Delay of 2 Adders Worst Delay: Delay of 1 Adder

RTL-3 Let’s compute the Area, Latency and


M1 A1 Critical path Delay for the three RTL
a 0 implementations.
o
b 1s
FF1
Area Delay
M2 + D
CP
Q Y (µm2) (ns)

c 0
o
Adder 200 100
1s
Flip-Flop 12 0

Multiplexer 6 10

I1 Inverter 1 1

FF2
D
Q
CLK CP select RTL-1 RTL-2 RTL-3
Initially = 0

Area 412 424 237

Resources: 1 Adders, 2 Flip-Flops, 1 2 2


Latency
cycle cycles cycles
2 Multiplexers and 1 Inverter
Latency: 2 Clock Cycles Delay 200 100 110
Worst Delay: Delay of 1 Adder + 1 MUX

Timed Behavior: Which RTL will be generated when:


- Three different timed implementations - Area is to be minimized? RTL-3
illustrated. - Latency is to be minimized? RTL-1
- There can be several other - Frequency is to be maximized? RTL-2
implementations.
- Behavioral Synthesis tool will choose The Behavioral Synthesis tool can make
the best possible implementation trade-offs between Figures of Merit to
satisfying the constraints. improve others.
Logic Synthesis
RTL to NETLIST
After completing the pre-RTL
phase and obtaining the RTL
(Register-Transfer Level) design,
the next step is logic synthesis.
In the Logic Synthesis process,
the RTL design is transformed
into an equivalent circuit. This
transformation involves mapping
the logic gates and their
interconnections, resulting in a
netlist as the output. The
Netlist represents the logical
relationships between various
components of the circuit,
effectively specifying how the
hardware elements are connected
and work together.
This step is crucial for further
refining the design and preparing
it for physical implementation.
The logic synthesis tool takes
three key inputs: the RTL Design
that implements the functional
description, the Library
comprising standard cells,
macros, and memories necessary
for the synthesis, and the
Constraints that define design
goals, such as the expected
timing behavior specified in the
Synopsis Design Constraints
(SDC) file.

RTL Design
(Verilog /
VHDL)

Logic Netlist
Library
Synthesis (Verilog /
(Liberty)
Tool Schematic)

Constraints
(SDC)
The logic synthesis tool
generates the Netlist in
Verilog, which comprises the
final circuit's entities and
their interconnections. This
Netlist can be depicted as a
schematic by the logic
synthesizer.
The true challenge in logic
synthesis lies not in merely
translating Verilog or VHDL
constructs into a corresponding
logic circuit, but rather in the
art of optimization. It involves
making strategic choices and
connecting cells picked from the
technology library to ensure
equivalence of the intended
functionality between the RTL
and the Netlist while minimizing
the cost in terms of power,
performance, and area (PPA).
Logic Synthesis: Illustration
module rtl_design(a, b, clk, select, out);
input a, b, clk, select;
RTL DESIGN

output out;
reg out;
wire y;
assign y = (select)? b : a;
always @(posedge clk)
begin
out <= y;
end
endmodule
VERILOG
(Logic Synthesis …)

module rtl_design(a, b, clk, select, out);


input a, b, clk, select;
NETLIST

output out;
wire y;
MUX2 INST1(.A(a), .B(b), .S(select), Y(y));
DFF INST2(.D(y), .CP(clk), .Q(out));
endmodule
VERILOG

rtl_design
MUX2
SCHEMATIC

a A DFF
Y D
b B Q out
S CP
INST1
select INST2
clk
Netlist Terminologies
MYDESIGN
AN2 NOT
in1 N1 DFF
A N3 N4
Y A Y D N7
in2 B Q out1
N2 INST2 CP
INST1
BUF OUT1_REG
DFF
N5
A Y D N8
Q out2
INST3 CP
N6
CLK OUT2_REG

Library …
Cells
cell(AN2) …
Instantiation
pin(A) …
in the design pin(B) …
from the pin(C) …
Technology …
Library cell(NOT) …
cell(BUF) …
Design: Top level cell(DFF) …
entity or schematic …
that represents the
circuit “MYDESIGN”.
Net: Wire that connects
different cells instances and
ports “N1, N2,… N8”.
Ports: Interfaces of the design
through which it communicates
with the external world “in1,
in2, CLK, out1, out2”.
Primary Inputs: Signals entering
the design “in1, in2, CLK”.
Primary Outputs: Signals exiting
the design “out1, out2”.
Cells: Basic entities contained
in libraries, delivering
combinational or sequential
function “AN2, NOT, BUF, DFF”.
Instances: Cells when used
inside a design “INST1, INST2,
INST3, OUT1_REG, OUT2_REG”. The
same cell can be instantiated
multiple times.
Pin: Interface of a library cell
or instance through which it
communicates with other
components. A cell instance pin
typically specified “I1/A,I1/Y”.
Logic Synthesis Tasks

RTL
RTL
Synthesis
Netlist
(Generic Gate)
Logic
Optimization
Optimized Netlist
(Generic Gate)
Technology
Mapping
Netlist
(Standard Cell)
Technology-
dependent
Optimization
Optimized Netlist
(Standard Cell)
RTL Synthesis: Initial part
consisting of translating RTL to
a Netlist of Generic Logic Gates
and their connections (Nets).
These Generic Logic Gates don’t
have a fixed transistor level
implementation at this stage.
Therefore, it's not possible to
make accurate decisions about
area, delay, and power since the
physical standard cells have not
been selected yet.

Inverter Inverter
Generic Gate Standard Cell
Logic Optimization: Performed at
the generic gate level,
involving the analysis of RTL
code, checking connections, and
optimizing arithmetic
operations. The goal is to
reduce the number of logic gates
to enhance efficiency.
Technology Mapping: Involves the
process of mapping a netlist of
generic logic gates to standard
cells within a given technology
library. This mapping requires
careful selection, resulting in
a netlist of standard cells.
For a given generic gate,
multiple standard cells are
available in the library, and
selecting the appropriate one
requires meticulous
consideration of timing, area,
and power constraints.
For instance, if an inverter is
intended to operate in a highly
timing-critical path, a higher
drive-strength cell may be
necessary to reduce signal
propagation delay, even though
it occupies a significant area.
Technology-dependent
Optimization: Where PPA
estimation becomes more accurate
after technology mapping as we
have standard cells knowing
their geometrical and physical
properties in terms of power
consumption, time delays, and
area. This task results in a
final optimized netlist
providing a solid foundation for
the subsequent physical design
phase where the circuit layout
is obtained “GDS file”.
Physical Design
NETLIST to GDS
After the Netlist emerges from
the Logic Synthesis phase, the
subsequent stage, known as
Physical Design, takes the chip
design process to a tangible
level. Here, the Netlist — a
representation of the chip's
logic elements — is translated
into an actual physical layout.
This intricate process involves
organizing the Netlist standard
cells, interconnections, and
other components within a silicon
chip. The output of this phase, a
layout GDS (Graphic Data System)
file, comprises the detailed
geometric patterns that serve as
masks necessary for the
fabrication of the chip. This
phase significantly influences
the chip's performance, power
consumption and area.
The Physical Design phase, also
known as “Place and Route (P&R)”
orchestrates the placement and
routing of individual components
on the chip's layout. The
physical design tool,
responsible for generating the
Layout GDS file, requires four
essential inputs as depicted in
the figure below.

Netlist

Library
Physical Layout
Design (GDS)
Tool
Constraints

Floorplan
Netlist: Gate level input design
(interconnected chip components).
Library: Comprises the technology
library used in logic synthesis,
containing general component
information “outputs behavior
according to inputs” in a (.lib)
liberty file, and the physical
library housing detailed
component specifics “geometrical
dimensions, pins location …” in a
(.lef) Library Exchange Format
file, to effectively position and
connect components in the design.
Constraints: Define design goals,
expected timing behavior (maximum
operable frequency), and the
environment within which signals
interact with the design,
influencing the expected behavior
of generated signals.
Constraints are articulated
within a modified version of the
SDC (Synopsys Design Constraints)
file, similar to the one in
logical synthesis, providing
additional information about
wires, routing and other
physicals aspects of cells.
Floorplan: Intent about the
physical design, it encompasses
the required layout size (die
size), shape, and predefined
locations for global entities
such as input-output blocks,
macros, and memories.
Layout: Where all design entities
are positioned and interconnected
following the input netlist's
connectivity. This process
involves placement and routing as
the primary steps in the physical
design phase.
Physical Design Major Tasks

Chip Planning Floorplan


& PDN

Placed
Placement Cells

Clock-Tree Synthesis
Clock-Tree

Global
Global Routing Routes

Detailed Routing Detailed


Routes

Engineering Change Order


ECO

Mask
Layers
Write GDS
Chip Planning: In this phase,
significant decisions are made
regarding the overall circuit
layout. For larger designs, the
partitioning into subsystems and
major blocks is an initial step.
Placement of major blocks, such
as the CPU core, memory modules,
controller blocks, and macros,
involves arranging these blocks
in specific locations on the die,
including the allocation of rows
for standard cells within the
layout. Manual intervention is
often necessary at this stage as
the Floorplanning tool may not
always generate the most optimal
solution. Manual adjustments can
lead to improved signal integrity
and delays reducing interference
between critical circuit
elements, resulting in a better
Quality of Result (QoR).
Additionally, placement of I/O
cells and power planning, aka the
Power Delivery Network (PDN),
ensure sufficient and stable
power supply (VDD & GND) across
the chip by managing voltage
drops within acceptable limits.
Congestion management is equally
crucial, relying on the placement
of global blocks to prevent dense
areas, which can alleviate data
flow and timing concerns.
Placement: involves locating
standard cells in dedicated rows
within the layout. Handled by the
physical design tool, it aims to
estimate and minimize wire
lengths between standard cells,
meeting timing constraints by
placing cells closer to reduce
delays along critical paths while
ensuring minimal congestion.
Standard Cells arranged

I/O
Cells Standard
Cell Rows

Major
Blocks

Clock-Tree Synthesis: involves


determining the topology of the
clock network, defining how the
clock reaches synchronous
circuits (from source to sink).
It aims to minimize clock skew,
which represents the difference
in clock arrival time to sinks
and can be influenced by
variances in buffer delays.
Achieving a symmetrical
structure, ensuring clock paths
length to parallel sinks nearly
identical, helps in reducing
clock skew.
Assumed Null
Buffer
Sinks in Logic
Synthesis
10ps
Clock
a
Skew = 5ps
Source
b Buffers
delays are
15ps different?

Additionally, The CTS, Clock-Tree


Synthesis, focuses on minimizing
power dissipation since the clock
network consumes a large fraction
of total power due to continuous
switching “charging/discharging
of capacitances”. Techniques like
clock gating are employed to
address this power consumption
issue, by disabling the clock
signal to specific circuit
portions when they are not in
use, reducing unnecessary power
consumption during idle states.
Routing: involves the creation of
wire layouts for all nets,
excluding clock (done in CTS) and
power supply (done in PDN). It
aims to minimize wire length (to
reduce delays), routing areas (to
mitigate congestion), and vias
(to decrease design complexity).
The routing process is typically
divided into two stages:
* Global routing: serves as a
planning stage in the routing
process where the actual layout
wires aren't yet created. It
involves generating a routing
plan for a net by dividing the
routing region into tiles known
as global bins. These bins are
utilized by the global routing
tool to define the connection
path for a specific net.
Entire routing region
partitioned into
rectangular tiles
called “Global Bins”

Planned wire paths in


terms of global Bins

Global routing assign a


set of global bins used
to guide connection path
for a given Net.

* Detailed routing: involves


deciding the actual wires of each
net in the pre-assigned global
bins by allocating wires on the
metal layer of the layout while
ensuring switching between metal
layers using vias.
Actual wires

Routing channel

Feedthrough cells allow


signals to pass straight
through standard cell
rows without changing
direction.
ECO: This phase ensures the
design functions as required and
accommodates any new requirements
that emerge during the process.
If any issues arise or
alterations are needed, these are
identified during verification
which is carried out after each
step in physical design,
controlled modifications are
applied using the Engineering
Change Order method, allowing for
small adjustments in the design.
Write GDS: involves dumping the
design layouts of each layer in a
GDS file, outlining mask patterns
essential for fabrication. This
file is shipped to the foundry,
marking the conclusive phase
known as "Tapeout" before the
design fabrication begins.
Optimizations are conducted
incrementally between each
physical design task to maintain
small and controlled changes,
thereby enhancing Power,
Performance, and Area (PPA)
aspects. For instance,
adjustments may involve buffer
insertions on specific nets,
resizing cells, changing cell
placement, or modifying net
routing. Verification tasks are
executed concurrently with the
physical design implementation,
aiming to achieve design closure
by meeting constraints like
timing, power, and signal
integrity. However, attaining
design closure is challenging due
to potential inaccuracies in
estimated parameters (time, size,
power). They may be wrong!.
In cases of issues, an iterative
flow is adopted, forming loops
within the physical design flow
to address changes required from
previous tasks.
For instance, in the detailed
routing phase, if congestion in a
specific routing region prevents
the routing of a particular net,
it might necessitate revisiting
the placement phase to alleviate
congestion within the entities
positioned in that particular
region.
Chip Planning

Placement

Clock-Tree Synthesis

Global Routing

Detailed Routing
Verification
Ensure Design
Integrity
During the transformation
process from RTL to GDS layout,
the design traverses several
Electronic Design Automation
(EDA) tools and different teams,
which may introduce errors into
the design. These errors could
stem from various sources, such
as miscommunications among
teams, human mistakes, or
unexpected behaviors or bugs
within the tools. To ensure the
absence of errors and verify the
integrity of the design,
verification steps are crucially
integrated into the design
implementation process.
A considerable amount of effort
and time is dedicated to
verification, emphasizing its
significance in ensuring a
robust and error-free design.
Verification steps are integral
to ensuring the design's
correctness and functionality,
much like quality checks
conducted to confirm an idea's
realization aligns with the
original vision. This process is
carried out iteratively
throughout the VLSI design flow,
especially after any
modifications to the design.
Detecting and addressing issues
early on significantly reduces
the effort and time needed for
corrective actions. Verification
encompasses Functional, Timing,
Power, and Physical Verification
methods validating functionality
against design requirements,
ensuring timing closure, and
meeting power and manufacturing
constraints.
Functional Verification
Functional verification aims to
confirm that the RTL design
accurately reflects the intended
functionality.
- Simulation:
Simulation is a crucial technique
employed in the functional
verification process, utilizing
test vectors composed of
sequences of zeroes and ones,
along with associated timing
information. These test vectors
serve as stimuli applied to the
design to verify its functional
correctness. Through simulation,
the RTL design's behavior and
response to various inputs are
examined, ensuring its adherence
to the functionality as described
in the specifications.
Test
Vectors

Specification Design
(Golden Model) (RTL)

Computation Simulation

Expected Output
Response Response

Fail Compare Pass

Simulation involves comparing the


output responses obtained from
the RTL simulation within an EDA
tool to the expected responses
derived from a computed
specification using a behavioral
high-level model (in C, C++, or
Matlab), validating the design
functionality and behavior.
The advantage of simulation-based
verification lies in its speed,
involving running executables to
obtain responses for comparison.
However, it has limitations,
being an incomplete technique.
Test vectors cannot cover all
possible scenarios, and they are
chosen for design areas where
there might be bugs, particularly
focusing on computation-heavy
regions prone to errors.
- Model Checking:
Model checking ensures the
correctness of the RTL using
formal methods that establish
proof of specific properties
through mathematical tools like
deductions. For instance, if f(x)
implies g(x) and g(x) implies
f(x), then f(x) must equal g(x)
regardless of the value of x.
Once a design property is
mathematically proven using model
checking, it guarantees
correctness for all possible test
scenarios. The advantage of model
checking-based verification lies
in its completeness, ensuring
coverage across various
properties or functions. However,
establishing mathematical proofs
for all types of properties can
be computationally demanding.
CEC - Combinational Equivalence
Checking
CEC involves verifying the
functional equivalence between
two design models, typically the
RTL description and the Netlist
generated by a logic synthesis
tool. This process ensures that
both representations produce
identical functionality.
CEC is executed in parallel to
the design flow to ensure that
the final Netlist aligns with the
intended functionality derived
from the verified RTL design, and
helps catch any potential bugs.

RTL
Logic Synthesis CEC
Netlist
Floorplanning CEC
Floorplanned design
Placement CEC
Placed design
CTS CEC
Clock-tree inserted design
Global routing CEC
Globally-routed design
Detailed routing CEC
Detail-routed design
ECO CEC
Design after ECO
STA – Static Timing Analysis
STA is pivotal in guaranteeing
synchronous data transfer between
flip-flops within a design. It
verifies that data launched by a
flip-flop is captured by the
subsequent flip-flop in the
following clock cycle.

D
Q Combinational D
CP Logic Q
CP
Launch FF
Capture FF
Clk

STA is crucial for ensuring


deterministic synchronous timing
behavior, maintaining compliance
with setup and hold time
constraints, respectively data
stability time before and after
the clock’s active edge for
correct capture by the flip-flop.
Once the Netlist is generated,
STA is executed throughout the
subsequent design steps. This
process takes inputs such as
timing Constraints specifying the
desired operational frequency,
input signal characteristics, and
output signal behavior.

Netlist
Floorplanning
Floorplanned design
Placement Library
Placed design
CTS
STA
Clock-tree inserted design Tool
Design &
Global routing Constraints
Globally-routed design
Detailed routing Timing
Report
Detail-routed design
ECO
Design after ECO
The Library provides information
of standard cells used in the
netlist, so that the STA tool
performs calculations and
generates the Timing Report,
allowing identification and
resolution of any timing
violations in the design.
Physical Design Verification
This verification phase ensures
the layout is free from
manufacturing and connectivity
issues to maintain high yield
during fabrication. Yield, which
represents the percentage of
good, functional chips obtained
from a wafer after fabrication,
is crucial in measuring the
efficiency of the manufacturing
process. A higher yield means a
greater number of defect-free
chips in one silicon wafer.
The physical design verification
involves checking a defined set
of rules prior to sending the
layout to the foundry:
- Design Rule Check (DRC):
Checking for violations against
foundry-defined manufacturing
technology rules. All DRC
violations must be rectified
before sending the layout for
fabrication.
- Electrical Rule Check (ERC):
Verifying proper connectivity,
such as preventing short circuits
between distinct signal lines.
- Layout Versus Schematic (LVS)
Check: Validating that the
transistor-level Layout
accurately reflects the original
functionality outlined in the
gate-level Netlist.
Rule Checking
This phase ensures the adherence
of design entities, such as RTL,
Constraints, and Netlist, to
specific VLSI design flow rules.
- In RTL, these rules ensure that
constructs do not generate
simulation or synthesis issues in
later stages. While the RTL
design language may allow certain
constructs, the RTL rule checker
flags those that could
potentially cause problems.
- In Constraints, rule checks
ensure the absence of conflicting
or any missing constraints in the
SDC file.
- In the Netlist, rules verify
that the connectivity of cells
instances does not compromise the
testability or create issues
downstream in the design flow.
Manufacturing Test
Assure Chip Quality
Throughout the VLSI design
journey, verification stages
meticulously ensure the GDS
layout mirrors the original
circuit specifications,
guaranteeing accuracy before
fabrication. After chips are
fabricated, the focus shifts to
Manufacturing Tests. These tests
validate that the chips conform
to the design layout, adhering to
predefined constraints, and
ascertain the absence of any
manufacturing defects. Testing in
post-fabrication is carefully
considered during design phase,
particularly from RTL to GDS
where Design for Testability
(DFT) methodologies are
implemented, enabling easy
diagnosis, debugging and early
rectification of defects.
Manufacturing defects are
permanent physical flaws found in
fabricated chips. Even though
manufacturing process occurs in
controlled cleanroom environments
to mitigate impurities, static
electricity, vibrations, and
temperature variations, defects
can arise due to statistical
deviations in fabrication
materials, variations in process
parameters, airborne particles,
or inconsistencies in mask
features. Large area defects,
often from wafer mishandling or
mask misalignment, can be readily
addressed. Conversely, Spot or
Small area defects are of random
nature, and they grow with
increase in die area, posing
significant challenges for
testing and quality assurance.
Manufacturing defects in
fabricated chips can manifest as
functional failures, like short-
circuits or open-circuits, or
altering circuit parameters such
as signal propagation delays.

A A
B B
GND GND

Short-circuit Open-circuit

Additionally, distortions, like


those from photolithography
during manufacturing, are
inevitable and result in
distorted features on a die due
to optical effects. These can
often go undetected through
regular testing. Special
techniques are needed to rectify
these distortions and ensure
proper chip functionality.
Inconsequential flaws may occur,
representing deviations from the
ideal circuit but without causing
any measurable change in
functionality, such as a slightly
smaller layout particle size.
These minor flaws aren't the
primary focus of testing, which
instead aims to identify defects
that could impact the circuit's
behavior or performance.
Quality of Process: Yield
Yield refers to the proportion of
functional dies on a wafer,
without manufacturing defects.
It's influenced by the complexity
and maturity of the manufacturing
process. For example, if 300 dies
are good out of 400 manufactured,
then the yield is:
300 x 100 = 75%
400
Factors affecting Yield include:
- Die area: larger areas tend to
decrease yield.
- Defect density: average number
of defects per chip area unit.
- Clustering: how defects are
distributed across the chip area.
Yield: Dependency on Clustering
Defects clustered are defects
lying in small region and defects
unclustered are defects
distributed over large area.
Bad Die

Good Die

Clustered defects Unclustered defects

17 15
x100=80,95% Yield x100=71,42%
21 21
Testing Technique
One of testing paradigms is
testing using an Automatic Test
Equipment (ATE), which involves a
system comprising hardware and
software to examine die
functionality and performance
Fabricated Die

Automatic Test
Equipment (ATE)

Test Actual
Patterns Responses

Die
Expected
Match?
Responses

Yes No

Test Pass Test Fail


(Accept (Reject
diagnose) diagnose)
Test patterns (test vectors) are
delivered to the die via probe
needles establishing contact with
test pads on the Design Under
Test (DUT). Electrical paths
formed by this contact allow the
application of test vectors, and
the resultant output values,
actual responses, obtained
through these needles, are
compared against expected values
to determine whether the test is
accepted or rejected, aiding in
diagnostic analysis.

Probe
Needles
Test programs control this
testing process by loading test
patterns to the ATE and
performing the comparison task.
Test patterns are designed in
such a way that, if applied on a
good circuit and a defected one,
different output response will be
produced. The failed chip can be
diagnosed to find the root cause
of the problem, and correction
measures can be taken after
diagnosis to reduce the effect of
defect down the fabrication flow.
Fault Coverage & Defect Level
Fault coverage (FC) evaluates the
quality of testing measuring the
ability of a set of patterns to
detect a class of faults.
# Faults detectable
Fault Coverage =
# Faults possible
Attaining 100% fault coverage is
quite challenging in practice.
Usually, aiming for over 99%
coverage is pursued. However, due
to the limitations of the test
patterns, there might be
instances where defective
products could reach end-users as
not all possible faults are
covered.
The quality of a chip is assessed
through a metric known as the
Defect Level (DL), which depend
on both yield and fault coverage.
DL is measured in parts per
million (ppm) and represents the
ratio of defective chips among
those that have successfully
passed the tests. It’s also an
indicator of test effectiveness;
a perfectly effective test
results in a DL of 0.
Defect Level: Example

100 Chips
manufactured
90% on a wafer
Yield

90 Good 10 Defected

60% Fault
Coverage

Released to + 4 6
end-user Passed Faulty

94 Defective chips
identified as
defect-free
4
DL = x 106 ppm
94

94 chips are released to end-user


with 4 faulty chips which have
escaped tests. If test is fully
effective (FC = 100%), then the
DL would be Null. For commercial
chips, DL < 500 ppm.
Post-GDS Processes
Fabrication
Once the GDS file is finalized,
detailing the layout of the
design after traversing the
complete design flow, it becomes
imperative to grasp fundamental
concepts regarding fabrication.
These concepts are pivotal in
overcoming challenges involved in
fabrication during design steps.

In Out
Idea f
VLSI Design Flow

Pre-RTL
flow

Module
RTL … … …
EndModule

RTL to
GDS flow

GDS
Fabrication

Post-GDS Post-GDS
flow Processes

Chip
Mask Fabrication
A mask functions as a replica of
patterns present on a specific
layer of the layout. It is
constructed on a substrate made
of materials like glass or
silica. These masks play a
critical role in transferring
geometric patterns onto the
silicon wafer during the
photolithography process.
Fabricating masks includes
several steps:
- Data Preparation: involves
translating layout-specified mask
information to a format
comprehended by a mask writing
tool, by converting and
fracturing complicated polygon
shapes into simpler rectangles
and trapeziums, enabling accurate
rendering on the silicon wafer.
- Mask Writing and Chemical
Processing: During this phase, a
layer of chromium and photoresist
is applied to a glass or quartz
substrate, forming what’s known
as a blank (1). The mask pattern
is then written onto this blank
by exposing it to a laser or an
electron beam (2). Following
exposure, and using chemical
solutions, the photoresist is
developed (3), the chromium layer
is etched (4), and the remaining
photoresist is removed (5).
light
Photoresist
Chromium
Substrate

(1) (2) (3)

Mask
(4) (5)
- Quality Checks and Protection:
The mask surface is thoroughly
inspected for defects using
scanning methods, comparing the
surface to a reference image. If
a defect exceeds tolerance
levels, laser repair is employed
for corrections. Finally, a thin
polymer protective film called a
Pellicle is applied to shield the
mask from dust particles.
Resolution Enhancement Techniques
In photolithography, utilizing a
193nm wavelength for exposure can
lead to diffraction effects when
the mask patterns are smaller
than the wavelength of light.
This causes distortions in the
final features produced on the
silicon wafer. To counteract
this, Resolution Enhancement
Techniques (RET) are employed.
RET involves adding distortions
intentionally to the mask, so
that the pre-compensated mask
help offset distortions like
optical diffractions during
manufacturing, ensuring that the
final features on the silicon
align more closely with the
intended layout design. Examples
of RET are Optical Proximity
Correction (OPC) and
Double/Multi-Patterning.

GDS GDS
(Desired Layout) (Desired Layout)

No RET RET

Mask Mask
(Same as in GDS) (Pre-compensated)

Manufacturing Distortions

Silicon Silicon
(Features distorted) (Features as desired)
RET: Optical Proximity Correction

Desired shape Mousebite

Hammerhead

Serif

Mask (without OPC) Mask (with OPC)

Line-end
pullback

Corner
rounding

Actual shape Actual shape

Diffractions cause line-end


pullback and corner rounding.
Modifying patterns with serifs,
hammerheads and mousebites
compensate for diffraction in
photolithography.
RET: Double/Multi-Patterning
The limited resolution of
photolithography presents
challenges in printing closely
spaced features on a die, causing
overlaps. One solution is to
decompose these features into two
or multiple layouts (assigning
colors to features), then use
separate masks with reduced
exposure resolution for
fabrication thanks to decreased
pattern density.

Double Patterning

= +

Layout Mask 1 Mask 2


Wafer Fabrication and Die Testing
Fabrication is conducted layer by
layer, proceeding sequentially
through various steps like
photolithography, oxidation,
diffusion, and iron implantation.
Front End Of the Line (FEOL)
processes involve fabricating
circuit elements such as
resistors, capacitors, diodes,
transistors on lower layers.
Back End Of the Line (BEOL)
processes involve creating
interconnections using metallic
layers at the top of the wafer.
Following die fabrication,
incorporating all designed
features, comes the step of
testing where each die is tested
and compared with expected
patterns. Faulty dies are
discarded and not packaged.
Packaging
Dies are encapsulated within a
protective housing to form a
chip. This packaging provides
pins facilitating connections to
the external environment. The
characteristics of these pins
significantly affect signal delay
entering or leaving the chip,
demanding careful design
consideration. The package serves
to dissipate heat while
safeguarding against mechanical
damage and corrosion. A variety
of package types and materials
exist such as Dual Inline Package
(DIP) or Ball Grid Array (BGA).

DIP BGA
Final Testing and Binning
The final testing stage ensures
that the packaging step hasn't
introduced any errors or faults.
Typically, after packaging, the
die undergoes testing to verify
if it's still functioning
correctly and producing the
required output. Additionally, a
burn-in test is conducted by
subjecting the chip to high
voltage and temperature.
The burn-in test aims to identify
any latent defects that might not
have been discovered during
manufacturing tests, preventing
potential issues (issues of
infant mortality) at the early
stages of the product's life
cycle, before reaching end-users,
(see the bathtub curve).
Furthermore, chips are
categorized through a process
called binning, sorting them
based on their performance
metrics. Chips are expected to
maintain consistent performance
after fabrication, but
manufacturing variations can
cause some chips to underperform.
Measurements are conducted using
On-chip measurement circuitry,
assigning performance-based price
points to different chips.
Finally, these chips can be sent
to market for sale or integrated
with other chips to form larger
systems to be marketed for sale.
To be continued …

You might also like