100% found this document useful (3 votes)
3K views

Advanced Fusion Compiler Synthesis and P&amp R Technologies To Drive Performance and

The document discusses new technologies in Synopsys' Fusion Compiler to improve performance, power, and turnaround time for digital design. It highlights enhancements to low-power techniques like clock gating and combinational multibit logic, as well as placement, optimization, and routing technologies. Key goals include faster timing closure through techniques like multithreaded clock tree synthesis and targeted endpoint-based optimization.

Uploaded by

yang hu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
3K views

Advanced Fusion Compiler Synthesis and P&amp R Technologies To Drive Performance and

The document discusses new technologies in Synopsys' Fusion Compiler to improve performance, power, and turnaround time for digital design. It highlights enhancements to low-power techniques like clock gating and combinational multibit logic, as well as placement, optimization, and routing technologies. Key goals include faster timing closure through techniques like multithreaded clock tree synthesis and targeted endpoint-based optimization.

Uploaded by

yang hu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Advanced Fusion Compiler Synthesis

and P&R Technologies to Drive


Performance and Turnaround Time

Keerthi Penmetsa
Synopsys

SNUG SILICON VALLEY 2023 1


Agenda
 Low-Power Technology
o Clock Gating Improvements  Clock Tree Synthesis
o Combinational Multibit o Multithreaded Global Skew Optimization
o Self-Gating Support in place_opt o Scenario Reduction in Area Recovery
o saif_map Support for commit_block o Next-Gen Regular Multisource Clock Tree Synthesis

 Placement Technology  Reference Methodology


o Direct Congestion Driven Placement
o Advanced Density Control  Summary
 Optimization Technology
o IO Priority
o Endpoint-Based Bottleneck Optimization
o Targeted Endpoint Based Optimization
o Improved Constant Propagation
o Log File Messages for Constant and Unloaded Registers
o Concurrent Legalization Optimization
o Wire-Opt Enhancements
o Improved Vt Handling
o Hold Closure Update
SNUG SILICON VALLEY 2023 2
Synopsys Digital Design Family
S Y N O P S Y S D I G I TAL D E S I G N FAM I LY
SYSTEM DESIGN
Fusion #1 Anchors: synthesis, P&R, signoff
3DIC Compiler Architecture Fusion of algorithms, engines, data model
DIGIT AL DESIGN Two fusion types: test and signoff
RTL Architect

Test Fusion Industry-unique Fusion Compiler


Innovative
Design Compiler NXT Products Design Compiler NXT, TestMAX, IC Validator NXT
TestMAX

PrimeShield, PrimeClosure, RTL Architect


Fusion Compiler

IC Compiler II
Market ML-enhanced tools, AI-driven apps
Leadership Accelerating AI, automotive, and multi-die systems
Signoff Fusion
Cloud-ready
PrimeTime, PrimeShield StarRC
PrimePower IC Validator NXT
PrimeLib Formality / Formality ECO
Tweaker ECO RedHawk Analysis Fusion
POST SILICON

Monitor IP SiliconDash Yield Explorer


2 0 % B e t te r Q u a lity- o f -R esu lts
2 X F a s t e r Time - to -R esu lts
SNUG SILICON VALLEY 2023 3
Fusion: Better, Faster, Predictable Results

Full-Flow Customer Designs: Across Design Styles, Across Processes

De s ig n Are a Tota l Powe r

7
Avg % Avg 16 %
1% 5% 10 % 15 % 20 % S malle r 1% 5% 10 % 15 % 30 % Lowe r

Com p le tion Tim e


Ac tive Avg15 %
Powe r Lower
1% 5% 15 % 30 % 60 %

2-3X Id le Avg30 %
Faste r Powe r
1.5 X 2X 3X Lower
1% 5% 15 % 30 % 60 %

SNUG SILICON VALLEY 2023 4


Synopsys Confidential Information
Key Technologies and Enhancements
• Self-Gating in place_opt
• Clock Gating
compile_fusion / Improvements
Performance place_opt • Combinational Multibit
• Self-Gating Support in Power
• TEP-based optimization
place_opt • Improved Vt Handling
• wire-opt • saif_map Support for
• Self-Gating in place_opt
• IO Priority commit_block
• Endpoint-Based • IO Priority
clock_opt build_clock Bottleneck Optimization • Wire-Opt
• Improved Constant
Propagation • Clock Gating
Improvements
• Auto Density Control • Combinational Multibit
• Wire-opt • Self-Gating Support in
clock_opt final_opto place_opt
• Improved vt handling • Multithreaded Clock Tree
Optimization • saif_map Support for
• IO Priority commit_block
• Concurrent Legalization • Next-Gen Regular MSCTS
Optimziation

route_auto OOTB / Runtime


• User hold eff
• Hold Fixing Effort
• Minimum Hold Fixing
Threshold • Multithreaded Clock
Tree Optimization
• Targeted endpoint (TEP)- • Auto Density Control
route_opt / Based Optimization
hyper_route_opt • Concurrent Legalization
Optimization
SNUG SILICON VALLEY 2023 5
Low-Power Technology

SNUG SILICON VALLEY 2023 6


T-2022.03-SP3

Clock Gating Improvements

Automatic Timing Driven Clock-Gate Splitting Automatic Timing Driven Ungating

Un-gated Register
New clock latency
CG_A_1

CG_A
CK

E
Clock latency updated

• Placement and timing driven • Full ungating of critical clock gates


• Considers trade-off between clock and enable • Selective ungating of critical sinks
timing • Creates better skewing opportunities
SNUG SILICON VALLEY 2023 7
T-2022.03-SP3

Clock Gating Improvements

Estimate Clock Gate Latency Improvements Self Gating QoR Improvements

Diminished power
returns compared
to area overhead

• Accurate latency computation


– Buffer or inverter selection • Technique: Area versus power trade-off
– Via resistance awareness • Improved clustering for further power reduction
• Consistent CTS constraint handling • Placement constraints for self-gated banks
SNUG SILICON VALLEY 2023 8
T-2022.03-SP3

Self-Gating Support in place_opt

• Reduce dynamic power by turning off the


clock signal when the register data remains
unchanged

• A self-gating cell can be shared across a


handful of registers or multibit banks
– A combined enable is created by implementing a
comparator tree
– Trade-off between the number of banked registers
and the quality of enable
– The tool can also automatically choose between
XOR, OR or NAND for a comparator tree based on
the static probability of each gated register

• Starting with version T-2022.03-SP3, the tool


supports self-gating as part of place_opt
SNUG SILICON VALLEY 2023 9
T-2022.03-SP3

Self-Gating Support in place_opt

• To ensure QoR improvements, the self-gating algorithm takes timing and power
into consideration. A self-gating cell is inserted for a candidate registers if:
– There is enough timing slack available in the register’s data pin
– Internal dynamic power of the circuit is reduced
– Smart grouping of candidate registers
• To enable self-gating inside place_opt:
– set the place_opt.flow.enable_self_gating application option to true

set_app_options –name place_opt.flow.enable_self_gating –value true

place_opt –from initial_place –to initial_drc

place_opt –from initial_opto –to initial_opto Insertion of Self Gates


SNUG SILICON VALLEY 2023 10
T-2022.03-SP3

Combinational Multibit Support


• Combinational multibit cells are library cells where a Combinational Multibit Library Cells
single cell implements multiple logic functions
Unshared Shared NAND/NOR
inputs inputs (Multioutput)
• Fusion Compiler can
– Bank single-bit combinational cells to multibit
combinational cells
– Delay, area or power recovery steps can debank cells as
needed to improve timing and power, via restructuring

• Provides area or power savings without degrading 2-bit Mux


timing QoR (non-inverted/inverted outputs)

D00 D00

D01 D01

D10 D10

D11 D11

SNUG SILICON VALLEY 2023 11


T-2022.03-SP3

Combinational Multibit Support


Flow
• To enable combinational multibit mapping, compile_fusion
use the following:
initial_map
set_app_options -list {
opt.common.enable_combinational_multibit
logic_opto
true } • Placement aware
banking based on • Placement-aware
– Enables combinational multibit mapping during physical proximity initial_place banking based on
the initial_opto and final_opto stages physical proximity
• Placement aware
debanking (via initial_drc • Placement
restructuring) aware debanking
(via restructuring)
• Cells banked: All initial_opto
combinational cells • Cells banked: All
with and without combinational cells
shared inputs
final_place
with and without
shared inputs
final_opto

• Placement
aware debanking clock_opt
(via restructuring)

SNUG SILICON VALLEY 2023 12


T-2022.03-SP3

saif_map Support for commit_block


Eases Writing PrimePower Mapping Files in Hierarchical Flow
• Loading and implementing full-chip design data is resource Setup Libraries
intensive in terms of memory and turnaround time
– Many users resort to the hierarchical flow Read and Elaborate RTL
• The saif_map command tracks all name changes before
committing a block saif_map -start
– In previous releases the saif_map database is not pushed down when
you execute the commit_block command Apply Timing Constraints,
Load Power Intent

• In version T-2022.03-SP3, the saif_map database is transferred to Ungrouping or Grouping


the new block when you execute the commit_block command
– You can now use the RTL saif_map database in a committed block Netlist Changes
that underwent name changes and write a PrimePower mapping file
commit_block

SNUG SILICON VALLEY 2023 13


Placement Technology

SNUG SILICON VALLEY 2023 14


T-2022.03-SP3

Core Placer Innovations

DIRECT CONGESTION ADVANCED DENSITY


DRIVEN PLACEMENT CONTROL
Density
hotspots
impact
routability

Congestion optimization Density Flattening


dynamically during placement
Footprint Expansion
Improved corner congestion and
routability Single-sided Density Cost

Improved Routability Better QoR

SNUG SILICON VALLEY 2023 15


U-2022.12

Dire ct Conge stion Drive n Place me nt


NGESTION DRIVEN PLACEMENT
DIRECT CONGESTION DRIVEN
• Overview
– Current congestion-driven placement uses cell expansion
PLACEMENT
technique to spread cells in congested areas, allowing
placement to indirectly optimize congestion
– This feature adds a new metric in the placer that measures
the number of wires in a region, referred to as Net Density.
The placer then set limits on Net Density to lead the placer to
spread nets and connected cells

• Benefits
– DCDP focuses to improve corner congestion and routability
– No need for custom placement blockages (hard or soft or partial)
at the congested corners to improve routability.

SNUG SILICON VALLEY 2023 16


T-2022.03-SP3

Advanced Density Control

• Background and Overview Baseline shows density hotspots,


cause routability issue

– The tool supports an Auto Density Control feature to control cell density for both
good timing and good congestion. But it still needs manual manipulation of the
density settings for better QoR
– This new density handling feature has better QoR without manual tuning

• Benefits
– Focus on mitigation of density hotspots, controlling oscillations between spreading
and clumping density objectives, and more targeted congestion expansion
– Density flattening improves local density hotspots by targeting them dynamically
– Footprint expansion increases the accuracy of cell expansion by accounting for
anticipated changes in local density
– Single-sided density cost avoids oscillations of cells spreading and clumping over
the placements in the flow
SNUG SILICON VALLEY 2023 17
Optimization Technology

SNUG SILICON VALLEY 2023 18


T-2022.03-SP3

IO Priority
Consistent de-prioritization of I/O through the RTL2GDS flow

• Many users are manually deprioritizing I/O


– To avoid burning area and power on I/O
– To avoid degrading R2RTNS to improve IOTNS

• Goal: Improve Out-of-the-Box results by handling I/O in a consistent way through


the whole flow
– All engines are impacted: Optimization, Placer, Router, DRC, CUS, CCD, CTS, …
– Works in Fusion Compiler and IC Compiler II throughout the RTL2GDS flow

• Benefits
– Consistent handling of IO paths through the flow
– Script simplification. No need for custom IO path groups and weights to drive QoR trajectory
– Transparent to the users SNUG SILICON VALLEY 2023 19
T-2022.03-SP4

IO Priority
User Interface

• In version T-2022.03-SP4, use the new flow.common.io_priority application


option
set_app_options –name flow.common.io_priority –value <value>
high (default) IO paths and R2R paths have equal priority (existing behavior)
medium R2R paths are prioritized over IO paths by the different engines

• Requirements
– time.enable_io_path_groups must be true (default)

SNUG SILICON VALLEY 2023 20


T-2022.03-SP3

Endpoint-Based Bottleneck Optimization

• Bottleneck Driver Selector


• Focused optimization on pins with high endpoint impact
• Improved timing with less area and power penalty S e nsitivity-base d Global Costing

• Se nsitivity-Base d Global Costing


• Accounts for endpoint impact during local costing
• Improves local-global timing correlation and PPA

• This feature is only available in


compie_fusion/place_opt. Bottleneck Driver Selector

SNUG SILICON VALLEY 2023 21


T-2022.03-SP3

Targeted Endpoint (TEP)-Based Optimization


• Enable TEP-based optimization for further convergence on a specific metric (setup, hold or ldrc)
– Available after clock_opt final_opto, route_opt and hyper_route_opt
– New -auto option guides the engine to work on all the violating endpoints without requiring you to manually create a collection of
endpoints, it can be one of “setup”, “hold” or “ldrc”.

clock_opt final_opto route_opt / hyper_route_opt


Use the set_clock_opt_target_endpoints command to specify the Use the set_route_opt_target_endpoints command to specify the
exact list of objects, or use the –auto option for all the violations of a exact list of objects, or use the –auto option for all the violations of a
specific metric specific metric
set_clock_opt_target_endpoints -setup_endpoints_collection set_route_opt_target_endpoints -setup_endpoints_collection
$setup_collection $setup_collection
clock_opt –from final_opto route_opt

set_clock_opt_target_endpoints –hold_endpoints_collection set_route_opt_target_endpoints –hold_endpoints_collection


$hold_collection $hold_collection
clock_opt –from final_opto route_opt
… …
OR OR

set_clock_opt_target_endpoints –auto setup set_route_opt_target_endpoints –auto setup


clock_opt –from final_opto route_opt

set_clock_opt_target_endpoints –auto hold set_route_opt_target_endpoints –auto hold


clock_opt –from final_opto route_opt
SNUG SILICON VALLEY 2023 22
T-2022.03-SP3

Improved Constant Propagation


Early Detection Improves QoR and Runtime

• In previous versions, detection of constants for propagation depended on


combinational optimization steps such as redundancy removal

• In version T-2022.03-SP3, improved constant propagation is on by default

• Constant detection occurs early in the flow


– Provides less dependency on other optimization steps
– Helps find constants that might have been previously missed

• Constant detection and register merging infrastructure are combined

SNUG SILICON VALLEY 2023 23


T-2022.03-SP3

Log File Messages for Constant and Unloaded


Registers
• Optimization techniques such as redundancy removal optimize logic around registers can make
registers become constant or unloaded

• In version T-2022.03-SP3, the redundancy removal step prints a log message when a register
becomes constant or unloaded

• To enable the feature, set the compile.flow.print_messages_for_redundant_registers


value to true
Log File Message:Info: Register pin A/B_reg/D has been identified as constant by the redundancy
removal engine and may be removed as constant register in later optimization steps.' (CGRR-0001)

SNUG SILICON VALLEY 2023 24


T-2022.03-SP3

Convergence - CLO Improvements


Concurrent Legalization Optimization

• Integrated Pin Access Optimization in CLO


– Cells placed at legal locations compile_fusion / clock_opt / route_opt
– Cell placement for routability is optimized

• Improved handling through Always Legal CLO Optimization CLO


Spacing,
cross-row
– Extra legalize placement call not required after Engine ALMap VT, NDR
optimization rules

– Spacing rules
– Cross-row VT rules
– NDR related PG-DRC rules Batch Legalization

SNUG SILICON VALLEY 2023 25


T-2022.03-SP3

Wire-Opt Enhancements
Better timing (R2R TNS) at end-of-flow

• Leverage new abilities to:


– Auto-apply non-default routing rules on tool-selected nets
– Tool prioritizes such nets internally throughout optimization and routing
– Router auto-derives design and process-specific NDR options
– Optimize via ladder insertion through the updated engine
– New VL flow with changes to when Pattern Must Join (PMJ), Electromigration Via Ladders (EMVL), and
performance Via ladder (VL) are inserted
– Incremental cover-rate update during extraction is cognizant of this change

• In version T–2022.03, use the new enable_wireopt_improvements command


in Fusion Compiler / IC Compiler II
– Automates above enhancements by using the –mode vlo or –mode andr arguments
– Possible to enable both using –mode all argument
– Stays enabled across stages through route_opt / hyper_route_opt

SNUG SILICON VALLEY 2023 26


T-2022.03-SP3

Improved VT Handling During Optimization


Overview

• In version T-2022.03-SP3, there are engine-level enhancements for improved VT


usage in the flow
– Restricts using leaky cells early in the flow through dynamic vt clustering along with optimization
improvements
– Limits lower VT cell usage to timing critical paths
– Helps leakage power designs, and the gain depends on power versus timing trade-off

SNUG SILICON VALLEY 2023 27


T-2022.03-SP3

Hold Fixing Effort


Controls hold fixing effort for runtime versus QoR tradeoff
• Balances hold fixing runtime versus QoR, when there is a large incoming hold
• To enable this feature, before clock_opt, route_opt and hyper_route_opt

set_app_options –name opt.common.hold_effort -value “high | medium | low”


high: (default) targets normal designs with reasonable hold QoR
Medium: targets designs with high local density area where it is challenging to insert hold buffers
Low: for early/dirty designs with big hold violations

• Setup TNS, power, and routability are comparable or improved when low/medium hold effort used
• “low” effort is applied automatically when set_qor_strategy -mode early_design used

SNUG SILICON VALLEY 2023 28


CTS

SNUG SILICON VALLEY 2023 29


T-2022.03-SP3

Multithreaded Global Skew Optimization

• Starting with version T-2022.03, a new clock


tree optimization (CTO) engine is built on top
of the new infrastructure to perform multiple
optimizations in parallel
– GR routing and timing evaluations for different
problems are multithreaded
– Scenario reduction for clock optimization to further
improve runtime

• This technology is being implemented in the


following CTO steps:
– Initial DRC fixing
– Global skew and latency optimization
– Area recovery
– Final DRC fixing
SNUG SILICON VALLEY 2023 30
T-2022.03-SP3

Scenario Reduction in Area Recovery

• In T-2022.03 version, scenario reduction was introduced on-by-default during the


skew optimization stage of clock_opt build_clock

• Starting with version T-2022.03-SP3, scenario reduction is extended on-by-


default to the area recovery step during the clock_opt build_clock as well

• With scenario reduction in skew optimization and area recovery steps, an


improved or neutral runtime is expected depending on the number of corners
and clocks

SNUG SILICON VALLEY 2023 31


T-2022.03-SP3

Next Generation -
H-tree Based Multisource Clock Trees
Fully Automated H-tree with Improved Routing and Latency Driven Tap Assignment

AUTOMATED FLEXIBLE H-TREE SYNTHESIS ENHANCED PIN ROUTING


H-tree trunk H-tree trunk
Targeted guidance for pin connection
 to Zroute
(GCR) (GCR)

Placed block +
CTS setup Automatic derivation of tap  Improved handshaking between
 drivers and configuration
Custom Router and Detail Route

H-tree Setup
 Improved pin connection Detail Route

 Single pass regular MSCTS


setup
Sub-optimal
connections from
Guidance for
improved pin

 Improved physical DRC


convergence during pin connection
trunk to pin connections from
trunk
H-tree  Minimum clock insertion
Single Pass

delay
Synthesis

Tap
 Latency-aware tap insertion LATENCY DRIVEN TAP ASSIGNMENT
Reduced WL for
Assignment Latency-aware tap
 assignment
 Latency and wire length Sinks
similar/better
latency
awareness for improved
Clock Tree No user-driven exploration sink distribution.
Synthesis  flows
Tap Driver

Better clock latency and


Faster turn-around time and  clock wire length
QoR  effort

SNUG SILICON VALLEY 2023 32


Reference Methodology

SNUG SILICON VALLEY 2023 33


General Recommendations
• Use the latest tool versions
– Take advantage of the latest tool enhancements and improvements
– Default tool results are continuously improved

• Use the Fusion Compiler Reference Methodology (RM)


– From R-2020.09-SP3: Fusion Compiler GUI “Help” -> “Generate RM scripts”
– Optinally you can download the Fusion Compiler RM scripts from SolvNet:
https://solvnet.synopsys.com/rmgen

• Apply mega switches for better results out-of-the- box


– Technology mega switch – set_technology –node <name>
– Reference methodology mega switch – set_qor_strategy, set_stage
– ARM core mega switch – set_hpc_options –core <name>
– Runtime mega switch – enable_runtime_improvements

SNUG SILICON VALLEY 2023 34


General Recommendation
 In the Fusion Compiler RM, use of the mega switches to quickly
Fusion Compiler-RM Flow
configure your design
init_design
set_technology –node $node  The set_technology mega switch is used first to ensure all
Technology Specific Side files
node specific recommended settings are applied.
set_qor_strategy –stage synthesis –metric
timing/total_power/leakage_power  The RM 2.0 flow makes use of the set_qor_strategy
set_stage –step synthesis
command to apply the best tool settings learned from customer
compile_fusion –to logic_opto engagements
insert_dft
compile_fusion –to initial_opto
set_stage –step compile_place
 The set_qor_strategy mega switch will be updated with new
compile_fusion –from final_place content as new features and techniques are available to help push
set_qor_strategy –stage pnr –metric for your design metrics
timing/total_power/leakage_power
 The set_stage command is run after the set_qor_strategy
set_stage –step cts
clock_opt –from build_clock –to route_clock command to apply step-dependent settings and RM special
features
set_stage –step post_cts_opto
clock_opt –from final_opto

set_stage –step route


route_auto

set_stage –step post_route


route_opt
endpoint_opt

SNUG SILICON VALLEY 2023 35


Key Technologies and Enhancements
• Self-Gating in place_opt
• Clock Gating
compile_fusion / Improvements
Performance place_opt • Combinational Multibit
• Self-Gating Support in Power
• Target endpoint based
place_opt • Improved Vt Handling
optimization
• saif_map Support for
• Self-Gating in place_opt
• wire-opt commit_block
• Endpoint-Based • IO Priority
• IO Priority clock_opt build_clock Bottleneck Optimization • Wire-Opt
• Improved Constant
Propagation • Clock Gating
Improvements
• Auto Density Control • Combinational Multibit
• Wire-opt • Self-Gating Support in
clock_opt final_opto place_opt
• Improved vt handling • Multithreaded Clock Tree
Optimization • saif_map Support for
• IO Priority commit_block
• Concurrent Legalization • Next-Gen Regular MSCTS
Optimziation

route_auto OOTB / Runtime


• User hold eff
• Hold Fixing Effort
• Minimum Hold Fixing
Threshold • Multithreaded Clock
Tree Optimization
• Targeted endpoint (TEP)- • Auto Density Control
route_opt / Based Optimization
hyper_route_opt • Concurrent Legalization
Optimization
SNUG SILICON VALLEY 2023 36
YOUR

THANK YOU INNOVATION


YOUR
COMMUNITY

SNUG SILICON VALLEY 2023 37

You might also like