0% found this document useful (0 votes)
12 views

arc22-ai-creating_optimized_ai_soc_architecture-virtual-prototyping-mojin-kottarathil

The document discusses the creation of optimized AI System-on-Chip (SoC) architectures using virtual prototyping, highlighting advancements in embedded AI applications and the challenges faced in design and verification. It outlines the use of Synopsys Virtual Prototyping for early architecture analysis and optimization, along with a case study of an AI SoC platform utilizing ARC Processor IP. The presentation concludes with insights on how to get started with faster development of AI SoCs using Synopsys tools and services.

Uploaded by

lapnd.english
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

arc22-ai-creating_optimized_ai_soc_architecture-virtual-prototyping-mojin-kottarathil

The document discusses the creation of optimized AI System-on-Chip (SoC) architectures using virtual prototyping, highlighting advancements in embedded AI applications and the challenges faced in design and verification. It outlines the use of Synopsys Virtual Prototyping for early architecture analysis and optimization, along with a case study of an AI SoC platform utilizing ARC Processor IP. The presentation concludes with insights on how to get started with faster development of AI SoCs using Synopsys tools and services.

Uploaded by

lapnd.english
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Creating Optimized AI SoC Architecture

Using Virtual Prototyping

Mojin Kottarathil, Staff Applications Engineer


Synopsys ARC® Processor Summit 2022
Agenda

• Recent advancements in embedded AI applications and architectures


• Challenges in the design and verification of AI SoCs
• Synopsys Virtual Prototyping for early architecture analysis and optimization
• AI SoC platform case-study with ARC Processor IP
• How to get started

Processor Summit © 2022 Synopsys, Inc. 2


AI SoCs: A New Golden Age for Computer Architecture
AI
• Applications becoming smart
enabled
– autonomous vehicles, smart IoT, robots, etc.
Applications
– AI moving to the client for better cost, latency, reliability

• Neural Networks are getting bigger Neural


– More accurate results, higher image size, complex NLP models Network

• Software is often the hardest part Neural


– Need optimizing compilers to map applications to custom chips Network
– ResNet-50 is easy, real workloads are hard Compiler

• Moore’s Law winds down - Domain-Specific Architectures gain AI


– Custom accelerators/data-paths/instructions, SIMD SoC
– Many startups, semiconductors, super-scalers build AI SoCs

Processor Summit © 2022 Synopsys, Inc. 3


AI SoC Design Challenges
Brute-force Processing of Huge Data Sets
• Choosing the right algorithm and architecture: CPU, vector DSP, ASIP, DNN accelerator
– DNN graphs are evolving fast, need short time to market and cannot optimize for one single graph
– Joint design of AI algorithm, compiler and SoC architecture
– Joint optimization of power, performance, accuracy, and cost

• Highly parallel compute drives memory requirements


– E.g. in computer vision: higher resolution, higher frame-rate, more cameras
– High on-chip and chip to chip bandwidth at low latency
– High memory bandwidth requirements for parameters and layer to layer communication

• Power & Performance analysis require realistic workloads to consider dynamic effects
– Scheduling of AI operators on parallel processing elements
– Unpredictable interconnect and memory access latencies

Large Design Space Drives Differentiation by AI Algorithm & Architecture


Processor Summit © 2022 Synopsys, Inc. 4
Shift Left Architecture Analysis of AI SoCs
Analytical Performance Model
Architecture spec

APM-based Workload Model Fast Performance Model RTL Emulation RTL Prototyping
• Partitioning and exploration • HW/SW co-optimization • HW/SW co-verification • HW/SW co-verification
• Interconnect/memory analysis • Performance/power analysis • Power characterization • KPI validation

Model-based Architecture Simulation RTL-based HW/SW co-verification

Processor Summit © 2022 Synopsys, Inc. 5


Use-cases for Architecture Analysis with Virtual Prototyping
Early
Early architecture
architecture partitioning
exploration and
and exploration
optimization
Performance
Performancevalidation
optimization
withwith
Software
Software
with workload
with
models,
workloadcalibrated
modelsfrom APM
• KPI capture and sensitivity analysis • KPI tracking and validation
• Traffic and application workload modeling • IP selection and benchmarking
• HW/SW partitioning, architecture specification • SoC performance validation
• power/performance analysis • L1/L2 cache & cache coherency optimization

execute

map

execute

Hardware resource Application Near cycle accurate


Workload model
model Software Hardware model

Processor Summit © 2022 Synopsys, Inc. 6


Platform Architect Power and Performance Analysis Flow
Application workload Workload trace and statistics
Application
specification

Software
and
traces
map

Model Hardware platform Root-cause analysis


libraries

Power models
and
characterization

Sensitivity analysis

Parallel
parameter
sweep

Design space exploration

Processor Summit © 2022 Synopsys, Inc. 7


Platform Architect Based Workload Modelling cycles:
rd_bytes:
2000
0
wr_bytes: 0

• Analytic Performance Model (APM)


cycles: 0 cycles: 0
– Used internally by Synopsys NPX System Architecture Team rd_bytes: 0x200 rd_bytes: 0
wr_bytes: 0 wr_bytes: 0x200
• Workload Model generated from APM
Coef inDMA
– Calibrated tasks for in-DMA, out-DMA, and processing
Proc outDMA
• SoC Platform Model inDMA
– Accurate SystemC Transaction Level Models (TLM)
of processing elements, interconnect and memory
• Map workload to NPX6 VPU (Virtual Processing Unit) model
NPX6 VPU Model
– Process VPUs has execution time of layer group
inDMA NPU
– DMA execution times are based on actual bus and memory delays
w L1 and
• Analyze performance metrics outDMA L2 mem

– End-to-end performance
– Workload activity
NoC / Bus
– Utilization of resources
– Interconnect metrics Host SRAM DDR

– Latency, Throughput
– Contention, Outstanding transactions

Processor Summit © 2022 Synopsys, Inc. 8


ARC Processor Simulation Models
Support for building virtual prototypes
• nSIM NCAM has
– SystemC wrapper
– Model Libraries for Platform Architect and Virtualizer
– For easy deployment in Synopsys Virtual Prototyping tools
– Instrumented for debug and analysis

• Allows for easy creation of your own Virtual Platform

• Integration of MetaWare Debugger (mdb) into PA and Virtualizer


– For debugging complete systems containing ARC IP models

• Accurate model of ARC STU with non-blocking FT-AXI interfaces

Processor Summit © 2022 Synopsys, Inc. 9


ARC AI Fast Performance Model (FPM) in Platform Architect
Whitepaper "Performance Analysis Using ARC EV7x Fast Performance Model"
Neural Platform Architect SoC performance model Platform Architect Analysis
Network
ARC
SW trace
DNN DNN
Graph shared task trace
Mapping library DNN
Tool address trace
DNN
utilization
Runtime, DDR
binary utilization
Libraries,
Compiler image bus
Computer throughput
Vision
DNN power

• Use MetaWare production build flow to compile DNN model and ARC Vector DSP binary image
• Use Platform Architect to execute application on cycle-approximate performance model in context of SoC platform
• Analyze AI application and SoC power and performance metrics,
– e.g. Arc function profile, DNN trace, utilization, and address pattern, SoC bus and memory throughput and latency

Processor Summit © 2022 Synopsys, Inc. 10


Accuracy of FPM with FT interfaces in Platform Architect
Interconnect & memory models are crucial to achieve high accuracy for multi-core systems

Processor Summit © 2022 Synopsys, Inc. 11


AI SoC platform case-study
with Fast Performance Model of ARC AI processor IP
• Capture an AI SoC platform with ARC AI processor IP,
a Network-on-Chip, and DDR and SRAM memory hierarchy
• Analysis and optimization of IP-level and SoC architecture configurations

Processor Summit © 2022 Synopsys, Inc. 12


AI SoC Platform Case-study with ARC AI Subsystem

Platform Architect MobileNet


AI SoC with ARC IP & LP-DDR5

Goals:
 4 ms latency for inference of 5 frames
 minimize DNN power and energy
Root-cause analysis Sensitivity analysis
Optimize Hardware configuration:
– IP configuration
– Speed of DDR memory
– Interconnect, buffers, transactions

Processor Summit © 2022 Synopsys, Inc. 13


Platform Architect with ARC AI sub-system and DWC LPDDR5

Model Library
Block diagram

Parameters

Components

Connections

Processor Summit © 2022 Synopsys, Inc. 14


Video 1: Platform creation and tracing

• Example Platform creation


• Software tracing
• Hardware tracing

Processor Summit © 2022 Synopsys, Inc. 15


Processor Summit © 2022 Synopsys, Inc. 16
What We Just Learned
Platform creation and tracing

We learned how to:


✓ Create demo platform with ARC
Fast Performance Model and
DesignWare LPDDR5 memory
controller

✓ Use ARC VPX Function Trace to


analyze Software activity

✓ Correlate Software trace with


Hardware traces from DNN
accelerator and interconnect

Processor Summit © 2022 Synopsys, Inc. 17


Video 2: Performance Analysis

• Performance analysis of initial result


• Change architecture configuration
• Compare results from different simulations

Processor Summit © 2022 Synopsys, Inc. 18


Processor Summit © 2022 Synopsys, Inc. 19
What We Just Learned
Performance Analysis

We learned how to:

✓ Analyze activity and stall cycles of ARC AI


accelerator, correlate DNN activity with
interconnect and LPDDR analysis views

✓ Change bus and LPDDR5 controller


configuration to increase memory bandwidth

✓ Compare results from multiple runs,


new results show diminishing returns from higher
memory bandwidth

Processor Summit © 2022 Synopsys, Inc. 20


AI SoC Block Diagram in Platform Architect
Scaling AI Sub-system and LPDRR5 memory controller

Single-core sub-system Dual-core sub-system Quad-core sub-system


- 1 ARC VPX cores - 2 ARC VPX cores - 4 ARC VPX cores
- 1 DNN slice - 2 DNN slices - 4 DNN slices

DesignWare LPDDR5 - Multi-port LPDDR5 Mctrl


Memory Controller - parallel AXI bus fabric

Processor Summit © 2022 Synopsys, Inc. 21


AI SoC Architecture Sweep
Goal: 4 ms inference latency, minimize power & energy
Sweep parameters
– AI configuration: 1, 2, 4 DNN slices
– Outstanding transactions: 16, 32, 64
– LPDDR5 memory speed: 3733, 4800, 6400
– Interconnect/LPDDR controller: single port, multi-port
– LPDDR controller scheduler queue: 32, 64
– LPDDR channels: 2, 4

Sensitivity

Root-Cause
Analysis

Processor Summit © 2022 Synopsys, Inc. 22


Analysis and Optimization of Architecture Configurations
Inference latency for 5 frames vs. DNN power and energy consumption
Power
Latency [us]
Energy

Outstanding
transactions

LPDDR
channels Sufficient
performance
LPDDR speed
1 DNN slice 2 DNN slices 4 DNN slices
Processor Summit © 2022 Synopsys, Inc. 23
Example Summary

Platform Architect MobileNet

Goals:
 4 ms latency for inference of 5 frames
 minimize DNN power and energy
Optimized Hardware configuration:
Root-cause analysis Sensitivity analysis
– AI configuration:1, 2, 4 DNN slices
– Outstanding transactions: 16, 32, 64
– LPDDR memory speed: 3733, 4800, 6400
– Interconnect/LPDDR controller: single port, multi-port
– LPDDR controller scheduler queue: 32, 64

Processor Summit © 2022 Synopsys, Inc. 24


How To Get Started?
Faster Development of AI SoCs with Synopsys IP, tools, and services
Deep Knowledge in: Platform Architect
– AI Frameworks, AI & CNN Graphs, Graph Compression, • Exploration and optimization flows
Architecture • Power and performance analysis
and Mapping Tools Exploration & • Tooling for model creation and
– Class leading CNN, State of the art Vector DSP, & ASIP Optimization platform assembly
capabilities • Rich model library
– Leading edge processor IP and SW (ARC)
– Mastery of key support IP (HBM, PCIe, DDR, MIPI) ZeBu/HAPS
– Foundry Process, Memory Compilers and Logic Libraries • SoC verification
Verification, • Software development & bring-up
Emulation & • Hybrid emulation
Prototyping • Power & performance analysis
• AI benchmarks

Services
• Architectural tradeoffs
• IP subsystems
Services • ASIP design
• System verification
• Early Software development

Processor Summit © 2022 Synopsys, Inc. 25


Thank You!
• Further resources
• Landing page: DesignWare IP for Artificial Intelligence
• Landing page: Platform Architect

• Further questions
[email protected]

Processor Summit © 2022 Synopsys, Inc. 26


Thank You

You might also like