arc22-ai-creating_optimized_ai_soc_architecture-virtual-prototyping-mojin-kottarathil
arc22-ai-creating_optimized_ai_soc_architecture-virtual-prototyping-mojin-kottarathil
• Power & Performance analysis require realistic workloads to consider dynamic effects
– Scheduling of AI operators on parallel processing elements
– Unpredictable interconnect and memory access latencies
APM-based Workload Model Fast Performance Model RTL Emulation RTL Prototyping
• Partitioning and exploration • HW/SW co-optimization • HW/SW co-verification • HW/SW co-verification
• Interconnect/memory analysis • Performance/power analysis • Power characterization • KPI validation
execute
map
execute
Software
and
traces
map
Power models
and
characterization
Sensitivity analysis
Parallel
parameter
sweep
…
Design space exploration
– End-to-end performance
– Workload activity
NoC / Bus
– Utilization of resources
– Interconnect metrics Host SRAM DDR
– Latency, Throughput
– Contention, Outstanding transactions
• Use MetaWare production build flow to compile DNN model and ARC Vector DSP binary image
• Use Platform Architect to execute application on cycle-approximate performance model in context of SoC platform
• Analyze AI application and SoC power and performance metrics,
– e.g. Arc function profile, DNN trace, utilization, and address pattern, SoC bus and memory throughput and latency
Goals:
4 ms latency for inference of 5 frames
minimize DNN power and energy
Root-cause analysis Sensitivity analysis
Optimize Hardware configuration:
– IP configuration
– Speed of DDR memory
– Interconnect, buffers, transactions
Model Library
Block diagram
Parameters
Components
Connections
Sensitivity
Root-Cause
Analysis
Outstanding
transactions
LPDDR
channels Sufficient
performance
LPDDR speed
1 DNN slice 2 DNN slices 4 DNN slices
Processor Summit © 2022 Synopsys, Inc. 23
Example Summary
Goals:
4 ms latency for inference of 5 frames
minimize DNN power and energy
Optimized Hardware configuration:
Root-cause analysis Sensitivity analysis
– AI configuration:1, 2, 4 DNN slices
– Outstanding transactions: 16, 32, 64
– LPDDR memory speed: 3733, 4800, 6400
– Interconnect/LPDDR controller: single port, multi-port
– LPDDR controller scheduler queue: 32, 64
Services
• Architectural tradeoffs
• IP subsystems
Services • ASIP design
• System verification
• Early Software development
• Further questions
• [email protected]