xilinx-versal-ai-compute-solution-brief
xilinx-versal-ai-compute-solution-brief
CHALLENGE
Applied machine learning techniques have now become pervasive
across a wide range of applications, with tremendous growth in vision
and video in particular. FPGA-based AI/ML acceleration has already
shown performance and latency advantages over GPU accelerators,
but next-generation CNN-based workloads demand compute density
beyond what traditional FPGA programmable logic and multipliers
can offer. Fabric-based DSP blocks offer flexible precision and are still
capable accelerators, but the bit-level interconnect and fine-grained
programmability come with overhead that limits scalability for the
most compute-intensive CNN-based workloads.
2.7X
Within the Versal platform is a unique architecture for AI inference—the AI Engines—which are
an array of software programmable vector processors with flexible interconnect and tightly
coupled local memory—ideal for CNN-based inference and delivering 2.7X performance/watt
over competing 10nm FPGAs.1 AI Engines deliver compute density, power efficiency, and low
latency not possible with GPUs and traditional FPGA architectures, all while retaining hardware
Performance/Watt vs. FPGAs1
adaptability to evolve with AI algorithms.
Versal AI Core device for
AI Accelerator Cards
Whole Application Acceleration
Machine learning is typically integrated into a larger application rather than a stand-alone
workload. As a complete heterogeneous compute platform, the Versal AI Core series leverages
its diverse engines to infuse deep learning as “an element” of a larger application that has other
pre/post-processing requirements, delivering end-to-end application acceleration.
PLATFORM HIGHLIGHTS
> Custom memory hierarchy optimizes data movement and management for accelerator kernels
Adaptable Engines
> Pre- and post-processing functions including neural network RT compression and image scaling
> Tiled array of vector processors, flexible interconnect, and local memory enabling massive parallelism
> Up to 133 INT8 TOPS with the Versal AI Core VC1902 device, scales up to 405 INT4 TOPS in the portfolio
AI Engines
> Compiles models in minutes based on TensorFlow, PyTorch, and Caffe using Python or C++ APIs
> Ideal for neural networks ranging from CNN, RNN, and MLP; hardware adaptable to optimize for evolving algorithms
> Arm processing subsystem for queue management and Kubernetes orchestration
Scalar Engines
> Platform management controller for security, power management, and bitstream management
> Comprises hardened host interface, programmable NoC, and Scalar Engines
Integrated Shell > Ensures streamlined device bring-up and connectivity to off-chip interfaces, making the platform available at boot
> Delivers pre-engineered timing closure and logic resource savings, simplifying development of accelerator cards
BENCHMARK
ResNet50 v1.5 Performance Comparison
Shown below is a comparison of measured results on Versal devices as submitted to the ML Perf Data Center v1.0, and projected performance
of competing 10nm Intel Agilex FPGAs.
3.0
2.0 2.7X
1.0
1X
Intel Agilex AGF027-2 Versal AI Core VC19021
Peak INT8 TOPS 61 TOPs2 2.2X 133 TOPS
2: Assumes 30% compute efficiency for Intel Agilex FPGA 18x19 multipliers and 40% compute efficiency of AI Engines
3: Integrated shell reduces logic required for connectivity, 45K LUTs required for run-time SW & deep-learning processor support
4: Based on Quartus Power & Thermal Calculator 2021.2, assumes SmartVID and claimed static power savings
5: Device power estimates, based on Xilinx Power Estimator (XPE) available at https://www.xilinx.com/products/technology/power/xpe.html
> To start designing for cloud acceleration and edge computing, visit www.xilinx.com/vck5000
> To contact your local AMD sales representative, visit Contact Sales
DISCLAIMERS
(The information contained herein is for informational purposes only and is subject to change without notice. While every precaution has been taken in the preparation of this document, it may contain technical
inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect
to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for purposes, with respect to the
operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations
applicable to the purchase or use of AMD’s products are as set forth in a signed agreement between the parties or in AMD’s Standard Terms and Conditions of Sale.
COPYRIGHT NOTICE
© 2023 Advanced Micro Devices, Inc. All rights reserved. Xilinx, the Xilinx logo, AMD, the AMD Arrow logo, Alveo, Artix, Kintex, Kria, Spartan, Versal, Vitis, Virtex, Vivado, Zynq, and other designated brands included herein
are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. AMBA, AMBA Designer, ARM,
ARM1176JZ-S, CoreSight, Cortex, and PrimeCell are trademarks of ARM in the EU and other countries. PCIe, and PCI Express are trademarks of PCI-SIG and used under license. PID# 231846771-B