SlideShare a Scribd company logo
Qi Xie (qi.xie@intel.com)
Hao Cheng (hao.cheng@intel.com)
Quanfu Wang (quanfu.wang@intel.com)
FPGA-BASED ACCELERATION
ARCHITECTURE FOR SPARK SQL
LEGAL NOTICES
• You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning
Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter
drafted which includes subject matter disclosed herein.
• No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
• Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness
for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing,
or usage in trade.
• This document contains information on products, services and/or processes in development. All information provided here is
subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and
roadmaps.
• The products and services described may contain defects or errors known as errata which may cause deviations from
published specifications. Current characterized errata are available on request.
• Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548-
4725 or by visiting www.intel.com/design/literature.htm.
• Intel, the Intel logo, Intel® are trademarks of Intel Corporation in the U.S. and/or other countries.
• *Other names and brands may be claimed as the property of others.
• Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
• Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software,
operations and functions. Any change to any of those factors may cause the results to vary. You should consult other
information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of
that product when combined with other products. For more complete information visit www.intel.com/benchmarks.
• Copyright © 2017 Intel Corporation.
2
About me
• Software engineer from Intel Big Data Engineering Spark team
• Focused on Spark optimization for Intel Architecture
3
Outline
• What’s an FPGA
• Intel FPGA Platform
• Workload & Benchmark Introduction
• Baseline Profile - Hotspot Analysis
• FPGA Acceleration Arch Overview
• Performance Comparison
• Future Works
4
Outline
• What’s an FPGA
• Intel FPGA Platform
• Workload & Benchmark Introduction
• Baseline Profile - Hotspot Analysis
• FPGA Acceleration Arch Overview
• Performance Comparison
• Future Works
5
What is an FPGA?
• Field Programmable Gate Array
6
‒ Configurable Logic Blocks (CLB)
‒ Embedded Memory
‒ Digital signal processing (DSP) blocks
‒ I/O pads
‒ Hard IP(PCIe, DDR, GigE, etc )
7
Why FPGA?
a
b
c
y
y a b c  
Truth Table
a b c y
0 0 0 1
0 0 1 0
0 1 0 1
0 1 1 1
1 0 0 1
1 0 1 0
1 1 0 1
1 1 1 1
Programmed LUT
1
0
1
1
1
0
1
1
MUX y
a,b,c
LUT
Required Function
‒ Reconfigurable architecture
CLB consists of LUTs. LUT is a RAM with data width of 1 bit.
The contents are programmed at power up.
‒ Low-power, energy efficiency, compared with CPU/GPU
Extreme degree of customizations, Well positioned for High performance and providing flexibility
Outline
• What’s an FPGA
• Intel FPGA Platform
• Workload & Benchmark Introduction
• Baseline Profile - Hotspot Analysis
• FPGA Acceleration Arch Overview
• Performance Comparison
• Future Works
8
Discrete and Integrated FPGA platforms
9
Intel Accelerator Abstraction Layer(AAL)
10
FPGAHardware
End User Programming Interfaces
11
FPGACPU
User Application
CPU
Infrastructure IP
(UPI, PCIe*, HSSI, FPGA Management)
FPGA Runtime Software
(Accelerator Abstraction Layer)
FPGA IP
(Acceleration
Function Unit)
Intel-Provided
Infrastructure
USER SOFTWARE
INTERFACE
User Developed
Application
Specific
Functions
UPI/PCIe
HSSI
= New blocks that simplify code development.
CORE CACHE
INTERFACE
Intel® Confidential
Traditional FPGA Development Approach
Kernels
exe
AFU
Bitstream
SW
Compiler
OpenCL
Compiler
HDL
SW
Compiler
exe AFU
Bitstream
HDL Programming
Syn.
PAR
AAL
Software
Blue
Bitstream
CPU FPGA
Green
Bitstream
OpenCL
Emulator
Application
Host
AFU
Simulation
Environment
(ASE)
C
OpenCL Programming
ASE
from Intel
AAL
from Intel
Altera® Quartus
Prime Pro
OpenCL BSP
AAL
Software
Blue
Bitstream
Green
Bitstream
Application
CPU FPGA
12
Outline
• What’s an FPGA
• Intel FPGA Platform
• Workload & Benchmark Introduction
• Baseline Profile - Hotspot Analysis
• FPGA Acceleration Arch Overview
• Performance Comparison
• Future Works
13
Workload Introduction
14Intel Confidential
The test case is from a customer and it utilizes SQL query to get the accounting summaries by
USER_ID on a big table. The SQL query contains heavy expression evaluations.
Accounting Big Table:
TIME_ID MBUSER_ID OPER_TID SUM_TIMES CHARGE1 …
20140407 2700007679977 5B013363363w 3 0 …
20140407 2704012998344 31011G13iG0 48 57180 …
20140407 2704040114238 31Q11512ZT0 1 180 …
20140407 2700007012466 31011G13iG0 8 52320 …
20140407 2700001523491 1T0311G80610ydH10G00 2 0 …
20140407 2700000765632 310103015G0 1 30 …
20140407 2700007800325 4562210021 1 0 …
…
1.6x10^8
Rows
38 Columns
 SQL queries to summarize customers consumption characteristics utilizing
billing data.
 5GB parquet format stored on HDFS, 160 Million rows.
Workload code snippet
Function Count
Max 13
Sum 155
Substr 329
Case 133
Implicit Data type cast (String to Double) n/a
Total 630
// Prepare
val parquet = spark.read.parquet ("/mnt/nvme/inputParquet/")
parquet.createOrReplaceTempView ("inputTable")
// Query
A very Long SQL statement, intensive use build-in functions:
16Intel Confidential
SQL Query Physical Execution Plan
Two stages and with a shuffle(cross the data in network), the map stage contains file scan, projection and
partial aggregation while the reduce stage do further aggregation by merging the partial aggregation results.
Stage 1 (Map)
• File Scan
Read data from source.
• Projection
Expression evaluation
consumes most CPU cycles.
• Partial Aggregation
Aggregate per partition.
Shuffle
Stage 2 (Reduce)
• Full Aggregation. Tiny
task, consumes minor
CPU cycles.
Outline
• What’s an FPGA
• Intel FPGA Platform
• Workload & Benchmark Introduction
• Baseline Profile - Hotspot Analysis
• FPGA Acceleration Arch Overview
• Performance Comparison
• Future Works
17
Benchmark H/W Setup
18Intel Confidential
In a single server for profiling and performance evaluation.
• MCP(Skylake-FPGA Multiple Chips Package)
o CPU
Intel Xeon Skylake-P, 2Socketsx14Cores@2.8GHz, 56Hyper Threads
o FPGA
1xArria10 GX, 427,200ALM, 8MB RAM (10AX115U3F45E2SG)
o DMA Channels
1xUPI (80Gbps)
• Memory
384GB, DDR4@2133 MHz
• Disk
1xIntel SSD P3700, 1.6TB, SR:2800MB/s, SW:1900MB/s, RR:450K IOPS, RW:150K IOPS
19Intel Confidential
Baseline Profile - CPU, The Bottleneck
• PAT(Performance Analysis Tool) shows CPU is heavily utilized (assigned 54/56 Virtual Cores to
Spark). The total query execution time is 85 seconds.
Note: We started measurement from the 2nd run(the 1st run is to warm up data Linux file system cache), so no disk access
bandwidth in general.
Reduce
Stage does
very simple
aggregation
and takes
minor
CPU.(~1s)
*For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
20Intel Confidential
Baseline Profile - CPU, The Bottleneck, Contd.
• From the VisualVM map task’s CPU breakdown we can see the projection consumes 66.7% CPU.
Projection takes
66.7% of CPU
*For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
Outline
• What’s an FPGA
• Intel FPGA Platform
• Workload & Benchmark Introduction
• Baseline Profile - Hotspot Analysis
• FPGA Acceleration Arch Overview
• Performance Comparison
• Future Works
21
Arch Overview – Typical SQL Query Operators
2121aaIntel Confidential
This POC Target
JVM
Spark
Spark FPGA Adaptor
Native
HW
InternalRow
to FPGA Batch
FPGA Batch
to InternalRow
FPGA Java Wrapper
FPGA Driver
FPGA
FPGA Project
Pattern
Configure
DMA
Configure
Huge Page
Memory Pool
Computation
Starter, Monitor
Java Native Interface(JNI)
Accelerator Abstraction Layer (AAL)
• Spark FPGA Adaptor
• Identify the expressions in projection and
export to FPGA SQL engine instructions
• Data conversions between Spark Internal
Rows  FPGA Batches.
• FPGA Driver
• Configure the SQL Engine Patterns according
to the instructions from Spark FPGA Adaptor
• Trigger the FPGA computation and collect
results
• Huge pages memory management
• Configure the DMA channel between main
memory & FPGA
• AAL
• FPGA runtime library
• low level API to FPGA Driver
• FPGA SQL Engine (RTL)
• SQL expression pattern units, can be
configurable.
• DMA RX: FPGA reads input data from main
memory.
• DMA TX: FPGA writes results to main memory.
23
Arch Overview - S/W Stack
DMA RX/TX(RTL) SQL Engine(RTL)
null bit set(1 bit/field) values(8 bytes/field) variable length portion
4 bytes(TIME_ID) …… 64 bytes(For 4xCL alignment)
…… 4 bytes(For 4xCL alignment)8 bytes(MBUSER_ID)
8 bytes(MBUSER_ID)FPGA Input Batch
FPGA Output Batch
Internal Row
InternalRow
to FPGA Batch
FPGA Batch
to InternalRow
FPGA Java Wrapper
FPGA Project
1. Get HugePage
wrapped in
DirectByteBuffer
Internal Rows
FPGAInputBatch
FPGAOutputBatch
Internal Rows
4. Input
for
Computation
5. Collect
computation
result
2. Data Conversion 6. Data Conversion
7. Free HugePage
wrapped in
DirectByteBuffer
• Internal Row
Spark representation of one record, flexible to represent fixed and variable length fields.
• FPGA Input Batch
For memory and computation efficiency fields are placed in a sequential physical memory.
• FPGA Output Batch
Similar as FPGA Input Batch.
3. Engine
Configuration,
Start
24
Arch Overview - S/W Stack, Contd.
12 bytes(ACC_NBR)
Input Output
Data Flow
Control Flow
Spark FPGA Adaptor
Spark
Engine
Unit
Engine
Unit
Engine
Unit …DMA
RX
DMA
TX
Output BufferInput Buffer
CPU
FPGA
FPGA
Adapter & Driver
Data Source
Input BufferInput Buffer
Output BufferOutput Buffer
Engine
Unit
Engine
Unit
Engine
Unit …
Engine
Unit
Engine
Unit
Engine
Unit …
169 Levels Pipeline
Data Flow
Control Flow
Pattern Configure,
Computation Control
25
Arch Overview - Engine Pipeline, Data Flow
• Engine Pipeline
Spark FPGA SQL Engine is designed as Engine Unit Pipelines. Every Engine Unit plays a single computation, different Engine Units are
assembled together(configured by Spark) to perform a complex computation and works in the way of pipeline. A lot of pipelines(say N
pipelines) can be constructed to perform N parallel computations, so that in a single FPGA cycle, N records can be digested.
• Data Flow
Spark pumps Data from Data Source and converts them into the format as FPGA required, and then put them into InputBuffer Array.
Then FPGA gets input data via DMA RX and feed them into Engine Pipelines. The results of Engine Pipelines are filled into OutBuffer
Array via DMA TX. Finally Spark converts data back in the format of Spark SQL needed.
Arch Overview – SQL Engine Micro Architecture
26
• Every SQL Expression Evaluation engine is configurable.
• Every engine contain max four pattern engines. The input data is parallel fed into
pattern engine. The final result is the combine of the pattern engine result.
 Pattern Engine 1 is configured to
evaluate the SQL expression
Substr(oper_tid,1,1) IN (‘1’, ‘7’)
 Pattern Engine 2 is configured to
evaluate the SQL expression
Substr(oper_tid, 2, 1) IN (‘o’)
Outline
• What’s an FPGA
• Intel FPGA Platform
• Workload & Benchmark Introduction
• Baseline Profile - Hotspot Analysis
• FPGA Acceleration Arch Overview
• Performance Comparison
• Future Works
27
• The FPGA accelerated version significantly reduced the total execution time, from 86
seconds(baseline) to 44 seconds in the end to end benchmark.
Speedup Ratio: 86s/44s => ~2X
FPGA: 44s
Baseline: 86s
Performance Comparison - FPGA vs Baseline
28
*For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
• The FPGA accelerated version reduced the CPU time in expression evaluation,
from 66.7%(baseline) to 6.6-% in Map stage.
Projection with FPGA, less
than 6.6%
Projection in Baseline,
66.7%
29
Performance Comparison - FPGA vs Baseline, Contd.
*For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
Outline
• What’s an FPGA
• Intel FPGA Platform
• Workload & Benchmark Introduction
• Baseline Profile - Hotspot Analysis
• FPGA Acceleration Arch Overview
• Performance Comparison
• Future Works
30
Future Works
• Fully Configurable FPGA SQL Acceleration Engine
• In this PoC, we identified the SQL expression patterns manually in
frontend and configure them to the FPGA SQL Engine units in
runtime; however, we have limit FPGA SQL engines to support
some of the typical expression patterns, and arbitrary SQL
expression combinations is not supported yet.
• More Operators Support
• SQL Expression Evaluation in Projection is the first step, and for the
other typical operators like Aggregation/Sort/Join probably also can
be offload to FPGA.
• CPU can also computes the expression evaluation when FPGA
resources are fully occupied in computation.
31
qi.xie@intel.com
hao.cheng@intel.com
quanfu.wang@intel.com
Thank You
Ad

Recommended

Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...
Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...
Databricks
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
Databricks
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
Databricks
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Cloudera, Inc.
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta Lakehouse
Databricks
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Databricks
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data Science
Databricks
 
Spark
Spark
Heena Madan
 
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And Profit
Spark Summit
 
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
Databricks
 
Hyperspace: An Indexing Subsystem for Apache Spark
Hyperspace: An Indexing Subsystem for Apache Spark
Databricks
 
Spark overview
Spark overview
Lisa Hua
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache Spark
Kazuaki Ishizaki
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
 
Spark architecture
Spark architecture
GauravBiswas9
 
Apache Flink internals
Apache Flink internals
Kostas Tzoumas
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of Flink
Flink Forward
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
Databricks
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
Databricks
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
Anton Kirillov
 
Best Practices for Enabling Speculative Execution on Large Scale Platforms
Best Practices for Enabling Speculative Execution on Large Scale Platforms
Databricks
 
Dive into PySpark
Dive into PySpark
Mateusz Buśkiewicz
 
Introduction to apache spark
Introduction to apache spark
Aakashdata
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IO
Databricks
 
Spark SQL Join Improvement at Facebook
Spark SQL Join Improvement at Facebook
Databricks
 
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...
Databricks
 
Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application Performance
Odinot Stanislas
 

More Related Content

What's hot (20)

Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Databricks
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data Science
Databricks
 
Spark
Spark
Heena Madan
 
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And Profit
Spark Summit
 
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
Databricks
 
Hyperspace: An Indexing Subsystem for Apache Spark
Hyperspace: An Indexing Subsystem for Apache Spark
Databricks
 
Spark overview
Spark overview
Lisa Hua
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache Spark
Kazuaki Ishizaki
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
 
Spark architecture
Spark architecture
GauravBiswas9
 
Apache Flink internals
Apache Flink internals
Kostas Tzoumas
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of Flink
Flink Forward
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
Databricks
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
Databricks
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
Anton Kirillov
 
Best Practices for Enabling Speculative Execution on Large Scale Platforms
Best Practices for Enabling Speculative Execution on Large Scale Platforms
Databricks
 
Dive into PySpark
Dive into PySpark
Mateusz Buśkiewicz
 
Introduction to apache spark
Introduction to apache spark
Aakashdata
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IO
Databricks
 
Spark SQL Join Improvement at Facebook
Spark SQL Join Improvement at Facebook
Databricks
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Databricks
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data Science
Databricks
 
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And Profit
Spark Summit
 
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
Databricks
 
Hyperspace: An Indexing Subsystem for Apache Spark
Hyperspace: An Indexing Subsystem for Apache Spark
Databricks
 
Spark overview
Spark overview
Lisa Hua
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache Spark
Kazuaki Ishizaki
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
 
Apache Flink internals
Apache Flink internals
Kostas Tzoumas
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of Flink
Flink Forward
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
Databricks
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
Databricks
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
Anton Kirillov
 
Best Practices for Enabling Speculative Execution on Large Scale Platforms
Best Practices for Enabling Speculative Execution on Large Scale Platforms
Databricks
 
Introduction to apache spark
Introduction to apache spark
Aakashdata
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IO
Databricks
 
Spark SQL Join Improvement at Facebook
Spark SQL Join Improvement at Facebook
Databricks
 

Similar to FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang (20)

Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...
Databricks
 
Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application Performance
Odinot Stanislas
 
FPGAs for Supercomputing: The Why and How
FPGAs for Supercomputing: The Why and How
DESMOND YUEN
 
Introduction to FPGA acceleration
Introduction to FPGA acceleration
Marco77328
 
Challenges and Opportunities of FPGA Acceleration in Big Data
Challenges and Opportunities of FPGA Acceleration in Big Data
IRJET Journal
 
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
Edge AI and Vision Alliance
 
FPGAs and Machine Learning
FPGAs and Machine Learning
inside-BigData.com
 
High Performance Computing: The Essential tool for a Knowledge Economy
High Performance Computing: The Essential tool for a Knowledge Economy
Intel IT Center
 
FPGA MeetUp
FPGA MeetUp
Moya Brannan
 
Using Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC Workloads
inside-BigData.com
 
Driving Industrial InnovationOn the Path to Exascale
Driving Industrial InnovationOn the Path to Exascale
Intel IT Center
 
INFN Advanced ML Hackaton 2022 Talk
INFN Advanced ML Hackaton 2022 Talk
Mirko Mariotti
 
E3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - Sundance
Sundance Multiprocessor Technology Ltd.
 
The basic graphics architecture for all modern PCs and game consoles is similar
The basic graphics architecture for all modern PCs and game consoles is similar
dinosocrates
 
What should be done to IR algorithms to meet current, and possible future, ha...
What should be done to IR algorithms to meet current, and possible future, ha...
Simon Lia-Jonassen
 
SoC FPGA Technology
SoC FPGA Technology
Siraj Muhammad
 
Search and fpga
Search and fpga
Arvind Rapaka
 
DATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe Conference
LEGATO project
 
AI Crash Course- Supercomputing
AI Crash Course- Supercomputing
Intel IT Center
 
Fpgas for-dummies-ebook
Fpgas for-dummies-ebook
Chichan Ibn Adam
 
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...
Databricks
 
Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application Performance
Odinot Stanislas
 
FPGAs for Supercomputing: The Why and How
FPGAs for Supercomputing: The Why and How
DESMOND YUEN
 
Introduction to FPGA acceleration
Introduction to FPGA acceleration
Marco77328
 
Challenges and Opportunities of FPGA Acceleration in Big Data
Challenges and Opportunities of FPGA Acceleration in Big Data
IRJET Journal
 
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
Edge AI and Vision Alliance
 
High Performance Computing: The Essential tool for a Knowledge Economy
High Performance Computing: The Essential tool for a Knowledge Economy
Intel IT Center
 
Using Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC Workloads
inside-BigData.com
 
Driving Industrial InnovationOn the Path to Exascale
Driving Industrial InnovationOn the Path to Exascale
Intel IT Center
 
INFN Advanced ML Hackaton 2022 Talk
INFN Advanced ML Hackaton 2022 Talk
Mirko Mariotti
 
The basic graphics architecture for all modern PCs and game consoles is similar
The basic graphics architecture for all modern PCs and game consoles is similar
dinosocrates
 
What should be done to IR algorithms to meet current, and possible future, ha...
What should be done to IR algorithms to meet current, and possible future, ha...
Simon Lia-Jonassen
 
DATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe Conference
LEGATO project
 
AI Crash Course- Supercomputing
AI Crash Course- Supercomputing
Intel IT Center
 
Ad

More from Spark Summit (20)

VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
Spark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
Spark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Spark Summit
 
Ad

Recently uploaded (20)

Camuflaje Tipos Características Militar 2025.ppt
Camuflaje Tipos Características Militar 2025.ppt
e58650738
 
25 items quiz for practical research 1 in grade 11
25 items quiz for practical research 1 in grade 11
leamaydayaganon81
 
Artigo - Playing to Win.planejamento docx
Artigo - Playing to Win.planejamento docx
KellyXavier15
 
Allotted-MBBS-Student-list-batch-2021.pdf
Allotted-MBBS-Student-list-batch-2021.pdf
subhansaifi0603
 
Indigo dyeing Presentation (2).pptx as dye
Indigo dyeing Presentation (2).pptx as dye
shreeroop1335
 
@Reset-Password.pptx presentakh;kenvtion
@Reset-Password.pptx presentakh;kenvtion
MarkLariosa1
 
最新版美国威斯康星大学河城分校毕业证(UWRF毕业证书)原版定制
最新版美国威斯康星大学河城分校毕业证(UWRF毕业证书)原版定制
taqyea
 
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
Tamanna36
 
PPT1_CB_VII_CS_Ch3_FunctionsandChartsinCalc.ppsx
PPT1_CB_VII_CS_Ch3_FunctionsandChartsinCalc.ppsx
animaroy81
 
最新版美国佐治亚大学毕业证(UGA毕业证书)原版定制
最新版美国佐治亚大学毕业证(UGA毕业证书)原版定制
Taqyea
 
Communication_Skills_Class10_Visual.pptx
Communication_Skills_Class10_Visual.pptx
namanrastogi70555
 
All the DataOps, all the paradigms .
All the DataOps, all the paradigms .
Lars Albertsson
 
The Influence off Flexible Work Policies
The Influence off Flexible Work Policies
sales480687
 
Measurecamp Copenhagen - Consent Context
Measurecamp Copenhagen - Consent Context
Human37
 
presentation4.pdf Intro to mcmc methodss
presentation4.pdf Intro to mcmc methodss
SergeyTsygankov6
 
Mynd company all details what they are doing a
Mynd company all details what they are doing a
AniketKadam40952
 
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Mahmoud Shoush
 
最新版意大利米兰大学毕业证(UNIMI毕业证书)原版定制
最新版意大利米兰大学毕业证(UNIMI毕业证书)原版定制
taqyea
 
Shifting Focus on AI: How it Can Make a Positive Difference
Shifting Focus on AI: How it Can Make a Positive Difference
1508 A/S
 
624753984-Annex-A3-RPMS-Tool-for-Proficient-Teachers-SY-2024-2025.pdf
624753984-Annex-A3-RPMS-Tool-for-Proficient-Teachers-SY-2024-2025.pdf
CristineGraceAcuyan
 
Camuflaje Tipos Características Militar 2025.ppt
Camuflaje Tipos Características Militar 2025.ppt
e58650738
 
25 items quiz for practical research 1 in grade 11
25 items quiz for practical research 1 in grade 11
leamaydayaganon81
 
Artigo - Playing to Win.planejamento docx
Artigo - Playing to Win.planejamento docx
KellyXavier15
 
Allotted-MBBS-Student-list-batch-2021.pdf
Allotted-MBBS-Student-list-batch-2021.pdf
subhansaifi0603
 
Indigo dyeing Presentation (2).pptx as dye
Indigo dyeing Presentation (2).pptx as dye
shreeroop1335
 
@Reset-Password.pptx presentakh;kenvtion
@Reset-Password.pptx presentakh;kenvtion
MarkLariosa1
 
最新版美国威斯康星大学河城分校毕业证(UWRF毕业证书)原版定制
最新版美国威斯康星大学河城分校毕业证(UWRF毕业证书)原版定制
taqyea
 
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
Tamanna36
 
PPT1_CB_VII_CS_Ch3_FunctionsandChartsinCalc.ppsx
PPT1_CB_VII_CS_Ch3_FunctionsandChartsinCalc.ppsx
animaroy81
 
最新版美国佐治亚大学毕业证(UGA毕业证书)原版定制
最新版美国佐治亚大学毕业证(UGA毕业证书)原版定制
Taqyea
 
Communication_Skills_Class10_Visual.pptx
Communication_Skills_Class10_Visual.pptx
namanrastogi70555
 
All the DataOps, all the paradigms .
All the DataOps, all the paradigms .
Lars Albertsson
 
The Influence off Flexible Work Policies
The Influence off Flexible Work Policies
sales480687
 
Measurecamp Copenhagen - Consent Context
Measurecamp Copenhagen - Consent Context
Human37
 
presentation4.pdf Intro to mcmc methodss
presentation4.pdf Intro to mcmc methodss
SergeyTsygankov6
 
Mynd company all details what they are doing a
Mynd company all details what they are doing a
AniketKadam40952
 
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Mahmoud Shoush
 
最新版意大利米兰大学毕业证(UNIMI毕业证书)原版定制
最新版意大利米兰大学毕业证(UNIMI毕业证书)原版定制
taqyea
 
Shifting Focus on AI: How it Can Make a Positive Difference
Shifting Focus on AI: How it Can Make a Positive Difference
1508 A/S
 
624753984-Annex-A3-RPMS-Tool-for-Proficient-Teachers-SY-2024-2025.pdf
624753984-Annex-A3-RPMS-Tool-for-Proficient-Teachers-SY-2024-2025.pdf
CristineGraceAcuyan
 

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang

  • 1. Qi Xie ([email protected]) Hao Cheng ([email protected]) Quanfu Wang ([email protected]) FPGA-BASED ACCELERATION ARCHITECTURE FOR SPARK SQL
  • 2. LEGAL NOTICES • You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein. • No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. • Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. • This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. • The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request. • Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548- 4725 or by visiting www.intel.com/design/literature.htm. • Intel, the Intel logo, Intel® are trademarks of Intel Corporation in the U.S. and/or other countries. • *Other names and brands may be claimed as the property of others. • Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. • Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks. • Copyright © 2017 Intel Corporation. 2
  • 3. About me • Software engineer from Intel Big Data Engineering Spark team • Focused on Spark optimization for Intel Architecture 3
  • 4. Outline • What’s an FPGA • Intel FPGA Platform • Workload & Benchmark Introduction • Baseline Profile - Hotspot Analysis • FPGA Acceleration Arch Overview • Performance Comparison • Future Works 4
  • 5. Outline • What’s an FPGA • Intel FPGA Platform • Workload & Benchmark Introduction • Baseline Profile - Hotspot Analysis • FPGA Acceleration Arch Overview • Performance Comparison • Future Works 5
  • 6. What is an FPGA? • Field Programmable Gate Array 6 ‒ Configurable Logic Blocks (CLB) ‒ Embedded Memory ‒ Digital signal processing (DSP) blocks ‒ I/O pads ‒ Hard IP(PCIe, DDR, GigE, etc )
  • 7. 7 Why FPGA? a b c y y a b c   Truth Table a b c y 0 0 0 1 0 0 1 0 0 1 0 1 0 1 1 1 1 0 0 1 1 0 1 0 1 1 0 1 1 1 1 1 Programmed LUT 1 0 1 1 1 0 1 1 MUX y a,b,c LUT Required Function ‒ Reconfigurable architecture CLB consists of LUTs. LUT is a RAM with data width of 1 bit. The contents are programmed at power up. ‒ Low-power, energy efficiency, compared with CPU/GPU Extreme degree of customizations, Well positioned for High performance and providing flexibility
  • 8. Outline • What’s an FPGA • Intel FPGA Platform • Workload & Benchmark Introduction • Baseline Profile - Hotspot Analysis • FPGA Acceleration Arch Overview • Performance Comparison • Future Works 8
  • 9. Discrete and Integrated FPGA platforms 9
  • 10. Intel Accelerator Abstraction Layer(AAL) 10 FPGAHardware
  • 11. End User Programming Interfaces 11 FPGACPU User Application CPU Infrastructure IP (UPI, PCIe*, HSSI, FPGA Management) FPGA Runtime Software (Accelerator Abstraction Layer) FPGA IP (Acceleration Function Unit) Intel-Provided Infrastructure USER SOFTWARE INTERFACE User Developed Application Specific Functions UPI/PCIe HSSI = New blocks that simplify code development. CORE CACHE INTERFACE Intel® Confidential
  • 12. Traditional FPGA Development Approach Kernels exe AFU Bitstream SW Compiler OpenCL Compiler HDL SW Compiler exe AFU Bitstream HDL Programming Syn. PAR AAL Software Blue Bitstream CPU FPGA Green Bitstream OpenCL Emulator Application Host AFU Simulation Environment (ASE) C OpenCL Programming ASE from Intel AAL from Intel Altera® Quartus Prime Pro OpenCL BSP AAL Software Blue Bitstream Green Bitstream Application CPU FPGA 12
  • 13. Outline • What’s an FPGA • Intel FPGA Platform • Workload & Benchmark Introduction • Baseline Profile - Hotspot Analysis • FPGA Acceleration Arch Overview • Performance Comparison • Future Works 13
  • 14. Workload Introduction 14Intel Confidential The test case is from a customer and it utilizes SQL query to get the accounting summaries by USER_ID on a big table. The SQL query contains heavy expression evaluations. Accounting Big Table: TIME_ID MBUSER_ID OPER_TID SUM_TIMES CHARGE1 … 20140407 2700007679977 5B013363363w 3 0 … 20140407 2704012998344 31011G13iG0 48 57180 … 20140407 2704040114238 31Q11512ZT0 1 180 … 20140407 2700007012466 31011G13iG0 8 52320 … 20140407 2700001523491 1T0311G80610ydH10G00 2 0 … 20140407 2700000765632 310103015G0 1 30 … 20140407 2700007800325 4562210021 1 0 … … 1.6x10^8 Rows 38 Columns  SQL queries to summarize customers consumption characteristics utilizing billing data.  5GB parquet format stored on HDFS, 160 Million rows.
  • 15. Workload code snippet Function Count Max 13 Sum 155 Substr 329 Case 133 Implicit Data type cast (String to Double) n/a Total 630 // Prepare val parquet = spark.read.parquet ("/mnt/nvme/inputParquet/") parquet.createOrReplaceTempView ("inputTable") // Query A very Long SQL statement, intensive use build-in functions:
  • 16. 16Intel Confidential SQL Query Physical Execution Plan Two stages and with a shuffle(cross the data in network), the map stage contains file scan, projection and partial aggregation while the reduce stage do further aggregation by merging the partial aggregation results. Stage 1 (Map) • File Scan Read data from source. • Projection Expression evaluation consumes most CPU cycles. • Partial Aggregation Aggregate per partition. Shuffle Stage 2 (Reduce) • Full Aggregation. Tiny task, consumes minor CPU cycles.
  • 17. Outline • What’s an FPGA • Intel FPGA Platform • Workload & Benchmark Introduction • Baseline Profile - Hotspot Analysis • FPGA Acceleration Arch Overview • Performance Comparison • Future Works 17
  • 18. Benchmark H/W Setup 18Intel Confidential In a single server for profiling and performance evaluation. • MCP(Skylake-FPGA Multiple Chips Package) o CPU Intel Xeon Skylake-P, [email protected], 56Hyper Threads o FPGA 1xArria10 GX, 427,200ALM, 8MB RAM (10AX115U3F45E2SG) o DMA Channels 1xUPI (80Gbps) • Memory 384GB, DDR4@2133 MHz • Disk 1xIntel SSD P3700, 1.6TB, SR:2800MB/s, SW:1900MB/s, RR:450K IOPS, RW:150K IOPS
  • 19. 19Intel Confidential Baseline Profile - CPU, The Bottleneck • PAT(Performance Analysis Tool) shows CPU is heavily utilized (assigned 54/56 Virtual Cores to Spark). The total query execution time is 85 seconds. Note: We started measurement from the 2nd run(the 1st run is to warm up data Linux file system cache), so no disk access bandwidth in general. Reduce Stage does very simple aggregation and takes minor CPU.(~1s) *For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
  • 20. 20Intel Confidential Baseline Profile - CPU, The Bottleneck, Contd. • From the VisualVM map task’s CPU breakdown we can see the projection consumes 66.7% CPU. Projection takes 66.7% of CPU *For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
  • 21. Outline • What’s an FPGA • Intel FPGA Platform • Workload & Benchmark Introduction • Baseline Profile - Hotspot Analysis • FPGA Acceleration Arch Overview • Performance Comparison • Future Works 21
  • 22. Arch Overview – Typical SQL Query Operators 2121aaIntel Confidential This POC Target
  • 23. JVM Spark Spark FPGA Adaptor Native HW InternalRow to FPGA Batch FPGA Batch to InternalRow FPGA Java Wrapper FPGA Driver FPGA FPGA Project Pattern Configure DMA Configure Huge Page Memory Pool Computation Starter, Monitor Java Native Interface(JNI) Accelerator Abstraction Layer (AAL) • Spark FPGA Adaptor • Identify the expressions in projection and export to FPGA SQL engine instructions • Data conversions between Spark Internal Rows  FPGA Batches. • FPGA Driver • Configure the SQL Engine Patterns according to the instructions from Spark FPGA Adaptor • Trigger the FPGA computation and collect results • Huge pages memory management • Configure the DMA channel between main memory & FPGA • AAL • FPGA runtime library • low level API to FPGA Driver • FPGA SQL Engine (RTL) • SQL expression pattern units, can be configurable. • DMA RX: FPGA reads input data from main memory. • DMA TX: FPGA writes results to main memory. 23 Arch Overview - S/W Stack DMA RX/TX(RTL) SQL Engine(RTL)
  • 24. null bit set(1 bit/field) values(8 bytes/field) variable length portion 4 bytes(TIME_ID) …… 64 bytes(For 4xCL alignment) …… 4 bytes(For 4xCL alignment)8 bytes(MBUSER_ID) 8 bytes(MBUSER_ID)FPGA Input Batch FPGA Output Batch Internal Row InternalRow to FPGA Batch FPGA Batch to InternalRow FPGA Java Wrapper FPGA Project 1. Get HugePage wrapped in DirectByteBuffer Internal Rows FPGAInputBatch FPGAOutputBatch Internal Rows 4. Input for Computation 5. Collect computation result 2. Data Conversion 6. Data Conversion 7. Free HugePage wrapped in DirectByteBuffer • Internal Row Spark representation of one record, flexible to represent fixed and variable length fields. • FPGA Input Batch For memory and computation efficiency fields are placed in a sequential physical memory. • FPGA Output Batch Similar as FPGA Input Batch. 3. Engine Configuration, Start 24 Arch Overview - S/W Stack, Contd. 12 bytes(ACC_NBR) Input Output Data Flow Control Flow Spark FPGA Adaptor
  • 25. Spark Engine Unit Engine Unit Engine Unit …DMA RX DMA TX Output BufferInput Buffer CPU FPGA FPGA Adapter & Driver Data Source Input BufferInput Buffer Output BufferOutput Buffer Engine Unit Engine Unit Engine Unit … Engine Unit Engine Unit Engine Unit … 169 Levels Pipeline Data Flow Control Flow Pattern Configure, Computation Control 25 Arch Overview - Engine Pipeline, Data Flow • Engine Pipeline Spark FPGA SQL Engine is designed as Engine Unit Pipelines. Every Engine Unit plays a single computation, different Engine Units are assembled together(configured by Spark) to perform a complex computation and works in the way of pipeline. A lot of pipelines(say N pipelines) can be constructed to perform N parallel computations, so that in a single FPGA cycle, N records can be digested. • Data Flow Spark pumps Data from Data Source and converts them into the format as FPGA required, and then put them into InputBuffer Array. Then FPGA gets input data via DMA RX and feed them into Engine Pipelines. The results of Engine Pipelines are filled into OutBuffer Array via DMA TX. Finally Spark converts data back in the format of Spark SQL needed.
  • 26. Arch Overview – SQL Engine Micro Architecture 26 • Every SQL Expression Evaluation engine is configurable. • Every engine contain max four pattern engines. The input data is parallel fed into pattern engine. The final result is the combine of the pattern engine result.  Pattern Engine 1 is configured to evaluate the SQL expression Substr(oper_tid,1,1) IN (‘1’, ‘7’)  Pattern Engine 2 is configured to evaluate the SQL expression Substr(oper_tid, 2, 1) IN (‘o’)
  • 27. Outline • What’s an FPGA • Intel FPGA Platform • Workload & Benchmark Introduction • Baseline Profile - Hotspot Analysis • FPGA Acceleration Arch Overview • Performance Comparison • Future Works 27
  • 28. • The FPGA accelerated version significantly reduced the total execution time, from 86 seconds(baseline) to 44 seconds in the end to end benchmark. Speedup Ratio: 86s/44s => ~2X FPGA: 44s Baseline: 86s Performance Comparison - FPGA vs Baseline 28 *For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
  • 29. • The FPGA accelerated version reduced the CPU time in expression evaluation, from 66.7%(baseline) to 6.6-% in Map stage. Projection with FPGA, less than 6.6% Projection in Baseline, 66.7% 29 Performance Comparison - FPGA vs Baseline, Contd. *For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
  • 30. Outline • What’s an FPGA • Intel FPGA Platform • Workload & Benchmark Introduction • Baseline Profile - Hotspot Analysis • FPGA Acceleration Arch Overview • Performance Comparison • Future Works 30
  • 31. Future Works • Fully Configurable FPGA SQL Acceleration Engine • In this PoC, we identified the SQL expression patterns manually in frontend and configure them to the FPGA SQL Engine units in runtime; however, we have limit FPGA SQL engines to support some of the typical expression patterns, and arbitrary SQL expression combinations is not supported yet. • More Operators Support • SQL Expression Evaluation in Projection is the first step, and for the other typical operators like Aggregation/Sort/Join probably also can be offload to FPGA. • CPU can also computes the expression evaluation when FPGA resources are fully occupied in computation. 31