Flinkcl: An Opencl-Based In-Memory Computing Architecture On Heterogeneous Cpu-Gpu Clusters For Big Data Abstract

This document proposes FlinkCL, an in-memory computing architecture for the Flink distributed dataflow system that utilizes GPUs for parallel processing. FlinkCL uses four key techniques: a heterogeneous distributed abstract model (HDST) for mapping data between CPUs and GPUs, just-in-time compilation of Java code to OpenCL kernels, hierarchical partial reduction for efficient aggregation, and heterogeneous task management. Evaluation shows FlinkCL improves performance by up to 11x for computationally intensive algorithms compared to CPU-only execution in Flink.

Uploaded by

Baranishankar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views3 pages

Flinkcl: An Opencl-Based In-Memory Computing Architecture On Heterogeneous Cpu-Gpu Clusters For Big Data Abstract

Uploaded by

Baranishankar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

FlinkCL: An OpenCL-based In-Memory Computing Architecture on Heterogeneous CPU-GPU

Clusters for Big Data

Abstract:

Research on in-memory big data management and processing has been prompted by the increase in main
memory capacity and the explosion in big data. By offering an efficient in-memory distributed execution
model, existing in-memory cluster computing platforms such as Flink and Spark have been proven to be
outstanding for processing big data. This paper proposes FlinkCL, an inmemory computing architecture
on heterogeneous CPU-GPU clusters based on OpenCL that enables Flink to utilize GPU’s massive
parallel processing ability. Our proposed architecture utilizes four techniques: a heterogeneous distributed
abstract model (HDST), a Just-In-Time (JIT) compiling schema, a hierarchical partial reduction (HPR)
and a heterogeneous task management strategy. Using FlinkCL, programmers only need to write Java
code with simple interfaces. The Java code can be compiled to OpenCL kernels and executed on CPUs
and GPUs automatically. In the HDST, a novel memory mapping scheme is proposed to avoid
serialization or deserialization between Java Virtual Machine (JVM) objects and OpenCL structs. We
have comprehensively evaluated FlinkCL with a set of representative workloads to show its effectiveness.
Our results show that FlinkCL improve the performance by up to 11× for some computationally heavy
algorithms and maintains minor performance improvements for a I/O bound algorithm.

Existing System:

By offering an efficient in-memory distributed execution model, existing in-memory cluster computing
platforms such as Flink and Spark have been proven to be outstanding for processing big data. This paper
proposes FlinkCL, an inmemory computing architecture on heterogeneous CPU-GPU clusters based on
OpenCL that enables Flink to utilize GPU’s massive parallel processing ability.

Proposed System:

Our proposed architecture utilizes four techniques: a heterogeneous distributed abstract model (HDST), a
Just-In-Time (JIT) compiling schema, a hierarchical partial reduction (HPR) and a heterogeneous task
management strategy. Using FlinkCL, programmers only need to write Java code with simple interfaces.
The Java code can be compiled to OpenCL kernels and executed on CPUs and GPUs automatically. In
the HDST, a novel memory mapping scheme is proposed to avoid serialization or deserialization between
Java Virtual Machine (JVM) objects and OpenCL structs. We have comprehensively evaluated FlinkCL
with a set of representative workloads to show its effectiveness. Our results show that FlinkCL improve
the performance by up to 11× for some computationally heavy algorithms and maintains minor
performance improvements for a I/O bound algorithm.

Conclusion:

GPUs have become efficient accelerators for HPC. This paper has proposed FlinkCL, which harnesses the
high computational power of GPUs to accelerate the in-memory cluster computing with an easy
programming model. FlinkCL is based on four proposed core techniques: an HDST, a JIT compiling
scheme, an HPR scheme and a heterogeneous task management strategy. By using these techniques,
FlinkCL remains compatible with both the compile-time and the runtime of the original Flink. To further
improve the scalability of FlinkCL, a pipeline scheme similar to that introduced in could be considered.
We can utilize this pipeline to overlap the communication between cluster nodes and the computation in a
node. In addition, by using an asynchronous execution model, transfer on PCIe and executions on GPUs
can also be overlapped. In the current implementation, data in the GPU memory must be moved into the
host memory before it can be sent over the network. A future research direction could involve enabling
GPU-to-GPU communication using GPU Direct RDMA to further improve the performance. Another
optimization measure could be a software cache scheme that can cache intermediate data in GPUs to
avoid unnecessary data transfers on PCIe. In current design, hMap and hReduce function are compiled to
kernels separately. Actually, we can fuse these kernels together if possible in our JIT compiler. By
adopting this scheme, the time for kernel invoking can be decreased and some data transfers on PCIe can
be avoided.

REFERENCES:

[1] “Flink programming guide,” http://flink.apache.org/, 2016, online; accessed 1-November-2016.

[2] M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I.
Stoica, “Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing,” in
Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation.
USENIX Association, 2012, pp. 2–2.

[3] W. Yang, K. Li, Z. Mo, and K. Li, “Performance optimization using partitioned SpMV on GPUs and
multicore CPUs,” IEEE Transactions on Computers, vol. 64, no. 9, pp. 2623–2636, 2015.

[4] Z. Zhong, V. Rychkov, and A. Lastovetsky, “Data partitioning on multicore and multi-GPU platforms
using functional performance models,” IEEE Transactions on Computers, vol. 64, no. 9, pp. 2506– 2518,
2015.

[5] K. Li, W. Yang, and K. Li, “Performance analysis and optimization for spmv on gpu using
probabilistic modeling,” Parallel and Distributed Systems, IEEE Transactions on, vol. 26, no. 1, pp. 196–
205, 2015.

[6] C. Chen, K. Li, A. Ouyang, Z. Tang, and K. Li, “Gpu-accelerated parallel hierarchical extreme
learning machine on Flink for big data,” IEEE Transactions on Systems, Man, and Cybernetics: Systems,
vol. 47, no. 10, pp. 2740–2753, 2017.

[7] W. Gropp, E. Lusk, N. Doss, and A. Skjellum, “A highperformance, portable implementation of the
MPI message passing interface standard,” Parallel computing, vol. 22, no. 6, pp. 789– 828, 1996.

[8] L. Dagum and R. Enon, “OpenMP: An industry-standard API for shared-memory programming,”
IEEE Computational Science and Engineering, vol. 5, no. 1, pp. 46–55, 1998.

[9] P. Carbone, G. Fra, S. Ewen, S. Haridi, and K. Tzoumas, “Lightweight asynchronous snapshots for
distributed dataflows,” Computer Science, 2015.

[10] C. Chen, K. Li, A. Ouyang, Z. Tang, and K. Li, “GFlink: An in-memory computing architecture on
heterogeneous CPU-GPU clusters for big data,” in International Conference on Parallel Processing, 2016,
pp. 542–551.
[11] C. Li, Y. Yang, Z. Lin, and H. Zhou, “Automatic data placement into GPU on-chip memory
resources,” in Ieee/acm International Symposium on Code Generation and Optimization, 2015, pp. 23–33.

[12] N. Fauzia and P. Sadayappan, “Characterizing and enhancing global memory data coalescing on
GPUs,” in Ieee/acm International Symposium on Code Generation and Optimization, 2015, pp. 12–22.

[13] T. Ben-Nun, E. Levy, A. Barak, and E. Rubin, “Memory access patterns: the missing piece of the
multi-GPU puzzle,” in High Performance Computing, Networking, Storage and Analysis, 2015 SC -
International Conference for, 2017, p. 19.

[14] I. J. Sung, G. D. Liu, and W. M. W. Hwu, “DL: A data layout transformation system for
heterogeneous computing,” in Innovative Parallel Computing, 2012, pp. 1–11.

[15] “Aparapi in amd developer website,” http://developer.amd. com/tools-and-sdks/opencl-

zone/aparapi/, 2016, online; accessed 1-April-2016.

Parallel Programming for Modern High Performance Computing Systems (Czarnul, Pawel)
No ratings yet
Parallel Programming for Modern High Performance Computing Systems (Czarnul, Pawel)
330 pages
Sap Module 5 Ehs
No ratings yet
Sap Module 5 Ehs
31 pages
Project Report Wheather Forecast Web Application
No ratings yet
Project Report Wheather Forecast Web Application
49 pages
Palera1n GUI Jailbreak Setup PDF
No ratings yet
Palera1n GUI Jailbreak Setup PDF
15 pages
Accelerated Computing with HIP
From Everand
Accelerated Computing with HIP
Yifan Sun
4.5/5 (2)
OpenCL Programming and Architecture: Definitive Reference for Developers and Engineers
From Everand
OpenCL Programming and Architecture: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Practical High Performance Computing: Definitive Reference for Developers and Engineers
From Everand
Practical High Performance Computing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
OpenACC Programming Essentials: Definitive Reference for Developers and Engineers
From Everand
OpenACC Programming Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Programming Scalable Systems with HPX: Definitive Reference for Developers and Engineers
From Everand
Programming Scalable Systems with HPX: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
CUDA Programming Fundamentals: Definitive Reference for Developers and Engineers
From Everand
CUDA Programming Fundamentals: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Ijaret: International Journal of Advanced Research in Engineering and Technology (Ijaret)
No ratings yet
Ijaret: International Journal of Advanced Research in Engineering and Technology (Ijaret)
9 pages
GASNet Programming and Architecture: Definitive Reference for Developers and Engineers
From Everand
GASNet Programming and Architecture: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Cost-Effective HPC Clustering For Computer Vision Applications
No ratings yet
Cost-Effective HPC Clustering For Computer Vision Applications
6 pages
OpenMP in Practice: Definitive Reference for Developers and Engineers
From Everand
OpenMP in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Practical NetCDF Techniques: Definitive Reference for Developers and Engineers
From Everand
Practical NetCDF Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
A Comparative Survey of Big Data Computing and HPC
No ratings yet
A Comparative Survey of Big Data Computing and HPC
38 pages
Programming GPU Clusters With Shared Memory Abstraction in Software
No ratings yet
Programming GPU Clusters With Shared Memory Abstraction in Software
8 pages
OpenMPI Programming and Architecture: Definitive Reference for Developers and Engineers
From Everand
OpenMPI Programming and Architecture: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Cks 2012 It Art 002
No ratings yet
Cks 2012 It Art 002
10 pages
StarPU: Parallel Computing and Task Scheduling Techniques
From Everand
StarPU: Parallel Computing and Task Scheduling Techniques
Richard Johnson
No ratings yet
A Light-Weight Approach To Dynamical Runtime Linking Supporting Heterogenous, Parallel, and Reconfigurable Architectures
No ratings yet
A Light-Weight Approach To Dynamical Runtime Linking Supporting Heterogenous, Parallel, and Reconfigurable Architectures
12 pages
Parallelization of BFS Graph Algorithm Using CUDA
No ratings yet
Parallelization of BFS Graph Algorithm Using CUDA
6 pages
Reconfigurable Dataflow Graphs For Processing-In-memory
No ratings yet
Reconfigurable Dataflow Graphs For Processing-In-memory
11 pages
Dynamic Load Balancing On Single-And Multi-GPU Systems
No ratings yet
Dynamic Load Balancing On Single-And Multi-GPU Systems
12 pages
HPC Question Bank
No ratings yet
HPC Question Bank
2 pages
A Programming Model For Massive Data Parallelism With Data Dependencies
No ratings yet
A Programming Model For Massive Data Parallelism With Data Dependencies
8 pages
Parallel Programming with MPI: Definitive Reference for Developers and Engineers
From Everand
Parallel Programming with MPI: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
To HPC With MPI For Data Science: Frank Nielsen
No ratings yet
To HPC With MPI For Data Science: Frank Nielsen
304 pages
ROCm Deep Dive: Definitive Reference for Developers and Engineers
From Everand
ROCm Deep Dive: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
MPICH Essentials: Definitive Reference for Developers and Engineers
From Everand
MPICH Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Fpga Implementation of A License Plate Recognition Soc Using Automatically Generated Streaming Accelerators
No ratings yet
Fpga Implementation of A License Plate Recognition Soc Using Automatically Generated Streaming Accelerators
8 pages
Cilk Programming and Algorithms: Definitive Reference for Developers and Engineers
From Everand
Cilk Programming and Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
HPC Clusters Demystified
From Everand
HPC Clusters Demystified
Alisa Turing
No ratings yet
Summary Master Thesis
No ratings yet
Summary Master Thesis
3 pages
Technical Foundations of Torch: Definitive Reference for Developers and Engineers
From Everand
Technical Foundations of Torch: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Cai Nat
No ratings yet
Cai Nat
25 pages
12. Dynamic Partitioned Scheduling of Real-time Tasks on ARM Big.littLE
No ratings yet
12. Dynamic Partitioned Scheduling of Real-time Tasks on ARM Big.littLE
14 pages
International Journal of Distributed and Parallel Systems (IJDPS)
No ratings yet
International Journal of Distributed and Parallel Systems (IJDPS)
20 pages
Parallel Comp Point Main
No ratings yet
Parallel Comp Point Main
18 pages
Mastering System Programming with C: Files, Processes, and IPC
From Everand
Mastering System Programming with C: Files, Processes, and IPC
Larry Jones
No ratings yet
High-Performance Computing in University Scientific Research
No ratings yet
High-Performance Computing in University Scientific Research
3 pages
PP_CS(451)
No ratings yet
PP_CS(451)
89 pages
PDC Final Document
No ratings yet
PDC Final Document
21 pages
Mastering the Craft of C Programming: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Craft of C Programming: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet
GTKSharp Programming Guide: Definitive Reference for Developers and Engineers
From Everand
GTKSharp Programming Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Practical MXNet Applications: Definitive Reference for Developers and Engineers
From Everand
Practical MXNet Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Concurrency in C++: Writing High-Performance Multithreaded Code
From Everand
Concurrency in C++: Writing High-Performance Multithreaded Code
Robert Johnson
No ratings yet
GPU_Architecture_and_Programming_Lecture
No ratings yet
GPU_Architecture_and_Programming_Lecture
9 pages
Navya2022 Chapter ComparativeStudyOfDirective-ba
No ratings yet
Navya2022 Chapter ComparativeStudyOfDirective-ba
13 pages
OpenCL Programming by Example
From Everand
OpenCL Programming by Example
Ravishekhar Banger
No ratings yet
GPU Verification Iccad18-Gpu
No ratings yet
GPU Verification Iccad18-Gpu
8 pages
An Approach To Parallel Processing: Yashraj Rai Puja Padiya
No ratings yet
An Approach To Parallel Processing: Yashraj Rai Puja Padiya
3 pages
NTNU HetComp Topublish PDF
No ratings yet
NTNU HetComp Topublish PDF
83 pages
A Study of Performance Programming of CPU, GPU Accelerated Computers and SIMD Architecture
No ratings yet
A Study of Performance Programming of CPU, GPU Accelerated Computers and SIMD Architecture
19 pages
Achieving High Performance Computing
No ratings yet
Achieving High Performance Computing
58 pages
Graphics_processing_unit_GPU_programming_strategie
No ratings yet
Graphics_processing_unit_GPU_programming_strategie
14 pages
Creating
No ratings yet
Creating
23 pages
Caffe Deep Learning Framework Essentials: Definitive Reference for Developers and Engineers
From Everand
Caffe Deep Learning Framework Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Workflows with Colab: Definitive Reference for Developers and Engineers
From Everand
Efficient Workflows with Colab: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Annex 1 - Description of The Action (Part B)
No ratings yet
Annex 1 - Description of The Action (Part B)
79 pages
2219-Article Text-15412-2-10-20230802
No ratings yet
2219-Article Text-15412-2-10-20230802
12 pages
Recent Advances in Computer Vision Applications Using Parallel Processing
No ratings yet
Recent Advances in Computer Vision Applications Using Parallel Processing
126 pages
3677035
No ratings yet
3677035
30 pages
Barani Sankar S
No ratings yet
Barani Sankar S
3 pages
Event Management System Project Report
No ratings yet
Event Management System Project Report
29 pages
Egg Production Management System
No ratings yet
Egg Production Management System
10 pages
Electric Bike Recharge Nearest Bunk
No ratings yet
Electric Bike Recharge Nearest Bunk
4 pages
Node JS
No ratings yet
Node JS
3 pages
Iot Based Smart Oil Skimmer Robot For The Marine Oil Spill: ISSN: 2348 - 8549
No ratings yet
Iot Based Smart Oil Skimmer Robot For The Marine Oil Spill: ISSN: 2348 - 8549
4 pages
Medical Shop App Online Pharmacy Management System.
No ratings yet
Medical Shop App Online Pharmacy Management System.
6 pages
Bllod Bank
No ratings yet
Bllod Bank
6 pages
Bachu Rajitha
No ratings yet
Bachu Rajitha
1 page
A Survey of Correlated High Utility Pattern Mining
No ratings yet
A Survey of Correlated High Utility Pattern Mining
15 pages
Design of Secure Authentication Protocol For Cloud-Assisted Telecare Medical Information System Using Blockchain
No ratings yet
Design of Secure Authentication Protocol For Cloud-Assisted Telecare Medical Information System Using Blockchain
15 pages
Manual Testing Syllabus
No ratings yet
Manual Testing Syllabus
4 pages
Certificate
No ratings yet
Certificate
1 page
Download
No ratings yet
Download
1 page
Practice Questions For Practical
100% (1)
Practice Questions For Practical
11 pages
PO Dev Data Model Cookbook.060927
No ratings yet
PO Dev Data Model Cookbook.060927
61 pages
18MIT13C-U1
No ratings yet
18MIT13C-U1
33 pages
Planned Order Conversion To Process Orders
No ratings yet
Planned Order Conversion To Process Orders
2 pages
Senior Administrative Officer Cover Letter
67% (3)
Senior Administrative Officer Cover Letter
8 pages
PDF Advances in Computer Communication and Computational Sciences Proceedings of IC4S 2019 Sanjiv K. Bhatia download
100% (3)
PDF Advances in Computer Communication and Computational Sciences Proceedings of IC4S 2019 Sanjiv K. Bhatia download
65 pages
Symantec Ghost-Solutions-Suite-En
No ratings yet
Symantec Ghost-Solutions-Suite-En
2 pages
07. SQL Commands - DDL, DML, DQL, DCL, TCL
No ratings yet
07. SQL Commands - DDL, DML, DQL, DCL, TCL
4 pages
Temperature and Humidity Sensor With LCD 1602 I2C
No ratings yet
Temperature and Humidity Sensor With LCD 1602 I2C
1 page
Assignment 01 Fall 2020
No ratings yet
Assignment 01 Fall 2020
12 pages
A Survey of Blockchain Applications in The FinTech Sector
No ratings yet
A Survey of Blockchain Applications in The FinTech Sector
44 pages
WhatsApp Bulk Sender
80% (5)
WhatsApp Bulk Sender
2 pages
Estimated Efforts: 2 Pds Trend NXT Url Link
No ratings yet
Estimated Efforts: 2 Pds Trend NXT Url Link
3 pages
Xpress Install Guide
No ratings yet
Xpress Install Guide
44 pages
Exp 05C3
No ratings yet
Exp 05C3
762 pages
Lab Report Cad 6
No ratings yet
Lab Report Cad 6
6 pages
Lab 1-4 - Reports
No ratings yet
Lab 1-4 - Reports
29 pages
Chapter 3 Selection Statements: Lecturer: Mrs Rohani Hassan
No ratings yet
Chapter 3 Selection Statements: Lecturer: Mrs Rohani Hassan
52 pages
Louisa v. Netflix, Et Al. - Complaint Aug. 18, 2020
100% (1)
Louisa v. Netflix, Et Al. - Complaint Aug. 18, 2020
39 pages
Shop GSMServer.ORG Number #1 Mobile Solutions Provider
No ratings yet
Shop GSMServer.ORG Number #1 Mobile Solutions Provider
1 page
CS312 Operating Systems: Usman Institute of Technology Department of Computer Science
No ratings yet
CS312 Operating Systems: Usman Institute of Technology Department of Computer Science
6 pages
Account Setup - My Settings
No ratings yet
Account Setup - My Settings
12 pages
Mohamed Hashi COS Assignment 2
No ratings yet
Mohamed Hashi COS Assignment 2
4 pages
08 Project Profile (ANU CDE Project)
No ratings yet
08 Project Profile (ANU CDE Project)
2 pages
Introduction To Supply Chain
No ratings yet
Introduction To Supply Chain
16 pages
Telegram SOP
No ratings yet
Telegram SOP
10 pages
Information and Network Security - PostQuiz - Attempt Review
No ratings yet
Information and Network Security - PostQuiz - Attempt Review
4 pages

Flinkcl: An Opencl-Based In-Memory Computing Architecture On Heterogeneous Cpu-Gpu Clusters For Big Data Abstract

Uploaded by

Flinkcl: An Opencl-Based In-Memory Computing Architecture On Heterogeneous Cpu-Gpu Clusters For Big Data Abstract

Uploaded by

FlinkCL: An OpenCL-based In-Memory Computing Architecture on Heterogeneous CPU-GPU

Clusters for Big Data

[1] “Flink programming guide,” http://flink.apache.org/, 2016, online; accessed 1-November-2016.

[15] “Aparapi in amd developer website,” http://developer.amd. com/tools-and-sdks/opencl-

You might also like