SlideShare a Scribd company logo
GPUs
GraalVM
Juan J.
Fumero
Introduction
API
Runtime Code
Generation
Data
Management
Results
Conclusion
Runtime and Data Management for
Heterogeneous Computing in Java
Juan Jos´e Fumero, Toomas Remmelg, Michel Steuwer,
Christophe Dubach
The University of Edinburgh
Principles and Practices of Programming on the Java
Platform 2015
1 / 28
GPUs
GraalVM
Juan J.
Fumero
Introduction
API
Runtime Code
Generation
Data
Management
Results
Conclusion
1 Introduction
2 API
3 Runtime Code Generation
4 Data Management
5 Results
6 Conclusion
2 / 28
GPUs
GraalVM
Juan J.
Fumero
Introduction
API
Runtime Code
Generation
Data
Management
Results
Conclusion
Heterogeneous Computing
NBody App (NVIDIA SDK) ˜105x speedup over seq
LU Decomposition (Rodinia Benchmark) ˜10x over 32
OpenMP threads
3 / 28
GPUs
GraalVM
Juan J.
Fumero
Introduction
API
Runtime Code
Generation
Data
Management
Results
Conclusion
Cool, but how to program?
4 / 28
GPUs
GraalVM
Juan J.
Fumero
Introduction
API
Runtime Code
Generation
Data
Management
Results
Conclusion
Example in OpenCL
1 // create host buffers
2 i n t ∗A, . . . .
3 //Initialization
4 . . .
5 // platform
6 c l u i n t numPlatforms = 0;
7 c l p l a t f o r m i d ∗ p l a t f o r m s ;
8 s t a t u s = clGetPlatf ormIDs (0 , NULL, &numPlatforms ) ;
9 p l a t f o r m s = ( c l p l a t f o r m i d ∗) malloc ( numPlatforms∗ s i z e o f ( c l p l a t f o r m i d ) ) ;
10 s t a t u s = c lGetPlatform IDs ( numPlatforms , platforms , NULL) ;
11 c l u i n t numDevices = 0;
12 c l d e v i c e i d ∗ d e v i c e s ;
13 s t a t u s = clGetDeviceIDs ( p l a t f o r m s [ 0 ] , CL DEVICE TYPE ALL , 0 , NULL, &
numDevices ) ;
14 // Allocate space for each device
15 d e v i c e s = ( c l d e v i c e i d ∗) malloc ( numDevices∗ s i z e o f ( c l d e v i c e i d ) ) ;
16 // Fill in devices
17 s t a t u s = clGetDeviceIDs ( p l a t f o r m s [ 0 ] , CL DEVICE TYPE ALL , numDevices ,
devices , NULL) ;
18 c l c o n t e x t context ;
19 context = c l C r e a t e C o n t e x t (NULL, numDevices , devices , NULL, NULL, &s t a t u s ) ;
20 cl command queue cmdQ ;
21 cmdQ = clCreateCommandQueue ( context , d e v i c e s [ 0 ] , 0 , &s t a t u s ) ;
22 cl mem d A , d B , d C ;
23 d A = c l C r e a t e B u f f e r ( context , CL MEM READ ONLY|CL MEM COPY HOST PTR,
d a t a s i z e , A, &s t a t u s ) ;
24 d B = c l C r e a t e B u f f e r ( context , CL MEM READ ONLY|CL MEM COPY HOST PTR,
d a t a s i z e , B, &s t a t u s ) ;
25 d C = c l C r e a t e B u f f e r ( context , CL MEM WRITE ONLY, d a t a s i z e , NULL, &s t a t u s ) ;
26 . . .
27 // Check errors
28 . . .
5 / 28
GPUs
GraalVM
Juan J.
Fumero
Introduction
API
Runtime Code
Generation
Data
Management
Results
Conclusion
Example in OpenCL
1 const char ∗ s o u r c e F i l e = ” k e r n e l . c l ” ;
2 source = r e a d s o u r c e ( s o u r c e F i l e ) ;
3 program = clCreateProgramWithSource ( context , 1 , ( const char ∗∗)&source , NULL,
&s t a t u s ) ;
4 c l i n t b u i l d E r r ;
5 b u i l d E r r = clBuildProgram ( program , numDevices , devices , NULL, NULL, NULL) ;
6 // Create a kernel
7 k e r n e l = c l C r e a t e K e r n e l ( program , ” vecadd ” , &s t a t u s ) ;
8
9 s t a t u s = c l S e t K e r n e l A r g ( kernel , 0 , s i z e o f ( cl mem ) , &d A ) ;
10 s t a t u s |= c l S e t K e r n e l A r g ( kernel , 1 , s i z e o f ( cl mem ) , &d B ) ;
11 s t a t u s |= c l S e t K e r n e l A r g ( kernel , 2 , s i z e o f ( cl mem ) , &d C ) ;
12
13 s i z e t globalWorkSize [ 1 ] = { ELEMENTS};
14 s i z e t l o c a l i t e m s i z e [ 1 ] = {5};
15
16 clEnqueueNDRangeKernel (cmdQ, kernel , 1 , NULL, globalWorkSize , NULL, 0 , NULL,
NULL) ;
17
18 clEnqueueReadBuffer (cmdQ, d C , CL TRUE , 0 , d a t a s i z e , C, 0 , NULL, NULL) ;
19
20 // Free memory
6 / 28
GPUs
GraalVM
Juan J.
Fumero
Introduction
API
Runtime Code
Generation
Data
Management
Results
Conclusion
OpenCL example
1 k e r n e l void
vecadd (
2 g l o b a l i n t ∗a ,
3 g l o b a l i n t ∗b ,
4 g l o b a l i n t ∗c ) {
5
6 i n t i d x =
7 g e t g l o b a l i d (0) ;
8 c [ i d x ] = a [ i d x ] ∗
b [ i d x ] ;
9 }
• Hello world App ˜ 250 lines of
code (including error
checking)
• Low-level and specific code
• Knowledge about target
architecture
• If GPU/accelerator changes,
tuning is required
7 / 28
GPUs
GraalVM
Juan J.
Fumero
Introduction
API
Runtime Code
Generation
Data
Management
Results
Conclusion
OpenCL programming is hard and error-prone!!
8 / 28
GPUs
GraalVM
Juan J.
Fumero
Introduction
API
Runtime Code
Generation
Data
Management
Results
Conclusion
Higher levels of abstraction
9 / 28
GPUs
GraalVM
Juan J.
Fumero
Introduction
API
Runtime Code
Generation
Data
Management
Results
Conclusion
Programming for Heterogeneous
Computing
10 / 28
GPUs
GraalVM
Juan J.
Fumero
Introduction
API
Runtime Code
Generation
Data
Management
Results
Conclusion
Higher levels of abstraction
11 / 28
GPUs
GraalVM
Juan J.
Fumero
Introduction
API
Runtime Code
Generation
Data
Management
Results
Conclusion
Similar works
• Sumatra API (discontinued): Stream API for HSAIL
• AMD Aparapi: Java API for OpenCL
• NVIDIA Nova: functional programming language for
CPU/GPU
• Cooperhead: subset of python than can be executed on
heterogeneous platforms.
12 / 28
GPUs
GraalVM
Juan J.
Fumero
Introduction
API
Runtime Code
Generation
Data
Management
Results
Conclusion
Our Approach
Three levels of abstraction:
• Parallel Skeletons: API based on functional programming
style (map/reduce)
• High-level optimising library which rewrites operations to
target specific hardware
• OpenCL code generation and runtime with data
management for heterogeneous architecture
13 / 28
GPUs
GraalVM
Juan J.
Fumero
Introduction
API
Runtime Code
Generation
Data
Management
Results
Conclusion
Example: Saxpy
1 // Computation function
2 ArrayFunc<Tuple2<Float , Float >, Float > mult = new
MapFunction<>(t −> 2.5 f ∗ t . 1 () + t . 2 () ) ;
3
4 // Prepare the input
5 Tuple2<Float , Float >[] i n p ut = new Tuple2 [ s i z e ] ;
6 f o r ( i n t i = 0; i < i np u t . l e n g t h ; ++i ) {
7 i n pu t [ i ] . 1 = ( f l o a t ) ( i ∗ 0.323) ;
8 i n pu t [ i ] . 2 = ( f l o a t ) ( i + 2 . 0 ) ;
9 }
10
11 // Computation
12 Flo at [ ] output = mult . apply ( i np u t ) ;
If accelerator enabled, the map expression is rewritten in lower
level operations automatically.
map(λ) = MapAccelerator(λ) =
CopyIn().computeOCL(λ).CopyOut()
14 / 28
GPUs
GraalVM
Juan J.
Fumero
Introduction
API
Runtime Code
Generation
Data
Management
Results
Conclusion
Our Approach
Overview
Ar r ayFunc
Map
MapThr eads
MapOpenCL
Reduce. . .
appl y( ) {
f or ( i = 0; i < si ze; ++i )
out [ i ] = f . appl y( i n[ i ] ) ) ;
}
appl y( ) {
f or ( t hr ead : t hr eads)
t hr ead. per f or mMapSeq( ) ;
}
appl y( ) {
copyToDevi ce( ) ;
execut e( ) ;
copyToHost ( ) ;
}
Funct i on
15 / 28
GPUs
GraalVM
Juan J.
Fumero
Introduction
API
Runtime Code
Generation
Data
Management
Results
Conclusion
Runtime Code Generation
Workflow
...
10: aload_2
11: iload_3
12: aload_0
13: getfield
16: aaload
18: invokeinterface#apply
23: aastore
24: iinc
27: iload_3
...
Java source
Map.apply(f)
Java bytecode
Graal VM
CFG + Dataflow
(Graal IR)
void kernel (
global float* input,
global float* output) {
...;
...;
} OpenCL Kernel
3. optimizations
2. IR generation
4. kernel generation
1. Type inference
16 / 28
GPUs
GraalVM
Juan J.
Fumero
Introduction
API
Runtime Code
Generation
Data
Management
Results
Conclusion
Runtime Code Generation
Param
StartNode MethodCallTarget
Invoke#Integer.intValue
DoubleConvert Const (2.0)
*
MethodCallTarget
Invoke#Double.valueOf
Param
StartNode IsNull
GuardingPi (NullCheckException)
DoubleConvert Const (2.0)
*
Box
Return
Return
Unbox
Param
StartNode
DoubleConvert Const (2.0)
*
Return
inline double lambda0 ( int p0 ) {
double cast_1 = ( double ) p0 ;
double result_2 = cast_1 * 2.0;
return result_2 ;
}
17 / 28
GPUs
GraalVM
Juan J.
Fumero
Introduction
API
Runtime Code
Generation
Data
Management
Results
Conclusion
OpenCL code generated
1 double lambda0 ( f l o a t p0 ) {
2 double c a s t 1 = ( double ) p0 ;
3 double r e s u l t 2 = c a s t 1 ∗ 2 . 0 ;
4 return r e s u l t 2 ;
5 }
6 k e r n e l void lambdaComputationKernel (
7 g l o b a l f l o a t ∗ p0 ,
8 g l o b a l i n t ∗ p0 index data ,
9 g l o b a l double ∗p1 ,
10 g l o b a l i n t ∗ p 1 i n d e x d a t a ) {
11 i n t p0 dim 1 = 0; i n t p1 dim 1 = 0;
12 i n t gs = g e t g l o b a l s i z e (0) ;
13 i n t loop 1 = g e t g l o b a l i d (0) ;
14 f o r ( ; ; loop 1 += gs ) {
15 i n t p 0 l e n d i m 1 = p 0 i n d e x d a t a [ p0 dim 1 ] ;
16 bool cond 2 = loop 1 < p 0 l e n d i m 1 ;
17 i f ( cond 2 ) {
18 f l o a t auxVar0 = p0 [ loop 1 ] ;
19 double r e s = lambd0 ( auxVar0 ) ;
20 p1 [ p 1 i n d e x d a t a [ p1 dim 1 + 1] + loop 1 ]
21 = r e s ;
22 } e l s e { break ; }
23 }
24 }
18 / 28
GPUs
GraalVM
Juan J.
Fumero
Introduction
API
Runtime Code
Generation
Data
Management
Results
Conclusion
Main Components
User Code
Acceleartor
Runtime
Java Threads
Deoptimisation(*)
Skeleton/Pattern rewriten
Runtime
Buffers Management
Code Cache Management
Graal Brigde
Code Generator
Optimizer
OCL JIT Compilation
API:
parallel pattern composition
19 / 28
GPUs
GraalVM
Juan J.
Fumero
Introduction
API
Runtime Code
Generation
Data
Management
Results
Conclusion
Investigation of runtime for BS
Black-scholes benchmark.
Float[] =⇒ Tuple2 < Float, Float > []
0.0
0.2
0.4
0.6
0.8
1.0
Amountoftotalruntimein%
Unmarshaling
CopyToCPU
GPU Execution
CopyToGPU
Marshaling
Java overhead
• Un/marshal data takes
up to 90% of the time
• Computation step
should be dominant
This is not acceptable. Can we do better?
20 / 28
GPUs
GraalVM
Juan J.
Fumero
Introduction
API
Runtime Code
Generation
Data
Management
Results
Conclusion
Custom Array Type
Programmer's View
Tuple2
...
Graal-OCL VM
float float float float...
double double double double...
FloatBuffer
DoubleBuffer
...
0 1 2 n-1
...
0 1 2 n-1
0 1 2 n-1
float
double
Tuple2
float
double
Tuple2
float
double
Tuple2
float
double
...
PArray<Tuple2<Float,Double>>
With this layout, un/marshal operations are not necessary
21 / 28
GPUs
GraalVM
Juan J.
Fumero
Introduction
API
Runtime Code
Generation
Data
Management
Results
Conclusion
Example of JPAI
1 ArrayFunc<Tuple2<Float , Double >, Double> f = new
MapFunction<>(t −> 2.5 f ∗ t . 1 () + t . 2 () ) ;
2
3 PArray<Tuple2<Float , Double>> i n pu t = new PArray<>( s i z e ) ;
4
5 f o r ( i n t i = 0; i < s i z e ; ++i ) {
6 i np u t . put ( i , new Tuple2 <>(( f l o a t ) i , ( double ) i + 2) ) ;
7 }
8
9 PArray<Double> output = f . apply ( i n pu t ) ;
pArray.put(2, new Tuple2<>(2.0f, 4.0));
1.0f 2.0f0.0f
FloatBuffer
3.0 4.02.0
DoubleBuffer
...
...
22 / 28
GPUs
GraalVM
Juan J.
Fumero
Introduction
API
Runtime Code
Generation
Data
Management
Results
Conclusion
Setup
• 5 Applications
• Comparison with:
• Java Sequential - Graal
compiled code
• AMD and Nvidia GPUs
• Java Array vs. Custom
PArray
• Java threads
23 / 28
GPUs
GraalVM
Juan J.
Fumero
Introduction
API
Runtime Code
Generation
Data
Management
Results
Conclusion
Java Threads Execution
0
1
2
3
4
5
6
small large
Saxpy
small large
K−Means
small large
Black−Scholes
small large
N−Body
small large
Monte Carlo
Speedupvs.Javasequential
Number of Java Threads
#1 #2 #4 #8 #16
CPU: Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz
24 / 28
GPUs
GraalVM
Juan J.
Fumero
Introduction
API
Runtime Code
Generation
Data
Management
Results
Conclusion
OpenCL GPU Execution
0.1
1
10
100
1000
small large
Saxpy
0.004 0.004
small large
K−Means
small large
Black−Scholes
small large
N−Body
small large
Monte Carlo
Speedupvs.Javasequential
Nvidia Marshalling Nvidia Optimized AMD Marshalling AMD Optimized
AMD Radeon R9 295
NVIDIA Geforce GTX Titan Black
25 / 28
GPUs
GraalVM
Juan J.
Fumero
Introduction
API
Runtime Code
Generation
Data
Management
Results
Conclusion
OpenCL GPU Execution
0.1
1
10
100
1000
small large
Saxpy
0.004 0.004
small large
K−Means
small large
Black−Scholes
small large
N−Body
small large
Monte Carlo
Speedupvs.Javasequential
Nvidia Marshalling Nvidia Optimized AMD Marshalling AMD Optimized
10x
12x 70x
AMD Radeon R9 295
NVIDIA Geforce GTX Titan Black
26 / 28
GPUs
GraalVM
Juan J.
Fumero
Introduction
API
Runtime Code
Generation
Data
Management
Results
Conclusion
.zip(Conclusions).map(Future)
Present
• We have presented an API to enable heterogeneous
computing in Java
• Custom array type to reduce overheads when transfer the
data
• Runtime system to run heterogeneous applications within
Java
Future
• Runtime data type specialization
• Code generation for multiple devices
• Runtime scheduling (Where is the best place to run the
code?)
27 / 28
GPUs
GraalVM
Juan J.
Fumero
Introduction
API
Runtime Code
Generation
Data
Management
Results
Conclusion
Thanks so much for your attention
This work was supported by
a grant from:
Juan Fumero
Email: juan.fumero@ed.ac.uk
Webpage: http://homepages.inf.ed.ac.uk/s1369892/
28 / 28
Ad

More Related Content

What's hot (20)

Gpus graal
Gpus graalGpus graal
Gpus graal
Juan Fumero
 
HKG15-207: Advanced Toolchain Usage Part 3
HKG15-207: Advanced Toolchain Usage Part 3HKG15-207: Advanced Toolchain Usage Part 3
HKG15-207: Advanced Toolchain Usage Part 3
Linaro
 
Address/Thread/Memory Sanitizer
Address/Thread/Memory SanitizerAddress/Thread/Memory Sanitizer
Address/Thread/Memory Sanitizer
Platonov Sergey
 
Async await in C++
Async await in C++Async await in C++
Async await in C++
cppfrug
 
[Sitcon2018] Analysis and Improvement of IOTA PoW Implementation
[Sitcon2018] Analysis and Improvement of IOTA PoW Implementation[Sitcon2018] Analysis and Improvement of IOTA PoW Implementation
[Sitcon2018] Analysis and Improvement of IOTA PoW Implementation
Zhen Wei
 
An introduction to ROP
An introduction to ROPAn introduction to ROP
An introduction to ROP
Saumil Shah
 
GTC16 - S6510 - Targeting GPUs with OpenMP 4.5
GTC16 - S6510 - Targeting GPUs with OpenMP 4.5GTC16 - S6510 - Targeting GPUs with OpenMP 4.5
GTC16 - S6510 - Targeting GPUs with OpenMP 4.5
Jeff Larkin
 
Processor Verification Using Open Source Tools and the GCC Regression Test Suite
Processor Verification Using Open Source Tools and the GCC Regression Test SuiteProcessor Verification Using Open Source Tools and the GCC Regression Test Suite
Processor Verification Using Open Source Tools and the GCC Regression Test Suite
DVClub
 
Good news, everybody! Guile 2.2 performance notes (FOSDEM 2016)
Good news, everybody! Guile 2.2 performance notes (FOSDEM 2016)Good news, everybody! Guile 2.2 performance notes (FOSDEM 2016)
Good news, everybody! Guile 2.2 performance notes (FOSDEM 2016)
Igalia
 
Advance ROP Attacks
Advance ROP AttacksAdvance ROP Attacks
Advance ROP Attacks
n|u - The Open Security Community
 
VLSI Lab manual PDF
VLSI Lab manual PDFVLSI Lab manual PDF
VLSI Lab manual PDF
UR11EC098
 
EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5
PRADEEP
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Marina Kolpakova
 
VLSI Anna University Practical Examination
VLSI Anna University Practical ExaminationVLSI Anna University Practical Examination
VLSI Anna University Practical Examination
Gouthaman V
 
Ch7
Ch7Ch7
Ch7
kinnarshah8888
 
Parallel program design
Parallel program designParallel program design
Parallel program design
ZongYing Lyu
 
Instruction Combine in LLVM
Instruction Combine in LLVMInstruction Combine in LLVM
Instruction Combine in LLVM
Wang Hsiangkai
 
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
 An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
RISC-V International
 
Demystify eBPF JIT Compiler
Demystify eBPF JIT CompilerDemystify eBPF JIT Compiler
Demystify eBPF JIT Compiler
Netronome
 
Re-engineering Eclipse MDT/OCL for Xtext
Re-engineering Eclipse MDT/OCL for XtextRe-engineering Eclipse MDT/OCL for Xtext
Re-engineering Eclipse MDT/OCL for Xtext
Edward Willink
 
HKG15-207: Advanced Toolchain Usage Part 3
HKG15-207: Advanced Toolchain Usage Part 3HKG15-207: Advanced Toolchain Usage Part 3
HKG15-207: Advanced Toolchain Usage Part 3
Linaro
 
Address/Thread/Memory Sanitizer
Address/Thread/Memory SanitizerAddress/Thread/Memory Sanitizer
Address/Thread/Memory Sanitizer
Platonov Sergey
 
Async await in C++
Async await in C++Async await in C++
Async await in C++
cppfrug
 
[Sitcon2018] Analysis and Improvement of IOTA PoW Implementation
[Sitcon2018] Analysis and Improvement of IOTA PoW Implementation[Sitcon2018] Analysis and Improvement of IOTA PoW Implementation
[Sitcon2018] Analysis and Improvement of IOTA PoW Implementation
Zhen Wei
 
An introduction to ROP
An introduction to ROPAn introduction to ROP
An introduction to ROP
Saumil Shah
 
GTC16 - S6510 - Targeting GPUs with OpenMP 4.5
GTC16 - S6510 - Targeting GPUs with OpenMP 4.5GTC16 - S6510 - Targeting GPUs with OpenMP 4.5
GTC16 - S6510 - Targeting GPUs with OpenMP 4.5
Jeff Larkin
 
Processor Verification Using Open Source Tools and the GCC Regression Test Suite
Processor Verification Using Open Source Tools and the GCC Regression Test SuiteProcessor Verification Using Open Source Tools and the GCC Regression Test Suite
Processor Verification Using Open Source Tools and the GCC Regression Test Suite
DVClub
 
Good news, everybody! Guile 2.2 performance notes (FOSDEM 2016)
Good news, everybody! Guile 2.2 performance notes (FOSDEM 2016)Good news, everybody! Guile 2.2 performance notes (FOSDEM 2016)
Good news, everybody! Guile 2.2 performance notes (FOSDEM 2016)
Igalia
 
VLSI Lab manual PDF
VLSI Lab manual PDFVLSI Lab manual PDF
VLSI Lab manual PDF
UR11EC098
 
EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5
PRADEEP
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Marina Kolpakova
 
VLSI Anna University Practical Examination
VLSI Anna University Practical ExaminationVLSI Anna University Practical Examination
VLSI Anna University Practical Examination
Gouthaman V
 
Parallel program design
Parallel program designParallel program design
Parallel program design
ZongYing Lyu
 
Instruction Combine in LLVM
Instruction Combine in LLVMInstruction Combine in LLVM
Instruction Combine in LLVM
Wang Hsiangkai
 
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
 An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
RISC-V International
 
Demystify eBPF JIT Compiler
Demystify eBPF JIT CompilerDemystify eBPF JIT Compiler
Demystify eBPF JIT Compiler
Netronome
 
Re-engineering Eclipse MDT/OCL for Xtext
Re-engineering Eclipse MDT/OCL for XtextRe-engineering Eclipse MDT/OCL for Xtext
Re-engineering Eclipse MDT/OCL for Xtext
Edward Willink
 

Similar to Runtime Code Generation and Data Management for Heterogeneous Computing in Java (20)

ISCA Final Presentaiton - Compilations
ISCA Final Presentaiton -  CompilationsISCA Final Presentaiton -  Compilations
ISCA Final Presentaiton - Compilations
HSA Foundation
 
C++ amp on linux
C++ amp on linuxC++ amp on linux
C++ amp on linux
Miller Lee
 
Data Acquisition
Data AcquisitionData Acquisition
Data Acquisition
azhar557
 
Whats new in_csharp4
Whats new in_csharp4Whats new in_csharp4
Whats new in_csharp4
Abed Bukhari
 
Global Interpreter Lock: Episode I - Break the Seal
Global Interpreter Lock: Episode I - Break the SealGlobal Interpreter Lock: Episode I - Break the Seal
Global Interpreter Lock: Episode I - Break the Seal
Tzung-Bi Shih
 
GPU Programming on CPU - Using C++AMP
GPU Programming on CPU - Using C++AMPGPU Programming on CPU - Using C++AMP
GPU Programming on CPU - Using C++AMP
Miller Lee
 
Static analysis of C++ source code
Static analysis of C++ source codeStatic analysis of C++ source code
Static analysis of C++ source code
PVS-Studio
 
Static analysis of C++ source code
Static analysis of C++ source codeStatic analysis of C++ source code
Static analysis of C++ source code
Andrey Karpov
 
2.1 ### uVision Project, (C) Keil Software .docx
2.1   ### uVision Project, (C) Keil Software    .docx2.1   ### uVision Project, (C) Keil Software    .docx
2.1 ### uVision Project, (C) Keil Software .docx
tarifarmarie
 
Tiramisu をちょっと、味見してみました。
Tiramisu をちょっと、味見してみました。Tiramisu をちょっと、味見してみました。
Tiramisu をちょっと、味見してみました。
Mr. Vengineer
 
망고100 보드로 놀아보자 15
망고100 보드로 놀아보자 15망고100 보드로 놀아보자 15
망고100 보드로 놀아보자 15
종인 전
 
Staging driver sins
Staging driver sinsStaging driver sins
Staging driver sins
Stephen Hemminger
 
Global Interpreter Lock: Episode III - cat &lt; /dev/zero > GIL;
Global Interpreter Lock: Episode III - cat &lt; /dev/zero > GIL;Global Interpreter Lock: Episode III - cat &lt; /dev/zero > GIL;
Global Interpreter Lock: Episode III - cat &lt; /dev/zero > GIL;
Tzung-Bi Shih
 
Stranger in These Parts. A Hired Gun in the JS Corral (JSConf US 2012)
Stranger in These Parts. A Hired Gun in the JS Corral (JSConf US 2012)Stranger in These Parts. A Hired Gun in the JS Corral (JSConf US 2012)
Stranger in These Parts. A Hired Gun in the JS Corral (JSConf US 2012)
Igalia
 
Linux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudLinux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloud
Andrea Righi
 
Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rLrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with r
Ferdinand Jamitzky
 
Multiplatform JIT Code Generator for NetBSD by Alexander Nasonov
Multiplatform JIT Code Generator for NetBSD by Alexander NasonovMultiplatform JIT Code Generator for NetBSD by Alexander Nasonov
Multiplatform JIT Code Generator for NetBSD by Alexander Nasonov
eurobsdcon
 
clWrap: Nonsense free control of your GPU
clWrap: Nonsense free control of your GPUclWrap: Nonsense free control of your GPU
clWrap: Nonsense free control of your GPU
John Colvin
 
Beyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisBeyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic Analysis
Fastly
 
Microkernel Development
Microkernel DevelopmentMicrokernel Development
Microkernel Development
Rodrigo Almeida
 
ISCA Final Presentaiton - Compilations
ISCA Final Presentaiton -  CompilationsISCA Final Presentaiton -  Compilations
ISCA Final Presentaiton - Compilations
HSA Foundation
 
C++ amp on linux
C++ amp on linuxC++ amp on linux
C++ amp on linux
Miller Lee
 
Data Acquisition
Data AcquisitionData Acquisition
Data Acquisition
azhar557
 
Whats new in_csharp4
Whats new in_csharp4Whats new in_csharp4
Whats new in_csharp4
Abed Bukhari
 
Global Interpreter Lock: Episode I - Break the Seal
Global Interpreter Lock: Episode I - Break the SealGlobal Interpreter Lock: Episode I - Break the Seal
Global Interpreter Lock: Episode I - Break the Seal
Tzung-Bi Shih
 
GPU Programming on CPU - Using C++AMP
GPU Programming on CPU - Using C++AMPGPU Programming on CPU - Using C++AMP
GPU Programming on CPU - Using C++AMP
Miller Lee
 
Static analysis of C++ source code
Static analysis of C++ source codeStatic analysis of C++ source code
Static analysis of C++ source code
PVS-Studio
 
Static analysis of C++ source code
Static analysis of C++ source codeStatic analysis of C++ source code
Static analysis of C++ source code
Andrey Karpov
 
2.1 ### uVision Project, (C) Keil Software .docx
2.1   ### uVision Project, (C) Keil Software    .docx2.1   ### uVision Project, (C) Keil Software    .docx
2.1 ### uVision Project, (C) Keil Software .docx
tarifarmarie
 
Tiramisu をちょっと、味見してみました。
Tiramisu をちょっと、味見してみました。Tiramisu をちょっと、味見してみました。
Tiramisu をちょっと、味見してみました。
Mr. Vengineer
 
망고100 보드로 놀아보자 15
망고100 보드로 놀아보자 15망고100 보드로 놀아보자 15
망고100 보드로 놀아보자 15
종인 전
 
Global Interpreter Lock: Episode III - cat &lt; /dev/zero > GIL;
Global Interpreter Lock: Episode III - cat &lt; /dev/zero > GIL;Global Interpreter Lock: Episode III - cat &lt; /dev/zero > GIL;
Global Interpreter Lock: Episode III - cat &lt; /dev/zero > GIL;
Tzung-Bi Shih
 
Stranger in These Parts. A Hired Gun in the JS Corral (JSConf US 2012)
Stranger in These Parts. A Hired Gun in the JS Corral (JSConf US 2012)Stranger in These Parts. A Hired Gun in the JS Corral (JSConf US 2012)
Stranger in These Parts. A Hired Gun in the JS Corral (JSConf US 2012)
Igalia
 
Linux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudLinux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloud
Andrea Righi
 
Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rLrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with r
Ferdinand Jamitzky
 
Multiplatform JIT Code Generator for NetBSD by Alexander Nasonov
Multiplatform JIT Code Generator for NetBSD by Alexander NasonovMultiplatform JIT Code Generator for NetBSD by Alexander Nasonov
Multiplatform JIT Code Generator for NetBSD by Alexander Nasonov
eurobsdcon
 
clWrap: Nonsense free control of your GPU
clWrap: Nonsense free control of your GPUclWrap: Nonsense free control of your GPU
clWrap: Nonsense free control of your GPU
John Colvin
 
Beyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisBeyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic Analysis
Fastly
 
Ad

Recently uploaded (20)

Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Ad

Runtime Code Generation and Data Management for Heterogeneous Computing in Java