This document discusses runtime and data management techniques for heterogeneous computing in Java. It presents an approach that uses three levels of abstraction: parallel skeletons API based on functional programming, a high-level optimizing library that rewrites operations to target specific hardware, and OpenCL code generation and runtime with data management for heterogeneous architectures. It describes how the runtime performs type inference, IR generation, optimizations, and kernel generation to compile Java code into OpenCL kernels. It also discusses how custom array types are used to reduce data marshaling overhead between the Java and OpenCL runtimes.