FAST MAP PROJECTION ON CUDA.ppt

FAST MAP PROJECTION ON CUDA Yanwei Zhao Institute of Computing Technology Chinese Academy of Sciences July 29, 2011

Outline Institute of Computing Technology, Chinese Academy of Sciences

Map Projection Establish the relationship between two different coordinate systems. geographical coordinates -> planar cartesian map space coordinate system Complicated and time consuming arithmetic operations. Fast answer with desired accuracy-> Slow exact answer It's need to be accelerated for interactive GIS scenarios. Institute of Computing Technology, Chinese Academy of Sciences

GPGPU (The general purpose computing on graphics processing units ) GPGPU is a young area of research. Advantage of GPU Flexibility Power processing Low cost GPGPU in applications other than 3D graphics GPU accelerates critical path of application Institute of Computing Technology, Chinese Academy of Sciences

CUDA ( Common Unified Device Architecture ) NVIDIA's parallel computing architecture C base programming language and development toolkit Advantage: Programmer can focus on the important issues rather than an unfamiliar language No need of graphics APIs and write efficient parallel code Institute of Computing Technology, Chinese Academy of Sciences

The characteristic of Map Projection Huge amount of coordinates to handle The complexity of arithmetic operations The requirement of a realtime response Institute of Computing Technology, Chinese Academy of Sciences

Our proposals using the new technology CUDA on the GPU Take Universal Transverse Mercator (UTM) projection as an example Performance: Improvement of up to 6x to 8x (include transfer time) Speed up 70x to 90x (not include transfer time) Institute of Computing Technology, Chinese Academy of Sciences

Algorithm framework Striped partitioning Matrix distribution Institute of Computing Technology, Chinese Academy of Sciences

Striped partitioning Define the number of block and thread: Block_num,Thread_num CUDA built-in parameters: GridDim, BlockDim Geographic feature number: fn Each block runs features: fn/GridDim.x Institute of Computing Technology, Chinese Academy of Sciences

Striped partitioning For surrounding loop: Blocks and features Block -> Feature[i] i = blockidx.x*(fn/GridDim.x) (1) Block -> next Feature[k] k = i + fn/GridDim.x (2) For inner loop: Threads and coordinates thread->coord[j] j = threadIdx.x thread->next coord[k] k = j +Thread_num Institute of Computing Technology, Chinese Academy of Sciences

Striped partitioning For surrounding loop: Blocks and features Block -> Feature[i] i = blockidx.x*(fn/GridDim.x) Block -> next Feature[k] k = i + fn/GridDim.x For inner loop: Threads and coordinates thread->coord[j] j = threadIdx.x (1) thread->next coord[k] k = j +Thread_num (2) Institute of Computing Technology, Chinese Academy of Sciences

Matrix distribution Institute of Computing Technology, Chinese Academy of Sciences Define the number of block and thread: grid(br,bc), block(tr,tc) Each block run k features, where: (1) Feature[i]: (2) (3)

Matrix distribution Each block run s coordnates, where: (1) coord[j]: Institute of Computing Technology, Chinese Academy of Sciences

Experiment Environment Hardware: CPU: Intel Core2 Duo CPU E8500 at 3.18GHz with 2GB of internal memory GPU: NVIDIA GeForce 9800 GTX+ graphics card which has 512MB memory, 128 CUDA cores and 16 multiprocessors Software: Microsoft Windows XP Pro SP2 Microsoft Visual Studio 2005 NVIDIA driver 2.2, CUDA sdk 2.2 and CUDA toolkit 2.2 Institute of Computing Technology, Chinese Academy of Sciences

The data parallel degree total CPU time : initialization and file reading time serial projection time Institute of Computing Technology, Chinese Academy of Sciences

The data parallel degree total CPU time : initialization and file reading time serial projection time Map projection can achieve more than 90 percent of parallelism. Institute of Computing Technology, Chinese Academy of Sciences

Comparing with CPU Block_num=64 Thread_num=512 Institute of Computing Technology, Chinese Academy of Sciences

Comparing with CPU Total time = map projection time + data transfer time Institute of Computing Technology, Chinese Academy of Sciences

Comparing with CPU If consider the total time, the performance can obtain 6x to 8x. Institute of Computing Technology, Chinese Academy of Sciences

Comparing with CPU If only compare map projection time, we can obtain 70x to 90x speedups. Institute of Computing Technology, Chinese Academy of Sciences

The performance of different task assignments striped partitioning : Block_num =64, Thread_num =512 matrix distribution: dim_grid (32,32) = 32*32 blocks dim_block (256,256) = 256*256 threads Institute of Computing Technology, Chinese Academy of Sciences

The performance of different task assignments striped partitioning : Block_num =64, Thread_num =512 matrix distribution: dim_grid (32,32) = 32*32 blocks dim_block (256,256) = 256*256 threads Striped: 6x to 8x Matrix: 4x to 6x Institute of Computing Technology, Chinese Academy of Sciences

The performance of different task assignments Matrix Striped Institute of Computing Technology, Chinese Academy of Sciences

The performance of different task assignments Matrix Striped All threads in the block accessing consecutive memory. it can only ensure each row of threads in the block handle consecutive data Institute of Computing Technology, Chinese Academy of Sciences

Conclusion and Future work Implement a fast map projection method. CUDA-enabled GPUs high speed-up compared to the CPU-based method the power of modern GPU is able to considerably speed up in the field of geoscience DEM-based spatial interpolation raster-based spatial analysis Future work: GPU implementation of other GIS application Institute of Computing Technology, Chinese Academy of Sciences

Thank you! Q & A Yanwei Zhao Institute of Computing Technology Contact: zhaoyanwei@ict.ac.cn Institute of Computing Technology, Chinese Academy of Sciences

FAST MAP PROJECTION ON CUDA.ppt

More Related Content

What's hot (20)

Viewers also liked (10)

Similar to FAST MAP PROJECTION ON CUDA.ppt (20)

More from grssieee (20)

Recently uploaded (20)

FAST MAP PROJECTION ON CUDA.ppt