SlideShare a Scribd company logo
FAST MAP PROJECTION ON CUDA Yanwei Zhao Institute of Computing Technology Chinese Academy of Sciences July 29, 2011
Outline Institute  of Computing Technology, Chinese Academy of  Sciences
Outline Institute  of Computing Technology, Chinese Academy of  Sciences
Map Projection Establish the relationship between two different coordinate systems. geographical coordinates -> planar cartesian map space coordinate system Complicated and time consuming arithmetic operations. Fast answer with desired accuracy-> Slow exact answer It's need to be accelerated for interactive GIS scenarios. Institute  of Computing Technology, Chinese Academy of  Sciences
GPGPU (The  general purpose computing on graphics processing units ) GPGPU is a young area of research. Advantage of GPU Flexibility Power processing Low cost GPGPU in applications other than 3D graphics  GPU accelerates critical path of application Institute  of Computing Technology, Chinese Academy of  Sciences
CUDA ( Common Unified Device Architecture ) NVIDIA's parallel computing architecture  C base programming language and development toolkit  Advantage: Programmer can focus on the important  issues rather than an unfamiliar language  No need of graphics APIs and write efficient parallel code Institute  of Computing Technology, Chinese Academy of  Sciences
The characteristic of Map Projection Huge amount of coordinates to handle The complexity of arithmetic operations The requirement of a realtime response Institute  of Computing Technology, Chinese Academy of  Sciences
Our proposals using the new technology CUDA on the GPU Take Universal Transverse Mercator (UTM) projection as an example Performance: Improvement of up to 6x to 8x (include transfer time) Speed up 70x to 90x (not include transfer time) Institute  of Computing Technology, Chinese Academy of  Sciences
Outline Institute  of Computing Technology, Chinese Academy of  Sciences
Algorithm framework Striped partitioning Matrix distribution Institute  of Computing Technology, Chinese Academy of  Sciences
Striped partitioning Define the number of block and thread: Block_num,Thread_num CUDA built-in parameters: GridDim, BlockDim Geographic feature number: fn Each block runs features:  fn/GridDim.x Institute  of Computing Technology, Chinese Academy of  Sciences
Striped partitioning For surrounding loop: Blocks and features Block -> Feature[i] i = blockidx.x*(fn/GridDim.x)  (1) Block -> next Feature[k] k = i + fn/GridDim.x  (2) For inner loop: Threads and coordinates thread->coord[j] j = threadIdx.x thread->next coord[k] k = j +Thread_num Institute  of Computing Technology, Chinese Academy of  Sciences
Striped partitioning For surrounding loop: Blocks and features Block -> Feature[i] i = blockidx.x*(fn/GridDim.x) Block -> next Feature[k] k = i + fn/GridDim.x For inner loop: Threads and coordinates thread->coord[j] j = threadIdx.x  (1)   thread->next coord[k] k = j +Thread_num  (2) Institute  of Computing Technology, Chinese Academy of  Sciences
Matrix distribution Institute  of Computing Technology, Chinese Academy of  Sciences Define the number of block and thread: grid(br,bc), block(tr,tc) Each block run k features, where: (1) Feature[i]: (2) (3)
Matrix distribution Each block run s coordnates, where: (1) coord[j]: Institute  of Computing Technology, Chinese Academy of  Sciences
Outline Institute  of Computing Technology, Chinese Academy of  Sciences
Experiment Environment Hardware: CPU:  Intel Core2 Duo CPU E8500 at 3.18GHz with 2GB of internal memory GPU:  NVIDIA GeForce 9800 GTX+ graphics card which has 512MB memory, 128 CUDA cores and 16 multiprocessors Software: Microsoft Windows XP Pro SP2 Microsoft Visual Studio 2005 NVIDIA driver 2.2, CUDA sdk 2.2 and CUDA toolkit 2.2 Institute  of Computing Technology, Chinese Academy of  Sciences
The data parallel degree total CPU time : initialization and file reading time  serial projection time Institute  of Computing Technology, Chinese Academy of  Sciences
The data parallel degree total CPU time : initialization and file reading time  serial projection time Map projection can achieve more than 90 percent of parallelism. Institute  of Computing Technology, Chinese Academy of  Sciences
Comparing with CPU Block_num=64  Thread_num=512 Institute  of Computing Technology, Chinese Academy of  Sciences
Comparing with CPU Total time = map projection time + data transfer time Institute  of Computing Technology, Chinese Academy of  Sciences
Comparing with CPU If consider the total time, the performance can obtain 6x to 8x. Institute  of Computing Technology, Chinese Academy of  Sciences
Comparing with CPU If only compare map projection time, we can obtain 70x to 90x speedups. Institute  of Computing Technology, Chinese Academy of  Sciences
The performance of different task assignments striped partitioning :  Block_num =64,  Thread_num =512 matrix distribution:  dim_grid (32,32) = 32*32 blocks dim_block (256,256) = 256*256 threads Institute  of Computing Technology, Chinese Academy of  Sciences
The performance of different task assignments striped partitioning :  Block_num =64,  Thread_num =512 matrix distribution:  dim_grid (32,32) = 32*32 blocks dim_block (256,256) = 256*256 threads Striped: 6x to 8x Matrix: 4x to 6x Institute  of Computing Technology, Chinese Academy of  Sciences
The performance of different task assignments Matrix  Striped Institute  of Computing Technology, Chinese Academy of  Sciences
The performance of different task assignments Matrix  Striped All threads in the block accessing consecutive memory. it can only ensure each row of threads in the block handle consecutive data Institute  of Computing Technology, Chinese Academy of  Sciences
Outline Institute  of Computing Technology, Chinese Academy of  Sciences
Conclusion and Future work Implement a fast map projection method. CUDA-enabled GPUs high speed-up compared to the CPU-based method the power of modern GPU is able to considerably speed up in the field of geoscience DEM-based spatial interpolation raster-based spatial analysis Future work: GPU implementation of other GIS application Institute  of Computing Technology, Chinese Academy of  Sciences
Thank you! Q & A  Yanwei Zhao Institute of Computing Technology Contact: zhaoyanwei@ict.ac.cn Institute  of Computing Technology, Chinese Academy of  Sciences

More Related Content

What's hot (20)

PPTX
Accelerated Logistic Regression on GPU(s)
RAHUL BHOJWANI
 
PDF
COUPLED FPGA/ASIC IMPLEMENTATION OF ELLIPTIC CURVE CRYPTO-PROCESSOR
IJNSA Journal
 
PDF
YOLACT
Arithmer Inc.
 
PPT
Matrix transposition
동호 이
 
PDF
A03530107
inventionjournals
 
PDF
Enhancement and Analysis of Chaotic Image Encryption Algorithms
cscpconf
 
PDF
DNR - Auto deep lab paper review ppt
taeseon ryu
 
PDF
Introduction to Cache-Oblivious Algorithms
Christopher Gilbert
 
PPTX
Improving access to satellite imagery with Cloud computing
RAHUL BHOJWANI
 
PDF
F044062933
IJERA Editor
 
PDF
IRJET-ASIC Implementation for SOBEL Accelerator
IRJET Journal
 
PDF
Road Quality Measurement from High Resolution Satellite Images for National H...
Dipesh Shome
 
PDF
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
Ryo Takahashi
 
PPT
B Eng Final Year Project Presentation
jesujoseph
 
PDF
Nicpaper2009
bikram ...
 
PDF
Deformable DETR Review [CDM]
Dongmin Choi
 
PDF
Image Object Detection Pipeline
Abhinav Dadhich
 
PPTX
Object Detection using Deep Neural Networks
Usman Qayyum
 
PDF
Introduction To Machine Learning and Neural Networks
德平 黄
 
Accelerated Logistic Regression on GPU(s)
RAHUL BHOJWANI
 
COUPLED FPGA/ASIC IMPLEMENTATION OF ELLIPTIC CURVE CRYPTO-PROCESSOR
IJNSA Journal
 
Matrix transposition
동호 이
 
Enhancement and Analysis of Chaotic Image Encryption Algorithms
cscpconf
 
DNR - Auto deep lab paper review ppt
taeseon ryu
 
Introduction to Cache-Oblivious Algorithms
Christopher Gilbert
 
Improving access to satellite imagery with Cloud computing
RAHUL BHOJWANI
 
F044062933
IJERA Editor
 
IRJET-ASIC Implementation for SOBEL Accelerator
IRJET Journal
 
Road Quality Measurement from High Resolution Satellite Images for National H...
Dipesh Shome
 
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
Ryo Takahashi
 
B Eng Final Year Project Presentation
jesujoseph
 
Nicpaper2009
bikram ...
 
Deformable DETR Review [CDM]
Dongmin Choi
 
Image Object Detection Pipeline
Abhinav Dadhich
 
Object Detection using Deep Neural Networks
Usman Qayyum
 
Introduction To Machine Learning and Neural Networks
德平 黄
 

Viewers also liked (10)

PDF
OpenHPI - Parallel Programming Concepts - Week 4
Peter Tröger
 
PDF
Joel Falcou, Boost.SIMD
Sergey Platonov
 
PDF
Dependable Systems -Reliability Prediction (9/16)
Peter Tröger
 
PPTX
Equipo 2 gpus
Elvis Hernadez
 
PDF
Computación paralela con gp us cuda
Javier Zarco
 
PDF
GPU Computing with Ruby
Shin Yee Chung
 
PPTX
Cuda Architecture
Piyush Mittal
 
PDF
A comparison of molecular dynamics simulations using GROMACS with GPU and CPU
Alex Camargo
 
PDF
GPU - An Introduction
Dhan V Sagar
 
PDF
Introduction to CUDA
Raymond Tay
 
OpenHPI - Parallel Programming Concepts - Week 4
Peter Tröger
 
Joel Falcou, Boost.SIMD
Sergey Platonov
 
Dependable Systems -Reliability Prediction (9/16)
Peter Tröger
 
Equipo 2 gpus
Elvis Hernadez
 
Computación paralela con gp us cuda
Javier Zarco
 
GPU Computing with Ruby
Shin Yee Chung
 
Cuda Architecture
Piyush Mittal
 
A comparison of molecular dynamics simulations using GROMACS with GPU and CPU
Alex Camargo
 
GPU - An Introduction
Dhan V Sagar
 
Introduction to CUDA
Raymond Tay
 
Ad

Similar to FAST MAP PROJECTION ON CUDA.ppt (20)

PDF
Gpu Cuda
melbournepatterns
 
PPT
Presentation
butest
 
PDF
The Rise of Parallel Computing
bakers84
 
PPT
Introduction to parallel computing using CUDA
Martin Peniak
 
PDF
4213ijaia02
ijaia
 
PDF
Computing using GPUs
Shree Kumar
 
PDF
Ultra Fast SOM using CUDA
QuEST Global (erstwhile NeST Software)
 
PPT
Gpu and The Brick Wall
ugur candan
 
PDF
High Performance Medical Reconstruction Using Stream Programming Paradigms
QuEST Global (erstwhile NeST Software)
 
PDF
Gpu based image segmentation using
csandit
 
PDF
GPU-BASED IMAGE SEGMENTATION USING LEVEL SET METHOD WITH SCALING APPROACH
csandit
 
PDF
Gpu perf-presentation
GiannisTsagatakis
 
PDF
GPGPU Computation
jtsagata
 
PPT
Lecture5 cuda-memory-spring-2010
douglaslyon
 
PPT
Current Trends in HPC
Putchong Uthayopas
 
PDF
Comparison of Parallel Algorithms For An Image Processing Problem on Cuda
Seval Çapraz
 
PPTX
GPU Computing: A brief overview
Rajiv Kumar
 
PDF
IMQA Poster
Vignesh Kannan
 
PDF
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
inside-BigData.com
 
PDF
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
AMD Developer Central
 
Presentation
butest
 
The Rise of Parallel Computing
bakers84
 
Introduction to parallel computing using CUDA
Martin Peniak
 
4213ijaia02
ijaia
 
Computing using GPUs
Shree Kumar
 
Ultra Fast SOM using CUDA
QuEST Global (erstwhile NeST Software)
 
Gpu and The Brick Wall
ugur candan
 
High Performance Medical Reconstruction Using Stream Programming Paradigms
QuEST Global (erstwhile NeST Software)
 
Gpu based image segmentation using
csandit
 
GPU-BASED IMAGE SEGMENTATION USING LEVEL SET METHOD WITH SCALING APPROACH
csandit
 
Gpu perf-presentation
GiannisTsagatakis
 
GPGPU Computation
jtsagata
 
Lecture5 cuda-memory-spring-2010
douglaslyon
 
Current Trends in HPC
Putchong Uthayopas
 
Comparison of Parallel Algorithms For An Image Processing Problem on Cuda
Seval Çapraz
 
GPU Computing: A brief overview
Rajiv Kumar
 
IMQA Poster
Vignesh Kannan
 
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
inside-BigData.com
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
AMD Developer Central
 
Ad

More from grssieee (20)

PDF
Tangent height accuracy of Superconducting Submillimeter-Wave Limb-Emission S...
grssieee
 
PDF
SEGMENTATION OF POLARIMETRIC SAR DATA WITH A MULTI-TEXTURE PRODUCT MODEL
grssieee
 
PPTX
TWO-POINT STATISTIC OF POLARIMETRIC SAR DATA TWO-POINT STATISTIC OF POLARIMET...
grssieee
 
PPT
THE SENTINEL-1 MISSION AND ITS APPLICATION CAPABILITIES
grssieee
 
PPTX
GMES SPACE COMPONENT:PROGRAMMATIC STATUS
grssieee
 
PPTX
PROGRESSES OF DEVELOPMENT OF CFOSAT SCATTEROMETER
grssieee
 
PPT
DEVELOPMENT OF ALGORITHMS AND PRODUCTS FOR SUPPORTING THE ITALIAN HYPERSPECTR...
grssieee
 
PPT
EO-1/HYPERION: NEARING TWELVE YEARS OF SUCCESSFUL MISSION SCIENCE OPERATION A...
grssieee
 
PPT
EO-1/HYPERION: NEARING TWELVE YEARS OF SUCCESSFUL MISSION SCIENCE OPERATION A...
grssieee
 
PPT
EO-1/HYPERION: NEARING TWELVE YEARS OF SUCCESSFUL MISSION SCIENCE OPERATION A...
grssieee
 
PDF
Test
grssieee
 
PPT
test 34mb wo animations
grssieee
 
PPT
Test 70MB
grssieee
 
PPT
Test 70MB
grssieee
 
PDF
2011_Fox_Tax_Worksheets.pdf
grssieee
 
PPT
DLR open house
grssieee
 
PPT
DLR open house
grssieee
 
PPT
DLR open house
grssieee
 
PPT
Tana_IGARSS2011.ppt
grssieee
 
PPT
Solaro_IGARSS_2011.ppt
grssieee
 
Tangent height accuracy of Superconducting Submillimeter-Wave Limb-Emission S...
grssieee
 
SEGMENTATION OF POLARIMETRIC SAR DATA WITH A MULTI-TEXTURE PRODUCT MODEL
grssieee
 
TWO-POINT STATISTIC OF POLARIMETRIC SAR DATA TWO-POINT STATISTIC OF POLARIMET...
grssieee
 
THE SENTINEL-1 MISSION AND ITS APPLICATION CAPABILITIES
grssieee
 
GMES SPACE COMPONENT:PROGRAMMATIC STATUS
grssieee
 
PROGRESSES OF DEVELOPMENT OF CFOSAT SCATTEROMETER
grssieee
 
DEVELOPMENT OF ALGORITHMS AND PRODUCTS FOR SUPPORTING THE ITALIAN HYPERSPECTR...
grssieee
 
EO-1/HYPERION: NEARING TWELVE YEARS OF SUCCESSFUL MISSION SCIENCE OPERATION A...
grssieee
 
EO-1/HYPERION: NEARING TWELVE YEARS OF SUCCESSFUL MISSION SCIENCE OPERATION A...
grssieee
 
EO-1/HYPERION: NEARING TWELVE YEARS OF SUCCESSFUL MISSION SCIENCE OPERATION A...
grssieee
 
Test
grssieee
 
test 34mb wo animations
grssieee
 
Test 70MB
grssieee
 
Test 70MB
grssieee
 
2011_Fox_Tax_Worksheets.pdf
grssieee
 
DLR open house
grssieee
 
DLR open house
grssieee
 
DLR open house
grssieee
 
Tana_IGARSS2011.ppt
grssieee
 
Solaro_IGARSS_2011.ppt
grssieee
 

Recently uploaded (20)

PDF
NASA A Researcher’s Guide to International Space Station : Earth Observations
Dr. PANKAJ DHUSSA
 
PDF
[GDGoC FPTU] Spring 2025 Summary Slidess
minhtrietgect
 
PDF
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
PDF
Next Generation AI: Anticipatory Intelligence, Forecasting Inflection Points ...
dleka294658677
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
Software Development Company Keene Systems, Inc (1).pdf
Custom Software Development Company | Keene Systems, Inc.
 
PDF
Modern Decentralized Application Architectures.pdf
Kalema Edgar
 
PDF
Evolution: How True AI is Redefining Safety in Industry 4.0
vikaassingh4433
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PPTX
Securing Model Context Protocol with Keycloak: AuthN/AuthZ for MCP Servers
Hitachi, Ltd. OSS Solution Center.
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
NASA A Researcher’s Guide to International Space Station : Earth Observations
Dr. PANKAJ DHUSSA
 
[GDGoC FPTU] Spring 2025 Summary Slidess
minhtrietgect
 
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
Next Generation AI: Anticipatory Intelligence, Forecasting Inflection Points ...
dleka294658677
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
Software Development Company Keene Systems, Inc (1).pdf
Custom Software Development Company | Keene Systems, Inc.
 
Modern Decentralized Application Architectures.pdf
Kalema Edgar
 
Evolution: How True AI is Redefining Safety in Industry 4.0
vikaassingh4433
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Securing Model Context Protocol with Keycloak: AuthN/AuthZ for MCP Servers
Hitachi, Ltd. OSS Solution Center.
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 

FAST MAP PROJECTION ON CUDA.ppt

  • 1. FAST MAP PROJECTION ON CUDA Yanwei Zhao Institute of Computing Technology Chinese Academy of Sciences July 29, 2011
  • 2. Outline Institute of Computing Technology, Chinese Academy of Sciences
  • 3. Outline Institute of Computing Technology, Chinese Academy of Sciences
  • 4. Map Projection Establish the relationship between two different coordinate systems. geographical coordinates -> planar cartesian map space coordinate system Complicated and time consuming arithmetic operations. Fast answer with desired accuracy-> Slow exact answer It's need to be accelerated for interactive GIS scenarios. Institute of Computing Technology, Chinese Academy of Sciences
  • 5. GPGPU (The general purpose computing on graphics processing units ) GPGPU is a young area of research. Advantage of GPU Flexibility Power processing Low cost GPGPU in applications other than 3D graphics GPU accelerates critical path of application Institute of Computing Technology, Chinese Academy of Sciences
  • 6. CUDA ( Common Unified Device Architecture ) NVIDIA's parallel computing architecture C base programming language and development toolkit Advantage: Programmer can focus on the important issues rather than an unfamiliar language No need of graphics APIs and write efficient parallel code Institute of Computing Technology, Chinese Academy of Sciences
  • 7. The characteristic of Map Projection Huge amount of coordinates to handle The complexity of arithmetic operations The requirement of a realtime response Institute of Computing Technology, Chinese Academy of Sciences
  • 8. Our proposals using the new technology CUDA on the GPU Take Universal Transverse Mercator (UTM) projection as an example Performance: Improvement of up to 6x to 8x (include transfer time) Speed up 70x to 90x (not include transfer time) Institute of Computing Technology, Chinese Academy of Sciences
  • 9. Outline Institute of Computing Technology, Chinese Academy of Sciences
  • 10. Algorithm framework Striped partitioning Matrix distribution Institute of Computing Technology, Chinese Academy of Sciences
  • 11. Striped partitioning Define the number of block and thread: Block_num,Thread_num CUDA built-in parameters: GridDim, BlockDim Geographic feature number: fn Each block runs features: fn/GridDim.x Institute of Computing Technology, Chinese Academy of Sciences
  • 12. Striped partitioning For surrounding loop: Blocks and features Block -> Feature[i] i = blockidx.x*(fn/GridDim.x) (1) Block -> next Feature[k] k = i + fn/GridDim.x (2) For inner loop: Threads and coordinates thread->coord[j] j = threadIdx.x thread->next coord[k] k = j +Thread_num Institute of Computing Technology, Chinese Academy of Sciences
  • 13. Striped partitioning For surrounding loop: Blocks and features Block -> Feature[i] i = blockidx.x*(fn/GridDim.x) Block -> next Feature[k] k = i + fn/GridDim.x For inner loop: Threads and coordinates thread->coord[j] j = threadIdx.x (1) thread->next coord[k] k = j +Thread_num (2) Institute of Computing Technology, Chinese Academy of Sciences
  • 14. Matrix distribution Institute of Computing Technology, Chinese Academy of Sciences Define the number of block and thread: grid(br,bc), block(tr,tc) Each block run k features, where: (1) Feature[i]: (2) (3)
  • 15. Matrix distribution Each block run s coordnates, where: (1) coord[j]: Institute of Computing Technology, Chinese Academy of Sciences
  • 16. Outline Institute of Computing Technology, Chinese Academy of Sciences
  • 17. Experiment Environment Hardware: CPU: Intel Core2 Duo CPU E8500 at 3.18GHz with 2GB of internal memory GPU: NVIDIA GeForce 9800 GTX+ graphics card which has 512MB memory, 128 CUDA cores and 16 multiprocessors Software: Microsoft Windows XP Pro SP2 Microsoft Visual Studio 2005 NVIDIA driver 2.2, CUDA sdk 2.2 and CUDA toolkit 2.2 Institute of Computing Technology, Chinese Academy of Sciences
  • 18. The data parallel degree total CPU time : initialization and file reading time serial projection time Institute of Computing Technology, Chinese Academy of Sciences
  • 19. The data parallel degree total CPU time : initialization and file reading time serial projection time Map projection can achieve more than 90 percent of parallelism. Institute of Computing Technology, Chinese Academy of Sciences
  • 20. Comparing with CPU Block_num=64 Thread_num=512 Institute of Computing Technology, Chinese Academy of Sciences
  • 21. Comparing with CPU Total time = map projection time + data transfer time Institute of Computing Technology, Chinese Academy of Sciences
  • 22. Comparing with CPU If consider the total time, the performance can obtain 6x to 8x. Institute of Computing Technology, Chinese Academy of Sciences
  • 23. Comparing with CPU If only compare map projection time, we can obtain 70x to 90x speedups. Institute of Computing Technology, Chinese Academy of Sciences
  • 24. The performance of different task assignments striped partitioning : Block_num =64, Thread_num =512 matrix distribution: dim_grid (32,32) = 32*32 blocks dim_block (256,256) = 256*256 threads Institute of Computing Technology, Chinese Academy of Sciences
  • 25. The performance of different task assignments striped partitioning : Block_num =64, Thread_num =512 matrix distribution: dim_grid (32,32) = 32*32 blocks dim_block (256,256) = 256*256 threads Striped: 6x to 8x Matrix: 4x to 6x Institute of Computing Technology, Chinese Academy of Sciences
  • 26. The performance of different task assignments Matrix Striped Institute of Computing Technology, Chinese Academy of Sciences
  • 27. The performance of different task assignments Matrix Striped All threads in the block accessing consecutive memory. it can only ensure each row of threads in the block handle consecutive data Institute of Computing Technology, Chinese Academy of Sciences
  • 28. Outline Institute of Computing Technology, Chinese Academy of Sciences
  • 29. Conclusion and Future work Implement a fast map projection method. CUDA-enabled GPUs high speed-up compared to the CPU-based method the power of modern GPU is able to considerably speed up in the field of geoscience DEM-based spatial interpolation raster-based spatial analysis Future work: GPU implementation of other GIS application Institute of Computing Technology, Chinese Academy of Sciences
  • 30. Thank you! Q & A Yanwei Zhao Institute of Computing Technology Contact: [email protected] Institute of Computing Technology, Chinese Academy of Sciences