SlideShare a Scribd company logo
VPU TECHNOLOGY &GPGPU COMPUTING Arka Ghosh(9007900477a@gmail.com) B.Tech Computer Science & Engineering DELIVERED AT Seacom Engineering College,CSE Dept DATE 7 th  April’2011
What Is VPU? VPU is Visual Processing Unit it is more generally known as Graphics Processing Unit or GPU. The Graphics Processing Unit is a MASSIVELY PARALAL & MASSIVELY MULTITHREADED microprocessor. HyBrid Solutions NVIDIA SLI ATI Raedon CROSSFIREX Why GPU? GPU is used for high performance Computing . Long time ago work of GPU was to offload & accelerate graphics rendering from the CPU, but now a days the scene has changed.GPU has capability to work like a CPU,in some complex computational cases it beats the CPU. GPU Solutions:- We can get GPU in two forms  1.Integrated GPU It is integrated on the chipset of MotherBoard.It has low memory bandwidth & its latency time is much more than Dedicated ones. i.e-NVIDIA 730a Chipset provides 8200GT GPU with 540Mhz core. 2.Discrete or Dedicated GPU It is the most power full form of GPU.it is generally installed on PCIe or AGP port of MotherBoard.It has its own memory module. i.e-ATI Raedon HD 5970 X2 has Compute power of  4.64 TeraFlops with 3200 Stream Processors & 1 Ghz core  © Arka Ghosh 2011
What is PPU? PPU is physics processing unit. which specialized for calculation of rigid body dynamics, soft body dynamics, collision detection, fluid dynamics, hair and clothing simulation, finite element analysis, and fracturing of objects. LARRABEE FUSION The Main Leader of PPU is AGIA PhysX. It consists of a general purpose RISC core controlling an array of custom SIMD floating point VLIW processors working in local banked memories, with a switch-fabric to manage transfers between them. There is no cache-hierarchy as in a CPU or GPU. GPUs vs PPUs:- The drive toward GPGPU is making GPUs more and more suitable for the job of a PPU. ULTIMATE FATE OF GPU:- 1.Intel’s LARRABEE 2.AMD’s FUSION © Arka Ghosh 2011
-:INTO THE ARCHITECTURE:- Use of SPM:- SPM or SCRATCHPAD MEMORY is a high-speed internal memory used for temporary storage of calculations, data, and other work in progress.Inreference to a microprocessor (&quot;CPU&quot;), scratchpad refers to a special high-speed memory circuit used to hold small items of data for rapid retrieval. EXAMPLE:-•  NVIDIA's 8800 GPU running under CUDA provides 16KiB of Scratchpad per thread-bundle when being used for gpgpu tasks. STREAM PROCESSING:  The stream processing paradigm simplifies parallel software and hardware by restricting the parallel computation that can be performed.  1.Uniform Stream. Applications:- Compute Intensity Data Parallelism Data Locality Conventional, sequential paradigm  Parallel SIMD paradigm, packed registers (SWAR) for(int el = 0; el < 100; el++)  // for each vector vector_sum(result[el], source0[el], source1[el]); for(int i = 0; i < 100 * 4; i++) result[i] = source0[i] + source1[i]; © Arka Ghosh 2011
 Graphics Pipeline  The graphics pipeline typically accepts some representation of a three-dimensional scene as an input and results in a 2D raster image as output. OpenGL and Direct3D are two notable graphics pipeline models accepted as widespread industry standards. Stages of the graphics pipeline:-> 1.Transformation 2.Per-vertex lighting 3.Viewing transformation 4.Primitives generation 5.Projection transformation 6.Clipping 7.Viewport transformation 8.Scan conversion or rasterization 9.Texturing, fragment shading 10.Display Shader  Shaders are used to program the graphics processing unit (GPU) programmable rendering pipeline, which has mostly superseded the fixed-function pipeline that allowed only common geometry transformation and pixel-shading functions; with shaders, customized effects can be used. <<<Types Of Shader>>> Vertex shaders. Pixel shaders Geometrical shaders USEFULLNESS OF SHADER:- 1.Simplified graphic processing unit pipeline 2.Parallel processing Programming shaders We can programe shader by using OpenGL,Cg & Microsoft HLSL.  © Arka Ghosh 2011
GPU CLUSTER  What is Cluster? GPU CLUSTER  Each node of the cluster is GPU. 1.Homogeneous 2.Heterogeneous Components Hardware (Other):- I nterconnector Software:- 1. Operating System 2. GPU driver for the each type of GPU present in each cluster node. 3. Clustering API (such as the Message Passing Interface, MPI). .. Algorithm mapping GPU SWITCHING  Means Switching from one cluster node to another. WINDOWS Switching. LINUX Switching. © Arka Ghosh 2011
What Is GPGPU? GPGPU stands for general purpose graphics processin unit computing.Using GPU as CPU is the GPGPU computing NVIDIA CUDA:- It is a GPGPU Computing architecture. It provides heterogeneous computing environment. Why GPU Computing? To achive high performance computing. Minimize ERROR LOW power Consumption..GO GREEN. NVIDIA FLEXES TESLA MUSCLE
CUDA Kernels and Threads Parallel   portions   of   an   application   are   executed   on the   device   as   kernels One   kernel   is   executed   at   a   time Many   threads   execute   each   kernel Differences   between   CUDA   and   CPU   threads CUDA   threads   are   extremely   lightweight CUDA   uses   1000s   of   threads   to   achieve   efficiency Multi-core   CPUs   can   use   only   a   few Definitions Device   =   GPU Host   =   CPU Kernel   =   function   that   runs   on   the   device Data Movement Example int   main(void) { float   *a_h,   *b_h;   //   host   data float   *a_d,   *b_d;   //   device   data int   N   =   14,   nBytes,   i   ; nBytes   =   N*sizeof(float); a_h   =   (float   *)malloc(nBytes); b_h   =   (float   *)malloc(nBytes); cudaMalloc((void   **)   &a_d,   nBytes); cudaMalloc((void   **)   &b_d,   nBytes); for   (i=0,   i<N;   i++)   a_h[i]   =   100.f   +   i; cudaMemcpy(a_d,   a_h,   nBytes,   cudaMemcpyHostToDevice); cudaMemcpy(b_d,   a_d,   nBytes,   cudaMemcpyDeviceToDevice); cudaMemcpy(b_h,   b_d,   nBytes,   cudaMemcpyDeviceToHost); for   (i=0;   i<   N;   i++)   assert(   a_h[i]   ==   b_h[i]   ); free(a_h);   free(b_h);   cudaFree(a_d);   cudaFree(b_d); return   0; } © Arka Ghosh 2011
© Arka Ghosh2011 10-Series   Architecture 240   thread   processors   execute   kernel   threads 30   multiprocessors ,   each   contains 8   thread   processors One   double-precision   unit Shared   memory   enables   thread   cooperation Thread Processors Multiprocessor Shared Memory Double
Execution   Model Software Hardware Threads   are   executed   by   thread   processors Thread Thread Processor Multiprocessor Thread   blocks   are   executed   on   multiprocessors Thread   blocks   do   not   migrate Several   concurrent   thread   blocks   can   reside   on Thread Block ... Grid Device one   multiprocessor   -   limited   by   multiprocessor resources   (shared   memory   and   register   file) A   kernel   is   launched   as   a   grid   of   thread   blocks Only   one   kernel   can   execute   on   a   device   at one   time © Arka Ghosh2011
Tesla Architecture  © Arka Ghosh 2011
Time GigaThread   Hardware   Thread   Scheduler Concurrent   Kernel   Execution   +   Faster   Context   Switch Serial   Kernel   Execution Parallel   Kernel   Execution Kernel   1 Kernel   1 Kernel   2 Kernel   2 Ker 4 nel Kernel   3 Kernel   5 Kernel   3 Kernel   4 Kernel   5 Kernel   2 Kernel   2 © Arka Ghosh2011
EXAMPLE:-> MATLAB CODE FOR SIMPLE FFT(CPU HOST MODE) FOR DEVICE( nVidia QUADRO Fx 5200*2) clear ALL; t1=cputime; x=rand(2^20,1); f=fft(x); t2=cputime; t3=t2-t1; Here t3=0.4056 Clear ALL; t1=cputime; x=rand(2^20,1); gx=gpuArray(x); f=fft(gx); t2=cputime; t3=t2-t1; Here t3=0.006056 clear ALL t1=cputime; x=rand(50); y=rand(50); z=rand(50); a=10; b=20; c=30; d=40; f=a*(x^2)+b*(x*y)+c*(y^3)+d*(z^4); net=feedforwardnet(800); net=trainlm(net,x,f); t2=cputime; t3=t2-t1; MATLAB code For Simple ANN  For CPU t3=250.2154 For GPU t3=122.25 So we can see that The GPU is nearabout 204% efficient than CPU. © Arka Ghosh 2011
CONCLUSION  C for the GPU Multi-GPU Computing Massively Multi-threaded Computing Architecture Compatible with Industry Standard Architectures WHERE GPGPU IS USED? MIT-for educational & Scientific Research Purpose Stanford University--for educational & Scientific Research Purpose NCSA (National Center for Supercomputing Applications) NASA Machine Learning & AI field Machine Vision(Mainly Robot Vision) Tablets © Arka Ghosh 2011
Acknowledgement  Mriganka Chakraborty(prof. Secom Engineering College) Saibal Chakraborty Dr.Nicolas Pinto .prof. of MIT.-Advanced Supercomputing Dept T.Halfhill-NVIDIA Corp Developer Guide GOOGLE
THANK YOU
Ad

More Related Content

What's hot (18)

Cuda
CudaCuda
Cuda
Mannu Malhotra
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
Dhaval Kaneria
 
Cuda
CudaCuda
Cuda
Gopi Saiteja
 
NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009
Randall Hand
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with Cuda
Rob Gillen
 
A beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAA beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDA
Piyush Mittal
 
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDA
Savith Satheesh
 
Cuda
CudaCuda
Cuda
Amy Devadas
 
Cuda Architecture
Cuda ArchitectureCuda Architecture
Cuda Architecture
Piyush Mittal
 
Computing using GPUs
Computing using GPUsComputing using GPUs
Computing using GPUs
Shree Kumar
 
GPU Programming
GPU ProgrammingGPU Programming
GPU Programming
William Cunningham
 
Introduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : NotesIntroduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : Notes
Subhajit Sahu
 
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
DefCamp
 
Introduction to GPU Programming
Introduction to GPU ProgrammingIntroduction to GPU Programming
Introduction to GPU Programming
Chakkrit (Kla) Tantithamthavorn
 
Example Application of GPU
Example Application of GPUExample Application of GPU
Example Application of GPU
Chakkrit (Kla) Tantithamthavorn
 
PG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrPG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated Asyncr
Kohei KaiGai
 
Gpu perf-presentation
Gpu perf-presentationGpu perf-presentation
Gpu perf-presentation
GiannisTsagatakis
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
Dhaval Kaneria
 
NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009
Randall Hand
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with Cuda
Rob Gillen
 
A beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAA beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDA
Piyush Mittal
 
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDA
Savith Satheesh
 
Computing using GPUs
Computing using GPUsComputing using GPUs
Computing using GPUs
Shree Kumar
 
Introduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : NotesIntroduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : Notes
Subhajit Sahu
 
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
DefCamp
 
PG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrPG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated Asyncr
Kohei KaiGai
 

Similar to Vpu technology &gpgpu computing (20)

Introduction to Accelerators
Introduction to AcceleratorsIntroduction to Accelerators
Introduction to Accelerators
Dilum Bandara
 
Introduction to CUDA programming in C language
Introduction to CUDA programming in C languageIntroduction to CUDA programming in C language
Introduction to CUDA programming in C language
angelo119154
 
Cuda intro
Cuda introCuda intro
Cuda intro
Anshul Sharma
 
gpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsngpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsn
ARUNACHALAM468781
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
ssuser413a98
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
Kohei KaiGai
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universe
Ofer Rosenberg
 
Stream Processing
Stream ProcessingStream Processing
Stream Processing
arnamoy10
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage
Kohei KaiGai
 
GPU in Computer Science advance topic .pptx
GPU in Computer Science advance topic .pptxGPU in Computer Science advance topic .pptx
GPU in Computer Science advance topic .pptx
HamzaAli998966
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
J On The Beach
 
Cuda Without a Phd - A practical guick start
Cuda Without a Phd - A practical guick startCuda Without a Phd - A practical guick start
Cuda Without a Phd - A practical guick start
LloydMoore
 
GPU Programming with Java
GPU Programming with JavaGPU Programming with Java
GPU Programming with Java
Kelum Senanayake
 
NVIDIA CUDA
NVIDIA CUDANVIDIA CUDA
NVIDIA CUDA
Jungsoo Nam
 
Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8
AbdullahMunir32
 
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Stefano Di Carlo
 
Exploring Gpgpu Workloads
Exploring Gpgpu WorkloadsExploring Gpgpu Workloads
Exploring Gpgpu Workloads
Unai Lopez-Novoa
 
Introduction to cuda geek camp singapore 2011
Introduction to cuda   geek camp singapore 2011Introduction to cuda   geek camp singapore 2011
Introduction to cuda geek camp singapore 2011
Raymond Tay
 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU ArchitectureRevisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
mohamedragabslideshare
 
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
mouhouioui
 
Introduction to Accelerators
Introduction to AcceleratorsIntroduction to Accelerators
Introduction to Accelerators
Dilum Bandara
 
Introduction to CUDA programming in C language
Introduction to CUDA programming in C languageIntroduction to CUDA programming in C language
Introduction to CUDA programming in C language
angelo119154
 
gpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsngpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsn
ARUNACHALAM468781
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
ssuser413a98
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
Kohei KaiGai
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universe
Ofer Rosenberg
 
Stream Processing
Stream ProcessingStream Processing
Stream Processing
arnamoy10
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage
Kohei KaiGai
 
GPU in Computer Science advance topic .pptx
GPU in Computer Science advance topic .pptxGPU in Computer Science advance topic .pptx
GPU in Computer Science advance topic .pptx
HamzaAli998966
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
J On The Beach
 
Cuda Without a Phd - A practical guick start
Cuda Without a Phd - A practical guick startCuda Without a Phd - A practical guick start
Cuda Without a Phd - A practical guick start
LloydMoore
 
Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8
AbdullahMunir32
 
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Stefano Di Carlo
 
Introduction to cuda geek camp singapore 2011
Introduction to cuda   geek camp singapore 2011Introduction to cuda   geek camp singapore 2011
Introduction to cuda geek camp singapore 2011
Raymond Tay
 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU ArchitectureRevisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
mohamedragabslideshare
 
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
mouhouioui
 
Ad

Recently uploaded (20)

CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - WorksheetCBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
Sritoma Majumder
 
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Celine George
 
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptxSCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
Ronisha Das
 
Geography Sem II Unit 1C Correlation of Geography with other school subjects
Geography Sem II Unit 1C Correlation of Geography with other school subjectsGeography Sem II Unit 1C Correlation of Geography with other school subjects
Geography Sem II Unit 1C Correlation of Geography with other school subjects
ProfDrShaikhImran
 
GDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptxGDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptx
azeenhodekar
 
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Library Association of Ireland
 
Social Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy StudentsSocial Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy Students
DrNidhiAgarwal
 
Ultimate VMware 2V0-11.25 Exam Dumps for Exam Success
Ultimate VMware 2V0-11.25 Exam Dumps for Exam SuccessUltimate VMware 2V0-11.25 Exam Dumps for Exam Success
Ultimate VMware 2V0-11.25 Exam Dumps for Exam Success
Mark Soia
 
Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...
Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...
Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...
Library Association of Ireland
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-30-2025.pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 4-30-2025.pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 4-30-2025.pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-30-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 AccountingHow to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
Celine George
 
SPRING FESTIVITIES - UK AND USA -
SPRING FESTIVITIES - UK AND USA            -SPRING FESTIVITIES - UK AND USA            -
SPRING FESTIVITIES - UK AND USA -
Colégio Santa Teresinha
 
2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx
contactwilliamm2546
 
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACYUNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
DR.PRISCILLA MARY J
 
K12 Tableau Tuesday - Algebra Equity and Access in Atlanta Public Schools
K12 Tableau Tuesday  - Algebra Equity and Access in Atlanta Public SchoolsK12 Tableau Tuesday  - Algebra Equity and Access in Atlanta Public Schools
K12 Tableau Tuesday - Algebra Equity and Access in Atlanta Public Schools
dogden2
 
Biophysics Chapter 3 Methods of Studying Macromolecules.pdf
Biophysics Chapter 3 Methods of Studying Macromolecules.pdfBiophysics Chapter 3 Methods of Studying Macromolecules.pdf
Biophysics Chapter 3 Methods of Studying Macromolecules.pdf
PKLI-Institute of Nursing and Allied Health Sciences Lahore , Pakistan.
 
Handling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptxHandling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptx
AuthorAIDNationalRes
 
apa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdfapa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdf
Ishika Ghosh
 
New Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptxNew Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptx
milanasargsyan5
 
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
Celine George
 
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - WorksheetCBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
Sritoma Majumder
 
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Celine George
 
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptxSCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
Ronisha Das
 
Geography Sem II Unit 1C Correlation of Geography with other school subjects
Geography Sem II Unit 1C Correlation of Geography with other school subjectsGeography Sem II Unit 1C Correlation of Geography with other school subjects
Geography Sem II Unit 1C Correlation of Geography with other school subjects
ProfDrShaikhImran
 
GDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptxGDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptx
azeenhodekar
 
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Library Association of Ireland
 
Social Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy StudentsSocial Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy Students
DrNidhiAgarwal
 
Ultimate VMware 2V0-11.25 Exam Dumps for Exam Success
Ultimate VMware 2V0-11.25 Exam Dumps for Exam SuccessUltimate VMware 2V0-11.25 Exam Dumps for Exam Success
Ultimate VMware 2V0-11.25 Exam Dumps for Exam Success
Mark Soia
 
Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...
Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...
Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...
Library Association of Ireland
 
How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 AccountingHow to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
Celine George
 
2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx
contactwilliamm2546
 
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACYUNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
DR.PRISCILLA MARY J
 
K12 Tableau Tuesday - Algebra Equity and Access in Atlanta Public Schools
K12 Tableau Tuesday  - Algebra Equity and Access in Atlanta Public SchoolsK12 Tableau Tuesday  - Algebra Equity and Access in Atlanta Public Schools
K12 Tableau Tuesday - Algebra Equity and Access in Atlanta Public Schools
dogden2
 
Handling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptxHandling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptx
AuthorAIDNationalRes
 
apa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdfapa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdf
Ishika Ghosh
 
New Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptxNew Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptx
milanasargsyan5
 
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
Celine George
 
Ad

Vpu technology &gpgpu computing

  • 1. VPU TECHNOLOGY &GPGPU COMPUTING Arka Ghosh([email protected]) B.Tech Computer Science & Engineering DELIVERED AT Seacom Engineering College,CSE Dept DATE 7 th April’2011
  • 2. What Is VPU? VPU is Visual Processing Unit it is more generally known as Graphics Processing Unit or GPU. The Graphics Processing Unit is a MASSIVELY PARALAL & MASSIVELY MULTITHREADED microprocessor. HyBrid Solutions NVIDIA SLI ATI Raedon CROSSFIREX Why GPU? GPU is used for high performance Computing . Long time ago work of GPU was to offload & accelerate graphics rendering from the CPU, but now a days the scene has changed.GPU has capability to work like a CPU,in some complex computational cases it beats the CPU. GPU Solutions:- We can get GPU in two forms 1.Integrated GPU It is integrated on the chipset of MotherBoard.It has low memory bandwidth & its latency time is much more than Dedicated ones. i.e-NVIDIA 730a Chipset provides 8200GT GPU with 540Mhz core. 2.Discrete or Dedicated GPU It is the most power full form of GPU.it is generally installed on PCIe or AGP port of MotherBoard.It has its own memory module. i.e-ATI Raedon HD 5970 X2 has Compute power of 4.64 TeraFlops with 3200 Stream Processors & 1 Ghz core © Arka Ghosh 2011
  • 3. What is PPU? PPU is physics processing unit. which specialized for calculation of rigid body dynamics, soft body dynamics, collision detection, fluid dynamics, hair and clothing simulation, finite element analysis, and fracturing of objects. LARRABEE FUSION The Main Leader of PPU is AGIA PhysX. It consists of a general purpose RISC core controlling an array of custom SIMD floating point VLIW processors working in local banked memories, with a switch-fabric to manage transfers between them. There is no cache-hierarchy as in a CPU or GPU. GPUs vs PPUs:- The drive toward GPGPU is making GPUs more and more suitable for the job of a PPU. ULTIMATE FATE OF GPU:- 1.Intel’s LARRABEE 2.AMD’s FUSION © Arka Ghosh 2011
  • 4. -:INTO THE ARCHITECTURE:- Use of SPM:- SPM or SCRATCHPAD MEMORY is a high-speed internal memory used for temporary storage of calculations, data, and other work in progress.Inreference to a microprocessor (&quot;CPU&quot;), scratchpad refers to a special high-speed memory circuit used to hold small items of data for rapid retrieval. EXAMPLE:-• NVIDIA's 8800 GPU running under CUDA provides 16KiB of Scratchpad per thread-bundle when being used for gpgpu tasks. STREAM PROCESSING:  The stream processing paradigm simplifies parallel software and hardware by restricting the parallel computation that can be performed. 1.Uniform Stream. Applications:- Compute Intensity Data Parallelism Data Locality Conventional, sequential paradigm Parallel SIMD paradigm, packed registers (SWAR) for(int el = 0; el < 100; el++) // for each vector vector_sum(result[el], source0[el], source1[el]); for(int i = 0; i < 100 * 4; i++) result[i] = source0[i] + source1[i]; © Arka Ghosh 2011
  • 5.  Graphics Pipeline  The graphics pipeline typically accepts some representation of a three-dimensional scene as an input and results in a 2D raster image as output. OpenGL and Direct3D are two notable graphics pipeline models accepted as widespread industry standards. Stages of the graphics pipeline:-> 1.Transformation 2.Per-vertex lighting 3.Viewing transformation 4.Primitives generation 5.Projection transformation 6.Clipping 7.Viewport transformation 8.Scan conversion or rasterization 9.Texturing, fragment shading 10.Display Shader  Shaders are used to program the graphics processing unit (GPU) programmable rendering pipeline, which has mostly superseded the fixed-function pipeline that allowed only common geometry transformation and pixel-shading functions; with shaders, customized effects can be used. <<<Types Of Shader>>> Vertex shaders. Pixel shaders Geometrical shaders USEFULLNESS OF SHADER:- 1.Simplified graphic processing unit pipeline 2.Parallel processing Programming shaders We can programe shader by using OpenGL,Cg & Microsoft HLSL. © Arka Ghosh 2011
  • 6. GPU CLUSTER  What is Cluster? GPU CLUSTER  Each node of the cluster is GPU. 1.Homogeneous 2.Heterogeneous Components Hardware (Other):- I nterconnector Software:- 1. Operating System 2. GPU driver for the each type of GPU present in each cluster node. 3. Clustering API (such as the Message Passing Interface, MPI). .. Algorithm mapping GPU SWITCHING  Means Switching from one cluster node to another. WINDOWS Switching. LINUX Switching. © Arka Ghosh 2011
  • 7. What Is GPGPU? GPGPU stands for general purpose graphics processin unit computing.Using GPU as CPU is the GPGPU computing NVIDIA CUDA:- It is a GPGPU Computing architecture. It provides heterogeneous computing environment. Why GPU Computing? To achive high performance computing. Minimize ERROR LOW power Consumption..GO GREEN. NVIDIA FLEXES TESLA MUSCLE
  • 8. CUDA Kernels and Threads Parallel portions of an application are executed on the device as kernels One kernel is executed at a time Many threads execute each kernel Differences between CUDA and CPU threads CUDA threads are extremely lightweight CUDA uses 1000s of threads to achieve efficiency Multi-core CPUs can use only a few Definitions Device = GPU Host = CPU Kernel = function that runs on the device Data Movement Example int main(void) { float *a_h, *b_h; // host data float *a_d, *b_d; // device data int N = 14, nBytes, i ; nBytes = N*sizeof(float); a_h = (float *)malloc(nBytes); b_h = (float *)malloc(nBytes); cudaMalloc((void **) &a_d, nBytes); cudaMalloc((void **) &b_d, nBytes); for (i=0, i<N; i++) a_h[i] = 100.f + i; cudaMemcpy(a_d, a_h, nBytes, cudaMemcpyHostToDevice); cudaMemcpy(b_d, a_d, nBytes, cudaMemcpyDeviceToDevice); cudaMemcpy(b_h, b_d, nBytes, cudaMemcpyDeviceToHost); for (i=0; i< N; i++) assert( a_h[i] == b_h[i] ); free(a_h); free(b_h); cudaFree(a_d); cudaFree(b_d); return 0; } © Arka Ghosh 2011
  • 9. © Arka Ghosh2011 10-Series Architecture 240 thread processors execute kernel threads 30 multiprocessors , each contains 8 thread processors One double-precision unit Shared memory enables thread cooperation Thread Processors Multiprocessor Shared Memory Double
  • 10. Execution Model Software Hardware Threads are executed by thread processors Thread Thread Processor Multiprocessor Thread blocks are executed on multiprocessors Thread blocks do not migrate Several concurrent thread blocks can reside on Thread Block ... Grid Device one multiprocessor - limited by multiprocessor resources (shared memory and register file) A kernel is launched as a grid of thread blocks Only one kernel can execute on a device at one time © Arka Ghosh2011
  • 11. Tesla Architecture  © Arka Ghosh 2011
  • 12. Time GigaThread Hardware Thread Scheduler Concurrent Kernel Execution + Faster Context Switch Serial Kernel Execution Parallel Kernel Execution Kernel 1 Kernel 1 Kernel 2 Kernel 2 Ker 4 nel Kernel 3 Kernel 5 Kernel 3 Kernel 4 Kernel 5 Kernel 2 Kernel 2 © Arka Ghosh2011
  • 13. EXAMPLE:-> MATLAB CODE FOR SIMPLE FFT(CPU HOST MODE) FOR DEVICE( nVidia QUADRO Fx 5200*2) clear ALL; t1=cputime; x=rand(2^20,1); f=fft(x); t2=cputime; t3=t2-t1; Here t3=0.4056 Clear ALL; t1=cputime; x=rand(2^20,1); gx=gpuArray(x); f=fft(gx); t2=cputime; t3=t2-t1; Here t3=0.006056 clear ALL t1=cputime; x=rand(50); y=rand(50); z=rand(50); a=10; b=20; c=30; d=40; f=a*(x^2)+b*(x*y)+c*(y^3)+d*(z^4); net=feedforwardnet(800); net=trainlm(net,x,f); t2=cputime; t3=t2-t1; MATLAB code For Simple ANN For CPU t3=250.2154 For GPU t3=122.25 So we can see that The GPU is nearabout 204% efficient than CPU. © Arka Ghosh 2011
  • 14. CONCLUSION  C for the GPU Multi-GPU Computing Massively Multi-threaded Computing Architecture Compatible with Industry Standard Architectures WHERE GPGPU IS USED? MIT-for educational & Scientific Research Purpose Stanford University--for educational & Scientific Research Purpose NCSA (National Center for Supercomputing Applications) NASA Machine Learning & AI field Machine Vision(Mainly Robot Vision) Tablets © Arka Ghosh 2011
  • 15. Acknowledgement  Mriganka Chakraborty(prof. Secom Engineering College) Saibal Chakraborty Dr.Nicolas Pinto .prof. of MIT.-Advanced Supercomputing Dept T.Halfhill-NVIDIA Corp Developer Guide GOOGLE