SlideShare a Scribd company logo
PPoPPĀ 2010,Ā Bangalore,Ā India.
BasicĀ ArchitectureĀ Concepts
CPUĀ Architecture
4Ā stagesĀ ofĀ instructionĀ execution
TooĀ manyĀ cyclesĀ perĀ instructionĀ (CPI)
Fetch Decode Execute Write
t=1 2 3 4 5
PPoPPĀ 2010,Ā Bangalore,Ā India.
BasicĀ ArchitectureĀ Concepts
CPUĀ Architecture
4Ā stagesĀ ofĀ instructionĀ execution
TooĀ manyĀ cyclesĀ perĀ instructionĀ (CPI)
ToĀ reduceĀ theĀ CPI,Ā introduceĀ Ā pipelinedĀ execution
Fetch Decode Execute Write
Fetch Decode Execute Write
Fetch Decode Execute Write
Fetch Decode Execute Write
t=1 2 3 4 5
PPoPPĀ 2010,Ā Bangalore,Ā India.
BasicĀ ArchitectureĀ Concepts
CPUĀ Architecture
4Ā stagesĀ ofĀ instructionĀ execution
TooĀ manyĀ cyclesĀ perĀ instructionĀ (CPI)
ToĀ reduceĀ theĀ CPI,Ā introduceĀ Ā pipelinedĀ execution
NeedsĀ buffersĀ toĀ storeĀ resultsĀ acrossĀ stages.
AĀ cacheĀ toĀ handleĀ slowĀ memoryĀ accessĀ times
Fetch Decode Execute Write
Fetch Decode Execute Write
Fetch Decode Execute Write
Fetch Decode Execute Write
t=1 2 3 4 5
Cache
PPoPPĀ 2010,Ā Bangalore,Ā India.
BasicĀ ArchitectureĀ Concepts
Fetch Decode Execute Write
Fetch Decode Execute Write
Fetch Decode Execute Write
Fetch Decode Execute Write
t=1 2 3 4 5
Cache
CPUĀ Architecture
4Ā stagesĀ ofĀ instructionĀ execution
TooĀ manyĀ cyclesĀ perĀ instructionĀ (CPI)
ToĀ reduceĀ theĀ CPI,Ā introduceĀ Ā pipelinedĀ execution
NeedsĀ buffersĀ toĀ storeĀ resultsĀ acrossĀ stages.
AĀ cacheĀ toĀ handleĀ slowĀ memoryĀ accessĀ times
MultilevelĀ caches,Ā outĀ­ofĀ­orderĀ execution,Ā branchĀ prediction,Ā ...
PPoPPĀ 2010,Ā Bangalore,Ā India.
BasicĀ ArchitectureĀ Concepts
CPUĀ architectureĀ gettingĀ tooĀ complex.
NotĀ translatingĀ toĀ equivalentĀ performanceĀ 
benefits
NeedĀ aĀ rethinkĀ onĀ traditionalĀ CPUĀ architectures.
PPoPPĀ 2010,Ā Bangalore,Ā India.
BasicĀ ArchitectureĀ Concepts
CoupleĀ withĀ thisĀ theĀ newĀ wisdomĀ inĀ computerĀ 
architectures.
 MemoryĀ Wall – memoryĀ latenciesĀ farĀ higher
 ILPĀ Wall – ReducingĀ benefitsĀ fromĀ instructionĀ 
levelĀ parallelism
 PowerĀ Wall – IncreaseĀ inĀ powerĀ consumptionĀ 
withĀ increaseĀ inĀ clockĀ rates.
MultiĀ­coreĀ isĀ theĀ wayĀ forward
Ex:Ā GPUs,Ā Cell,Ā IntelĀ QuadĀ core,Ā ...
PredictedĀ thatĀ 100+Ā coreĀ computersĀ wouldĀ beĀ aĀ 
realityĀ soon.
PPoPPĀ 2010,Ā Bangalore,Ā India.
MulticoreĀ andĀ ManycoreĀ Processors
IBMĀ Cell
NVidiaĀ GeForceĀ 8800Ā includesĀ 128Ā scalarĀ processorsĀ 
andĀ Tesla
SunĀ T1Ā andĀ T2
TileraĀ Tile64
PicochipĀ combinesĀ 430Ā simpleĀ RISCĀ cores
CiscoĀ 188
TRIPSĀ 
PPoPPĀ 2010,Ā Bangalore,Ā India.
TheĀ CaseĀ forĀ theĀ GPUs
GPUsĀ areĀ nowĀ common.Ā TheyĀ alsoĀ haveĀ highĀ computingĀ 
powerĀ perĀ dollar,Ā comparedĀ toĀ theĀ CPU
Today’sĀ computerĀ systemĀ hasĀ aĀ CPUĀ andĀ aĀ GPU,Ā withĀ theĀ 
GPUĀ beingĀ usedĀ primarilyĀ forĀ graphics.
GPUsĀ areĀ goodĀ atĀ someĀ tasksĀ andĀ notĀ soĀ goodĀ atĀ others.
TheyĀ areĀ especiallyĀ goodĀ atĀ processingĀ largeĀ dataĀ suchĀ asĀ 
images.
LetĀ usĀ useĀ theĀ rightĀ processorĀ forĀ theĀ rightĀ task.
Goal:Ā IncreaseĀ theĀ overallĀ throughputĀ ofĀ theĀ computerĀ systemĀ 
onĀ theĀ givenĀ task.Ā UseĀ CPUĀ andĀ GPUĀ synergistically.
PPoPPĀ 2010,Ā Bangalore,Ā India.
EvolutionĀ ofĀ GPUs
Graphics:Ā aĀ fewĀ hundredĀ triangles/verticesĀ mapĀ toĀ afewhundredĀ 
thousandĀ pixels
ProcessĀ pixelsĀ inĀ parallel.Ā DoĀ theĀ sameĀ thingĀ onĀ aĀ largeĀ numberĀ ofĀ 
differentĀ items.
DataĀ parallelĀ modelĀ :Ā parallelismĀ providedĀ byĀ theĀ data
ThousandsĀ toĀ millionsĀ ofĀ dataĀ elements
SameĀ program/instructionĀ onĀ allĀ ofĀ themĀ 
Hardware:Ā 8Ā­16Ā Ā coresĀ toĀ processĀ verticesĀ andĀ 64Ā­128Ā toĀ processĀ 
pixelsĀ byĀ 2005
LessĀ versatileĀ thanĀ CPUĀ cores
SIMDĀ modeĀ ofĀ computations.Ā LessĀ hardwareĀ forĀ instructionĀ issue
NoĀ caching,Ā branchĀ prediction,Ā out of orderĀ execution,Ā etc.
‐ ‐
CanĀ packĀ moreĀ coresĀ inĀ sameĀ siliconĀ dieĀ area
PPoPPĀ 2010,Ā Bangalore,Ā India.
GPUsĀ asĀ aĀ CaseĀ Study
GPGPU – GeneralĀ PurposeĀ ProgrammingĀ onĀ 
GPUsĀ 
OpenGLĀ extensionsĀ 
VeryĀ difficultĀ toĀ program
Recently manufacturesĀ startedĀ supportingĀ CĀ­likeĀ 
Ā 
programmingĀ abstractionĀ toĀ programĀ GPUs
CUDAĀ fromĀ NVidia
OtherĀ benefitsĀ ofĀ GPGPU
AffordableĀ cost,Ā easyĀ availability,Ā computationalĀ 
power
PPoPPĀ 2010,Ā Bangalore,Ā India.
GPUsĀ asĀ aĀ CaseĀ Study
GPUsĀ suitedĀ forĀ routinesĀ withĀ highĀ arithmeticĀ 
intensity.
OneĀ featureĀ isĀ highĀ memoryĀ latency,Ā dependingĀ 
onĀ theĀ natureĀ ofĀ access.
ShouldĀ overlapĀ memoryĀ withĀ arithmetic.
PPoPPĀ 2010,Ā Bangalore,Ā India.
CPUĀ VsĀ GPU
FewĀ powerfulĀ coresĀ Vs.Ā lotsĀ ofĀ smallĀ cores

GPUs:Ā ForĀ goodĀ performance,Ā applicationsĀ 
needĀ highĀ arithmeticĀ intensity
GPUsĀ :Ā NoĀ systemĀ managedĀ cache.
PPoPPĀ 2010,Ā Bangalore,Ā India.
GPGPUĀ asĀ aĀ CaseĀ Study
RegularĀ algorithms
MapĀ wellĀ toĀ dataĀ parallelĀ modelĀ ofĀ GPUs
EachĀ workĀ itemĀ operatesĀ byĀ itselfĀ orĀ withĀ aĀ fewĀ 
neighbors
ExampleĀ settingsĀ :Ā imageĀ processing.
ThreadsĀ canĀ shareĀ data,Ā e.g.,Ā apronĀ pixelsĀ inĀ anĀ 
imageĀ processingĀ kernel.
PPoPPĀ 2010,Ā Bangalore,Ā India.
GPUĀ asĀ aĀ CaseĀ Study
IrregularĀ algorithms
ApplicationsĀ withĀ dataĀ accessesĀ thatĀ areĀ notĀ regularĀ 
inĀ nature.
OccursĀ inĀ settingsĀ suchĀ asĀ graphĀ algorithms,Ā dataĀ 
structuresĀ building,Ā etc.
DifficultĀ toĀ getĀ highĀ efficiencyĀ dueĀ toĀ highĀ memoryĀ 
latencyĀ ofĀ accesses.Ā 
PPoPPĀ 2010,Ā Bangalore,Ā India.
GPGPUĀ ToolsĀ andĀ APIs
OpenGL
CUDA
OpenCL
Brook

More Related Content

More from Subhajit Sahu (20)

Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
Ā 
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
Subhajit Sahu
Ā 
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTESDyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
Subhajit Sahu
Ā 
Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)
Subhajit Sahu
Ā 
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
A Dynamic Algorithm for Local Community Detection in Graphs : NOTESA Dynamic Algorithm for Local Community Detection in Graphs : NOTES
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
Subhajit Sahu
Ā 
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
Scalable Static and Dynamic Community Detection Using Grappolo : NOTESScalable Static and Dynamic Community Detection Using Grappolo : NOTES
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
Subhajit Sahu
Ā 
Application Areas of Community Detection: A Review : NOTES
Application Areas of Community Detection: A Review : NOTESApplication Areas of Community Detection: A Review : NOTES
Application Areas of Community Detection: A Review : NOTES
Subhajit Sahu
Ā 
Community Detection on the GPU : NOTES
Community Detection on the GPU : NOTESCommunity Detection on the GPU : NOTES
Community Detection on the GPU : NOTES
Subhajit Sahu
Ā 
Survey for extra-child-process package : NOTES
Survey for extra-child-process package : NOTESSurvey for extra-child-process package : NOTES
Survey for extra-child-process package : NOTES
Subhajit Sahu
Ā 
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTERDynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Subhajit Sahu
Ā 
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Subhajit Sahu
Ā 
Fast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTESFast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTES
Subhajit Sahu
Ā 
Can you fix farming by going back 8000 years : NOTES
Can you fix farming by going back 8000 years : NOTESCan you fix farming by going back 8000 years : NOTES
Can you fix farming by going back 8000 years : NOTES
Subhajit Sahu
Ā 
HITS algorithm : NOTES
HITS algorithm : NOTESHITS algorithm : NOTES
HITS algorithm : NOTES
Subhajit Sahu
Ā 
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDESDynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Subhajit Sahu
Ā 
Are Satellites Covered in Gold Foil : NOTES
Are Satellites Covered in Gold Foil : NOTESAre Satellites Covered in Gold Foil : NOTES
Are Satellites Covered in Gold Foil : NOTES
Subhajit Sahu
Ā 
Taxation for Traders < Markets and Taxation : NOTES
Taxation for Traders < Markets and Taxation : NOTESTaxation for Traders < Markets and Taxation : NOTES
Taxation for Traders < Markets and Taxation : NOTES
Subhajit Sahu
Ā 
A Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTESA Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTES
Subhajit Sahu
Ā 
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
Subhajit Sahu
Ā 
Income Tax Calender 2021 (ITD) : NOTES
Income Tax Calender 2021 (ITD) : NOTESIncome Tax Calender 2021 (ITD) : NOTES
Income Tax Calender 2021 (ITD) : NOTES
Subhajit Sahu
Ā 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
Ā 
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
Subhajit Sahu
Ā 
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTESDyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
Subhajit Sahu
Ā 
Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)
Subhajit Sahu
Ā 
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
A Dynamic Algorithm for Local Community Detection in Graphs : NOTESA Dynamic Algorithm for Local Community Detection in Graphs : NOTES
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
Subhajit Sahu
Ā 
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
Scalable Static and Dynamic Community Detection Using Grappolo : NOTESScalable Static and Dynamic Community Detection Using Grappolo : NOTES
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
Subhajit Sahu
Ā 
Application Areas of Community Detection: A Review : NOTES
Application Areas of Community Detection: A Review : NOTESApplication Areas of Community Detection: A Review : NOTES
Application Areas of Community Detection: A Review : NOTES
Subhajit Sahu
Ā 
Community Detection on the GPU : NOTES
Community Detection on the GPU : NOTESCommunity Detection on the GPU : NOTES
Community Detection on the GPU : NOTES
Subhajit Sahu
Ā 
Survey for extra-child-process package : NOTES
Survey for extra-child-process package : NOTESSurvey for extra-child-process package : NOTES
Survey for extra-child-process package : NOTES
Subhajit Sahu
Ā 
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTERDynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Subhajit Sahu
Ā 
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Subhajit Sahu
Ā 
Fast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTESFast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTES
Subhajit Sahu
Ā 
Can you fix farming by going back 8000 years : NOTES
Can you fix farming by going back 8000 years : NOTESCan you fix farming by going back 8000 years : NOTES
Can you fix farming by going back 8000 years : NOTES
Subhajit Sahu
Ā 
HITS algorithm : NOTES
HITS algorithm : NOTESHITS algorithm : NOTES
HITS algorithm : NOTES
Subhajit Sahu
Ā 
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDESDynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Subhajit Sahu
Ā 
Are Satellites Covered in Gold Foil : NOTES
Are Satellites Covered in Gold Foil : NOTESAre Satellites Covered in Gold Foil : NOTES
Are Satellites Covered in Gold Foil : NOTES
Subhajit Sahu
Ā 
Taxation for Traders < Markets and Taxation : NOTES
Taxation for Traders < Markets and Taxation : NOTESTaxation for Traders < Markets and Taxation : NOTES
Taxation for Traders < Markets and Taxation : NOTES
Subhajit Sahu
Ā 
A Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTESA Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTES
Subhajit Sahu
Ā 
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
Subhajit Sahu
Ā 
Income Tax Calender 2021 (ITD) : NOTES
Income Tax Calender 2021 (ITD) : NOTESIncome Tax Calender 2021 (ITD) : NOTES
Income Tax Calender 2021 (ITD) : NOTES
Subhajit Sahu
Ā 

Basic Computer Architecture and the Case for GPUs : NOTES