0% found this document useful (0 votes)
81 views

Parallel Programming Course Project Proposal

The proposed project aims to implement real-time image reconstruction using QR factorization to solve large systems of linear equations for medical imaging applications. The work will investigate load balancing between the CPU and GPU to process streaming data. It will schedule computations dynamically based on input size, using both multiple processes and CUDA threads. Performance will be measured against GPU-only and multi-process only implementations.

Uploaded by

Akhil Gupta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views

Parallel Programming Course Project Proposal

The proposed project aims to implement real-time image reconstruction using QR factorization to solve large systems of linear equations for medical imaging applications. The work will investigate load balancing between the CPU and GPU to process streaming data. It will schedule computations dynamically based on input size, using both multiple processes and CUDA threads. Performance will be measured against GPU-only and multi-process only implementations.

Uploaded by

Akhil Gupta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

ParallelProgrammingCourseProjectProposal

Team
:AnkitKumar[11455],AkhilGupta[11595]

AIM
RealtimeImageReconstructionusinglargesystemoflinearequationsforMedical
Imaging

Introductiontotheproblem
Inmedicine,computedtomographicimagesarereconstructedfromalargenumberof
measurementsofthepatientfromxray(projectiondata),usinglargesystemsof
equation.Forrealtimeimaging,computationneedstobefastenoughtomatchspeed
ofincomingstreamingdatatoobtaincorrectimagereconstruction.

RelatedWork
QRfactorization
isadirectmethodinmatrixalgebrawhichinvolvesthedecomposition
T
ofamatrixMofdimensionsAxBintotheproductofanorthogonalmatrixQ(Q
=Q1)
[2]
andanuppertriangularmatrix.R.Tourinoetalpresentedaparallelalgorithmforthe
QRfactorizationwithcolumnpivotingofasparsematrixbymeansofGivensrotations
andappliedittotheleastsquaresproblemusingPVMonmultiprocessors.
[4]
VolkovandDemmel
presentedstrongperformanceresultsforageneralmatrix
productinthecontextofaQRalgorithm.Howevertheyhavedonepartofthe
computationonCPUandthentransferredittoGPU,withoutaccountingforthetransfer
times.
[3]
Kerretal
,havedemonstratedanimplementationofQRdecompositionthatruns
entirelyontheGPU,utilizingregisterfileoversharedmemoryforperformancegains.
Theyachievednearly5xspeedupforlargematricesoverIntelsMKLnativeQR
algorithm.
Howevertheaboveworkshavebeendoneonlyonstaticcomputationalloads.Inour
work,weintendtotaketheirwork,onestepfurtherbyinvestigating
loadbalancing
betweenCPUandGPUandutilizingboththestrengthsofmultipleprocessesaswell
ascudathreadstoapplytheabove
workonrealtimestreamingdata
.

ProposedWork
Thesystemoflinearequationscanbesolvedsequentiallyorinparalleldependingon
therateandsizeoftheincomingdata.Sincethedataiscomingatsomerate,the
objectiveistohitabalancedoptimumintheusageofeithernewprocessesorlaunch
ofCUDAkernelstoachieveoptimumrateofprocessing.

Forthis,weneed
i)anrealtimepredictorwhichcananalyzesizeoftheincominginputand
correspondinglyschedulecomputationsoneitherhostordevice(itwillactasan
innodescheduler).
ii)thecurrentcomputationswillruninparallelonmultiplenodesofGPUsusinga
hybridMPICUDA
strategy.
iii)acontrolsystemwhichmonitorsaslidingcomputationwindowandtriestokeepits
sizewithinacertainlimit.Theslidingwindowactsasaworklisttoallocateworkto
differentnodes,whichwillalsobeeffectiveinbalancingloadamongthedifferentnodes
ofGPUs.

WewillbemeasuringtheperformanceofourimplementationtotheGPUonlyversion
[3]
[2]
byKerr
aswellasthemultiprocessonlyversionbyTourino
bysupplyingthese
algorithms,dataatthesamerateandvolumeastoourimplementation.

References
[1]MaraJosRodrguezAlvarez,FilomenoSnchez,AntonioSoriano,Amadeo
Iborra.
SparseGivensresolutionoflargesystemoflinearequations:Applications
toimagereconstruction
.
MathematicalandComputerModelling
52
(78),
2010
,
12581264

[2]JuanTourino,RamonDoallo,EmilioL.Zapata.
SparseGivensQRFactorization
onaMultiprocessor
,
4thEuromicroWorkshoponParallelandDistributedProcessing
(PDP'96),January2426,
1996
,Portugal
[3]

AndrewKerr,
DanCampbell
,
MarkRichards
.
QRdecompositiononGPUs
,
Proceedingsof2ndWorkshoponGeneralPurposeProcessingonGraphics
ProcessingUnits,
2009
,Pages7178

[4]VasilyVolkov,JamesW.Demmel.
BenchmarkingGPUstotunedenselinear
algebra,Proceedingsofthe
2008
ACM/IEEEconferenceonSupercomputingArticle
No.31

You might also like