Parallel Programming Course Project Proposal
Parallel Programming Course Project Proposal
Team
:AnkitKumar[11455],AkhilGupta[11595]
AIM
RealtimeImageReconstructionusinglargesystemoflinearequationsforMedical
Imaging
Introductiontotheproblem
Inmedicine,computedtomographicimagesarereconstructedfromalargenumberof
measurementsofthepatientfromxray(projectiondata),usinglargesystemsof
equation.Forrealtimeimaging,computationneedstobefastenoughtomatchspeed
ofincomingstreamingdatatoobtaincorrectimagereconstruction.
RelatedWork
QRfactorization
isadirectmethodinmatrixalgebrawhichinvolvesthedecomposition
T
ofamatrixMofdimensionsAxBintotheproductofanorthogonalmatrixQ(Q
=Q1)
[2]
andanuppertriangularmatrix.R.Tourinoetalpresentedaparallelalgorithmforthe
QRfactorizationwithcolumnpivotingofasparsematrixbymeansofGivensrotations
andappliedittotheleastsquaresproblemusingPVMonmultiprocessors.
[4]
VolkovandDemmel
presentedstrongperformanceresultsforageneralmatrix
productinthecontextofaQRalgorithm.Howevertheyhavedonepartofthe
computationonCPUandthentransferredittoGPU,withoutaccountingforthetransfer
times.
[3]
Kerretal
,havedemonstratedanimplementationofQRdecompositionthatruns
entirelyontheGPU,utilizingregisterfileoversharedmemoryforperformancegains.
Theyachievednearly5xspeedupforlargematricesoverIntelsMKLnativeQR
algorithm.
Howevertheaboveworkshavebeendoneonlyonstaticcomputationalloads.Inour
work,weintendtotaketheirwork,onestepfurtherbyinvestigating
loadbalancing
betweenCPUandGPUandutilizingboththestrengthsofmultipleprocessesaswell
ascudathreadstoapplytheabove
workonrealtimestreamingdata
.
ProposedWork
Thesystemoflinearequationscanbesolvedsequentiallyorinparalleldependingon
therateandsizeoftheincomingdata.Sincethedataiscomingatsomerate,the
objectiveistohitabalancedoptimumintheusageofeithernewprocessesorlaunch
ofCUDAkernelstoachieveoptimumrateofprocessing.
Forthis,weneed
i)anrealtimepredictorwhichcananalyzesizeoftheincominginputand
correspondinglyschedulecomputationsoneitherhostordevice(itwillactasan
innodescheduler).
ii)thecurrentcomputationswillruninparallelonmultiplenodesofGPUsusinga
hybridMPICUDA
strategy.
iii)acontrolsystemwhichmonitorsaslidingcomputationwindowandtriestokeepits
sizewithinacertainlimit.Theslidingwindowactsasaworklisttoallocateworkto
differentnodes,whichwillalsobeeffectiveinbalancingloadamongthedifferentnodes
ofGPUs.
WewillbemeasuringtheperformanceofourimplementationtotheGPUonlyversion
[3]
[2]
byKerr
aswellasthemultiprocessonlyversionbyTourino
bysupplyingthese
algorithms,dataatthesamerateandvolumeastoourimplementation.
References
[1]MaraJosRodrguezAlvarez,FilomenoSnchez,AntonioSoriano,Amadeo
Iborra.
SparseGivensresolutionoflargesystemoflinearequations:Applications
toimagereconstruction
.
MathematicalandComputerModelling
52
(78),
2010
,
12581264
[2]JuanTourino,RamonDoallo,EmilioL.Zapata.
SparseGivensQRFactorization
onaMultiprocessor
,
4thEuromicroWorkshoponParallelandDistributedProcessing
(PDP'96),January2426,
1996
,Portugal
[3]
AndrewKerr,
DanCampbell
,
MarkRichards
.
QRdecompositiononGPUs
,
Proceedingsof2ndWorkshoponGeneralPurposeProcessingonGraphics
ProcessingUnits,
2009
,Pages7178
[4]VasilyVolkov,JamesW.Demmel.
BenchmarkingGPUstotunedenselinear
algebra,Proceedingsofthe
2008
ACM/IEEEconferenceonSupercomputingArticle
No.31