Distributed Hash Tables
Distributed Hash Tables
DHTs
Likeitsoundsadistributedhashtable
Put(Key,Value)
Get(Key)>Value
Interfacevs.Implementation
Put/Getisanabstractinterface
Veryconvenienttoprogramto
Doesn'trequireaDHTintoday'ssenseofthe
world.
e.g.,Amazon'sS^3storageservice
/bucketname/objectid>data
We'llmostlyfocusonthebackendlog(n)lookup
systemslikeChord
Butresearchershaveproposedalternate
architecturesthatmayworkbetter,dependingon
assumptions!
Lasttime:UnstructuredLookup
Pureflooding(Gnutella),TTLlimited
Sendmessagetoallnodes
Supernodes(Kazaa)
Floodtosupernodesonly
Adaptivesupernodesandothertricks(GIA)
Noneofthesescaleswellforsearchingfor
needles
AlternateLookups
Keepinmindcontraststo...
Flooding(Unstructured)fromlasttime
Hierarchicallookups
DNS
Properties?Rootiscritical.Today'sDNSrootis
widelyreplicated,runinserioussecuredatacenters,
etc.Loadisasymmetric.
NotalwaysbadDNSworksprettywell
Butnotfullydecentralized,ifthat'syourgoal
P2PGoal(general)
Harnessstorage&computationacross
(hundreds,thousands,millions)ofnodesacross
Internet
Inparticular:
Canweusethemtocreateagigantic,hugely
scalableDHT?
P2PRequirements
Scaletothosesizes...
Berobusttofaultsandmalice
Specificchallenges:
Nodearrivalanddeparturesystemstability
Freeloadingparticipants
Maliciousparticipants
Understandingboundsofwhatsystemscanand
cannotbebuiltontopofp2pframeworks
DHTs
Twooptions:
lookup(key)>nodeID
lookup(key)>data
WhenyouknowthenodeID,youcanaskit
directlyforthedata,butspecifyinginterfaceas
>dataprovidesmoreopportunitiesforcaching
andcomputationatintermediaries
Differentsystemsdoeither.We'llfocusonthe
problemoflocatingthenoderesponsibleforthe
data.Thesolutionsarebasicallythesame.
AlgorithmicRequirements
Everynodecanfindtheanswer
Keysareloadbalancedamongnodes
Note:We'renottalkingaboutpopularityofkeys,
whichmaybewildlydifferent.Addressingthisisa
furtherchallenge...
Routingtablesmustadapttonodefailuresand
arrivals
Howmanyhopsmustlookupstake?
Tradeoffpossiblebetweenstate/maint.trafficand
numlookups...
ConsistentHashing
Howcanwemapakeytoanode?
Considerordinaryhashing
func(key)%N>nodeID
Whathappensifyouadd/removeanode?
Consistenthashing:
MapnodeIDstoa(large)circularspace
Mapkeystosamecircularspace
Keybelongstonearestnode
DHT:ConsistentHashing
Key 5 K5
Node 105
N105 K20
N90
K80
A key is stored at its successor: node with next higher ID
15441Spring2004,JeffPang 11
ConsistentHashing
VeryusefulalgorithmictrickoutsideofDHTs,
etc.
Anytimeyouwanttonotgreatlychangeobject
distributionuponbucketarrival/departure
Detail:
Tohavegoodloadbalance
Mustrepresenteachbucketbylog(N)virtual
buckets
15441Spring2004,JeffPang 12
DHT:ChordBasicLookup
N120
N10
Where is key 80?
N105
K80 N90
N60
15441Spring2004,JeffPang 13
DHT:ChordFingerTable
1/4 1/2
1/8
1/16
1/32
1/64
1/128
N80
Entryiinthefingertableofnodenisthefirstnodethatsucceedsor
equalsn+2i
Inotherwords,theithfingerpoints1/2niwayaroundthering
15441Spring2004,JeffPang 14
DHT:ChordJoin
Assumeanidentifierspace[0..8]
Noden1joins Succ.Table
0 iid+2isucc
1 021
7 131
251
6 2
5 3
4
15441Spring2004,JeffPang 15
DHT:ChordJoin
Noden2joins Succ.Table
0 iid+2isucc
1 022
7 131
251
6 2
Succ.Table
iid+2isucc
5 3
031
4
141
261
15441Spring2004,JeffPang 16
DHT:ChordJoin
Succ.Table
iid+2isucc
011
122
240
Nodesn0,n6join Succ.Table
0 iid+2isucc
1 022
7 136
256
Succ.Table
iid+2isucc
070 6 2
100
222
Succ.Table
iid+2isucc
5 3
036
4
146
266
15441Spring2004,JeffPang 17
DHT:ChordJoin
Succ.Table Items
iid+2isucc 7
Nodes: 011
n1,n2,n0,n6 122
240
Items: 0
1
Succ.Table Items
iid+2isucc 1
f7,f2 7
022
136
256
Succ.Table 6 2
iid+2isucc
070
100 Succ.Table
222 iid+2isucc
5 3
036
4
146
266
15441Spring2004,JeffPang 18
DHT:ChordRouting
Succ.Table Items
iid+2isucc 7
Uponreceivingaqueryfor 011
itemid,anode: 122
Checkswhetherstoresthe 240
itemlocally
Ifnot,forwardsthequeryto 0 Succ.Table Items
thelargestnodeinits 7 1 iid+2isucc 1
successortablethatdoes query(7) 022
notexceedid 136
256
Succ.Table 6 2
iid+2isucc
070
100 Succ.Table
222 iid+2isucc
5 3
036
4
146
266
15441Spring2004,JeffPang 19
DHT:ChordSummary
Routingtablesize?
LogNfingers
Routingtime?
Eachhopexpectsto1/2thedistancetothe
desiredid=>expectO(logN)hops.
15441Spring2004,JeffPang 20
Alternatestructures
Chordislikeaskiplist:eachtimeyougoway
towardsthedestination.Othertopologiesdothis
too...
15441Spring2004,JeffPang 21
Treelikestructures
Pastry,Tapestry,Kademlia
Pastry:
NodesmaintainaLeafSetsize|L|
|L|/2nodesabove&belownode'sID
(LikeChord'ssuccessors,butbidirectional)
Pointerstolog_2(N)nodesateachleveliofbit
prefixsharingwithnode,withi+1different
e.g.,nodeid01100101
storestoneighborat1,00,010,0111,...
15441Spring2004,JeffPang 22
Hypercubes
theCANDHT
EachhasID
Maintainspointerstoaneighborwhodiffersin
onebitposition
Onlyonepossibleneighborineachdirection
Butcanroutetoreceiverbychanginganybit
15441Spring2004,JeffPang 23
SomanyDHTs...
Comparealongtwoaxes:
Howmanyneighborscanyouchoosefromwhen
forwarding?(ForwardingSelection)
Howmanynodescanyouchoosefromwhen
selectingneighbors?(NeighborSelection)
Failureresilience:Forwardingchoices
Pickinglowlatencyneighbors:Bothhelp
15441Spring2004,JeffPang 24
Proximity
Ring:
Forwarding:log(N)choicesfornexthopwhen
goingaroundring
Neighborselection:Pickfrom2^inodesatleveli
(greatflexibility)
Tree:
Forwarding:1choice
Neighbor:2^i1choicesforithneighbor
15441Spring2004,JeffPang 25
Hypercube
Neighbors:1choice
(neighborswhodifferinonebit)
Forwarding:
Canfixanybityouwant.
N/2(expected)waystoforward
So:
Neighbors:Hypercube1,Others:2^i
Forwarding:tree1,hypercubelogN/2,ringlogN
15441Spring2004,JeffPang 26
Howmuchdoesitmatter?
Failureresiliencewithoutrerunningrouting
protocol
Treeismuchworse;ringappearsbest
Butallprotocolscanusemultipleneighborsat
variouslevelstoimprovethese#s
Proximity
Neighborselectionmoreimportantthanroute
selectionforproximity,anddrawsfromlargespace
witheverythingbuthypercube
15441Spring2004,JeffPang 27
Otherapproaches
Insteadoflog(N),cando:
Directrouting(everyoneknowsfullroutingtable)
Canscaletotensofthousandsofnodes
Mayfaillookupsandretrytorecoverfrom
failures/additions
Onehoproutingwithsqrt(N)stateinsteadoflog(N)
state
What'sbestforrealapplications?Stillupinthe
air.
15441Spring2004,JeffPang 28
DHT:Discussion
Pros:
GuaranteedLookup
O(logN)pernodestateandsearchscope
(Orotherwise)
Cons:
Hammerinsearchofnail?Nowbecoming
popularinp2pBittorrentDistributed
Tracker.Butstillwaitingformassive
uptake.Ornot.
Manyservices(likeGoogle)arescalingto
huge#swithoutDHTlikelog(N)
15441Spring2004,JeffPang 29
FurtherInformation
Wedidn'ttalkaboutKademlia'sXORstructure
(likeageneralizedhypercube)
SeeTheImpactofDHTRoutingGeometryon
ResilienceandProximityformoredetailabout
DHTcomparison
Nosilverbullet:DHTsveryniceforexactmatch,
butnotforeverything(nextfewslides)
15441Spring2004,JeffPang 30
Writable,persistentp2p
Doyoutrustyourdatato100,000monkeys?
Nodeavailabilityhurts
Ex:Store5copiesofdataondifferentnodes
Whensomeonegoesaway,youmustreplicate
thedatatheyheld
Harddrivesare*huge*,butcablemodemupload
bandwidthistinyperhaps10Gbytes/day
Takesmanydaystouploadcontentsof200GB
harddrive.Veryexpensiveleave/replication
situation!
15441Spring2004,JeffPang 31
Whenarep2p/DHTsuseful?
Cachingandsoftstatedata
Workswell!BitTorrent,KaZaA,etc.,all
usepeersascachesforhotdata
Findingreadonlydata
Limitedfloodingfindshay
DHTsfindneedles
BUT
15441Spring2004,JeffPang 32
APeertopeerGoogle?
Complexintersectionqueries(the+who)
Billionsofhitsforeachtermalone
Sophisticatedranking
Mustcomparemanyresultsbeforereturninga
subsettouser
Very,veryhardforaDHT/p2psystem
Needhighinternodebandwidth
(ThisisexactlywhatGoogledoesmassive
clusters)
15441Spring2004,JeffPang 33