0% found this document useful (0 votes)
7 views

03_P2P_Lecture

This document discusses Peer-to-Peer (P2P) computing, focusing on its architecture, including unstructured and structured overlays like Napster, Gnutella, and BitTorrent. It explains the functionality of these networks, their historical context, and the challenges they face, such as flooding and legal issues. Additionally, it covers the evolution of structured overlays, particularly the Chord DHT and Pastry, highlighting their mechanisms for data storage and retrieval.

Uploaded by

Pig happy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

03_P2P_Lecture

This document discusses Peer-to-Peer (P2P) computing, focusing on its architecture, including unstructured and structured overlays like Napster, Gnutella, and BitTorrent. It explains the functionality of these networks, their historical context, and the challenges they face, such as flooding and legal issues. Additionally, it covers the evolution of structured overlays, particularly the Chord DHT and Pastry, highlighting their mechanisms for data storage and retrieval.

Uploaded by

Pig happy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

Peer-to-Peer Computing: DHTs and Overlay

Networks

Distributed Systems

Ahmed Ali-Eldin
One of the best places to read about systems
http://highscalability.com/
What is P2P?
● It is (used to be) every ISP's nightmare
– In 2007, estimations between 50% to 70% of
Internet traffic is P2P [1,2,3].
– Look at the nice picture from 2002 [4].
What is P2P again?
●“A distributed network architecture may be called
a Peer-to-Peer (P-to-P, P2P) network, if the
participants share a part of their own hardware
resources (processing power, storage
capacity,network link capacity, printers).”
●“These shared resources are necessary to
provide the Service and content offered by the
network (e.g. file sharing or shared workspaces
for collaboration).
“They are accessible by other peers directly,

without passing intermediary entities. The


Why do they use P2P?
●Go beyond services offered by client-server
systems by having symmetry in roles where a
client may also be a server.
●Unlike Grid systems, P2P systems do not arise
from the collaboration between established and
connected groups of systems and without a more
reliable set of resources to share.
– Grid was an idea that computing should be
like the power grid
What is a P2P Overlay?
Distributed systems, without any hierarchical

organization or centralized control.


Peers form self-organizing overlay networks that

are overlayed on the Internet Protocol (IP)


networks.
– They are virtual networks
Today’s lecture

● Unstructured Overlays
– Napster
– Gnutella

– BitTorrent

● Structured Overlays
Unstructured overlays
History: Napster

Figure Courtesy of Sameh El-Ansary's and Seif Haridi's course at SICS/KTH


History: Basic operations in Napster
● Join
– Connect to the Napster name server
● Leave/Fail
– Server detects you are not there any more
– Data removed from directory
● Share (Publish/Insert)
– Hi server, I have these files I want to share
● Search
– Hi server, does anyone online have these
History: The end
History: End of Napster
Napster was brought down when Metallica and

Dr. Dre filed a lawsuit for copyright infringement.


● Its name was sold at a bankruptcy auction
– https://us.napster.com/
Reflections on Napster
● Why could it be sued?
● Can we build something that is un-sueable?
Rise of the Unstructured overlays
History:Random (Unstructured)
Overlay Networks Distributed
Directory + Distributed Storage
12

10

Column 1
6
Column 2
Column 3

0
Row 1 Row 2 Row 3 Row 4

Figure Courtesy of Sameh El-Ansary's and Seif Haridi's course at SICS/KTH


History: Gnutella
●This is still a functional network with updates
rolled out as we speak
●Peers join network by connecting to one of the
(always there) peers
– Usedto be available from a link that is now a
porn web page
History: Gnutella messages
● Group Membership
– Ping: “Hi I am here, add me to the network”
● Broadcasted to neighbors, neighbors
forward the message
● Pong: “Cool, my IP-address is x.x.x.x, etc..
– From all peers who get the Ping
History: Gnutella messages
● Search
– Query: “Does any one have 'foo.mp3'?”
● Contains a Time-to-Live (TTL)
● Floods the network until TTL
– Queryresponse: “ I do, you can get the file
through...” or no response
● “Maybe it is just the TTL, let me increase it
and try again”
History: Gnutella messages
● File Transfer
– GET and PUSH: is the actual protocol used for
downloading the files
– GET when no peer is behind a firewall
– PUSH when the file owner is behind a firewall
(or similar)
Example for Gnutella (Historical)

Where is A?
A
History: Gnutella
a peer periodically PINGs its neighbors to

discover other participating peers.


– And that was problematic
History: Gnutella
a peer periodically PINGs its neighbors to

discover other participating peers.


– And that was problematic
● Flooding
– Too much traffic
– Impossible
to break the network without taking
down everyone (or at least the always on
peers)
– Ortargeting the people who originally wrote
the software
Design question
● How to reduce flooding?
Design question
● How to reduce flooding?
Bittorrent: Content distribution with a
twist
●Large number of peers want to download a
certain file as fast as possible.
● All previous techniques discussed do not do that
– Has to wait for a peer already downloading the
file
– Then another peer can start retrieving the file
from one of the two peers
Bittorent
● Philosophy:
– Use the bandwidth of everyone in the network
to the maximum
Bittorrent: Layman's example
● One file at Peer1 of size 10 GB on a 1 Mb/s link.
● 100 peers who want that file at the same time
● A few options to distribute the data
Layman's example: Option 1
● Send it to Peer2 in 21 hours (at least)
● Then send it to Peer3 in another 21 hours.
To distribute the file to all peers it will take 2100

hours (actually more), i.e. roughly 3 months.


Even with all peers acting like sources once they

get the file, you need 6*21 hours=126 hours,


almost 5.25 days. (How did I get this number?)
Layman's example: Option 2
● Divide the file in to 100 distinct chunks
●Send to each peer out of the 100 peers one
chunk of the file
– Try
to be fair with bandwidth allocated to each
peer
●Each peer now has a small part of the file after
the first 21 hours.
– But the whole file is in the network
– Peers can then fetch from each other the parts
they are missing
In reality
“Piece length maps to the number of bytes in

each piece the file is split into.”


●“For the purposes of transfer, files are split into
fixed-size pieces which are all the same length
except for possibly the last one which may be
truncated.”
●“piece length is almost always a power of two,
most commonly 2^18 = 256 K (BitTorrent prior to
version 3.2 uses 2 20 = 1 M as default).”

Layman's example: Option 2
● That means even faster downloads
– Smaller chunks will be sent in a few seconds.
– Then
each peer in the network will have
something to share.
– Making the optimal transfer length just a little
over 21 hours.
In reality
●You almost never have 100 peers asking for the
same file at the exact same instance
Some will start the download earlier than the

others
●The early downloaders will, with high probability,
eat up the upload bandwidth of the file source
since they establish the connection ahead
– Andeach uploader/downloader can set the
maximum number of simultaneous
connections
Bittorent in action

Courtesy of Sameh El-Ansary


Seeders
● Machines that have a complete copy of the file
They are usually selfish and do not want to wait

after they get the file


●At least the first seed has to stay to serve one
complete copy of the file
In general at all times all pieces of a file must be

around or the process fails


Leechers
● Those who do not have a full copy of the file.
●Once they have the full file, they become
seeders.

A swarm is the set of peers


that are participating in distributing the same file
Tracker (Membership server)
●A peer that keeps track of the seeds and the
leechers
●Connected to when you download a .torrent file,
which contains information about the file, its
length, name, and hashing information, and URL
of a tracker
● It is a simple protocol layered on top of HTTP
– Downloader sends information about the file it
is downloading and the port number.
– Tracker responds with a random list of
contact information about the peers that are
Torrent File

Courtesy of Sameh El-Ansary


Handshaking and exchanging
Bitfields
●Send handshake to all
peers received from the
tracker
● Exchange bitfields
Choking algorithm: One sided
relationship
Choking algorithm: Mutual love

Choked=false Choked=false
Interested=true Interested=true

A B

Tit for Tat


Chocking algorithm: Philanthropy

Choked=false Choked=false
Interested=false Interested=true

A B

A is typically a seeder in this case


Which piece to get first
● Rarest-first
– Choose the piece that is rarest among your
peers
– If possible
● Random-first piece
– As we already know, the first piece is crucial
for someone to be useful to the community
– So, rarest-first is not applied
– Getfirst piece as fast as possible, even if you
download subpieces from different peers
BitTorrent legal uses
https://www.makeuseof.com/tag/8-legal-uses-for-

bittorrent-youd-be-surprised/
Structured Overlays
Premise

The ID of a data item determines the machine on


which it is going to be stored
Simplest Example: Consistent Hashing
14 12 2

bye.cs.kth.se hej.cs.upp.se me.cs.lu.se Berke.cs.umu.se

12 3 7 0

- 4 machines that need to share data


- Each machine will have an id based on the hash of its IP address
- Data will also have ids using the same function (SHA, MD5sum..)
Simple Example: Build a ring
● Original structure
Simple Example: update the ring
● Policy
– A doc with id y,
would be stored at
Succ(y)
● Lookup
– ask node n which
you know about to
find the successor of
id.
Handling Joins
●You need to know the
successor
●Your predecessor
needs to know about
you
New example: Handling Joins
● A node 9 joins the ring
Handling Joins: data redistribution
●Can you think what
happens on data
leaves?
Chord DHT
● Developed at MIT
● Very popular since it is very simple to understand
In addition to the successor pointer, every node

has a predecessor pointer as well


Chord Join
9 can join through any

other node, take 0 for


example.
9 will set its

predecessor to nil
●0 will help 9 to find its
successor
●0 will tell 9 that its
successor should be 12
Chord Join
12

10

Column 1
6
Column 2
Column 3

0
Row 1 Row 2 Row 3 Row 4

- 12 changes its predecessor to 9


- 7 will learn that 9 is it successor when it runs stablize
-12 will tell
Can their be collisions in the peer
names?
●Can the hash function produce two peers having
the same hash ID?
– H(a) = H(b), and a ≠ b?
– Yes?

– No?
Can their be collisions in the peer
names?
●Can the hash function produce two peers having
the same hash ID?
– No, if you use a good function
● Called Provably secure hash functions
– MD5 and SHA-I are not Provably secure hash
functions
– Still
we use non provably secure hash
functions and seldom use the provably secure
ones
● Can you think why?
Can their be collisions in the peer
names?
●Can the hash function produce two peers having
the same hash ID?
– No, if you use a good function
● Called Provably secure hash functions
– MD5 and SHA-I are not Provably secure hash
functions
– Still
we use non provably secure hash
functions and seldom use the provably secure
ones
● Can you think why?
Routing: Fingers table
Keep a routing table of

M peers at each node


– Where N=2M
– Maximum distance
to any other node is
then log2(N), i.e., M
hops
Routing: Fingers table
Chord: Routing
In fact, if nodes are uniformly distributed, the

maximum is log (# of nodes), i.e. log (8) hops


between any two nodes
● The average complexity is: 1⁄2 log(#nodes)
Chord: Handling failures
● If one successor fails, total ring collapses (Why?)
– Each node maintains a successor list of length
r
– Ifa node’s immediate successor does not
respond, it uses the second entry in its
successor list
● Update the successors' list
Chord Algorithm
Chord Algorithm
Chord Algorithm
Practicalities
●If we were storing files, replication can be costly,
also access time might be high, however, it might
be needed in certain applications like distributed
files systems
●For file sharing and other application it would be
more suitable to store addresses of files rather
than files
– Stale addresses could be a problem.
– One solution is to let them values expire and
the publisher refreshes them periodically.
Would not it be nice to have
something with locality?
●A huge issue with chord is its inability to take
locality in account
Pastry
•From Microsoft Research
•Aims to achieving
logarithmic diameter with •Was used later as a basis
a logarithmic node state for the Bamboo DHT
•Targets the issue of which inspired Amazon’s
locality: Dynamo, the backbone of
–2 machines in one country, the S3 and EC2
could communicate through a infrastructure
machine in another continent
because the hash of their ids
will be far apart in the id
space.
Pastry – Mapping items onto nodes
•An item in Pastry is
stored at the node that is
numerically closest to the
id of the item
•Naturally, such a node •Can we create an
will have the longest example together?
matching prefix.
•More here:
https://www.freepastry.or
g/
Why did research on P2P computing
almost die?
● Major drawbacks with P2P systems [7]
– Churn and performance guarantees
– The power of cloud computing
● Skype abandoned their P2P architecture
– Decreasing cost of content delivery
– Mostof the low-hanging fruits have been
harvested
● To actually push the field you need to put so
much effort
– You need to be a superb mathematician,
Security
We have mostly talked about robustness and

efficiency
● P2P adds a new dimension to security
– With newer attacks and newer challenges
● Example:
– Sybil attacks
– File Poisoning
– Rational Attacks

● Plus the good old ones like DdoS, man-in-the-


Security: Sybil attacks
●Adversary introduces a large number of peers
that he controls to the P2P system
– Subverting the reputation system
– Corrupting the network
– Tricking the users
– Controlling/influencing the network

“One can have, some claim, as many electronic


personas as one has time and energy to
create.”
Security: Sybil attacks
●“without a logically centralized authority, Sybil
attacks are always possible except under extreme
and unrealistic assumptions of resource parity and
coordination among entities.” [6]
Example: Amazon's Dynamo
Huge parts of Amazon's infrastructure depend on

Dynamo
●It is a system that integrates a wide range of
concepts you will learn in the distributed systems
course
– Consistent hashing
– Versioning

– Vector clocks
– Quorum

– And a few more


Where does P2P live today
● Internal within companies and datacenters
– Dynamo at Amazon
● Blockchains
The next big thing and the killer of P2P computing
research

Cloud Computing
Some extra extra extra readings :)
[1] http://spin2013.cs.sunysb.edu/~phillipa/CSE390/choffnes08.pdf

[2]https://www.net.t-labs.tu-berlin.de/papers/AFS-CIP2PCIP-07.pdf

[3] https://www.usenix.org/legacy/event/imc05/tech/full_papers/karagiannis/karagiannis.pdf

[4] http://research.cs.washington.edu/networking/websys/pubs/osdi_2002/osdi.pdf

[5] http://www.computer.org/csdl/proceedings/p2p/2001/1503/00/15030101.pdf

[6] http://www.cs.rice.edu/Conferences/IPTPS02/101.pdf

[7] Li, Baochun, Yuan Feng, and Bo Li. "Rise and fall of the peer-to-peer empire." Tsinghua Science

and Technology 17.1 (2012): 1-


16.http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=A744210E174E5775327F6B8F844D91
E8?doi=10.1.1.296.1725&rep=rep1&type=pdf

[8] https://www.allthingsdistributed.com/2007/10/amazons_dynamo.html

You might also like