An Overview of Gnutella
An Overview of Gnutella
History
The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network was spurred on by Napster's threatened legal demise in early 2001.
A generic view
object1
peer
object2
peer
No central authority.
3
What is Gnutella?
Gnutella is a protocol for distributed search
Two stages: 1. Join Network later 2. Use Network, I.e discover / search other peers
4
Gnutella Jargon
Servent: A Gnutella node. Each servent is both a server and a client.
2 Hops
1 Hop
TTL: how many hops a packet can go before it dies (default setting is 7 in Gnutella)
Gnutella Scenario
Step 0: Join the network Step 1: Determining who is on the network "Ping" packet is used to announce your presence on the network. Other peers respond with a "Pong" packet. Also forwards your Ping to other connected peers A Pong packet also contains: an IP address port number amount of data that peer is sharing Pong packets come back via same route Step 2: Searching Gnutella "Query" ask other peers if they have the file you desire A Query packet might ask, "Do you have any content that matches the string Double Helix"? Peers check to see if they have matches & respond (if they have any matches) & send packet to connected peers Continues for TTL Step 3: Downloading Peers respond with a QueryHit (contains contact info) File transfers use direct connection using HTTP protocols GET method
6
Remarks
Simple idea , but lacks scalability, since query flooding wastes bandwidth. Sometimes, existing objects may not be located due to limited TTL. Subsequently, various improved search strategies have been proposed.
7
Searching in Gnutella
The topology is dynamic, I.e. constantly changing. How do we model a constantly changing topology? Usually, we begin with a static topology, and later account for the effect of churn. Modeling topology (measurements provide useful inputs) Random graph Power law graph
with probability p.
Q. Is Gnutella topology a random graph?
Gnutella topology
Gnutella topology is actually a power-law graph. (Also called scale-free graph) What is a power-law graph? The number of nodes with degree k = c.k - r (Contrast this with Gaussian distribution where the number of nodes with degree k = c. 2 - k. )
________________
Many graphs in the nature exhibit power-law characteristics. Examples, world-wide web (the number of pages that have k in-links is proportional to k - 2), The fraction of scientific papers that receive k citations is k -3 etc.
10
How many telephone numbers receive calls from k different telephone numbers?
Gnutella network
10
10
10
10 number of neighbors
A possible explanation
Nodes join at different times. The more connections a node has, the more likely it is to acquire new connections (Rich gets richer). Popular webpages attract new pointers.
Search strategies
Flooding Random walk / - Biased random walk/ - Multiple walker random walk (Combined with) One-hop replication /
Two-hop replication
k-hop replication
14
On Random walk
Let p(d) be the probability that a random walk on a dD lattice returns to the origin. In 1921, Plya proved that, (1) p(1)=p(2)=1, but (2) p(d)<1 for d>2 There are similar results on two walkers meeting each other via random walk
15
16
Overhead = total distance covered by the walker Both should be as small as possible. For a single random walker, these are equal. K random walkers is a compromise. For search by flooding, if delay = h then overhead = d + d2 + + dh where d = degree of a node.
17
3
T
(1-p)2.p
(1-p)T-1.p
18
With a large TTL, E(h) = 1/p, which is intuitive. With a small TTL, there is a risk that search will time out before an existing object is located.
19
K random walkers
Assume they all k walkers start in unison. Probability that none could find the object after one hop = (1-p)k. The probability. that none succeeded after T hops = (1-p)kT. So the probability that at least one walker succeeded is 1-(1-p)kT. A typical assumption is that the search is abandoned as soon as at least one walker succeeds
As k increases, the overhead increases, but the delay decreases. There is a tradeoff.
20
21
Where is
22
P=3/10
Each node records the degree of the neighboring nodes. Search easily gravitates towards high degree nodes that hold more clues.
23
power-law graph
number of nodes found
94
67 63 54
6 1
25
ABC
Supernode Supernode
Powerful nodes (supernodes) act as local index servers, and client queries are propagated to other supernodes. Two-layered architecture.
26