0% found this document useful (0 votes)
26 views26 pages

An Overview of Gnutella

The Gnutella network is a decentralized peer-to-peer file sharing network established in 2001 as an alternative to the centralized Napster. It uses a protocol that allows peers to connect directly and search for files without a central index. Searches are conducted through query flooding within a time-to-live limit, though this lacks scalability. The network topology has been shown to follow a power-law or scale-free structure, and various search strategies using concepts like random walks, biased routing, and supernodes have been proposed to improve efficiency.

Uploaded by

nishasaiyed2304
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views26 pages

An Overview of Gnutella

The Gnutella network is a decentralized peer-to-peer file sharing network established in 2001 as an alternative to the centralized Napster. It uses a protocol that allows peers to connect directly and search for files without a central index. Searches are conducted through query flooding within a time-to-live limit, though this lacks scalability. The network topology has been shown to follow a power-law or scale-free structure, and various search strategies using concepts like random walks, biased routing, and supernodes have been proposed to improve efficiency.

Uploaded by

nishasaiyed2304
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 26

An Overview of Gnutella

History
The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network was spurred on by Napster's threatened legal demise in early 2001.

A generic view

object1

peer

object2

peer

No central authority.
3

What is Gnutella?
Gnutella is a protocol for distributed search

peer-to-peer comm decentralized model

Two stages: 1. Join Network later 2. Use Network, I.e discover / search other peers
4

Gnutella Jargon
Servent: A Gnutella node. Each servent is both a server and a client.
2 Hops

1 Hop

Hops: a hop is a pass through an intermediate node


client

TTL: how many hops a packet can go before it dies (default setting is 7 in Gnutella)

Gnutella Scenario
Step 0: Join the network Step 1: Determining who is on the network "Ping" packet is used to announce your presence on the network. Other peers respond with a "Pong" packet. Also forwards your Ping to other connected peers A Pong packet also contains: an IP address port number amount of data that peer is sharing Pong packets come back via same route Step 2: Searching Gnutella "Query" ask other peers if they have the file you desire A Query packet might ask, "Do you have any content that matches the string Double Helix"? Peers check to see if they have matches & respond (if they have any matches) & send packet to connected peers Continues for TTL Step 3: Downloading Peers respond with a QueryHit (contains contact info) File transfers use direct connection using HTTP protocols GET method
6

Remarks
Simple idea , but lacks scalability, since query flooding wastes bandwidth. Sometimes, existing objects may not be located due to limited TTL. Subsequently, various improved search strategies have been proposed.
7

Searching in Gnutella
The topology is dynamic, I.e. constantly changing. How do we model a constantly changing topology? Usually, we begin with a static topology, and later account for the effect of churn. Modeling topology (measurements provide useful inputs) Random graph Power law graph

Random graph: Erds-Rnyi model


A random graph G(n, p) is constructed by starting with
a set of n vertices, and adding edges between pairs of nodes at random. Every possible edge occurs independently

with probability p.
Q. Is Gnutella topology a random graph?

Gnutella topology
Gnutella topology is actually a power-law graph. (Also called scale-free graph) What is a power-law graph? The number of nodes with degree k = c.k - r (Contrast this with Gaussian distribution where the number of nodes with degree k = c. 2 - k. )

________________
Many graphs in the nature exhibit power-law characteristics. Examples, world-wide web (the number of pages that have k in-links is proportional to k - 2), The fraction of scientific papers that receive k citations is k -3 etc.
10

AT&T Call Graph

# of telephone numbers from which calls were made

How many telephone numbers receive calls from k different telephone numbers?

# of telephone numbers called


11 4

Gnutella network

power-law link distribution


10 proportion of nodes
2

data power-law fit t = 2.07

10

10

10

10 number of neighbors

summer 2000, data provided by Clip2


12 5

A possible explanation
Nodes join at different times. The more connections a node has, the more likely it is to acquire new connections (Rich gets richer). Popular webpages attract new pointers.

It has been mathematically shown that such a growth


process produces power-law network
13 7

Search strategies
Flooding Random walk / - Biased random walk/ - Multiple walker random walk (Combined with) One-hop replication /

Two-hop replication
k-hop replication
14

On Random walk
Let p(d) be the probability that a random walk on a dD lattice returns to the origin. In 1921, Plya proved that, (1) p(1)=p(2)=1, but (2) p(d)<1 for d>2 There are similar results on two walkers meeting each other via random walk
15

Search via random walk


Existence of a path does not necessarily mean that such a path can be discovered

16

Search via Random Walk


Search metrics Delay = discovery time in hops

Overhead = total distance covered by the walker Both should be as small as possible. For a single random walker, these are equal. K random walkers is a compromise. For search by flooding, if delay = h then overhead = d + d2 + + dh where d = degree of a node.
17

A simple analysis of random walk


Let p = Population of the object. i.e. the fraction of nodes hosting the object T = TTL (time to live) Hop count h 1 2 Probability of success p (1-p).p

3
T

(1-p)2.p
(1-p)T-1.p
18

A simple analysis of random walk


Expected hop count E(h) =

1.p + 2.(1-p).p + 3(1-p)2.p + + T.(1-p)T-1.p


= 1/p. (1-(1-p)T) - T(1-p)T

With a large TTL, E(h) = 1/p, which is intuitive. With a small TTL, there is a risk that search will time out before an existing object is located.

19

K random walkers
Assume they all k walkers start in unison. Probability that none could find the object after one hop = (1-p)k. The probability. that none succeeded after T hops = (1-p)kT. So the probability that at least one walker succeeded is 1-(1-p)kT. A typical assumption is that the search is abandoned as soon as at least one walker succeeds

As k increases, the overhead increases, but the delay decreases. There is a tradeoff.

20

Increasing search efficiency


Major strategies

1. Biased walk utilizing node degree heterogeneity.


2. Utilizing structural properties like random graph, power-law graphs, or small-world properties

3. Topology adaptation for faster search


4. Introducing two layers in the graph structure using supernodes

21

One hop replication


Each node keeps track of the indices of the files belonging to its immediate neighbors. As a result, high capacity / high degree nodes can provide useful clues to a large number of search queries.

Where is

22

Biased random walk


P=2/10 P=5/10

P=3/10

Each node records the degree of the neighboring nodes. Search easily gravitates towards high degree nodes that hold more clues.
23

power-law graph
number of nodes found
94

67 63 54

6 1

Deterministic biased walk


24 9

The next step


This growing surge in popularity revealed the limits of the initial protocol's scalability. In early 2001, variations on the protocol improved the scalability. Instead of treating every user as client and server, some users were treated as "ultrapeers or supernodes, routing search requests and responses for users connected to them.

25

The KaZaA approach


Where is ABC?

ABC

Supernode Supernode

Powerful nodes (supernodes) act as local index servers, and client queries are propagated to other supernodes. Two-layered architecture.
26

You might also like