Networkx: Network Analysis With Python: Salvatore Scellato
Networkx: Network Analysis With Python: Salvatore Scellato
1. Introduction to NetworkX
• Tool to study the structure and dynamics of social, biological, and infrastructure
networks
>>> g = nx.Graph()
>>> g.add_edge(’a’,’b’,weight=0.1)
>>> g.add_edge(’b’,’c’,weight=1.5)
>>> g.add_edge(’a’,’c’,weight=1.0)
>>> g.add_edge(’c’,’d’,weight=2.2)
>>> print nx.shortest_path(g,’b’,’d’)
[’b’, ’c’, ’d’]
>>> print nx.shortest_path(g,’b’,’d’,weighted=True)
[’b’, ’a’, ’c’, ’d’]
Introduction to NetworkX - online resources
Online documentation and active mailing list with helpful developers and
contributors.
Introduction to NetworkX - Python’s Holy Trinity
• It is possible to draw within NetworkX and to export network data and draw
with other programs (i.e., GraphViz, matplotlib)
Introduction to NetworkX - official website
http://networkx.lanl.gov/
2. Getting started with Python and NetworkX.
Getting started - import NetworkX
There are different Graph classes for undirected and directed networks. Let’s
create a basic Graph class
>>> g = nx.Graph()
The graph g can be grown in several ways. NetworkX includes many graph
generator functions and facilities to read and write graphs in many formats.
Getting started - add nodes
# A list of nodes
>>> g.add_nodes_from([2 ,3])
# A container of nodes
>>> h = nx.path_graph(10)
>>> g.add_nodes_from(H) # g now contains the nodes of h
A node can be any hashable object such as strings, numbers, files, functions,
and more. This provides important flexibility for all your projects.
# Single edge
>>> g.add_edge(1,2)
>>> e=(2,3)
>>> g.add_edge(*e) # unpack edge tuple
# List of edges
>>> g.add_edges_from([(1 ,2) ,(1 ,3)])
# Container of edges
>>> g.add_edges_from(h.edges())
Any NetworkX graph behaves like a Python dictionary with nodes as primary keys
The special edge attribute ’weight’ should always be numeric and holds values used by
algorithms requiring weighted edges.
Many applications require iteration over nodes or over edges: simple and easy in
NetworkX
>>> g.add_edge(1,2)
>>> for node in g.nodes():
print node, g.degree(node)
1, 1
2, 1
>>> g.add_edge(1,3,weight=2.5)
>>> g[1][2][‘weight’] = 1.5
>>> for n1,n2,attr in g.edges(data=True):
print n1,n2,attr[‘weight’]
1, 2, 1.5
1, 3, 2.5
Getting started - directed graphs
>>> dg = nx.DiGraph()
>>> dg.add_weighted_edges_from([(1,4,0.5), (3,1,0.75)])
>>> dg.out_degree(1,weighted=True)
0.5
>>> dg.degree(1,weighted=True)
1.25
>>> dg.successors(1)
[4]
>>> dg.predecessors(1)
[3]
Some algorithms work only for directed graphs and others are not well defined for
directed graphs. If you want to treat a directed graph as undirected for some
measurement you should probably convert it using Graph.to_undirected()
Getting started - multigraphs
NetworkX provides classes for graphs which allow multiple edges between any pair
of nodes, MultiGraph and MultiDiGraph.
This can be powerful for some applications, but many algorithms are not well
defined on such graphs: shortest path is one example.
Where results are not well defined you should convert to a standard graph in a way
that makes the measurement well defined.
>>> mg = nx.MultiGraph()
>>> mg.add_weighted_edges_from([(1,2,.5), (1,2,.75),
(2,3,.5)])
>>> mg.degree(weighted=True)
{1: 1.25, 2: 1.75, 3: 0.5}
Getting started - graph operators
# classic graphs
>>> K_5=nx.complete_graph(5)
>>> K_3_5=nx.complete_bipartite_graph(3,5)
>>> barbell=nx.barbell_graph(10,10)
>>> lollipop=nx.lollipop_graph(10,20)
# random graphs
>>> er=nx.erdos_renyi_graph(100,0.15)
>>> ws=nx.watts_strogatz_graph(30,3,0.1)
>>> ba=nx.barabasi_albert_graph(100,5)
>>> red=nx.random_lobster(100,0.9,0.9)
Getting started - graph I/O
NetworkX is able to read/write graphs from/to files using common graph formats:
• edge lists
• adjacency lists
• GML
• GEXF
• Python pickle
• GraphML
• Pajek
• LEDA
• YAML
Formats
• Node pairs with no data:
1 2
• Python dictionary as data:
1 2 {'weight':7, 'color':'green'}
• Arbitrary data:
1 2 7 green
Getting started - draw a graph
NetworkX is not primarily a graph drawing package but it provides basic drawing
capabilities by using Matplotlib. For more complex visualization techniques it
provides an interface to use the open source Graphviz software package.
Note that the drawing package in NetworkX is not (yet!) compatible with Python
versions 3.0 and above.
3. Basic network analysis.
Basic network analysis - graph properties
Let’s load the Hartford drug users network: it’s a directed graph with integers as
nodes.
hartford = nx.read_edgelist('hartford.txt',
create_using=nx.DiGraph(),nodetype=int)
NetworkX takes advantage of Python dictionaries to store node and edge measures.
The dict type is a data structure that represents a key-value mapping.
Let’s compute in- and out-degree distribution of the graph and plot them. Don’t try this method
with massive graphs, it’s slow...!
plt.figure()
plt.plot(in_values,in_hist,'ro-') # in-degree
plt.plot(out_values,out_hist,'bv-') # out-degree
plt.legend(['In-degree','Out-degree'])
plt.xlabel('Degree')
plt.ylabel('Number of nodes')
plt.title('Hartford drug users network')
plt.savefig('hartford_degree_distribution.pdf')
plt.close()
Basic network analysis - degree distribution
Basic network analysis - node centralities
Now, we will convert the graph to an undirected network and extract the main connected component;
then we will compute node centrality measures.
hartford_ud = hartford.to_undirected()
hartford_components =
nx.connected_component_subgraphs(hartford_ud)
hartford_mc = hartford_components[0]
# Betweenness centrality
bet_cen = nx.betweenness_centrality(hartford_mc)
# Closeness centrality
clo_cen = nx.closeness_centrality(hartford_mc)
# Eigenvector centrality
eig_cen = nx.eigenvector_centrality(hartford_mc)
Basic network analysis - most central nodes
To find the most central nodes we will use Python’s list comprehension technique to do
basic data manipulation on our centrality dictionaries.
def highest_centrality(cent_dict):
"""Returns a tuple (node,value) with the node
with largest value from Networkx centrality dictionary."""
# Create ordered tuple of centrality data
cent_items=[(b,a) for (a,b) in cent_dict.iteritems()]
return tuple(reversed(cent_items[0]))
Basic network analysis - plotting results
def centrality_scatter(dict1,dict2,path="",
ylab="",xlab="",title="",line=False):
# Create figure and drawing axis
fig = plt.figure(figsize=(7,7))
ax1 = fig.add_subplot(111)
if line:
# use NumPy to calculate the best fit
slope, yint = plt.polyfit(xdata,ydata,1)
xline = plt.xticks()[0]
yline = map(lambda x: slope*x+yint,xline)
ax1.plot(xline,yline,ls='--',color='b')
In most cases this will entail either exporting the raw network data, or metrics from
some network analysis
1. NetworkX can write out network data in as many formats as it can read them,
and the process is equally straightforward
2. When you want to export metrics we can use Python’s built-in XML and CSV
libraries
Basic network analysis - write results to file
Let’s export a CSV file with node IDs and the related centrality values on each line:
this can be then used to plot without computing again all centrality measures.
results = [(k,bet_cen[k],clo_cen[k],eig_cen[k])
for k in hartford_mc]
f = open('hartford_results.txt','w')
for item in results:
f.write(','.join(map(str,item)))
f.write('\n')
f.close()
4. Writing your own code.
Write your own code - BFS
With Python and NetworkX it’s easy to write any graph-based algorithm
def get_triangles(g):
for n1 in g.nodes:
neighbors1 = set(g[n1])
for n2 in filter(lambda x: x>n1, nodes):
neighbors2 = set(g[n2])
common = neighbors1 & neighbors2
for n3 in filter(lambda x: x>n2, common):
yield n1,n2,n3
Write your own code - average neighbours’ degree
Compute the average degree of each node’s neighbours (long and one-liner version).
def avg_neigh_degree(g):
data = {}
for n in g.nodes():
if g.degree(n):
data[n] = float(sum(g.degree(i) for i in g[n]))/
g.degree(n)
return data
def avg_neigh_degree(g):
return dict((n,float(sum(g.degree(i) for i in g[n]))/
g.degree(n)) for n in g.nodes() if g.degree(n))
5.You are ready for your project!
What you have learnt today about NetworkX
• How to create graphs from scratch, with generators and by loading local data
• How to compute basic network measures, how they are stored in NetworkX
and how to manipulate them with list comprehension
• How to use matplotlib to visualize and plot results (useful for final report!)
• How to use and include NetworkX features to design your own algorithms/
analysis
Useful links
Thanks!
Questions?
Salvatore Scellato
Email: [email protected]
Web: www.cl.cam.ac.uk/~ss824/
Twitter: @thetarro
Foursquare: thetarro