Get Visualizing Graph Data MEAP Edition Corey Lanum Free All Chapters
Get Visualizing Graph Data MEAP Edition Corey Lanum Free All Chapters
com
https://textbookfull.com/product/visualizing-
graph-data-meap-edition-corey-lanum/
https://textbookfull.com/product/visualizing-graph-data-1st-edition-
corey-lanum/
textbookfull.com
https://textbookfull.com/product/visualizing-financial-data-1st-
edition-rodriguez/
textbookfull.com
https://textbookfull.com/product/visualizing-streaming-data-
interactive-analysis-beyond-static-limits-first-edition-aragues/
textbookfull.com
https://textbookfull.com/product/small-animal-medicine-and-metabolic-
disorders-self-assessment-color-review-second-edition-ruaux/
textbookfull.com
Learning Perl Making Easy Things Easy and Hard Things
Possible 7th Edition Randal L. Schwartz
https://textbookfull.com/product/learning-perl-making-easy-things-
easy-and-hard-things-possible-7th-edition-randal-l-schwartz/
textbookfull.com
https://textbookfull.com/product/the-woman-in-the-trunk-costa-
family-1-1st-edition-jessica-gadziala/
textbookfull.com
https://textbookfull.com/product/essentials-of-modern-neuroscience-
lange-1st-edition-erik-roberson/
textbookfull.com
https://textbookfull.com/product/mathematical-logic-on-numbers-sets-
structures-and-symmetry-1st-edition-roman-kossak/
textbookfull.com
https://textbookfull.com/product/starting-out-with-visual-basic-8th-
edition-tony-gaddis/
textbookfull.com
MEAP Edition
Manning Early Access Program
Visualizing Graph Data
Version 7
https://forums.manning.com/forums/visualizing-graph-data
welcome
Thank you for purchasing the MEAP for Visualizing Graph Data! The book is an introductory
level book aimed at data scientists who want to take better advantage of graphs by visualizing
them, and at web developers who want to create applications that include graph visualization.
Although many of the examples use JavaScript for creating interactive user interfaces around
graphs, knowledge of the language is not strictly required and I expect that a non-
programmer would get value from this book as well.
I am releasing the first three chapters to start.
Chapter 1 covers graphs more generally and will be helpful to someone interested in
understanding graph technology and how it might be useful in the context of software
applications.
Chapter 2 goes through some concrete examples of graphs and graph visualizations in
different industries to show the value of visualizations with more detail.
Chapter 3 introduces some of the tools that we use in the rest of the book, KeyLines and
Gephi and when they might be helpful. I will also have a more technical appendix that will
cover detailed implementations using some of the other popular visualization tools like D3 or
Sigma.js.
Please feel free to take advantage of the Author Online forum. I’ll be reviewing all the
feedback there and responding when I can. Your feedback is helpful in making sure the book
is accessible and useful during the development process.
—Corey Lanum
brief contents
PART 1: GRAPH VISUALIZATION BASICS
1 Introduction to Graph Visualization
2 Case Studies in Graph Visualization
3 An Introduction to Gephi and KeyLines
PART 2: VISUALIZE YOUR OWN DATA
4 Data Modeling
5 Engage Your Audience: How To Build Graph Visualizations
6 Creating Interactive Visualizations
7 Graph Layouts: How to Organize a Chart
8 Big Data: Using Graphs When There is Too Much Data
9 Dynamic Graphs: How to Show Data Over
10 Graphs on Maps: The Where of Graph Visualization
APPENDIXES: ALTERNATIVE VISUALIZATION TOOLS
A Visualize Scientific Data with Cytoscape
B Build your own visualizations using D3.js
©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and
other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders.
https://forums.manning.com/forums/visualizing-graph-data
1
1
Introduction to Graph Visualization
In December 2001, the Enron Corporation filed for what was at the time the largest-ever
corporate bankruptcy. Its stock had fallen from a high of $90 per share the previous year to
$0.61, decimating its employees’ pensions and shareholders’ investments in it. The FBI’s
investigation into this collapse became the largest white-collar criminal investigation in history
as they seized over 3000 boxes of documents and 4 terabytes of data. Among the information
seized were about 600,000 e-mails between key executives at the organization. Although the
FBI took pains to read every e-mail individually, the investigators recognized that they were
unlikely to find a smoking gun – people committing complex financial fraud don’t often
disclose their actions in a written form. And in 2001, e-mails were only starting to become the
primary means of internal communications; lots of information was still exchanged via phone
calls.
In addition to looking at the text of individual e-mails, the FBI also wanted to uncover
patterns in the communications, perhaps in an attempt to better understand who the decision
makers were within Enron or had access to a lot of the information internal to the company.
To do this, they modeled the e-mails in Enron as a graph.
Before we get into how the FBI used graph visualization in their investigation leading to the
conviction of 24 Enron executives, it’s worth mentioning there are two types of people who
can benefit by reading this book. The first are web developers interested in building a graph-
visualization application or adding graph-visualization capability to an existing web application.
The included JavaScript samples will be particularly relevant for this group.
The second type is data scientists or data engineers who wish to learn more about graph-
visualization technology and how it can enhance their research. The coding samples may not
add much for this group; those sections can be skipped for these readers, who may find the
Gephi examples of prime interest.
©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and
other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders.
https://forums.manning.com/forums/visualizing-graph-data
2
Back to the FBI. A graph is a model of data that consists of nodes, which are discrete data
elements, and edges, which are relationships between nodes. The graph model brings to the
forefront relationships that may be hidden in tabular views of the same data and illustrates
what is most important. By making those relationships between the data elements a core part
of the data structure, it helps you identify patterns in the data that wouldn’t otherwise be
apparent. But building graph data structures are only half the solution to pattern recognition.
This book will teach you how to visualize graphs using interactive node-link visualization
diagrams, and by the end, you’ll be able to create your own dynamic, interactive visualizations
using a variety of tools available today.
In this chapter, we’ll go a little deeper into the concept of a graph, its history and uses,
and talk about various techniques used to visualize graph data. Subsequent chapters will build
on this framework by introducing concrete examples of graph visualizations and the data they
are based on and discuss various techniques for creating useful visualizations.
directed – the relationship has a direction. Stella owns the car, but it doesn’t make
sense to say the car owns Stella.
undirected – the two items are linked without the concept of direction, the relationship
inherently goes both ways. If Stella is linked to Roger because they committed a crime
together, it means the same thing to say Stella was arrested with Roger as it does
Roger was arrested with Stella.
©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and
other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders.
3
Figure 1.1: A property graph of a single e-mail between Enron executives. The two nodes are the sender and
recipient of the e-mail, and the edge is the e-mail
Both nodes and edges can have properties, which are key-value pairs, lists of properties and
values, describing either the data element itself or the relationship. Below is a simple property
graph showing that Stella bought a 2008 Volkswagen Jetta in September 2007 and sold it in
October 2013. By modeling it as a graph in figure 1.2, it highlights that Stella had a
relationship with this car, albeit temporarily.
Figure 1.2: A simple property graph with two nodes and an edge. Stella (the first node) bought a 2008
Volkswagen Jetta (the second node) in September 2007 and sold it in October 2013. By modeling it as a graph,
it highlights that Stella had a relationship with this car (the edge).
©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and
other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders.
4
An e-mail is a relationship, too, between the sender and the recipient. The properties of the
nodes are things like e-mail address, name, and tile, and the properties of the relationship are
the date/time it was sent, its subject line, and the text of the e-mail.
To prove conspiracy, the FBI was interested in all the e-mails sent among the Enron
executives, not just a single one, so let’s add some more nodes to represent a larger number
of e-mails sent during a specified period of time as you can see in figure 1.3.
Figure 1.3: A graph of some of the Enron executives' e-mail communications. You can easily see that Timothy
Belden is a hub of communication in this segment of Enron, sending and receiving email from many other
executives
This is called a directed graph because it matters whether Kevin Presto sent an e-mail to
Timothy Belden or received one – there’s a big difference between sending and receiving
information when you’re investigating who knew what when. The arrowheads on the edges
show that directionality: Kevin Presto sent an e-mail to Timothy Belden, but Timothy Belden
did not reply, indicating they may not have been close associates, or they may have spoken
offline. As we start to add more data to our graph, you can see the value of graphs – patterns
become apparent. In the example above, you can easily see that Timothy Belden is a hub of
communication in this segment of Enron, sending and receiving email from many other
executives.
©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and
other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders.
5
Figure 1.4: The Seven Bridges of Königsberg problem. Using this map of the bridges of Königsberg, Prussia, try
to draw a route that reaches each area of the city but never crosses the same bridge twice.
Leonhard Euler proved this problem impossible by abstracting the regions of the city into
individual points and the bridges as paths between those points, as you can see in figure 1.5
below.
©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and
other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders.
6
Figure 1.5: 7 Bridges and 4 Land Areas of Königsberg as a Graph. In this graph, nodes denote the land masses
bordering the Pregel River and two islands in its middle. Edges represent the bridges connecting the two islands
and two shorelines.
Each land area of Königsberg is indicated by a point, and the bridges are the lines that connect
the points. This is a graph, just like the Enron one. The graph model in Figure 1.5 makes it
easier to see that nodes with an even number of links can be navigated easily (you can enter
and exit using two different links), but nodes with an odd number of links can only be either
the beginning or the end of a path (this is obvious for a node with only one link, but you can
also see it applies to 3, 5, etc). The number of links from a node is called that node’s degree.
The Königsberg bridge problem can now be proven possible only if 2 nodes have an odd
degree and the rest have an even degree. The above diagram doesn’t satisfy that condition
and therefore it’s impossible to cross each bridge only once, so graph theory has answered a
problem previously considered intractable.
©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and
other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders.
Visit https://textbookfull.com
now to explore a rich
collection of eBooks, textbook
and enjoy exciting offers!
7
DEFINITION: A visualization is any method of using imagery to convey a point. When working with
computer graphics, it typically means finding a way to show large amounts of data in a single view. Creating
images to show graph data is called graph visualization.
In this case, we may want to consider the e-mail itself a node. Figure 1.6 shows the sender
and the recipients of an e-mail with subject line “Let’s Talk,” sent among Enron executives.
The nodes are executives from Enron, and the edges represent how they received the e-mail
(whether they were in the To:, CC:, or BCC: fields).
©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and
other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders.
8
Consider the following very simple table. It consists of only two columns, a list of names and
the countries where that name is used. In the United States, the Social Security
Administration releases a similar table each year showing the popularity of first names given
to babies, as measured by new applications for Social Security numbers.
Name Country
Juan Mexico
João Brazil
Jean France
Antoine France
Ignacio Mexico
João Portugal
This can be modeled as a graph – take each name as a node and also each country. The link
between them exists if the name and country appear in the same row. The result looks like the
following:
Figure 1.7: Using a graph to illustrate a table. The graph view of these name/country pairs
Looking at this graph tells us more than is immediately obvious from the table, namely that
Jean is used both in France and the United States. It also shows that João is used in Brazil and
©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and
other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders.
9
Portugal, but no other names are associated with those countries. As you can see, you can
generate graph models from even the simplest data set, but typically, you’re going to want
properties on the nodes, the links, or both. In this case, we may want the frequency of that
name’s use in the country as a property of the link, and perhaps whether that name is
typically male or female.
VALUE OF GRAPHS
Graphs can be incredibly useful, but there’s a danger in overusing them. Many people, when
first exposed to graph concepts, begin to see graphs everywhere, in every data set, but it can
sometimes obscure the meaning of the data.
Graphs are a good choice when:
The links between items are not obvious. For example, linking someone’s first name
and last name together is unlikely to be useful unless you’re specifically looking at
relationship of the names independently. “How many ‘Coreys’ drive black cars?”
There’s a structure embedded in the data. If every link has unique end points with no
other links then the graph is a bunch of disconnected links which doesn’t answer any
interesting question
There are at least some properties on the nodes. If a data set doesn’t have any
properties, then it may create a pretty picture when drawn, but it’s impossible to tell
what you’re looking at.
Below in figure 1.8 is a graph model that is not very helpful. It represents the data found in
the back of a road atlas that shows the mileage and driving times between city pairs.
©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and
other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders.
10
Figure 1.8: A graph of driving times between North American cities. A graph is not an effective way of presenting
this type of data because everything is connected to everything else.
In reality, every city in North America is connected to every other city via roads, and it’s
unlikely that the atlas browser wants to bother adding up the various segments and city pairs
necessary to drive from Richmond to Buffalo, they just want to know how far and how long to
get there. In this case, they might prefer an association matrix, which is a different way of
representing graphs:
Association Matrix
An association matrix is a table displaying the node names as both the columns and rows. Then for
each pair of nodes that has a link between them, the cell between those names is either blocked with
a mark or filled in with a property value. In an undirected graph, the values will be repeated, as each
node pairing appears twice: once with the first node as a row and the second as a column, and then
vice versa.
©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and
other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders.
11
This table -- taken from a 2008 Rand McNally atlas -- is an association matrix presenting
similar data on city pairs.
Figure 1.9 The back page of my road atlas. City names are represented as both columns and rows and the cell
indicates the distance between them.
Sets of data where the relationships between the data elements are the most important
feature are the most useful to model as a graph. Graphs work best when you have a key
component of analysis. In this section, you’ve had a brief introduction to the graph data
model, how tabular data can be represented as a graph in the “Graph Data Model Introduced”
subsection, and when graphs might be useful. In the next section, we will discuss how and
when to visualize graphs, that is, draw pictures of this data model on paper or on computer
screens.
©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and
other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders.
12
sort of model – and not picture how it looks. Ernst Rutherford developed the model of the
atom in 1909 that we’re all familiar with, a nucleus of protons and neutrons with electrons
whizzing in orbits around the nucleus. This model was quickly replaced by Shrodinger’s more
accurate model based around quantum mechanics, but the Rutherfordian model is still, 90
years later, the one that the public embraces. Why? Because it can be pictured. The
Schrodinger model is more accurate, but it’s a mathematical concept, not a visual one, and
therefore has never acquired broad public appeal. Data is the same way; unless you can show
your audience what you’re talking about, they won’t recall it. Visualization helps bridge that
gap and gets the understanding of the data to the decision maker.
I described the property graph model, with nodes, edges, and properties in section 1.1,
but the topic of this book is graph visualization. The entire reason to collect data is to make
better informed decisions based on the data, so it’s important to not just collect data with no
useful way of accessing it. And with graph data, that typically means drawing the graph.
Although there are many different methods of graph visualization, and I’ll briefly discuss
many of them, the focus of this book is going to be on node-link visualization. This is not to
say that other visualizations are never useful, but node-link visualizations tend to the have the
broadest appeal regardless of the data source and require the least amount of technical
understanding to understand what they are seeing. We’ve been using node-link visualization
so far in this chapter and it’s just what it sounds like. Nodes are points or polygons or icons
and links are lines connecting those points. Node link diagrams are almost always drawn on a
2D plane and almost never three dimensional. An important aspect of the node link diagram is
that its location doesn’t tell you anything interesting about the node. Nodes are placed solely
for convenience and readability, which make this quite different from a Cartesian scatterplot,
for example. An effect of this is that layout, or how nodes are arranged on the chart becomes
much more important. Two charts with identical data but different layouts can imply different
things to the human eye.
1. The first is to better understand the structure of the data that you have.
2. The second purpose of visualization is to expose a broader audience to the data
connections.
This visualization, figure 1.10 below shows the structure of the sales database and how its
elements are connected to one another, but not the connections between individual employees
and products.
©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and
other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders.
13
Figure 1.10: A sales database showing connections between different data types
With regard to the first purpose, understanding the structure: In a dataset, what sorts of
things are linked to other things? Many of the diagrams you’ll see from graph databases are
designed to illuminate this structure.
In the above example, the structure of a sales database is visualized. Suppliers supply
products that are members of categories. Employees take orders that consist of products. To a
data scientist or an application engineer, this view is very important, it helps define the model
of the data, how it’s stored, and how users will interact with it. Get this wrong, and it can be a
very time consuming and expensive process to fix.
The second purpose is to visualize the data inside your dataset. In this case, we’re not
interested in the categories of data, but the actual relationships among the data elements
themselves. Instead of saying “Employees sell orders which consist of products”, we can get
specific and say “Brad from Home Depot sold me a chainsaw from Husqvarna.” Then we can
get more specific – who else is buying Husqvarna chainsaws from Home Depot? What other
products are included in those orders? The visualization allows us to better understand the
connections embedded within the data itself.
A key reason that graph visualization is important is it gives a visual interface to data
discovery. While much of the big data revolution of the past decade has been in understanding
trends in the aggregate data, it is equally important to discover connections and relationships
in the individual data that may not have been known. A dashboard will be unlikely to show
this, but a graph will allow the user to explore through the data and discover these patterns
visually. We’ll discuss this more in chapter 6.
©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and
other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders.
14
CIRCLE PLOT
If the primary goal is to show aggregate link from groups of nodes, as opposed to individual
data, something like the circle plot may make more sense. A good example is from
http://www.global-migration.info/here where they are showing global migration between
and within the 6 populated continents. Their data is a graph of migration patters from each
country to each other country, but a node-link diagram of all this data would be busy and
wouldn’t show aggregate patterns as well as the circle plot, so it was a good choice.
Figure 1.11: A graph Global-Migration.info. This stylish graph illustrates migration patterns among people from
all six inhabited continents.
HIVE PLOT
Another example of an alternative to the node-link diagram is called the hive plot. As
mentioned above, node-link graph visualizations focus on individual data elements and the
connections between them. While useful, they can fail to identify and communicate
connections among different types or groups of data elements. This can be especially useful
when trying to understand structure among very large networks, in the tens or hundreds of
©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and
other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders.
15
thousands of nodes or links. The hive plot differentiates nodes into three or more types and
aligns them from the center of the chart on an axis. Links between elements of different types
are drawn as curved lines around the center of the chart. This can allow a viewer to visually
differentiate between two types that are tightly linked versus weakly linked, but it doesn’t
display links between elements in the same type, and makes a drill down, to look at subsets of
the data, much more difficult.
Figure 1.5: Hive plot of the e coli bacteria. Notice the large number of connections of the leftmost group but the
fewer connections between the top and right
There are lots of benefits to using node-link diagrams to better understand data, but there are
instances where other visualizations are more helpful. In general, node-links are valueable
when you want to focus on specifics, and are less helpful when you’re interested in
aggregates, as we’ve seen above.
1.3 Summary
In this chapter, I’ve introduced you to the definition of a graph, and discussed the benefits of
thinking about data in terms of nodes and links. I’ve also emphasized when this is beneficial
and when you’d see more benefit from a tabular view of data. I’ve also touched on the history
©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and
other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders.
16
of graph visualization and the reason that drawing a picture of your data can be enlightening.
Although most of this book will focus on node-link visualizations, I mentioned a few other
styles of visualization that can be helpful in certain circumstances.
©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and
other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders.
Visit https://textbookfull.com
now to explore a rich
collection of eBooks, textbook
and enjoy exciting offers!
17
2
Case Studies in Graph
Visualization
Over the past dozen years, interest in graphs has exploded beyond academia and into
industry. The intelligence failures that allowed the September 11, 2001 attacks were portrayed
in the media as a “failure to connect the dots,” as individual government agencies had
suspicions about individual terrorists, but no one was collecting and analyzing the big picture.
Well, connecting dots means understanding the relationships between individual data sets,
and although it still has a lot of room for improvement, the US Intelligence community was
one of the first adopters of graph visualization, specifically among anti-terrorism analysts and
investigators. The application of graphs to understand flows across a network also appealed to
the investigators looking at money laundering. Often, discovering money laundering involves
looking at financial flows between people and companies and identifying areas that don’t make
sense, either because they have more inflows than outflows, or because they appear central in
a network when they shouldn’t be. Fraud isn’t limited to financial fraud – any sort of
misrepresentation for gain is fraud, and a relatively recent fraudulent practice is review fraud,
or submitting fake positive reviews of products or services in which one has a financial stake,
or fake negative reviews of one’s competitors. As an increasing sector of the economy is
©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and
other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders.
18
comprised of these middlemen who match services or products and their customers (think
OpenTable for resturants, Uber for taxicabs, or Yelp for local businesses), the need to ensure
the integrity of these reviews is critical. I’ve been using “networks” thus far in the book to
mean generic mathematical sense, any collection of nodes and the edges connecting them.
Most people, however, think of computer networks, either a basic local area network, or
something as complex as the global Internet. As the world continues to evolve toward the
Internet being the primary communications infrastructure, not just for people but for devices
as well (the Internet of things, or IoT), understanding how all these things may be connected
to one another gains much higher importance. Cyber Security has become tremendously
important over the last few years, as more critical functions of businesses and individuals are
conducted over the internet. Graphs can be used to identify both weaknesses in computer
network infrastructure and to visualize a cyber attack to determine how to stop one in
progress or prevent a future one.
We’ll be looking at all the above examples in this chapter, including how graph
visualizations aid law-enforcement investigations at both the national and local level, and help
businesses weed out fraudulent behavior by their customers or online reviewers. These case
studies will illustrate several different ways that graph visualization has been used successfully
in real life. Here are some other industries where graph visualization may be useful:
Table 2.1: More industries and data where graph visualization can be used
The data from each of these case studies is real-life data from the domains, however, some of
it has been anonymized to protect confidentiality. I’d encourage you to download it from
<here (provide URL)> to play with visualizations yourself.
©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and
other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders.
19
A table version of Al Qaeda members and sympathizers and their whereabouts, taken from Marc Sageman’s
book Understanding Terror Networks. In a graph visualization, these people will be the nodes.
And here’s the data showing the relationships between these people as a matrix.
The relationship of the people from Figure 2.2, in matrix format. These relationships will be represented by
edges in a graph visualization.
Now we’ll expand this to represent all 172 terrorists in a node-link visualization using the fact
that two people know one another as a link between them in the chart. We’ve made a couple
of design choices here – one is to use the flag of the country that the person lives in (or lived
©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and
other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders.
20
in, this is 2004 data) as the node icon. This is helpful because it allows us to tell at a glance
where people are from. We’re going to draw all links identically. Normally we’d want to use
visual properties of the links like width and color to indicate something substantive about the
data, but our data just includes whether a link exists, without a lot of properties on those
links. We’ll also run a force-directed layout, which creates separation between the nodes and
attempts to make the chart more readable. The result is below:
A graph visualization illustrating the relationships among 172 people in the worldwide Al Qaeda network.
Nodes are people, and edges show who knows whom.
As you can see, a static view of 172 nodes isn’t likely to be very helpful. There is a temptation
in graph visualizations to show more and more data all at once, and that can be
counterproductive, as your diagram becomes less and less readable. Zooming in on
subsections of the chart can give some better insights, as seen below.
©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and
other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders.
21
Zooming in allows you to see in better detail the relationships between Encep Nurjaman, a Malaysian
member of Al Qaeda, and an international group of his associates.
Now, we see Encep Nurjaman, from the table above, represented by a Malaysian flag, as the
key connection point between a group of mostly Malaysians below and a mix of countries
above. And in fact, this is true – Nurjaman was known as the “Osama bin Laden of Southeast
Asia”, and was the main link between Al Qaeda’s Middle East arm, and its Southeast Asian
operations, so even with no additional information other than who knows whom else, we’ve
identified some key people in this network.
So even without diving in deeper with more properties for each node, the graph
visualization helps identify who knows whom on an international scale. Merely displaying the
data graphically enables you to identify key patterns from a data set that would be impossible
to see in tabular form. From there, you can start to look for hubs – focal points where one
cluster meets another, indicating someone with broad reach within a network.
©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and
other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders.
Random documents with unrelated
content Scribd suggests to you:
to see the ladies home.
“Quilting Bees” define themselves in their name. They were very
similar to spinning bees, except that the work was done after the
guests had assembled.
Of “Stoning Bees,” “Logging Bees” and “Raising Bees,” description is
unnecessary. The names are almost self-explaining, though just why
they were called “Bees” I cannot learn, unless it is because those
who came were expected to, and usually did, imitate the industrial
virtues of that insect. They were also sometimes called “frolics,”
possibly for the reason that the frolicking was often as hard and as
general as the work. Strong and hearty men were much inclined to
playful trials of strength and other frivolities when they met at such
times. This tendency was much enhanced in the earlier days by the
customary presence of intoxicants.
These amusements were varied and extended far beyond those
above mentioned. They exhibited and illustrate much of the
character, surroundings and habits of those early people. They
wanted no better amusement. It was, in their esteem, a wicked
waste of time and in conflict with their necessary economies to have
parties or gatherings of any kind exclusively for amusement, and
unaccompanied with some economic or industrial purpose like those
indicated above.
The dancing party or ball was a thing of later date, but even when it
came, and for many years after, it was looked upon by the more
serious people as not only wicked and degrading in a religious and
moral point of view, but very wasteful in an economic sense.
Their hard sense taught them that their industrio-social gatherings,
together with the church meetings and Sunday-schools, furnished
ample occasions for the young to meet and become acquainted,
while the elements of evil that crept into modern society elsewhere
were there reduced to a minimum.
A THRIFTY STOREKEEPER
A good story is told of Joseph Hoover dating well back in the first half
of the century. He went one day to the store of Mr. Jacob R——, in a
neighboring town, to get a gallon of molasses, taking with him the jug
usually used for that purpose. As it happened that day, the son,
Isaac, who usually waited on him, was otherwise engaged, and the
father, Jacob, went down cellar to draw the molasses. After being
gone some time, Jacob called up from the cellar to Joseph and said
that the jug did not hold a gallon. “Call Isaac,” replied Hoover, “and
let him try; he has always been able to get a gallon in that jug!”
THE ARMY OF THE POTOMAC
A PAGE OF HISTORY CORRECTED
III
HALLECK AND POPE
L’ENVOI
FOOTNOTES:
[13] November 20, pp. 825-6.
[14] P. 818
[15] Page 817
[16] Ibid.
[17] Ibid.
[18] Ibid.
[19] P. 817.
[20] P. 818.
[21] Ibid.
[22] P. 921.
[23] Ibid.
[24] P. 822
[25] Vol. II, Part I, p. 454
[26] O. R., Vol. XII, Part III, p. 825.
[27] See Warden’s Chase, p. 415.
THE NORTHERN NECK OF VIRGINIA
PRESENT-DAY ASPECTS OF WASHINGTON’S BIRTHPLACE