Monalisa Thesis
Monalisa Thesis
by
Monalisa
2009EE50484
1 Introduction 4
1.1 Systems Biology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Real World Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Generative Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.1 The Inferelator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.2 Inductive Logic Programming (ILP) . . . . . . . . . . . . . . . . . . . 7
1.4.3 Comparative Network Analysis . . . . . . . . . . . . . . . . . . . . . 8
3 Results 12
3.1 Inter-class Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4 K-core and Number of edges . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4 Conclusion 21
References 31
1
List of Tables
2
List of Figures
1.1 Obataining the empiral error prior from a given target data set1 . . . . . . . 9
3
Chapter 1
Introduction
There have been many attempts to understand intricacies of human body, the biological in-
teractions going on inside the body all the time. A simple activity is dependent on various
interactions, factors and moderators which are in turn controlled by some other factors or
moderators. For example, p53 (a 393-amino-acid protein sometimes called the guardian of
genome) acts as tumour suppressor because of its position within a network of transcription
factors. However, p53 is activated, inhibited and degraded by modifications such as phos-
phorylation, dephosphorylation and proteolytic degradation, while its targets are selected by
the different modification patterns that exist. [3]. So, neither p53 nor the network work as
a tumor suppressor by itself. There is a symbiotic kind of relation between them[3]. Now,
the challenge is to use these kind of data (i.e. gene expression microarrays, protein-protein
interaction) to understand underlying systems and mechanisms. Systems biology aims to
tackle this by using the mathematial abstraction of graph to represent the system consisting
of interacting component. The interactions and proteins or genes (or other participating
molecules) are viewed as edges and nodes of a network(graph) to help understand them
better. This project focuses on this only.
Basically, we are trying to understand system level of biology. In this approach, the un-
derstanding of structure and dynamics of a biological system is as important as the under-
standing of interacting genes and proteins. Just the knowledge of ’what’ is not enough, we
also need to know ’how’ the interaction is happening if we want to understand how a gene
or protein affects the system. We need a dynamic picture. We can have this understanding
by insight in four key features namely [2]-
• System structures - These include genes and proteins, biological pathways and their
interactions.
• System dynamics - It includes the behaviour of system over time and under different
4
conditions. This mainly deals with the how part of systems biology
• Control method - The cell functions can be modulated by some mechanisms i.e. we
can kind of control the environment and prevent malfunctioning of the cell
• Design method - We can use the information to design and modify biological systems
instead of trial and error method.
Insight in any of the above features would help us to understand the biological networks,
which has its own benefits. We can find out where exactly is the problem and what caused
it. We can design new drugs and medicines which can be more precise and effective.
An accurate model of a biological network is highly desirable. We are looking for some
patterns or similarities between biological networks which can guide or inference or recon-
struction process. We are trying to find some structural signatures of the biological networks.
For example, consider architecture. By looking at a building or its structure, an architect
can easily tell what kind of building it is and can also list some salient features of that type
of architecture. We want to have this kind of understanding of biological networks. We
want the structural signature to be incorporated in the reconstruction process. To look this
thing in more detail, we are trying to find features which distinguish biological networks
from the non biological ones ans then use those structural differences as a prior (or given) in
development of an algorithm.
1.2 Networks
1.2.1 Real World Networks
This project mainly concerns with real world networks1 and their structural differences and
similarities. They can be broadly divided into two types of networks -
• Interaction - These type of networks show the interaction between its nodes. Examples
include social networks like facebook, protein interaction networks.
5
Biological Networks
Proteins interact with each other to carry out various bodily and cell functions. Protein-
protein interactions refer to intentional physical contacts established between two or more
proteins as a result of biochemical events and/or electrostatic forces. These interactions
combine to form a protein interaction network. We have 67 of biological networks in our
database[4].
1.3 Features
In this section, we define some of the features (or network diagnostics) which we use to
understand the networks. These form columns of the design matrix (defined in later section).
There are 253 features, some of which that recur in this report -
• Density - It is the ratio of number of edges in the network to the maximum number
of possible edges.
• Eigenvector centrality - A centrality measure, whih gives weights to the edges ac-
cording to the importance of their nodes.
6
• Degree centrality - It gives the ratio of a degree of node to the maximum possible
degree of the node (i.e. n-1, if we ignore self edges.)
dy
τ = −y + g(β.Z) (1.1)
dt
Here, Z = (z1 (X), z2 (x), ..., zm (X)) is a set of functions on X. The coefficient β = (β1 , β2 , ..., βm )
denote the influence of Z on y, i.e, if it is positive, we can say that it acts as an inducer while
negative coefficient would mean it acts as an repressor. τ is the time constant of level y in
absence of external influence. The function g acts ac an activation function and takes the
form of sigmoidal or logistic function.
1
g(β.Z) = (1.2)
1 + e−β.Z
Multivariate regression is used to find β. For model selection, LASSO is used.
• Background knowledge - These are statements which define some qualitative con-
straints. It is the rule book for the model, so that we don’t go on searching every
path whether its feasible or not
• Examples - Examples are the observations (or data points). Given the definitions
defined in the background knowledge, a model is said to be an explanation of an example
if it yeilds true for the example
7
• Refinement Operator - This operator defines how the descendents of a node of tree
would be. The descendents are mainly defined as whether they are generalisations or
specialisations of the node.[11] Generalistion refers to removing one or more components
or disconnecting them while specialisation means adding new components or connecting
the existing ones.
ILP can be incremental, meaning, we can create a model of bigger and complex systems by
breaking it into sub systems and creating a model for them and then sending it into another
ILP which specialises for the complex model.
• Hardness regression - It aims to identify the features of networks which makes TSP
(travelling salesman) computation for the given graph easier and exploiting the result
that some graphs can be solved more easily than others. Some features correlate highly
with the solution length or runtime of the solver which can be calculated more easily
and can be used to estimate the solution length or runtime.
• Phylogeny regression - It identifies if there are any signs of evolution i.e. phylogenetic
signal in features as features evolve with evolution.
8
Figure 1.1: Obataining the empiral error prior from a given target data set2
2
This image is courtesy unpublished paper by Dr. S. Agarwal and G. Villar, and N. S. Jones[5]
9
Chapter 2
After normalisation, the features (columns) having less than 80% of their entries filled are
discarded. This reduces our number of features to 211. For the columns with missing entries,
the NaN values are replaced by the average of that column.
Since, the number of features is too large, this (normalised) matrix is reduced using Isomap
and PCA. 10 principal features of each reduced matrix is taken into account for analysis.
Separate analysis is done for GRNs, as new matrix is created adding them into the list
of networks already taken. After normalisation and discarding of features, we are left with
210 features. This matrix is also reduced using Isomap and PCA and 10 principal features
are considered. Euclidean distance from the 3 GRNs is calculated for all the networks.
We tried to find out most significant feature(s) using various techniques. The features were
ranked according to their interclass variances to see which feature is the most diverse. SVM
was used to get a linear classifier for biological and non biological networks and then features
were arranged according to their weights. Features, which were maximally correlated with
the reduced matrix (as told earlier), were found out and ranked accordingly.
1
The code for which is obatined from the unpublished paper by Dr. S. Agarwal and G. Villar, and N. S.
Jones[5]
10
2.2 Relation between k-core of a graph and its number
of edges
As discussed earlier, there exists a relation between the number of edges and the emergence
of a k-core in an Erdos-Renyi random graph. It has been shown that the probability of
emergence of a giant k-core (for k > 2) is high, when number of edges reaches a constant(ck )
multiplied by number of nodes. This constant varies with the value of k and can be found[17].
λ
ck = minλ>0 (2.2)
πk (λ)
and
πk (λ) = P{P oisson(λ) ≥ k − 1} (2.3)
where, ck /2 is the constant required and k is minimum degree of nodes in the core. ck is the
minimum value for which a k-core exists in G(n,m) (m > ck ). Below this value, it doesnt
exist. As shown in the result by Erdos and Renyi, for large n, there is birth of a component
when m reaches n/2 [7]. So, ck >1. The value of this constant for k = 3 is 3.35 and for k =
4 is 4.88. We took ratio of number of edges(m) to ck n/2 i.e.
m
ρk = (2.4)
ck n/2
Value of ρk was calculated for each network for k = 3,4 and was added into the network-
feature matrix as an extra feature.
11
Chapter 3
Results
Features Variance
fraction2core 8.006
fraction3core 7.944
fraction4core 7.870
clusteringCoeff trimmean10 6.965
clusteringCoeff mean 6.925
evectorCentrality max 6.869
As apparent from the figure 3.1, approximating distribution of networks for these feaure
values using guassian distribution we can see that the two classes of networks are quite
distinguishable. We, then, constrained the size and density of network to 1000 nodes and
0.251 , respectively. We found out that the top contenders remained the same.
1
density was on scale of 0 - 0.5
12
(a) Without any distribution approximation
Figure 3.2: Histogram features with max variance after some constraints
14
3.2 SVM
A Support Vector Machine (SVM) is a discriminative classifier formally defined by a sep-
arating hyperplane. In other words, given labeled training data (supervised learning), the
algorithm outputs an optimal hyperplane which categorizes new examples.[14]
As told earlier, SVM was applied on the design matrix using linear kernel and cross-validation.
It was able to classify normalised design matrix with 91.88%, isomap reduced design matrix
with 74.36% and PCA reduced matrix with 85.47% accuracy.
Features Weight
clusteringCoeff medad 1.192
assortativeCoefficient snowball100 0.996
fraction2core 0.879
betweenCentrality max 0.878
betweenCentrality range 0.878
Table 3.2: Features with maximal weights after applying SVM on normalised design matrix
Table 3.3: Maximum correlated features with Isomap reduced features (without GRN)
As we can see, the first feature is highly correlated with degree centrality (which is a
measure of how the degree of nodes is spread) while the second feature is correlated with
fraction-2-core and number of nodes. So, we can say that size and node variability of network
capture most of the variance.
15
Dimension Feature1 r1 Feature2 r2
1 clusteringCoeff min -0.9409 transitivity -0.9304
2 evectorCentrality posrms -0.8413 numNodes snowball100 0.7671
3 closeness meanad 0.7969 closeness medad 0.7901
4 degree fit gamma -0.6231 degree fit wbl -0.6230
5 betweenCentrality posrms -0.4997 degreeCentrality range -0.4886
Table 3.4: Maximum correlated features with PCA reduced features (without GRN)
The first two principal components of PCA capture 56% of variance among them. Since,
it is a linear technique the variance captured is not high as the features themselves, can be
quite correlated.
When we reduced the design matrix containing 3 GRNs using isomap and PCA, respec-
tively, we find them among the biological networks. The correlation of the reduced design
matrix with normalised design matrix was found out for both Isomap and PCA reduction.
We find that the results are not that different from the design matrix which didn’t have gene
regulatorey networks. But, this can be very easily accounted for, by the fact that we didn’t
have many GRNs to work with.
Figure 3.3: Network clustering via Isomap dimensionality reduction. Here, GRNs are shown
in black color, biological networks in green and non-biological in black
16
Figure 3.4: Network clustering via PCA dimensionality reduction. Here, GRNs are shown in
black color, biological networks in green and non-biological in black
We also tried to find out the distance between GRNs and the networks in our database.
The measure you used to find the distance is Euclidean distance measure. As can be seen,
the biological networks are closest to them.
17
3.4 K-core and Number of edges
We saw that fraction-k-core is the most significant feature, so we dwelled deeper into this.
We found that there are some networks which have low density but high fraction-k-core. In
the Fig 3.5 and 3.6, we can see significant number of networks in the top right corner of the
graph, which include a few biological networks.
So, we applied the method, which is defined for the genrative models, for our real world
networks. We found out that this property was partially able to explain this behavior of real
world networks. In Fig 7 and 8, we see that for ρk = 10, (which means number of edges as
less as 4̃ times the number of nodes) the fraction-k-core can be as high as 1.
18
Figure 3.6: Network clustering in density and fraction-4-core space
19
Figure 3.8: Network clustering in ρ4 (log scale) and fraction-4-core space
20
Chapter 4
Conclusion
Till now, we saw that fraction-k-core features are the most signficant features that differ
in biological and non biological networks. For biological networks, generally, the value of
this feature is low. After that, there is clustering coefficient feature which has low value for
biological networks. It is more scattered and sparse in nature. Then we have eigenvector
centrality (max), which has large values for biological networks. But the mean of eigenvector
centrality for biological and non biological networks seem to have similar distribution. From
this, we can conclude that the biological networks may have some important nodes, i.e., it
might have some proteins or molecules which is effecting the whole network the most. We
apply SVM on the design matrix and it gives us clustering coefficient as the prominent feature.
All the methods we tried to find out the prominent features like svm, inter-class variance gave
us inconsistent results. But we can say that the distinguishing factor between biological and
non biological networks is that biological networks are sparse and small in size as compared
to non biological networks.
Due to limited data available on GRNs, we can’t conclude much. Seeing what we have,
we can say that Bsubtilis and E.Coli have similar behaviours while Dream4 seems to deviate
from them. Network clustering done using Isomap and PCA on design matrix with GRNs
doesn’t show much difference from what we observed for 234 networks. The maximal corre-
lated features with first few principal components of Isomap reduced matrix seem to capture
the degree distribution of nodes, fraction-k-core and size of the networks, while those of PCA
seem to give more preference to clustering coefficient of a network. Since, the number of
GRNs is just 3, we can’t really say anything.
We observed an interesting feature in the networks. They had high fraction core even though
the density was low. This can be partially explained by a research paper by Pittel et al.[17].
This information can also be helpful in thinking about what kind of networks they are and
how can we develop an algorithm to generate models for the same.
Keeping the structural results that we got in mind, we can develop algorithms which con-
structs the network with the structural constraints. We can use ABC or some of the other
methods like the Bayesian network[18] or ARACNE[19], where this can be used as a prior or
can even act as a check for the generated networks.
21
Appendix A
Following is the list of all network diagnostics (features) used in this project. For each feature,
shoet name is what that has been used in the text and code. For statistics, the short name
is suffixed with the statistics name. (For example - clusteringCoeff min means minimum
clustering coefficient for the graph) The code for evaluating this project has been obtained
from the thesis of Dr. Sumeet Agarwal [4].
22
Short Name Full Name
modularity Spectrally optimized modularity
modularityFast Louvain optimized modularity
greedyPartitionEntropy Entropy of Louvain partition
spectral Newmans spectral community detection
greedyComm Louvain community detection
pottsModel Potts model community detection
infomap Infomap community detection
Clustering
transitivity Transitivity
clusteringCoeff Clustering coefficient
clustSofferGlobalMean Global mean Soffer clustering coefficient
clustSofferLocalMean Local mean Soffer clustering coefficient
Distance
diameter Graph diameter
radius Graph radius
szegedIndex Szeged index
cyclicCoefficient Cyclic coefficient
geodesicDistanceMean Mean geodesic distance
geodesicDistanceVar Variance of geodesic distance
harmonicMeanGeoDist Harmonic mean geodesic distance
Complexity
cyclomaticNumber Cyclomatic number
edgeFraction Edge fraction
connectivity Connectivity
logNumSpanningTrees log(number of spanning trees)
graphIndexComplexity Graph index complexity
mediumArticulation Medium articulation
efficiency Efficiency
efficiencyComplexity Efficiency complexity
offDiagonalComplexity Off-diagonal complexity
chromaticNumber Chromatic number
tspl TSP length from cross-entropy algorithm
tsplga TSP length from genetic algorithm
tsplsa TSP length from simulated annealing
Spectral
largestEigenvalue Largest eigenvalue
spectralScalingDeviations Deviations from perfect spectral scaling
algebraicConnectivity Algebraic connectivity
algebraicConnectivityVector Algebraic connectivity vector
fiedlerValue Fiedler value
Statistical physics energy Energy
entropy Entropy
Motif
fraction3motifs Fraction of 3-motifs
fraction4motifs Fraction of 4-motifs
23
Short Name Full Name
Size
numNodes Number of nodes
numEdges Number of edges
totStrength Sum of all link weights
Model
ergm edges Exponential random graph model for edges
fitPowerLawAlpha Fitted power law exponent for degrees
fitPowerLawP p-value of power law fit to degrees
Table A.1: List of features
24
Appendix B
This is the list of 192 real world networks used in this project obtained from the thesis of Dr.
Sumeet Agarwal [4]
Name Category
Human brain cortex: participant A1 Brain
Human brain cortex: participant A2 Brain
Human brain cortex: participant B Brain
Human brain cortex: participant D Brain
Human brain cortex: participant E Brain
Human brain cortex: participant C Brain
Cat brain: cortical Brain
Cat brain: cortical/thalmic Brain
Macaque brain: cortical Brain
Macaque brain: visual/sensory cortex Brain
Brain Macaque brain: visual cortex 1 Brain
Macaque brain: visual cortex 2 Brain
Co-authorship: astrophysics Collaboration
Co-authorship: comp. geometry Collaboration
Co-authorship: condensed matter Collaboration
Co-authorship: Erdos Collaboration
Co-authorship: high energy theory Collaboration
Co-authorship: network science Collaboration
Hollywood film music Collaboration
Jazz collaboration Collaboration
Facebook: Caltech Facebook
Facebook: Cornell Facebook
Facebook: Dartmouth Facebook
Facebook: Georgetown Facebook
Facebook: Harvard Facebook
Facebook: Indiana Facebook
Facebook: MIT Facebook
Facebook: NYU Facebook
Facebook: Oklahoma Facebook
Facebook: Texas80 Facebook
25
Name Category
Facebook: Trinity Facebook
Facebook: UCSD Facebook
Facebook: UNC Facebook
Facebook: USF Facebook
Facebook: Wesleyan Facebook
NYSE: 1980-1999 Financial
NYSE: 1980-1983 Financial
NYSE: 1984-1987 Financial
NYSE: 1988-1991 Financial
NYSE: 1992-1995 Financial
NYSE: 1996-1999 Financial
Phanerochaete velutina control11-2 Fungal
Phanerochaete velutina control11-5 Fungal
Phanerochaete velutina control11-8 Fungal
Phanerochaete velutina control11-1 Fungal
Phanerochaete velutina control17-2 Fungal
Phanerochaete velutina control17-5 Fungal
Phanerochaete velutina control17-8 Fungal
Phanerochaete velutina control17-11 Fungal
Phanerochaete velutina control14-2 Fungal
Phanerochaete velutina control14-5 Fungal
Phanerochaete velutina control14-8 Fungal
Phanerochaete velutina control14-11 Fungal
Online Dictionary of Computing Language
Online Dictionary Of Information Science Language
Reuters 9/11 news Language
Roget’s thesaurus Language
Word adjacency: English Language
Word adjacency: French Language
Word adjacency: Japanese Language
Word adjacency: Spanish Language
Metabolic: CE Metabolic
Metabolic: CL Metabolic
Metabolic: CQ Metabolic
Metabolic: CT Metabolic
Metabolic: DR Metabolic
Metabolic: HI Metabolic
Metabolic: NM Metabolic
Metabolic: OS Metabolic
Metabolic: PA Metabolic
Metabolic: PG Metabolic
Metabolic: PH Metabolic
Metabolic: PN Metabolic
Metabolic: SC Metabolic
Metabolic: ST Metabolic
26
Name Category
Metabolic: TP Metabolic
Bill cosponsorship: U.S. House 96 Political: cosponsorship
Bill cosponsorship: U.S. House 97 Political: cosponsorship
Bill cosponsorship: U.S. House 98 Political: cosponsorship
Bill cosponsorship: U.S. House 99 Political: cosponsorship
Bill cosponsorship: U.S. House 100 Political: cosponsorship
Bill cosponsorship: U.S. House 101 Political: cosponsorship
Bill cosponsorship: U.S. House 102 Political: cosponsorship
Bill cosponsorship: U.S. House 103 Political: cosponsorship
Bill cosponsorship: U.S. House 104 Political: cosponsorship
Bill cosponsorship: U.S. House 105 Political: cosponsorship
Bill cosponsorship: U.S. House 106 Political: cosponsorship
Bill cosponsorship: U.S. House 107 Political: cosponsorship
Bill cosponsorship: U.S. House 108 Political: cosponsorship
Bill cosponsorship: U.S. Senate 96 Political: cosponsorship
Bill cosponsorship: U.S. Senate 97 Political: cosponsorship
Bill cosponsorship: U.S. Senate 98 Political: cosponsorship
Bill cosponsorship: U.S. Senate 99 Political: cosponsorship
Bill cosponsorship: U.S. Senate 100 Political: cosponsorship
Bill cosponsorship: U.S. Senate 101 Political: cosponsorship
Bill cosponsorship: U.S. Senate 102 Political: cosponsorship
Bill cosponsorship: U.S. Senate 103 Political: cosponsorship
Bill cosponsorship: U.S. Senate 104 Political: cosponsorship
Bill cosponsorship: U.S. Senate 105 Political: cosponsorship
Bill cosponsorship: U.S. Senate 106 Political: cosponsorship
Bill cosponsorship: U.S. Senate 107 Political: cosponsorship
Bill cosponsorship: U.S. Senate 108 Political: cosponsorship
Committees: U.S. House 101, comms. Political: committee
Committees: U.S. House 102, comms. Political: committee
Committees: U.S. House 103, comms. Political: committee
Committees: U.S. House 104, comms. Political: committee
Committees: U.S. House 105, comms. Political: committee
Committees: U.S. House 106, comms. Political: committee
Committees: U.S. House 107, comms. Political: committee
Committees: U.S. House 108, comms. Political: committee
Committees: U.S. House 101, Reps. Political: committee
Committees: U.S. House 102, Reps. Political: committee
Committees: U.S. House 103, Reps. Political: committee
Committees: U.S. House 104, Reps. Political: committee
Committees: U.S. House 105, Reps. Political: committee
Committees: U.S. House 106, Reps. Political: committee
Committees: U.S. House 107, Reps. Political: committee
Committees: U.S. House 108, Reps. Political: committee
Roll call: U.S. House 101 Political: voting
Roll call: U.S. House 102 Political: voting
27
Name Category
Roll call: U.S. House 103 Political: voting
Roll call: U.S. House 104 Political: voting
Roll call: U.S. House 105 Political: voting
Roll call: U.S. House 106 Political: voting
Roll call: U.S. House 107 Political: voting
Roll call: U.S. House 108 Political: voting
Roll call: U.S. Senate 101 Political: voting
Roll call: U.S. Senate 102 Political: voting
Roll call: U.S. Senate 103 Political: voting
Roll call: U.S. Senate 104 Political: voting
Roll call: U.S. Senate 105 Political: voting
Roll call: U.S. Senate 106 Political: voting
Roll call: U.S. Senate 107 Political: voting
Roll call: U.S. Senate 108 Political: voting
U.K. House of Commons voting: 1992-1997 Political: voting
U.K. House of Commons voting: 1997-2001 Political: voting
U.K. House of Commons voting: 2001-2005 Political: voting
U.N. resolutions 59 Political: voting
U.N. resolutions 60 Political: voting
U.N. resolutions 61 Political: voting
U.N. resolutions 62 Political: voting
Biogrid: A. thaliana Protein interaction
Biogrid: C. elegans Protein interaction
Biogrid: D. melanogaster Protein interaction
Biogrid: H. sapiens Protein interaction
Biogrid: M. musculus Protein interaction
Biogrid: R. norvegicus Protein interaction
Biogrid: S. cerevisiae Protein interaction
Biogrid: S. pombe Protein interaction
DIP: H. pylori Protein interaction
DIP: H. sapiens Protein interaction
DIP: M. musculus Protein interaction
DIP: C. elegans Protein interaction
Human: CCSB Protein interaction
Human: OPHID Protein interaction
Protein: serine protease inhibitor (1EAW) Protein interaction
Protein: immunoglobulin (1A4J) Protein interaction
Protein: oxidoreductase (1AOR) Protein interaction
STRING: C. elegans Protein interaction
STRING: S. cerevisiae Protein interaction
Yeast: Oxford Statistics Protein interaction
Yeast: DIP Protein interaction
Yeast: DIPC Protein interaction
Yeast: FHC Protein interaction
Yeast: FYI Protein interaction
28
Name Category
Yeast: PCA Protein interaction
Corporate directors in Scotland (1904-1905) Social
Corporate ownership (EVA) Social
Dolphins Family planning in Korea Social
Unionization in a hi-tech firm Social
Communication within a sawmill on strike Social
Leadership course Social
Les Miserables Social
Marvel comics Social
Mexican political elite Social
Pretty-good-privacy algorithm users Social
Prisoners Social
Bernard and Killworth fraternity: observed Social
Bernard and Killworth fraternity: recalled Social
Bernard and Killworth HAM radio: observed Social
Bernard and Killworth HAM radio: recalled Social
Bernard and Killworth office: observed Social
Bernard and Killworth office: recalled Social
Bernard and Killworth technical: observed Social
Bernard and Killworth technical: recalled Social
Kapferer tailor shop: instrumental (t1) Social
Kapferer tailor shop: instrumental (t2) Social
Kapferer tailor shop: associational (t1) Social
Kapferer tailor shop: associational (t2) Social
University Rovira i Virgili (Tarragona) e-mail Social
Zachary karate club Social
29
Appendix C
Name Category
Anopheles gambiae Protein interaction
Arabidopsis thaliana Protein interaction
Aspergillus nidulans Protein interaction
Bacillus subtilis Protein interaction
Bos taurus Protein interaction
Caenorhabditis elegans Protein interaction
Candida albicans SC5314 Protein interaction
Canis familiaris Protein interaction
Cavia porcellus Protein interaction
Chlamydomonas reinhardtii Protein interaction
Cricetulus griseus Protein interaction
Danio rerio Protein interaction
Dictyostelium discoideum AX4 Protein interaction
Drosophila melanogaster Protein interaction
Equus caballus Protein interaction
Escherichia coli Protein interaction
Gallus gallus Protein interaction
Hepatitus C Virus Protein interaction
Homo sapiens Protein interaction
Human Herpesvirus 1 Protein interaction
Human Herpesvirus 2 Protein interaction
Human Herpesvirus 3 Protein interaction
Human Herpesvirus 4 Protein interaction
Human Herpesvirus 5 Protein interaction
Human Herpesvirus 6 Protein interaction
Human Herpesvirus 8 Protein interaction
Human Immunodeficiency Virus 1 Protein interaction
Human Immunodeficiency Virus 2 Protein interaction
Leishmania major Protein interaction
Macaca mulatta Protein interaction
Mus musculus Protein interaction
30
Name Category
Neurospora crassa Protein interaction
Oryctolagus cuniculus Protein interaction
Oryza sativa Protein interaction
Pan troglodytes Protein interaction
Plasmodium falciparum 3D7 Protein interaction
Rattus norvegicus Protein interaction
Ricinus communis Protein interaction
Saccharomyces cerevisiae Protein interaction
Schizosaccharomyces pombe Protein interaction
Simian-Human Immunodeficiency Virus Protein interaction
Sus scrofa Protein interaction
Ustilago maydis Protein interaction
Xenopus laevis Protein interaction
Zea mays Protein interaction
31
Bibliography
[1] T. Ideker, T. Galitski, and L. Hood. A New Approach To Decoding Life: Systems
Biology. Annu. Rev. Genomics Hum. Genet. 2001. 2:34372
[2] H. Kitano. Systems Biology: A Brief Overview. Science 295, 1662 (2002); DOI:
10.1126/science.1069492
[3] H. Kitano. Computational Systems Biology. Nature 420, 206-210 (14 November 2002)
[4] S. Agarwal. Networks in Nature: Dynamics, Evolution and Modularity. Ph.D. thesis,
University of Oxford (2012).
[8] Albert-László Barabási and Zoltán N. Oltvai. Network biology: Understanding the cell’s
functional organization. Nature Reviews Genetics, 5:101-113, 2004.
[9] R. Bonneau. Learning biological networks: from modules to dynamics. Nature Chemical
Biology 4:658-664 (2008).
[12] en.wikipedia.org/wiki/Gene
[16] J. B. Tenenbaum, V. Silva, and J. C. Langford. A global geometric framework for non-
linear dimensionality reduction. Science, 290(5500):2319-2323 (2000).
32
[17] B. Pittel, J. Spencer, N. Wormald. Sudden Emergence of a Giant k-core in a Random
Graph. Journal of Combinatorial Theory, Series B 67, 111-151 (1996).
33