0% found this document useful (0 votes)

32 views

TMP D73

Interactome and Gene Ontology provide congruent yet subtly different views of a cell. Gene Ontology and the protein-protein interaction network offer alternative views. Some significant differences were also detected, which may contribute to a better understanding of cell function and refinement of the current ontologies.

Uploaded by

Frontiers

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views

TMP D73

Uploaded by

Frontiers

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

BMC Systems Biology BioMed Central

Research article Open Access

Interactome and Gene Ontology provide congruent yet subtly
different views of a eukaryotic cell
Antonio Marco1 and Ignacio Marín*2

Address: 1Center for Evolutionary Functional Genomics, The Biodesign Institute, Tempe, Arizona State University, USA and 2Instituto de
Biomedicina de Valencia, Consejo Superior de Investigaciones Científicas (IBV-CSIC), Valencia, Spain
Email: Antonio Marco - [email protected]; Ignacio Marín* - [email protected]
* Corresponding author

Published: 15 July 2009 Received: 19 December 2008

Accepted: 15 July 2009
BMC Systems Biology 2009, 3:69 doi:10.1186/1752-0509-3-69
This article is available from: http://www.biomedcentral.com/1752-0509/3/69
© 2009 Marco and Marín; licensee BioMed Central Ltd.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract
Background: The characterization of the global functional structure of a cell is a major goal in
bioinformatics and systems biology. Gene Ontology (GO) and the protein-protein interaction
network offer alternative views of that structure.
Results: This study presents a comparison of the global structures of the Gene Ontology and the
interactome of Saccharomyces cerevisiae. Sensitive, unsupervised methods of clustering applied to a
large fraction of the proteome led to establish a GO-interactome correlation value of +0.47 for a
general dataset that contains both high and low-confidence interactions and +0.58 for a smaller,
high-confidence dataset.
Conclusion: The structures of the yeast cell deduced from GO and interactome are substantially
congruent. However, some significant differences were also detected, which may contribute to a
better understanding of cell function and also to a refinement of the current ontologies.

Background using non-directed, massive approaches (reviewed in ref-

Gene Ontology (GO) is "a set of structured vocabularies erences [2-4]). This accumulation of knowledge is of fun-
for specific biological domains that can be used to damental importance, because the set of all PPIs (known
describe gene products in any organism" [1]. GO attempts as PPI graph, PPI network or interactome) may be envis-
to summarize the current knowledge of the basic compo- aged as a functional map of the cell [3,5,6]. The fact that
nents that shape cell function in a given organism. How- most interactome data have been obtained by non-
ever, the current GO is still limited, given that we directed approaches avoids the bias just described for GO.
understand only part of the functions of any cell. Moreo- However, PPI data have also their own significant biases
ver, our current views are biased by the concentration of and shortcomings. An intrinsic problem is unavoidable:
research efforts on some aspects of cell metabolism and some aspects of cell metabolism may require few or no
function in detriment of others. This bias is caused by PPIs and therefore they will not be reflected in the interac-
most data used to assign GO terms deriving from hypoth- tome. The second problem is that so far, even in the best
esis-driven approaches. analyzed species, data are still partial. In addition, some
protein interactions (e. g. those that occur along brief peri-
In the last years, large protein-protein interaction (PPI) ods of time) are difficult to detect with the current meth-
datasets have been characterized in several organisms ods. Finally, there is some controversy over the quality of

Page 1 of 15
(page number not for citation purposes)
BMC Systems Biology 2009, 3:69 http://www.biomedcentral.com/1752-0509/3/69

the PPI data generated in massive, high-throughput exper- we described novel strategies of graph analysis and we
iments [7-11]. showed their usefulness to explore the structures of differ-
ent complex biological graphs, such as the interactome or
GO and interactome provide alternative views of how an protein domain graphs [15,28-30]. Our methods generate
organism is structured and functions. It is thus logical to hierarchical structures, dendrograms, based on the aver-
explore whether they are congruent. This is however prob- age strength of the connections among the units of a
lematic, because GO and PPI data are very different. On graph, and then establish whether clusters in the dendro-
one hand, gene products may be either annotated or not grams are enriched for units with particular features.
with GO terms. Thus, from the point of view of each GO These procedures open the way for a global comparison of
term, the classification is dichotomous. On the other interactome and GO. Particularly, they avoid the need of
hand, PPI data are best expressed as a graph or network of selecting modules to compare with GO. In interactome-
units (proteins) connected by edges (known interactions). based dendrograms, it is possible to include all proteins
How to compare then these two, so different, types of that we wish to analyze – without dividing them into
information? The simplest way to collate GO and interac- those highly connected, included in modules, and those
tome data is to characterize from PPI results groups of excluded from them – and to establish whether any clus-
densely connected units, i. e. modules [12-15] and then to ter of proteins, no matter the number of direct interac-
establish whether modules are statistically enriched for tions among its members, is enriched for GO terms. As we
particular GO terms. This strategy has been followed with will show, this allows for a precise mathematical determi-
success by several groups [12,15-18]. Discussions cur- nation of the similarity between the GO-based and the
rently center in the best way to define modules so they interactome-based classifications.
make sense from either the mathematical or the biological
point of view (e. g. refs. [18-20]), but it is generally In this study, we obtained a hierarchical representation of
accepted that modules are often enriched for particular large fragments of the interactome of Saccharomyces cerevi-
GO terms. This congruence between GO and PPI data has siae. Then, we determined and quantified the global simi-
led to works in which proteins are assigned functions larity between a significant part of the structures of
according to the GO annotations of their interaction part- interactome and GO in the yeast. Our results greatly
ners [21-23]. Similarity in GO annotations has been also enrich our knowledge of the relationships between the
used to predict interactions among pairs of proteins alternative views of the yeast cell that its gene ontology
[24,25]. and interactome provide.

It is very significant to point out that those results imply Results

just local congruence, but not necessarily global similar- A strategy to compare interactome and GO
ity, between the interactome and GO structures. GO and Saccharomyces cerevisiae has by far the best characterized
interactome could be congruent if we focus on highly con- interactome of any eukaryote. We thus decided to focus
nected and well-known sets of proteins, but still be very our research on this species. Our goal was to explore the
different in their global structures. In fact, in a deep sense, yeast data and to determine whether the hierarchical
it is trivial to find out that proteins in a particular module structure of the GO is reflected in the interactome. We
often share GO annotations, if only because many mod- chose a simple design, based on analyzing large parent
ules detected correspond to, or at least include, protein GO terms which are subdivided into several child GO
complexes, which contain units that work together in the terms. The question that we wanted to solve is whether we
cell. Thus, all analyses performed so far fall short of were able to detect clusters corresponding to the child
addressing the general question of whether GO and PPI terms in a dendrogram, generated from PPI data, which
data offer compatible views of an organism. included all the proteins of a parent GO term. If we were
able to do so, it would mean that GO and interactome
It is also clear that, to characterize the level of global sim- have similar structures.
ilarity between GO and interactome, the analysis of mod-
ules has important methodological limitations. First, Therefore, our general strategy to establish the level of
proteins excluded from modules are not analyzed, so a congruence between interactome and GO had two steps
fully global, statistical estimation of congruence is intrin- (Figure 1). First, trees were generated, using UVCLUSTER
sically impossible. Second, the interactome graph struc- (ref. [15]; see Methods), for proteins encoded by genes
ture has small world properties, meaning that many units/ included in a general, parent GO term. As indicated
proteins are connected to other proteins and that the dis- above, these trees are based on the relative strength of the
tances among all them, measured as their shortest path connections among proteins, based on interactome data.
lengths, are very small [26,27]. These problems suggest Second, TreeTracker [30] was used to determine whether
that a novel type of approach is needed. In recent works, groups of proteins which appeared clustered together in

Page 2 of 15
(page number not for citation purposes)
BMC Systems Biology 2009, 3:69 http://www.biomedcentral.com/1752-0509/3/69

Figure 1 of the strategy used to compare GO and the interactome

Overview
Overview of the strategy used to compare GO and the interactome. For a given parent GO term, we extracted the
proteins annotated with it and determined their primary distances (shortest path length) in the protein interaction network.
The resulting graph was transformed into a dendrogram with UVCLUSTER. We then retrieved the proteins annotated with
each child GO term and labeled them in the tree. We finally detected, using the program TreeTracker, the clusters in the tree
significantly enriched for each child GO term.

those trees were significantly enriched for some child GO about 80% of the interactions derive from massive exper-
terms, hierarchically situated just below the parent term in iments. Second, we used the "Binary gold standard data-
the GO structure. If interactome and GO are congruent, set" (which we will call from now on "GOLD dataset"), a
we would expect to detect in a tree clusters of units set of 1318 high-confidence binary interactions selected
enriched for the child GO terms. A significant technical by Yu et al. [31]. The comparison between the results
point is that, because we use each parent term in isolation, obtained with the DIP dataset and those obtained with
we avoided the analytical problems which would derive the GOLD dataset will allow us to determine whether
from the fact that sometimes a GO term has several parent using massive data creates biases that may affect our gen-
terms. eral conclusions.

Table 1 summarizes the data for the nine parent terms About 79% of the proteins annotated with the nine
selected for this study (see Methods for the criteria used selected parent terms were included in the interactome
for choosing them). Interactome data were obtained from dataset that we obtained from the DIP database. The final
two different databases. First, we used all the information groups of proteins included in both the GO and the DIP
available for S. cerevisiae at the Database of Interacting interactome dataset contained from 230 to 632 units
Proteins (DIP; http://dip.doe-mbi.ucla.edu). This dataset (average: 354 units; Table 1). This means that each com-
contains both low- and high-throughput data, although parison included from 4 to 11% of all S. cerevisiae pro-

Page 3 of 15
(page number not for citation purposes)
BMC Systems Biology 2009, 3:69 http://www.biomedcentral.com/1752-0509/3/69

Table 1: Parent GO terms selected for the analysis, and number of elements included.

GO term Level1 Genes2 ORFs3 Prot. Prot.DIP/ Prot. Prot.GOLD/

DIP4 ORFs (%) GOLD5 ORFs (%)

Developmental process (BP) 1 768 757 632 83.5% 257 34.0%

Reproduction (BP) 1 299 298 245 82.2% 111 37.3%
Establishment of cellular localization (BP) 1 573 568 452 79.6% 188 33.1%
Response to stimulus (BP) 1 670 657 514 78.2% 207 31.5%
Ribonucleoprotein complex (CC) 2 556 459 318 69.3% 96 20.9%
Organelle envelope (CC) 2 346 345 230 66.7% 69 20.0%
Transcription regulator activity (MF) 1 307 303 276 91.1% 107 35.3%
Structural molecule activity (MF) 1 307 286 231 80.8% 75 26.2%
Transporter activity (MF) 1 380 377 297 78.8% 63 16.7%
Average: Average:
78.9% 28.3%

BP: Biological Process; CC: Cellular Component; MF: Molecular Function. 1: Levels of the parent GO terms. Level 1 terms are hierarchically located
just below the three main categories (BP, CC and MF) while Level 2 terms are below a Level 1 term. 2: Number of genes selected for the analysis,
i. e. those ascribed to the parent GO term which are also included in one of the selected child GO terms. 3: Genes among those in the previous
column that contain ORFs and therefore encode for proteins. 4: Number of products among those in the selected ORFs for which interactions
were compiled in the DIP database. 5: Same as 4, but for the GOLD dataset.

teins. The nine comparisons together included about 44% Once the data had been chosen, UVCLUSTER was used to
of the proteins present in the yeast (percentages derived obtain dendrograms, one per each of the nine parent GO
from [32]; notice that a protein may be annotated with terms (see Methods). Then, we searched for clusters of
multiple terms). The GOLD dataset is much more units significantly enriched for child GO terms using
reduced. Only 28% of the proteins annotated with one of TreeTracker (see again the Methods section for the
the nine parent GO terms were found in that dataset. The details). In Table 3 and Additional File 1, we describe the
average size of the groups analyzed was correspondingly results obtained. Table 3 contains the summary of results
much smaller than those found in DIP, including in aver- for parent GO terms and Additional File 1, the details for
age just 130 proteins (range 63 – 257; Table 1). In the next child GO terms. We used four parameters (coverage,
sections, we will first discuss the results obtained for the purity, ambiguity and Φ coefficient; see Methods for pre-
DIP dataset and, later, we will show that our main find- cise definitions) to quantify the results obtained. The
ings are confirmed with the smaller, high-confidence summary of the results detailed in Table 3 is as follows: 1)
GOLD dataset. Confirming that our methodology indeed detects clusters
highly enriched for the corresponding GO terms, the
Interactome and GO structures are substantially purity of the clusters (i. e. the percentage of proteins
congruent: DIP data included in a positive cluster, detected as significantly
The nine selected parent GO terms were subdivided into enriched for a given GO term, which indeed belong to
child terms, which are detailed in Table 2. Using DIP data, that GO term), was high (62 – 96%, average: 80.1%). This
we found that each child GO term included an average of is good evidence for our approach being very sensitive, in
96.7 proteins. Table 2 also shows an important prelimi- agreement with our previous work [30]; 2) Coverage (a
nary point, namely that interactome and GO data are measure of to which extent a given GO term is detected in
largely independent. Less than 5% of the proteins ana- the interactome data), was quite complete, ranging from
lyzed in the DIP dataset were assigned to a particular GO 34 to 67%, with a global average of 51.2%. This means
because of PPI data in absence of other evidence (i. e. that a significant fraction of proteins in the examined GO
assignations annotated as "inferred from physical interac- classes are recovered in the interactome-based clusters.
tion" in GO databases). Moreover, this percentage dimin- Interestingly, GO terms in the Biological Process category
ishes to only 3% if two exceptional child GO terms (Small had higher coverages (average: 61.2%) than those in the
nucleolar ribonucleoprotein complex and Structural constituent Cellular Component (average: 49.7%) or Molecular Func-
of cytoskeleton) are excluded and is 0.0% for 19 of the 46 tion (average: 39.0%) categories; 3) Ambiguity, which
child GO terms. Therefore, we can confidently assume measures cluster overlap, was variable, ranging from 0 to
that, if we find evidence for global congruence between 20% (average: 7.7%); and, 4) Finally, Phi coefficients (Φ),
the GO and interactome structures, this will not be caused a precise measure of correlation between GO and interac-
by PPI being systematically used to define to which GO tome data (see Methods), are all positive and quite high
terms the proteins are assigned. (+0.39 to +0.64), with an average of +0.47 ± 0.03. This last

Page 4 of 15
(page number not for citation purposes)
BMC Systems Biology 2009, 3:69 http://www.biomedcentral.com/1752-0509/3/69

Table 2: Summary of the GO terms used in this study.

GO term N (P) N (P) GO term N (P) N (P) GOLD

DIP GOLD DIP

Developmental process (32502) 632 (16) 257 (8) Organelle envelope (31967) 230 (12) 69 (2)
Reproductive developmental process 26 (0) 13 (0) Organelle inner membrane (19866) 105 (8) 27 (2)
(3006)
Anatomical structure development 186 (15) 94 (8) Organelle outer membrane (31968) 24 (0) ---
(48856)
Cellular developmental process (48869) 450 (1) 169 (0) Organelle envelope lumen (31970) 25 (0) ---
Aging (7568) 40 (0) 22 (0) Nuclear envelope (5635) 86 (3) 35 (0)
Mitochondrial envelope (5740) 148 (9) 34 (2)
Reproduction (3) 245 (7) 111 (4)
Sexual reproduction (19953) 95 (0) 41 (0) Transcription regulator activity (30528) 276 (14) 107 (5)
Asexual reproduction (19954) 74 (6) 44 (4) Transcriptional activator activity (16563) 50 (0) 24 (0)
Reproductive process (22414) 207 (7) 88 (4) Transcriptional repressor activity (16564) 35 (2) 13 (1)
Rep. of a single-celled organism (32505) 220 (7) 99 (4) Transcription factor activity (3700) 45 (2) 13 (1)
RNA polymerase II transcription factor 112 (4) 44 (1)
activity (3702)
Establishment of cellular localization 452 (21) 188 (10) Transcriptional elongation regulator activity 14 (6) ---
(51649) (3711)
Secretion by cell (32940) 206 (9) 84 (3) Transcription cofactor activity (3712) 36 (1) 16 (0)
Establishment of nucleus localization 17 (0) ---
(40023)
Intracellular transport (46907) 409 (21) 175 (10) Structural molecule activity (5198) 231 (29) 75 (18)
Structural constituent of ribosome (3735) 115 (0) 21 (0)
Response to stimulus (50896) 514 (3) 207 (0) Structural constituent of cytoskeleton (5200) 50 (29) 31 (18)
Response to endogenous stimulus 197 (3) 101 (0)
(9719)
Cellular response to stimulus (51716) 13 (0) --- Transporter Activity (5215) 297 (8) 63 (1)
Response to abiotic stimulus (9628) 83 (0) 32 (0) Ion transport activity (15075) 111 (5) 16 (0)
Response to external stimulus (9605) 27 (0) 13 (0) Carbohydrate transporter activity (15144) 26 (0) ---
Response to biotic stimulus (6907) 19 (0) --- ATPase activity, coupled to movement of 41 (2) ---
substances (43492)
Response to chemical stimulus (42221) 212 (0) 65 (0) Amine transporter activity (5275) 27 (0) ---
Response to stress (6950) 370 (3) 159 (0) Organic acid transporter activity (5342) 32 (0) ---
Carrier activity (5386) 67 (0) 13 (0)
Ribonucleoprotein complex (30529) 318 (64) 96 (12) Intracellular transporter activity (5478) 28 (0) 17 (0)
Small nuclear ribonucleoprotein 58 (2) 24 (0) Protein transporter activity (8565) 48 (1) 29 (1)
complex (30532)
Preribosome (30684) 12 (4) --- Lipid transporter activity (5319) 11(2) ---
Spliceosome (5681) 74 (12) 33 (2)
Small nucleolar ribonucleoprotein 49 (43) 10 (9)
complex (5732)
Ribosome (5840) 156 (5) 45 (1)
Polysome (5844) 11 (0) ---

Results for both the DIP and GOLD datasets are indicated. Parent GO terms are indicated in bold and, below them, the child GO terms are
detailed. The numbers in parentheses adjacent to the names refer to the numerical identifiers of the GO terms. N: number of proteins for which
we obtained PPI data and whose genes were annotated to the GO term. (P): in parentheses, number of proteins among those N that are annotated
with the GO term based exclusively on PPI evidence. The child GO terms with less than 10 proteins found when analyzing the GOLD dataset were
not further examined (dashes).

result demonstrates that the GO and interactome classifi- present in average in each cluster. The summary is that
cations are, when globally considered, significantly simi- positive clusters were detected for 45 of the 46 child GO
lar. terms. Purities larger than 70% were observed for 31 out
of those 45 child GO terms and 22 of the 46 child GO
Additional File 1 details the results for all child terms. In terms had coverages larger than 50%. Φ values were posi-
addition of the purity, coverage and Φ coefficient values, tive for all 45 child GO terms for which we found signifi-
that table also details how many significant, non-overlap- cant clusters. Once put aside the two already mentioned
ping clusters were detected for each GO term and how child GO terms with a high number of assignments based
many proteins corresponding to the GO child term were on PPI data, which may therefore be spuriously significant

Page 5 of 15
(page number not for citation purposes)
BMC Systems Biology 2009, 3:69 http://www.biomedcentral.com/1752-0509/3/69

Table 3: General results for the parent GO terms. Analyses using the DIP dataset.

GO TERMS Coverage Purity Ambiguity

(Average) Φ(average ± s.e.m.)

Developmental process (32502) 63.6% (402/632) 62.2% 13.0% (74/570) 0.46 ± 0.02
Reproduction (3) 58.4% (142/245) 94.1% 0% (0/25) 0.38 ± 0.11
Establishment of cellular localization (51649) 66.8% (302/452) 88.4% 1.1% (3/264) 0.43 ± 0.10
Response to stimulus (50896) 56.4% (290/514) 77.5% 19.5% (32/164) 0.46 ± 0.05
Ribonucleoprotein complex (30529) 59.7% (190/318) 77.8% 12.8% (31/242) 0.64 ± 0.06
Organelle envelope (31967) 39.6% (91/230) 84.9% 1.2% (1/83) 0.47 ± 0.09
Transcription regulator activity (30528) 43.5% (120/276) 67.6% 15.0% (30/200) 0.40 ± 0.08
Structural molecule activity (5198) 39.8% (92/231) 95.6% 0% (0/165) 0.53
Transporter Activity (5215) 33.7% (100/297) 72.5% 6.4% (12/186) 0.43 ± 0.06

(Small nucleolar ribonucleoprotein complex and Structural determining, we performed additional analyses using the
constituent of cytoskeleton; see above), we determined the GOLD dataset in order not only to validate those results,
significance level for the other 43 child GO terms using a but also to check for the potential effects of low-confi-
chi square test and Bonferroni's correction (see Methods). dence interactions in our conclusions. First, we repeated
Φ was highly significant for 41 of those 43 terms (Addi- the screening for assignations to GO terms based only in
tional File 1). These results further confirm that GO and PPI data, again finding that only 5.6% of the proteins
interactome are notably congruent. included in our parent GO terms according to the GOLD
dataset were in that class and that the percentage again
Figures 2 and 3 graphically show typical results. Figure 2 went down to 2.7% when we excluded the same two
depicts the UVCLUSTER-based dendrogram of the parent exceptional terms Structural constituent of cytoskeleton and
GO term Ribonucleoprotein complex, which includes well- Small nucleolar ribonucleoprotein complex, mentioned
known cellular components such as the ribosome or the above. Once demonstrated the almost complete inde-
spliceosome. Significant clusters for its six child terms are pendence of the GO and interactome data, we performed
indicated. Interestingly, significant clusters for four out of the same analyses that we did before for the DIP dataset.
the six child GO terms (Spliceosome, Ribosome, Small nucle- In this case, there were just 33 child GO terms containing
olar ribonucleoprotein complex and Preribosome) were almost 10 or more units. We again focused our analyses in deter-
completely independent, while significant clusters for the mining whether those 33 groups appeared in the general
other two (Small nuclear ribonucleoprotein complex and Poly- dendrograms generated with all the proteins annotated to
some) appeared included in more comprehensive clusters the parent GO terms. Table 4 shows the average results for
positive for other child GO terms (Spliceosome and Preri- the nine parent GO terms using the GOLD dataset. They
bosome, respectively). This overlap explains the relatively are in general quite similar to those shown before for the
high ambiguity of the Ribonucleoprotein complex term DIP dataset (Table 3). As happened in the DIP analyses,
(12.8%; Table 3). In Figure 3, the graph with all the both the purity (76.9%; range 64.7% – 93.6%) and cover-
known direct PPI among the proteins in the parent GO age (average: 78.9%; range 39.3% – 96.4%) were high.
term is shown. The color codes allow visualizing why the Ambiguity was higher than in the DIP analyses (average
Spliceosome and Small nuclear ribonucleoprotein complex 28.1%; range 0% – 46.2%). This result was however
terms overlap in the UVCLUSTER analyses: a large expected, considering that the number of proteins in the
number of proteins are annotated with both GO terms GOLD-based trees is much smaller than in the DIP-based
(shown in Figure 3 as blue/yellow dots). The high degree trees, favoring the overlap of the significant clusters.
of purity (77.8%) for the Ribonucleoprotein complex GO Finally, the positive correlation between GO and interac-
term can be also easily visualized in this representation: tome measured by the Φ coefficient was also highly signif-
notice the very few dots with a color different from that of icant and a bit higher than in the DIP-based analyses, with
the clusters (surrounded by the polygons). Those corre- an average of +0.58 ± 0.06 (range: +0.37 – +0.91). This
spond to the few proteins included in a cluster but not difference in average Φ coefficients for the two datasets is
annotated with the corresponding child GO term. however statistically not significant (t test). The results for
all child GO terms are detailed in Additional File 2. They
Analyses of the GOLD dataset: confirming the congruence were very similar to those shown before for the DIP data-
between GO and interactome set (Additional File 1). We detected significant clusters for
While the results shown in the previous section provide all (n = 33) the child GO terms of size ≥ 10. Both purities
the general picture of the congruence between the GO and above 70% and coverages larger than 50% were found in
interactome classifications that we were interested in 24 of those 33 terms. After eliminating the two terms with

Page 6 of 15
(page number not for citation purposes)
BMC Systems Biology 2009, 3:69 http://www.biomedcentral.com/1752-0509/3/69

Figure
Hierarchical
2 representation of the protein interaction network for the Ribonucleoprotein complex term
Hierarchical representation of the protein interaction network for the Ribonucleoprotein complex term. On the
left, tree based on secondary distances. The tree on the right is shown to make the topology easier to visualize. At the bottom,
"Unconnected proteins" are those with no direct interactions, which are separated from the rest by UVCLUSTER. Numbers
refer to different clusters found for the same child GO term, which are again shown in Figure 3. snoRNP complex: Small nucle-
olar ribonucleoprotein complex; snRNP complex: Small nuclear ribonucleoprotein complex. NMD: nonsense-mediated mRNA
decay. LSM: like-SM protein complex.

Page 7 of 15
(page number not for citation purposes)
BMC Systems Biology 2009, 3:69 http://www.biomedcentral.com/1752-0509/3/69

Figure 3
Ribonucleoprotein complex protein interaction network
Ribonucleoprotein complex protein interaction network. All the proteins (dots) in this parent GO term that have at
least one direct connection are shown. Colors refer to the child GO terms to which the proteins are annotated. White dots
are proteins that do not belong to any of the analyzed child GO terms. The clusters detected in our analyses are framed with
colored polygons. Color codes and cluster numbers as in Figure 2.

Table 4: General results for the parent GO terms. Analyses using the GOLD dataset.

GO TERMS Coverage Purity Ambiguity

(Average) Φ(average ± s.e.m.)

Developmental process (32502) 83.3% (214/257) 82.0% 7.2% (16/222) 0.51 ± 0.06
Reproduction (3) 96.4% (107/111) 82.5% 8.3% (1/12) 0.45 ± 0.03
Establishment of cellular localization (51649) 86.7% (163/188) 76.8% 46.2% (49/106) 0.37 ± 0.02
Response to stimulus (50896) 78.3% (162/207) 73.2% 32.1% (18/56) 0.48 ± 0.07
Ribonucleoprotein complex (30529) 82.3% (79/96) 70.7% 56.2% (41/73) 0.72 ± 0.03
Organelle envelope (31967) 87.0% (60/69) 79.5% 26.5% (9/34) 0.70 ± 0.05
Transcription regulator activity (30528) 39.3% (42/107) 64.7% 33.8% (26/77) 0.42 ± 0.03
Structural molecule activity (5198) 69.3% (52/75) 68.8% 42.3% (22/52) 0.91
Transporter Activity (5215) 87.3% (55/63) 93.6% 0.0% (0/50) 0.63 ± 0.13

Page 8 of 15
(page number not for citation purposes)
BMC Systems Biology 2009, 3:69 http://www.biomedcentral.com/1752-0509/3/69

a high assignment based solely on PPI data, we found that dataset (Additional File 2). This fragmentation may be
29 of the 31 child GO terms left had significant Φ coeffi- due to three different causes. First, lack of PPI data con-
cients. All these results confirm the major findings necting the clusters, due to incompleteness of the current
obtained analyzing the DIP dataset. PPI information. Alternatively, it could be due to an arti-
factual division in clusters due to methodological limita-
Differences between the interactome and GO structures tions. Finally, it could also be caused by lumping of
In spite of the clear general congruence between GO and several independent cellular modules into single GO
interactome described in the previous sections, some sig- terms. Results shown in Figures 2, 3, 4 and 5 for the Ribo-
nificant structural differences were also detected in our nucleoprotein complex GO term, using the DIP dataset, sug-
analyses. We will base the following description mainly gest an important role for lumping (similar results were
on results obtained from the DIP dataset, but similar con- obtained for other GO terms). The GO term in those fig-
siderations arose when considering the GOLD data (see ures for which fragmentation is larger (Ribosome, 5 clus-
some details below). ters) is composed by groups of proteins that belong to as
many independent functional units: translation initiation
First of all, several GO terms had low coverages, meaning factors, ribosome stalk, elongation factors and small and
that PPI data to connect proteins annotated with those large mitochondrial ribosomal subunits. These functional
terms is limited or absent. The fact that PPI data is still par- units are largely independent according to PPI data (Fig-
tial obviously contributes to this problem. For example, ures 2 and 3). The structure deduced from the interactome
the GO term Ribonucleoprotein complex had a quite high is summarized in Figure 4, in which the relationships
coverage (59.7% using DIP data; 82.3% using GOLD among the significant clusters of size ≥ 5 are detailed. Five
data) largely because it included several large multipro- of them correspond to the Ribosome GO term. When we
tein complexes (e. g. both units of the mitochondrial then determined which GO terms among those included
ribosome; spliceosome), for which interactome informa- in the general GO term Ribonucleoprotein complex con-
tion is abundant. However, coverage could have been tained a significant number of proteins belonging to the
even higher except for the fact that PPI for proteins of the five detected Ribosome clusters (see Methods), we found
cytoplasmic ribosome were scarce. In fact, no clusters for the results summarized in Figure 5. The fact that four clus-
the cytoplasmic ribosome units were detected (Figure 2). ters (nos. 1, 2, 3, 5) are detected as significantly enriched
Even so, lack of PPI data does not explain all cases of low in different low-level GO terms demonstrates that the
coverage. Often, proteins were annotated with particular detection of multiple clusters is not spurious, but caused
terms by facts unrelated to them collaborating in the cell. by real heterogeneity among the functions of the proteins
This fact explains the especially low coverage values for included in different clusters. The appearance of multiple
some terms in the Molecular Function category, which put clusters may thus be ascribed to the fact that the general
together proteins with related biochemical properties Ribosome GO term indeed includes independent func-
even if their functions are, from a biological point of view, tional units.
totally unrelated. Typical in this sense were our results for
the child GO term Transcription activator activity. In the DIP Figure 4 also shows the third main characteristic discrep-
dataset, this term included 50 proteins, but only 4 pro- ancy that we have observed between interactome and GO:
teins were detected in the UVCLUSTER dendrograms some clusters (snRNP, snoRNP 1, Ribosome 2) are
(Additional File 1). Coverage was thus one of the lowest included within others. This is due to multiple proteins
in the whole DIP dataset, a mere 8.0%. When we searched being annotated with two or more GO terms (Figure 3).
for direct interactions among the 50 proteins annotated The high degree of overlapping among GO terms can be
with this GO term, we found that just 23 loosely inter- best detected when we again determine the GO terms to
acted (none of those had more than 2 interactions with which the proteins in the clusters are annotated (Figures 5
other proteins in the set). It is extremely unlikely that this and 6). In some cases (Figure 5), the degree of overlap is
is solely due to PPI data for all these proteins having been limited. However, in others the overlap is very considera-
missed so far. The simplest explanation is that proteins ble. For example, to generate Figure 6 we took the clusters
included in this GO term function alone or at most in of size ≥ 5 detected for the GO terms Spliceosome, snRNP
small groups, they do not form any functional module. and snoRNP shown in Figures 2 and 3 (a total of 4 clusters;
DIP dataset) and we determined all the GO terms for
A second significant difference between GO and interac- which a significant enrichment of proteins in those clus-
tome structures is that most child GO terms were frag- ters was present. Notably, all 11 GO terms detected as car-
mented into multiple significant PPI clusters. For the DIP rying a higher than expected number of proteins present
dataset, we detected in average 4.1 significant clusters for in those clusters were actually significant for proteins
each child GO term, with 14.9 proteins per cluster (Addi- included in two or even three of them (Figure 6). Similar
tional File 1). Similar results were obtained for the GOLD results were found for some other GO terms.

Page 9 of 15
(page number not for citation purposes)
BMC Systems Biology 2009, 3:69 http://www.biomedcentral.com/1752-0509/3/69

Interactome-based
Figure 4 structure of the GO term Ribonucleoprotein complex, as deduced from Figure 2
Interactome-based structure of the GO term Ribonucleoprotein complex, as deduced from Figure 2. For simplic-
ity, significant clusters of size < 5 are omitted. This eliminates the term Polysome, for which only one cluster of size = 3 was
found.

Discussion detected in statistically significant clusters. Second, we

In this study, we quantified for the first time the global have shown that the analyses of large PPI datasets, even
congruence between the structures of the GO and interac- those that include low-confidence interactions, provide
tome of a eukaryotic species. We used a simple scheme of robust results. It is true that using the GOLD dataset has
analysis, which only considers large parent GO terms with led to the detection of a higher level of congruence
multiple child GO terms. This allowed us to analyze large between GO and interactome than that found using the
numbers of proteins with minimal design problems, DIP dataset (Φ coefficient for the DIP dataset: +0.47 ±
which could be caused by using smaller groups (e. g. those 0.03; Φ coefficient for the GOLD dataset: +0.58 ± 0.06),
lower in the GO hierarchy) or by the intrinsic structure of However, this difference is statistically not significant.
directed acyclic graph characteristic of the GO (which Therefore, the improvement obtained by excluding low-
would have influenced the results in more complex confidence interactions is scarce.
designs, e. g. when using multiple GO levels). In spite of
this intrinsic simplicity of design and the fact that we have On the other hand, our results may also contribute to
not analyzed the complete GO or the whole interactome revise the current ontologies. For example, results in Fig-
of S. cerevisiae, it is reasonable to expect that our results ures 2, 3 and 4, in which we showed that the Ribosome
can be extrapolated to the cell as a whole. Most especially, term is divided into five interactome-based units, each
our main conclusion, that the congruence between the one of them inherently logical from a functional point of
structures deduced from GO and PPI is high, seems ines- view, suggest a division of this term slightly different from
capable. This result goes well beyond previous efforts, the one currently available. Now, only both mitochon-
which simply characterized whether groups of highly con- drial subunits have their own GO terms (Figure 5). Our
nected proteins, modules, were enriched for GO terms. results suggest however that it may be better to establish
terms for the five clusters detected. Another significant
These results have important implications. A first conclu- point to consider is why a substantial number of GO
sion is that our analyses show that GO classifications terms have low coverages. Although this can be in part
often have a strong structural basis: proteins annotated explained by lack of PPI data, there are GO terms defined
with the same GO term often interact, or at least they are for groups of proteins that most likely do not interact (see
sufficiently close in the interactome graph as to be results described for the Transcription activator activity

Page 10 of 15
(page number not for citation purposes)
BMC Systems Biology 2009, 3:69 http://www.biomedcentral.com/1752-0509/3/69

Figure
GO
childterms
GO5 term
for which it was found a significant enrichment for proteins in the clusters detected when analyzing the Ribosome
GO terms for which it was found a significant enrichment for proteins in the clusters detected when analyzing
the Ribosome child GO term. Notice how this structure, directly taken from the GO, differs from that shown in Figure 4.
Numbers refer to the five clusters shown also in the other figures (1: Translation initiation factors; 2: Ribosome stalk; 3: Large
mitochondrial subunit; 4: Elongation factors; 5: Small mitochondrial subunit).

Page 11 of 15
(page number not for citation purposes)
BMC Systems Biology 2009, 3:69 http://www.biomedcentral.com/1752-0509/3/69

Figure
Spliceosome
GO terms 6 for
waswhich
detected
a significant enrichment for proteins in the clusters detected for the child GO terms snRNP, snoRNP and
GO terms for which a significant enrichment for proteins in the clusters detected for the child GO terms
snRNP, snoRNP and Spliceosome was detected. The names below the boxes refer to the child GO terms from which
derive the clusters of proteins detected as significant. Notice the obvious overlap due to many proteins belonging to two or
even the three child GO terms.

term, above). We think that to annotate with a GO term careful reconsideration of these GO terms attending to the
proteins that do not work together in the cell may be PPI data may generate a more natural classification.
acceptable for terms in the Molecular Function category, Finally, a third significant discrepancy between GO and
useful just for obtaining a biochemical classification of interactome regards the overlaps and the hierarchical rel-
gene products. In fact, terms in that category generally had ative position of terms. The knowledge of biological net-
the lowest coverages (see Tables 3, 4). However, low cov- works may be very useful to define the levels in biological
erages for terms in the Biological Process or Cellular Com- ontologies. One of the first goals may be to avoid as much
ponent categories should be regarded with suspicion. A as possible to establish at the same level two terms that

Page 12 of 15
(page number not for citation purposes)
BMC Systems Biology 2009, 3:69 http://www.biomedcentral.com/1752-0509/3/69

contain many common proteins (e. g. Figure 6). Also, as mary distances [15]. Dendrograms using secondary dis-
we have seen (Figures 2 and 4), according to PPI data, a tances were obtained using the UPGMA routine in Mega 3
cluster for one GO term often contains a smaller cluster [33].
for another GO term of the same level. Those two terms
may be based, at least in part, in just one functional mod- UVCLUSTER analyses are very time consuming when the
ule, being thus substantially redundant. This situation number of units is higher than 1000 [15]. That is why we
should be also as much as possible avoided. selected parent GO terms with at most 1000 annotated
proteins. Moreover, we selected parent GO terms subdi-
Conclusion vided into multiple child GO terms to speed up the recol-
In summary, in Saccharomyces cerevisiae, GO and the glo- lection and analysis of the data. We finally centered our
bal structure of the interactome show a substantial degree analysis on the child GO terms containing at least 10 pro-
of congruence. This is comforting, given that both classifi- teins for which interactome data were available, discard-
cations have been obtained almost independently. We ing smaller child GO terms, to avoid biases that could be
conclude that our current "curated" view of the yeast cell, caused by a few missing or a few false positive links in
as schematized in the GO, is globally confirmed by the small groups of proteins. Some child GO terms were
unsupervised type of analysis developed here. However, excluded specifically from the GOLD analyses, given that
the discrepancies detected mean that the current develop- in the GOLD dataset they contained less than 10 proteins
ment of the Saccharomyces Gene Ontology is still incom-
plete and a better integration of PPI data may contribute GO is divided into three main categories: Biological Proc-
to its improvement. ess, Cellular Component and Molecular Function. The
first of these groups reflects the known information about
Methods the cellular functions in which gene products are
We searched the GO annotations compiled in the Saccha- involved, the second refers to the locations (subcellular
romyces Genome Database (SGD; http://www.yeastge structures, macromolecular complexes) in which those
nome.org) for large parent GO terms including 200–1000 products act and the third refers to the biochemical task
proteins and with at least 4 child GO terms, each with 10 that the products perform (e. g. they have certain enzy-
or more proteins. All proteins not included in a child GO matic activity, act as receptors, etc.). We retrieved four par-
term (i. e. annotated only with the parent GO term) were ent GO terms from the Biological Process category and
excluded from the cluster analyses. The UVCLUSTER pro- three more for the Molecular Function category that com-
gram [15] (see http://www.uv.es/~genomica/UVCLUS ply with our criteria of selection and were hierarchically
TER) was then used to obtain the hierarchical structure of located just below these two main categories (these are
the graphs for each set of proteins annotated with a GO often called "level 1 GO terms"). However, none of the
term. The starting point to obtain the hierachical trees level 1 GO terms of the Cellular Component category
with UVCLUSTER analyses are the "primary distances" matched our criteria of size and number of child terms.
among the proteins (shortest path lengths in the interac- We thus selected as parents two level 2 GO terms of that
tome graph). They were obtained from two sources. First, category that indeed comply with those criteria. The
from the Database of Interacting Proteins (DIP; http:// selected parent GO terms are summarized in Table 1.
dip.doe-mbi.ucla.edu). We used the full S. cerevisiae data-
set in DIP, which compiles information from multiple Explorations of the dendrograms to estimate the enrich-
sources, although about 80% of the included protein-pro- ment for GO terms were performed as described in [30].
tein interactions derive from high-throughput experi- This highly sensitive method, implemented in the
ments, either using the yeast two hybrid method or TreeTracker program, compares the enrichments for child
affinity purification of protein complexes. The second GO terms in the observed tree with those in random sim-
source was the "Binary gold standard set" described by Yu ulations based on the same tree topology. Whenever the
et al. [31], which includes only high-confidence data, probability of finding by chance a particular enrichment
mostly based on direct physical interactions characterized was sufficiently low (in this study, p < 0.001; i. e. only 1/
by the two-hybrid method. For UVCLUSTER analyses, 1000 of significant clusters detected are expected to be
10000 iterations, generating as many alternative topolo- false positives) and provided that the cluster contained 2
gies, and an affinity coefficient of 100 were used to esti- or more units belonging to the analyzed GO term, the
mate the "secondary distances" that are used to build the cluster was labeled as positive.
final dendrograms (see [15] for details on these parame-
ters). Secondary distances, obtained by weighting the To quantify the congruence between GO and interactome,
10000 alternative trees, have clear advantages over pri- we used four parameters. The first one is the coverage,

Page 13 of 15
(page number not for citation purposes)
BMC Systems Biology 2009, 3:69 http://www.biomedcentral.com/1752-0509/3/69

which measures to which extent a GO term is recovered by Additional material

analyzing the structure of the interactome. For a parent
GO term, coverage is defined as the percentage of the pro-
teins annotated with that parent GO term that appear in
Additional file 1
Supplementary table 1. Detailed results for DIP interaction network.
the statistically significant clusters characterized for its Click here for file
child GO terms. For a child GO term, the definition is [http://www.biomedcentral.com/content/supplementary/1752-
slightly different: coverage is defined as the percentage of 0509-3-69-S1.doc]
proteins annotated with the child GO term that are
included in significant clusters detected specifically for Additional file 2
that term. The second parameter is the purity of the clus- Supplementary table 2. Detailed results for GOLD interaction network.
ters, defined as the percentage of proteins contained in Click here for file
[http://www.biomedcentral.com/content/supplementary/1752-
clusters significant for a given GO term which indeed are 0509-3-69-S2.doc]
annotated with that term. The third parameter, which we
called ambiguity, is defined as the percentage of proteins
annotated with a single child GO term that however
appear included in significant clusters for two or more Acknowledgements
child GO terms. Ambiguity thus indicates the degree of Research supported by grant BIO2008-05067 (Programa Nacional de Bio-
overlap among child GO terms according to the interac- tecnología; Ministerio de Ciencia e Innovación. Spain), awarded to IM. AM
tome structure. However, none of these three informative was a FPI fellow from Ministerio de Educación y Ciencia (Spain).
parameters (coverage, purity, ambiguity) by itself fully
measures the global congruence of the two structures. To References
do so, we used a fourth parameter, the Phi correlation coef- 1. The Gene Ontology Consortium: Creating the gene ontology
resource design and implementation. Genome Res 2001,
ficient (Φ; [34] p. 741), defined as: 11:1425-1433.
2. Bork P, Jensen LJ, von Mering C, Ramani AK, Lee I, Marcotte EM: Pro-
tein interaction networks from yeast to human. Curr Opin
Φ = ( TP TN − FP FN) / √ [( TP + FN)( TN + FP)( TP + FP)( TN + FN)] Struct Biol 2004, 14:292-299.
3. Cusick ME, Klitgord N, Vidal M, Hill DE: Interactome gateway
The four parameters (TP, TN, FN, FP) refer to a particular into systems biology. Hum Mol Genet 2005, 14:R171-R181.
4. Stelzl U, Wanker EE: The value of high quality protein-protein
GO term. TP (true positives) are the proteins in the clus- interaction networks for systems biology. Curr Opin Chem Biol
ters detected as positive for a GO term which are indeed 2006, 10:551-558.
5. Xia Y, Yu H, Jansen R, Seringhaus M, Baxter S, Greenbaum D, Zhao
annotated to that term. TN (true negatives) are the pro- H, Gerstein M: Analyzingcellular biochemistry in terms of
teins excluded from the clusters that are not annotated to molecular networks. Annu Rev Biochem 2004, 73:1051-1087.
the term. FN (false negatives) are proteins annotated to 6. Uetz P, Finley RLJ: From protein networks to biological sys-
tems. FEBS Lett 2005, 579:1821-1827.
the GO term which are not included in any significant 7. Mrowka R, Patzak A, Herzel H: Is there a bias in proteome
cluster for that term. Finally, FP are proteins included in research? Genome Res 2001, 11:1971-1973.
the significant clusters that are not annotated to the GO 8. von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork
P: Comparative assessment of large-scale data sets of pro-
term. Significance of Φ can be simply estimated: Φ2 n, tein-protein interactions. Nature 2002, 417:399-403.
where n is the total sample size (n = FP + FN + TP + TN), 9. Bader GD, Hogue CW: An automated method for finding
molecular complexes in large protein interaction networks.
follows a chi-square distribution with one degree of free- Nat Biotechnol 2002, 20(10):991-7.
dom [34-36]. Notice also that, for child GO terms, the 10. Deane CM, Salwiñski Ł, Xenarios I, Eisenberg D: Protein interac-
parameters coverage and purity, explained above, can be tions: two methods for assessment of the reliability of high
throughput observations. Mol Cell Proteomics 2002, 1:349-356.
respectively calculated as TP/(TP + FN) and TP/(TP + FP). 11. Hart GT, Ramani AK, Marcotte EM: How complete are current
yeast and human protein-interaction networks? Genome Biol
2006, 7:120.
Finally, to generate Figures 5 and 6, we took each of the 12. Bader GD, Hogue CWV: An automated method for finding
significant clusters (size ≥ 5 elements) that we wanted to molecular complexes in large protein interaction networks.
analyze and we searched for GO terms that contained BMC Bioinformatics 2003, 4:2.
13. Rives AW, Galitski T: Modular organization of cellular net-
more proteins included in each cluster than expected by works. Proc Natl Acad Sci USA 2003, 100:1128-1133.
chance (p < 0.01) using High-Throughput GoMiner [37]. 14. Spirin V, Mirny LA: Protein complexes and functional modules
in molecular networks. Proc Natl Acad Sci USA 2003,
100:12123-12128.
Authors' contributions 15. Arnau V, Mars S, Marín I: Iterative cluster analysis of protein
Both authors devised this research. AM performed all the interaction data. Bioinformatics 2005, 21:364-378.
16. Sen TZ, Kloczkowski A, Jernigan RL: Functional clustering of
analyses of the paper and contributed to the text. IM wrote yeast proteins from the protein-protein interaction net-
the manuscript. Both authors read and approved the final work. BMC Bioinformatics 2006, 7:355.
manuscript. 17. Hirsh E, Sharan R: Identification of conserved protein com-
plexes based on a model of protein network evolution. Bioin-
formatics 2007, 23:e170-e176.

Page 14 of 15
(page number not for citation purposes)
BMC Systems Biology 2009, 3:69 http://www.biomedcentral.com/1752-0509/3/69

18. Luo F, Yang Y, Chen CF, Chang R, Zhou J, Scheuermann RH: Modu-
lar organization of protein interaction networks. Bioinformat-
ics 2007, 23:207-214.
19. Brohée S, van Helden J: Evaluation of clustering algorithms for
protein-protein interaction networks. BMC Bioinformatics 2006,
7:488.
20. Marín I, Hoyas S: Basic networks: definition and applications.
Journal of Theoretical Biology 2009, 258:53-59.
21. Deng M, Zhang K, Mehta S, Chen T, Sun F: Prediction of protein
function using protein-protein interaction data. J Comput Biol
2003, 10:947-960.
22. Letovsky S, Kasif S: Predicting protein function from protein/
protein interaction data a probabilistic approach. Bioinformat-
ics 2003, 19 Suppl 1:i197-204.
23. Karaoz U, Murali TM, Letovsky S, Zheng Y, Ding C, Cantor CR, Kasif
S: Whole-genome annotation by using evidence integration
in functional-linkage networks. Proc Natl Acad Sci USA 2004,
101:2888-2893.
24. Lu LJ, Xia Y, Paccanaro A, Yu H, Gerstein M: Assessing the limits
of genomic data integration for predicting protein networks.
Genome Res 2005, 15:945-953.
25. Wu X, Zhu L, Guo J, Zhang DY, Lin K: Prediction of yeast pro-
tein-protein interaction network insights from the Gene
Ontology and annotations. Nucleic Acids Res 2006, 34:2137-2150.
26. Barabási A, Oltvai ZN: Network biology understanding the
cell's functional organization. Nat Rev Genet 2004, 5:101-113.
27. Albert R: Scale-free networks in cell biology. J Cell Sci 2005,
118:4947-4957.
28. Arnau V, Marín I: A hierarchical clustering strategy and its
application to proteomic interaction data. Lec Notes Comp Sci
2003, 2652:62-69 [http://www.springerlink.com/content/
mdne0nbmtypjjl6j/].
29. Lucas JI, Arnau V, Marín I: Comparative genomics and protein
domain graph analyses link ubiquitination and RNA metabo-
lism. J Mol Biol 2006, 357:9-17.
30. Marco A, Marin I: A general strategy to determine the congru-
ence between a hierarchical and a non-hierarchical classifica-
tion. BMC Bioinformatics 2007, 8:442.
31. Yu H, Braun P, Yildirim MA, Lemmens I, Venkatesan K, Sahalie J, Hiro-
zane-Kishikawa T, Gebreab F, Li N, Simonis N, Hao T, Rual JF, Dricot
A, Vazquez A, Murray RR, Simon C, Tardivo L, Tam S, Svrzikapa N,
Fan C, de Smet AS, Motyl A, Hudson ME, Park J, Xin X, Cusick ME,
Moore T, Boone C, Snyder M, Roth FP, Barabási AL, Tavernier J, Hill
DE, Vidal M: High-quality binary protein interaction map of
the yeast interactome network. Science 2008, 322:104-110.
32. Dolinski K, Botstein D: Changing perspectives in yeast research
nearly a decade after the genome sequence. Genome Res 2005,
15:1611-1619.
33. Kumar S, Tamura K, Nei M: MEGA3: Integrated software for
Molecular Evolutionary Genetics Analysis and sequence
alignment. Brief Bioinform 2004, 5(2):150-63.
34. Sokal RR, Rohlf FJ: Biometry the principles and practice of statistics in bio-
logical research New York; WH Freeman and Co; 1995.
35. Burset M, Guigó R: Evaluation of gene structure prediction
programs. Genomics 1996, 34:353-367.
36. Tompa M, Li N, Bailey TL, Church GM, De Moor B, Eskin E, Favorov
AV, Frith MC, Fu Y, Kent WJ, Makeev VJ, Mironov AA, Noble WS,
Pavesi G, Pesole G, Régnier M, Simonis N, Sinha S, Thijs G, van Helden
J, Vandenbogaert M, Weng Z, Workman C, Ye C, Zhu Z: Assessing
computational tools for the discovery of transcription factor
binding sites. Nat Biotechnol 2005, 23:137-144. Publish with Bio Med Central and every
37. Zeeberg BR, Qin H, Narasimhan S, Sunshine M, Cao H, Kane DW,
Reimers M, Stephens RM, Bryant D, Burt SK, Elnekave E, Hari DM, scientist can read your work free of charge
Wynn TA, Cunningham-Rundles C, Stewart DM, Nelson D, Wein- "BioMed Central will be the most significant development for
stein JN: High-Throughput GOMiner, an 'industrial-strength' disseminating the results of biomedical researc h in our lifetime."
integrative Gene Ontology tool for interpretation of multi-
ple-microarray experiments, with application to studies of Sir Paul Nurse, Cancer Research UK
Common Variable Immune Deficiency (CVID). BMC Bioinfor- Your research papers will be:
matics 2005, 6:168.
available free of charge to the entire biomedical community
peer reviewed and published immediately upon acceptance
cited in PubMed and archived on PubMed Central
yours — you keep the copyright

Submit your manuscript here: BioMedcentral

http://www.biomedcentral.com/info/publishing_adv.asp

Page 15 of 15
(page number not for citation purposes)

Advanced Aesthetic Rhinoplasty Art, Science, and New Clinical Techniques
100% (1)
Advanced Aesthetic Rhinoplasty Art, Science, and New Clinical Techniques
1,121 pages
Balance of Carbohydrate and Lipid Utilization During Exercise The Cossover Concept
No ratings yet
Balance of Carbohydrate and Lipid Utilization During Exercise The Cossover Concept
9 pages
Functional Genomics and Proteomics: Charting A Multidimensional Map of The Yeast Cell
No ratings yet
Functional Genomics and Proteomics: Charting A Multidimensional Map of The Yeast Cell
13 pages
Bioinformatics Unveiled
From Everand
Bioinformatics Unveiled
Joan Melody
No ratings yet
Protein Domain: Structural Insights Into Molecular Interactions and Functionality
From Everand
Protein Domain: Structural Insights Into Molecular Interactions and Functionality
Fouad Sabry
No ratings yet
STRING v10: Protein-Protein Interaction Networks, Integrated Over The Tree of Life
No ratings yet
STRING v10: Protein-Protein Interaction Networks, Integrated Over The Tree of Life
6 pages
Protein Structure: Unveiling Molecular Dynamics and Interactions in Biological Macromolecules
From Everand
Protein Structure: Unveiling Molecular Dynamics and Interactions in Biological Macromolecules
Fouad Sabry
No ratings yet
1.latest String
No ratings yet
1.latest String
8 pages
Structural Bioinformatics: Molecular Insights into Biomacromolecular Structures and Interactions
From Everand
Structural Bioinformatics: Molecular Insights into Biomacromolecular Structures and Interactions
Fouad Sabry
No ratings yet
Elscgbf
No ratings yet
Elscgbf
11 pages
FEBS Letters - 2005 - Uetz - From Protein Networks To Biological Systems
No ratings yet
FEBS Letters - 2005 - Uetz - From Protein Networks To Biological Systems
7 pages
Protein Structure Prediction: Advances in Computational Methods and Their Application to Molecular Modeling
From Everand
Protein Structure Prediction: Advances in Computational Methods and Their Application to Molecular Modeling
Fouad Sabry
No ratings yet
Yeast Two-Hybrid, A Powerful Tool For Systems Biology: Molecular Sciences
No ratings yet
Yeast Two-Hybrid, A Powerful Tool For Systems Biology: Molecular Sciences
26 pages
A Bayesian Approach For Estimating Protein-Protein
No ratings yet
A Bayesian Approach For Estimating Protein-Protein
12 pages
Molecular Modelling and Drug Design
From Everand
Molecular Modelling and Drug Design
K Anand Solomon
No ratings yet
Bandyopadhyay 2016
No ratings yet
Bandyopadhyay 2016
10 pages
Protein: Structural Dynamics and Functional Interactions at the Molecular Level
From Everand
Protein: Structural Dynamics and Functional Interactions at the Molecular Level
Fouad Sabry
No ratings yet
Protein Folding: Exploring the Dynamics of Molecular Structure and Function
From Everand
Protein Folding: Exploring the Dynamics of Molecular Structure and Function
Fouad Sabry
No ratings yet
Intrinsically Disordered Proteins: Exploring Structural Dynamics and Functional Roles in Cellular Mechanisms
From Everand
Intrinsically Disordered Proteins: Exploring Structural Dynamics and Functional Roles in Cellular Mechanisms
Fouad Sabry
No ratings yet
Logical Modeling of Biological Systems
From Everand
Logical Modeling of Biological Systems
Luis Fariñas del Cerro
No ratings yet
The Gene Ontology Consortium - Gene Ontology: Tool For The Unification of Biology
No ratings yet
The Gene Ontology Consortium - Gene Ontology: Tool For The Unification of Biology
5 pages
Shannon 2003 Cytoscape
No ratings yet
Shannon 2003 Cytoscape
8 pages
Bioinformatics: Merging Biology and Technology
From Everand
Bioinformatics: Merging Biology and Technology
Mani Devar
No ratings yet
Protein Biosynthesis: Molecular Mechanisms and Dynamics of Cellular Protein Formation
From Everand
Protein Biosynthesis: Molecular Mechanisms and Dynamics of Cellular Protein Formation
Fouad Sabry
No ratings yet
Evaluating The Accuracy and Efficiency of Complex Network Classification Algorithms
No ratings yet
Evaluating The Accuracy and Efficiency of Complex Network Classification Algorithms
6 pages
Keywords: Graph Algorithms, Data Tegrati, Cellular S, Prote - Prote Teracti S, Transcripti Al Regulatory S, Modularity
No ratings yet
Keywords: Graph Algorithms, Data Tegrati, Cellular S, Prote - Prote Teracti S, Transcripti Al Regulatory S, Modularity
35 pages
Introduction to Bioinformatics, Sequence and Genome Analysis
From Everand
Introduction to Bioinformatics, Sequence and Genome Analysis
Jerry H. Swift
No ratings yet
Beta Sheet: Structural Dynamics and Folding Mechanisms in Protein Architecture
From Everand
Beta Sheet: Structural Dynamics and Folding Mechanisms in Protein Architecture
Fouad Sabry
No ratings yet
Protein-Protein Interaction Networks: Methods and Protocols
No ratings yet
Protein-Protein Interaction Networks: Methods and Protocols
291 pages
A Structure Based Approach For Accurate Prediction of Protein
No ratings yet
A Structure Based Approach For Accurate Prediction of Protein
8 pages
Site Directed Spin Labeling: Advances in Molecular Imaging and Protein Structure Analysis
From Everand
Site Directed Spin Labeling: Advances in Molecular Imaging and Protein Structure Analysis
Fouad Sabry
No ratings yet
The evolution of transcriptomics
No ratings yet
The evolution of transcriptomics
10 pages
Gkad 445
No ratings yet
Gkad 445
6 pages
16 article
No ratings yet
16 article
8 pages
Identi Fi Cation of Ligand Binding Site and Protein-Protein Interaction Area
No ratings yet
Identi Fi Cation of Ligand Binding Site and Protein-Protein Interaction Area
172 pages
String Database
No ratings yet
String Database
10 pages
Module 4 Complete
No ratings yet
Module 4 Complete
197 pages
The Gene Ontology Resource: 20 Years and Still Going Strong
No ratings yet
The Gene Ontology Resource: 20 Years and Still Going Strong
9 pages
NIH Public Access: Author Manuscript
No ratings yet
NIH Public Access: Author Manuscript
9 pages
STRING v11: Protein-Protein Association Networks With Increased Coverage, Supporting Functional Discovery in Genome-Wide Experimental Datasets
No ratings yet
STRING v11: Protein-Protein Association Networks With Increased Coverage, Supporting Functional Discovery in Genome-Wide Experimental Datasets
7 pages
Proteome Wide Interaction Maos and Networks
No ratings yet
Proteome Wide Interaction Maos and Networks
12 pages
Artículo Saccharomyces Cerevisae
No ratings yet
Artículo Saccharomyces Cerevisae
9 pages
A Combined Approach For Genome Wide Protein
No ratings yet
A Combined Approach For Genome Wide Protein
12 pages
Bts 595
No ratings yet
Bts 595
8 pages
Protein Interaction Networks Computational Analysis 1st Edition Aidong Zhangdownload
100% (1)
Protein Interaction Networks Computational Analysis 1st Edition Aidong Zhangdownload
45 pages
9
No ratings yet
9
9 pages
A Reference Map of The Human Binary Protein Interactome
No ratings yet
A Reference Map of The Human Binary Protein Interactome
34 pages
BIO 401 Note... Protein Function Prediction and Protein Interaction, String
No ratings yet
BIO 401 Note... Protein Function Prediction and Protein Interaction, String
4 pages
Functional Genomics
100% (1)
Functional Genomics
210 pages
2016.science.a Global Genetic Interaction Network Maps A Wiring Diagramof Cellular Function
No ratings yet
2016.science.a Global Genetic Interaction Network Maps A Wiring Diagramof Cellular Function
16 pages
Network Biology
No ratings yet
Network Biology
43 pages
You Are What You Eat
No ratings yet
You Are What You Eat
3 pages
Systems Biology: A Textbook
From Everand
Systems Biology: A Textbook
Edda Klipp
No ratings yet
Experimental and Bioinformatic Approaches For Interrogating Protein-Protein Interactions To Determine Protein Function
No ratings yet
Experimental and Bioinformatic Approaches For Interrogating Protein-Protein Interactions To Determine Protein Function
18 pages
Slides 3
No ratings yet
Slides 3
53 pages
Applications of Multi-Omics: Fundamentals of Integrating Biological Data for Precision Medicine and Research
From Everand
Applications of Multi-Omics: Fundamentals of Integrating Biological Data for Precision Medicine and Research
Richard Skiba
No ratings yet
Polymer Protein Hybrid: Advancements in Bioengineering for Autonomous Reproductive Systems
From Everand
Polymer Protein Hybrid: Advancements in Bioengineering for Autonomous Reproductive Systems
Fouad Sabry
No ratings yet
Gene Control: Unlocking Genetic Secrets
From Everand
Gene Control: Unlocking Genetic Secrets
Deevakar Asan
No ratings yet
Relating Whole-Genome Expression Data With Protein-Protein Interactions
No ratings yet
Relating Whole-Genome Expression Data With Protein-Protein Interactions
30 pages
Galperin 2019 The COG Approach
No ratings yet
Galperin 2019 The COG Approach
8 pages
ppiGReMLIN A Graph Mining Based
No ratings yet
ppiGReMLIN A Graph Mining Based
25 pages
Bioinformatics 25 8 1091
No ratings yet
Bioinformatics 25 8 1091
3 pages
Tmp1a96 TMP
No ratings yet
Tmp1a96 TMP
80 pages
tmpF178 TMP
No ratings yet
tmpF178 TMP
15 pages
tmpE3C0 TMP
No ratings yet
tmpE3C0 TMP
17 pages
Tmp75a7 TMP
No ratings yet
Tmp75a7 TMP
8 pages
tmp998 TMP
No ratings yet
tmp998 TMP
9 pages
tmp3656 TMP
No ratings yet
tmp3656 TMP
14 pages
tmp96F2 TMP
No ratings yet
tmp96F2 TMP
4 pages
tmpA7D0 TMP
No ratings yet
tmpA7D0 TMP
9 pages
tmp97C8 TMP
No ratings yet
tmp97C8 TMP
9 pages
Exam Oriented Questions - GP
No ratings yet
Exam Oriented Questions - GP
97 pages
EUEE from Genetics
No ratings yet
EUEE from Genetics
5 pages
Andrology
No ratings yet
Andrology
93 pages
Mechanism of Respiration
No ratings yet
Mechanism of Respiration
15 pages
Laemmli Buffer
No ratings yet
Laemmli Buffer
2 pages
Nasal Trauma
No ratings yet
Nasal Trauma
5 pages
Serangan Jantung
No ratings yet
Serangan Jantung
13 pages
Histology Practical PDF Bytom Cnaan
No ratings yet
Histology Practical PDF Bytom Cnaan
65 pages
Urinary Elimination
No ratings yet
Urinary Elimination
9 pages
Biology Notes Chpter 10
No ratings yet
Biology Notes Chpter 10
8 pages
Medical Nutrition Therapy For Renal Disease 2016
No ratings yet
Medical Nutrition Therapy For Renal Disease 2016
39 pages
Lesson Plan in Biology 1 2 Quarter
No ratings yet
Lesson Plan in Biology 1 2 Quarter
3 pages
No Nama Diagnosa Kode Icd X: Kecelakaan) Kecelakaan Kerja)
No ratings yet
No Nama Diagnosa Kode Icd X: Kecelakaan) Kecelakaan Kerja)
2 pages
GRADE 10 SCIENCE-REVIEWER-3rd-Quarter
No ratings yet
GRADE 10 SCIENCE-REVIEWER-3rd-Quarter
3 pages
Carbonic Anhydrase-Smaranika
100% (1)
Carbonic Anhydrase-Smaranika
19 pages
A Case of Swelling: Presented by - Divya Mundada Navodaya Dental College
No ratings yet
A Case of Swelling: Presented by - Divya Mundada Navodaya Dental College
23 pages
Dna and Rna Powerpoint 2
No ratings yet
Dna and Rna Powerpoint 2
46 pages
Chapter 074
No ratings yet
Chapter 074
7 pages
Powerpoint Acupuncture Works
0% (1)
Powerpoint Acupuncture Works
31 pages
9700 BIOLOGY: MARK SCHEME For The October/November 2015 Series
No ratings yet
9700 BIOLOGY: MARK SCHEME For The October/November 2015 Series
4 pages
Download Complete Biological Psychology 1st Edition Kelly G. Lambert PDF for All Chapters
100% (1)
Download Complete Biological Psychology 1st Edition Kelly G. Lambert PDF for All Chapters
52 pages
Retdem Guide
No ratings yet
Retdem Guide
3 pages
Carbohydrate Metabolism in Dairy Cows
100% (2)
Carbohydrate Metabolism in Dairy Cows
4 pages
Summary of Pathways
100% (1)
Summary of Pathways
1 page
Biology Module 5 - Animal Organ Systems
No ratings yet
Biology Module 5 - Animal Organ Systems
34 pages
Nutrition During The Life Cycle
100% (2)
Nutrition During The Life Cycle
335 pages
009 Neuromodulation Vs Neurotransmission V5
100% (1)
009 Neuromodulation Vs Neurotransmission V5
1 page
ANATOMY SHORT ANSWER QUESTIONS BANK September 2009
No ratings yet
ANATOMY SHORT ANSWER QUESTIONS BANK September 2009
4 pages