Relating Whole-Genome Expression Data With Protein-Protein Interactions
Relating Whole-Genome Expression Data With Protein-Protein Interactions
Protein Interactions
[email protected]
[email protected]
[email protected]
In general, our goal was to integrate and cross-correlate already existing data from
different sources and find general trends in it. This is an exploratory study prior to any
type of prediction. In a sense, this study can be understood as an exploration of the
knowledge already implicit in the current data but not yet obvious because, previously, it
has not yet been integrated and put together in this way.
Results
In our survey of existing data, we have used two different approaches to analyze the two
different types of expression data available: the computation of normalized differences
for absolute expression levels and a more standard analysis of the correlation of profiles
of relative expression levels (expression ratios). We explain these two approaches in
more detail in the following two sections.
where Ei and Ej are the mRNA expression levels of subunits i and j. This quantity defines
the difference as a fraction of the sum of the expression levels, thus allowing for a
It should also be noted that there are obviously many limitations in treating GeneChip
and SAGE data as absolute measurements of mRNA expression (Schadt et al. 2000).
As the input for our procedure we use the expression vectors or profiles of all the
subunits of a complex and then compute their pair-wise correlations. Like for the
normalized difference, we compute the correlation coefficients for all protein pairs in a
complex, thus gaining a distribution of correlation coefficients. If the complex consists
of N subunits, this yields (N2 - N)/2 different combinations of protein pairs and thus
correlation coefficients. To summarize these distributions, we calculate the “average
correlation” (by which we mean the average of all pair-wise correlations within a
complex). As a suitable control to assess statistical significance, we use the distributions
of correlation coefficients for random groups of proteins and their averages (see
methods). We would expect correlations of close to 1 for subunits in a tight complex.
However, as we show in the Methods section this will not be exactly the case due to the
relationship between mRNA and protein abundances.
Results
We first outline some results obtained for specific protein complexes, then we proceed to
a more general overview of complexes.
Specific Complexes
Ribosome
It has long been known that the mRNA expression levels of the ribosomal proteins are
strongly correlated with one another (Johannes et al. 1999). Figure 1 shows the observed
distribution of normalized differences for protein pairs in the large subunit of the
cytoplasmic ribosome. The median of this distribution is 0.23, much lower than the
median of 0.5 for randomly selected protein pairs. While there is a wide range of
normalized differences (which may partially result from the fact that many proteins in the
Similar observations can be made for the proteins in the small cytoplasmic ribosome.
Key statistics are summarized in figure 3 in comparison to those for other protein
complexes. Furthermore, the two separate ribosome particles are strongly co-regulated.
In fact, the large and the small ribosomal particles cannot be differentiated by our
measures of expression similarity.
Proteasome
A second example of a complex whose individual subunits are strongly co-regulated is
the proteasome, which is involved in protein degradation and responsible for the rapid
breakdown of ubiquitinated proteins. Like the ribosome, the 26S proteasome can be
divided into two sub-particles: the 20S and the 19S (or 19S/22S regulatory particle). The
20S particle is present as a dimer in the center of the complex structure and contains the
catalytic core, whereas two 19S particles are attached to both ends of the 20S particle
dimer (Coux et al. 1996; Wilkinson et al. 1999).
The distribution of the normalized differences for all possible protein pairs in the 20S
proteasome is shown in figure 1. Like the ribosome, it is clearly skewed towards zero,
compared to the control, with a median of 0.29. Figure 2 shows the distribution of
correlation coefficients, which is strongly shifted to the right of the control, though to a
lesser extent than that for the ribosome. An investigation of the crystal structure of 20S
particle (Whitby et al. 2000) did not reveal any relationship with the gene expression
differences (e.g. proteins with slightly more random correlations tending to be more on
the surface of the particle).
One subunit, Doa4p, exhibits a very low average correlation (-0.02). Biochemical studies
have previously shown that not all proteasomes have Doa4p bound and that the Doa4p-
proteasome interaction is more likely to be transitory (Papa and Hochstrasser 1993; Papa
et al. 1999).
It is known that, unlike the RNA polymerase II core enzyme, the SRB complex and the
other holoenzyme components are only needed for the transcription of a fraction of genes
(Holstege et al. 1998). In other words, the holoenzyme is an example of a complex of
transitory nature with a permanent core. This permanent-and-transitory structure is
clearly evident in the gene expression analysis. For the core enzyme, the average
correlation in both the cell cycle and Rosetta data sets are significantly higher than for the
random control (Figure 3). However, for the SRB complex and a variety of other, smaller
components (e.g. the TAFIIs) the average correlations are virtually indistinguishable from
the random control.
Replication Complex
As a whole, the replication complex exhibits a low average correlation not significantly
different from that of the random control (figures 3 and 4). However, figure 4 shows how
the entire complex breaks into subcomponents in terms of correlations in the cell-cycle
experiment. The individual correlations for each of the subcomponents are much higher
than that of the complex as a whole. This indicates that the replication complex is
composed of independent units in terms of expression regulation. Using the permanent-
transient terminology, each subcomponent behaves similarly to an independent
permanent complex, whereas the replication complex as a whole can be characterized as
transient. The permanent sub-components can be seen to come together to form a
transient functional entity. (Note, this effect is more evident in the cell cycle experiment
than the Rosetta data, as it should only be observable in a synchronized population of
cells, not those averaged across the cell cycle.)
Figure 1 shows the distribution of normalized differences and figure 2 the distributions of
correlation coefficients between interacting proteins in the aggregated data sets. The
distributions of normalized differences are relatively similar to those of the transient
protein complexes. The physical interactions show the smallest median normalized
difference while the yeast two-hybrid interactions have a median normalized difference
closest to the random control (~0.5). Figure 2 shows that the correlation distributions for
Jansen et al. - 10 -
the aggregated data sets are fairly similar among themselves and only slightly shifted
towards the right of the distribution curve for random protein pairs. This, again, is very
similar to the behavior of transient protein complexes.
Thus, overall, it seems fair to conclude that the aggregated protein-protein interactions
are related to mRNA expression in a similar fashion as the transient protein complexes.
Of course, even for the complexes that we do classify, the terms "transient" and
"permanent" are somewhat of an over-simplification. In particular, our detailed
discussions of the RNA polymerase II holoenzyme and the replication complex above are
precisely two examples where our simplified terminology fails to completely explain the
situation since these complexes are somewhere between fully "transient" and
"permanent".
Jansen et al. - 11 -
One can think about the distinction between permanent and transient in terms of the
mathematical model introduced in the Methods section. Whenever a complex is formed,
its subunits tend to be expressed at equimolar protein concentrations: Pi Pj and
dP dt i dPj dt dPi dt dPj dt (where Pi and Pj are the protein concentrations of two
subunits i and j). If the complex is "permanent", then these conditions should be
approximately or vaguely met. If the complex is "transient", then these conditions can be
relaxed in those situations where the complex is not formed. There are some complexes,
that are always formed ("permanent") whereas the "transient" complexes are only formed
under particular conditions. There can be different degrees of being transient: for
instance, complexes that are formed under 80% of conditions or those that are formed
under 20% of conditions. The transient complex formed under 80% of conditions
behaves almost like "permanent" (i.e., 100% of conditions), whereas the transient
complex formed only 20% of the time would be expected to show less significant
normalized differences and correlations.
If one goes as far as to accepts the premise that the subunits in a complex should be
present at equimolar amounts, then it is perhaps circular reasoning to say that they should
also be co-expressed.
Jansen et al. - 12 -
Noise in the expression and interaction data
In general, the interactions in the aggregated datasets exhibited surprisingly little
deviation from randomness in terms of the co-expression of interaction pairs. This was
most strongly observed for the yeast two-hybrid data. It is true that, overall, this
deviation from randomness is statistically significant. All the same, the gene expression
data and the aggregated protein interaction data do not reinforce each other strongly and
it seems that the prediction of these type of interactions from expression data would be of
little benefit.
Perhaps the most optimistic view of this situation is that the strong degree of
independence of the two types of data makes both of them suitable for use in machine-
learning approaches to characterize genes of unknown function: if they were strongly
correlated, then one type of data could perhaps well replace the other since it represents
very similar information. A negative view would be that the reason for the surprisingly
weak relationship between the aggregated interactions and mRNA expression are to be
found in the problems with the either the expression or the interaction data.
We feel confident that our results are robust to the noise in the expression data for the
following reasons. With respect to the correlation analysis of expression profiles roughly
the same results (in terms of statistical significances) can be obtained for two independent
data sets (the cell-cycle timecourse and the Rosetta knockout series). The normalized
difference analysis is perhaps more sensitive to problems with the data, in particular,
considering that the measurement of absolute expression levels with gene chips is
problematic to start with. However, we have looked at an integrated dataset from various
chip experiments and the SAGE data, thus averaging out errors to some degree (see
Methods). In addition, for both the correlation and the normalized difference analysis, we
have concentrated on the statistical significance of distributions rather than relying on the
error-prone data for individual protein pairs, thus observing more robust, aggregate trends
for whole complexes and groups of proteins.
Jansen et al. - 13 -
Part of the aggregated data, in particular the yeast two-hybrid data, represent a relatively
new approach to studying protein-protein interactions and it is interesting to note that it,
obviously, includes some interactions implied by the complexes. However, the degree of
intersection with possible complexes interactions ranges from 35% for the physical
interactions to only approximately 6% for the yeast two-hybrid data (as a fraction of the
number of interactions in the aggregated datasets). This is surprisingly low, given that
the yeast two-hybrid data is from experiments that covered the complete genome (Uetz et
al. 2000; Ito et al. 2001). Independently, Ito et al. (2001) have reported that only a small
fraction of the previous yeast two-hybrid data (Uetz et al. 2000) overlapped with their
own yeast two-hybrid results. (Although Ito and colleagues assumed that their core data
was similar in quality as the Uetz data, the fraction of interactions present in both datasets
was only 16.8% for the Ito core and 20.4% for the Uetz data).
Jansen et al. - 14 -
Methods
Jansen et al. - 15 -
Rosetta data contains genome-wide expression ratios for 300 stationary cell states, which
are derived from 280 gene deletions and the 20 drug interaction experiments.
xk x
relation X k , where x denotes the average and σx the standard deviation of
x
values in x, and Xk and xk are the kth components of their respective profiles.
Given a group of N genes we can compute the correlation coefficient matrix R, where
each element ij of the matrix denotes the Pearson correlation coefficient between genes i
and j. We can then compute the average correlation coefficient by averaging the
matrix elements (excluding the main diagonal). This statistic gives an idea of the overall
similarity of the expression profiles in a group of genes. Although there are O(N2)
elements in R, the computation time for can be kept proportional to O(N) by using the
linearity of the correlation to calculate as follows:
1 N 1 1
R ij N 2
XT XT N ,
N N i, j N N M 1
2
N
where X T X n is the sum of all expression profiles in the group of N genes.
n1
Jansen et al. - 16 -
dPi dt dPj dt . Using a simple model for the relationship between mRNA and protein
concentrations, we can see how even under these ideal conditions similarity measures
based on the mRNA concentrations would deviate from perfect results. For instance, A a
linear kinetic model for the protein concentration Pi and the mRNA concentration Ri of a
subunit i in a complex is given by:
dPi
k Ri Ri k Pi Pi
dt
where kRi is an mRNA translation rate constant and kPi is a protein degradation constant.
It is clear that only under the strong assumption that the two protein degradation
constants are equal (kPi = kPj)
Ri (t ) k Rj
const
R j (t ) k Ri
Thus, the two mRNA expression levels are only expected to be equal if the ratios of the
rate constants for translation and degradation are the same for both proteins. This is not
necessarily the case for the subunits of a complex and therefore normalized differences
should not be expected to be zero.
Jansen et al. - 17 -
It is clear that the arguments above are based on a variety of simplifying assumptions. In
reality, there are additional factors (such as the noise in the expression data, the stochastic
nature of gene expression) that add even more difficulty to the analysis of mRNA levels.
Jansen et al. - 18 -
Figure Captions
Figure 1
Distributions of normalized differences for various groups of proteins in boxplot
representation. The normalized difference Dij is a measure of the relative similarity of
two absolute gene expression levels Ei and Ej. The middle panel shows the distribution
for two protein complexes (the large ribosomal subunit and the 20S proteasome). Note
that we considered all theoretically possible protein pairs within the protein complex (as
indicated in the schematic drawing above the panel). The right panel shows the
distribution for the aggregated datasets of protein-protein interactions (Y2H is yeast two-
hybrid) (Bader and Hogue 2000; Cagney et al. 2000; Fellenberg et al. 2000; Ito et al.
2000; Schwikowski et al. 2000; Uetz et al. 2000; Uetz and Hughes 2000; Xenarios 2000;
Ito et al. 2001). Unlike in the complexes, where we consider interactions among a whole
group of proteins, the interactions in the aggregated datasets are specific to individual
protein pairs (see schematic drawing). The left panel shows two control distributions of
the normalized difference, on the left for pairs of nuclear and cytoplasmic proteins --
which presumably, because of spatial separation, do not interact -- and on the right for
any random protein pair ("all transcripts") in yeast. The distribution of nuclear versus
cytoplasmic proteins is strongly skewed towards one (the maximum value of the
normalized difference), which is partially explained by the fact that cytoplasmic proteins
tend to have higher expression levels than cytoplasmic ones (Drawid 2000; Drawid and
Gerstein 2000). The distribution of all transcripts is nearly uniform (with a median of
0.5) -- see Methods. The complexes distributions are clearly skewed towards zero with
medians between 0.2 and 0.3. The medians of the distributions of the aggregated datasets
are still somewhat smaller than the control median, most notably for the physical
interactions dataset; on the other hand, there is virtually no difference between the control
and the distribution of the yeast two-hybrid dataset.
The aggregated data, obviously, includes some interactions implied by the complexes,
with the degree of intersection ranging from 35% for the physical interactions to
approximately 4%6% for Y2H.
Jansen et al. - 19 -
Figure 2
Distributions of correlation coefficients between expression profiles. In part A we show
distributions of the average correlation N of N genes for the cell cycle experiments.
The gray curve in the background represents the case N = 2 (i.e., simply the distribution
of pair-wise correlations). In the case of N > 2, N is defined as the average of all
possible (N2-N)/2 pairwise correlations among the N genes. We show here, as examples,
the distributions for N = 3 and N = 5. The distributions obviously become narrower,
reflecting the fact that it becomes more unlikely to find large groups of strongly
correlated genes at random as N increases.
These distributions provide a suitable control for the observed correlations between pairs
of genes (N = 2) or for the average correlations among the subunits of a complex (N > 2).
This P-value then represents the chance that a group of N randomly selected genes could
exhibit an average correlation greater than or equal to that of a complex with N proteins
(see figure 3).
Part B and C show the distribution of pair-wise correlations for both the cell cycle and
the Rosetta experiments in two protein complexes (the ribosome and the proteasome) as
well as for the aggregated datasets (genetic, physical and Y2H). The gray curves in the
background are the control distributions for N = 2 as explained above. The distributions
for the ribosome and the proteasome are strongly shifted to the right of the control; this
effect is much weaker for the datasets of aggregated interactions.
Jansen et al. - 20 -
Figure 3
Part A consolidates various key statistics shown in figures 1 and 2 for the ribosome and
proteasome as well as for a large number of protein complexes. We list all protein
complexes from the MIPS catalog having at least 10 ORFs. The complexes are divided
into three classes: permanent, transient or "other" (see below). Some complexes can be
divided into smaller sub-complexes (e.g., the ribosomes) as indicated. The table lists
(from left to right) the average expression level of the complex, the median normalized
difference (see figure 1A), the average correlation for the cell cycle and Rosetta
experiments (see figure 2), the negative logarithm of the P-value of the average
correlations in both experiments (see figure 2), and the size of the complex in terms of
the number of ORFs.
In general, the P-values for the average correlations are very low for most of the
permanent protein complexes (accordingly, -log10(P) is very high), indicating that these
averages are significantly greater than for random groups of proteins of the same size.
The same cannot be observed for the transient protein complexes, for which the
correlation averages are usually much smaller.
The section "other" at the bottom of part A contains complexes that are either difficult to
classify as permanent/transient or for which, due to very small turnover rates, down-
regulations of mRNA levels take a very long time to affect protein abundance. The H+-
transporting ATPase can be thought of as containing a mixture of permanent and transient
components at the same time(Kane 2001). The nuclear pore complex (NPC) and the
TRAPP complex are known to have low turnover rates (Bucci and Wente 1997; Winey et
al. 1997; Sacher et al. 1998; Barrowman et al. 2000). The NPC has relatively small
average correlations, but this still yields P-values of 10-3 (cell cycle) and <10-4 (Rosetta)
because the nuclear pore complex is a relatively large aggregation of proteins, and even
these weak average correlations are very unlikely to occur for random groups of proteins
of this size. The TRAPP protein complex, while existing throughout the cell cycle, has a
low turnover rate and as such its mRNA expression data would not be sufficient for our
analysis.
Jansen et al. - 21 -
The RNA polymerase holoenzyme is composed of both permanent and transient
components. Note that the MIPS complexes catalog does not include the SWI/SNF
chromatin-remodeling complex and a subset of basal transcription factors (Wilson et al.
1996) as part of the holoenzyme, thus we list them separately here.
The list does not include those categories from the MIPS complexes catalog that do not
really represent protein complexes per se but rather aggregations of disparate proteins
that are involved in similar types of complex interactions, such as the "actin-associated"
and "tubulin-associated" protein groups.
Part B shows a graphical representation of part of the protein complex statistics from part
A. The abscissa and ordinate represent the average correlations in the cell cycle and the
Rosetta data, while the bubble sizes are a function of the normalized differences (larger
bubbles represent larger normalized differences). In general, the permanent complexes
tend to be located in the upper right region of the plot, whereas transient complexes are
closer to the random control in the lower left.
Figure 4
Part A of the figure shows a representation of the replication complex and its components
on the same coordinates as the protein complexes in figure 3B. The transient replication
complex can be decomposed into smaller complexes: the origin recognition complex, the
MCM proteins, and the DNA polymerases and . Whereas the whole replication
complex exhibits an average correlation close to zero (in both the cell cycle and the
Rosetta data), the four smaller complexes show greater correlations in the cell cycle
experiment. The four sub-complexes behave more like permanent complexes than the
replication complex as a whole.
Part B shows the correlation coefficient matrix for the subunits of the replication
complex derived from the cell cycle data. The upper triangle of the correlation matrix
shows the individual correlation coefficients for particular gene pairs (with darker colors
Jansen et al. - 22 -
indicating higher correlations). The lower triangle shows the average correlations for
subgroups of proteins (representing the MCM proteins, the two DNA polymerases, and
the origin of the replication complex) within the complex as a whole. The table on the
right side shows which genes belong to which subgroups in different colors. The genes
were ordered with unsupervised clustering (average linkage) without regard to their
classification according to the three subgroups. It can be seen that this order reflects the
separation according to the subgroups very well (only the proteins in the two DNA
polymerase cannot be separated into two groups). An exception is the CDC45 protein
that belongs to the MCM proteins but tends to cluster with the DNA polymerases.
Jansen et al. - 23 -
References
Anderson, L. and J. Seilhamer. 1997. A comparison of selected mRNA and protein
abundances in human liver. Electrophoresis 18: 533-7.
Aparicio, O.M., D.M. Weinstein, and S.P. Bell. 1997. Components and dynamics of DNA
replication complexes in S. cerevisiae: redistribution of MCM proteins and
Cdc45p during S phase. Cell 91: 59-69.
Ashburner, M., C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry, A.P. Davis, K.
Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris, D.P. Hill, L. Issel-Tarver, A.
Kasarskis, S. Lewis, J.C. Matese, J.E. Richardson, M. Ringwald, G.M. Rubin, and
G. Sherlock. 2000. Gene ontology: tool for the unification of biology. The Gene
Ontology Consortium. Nat Genet 25: 25-9.
Bader, G.D. and C.W. Hogue. 2000. BIND--a data specification for storing and
describing biomolecular interactions, molecular complexes and pathways.
Bioinformatics 16: 465-77.
Barrowman, J., M. Sacher, and S. Ferro-Novick. 2000. TRAPP stably associates with the
Golgi and is required for vesicle docking. Embo J 19: 862-9.
Brown, M.P., W.N. Grundy, D. Lin, N. Cristianini, C.W. Sugnet, T.S. Furey, M. Ares, Jr.,
and D. Haussler. 2000. Knowledge-based analysis of microarray gene expression
data by using support vector machines. Proc Natl Acad Sci U S A 97: 262-7.
Bucci, M. and S.R. Wente. 1997. In vivo dynamics of nuclear pore complexes in yeast. J
Cell Biol 136: 1185-99.
Cagney, G., P. Uetz, and S. Fields. 2000. High-throughput screening for protein-protein
interactions using two- hybrid assay. Methods Enzymol 328: 3-14.
Califano, A., G. Stolovitzky, and Y. Tu. 2000. Analysis of gene expression microarrays
for phenotype classification. Proc Int Conf Intell Syst Mol Biol 8: 75-85.
Cho, R.J., M.J. Campbell, E.A. Winzeler, L. Steinmetz, A. Conway, L. Wodicka, T.G.
Wolfsberg, A.E. Gabrielian, D. Landsman, D.J. Lockhart, and R.W. Davis. 1998.
A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell 2: 65-
73.
Jansen et al. - 24 -
Christendat, D., A. Yee, A. Dharamsi, Y. Kluger, M. Gerstein, C.H. Arrowsmith, and
A.M. Edwards. 2000a. Structural proteomics: prospects for high throughput
sample preparation. Prog Biophys Mol Biol 73: 339-45.
Christendat, D., A. Yee, A. Dharamsi, Y. Kluger, A. Savchenko, J.R. Cort, V. Booth, C.D.
Mackereth, V. Saridakis, I. Ekiel, G. Kozlov, K.L. Maxwell, N. Wu, L.P.
McIntosh, K. Gehring, M.A. Kennedy, A.R. Davidson, E.F. Pai, M. Gerstein,
A.M. Edwards, and C.H. Arrowsmith. 2000b. Structural proteomics of an
archaeon. Nat Struct Biol 7: 903-9.
Coux, O., K. Tanaka, and A.L. Goldberg. 1996. Structure and functions of the 20S and
26S proteasomes. Annu Rev Biochem 65: 801-47.
D'haeseleer, P., Wen,X.,Fuhrman,S.,Somogyi,R. 1997. Mining the gene expression
matrix: inferring gene relationships from large scale gene expression data.
Information processing in cells and tissues. In Plenum (ed. P. M. Holcombe, R),
pp. 203-212.
Drawid, A., Jansen, R. & Gerstein, M. 2000. Genome-wide analysis relating expression
level with protein subcellular localization. Trends in Genetics 16: 426-430.
Drawid, A. and M. Gerstein. 2000. A Bayesian system integrating expression data with
sequence patterns for localizing proteins: comprehensive application to the yeast
genome. J Mol Biol 301: 1059-75.
Eisenberg, D., E.M. Marcotte, I. Xenarios, and T.O. Yeates. 2000. Protein function in the
post-genomic era. Nature 405: 823-6.
Emili, A.Q. and G. Cagney. 2000. Large-scale functional analysis using peptide or protein
arrays. Nat Biotechnol 18: 393-7.
Fellenberg, M., K. Albermann, A. Zollner, H.W. Mewes, and J. Hani. 2000. Integrative
analysis of protein interaction data. Proc Int Conf Intell Syst Mol Biol 8: 152-61.
Futcher, B., G.I. Latter, P. Monardo, C.S. McLaughlin, and J.I. Garrels. 1999. A sampling
of the yeast proteome. Mol Cell Biol 19: 7357-68.
Gaasterland, T. and S. Bekiranov. 2000. Making the most of microarray data. Nat Genet
24: 204-6.
Jansen et al. - 25 -
Gerstein, M. and R. Jansen. 2000. The current excitement in bioinformatics-analysis of
whole-genome expression data: how does it relate to protein structure and
function? Curr Opin Struct Biol 10: 574-84.
Greenbaum, D., R. Jansen and M. Gerstein. 2002. Analysis of mRNA expression and
protein abundance data: An approach for the comparison of the enrichment of
features in the cellular population of proteins and transcripts. Bioinformatics. (in
press).
Golub, T.R., D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller,
M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander.
1999. Molecular classification of cancer: class discovery and class prediction by
gene expression monitoring. Science 286: 531-7.
Gygi, S.P., Y. Rochon, B.R. Franza, and R. Aebersold. 1999. Correlation between protein
and mRNA abundance in yeast. Mol Cell Biol 19: 1720-30.
Heyer, L.J., S. Kruglyak, and S. Yooseph. 1999. Exploring expression data: identification
and analysis of coexpressed genes. Genome Res 9: 1106-15.
Hishigaki, H., K. Nakai, T. Ono, A. Tanigami, and T. Takagi. 2001. Assessment of
prediction accuracy of protein function from protein- protein interaction data.
Yeast 18: 523-31.
Hochstrasser, M. 2001. Personal Communication.
Holstege, F.C., E.G. Jennings, J.J. Wyrick, T.I. Lee, C.J. Hengartner, M.R. Green, T.R.
Golub, E.S. Lander, and R.A. Young. 1998. Dissecting the regulatory circuitry of
a eukaryotic genome. Cell 95: 717-28.
Hughes, T.R., M.J. Marton, A.R. Jones, C.J. Roberts, R. Stoughton, C.D. Armour, H.A.
Bennett, E. Coffey, H. Dai, Y.D. He, M.J. Kidd, A.M. King, M.R. Meyer, D.
Slade, P.Y. Lum, S.B. Stepaniants, D.D. Shoemaker, D. Gachotte, K.
Chakraburtty, J. Simon, M. Bard, and S.H. Friend. 2000. Functional discovery via
a compendium of expression profiles. Cell 102: 109-26.
Ito, T., T. Chiba, R. Ozawa, M. Yoshida, M. Hattori, and Y. Sakaki. 2001. A
comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc
Natl Acad Sci U S A 98: 4569-74.
Jansen et al. - 26 -
Ito, T., K. Tashiro, S. Muta, R. Ozawa, T. Chiba, M. Nishizawa, K. Yamamoto, S. Kuhara,
and Y. Sakaki. 2000. Toward a protein-protein interaction map of the budding
yeast: A comprehensive system to examine two-hybrid interactions in all possible
combinations between the yeast proteins. Proc Natl Acad Sci U S A 97: 1143-7.
Jansen, R. and M. Gerstein. 2000. Analysis of the yeast transcriptome with structural and
functional categories: characterizing highly expressed proteins. Nucleic Acids Res
28: 1481-8.
Jelinsky, S.A. and L.D. Samson. 1999. Global response of Saccharomyces cerevisiae to
an alkylating agent. Proc Natl Acad Sci U S A 96: 1486-91.
Johannes, G., M.S. Carter, M.B. Eisen, P.O. Brown, and P. Sarnow. 1999. Identification
of eukaryotic mRNAs that are translated at reduced cap binding complex eIF4F
concentrations using a cDNA microarray. Proc Natl Acad Sci U S A 96: 13118-23.
Kane, P. 2001. Personal Communication.
Kruiswijk, T., R.J. Planta, and W.H. Mager. 1978. Quantitative analysis of the protein
composition of yeast ribosomes. Eur J Biochem 83: 245-52.
Li, B., C.R. Nierras, J.R. Warner. 1999. Mol Cell Biol 19: 5393-5404.
Z. Lian, Wang, L., Yamaga, S., Bonds, W., Beazer-Barclay, Y., Kluger, Y., Gerstein, M.,
Newburger, P.E., Berliner, N., Weissman, S.M. 2001. Genomic and proteomic
analysis of the myeloid differentiation program. Blood 98: 513-24.
Luscombe, N.M., R.A. Laskowski, D.R. Westhead, D. Milburn, S. Jones, M.
Karmirantzou, and J.M. Thornton. 1998. New tools and resources for analysing
protein structures and their interactions. Acta Crystallogr D Biol Crystallogr 54:
1132-8.
Mewes, H.W., D. Frishman, C. Gruber, B. Geier, D. Haase, A. Kaps, K. Lemcke, G.
Mannhaupt, F. Pfeiffer, C. Schuller, S. Stocker, and B. Weil. 2000. MIPS: a
database for genomes and protein sequences. Nucleic Acids Res 28: 37-40.
Nomura, M. Regulation of Ribosome Biosynthesis in Escherichia coli and
Saccharomyces cerevisiae: Diversity and Common Principles. Journal of
Bacteriology 181: 6857-6864.
Jansen et al. - 27 -
Papa, F.R., A.Y. Amerik, and M. Hochstrasser. 1999. Interaction of the Doa4
deubiquitinating enzyme with the yeast 26S proteasome. Mol Biol Cell 10: 741-
56.
Papa, F.R. and M. Hochstrasser. 1993. The yeast DOA4 gene encodes a deubiquitinating
enzyme related to a product of the human tre-2 oncogene. Nature 366: 313-9.
Pitman, J. 1993. Probability. Springer-Verlag, New York.
Planta, R.J. 1997. YeastIYeast 13: 1505-1518.
Qian, J., M. Dolled-Filhart, J. Lin, Gerstein, M. 2002. Beyond synexpression
relationships: Clustering of time shifted and inverted gene expression profiles
identifies new biologically relevant interactions. J Mol Biol (in press).
Raychaudhuri, S., P.D. Sutphin, J.T. Chang, and R.B. Altman. 2001. Basic microarray
analysis: grouping and feature reduction. Trends Biotechnol 19: 189-93.
Roth, F.P., J.D. Hughes, P.W. Estep, and G.M. Church. 1998. Finding DNA regulatory
motifs within unaligned noncoding sequences clustered by whole-genome mRNA
quantitation [see comments]. Nat Biotechnol 16: 939-45.
Sacher, M., Y. Jiang, J. Barrowman, A. Scarpa, J. Burston, L. Zhang, D. Schieltz, J.R.
Yates, 3rd, H. Abeliovich, and S. Ferro-Novick. 1998. TRAPP, a highly conserved
novel complex on the cis-Golgi that mediates vesicle docking and fusion. Embo J
17: 2494-503.
Schadt, E.E., C. Li, C. Su, and W.H. Wong. 2000. Analyzing high-density oligonucleotide
gene expression array data. J Cell Biochem 80: 192-202.
Schwikowski, B., P. Uetz, and S. Fields. 2000. A network of protein-protein interactions
in yeast. Nat Biotechnol 18: 1257-61.
Subrahmanyam, Y.V., S. Yamaga, Y. Prashar, H.H. Lee, N.P. Hoe, Y. Kluger, M. Gerstein,
J.D. Goguen, P.E. Newburger, and S.M. Weissman. 2001. RNA expression
patterns change dramatically in human neutrophils exposed to bacteria. Blood 97:
2457-68.
Teichmann, S.A., A.G. Murzin, and C. Chothia. 2001. Determination of protein function,
evolution and interactions by structural genomics. Curr Opin Struct Biol 11: 354-
63.
Jansen et al. - 28 -
Uetz, P., L. Giot, G. Cagney, T.A. Mansfield, R.S. Judson, J.R. Knight, D. Lockshon, V.
Narayan, M. Srinivasan, P. Pochart, A. Qureshi-Emili, Y. Li, B. Godwin, D.
Conover, T. Kalbfleisch, G. Vijayadamodar, M. Yang, M. Johnston, S. Fields, and
J.M. Rothberg. 2000. A comprehensive analysis of protein-protein interactions in
Saccharomyces cerevisiae. Nature 403: 623-7.
Uetz, P. and R.E. Hughes. 2000. Systematic and large-scale two-hybrid screens. Curr
Opin Microbiol 3: 303-8.
Velculescu VE, Z.L., Zhou W, Vogelstein J, Basrai MA, Bassett DE Jr, Hieter P,
Vogelstein B, Kinzler KW. 1997. Characterization of the yeast transcriptome. Cell
88: 243-251.
Walhout, A.J. and M. Vidal. 2001. High-throughput yeast two-hybrid assays for large-
scale protein interaction mapping. Methods 24: 297-306.
Wen, X., S. Fuhrman, G.S. Michaels, D.B. Carr, S. Smith, J.L. Barker, and R. Somogyi.
1998. Large-scale temporal gene expression mapping of central nervous system
development. Proc Natl Acad Sci U S A 95: 334-9.
Westhead, D.R., T.W. Slidel, T.P. Flores, and J.M. Thornton. 1999. Protein structural
topology: Automated analysis and diagrammatic representation. Protein Sci 8:
897-904.
Whitby, F.G., E.I. Masters, L. Kramer, J.R. Knowlton, Y. Yao, C.C. Wang, and C.P. Hill.
2000. Structural basis for the activation of 20S proteasomes by 11S regulators.
Nature 408: 115-20.
Wilkinson, C.R., M. Penney, G. McGurk, M. Wallace, and C. Gordon. 1999. The 26S
proteasome of the fission yeast Schizosaccharomyces pombe. Philos Trans R Soc
Lond B Biol Sci 354: 1523-32.
Wilson, C.J., D.M. Chao, A.N. Imbalzano, G.R. Schnitzler, R.E. Kingston, and R.A.
Young. 1996. RNA polymerase II holoenzyme contains SWI/SNF regulators
involved in chromatin remodeling. Cell 84: 235-44.
Winey, M., D. Yarar, T.H. Giddings, Jr., and D.N. Mastronarde. 1997. Nuclear pore
complex number and distribution throughout the Saccharomyces cerevisiae cell
cycle by three-dimensional reconstruction from electron micrographs of nuclear
envelopes. Mol Biol Cell 8: 2119-32.
Jansen et al. - 29 -
Woolford, J.L., and J.R. Warner. 1991. in The Molecular and Cellular Biology of the
Yeast Saccharomyces: Genome Dynamics, Protein Synthesis, and Energetics (J.R.
Broach, J.R. Pringle, and E.W. Jones, eds), pp. 587-626, Cold Spring Harbor
Laboratory Press.
Xenarios, I., L. Salwinski L, M.K. Baron, E.M. Marcotte, D. Eisenberg. 2000. DIP: the
Database of Interacting Proteins. Nucleic Acids Research 28: 289-291.
Jansen et al. - 30 -