0% found this document useful (0 votes)
8 views8 pages

2008 - Assessing Conservation of Disordered Regions in Proteins

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views8 pages

2008 - Assessing Conservation of Disordered Regions in Proteins

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

46 The Open Proteomics Journal, 2008, 1, 46-53

Open Access
Assessing Conservation of Disordered Regions in Proteins

Ágnes Tóth-Petróczy1, Bálint Mészáros1, István Simon1, A. Keith Dunker2,


Vladimir N. Uversky2,3,4,* and Monika Fuxreiter1,*

1
Institute of Enzymology, Biological Research Center, Hungarian Academy of Sciences, H-1518 Budapest, Hungary,
2
Department of Biochemistry and Molecular Biology, Center for Computational Biology and Bioinformatics, Indiana
University School of Medicine, IN 46202-2111 Indianapolis, USA, 3Institute for Biological Instrumentation, Russian
Academy of Sciences, 142290 Pushchino, Moscow Region, Russia, 4Institute for Intrinsically Disordered Protein Re-
search, Indiana University School of Medicine, IN 46202-2111 Indianapolis, USA
Abstract: Intrinsically disordered regions (IDRs) are highly populated in eukaryotic proteomes and serve pivotal, mostly
regulatory functions. Many IDRs appear to be functionally conserved and analysis of protein domains indicates high pro-
pensity of conserved regions predicted to be disordered. Nevertheless, it is difficult to assess conservation of IDRs in gen-
eral due to their fast evolution and low sequence similarity. We propose three measures to evaluate conservation of IDRs:
i) similarities of the disorder profiles using different prediction conditions; ii) the conservation of amino acids with pro-
pensities for promoting either disorder or order; and iii) the overlap between disordered/ordered regions. These measures
are computed on multiple sequence alignments that also include low-complexity regions of proteins. Using three subunits
of the Mediator complex of transcription regulation from Homo sapiens and Drosophila melanogaster as an example we
show that despite of their sequence dissimilarity IDRs can be conserved and likely carry out the same function in different
organisms.

INTRODUCTION facilitating formation of new contacts (chaperones) [16].


Formation of the scrapie form of prions is also critically de-
The wealth of recent experimental and theoretical evi-
dence indicates that proteins or protein segments may exist pendent on the intermediate disordered state [17]. Large mul-
tiprotein complexes also take advantage of IDPs that assist
as a rapidly fluctuating ensemble of conformations both in
assembly of these organizations (assemblers). The RNA po-
vitro and in vivo conditions [1, 2]. These intrinsically disor-
lymerase II disordered C-terminal domain provides a plat-
dered proteins (IDPs) or proteins with long intrinsically dis-
form for the mRNA processing machinery [18]. Alterna-
ordered regions (IDRs; >30 aa) can adopt a continuum of
tively, IDPs can capture and store small ligands (scaven-
structural states such as completely unstructured, molten
globules or locally disordered tails and linkers [3, 4]. The gers). This underlies the response to dehydration stress in
plants achieved by water retention by Desiccation stress pro-
variety of disordered states can be beneficial, even pre-
tein (Dsp) 16 [19].
requisite for various biological roles [5-9].
IDRs can act as entropic chains (linkers, clocks, bristles) IDPs or proteins with long IDRs (> 30 aa) are highly
as the Nup2p FG repeat region of the nuclear pore complex populated in eukaryotic proteomes [20, 21] and are often
for example is responsible for regulation of gating [10]. associated with regulatory functions such as signal transduc-
IDRs often serve as target sites for post-translational modifi- tion or transcription [5]. Analysis of several sets of proteins
cations (display sites) such as the KID domain of CREB, related to various diseases revealed that IDRs are highly
which phosphorylation induces its binding to the KIX do- abundant in proteins associated with cancer [22], cardiovas-
main of CBP [11]. Binding of IDPs can also modulate the cular disease [23], Parkinson’s disease and other synuclei-
effect of the partner (effectors). For example, p27Kip1 regu- nopathies, Alzheimer’s, prion diseases and diabetes. Addi-
lates cell-cycle by binding to cyclin dependent kinases and tional confirmation of the high prevalence of intrinsically
inhibiting their activity [12]. Intriguingly, malleability of disordered proteins in human diseases came from the func-
IDPs enables binding in different conformations leading to tional annotation over the entire Swiss Protein database from
unrelated, even opposite functions [13]. Activation and inhi- a structured-versus-disordered point of view [24]. Thus, in-
bition of ryanodine receptor can be resulted by the binding of trinsic disorder is very common in disease-associated pro-
the disordered C fragment of dihydropiridine receptor teins, giving rise to the disorder in disorders concept, which
(DHPR) in two conformations [14]. IDPs frequently partici- we are calling the “D2 concept” [25].
pate in folding of proteins (like heat-shock proteins, Hsps In spite of their biological importance, it is very difficult
[15]) or RNA partly by unfolding the incorrect structures and to assess the conservation of IDRs based on simple sequence
comparisons. IDRs in general are located in low-complexity
*Address correspondence to these authors U.V.N. at the Department of regions, which is depleted in aliphatic (Ile, Leu, and Val) and
Biochemistry and Molecular Biology, Center for Computational Biology aromatic amino acid residues (Trp, Tyr, and Phe) [26], which
and Bioinformatics, Indiana University School of Medicine, IN 46202-2111 hampers formation of hydrophobic cores that promote fold-
Indianapolis, USA; E-mail: [email protected] and M.F. at the Institute of
Enzymology, Biological Research Center, Hungarian Academy of Sciences, ing of globular structures. Instead, they are enriched in
H-1518 Budapest, Hungary; E-mail: [email protected] charged and polar amino acid residues: Arg, Gln, Ser, Glu,

1875-0397/08 2008 Bentham Open


Assessing Conservation of Disordered Regions in Proteins The Open Proteomics Journal, 2008, Volume 1 47

Lys and structure breaking residues (Gly, Pro) designated as containing low-complexity regions. First we generated pro-
disorder-promoting residues [27]. Furthermore, as it has files based on groups of sequences that fulfilled the E < 10-5
been shown for 26 protein families, intrinsically disordered threshold (> 30 bits). Then, the PSI-BLAST search is re-
regions evolve faster (KA/KS = 0.4 – 0.8) than globular pro- peated using these profiles and the resulting groups are com-
tein domains (KA/KS = 0.1 – 0.2) [28]. Nevertheless, in mul- pared to the previously identified ones, from which the pro-
tiple sequence alignments that underlie identification of do- files have been extracted. If the two PSI-BLAST searches
mains a high percentage of positions with conserved pre- resulted in identical groups, a multiple sequence alignment
dicted disorder was found [29, 30]. These results suggest can be carried out. Alternatively, the PSI-BLAST searches
conservation of IDRs even in the absence of apparent se- are continued with profiles updated in each cycle until the
quence similarity. This calls for a novel measure that can be sequences in the resulted sequence groups converge. These
used to assess conservation of IDRs. sequences are subjected to a multiple sequence alignment
using the profile, which has been extracted from the last PSI-
Recently we have carried out the bioinformatics analysis
BLAST run. These multiple sequence alignments were per-
on proteins of the Mediator complex of transcription regula-
formed using the CLUSTALW algorithm [35].
tion (Fig. 1) (Tóth-Petróczy et al., submitted). These proteins
play role in transmitting regulatory information from activa- The alignments for the Med4 and Med9 subunits of the
tors/repressors to the basal transcription machinery [31]. Middle module and for the Med12 of the Cdk module of the
They exhibit low sequence conservation and lack globular Mediator complex from Drosophila melanogaster and Homo
domains that are usually present in transcriptional proteins. sapiens generated by the iterative alignment algorithm are
Using two independent predictors, IUPred [32] and PONDR- shown in Fig. (2). In case of Med4 and Med9 these align-
VSL [33], we have shown the abundance of disordered re- ments show a fair agreement with previous results [36] as
gions in the Mediator proteins, especially in those that par- indicated by the superposition between the marked ordered
ticipate in regulatory signal transfer. We have also found that segments. Advantages of using an iterative algorithm be-
in spite of the low sequence conservation of IDRs in Media- came apparent when more sequences were considered. For
tor proteins, they exhibit fairly similar location and distribu- example, in case of Med9, sequences from Saccharomyces
tion in different organisms. Motivated by these observations cerevisiae (Sc), Saccharomyces pombe (Sp), Caenorhabditis
we propose to assess conservation of intrinsically disordered elegans (Ce), Drosophila melanogaster (Dm), Homo sapiens
regions based on i) similarity of the IDRs predicted in differ- (Hs) were found to be homologous based on ordered motifs
ent conditions ; ii) conservation of propensities of amino [36], while by our alignment only Ce, Dm, Hs sequences
acid residues promoting order and disorder iii) an overlap could be aligned (Tóth-Petróczy et al., submitted). A higher
between ordered/disordered regions. These measures reflect value of sequence conservation on Ce, Dm, Hs organisms
different aspects of intrinsic disorder/order properties of a can also be obtained on the alignment generated by the itera-
given system and their combination provides comprehensive tive algorithm as compared to alignment based on ordered
characteristics. We demonstrate the application of these motifs.
measures on Med4, Med9 and Med12 proteins of the Media-
tor complex of transcription regulation to assess the struc- ASSIGNMENT OF INTRINSICALLY DISORDERED
tural and thus the possible functional conservation of IDRs. REGIONS
Intrinsically disordered regions were predicted based on
the unfavorable contact energies according by the IUPred
algorithm (http://iupred.enzim.hu) that exploits the principle
enhancer that IDPs cannot fold as their amino acids cannot form suffi-
Tail cient inter-residue interactions to overcome the entropic pen-
activator alty of folding [32, 37]. The overall interaction energy is
Middle Cdk estimated by pairwise interresidue potentials, approached by
a low-resolution force field. In general, we distinguish two
M di
Mediator Med9
Med12 types of disorder: short and long disorder. Short disorder is
Med4
associated with loops that are missing from crystal structures
GTFs and usually are 5-12 residues in length. In contrast, long dis-
Head ordered regions can span several hundred residues in length,
RNAP II and may be functional on their own, even separated from the
rest of the sequence. The sequence window along which the
TATA
inter-residue potentials are computed can discriminate be-
Fig. (1). The Mediator of transcription regulation. The Tail (yellow) tween these two types of disorder. In general, long disor-
interacts with activators bound at enhancers, the Middle (green) dered regions are identified using a 100-residue window,
transmits regulatory signals, while the Head (orange) interacts with while for a short disorder a 25-residue window is used.
RNA polymerase II. The Cdk module (blue) usually dissociates
Changes in the predicted disorder profile upon altering the
prior to initiation of transcription. The analyzed subunits: Med4 and
Med9 of the Middle and Med12 of Cdk are labelled inividually.
sequence window size can discriminate far-lying and proxi-
mal sequence effects and may provide a general, inherent
feature of the sequence. In Fig. (2A and B) we indicated the
ALIGNMENT OF PROTEINS WITH INTRINSI-
predicted short and long disorder (predicted with 25 and
CALLY DISORDERED REGIONS
100-residue windows, respectively) for human and droso-
We designed an iterative, PSI-BLAST-based [34] align- phila Med4 and Med9 proteins. Disordered regions (IDRs)
ment scheme to align full sequences of Mediator proteins were defined as continuous segments of residues with score
Tóth-Petróczy et al.

A Med 4 1 10 20 30 40 50 60 70 80 90 100
Hs_25 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hs_100 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dm_25 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dm_100 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hs_ave M A A S S S G E K E K E R L G G G L G V A G G N S T R E R L L S A L E D L E V L S R E L I E M L A I S R N Q K L L Q A GE E N Q V L E L L I H R D G E F Q E L MK L A L N Q G K I H H E M Q V L E K E V
Dm_ave - - - - - - - - - - - - - - - - - - - - - - - - M S F H L S T K E R L L L L I D D I E M I A K E L I E Q A H Q K I S S T E L V D L L D L L V A K D E E F R K M L E L A E E Q A K V E E A M D Q L R A K V
110 120 130 140 150 160 170 180 190 200
Hs_25 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hs_100 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dm_25 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dm_100 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hs_ave E K R D S D I Q Q L Q K Q L K E A E Q I L A T A V Y Q A K E K L K S I E K A R K G A I S S E E I I K Y A H R I S A S N A V C A P L T WV P GD P R R P Y P T D L E M R S G L L G Q - - - - M N N P S T N
Dm_ave E V H D R E I Q K L Q K S L K D A E L I L S T A I F Q A R QK L A S I N Q A N K R P V S S E E L I K Y A H R I S S A N A V S A P L T WC I GD L R R P Y P T D I E M R N G L L G K S E Q N I N G G T V T
210 220 230 240 250 260 270 280
Hs_25 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hs_100 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dm_25 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dm_100 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hs_ave G V N G H L P G D A - - - - - - - - - - - - - - L A A G R L P D V L A P Q Y P WQ S N D M S M N M L P P N H S S D F L L E P P G H N K E N E D D V E I M S T D S S S S S S E S D -
Dm_ave H Q N S G M P S E QQ R T L S G S A G S G S G S G A G G E V P N A F Q N Q F N WN L G E L H M T M GA S G N T V A L - - - - - - - E T R A QD D V E V M S T D S S S S S S S D S Q
B Med 9
1 10 20 30 40 50 60 70 80 90 100
Hs_25 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hs_100 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dm_25 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dm_100 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hs M D V S G Q E T D WR S T A F R Q K L V S Q I E D A M R K A G V A H S K S S K D M E S H V F L K A K T R D E Y L S L V A R L I I H F R D I H N K K S Q A S V S D P M N A L Q S L T GG P A A G A A G I G
Dm - - - - - M T E D WQ S Q K F R Q N V I S K I H D L L P P N A Q D Q T K N A G V M E N H I F R K S R T K D E Y L G L V A K L F M H Y K D M S R K S Q Q Q Q Q Q QQ Q Q G G P P P N A E M G G G Q N M M Q
110 120 130 140 150 160 170 180 190 200
Hs_25 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hs_100 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dm_25 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dm_100 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hs M P P R G P G Q S L G G M G S L G A M GQ P M S L S G Q P P P G T S G M A P H S M A V V S T A T P QT Q L Q L Q Q V A L Q Q Q Q Q Q Q Q F QQ Q Q Q A A L Q Q QQ Q Q Q Q Q Q Q F QA Q Q S A M Q Q Q F
Dm D P L N A L Q N L A S Q G N R N P Q M MP M G A G G G A P V P G G P G T A S N L L Q S L N Q Q R P GQ Q Q M Q P M S N I R G Q M P M G A G GA G A Q Q M M Q V QQ M Q Q G G N A P GV M N V M G A G G G
210 220 230 240 250 260 270 280 290 300
Hs_25 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hs_100 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dm_25 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dm_100 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hs Q A V V Q Q Q Q Q L Q Q Q Q Q Q Q Q H L I K L H H Q N Q Q QI Q Q Q Q Q Q L Q R I A Q L Q L Q Q Q QQ Q Q Q Q Q Q Q Q QA L Q A Q P P I Q QP P M Q Q P Q P P P S Q A L P Q Q L Q QM H H T Q H H Q P P
Dm Q N Q G Q I V G N P G Q Q M G V G V G MP N Q M V G P G P N S G P A V G G A G GP N A A P G A G G P G P N Q M Q G G P MN V N A M Q Q M P P M Q Q I Q Q N Q L GM G M N P M M R M GQ G N G M G G P Q G
310 320 330 340 350 360 370 380 390 400
Hs_25 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hs_100 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dm_25 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dm_100 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hs P Q P Q Q P P V A QN Q P S Q L P P Q S Q T Q P L V S Q A QA L P G - Q M L Y T Q P P L K F V R A P M V V Q Q P P V Q P Q V Q Q Q Q T A V QT A Q A A Q M V A P G V Q V S Q S S L P M L S S P S P G Q Q
Dm M P G Q G M Q G M P Q G P H N V V G G P A G Q Q Q V G G A GL P P N A V Q Q G GM N P M G G M G V N M P P N L Q Q K P N M P M G Q A G Q M F P G N R G G V G V GG Q Q P G Q P F M R S S P S P A D A Q Q
48 The Open Proteomics Journal, 2008, Volume 1

410 420 430 440 450 460 470 480 490 500
Hs_25 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hs_100 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dm_25 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dm_100 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hs V Q T P Q S M P P P P Q P S P Q P G Q P S S Q P N S N V S S G P A P S P S S F L P S P S P Q P S Q S P V T A R T P Q N F S V P S P G P L N T P V N P S S V M S P A G S S Q A E E Q QY L D K L K Q L S K
Dm L Q Q Q A Q L Q Q MQ Q Q Q Q Q L V V GN Q T P T Q Q P P T P Q M P T P N M I P S P A L V P Q S S P Q M M Q M Q N S Q R N I R Q Q S P S A S I N T P G Q V T G N S P F N P Q E E A L Y R E K Y K Q L T K
510 520 530 540 550 560 570 580 590 600
Hs_25 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hs_100 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dm_25 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dm_100 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hs Y I E P L R R M I N K I D K N E D R K K D L S K M K S L L D I L T D P S K R C P L K T L Q K C E I A L E K L K N D M A V P T P P P P P V P P T K Q Q Y L C Q P L L D A V L A N - - I R S P V F N H S L Y
Dm Y I E P L K R M L A K I S N D G T N V E K M T K M S K L L E I L C N P T Q R V P L E T L L K C E K A L E K M D - - - - - - - - - - - L I S Y S G Q Q F G K S S N P L L E V I N T T L Q S P V A N H T L Y
610 620 630 640 650 660 670 680 690 700
Hs_25 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hs_100 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dm_25 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dm_100 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hs R T F V P A M T A I H G P P I T A P V V C T R K R R L E D D E - - R Q S I P S V L Q G E V A R L D P K F L V N L D P S H C S N N G T V H L I C K L D D K D L P S V P P L E L S V P A D Y P A Q S P L WI
Dm R T F R P T L E L L F G T D I T A P V P A K R P R V E E K S T S F E Q E V P H V L Q G E I A R L D T K F K V K L D T T S Q I N N K A I R L I C C L D D K R L P S V P P V S V S V P E E Y P WQ A P D C S
710 720 730 740 750 760
Hs_25 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hs_100 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dm_25 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dm_100 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hs D R Q WQ Y D A N P F L Q S V H R C M T S R L L Q L P D K H S V T A L L N T WA Q S V H Q A C L S A A - - - - - - - - - - - - - -
Dm L A E QE Y S A T P F L Q T V Q Q A L I A R I S K L P K N Y S L S H L L D T WE M A V R Q A C S P QS K P R A V C E L S T L L G V
windows are shown by green and brown bars above the sequence. Consensus predictions are marked on the corresponding sequences by yellow.
generated by the PSI-BLAST based iterative algorithm. Intrinsically disordered regions predicted by the IUPred algorithm using 25 and 100 residues

Conserved ordered regions predicted by previous alignments are shown by black boxes. Groups of similar amino acid residues are shown by cyan
(Fig. 2) contd….

Fig. (2). Alignments of sequences of A) Med4, B) Med9 and C) Med12 proteins from Homo sapiens (Hs) and Drosophila melanogaster (Dm)
49
The Open Proteomics Journal, 2008, Volume 1

(K/R/H), green (A/S/T), blue (I/L/V/M/C/F/Y/W), magenta (G/P), red (E/D/N/Q) [36]. Plots were generated by the Alscript program [48].
C Med 12
1 10 20 30 40 50 60 70 80 90
Hs M K Q S M P S L H T K K I L F C Y F H L T N S WC L R R Y G L G K M A A F G I L S Y E H R P L K R P R P R L G P P D V Y P Q D P K Q K E D E L T A L N V K Q G F N N Q P A V S G D E
Dm - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ML S ML QE K R P L K R T R L GP P D I Y P QD A K QR E D E L T P T N V K H GF T T T P P L S - D E
100 110 120 130 140 150 160 170 180
Hs H G S A K N V S F N P A K I S S N F S S I I A E K L R C N T L P D T G R R K P Q V N Q K D N F WL V T A R S Q S A I N T WF T D L A G T K P L T Q L A K K V P I F S K K E E V F G Y
Dm F G T A H N S N V N A S K V S A F F S G V L A K K E E L M T L P D T G R K K Q Q I N C K D N F WP V S P R R K C T V D A WF K D L A G N K P L L S L A K R A P S F N K K E E I F I T
190 200 210 220 230 240 250 260 270
Hs L A K Y T V P V M R A A WL I K M T C A Y Y A A I S E T K V K K R H V - D P F M E WT Q I I T K Y L WE Q L Q K M A E Y Y R P - - - - - - - - - - - - - - - - - - - - G P A G S G G
Dm L C E N Q V N M Q R A T WF I K L S A A Y T L S F T E S K N K K R S I Y D P A A E WT G N M I K F M K E L L P K L Q E Y Y Q Q N H D K S S S N G T T S G S L T A A G N G P A S N G S
280 290 300 310 320 330 340 350 360
Hs C GS T I G- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dm T GT S S I N S V T GS S A S T N V I P V P S MA S P L P P I H S P A N GQQA A P GGGV N A GS V MP T T GS L GGV V GGP GS S V V GGA A GA GA A V P P GS T I S GI G
370 380 390 400 410 420 430 440 450
Hs P L P H D V E V A I R Q WD Y T E K L A M F M F Q D G M L D R H E F L T WV L E C F E K I R P G - - - E D E L L K L L L P L L L R Y S G E F V Q S A Y L S R R L A Y F C T R R L A L
Dm S Q F E D S R N A L K Y WK Y C H Q L S K Y M Y E E S L L D R Q E F L N WI L D L L D K M R T Q A S F D E P L K K L V L S F A L Q Y M H D F V Q S E R L C R K M A Y I V S K K L A Q
460 470 480 490 500 510 520 530 540
Hs Q L D G V S S H S S H V I S A Q S T S T L P T T P A P Q P P T S S T P S T P F S D L L M C P Q H R P L V F G L S C I L Q T I L L C C P S A L V WH Y S L T D S R I K T G - - - S P L
Dm L L N - - - - - - - - - - - T V V E Q Q T I K E L D E P K L Q Q D P Y E L A L Q E Q M S C P H H R D I V L Y L S T I L Q I I T I E C P T A L V WS G - I A A H R A P S S L L G S P L
550 560 570 580 590 600 610 620 630
Hs D H L P I A P S N L P M P E G N S A F T Q Q V R A K L R E I E Q Q I K E R G Q A V E V R WS F D K C Q E A T A G F T I G R V L H T L E V L D S H S F E R S D F S N S L D S L C N R I
Dm D H L P L A P S V L P M P T R C P R T N H E I R R Q L R A A E S D I V L R T Q H A E Q R WF A A K WL S A G K - N Q Y T S V L A T L D H L D T H C F D R M E H N N S I D T L Y A Q I
640 650 660 670 680 690 700 710 720
Hs F G L G P S - - - - - - - - - - K D G H E I S S D D D A V V S L L C E WA V S C K R S G R H R A M V V A K L L E K R Q A E I E A E R C G - E S E A A D E K G S I A S G S L S A P S A
Dm F P S P T V S R R R E E D Q V E P R P P Y E P K Q D K D T V R I L C E WA V S G Q R WG E H R A M V V A I L L D K R Q I D V T S T P A D Q Q S S D K D D K D S L A S G A G L I D G L
730 740 750 760 770 780 790 800 810
Hs P I F QD V L L QF L D T QA P ML T D P - R S E S E R V E F F N L V L L F C E L I R H D V F S H N MY T C T L I S R GD L A F GA P GP R P P S P F D D P A D D P E H K E A E GS
Dm P V F QH V L MH F L D H D A P V L D E H V S S P QQR T E F T N L V QL F S A L I R H D V F S H N A Y MH T L I S R GD L L L E S V L V I K S GT T A T K T S P P P P A P P P T T
820 830 840 850 860 870 880 890 900
Hs S S S K L E D P GL S E S MD I D P S S S V L F E D ME K P D F S L F S P T MP C E GK GS P S P E K P D V E K E V K P P P K E K I E GT L GV L Y D QP R H V QY A T H F P I P Q
Dm T H GF D D D GF G- - - - - - - GGL D F K H N E F D D S N V D D D L D K L V QN I K E K GQQH E A P D S P K I GP P GD GE T N P GGS I S R H Y V Y T K H F P I P QD D P S
Assessing Conservation of Disordered Regions in Proteins

910 920 930 940 950 960 970 980 990


Hs E E S C S H E C N QR L V V L F GV GK QR D D A R H A I K K I T K D I L K V L N R K GT A E T D QL A P I V P L N P GD L T F L GGE D GQK R R R N R P E A F P T A E D I F A K
Dm MS S Y S S E S N QR Y I L L F GV GK E R D E K K H A V K K MS K E I GK L F T K K F S I D V A A A G- - - - - - - - - - - - - - - - - - - - H V K K H S R N E F N F E A T T S K
1000 1010 1020 1030 1040 1050 1060 1070 1080
Hs F QH L S H Y D QH QV T A QV S R N V L E QI T S F A L GMS Y H L P L V QH V QF I F D L ME Y S L S I S GL I D F A I QL L N E L S V V E A E L L L K S S D L V GS Y T T S L
Dm C QQMA Y F D QH V V T A QC A A N V L E QL N GF A L GN N N Y L P V QE H V A F L F D L ME L A L N I Y S L L E L C D S L L K E L P E V E H QL QL K K S N L V R S Y T T S L
1090 1100 1110 1120 1130 1140 1150 1160 1170
Hs C L C I V A V L R H Y H A C L I L N QD QMA QV F E GL C GV V K H GMN R S D GS S A E R C I L A Y L Y D L Y T S C S H L K N K F G- - - - E L F S D F C S K V K N T I Y C N V
Dm A L Y I V S I L R R Y H S C L L L S P E QT L S V F E GV C R T I R H V S N P S E C T S A E R C I I A Y L S D L H E S C V L L QGK E QS T E Y Y QQL QC I K R F K D I F N T P E
1180 1190 1200 1210 1220 1230 1240 1250 1260
Hs E P S E S N M R WA P E F M I D T L E N P A A H T F T Y T G L G K S L S E N P A N R Y S F V C N A L M H V C V G H H D P D R V N D I A I L C A E L T G Y C K S L S A E WL G V L K A
Dm Q L D L P P Q G Y N P L L L Q E L F M A P R R G G K L D P H WL G T L H E S P A N V Y S F V S N A L I A V C R - E T D N E R L N D V A L A C A E L T A S C N V L S E E WI Y A L Q S
1270 1280 1290 1300 1310 1320 1330 1340 1350
Hs L C C S S N N GT C GF N D L L C N V D V S D L S F H D S L A T F V A I L I A R QC L L L E D L I R C A A I P S L L N - - - - - A A C S E QD S E P GA R L T C R I L L H L F K T P
Dm L C S GS K S P - - R Y P H L GGQV D I GQL K T H N A L A V F V C I L V A R H C F S L A D F V S K F A L P T L A R S V S A GGA E L S V D A E A GA R L T C H L V L K L F K T L
1360 1370 1380 1390 1400 1410 1420 1430 1440
Hs QL N - - - - - - - - - - P C QS D GN K P T V GI R S S C D R H L L A A S QN R I V D GA V F A V L K A V F V L GD A E L K GS GF T V T G- - - - - - - - - - - - - - - - - - -
Dm E I P QP GMY S V S T S P N P L H A V GN D F S I R L S C D R H L L V GA H K T I P I A A V L A V L K A I L I V V D N A A L K T P L A S GS GT S S GGL GGA F GS GK R S GF
50 The Open Proteomics Journal, 2008, Volume 1 Tóth-Petróczy et al.

> 0.5 with the minimum length of 5 residues. Above disor- separately for these two regions. Conservation of the groups
dered segment lengths of 10 residues a 3 residue ordered gap of similar amino acid residues (defined as R/K/H, A/S/T,
was allowed. Interestingly, using a larger window in disorder I/L/V/M/C/F/Y/W, G/P and E/D/N/Q [36]) exhibits higher
prediction does not necessarily increase the size of the pre- values, in agreement with the low-complexity of IDR se-
dicted IDRs. In case of Med4 long disorder prediction identi- quences (Table 2B).
fies shorter IDRs in both organisms than short disorder pre-
diction, while in Med9 an opposite trend can be observed Table 2. Conservation of A) Individual Amino Acid Residues
(longer disordered segments were predicted in both se- (AA_CONS) and B) Groups of Similar Amino acid
quences using a 100 residue window). Overall, the propensi- Residues (GAA_CONS; Defined as R/K/H, A/S/T,
ties of residues belonging to disordered segments vary with I/L/V/M/C/F/Y/W, G/P and E/D/N/Q [36]) in the
the size of sequence window as shown in Table 1 (for 25 and Disordered Segments (DIS), Ordered Segments
100-residue windows, respectively). The trends are rather (ORD) and in the Whole Med4, Med9 and Med12
similar for the same protein from different organisms, as Proteins (TOT) in Homo Sapiens and Drosophila
they mostly depend on the preference for a certain type of Melanogaster. For Disordered and Ordered Seg-
disorder. This observation suggests that variation in disor- ments Only the Overlapping Regions were Consid-
dered segments upon altering the sequence context (over ered, Whereas for the Total Conservations Only the
which the prediction was performed) is an inherent property Gaps have been Excluded. The Conservation Scores
of the overall sequence. have been Computed by Using a Simple Sum-of-
Pairs Formula on the Alignment Generated by the
Based on propensities of residues predicted to be disor- Iterative Algorithm Described in the Text
dered using different windows for prediction, Med9 can be
considered as a disordered protein and Med12 as an ordered
protein that contains some long IDRs. The level of disorder AA_CONS
in Med4 is between Med9 and Med12 with an ordered N- A
terminal and a disordered C-terminal region (Fig. 2C). DIS ORD TOT

Med4 36.90 42.07 42.26


Table 1. Propensities of Amino Acid Residues in Intrinsically
Disordered Segments of Med4, Med9 and Med12 Med9 13.9 51.36 26.99
Proteins from Homo Sapiens (Hs) and Drosophila
Melanogaster (Dm). The Disordered Regions were Med12 14.50 43.00 33.43
Predicted by the IUPred Algorithm [32], with 100
Residue and 25 Residue Windows, Respectively. O/D
Ratio Designates Ratio of the Propensities of Disor- GAA_CONS
der- and Order Promoting Amino Acid Residues in
B
Exclusively Disordered Regions
DIS ORD TOT

Disorder Disorder O/D Med4 58.33 62.07 61.92


(w=100) (w=25) Ratio
Med9 27.19 69.55 42.05

Med4 Hs 0.44 0.47 0.43 Med12 31.19 63.96 52.37

Med4 Dm 0.35 0.35 0.42

Med9 Hs 0.57 0.48 0.25 CONSERVATION OF AMINO ACID COMPOSITION


Med9 Dm 0.66 0.63 0.31
OF INTRINSICALLY DISORDERED REGIONS

Med12 Hs 0.32 0.37 0.30


Amino acid composition of disordered regions in Med4,
Med9 and Med12 proteins as compared to an average com-
Med12 Dm 0.40 0.43 0.32 position of globular proteins [6] are shown in Fig. (3). All
proteins are enriched in Arg, Gly, Gln, Ser, Pro, Glu, and
Lys residues that are generally abundant in IDPs [27] (re-
SEQUENCE CONSERVATION OF INTRINSICALLY ferred as disorder-promoting residues) and are depleted in
DISORDERED REGIONS hydrophobic amino acids (Ile, Leu, Val, Trp, Tyr, and Phe)
(referred to order-promoting residues) in similar manner [26]
Sequence conservations of human and drosophila Med4, as inferred from the analysis of the DisProt database [38].
Med9 and Med12 proteins were computed separately for Although compositions in human and drosophila proteins are
disordered and ordered regions, over individual amino acid biased for intrinsic disorder, remarkable deviations can be
residues (Table 2A). As expected, the sequence similarity of observed (e.g. in propensities of Gly in Med9 and Pro in
disordered regions is considerably lower than that of the or- Med4 that show an opposite deviation from the composition
dered regions. It especially holds for Med9 and Med12 that of globular proteins), and also in the composition of charged
are equipped with over 400-residue-long IDRs. Please note residues (K, E). Compositions of order-promoting and disor-
that since conservation in disordered and ordered regions der-promoting residues in disordered regions however, are
refer only to overlapping segments (cf., Fig. 4), the total con- considerably more stable as shown in column three of Table
servation is not an average of the conservations obtained 1 by the ratios of the percentages of the two types of residues
Assessing Conservation of Disordered Regions in Proteins The Open Proteomics Journal, 2008, Volume 1 51

(referred as O/D ratio). The O/D ratios exhibit a good agree- SEGMENTAL OVERLAP BETWEEN INTRINSI-
ment between the two organisms and in accord with the pro- CALLY DISORDERED REGIONS
pensities of residues involved in disordered segments the
In spite of their low sequence conservation, IDRs align
O/D ratios reflect a similar level of disorder of the protein in
well in human and drosophila Med4, Med9 and Med12 se-
different organisms. These results are further corroborated
by previous studies on Med4 from 15 organisms that exhibit quences even by visual inspection (cf. yellow regions in Fig.
2). To assess the similarity of the IDR patterns, we quanti-
negligible variation in the composition of disorder- and or-
fied the overlap between ordered/disordered regions pre-
der-promoting amino acid residues (Tóth-Petróczy et al.,
dicted in different sequences. Multiple alignments generated
submitted).
by Clustalw algorithm [35] were converted into a binary
code, with two states (D and O, for disordered and ordered
regions, respectively) defined based on the IUPred predic-
tions. Gaps were excluded as they reflect variations in the
size of a given disordered/ordered segment. Estimating the
disorder/order properties in gapped regions of the alignment
is in progress in our laboratory. Similarly to the assessment
of the quality of secondary structure predictions, we com-
pared the overlap between residues predicted to be or-
dered/disordered in different sequence pairs.
The accuracy matrix M was built from the number of
residues that were predicted to conform to identical disor-
dered (D) or ordered (O) states. The two-state overall accu-
racy is defined as [39]:

Q2 =
100 2
 M ii i  {D, O} (1)
N i =1
where N is the total number of residues and i runs over the
two conformational states.
In addition to the per-residue based evaluation the actual
overlap between patterns of disordered and ordered segments
can be computed using the so-called segmental overlap
(SOV) measure [40]. The advantage of using SOV is that it
effectively captures the segmental characteristics of the se-
quence that is schematically illustrated in Fig. (4). For M
conformational states, SOV is defined as:
100 M min ov( S1 ; S 2 ) +  ( S1 ; S 2 ) (2)
SOV =   len( S1 )
N i =1 Si max ov( S1 ; S 2 )

where S1 and S2 stand for segments in two distinct se-


quences, respectively, minov(S1 ; S2) is the length of the over-
lap between S1 and S2, maxov(S1; S2) is the total extent of S1
and S2 in the given conformational state and len(S1) is the
length of the segment in the reference sequence. (S1; S2) is
the minimum of [(maxov(S1; S2) – minov(S1; S2); minov(S1;
S2); int(len(S1)/2); int(len(S2)/2)]. The normalization factor N
is given by the number of residues in conformational state i
and the second summation runs over all M conformational
states. We have to note that computing SOV separately for
disordered and ordered segments is also meaningful.

Fig. (3). Amino acid compositions in disordered segments of A)


Med4, B) Med9 and C) Med12 from Homo sapiens (red) and Dro-
sophila melanogaster (green), relative to the set of globular pro- Fig. (4). Schematic representation of the segmental overlap analy-
teins. Composition of IDPs of the DisProt database [38] is shown sis. Disordered regions are colored by yellow, ordered regions by
by blue bars. The amino acids are arranged from left to right in white, and gaps by black. The actual overlap is marked by dashed
order of their increasing propensity to promote disorder. boxes.
52 The Open Proteomics Journal, 2008, Volume 1 Tóth-Petróczy et al.

The overall accuracy values obtained for Med4, Med9 obtained on the actual sequences corroborating the signifi-
and Med12 proteins are shown in Table 3 for the full se- cance of the results. Differences between the SOV and
quences as well as separately for disordered and ordered RSOV values are lower than those between the Q and RQ
regions. The high Q values indicate that more than 70% of values reflecting that disordered regions are primarily deter-
the residues in total belong to identical regions in human and mined by the amino acid composition rather than the actual
drosophila in sequences of the same protein. In the more sequence.
ordered Med4 and Med12 proteins the match between disor-
dered residues is lower than that of ordered residues, while DISCUSSION
in Med9 a contrary behavior can be observed, likely due to
Proteins with intrinsically disordered regions are ubiqui-
the abundance of disordered regions. Please note that by
tous in eukaryotic proteomes: almost 33% of proteins have
definition, the total Q is not an average of the values ob- long (>30 aa) disordered regions [20, 21]. Depending on
tained in disordered and ordered regions separately (cf., Fig.
their actual amino acid compositions, IDRs may conform to
4). The SOV values are expected to be smaller than the accu-
different categories of structural disorder such as a confor-
racy measures due to the variations in the length of IDRs.
mational ensemble, a pre-molten or molten globule or a dis-
This is indeed the case for Med9 and Med12, where the pre-
ordered tail or linker of an otherwise globular protein [3, 4,
dicted IDR in the drosophila sequence is significantly longer
8, 41]. These properties are generally determined by the
than in human. For Med4, the SOV value is 100% reflecting amino acid composition rather than the actual sequence indi-
a very small deviation between the predicted regions, which
cated also by their separation in the charge-hydropathy space
is compensated by the  term in eq. 2. The statistical signifi-
[3]. Hence, despite their low sequence conservation (as
cance of the accuracy measures and segmental overlap val-
shown in Table 2 for Med4, Med9 and Med12 proteins)
ues has been assessed by comparing them to the correspond-
IDRs can have similar structural characteristics and thus can
ing Q and SOV values obtained on shuffled sequences (ran-
carry out homologous functions in different organisms.
domizing the sequence 50 times). The resulting RQ and
RSOV values are considerably lower than the Q and SOV Thus, even in the absence of apparent sequence homol-
ogy functional information can be inferred from conserved
IDRs. Therefore, instead of the conventional sequence con-
Table 3. Overlap between Ordered (ORD) and Disordered
servation we propose three measures to assess conservation
(DIS) Regions in Med4, Med9 and Med12 Droso-
phila and Human Proteins. For Disordered and Or- of IDRs in different organisms: i) the similarities between
dered Segments Only the Overlapping Regions were disorder patterns using different prediction conditions, ii) the
Considered, Whereas for the Full Proteins (TOT) conservation of the propensities of disorder- and order pro-
Only the Gaps have been Excluded. A) Q is the moting residues and iii) the overlap between or-
Overall Two-State Accuracy Computed Based on dered/disordered patterns.
the Number of Residues in Identical Ordered or We demonstrated the application of these measures for
Disordered States. B) SOV is the Segmental Overlap three Mediator proteins. All were shown to contain long
[40] that is Obtained Based on the Actual Corres-
IDRs that in case of Med9 and Med12 span several hundred
pondance between the Disordered/Ordered Segments
in Different Sequences. The Reference Values RQ
residues. Studies on these three proteins illustrate that de-
and RSOV were Obtained Using Shuffling the spite of sequence dissimilarities, IDRs in Mediator proteins
Sequences (50 Times). For the Total Conservations can be aligned well. A similar level of disordered in different
Only the Gaps have been Excluded (so it is not an organisms was witnessed by the similar disorder/order pro-
Average of the Q and SOV Values Obtained on Dis- moting amino acid ratios in different organisms. The agree-
ordered and Ordered Segments, Separately) ment between patterns of disordered/ordered regions was
quantified using the segmental overlap measure (SOV)
adopted for IDRs. High values of SOV and overall accuracy
Q RQ (Q) and their significant deviation from the corresponding
A measures obtained on shuffled sequences corroborate the
DIS ORD TOT DIS ORD TOT conservation of IDRs in Med4, Med9 and Med12 proteins.
All three proteins are involved in complex regulatory
Med4 94.7 96.8 95.8 43.5 57.1 43.5
pathways of the Mediator complex. Med4 and Med9 belong
Med9 79.8 72.9 75.5 70.9 28.8 54.8 to the Middle module of the Mediator complex that transmits
regulatory signals for transcription from the Tail module to
Med12 79.1 89.9 86.4 34.9 66.5 55.8
the Head [42] that in turn interacts directly with the RNA
polymerase II-TFIIF for pre-initiation complex formation
[43]. The Middle module also receives repression signals
SOV RSOV from the CDK module, which dissociates prior to transcrip-
B tion [44]. The Med9 was shown to physically and genetically
DIS ORD TOT DIS ORD TOT interact with Cdk8 and CycC of the CDK module. Based on
the abundance and conservation of IDRs in Med9 we pro-
Med4 100.0 100.0 100.0 32.2 29.9 29.9 pose that these IDRs play critical role in mediating these
interactions and induce large-scale conformational rear-
Med9 55.1 52.4 54.2 33.5 17.9 29.1 rangements of the complex that accompany transcription
Med12 44.4 52.3 49.4 16.8 28.7 25.1
[45]. The conserved IDR in Med4 contains a phosphoryla-
tion site (T237 in yeast) that plays a role in enhancement of
Assessing Conservation of Disordered Regions in Proteins The Open Proteomics Journal, 2008, Volume 1 53

CTD phosphorylation by TFIIH [46]. Med12 belongs to the [20] Dunker, A.K.; Obradovic, Z.; Romero, P.; Garner, E.C.; Brown,
CDK module, which can inhibit, but also activate transcrip- C.J. Genome Inform. Ser. Workshop Genome Inform., 2000, 11161.
[21] Tompa, P.; Dosztanyi, Z.; Simon, I. J. Proteome Res., 2006, 58,
tion [47]. Such complex functioning with opposite outcomes, 1996.
often termed as moonlighting is also facilitated by the malle- [22] Iakoucheva, L.; Brown, C.; Lawson, J.; Obradovic, Z.; Dunker,
ability of IDRs [13]. Thus it is likely that such complex func- A.K. J. Mol. Biol., 2002, 3233, 573.
tioning of Med12 is also linked to the conserved, long IDRs [23] Cheng, Y.; LeGall, T.; Oldfield, C.J.; Dunker, A.K.; Uversky, V.N.
Biochemistry, 2006, 4535, 10448.
in this protein. [24] Xie, H.; Vucetic, S.; Iakoucheva, L.M.; Oldfield, C.J.; Dunker,
In conclusion, we find that important functional informa- A.K.; Obradovic, Z.; Uversky, V.N. J. Proteome Res., 2007, 65,
1917.
tion can be inferred from identifying conserved IDRs. As [25] Uversky, V.N.; Oldfield, C.J.; Dunker, A.K. Ann. Rev. Biophys.
sequence similarities of IDRs are generally low, we propose Mol. Biol., 2008, 37, 215.
to apply alternative measures such as disorder pattern simi- [26] Romero, P.; Obradovic, Z.; Li, X.; Garner, E.C.; Brown, C.J.;
larity and segmental overlap between disordered regions to Dunker, A.K. Proteins, 2001, 421, 38.
evaluate the conservation of these regions. [27] Vacic, V.; Uversky, V.N.; Dunker, A.K.; Lonardi, S. BMC
Bioinformatics, 2007, 8211.
[28] Brown, C. J.; Takayama, S.; Campen, A.M.; Vise, P.; Marshall,
ACKNOWLEDGEMENTS T.W.; Oldfield, C.J.; Williams, C.J.; Dunker, A.K. J. Mol. Evol.,
2002, 551, 104.
This work was supported in part by the grants MCRTN [29] Chen, J.W.; Romero, P.; Uversky, V.N.; Dunker, A.K. J. Proteome
2005-09566 of European FP6 (to M.F.), R01 LM007688- Res., 2006, 54, 888.
01A1 (to A.K.D and V.N.U.) and GM071714-01A2 (to [30] Chen, J.W.; Romero, P.; Uversky, V.N.; Dunker, A.K. J. Proteome
A.K.D and V.N.U.) from the National Institutes of Health Res., 2006, 54, 879.
[31] Kornberg, R.D. Trends Biochem. Sci., 2005, 305, 235.
and the Bolyai fellowship (M.F.). We gratefully acknowl- [32] Dosztanyi, Z.; Csizmok, V.; Tompa, P.; Simon, I. J. Mol. Biol.,
edge the support of the IUPUI Signature Centers Initiative. 2005, 3474, 827.
[33] Obradovic, Z.; Peng, K.; Vucetic, S.; Radivojac, P.; Brown, C.J.;
REFERENCES Dunker, A.K. Proteins, 2003, 53, 6566.
[34] Altschul, S.F.; Madden, T.L.; Schaffer, A.A.; Zhang, J.; Zhang, Z.;
[1] Romero, P.; Obradovic, Z.; Kissinger, C.R.; Villafranca, J.E.; Miller, W.; Lipman, D.J. Nucleic Acids Res., 1997, 2517, 3389.
Garner, E.; Guillot, S.; Dunker, A.K. Pac. Symp. Biocomputing, [35] Chenna, R.; Sugawara, H.; Koike, T.; Lopez, R.; Gibson, T.J.;
1998, 3437. Higgins, D.G.; Thompson, J.D. Nucleic Acids Res., 2003, 3113,
[2] Wright, P.E.; Dyson, H.J. J. Mol. Biol., 1999, 2932, 321. 3497.
[3] Uversky, V.N.; Gillespie, J.R.; Fink, A.L. Proteins, 2000, 413, 415. [36] Boube, M.; Joulia, L.; Cribbs, D.L.; Bourbon, H.M. Cell, 2002,
[4] Dyson, H.J.; Wright, P.E. Nat. Rev. Mol. Cell Biol., 2005, 63, 197. 1102, 143.
[5] Xie, H.; Vucetic, S.; Iakoucheva, L.M.; Oldfield, C.J.; Dunker, [37] Dosztanyi, Z.; Csizmok, V.; Tompa, P.; Simon, I. Bioinformatics,
A.K.; Uversky, V.N.; Obradovic, Z. J. Proteome Res., 2007, 65, 2005, 21, 3433.
1882. [38] Sickmeier, M.; Hamilton, J.A.; LeGall, T.; Vacic, V.; Cortese,
[6] Tompa, P. Trends Biochem. Sci., 2002, 2710, 527. M.S.; Tantos, A.; Szabo, B.; Tompa, P.; Chen, J.; Uversky, V.N.;
[7] Tompa, P. FEBS Lett., 2005, 57915, 3346. Obradovic, Z.; Dunker, A.K. Nucleic Acids Res., 2007, 35, D786.
[8] Dunker, A.K.; Obradovic, Z. Nat Biotechnol., 2001, 199, 805. [39] Rost, B.; Sander, C. Proc. Natl. Acad. Sci. U. S. A., 1993, 9016,
[9] Dunker, A.K.; Brown, C.J.; Lawson, J.D.; Iakoucheva, L.M.; 7558.
Obradovic, Z. Biochemistry, 2002, 4121, 6573. [40] Zemla, A.; Venclovas, C.; Fidelis, K.; Rost, B. Proteins, 1999, 342,
[10] Denning, D.P.; Patel, S.S.; Uversky, V.N.; Fink, A.L.; Rexach, M. 220.
Proc. Natl. Acad. Sci. U. S. A., 2003, 1005, 2450. [41] Uversky, V.N.; Oldfield, C.J.; Dunker, A.K. J. Mol. Recognit.,
[11] Parker, D.; Ferreri, K.; Nakajima, T.; LaMorte, V.J.; Evans, R.; 2005, 185, 343.
Koerber, S.C.; Hoeger, C.; Montminy, M.R. Mol. Cell Biol., 1996, [42] Kang, J.S.; Kim, S.H.; Hwang, M.S.; Han, S.J.; Lee, Y.C.; Kim,
162, 694. Y.J. J. Biol. Chem., 2001, 27645, 42003.
[12] Kriwacki, R.W.; Hengst, L.; Tennant, L.; Reed, S.I.; Wright, P.E. [43] Takagi, Y.; Calero, G.; Komori, H.; Brown, J.A.; Ehrensberger,
Proc. Natl. Acad. Sci. U. S. A., 1996, 9321, 11504. A.H.; Hudmon, A.; Asturias, F.; Kornberg, R.D. Mol. Cell, 2006,
[13] Tompa, P.; Szasz, C.; Buday, L. Trends Biochem. Sci., 2005, 309, 233, 355.
484. [44] Elmlund, H.; Baraznenok, V.; Lindahl, M.; Samuelsen, C.O.;
[14] Haarmann, C.S.; Green, D.; Casarotto, M.G.; Laver, D.R.; Koeck, P.J.; Holmberg, S.; Hebert, H.; Gustafsson, C.M. Proc.
Dulhunty, A.F. Biochem. J., 2003, 372, 305. Natl. Acad. Sci. U. S. A., 2006, 10343, 15788.
[15] van Montfort, R.L.; Basha, E.; Friedrich, K.L.; Slingsby, C.; [45] Davis, J.A.; Takagi, Y.; Kornberg, R.D.; Asturias, F.A. Mol. Cell,
Vierling, E. Nat. Struct. Biol., 2001, 812, 1025. 2002, 102, 409.
[16] Tompa, P.; Csermely, P. FASEB J., 2004, 1811, 1169. [46] Guidi, B.W.; Bjornsdottir, G.; Hopkins, D.C.; Lacomis, L.;
[17] Pierce, M.M.; Baxa, U.; Steven, A.C.; Bax, A.; Wickner, R.B. Erdjument-Bromage, H.; Tempst, P.; Myers, L.C. J. Biol. Chem.,
Biochemistry, 2005, 441, 321. 2004, 27928, 29114.
[18] Proudfoot, N.J.; Furger, A.; Dye, M.J. Cell 2002, 1084, 501. [47] Andrau, J.C.; van de Pasch, L.; Lijnzaad, P.; Bijma, T.; Koerkamp,
[19] Chakrabortee, S.; Boschetti, C.; Walton, L.J.; Sarkar, S.; M.G.; van de Peppel, J.; Werner, M.; Holstege, F.C. Mol. Cell,
Rubinsztein, D.C.; Tunnacliffe, A. Proc. Natl. Acad. Sci. U. S. A., 2006, 222, 179.
2007, 10446, 18073. [48] Barton, G.J. Protein Eng., 1993, 61, 37.

Received: March 28, 2008 Revised: April 15, 2008 Accepted: April 15, 2008

© Tóth-Petróczy et al.; Licensee Bentham Open.


This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/license/by/2.5/), which
permits unrestrictive use, distribution, and reproduction in any medium, provided the original work is properly cited.

You might also like