0% found this document useful (0 votes)
2 views

2017 - MFIB - a repository of protein complexes with mutual folding induced by binding

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

2017 - MFIB - a repository of protein complexes with mutual folding induced by binding

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Bioinformatics, 33(22), 2017, 3682–3684

doi: 10.1093/bioinformatics/btx486

Downloaded from https://academic.oup.com/bioinformatics/article-abstract/33/22/3682/4061276 by Europaisches Laboratorium fuer Molekularbiologie, Bibliothek user on 14 September 2018
Advance Access Publication Date: 3 August 2017
Applications Note

Databases and ontologies

MFIB: a repository of protein complexes with


mutual folding induced by binding
Erzsébet Fichó1, István Reményi2, István Simon1,*
and Bálint Mészáros1,*
1
Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest
H-1117, Hungary and 2Institute of Enzymology, RCNS, Hungarian Academy of Sciences, 0 Momentum0 Membrane
Protein Bioinformatics Research Group, Budapest H-1117, Hungary
*To whom correspondence should be addressed.
Associate Editor: Alfonso Valencia
Received on March 22, 2017; revised on June 26, 2017; editorial decision on July 26, 2017; accepted on August 2, 2017

Abstract
Motivation: It is commonplace that intrinsically disordered proteins (IDPs) are involved in crucial
interactions in the living cell. However, the study of protein complexes formed exclusively by IDPs
is hindered by the lack of data and such analyses remain sporadic. Systematic studies benefited
other types of protein–protein interactions paving a way from basic science to therapeutics; yet
these efforts require reliable datasets that are currently lacking for synergistically folding com-
plexes of IDPs.
Results: Here we present the Mutual Folding Induced by Binding (MFIB) database, the first system-
atic collection of complexes formed exclusively by IDPs. MFIB contains an order of magnitude
more data than any dataset used in corresponding studies and offers a wide coverage of known
IDP complexes in terms of flexibility, oligomeric composition and protein function from all domains
of life. The included complexes are grouped using a hierarchical classification and are comple-
mented with structural and functional annotations. MFIB is backed by a firm development team
and infrastructure, and together with possible future community collaboration it will provide the
cornerstone for structural and functional studies of IDP complexes.
Availability and implementation: MFIB is freely accessible at http://mfib.enzim.ttk.mta.hu/. The
MFIB application is hosted by Apache web server and was implemented in PHP. To enrich querying
features and to enhance backend performance a MySQL database was also created.
Contact: [email protected], [email protected]
Supplementary information: Supplementary data are available at Bioinformatics online.

1 Introduction 2007) giving rise to weak, transient, yet highly specific interactions.
Intrinsically disordered proteins (IDPs) do not have a stable struc- In accord, IDPs often represent hubs of protein–protein interaction
ture under native conditions (Wright and Dyson, 1999), yet they networks (Haynes et al., 2006) presenting promising therapeutic tar-
perform crucial biological roles being deeply embedded in regula- gets (Joshi and Vendruscolo, 2015).
tory and signaling pathways, amongst others (Dyson and Wright, In line with their biological importance, IDPs are heavily studied.
2005; Wright and Dyson, 2015). Despite the lack of intrinsic tertiary The resulting information are collected in disorder-specific data-
structure of IDPs, many critical biological processes require them to bases (such as DisProt, Piovesan et al., 2016 or IDEAL, Fukuchi
interact with molecular partners, most often other proteins. During et al., 2014) and are disseminated as various levels of annotation in
the vast majority of these interactions IDPs do adopt a stable bound core biology databases, such as UniProt (Pundir et al., 2017). The
structure—hence their folding is coupled to binding (Sugase et al., majority of these information pertains to the establishment of which

C The Author 2017. Published by Oxford University Press.


V 3682
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/),
which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact
[email protected]
A repository of protein complexes with mutual folding 3683

protein regions are disordered and which have intrinsic structure, manually curated information, protein chains in the candidate PDB

Downloaded from https://academic.oup.com/bioinformatics/article-abstract/33/22/3682/4061276 by Europaisches Laboratorium fuer Molekularbiologie, Bibliothek user on 14 September 2018
with some additional information about the detailed structural complexes were annotated using three different approaches.
properties of IDPs (Varadi et al., 2014). These data are in turn used First, some candidate protein chains had direct disorder annotations,
to develop prediction algorithms that enable the in silico identifica- meaning that they cover the same region in the corresponding UniProt
tion of IDP regions (Oates et al., 2013) and functional sites protein sequence as referenced in disorder databases. Second, annota-
(Dosztanyi et al., 2010; Malhis et al., 2016), which aids experimen- tions were transferred to close homologues, considering proteins that
tal verification, creating an iterative synergistic workflow. share at least 90% sequence identity (i.e. they belong to the same
This targeted research and synergy can be seen in the identifica- UniRef90 sequence cluster). As the third level of annotations, disorder
tion of IDPs; other areas of unstructural biology still lack this kind information was transferred through Pfam (release 31.0, Bateman,
of focus. The identification of the interactions of IDPs in structural 2000) objects (families, domains, motifs or repeats). If a Pfam object cov-
detail seems to be much more sporadic, lacking systematic targeted ered at least 70% of both an interacting chain and a disorder annotation,
efforts. While no specific IDP interaction database exists, a subset of then the disordered status was also assigned to the interacting chain.
such interactions have been studied in detail (Mészáros et al., 2007; Taking all three types of annotations (direct, UniRef90-transferred
Mohan et al., 2006). The interaction between IDPs and ordered pro- and Pfam-transferred) into account, all candidate complexes were
teins are often mediated by short linear motifs (SLiMs) residing in categorized. Complexes containing only disordered chains were kept;
the IDP partner (Fuxreiter et al., 2007), and in accord, SLiM data- and complexes with both disordered chains and chains without anno-
bases—such as the Eukaryotic Linear Motif database (Dinkel et al., tations were further inspected. If evidence uncovered using literature
2016)—can provide a starting point for structural studies of searches indicated that the unknown chains were in fact disordered,
IDP–ordered protein interactions. the complex was also kept. The database-based annotations coupled
In contrast to the study of IDP–ordered protein interactions, pro- with information from the literature resulted in a set of 1406 com-
tein complexes formed exclusively by IDPs are far less understood plexes that all exclusively contain protein chains that are disordered in
from both structural and functional points of view. The primary rea- their monomeric form. Each complex is manually inspected by data-
son behind the lack of systematic research of IDP-only complexes is base curators with a focus on the validity of the experimental evidence
the lack of well-organized and accessible data. While several such for disorder to assure the reliability of the database. Curators also
complexes are known (and some have been studied in detail, see for check the true biological assemblies of the complexes using PISA
example, Demarest et al., 2002), no specific database exists, and the (Proteins, Interfaces, Structures and Assemblies) to avoid the inclusion
majority of corresponding data are scattered in various databases. of non-biological contacts due to crystallization. These manually cura-
Yet, a targeted database often proves to be not only beneficial, but ted protein complexes together comprise MFIB.
vital for the development of research areas in biology (Baxevanis To reduce redundancy, complexes in MFIB were clustered based
and Bateman, 2015). on sequence similarities of their constituent chains. Protein chains
Our current work lays this missing foundation of the systematic were considered to be similar if they belong to the same UniRef90
structural/functional studies of IDP complexes by assembling cluster and show at least 70% overlap. Two complexes are deemed
Mutual Folding Induced by Binding (MFIB). MFIB is constructed by related if they contain the same number of proteins, and the proteins
integrating information from a range of databases and a wealth of from the two structures show pairwise similarity. Related complexes
literature to assemble by far the largest repository of protein com- were grouped into clusters forming the entries in MFIB. This clustering
plexes, where the interacting chains mutually fold as a result of the grouped the 1406 structures into 205 MFIB entries. Furthermore, each
interaction. entry in MFIB is assigned a class and a subclass during the manual an-
notation and curation step. Supplementary Table S1 shows the 8
classes and 33 subclasses currently defined in MFIB.

2 Database assembly
MFIB aims to serve as a starting point for the functional and structural
analysis of interactions between IDPs. In accord, the existence of a
solved complex structure of the interacting protein partners was a pre-
requisite for inclusion in the dataset. The existence of a solved structure
also serves as verification of the interaction and proof that the proteins
involved in fact adopt a stable structure upon interacting. Accordingly,
the PDB (version March 28, 2017) was taken as a starting point, and
was filtered and annotated using various criteria and information from
other databases to derive a high-quality set of interacting IDPs.
Structures that contain at least two protein chains in interaction
were selected and were filtered for structure quality (keeping only
nuclear magnetic resonance structures, and X-ray structures with a
Fig. 1. Workflow of the construction of MFIB. The figure shows the annotation
resolution better than 5 Å to discard poor quality structures) and steps of a hypothetical example of three interacting disordered protein re-
biological relevance (discarding chimeras and other structures gions, where the three chains are annotated through direct, UniRef90-transfer
containing non-biological polypeptide chains). Complexes where and Pfam-transfer of annotations (marked A, B and C, respectively). Light
non-protein chains—typically DNA and RNA—participate in the grey boxes represent disordered protein regions. Smaller black boxes mark
regions that are present in the candidate PDB structure. Boxes with dashed
interaction were also discarded. The remaining set of candidate
outline represent Pfam objects. Arrows show the transfer of annotations ei-
complexes were annotated based on experimental evidence in vari-
ther with direct sequence comparisons (direct annotations between UniProt
ous annotation databases (see Fig. 1). Disorder annotations were sequences) or with mapping (using Pfam, UniRef90 clusters, or BLAST in the
taken from DisProt (version 7 v0.4) (Piovesan et al., 2016) and case of transfer between UniRef90 sequences and between UniProt and the
IDEAL (version March 29, 2017) (Fukuchi et al., 2014). Using these PDB candidate proteins)
3684 E.Fichó et al.

3 Web interface Funding

Downloaded from https://academic.oup.com/bioinformatics/article-abstract/33/22/3682/4061276 by Europaisches Laboratorium fuer Molekularbiologie, Bibliothek user on 14 September 2018
MFIB is made available through a dedicated website at http://mfib. This is work was supported by the postdoctoral fellowship of the Hungarian
enzim.ttk.mta.hu/. The 205 entries representing interactions of IDPs Academy of Sciences, the Hungarian Research and Developments Fund
form the core of MFIB. Accordingly, each entry is assigned a unique [OTKA K115698 and OTKA K104586], and the Momentum Grant of the
Hungarian Academy of Sciences [LP2012-35]. Project no. FIEK_16-1-2016-
accession and has a separate page that details information about the
0005 has been implemented with the support provided from the National
given complex. Furthermore, the MFIB server also includes features
Research, Development and Innovation Fund of Hungary, financed under the
to ease searching and navigating through the database.
FIEK_16 funding scheme.
The ‘Home’ page describes the basis and purpose of the database
for users unfamiliar with MFIB. The ‘Statistics’ page shows basic Conflict of Interest: none declared.
statistics about MFIB. The ‘Help’ page answers questions connected
to the conception, assembly, design and usability of the database
References
and the server. MFIB also offers several ways of structured access to
the database including browsing, searching and multiple ways of Bateman,A. (2000) The Pfam protein families database. Nucleic Acids Res.,
downloading data in XML and text formats for local use. 28, 263–266.
Baxevanis,A.D. and Bateman,A. (2015) The importance of biological databases
in biological discovery. Curr. Protoc. Bioinformatics, 50, 1–8, Unit 1.1.
Demarest,S.J. et al. (2002) Mutual synergistic folding in recruitment of
4 Discussion
CBP/p300 by p160 nuclear receptor coactivators. Nature, 415, 549–553.
The construction of MFIB presents the first systematic collection of Dinkel,H. et al. (2016) ELM 2016–data update and new functionality of the
data concerning complexes formed by IDPs. It is based on the inte- eukaryotic linear motif resource. Nucleic Acids Res., 44, D294–D300.
gration of structural and sequence annotation databases coupled Dosztanyi,Z. et al. (2010) Bioinformatical approaches to characterize intrin-
with the results of an extensive manual literature survey. Previous sically disordered/unstructured proteins. Brief. Bioinform., 11, 225–243.
studies of complexes of mutually folding IDPs were typically based Dyson,H.J. and Wright,P.E. (2005) Intrinsically unstructured proteins and
their functions. Nat. Rev. Mol. Cell Biol., 6, 197–208.
on 10–35 structures (Gunasekaran et al., 2004; Nussinov et al.,
Fukuchi,S. et al. (2014) IDEAL in 2014 illustrates interaction networks com-
1998; Rumfeldt et al., 2008). In contrast, MFIB contains over 1400
posed of intrinsically disordered proteins and their binding partners.
complex structures organized into 205 entries. These data provide Nucleic Acids Res., 42, D320–D325.
the missing cornerstone of future structural and functional studies of Fuxreiter,M. et al. (2007) Local structural disorder imparts plasticity on linear
the synergistic folding of IDPs. motifs. Bioinformatics, 23, 950–956.
The data contained in MFIB not only far surpasses the number Giartosio,A. et al. (1996) Thermal stability of hexameric and tetrameric nu-
of complexes used in previous analyses but also provides a wide cleoside diphosphate kinases. Effect of subunit interaction. J. Biol. Chem.,
coverage of possible IDP–IDP interactions in many ways. Entries in 271, 17845–17851.
MFIB cover all three domains of life and also include complexes Gunasekaran,K. et al. (2004) Analysis of ordered and disordered protein com-
from viral proteins shedding light on the importance of synergistic plexes reveals structural features discriminating between stable and unstable
monomers. J. Mol. Biol., 341, 1327–1341.
folding in host–pathogen interactions. MFIB entries also cover the
Haynes,C. et al. (2006) Intrinsic disorder is a common feature of hub proteins
majority of possible oligomeric compositions from dimers to hexam-
from four eukaryotic interactomes. PLoS Comput. Biol., 2, e100.
ers, including both hetero- and homo-oligomers. Most importantly, Joshi,P. and Vendruscolo,M. (2015) Druggability of intrinsically disordered
entries in MFIB also cover the known spectrum of protein disorder. proteins. Adv. Exp. Med. Biol., 870, 383–400.
Protein disorder is a highly heterogeneous property with various Malhis,N. et al. (2016) MoRFchibi SYSTEM: software tools for the identifica-
IDPs exhibiting markedly different levels of flexibility in their un- tion of MoRFs in protein sequences. Nucleic Acids Res., 44, W488–W493.
bound form. MFIB contains complexes of IDP regions from near Mészáros,B. et al. (2007) Molecular principles of the interactions of dis-
random coil proteins (such as the CBP (CREB Binding Protein)- ordered proteins. J. Mol. Biol., 372, 549–561.
interacting region of ACTR, Demarest et al., 2002), through molten Mohan,A. et al. (2006) Analysis of molecular recognition features (MoRFs).
globules (such as the Arc repressor, Peng et al., 1993) to near- J. Mol. Biol., 362, 1043–1059.
Nussinov,R. et al. (1998) Mechanism and evolution of protein dimerization.
ordered structures, where a monomeric structure can be stabilized
Protein Sci., 7, 533–544.
with a limited number of mutations (such as the nucleoside diphos-
Oates,M.E. et al. (2013) D2P2: database of disordered protein predictions.
phate kinase, Giartosio et al., 1996). Nucleic Acids Res., 41, D508–D516.
The presented MFIB database currently presents the far largest Peng,X. et al. (1993) Molten-globule conformation of Arc repressor mono-
collection of interactions between IDPs; yet there are undoubtedly mers determined by high-pressure 1H NMR spectroscopy. Proc. Natl.
many more information scattered in the PDB and the literature that Acad. Sci., USA, 90, 1776–1780.
are not currently incorporated. In accord, we consider the present Piovesan,D. et al. (2016) DisProt 7.0: a major update of the database of disor-
version of MFIB as a stepping stone and plan to constantly update, dered proteins. Nucleic Acids Res., 45, D219–D227.
expand and revise the database. This process will rely on the past ex- Pundir,S. et al. (2017) UniProt protein knowledgebase. Methods Mol. Biol.,
perience of the authors in database-maintenance, the firm technical 1558, 41–55.
Rumfeldt,J.A.O. et al. (2008) Conformational stability and folding mechan-
and infrastructural background of the initiative, and the encourage-
isms of dimeric proteins. Prog. Biophys. Mol. Biol., 98, 61–84.
ment of a community effort to contribute to MFIB.
Sugase,K. et al. (2007) Mechanism of coupled folding and binding of an intrin-
sically disordered protein. Nature, 447, 1021–1025.
Varadi,M. et al. (2014) pE-DB: a database of structural ensembles of intrinsically
Acknowledgements disordered and of unfolded proteins. Nucleic Acids Res., 42, D326–D335.
The authors would like to thank Zsófia Béky for her help in the MFIB graph- Wright,P.E. and Dyson,H.J. (1999) Intrinsically unstructured proteins: re-assessing
ical design and Gábor E. Tusnády for his help with setting up the MFIB ser- the protein structure-function paradigm. J. Mol. Biol., 293, 321–331.
ver. The critical comments of Katalin Paréj, László Dobson and Karolina Wright,P.E. and Dyson,H.J. (2015) Intrinsically disordered proteins in cellular
Fichó concerning server functionality are greatly appreciated. signalling and regulation. Nat. Rev. Mol. Cell Biol., 16, 18–29.

You might also like