s42485 021 00057 y
s42485 021 00057 y
https://doi.org/10.1007/s42485-021-00057-y
SHORT COMMUNICATION
Received: 10 October 2020 / Revised: 23 January 2021 / Accepted: 27 January 2021 / Published online: 13 February 2021
© The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. part of Springer Nature 2021
Abstract
The emergence of novel coronavirus SARS-CoV-2 is responsible for causing coronavirus disease-19 (COVID-19) impos-
ing serious threat to global public health. Infection of SARS-CoV-2 to the host cell is characterized by direct translation of
positive single stranded (+ ss) RNA to form large polyprotein polymerase 1ab (pp1ab), which acts as precursor for a number
of nonstructural and structural proteins that play vital roles in replication of viral genome and biosynthesis of new virus
particles. The maintenance of viral protein homeostasis is essential for continuation of viral life cycle in the host cell. To
test whether the protein homeostasis of SARS-CoV-2 can be disrupted by inducing specific protein aggregation, we made
an effort to examine whether the viral proteome contains any aggregation prone regions (APRs) that can be explored for
inducing toxic protein aggregation specifically in viral proteins and without affecting the host cell. This curiosity leads to
the identification of several (> 70) potential APRs in SARS-CoV-2 proteome. The length of the APRs ranges from 5 to 25
amino acid residues. Nearly 70% of total APRs investigated are relatively smaller and found to be in the range of 5–10 amino
acids. The maximum number of ARPs (> 50) was observed in pp1ab. On the other hand, the structural proteins such as,
spike (S), nucleoprotein (N), membrane (M) and envelope (E) proteins also possess APRs in their primary structures which
altogether constitute 30% of the total APRs identified. Our findings may provide new windows of opportunities to design
specific peptide-based, anti-SARS-CoV-2 therapeutic molecules against COVID-19.
13
Vol.:(0123456789)
2 Journal of Proteins and Proteomics (2021) 12:1–13
these vaccines (Kim et al. 2021). Therefore, in addition (Pillay 2020). The M protein has three trans-membrane
to the available options, it becomes imperative to search domains and it shapes the virions, promotes membrane
novel therapeutic targets to curtail virus infection and curvature, and binds to the nucleocapsid. The E pro-
multiplication. The replication cycle of SARS-CoV-2 in tein plays a role in virus assembly and release, and it is
host cell is marked with highly synchronized processes of involved in viral pathogenesis. The N protein contains two
protein expression, protein folding, and assembly of viral domains, which bind with virus RNA genome through an
genome along with structural proteins lead to formation integrated action S, E and M proteins.
of new virus particles (Sims et al. 2008; Fehr and Perlman It has often been observed that the protein aggregation
2015; Chen et al. 2020; Lukassen et al. 2020; Lunget al. frequently disrupts the protein homeostasis leading to devel-
2020; Malik 2020). Maintenance of protein homeostasis opment of various disease conditions. Protein aggregation
in a eukaryotic cell is achieved by an integrated mecha- is generally driven by specific amino acid sequences which
nism of protein biosynthesis, folding and attainment of are interspersed within the primary structure of proteins and
native structure, and the degradation of misfolded proteins polypeptides, known as aggregation-prone regions (APRs).
(Balchin et al. 2016; Chiti and Dobson 2017; Klaips et al. The synthetic analogs of such APRs sequences contain the
2018; Zhong et al. 2019). After the entry into the host ability to self-assemble to form aggregates rich β-sheet
cell, viruses employ various strategies to hijack and regu- structures. Further, these APRs are shown to interact with
late various biochemical and molecular activities, such as similar sequences present in parent proteins and peptides
transcription and translation machineries, of the host cell through homologous interaction and induce aggregation.
to produce new viral proteins and enzymes essential for Hence, these APRs have been successfully explored for the
multiplication of the virus (Chen et al. 2020; Malik 2020; targeted disruption of protein homeostasis. Several recent
Salvi and Patankar 2020). Translation of viral genome studies have confirmed that the presence of synthetic analogs
represents a key event required for the establishment of of these sequence-stretches (i.e., APRs) effectively block the
infection and multiplication of SARS-CoV-2. We started folding of the original proteins and render them for degrada-
our prediction using the primary structures of proteins tion by the proteasomal degradation machinery of the host
emerging from all the known open reading frames (ORFs) cell (Beerten et al. 2012; De Baets et al. 2014; Gallardo
of the SARS-CoV-2. The genome structure of SARS- et al. 2016; Ganesan et al. 2016; Khodaparast et al. 2018).To
CoV-2 contains at least six ORFs. The first ORF (known explore the possibility of targeted protein aggregation to cur-
as ORF1a/b) constitutes approximately two‐thirds of the tail SARS-CoV-2 infection, we screened the viral proteome
total genome length and encodes 16 nonstructural pro- to find out presence of APRs. Our initial studies suggest that
teins (NSPS1‐16) (Gordonet al. 2020; Malik 2020). There the primary structures of many of the key proteins such as,
is a − 1 frame shift between ORF1a and ORF1b, leading polyprotein polymerase 1ab (pp1ab), envelope protein (E),
to production of two polypeptides: polypeptide 1a (pp1a) nucleoprotein (N), membrane (M) protein, etc., are marked
and polypeptide 1ab(pp1ab) having 7096 amino acid resi- by the presence of small amino acid sequence-stretches pos-
dues. These polypeptides are proteolytically cleaved to sessing high aggregation propensity. On the other hand, it
form 16 polypeptides segments that ultimately give rise has been observed that the peptide (APR)-induced protein
nonstructural proteins (NSPS). Chymotrypsin‐like pro- aggregation turns out to be a highly ordered and specific
tease (3CLpro) which is virally encoded act at specific process. Since these APRs form essential elements of the
sites and help in the formation of NSPS. Other ORFs are native proteins, in unfolded state (immediately after trans-
situated at the 3′-terminus of ORF1 constitute just 1/3rd lation), can interact with synthetic analogs of APRs and
of the viral genome and encode four major structural pro- induce aggregation of entire proteins and finally subject the
teins namely, spike (S), membrane (M), envelope (E), and protein molecules for degradation rather than their folding
nucleocapsid (N) proteins. The NSPS play specific roles into functional proteins.
during infection such as, degradation of host cell mRNA,
inhibition of interferon (IFN) signaling, blocking the host
innate immune response, promoting cytokine expression, Methods
etc. These biochemical functions of NSPS are crucial for
establishment of viral infection and multiplication. The Prediction of the potential aggregation prone
four structural proteins are vital for virion assembly and regions (APRs) in the SARS‑CoV‑2 proteome
formation of new viral particles. The S protein forms a
homotrimer and then form spikes on the viral surface that The complete genome of Wuhan-Hu-1 (NC_045512.2)
are responsible for initial attachment to the host receptors was downloaded from NCBI nucleotide database. The
13
Journal of Proteins and Proteomics (2021) 12:1–13 3
aggregation propensity of all the SARS-CoV-2 proteins helpful in enhancing host mediated destruction of the virus.
primary structure was assessed by using in silico predic- Similarly, the polypeptide regions corresponding to NSP-3
tions. These primary structures of the proteins were sequen- and NSP-4 consist of large number of APRs (Fig. 1).
tially submitted to different computation algorithms namely Apart from pp1ab, the structural proteins also contain
FoldAmyloid (http://bioin fo.protr e s.ru/FoldA myloi d /) sequence stretches of high significant aggregation score.
(Garbuzynskiy et al. 2010), TANGO (http://tango.crg.es/) There were six potential APRs identified in S protein, how-
(Fernandez-Escamilla et al. 2004), AGGRESCAN (http:// ever, the length of APRs use to be relatively shorter except
bioinf.uab.es/aggrescan/) (Conchillo-Soleet al. 2007), and the APR present at N-terminus of the protein. The E-protein
AMYLPRED (http://aias.biol.uoa.gr/AMYLPRED/input is the smallest structural protein (75 residues) and known
.php) (Frousios et al. 2009) with the default setting. The to play essential role in the virus morphogenesis (Liu et al.
scores were compared with classical aggregating peptide 2007), consists of a single potential APR. The M-protein
i.e., Amyloid beta (Aβ) peptide. constitutes an essential component of virus along with other
structural proteins and plays a central role in virus morpho-
genesis and assembly via its interactions with other viral
Result and discussion proteins (Neuman et al. 2011). It consists of five APRs rang-
ing from 7 to 18 amino acid residues. The N-protein consists
Prediction of APRs in different proteins emerging of relatively less number of shorter APRs compared to other
from different ORFs of SARS‑CoV‑2 structural proteins. Among all the four structural proteins
the S-protein and M-proteins are comparatively richer in the
Table 1 summarizes the locations of the predicted APRs APRs compared to N and E-proteins. The score of individual
in different structural and nonstructural proteins of SARS- APRs range from 20 to 100. However, most of the APRs
CoV-2. The APRs are found to be asymmetrically distrib- have aggregation score above 50, indicative of less chance
uted in the different regions of all the proteins investigated. to give false positive values.
As mentioned earlier, pp1a and pp1ab are the two large The lengths of most of the APRs identified in the SARS-
polypeptides that formed from direct translation of virus CoV-2 proteome are in the range of 5–8 residues (Table 1).
genome after its entry into host cells. Given the fact that Most of the APRs in pp1ab possess are found to be relatively
2/3rd proportion of the total virus genome is utilized for the shorter in length compared to the one observed in structural
synthesis of NSPS, they are very crucial for the continuation proteins. It is observed that the shorter APR peptides (of ≈
of virus replication cycle (Masters 2006; Chen et al. 2020). 6 residues) found to be giving better prediction reliability
NSP-1 is the first non-structural protein formed from pp1ab, compared to the larger one. On the other hand, it has also
obstructs translation of host mRNA by interfering with the been established that the longer APRs possesses greater
40S ribosomal subunit (Raj 2021). The primary structure tendency to display false positives compared to the shorter
of NSP-1 contains 180 residues and it was found to be free ones. It has been observed that APRs of shorter length pos-
from any APRs in it. Similarly, the region corresponding to sess high aggregation propensity and interact more effi-
NSP-13 spanning from 5325 to 5925 residues does not con- ciently with the identical sequences in the large peptides or
tain any aggregation prone regions in it. In the polypeptide proteins compared to longer APRs.
segments corresponding to NSP-2 to NSP-12 and NSP-14 The legitimacy of the predicted APRs is based on the
to NSP-16 contain several APRs. NSP-2, 637 residues in its reliability of mathematical and statistical lucidities. The
primary structure, is the second nonstructural protein and computational algorithm TANGO uses a statistical mechan-
found to have 6 potential APRs ranging from 6 to 13 resi- ics approach to make predictions of different secondary
dues in length. The maximum numbers of APRs are found structures present in different regions for a given proteins
in the segment spanning from 3570 to 3859 residues, which (Pande 2004). The algorithm assumes a particular amino
corresponds to NSP-6. The total APRs in this region consti- acid sequence (of at least five consecutive residues) is aggre-
tute > 35% residues of the total protein. Along with NSP-3 gation-prone if it has high propensity to form β-sheet struc-
and NSP-4, NSP-6 plays vital role in creation of cytoplasmic ture and when this sequence form aggregate all the residues
double-membrane vesicles essential for viral replication. On of the β-region are buried in the hydrophobic interior. It
the other hand, NSP-6 also plays important role in prevent- predicts the aggregation propensity in a sequence specific
ing delivery of the viral components to lysosomes of the host manner and presents the data in the form of beta-aggrega-
cell and hence protects the virus from lysosomal inactivation tion score and its value range from 1 to 100. It is reported
(Gordon et al. 2020). Hence, truncating NSP-6 would be that the TANGO score of 5 per residue gives a Matthews
13
4 Journal of Proteins and Proteomics (2021) 12:1–13
correlation coefficient between prediction and experiment of different viral proteins by all the four algorithms are found
0.92 (Fernandez-Escamillaet al. 2004). Further, it has been to be unanimous.
shown that the false-positive rate of TANGO is below 5%
for a TANGO score of more than 15 (Bednarskaet al. 2016). Mechanistic outlook of APRs‑induced disruption
Most of the classical amyloidogenic peptides possess the SARS‑CoV‑2 protein homeostasis
aggregation score above 50 and hence we gathered all the
sequence stretches displaying the score above it. The overall For the first time the mechanism of APR-induced disrup-
score above 90 suggest high aggregation propensity with tion of protein homeostasis action was proposed by Balch
less probability of getting false positive. The data obtained et al. (2008). They showed that the disruption of bacterial
from Tango were further analyzed by using other analogous protein homeostasis can be induced by small aggregating
algorithms such as Aggrescan, AmylPred and FoldAmyoid. peptides resulting into formation of toxic protein aggregates
In all the predictions we used amyloid beta (Aβ1-42) peptide in the bacterial cell. Generally, the ordered protein aggrega-
as a reference due to its ability to form classical aggregates tion is facilitated through the formation of intermolecular
rich in β-structures. The AGGRESCAN program predicts β-structures by short polypeptide sequence with high aggre-
the aggregation prone regions in a protein as “hot spot” gation propensity. Presence of such sequences define the
sequences of 5 to 11 residues that can nucleate aggregation basis of amyloid formation in various disease conditions,
in peptides and proteins. The aggregation propensity of the particularly the most debilitating Alzheimer’s and Parkin-
hot spots is determined largely by amino acid composition, son’s diseases. Similar sequences are commonly present in
which is based on the experimentally determined aggrega- various globular proteins that constitute their hydrophobic
tion propensity scale for individual amino acids. The Fol- core and confer structural stability. They also assist oligo-
dAmyloid program predicts short amino acid sequences (≥ 5 meric proteins by forming protein–protein interfaces.
residues) based on the contacts, packing density, backbone Despite the fact that these sequences participate in pro-
H-bonds of acceptors or donors for prediction of aggregation viding stability to the native proteins, they can self-assemble
prone regions. AmylPred combines the data from SecStr, with identical sequences to form β-structured aggregates in
a secondary structure prediction tool, to predict the amino unfolded state. While forming the β-structured aggregates,
acid sequence in protein that can act as potential conforma- it is often observed that their interactions with identical
tional switch. As shown in Fig. 2, the APRs identified in sequences in denatured proteins use to be more efficient than
Table 1 Location of newly identified aggregation prone regions in different proteins of SARS-CoV-2
Amino acid sequences Positions Residues Amino acid Sequence of APRs Length of
APRs
13
Journal of Proteins and Proteomics (2021) 12:1–13 5
Table 1 (continued)
Amino acid sequences Positions Residues Amino acid Sequence of APRs Length of
APRs
13
6 Journal of Proteins and Proteomics (2021) 12:1–13
Table 1 (continued)
Amino acid sequences Positions Residues Amino acid Sequence of APRs Length of
APRs
13
Journal of Proteins and Proteomics (2021) 12:1–13 7
Table 1 (continued)
Amino acid sequences Positions Residues Amino acid Sequence of APRs Length of
APRs
13
8 Journal of Proteins and Proteomics (2021) 12:1–13
Table 1 (continued)
Amino acid sequences Positions Residues Amino acid Sequence of APRs Length of
APRs
13
Journal of Proteins and Proteomics (2021) 12:1–13 9
Fig. 1 Identification of aggregation prone regions (APRs) in the major proteins of SARS-CoV-2. The aggregation score and propensity in the
predicted APRs found to be equivalent to the Abeta peptide, which serves as a classical β-structured aggregates
13
10 Journal of Proteins and Proteomics (2021) 12:1–13
acid metabolism constitute viable therapeutic targets for actively engaged in synthesizing all peptides analogous to
COVID-19 (Bojkova et al. 2020). Hence, the develop- the identified APRs and characterizing their biophysical
ment of an exclusive and multi-target strategy to disrupt characteristics and we hope that the APR-induced proteo-
the protein homeostasis will represent an attractive and static disruptions will provide an innovative approach to
potential anti-SARS-CoV-2 strategy. At present, we are fight with COVID-19.
13
Journal of Proteins and Proteomics (2021) 12:1–13 11
Fig. 2 (continued)
13
12 Journal of Proteins and Proteomics (2021) 12:1–13
Fig. 3 Schematic representation of APR peptide-based inhibition to form new viral particles. The events depicted in the region 2 (left
of viral replication. The events in the region one represents usual side) depict the events leading to APR peptide-based targeting of pro-
cycle of infection, release of viral + ssRNA AND its direct trans- teins formed from ORF1a/ORF1ab (pp1ab). Addition of APR pep-
lation to form pp1ab which subsequently forms all the nonstruc- tides will interfere the protein folding reaction of viral proteins and
tural proteins (NSPS). The NSPS, are used in amplification of viral subject them for proteasomal degradation in the host cell. Depletion
genomic + ssRNA, formation of structural and other accessory pro- of essential viral proteins will lead to complete halt of the viral repli-
teins. At the end genomic + ssRNA assemble with structural proteins cation and formation of new viral particles
13
Journal of Proteins and Proteomics (2021) 12:1–13 13
Conchillo-Sole O, de Groot NS, Aviles FX, Vendrell J, Daura X, Lung J, Lin YS, Yang YH, Chou YL, Shu LH, Cheng YC et al (2020)
Ventura S (2007) AGGRESCAN: a server for the prediction and The potential chemical structure of anti-SARS-CoV-2 RNA-
evaluation of “hot spots” of aggregation in polypeptides. BMC dependent RNA polymerase. J Med Virol 92(6):693–697
Bioinformatics 27(8):65 Malik YA (2020) Properties of coronavirus and SARS-CoV-2. Malay-
Cucinotta D, Vanelli M (2020) WHO declares COVID-19 a pandemic. sian J Pathol 42(1):3–11
Acta Bio-med Atenei Parmensis 91(1):157–160 Masters PS (2006) The molecular biology of coronaviruses. Adv Virus
De Baets G, Schymkowitz J, Rousseau F (2014) Predicting aggrega- Res 66:193–292
tion-prone sequences in proteins. Essays Biochem 56:41–52 Neuman BW, Kiss G, Kunding AH, Bhella D, Baksh MF, Connelly
Fehr AR, Perlman S (2015) Coronaviruses: an overview of their repli- S et al (2011) A structural analysis of M protein in coronavirus
cation and pathogenesis. Methods Mol Biol 1282:1–23 assembly and morphology. J Struct Biol 174(1):11–22
Fernandez-Escamilla AM, Rousseau F, Schymkowitz J, Serrano L Nicola M, Alsafi Z, Sohrabi C, Kerwan A, Al-Jabir A, Iosifidis C et al
(2004) Prediction of sequence-dependent and mutational effects (2020) The socio-economic implications of the coronavirus and
on the aggregation of peptides and proteins. Nat Biotechnol COVID-19 pandemic: a review. Internat J Surg 78:185–193
22(10):1302–1306 Pande VS (2004) A universal TANGO? Nat Biotechnol
Frousios KK, Iconomidou VA, Karletidi C-M, Hamodrakas SJ (2009) 22(10):1240–1241
Amyloidogenic determinants are usually not buried. BMC Struct Pillay TS (2020) Gene of the month: the 2019-nCoV/SARS-CoV-2
Biol 9(1):44 novel coronavirus spike protein. J Clin Pathol 73(7):366–369
Gallardo R, Ramakers M, De Smet F, Claes F, Khodaparast L, Khoda- Raj R (2021) Analysis of non-structural proteins, NSPs of SARS-
parast L et al (2016) De novo design of a biologically active amy- CoV-2 as targets for computational drug designing. Biochem
loid. Science 354(6313):18 Biophys Rep 25:100847
Ganesan A, Siekierska A, Beerten J, Brams M, Van Durme J, De Baets Salvi R, Patankar P (2020) Emerging pharmacotherapies for COVID-
G et al (2016) Structural hot spots for the solubility of globular 19. Biomed Pharma 14:110267
proteins. Nature Commun 7:10816 Sanders JM, Monogue ML, Jodlowski TZ, Cutrell JB (2020) Pharma-
Garbuzynskiy SO, Lobanov MY, Galzitskaya OV (2010) FoldAmyloid: cologic treatments for coronavirus disease 2019 (COVID-19): a
a method of prediction of amyloidogenic regions from protein review. JAMA 323(18):1824–1836
sequence. Bioinformatics (Oxford, England) 26(3):326–332 Scarabel L, Guardascione M, Dal Bo M, Toffoli G (2021) Pharmaco-
Gordon DE, Jang GM, Bouhaddou M, Xu J, Obernier K, White KM logical strategies to prevent SARS-CoV-2 infection and to treat the
et al (2020) A SARS-CoV-2 protein interaction map reveals tar- early phases of COVID-19 disease. Internat Soc Infect Dis 10:26
gets for drug repurposing. Nature 10:26 Sims AC, Burkett SE, Yount B, Pickles RJ (2008) SARS-CoV replica-
Hung IF, Lung KC, Tso EY, Liu R, Chung TW, Chu MY et al (2020) tion and pathogenesis in an in vitro model of the human conduct-
Triple combination of interferon beta-1b, lopinavir-ritonavir, and ing airway epithelium. Virus Res 133(1):33–44
ribavirin in the treatment of patients admitted to hospital with Srinivas P, Sacha G, Koval C (2020) Antivirals for COVID-19. Cleve-
COVID-19: an open-label, randomised, phase 2 trial. Lancet land Clinic J Med 10:18–24
(London, England) 395(10238):1695–1704 Sternberg A, McKee DL, Naujokat C (2020) Novel drugs targeting the
Jean SS, Lee PI, Hsueh PR (2020) Treatment options for COVID- SARS-CoV-2/COVID-19 machinery. Curr Top Med Chem 10:25
19: the reality and challenges. J Microbiol Immunol Infect Tandon PN (2020) COVID-19: Impact on health of people & wealth
53(3):436–443 of nations. Indian J Med Res 151(2 & 3):121–123
Khodaparast L, Khodaparast L, Gallardo R, Louros NN, Michiels E, Wang Y, Zhang D, Du G, Du R, Zhao J, Jin Y et al (2020) Remdesi-
Ramakrishnan R et al (2018) Aggregating sequences that occur vir in adults with severe COVID-19: a randomised, double-blind,
in many proteins constitute weak spots of bacterial proteostasis. placebo-controlled, multicentre trial. Lancet (London, England)
Nat Commun 9(1):866 395(10236):1569–1578
Kim JH, Marks F, Clemens JD (2021) Looking beyond COVID-19 Zhang J, Zeng H, Gu J, Li H, Zheng L, Zou Q (2020) Progress and
vaccine phase 3 trials. Nat Med. https://doi.org/10.1038/s4159 Prospects on Vaccine Development against SARS-CoV-2. Vac-
1-021-01230-y cines 8(2):9
Klaips CL, Jayaraj GG, Hartl FU (2018) Pathways of cellular proteo- Zhong M, Lee GM, Sijbesma E, Ottmann C, Arkin MR (2019) Modu-
stasis in aging and disease. J Cell Biol 217(1):51–63 lating protein-protein interaction networks in protein homeostasis.
Liu DX, Yuan Q, Liao Y (2007) Coronavirus envelope protein: a small Curr Opin Chem Biol 50:55–65
membrane protein with multiple functions. Cell Mole life Sci
CMLS 64(16):2043–2048 Publisher’s Note Springer Nature remains neutral with regard to
Lukassen S, Chua RL, Trefzer T, Kahn NC, Schneider MA, Muley jurisdictional claims in published maps and institutional affiliations.
T et al (2020) SARS-CoV-2 receptor ACE2 and TMPRSS2 are
primarily expressed in bronchial transient secretory cells. EMBO
J 4:e105114
13