Predicting S - RNA
Predicting S - RNA
com
GENOMICS
PROTEOMICS &
BIOINFORMATICS
Genomics Proteomics Bioinformatics 10 (2012) 276–284
www.elsevier.com/locate/gpb
Review
Abstract
Bacterial small RNAs (sRNAs) are an emerging class of regulatory RNAs of about 40–500 nucleotides in length and, by binding to their
target mRNAs or proteins, get involved in many biological processes such as sensing environmental changes and regulating gene expres-
sion. Thus, identification of bacterial sRNAs and their targets has become an important part of sRNA biology. Current strategies for
discovery of sRNAs and their targets usually involve bioinformatics prediction followed by experimental validation, emphasizing a key
role for bioinformatics prediction. Here, therefore, we provided an overview on prediction methods, focusing on the merits and limita-
tions of each class of models. Finally, we will present our thinking on developing related bioinformatics models in future.
1672-0229/$ - see front matter Ó 2012 Beijing Institute of Genomics, Chinese Academy of Sciences and Genetics Society of China. Published by Elsevier
Ltd and Science Press. All rights reserved.
http://dx.doi.org/10.1016/j.gpb.2012.09.004
Li W et al / Predicting sRNAs and Their Targets in Bacteria 277
the sRNA IstR and its target mRNA tisB [5] (see sRNATar- Third, unlike tRNAs with the conserved cloverleaf second-
Base for detailed information, http://ccb.bmi.ac.cn/srnatar- ary structure pattern, or eukaryotic microRNAs with sim-
base/). The imperfect base pairing results in much difficulty ilar sequence lengths and hairpin structured precursors
in detecting target mRNAs, which renders the experimental [41], different sRNAs often have different secondary struc-
validation essential after computational prediction. Never- tures. Fourth, sRNAs are involved in many biological pro-
theless, the computational methods have provided a time- cesses, such as posttranscriptional regulation of gene
saving and less labor-intensive way for the identification of expression, RNA processing, mRNA stability and transla-
sRNA targets. To this end, several prediction models have tion, protein degradation, plasmid replication and bacterial
been developed [28–36]. virulence [42–47]. The above features, on the one hand,
Taken together, bioinformatics prediction plays an reflect the importance of sRNAs, and on the other hand,
important role in discovering sRNAs and their targets, as bring difficulties in developing general models for sRNA
pointed by some reviews on bioinformatics prediction prediction. Although many empirical models have been
and experimental discovery [37–40]. In the current review, developed for sRNA discovery [17–26] (Table 1), there is
we focus on the merits and limitations of each class of mod- little overlap between the prediction results from different
els and provide some perspective on future development in models. We are still a long way from developing a perfect
this field. model for sRNA prediction.
In essence, the process of developing bioinformatics models Comparative genomics-based models are a class of com-
is to learn the rules from known samples and then to apply monly-used methods for sRNA prediction at present. The
the rules for new samples for experimental validation. basic assumption is that an sRNA gene should have a cer-
Therefore, understanding the characteristics of bacterial tain conservation of both sequence and secondary structure
sRNAs is vital in developing sRNA prediction models. among a group of closely-related genomes. Therefore, how
The available literature indicates that sRNAs possess the to choose the right set of closely-related genomes plays a
following features [37–40]. First, sRNAs are widespread key role in the success of comparative genomics-based
and each bacterium is assumed to contain sRNA genes. models for sRNA prediction, and usually depends on the
Second, sRNAs are heterogeneous in sequence length and research purposes and models employed. For example, to
secondary structure as mentioned previously. The sequence find the sRNA genes in the intergenic regions of Esche-
of sRNAs ranges from 40 to 500 nucleotides in length. richia coli [46], Argaman et al. applied the BLAST program
to compare potential sRNA regions against the genomes of are of no use if there are no closely-related genomes avail-
Salmonella typhi, S. paratyphi and S. typhimurium and able for a given genome. Third, the conserved intergenic
identified 24 putative sRNA genes. In addition, Rivas regions may contain other gene structures such as tran-
and Eddy applied the WUBLASTN program to compare scription factor binding sites or untranslated regions of
2367 intergenic sequences of E. coli against the complete mRNAs rather than sRNA genes. Therefore, the compar-
genome of S. typhi [17]. The 11,509 generated alignments ative genomics-based models are only applicable to identify
were scanned using the QRNA model and finally, 33 out some sRNA genes.
of 115 known ncRNAs were identified. The E. coli genome
was also used to test the performance of the sRNAPredict Machine learning-based models for sRNA prediction
program. Using sequence conservation between E. coli
intergenic regions and Shigella flexneri, Livny et al. identi- The basic assumption of this class of models is that a given
fied 50 out of 55 known sRNAs [21]. Therefore, it is very genome is composed of two parts, i.e., sRNA genes and the
difficult to provide a general rule for how many genomes remaining part of the genome. If we take sRNA genes as
and which genomes should be included in studies of com- signal, the remaining part of the genome will be viewed
parative genomics-based sRNA prediction. as the background. The first step to develop machine learn-
The main steps for comparative genomics-based models ing-based models is to construct a training dataset includ-
to predict sRNA genes are as follows. The first step is to ing positive and negative samples. The known sRNA
find closely-related genomes to a given bacterial genome. genes are often used as positive samples, while randomly-
The second step is to extract intergenic regions among selected DNA sequences from the given genome are taken
the selected genomes and to apply the BLAST program as negative samples. The second step is to extract features
to compare intergenic regions pair wisely. Then, the pair describing the samples, which is a key step in developing
wise BLAST hits are gathered into clusters of two or more models. Only suitable features can improve the model per-
sequences, and these sequence clusters are aligned using formance. In addition, feature selection is also important in
ClustalW or ncDNAlign [48]. Finally, the resulting align- machine learning-based model construction. For example,
ments are scored using RNAz [18] or EvoFold [19]. The in Tran’s model for sRNA prediction [26], they firstly con-
third step is to carry out structural conservation analysis structed a training dataset including 936 non-redundant
for the intergenic regions using the above alignment. Here ncRNA sequences as the positive set and the shuffled
structural conservation means that, for some positions in sequences of those positive samples as the negative sam-
each sequence, even though there is no perfect conservation ples. Then, they applied a t-test to find a set of features with
of nucleotides, the base pairing information is kept. The statistical significance (P < 0.05) for neural network-based
fourth step is to predict whether the conserved intergenic model construction. In fact, many feature selection meth-
regions contain the signal of promoter, transcript factor ods have been applied in gene expression profile-based
binding sites or Rho-independent terminator. sample classification studies such as the Tclass system
Based on some or all steps above, some programs, developed by our laboratory [50]. All those feature selec-
including QRNA [17], RNAz [18], EvoFold [19], SIPHT tion methods can be applied to select proper feature sets
[20] and sRNAPredict [21], have been developed and suc- for sRNA prediction. Third, the machine learning methods
cessfully applied to finding bacterial sRNA genes. QRNA such as neural networks and support vector machines are
takes blast alignment of two sequences as the input, while applied to develop the models. Fourth, the models devel-
RNAz and EvoFold take multiple sequence alignment as oped are applied to genome-wide discovery of sRNA genes
input, before structure analysis such as conservation and for experimental validation. If the number of predicted
thermodynamic stability is performed to predict potential sRNA genes is very large, the comparative genomics-based
sRNA genes. Different from these tools, sRNAPredict models can be further applied to reduce the number of the
and SIPHT only use information from blast alignment genes. The main challenge in developing machine learning-
and Rho-independent terminator signal without consider- based models lies in constructing training samples and fea-
ing structural information. tures. For example, in the neural network-based model pre-
Four comparative genomics-based methods, QRNA, sented by Carter [23], the genetic algorithm-based model
RNAz, sRNAPredict/SIPHT and NAPP (nucleic acid phy- presented by Saetrom [24] and the model presented by
logenetic profiling) [22] were systematically compared using Wang [25], the number of positive samples was enlarged
10 sets of benchmark data in a recent evaluation paper [49], by incorporating the tRNA and rRNA sequences into the
The authors found that sRNAPredict provided the best training dataset.
performance by comprehensively considering multiple fac- Compared to the comparative genomics-based models,
tors such as low false positive rates, ability to identify the machine learning-based models for sRNA gene prediction
correct strand of sRNAs and speed of execution. have some advantages. For example, these models can be
There are limitations for this class of methods. First, the applied to find sRNA genes unique to a given genome.
aforementioned models are only applicable to the discovery However, when we apply these models to do genome-wide
of evolutionarily-conserved sRNA genes rather than the discovery of sRNA genes, we often divide the genome into
genes unique to a given genome. Second, these models fragments with a certain length for prediction separately. If
Li W et al / Predicting sRNAs and Their Targets in Bacteria 279
the fragment is too short, it might not contain enough energy model to reduce computational time. RNAplex per-
information for sRNA genes. Conversely, if the fragment formed 10–27 times faster than RNAhybrid [57].
is too large, it might contain noise information. Therefore, The methods mentioned above ignore the secondary
it is very difficult to choose the optimal window size for structures of two RNA molecules before they interact. To
machine learning-based models due to the length heteroge- improve the prediction performance, Muckstein et al.
nicity of sRNA genes. Because of this, Tran et al. con- applied a dynamic programming algorithm to search the
structed different models using different window sizes. minimum extended hybridization energy, which was
This might be the reason why the positive prediction value defined as the sum of hybridization energy and the energy
of machine learning-based models is less than that from for making the binding sites accessible [68].
comparative genomics-based models [26]. Since pseudo-knots were not considered in both the clas-
sical and the extensions of RNA secondary structure pre-
diction algorithms, the aforementioned programs cannot
Prediction of sRNA targets
find loop–loop interactions (kissing complex) between
two RNA molecules. To address this problem, Alkan
Developing predicting models for sRNA targets is very
et al. presented inteRNA [59] based on joint structure of
important. The strategy, which combines bioinformatics
two RNA molecules. When applied in CopA-CopT and
prediction and experimental validation for sRNA gene dis-
OxyS-fhlA interactions, inteRNA detected the loop-loop
covery, can also be applied to sRNA target identification
interactions successfully. Thereafter, multiple programs
[51]. To do this, understanding the features of sRNA-target
such as piRNA [60], inRNA [61], rip [62], RactIP [63], rip-
interactions is the initial key step. sRNAs exert their func-
align [64] and PETcofold [65] have been presented based on
tions through the following two ways: (1) imperfect base-
joint structure of two RNA molecules.
pairing with their target mRNAs; and (2) binding proteins
Although many programs for general RIP have been
and altering their activity [43]. Imperfect base-pairing with
presented, most programs only provide the potential bind-
mRNAs represents the major regulatory mechanism, which
ing sites between two RNA molecules rather than deter-
can lead to translational repression, translational activa-
mine whether two RNA sequences interact or not. In
tion or mRNA degradation [52]. This mechanism is the
fact, two randomly selected RNA sequences can present
focus of current studies on sRNA-target interactions. We
many potential binding sites, which cannot guaranty that
reviewed the related prediction models below. To date,
two RNA sequences interact. These programs are only
two categories of methods, prediction models for general
suitable for searching binding sites given the interaction
RNA–RNA interaction [53–65] (Table 1) and models spe-
between an sRNA and a target mRNA. Therefore, it is
cifically designed for sRNA-target mRNA interactions in
impractical to apply these models for genome-wide predic-
bacteria [28–36] (Table 1), have been utilized in sRNA tar-
tion of sRNA targets. It is necessary to develop specific
get discovery.
prediction models for sRNA targets.
Prediction models for general RNA–RNA interactions Prediction models specifically designed for sRNA-target
mRNA interactions
In essence, the sRNA-target mRNA interactions in bacte-
ria fall into the class of RNA–RNA interactions. There- The first prediction model specific to sRNA-target mRNA
fore, the models for general RNA–RNA interaction interaction was presented by Zhang et al. [28]. They incor-
prediction (RIP) can also be applied to investigate porated the following five features into the model: (1) Hfq-
sRNA-target mRNA interaction. binding sites in both sRNA and target mRNA sequences;
The earliest methods for RIP are to find hybridization (2) flanking sequence 35 to +15 nt around the translation
structure with the minimum binding free energy for two initiation sites in target mRNA sequences; (3) Hfq-binding
RNA molecules, using the program RNAfold [53,54] or sRNA structures; (4) extension alignment based on the cen-
Mfold [66] to fold the two concatenated RNA sequences. ter of loop or bulge regions from sRNA secondary struc-
Hybridization artifacts can arise from folding the concate- ture; and (5) conservation profiles of the sRNAs and
nation of two RNA sequences. To prevent such artifacts, their targets among 8 closely-related organisms of E. coli
many programs such as RNAcofold [54], RNAhybrid K-12. For a given sRNA, this model scores each potential
[55,56] and RNAplex [57] were presented by extending sRNA–mRNA interaction based on a modified Smith–
the classical RNA secondary structure prediction algo- Waterman local sequence alignment algorithm (a reward
rithm to two sequences. For instance, RNAhybrid [55,56] for a match and a penalty for a mismatch) and takes the
was a modification of the classic RNA secondary structure mRNAs with top 10 or 50 scores as the potential targets.
prediction method, by neglecting intra-molecular base- Among 10 experimentally-validated sRNA-target interac-
pairings and multi-loops. This method was originally pro- tions, there are 7 pairs ranked in the top 50 scores. How-
posed for miRNA target prediction, but it was also applied ever, this model has not been applied widely because of
to sRNA target prediction by Sharma et al. [67]. Compared the following reasons. First, this model was designed spe-
to RNAhybrid, RNAplex [57] used a slightly different cifically for E. coli genome. For example, the conservation
280
Table 1 Main computational tool for prediction of bacterial sRNAs and their target mRNAs
Type Tool Availability Main features References
Comparative genomics-based QRNA ftp://ftp.genetics.wustl.edu/pub/ Sequence and secondary structure; suitable for two sequence alignment [17]
models for sRNA prediction eddy/software/qrna.tar.Z
RNAz http://www.tbi.univie.ac.at/~wash/ Sequence and secondary structure; suitable for multiple sequence alignment [18]
RNAz
EvoFold http://www.cbse.ucsc.edu/jsp/ Sequence, structure and evolution; suitable for multiple sequence alignment [19]
EvoFold
SIPHT http://bio.cs.wisc.edu/sRNA Sequence and Rho-independent terminators [20]
sRNAPredict http://www.tufts.edu/sackler/ Sequence and Rho-independent terminators [21]
waldorlab/sRNAPredict.html
NAPP – Phylogenetic profiling of nucleic acid fragments; cluster analysis [22]
Machine learning-based models Carter et al. http://rnagene.lbl.gov/ Nucleotide compositions and secondary structure; neural networks and support vector [23]
for sRNA prediction machines
profile associated with E. coli was considered, which hin- we also extracted these flanking sequences using sliding win-
ders people from applying the model in other organisms. dows. For each sub-sequence, 10 features were computed,
Second, the model only considers secondary structures of including the percent composition of bases in interior loops,
sRNAs rather than the joint structures of two RNA the minimum free energy (MFE) of hybridization, and the
sequences, which makes the model less competitive in com- difference in the MFE values before and after hybridization.
parison with the models presented later. Third, there is no Each sRNA-target mRNA interaction was described by
program provided for sRNA biologists. 10,000 features. Third, we applied the Tclass system [50]
The second model, termed TargetRNA, was presented and support vector machines to construct prediction models
by Tjaden et al. [29,30]. TargetRNA included an individual sRNATargetNB and sRNATargetSVM, respectively
base pair model and a stacked base pair model for calculat- [33,34]. The main difference between sRNATargetNB and
ing hybridization score for sRNA-target interactions. The sRNATargetSVM is that the former only takes six features,
individual base pair model was based on a modified which were selected from 10,000 initial features using the
Smith–Waterman local sequence alignment algorithm, Tclass system [50], to determine whether a given pair of
and the stacked base pair model was a straightforward sRNA and mRNA interacts or not, whereas the latter needs
extension of RNA folding approaches with intra-molecular 10,000 features. Therefore, sRNATargetNB runs faster.
base-pairing prohibited, which is very similar to the statis- Finally, the performance of the two models above was eval-
tical idea from RNAhybrid [55,56]. However, TargetRNA uated on an independent test set containing 22 positive sam-
was optimized on a training dataset containing 12 experi- ples and 1700 randomly-generated negative samples.
mentally-verified sRNA-target mRNA interactions. The Prediction accuracies are 93.03% and 80.55%, respectively.
optimal translational initiation region was –30 to +20 nt IntaRNA was presented by Busch et al. [32], which
and seed length was 9 nt. For each potential sRNA-target incorporated accessibility of binding sites of two RNA
mRNA interaction, the model calculates the hybridization molecules and a user-definable seed. Similar to RNAup
score, which was assumed to abide by extreme value distri- [53,58], IntaRNA searched the optimal interaction with
bution. The extreme value distribution was obtained by the minimum extended hybridization energy, which was
considering a large number of randomly-generated defined as the sum of hybridization energy and the energy
sRNA-target mRNA interactions. Therefore, for a given to make the binding sites accessible. The difference between
sRNA, all potential sRNA-target mRNA interactions will IntaRNA and RNAup is that MFE values for seed regions
be considered and the interactions with the top 10 or 50 are also included in the calculation of the minimum
smallest P values will be taken as the putative interactions. extended hybridization energy in IntaRNA. Three factors
As a result, TargetRNA can pick up 8 from the 12 interac- make IntaRNA outperform other simpler programs like
tions with top 10 smallest P values. RNAhybrid: (i) finding the optimal structure with the
Mandin et al. proposed a model for sRNA target predic- MFE; (ii) summing the energy for opening original struc-
tion by searching strong sRNA-mRNA duplexes [31]. Each tures of binding sites and (iii) involving the MFE of seed
sRNA-mRNA duplex was scored as a sum of both positive regions. IntaRNA provides the binding sites of two RNA
contributions and negative contributions, which correspond molecules and the energy of the hybridization, rather than
to pairing nucleotides and bulges/internal loops, respec- the judgment of interacting or not.
tively. The cost of bulges and internal loops was empirically From these models, we can see that different potential
gauged using four validated sRNA-mRNA interactions. binding regions are considered in different models. So,
The statistical significance of the duplex was used as the cri- which regions are suitable for sRNA target prediction?
terion for interaction, which was assessed by comparing to To address this problem, we continued our efforts to collect
an ensemble of random sequences. During prediction, the sRNA targets in peer-reviewed papers and constructed the
flanking regions, 140 to +90 nt around the translation ini- database sRNATarBase [5], which contains 138 sRNA–
tiation sites and 60 to +90 nt around the translation stop target interactions and 252 non-interaction entries. Using
sites in target mRNA sequences, were considered. this database, we found that binding regions of 95.79%
Obviously all aforementioned models only take a certain of the targets (91 of 95 entries containing binding regions)
number of top predictions (with the larger comparison are located in the region 150 to 100 nt around the initial
scores, small free energies or small P values) as potential tar- start codon of the targets. We therefore proposed another
gets. To determine clearly whether a given sRNA-mRNA method termed sTarPicker to improve the performance of
complex interacts or not, our group have systematically col- sRNA target prediction [36].
lected 46 positive samples (true interactions) and 86 negative The sTarPicker method was based on a two-step model
samples (no interaction) as the training dataset. Then, for hybridization between an sRNA and an mRNA target.
according to the positions of mRNA binding sites from The model first selects stable duplexes after screening all
the validated sRNA-target mRNA interactions at that time, possible duplexes between the sRNA and the potential
sub-sequences located within 30 to +30 nt of the initial mRNA target. Next, hybridization between the sRNA
start codons of targets were selected as core binding regions. and the target is extended to span the entire binding site.
Based on the hypothesis that sequences flanking the core Finally, quantitative predictions are produced with an
binding regions are also likely to influence the interactions, ensemble classifier generated using the Tclass system,
282 Genomics Proteomics Bioinformatics 10 (2012) 276–284
originally developed for gene expression profile-based sam- available to describe sRNAs or sRNA-target mRNA interac-
ple classification by our laboratory [50]. In determining the tions. Then, different strategies for feature selection in
hybridization energies of seed regions and binding regions, machine-learning based model construction can be applied
both thermodynamic stability and site accessibility of the to search suitable features or their combinations.
sRNAs and targets were considered. The major difference The considerations mentioned above can also be applied
between the hybridization model in sTarPicker and the to the second direction, i.e., developing prediction models
one used in IntaRNA lies in the filtering of seed regions. for sRNA target proteins. To our knowledge, there is no
IntaRNA does not filter any seed regions and instead, prediction model specifically for sRNA target proteins.
searches the optimal hybridization of two RNA molecules Although the general prediction model for RNA-protein
with the minimum extended hybridization energy in the interaction can be applied here [72], we believe that models
whole length of two RNAs. sTarPicker first finds all possi- based on the sRNA-protein interaction in bacteria will pro-
ble seed regions, then removes the seed regions with high vide better support for the discovery of sRNA target pro-
hybridization energy. Here we assume that only stable seed teins. To this end, we have been collecting the validated
hybridization results in stable hybridization between two sRNA-protein interactions in the database sRNATarBase
RNA molecules, which was verified by the real sRNA-tar- [5]. However, the number of samples is so low that we
get mRNA interactions from sRNATarBase [5]. are not able to develop a reliable model yet.
Compared to IntaRNA, sRNATarget and TargetRNA, The third direction involves developing comprehensive
sTarPicker performed best in both performance of target bioinformatics pipelines for the discovery of sRNAs and
prediction and accuracy of the predicted binding sites on sRNA-target interactions using high throughput sequenc-
17 non-redundant validated sRNA-target pairs [36]. ing technology (HTS). With the application of HTS, a large
Recently, Eggenhofer et al. developed a webserver number of short reads will be generated. How to efficiently
termed RNApredator specifically for prediction of sRNA manage these short reads and to find potential sRNAs has
targets [35]. RNApredator predicts sRNA targets using become an important bioinformatics topic in HTS-based
RNAplex [57]. To improve the prediction specificity, RNA- sRNA discovery. For example, in their recent paper [73],
predator also takes into account the accessibility of the Pellin and his colleagues presented a bioinformatics pipe-
target. To enable fast computation, the accessibility is line for sRNA discovery in Mycobacterium tuberculosis
pre-computed using RNAplfold [69,70]. During prediction, using RNA-seq and conservation analysis, and a list of
the web server considers the regions 200 to +200 nt of both 1948 candidate sRNAs was found. Currently, HTS has
50 and 30 UTR (default) as the potential binding regions and been widely applied in molecular biology, resulting in the
top 100 predictions as the potential interactions. discovery of sRNA transcripts [74–81], identification of
human miRNA-mRNA [82] or RNA-protein interactions
Future thinking in developing bioinformatics models for [83–85] and determination of mRNA secondary structure
bacterial sRNAs and their targets [86–88]. However, HTS has not been applied to investigate
the interactions of sRNA-protein and sRNA-mRNA in
Here we briefly present an overview of prediction models bacteria. We can predict that HTS will soon have a wide-
for bacterial sRNAs and their targets, and point out the spread application in sRNA biology.
advantage and disadvantage of each class of models.
Although these models have provided much support for Competing interests
experimental discovery of sRNAs and their targets, they
are not perfect. Here we want to emphasize three future The authors have no competing interests to declare.
directions in developing bioinformatics models.
The first thing is to improve the existing prediction models. Acknowledgements
Compared to methods for open reading frame identification,
the prediction accuracy of sRNAs is still very low. For exam- This work was supported by grants from National Key Ba-
ple, sTarPicker has the highest positive prediction value on sic Research and Development Program (Grant No.
the independent test dataset [36]; however, a large number 2010CB912801), and National Natural Science Foundation
of false positive samples were included in the prediction of China (Grant No. 31071157 and 31271404).
results. Therefore, developing better models for sRNAs and
their targets is still necessary. From the perspective of statis- References
tics, we firstly need more samples. At present, some databases,
such as sRNAMap [1] and Rfam [71] for sRNAs and sRNA- [1] Huang HY, Chang HY, Chou CH, Tseng CP, Ho SY, Yang CD,
TarBase [5] for sRNA targets, have been developed. These et al. SRNAMap: genomic maps for small non-coding RNAs, their
regulators and their targets in microbial genomes. Nucleic Acids Res
databases provide a data source for model development. 2009;37:D150–4.
The key point is to construct suitable features to describe [2] Livny J, Waldor MK. Identification of small RNAs in diverse
the bacterial sRNA gene and sRNA-target mRNA interac- bacterial species. Curr Opin Microbiol 2007;10:96–101.
tion. To this end, before new features are explored, it might [3] Gottesman S, Storz G. Bacterial small RNA regulators: versatile roles
be better to comprehensively integrate all features currently and rapidly evolving variations. Cold Spring Harb Perspect Biol 2011;3.
Li W et al / Predicting sRNAs and Their Targets in Bacteria 283
[4] Vanderpool CK, Balasubramanian D, Lloyd CR. Dual-function [28] Zhang Y, Sun S, Wu T, Wang J, Liu C, Chen L, et al. Identifying
RNA regulators in bacteria. Biochimie 2011;93:1943–9. Hfq-binding small RNA targets in Escherichia coli. Biochem Biophys
[5] Cao Y, Wu J, Liu Q, Zhao Y, Ying X, Cha L, et al. SRNATarBase: a Res Commun 2006;343:950–5.
comprehensive database of bacterial sRNA targets verified by [29] Tjaden B, Goodwin SS, Opdyke JA, Guillier M, Fu DX, Gottesman
experiments. RNA 2010;16:2051–7. S, et al. Target prediction for small, noncoding RNAs in bacteria.
[6] Guillier M, Gottesman S. Remodelling of the Escherichia coli outer Nucleic Acids Res 2006;34:2791–802.
membrane by two small regulatory RNAs. Mol Microbiol 2006;59: [30] Tjaden B. TargetRNA: a tool for predicting targets of small RNA
231–47. action in bacteria. Nucleic Acids Res 2008;36:W109–13.
[7] Valentin-Hansen P, Johansen J, Rasmussen AA. Small RNAs [31] Mandin P, Repoila F, Vergassola M, Geissmann T, Cossart P.
controlling outer membrane porins. Curr Opin Microbiol 2007;10: Identification of new noncoding RNAs in Listeria monocytogenes
152–5. and prediction of mRNA targets. Nucleic Acids Res 2007;35:962–74.
[8] Massé E, Vanderpool CK, Gottesman S. Effect of RyhB small RNA [32] Busch A, Richter AS, Backofen R. IntaRNA: efficient prediction of
on global iron use in Escherichia coli. J Bacteriol 2005;187:6962–71. bacterial sRNA targets incorporating target site accessibility and seed
[9] Massé E, Salvail H, Desnoyers G, Arguin M. Small RNAs controlling regions. Bioinformatics 2008;24:2849–56.
iron metabolism. Curr Opin Microbiol 2007;10:140–5. [33] Zhao Y, Li H, Hou Y, Cha L, Cao Y, Wang L, et al. Construction of
[10] Večerek B, Moll I, Bläsi U. Control of Fur synthesis by the non- two mathematical models for prediction of bacterial sRNA targets.
coding RNA RyhB and iron-responsive decoding. EMBO J Biochem Biophys Res Commun 2008;372:346–50.
2007;26:965–75. [34] Cao Y, Zhao Y, Cha L, Ying X, Wang L, Shao N, et al.
[11] Lenz DH, Miller MB, Zhu J, Kulkarni RV, Bassler BL. CsrA and sRNATarget: a web server for prediction of bacterial sRNA targets.
three redundant small RNAs regulate quorum sensing in Vibrio Bioinformation 2009;3:364–6.
cholerae. Mol Microbiol 2005;58:1186–202. [35] Eggenhofer F, Tafer H, Stadler PF, Hofacker IL. RNApredator: fast
[12] Tu KC, Bassler BL. Multiple small RNAs act additively to integrate accessibility-based prediction of sRNA targets. Nucleic Acids Res
sensory information and control quorum sensing in Vibrio harveyi. 2011;39:W149–54.
Genes Dev 2007;21:221–33. [36] Ying X, Cao Y, Wu J, Liu Q, Cha L, Li W. STarPicker: a method for
[13] Romby P, Vandenesch F, Wagner EGH. The role of RNAs in the efficient prediction of bacterial sRNA targets based on a two-step
regulation of virulence-gene expression. Curr Opin Microbiol model for hybridization. PLoS One 2011;6:e22705.
2006;9:229–36. [37] Vogel J, Wagner EGH. Target identification of small noncoding
[14] Toledo-Arana A, Repoila F, Cossart P. Small noncoding RNAs RNAs in bacteria. Curr Opin Microbiol 2007;10:262–70.
controlling pathogenesis. Curr Opin Microbiol 2007;10:182–8. [38] Pichon C, Felden B. Small RNA gene identification and mRNA
[15] Voss B, Georg J, Schön V, Ude S, Hess WR. Biocomputational target predictions in bacteria. Bioinformatics 2008;24:2807–13.
prediction of non-coding RNAs in model cyanobacteria. BMC [39] Backofen R, Hess WR. Computational prediction of sRNAs and
Genomics 2009;10:123. their targets in bacteria. RNA Biol 2010;7:33–42.
[16] Acebo P, Martin-Galiano AJ, Navarro S, Zaballos Á, Amblar M. [40] Sharma CM, Vogel J. Experimental approaches for the discovery and
Identification of 88 regulatory small RNAs in the TIGR4 strain of the characterization of regulatory small RNA. Curr Opin Microbiol
human pathogen Streptococcus pneumoniae. RNA 2012;18:530–46. 2009;12:536–46.
[17] Rivas E, Eddy S. Noncoding RNA gene detection using comparative [41] Kozomara A, Griffiths-Jones S. MiRBase: integrating microRNA
sequence analysis. BMC Bioinformatics 2001;2:8. annotation and deep-sequencing data. Nucleic Acids Res
[18] Washietl S, Hofacker IL, Stadler PF. Fast and reliable prediction of 2011;39:D152–7.
noncoding RNAs. Proc Natl Acad Sci U S A 2005;102:2454–9. [42] Eddy SR. Computational genomics of noncoding RNA genes. Cell
[19] Pedersen JS, Bejerano G, Siepel A, Rosenbloom K, Lindblad-Toh K, 2002;109:137–40.
Lander ES, et al. Identification and classification of conserved RNA [43] Storz G, Altuvia S, Wassarman KM. An abundance of RNA
secondary structures in the human genome. PLoS Comput Biol regulators. Annu Rev Biochem 2005;74:199–217.
2006;2:e33. [44] Hershberg R, Altuvia S, Margalit H. A survey of small RNA-
[20] Livny J, Teonadi H, Livny M, Waldor MK. High-throughput, encoding genes in Escherichia coli. Nucleic Acids Res 2003;31:
kingdom-wide prediction and annotation of bacterial non-coding 1813–20.
RNAs. PLoS One 2008;3:e3197. [45] Vogel J, Sharma CM. How to find small non-coding RNAs in
[21] Livny J, Fogel MA, Davis BM, Waldor MK. SRNAPredict: an bacteria. Biol Chem 2005;386:1219–38.
integrative computational approach to identify sRNAs in bacterial [46] Argaman L, Hershberg R, Vogel J, Bejerano G, Wagner EGH,
genomes. Nucleic Acids Res 2005;33:4096–105. Margalit H, et al. Novel small RNA-encoding genes in the intergenic
[22] Marchais A, Naville M, Bohn C, Bouloc P, Gautheret D. Single-pass regions of Escherichia coli. Curr Biol 2001;11:941–50.
classification of all noncoding sequences in a bacterial genome using [47] Rivas E, Klein RJ, Jones TA, Eddy SR. Computational identification
phylogenetic profiles. Genome Res 2009;19:1084–92. of noncoding RNAs in E. coli by comparative genomics. Curr Biol
[23] Carter RJ, Dubchak I, Holbrook SR. A computational approach to 2001;11:1369–73.
identify genes for functional RNAs in genomic sequences. Nucleic [48] Rose D, Hertel J, Reiche K, Stadler PF, Hackermüller J. NcDN-
Acids Res 2001;29:3928–38. Align: plausible multiple alignments of non-protein-coding genomic
[24] Strom P, Sneve R, Kristiansen KI, Snøve O, Grünfeld T, Rognes T, sequences. Genomics 2008;92:65–74.
et al. Predicting non-coding RNA genes in Escherichia coli with [49] Lu X, Goodrich-Blair H, Tjaden B. Assessing computational tools for
boosted genetic programming. Nucleic Acids Res 2005;33:3263–70. the discovery of small RNA genes in bacteria. RNA 2011;17:1635–47.
[25] Wang C, Ding C, Meraz RF, Holbrook SR. PSoL: a positive sample [50] Li W, Xiong MM. Tclass: tumor classification system based on gene
only learning algorithm for finding non-coding RNA genes. Bioin- expression profile. Bioinformatics 2002;18:325–6.
formatics 2006;22:2590–6. [51] Richter AS, Schleberger C, Backofen R, Steglich C. Seed-based
[26] Tran TT, Zhou F, Marshburn S, Stead M, Kushner SR, Xu Y. De IntaRNA prediction combined with GFP-reporter system identifies
novo computational prediction of non-coding RNA genes in mRNA targets of the small RNA Yfr1. Bioinformatics 2010;26:1–5.
prokaryotic genomes. Bioinformatics 2009;25:2897–905. [52] Storz G, Opdyke JA, Zhang A. Controlling mRNA stability and
[27] Wagner EGH. Kill the messenger: bacterial antisense RNA promotes translation with small, noncoding RNAs. Curr Opin Microbiol
mRNA decay. Nat Struct Mol Biol 2009;16:804–6. 2004;7:140–4.
284 Genomics Proteomics Bioinformatics 10 (2012) 276–284
[53] Hofacker IL, Fontana W, Stadler PF, Bonhoeffer LS, Tacker M, [72] Muppirala UK, Honavar VG, Dobbs D. Predicting RNA-protein
Schuster P. Fast folding and comparison of RNA secondary interactions using only sequence information. BMC Bioinformatics
structures. Monatsh Chem 1994;125:167–88. 2011;12:489.
[54] Lorenz R, Bernhart SH, Höner Zu Siederdissen C, Tafer H, Flamm [73] Pellin D, Miotto P, Ambrosi A, Cirillo DM, Di Serio C. A genome-
C, Stadler PF, et al. ViennaRNA package 2.0. Algorithms Mol Biol wide identification analysis of small regulatory RNAs in mycobac-
2011;6:26. terium tuberculosis by RNA-Seq and conservation analysis. PLoS
[55] Rehmsmeier M, Steffen P, Höchsmann M, Giegerich R. Fast and One 2012;7:e32723.
effective prediction of microRNA/target duplexes. RNA [74] Oliver HF, Orsi RH, Ponnala L, Keich U, Wang W, Sun Q, et al.
2004;10:1507–17. Deep RNA sequencing of L. monocytogenes reveals overlapping and
[56] Kruger J, Rehmsmeier M. RNAhybrid: microRNA target prediction extensive stationary phase and sigma B-dependent transcriptomes,
easy, fast and flexible. Nucleic Acids Res 2006;34:W451–4. including multiple highly transcribed noncoding RNAs. BMC
[57] Tafer H, Hofacker IL. RNAplex: a fast tool for RNA–RNA Genomics 2009;10:641.
interaction search. Bioinformatics 2008;24:2657–63. [75] Yoder-Himes DR, Chain PSG, Zhu Y, Wurtzel O, Rubin EM, Tiedje
[58] Muckstein U, Tafer H, Hackermuller J, Bernhart SH, Stadler PF, JM, et al. Mapping the Burkholderia cenocepacia niche response via
Hofacker IL. Thermodynamics of RNA–RNA binding. Bioinfor- high-throughput sequencing. Proc Natl Acad Sci U S A 2009;106:
matics 2006;22:1177–82. 3976.
[59] Alkan C, Karakoc E, Nadeau JH, Sahinalp SC, Zhang K. RNA– [76] Camarena L, Bruno V, Euskirchen G, Poggio S, Snyder M.
RNA interaction prediction and antisense RNA target search. J Molecular mechanisms of ethanol-induced pathogenesis revealed by
Comput Biol 2006;13:267–82. RNA-sequencing. PLoS Pathog 2010;6:e1000834.
[60] Chitsaz H, Salari R, Sahinalp SC, Backofen R. A partition function [77] Kolev NG, Franklin JB, Carmi S, Shi H, Michaeli S, Tschudi C. The
algorithm for interacting nucleic acid strands. Bioinformatics transcriptome of the human pathogen Trypanosoma brucei at single-
2009;25:i365–73. nucleotide resolution. PLoS Pathog 2010;6:e1001090.
[61] Salari R, Backofen R, Sahinalp SC. Fast prediction of RNA-RNA [78] Sharma CM, Hoffmann S, Darfeuille F, Reignier J, Findeiss S, Sittka
interaction. Algorithms Mol Biol 2010;5:5. A, et al. The primary transcriptome of the major human pathogen
[62] Huang FWD, Qin J, Reidys CM, Stadler PF. Partition function and Helicobacter pylori. Nature 2010;464:250–5.
base pairing probabilities for RNA–RNA interaction prediction. [79] Raghavan R, Groisman EA, Ochman H. Genome-wide detection of
Bioinformatics 2009;25:2646–54. novel regulatory RNAs in E. coli. Genome Res 2011;21:1487–97.
[63] Kato Y, Sato K, Hamada M, Watanabe Y, Asai K, Akutsu T. [80] Atsuko S, Motomu M, Kiriko H, Wataru N, Reiko H, Kenji N, et al.
RactIP: fast and accurate prediction of RNA–RNA interaction using Deep sequencing reveals as-yet-undiscovered small RNAs in Esche-
integer programming. Bioinformatics 2010;26:i460–6. richia coli. BMC Genomics 2011;12:428.
[64] Li AX, Marz M, Qin J, Reidys CM. RNA–RNA interaction [81] Kumar R, Lawrence ML, Watt J, Cooksey AM, Burgess SC, Nanduri
prediction based on multiple sequence alignments. Bioinformatics B. RNA-Seq based transcriptional map of bovine respiratory disease
2011;27:456–63. pathogen “Histophilus somni 2336”. PloS One 2012;7:e29435.
[65] Seemann SE, Richter AS, Gesell T, Backofen R, Gorodkin J. [82] Chi SW, Zang JB, Mele A, Darnell RB. Argonaute HITS-CLIP
PETcofold: predicting conserved interactions and structures of two decodes microRNA–mRNA interaction maps. Nature 2009;460:
multiple alignments of RNA sequences. Bioinformatics 2011;27: 479–86.
211–9. [83] Sittka A, Lucchini S, Papenfort K, Sharma CM, Rolle K, Binnewies
[66] Zuker M. Mfold web server for nucleic acid folding and hybridization TT, et al. Deep sequencing analysis of small noncoding RNA and
prediction. Nucleic Acids Res 2003;31:3406–15. mRNA targets of the global post-transcriptional regulator, Hfq.
[67] Sharma CM, Darfeuille F, Plantinga TH, Vogel J. A small RNA PLoS Genet 2008;4:e1000163.
regulates multiple ABC transporter mRNAs by targeting C/A-rich [84] Hafner M, Landthaler M, Burger L, Khorshid M, Hausser J,
elements inside and upstream of ribosome-binding sites. Genes Dev Berninger P, et al. Transcriptome-wide identification of RNA-binding
2007;21:2804–17. protein and microRNA target sites by PAR-CLIP. Cell
[68] Muckstein U, Tafer H, Bernhart SH, Hernandez-Rosales M, Vogel J, 2010;141:129–41.
Stadler PF, et al. Translational control by RNA–RNA interaction: [85] Zhang C, Darnell RB. Mapping in vivo protein-RNA interactions at
improved computation of RNA–RNA binding thermodynamics. In: single-nucleotide resolution from HITS-CLIP data. Nat Biotechnol
Elloumi M, Küng J, Linial M, Murphy RF, Schneider K, Toma C, 2011;29:607–14.
editors. Bioinformatics research and development, vol. 13. Berlin, [86] Mauger DM, Weeks KM. Toward global RNA structure analysis.
Heidelberg: Springer; 2008. p. 114–27. Nat Biotechnol 2010;28:1178–9.
[69] Bompfünewerer AF, Backofen R, Bernhart SH, Hertel J, Hofacker [87] Underwood JG, Uzilov AV, Katzman S, Onodera CS, Mainzer JE,
IL, Stadler PF, et al. Variations on RNA folding and alignment: Mathews DH, et al. FragSeq: transcriptome-wide RNA structure
lessons from Benasque. J Math Biol 2008;56:129–44. probing using high-throughput sequencing. Nat Methods
[70] Stephan B, Ullrike M, Ivo H. RNA accessibility in cubic time. 2010;7:995–1001.
Algorithms Mol Biol 2011;6:3. [88] Kertesz M, Wan Y, Mazor E, Rinn JL, Nutter RC, Chang HY, et al.
[71] Gardner PP, Daub J, Tate JG, Nawrocki EP, Kolbe DL, Lindgreen Genome-wide measurement of RNA secondary structure in yeast.
S, et al. Rfam: updates to the RNA families database. Nucleic Acids Nature 2010;467:103–7.
Res 2009;37:D136–40.