Computational_Characterization_of_Transc
Computational_Characterization_of_Transc
Volume 9, Issue 8, August -2021, Impact Factor: 7.429, Available online at: www.ijaresm.com
-------------------------------------------------------------------*****************-------------------------------------------------------------------
ABSTRACT
The completion of the sequencing of the human, mouse, and other genomes has allowed efforts to annotate these
genomes extensively using a combination of computational and experimental methodologies. A thorough list of
transcripts, together with basic information on where the various transcripts are expressed, is a vital initial step
toward annotating a genome after it has been fully sequenced. The process of finding the transcribed areas of a
sequenced genome is made more difficult by the fact that transcripts are made up of several small exons that are
spread throughout considerably larger amounts of genomic DNA. The wildly disparate predictions about the
number of genes in the human genome highlight this difficulty.
Key words: Human genome, Computational characterization, Gene transcription, Gene expression
INTRODUCTION
The link between gene expression dynamics and biological function has long been studied (Davidsonet al.,1963; Hydén and
Lange, 1965). While it is obvious that measuring gene expression cannot capture all of the information content of a cell, the
ease with which nucleic acids can be manipulated has led to the widespread use of gene expression measures in many fields
of biology. Recent advances in microarray (Schena, 1996) and sequencing technology (Wang et al., 2009) have
significantly reduced the cost and boosted the throughput of detecting RNA gene expression, to the point that a search for
the terms “differential gene expression” on NCBI PubMed yielded 68,519 hits. RNA sequencing (RNA-Seq) has emerged
as a dominating technology for determining gene expression levels (Kukurba and Montgomery, 2015). Indeed, RNA-Seq
technology for assessing gene expression has become nearly common in biomedical research, with studies now routinely
sequencing hundreds of samples (Carithers and Moore, 2015). However, several known and unknown biases exist in RNA
quantification, and efforts have been made to reduce these effects (Love et al., 2016). In many circumstances, the
researcher's choice of analysis technique dictates which of these biases are significant and which may be safely ignored.
Only a few lengthy noncoding RNAs have been functionally studied. They have been shown to regulate regulation through
interacting with other macromolecules in the cell, including DNA, RNA, and protein (Jalali et al., 2017).Advances in
experimental and computational methods have substantially aided genomic research over the last two decades. Next-
generation sequencing technology have reduced the cost of de novo sequencing of huge genomes, and strong computational
tools have permitted precise annotation of genomic DNA sequences. Noncoding RNAs, repetitive elements, chromatin
states, epigenetic changes, and gene regulatory elements must all be considered when mapping functional areas in genomes.
This genuinely worldwide initiative has relied heavily on a combination of comparative genomics, high-throughput biology
investigations, and machine learning technologies (Taher et al., 2015).
For more than 15 years, researchers have been looking for innovative ways to identify genes in anonymous genomic
sequences. During this time, the research has progressed from developing systems to detect protein coding regions in
compact mitochondrial or bacterial genomes to the problem of anticipating the precise organization of multi-exon
vertebrate genes. The best program currently available locates more than 80% of the internal coding exons perfectly, and
only 5% of the predictions do not overlap an actual exon. Computational approaches are indeed quite beneficial, given their
precision; yet, they do not eliminate the requirement for experimental validation. Although the performance for identifying
the coding moiety of genes (internal coding exons) is excellent, the determination of the whole extent of the transcript (5′
and 3′ extremities of the gene) and the location of promoter regions remain uncertain (Claverie, 1997).
Transcription Termination
Transcription termination occurs when the polymerase is released following a transcription event, thereby delimiting
transcription units; nevertheless, the functional significance of termination extends beyond the simple defining of gene
borders. Transcription termination pathways influence the transcriptome by deciding the cellular fate of the produced
transcripts. Recent studies have highlighted the importance of these pathways in controlling the degree of widespread
transcription, which has sparked interest in post-initiation processes in gene expression control. Transcription termination
routes involved in non-coding RNA generation, such as the Nrd1–Nab3–Sen1 (NNS) system in yeast and the cap-binding
complex (CBC)–ARS2 pathway in humans, are important factors of transcription quality control (Porrua and Libri, 2015).
Multiple transcript isoforms can be generated from a single gene via alternative splicing, alternate promoter use, and
alternative polyadenylation (de Klerk and 't Hoen, 2015). At least 70% of genes in mammalian genomes have numerous
polyadenylation sites, >50% of genes have alternative transcription start sites, and virtually all genes undergo alternative
splicing (Tian and Manley, 2016). As a result, these molecular processes have the potential to significantly expand the
repertoire of transcripts, proteins, and activities encoded by mammalian genomes (Forrest et al., 2014). Alternative
transcript isoforms regulate critical biological processes (Gabut et al., 2011), and their misexpression is linked to illnesses
such as cancer (Sendoelet al., 2017). Alternative transcripts produce alternative proteins with distinct protein interactions,
subcellular localization, stability, DNA-binding characteristics, lipid-binding properties, or enzymatic activity for dozens of
genes (Yang et al., 2016).
Although DNA sequencing has (nearly) entirely revealed the sequence of the human genome, it is still not fully understood.
Although most (if not all) genes have been found using a combination of high throughput experimental and bioinformatics
approaches, significant work remains to be done to clarify the biological activities of their protein and RNA products.
Recent findings reveal that the great majority of the genome's noncoding DNA is involved in biochemical activities such as
gene expression regulation, chromosome architecture organization, and signals influencing epigenetic inheritance.
The estimates of the number of human genes were between 50,000 and 140,000 before to whole genome sequence (with
often ambiguous determination if these numbers included genes coded for non-protein) (IHGSC, 2004).The number of
genes which have been discovered for protein coding decreased to 19.000-20.000 as the quality of genome sequences and
processes has increased (Ezkurdiaet al., 2014; Saey, 2018). However, a better knowledge of the importance of sequences
that do not encode proteins but instead express regulatory RNA has increased the overall number of genes to at least
46,831, (Alles, 2018), plus an additional 2300 micro-RNA genes (Pennisi, 2012). Functional DNA elements that do not
encode RNA or proteins were discovered in 2012 (Zhang, 2018). A 2018 population study discovered an additional 300
million bases of human genome that were not included in the reference sequence (IHGSC, 2001)
Some noncoding DNA contains genes that code for RNA molecules that perform crucial biological functions (noncoding
RNA, for example ribosomal RNA and transfer RNA). An essential objective for modern genome research, such as the
project ENCODE, is to investigate the complete human genome by means of a range of experimental methods, the findings
of which reveal molecular activity.Because non-coding DNA outnumbers coding DNA, the concept of the sequenced
genome has evolved into a more concentrated analytical concept than the traditional concept of the DNA-coding gene
(Waters, 2007; Gannett, 2008).
Common Mechanism For Termination Of Fission Yeast Transcript Coding And Non-Coding Of Rna Genes
The synthesis of ncRNAs necessitates the use of mRNA 3′ end processing factors.
The 3" end of ncRNA genes indicated that the end-processing machine mRNA 3" can play an active role in snoRN A
production in a 2018 reported study by Larochelle et al. Conditional strains were created to investigate the role of Rna14,
Pcf11, Ysh1, and Dhp1 in the synthesis of ncRNAs, as these proteins are all encoded by important genes. We created
strains with indigenous pcf11 and dhp1 genes under the control of the thiamine-repressible nmt81 promoter, as previously
exploited to suppress the expression of crucial core exosome subunits29 and Seb124.
Especially, the levels produced from the nmt promotor of Seb1, Pcf11, and Dhp1 were equivalent to those expressed from
their endogenous promoter in non-repressive circumstances, but significantly reduced the levels of Thiamine
supplementation in the nmt promoter (Fig. 1a-c). Pnmt-pcf11 and Pnmt-dhp1 cells cultivated in thiamine-supplemented
media displayed growth arrest, indicating that Pcf11 and Dhp1 are required for viability30 (Fig. 1d). Pcf11 depletion in S.
pombe also reduces mRNA production31, as one would expect from a protein required for mRNA synthesis. The
rapamycin-dependent anchor-away system32 (Fig. 1e) was utilized to construct conditional strains for rna14 and ysh1, as
the nmt promoter did not provide sufficient Rna14 and Ysh1 depletion to trigger growth arrest. Importantly, nuclear
deletion of Ysh1 and Rna14 hindered rapamycin-dependent mRNA synthesis (Fig. 1h, I and resulted in the creation of read-
through transcripts (Fig. 1j), consistent with 3′ end processing abnormalities at protein-coding genes. These workers
(Larochelle et al., 2018), employed an RNase H cleavage assay that can detect both mature snoRNA and 3′-extended
polyadenylated snoRNA precursors to investigate the contribution of Pcf11, SnoRNA 3′′ end processing, Dhp1, Ysh1, and
Rna14 (Fig. 1a). We employ pab2 mutant as a controller which collects 3′′ extended polyadenylated precursors to reduce
snR3 and snR99 snoRNA levels by 45 and 34 percent28 (Fig. 1b, c, compare lanes 3–4 to 1–2).Deficited levels of snR3 and
snR99 snoRNAs in the Pcf11, Ysh1, and Rna14 also resulted in decreased numbers. Pcf11 declines by 39 and 42% (Fig.
1b, c, lanes 5–6; Fig. 1k); 45 and 46 percent decrease for Ysh1 (Fig. 1d, e, lanes 3–4); and 41 and 33 percent decrease for R
Long snR3 read-through products were discovered by Northern blot analysis in cells lacking Pcf11, Ysh1, and Rna14 (Fig.
1f, lanes 2–3 and 5), in addition to polyadenylated pre-snoRNA revealed by RNase H assays (Fig. 1b, e).
In Dhp1 defective cells, by contrast, snoRNA 3′-extensions (Fig. 1b, c, lanes 9–10; Fig. 1k; and Fig. 1f, lane 7) did not
accumulate, which suggests that Dhp1 plays a function in transcription terminations that exceed 3′′ final processing. As
previously demonstrated24, a Seb1 deficiency did not impair RNA production, as evidenced by comparable levels of
mature snoRNA between wild-type and Seb1-deficient cells, but altered cleavage site selection, as evidenced by the
lengthening of the 3′-extended snoRNA precursors on RNase H assays (Fig. 1b, c, compare lanes 1–2 and 7–8; Fig. 1k.
These findings imply that 3′′ end factors for mRNA are necessary in synthesizing snoRNAs transcribed separately. These
findings, together with evidence indicating the recruitment of 3′ end processing components downstream of ncRNA genes
(Fig. 1), imply that ncRNA 3′ end creation involves cleavage and polyadenylation.
Fig. 1: In fission yeast, there is a common mechanism for transcription termination at both coding and noncoding
RNA genes.
The snoRNA production requires the mRNA cleavage and polyadenylation complex. A cleavage schematic of RNase H for
b–e. The 3′ fragment (mature or 3′-extended) is detected by north blotting after the RNase H cleavage of the snoRNA at
areas of RNA:DNA hybrids in the case of a sequence-specific DNA oligonucleotide (NB).When oligo d(T) is added to the
RNase H reaction, it eliminates heterogeneous poly(A) tails, resulting in discrete products. b and c Total RNA from the
indicated strains was treated with RNase H in the presence of DNA oligonucleotides corresponding to snR3 (b) and snR99
(c) after being cultured in thiamine-supplemented medium for 15 hours to deplete Pcf11, Seb1, and Dhp1 (c). The RNase H
reactions were carried out in the presence (+) or absence () of oligo d (T). To see 3′-extended (3′-ext) cleavage products, the
top panel indicates a prolonged exposure of the middle panel. As a loading control, the 5S rRNA was employed. d and e As
in b and c, but with cells that had previously been treated with rapamycin to remove Ysh1 and Rna14 from the nucleus. f
Northern blot examination of total RNA isolated from the indicated strains after treatment with rapamycin (lanes 1–3) or
thiamine (lanes 4–8). The blot was hybridized with snR3 and 18S rRNA specific DNA probes. On the right, the positions of
mature snR3, 18S, and 25S rRNAs, as well as snR3 read-through (RT) products, are indicated (Larochelle et al., 2018)
Other mechanism for termination of fission yeast transcript coding and non-coding of rnagenes are:
RNAPII termination does not require Nab3 or Sen1.
Fig. 2: At the 3′ end of independently produced snoRNA and snRNA genes, S. pombe mRNA 3′ end processing and
transcription termination factors are recruited. a, b Average ChIP-seq patterns of total RNAPII (Rpb1) relative to
annotated 3′ end of independently generated monocistronicsnoRNAs (b n = 31) in WT (solid line, orange) and NNS
mutant strains (dotted lines, orange) cultured in rich medium (YES). Average ChIP-seq patterns of total RNAPII
(Rpb1) in wild-type (WT) and Seb1-deficient (Pnmt-seb1) cells cultured for 15 hours in thiamine-supplemented
minimal medium (EMM) are also displayed (gold). Supplementary Data 1–2 include gene coordinates. c–e RNAPII
(Rpb1) and the appropriate mRNA 3′ end processing factors normalized ChIP-seq signal across the fba1 mRNA (c),
snR99 snoRNA (d), and snu5 snRNA (e) genes. f and g Average ChIP-seq profile of the specified mRNA 3′ end
processing components across the same mRNA (f) and snoRNA (g) gene groups as a, b. (Larochelle et al., 2018)
CONCLUSION
Upstream intrinsic termination is used to control the expression of diverse operons via mechanisms that allow or prevent
transcription termination in a leader sequence or the first region of translation. Transcription is terminated via a reaction
that is linked to RNA 3′-end processing. Most eukaryotic mRNA precursors are site-specifically cleaved at the 3′-
untranslated region, followed by polyadenylation of the upstream cleavage product. A large number of proteins are
involved in these processes. It is unknown what mechanism connects 3′-end processing with transcription termination.
Dephosphorylation of the pol II CTD occurs concurrently with termination, but the precise timing of pol II
dephosphorylation is unknown. Advances in experimental and computational methods have substantially aided genomic
research over the last two decades. Next-generation sequencing technology have reduced the cost of de novo sequencing of
huge genomes, and strong computational tools have permitted precise annotation of genomic DNA sequences. Noncoding
RNAs, repetitive elements, chromatin states, epigenetic changes, and gene regulatory elements must all be considered when
mapping functional areas in genomes.
To create a complete transcript index for the human genome, computational and transcriptional-based experimental
methodologies were examined. Oligonucleotide probes derived from a wide number of known and anticipated human
genome transcript sequences were utilized to scan transcription from a varied selection of eukaryotes.
REFERENCES
[1] Abecasis, G. R., Auton, A., Brooks, L. D., DePristo, M. A., Durbin, R. M., Handsaker, R.E., Kang, H. M., Marth, G.
T. and McVean, G. A. (2012). An integrated map of genetic variation from 1,092 human
genomes. Nature. 491 (7422): 56–65.
[2] Alles, J., Fehlmann, T., Fischer, U., Backes, C., Galata, V., Minet, M., Hart, M., Abu-Halima, M., Grasser, F. A.,
Lenhof, H., Keller, A. and Meese, E. (2019). An estimate of the total number of true human miRNAs. Nucleic Acids
Research. 47 (7): 3353–3364.
[3] Auton, A., Brooks, L. D., Durbin, R. M., Garrison, E. P., Kang, H. M., Korbel, J. O., et al. (2015). A global reference
for human genetic variation. Nature. 526 (7571): 68–74.Carithers, L. J. and Moore, H. M. (2015). The genotype-
tissue expression (GTEx) project. Biopreserv Biobank, 13 (5): 307-308.
[4] Claverie, J. (1997). Computational Methods for the Identification of Genes in Vertebrate Genomic Sequences, Human
Molecular Genetics, 6(10): 1735–1744. Chimpanzee Sequencing Analysis Consortium (2005). Initial sequence of the
chimpanzee genome and comparison with the human genome. Nature. 437(7055): 69–87.
[5] Davidson, E. H., Allfrey, V. G. and Mirsky, A. E. (1963). Gene expression in differentiated cells. Proc Natl AcadSci
U S A, 49 (1): 53-60.
[6] de Klerk, E. and 't Hoen, P. A. (2015). Alternative mRNA transcription, processing, and translation: insights from
RNA sequencing. Trends Genet. 31(3):128-39.
[7] Ezkurdia, I., Juan, D., Rodriguez, J. M., Frankish, A., Diekhans, M., Harrow, J., Vazquez, J., Valencia, A. and Tress,
M. L. (2014). Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding
genes. Human Molecular Genetics. 23 (22): 5866–78.
[8] Forrest, A. R. R., Kawaji, H., Rehli, M., Kenneth-Baillie, J., de Hoon, M. J. L., Haberle, V., Lassmann, T.,
Kulakovskiy, I. V., Lizio, M., Itoh, M. (2014). A promoter-level mammalian expression atlas. Nature. 507:462–470.
[9] Gabut, M., Samavarchi-Tehrani, P., Wang, X., Slobodeniuc, V., O'Hanlon, D., Sung, H. K. Alvarez, M., Talukder, S.,
Pan, Q., Mazzoni, E. O., Nedelec, S., Wichterle, H., Woltjen, K., Hughes, T. R., Zandstra, P. W., Nagy, A., Wrana, J.
L. and Blencowe, B. J. (2011). An alternative splicing switch regulates embryonic stem cell pluripotency and
reprogramming. Cell.; 147(1):132-46.
[10] Gannett, L. (2008). The Human Genome Project. Stanford Encyclopedia of Philosophy. Retrieved 18 July 2021.
[11] Hydén, H. and Lange, P.W. (1965). A differentiation in RNA response in neurons early and late during learning. Proc
Natl AcadSci U S A, 53 (5): 946-952.
[12] International Human Genome Sequencing Consortium (2001). Initial sequencing and analysis of the human
genome. Nature. 409 (6822): 860–921.
[13] International Human Genome Sequencing Consortium (2004). Finishing the euchromatic sequence of the human
genome. Nature. 431 (7011): 931–45.
[14] Jalali, S., Singh, A., Maiti, S., Scaria, V. (2017). Genome-wide computational analysis of potential long noncoding
RNA mediated DNA:DNA:RNA triplexes in the human genome. J Transl Med. 2017 Sep 2;15(1):186.
[15] Kukurba, K. R. and Montgomery, S. B. (2015). RNA sequencing and analysis. Cold Spring HarbProtoc, 11: 951-969.
[16] Larochelle, M., Robert, M. A., Hébert, J. N., Liu, X., Matteau, D., Rodrgue, S., Tian, B., Jacques, P. and backhand, F.
(2018). Common mechanism of transcription termination at coding and noncoding RNA genes in fission yeast. Nat
Commun 9, 4364.
[17] Love, M. I. Hogenesch, J. B. and Irizarry, R. A. (2016). Modeling of RNA-seq fragment sequence bias reduces
systematic errors in transcript abundance estimation. Nat Biotechnol, 34 (12): 1287-1291.
[18] Pennisi, E. (2012). Genomics. ENCODE project writes eulogy for junk DNA. Science. 337 (6099): 1159–1161.
[19] Porrua, O. and Libri, D. (2015). Transcription termination and the control of the transcriptome: why, where and how
to stop. Nat Rev Mol Cell Biol 16, 190–202.
[20] Saey, T. H. (2018). A recount of human genes ups the number to at least 46, 831. Science News.
[21] Schena, M. (1996). Genome analysis with gene expression microarraysBioEssays, 18 (5): 427-431.
[22] Sendoel, A., Dunn, J. G., Rodriguez, E. H., Naik, S., Gomez, N. C., Hurwitz, B., Levorse, J., Dill, B. D., Schramek,
D. and Molina, H. (2017). Translation from unconventional 5’ start sites drives tumour initiation. Nature. 541:494–
499.
[23] Sudheer Menon (2020) “Preparation and computational analysis of Bisulphite sequencing in Germfree Mice”
International Journal for Science and Advance Research In Technology, 6(9) PP (557-565).
[24] Sudheer Menon (2021) “Bioinformatics approaches to understand gene looping in human genome” EPRA
International Journal of Research & Development (IJRD), Vol (6) Issue (7) July 2021, PP (170-173).
[25] Sudheer Menon (2021) “Computational analysis of Histone modification and TFBs that mediates gene looping”
Bioinformatics, Pharmaceutical, and Chemical Sciences (RJLBPCS), June 2021, 7(3) PP (53-70).
[26] Sudheer Menon (2021) “Insilico analysis of terpenoids in Saccharomyces Cerevisiae”international Journal of
Engineering Applied Sciences and Technology, 2021 Vol. 6, Issue1, ISSN No. 2455-2143, PP(43-52).
[27] Sudheer Menon Shanmughavelpiramanayakam, Gopal Prasad Agarwal (2021) “FPMD-Fungal promoter motif
database: A database for the Promoter motifs regions in fungal genomes” EPRA International Journal of
Multidisciplinary research,7(7) PP (620-623).
[28] Sudheer Menon, ShanmughavelPiramanayakam and Gopal Agarwal (2021) “Computational identification of
promoter regions in prokaryotes and Eukaryotes” EPRA International Journal of Agriculture and Rural Economic
Research (ARER), Vol (9) Issue (7) July 2021, PP (21-28).
[29] Sudheer Menon, ShanmughavelPiramanayakam and Gopal Agarwal (2021) Computational Identification of promoter
regions in fungal genomes, International Journal of Advance Research, Ideas and Innovations in Technology, 7(4) PP
(908-914).
[30] Sudheer Menon, Vincent Chi Hang Lui and Paul Kwong Hang Tam (2021) Bioinformatics methods for identifying
hirschsprung disease genes, International Journal for Research in Applied Science & Engineering Technology
(IJRASET), Volume 9 Issue VII July, PP (2974-2978).
[31] Taher, L., Narlikar, L. and Ovcharenko, I. (2015). Identification and computational analysis of gene regulatory
elements. Cold Spring Harbor protocols, (1), pdb.top083642.
[32] Tian, B. and Manley, J. L. (2016). Alternative polyadenylation of mRNA precursors. Nat. Rev. Mol. Cell Biol. 18:18–
30.
[33] Varki, A. and Altheide, T. K. (2005). Comparing the human and chimpanzee genomes: searching for needles in a
haystack. Genome Research. 15 (12): 1746–58.
[34] Wade, N. (1999). Number of human genes is put at 140,000, a significant gain. The New York Times.
[35] Wang, Z., Gerstein, M. and Snyder, M. (2009). RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev
Genet, 10 (1): 57-63.
[36] Waters, K. (2007). "Molecular Genetics". Stanford Encyclopedia of Philosophy. Retrieved 18 July 2021.
[37] Yang, X., Coulombe-Huntington, J., Kang, S., Sheynkman, G., Hao, T., Richardson, A., Sun, S., Yang, F., Shen, Y.,
and Murray, R. (2016). Widespread expansion of protein interaction capabilities by alternative splicing. Cell.
164:805–817.