EST

Expressed Sequence Tags (ESTs) are short DNA sequences used to identify expressed genes in cells, with approximately 65.9 million ESTs available in public databases. They serve as a cost-effective alternative to whole genome sequencing, particularly for organisms with large genomes, and are instrumental in gene discovery, transcript identification, and understanding gene regulation. ESTs are generated from cDNA of mRNA and can be analyzed for various applications, including mapping gene structures and assessing gene expression patterns.

Uploaded by

peachybony

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views29 pages

EST

Uploaded by

peachybony

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Expressed Sequence Tag

Dr. Sujoy Ghosh

7/07/2011
Definition
• ESTs are short (200–500 nucleotides) DNA
sequences that can be used to identify a gene
that is being expressed in a cell at a particular
time.
• They may be used to identify gene transcripts,
and are instrumental in gene discovery and
gene sequence determination. The identification
of ESTs has proceeded rapidly, with
approximately 65.9 million ESTs now available
in public databases .
• Whole genome sequencing is currently
impractical and expensive for organisms
with large genome sizes. Such an
approach is unlikely to be applied
extensively, irrespective of the significance
of such genome data in human and animal
health, agriculture, ecology and evolution.
In addition, genome expansion, as a result
of retrotransposon repeats, makes whole
genome sequencing less attractive for
plants such as maize [6].
• In this scenario, EST data sets have been
utilized to complement genome sequencing or
as an alternative to the genome sequencing of
many organisms, earning the label, the ‘poor
man's genome’ It must be noted that ESTs are
subject to sampling bias resulting in under-
representation of rare transcripts, often
accounting for only 60% of an organism's genes.
However, ESTs in combination with reduced
representation sequencing strategies, such as
methylation filtration and high Cot selection,
have enabled the successful examination of the
gene pool in plants like maize
• Expressed Sequence Tags are generated from cDNA cloned from
mRNA of any particular species (refer to the Figure 1). As the cDNA
used is complementary to mRNA, the ESTs represent portions of
expressed genes.
• The ESTs can be generated by following steps:
• Transcription of Genomic DNA: Genomic DNA is first transcribed
to generate Nascent mRNA followed by splicing of synthesize
perfect mRNA.
• Reverse transcription of mRNA: mRNA can also be directly
isolated from the species by using different kits (e.g. RNAgent
Promega). mRNA synthesized undergoes reverse transcription to
form cDNA library.
• Generation of ESTs: From the cDNA library 5' or 3'-ESTs are
generated by cDNA end sequencing. 5' EST is formed from a region
of transcript which forms protein whereas the ending portion of
cDNA forms 3'EST.
• Assembly and organization of ESTs: The constructed ESTs can
then be assembled separately in multimember sequence assembly,
Bridged sequence assembly and small clusters on the basis of size
of ESTs.
EST GENERATION
Alternate strategy
• Simpson and co-workers [2005] have developed a novel
cost-effective method for generating high-throughput
ESTs called ORESTES (open reading frame expressed
sequence tags). This method differs from conventional
EST generation by providing sequence data from the
central protein coding region, and thus the most
informative and desired portion, of transcripts.
ORESTES representing highly, moderately and rarely
expressed transcripts have been derived from several
species with more than a million human sequences and
thousands from other species such as cow and honey
bee deposited in the Expressed Sequence Tags
database, dbEST
Characteristics of EST sequences.

Nagaraj S H et al. Brief Bioinform 2007;8:6-21

© The Author 2006. Published by Oxford University Press. For Permissions, please email:
[email protected]
Errors Associated with EST generation
• A typical EST sequence is only a very short
copy of the mRNA itself and is highly error
prone, especially at the ends. The overall
sequence quality is usually significantly better in
the middle. Vector and repeat sequences either
in the end or rarely in the middle are excised
during EST pre-processing. As ESTs are
sequenced only once, they are susceptible to
errors. Generally, the quality of base reads in
individual EST sequences is initially poor (upto
20% or ∼50–100 bp), gradually improves and
then diminishes once again towards the end .
The overall sequence quality is usually
significantly better in the middle (‘highly
informative length,’ Figure 1B).
EST and Untranslated Regions (UTRs)
• The 5′ and 3′ UTRs of eukaryotic mRNA have been
experimentally shown to contain sequence elements
essential for gene regulation, expression and translation.
In this context, EST data has proven to be important for
mining UTRs as both 5′ and 3′ ESTs contain significant
sections of the UTRs along with protein coding regions.
The CORG (COmparative Regulatory Genomics)
resource supports promoter analysis using assembled
ESTs, while more than half of the Eukaryotic Promoter
Database entries are based on 5′ EST sequences. Mach
has developed the PRESTA (PRomoter EST
Association) algorithm for promoter verification and
identification of the first exon, by mapping EST 5′ ends.
• The COmparative Regulatory Genomics
(CORG) database and annotation project aims
at providing insights into gene regulation at the
level of transcription. Having now several
genomes of higher eukaryotes at hand, we are
able to study sequence elements on a
comparative basis. Comparative sequence
analysis has become a powerful tool regarding
a variety of problems ranging from gene
finding to the identification of regulatory
elements..
• The CORG project systematically applies
comparative sequence analysis methods to
non-coding, genomic DNA. The working
hypothesis underlying the CORG project is
that local sequence conservation points to
functional importance . The CORG project is a
resource for the genome-wide annotation of
conserved sequence elements in non-coding
genomic DNA. We will subsequently call
these elements ‘conserved non-coding blocks’
(CNBs)
PRESTA
• Large sets of well-characterized promoter
sequences are required to facilitate the
understanding of promoter architecture. The
major sequence databases are a prospective
source of upstream regulatory regions, but
suffer from inaccurate annotation. The
software tool PRESTA (PRomoter EST
Association) presented in this study is
designed for efficient recovery of
characterized and partially verified promoters
from GenBank and EMBL libraries.
• The PRESTA algorithm examines the
putative GenBank/EMBL promoters and
automatically removes most of the poorly
annotated entries. The remaining records
are connected to expressed sequence
tags (ESTs) through a high-stringency
BLAST search.
• The frequency and source of recovered
ESTs provide an estimate of the activity
and expression pattern of the promoter,
and the ESTs' 5' ends assist in
transcription start-site verification. The
PRESTA database provides easy access
to non-redundant upstream regulatory
regions recently extracted by the PRESTA
algorithm.
• The current size of this resource is 552
human and 241 mouse promoters.
Surprisingly, no overlap between the
PRESTA database and the Eukaryotic
Promoter Database (EPD) was detected
by sequence comparison.
EST contigs
• Because of the way ESTs are sequenced,
many distinct expressed sequence tags
are often partial sequences that
correspond to the same mRNA of an
organism. In an effort to reduce the
number of expressed sequence tags for
downstream gene discovery analyses,
several groups assembled expressed
sequence tags into EST contigs.
Data base for EST
• Diatom EST database [http://avesthagen.sznbowler.com.
• ESTree http://www.itb.cnr.it/estree/
• Fungal genomics project
https://fungalgenomics.concordia.ca/home/index.php
• Honey bee brain EST project
http://titan.biotec.uiuc.edu/bee/honeybee_project.htm
• Nematode ESTs at the Sanger
Instituteftp://ftp.sanger.ac.uk/pub/pathogens/nem_ests/N
EMBASE- parasitic nematode
• ESTshttp://www.nematodes.orgParasitic and free-living
nematode EST resourcehttp://www.nematode.net/
EST sequence analysis
• An individual raw EST has negligible biological
information. Analysis using different
combinations of computational tools augments
this weak signal and when a multitude of ESTs
are analysed, the results enable the
reconstruction of transcriptome of that organism.
While diverse research groups have used
different combinations of tools for extraction of
data from specific databases followed by
analyses [32–37], a generic protocol of the
different steps in the analysis of EST data sets is
shown in Figure 2.
Generic steps involved in EST analysis. 1.

Nagaraj S H et al. Brief Bioinform 2007;8:6-21

© The Author 2006. Published by Oxford University Press. For Permissions, please email:
[email protected]
EST clustering and assembly
• The purpose of EST clustering is to collect overlapping
ESTs from the same transcript of a single gene into a
unique cluster to reduce redundancy. An EST cluster is a
fragmented data, which can be consolidated and
indexed using gene sequence information, such that all
the expressed data arising from a single gene is grouped
into a single index class, and each index class contains
information for only that particular gene. A simple way to
cluster ESTs is by measuring the pair-wise sequence
similarity between them. Then, these distances are
converted into binary values, depending on whether
there is a significant match or not, such that the
sequence pair can be accepted or rejected from the
cluster being assembled
Program for EST sequence assembly
• Name#Website

• CAP3 http://genome.cs.mtu.edu/cap/cap3.html
• CLOBB http://zeldia.cap.ed.ac.uk/CLOBB/
• CLUhttp://compbio.pbrc.edu/pti
• ESTatehttp://www.ebi.ac.uk/~guy/estate/
• ESTs aSSEmbly using
Malighttp://alggen.lsi.upc.es/recerca/essem/frame-essem.html
• megaBLASTftp://ftp.ncbi.nih.gov/blast/
• miraEST http://www.chevreux.org/projects_mira.html
• Paracel Transcript Assemblerhttp://www.paracel.com/
• Phrap http://www.phrap.org/
• stackPACK http://www.sanbi.ac.za/Dbases.html#stackpack
• Xsact and Xtract http://www.ii.uib.no/~ketil/bioinformatics/
Database similarity searches
• Once consensus sequences (putative genes) are
obtained from assembled ESTs, possible functions can
be assigned through downstream annotation, achieved
via database similarity searches, employing familiar
freely available tools and databases.
• Different flavours of BLAST programs from NCBI serve
as a universal tools for database similarity searches.
BLASTN can be used to search ESTs against nucleotide
sequence database and BLASTX to search against
protein databases. BLASTX translates a consensus EST
sequence (query) into protein products in six reading
frames followed by comparisons with protein databases.
Program for EST alignment to
genomic DNA
• BLAT http://genome.ucsc.edu/cgi-bin/hgBlat
• Est2genomehttp://bioweb.pasteur.fr/seqanal/interfaces/est2genome.
html
• GMAP http://www.gene.com/share/gmap/
• MGAlign http://origin.bic.nus.edu.sg/mgalign
• SSAHA http://www.sanger.ac.uk/Software/analysis/SSAHA/
• Sim4http://globin.cse.psu.edu/html/docs/sim4.html
• Splignhttp://www.ncbi.nlm.nih.gov/sutils/splign/splign.cgi
Application of EST
• ESTs were first used to construct maps of
the human genome
• assessment of the gene coverage from
EST sequencing
• mapping of gene-based site markers
• EST databases are used for gene
structure prediction
• to investigate alternative splicing
• to discriminate between genes exhibiting
tissue or disease-specific expression
• for the discovery and characterization of
candidate SNPs
• EST-based gene expression protocols
have been used in the identification and
analysis of coexpressed genes on a large
scale

RNAProteinSynthesisSE KEY
52% (134)
RNAProteinSynthesisSE KEY
6 pages
Biomolecules Concept Map
100% (3)
Biomolecules Concept Map
1 page
Cellular and Molecular Pharmacology
From Everand
Cellular and Molecular Pharmacology
Dr. Amteshwar Singh Jaggi
4.5/5 (6)
Expressed Sequence Tags
No ratings yet
Expressed Sequence Tags
4 pages
EST - "Expressed Sequence Tags": - Manali Mehendale
No ratings yet
EST - "Expressed Sequence Tags": - Manali Mehendale
19 pages
ESTWeb Bioinformatics Services For EST
No ratings yet
ESTWeb Bioinformatics Services For EST
2 pages
Expressed Sequence Tags
0% (1)
Expressed Sequence Tags
20 pages
Xpressed Equence Ag: Ests - Outline
No ratings yet
Xpressed Equence Ag: Ests - Outline
26 pages
Lecture 2
No ratings yet
Lecture 2
28 pages
Ests: Gene Discovery Made Easier
No ratings yet
Ests: Gene Discovery Made Easier
7 pages
Chapter 18 Presentation
No ratings yet
Chapter 18 Presentation
47 pages
Module_5_Reference Course content
No ratings yet
Module_5_Reference Course content
25 pages
Smalheiser2003 Article ESTAnalysesPredictTheExistence
No ratings yet
Smalheiser2003 Article ESTAnalysesPredictTheExistence
3 pages
class EST
No ratings yet
class EST
21 pages
RNA Sequnecing and Analysis - 2015 Nihms768779
No ratings yet
RNA Sequnecing and Analysis - 2015 Nihms768779
29 pages
Anotacion_de_Genomas
No ratings yet
Anotacion_de_Genomas
84 pages
Baldi Bioinformatics 1999
No ratings yet
Baldi Bioinformatics 1999
2 pages
Bioinformatics - Group21 - Report - Application of Bioinformatics in Agriculture
No ratings yet
Bioinformatics - Group21 - Report - Application of Bioinformatics in Agriculture
11 pages
BMC Bioinformatics: Identification of Clustered Micrornas Using An Ab Initio Prediction Method
No ratings yet
BMC Bioinformatics: Identification of Clustered Micrornas Using An Ab Initio Prediction Method
15 pages
CUBT401 - 4 - Sequence and Genome Annotation
No ratings yet
CUBT401 - 4 - Sequence and Genome Annotation
66 pages
Rna Bioinformatics 1st Edition Ernesto Picardi Eds download
No ratings yet
Rna Bioinformatics 1st Edition Ernesto Picardi Eds download
58 pages
Bioinformatics Tools For Nucleotide Sequence Analysis and Database Exploration
No ratings yet
Bioinformatics Tools For Nucleotide Sequence Analysis and Database Exploration
75 pages
Unit 2 BI
No ratings yet
Unit 2 BI
10 pages
Bioinformatics Cheat Sheet
No ratings yet
Bioinformatics Cheat Sheet
4 pages
Paulson_2017
No ratings yet
Paulson_2017
10 pages
Gene Expression Ebook M GL 00258
No ratings yet
Gene Expression Ebook M GL 00258
26 pages
Bioinformatics Unit I
No ratings yet
Bioinformatics Unit I
6 pages
The Science of Stem Cells
From Everand
The Science of Stem Cells
Jonathan M. W. Slack
No ratings yet
Freedman2024
No ratings yet
Freedman2024
9 pages
Nucleic_Acid_Databases
No ratings yet
Nucleic_Acid_Databases
37 pages
Serial Analysis of Gene Expression
No ratings yet
Serial Analysis of Gene Expression
22 pages
2008-Brouns-Suppression of The MicroRNA Pathway by Bacterial Effector Proteins
No ratings yet
2008-Brouns-Suppression of The MicroRNA Pathway by Bacterial Effector Proteins
6 pages
PP-604 Assignment 1
No ratings yet
PP-604 Assignment 1
27 pages
Bianca Castiglioni
No ratings yet
Bianca Castiglioni
96 pages
Gene Editing, Epigenetic, Cloning and Therapy
From Everand
Gene Editing, Epigenetic, Cloning and Therapy
Amin Elsersawi Ph.D.
4.5/5 (2)
Artigo Bioinformática
No ratings yet
Artigo Bioinformática
19 pages
Computational_Characterization_of_Transc
No ratings yet
Computational_Characterization_of_Transc
6 pages
Classif lncRNA
No ratings yet
Classif lncRNA
13 pages
Trapnell 2024 TopHat discovering splice junction wiht RNaSeq
No ratings yet
Trapnell 2024 TopHat discovering splice junction wiht RNaSeq
7 pages
COMPUTATIONAL BIOLOGY manual
No ratings yet
COMPUTATIONAL BIOLOGY manual
37 pages
Fast Facts: EGFR Exon 20 Insertion Mutations in NSCLC
From Everand
Fast Facts: EGFR Exon 20 Insertion Mutations in NSCLC
Julia Rotow
No ratings yet
Enhancer_derived_RNA_a_primer
No ratings yet
Enhancer_derived_RNA_a_primer
5 pages
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
9 Ncrna
No ratings yet
9 Ncrna
70 pages
Chapter 20 Genomics
No ratings yet
Chapter 20 Genomics
43 pages
Association Between Microrna Regulation and Cross-Species Variation of Gene Expression
No ratings yet
Association Between Microrna Regulation and Cross-Species Variation of Gene Expression
7 pages
Frontiers in Stem Cell and Regenerative Medicine Research: Volume 5
From Everand
Frontiers in Stem Cell and Regenerative Medicine Research: Volume 5
Atta-ur-Rahman
No ratings yet
Bioinformatics Databases
No ratings yet
Bioinformatics Databases
10 pages
Advances in Plant Genome Sequencing
No ratings yet
Advances in Plant Genome Sequencing
14 pages
Large-Scale Analysis of Gene Expression
No ratings yet
Large-Scale Analysis of Gene Expression
27 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
66 pages
Gene Expression Programming: Fundamentals and Applications
From Everand
Gene Expression Programming: Fundamentals and Applications
Fouad Sabry
No ratings yet
Advanced Applications of RNA Sequencing
No ratings yet
Advanced Applications of RNA Sequencing
18 pages
Gene Control: Unlocking Genetic Secrets
From Everand
Gene Control: Unlocking Genetic Secrets
Deevakar Asan
No ratings yet
Diversity and Dynamics of The Drosophila
No ratings yet
Diversity and Dynamics of The Drosophila
7 pages
The RNA World 11th Lect High-throughput Methods GH AY16 2017
No ratings yet
The RNA World 11th Lect High-throughput Methods GH AY16 2017
59 pages
Next Generation Sequencing Presentation
No ratings yet
Next Generation Sequencing Presentation
28 pages
Bif501 Handouts PDF Bif
No ratings yet
Bif501 Handouts PDF Bif
197 pages
micrornaNaming
No ratings yet
micrornaNaming
12 pages
Using Bayesian Networks To Analyze Expression Data
No ratings yet
Using Bayesian Networks To Analyze Expression Data
9 pages
Sequence Annotation Editor
No ratings yet
Sequence Annotation Editor
14 pages
GlOsario Bioinformatica
No ratings yet
GlOsario Bioinformatica
5 pages
Dosage Compensation in Mammals
No ratings yet
Dosage Compensation in Mammals
32 pages
sg noted
No ratings yet
sg noted
1 page
P-24-383A
No ratings yet
P-24-383A
1 page
assignmt -3
No ratings yet
assignmt -3
3 pages
BASUDEV KUNDU
No ratings yet
BASUDEV KUNDU
1 page
Scan 07 Dec 24 19·09·36
No ratings yet
Scan 07 Dec 24 19·09·36
4 pages
Potency 11.7.18-1
No ratings yet
Potency 11.7.18-1
2 pages
Gregarious behavior in desert locusts is evoked by touching their back legs
No ratings yet
Gregarious behavior in desert locusts is evoked by touching their back legs
3 pages
srlesson2
No ratings yet
srlesson2
8 pages
Scan 24 Jul 23 10 15 38
No ratings yet
Scan 24 Jul 23 10 15 38
2 pages
Development of Xenopus
No ratings yet
Development of Xenopus
25 pages
Chapter - 3 Short
No ratings yet
Chapter - 3 Short
4 pages
Watson Crick
No ratings yet
Watson Crick
5 pages
2-Biological Molecules (Revision Test) - Engeecon By Academies Studio ?️ X Prep Titans
No ratings yet
2-Biological Molecules (Revision Test) - Engeecon By Academies Studio ?️ X Prep Titans
37 pages
Tutorial-3D Protein Structure Visualisation and Analysis
No ratings yet
Tutorial-3D Protein Structure Visualisation and Analysis
6 pages
DNA and Replication Worksheet Answers
No ratings yet
DNA and Replication Worksheet Answers
2 pages
Midterm Secondary Hemostasis 1
No ratings yet
Midterm Secondary Hemostasis 1
3 pages
Library Construction
No ratings yet
Library Construction
8 pages
Worksheet On DNA and RNA Answers
100% (2)
Worksheet On DNA and RNA Answers
3 pages
iGenetics A Molecular Approach 3rd Edition Russell Test Bankpdf download
100% (5)
iGenetics A Molecular Approach 3rd Edition Russell Test Bankpdf download
36 pages
Lecture 6 - Epigenetic Regulation
No ratings yet
Lecture 6 - Epigenetic Regulation
16 pages
Dnas Secret Code PDF
100% (1)
Dnas Secret Code PDF
4 pages
NEB Gibson Master Manual
No ratings yet
NEB Gibson Master Manual
26 pages
Hachimoji DNA and RNA: A Genetic System With Eight Building Blocks
No ratings yet
Hachimoji DNA and RNA: A Genetic System With Eight Building Blocks
5 pages
Manual. Tapestation - gDNA - QG
No ratings yet
Manual. Tapestation - gDNA - QG
4 pages
MCQ Biology - Learning Biology Through Mcqs
No ratings yet
MCQ Biology - Learning Biology Through Mcqs
4 pages
Affinity Chromatography Sameh Magdeldin download
No ratings yet
Affinity Chromatography Sameh Magdeldin download
88 pages
Cloning A Vaccinia Virus Host Range Determinant, C7L, Into A Bacterial Expression Vector For Biophysical Analysis of The Purified Protein
No ratings yet
Cloning A Vaccinia Virus Host Range Determinant, C7L, Into A Bacterial Expression Vector For Biophysical Analysis of The Purified Protein
1 page
Amino Acid Worksheet 2 Key
No ratings yet
Amino Acid Worksheet 2 Key
6 pages
(eBook PDF) The Cell: A Molecular Approach 7th Edition instant download
No ratings yet
(eBook PDF) The Cell: A Molecular Approach 7th Edition instant download
53 pages
Gene Sequencing: Darshan Maheshbhai Patel 1 Sem M. Pharm Dept. of Pharmacology Anand Pharmacy College Guide: Anjali Patel
100% (1)
Gene Sequencing: Darshan Maheshbhai Patel 1 Sem M. Pharm Dept. of Pharmacology Anand Pharmacy College Guide: Anjali Patel
47 pages
DNA Gel Electrophoresis
No ratings yet
DNA Gel Electrophoresis
5 pages
Sickle Cell Mutation Extension v2 (2) (2)
No ratings yet
Sickle Cell Mutation Extension v2 (2) (2)
2 pages
Enzyme Notes
No ratings yet
Enzyme Notes
18 pages
Simultaneous Profiling of Native-state Proteomes and Transcriptomes of Neural Cell Types Using Proximity Labeling
No ratings yet
Simultaneous Profiling of Native-state Proteomes and Transcriptomes of Neural Cell Types Using Proximity Labeling
19 pages
Bioinformatic Tools For Next Generation DNA Sequencing - PHD Thesis
No ratings yet
Bioinformatic Tools For Next Generation DNA Sequencing - PHD Thesis
237 pages
Nucleic Acids Essay
100% (2)
Nucleic Acids Essay
3 pages
Dna Profiling Part 1
No ratings yet
Dna Profiling Part 1
31 pages
Biochemistry
No ratings yet
Biochemistry
29 pages

EST

Uploaded by

EST

Uploaded by

Expressed Sequence Tag

Dr. Sujoy Ghosh

Nagaraj S H et al. Brief Bioinform 2007;8:6-21

Nagaraj S H et al. Brief Bioinform 2007;8:6-21

You might also like