0% found this document useful (0 votes)
50 views

Bioinformatics Week 1: Play Video Starting At:4:13 and Follow Transcript4:13

Based on the information provided in the document, the biological process this gene is involved with according to the Gene Ontology terms is "regulation of transcription, DNA-templated".
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views

Bioinformatics Week 1: Play Video Starting At:4:13 and Follow Transcript4:13

Based on the information provided in the document, the biological process this gene is involved with according to the Gene Ontology terms is "regulation of transcription, DNA-templated".
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

BIOINFORMATICS WEEK 1

Introduction
Instructor : Nicholas Provart 

What is bioinformatics? 
Basically, it's the use of computational tools to manage all kinds of biological data. 
Here we use computers for storage, retrieval, to manipulate and to distribute information 
related to biological molecules, such as DNA, RNA, proteins, and metabolites. 
Here, we're generally talking about sequence information, structural information, 
functional analysis of genes and genomes, and their corresponding products such as
transcripts, so gene expression levels. 
It's sometimes called computational molecular biology. 
This field has really developed in the past 10 years due to the efforts of genome
sequencing projects, such as the human genome sequencing project which you may have
heard of ;-) 
How do we deal with 
three billion pieces of sequence information? 
So why do we need bioinformatics? 
Well, if you can imagine 
three billion letters in the human genome, 
three billion nucleotides, how do you really 
make sense of that without using computers? 
So this is just a small section of 
a human genome encompassing the human globin gene. 
We would like to know about 
which parts of the genome are important, 
that code for proteins for instance. 
Without using computers, we would 
never know that this region here or 
this region here actually 
comprise an exon of the globin gene. 
That is a piece of the gene that 
actually codes for protein. 
The other thing that bioinformatics is 
about is biological databases, 
how we can store these biological data. 
We'll talk a little bit about what a database is, 
data structures, flat file 
databases versus relational databases. 
We'll talk also about accession numbers and identifiers, 
and we'll go over the GenBank flat file format, 
and we'll just touch briefly on a practical example of 
utility using NCBI's Entrez / GQuery / Search.

Play video starting at :4:13 and follow transcript4:13


BIOINFORMATICS WEEK 1

So if we look at the planet Earth from afar, 


we see a lot of green means that life's present, 
and this background that I've placed here 
is actually an output 
from a next generation sequencing machine, 
you see small spots, clusters in an Illumina flow cell, 
and it's possible now to generate 
a universe of information about the organisms 
that are on our planet. 
We might be interested in genome and genomic sequences, 
gene sequences and mutations, 
gene regulation, 
where a given gene is expressed and when. 
That can tell us about the function of the gene, 
what happens when introns aren't spliced 
properly, or when they are 
spliced properly but create variants. 
We can think about 
protein sequences and 
some post-translational modifications, 
such as phosphorylation of proteins, 
and look at how the proteins fold up 
to create small machines, 
basically that do the things 
that we need them to do inside our bodies. 
These machines don't operate in isolation, 
often they operate in networks so we're interested 
in how proteins function together in networks, 
where the proteins are localized, 
the kinetics of enzymes, which are a sub-class proteins, 
the metabolites that some of these proteins produce, 
and when things go awry, 
what kinds of diseases are caused by 
defects in genes and proteins. 
Of course, we would like to tie 
all of these together with some academic framework, 
so we want to have access to the literature. 
So basically, we need databases 
to archive accumulated knowledge 
and to provide scientists 
with easy access to biological data. 
How can we store this data? 
You can store them in a flat-file format 
with the field separated by some kind of a delimiter. 
So here we've got four records of professors, 
BIOINFORMATICS WEEK 1

University of Toronto in 


this case, some former professors. 
Basically, that's the first name, 
separated by a pipe character, 
last name and then the department, 
the university, and the address in this case. 
We could store those data in a spreadsheet, 
so this is maybe a more familiar way for 
you to think about 
storing data and you're all familiar with Excel, I'm sure. 
Here we've got a column that contains the first_name, 
the last_name, the institution, 
the department, and the address. 
There are problems with this kind of flat file format, 
this kind of database. 
One of these problems is that there's some redundancy. 
So that for instance if we look at 
this record here and this record here, 
we've got two entries, which is taking 
up extra storage space. 
If the physical building 
changes where these professors are housed in, 
we'll have to update all of 
the records in this flat file database. 
If we miss one of them, that would be an error. 
So relational databases actually offer a solution, 
and they are commonly used in biology. 
What we've got is a series of tables, 
relations, that contain attributes, 
which are fields or columns of the table, 
and each row in a table is known as a tuple or a record, 
and the information in these tables should 
be normalized so that it's non-redundant. 
So we can do this in a couple of different ways. 
One common way to do this is to use 
a foreign key to link tables. 
The second table here, 
the first table, we've got the table of Professors, 
we've got a link to 
another table of Contacts down here by 
a foreign key to the primary key of the Contacts table. 
So in fact here we would only represent 
the Department of Botany once in the table of Contacts, 
instead of having entered multiple 
times as we did in the flat file field. 
BIOINFORMATICS WEEK 1

SQL can be used to query relational databases, 


and there's a very large body 
of research and development on SQL databases, 
how to index things efficiently 
and query these databases efficiently. 
When we create biological databases, 
often we use different identifiers to index records. 
A couple of different ways of identifying records in 
a database, in GenBank for 
instance, are using identifiers or accession codes. 
In the case of identifiers, 
typically a string of letters and digits 
that's understandable in some meaningful way by a human. 
They're not stable as accession numbers, mainly 
because they can be 
changed by curators if the function of the, 
presumed function of the protein is found, 
is changed, is updated as research advances. 
In the case of GenBank,

The Answer for the Quiz

1. 1
2. 65 points …
3. Xm ..621 udah dicoba Xm… 721 xm..521 bv udahh
BIOINFORMATICS WEEK 1

a. What is the taxonomic lineage of your organism?

LINEAGE

cellular;organisms; Eukaryota; Viridiplantae; Streptophyta; Streptophytina; Embry
ophyta; Tracheophyta; Euphyllophyta; Spermatophyta; Magnoliopsida; Mesangios
permae; eudicotyledons; Gunneridae; Pentapetalae; rosids; malvids; Brassicales; 
Brassicaceae; Camelineae; Arabidopsis
b. Has the genome of this organism been sequenced, i.e., is there a Genome Project?

Yes, there is a genome project. If i clicked a bio project of this organism, I can get 6,937 results.

c. If so, can you find the accession for the full sequence or one of the chromosomes?

Yes, I can. Example, the accession PRJNA795329

a. Where did this take you or what happened when you did this?
BIOINFORMATICS WEEK 1

This link took me to focused on the origin, as you can see the origin of this organism is
coloured by brown colour. So, the origin of this organism started by 1 until 541. Furthermore,
we can know the details about this organism such as gene synonym, inference (similar to
RNA sequence), domain etc.
a. Where is your gene’s location in the genome? (Tip: hover with your cursor over the green
bars in the “Genomic regions, transcripts, and products” section; the green bars represent
the gene in the sequence viewer)
Location : chromosome 2
Location complement(12,368,220..12,370,420)
:

b. How many exons do you see in this gene? Tip: how many green boxes are there?
Exon count : 4
c. What are the names of the genes surrounding it (i.e. what is its “Genomic context”)?
NC_003071.7
d. Does it have any conserved domains? What are they called? (Tip: use the “Related
Information” link to Conserved Domains on the right of the Gene page)
Yes, it does. There is 50 results of conserved domain in this organism.
BIOINFORMATICS WEEK 1

e. After exploring conserved domains go back to the Gene page. What biological process (Gene
Ontology terms) is this gene involved with (scroll down!)?

You might also like