Practical Lab Exercise for Intro Bioinf II (2)
Practical Lab Exercise for Intro Bioinf II (2)
Introduction to Bioinformatics
Practical Assignment II
1. Introduction 2
2. Identification of the Unknown Sequences 2
3. Gene Expression 3
4. Genomics Information 4
5. Protein Domain Architecture 6
6. Protein Motif and Modification analysis (Post-translational modification) 7
7. Protein Secondary Structure 7
8. Tertiary or 3D structures 8
9. Protein-Protein Interactions 8
10. PubMed Bibliography Search 9
11. Molecular Phylogeny on IRG gene and proteins 11
1
1. Introduction
The increasing amount of biological and sequencing data enables scientist to retrieve
large amount of information regarding their research interest from public repositories
(databases). Especially, the increasing number of high throughput analysis
techniques increases the volume of databases almost on a daily basis. Here, we like
to introduce/demonstrate how to acquire a set of properties, useful in gene, genome,
and protein analysis.
You can open the file ZZZ.fa copy and paste the sequence into Blat search box
Look at the best hit for your DNA query. This should be your target DNA. What is it ?
Take close look by clicking on “browser” link. Zoom_out the browser. Can you see
the gene name ? What is the gene name ? Click on the gene name. Read the
2
description and the instructions. This is called information page. In this page you can
almost every information about the gene and protein.
Please write down the Gene, Chromosome, Strand Name: Survival of motor neuron 2,
centromeric (SMN2), Chromosome 5, +
3. Gene Expression
Although you can see directly on information page, but it is always better to go and
check the expression profile in its webportal. Please use the GTEx Portal that is
available on “https://www.gtexportal.org/home/”.
Write the name or gene ID to initiate a search to see where and in which tissue it is
expressed mostly ?
3
Cervix-Ectocervix
What could be the reason of this high expression in specific tissue or cells?
Because women give birth by contracting their muscles around the cervix while giving birth
and therefore of this contraction this gene most likely present in the cervix.
4. Genomics Information
Please now go to the GenBank “https://www.ncbi.nlm.nih.gov/genbank/” , and insert
the Acession number from RefSeq database (above). When you enter to the RefSeq
Genebank data format. You will see the GeneBank data format.
This is an example. Please use the information above to get the GeneBank data
for your gene of interest.
Please click on the FASTA in order to get the nucleotide data. Once you click on the
FASTA you will get the nucleotide in FASTA format. Please copy the sequence and
go back to UCSC Genome browser (same as above) and blat it against human
genome (hg38). Click on the “browser” to come to the information page. If you cannot
see the information, you can make each track (Conservation, expression etc.)
available from the bottom and refresh.
4
-9
Check also the RepeatMasker content of the region at the bottom of the information page.
Does the region contain any repetitive element? If yes, please click on each repetitive
element and write the name of them.
1-Repeat L2
2- AluJb
3- AluYm1
4- Repeat (AAAT)n
5- AluJo
6- Repeat L2
7- FRAM
8- Repeat L2
9- AluSx1
10- Repeat L2
11- AluSx4
12- AluJb
13-Repeat L2
14-MSTA
15-Repeat L2
16-AluJb
17-Repeat L2
18-AluY
19-Repeat L2
20-MLT1F
21-L1ME1
22-AluJb
23-L1ME1
24-MER57B2
25-L1ME1
26-AluJo
27-L1ME1
28-LTR19C
29-L1ME1
30-MER4B
31-L1ME1
32-L1ME1
33-AluSq
34-L1ME1
35-L1ME1
36-AluSz
37-MIR
38-AluJb
39-AluSx
40-L1MB2
41-AluSp
42-L1MB2
5
43-AluJb
44-AluSg
45-AluJb
46-(TTGTT)n
47-AluSg
48-(TTTTG)n
49-(ATT)n
50-AluJo
51-AluJr
52-GA-rich
53-(TAA)n
54-AluSx
55-LTR104_Mam
56-AluJo
57-(AGTTTG)n
58-AluSg
59-L1ME3D
60-MIRb
61-AluY
62-AluSx1
63-(CCA)n
64-AluSq
65-(TG)n
66-FLAM-C
67-AluY
68-AluYc3
69-AluSz
70-L1MB8
71-AluYj4
72-L1MB8
73-AluY
74-AluSp
75-AluSp
76-AluSp
77-AluY
78-AluSp
79-AluJr
80-AluSx1
81-AluJr
82-L1MC5a
6
Use the Normal mode for now and click all the options
How many different domains or motifs discovered ? What are the name of each Confidently
predicted domains, repeats, motifs and features:
(Please copy and paste the screenshot (whole browser including the date and time))
7
6. Protein Motif and Modification analysis (Post-translational
modification):
Since we have detected the known domain of our protein now we can check whether
any predicted modification, motifs can be found in our protein. Please go now to
motifscan software by typing “https://myhits.isb-sib.ch/cgi-bin/motif_scan”, and put
your protein search box by clicking all the option.
(Please copy and paste the screenshot (whole browser including the date and time))
8
7. Protein Secondary Structure
It is possible to predict the secondary structure of proteins from the sequence. This is
especially important if the 3D structure of the protein is not available and you want to
use the protein for targeted functional analysis. For example; You have to check the
secondary and if possible 3D structure before designing target peptides for the
antibody generation against your target protein. PSIPRED is a program that does
that based on the amino acid sequence. You can find it at
“http://bioinf.cs.ucl.ac.uk/psipred/”
Before processing the program may give error for invalid character because it does
not accept any empty space or other things. You can clean your protein or DNA
sequence easily by
“http://www.cellbiol.com/scripts/cleaner/dna_protein_sequence_cleaner.php”
9
Predict the secondary structure of your protein. Which secondary structure elements
are predicted and which one is the most frequent in your protein ? Depending on the
options you have selected, this process may take a while. Go on with the next
exercises and come back to this point, when the calculation is finished.
https://predictprotein.org
With predict protein, you can try to determine many properties of protein including
secondary, tertiary, topology, DNA binding domain etc.
(Please copy and paste the screenshot (whole browser including the date and time))
10
8. Tertiary, or 3D structures
For many proteins the three-dimensional structure has been documented by e.g.
crystallography or spectrometry. The Protein Data Bank (PDB) stores the information.
Go to the “https://www.rcsb.org”, and look for your protein. You can type the name of
the protein (above) or PBD ID in the search box. Please remember that not all protein
structure are determined. Can you find your protein there ?
(Please copy and paste the screenshot (whole browser including the date and time))
11
9. Protein-Protein Interactions
Proteins interact with other proteins in the living cell to perform various tasks. The
BioGrid and GeneMania Database collects described interactions between proteins
from published studies and provide it with annotations. Please go to Genemania
“https://genemania.org” and search for your protein. You can also put list of proteins
that you want to see how many of them interact or co-expressed.
(Please copy and paste the screenshot (whole browser including the date and time))
12
10. PubMed Bibliography Search
We are using the information based on the publications. All these databases uses the
publication as a resource. Therefore it is always best to go and search original
publication to answer the such questions “Who is the discoverer of this Gene ? ”
What could be the other functions which are not indicated in the databases ?”
Please write down the title and authors of most recent publication about the
gene that you have found.
1. Comley LH, Kline RA, Thomson AK, Woschitz V, Landeros EV, Osman EY,
Lorson CL, Murray LM. Motor Unit Recovery Following Smn Restoration in
Mouse Models of Spinal Muscular Atrophy. Hum Mol Genet. 2022 May
13
12:ddac097. doi: 10.1093/hmg/ddac097. Epub ahead of print. PMID:
35551393.
2. Du LL, Sun JJ, Chen ZH, Shao YX, Wu LC. NOVA1 promotes SMN2 exon 7
splicing by binding the UCAC motif and increases SMN protein expression.
Neural Regen Res. 2022 Nov;17(11):2530-2536. doi: 10.4103/1673-
5374.339005. PMID: 35535907.
3. Nagy ZF, Pál M, Salamon A, Kafui Esi Zodanu G, Füstös D, Klivényi P,
Széll M. Re-analysis of the Hungarian amyotrophic lateral sclerosis
population and evaluation of novel ALS genetic risk variants. Neurobiol
Aging. 2022 Apr 9;116:1-11. doi: 10.1016/j.neurobiolaging.2022.04.002.
Epub ahead of print. PMID: 35525134.
11. Please download the protein sequences for IRGM from Itslearning and
perform the Multiple Alignment (Clustalw). Please remember that you need
to edit (remove the information.. title before doing the alignment)
(Please copy and paste the screenshot (whole browser including the date and time))
14
15
16
17
18
19
20
21
22
23
24
25
12. Please download the protein and nucleotide sequences for IRGM from
Itslearning and perform the phylogenetic analysis both for nucleotide and
protein sequences by using MEGA11. Copy and paste the screenshots of
both phylogenetic tree below (Date and Time should be seen).
Protein Sequence
26
Nucleotide Sequence
27
13. Conclusion and Discussion about IRGM gene
Please write your comment about the IRGM gene, what is the function
and whether it is involved in Crohn diseases by proper citation with
the literature search from PubMed. (Please also download a file called
using a reference to learn how to cite the articles within the text)
Please list 3 most recent publication about the IRGM gene in Human.
1-Wang LL, Jin XH, Cai MY, Li HG, Chen JW, Wang FW, Wang CY, Hu
WW, Liu F, Xie D. Corrigendum to "AGBL2 promotes cancer cell
growth through IRGM-regulated autophagy and enhanced Aurora
A activity in hepatocellular carcinoma" [Canc. Lett. 414 (2018) 71-
80]. Cancer Lett. 2022 May 4:215700. doi:
10.1016/j.canlet.2022.215700. Epub ahead of print. Erratum for:
Cancer Lett. 2018 Feb 1;414:71-80. PMID: 35525812.
28
2-Olivieri G, Ceccarelli F, Perricone C, Ciccacci C, Pirone C,
Natalucci F, Spinelli FR, Alessandri C, Borgiani P, Conti F. Fever in
systemic lupus erythematosus: associated clinical features and
genetic factors. Clin Exp Rheumatol. 2022 Mar 23. doi:
10.55563/clinexprheumatol/7x37pf. Epub ahead of print. PMID:
35349414.
3-Liang C, Fan J, Liang C, Guo J. Identification and Validation of a
Pyroptosis-Related Prognostic Model for Gastric Cancer. Front
Genet. 2022 Feb 25;12:699503. doi: 10.3389/fgene.2021.699503.
PMID: 35280928; PMCID: PMC8916103.
References
1-Jena KK, Mehto S, Nath P, Chauhan NR, Sahu R, Dhar K, Das SK, Kolapalli
SP, Murmu KC, Jain A, Krishna S, Sahoo BS, Chattopadhyay S, Rusten TE,
Prasad P, Chauhan S, Chauhan S. Autoimmunity gene IRGM suppresses
cGAS-STING and RIG-I-MAVS signaling to control interferon response.
EMBO Rep. 2020 Sep 3;21(9):e50051. doi: 10.15252/embr.202050051.
Epub 2020 Jul 27. PMID: 32715615; PMCID: PMC7507369.
2-Feuerstein JD, Cheifetz AS. Crohn Disease: Epidemiology, Diagnosis, and
Management. Mayo Clin Proc. 2017 Jul;92(7):1088-1103. doi:
10.1016/j.mayocp.2017.04.010. Epub 2017 Jun 7. PMID: 28601423.
29