This document provides an introduction to bioinformatics. It defines bioinformatics as the use of computers to study and manage biological data, especially using algorithms and software. It explains that bioinformatics integrates biology, computer science, and information technology. It also gives examples of key applications of bioinformatics like sequence analysis, structure prediction, gene function prediction, and analyzing genetic mutations and diseases.
This document provides an introduction to bioinformatics. It defines bioinformatics as the use of computers to study and manage biological data, especially using algorithms and software. It explains that bioinformatics integrates biology, computer science, and information technology. It also gives examples of key applications of bioinformatics like sequence analysis, structure prediction, gene function prediction, and analyzing genetic mutations and diseases.
Lecturer Government Degree College Boys Baramulla Mobile: 9622599672 Email: [email protected]
Dr.Shayaq Ul Abeer Rasool
Dr.Shayaq Ul Abeer Rasool What is Bioinformatics? • Bioinformatics is the use of computers to study biology • Bioinformatics is the science of using information to understand biology • Bioinformatics is integration of information technology (IT) and biology • Bioinformatics is the development of computational methods for studying structure, function and evolution of genes, proteins and whole genomes Dr.Shayaq Ul Abeer Rasool Biological Data + Computer Calculations
Bioinformatics
Dr.Shayaq Ul Abeer Rasool
What is Bioinformatics? •Bioinformatics is “the study of the information content and information flow in biological systems and processes •Bioinformatics is a relatively new interdisciplinary field that integrates computer science, mathematics, biology, and information technology to manage, analyze, and understand biological, biochemical and biophysical information. •Bioinformatics deals with Development of methods & algorithms to organize, integrate, analyze and interpret biological data Dr.Shayaq Ul Abeer Rasool Dr.Shayaq Ul Abeer Rasool Dr.Shayaq Ul Abeer Rasool Dr.Shayaq Ul Abeer Rasool • Between 1958 and 1962, Margaret Dayhoff worked alongside Robert Ledley at the National Biomedical Resource Foundation to develop the computer program known as COMPROTEIN • In 1970, Needleman and Wunsch established dynamic programming algorithm for the alignment of pairwise protein sequences • Paulien Hogeweg and Ben Hesper coined term “Bioinformatics” in 1970 to refer to the study of information processes in biotic systems • In 1976, the Sanger’s sequencing method became the first widely adopted DNA sequencing method. • In 1979, Roger Staden created the first available software to analyze DNA using Sanger sequencing • In 1990’s the Human Genome Project was established. In 2003 accurate and complete human genome sequence was published Dr.Shayaq Ul Abeer Rasool Dr.Shayaq Ul Abeer Rasool Dr.Shayaq Ul Abeer Rasool Dr.Shayaq Ul Abeer Rasool Biological Information in Genome • Made up of ~35,000-50,000 genes which code for functional proteins in the body • Includes non-coding sequences located between genes, which makes up the vast majority of the DNA in the genome (~95%) • The particular order of nucleotide bases (As, Gs, Cs, and Ts) determines the amino acid composition of proteins • Information about DNA variations (polymorphisms) among individuals can lend insight into new technologies for diagnosing, treating, and preventing diseases
Dr.Shayaq Ul Abeer Rasool
Genome size and length of various organisms DNA Length Weight (Da)
Dr.Shayaq Ul Abeer Rasool Dr.Shayaq Ul Abeer Rasool Dr.Shayaq Ul Abeer Rasool Dr.Shayaq Ul Abeer Rasool Dr.Shayaq Ul Abeer Rasool Aims of bioinformatics
➢ Store the biological data organized in form of a database.
easy access to existing information and submit new entries. Annote data and assign its functional characteristics
➢Develop tools and resources that aid in the analysis of data
find out similar nucleotide/amino-acid sequences- BLAST align more nucleotide/amino-acid sequences- ClustalW design primer probes for PCR techniques – Primer3
➢Analyze the biological data interpret the results in a biologically
meaningful manner.
Dr.Shayaq Ul Abeer Rasool
Applications of Bioinformatics Various bioinformatics application can be categorized under following groups ➢Sequence Analysis All the applications that analyzes various types of sequence information can compare between similar types of information is grouped under Sequence Analysis ➢Function Analysis These applications analyze the function engraved within the sequences and helps predict the functional interaction between various proteins or genes ➢Structure Analysis Applications predicting structure of biomolecules
Function Analysis • Gene Expression Profiling • Metabolic Pathway Modeling • Protein Interaction Prediction • Protein Subcellular Localization
Dr.Shayaq Ul Abeer Rasool
Structure Analysis • Nucleic Acid Structure Prediction • Effects of Variants on structure • Protein Structure Prediction • Protein Structure Classification • Protein Structure Comparison
Dr.Shayaq Ul Abeer Rasool
Dr.Shayaq Ul Abeer Rasool Genome annotation • Annotation is the process of marking the genes and other biological features in a DNA sequence • Annotation is made possible by the fact that genes have recognizable start and stop regions • Gene finding is a chief aspect of nucleotide- level annotation • Gene anotation includes Promoter searching, signal peptide searching, marking exons and introns and coding regions in a gene
Dr.Shayaq Ul Abeer Rasool
Gene Function Prediction • The properties of sequences can be used to predict the function of genes • Most gene function prediction methods focus on protein sequences • Distribution of hydrophobic amino acids predicts transmembrane segments in proteins • Bioinformatics tools can also use external information such as gene/ protein expression data, protein structure, or protein-protein interactions to predict protein function
Dr.Shayaq Ul Abeer Rasool
Evolutionary biology • Evolutionary biology is the study of the origin and descent of species, as well as their change over time • Bioinformatics can help to trace the evolution of a large number of organisms by measuring changes in their DNA • We can compare entire genomes, which permits the study of more complex evolutionary events • Track and share information on an increasingly large number of species and organisms • Build complex computational population genetics models to predict the outcome of the system over time Dr.Shayaq Ul Abeer Rasool Genetics of disease • Bioinformatics can help to map the genes of complex polygenic diseases like infertility, breast cancer or Alzheimer's disease • Genome-wide association studies are a useful approach to pinpoint the mutations responsible for such complex diseases • Thousands of DNA variants have been identified that are associated with similar diseases and traits • Better tools for diagnosis or treatment is one of the most essential applications of bioinformatics for complex disorders
Dr.Shayaq Ul Abeer Rasool
Analysis of mutations • The genomes of disease affected cells are rearranged in complex or even unpredictable ways (mutations) • Massive sequencing efforts are used to identify previously unknown point mutations in a variety of disease associated genes • Detection methods using bioinformatics tools simultaneously measure several hundred thousand sites throughout the genome to find new mutations Dr.Shayaq Ul Abeer Rasool Gene & Protein Expression Analysis • The expression of genes can be determined by measuring mRNA levels • Gene expression of whole genome (Transcriptomics) gives lot of data that can be analysed by bioinformatics • The techniques used in sequencing are extremely noise-prone and/or subject to bias in the biological measurement, so bioinformatics helps to filter signal from noise • Protein microarrays and high throughput (HT) mass spectrometry (MS) can provide a snapshot of the proteins present in a biological sample
Dr.Shayaq Ul Abeer Rasool
NCBI National Center for Biotechnology Information ncbi.nlm.nih.gov • Created in 1988 as part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH) • NCBI houses a series of databases relevant to biotechnology and biomedicine • Major databases include GenBank for DNA sequences and PubMed a bibliographic database • Gene, Online Mendelian Inheritance in Man OMIM, the Molecular Modeling Database (3D protein structures), dbSNP (a database of single-nucleotide polymorphisms), the Reference Sequence Collection, a map of the human genome Dr.Shayaq Ul Abeer Rasool Dr.Shayaq Ul Abeer Rasool EBI European Bioinformatics Institute www.ebi.ac.uk • Created in 1992 as part of part of the European Molecular Biology Laboratory (EMBL) • EMBL-EBI indexes and maintains biological data in a set of databases, including Ensembl (housing whole genome sequence data), UniProt (protein sequence and annotation database) and Protein Data Bank (protein and nucleic acid tertiary structure database)
Dr.Shayaq Ul Abeer Rasool
Dr.Shayaq Ul Abeer Rasool ExPASy (Expert Protein Analysis System) • Created in 1993, is an online bioinformatics resource operated by the SIB Swiss Institute of Bioinformatics • provides access to over 160 databases and software tools and supports a range of life science and clinical research areas • acted as a proteomics server to analyze protein sequences and structures and two-dimensional gel electrophoresis (2-D Page electrophoresis)
Dr.Shayaq Ul Abeer Rasool
Dr.Shayaq Ul Abeer Rasool Entrez • This molecular biology database Search System is a search engine, that allows users to search many health sciences databases at the NCBI • Entrez Global Query is a search and retrieval system that provides access to all databases simultaneously with a single query and user interface • Entrez covers over 20 databases including PIR- International, Swiss-Prot, and PDB and nucleotide sequence data from GenBank that includes information from EMBL and DDBJ Dr.Shayaq Ul Abeer Rasool PDB Protein Data Bank www.wwpdb.org • A database for the three-dimensional structural data of large biological molecules, such as proteins and nucleic acids • The Protein Data Bank was announced in October 1971 as a joint venture between Cambridge Crystallographic Data Centre, UK and Brookhaven National Laboratory, US. • PDB is a key in areas of structural biology and scientists are required to submit their structure data to the PDB
Dr.Shayaq Ul Abeer Rasool
Dr.Shayaq Ul Abeer Rasool Uniprot • UniProt provides the scientific community with a comprehensive, high quality and freely accessible resource of protein sequence and functional information. • UniProt is comprised of four components, each optimized for different uses:
Dr.Shayaq Ul Abeer Rasool
UniProt Knowledgebase (UniProtKB) is the central access point for extensive curated protein information, including function, classification, and cross-reference UniProtKB comprises two sections: UniProtKB/Swiss-Prot (1986) SWISS-PROT, a curated protein sequence data bank, contains not only sequence data but also annotation relevant to a particular sequence. The annotation added to each entry is done by a team of biologists and comes, primarily, from articles in journals reporting the actual sequencing and sometimes characterisation Review articles and collaboration with external experts also play a role along with the use of secondary databases like PROSITE and Pfam in addition to a variety of feature prediction methods Dr.Shayaq Ul Abeer Rasool UniProtKB/TrEMBL (Translation of the EMBL database) •TrEMBL consists of entries that are derived from the translation of all coding sequences in the EMBL/GenBank/ DDBJ nucleotide sequence database. Nucleotide Sequence Databases and also protein sequences extracted from the literature or submitted to UniProtKB/Swiss-Prot. •TrEMBL supplements SwissProt •Unlike SWISS-PROT entries those in TrEMBL are awaiting manual annotation. However, rather than just representing basic sequence and source information, steps have been taken to add features and annotation automatically Dr.Shayaq Ul Abeer Rasool