0% found this document useful (0 votes)

37 views

SHES2201 Lecture 3 - Data Mining in Bioinformatics

The document discusses various data mining techniques used in bioinformatics including classification and prediction, clustering, data summarization, dependency modeling, and change and deviation detection. It provides examples of how these techniques are applied to problems like identifying coregulated genes and predicting protein structures.

Uploaded by

Koo Xue Ying

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views

SHES2201 Lecture 3 - Data Mining in Bioinformatics

Uploaded by

Koo Xue Ying

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 45

SHES2201

Lecture 3 – Data-mining in Bioinformatics

Profesor Madya Khairuddin Itam

Room B20, Bioinformatics Division
[email protected]
03-79676738
Data Mining The Data
Inference from biological data
• Goal is to move from raw data to meaningful
conclusions.
• Examples: detecting remote homologues,
identifying coregulated genes, predicting binding
affinities
• Broadly applicable computational techniques:
clustering, discrimination/regression & density
estimation
Data Mining Techniques

• Common Techniques
– Classification and prediction
– Clustering
– Data summarization
– Dependency modeling
– Change and deviation detection
Data Mining Techniques
• Dependency modeling.
– The aim is to derive some causal structure within the data.
– One example is functional dependency between predicates.

Given: a sample (query) object and a database containing a set of

objects,
• Find the objects within the database that are within a user-
defined distance of the queried object
• Find all pairs within some distance of each other.
Data Mining Techniques
Clustering.
– The aim is to partition or segment the set of data items into
smaller subsets.
– The elements of one subset are similar to each other (high
intra-group similarity) and significantly different from elements
in other subsets (small inter-group similarity).
– Also called unsupervised learning
Clustering
• Begin with a set of instances (e.g. gene
sequences, protein structures) and a distance
metric.
• Create a collection of groups of the instances
which are more similar to each other than they
are to instances in other groups. Groups can be
hierarchically clustered themselves
• Examples:
– Building taxonomic trees from aligned sequences
– Identifying coregulated genes from expression arrays
Data Mining Techniques
• Classification and prediction
– The aim is to predict the value of some database field based on
the values of other fields.
– The field to predict is sometimes called class.
– If the class takes discrete values, then it is a classification
problem.
– If the class takes continuous numerical values it is a regression
problem.
– Also called supervised learning
Discrimination/Regression
• Induce a predictor of some aspect (the label) of an
instance from other aspects. Numeric predictions
are regression, class predictions discrimination.
• Beginning with a training set of labeled instances
• Produce a model which accurately predicts the labels
of other (unlabeled) instances.
• Examples:
– Protein secondary structure prediction
– Prediction of drug response from gene expression
Discrimination & Regression
 Clusters of co-expressed genes are interesting,
but just a first step. Really want predictive
models, gene networks, etc.
 Biology tells us that the predictors are likely to
involve interactions more than linear effects.
 Traditional statistics is not strong in non-linear
models, high order interactions, large
datasets.
Data Mining Techniques

• Data summarization.
– The aim is
• to discover patterns that describe subsets of the data
(attribute focusing), and
• to extract rules from the data telling us how a subset
of data influences the presence of another subset
– Association Rules Mining (ARM) relate to an
undirected/unsupervised data mining technique.
– Usually produces clear and understandable results

Detect sets of attributes that frequently occur together, and

also the rules among them. Example: 60% of the
population with a credit card also has a charge card (40%
of shoppers have both)
Density estimation

• Produces a method for assessing the probability

of an observation.
Like a histogram
• Uses a set of observations (and, optionally, a
distance function)
• Examples
– Recognition of members of protein families
– Evaluation of diversity of compound libraries
Particular applications

• Those broad computational approaches have

many particular instantiations and applications
• Two examples
– Hidden Markov models for multiple sequence
alignment and homologous family discrimination
– Analysis of gene expression array data
• Finding genes that vary significantly
• Estimating the number of clusters
• Finding high-order discriminators
Hidden Markov models

• Technique from speech understanding is now widely used

in sequence analysis
• Good software and tutorials on the web
http://www.cse.ucsc.edu/research/compbio/ismb99.tuto
rial.html
• HMMs infer unobserved states that influence the
probability distribution of observed states
• Most common use is to model sequence families.
Data Mining Techniques
• Change and deviation detection.
– Data has a sequential structure, either temporal,
physical or other.
– The aim is to find patterns assuming an ordering of
the observations.

Find the record(s) that is (are) the most different from

the other records; i.e., find all outliers. These outliers
may be thrown away as noise or may be the
“interesting” ones.
Expression Array Analysis

• Gene expression arrays are a popular new

technique for assaying the expression level of
tens of thousands of genes simultaneously
• Many problems arise in analyzing this data
• Collaborative groups are now developing tools
and procedures for such analysis
Expression Analysis Issues

• Identifying genes that changed significantly

over a set of observations. How much change
is enough? What's wrong with 2-fold?
• Estimating the number of expression clusters.
How many groups of genes are there?
• Finding discriminators based on expression
levels (e.g. for response to drugs)
Data Mining Techniques
• Other techniques
– Three dimension visioning
– Decision trees
– Neural networks
– Genetic algorithms
– Hidden markov models
– Time series
– Bayesian networks
– Soft computing : rough and fuzzy sets
– Graphical models
– Density estimation
Some Bioinformatics
Data Mining Perspectives
Taking advantage of public data

• An enormous amount of high quality data is

available free.
How to find public data
• Start with http://www.ncbi.nlm.nih.gov
• Consult the Jan 2001 issue of Nucleic Acids
Research
http://nar.oupjournals.org/content/vol28/issu
e1
• Metadata sites (listings of databases)
– The NAR issue has an associated site
– http://www.genome.ad.jp/kegg/kegg4.html
– http://bioinformatics.weizmann.ac.il/mb/molecula
r_biol_databases.html
• Commercial portals, e.g. www.biolinks.com,
www.doubletwist.com
NCBI: Ground Zero
• The National Center for Biotechnology
Information is the first place to go. Sequences,
structures, PubMed, taxonomy, medical genetics,
etc.
• Spend some time learning all it has to offer.
There are good online tutorials at
http://www.ncbi.nlm.nih.gov/Education/ Look at
the site map, not just the front page!
• Check out PROW (Protein Reviews on the Web), a
journal/reference source at NCBI.
An Abundance of Specialized Data
• Gene sequences and protein structures are not
all there is!
• Metabolic, regulatory and signaling pathway data
is growing rapidly
• Carbohydrates, drugs, lipids, diseases, organisms,
etc. all have their own public databases
Integrated data sources
• Like the data, the shear volume of databases can
be overwhelming.
• Integrated systems offer organized summaries of
diverse datasets.
• An excellent starting place for information about
human genes are GeneCards:
http://bioinformatics.weizmann.ac.il/cards/
• And ENTREZ at NCBI.
• Biozon at Stanford
Definition and scope
• In the computational sense, bioinformatics is the
– systematic development and
– application of computing systems and
– computational solution techniques,
– analyzing biological datasets obtained by experiments,
– modeling,
– database search and
– instrumentation.
Computational Perspective
• A sampling of spheres of research carried out by
biocomputing scientists from the computational
perspective are discussed next.
Computational Perspective
Neural networks
• Development and application of novel computational
techniques based on neural networks.
• First proposed by McCulloch and Pitts in 1943.
• Neural nets comprised a set of interconnected nodes,
based on the natural nervous systems, and with various
mechanisms of interconnections.
• Neural network architectures are usually designed to
complete specific tasks through some sort of learning
procedure or mechanism.
• Neural networks and genetic algorithms are utilised to
classify DNA sequences, predict sequenced based
protein structures and optimisation of molecular
structures (Anonymous, 1996).
Computational Perspective
Evolutionary algorithms
• Based on observations on biological processes of natural
selection and includes genetic algorithms, evolutionary
strategies, evolutionary programming and genetic
programming.
• Applications developed from these algorithms are such
as: routing and scheduling, time tabling, financial
models, data analysis and data mining (Langdon, 1995,
Abramson and Abela, 1991).
• Neural networks are also used in research to classify
nucleic acid sequences and sequence-based prediction
of protein structure (Notredame and Higgins, 1995),
while genetic algorithms are used in molecular structure
optimisation and protein and RNA folding (Ogata et al.,
1995, Shapiro, 1996).
Computational Perspective
Molecular computing
• The first DNA based computer was developed by
Adleman (1994) to solve the Hamiltonian Path problem.
• The goal of the Hamiltonian Path problem is to find a
path from one city to another city going through every
city only once.
• It took the DNA-computer one week to process and
complete the operation for a seven city problem (which
can be solved with a pen and paper within an hour).
• As the number of cities increases to more than 70,
conventional (serial logic) computers (including
supercomputers) are unable to solve the problem
completely and efficiently.
• The DNA-computer operates in a massively parallel
construction, and solved the complex problem within
the same period of time!.
Computational Perspective
Molecular computing

Another example
• bacteriarhodopsin (bR) from the bacteria,
Halobacterium halobium, are now being used by
scientists to produce bioelectronic switches a thousand
times smaller and faster than current semiconductor
technologies.
• Hong, Birge and others (in Vitaliano, 1996) are
researching electronic photo-active bR systems to
develop massively parallel and massively distributed
biocomputers.
Biological Perspective

• A sampling of research activities carried out by

biocomputing scientists from the biological
perspective are listed in the next paragraphs.
• Complete discussion of individual research is
beyond the scope of this paper.
Biological Perspective
Gene expression and genetic networks
• Large scale gene expression identification and data
analysis using micro-array, Expressed Sequence Tags
(ESTs), SAGE, DNA chip, etc.
• Identification of coordinated gene expression and
regulatory sequences and their functional
characteristics.
• Expression profile or sequence motif identification and
classification using novel pattern recognition methods.
• Forward modeling of genetic networks based on
Boolean, continuous and stochastic nets
• Development of reverse-engineering algorithm to
extract information from noisy sequence data.
Biological Perspective
Distributed and intelligent databases
• Developing robust and high-speed network to cater for
the needs of the scientific community - Asia Pacific
Bioinformatics Network
• Developing integrated database search engines and
retrieval system -BioXML Project, KRIS Program Suite.
• Developing an ontology to bridge (or middleware)
between the different notions in various databases
Biological Perspective
Visualisation and interactive molecular modeling
tools
• The study of structure, energetics and dynamics of
proteins and their interaction with ligands.
• Using Virtual Reality Modeling Language (VRML) to
develop models of the substrate channels in
cytochrome P450 - German Cancer Research Center.
• Developing musical algorithms to provide a different
perspective into the structure of DNA - The Nucleic Acid
Database Project
• High throughput graphics library for molecular structure
viewer RASMOL - Electrotechnical Laboratory Japan.
Biological Perspective
Analysis, management and application of single
nucleotide polymorphisms (SNP) data
• Automation of large scale SNP genotyping
• Tools for high throughput SNP discovery and screening
• Visualization and analysis of SNP data
Biological Perspective
Computer-aided drug design
• Development of large and high throughput
combinatorial libraries - ECLiPStm from Pharmacopeia
• Protein evolution and structural genomics
• Developing an information-theoretic DNA compression
scheme for new gene discovery and studying DNA
compression
• Developing a software that supports semi-automated
annotation of uncharacterized sequence data - GAIA
Univ Pennsylvania
Biological Perspective
Natural language processing for biology
• GenEng: A dialogue-based natural language user
interface to the GeneBank - Center for High
Performance Computing, Univ. Texas
• A logic-based syntactic pattern recognition system for
DNA sequences - GenLang, Univ. Pennsylvania.
• Protein structure prediction in biology and medicine
• Probable "folding cluster" role of non-functional
conserved residues of protein. - Laboratory of
Experimental & Computational Biology, National Cancer
Institute, NIH
Biological Perspective
Application of information theory to biology
• Coincident Detection Method to detect functional and
immunological sites of the highly variable HIV V3 Loop -
Molecular Mining Corporation
• Building predictive prototypes of the immune system
function with information theory models
• Using Minimum Message Length from Information
Theory on Structural Building Blocks (SBB) to identify
different distributions of rotamer classes amongst the
SBB's
Biological Perspective
Data mining and discovery in molecular databases
• Investigating the motif rules that predict T cell
activation, from peptide databases with high binding
affinity to the same MHC class I molecule - BONSAI,
Medical Institute of Bioregulation, Kyushu University.
Biological Perspective
Internet tools for computational biology
• Developing tools to access high volume, heterogeneous
and geographically dispersed biological databases in an
integrated manner - KRIS, NUS Bioinformatics Centre
• Understanding genomic and protein structures on the
WWW. - CHIME & RASMOL
• Virtual Reality meeting place for biologists - BioMOO,
Weizmann Institute.
Biological Perspective
Educational topics in biocomputing
• M.Sc. in medical informatics with biomedical computing
skills - Biomedical Information and Communnication
Center, Oregon Health Sciences University.
• Computational biology as an instructional tool for
graduates - Center of Bioengineering, University of
Washington
Some Advice
• A little bioinformatics is good for you!
– Know how to use web data resources
– Know the kinds of analyses that are possible
• Sequence and structure computations are
widespread and (fairly) easy. Finding exons, remote
homologies, structural domains, fold families, etc.
are routine.
• Generic clustering, discrimination/regression and
density estimation tools exist (neural networks...)
• Collaboration with bioinformaticians is no
worse than with statisticians.... :-)
Some Advice
• A little bioinformatics is good for you!
– Know how to use web data resources
– Know the kinds of analyses that are possible
• Sequence and structure computations are
widespread and (fairly) easy. Finding exons, remote
homologies, structural domains, fold families, etc.
are routine.
• Generic clustering, discrimination/regression and
density estimation tools exist (neural networks...)
• Collaboration with bioinformaticians is no
worse than with statisticians.... :-)

Business Chinese for Beginner - Speaking 初级商务汉语 - 口语 (PDFDrive)
100% (8)
Business Chinese for Beginner - Speaking 初级商务汉语 - 口语 (PDFDrive)
295 pages
YCT Standard Course 4
100% (4)
YCT Standard Course 4
75 pages
Data Mining
No ratings yet
Data Mining
26 pages
Clustering Is The Process of Organizing Objects Into Groups Whose Members Are
No ratings yet
Clustering Is The Process of Organizing Objects Into Groups Whose Members Are
6 pages
17 GM ASAP Data Mining - Clustering
No ratings yet
17 GM ASAP Data Mining - Clustering
107 pages
Introduction To Data Mining For Bioinformatics: Fall 2005 Peter Van Der Putten (Putten - at - Liacs - NL)
No ratings yet
Introduction To Data Mining For Bioinformatics: Fall 2005 Peter Van Der Putten (Putten - at - Liacs - NL)
50 pages
Why We Need Data Mining?
No ratings yet
Why We Need Data Mining?
39 pages
Biological Databases
No ratings yet
Biological Databases
28 pages
lecture1428550844
No ratings yet
lecture1428550844
84 pages
Chapter 3: Data Mining
No ratings yet
Chapter 3: Data Mining
20 pages
DSA Unit1
No ratings yet
DSA Unit1
37 pages
DataMiningTechniques
No ratings yet
DataMiningTechniques
24 pages
Basics of Bioinformatics
100% (7)
Basics of Bioinformatics
99 pages
Biological Data Mining: Back Ground
No ratings yet
Biological Data Mining: Back Ground
3 pages
KMSPquickreviewfinal
No ratings yet
KMSPquickreviewfinal
47 pages
10.1.1.302.9956
No ratings yet
10.1.1.302.9956
13 pages
Data Mining Simran
No ratings yet
Data Mining Simran
128 pages
Qualitative Data Analysis
No ratings yet
Qualitative Data Analysis
24 pages
Predicting Earthquakes Through Data Mining
No ratings yet
Predicting Earthquakes Through Data Mining
12 pages
Introduction and Overview of Bioinformatics
100% (1)
Introduction and Overview of Bioinformatics
1 page
CENG3300 Lecture 3
No ratings yet
CENG3300 Lecture 3
24 pages
3text
No ratings yet
3text
2 pages
Genetic Algorithms in Artificial Neural Network (Autosaved)
No ratings yet
Genetic Algorithms in Artificial Neural Network (Autosaved)
19 pages
HaftamuA ArticleReview
No ratings yet
HaftamuA ArticleReview
39 pages
Dwdmsem 6 QB
No ratings yet
Dwdmsem 6 QB
13 pages
Da Session 1
No ratings yet
Da Session 1
50 pages
CHAPTER1-datamining
No ratings yet
CHAPTER1-datamining
33 pages
5 What Is Data-WPS Office
No ratings yet
5 What Is Data-WPS Office
19 pages
1.supervised and Unsupervised
No ratings yet
1.supervised and Unsupervised
42 pages
Lecture Notes 1 2 Intro Python
No ratings yet
Lecture Notes 1 2 Intro Python
13 pages
Notes 3 Biomolecular Deep Learning Models
No ratings yet
Notes 3 Biomolecular Deep Learning Models
3 pages
DWDM R13 Unit 1 PDF
No ratings yet
DWDM R13 Unit 1 PDF
10 pages
Patterns Mined +frequent Patterns
No ratings yet
Patterns Mined +frequent Patterns
18 pages
Unit v Data Mining
No ratings yet
Unit v Data Mining
62 pages
Gupta 2021
No ratings yet
Gupta 2021
29 pages
Dmbi Unit-4
No ratings yet
Dmbi Unit-4
18 pages
Week 1-2
No ratings yet
Week 1-2
3 pages
Data Mining Based On Neural Networks: Fore Word: What Is A Neural Network?
No ratings yet
Data Mining Based On Neural Networks: Fore Word: What Is A Neural Network?
21 pages
Data Mining
No ratings yet
Data Mining
22 pages
report 2
No ratings yet
report 2
7 pages
Data Mining System and Applications A Re
No ratings yet
Data Mining System and Applications A Re
13 pages
KRAWXZYKINFFUS2017
No ratings yet
KRAWXZYKINFFUS2017
86 pages
Cluster Analysis
No ratings yet
Cluster Analysis
36 pages
Comprehensive_Pattern_Recognition_Lecture_Notes
No ratings yet
Comprehensive_Pattern_Recognition_Lecture_Notes
12 pages
Microarray Experiment Design
No ratings yet
Microarray Experiment Design
18 pages
CS3491 - Aiml - Unit Iii Supervised Learning
No ratings yet
CS3491 - Aiml - Unit Iii Supervised Learning
162 pages
ML Lect1
100% (1)
ML Lect1
51 pages
Data Mining: An Overview From A Database Perspective
No ratings yet
Data Mining: An Overview From A Database Perspective
30 pages
Introduction To Bioinformatics Presentation
No ratings yet
Introduction To Bioinformatics Presentation
13 pages
Deep Learning Healthcare
No ratings yet
Deep Learning Healthcare
10 pages
Module 2 (Bioinformatics)
No ratings yet
Module 2 (Bioinformatics)
81 pages
Data Mining and Warehouse
No ratings yet
Data Mining and Warehouse
30 pages
MOST ASKED QUESTIONS Pattern Recognition GTU
No ratings yet
MOST ASKED QUESTIONS Pattern Recognition GTU
23 pages
Unit 6 - Bioinformatics
No ratings yet
Unit 6 - Bioinformatics
41 pages
DS
No ratings yet
DS
7 pages
Data Mining Notes
No ratings yet
Data Mining Notes
9 pages
Dwdm Unit-II Notes
No ratings yet
Dwdm Unit-II Notes
29 pages
Clustering Agglo Devisive DBSCAN
No ratings yet
Clustering Agglo Devisive DBSCAN
78 pages
Unit 3
No ratings yet
Unit 3
33 pages
Systems Biology: - Genomics
No ratings yet
Systems Biology: - Genomics
13 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Shese 2201 - Practical 2 Answer
No ratings yet
Shese 2201 - Practical 2 Answer
10 pages
Seb080015 Tut04shes2300
No ratings yet
Seb080015 Tut04shes2300
2 pages
SHES2300: Operating System Linux Basics: Exercise T4
No ratings yet
SHES2300: Operating System Linux Basics: Exercise T4
7 pages
Briefly Define and Compare Gantt Charts and Network Diagrams
No ratings yet
Briefly Define and Compare Gantt Charts and Network Diagrams
4 pages
Shes2201 Tut01
No ratings yet
Shes2201 Tut01
2 pages
Shes 2302 Sistem Maklumat 09 Bioinformatics Department Tutorial 2 Name: Matric No
No ratings yet
Shes 2302 Sistem Maklumat 09 Bioinformatics Department Tutorial 2 Name: Matric No
2 pages
Seb080015-Shes2302 Tut08
No ratings yet
Seb080015-Shes2302 Tut08
4 pages
100 Fundamental Characters
100% (3)
100 Fundamental Characters
50 pages
Seb080015 Tut04
No ratings yet
Seb080015 Tut04
2 pages
轻松学中文1 workbook
100% (1)
轻松学中文1 workbook
186 pages
Discuss Unary, Binary, and Ternary Relationships. Provide An Example of Each
No ratings yet
Discuss Unary, Binary, and Ternary Relationships. Provide An Example of Each
3 pages
YCT 3 Chapter 8 To 10
No ratings yet
YCT 3 Chapter 8 To 10
1 page
YCT2 Work Book
100% (1)
YCT2 Work Book
45 pages
Jesus Is Lord
No ratings yet
Jesus Is Lord
2 pages
Get (Etextbook PDF) For Understanding Child Development 10th Edition Free All Chapters
100% (2)
Get (Etextbook PDF) For Understanding Child Development 10th Edition Free All Chapters
46 pages
Iq200 Digital Image Guide US
No ratings yet
Iq200 Digital Image Guide US
7 pages
Free Thesis PowerPoint Template
No ratings yet
Free Thesis PowerPoint Template
31 pages
James Foster - Session 4 High Accuracy GNSS
No ratings yet
James Foster - Session 4 High Accuracy GNSS
17 pages
The Mangalayatan University Uttar Pradesh Act, 2006
No ratings yet
The Mangalayatan University Uttar Pradesh Act, 2006
15 pages
Essay Structure (CAE Writing, Part 1)
No ratings yet
Essay Structure (CAE Writing, Part 1)
2 pages
Majina Ya Walimu Wapya 2020
80% (5)
Majina Ya Walimu Wapya 2020
196 pages
All Elements
100% (2)
All Elements
392 pages
US Dept of Commerce Data Strategy
No ratings yet
US Dept of Commerce Data Strategy
17 pages
Keynote D.V. Karandikar
No ratings yet
Keynote D.V. Karandikar
47 pages
Using Big Data To Promote Precision Oral Health in
No ratings yet
Using Big Data To Promote Precision Oral Health in
16 pages
Environmental Engineering: Master of Science
No ratings yet
Environmental Engineering: Master of Science
2 pages
Fenomena Dalam Konteks Interaksi Atmosfer
No ratings yet
Fenomena Dalam Konteks Interaksi Atmosfer
9 pages
Pipe Thickness & RF Pad Calculations As Per Asme b313
No ratings yet
Pipe Thickness & RF Pad Calculations As Per Asme b313
3 pages
Art and Gentrification in The Changing Neoliberal Urban Landscape
No ratings yet
Art and Gentrification in The Changing Neoliberal Urban Landscape
4 pages
Does Education Occur Only in Educational
No ratings yet
Does Education Occur Only in Educational
3 pages
Optical Fiber and 10 Gigabit Ethernet
No ratings yet
Optical Fiber and 10 Gigabit Ethernet
13 pages
ĐỀ VÀ ĐÁP ÁN CHON TUYỂN TỈNH LỚP 11 CTN
No ratings yet
ĐỀ VÀ ĐÁP ÁN CHON TUYỂN TỈNH LỚP 11 CTN
17 pages
Kinematics and Kinetics of Marine Vessels: DR Tristan Perez Prof. Thor I Fossen
No ratings yet
Kinematics and Kinetics of Marine Vessels: DR Tristan Perez Prof. Thor I Fossen
53 pages
MIT Computational and Systems Biology
No ratings yet
MIT Computational and Systems Biology
5 pages
Gender & Society - Module 1 Lesson 1
No ratings yet
Gender & Society - Module 1 Lesson 1
2 pages
CH 2. Defining The Marketing Research Problem and Developing An Approach
No ratings yet
CH 2. Defining The Marketing Research Problem and Developing An Approach
32 pages
Discourse Analysis - Avril Lavigne Let Go
No ratings yet
Discourse Analysis - Avril Lavigne Let Go
8 pages
Mubarak Lab 4
No ratings yet
Mubarak Lab 4
4 pages
Testing The
No ratings yet
Testing The
12 pages
Exploring the Use of Social Media in Science Learning Environments A Systematic Literature Review
No ratings yet
Exploring the Use of Social Media in Science Learning Environments A Systematic Literature Review
9 pages
High Output Management by Andy Groove
No ratings yet
High Output Management by Andy Groove
10 pages
GRIDCON GFST-DC160-100A Automatic Battery Charger
No ratings yet
GRIDCON GFST-DC160-100A Automatic Battery Charger
5 pages
SDU 410 Datasheet
No ratings yet
SDU 410 Datasheet
2 pages
Just One Step at A Time
No ratings yet
Just One Step at A Time
6 pages

SHES2201 Lecture 3 - Data Mining in Bioinformatics

Uploaded by

SHES2201 Lecture 3 - Data Mining in Bioinformatics

Uploaded by

SHES2201

Lecture 3 – Data-mining in Bioinformatics

Profesor Madya Khairuddin Itam

Given: a sample (query) object and a database containing a set of

Detect sets of attributes that frequently occur together, and

• Produces a method for assessing the probability

• Those broad computational approaches have

• Technique from speech understanding is now widely used

Find the record(s) that is (are) the most different from

• Gene expression arrays are a popular new

• Identifying genes that changed significantly

• An enormous amount of high quality data is

• A sampling of research activities carried out by

You might also like