0% found this document useful (0 votes)

49 views

Cluster Analysis For Gene Expression Data: Jiong Yang Eecs Case Western Reserve University

This document outlines different clustering algorithms and approaches that can be used for analyzing gene expression data. It discusses gene-based clustering, sample-based clustering, and subspace clustering. Various clustering algorithms are described, including K-means, hierarchical clustering, self-organizing maps, graph-theoretical approaches, and model-based clustering. Both supervised and unsupervised methods for sample clustering are also covered. The document provides an overview of current research in clustering gene expression data.

Uploaded by

Sujan Gowda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views

Cluster Analysis For Gene Expression Data: Jiong Yang Eecs Case Western Reserve University

Uploaded by

Sujan Gowda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 34

Cluster Analysis for Gene

Expression Data

Jiong Yang
EECS
Case Western Reserve University
Outlines
 Introduction

 Clustering Algorithms

 Class Validation

 Current and Future Research Directions

Clusters & Clustering
 Clustering is the process of grouping
data objects into a set of disjoint
classes, called clusters, so that objects
within a class have high similarity to
each other, while objects in separate
classes are more dissimilar.
Categories of gene expression
data clustering
 Gene expression profile: it is a matrix,
consists of a set of genes with a set of
samples.
 Gene-based clustering: the genes are
treated as the objects, while the samples are
the features.
 Sample-based clustering: the samples are
treated as the objects and the genes as the
features.
 Subspace clustering: capture clusters
formed by a subset of genes across a subset
of samples.
Proximity measurement for
gene expression data

Vectors Oi   oij |1  j  p
oij is the expression value of the ith gene under jth sample a
nd p is the number of samples
 Euclidean distance (Wang et al., 2002)
p
Euclidean(Oi , O j )   id jd
( o
d 1
 o ) 2

 Pearson’s correlation coefficient (Tang et a

l., 2002)


p
d 1
(oid  oi )(o jd  o j )
Pearson(Oi , O j ) 
 d 1 id oi  d 1 jd oj
p p
( o   ) 2
( o   ) 2
Outlines
 Introduction

 Clustering Algorithms

 Class Validation

 Current and Future Research Directions

Gene-based clustering
 The purpose of clustering gene expression data is
to reveal the natural data structures and gain
some initial insights regarding data distribution.

 Clustering algorithms for gene expression data

should be capable of extracting useful information
from a high level of background noise.

 A good clustering algorithm may not only partition

the data set but also provide some graphical
representation of the cluster structure
K-means (Tavazoie et al., 1999)

 Given a pre-specified number K, the algorithm

partitions the data set into K disjoint subsets
which optimize the following objective functio
n:
K
E    O  i
2

i 1 OCi

O is a data object in cluster C i and i is the centroid (mean of o

bjects) of C i
Hierarchical clustering (Eisen et a
l., 1998)
 Generates a hierarchical series of nested clu
sters which can be graphically represented
by a tree, called dendrogram.
 Agglomerative approaches (bottom-up a
pproach): single link, complete link and mini
mum-variance
 Divisive algorithms (top-down approach)
: deterministic annealing algorithm, graph t
heoretical methods
Self-organizing map (1) (Tamayo
et al., 1999)
 On the basis of a single layered neural
network
 Each gene is mapped to a high dimension
point
 A number of “virtual centers” chosen.
 An iterative process.
 For each gene, each center is moved toward the
gene. If a center is closer to the gene, the
center travels a larger distance.
Self-organizing map (2)
Graph-theoretical approaches
 Given a dataset X, a proximity matrix P, w
here P  i, j   proximity (Oi , O j ,) and a weight
ed graph , (where
V , E ) each data point co
rresponds to a vertex, the problem of clust
ering a dataset can be converted into findi
ng minimum cut or maximal cliques in the
graph
 CLICK (Shamir et al., 2000) & CAST (Ben-Dor et a
l., 1999)
Model-based clustering (Yeung e
t al., 2001)
 The data set is assumed to come from a finite
mixture of underlying probability distributions,
with each component corresponding to a different
cluster.

 The probabilistic feature of model-based

clustering is particularly suitable for gene
expression data

 The assumption that the data set fits a specific

distribution may not be true in many cases.
A density-based hierarchical a
pproach: DHC (Jiang et al., 2003)
 The basic idea is to consider a cluster as a
high-dimensional dense area, where data
objects are “attracted” with each other.

 DHC effectively detects the co-expressed

genes from noise, and thus is robust in the
noisy environment.
Summary of gene-based
clustering
 Some conventional clustering algorithm, such as K-means,
SOM and hierarchical approaches (UPGMA), is applied in
the early stage and proven to be useful

 Several new clustering algorithms, such as CLICK, CAST

and DHC, have been proposed specifically aiming at gene
expression data

 The performance of each clustering algorithm may vary

greatly with different data sets, and there is no absolute
“winner” among the clustering algorithms
Sample-based clustering
 The goal of sample-based clustering is to
find the phenotype structures or
substructures of the samples.

 Appling the conventional clustering

methods to cluster samples using all the
genes as features may seriously degrade
the quality and reliability of clustering
results
Clustering based on supervised
informative gene selection
 Training sample selection: a subset of samples is
selected to form the training set (less than 100)

 Informative gene selection: pick out those genes

whose expression patterns can distinguish different
phenotypes of samples.

 Sample clustering and classification: the whole

set of samples are clustered using only the informative
genes as features. Conventional clustering algorithms
are usually applied to cluster samples
Unsupervised clustering and
informative gene selection
 Unsupervised sample-based clustering assumes no
phenotype information being assigned to any sample.

 Unsupervised gene selection: First the gene

(feature) dimension is reduced, then the conventional
clustering algorithms are applied. PCA (Alter et al., 2000) &
F-statistic (Ding et al. 2002)
 Interrelated clustering: the relationship between
the genes and samples is dynamically maintained and
a clustering process and a gene selection process are
iteratively combined. CLIFF (Xing et al., 2001)
Summary of sample-based
clustering
 Supervised informative gene selection techniques
is widely applied, and relatively easy to get high
clustering accuracy rate
 Unsupervised sample-based clustering converges
into an accurate partition of the samples and a
set of informative genes as well. One drawback of
these approaches is that the gene filtering
process is non-invertible.
 Two more issues regarding the quality of sample-
based clustering techniques: the number of
clusters K, time complexity of the sample-based
clustering techniques
Subspace clustering
 Only a small subset of the genes
participates in any cellular
process of interest and that any
cellular process takes place only
in a subset of the samples

 A single gene may participate in

multiple pathways that may or
may not be coactive under all
conditions

 Subspace clustering methods

capture coherence exhibited by
the “blocks” within gene
expression matrices. A “block” is
a sub-matrix defined by a subset
of genes on a subset of samples.
Coupled two-way clustering (Getz
et al., 2000)
 CTWC provides a heuristic to avoid brute-force
enumeration of all possible combinations. Only
subsets of genes or samples that are identified as
stable clusters in previous iterations are candidates for
the next iteration. The iteration continues until no new
clusters are found which satisfy some criterion, such
as stability or critical size.

 CTWC searches for blocks in a deterministic manner

and the clustering results are therefore sensitive to
initial clustering settings.
Plaid model (Lazzeroni et al., 2002)
 The plaid model regards gene expression data
as a sum of multiple “layers”, where each layer
may represent the presence of a particular
biological process with only a subset of genes
and a subset of samples involved.

 The plaid model is based on the questionable

assumption that, if a gene participates in
several cellular processes, then its expression
level is the aggregation (sum) of the terms
involved in the individual processes
Biclustering (Cheng et al., 2000) and
δ-Clusters (Yang et al., 2002)
 The bicluster is finding a block, along with a score call
ed the mean-squared residue to measure the coheren
ce of genes and conditions in the block. A low mean-s
quared residue score together with a large variation fr
om the constant suggest a good criterion for identifyin
g a block

 δ-Clusters use average residue across every entry in t

he sub-matrix to measure the coherence within a sub
matrix. A heuristic move-based method called FLOC (F
lexible Overlapped Clustering) is applied to search K e
mbedded subspace clusters
Summary of subspace
clustering
 The genes in the “block” illustrate
coherent expression patterns under the
conditions within the same “block”.

 Different approaches adopt different

greedy heuristics to approximate the
optimal solution and make the problem
tractable
Outlines
 Introduction

 Clustering Algorithms

 Cluster Validation

 Current and Future Research Directions

Cluster validation
 Different clustering algorithms, or even a
single clustering algorithm using different
parameters, generally result in different sets of
clusters. Therefore, it is important to compare
various clustering results and select the one
that best fits the “true” data distribution.

 Three aspects: the quality of clusters,

comparing to a given “ground truth” of the
clusters, the reliability of the clusters
Homogeneity and separation
 The homogeneity of cluster C by the average pairwise object
similarity within C
 Oi ,O j C ,Oi  Oi
Similarity (Oi , O j )
H1 (C ) 
C  ( C  1)

 The homogeneity with respect to the “centroid” of the cluster

C
1
H 2 (C )   O C Similarity (Oi , O) Where O is the “centroid” of C
C i

 Cluster separation is analogously defined from various perspe

ctives to measure the dissimilarity between two clustersC1 an
d C2
 Oi C1 ,O j C2
Similarity (Oi , O j )
S1 (C1 , C2 )  and S 2 (C1 , C2 )  Similarity (O1 , O 2 )
C1  C2
Agreement with reference partitio
n (Halkidi et al., 2001)
 For clustering results, a matrix C can be constructed, Cij=1 if Oi and
Oj belong to the same cluster. Given a “ground truth” matrix P:
n11 is the number of object pairs (Oi,Oj), where Cij=1 and Pij=1
n10 is the number of object pairs (Oi,Oj), where Cij=1 and Pij=0
n01 is the number of object pairs (Oi,Oj), where Cij=0 and Pij=1
n00 is the number of object pairs (Oi,Oj), where Cij=0 and Pij=0

n11  n00
Rand index : Rand  ,
n11  n10  n01  n00
n11
Jaccard coefficient : JC  ,
n11  n10  n01
n10  n01
Minkowski measure : Minkowski 
n11  n01
Reliability of clusters
 P-value  f  g  f 
k 1   
i  n  i 
P  1  
i 0 g
 
n
where f is total number of genes within a functional category and g is the total
number of genes

 Prediction strength (Yeung et al., 2001): the generated clusters are

assessed by repeatedly measuring the prediction strength with
one or a few of the data objects left out in turn as “test sampl
es” while the remaining data objects are used for clustering.
Outlines
 Introduction

 Clustering Algorithms

 Class Validation

 Current and Future Research Directions

Current and future research
directions
 The performance of different clustering algorithms
and different validation approaches is strongly
dependent on both data distribution and application
requirements.
 The gene expression profile is very noisy. How to
deal or remove these noise is an important challenge.
 Integrating different biological knowledge, e.g.,
pathway, GO, etc. into the analysis process.
References(1)
 Alter O., Brown P.O. and Bostein D. Singular value decomposition for genome-wide expression data proces
sing and modeling. Proc. Natl. Acad. Sci. USA, Vol. 97(18):10101–10106, Auguest 2000.
 Ben-Dor A., Shamir R. and Yakhini Z. Clustering gene expression patterns. Journal of Computational Biolog
y, 6(3/4):281–297, 1999.
 Cheng Y., Church GM. Biclustering of expression data. Proceedings of the Eighth International Conference o
n Intelligent Systems for Molecular Biology (ISMB) , 8:93–103, 2000.
 Ding, Chris. Analysis of gene expression profiles: class discovery and leaf ordering. In Proc. of International
Conference on Computational Molecular Biology (RECOMB) , pages 127–136,Washington, DC., April 2002.
 Eisen, Michael B., Spellman, Paul T., Brown, Patrick O. and Botstein, David . Cluster analysis and display of
genome-wide expression patterns. Proc. Natl. Acad. Sci. USA, 95(25):14863–14868, December 1998.
 Getz G., Levine E. and Domany E. Coupled two-way clustering analysis of gene microarray data. Proc. Natl.
Acad. Sci. USA, Vol. 97(22):12079–12084, October 2000.
 Halkidi, M., Batistakis, Y. and Vazirgiannis, M. On Clustering Validation Techniques. Intelligent Information
Systems Journal, 2001.
 Jiang, D., Pei, J. and Zhang, A. . DHC: A Density-based Hierarchical Clustering Method for Timeseries Gene
Expression Data. In Proceeding of BIBE2003: 3rd IEEE International Symposium on Bioinformatics and Bioe
ngineering, Bethesda, Maryland, March 10-12 2003.
 Lazzeroni, L. and Owen A. Plaid models for gene expression data. Statistica Sinica, 12(1):61–86, 2002.
 Shamir R. and Sharan R. Click: A clustering algorithm for gene expression analysis. In In Proceedings of th
e 8th International Conference on Intelligent Systems for Molecular Biology (ISMB ’00). AAAI Press., 2000.
References(2)
 Tamayo P., Solni D., Mesirov J., Zhu Q., Kitareewan S., Dmitrovsky E., Lander E.S. and Golub T.R. Interpret
ing patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differ
entiation. Proc. Natl. Acad. Sci. USA, Vol. 96(6):2907–2912, March 1999.
 Tang C., Zhang L., Zhang A. and Ramanathan M. Interrelated two-way clustering: An unsupervised approac
h for gene expression data analysis. In Proceeding of BIBE2001: 2nd IEEE International Symposium on Bioi
nformatics and Bioengineering, pages 41–48, Bethesda, Maryland, November 4-5 2001.
 Tang, Chun and Zhang, Aidong. An iterative strategy for pattern discovery in high-dimensional data sets. In
Proceeding of 11th International Conference on Information and Knowledge Management (CIKM 02) , McLe
an, VA, November 4-9 2002.
 Tavazoie, S., Hughes, D., Campbell, M.J., Cho, R.J. and Church, G.M. Systematic determination of genetic n
etwork architecture. Nature Genet, pages 281–285, 1999.
 Wang, Haixun, Wang, Wei, Yang, Jiong and Yu, Philip S. Clustering by Pattern Similarity in Large Data Sets.
In SIGMOD 2002, Proceedings ACM SIGMOD International Conference on Management of Data , pages 394
–405, 2002.
 Xing, E.P. and Karp, R.M. Cliff: Clustering of high-dimensional microarray data via iterative feature filtering
using normalized cuts. Bioinformatics, Vol. 17(1):306–315, 2001.
 Yang, Jiong, Wang, Wei, Wang, Haixun and Yu, Philip S. -cluster: Capturing Subspace Correlation in a Larg
e Data Set. In Proceedings of 18th International Conference on Data Engineering (ICDE 2002) , pages 517–
528, 2002.
 Yeung, K.Y., Fraley, C, Murua, A., Raftery, AE., Ruzz WL. Model-based clustering and data transformations f
or gene expression data. Bioinformatics, 17:977–987, 2001.
 Yeung, K.Y., Haynor, D.R. and Ruzzo, W.L. Validating Clustering for Gene Expression Data. Bioinformatics,
Vol.17(4):309–318, 2001.
Thanks
!

KAON - CG2200 - eMTA Technical Manual v1.4
33% (3)
KAON - CG2200 - eMTA Technical Manual v1.4
75 pages
Gigabyte GA-H110M-S2 DDR3 Rev1.0
No ratings yet
Gigabyte GA-H110M-S2 DDR3 Rev1.0
45 pages
How Does Gene Expression Clustering Work?: Primer
No ratings yet
How Does Gene Expression Clustering Work?: Primer
3 pages
Clustering
No ratings yet
Clustering
36 pages
Agenda: 1. Introduction To Clustering
No ratings yet
Agenda: 1. Introduction To Clustering
47 pages
Microarray Data Analysis: Class Discovery and Class Prediction: Clustering and Discrimination
No ratings yet
Microarray Data Analysis: Class Discovery and Class Prediction: Clustering and Discrimination
70 pages
Clustering
No ratings yet
Clustering
45 pages
Data mining and machine learning
No ratings yet
Data mining and machine learning
48 pages
CMMB 461 Dna Microarray 2 2019 For D2L
No ratings yet
CMMB 461 Dna Microarray 2 2019 For D2L
27 pages
Unsupervised Learning-01
No ratings yet
Unsupervised Learning-01
42 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
64 pages
Cluster Analysis in DNA Microarray Experiments: Sandrine Dudoit and Robert Gentleman
No ratings yet
Cluster Analysis in DNA Microarray Experiments: Sandrine Dudoit and Robert Gentleman
48 pages
Classification and Clustering Method
0% (1)
Classification and Clustering Method
30 pages
94dc08a6aded73bc9aea7cb22267245d_MIT6_047F15_Lecture07
No ratings yet
94dc08a6aded73bc9aea7cb22267245d_MIT6_047F15_Lecture07
86 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
SJNanda - Spider and CollidingBodies
No ratings yet
SJNanda - Spider and CollidingBodies
50 pages
Unit 3 DVA
No ratings yet
Unit 3 DVA
50 pages
ML - 8
No ratings yet
ML - 8
70 pages
An Iterative Data Mining Approach For Mining Overlapping Co Expression Patterns in Noisy Gene Expression
No ratings yet
An Iterative Data Mining Approach For Mining Overlapping Co Expression Patterns in Noisy Gene Expression
22 pages
UNIT5
No ratings yet
UNIT5
60 pages
Discovering Knowledge in Data: Lecture Review of
No ratings yet
Discovering Knowledge in Data: Lecture Review of
20 pages
Cluster Analysis: Basic Concepts and Algorithms
No ratings yet
Cluster Analysis: Basic Concepts and Algorithms
141 pages
Clustering
No ratings yet
Clustering
65 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
Clustering new
No ratings yet
Clustering new
6 pages
1. Clustering
No ratings yet
1. Clustering
75 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Concepts and Techniques: - Chapter 11
No ratings yet
Concepts and Techniques: - Chapter 11
103 pages
5 Microarray PDF
No ratings yet
5 Microarray PDF
79 pages
ML-UNIT-III
No ratings yet
ML-UNIT-III
12 pages
Ch10_Clustering
No ratings yet
Ch10_Clustering
45 pages
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
93 pages
Ds Econtent
No ratings yet
Ds Econtent
8 pages
Clustering
No ratings yet
Clustering
39 pages
Clustering
No ratings yet
Clustering
7 pages
Lecture 6
No ratings yet
Lecture 6
55 pages
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
No ratings yet
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
22 pages
Custer Analysis: Prepared by Navin Ninama
No ratings yet
Custer Analysis: Prepared by Navin Ninama
20 pages
K-mean
No ratings yet
K-mean
11 pages
Grouping
No ratings yet
Grouping
98 pages
Lect 12
No ratings yet
Lect 12
80 pages
Ambo University: Inistitute of Technology
No ratings yet
Ambo University: Inistitute of Technology
15 pages
Clustering
No ratings yet
Clustering
64 pages
ML IMP QUES 2
No ratings yet
ML IMP QUES 2
37 pages
Clustering 2
No ratings yet
Clustering 2
17 pages
Lec. 15-Final. ClusAdvanced
No ratings yet
Lec. 15-Final. ClusAdvanced
103 pages
UNIT IV DM
No ratings yet
UNIT IV DM
15 pages
Synopsis: Data Mining Feasibility in Gene Expression Data Analysis Using Weka
No ratings yet
Synopsis: Data Mining Feasibility in Gene Expression Data Analysis Using Weka
12 pages
What Is Cluster Analysis?
No ratings yet
What Is Cluster Analysis?
20 pages
Clustering Basics
No ratings yet
Clustering Basics
39 pages
Module 5
No ratings yet
Module 5
91 pages
ML Assign4
No ratings yet
ML Assign4
7 pages
Cluster
100% (1)
Cluster
72 pages
2009 A Survey of Evolutionary Algorithms For Clustering
No ratings yet
2009 A Survey of Evolutionary Algorithms For Clustering
23 pages
PR Assignment 02 - Seemal Ajaz (206979)
No ratings yet
PR Assignment 02 - Seemal Ajaz (206979)
5 pages
کتاب چهارم بارگزاری شده
No ratings yet
کتاب چهارم بارگزاری شده
63 pages
Clustering in R
No ratings yet
Clustering in R
12 pages
Cluster 3.0 Manual: Michael Eisen Updated by Michiel de Hoon
No ratings yet
Cluster 3.0 Manual: Michael Eisen Updated by Michiel de Hoon
34 pages
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
No ratings yet
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
54 pages
Neuroevolution: Fundamentals and Applications for Surpassing Human Intelligence with Neuroevolution
From Everand
Neuroevolution: Fundamentals and Applications for Surpassing Human Intelligence with Neuroevolution
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Introduction to Bioinformatics, Sequence and Genome Analysis
From Everand
Introduction to Bioinformatics, Sequence and Genome Analysis
Jerry H. Swift
No ratings yet
Abdul Hadi Walizai - Scrum Master
No ratings yet
Abdul Hadi Walizai - Scrum Master
6 pages
37_sap t codes
No ratings yet
37_sap t codes
97 pages
Follow That Ground Station! and Double The Data Throughput Using Polarization Diversity
No ratings yet
Follow That Ground Station! and Double The Data Throughput Using Polarization Diversity
6 pages
Untitled
No ratings yet
Untitled
107 pages
Interaction Design Patterns For Enterprises
No ratings yet
Interaction Design Patterns For Enterprises
128 pages
Swift Mailer
No ratings yet
Swift Mailer
65 pages
Infoscale Tshoot 71 Sol
No ratings yet
Infoscale Tshoot 71 Sol
228 pages
HFTSK Confirmation Slip COA
No ratings yet
HFTSK Confirmation Slip COA
15 pages
Class 8 - Computer
100% (1)
Class 8 - Computer
15 pages
FGRouteBased IPSec With SonicWALL Firewall
No ratings yet
FGRouteBased IPSec With SonicWALL Firewall
12 pages
QNO-7010R/7020R/7030R: 4megapixel Network IR Bullet Camera
No ratings yet
QNO-7010R/7020R/7030R: 4megapixel Network IR Bullet Camera
1 page
Lesson 02 Using Essential Tools Cockpit
No ratings yet
Lesson 02 Using Essential Tools Cockpit
14 pages
MSI Z370 GAMING PLUS (MS-7B61) Rev 1.0 Схема PDF
No ratings yet
MSI Z370 GAMING PLUS (MS-7B61) Rev 1.0 Схема PDF
61 pages
Database Development
No ratings yet
Database Development
6 pages
Sunny Tripower CORE1 (STP 50-40) : Operating Manual
No ratings yet
Sunny Tripower CORE1 (STP 50-40) : Operating Manual
118 pages
American International University-Bangladesh (AIUB) Department of Computer Science
No ratings yet
American International University-Bangladesh (AIUB) Department of Computer Science
3 pages
Asm4 Guide PDF
No ratings yet
Asm4 Guide PDF
154 pages
(Ebook) Agent-Based Modeling for Archaeology: Simulating the Complexity of Societies by Stefani Crabtree, Iza Romanowska, Colin Wren ISBN 9781947864252, 1947864254 download
100% (3)
(Ebook) Agent-Based Modeling for Archaeology: Simulating the Complexity of Societies by Stefani Crabtree, Iza Romanowska, Colin Wren ISBN 9781947864252, 1947864254 download
30 pages
CALCULUS SampleQuestions
No ratings yet
CALCULUS SampleQuestions
2 pages
s5-s6-schemes-of-work-2025
No ratings yet
s5-s6-schemes-of-work-2025
13 pages
Virtual User Interface For Industrial Robots Off-Line Programming & Simulation and Video Cam On-Line Remote Operation
No ratings yet
Virtual User Interface For Industrial Robots Off-Line Programming & Simulation and Video Cam On-Line Remote Operation
6 pages
Kaseya VSA
No ratings yet
Kaseya VSA
12 pages
Understanding A Consolidation
No ratings yet
Understanding A Consolidation
2 pages
A Technical Introduction To USB 2.0
No ratings yet
A Technical Introduction To USB 2.0
5 pages
Platform 2
No ratings yet
Platform 2
5 pages
Infinite Series and Sequences: Some Tips and Traps: N N N N N
No ratings yet
Infinite Series and Sequences: Some Tips and Traps: N N N N N
2 pages
Experiment 2 Updated
No ratings yet
Experiment 2 Updated
11 pages
BNPP CIB MITP - Extended Rights L2 (Job Description)
No ratings yet
BNPP CIB MITP - Extended Rights L2 (Job Description)
3 pages

Cluster Analysis For Gene Expression Data: Jiong Yang Eecs Case Western Reserve University

Uploaded by

Cluster Analysis For Gene Expression Data: Jiong Yang Eecs Case Western Reserve University

Uploaded by

Cluster Analysis for Gene

 Current and Future Research Directions

 Pearson’s correlation coefficient (Tang et a

 Current and Future Research Directions

 Clustering algorithms for gene expression data

 A good clustering algorithm may not only partition

 Given a pre-specified number K, the algorithm

O is a data object in cluster C i and i is the centroid (mean of o

 The probabilistic feature of model-based

 The assumption that the data set fits a specific

 DHC effectively detects the co-expressed

 Several new clustering algorithms, such as CLICK, CAST

 The performance of each clustering algorithm may vary

 Appling the conventional clustering

 Informative gene selection: pick out those genes

 Sample clustering and classification: the whole

 Unsupervised gene selection: First the gene

 A single gene may participate in

 Subspace clustering methods

 CTWC searches for blocks in a deterministic manner

 The plaid model is based on the questionable

 δ-Clusters use average residue across every entry in t

 Different approaches adopt different

 Current and Future Research Directions

 Three aspects: the quality of clusters,

 The homogeneity with respect to the “centroid” of the cluster

 Cluster separation is analogously defined from various perspe

 Prediction strength (Yeung et al., 2001): the generated clusters are

 Current and Future Research Directions

You might also like