0% found this document useful (0 votes)
2 views10 pages

Anomaly Detection in Gene Expression via Stochastic Models of Gene Regulatory Networks

This study presents a method for anomaly detection in gene expression using stochastic models of gene regulatory networks (GRNs). By applying G-Network theory, the authors demonstrate that their approach effectively monitors steady-state behaviors and detects differentially expressed genes, outperforming traditional methods like the t-test. The findings indicate that key regulatory genes can be expressed without certain cyclins, providing insights into GRN dynamics and potential applications in further research.

Uploaded by

ranaimransa227
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views10 pages

Anomaly Detection in Gene Expression via Stochastic Models of Gene Regulatory Networks

This study presents a method for anomaly detection in gene expression using stochastic models of gene regulatory networks (GRNs). By applying G-Network theory, the authors demonstrate that their approach effectively monitors steady-state behaviors and detects differentially expressed genes, outperforming traditional methods like the t-test. The findings indicate that key regulatory genes can be expressed without certain cyclins, providing insights into GRN dynamics and potential applications in further research.

Uploaded by

ranaimransa227
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

BMC Genomics BioMed Central

Proceedings Open Access


Anomaly detection in gene expression via stochastic
models of gene regulatory networks
Haseong Kim and Erol Gelenbe*
Address: Intelligent Systems Networks Group, Electrical and Electronic Engineering Department, Imperial College London, UK
E-mail: Haseong Kim - [email protected]; Erol Gelenbe* - [email protected]
*Corresponding author

from Asia Pacific Bioinformatics Network (APBioNet) Eighth International Conference on Bioinformatics (InCoB2009)
Singapore 7-11 September 2009

Published: 3 December 2009


BMC Genomics 2009, 10(Suppl 3):S26 doi: 10.1186/1471-2164-10-S3-S26

This article is available from: http://www.biomedcentral.com/1471-2164/10/S3/S26


© 2009 Kim and Gelenbe; licensee BioMed Central Ltd.
This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract
Background: The steady-state behaviour of gene regulatory networks (GRNs) can provide
crucial evidence for detecting disease-causing genes. However, monitoring the dynamics of GRNs is
particularly difficult because biological data only reflects a snapshot of the dynamical behaviour of
the living organism. Also most GRN data and methods are used to provide limited structural
inferences.
Results: In this study, the theory of stochastic GRNs, derived from G-Networks, is applied to
GRNs in order to monitor their steady-state behaviours. This approach is applied to a simulation
dataset which is generated by using the stochastic gene expression model, and observe that the
G-Network properly detects the abnormally expressed genes in the simulation study. In the
analysis of real data concerning the cell cycle microarray of budding yeast, our approach finds that
the steady-state probability of CLB2 is lower than that of other agents, while most of the genes
have similar steady-state probabilities. These results lead to the conclusion that the key regulatory
genes of the cell cycle can be expressed in the absence of CLB type cyclines, which was also the
conclusion of the original microarray experiment study.
Conclusion: G-networks provide an efficient way to monitor steady-state of GRNs. Our method
produces more reliable results then the conventional t-test in detecting differentially expressed
genes. Also G-networks are successfully applied to the yeast GRNs. This study will be the base of
further GRN dynamics studies cooperated with conventional GRN inference algorithms.

Background technology, the amount of microarray gene expression


Identifying the key features and dynamics of gene data has greatly increased, and numerous mathematical
regulatory networks (GRNs) is an important step models attempt to explain gene regulations using gene
towards understanding behaviours of biological systems. networks [1,2]. Once a network structure is inferred, its
Thanks to the development of high-throughput dynamics needs to be considered. However, most

Page 1 of 10
(page number not for citation purposes)
BMC Genomics 2009, 10(Suppl 3):S26 http://www.biomedcentral.com/1471-2164/10/S3/S26

methods focus on the inference of network structure enables us to obtain the steady-state of GRNs with only
which only provides a snapshot of a given dataset. polynomial computational complexity due to the pro-
Probabilistic Boolean Networks (PBNs) represent duct form solution of G-Networks; the computational
the dynamics of GRNs [3], but PBNs are limited by cost due to large memory space and non-polynomial
the computational complexity of the related algo- computational complexity are basic limitations in con-
rithms [4]. ventional methods such as PBN. Also our method can
provide more reliable measures to detect differentially
In [5], a new approach to the steady-state analysis of expressed genes in microarray analysis (as shown in our
GRNs based on G-Network theory [6,7] is proposed, simulation study).
while G-Networks were firstly applied to GRNs with
simplifying assumptions concerning gene expression
in [8]. However, the G-Network approach also exhibits G-networks and gene regulatory networks
specific difficulties because of the large number of The GRN model used in this study is the probabilistic
parameters that are needed to compute their steady- gene regulatory model introduced in [5]. In this
state solution. Thus, in this study we reduce the number model, let Ki(t) be integer-valued random variables
of model parameters on the basis of biological assump- which represent a quantity (mRNA) of the gene i at
tions and focus on estimating two parameters in time t. If the Ki(t) is zero, the gene i cannot interact
particular: the total input rate and steady-state prob- with other genes. Then we have the following
ability of a gene. Probabilities,

A G-Network is a probabilistic queuing network having Pr(K i (t + Δt ) = k + 1 | K i (t ) = k) = Λ i Δt + o(Δt )


special customers which include positive and negative Pr(K i (t + Δt ) = k − 1 | K i (t ) = k) = μ i Δt + o(Δt )
“customers”, signals and triggers [6,7]. It was originally
developed also as a model of stochastic neuronal where Λi is the total input rate (sum of transcription rate,
networks [9] with “negative and positive signals or li and increment rate of mRNAs come from outside of
spikes” which represent inhibition and excitation. In system, Ii), μi is the service rate (e.g. Degradation rate of
terms of GRNs, a queue is a “place” in which mRNAs are mRNAs). o(Δt) Æ 0 as t Æ 0. Let ri is representing the
stored, and an mRNA can be considered to be a activity (signal process) rate of each gene i. Then 1/ri is
“customer” of the G-Network. The positive and negative the average time between successive interactions of gene
signals are interpreted as the protein activities such as i with other genes. If the ith gene interacts with other
transcription factors, inducers and repressors. Note that genes, the following events occur:
the customers or signals of the G-Network can be any
biological molecules. However, in our study, we focus • With probability P+ (i, j), gene i activates gene j;
on behaviours of mRNAs because the available GRN when this happens, Ki(t) is depleted by 1 and Kj(t) is
data are usually mRNA expressions. Each queue has an increased by 1
input and service rates which represent a transcription • With probability P- (i, j), gene i inhibits gene j;
and degradation processes, respectively. Our interest is when this happens, both Ki(t) and Kj(t) are depleted
to estimate the steady-state probability that a queue is by 1
busy, which corresponds to the probability that an • With probability Q(i, j, l) gene i joins with gene j to
mRNA is present, and we are also interested in the total act upon gene l in excitatory mode, as a result of
mRNA input rate of each queue. To evaluation the which both Ki(t) and Kj(t) are reduced by 1, while
accuracy of the proposed method, we generated a simple Kl(t) is increased by 1
simulation dataset by using the stochastic gene expres- • With probability di, which is defined as follow,
sion models processed with the widely accepted Gille-
spie algorithm [10,11]. We also examine a real n ⎛ n ⎞
biological dataset obtained from the cell cycle of the
budding yeast [12].
di + ∑
j =1
⎜ P + (i, j) + p − (i, j) +


∑ Q(i, j, l) ⎟⎟⎠ = 1
l =1

Although queueing theory is a common computational the signal of gene i exits the system so Ki(t) is depleted by 1
tool, G-Networks are an essential departure from
queueing theory; in particular conventional queues Let’s define a random process K(t) = [K1(t), ..., Kn(t)],
could not be possibly applied to GRNs because the t ≥ 0 and an n-vector of non-negative integers k = [k1, ...,
notion of inhibition does not exist in queueing theory kn]. The P (k, t) is the probability that K(t) takes k at time
but was introduced by G-Network theory. There are two t, P (k, t) = P (K(t) = k). Then the probability that
other essential novelties in our work. First, our approach K(t) have k at time t + Δt is defined by

Page 2 of 10
(page number not for citation purposes)
BMC Genomics 2009, 10(Suppl 3):S26 http://www.biomedcentral.com/1471-2164/10/S3/S26

∑ ⎡⎣ (Λ Δt + o(Δt))P(k
km
P(k , t + Δt ) = i

i , t )I(k i > 0) + (μ i Δt + o(Δt ))P(k i+ , t ) P( K m = k m ) = q m (1 − q m ),
i =1 |I|
+(d i ri Δt + o(Δt ))P(k i+ , t )
n (
P (K m1 ,..., K m|I| ) = (k m1 ,..., k m|I| ) = ) ∏q k mi
m i (1 − q mi )
+ ∑ {(P (i, j)r Δt +
j =1
+
i o(Δ))P(k ij+− , t )I(k j > 0) i =1
(4)
+(P − (i, j)ri Δt + o(Δ))P(k ij++ , t )
+(P − (i, j)ri Δt + o(Δ))P(k i+ , t )I(k j = 0) where for any subset I ⊂ 1, ..., n such that qm<1 for each
n m Œ I, and I{m1, ..., m|I|}.
+ ∑ ((Q(i, j, l)r Δt + o(Δt))P(k
l =1
i
++−
ijl , t )

++−
+(Q( j, i, l)r j Δt + o(Δt ))P(k ijl , t ))I(k l > 0) } Results and discussion
Simple gene regulatory networks using stochastic
+(1 − Λ i Δ t − μ i Δt − ri Δt + o(Δt ))P(k , t )I(k i > 0) ]
gene expression model
(1) In order to assess our G-Network model, we construct a
simple GRN structure and generate the expression data
where k i+ (k i− ) is a vector that the value of ith element is using a synthetic stochastic gene expression model
ki + 1 (ki - 1) and I(x) is indicator function which is 1 if [13,14]. This stochastic gene expression model has
the condition, x, is satisfied or 0 other wise. The first and several important features such as protein dimerization
second terms describe the increment and decrement of [15] and time delay for protein signalling [13]. Figure 1
the length of queue i, respectively. Third term is the shows the simulated network structure which is based on
probability that the gene i is activated but nothing is the following basic principles: the number of proteins
happened except queue i lose one mRNA. From fourth per cell chases the number of mRNAs which in turn
to sixth terms are the probabilities that gene i is activated chases the number of active genes [14]. Figure 2 depicts
and interacts with gene j. The rest terms of (1) represent the assumptions of our model and (5)~(11) give the
the probabilities that the interaction of gene i and gene j corresponding processes (RPo: RNA open complex, Pro:
affect the gene l (length of lth queue). Divide (1) by Δt promoter, R: mRNA, P: protein monomer, PP: protein
and introduce the equilibrium probability distribution dimmer, 0: degradation, t: time, and Δt: time increment):
of the system P(k) = limt Æ ∞ P (k, t) then we obtain
following dynamic behaviour, λ2
RPo i (t ) + Pro i (t ) → RPo i (t ) + Pro i (t ) + R i (t ) (5)
n
∂P(k)
∂t
= ∑ ⎡⎣ Λ P(k )I(k ) > 0) + (μ
i =1
i

i i i + d i ri )P(k i+ )
λ3
n R i (t ) → R i (t ) + Pi (t ) (6)
+ ∑{ P
j =1
+
(i, j)ri P(k ij+− )I(k j > 0) + P − (i, j)ri (P(k ij++ ) + P(k i+ )I(k j = 0))
ka2
n

} Pi (t ) + P j (t ) → PPij (t ) (7)
+ ∑ ((r Q(i, j, l) +
l =1
i
++−
r jQ( j, i, l)P(k ijl )I(k l ))

−(Λ i + μ i + ri )P(k)I(k i > 0) ] kd2


(2) PPij (t ) → Pi (t ) + P j (t ) (8)

Now, let’s consider following equations, Λ i+ and Λ i− l a1


Pro l (t ) + PPij (t ) → Pro lPPij (t + Δt ) (9)
n n
Λ i+ = Λ i + ∑ q r P ( j, i) + ∑
j =1
j j
+

j ,l =1,l ≠ j
q jq l r jQ( j, l, i) l d1
Pro lPPij (t ) + PPmn (t ) → Pro l (t + Δt ) + PPij (t + Δt ) + PPmn (t + Δt )
(3)
n n (10)
Λ i− = μ i + ∑
j =1
q j r j P − ( j, i) + ∑
j ,l =1,l ≠ j
q lq i rlQ(l, i, j)
γ2
R i (t ) → 0(t )
γ3
Where qi (= Λ i+ /(ri + Λ i− )) represents the probability
Pi (t ) → 0(t ) (11)
that gene i is expressed in steady-state. Using (2) and (3),
E. Gelenbe showed the following product form is γ4
satisfied [5,7]. PPij (t ) → 0(t )

Page 3 of 10
(page number not for citation purposes)
BMC Genomics 2009, 10(Suppl 3):S26 http://www.biomedcentral.com/1471-2164/10/S3/S26

where i, j Œ {A, B, C, D} in Figure 1. In addition, we


assume that proteins such as transcription factors and
repressors require accumulation times for their activa-
tion [11,13], and use the modified Gillespie algorithm to
generate the expression data [10,11]. The cell growth rate
and cell volume are fixed, and we consider five cells.
Detailed parameters are summarized in Table 1 with
their references.

The transcription process in (5) follows an exponential


distribution with transcription initiation rate l2 [16].
The translation processes are given in (6) and include
direct competition between the ribosome binding site
and the RNAse-E binding site which degrade the mRNAs.
Thus the translation process follows a geometric dis-
tribution with probability p and busting size b = p(1 - p)
[13,16]. T D is the average time interval between
successive competitions, and the number of surviving
mRNAs n2 in the population after transcription is
blocked with n2 = n2,0 p t / TD . This is equal to Thalf =
-(log(2)/log(p))TD [13]. Thus the translation initiation
Figure 1 rate, l3 = 1/TD, can be computed. The protein dimer
Simple gene regulatory network structure. The association and disassociation rates are ka2 and kd2,
simulation study performed with the four gene GRN respectively, as shown in (7) and (8) [17]. We also
structure. Each gene inhibits its neighbor gene. consider the DNA-protein association and disassociation
rates (ka1 and kd2 in (9) and (10), respectively) [18]. The
degradation rate of mRNA and of proteins are obtained
by using the half-life of each molecule (11) [16,17].

We generate three sets of expression data (Dataset 1, 2,


and 3); each dataset has two groups, the normal and the
case group. These groups are obtained with the same
parameter values except for the transcription initiation
rate of GA in case group is 0.0012 sec-1 which is half of
the transcription rate in normal group, 0.0025 sec-1. Both
groups are simulated during 3000 seconds. In order to
compare these two groups, we perform not only the
G-Network analysis but also the t-test which is widely
used to find differentially expressed genes in microarray
analysis. Datasets 1 and 2 consist of 50 samples each

Table 1: Parameters of stochastic gene expression model

Parameters Values References

Transcription initiation l2 0.0025 sec -1


[16,22]
Translation initiation l3 0.0612 sec-1 [14,16]
mRNA degradation g2 0.00578 sec-1 [16]
Monomer degradation g3 0.00077 sec-1 [16,17]
Figure 2 Dimer degradation g4 0.00057 sec-1 [16,17]
Assumptions for the stochastic gene expressions. Dimer association ka1 0.1 [17]
Dimer dissociation kd1 0.01 [17]
There are total 10 processes (Transcription, Translation, DNA-protein association ka2 0.189 [18]
mRNA degradation, Dimerization, Monomerization, DNA-protein dissociation kd2 0.0157 [18]
Monomer degradation, Dimer degradation, Time delay for Burst size b 10 [14,16]
protein activation, DNA-protein association/disassociation) Accumulation time of proteins Δt 0.1 [11]
for the stochastic gene expression modeling.
Cell growth rate and the cell volume are fixed.

Page 4 of 10
(page number not for citation purposes)
BMC Genomics 2009, 10(Suppl 3):S26 http://www.biomedcentral.com/1471-2164/10/S3/S26

Table 2: Steady-state probability and total income rate of dataset showing significant p-value of GA

Normal Case t-test

q Λ q Λ qC/qN ΛC/ΛN p-value

Dataset 1 50 Samples GA 0.512 0.765 0.296 0.465 0.57 0.60 0.000


GB 0.517 0.785 0.595 0.775 1.15 0.98 0.123
GC 0.502 0.765 0.546 0.875 1.08 1.14 0.311
GD 0.487 0.735 0.563 0.875 1.15 1.19 0.127

Dataset 2 50 Samples GA 0.445 0.675 0.369 0.565 0.82 0.83 0.202


GB 0.423 0.615 0.556 0.765 1.31 1.24 0.016
GC 0.472 0.675 0.432 0.675 0.91 1.00 0.439
GD 0.510 0.755 0.525 0.755 1.02 1.00 0.661

Dataset 3 500 Samples GA 0.474 0.725 0.319 0.495 0.67 0.68 0.000
GB 0.503 0.745 0.584 0.775 1.16 1.04 0.000
GC 0.460 0.695 0.443 0.705 0.96 1.01 0.304
GD 0.521 0.765 0.541 0.785 1.03 1.02 0.122

which are drawn from all the data points. In Dataset 1, genes are involved with specific cell cycle phases, but the
the expression of GA is significantly different (p-value of number of key regulators that are responsible for the
t-test <0.01 in Table 2) while the difference of the GA control of the cell cycle process is much smaller. Thus,
expression in Dataset 2 is not significant. The third based on published information, we build a cell cycle
dataset consists of 500 samples which are randomly GRN with the key regulators in budding yeast as shown
chosen from the original observations. in Figure 3, although the relationships that contribute to
the true regulatory network structure of the cell cycle still
Table 2 summarizes the results of the three datasets. In remain uncertain. Therefore we simplify the cell cycle
the case groups of Datasets 1 and 2, both the qA and ΛA network structure by selecting thirteen key regulatory
have the lowest values among the four nodes while the t- genes (the gray circles in Figure 3) and connect the genes
test of the GA expression in Dataset 2 shows that it is not without regard to the transcriptional and post-transcrip-
significant (p-value = 0.202). In the small sample results tional processes. Figure 4 shows the reconstructed
(Datasets 1 and 2), our method provides consistent regulatory network structure.
results with large sample analysis (Dataset 3). The ratios
(case/normal) also show that the qA and ΛA, in the case The activity of cyclin-dependent kinases (CDKs) plays an
group, are smaller than one while the other ratios stay important role in controlling periodic events during cell
around one. In Dataset 3, the p-value of GB is significant cycle. Some studies of cell cycle with high-throughput
along with that of GA because the expression of GA technologies have suggested alternative regulation models
directly affects the expression of GB. However, GB is not of periodic transcription [20]. D. Olando et., al. [12]
the causal gene in this study. Our G-Network analysis measured the transcription levels of cell cycle related genes
reveals that only GA has lower q and Λ values than other with the use of Yeast 2.0 oligonucleotide array and
nodes including GB. All these results concur with the determined the manner in which transcription factor
simulation data generated with one half of the normal networks contribute to CDKs and to global regulation of
transcription rate. the cell-cycle transcription process. This microarray dataset
is used in our study with the cell cycle network structure of
Modeling cell cycle gene regulatory networks in Figure 4; it consists of two groups: one group is obtained
budding yeast from wild-type (WT) cells and the other is from cyclin-
The cell cycle regulated transcription and its overall mutant (CM) cells which are disrupted for all S-phase and
controls have been studied in detail for budding yeast mitotic cyclins (mutate clb1, 2, 3, 4, 5, and 6).
[19]. Recent developments in high-throughput micro-
array techniques help to reveal many of yeast genes The microarray data consist of a total of 30 data points
controlling the cell cycle [20] which consists of four taken over 270 minutes. We subdivide it into five states
distinct phases: Gap1 (G1), Synthesis (S), Gap2 (G2), (groups), each consisting of 6 data points. The expres-
and Mitosis (M). The cells grow during their G1 and G2 sion levels are transformed by taking the natural
phases and their DNA is replicated during the S phase. In logarithm. Figure 5 depicts the transformed expression
the M phase, cell growth stops and the cell divides into profiles of the 13 genes with 5 states. The black and gray
two daughter cells that include nuclear division. Many solid lines are the expression profiles from WT and CM

Page 5 of 10
(page number not for citation purposes)
BMC Genomics 2009, 10(Suppl 3):S26 http://www.biomedcentral.com/1471-2164/10/S3/S26

Figure 3
Cell cycle regulatory network structure in budding yeast. The genes are represented by circles. Complex molecules
consisted of two more proteins are represented by a white rectangle. The gray and black boxes are transcription and post-
transcription processes, respectively. Activation processes are depicted by the solid lines and inhibitions or repressions are
shown by the dashed lines. The genes with gray circles are used to model the G-Networks.

cells, respectively, and S1, S2, ..., S5 represent the five i.e. that the steady-state of the 12 genes does not entirely
states. It is obvious that the profiles of CLB2 are different depend on the expression of CLB2. Table 4 shows the
between WT and CM cells because the CM dataset is estimated total input rate of the 13 genes. These results
designed to monitor the cell cycle processes without the also show that only the input rates of CLB2 decrease in
clb cyclines. the CM group.

Table 3 summarizes the steady-state probabilities of 13 Conclusion


genes in the cell cycle GRN. All genes have similar This paper has used the G-Network approach [5-8] to
steady-state probabilities in the WT and CM cell groups model GRNs. Two model parameters, the steady-state
except for CLB2 in the CM group, which has a lower probability, qi, and the total input rate, ΛI, are estimated
steady-state probability than the elements of the WT by determining the boundary of Λi and using a grid
group: as shown in Table 3, the ratio of CM/WT is search. We first use simulated gene expression data
smaller than one (bold letter). This smaller probability generated on the basis of a stochastic gene expression
can be explained by considering the experimental design model. Two groups (normal and case) of expression data
of the CM dataset which is obtained without clb cyclines. are examined. These two groups are exactly the same
Also, the original study of this dataset suggested except for one parameter, the transcription initiation
alternative cell cycle regulatory pathways in [12] which rate. We have observed that the G-Network based
had revealed that over 70% of the cell cycle related genes method is able to detect the abnormally expressed
were expressed periodically without the clb cyclines. In genes, while the t-test produces false positives. Then,
our results, the steady-state probabilities of the CM using real data, we have observed that the steady-state
group are consistent with that of the WT group. These probability of CLB2 is lower than that of other agents
results draw the same conclusion as the original study, and concluded that the key genes of cell cycle regulation

Page 6 of 10
(page number not for citation purposes)
BMC Genomics 2009, 10(Suppl 3):S26 http://www.biomedcentral.com/1471-2164/10/S3/S26

of GRNs, and has shown their usefulness in obtaining


quantities such as the effective transcription rate and the
steady-state probabilities, using them to detect differen-
tially expressed genes, thus introducing a new approach
which differs from more conventional microarray analysis
methods. Future research will investigate the ensemble
approaches to GRNs [21] based on the G-Network
methodology in [5], which will allow to infer GRN
structures, and also to monitor their steady-state behaviour.

Methods
Once a GRN structure is determined, it is necessary to
estimate the total input rate (Λi) of ith queue and its
steady-state probability, (qi). For the simplicity, the
probabilities, P+ (i, j), P- (i, j), and Q(i, j, l) in (3) are
set to be one. Then, it can be rewritten as follows

Λ i + f i+ (q j )
qi =
Ri + f i−(q j )
n n

∑ ∑
Figure 4
Cell cycle regulatory network structure with selected where f i+ (q j ) = q jr j + q jq l r j (12)
13 genes. Each node represents a queue. Signals are j =1 j ,l =1,l ≠ j
transferred through the edges. Solid and dashed lines are n n
positive and negative interactions, respectively. f i− (q j ) = ∑
j =1
q jr j + ∑
j ,l =1,l ≠ j
q lq i rl

can be expressed without the clb cyclines; this result is In (12), the Λi and Ri is the total input (Λi = li + Ii) and
consistent with the original experimental study. total output rates (Ri = ri + μi), respectively. f i+ is a
function of activation probabilities of genes which affect
However, the unchanged steady-state probabilities in all to gene i positively and f i− is a function of activation
the five states may need to be considered, because the cell probabilities of genes which affect to gene i negatively.
cycle has four phases (G1, S, G2, M) and expressions of We fix the ri as the number of out degrees of gene i and
genes involved with a specific phase are expected to be the degradation rate of mRNA, i, as a constant (Table 1)
different from those in other phases. Also the small because the total output rate, Ri is not our interest.
decrease rate and relatively large total input rates of CLB2 Therefore, we need to estimate two parameters, the total
may require a more careful analysis of the G-Network input rate, Λi, and the steady-state probability, qi.
approach in relation to cell cycle GRN structure. The
manner in which we have used G-Network models in this Let Λ lower
i is the lower bound of the Λi, which is larger
paper did not currently include simultaneous interactions than zero. The lower bound of total input is regarded as
with three or more nodes. However this is not really a an initial transcription rate without any external input.
limiting effect of the model, since it suffices to include In this study, we use Λ lower
i = 0.0025 [16]. The upper
chain representations of dependencies in the G-Network bound of Λi Λ iupper is obtained by assuming inputs from
model as has been done for neuronal networks [9] to other nodes are zero and the queues fully work. That is
cover excitatory and inhibitory effects that involve three
or more nodes, and in fact random chains of nodes of any Λ iupper = q i∗(Ri + f i− (q ∗j ))
length. Although in this study the probabilities that gene i
affect gene j, P+ (i, j) and P- (i, j) in (3), are fixed at the where the probabilities q i∗ and q ∗j are one.
value one, we think that the conventional reverse
engineering GRN methods using the “Ensemble” method Let q (i 0) is the initial value of qi. Then q (i 0) can be
[21] can provide these probabilities more accurately for obtained as follow,
an improved steady-state analysis of GRNs.
q (i 0) = E j[q ij(0)]
In conclusion, our study has illustrated the use of
G-Networks as a new approach for the steady-state analysis = E j[x ij / max( x ij )]

Page 7 of 10
(page number not for citation purposes)
BMC Genomics 2009, 10(Suppl 3):S26 http://www.biomedcentral.com/1471-2164/10/S3/S26

Figure 5
Expression profiles of selected 13 genes. The black and gray lines represent the wild-type (WT) and clb-mutant (CM)
groups' expression levels.

Table 3: Steady-state probability of the 13 genes in cell cycle GRNs

State Cells CLN3 WHI5 SWI4 MBP1 CLB2 YOX1 YHP1 HCM1 FKH2 NDD1 SWI5 ACE2 SIC1

S1 WT 0.880 0.813 0.829 0.839 0.784 0.99 0.803 0.843 0.855 0.836 0.799 0.99 0.99
CM 0.878 0.814 0.818 0.848 0.770 0.99 0.802 0.842 0.864 0.839 0.787 0.99 0.99
C/W 0.998 1.001 0.987 1.011 0.981 1.00 0.999 0.999 1.011 1.004 0.986 1.00 1.00

S2 WT 0.882 0.845 0.845 0.840 0.847 0.99 0.850 0.870 0.863 0.863 0.825 0.99 0.99
CM 0.876 0.837 0.846 0.847 0.769 0.99 0.853 0.873 0.865 0.861 0.807 0.99 0.99
C/W 0.994 0.990 1.000 1.008 0.909 1.00 1.004 1.004 1.002 0.998 0.978 1.00 1.00

S3 WT 0.890 0.840 0.826 0.846 0.886 0.99 0.844 0.855 0.863 0.854 0.871 0.99 0.99
CM 0.880 0.846 0.820 0.849 0.751 0.99 0.863 0.863 0.869 0.870 0.840 0.99 0.99
C/W 0.989 1.008 0.993 1.003 0.847 1.00 1.022 1.010 1.007 1.019 0.964 1.00 1.00

S4 WT 0.890 0.841 0.837 0.845 0.866 0.99 0.839 0.870 0.862 0.853 0.857 0.99 0.99
CM 0.879 0.835 0.821 0.849 0.757 0.99 0.864 0.864 0.859 0.863 0.845 0.99 0.99
C/W 0.988 0.993 0.982 1.005 0.874 1.00 1.029 0.994 0.996 1.012 0.986 1.00 1.00

S5 WT 0.891 0.850 0.837 0.846 0.877 0.99 0.839 0.869 0.862 0.856 0.865 0.99 0.99
CM 0.869 0.830 0.823 0.842 0.756 0.99 0.862 0.862 0.857 0.861 0.845 0.99 0.99
C/W 0.976 0.977 0.983 0.995 0.862 1.00 1.027 0.991 0.994 1.006 0.976 1.00 1.00

Page 8 of 10
(page number not for citation purposes)
BMC Genomics 2009, 10(Suppl 3):S26 http://www.biomedcentral.com/1471-2164/10/S3/S26

Table 4: Estimated total input rate of the 13 genes in cell cycle GRNs

State Cells CLN3 WHI5 SWI4 MBP1 CLB2 YOX1 YHP1 HCM1 FKH2 NDD1 SWI5 ACE2 SIC1

S1 WT 4.127 2.248 5.309 0.763 1.278 2.006 0.914 0.995 0.884 1.015 1.783 1.006 1.006
CM 4.127 2.248 5.238 0.793 1.217 2.006 0.914 0.995 0.914 1.036 1.702 1.006 1.006
C/W 1.000 1.000 0.987 1.040 0.953 1.000 1.000 1.000 1.034 1.020 0.955 1.000 1.000

S2 WT 4.187 2.339 5.521 0.763 1.430 2.006 0.995 1.036 0.854 0.995 1.945 1.006 1.006
CM 4.187 2.309 5.521 0.793 1.187 2.006 0.995 1.036 0.854 1.056 1.743 1.006 1.006
C/W 1.000 0.987 1.000 1.040 0.830 1.000 1.000 1.000 1.000 1.061 0.896 1.000 1.000

S3 WT 4.187 2.339 5.379 0.763 1.551 2.006 0.995 1.015 0.884 0.955 2.187 1.006 1.006
CM 4.187 2.339 5.379 0.793 1.127 2.006 1.036 1.036 0.884 1.096 1.824 1.006 1.006
C/W 1.000 1.000 1.000 1.040 0.726 1.000 1.041 1.020 1.000 1.148 0.834 1.000 1.000

S4 WT 4.187 2.339 5.450 0.763 1.490 2.006 0.975 1.036 0.854 0.955 2.106 1.006 1.006
CM 4.187 2.309 5.379 0.793 1.157 2.006 1.036 1.036 0.854 1.076 1.864 1.006 1.006
C/W 1.000 0.987 0.987 1.040 0.776 1.000 1.062 1.000 1.000 1.127 0.885 1.000 1.000

S5 WT 4.187 2.369 5.450 0.763 1.521 2.006 0.975 1.036 0.854 0.955 2.147 1.006 1.006
CM 4.127 2.278 5.379 0.793 1.157 2.006 1.036 1.036 0.854 1.076 1.864 1.006 1.006
C/W 0.986 0.962 0.987 1.040 0.761 1.000 1.062 1.000 1.000 1.127 0.868 1.000 1.000

where xij is the observed expression level (number of Competing interests


mRNAs) of ith gene at the jth observation and max(xij) is The authors declare that they have no competing interests.
the maximum value among all observed values of ith
gene. Let Λiu is a value of total input rate between the Authors’ contributions
lower bound and the upper bound of Λi Haseong Kim developed the data analysis techniques
( (Λ lower
i ≤ Λ iu ≤ Λ iupper ) ). Then the steady-state probabil- including synthetic data generation and tested the
ity qi can be obtained numerically by solving (12) with models on the data. He wrote the first draft of the paper.
the q (i 0) and the Λiu. Once the steady-state probability is
determined, the log-likelihood of the given model can be E. Gelenbe developed the G-Network models and the
computed by using (4) which is the same form of the specific application of these models to GRNs. He rewrote
likelihood of geometric distribution. It is known that the the paper for submission, and then finalised the accepted
log-likelihood of geometric distribution is convex so we paper in preparation for its publication.
choose appropriate value Λi which maximizes the log-
likelihood function. Note
Other papers from the meeting have been published as
For each value of total input, Λiu ( (Λ lower
i ≤ Λ iu ≤ Λ iupper ) ), part of BMC Bioinformatics Volume 10 Supplement 15,
we compute the steady-state probability, qiu, with initial 2009: Eighth International Conference on Bioinformatics
value, q (i 0) , and obtain the log-likelihood score, log Liu, (InCoB2009): Bioinformatics, available online at http://
which is used to choose the optimal I total input rate, Λ (i 0) , www.biomedcentral.com/1471-2105/10?issue=S15.
Λ (i 0) = arg max(log L iu )
Λ iu Acknowledgements
Some of this research has been supported by the EU FP7 DIESIS Project.
⎛ n ⎞ (13)
where log L iu = log ⎜



i =1
ki
q iu (1 − q iu ) ⎟


This article has been published as part of BMC Genomics Volume 10
Supplement 3, 2009: Eighth International Conference on Bioinformatics
(InCoB2009): Computational Biology. The full contents of the supplement
Note that the qiu is a numerical solution of (12) with initial are available online at http://www.biomedcentral.com/1471-2164/10?
value, q (i 0) , and total input rate, Λiu. In order to compute issue=S3.
Λ (i1) , the initial value, q (i 0) in (13) is substituted by q (i 1)
which is a numerical solution of (12) with initial value, q (i 0) , References
1. Friedman N, Linial M, Nachman I and Pe’er D: Using Bayesian
and total input rate, Λ (i 0) . Then the steady-state probability networks to analyze expression data. Journal of computational
of gene i, qi, can be obtained by updating its value iteratively biology 2000, 7(3-4):601–620.
2. Schafer J and Strimmer K: An empirical Bayes approach to
until the d(t) <δ where d(t) is the difference between q (i t ) and inferring large-scale gene association networks. Bioinformatics
q (i t −1) at tth iteration. In this study, δ is 0.0001. 2005, 21(6):754–764.

Page 9 of 10
(page number not for citation purposes)
BMC Genomics 2009, 10(Suppl 3):S26 http://www.biomedcentral.com/1471-2164/10/S3/S26

3. Shmulevich I, Gluhovsky I, Hashimoto R, Dougherty E and Zhang W:


Steady-state analysis of genetic regulatory networks mod-
elled by probabilistic Boolean networks. Comparative and
Functional Genomics 2003, 4(6):601–608.
4. Ching W, Zhang S, Ng M and Akutsu T: An approximation
method for solving the steady-state probability distribution
of probabilistic Boolean networks. Bioinformatics 2007, 23(12):
1511.
5. Gelenbe E: Steady-state solution of probabilistic gene
regulatory networks. J Theor Biol Phys Rev E 2007, 76:031903.
6. Gelenbe E: Product-form queueing networks with negative
and positive customers. Journal of Applied Probability 1991,
656–663.
7. Gelenbe E: G-networks with triggered customer movement.
Journal of Applied Probability 1993, 742–748.
8. Arazi A, Ben-Jacob E and Yechiali U: Bridging genetic networks
and queueing theory. Physica A: Statistical Mechanics and its
Applications 2004, 332:585–616.
9. Gelenbe E and Timotheou S: Random neural networks with
synchronized interactions. Neural Computation 2008, 20(9):
2308–2324.
10. Gillespie D, et al: Exact stochastic simulation of coupled
chemical reactions. The journal of physical chemistry 1977,
81(25):2340–2361.
11. Bratsun D, Volfson D, Tsimring L and Hasty J: Delay-induced
stochastic oscillations in gene regulation. Proc Natl Acad Sci U S
A 2005, 102(41):14593–14598.
12. Orlando D, Lin C, Bernard A, Wang J, Socolar J, Iversen E,
Hartemink A and Haase S: Global control of cell-cycle
transcription by coupled CDK and network oscillators.
Nature 2008, 453(7197):944–947.
13. McAdams H and Arkin A: Stochastic mechanisms in gene
expression.1997.
14. Paulsson J: Models of stochastic gene expression. Physics of life
reviews 2005, 2(2):157–175.
15. Ribeiro A, Zhu R and Kauffman S: A general modeling strategy
for gene regulatory networks with stochastic dynamics.
Journal of Computational Biology 2006, 13(9):1630–1639.
16. Thattai M and van Oudenaarden A: Intrinsic noise in gene
regulatory networks. Proc Natl Acad Sci U S A 2001, 98(15):
8614–8619.
17. Buchler N, Gerland U and Hwa T: Nonlinear protein degrada-
tion and the function of genetic circuits. Proc Natl Acad Sci U S A
2005, 102(27):9559–9564.
18. Goeddel D, Yansura D and Caruthers M: Binding of synthetic
lactose operator DNAs to lactose repressors. Proc Natl Acad
Sci U S A 1977, 74(8):3292–3296.
19. Bloom J and Cross F: Multiple levels of cyclin specificity in cell-
cycle control. Nature Reviews Molecular Cell Biology 2007, 8(2):
149–160.
20. Fink G, Spellman P, Sherlock G, Zhang M, Iyer V, Anders K, Eisen M,
Brown P, Botstein D and Futcher B: Comprehensive identifica-
tion of cell cycle-regulated genes of the yeast Saccharo-
myces cerevisiae by microarray hybridization. Molecular
biology of the cell 1998, 9(12):3273–3297.
21. Kim H, Lee J and Park T: Inference of Large Scale Gene
Regulatory Networks Using Regressionbased Approach.
Journal of Bioinformatics and Computational Biology 2009 in press.
22. Golding I, Paulsson J, Zawilski S and Cox E: Real-time kinetics of
gene activity in individual bacteria. Cell 2005, 123(6):
1025–1036.

Publish with Bio Med Central and every


scientist can read your work free of charge
"BioMed Central will be the most significant development for
disseminating the results of biomedical researc h in our lifetime."
Sir Paul Nurse, Cancer Research UK

Your research papers will be:


available free of charge to the entire biomedical community
peer reviewed and published immediately upon acceptance
cited in PubMed and archived on PubMed Central
yours — you keep the copyright

Submit your manuscript here: BioMedcentral


http://www.biomedcentral.com/info/publishing_adv.asp

Page 10 of 10
(page number not for citation purposes)

You might also like