0% found this document useful (0 votes)
28 views

Xóa 18

This document discusses how canalization, a biological process where developmental trajectories are robust to perturbations, reduces nonlinearity in gene regulatory networks. The study finds that increased canalization in biological networks can explain their high approximability by linear models. Canalizing functions, which are common in gene networks, are more redundant, biased, and approximable than random functions. Additionally, a network's approximability depends on its dynamical regime, with ordered networks being more approximable than chaotic ones. Canalization thus plays a key role in the linearizability of biological network dynamics.

Uploaded by

Rin Tohsaka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Xóa 18

This document discusses how canalization, a biological process where developmental trajectories are robust to perturbations, reduces nonlinearity in gene regulatory networks. The study finds that increased canalization in biological networks can explain their high approximability by linear models. Canalizing functions, which are common in gene networks, are more redundant, biased, and approximable than random functions. Additionally, a network's approximability depends on its dynamical regime, with ordered networks being more approximable than chaotic ones. Canalization thus plays a key role in the linearizability of biological network dynamics.

Uploaded by

Rin Tohsaka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Canalization reduces the nonlinearity of

regulation in biological networks


Claus Kadelka1* and David Murrugarra2
1* Department of Mathematics, Iowa State University, 411 Morrill Rd,
Ames, 50011, IA, United States.
2 Department of Mathematics, University of Kentucky, 719 Patterson
arXiv:2402.09703v1 [q-bio.MN] 15 Feb 2024

Office Tower, Lexington, 40506, KY, United States.

*Corresponding author(s). E-mail(s): [email protected];


Contributing authors: [email protected];

Abstract
Biological networks such as gene regulatory networks possess desirable properties.
They are more robust and controllable than random networks. This motivates the
search for structural and dynamical features that evolution has incorporated in
biological networks. A recent meta-analysis of published, expert-curated Boolean
biological network models has revealed several such features, often referred to as
design principles. Among others, the biological networks are enriched for certain
recurring network motifs, the dynamic update rules are more redundant, more
biased and more canalizing than expected, and the dynamics of biological net-
works are better approximable by linear and lower-order approximations than
those of comparable random networks. Since most of these features are inter-
related, it is paramount to disentangle cause and effect, that is, to understand
which features evolution actively selects for, and thus truly constitute evolution-
ary design principles. Here, we show that approximability is strongly dependent
on the dynamical robustness of a network, and that increased canalization in
biological networks can almost completely explain their recently postulated high
approximability.

Keywords: Boolean networks, systems biology, nonlinear dynamics, canalization,


stability, approximation

1
1 Introduction
Biological systems are frequently represented as networks, which describe the interac-
tions between different biological entities such as genes, proteins or metabolites. For
instance, gene regulatory networks (GRNs) describe how a collection of genes governs
key processes within a cell. A static biological network is completely described by a
wiring diagram, which contains nodes (e.g., genes) and edges between nodes, which
can be undirected (e.g., in protein-protein interaction networks), directed and even
signed (e.g., in gene regulatory networks). Static networks are, however, insufficient to
obtain accurate insights into the often complex, non-linear dynamics of biological net-
works [1]. Dynamic biological networks possess the additional information how each
node is regulated by the set of its regulators. Popular dynamic modeling frameworks
include differential equation models and discrete models. While the former harbors
the potential for quantitative predictions, it requires a substantial amount of data
for an accurate inference of its many kinetic parameters. Therefore, many modelers
prefer discrete models and their qualitative predictions. Boolean networks constitute
the simplest type of discrete model. Here, each node takes on only two values and
time is discretized as well. The two values can be interpreted as low and high con-
centration, unexpressed and expressed genes or proteins, etc. Particularly for GRNs,
Boolean networks have become increasingly popular. Over 160 Boolean GRN models
have been curated by experts in their respective fields - most over the course of the
last twelve years [2]. These models range in size from 3 to 302 nodes, and describe
various processes in many species and kingdoms of life.
Over the last few decades, a number of interesting features of biological networks
have been identified. At the structural, “wiring diagram” level, biological networks
are sparsely connected with an average degree of about 2.5, and are enriched for cer-
tain network motifs such as coherent feed-forward loops and complex feedback loops,
particularly those that contain many negative interactions [2, 3]. Dynamically, most
biological networks operate at the critical edge between order and chaos [2, 4, 5]. For
random N − K Kauffman networks, it is well-established that the network dynamics
are generally ordered whenever 2Kp(1 − p) < 1 and chaotic whenever 2Kp(1 − p) > 1;
at 2Kp(1 − p) = 1, a phase transition happens [6, 7]. Here, K is the average degree of
the network, while p describes the bias of picking a one in the Boolean function’s truth
table; the unbiased case corresponds to p = 0.5, and the absolute bias can be quanti-
fied by 2|0.5 − p| ∈ [0, 1], or alternatively by 1 − 4p(1 − p) ∈ [0, 1], with 0 corresponding
in both cases to the unbiased case. Networks with ordered dynamics typically possess
few and short attractors, while chaotic dynamics are characterized by the presence of
many long attractors [8].
The dynamic update rules of Boolean biological network models are also remark-
able. They are highly canalizing, redundant and biased [2, 9, 10]. Canalization is a
widely used term in biology. First coined by developmental geneticist Waddington in
the 1940s [11], it refers to the tendency of developmental processes to follow partic-
ular trajectories, despite internal and external perturbations [12]. In other words, it
refers to low variation in phenotypes despite potentially high variation in genotypes
and the environment [13]. Correspondingly, Kauffman introduced Boolean canalizing

2
functions as suitable update rules to describe the gene regulatory logic [14]. A canal-
izing function possesses a canalizing variable, which, when it receives its canalizing
input, determines the output of the function, irrespective of all other inputs. If the
subfunction which is evaluated when the canalizing variable does not receive its canal-
izing input is also canalizing, the function is 2-canalizing, etc [15]. If all n variables
of a function become eventually canalizing, the function is n-canalizing, also known
as nested canalizing [16]. The number of variables which become eventually canaliz-
ing is known as the canalizing depth [15]. Every non-zero Boolean function possesses
a unique standard monomial form, from which the canalizing depth and the num-
ber of variables in each “layer” of canalization can be directly derived [17, 18]. As
the number of variables increases, canalizing and especially nested canalizing func-
tions become increasingly rare [19–21]. It is therefore very surprising that almost all
rules in published Boolean biological network models are canalizing and even nested
canalizing [2, 9].
Another recently discovered feature of biological Boolean network models is the
high approximability of their dynamics by linear and low-order continuous Taylor
approximations of the Boolean update rules [22]. Here, the mean approximation error
(MAE) is defined as follows: Each update rule of a given Boolean network is replaced
by a continuous Taylor approximation of a defined order. The MAE describes the mean
squared error between the long-term state of the Boolean network and the long-term
state of the continuous approximation when starting from a random initial state (see
Methods for details). Manicka et al found that biological networks were consistently
more approximable (i.e., had lower MAE values) than random networks with the same
wiring diagram (i.e., matching degree distribution) and matching update rule bias [22].
Many of the described remarkable features of biological networks are interrelated
and correlated. For instance, canalizing Boolean functions are on average more redun-
dant and biased than random functions [2]. In this paper, we show that the described
increased approximability of biological networks can be fully explained by the abun-
dance of canalization, which was not considered in [22]. We further show that the
approximability of a Boolean network depends mostly on its dynamic regime, which
in turn depends on its update rules (that is, average degree, bias and amount of canal-
ization) [2, 23]. A network with ordered dynamics (i.e., few and short attractors) tends
to possess much more approximbale dynamics than a network with chaotic dynamics.

Results and discussion


To test the hypothesis that the increased canalization in biological networks explains
their increased approximability, we compared the approximability of published expert-
curated biological networks with several ensembles of random null models, similar
to [22]. All random networks possessed the same wiring diagram as the respective bio-
logical network. The authors in [22] considered an “unconstrained” null model, where
each biological update rule was replaced by a non-constant random Boolean function
(of the same degree), and a “constrained” model (null model type 1 in this study),
which additionally matched the bias of each biological update rule. Neither model
accounted for the high degree of canalization in biological networks. We therefore

3
Fig. 1 Canalization explains the high approximability of biological networks.. The distri-
bution of mean approximation errors is shown for the biological networks (orange) and three different
types of random null networks (shades of blue), which match different characteristics (bias and/or
canalizing depth) of the biological network. Each box depicts the interquartile range (IQR), each
whisker extends to the most extreme value within 1.5 * IQR from the box, and each horizontal line
within a box depicts the median. For a fixed approximation order (1-3, x-axis), differences between
the MAE distribution of the biological and the random networks are assessed using the two-sided
Wilcoxon signed-rank test. Fig. 2 contains scatterplots showing the MAE values of all biological net-
works and their random null models.

considered two additional null models, one which matches the degree and canaliz-
ing depth of each biological update rule (null model type 2), and one which matches
degree, canalizing depth and bias (null model type 3; see Methods for details). After
excluding highly similar biological models and those with a maximal degree of eleven
or more, we compared the approximability of 110 published expert-curated biologi-
cal Boolean network models [2] and the three different ensembles of null models. As
in [22], we found that random networks of type 1 were less approximable (Fig. 1).
However, random networks that accounted for the increased canalization (null mod-
els of type 2 and type 3) exhibited similar levels of approximability as the biological
networks. Interestingly, the higher the order of employed approximation the more sig-
nificant were the differences in the MAE distributions between biological and random
networks (Fig. 2). Third-order Taylor approximations recovered the dynamics of bio-
logical networks slightly better than those of random networks with matched degree,
bias and canalizing depth. Overall, these results show that the approximability of bio-
logical networks can be almost entirely explained by their high degree of canalization,
measured by the canalizing depth.
However, a related question, which has implications for the control of Boolean net-
works [24], remains: Why can the dynamics of biological networks be approximated so
well by low-order and even linear continuous Taylor approximations? We hypothesized

4
Order 1 MAE Order 2 MAE Order 3 MAE
0.10

random null models (type 1)


0.15 = 0.76 = 0.79 0.04 = 0.87
0.08
0.03
0.10 0.06
0.02
0.04
0.05 0.01
0.02
0.00 0.00 0.00
0.00 0.05 0.10 0.15 0.00 0.05 0.10 0.00 0.02 0.04

0.125
random null models (type 2)

= 0.78 0.06 = 0.82 = 0.84


0.100
0.02
0.075 0.04
0.050 0.02 0.01
0.025
0.000 0.00 0.00
0.00 0.05 0.10 0.00 0.02 0.04 0.06 0.00 0.01 0.02

0.125 0.06 0.025


random null models (type 3)

= 0.77 = 0.82 = 0.84


0.100 0.020
0.04
0.075 0.015
0.050 0.02 0.010
0.025 0.005
0.000 0.00 0.000
0.00 0.05 0.10 0.00 0.02 0.04 0.06 0.00 0.01 0.02
biological networks biological networks biological networks

Fig. 2 Mean approximation errors of biological networks and their random null mod-
els. For a fixed approximation order (1-3, columns), differences between the MAE values of the 110
biological and the different random networks (rows) are shown, in addition to the Spearman correla-
tion coefficient, ρ. A summary of this data is shown in Fig. 1.

that the approximability of a Boolean network is strongly correlated with its dynami-
cal robustness, which is typically measured by the average sensitivity [7] and Derrida
values [25, 26]. These metrics describe how a small perturbation affects the network
over time. If the perturbation gets on average smaller after each node has been syn-
chronously updated once, the system operates in the ordered regime; if, on average, it
increases in size, the system is in the chaotic regime, and if it remains, on average, of
similar size, the system exhibits criticality. All biological systems that have thus far
been modeled as Boolean networks operate close to the critical edge between order
and chaos [2, 4, 5]. This is likely because most update rules in biological networks are

5
nested canalizing - in fact, biological networks are even particularly enriched for insen-
sitive nested canalizing functions (NCFs) [2] - and the expected average sensitivity of
an NCF in any number of variables is 1. On the contrary, the average sensitivity of
random Boolean functions with degree k and bias p is 2kp(1 − p). That is, it increases
as the number of inputs increases and decreases as the function becomes more biased
(where p = 0.5 corresponds to the unbiased case). Boolean networks governed by such
random functions thus exhibit a phase transition at 2kp(1 − p) = 1 [6, 7].
To test which features of a biological network make it highly approximable, we
computed Spearman correlations (ρ) between the mean approximation errors of the
110 biological networks and several dynamics-related properties (Fig. 3). Highly con-
nected networks proved less approximable (ρ > 0.6). This is likely due to the fact that
a continuous Taylor approximation of order n matches a Boolean function with k ≤ n
variables perfectly everywhere. Thus, the higher the average degree, ⟨K⟩, of a Boolean
network, the lower is the chance for perfect matches. Across the three approximation
orders, the average degree was even slightly more negatively correlated with network
approximability than the average effective degree, ⟨Ke ⟩, defined in [10]. This is some-
what surprising because the latter, which takes into account the importance of Boolean
inputs, is a much stronger predictor of the dynamical robustness of a Boolean net-
work, measured by its mean average sensitivity [2, 23]. In line with this, the strongest
predictor of the mean average sensitivity of a Boolean network, ⟨Ke ⟩⟨p(1 − p)⟩, as
well as the mean average sensitivity itself were both not strongly correlated with the
approximability of a Boolean network, with the correlation becoming insignificant for
higher-order approximations. On the contrary, the proportion of Boolean rules in a
biological network, which are nested canalizing, was fairly strongly correlated with the
approximability, for all orders of approximation (|ρ| > 0.4). The higher this propor-
tion the more approximable was the network. Canalizing rules, especially those with
a low sensitivity, are typically fairly biased. In line with the result on the propor-
tion of NCFs, more biased networks proved more approximable (|ρ| > 0.5). Biological
Boolean rules with a higher number of inputs tend to be more biased [5]. Interestingly,
the covariance between p(1 − p) and the in-degree was the only property that became
more correlated with approximability at higher approximation orders.
Metrics that explicitly describe dynamic aspects of a Boolean network also exhib-
ited interesting correlations with the approximability. Assuming, as in the computation
of approximability [22], a synchronous update of all nodes, we obtained, through sim-
ulation, for each biological network a lower bound of the number of attractors, as
well as the approximate mean length of the attractors, the proportion of steady state
attractors and the entropy of the basin sizes (see Methods). While the third-order
approximability was not correlated with any of these metrics, networks with more
attractors, a lower proportion of steady state attractors and higher entropy possessed
dynamics that were less approximable at first and second order. This is surprising,
since the presence of many long attractors, and concomitant high entropy, is associated
with Boolean networks that operate in the chaotic regime [27].
To rule out potential confounders such as differences in network size, average degree
as well as degree distribution, we considered modified N − K Kauffman networks,
first defined in [28]. In these random networks of size N , each node has constant

6
0.8 MAE of order
0.6 1 10 20
2 10 10

Spearman correlation
0.4 3
0.001
0.2

p-value
0.05
0.0 1
0.2 0.05
0.001
0.4
10 10
0.6 10 20

proportion steady states


entropy basin sizes
proportion NCFs
mean average sensitivity
mean length attractors
network size
minimal number attractors
<K>
<K><p(1-p)>

Cov(p(1 p), K)
<K><p(1 p)> + Cov
<Ke><p(1 p)>

<p(1 p)>
<Ke>

Fig. 3 Predictors of approximability of biological networks. Pairwise Spearman correlation


between the first-, second- and third-order mean approximation errors and various network properties
across the 110 published biological networks, ordered by the mean correlation. < · > denotes the
mean, p = output bias, K = number of variables, Ke = effective connectivity, Cov = covariance of
p(1 − p) and K. The pairwise Spearman correlations between all shown properties are in Fig. S1.

degree K. The Boolean update rule of each node is generated by drawing 2K times
randomly with replacement from {0, 1} with probability 1 − p and bias p, respectively.
We further required the wiring diagram of each network to be strongly connected since
the dynamics decouple otherwise [29]. Networks with a higher absolute bias exhibited
more approximable dynamics (Fig. 4). Moreover, sparse networks (i.e., with low in-
degree) were on average more approximable. Interestingly, the MAE did not always
decrease as the approximation order increased. For unbiased networks with high in-
degree (e.g., K = 5, p = 0.5), the MAE was very close to the maximally observed
value of 0.25, even when using forth-order Taylor approximations.
Since a Boolean function with K inputs is perfectly matched everywhere by a
continuous Taylor approximation of order K, the MAE values were zero in these cases.
If only J < K of the inputs of a Boolean function are essential (a Boolean input
is non-essential if a change in this input never changes the output of the function.
For example, f (x, y) = x has a non-essential input y), then the Jth order Taylor
approximation already provides a perfect match. To rule out a potentially confounding
effect created by perfect matches, we required, in a sensitivity analysis, all update rules
to be non-degenerated, i.e., to contain only essential variables (Fig. S2). Most MAE

7
order 1 order 2
0.25
2

constant in-degree
3 0.20
4

mean approximation error


5 0.15
order 3 order 4
2 0.10
constant in-degree

3
4 0.05

5
0.00
0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5
bias bias
Fig. 4 Effect of bias and in-degree on the approximability of the dynamics of Boolean
networks. For strongly connected 15-node Boolean networks with a constant in-degree (y-axis)
governed by random update functions generated with a certain bias (x-axis), the mean error is shown
when approximating their dynamics using different order Taylor polynomials (subplots). Each cell
depicts the MAE across 50 networks, and the same networks were used to estimate the MAE using
first-order to fourth-order Taylor polynomials. Results from an equivalent analysis where the functions
are required to be essential in all its variables are shown in Fig. S2.

values were slightly higher, likely due to the higher effective degree. Qualitatively, the
results were, however, very similar.
Combining all 2000 random networks (100 each for combinations of constant in-
degree K ∈ {2, 3, 4, 5} and bias p ∈ {0.1, 0.2, 0.3, 0.4, 0.5}), we computed, as before, the
Spearman correlation between MAE values and metrics that explicitly describe net-
work dynamics. The dynamical robustness of a network, measured by the mean average
sensitivity, was strongly positively correlated with first-, second- and third-order MAE
values (ρ > 0.75; Fig. 5). Given that the average sensitivity of random Kauffman net-
works is 2Kp(1 − p) [7], this agrees qualitatively with the results from Fig. 4. Also in
line is the finding that random networks are more approximable if they have few and
short attractors, a high proportion of steady states, and low entropy in the distribution
of the basin sizes. These four properties characterize networks that operate mostly in
the ordered and critical dynamical regime. As observed for the biological networks,
the correlations were consistently weaker when considering higher-order approxima-
tions. Note however that, by design of the computational experiment, 25% (50%) of
the networks perfectly match their second-order (third-order) approximation, which
certainly contributed to weaker correlations.

8
0.75

Spearman correlation
0.50
0.25
0.00
MAE of order
0.25 1
0.50 2
3
0.75

proportion steady states

number attractors

entropy basin sizes


mean length attractors
mean average sensitivity

Fig. 5 Predictors of approximability of random networks. Pairwise Spearman correlation


between the first-, second- and third-order mean approximation errors and network properties related
explicitly to dynamics, across 2000 random strongly-connected Boolean networks with fixed degree
K ∈ {2, 3, 4, 5} and bias p ∈ {0.1, 0.2, 0.3, 0.4, 0.5} (100 for each combination).

To study the effect of canalization on the nonlinearity of regulation in more detail,


we modified the random networks such that the update rules were restricted to spe-
cific classes of functions. First, we compared the approximability of random networks
governed by 4-variable functions with different minimal canalizing depth (see Meth-
ods). While networks without required canalization were hardly approximable (MAE
≈ 0.25), the restriction to canalizing update rules gave rise to more approximable
dynamics (Fig. 6a). Functions with a higher canalizing depth are however on average
also less sensitive [30] and exhibit a higher absolute bias.
While the canalizing depth provides a crude measure of the amount of canalization
in a Boolean function, more detailed information is contained in the canalizing layer
structure [17, 18, 30]. To investigate this, we compared the approximability of random
networks, each governed entirely by 4-variable NCFs but with different layer structure.
Networks governed by NCFs with layer structure k1 = 4, e.g., an AND-NOT function
x1 ∧ x̄2 ∧ x̄3 ∧ x4 , are highly approximable (Fig. 6b). On the other hand, networks
governed by NCFs with layer structure k1 = 1, k2 = 3, e.g., functions such as x1 ∨
(x2 ∧ x3 ∧ x4 ), are much less approximable. Again, as the approximability of these
networks decreases, the sensitivity of the underlying NCFs increases and the absolute
bias decreases [30].

Conclusion
The idea of a probabilistic generalization of Boolean logic dates back all the way to
George Boole [31]. In this manuscript, we study in depth a recent implementation

9
a b

Fig. 6 The approximability of Boolean network dynamics depends on canalization. Each


boxplot shows the distribution of the mean approximation error for 50 strongly connected N =
15-node Boolean networks with a fixed in-degree of K = 4 and a variable degree of canalization,
characterized by (a) the minimal canalizing depth of each update rule (x-axis). In (b), all functions
are nested canalizing (i.e., have canalizing depth 4) but the canalizing layer structure differs. The
order of the Taylor polynomial used for the approximation is depicted by color. Fourth-order Taylor
polynomials match the functions perfectly, the mean approximation error is 0. Each box extends
across the interquartile range (IQR), whiskers extend to the lowest data point still within 1.5 IQR of
the lower quartile, and the highest data point still within 1.5 IQR of the upper quartile, and black
circles show outliers.

of this idea: using continuous Taylor approximations of Boolean functions to approx-


imate the dynamics of a Boolean network. We show that the high approximability
of biological networks, first postulated in [22], can be almost entirely explained by
the abundance of canalization in biological networks. We conjecture that the remain-
ing higher approximability of biological networks is due to the reported increased
occurrence of insensitive canalizing rules in biological networks [2]. Through a com-
putational analysis of random networks, we show that the dynamical robustness of
a network strongly influences its approximability: Networks with low mean average
sensitivity, operating in the ordered and critical dynamical regime and characterized
by few and short attractors, possess generally more approximable dynamics. In line
with this, networks governed by canalizing or even nested canalizing functions that
are highly biased and insensitive to perturbations proved more approximable.
Fully disentangling the relative contribution of the related properties canalization,
bias, and sensitivity on approximability constitutes one of several open questions.
Moreover, it remains to be investigated how well non-perfect continuous approxima-
tions of Boolean networks perform in the context of predicting control targets or
specific dynamical features. A more technical question is whether Boolean functions
that can be well approximated by low-order continuous extensions give rise to more
approximable Boolean networks.

10
Methods
Boolean networks
A Boolean network F in variables x1 , . . . , xn can be viewed as a function on binary
strings of length n, which can be described coordinate-wise by n Boolean update func-
tions fi : {0, 1}n → {0, 1}. Every Boolean network defines a canonical map, where the
functions are synchronously updated:

F : {0, 1}n → {0, 1}n , F (x1 , . . . , xn ) = (f1 (x), . . . , fn (x)).

In this paper, we only consider this canonical map, i.e., we only consider synchronously
updated Boolean networks.
While possible, most update functions in a Boolean network do not depend on
all n variables. The wiring diagram describes the dependencies. It contains n nodes,
corresponding to the xi , and a directed edge from xi to xj if fj depends on xi
(that is, if fj (x1 , . . . , xi = 0, . . . , xn ) ̸= fj (x1 , . . . , xi = 1, . . . , xn ) for at least some
(x1 , . . . , xi−1 , xi+1 , . . . , xn ) ∈ {0, 1}n−1 ). If fj depends on xi , xi is a essential variable.
Otherwise, it is non-essential. From the wiring diagram, the degree of each node can
be derived.

Metrics describing Boolean network dynamics


A second graph associated with a synchronously updated Boolean network F , the state
space, contains as nodes the 2n binary strings and a directed edge from x ∈ {0, 1}n to
y ∈ {0, 1}n if F (x) = y. Each connected component of the state space corresponds to a
basin of attraction, consisting of a directed loop, the attractor, as well as trees feeding
into the attractor. Attractors can be steady states (also known as fixed points) or limit
cycles. Due to its finite size, all states in a Boolean network eventually transition to
an attractor. Every attractor in a biological network model typically corresponds to a
distinct phenotype [32].
Since the number of nodes, n, in the investigated biological Boolean network models
differs from 3 to 302, some of the state spaces are huge (size 2n ). We therefore used
the following procedure to approximate several dynamics-related metrics. For each
biological network F , we randomly picked 1000 different initial values x0 ∈ {0, 1}n . For
each x0 , we synchronously updated F until a repeated state was reached, indicating
the arrival at an attractor. The number of updates between first and second transition
to the repeated state corresponds to the length of the attractor. This process yields a
non-empty list of attractors {A1 , . . . , P As } of length {L1 , . . . , Ls } with corresponding
basin sizes {B1P , . . . , Bs }. We used 1s i Li as the approximate mean length of the
attractor and 1s i 1(Li = 1) as the approximate proportion of steady state attractors.
We considered an alternative P version of thesePtwo measures, weighted by the relative
1 1
basin sizes (that is, 1000 i Bi Li and 1000 i Bi 1(Li = 1)). Since the alternative
versions differed barely from the respective base versions (Spearman correlations of
ρ > 0.95 across the 110 investigated biological networks), we decided to only use the

11
base versions in the analysis. We approximated the entropy of the basin sizes as

1 X Bi
− ln( )Bi ∈ [0, ∞)
1000 i 1000

Finally, we used s as the lower bound of the number of attractors. In a network


with many attractors, we almost certainly fail to discover all attractors when starting
from only 1000 random states. However, all attractors with a large basin size are
discovered with high probability.
For the random Boolean networks of fixed size n = 15, analyzed in Figs. 4, 5 6,
S2, we computed the entire state space. All dynamics-related metrics, including the
number of network attractors, are therefore exact in these analyses.

Continuous extensions of Boolean functions


To compute the approximability of Boolean networks, we use the same approach as
in [22]. We start by defining continuous extensions of Boolean functions. Any Boolean
function f : {0, 1}n → {0, 1} is defined in the corners of the n-dimensional hypercube,
{0, 1}n , and can be extended to the entire hypercube [0, 1]n by defining a function
fˆ : [0, 1]n → [0, 1] such that fˆ(x) = f (x) for all x ∈ {0, 1}n . Specifically, we employ a
probabilistic generalization of Boolean logic, already introduced by George Booole [31].
We consider random variables Xi : {0, 1} → [0, 1] with Bernoulli distributions and set
pi = Prob(Xi = 1). Let X = X1 × · · · × Xn be the product of random variables. Then,
we define
n
X Y
fˆ(p1 , . . . , pn ) = p̂i
x∈X: i=1
f (x)=1
where (
pi if xi = 1,
p̂i =
1 − pi if xi = 0.
By this definition, fˆ : [0, 1]n → [0, 1] is a continuous function that satisfies fˆ(x) = f (x)
for all x ∈ {0, 1}n [22].

Taylor polynomials of Boolean functions


Since fˆ is a continuous-variable function, we can consider different orders of approx-
imation for fˆ using its Taylor expansion. As described in [22], fˆ is a square-free
polynomial and its Taylor expansion is finite. More specifically, the nth order approx-
imation will match fˆ perfectly, and if only m < n inputs of f are essential, then the
mth order approximation already matches fˆ perfectly.
For a given α = (α1 , . . . , αn ) ∈ {0, 1}n and x ∈ [0, 1]n , we define

|α| = α1 + · · · + αn ,
xα = xα 1 α2 αn
1 x2 · · · xn ,

12
∂ |α| fˆ
∂ α fˆ = ∂1α1 ∂2α2 · · · ∂nαn fˆ = ,
∂1α1 ∂2α2 · · · ∂nαn

with the convention that ∂i0 fˆ ≡ fˆ. For p ∈ [0, 1]n , we have

X ∂ α fˆ(p) X ∂ α fˆ(p)
fˆ(x) = (x − p)α = fˆ(p) + (x − p)α . (1)
|α|! |α|!
α∈{0,1}n α∈{0,1}n
0<|α|≤n

If p = ( 21 , . . . , 21 ), which represents the unbiased selection of each variable, then


fˆ(p) equals the output bias of f , as shown in [22]. The Taylor decomposition yields
different approximations of a Boolean function by restricting the sum in Equation 1
to α with |α| ≤ m ≤ n. The Taylor polynomial of order m is given by

X ∂ α fˆ(p)
fˆ(m) (x) = (x − p)α (2)
|α|!
α∈{0,1}n
|α|≤m

Approximability of a Boolean network by continuous extensions


Let F = (f1 , · · · , fn ) : {0, 1}n → {0, 1}n be a Boolean network. We define the mth
order approximation of F to be
 
(m)
F̂ (m) = max(0, min(1, fˆ1 )), . . . , max(0, min(1, fˆn(m) )) : [0, 1]n → [0, 1]n ,

where the update functions of F̂ (m) are the mth order Taylor approximations of the
(m)
update functions of F , fˆi as defined in Equation 2, rescaled to the interval [0, 1].
With this, we can define the mean approximation error (MAE) as the mean squared
error between the long-term state of the Boolean network and the long-term state of
its continuous approximation. That is,

1 X
MAE(F, m) = || F ∞ (x0 ) − F̂ (m),∞ (x0 ) ||2 (3)
2n
x0 ∈{0,1}n

where F ∞ (x0 ) and F̂ (m),∞ (x0 ) describe the long-term state of the Boolean network F
and its mth order approximation, respectively. In practice, we approximated the MAE,
using the Python library boolion [22], by updating both F and F̂ (m) synchronously
25 times and using 1000 random initial values.

Canalization
This study employs several mathematical concepts related to canalization. By [14],
a Boolean function f (x1 , . . . , xn ) : {0, 1}n → {0, 1} is canalizing if there exists a

13
canalizing variable xi , a canalizing input a ∈ {0, 1} and a canalized output b ∈ {0, 1}
such that
(
b if xi = a,
f (x1 , . . . , xn ) = (4)
g(x1 , . . . , xi−1 , xi+1 , . . . , xn ) ̸≡ b otherwise.

If the subfunction g is also canalizing, then f is 2-canalizing, etc. More generally, f


is k-canalizing, where 1 ≤ k ≤ n, with respect to the permutation σ ∈ Sn , inputs
a1 , . . . , ak , and outputs b1 , . . . , bk if


 b1 xσ(1) = a1 ,
b2 xσ(1) ̸= a1 , xσ(2) = a2 ,




 b3 xσ(1) ̸= a1 , xσ(2) ̸= a2 , xσ(3) = a3 ,

f (x1 , . . . , xn ) = . .. (5)

 .. .

b ̸ a1 , . . . , xσ(k−1) =
xσ(1) = ̸ ak−1 , xσ(k) = ak ,

 k



fC ̸≡ bk xσ(1) ≠ a1 , . . . , xσ(k−1) ≠ ak−1 , xσ(k) ̸= ak .

Here, fC = fC (xσ(k+1) , . . . , xσ(n) ) is the core function, a Boolean function on n − k


variables. When fC is not canalizing, then the integer k is the canalizing depth of f [15].
If k = n (i.e., if all variables are become eventually canalizing), then f is a nested
canalizing function (NCF) [16]. By [17], every nonzero Boolean function f (x1 , . . . , xn )
can be uniquely written as

f (x1 , . . . , xn ) = M1 (M2 (· · · (Mr−1 (Mr pC + 1) + 1) · · · ) + 1) + q,


Qki
where each Mi = j=1 (xij + aij ) is a non-constant extended monomial, pC is the core
Pr
polynomial of f , and k = i=1 ki is the canalizing depth. Each xi appears in exactly
one of {M1 , . . . , Mr , pC }. The layer structure of f is the vector (k1 , k2 , . . . , kr ) and
describes the number of variables in each layer Mi [18, 30].

Random null models of biological networks


We compared biological Boolean network models to three ensembles of null models
that matched different characteristics of the biological networks, as shown in Fig. 1.
All null models matched the in-degree of the biological networks. Null models 1 and 3
matched, in addition, the bias of each biological update rule, while null models 2 and
3 matched the canalizing depth.
Let F = (f1 , . . . , fn ) be a biological Boolean network model. For each fi , we first
simplified the function to only include essential variables, yielding f˜i : {0, 1}k → {0, 1},
where k is the number of essential variables, i.e., the in-degree. While this step was
omitted in [22], it appears important for an unbiased comparison given that close
to 2% of regulators in biological networks are non-essential [2]. We then computed
the number of ones in the truth table of f˜i , denoted q, and f˜i ’s canalizing depth d,
following [18].

14
To obtain a random Boolean function g (for null model 1) with the same bias as
f˜i and arbitrary canalizing depth, we simply selected a random subset Ω ⊆ {0, 1}k of
size |Ω| = q, and set (
1 if x ∈ Ω
g(x) =
0 if x ̸∈ Ω.
To obtain a random Boolean function g (for null model 2) with exact canalizing
depth d and arbitrary bias, we randomly selected d out of f˜i ’s k essential variables,
arranged them in a random order, and randomly selected for each of the d variables
a canalizing input value a ∈ {0, 1} and a canalized output value b ∈ {0, 1} (see
Equations 4, 5). Finally, we randomly selected a core function gC : {0, 1}k−d → {0, 1},
ensuring that gC depends on all k − d variables and that gC is not canalizing, by
repeating this random selection process until both conditions were met. We then filled
the truth table of g, as outlined in Equation 5. This entire procedure has already
been implemented in the Python library canalizing function toolbox, published along
with [2].
To obtain a random Boolean function g (for null model 3) with the same bias as
f˜i and the same canalizing depth d, we followed the same procedure as for null model
2, with two exceptions. First, we did not randomly select the canalized output values
b1 , . . . , bd but instead used the canalized output values of f˜i . Otherwise, it is impossible
to obtain the same bias. Second, we randomly selected a core function gC of g that
has the same number of ones as the core function of f˜i (following the same approach
as for null model 1).

Random Boolean networks


To generate a random Boolean network F = (f1 , . . . , fN ) (modified N − K Kauffman
network), we first generated a random directed graph of N nodes (the wiring diagram),
where each node has K incoming edges. We ensured the graph is simple (i.e., does
not contain self-edges/auto-regulations). We further ensured the graph is strongly
connected since the dynamics decouple otherwise [29].
To obtain the random Boolean update rules f1 , . . . , fN , we randomly selected,
for the networks analyzed in Figs. 4, 5, any Boolean function g : {0, 1}K → {0, 1}.
In a sensitivity analysis, reported in Fig. S2, we ensured that g is non-degenerated,
i.e., that all variables of g are essential, by repeating the random selection until this
condition was met. For the random Boolean networks with constant degree 4 and
minimal canalizing depth d ∈ {0, 1, 2, 4}, analyzed in Fig. 6a, we followed a very
similar procedure as for null model 2 (see above), with one exception: We allowed
the core function to be canalizing so that the realized canalizing depth may be larger
than d. For the random nested canalizing Boolean networks with constant degree 4
and different layer structure, analyzed in Fig. 6b, we followed again a very similar
procedure as for null model 2, with the exception that the layer structure determines
the canalized output values, b1 , . . . , b4 [30].

15
DATA AVAILABILITY
[2] contains standardized update rules of the 110 investigated published, expert-
curated Boolean biological network models.

CODE AVAILABILITY
Original code to compute the approximability of Boolean networks, published along
with [22], is available at https://gitlab.com/smanicka/boolion.
Code to analyze the 110 investigated published, expert-curated
Boolean biological network models, as well as the Python library
canalizing function toolbox, published along with [2], is available at
https://github.com/ckadelka/DesignPrinciplesGeneNetworks.
New code underlying the analyses described in this manuscript is available at
https://github.com/ckadelka/ApproximabilityBooleanNetworks.

References
[1] Barrat, A., Barthelemy, M. & Vespignani, A. Dynamical processes on complex
networks (Cambridge university press, 2008).

[2] Kadelka, C. et al. A meta-analysis of Boolean network models reveals design


principles of gene regulatory networks. Science Advances 10, eadj0822 (2024).

[3] Shen-Orr, S. S., Milo, R., Mangan, S. & Alon, U. Network motifs in the tran-
scriptional regulation network of Escherichia coli. Nature genetics 31, 64–68
(2002).

[4] Balleza, E. et al. Critical dynamics in genetic regulatory networks: examples from
four kingdoms. PLoS One 3, e2456 (2008).

[5] Daniels, B. C. et al. Criticality distinguishes the ensemble of biological regulatory


networks. Physical review letters 121, 138102 (2018).

[6] Luque, B. & Solé, R. V. Lyapunov exponents in random Boolean networks.


Physica A: Statistical Mechanics and its Applications 284, 33–45 (2000).

[7] Shmulevich, I. & Kauffman, S. A. Activities and sensitivities in Boolean network


models. Physical review letters 93, 048701 (2004).

[8] Chandrasekhar, K., Kadelka, C., Laubenbacher, R. & Murrugarra, D. Stability of


linear Boolean networks. Physica D: Nonlinear Phenomena 451, 133775 (2023).

[9] Harris, S. E., Sawhill, B. K., Wuensche, A. & Kauffman, S. A model of tran-
scriptional regulatory networks based on biases in the observed regulation rules.
Complexity 7, 23–40 (2002).

16
[10] Gates, A. J., Brattig Correia, R., Wang, X. & Rocha, L. M. The effective graph
reveals redundancy, canalization, and control pathways in biochemical regulation
and signaling. Proceedings of the National Academy of Sciences 118, e2022598118
(2021).

[11] Waddington, C. H. Canalization of development and the inheritance of acquired


characters. Nature 150, 563–565 (1942).

[12] Hallgrı́msson, B., Willmore, K. & Hall, B. K. Canalization, developmental stabil-


ity, and morphological integration in primate limbs. American Journal of Physical
Anthropology: The Official Publication of the American Association of Physical
Anthropologists 119, 131–158 (2002).

[13] Flatt, T. The evolutionary genetics of canalization. The Quarterly review of


biology 80, 287–316 (2005).

[14] Kauffman, S. The large scale structure and dynamics of gene control circuits: an
ensemble approach. Journal of Theoretical Biology 44, 167–190 (1974).

[15] Layne, L., Dimitrova, E. & Macauley, M. Nested canalyzing depth and network
stability. Bulletin of mathematical biology 74, 422–433 (2012).

[16] Kauffman, S., Peterson, C., Samuelsson, B. & Troein, C. Random Boolean net-
work models and the yeast transcriptional network. Proceedings of the National
Academy of Sciences 100, 14796–14799 (2003).

[17] He, Q. & Macauley, M. Stratification and enumeration of Boolean functions by


canalizing depth. Physica D: Nonlinear Phenomena 314, 1–8 (2016).

[18] Dimitrova, E., Stigler, B., Kadelka, C. & Murrugarra, D. Revealing the canalizing
structure of Boolean functions: Algorithms and applications. Automatica 146,
110630 (2022).

[19] Just, W., Shmulevich, I. & Konvalina, J. The number and probability of canalizing
functions. Physica D: Nonlinear Phenomena 197, 211–221 (2004).

[20] Li, Y., Adeyeye, J. O., Murrugarra, D., Aguilar, B. & Laubenbacher, R. Boolean
nested canalizing functions: A comprehensive analysis. Theoretical Computer
Science 481, 24–36 (2013).

[21] Kadelka, C., Li, Y., Kuipers, J., Adeyeye, J. O. & Laubenbacher, R. Multistate
nested canalizing functions and their networks. Theoretical Computer Science
675, 1–14 (2017).

[22] Manicka, S., Johnson, K., Levin, M. & Murrugarra, D. The nonlinearity of reg-
ulation in biological networks. NPJ Systems Biology and Applications 9, 10
(2023).

17
[23] Manicka, S., Marques-Pita, M. & Rocha, L. M. Effective connectivity deter-
mines the critical dynamics of biochemical networks. Journal of the Royal Society
Interface 19, 20210659 (2022).

[24] Borriello, E. & Daniels, B. C. The basis of easy controllability in Boolean


networks. Nature communications 12, 5227 (2021).

[25] Derrida, B. & Weisbuch, G. Evolution of overlaps between configurations in


random Boolean networks. Journal de Physique 47, 1297–1303 (1986).

[26] Derrida, B. & Pomeau, Y. Random networks of automata: a simple annealed


approximation. Europhysics Letters 1, 45 (1986).

[27] Drossel, B. Random boolean networks. Reviews of nonlinear dynamics and


complexity 69–110 (2008).

[28] Kauffman, S. A. Metabolic stability and epigenesis in randomly constructed


genetic nets. Journal of theoretical biology 22, 437–467 (1969).

[29] Kadelka, C., Wheeler, M., Veliz-Cuba, A., Murrugarra, D. & Laubenbacher, R.
Modularity of biological systems: a link between structure and function. Journal
of the Royal Society Interface 20, 20230505 (2023).

[30] Kadelka, C., Kuipers, J. & Laubenbacher, R. The influence of canalization on the
robustness of Boolean networks. Physica D: Nonlinear Phenomena 353, 39–47
(2017).

[31] Boole, G. Studies in Logic and Probability (Dover Publications, 2012).

[32] Schwab, J. D., Kühlwein, S. D., Ikonomi, N., Kühl, M. & Kestler, H. A. Con-
cepts in Boolean network modeling: What do they all mean? Computational and
Structural Biotechnology Journal (2020).

ACKNOWLEDGEMENTS
C.K. and D.M. were both partially supported by travel grants from the Simons
Foundation (grant numbers 712537 and 850896, respectively).
The authors thank Iowa State University for making high-performance computing
freely available to C.K.

AUTHOR CONTRIBUTIONS
Conceptualization: C.K.; Methodology: C.K. and D.M.; Software & Visualization:
C.K.; Formal analysis: C.K; Writing - Original Draft: C.K. and D.M.; Writing - Review
& Editing: C.K. and D.M.

18
COMPETING INTERESTS
The authors declare no competing interests.

19
Supplementary information.

-1.0 -0.5 0 0.5 1.0


Spearman correlation
mean average sensitivity
<Ke><p(1 p)>
Cov(p(1 p), K)
mean length attractors
order 1 MAE
<Ke>
order 2 MAE
<K>
<K><p(1 p)> + Cov
<K><p(1-p)>
order 3 MAE
entropy basin sizes
minimal number attractors
network size
proportion NCFs
<p(1 p)>
proportion steady states
proportion steady states
<p(1 p)>
proportion NCFs
network size
minimal number attractors
entropy basin sizes
order 3 MAE
<K><p(1-p)>
<K><p(1 p)> + Cov
<K>
order 2 MAE
<Ke>
order 1 MAE
mean length attractors
Cov(p(1 p), K)
<Ke><p(1 p)>
mean average sensitivity

Fig. S1 Pairwise Spearman correlation between properties of the 110 published bio-
logical networks. Clusters are defined using average linkage hierarchical clustering and Euclidean
distance. < · > denotes the mean, p = output bias, K = number of variables, Ke = effective connec-
tivity, Cov = covariance of p(1 − p) and K.

1
order 1 order 2
0.25
2
constant in-degree

3 0.20
4

mean approximation error


5 0.15
order 3 order 4
2 0.10
constant in-degree

3
4 0.05

5
0.00
0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5
bias bias
Fig. S2 Effect of bias and in-degree on the approximability of the dynamics of non-
degenerated Boolean networks. For strongly connected 15-node Boolean networks with a constant
in-degree (y-axis) governed by random non-degenerated update functions generated with a certain
bias (x-axis), the mean error is shown when approximating their dynamics using different order Taylor
polynomials (subplots). Each cell depicts the mean approximation error across 50 networks, and the
same networks were used to estimate the mean approximation error using first-order to fourth-order
Taylor polynomials. Results from an equivalent analysis where the functions are allowed to contain
non-essential variables are shown in Fig. 4.

You might also like