100% found this document useful (11 votes)
131 views

Identifiability and Regression Analysis of Biological Systems Models Statistical and Mathematical Foundations and R Scripts, 2nd Edition Illustrated eBook Download

The book provides a comprehensive guide to regression analysis and identifiability in biological systems, emphasizing the importance of a solid theoretical foundation for data science practitioners. It covers statistical concepts, methodologies, and practical applications, particularly in biochemistry and systems biology, while also introducing neural networks for parameter estimation. Aimed at students and researchers, the book includes R scripts and exercises to enhance understanding and application of the discussed techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (11 votes)
131 views

Identifiability and Regression Analysis of Biological Systems Models Statistical and Mathematical Foundations and R Scripts, 2nd Edition Illustrated eBook Download

The book provides a comprehensive guide to regression analysis and identifiability in biological systems, emphasizing the importance of a solid theoretical foundation for data science practitioners. It covers statistical concepts, methodologies, and practical applications, particularly in biochemistry and systems biology, while also introducing neural networks for parameter estimation. Aimed at students and researchers, the book includes R scripts and exercises to enhance understanding and application of the discussed techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Identifiability and Regression Analysis of Biological Systems

Models Statistical and Mathematical Foundations and R


Scripts 2nd Edition

Visit the link below to download the full version of this book:

https://medipdf.com/product/identifiability-and-regression-analysis-of-biologica
l-systems-models-statistical-and-mathematical-foundations-and-r-scripts-2nd-edit
ion/

Click Download Now


Preface

The idea behind this book was to provide a practical, concise but mathematically
rigorous guide to regression procedures for experimental data. Finding the right
balance between practicality, conciseness, and rigour is the greatest challenge facing
those who teach or do outreach work. It is increasingly necessary for those who
work in data science and those who simply use its methods and results to have
a sufficiently robust theoretical basis to enable a reasoned, conscious, and correct
use of the statistical tools that are applied in data science. I wrote the book
with the needs of students on the various Master’s courses in Data Science at
various universities around the world in mind, as well as those in various scientific
fields who have to process and interpret data (e.g. doctors, biologists, physicists,
sociologists). In these times when Artificial Intelligence techniques are often used
as black boxes for learning from data and then making predictions based on it,
the need for theoretical foundations and clear practical instructions is therefore
becoming ever greater, with statistics becoming an increasingly important part of a
data scientist’s background. Indeed, statistical analysis improves prediction, pattern
analysis, and data conclusion and interpretation. The two core statistics concepts
that are important in data science are descriptive statistics and inferential statistics.
This book deals with two branches of inferential statistics: (i) the analysis of
identifiability and (ii) regression analysis. The domains of applications in which the
techniques of model identifiability and the regression analysis are presented is that
of dynamical models in biochemistry and systems biology, but the methodologies
and the computational techniques described therein are also of interest and practical
use in other disciplines.
By definition, a model is identifiable if it is theoretically possible to learn the true
values of its parameters from an infinite number of observations of it. If a model is
identifiable from a given set of experimental data, then there exists a unique set
of parameters returning the observed data. Equivalently, if a model is identifiable,
different values of its parameters must generate different probability distributions of
the observable variables. Regression analysis is a predictive modelling technique
which investigates the causal relationship (expressed as a mathematical model)
between a dependent (target) and independent variable(s) (predictor).

v
vi Preface

Chemical and biological systems of realistic size and complexity often exhibit
stiff and non-linear dynamics whose parameter identifiability is not guaranteed
and/or for which the most common and most used regression algorithms do
not converge. Consequently, biochemical and biological systems are a suitable
benchmark for identifiability and regression analysis techniques. A unique solution
for the unknown parameters that links any set of inputs to a set of outputs is a critical
requirement for any model-based analysis, and, indeed, may become particularly
hard for dynamical models of biochemical and, more generally, biological networks.
The size of such systems in terms of the number of interacting agents, the number
and type of interactions among them, the stiffness and non-linearity of their
dynamics, along with a suboptimal sample size of the experimental observations
(due to objective limitations of the experimental investigation of living matter)
challenges the identifiability of putative models. In turn, parameters in the model
that are not identifiable pose challenges during the regression analysis, leading to
both imprecise parameter estimation and misleading conclusions, and at the end, to
the failure of the modelling process.
The book presents the concepts of complexity of a dynamical systems and
knowledge inference (Chap. 1); deterministic and stochastic dynamical models,
stiff dynamical systems, and hybrid stochastic/deterministic simulation algorithms
(Chap. 2); theoretical and algorithmic treatment of observability, identifiability, and
distinguishability of models of complex systems (Chap. 3); the theoretical principles
and the practical formulas of multilinear regression, non-linear regression, robust
and Bayesian regression, along with the methods of predictors selections, regression
diagnostic, and outlier analysis (Chap. 4). As the spread of artificial intelligence
techniques in data science requires an increasing understanding and familiarity
in the use of regression approaches based on neural networks, in this edition, a
new chapter (Chap. 5) has been included, explaining at a basic introductory level
the concepts of neural networks and their use for parameter estimation in both
multilinear systems and differential equation systems describing the dynamics of
real world systems (e.g. physical, biological, chemical, social system, etc.). The
book also provides R scripts illustrating the implementation of unsupervised model
selection and regression analysis, multi-linear regression, unsupervised model
selection, non-linear regression, an example of neural Bayesian regression, and an
example of neural network for data regression (Chap. 6). Within the chapters, we
also point the reader to other sources (websites, blogs, and posts) in which to find
implementation solutions to regression problems. As in the first edition, at the end of
each chapter we offer a number of practical and theoretical exercises through which
the reader can test his or her understanding of the concepts. Some of the exercises
also invite the reader to go deeper into what is presented in the chapters.
The book is addressed to (i) university students in the last years of their
study courses in scientific disciplines such as chemistry, mathematics, engineering,
physics, (ii) doctoral students in courses in bioinformatics, bioengineering, systems
biology, biophysics, biochemistry, environmental sciences, experimental physics,
numerical analysis, and (iii) researchers, modellers, and practitioners in these
fields. The prerequisites necessary to understand the contents of the book are the
Preface vii

knowledge of the fundamentals of mathematical analysis, probability, and statistics


that are provided in the first years of university courses in scientific subjects and a
basic knowledge of R programming language.

Bolzano-Bozen, Italy Paola Lecca


July 2024
Acknowledgements

There are many people and work contexts that inspired the content of this book and
made its realization possible. A book is the result of study and cultural exchange
with colleagues, collaborators, and students. I am very grateful to the colleagues
of the Faculty of Computer Engineering of the University of Bolzano-Bozen (Italy)
for their advices and for their outstanding commitment in didactic and dissemination
activities. I thank them for being an example and guide and for creating a pleasant
and productive working environment around me. I also thank my collaborators of
the Department of Medicine (Division of Pathology) of University of Verona (Italy)
with whom I have worked fruitfully and with great pleasure for several years and
who have helped me to understand the needs of doctors and biologists in the use
and understanding of statistics. I thank very much my students of the University of
Bolzano-Bozen, as their enthusiasm and their questions have always been for me
the motivation and inspiration of my work.
Finally, I cannot fail to thank my family, to whom I owe a great deal for always
encouraging me and creating a family environment conducive to study, discussion,
and learning.

ix
Contents

1 Complex Systems, Data, and Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1


1.1 The Definition of Complex System: Size, Stiffness, Non-linearity . . . 1
1.2 Biological Systems as Graphs and Hypergraphs . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Chemical Reaction and Metabolic Networks. . . . . . . . . . . . . . . . . . 5
1.2.2 Protein Complex Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Models Construction: Objectives and Challenges . . . . . . . . . . . . . . . . . . . . . 10
2 Dynamic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1 Chemical Networks as Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.1 Properties of the Process-Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.2 Properties of State Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Nature of Determination of a Dynamical System . . . . . . . . . . . . . . . . . . . . . 20
2.2.1 Deterministic Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.2 Stochastic Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Formalism and Algorithms for Stochastic Dynamical System . . . . . . . 24
2.3.1 The Master Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.2 Stochastic Simulation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4 Stiff Dynamical Systems and Hybrid Stochastic Algorithms . . . . . . . . . 29
3 Model Identifiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1 Parameter Identifiability: The State of the Art of the Problem. . . . . . . . 33
3.2 Observability, Distinguishability, and Identifiability . . . . . . . . . . . . . . . . . . 35
3.3 Observability Rank Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4 Biochemical Networks and Identifiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4 Regression and Variable Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1 Multilinear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1.1 Ordinary Least Square (OLS) Regression . . . . . . . . . . . . . . . . . . . . . 46
4.2 Regression Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2.1 Partial Residual Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2.2 Variance Inflation Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

xi
xii Contents

4.2.3 A Case Study: A Multilinear Model of Sweat


Secretion Volumes in Cystic Fibrosis (CF) Patients . . . . . . . . . . 50
4.3 Nonlinear Least Square Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.4 Outliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.4.1 Interquantile Distance Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.4.2 Grubb’s Test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.4.3 Tietjen-Moore Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.5 Variable Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.6 Robust Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.7 Bayesian Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5 Parameter Estimation Using Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . 71
5.1 Neural Networks and Regression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2 Machine Learning Approaches to Regression . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.2.1 Recurrent Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.2.2 Bayesian Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.2.3 Generalized Regression Neural Networks . . . . . . . . . . . . . . . . . . . . . 83
5.2.4 Deep Learning for Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . 84
6 R Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.1 Code 1: Multilinear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.2 Code 2: Unsupervised Regression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.3 Code 3: Bi-exponential Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.4 Code 4: Bayesian Regression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.5 Code 5: A Neural Network Solving a Regression Problem . . . . . . . . . . . 108

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Chapter 1
Complex Systems, Data, and Inference

Abstract The concepts of complexity and networks are recurrent in modern


systems biology. They are intimately linked to the very nature of biological
processes governed by mathematically complex laws and orchestrated by thousands
of interactions among thousands of molecular components. In this chapter, we
explain what it means for a system to be complex, what are the mathematical
tools and the abstract data structures that we can use to describe a complex system,
and finally what challenges the scientific community must face today to deduce a
mathematical or computational model from experimental observations.

1.1 The Definition of Complex System: Size, Stiffness,


Non-linearity

We usually define a phenomenon as complex when this is an expression of the


dynamics of a system made up of many components whose individual behaviour
and interactions depend on or are influenced by many factors. The phenomenology
of a complex system, due to the large number of variables involved and to the
relationships that bind these variables, is difficult to understand and therefore often
scarcely predictable. The experimental investigation of complex systems opens up
so as to shed light on a number, even a large number, of variables that describe
the components of these systems, and their interactions, but cannot be exhaustive.
In particular, for open systems, it is very rare to gain a complete knowledge of
the phenomenology and evolution. Mathematical investigation of a complex system
from experimental data can help to identify the possible presence of latent variables
or to establish whether the number of variables considered in the experimental
investigation is insufficient, but above all can be useful to identify the reasons for
the complexity inherent in the interactions between the components of the system in
question. A system can be complex because its components are many, because they
are highly interacting with each other, because (i) the dynamics of the interactions is
not linear, and/or (ii) the system is highly sensitive to initial conditions, and/or (iii)
the system is stiff, that is, the values assumed by the variables vary in intervals
whose amplitude differs by many orders of magnitude. The mathematics used

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 1


P. Lecca, Identifiability and Regression Analysis of Biological Systems Models,
SpringerBriefs in Statistics, https://doi.org/10.1007/978-3-031-74748-9_1
2 1 Complex Systems, Data, and Inference

for the analysis of complex systems does not necessarily have to be complex.
The most promising way to implement a mathematical treatment that is not too
complex and practical for users coming from various disciplines is to organize
knowledge on a complex system in a graph or in a hypergraph. In recent years, the
biological sciences have made extensive use of graphs and networks to represent
complex interacting systems composed of sets of genes, proteins, metabolites, and
functional chemical compounds of various natures and functions [1–3]. In a graph-
like representation, the agents are the vertices and the interactions are indicated by
arcs connecting interacting agents. The topology of the graph is usually derived from
qualitative and quantitative experimental observations. This type of representation
implies a new way of investigating a phenomenology, which takes the whole system
into account, and not just its individual components. In fact, a system is not just
a set of components, but sets of information: components and their interactions.
The graph that represents it includes both information: those on the components
and those on their interactions. The graph also facilitates the construction of a
mathematical model of the dynamics of a system since it is a data structure that
can be translated into a set of equations or computational procedures. The use of a
graph or hypergraph representation not only provides a guide for the construction of
the mathematical or computational specification of a model and for the analysis
of its properties, but it also makes it possible to identify the possible controls
of its complexity. A sensitivity and robustness analysis of the graph allows us
to identify the driver nodes of a dynamics, and the cluster of driver nodes of
stochastic/deterministic hybrid dynamics due to stiffness.

1.2 Biological Systems as Graphs and Hypergraphs

Frequently the terms graph and network are used interchangeably, although their
meaning is very different. We will not give formal mathematical definitions of
“graph” and “network” here since they can be found in numerous good books and
articles in the literature. Here we instead highlight, what are the differences between
graph and network from the point of view of the processes that we want to represent
graphically and mathematically. To emphasize immediately that graph and net are
two different objects, we will use the terms vertex and arc when we talk about graphs
and node and edge when we talk about the network.
Graphs are combinatorial models representing relationships (arcs) between
certain agents (vertices). In biology, the vertices typically describe proteins, metabo-
lites, genes, or other molecular complexes, whereas the arcs represent functional
relationships or interactions between the vertices such as “activate”, “binds to”,
“catalyses”, or “is converted to” [4]. Furthermore, very often the activation action
performed by a vertex is represented as an arc coming out of the vertex and pointing
to another arc.
In a graph, every edge connects two nodes, and there are no arcs pointing to
other arches. Many biological processes, however, are characterized by more than
1.2 Biological Systems as Graphs and Hypergraphs 3

Fig. 1.1 Multireagents bindings reactions, as well as the creation of multiple reaction products are
not representable as graphs, where only bilateral relations between nodes are contemplated

two participating partners. Klamt et al. [4] bring as an example a metabolic reaction
involving four species such as A + B −→ C + D or a protein complex consisting of
more than two proteins. Hence, physico-chemical interactions between biological
entities are not susceptible of a graph-like representation. As illustrated in Fig. 1.1,1
an attempt to provide a graph-like representation may cause a loss of information
that can lead to wrong interpretations afterwards. A hypergraph is a generalization of
a graph that helps to overcome such conceptual limitations [6]. For this reason, many
databases and interaction storage formats support hyperedges i of different types,
either explicitly or implicitly [6–8]. In a hypergraph an arc can join any number of
vertices. What it is commonly called “network” is indeed a hypergraph. Klamt et al.

1 The clip-arts objects of “Thinking man” are taken from the free images databases publicly

available at free Clipart Library [5].


4 1 Complex Systems, Data, and Inference

[4] noted that although hypergraphs occur ubiquitously when dealing with cellular
networks, their notion is less known than that of graphs. This causes a suboptimal
use of the hypergraph expressive potentialities. On the online Encyclopedia of
Mathematics [9], we learn that a hypergraph is defined by a set of vertices V and a
st od arcs that are defined by subsets of vertices. We learn also that “a hypergraph
may be represented in a plane by identifying its nodes with points of the plane and
by identifying the edges with connected domains containing the vertices incident
with these edge”. For example, it is possible to represent a hypergraph H with set
of nodes

V = {v1 , v2 , v3 , v4 }

and the family of edges

E = {E1 = {v1 }, E2 = {v2 , v3 , v4 }, E3 = {v2 , v4 }}.

as in Fig. 1.2. A hypergraph H may be also represented by a bipartite graph G as


follows: the sets V and E are the partitions of G, and (vi , Ei ) are connected with an
edge if and only if vertex vi is contained in edge Ei in H (Fig. 1.2).
From a set of points and lines on a plane, we can draw a graph with one vertex
per point, one vertex per line, and an edge for every incidence between a point and
a line. Indeed, A hypergraph is a incidence structure. An incidence structure is a
triple (P , L, I ) where P is a set whose elements are called points, L is a distinct set
whose elements are called lines, and I ⊆ P × L is the incidence relation. We say

Fig. 1.2 An example of hypergraph. A hypergraph may be represented by a bipartite graph, and
conversely
1.2 Biological Systems as Graphs and Hypergraphs 5

Table 1.1 An example of R script to build and visualize the hypergraph H of Fig. 1.1
library(hypergraph)
library(hyperdraw)

nodes <- c("v1", "v2", "v3", "v4", "E1", "E2", "E3")


dh1 <- DirectedHyperedge("v1", "E1")
dh2 <- DirectedHyperedge(c("v2", "v3", "v4"), "E2")
dh3 <- DirectedHyperedge(c("v2", "v4"), "E3")
hg <- Hypergraph(nodes, list(dh1, dh2, dh3))
plot(graphBPH(hg))

Fig. 1.3 The hypergraph H obtained from the code in Table 1.1

that p ∈ P is in incidence relation with l ∈ L, i.e. (p, l) ∈ I , if p lies on line l.


There is a bipartite or Levi graph corresponding to every hypergraph. It is not true,
instead, that all bipartite graphs can be regarded as incidence graphs of hypergraphs.
In Table 1.1, we show a script in R language [10] for the construction and
visualization of the hypergraph H of Fig. 1.3.
In the next section, we will give example of hypergraph models of biochemical
and biological interaction networks.

1.2.1 Chemical Reaction and Metabolic Networks

A chemical reaction is a process in which a set of chemical compounds known as


reactants, {Ri }, react in certain stoichiometric proportions, ri , to be transformed
into a set of other chemical compounds named products, {Ri }, which are produced
6 1 Complex Systems, Data, and Inference

in certain stoichiometric quantities pi :

r1 R1 + r2 R2 + . . . −→ p1 P1 + p2 P2 + . . .

Temkin et al. [11] showed that a chemical reaction can be described as a weighted
directed hyperedge in a directed hypergraph where nodes are the chemicals and
hyperedges are the reactions. However, Estrada et al. [12] noted that the lack of
a mature well-founded theory for the structural analysis of directed hypergraphs
caused the coexistence of two alternative commonly used representations of a
chemical reaction. In the first representation, a chemical reaction is modelled as
a bipartite graph, in which a set of nodes represents the reactants and products
and the other set represents the reaction itself. The other representation consists
of the substrate graph, in which reactants and products are nodes, and two nodes
are connected if the corresponding chemical compounds take part in the same
reaction. As sets of chemical reactions, metabolic pathways are represented in the
form of hypergraphs as well. In order to give an example of metabolic pathway
modelled as hypergraph, we consider the amphibolic pathway of the citric acid
(Krebs cycle) [13–15], involving the set of reactions reported in Table 1.2, and
converted into a graph structure by the script in Table 1.3. We present then in
Tables 1.4 and 1.5 two R scripts that can be used to generate and visualize the
hypergraph of the 25 reactions of the Krebs cycle, whereas in Table 1.6 we present
the R script tow generate the hypergraph of bimolecular reaction. Although the
network of citric acid cycle considered in this example has only 25 reactions and
24 nodes, its graphical representations as hypergraph result to be complex and not
immediately understandable, especially compared with the graph representation in
Fig. 1.4 (obtained with the R script in Table 1.4). However, the graph is missing
important information, for instance, about the citrate formation, that occurs through
the reaction

acetyl-CoA + oxaloacetate + H2 O −→ citrate + CoA-SH.

which is represented by an edge from acetyl-CoA to oxaloacetate. In many network-


like pictures of the citric acid cycle, this reaction is represented as an edge pointing
to the edge connecting acetyl-CoA to citrate (Fig. 1.5). In Fig. 1.6, we see a
representation as a hypergraph of the association reaction between acetyl-CoA and
oxaloacetate, obtained from the code in Table 1.6. The arcs opening from the two
reactants converge to point to the product citrate.

1.2.2 Protein Complex Networks

The systematic characterization of multi-protein complexes in the whole proteome


of an organism requires the data to be organized in the form of protein membership
lists of the protein complexes. The most common forms of this organization are
the protein-protein interaction networks and the complex intersection graphs. In the
1.2 Biological Systems as Graphs and Hypergraphs 7

Table 1.2 The citric acid cycle, known as Krebs cycle, is amphibolic. An amphibolic pathway
is both anabolic and catabolic in its functions, i.e. it functions in both degradative or catabolic
and biosynthetic or anabolic reactions (the Greek prefix “amphi” means “both”). The citric acid
cycle is a series of reactions that degrade acetyl co-enzyme A to yield carbon dioxide and energy
[13–15]
Reaction Reaction’s index
Pyruvate −→ Acetyl-CoA R1
Acetyl-CoA −→ Oxaloacetate R2
Oxaloacetate −→ Citrate R3
Citrate −→ Cis-aconitate R4
Cis-aconitate −→ Isocitrate R5
Isocitrate −→ Oxalosuccinate R6
Oxalosuccinate −→ alpha-ketoglutarate R7
alpha-ketoglutarate −→ Succinyl-CoA R8
Succinyl-CoA −→ Succinate R9
Succinate −→ Fumarate R10
Fumarate −→ Malate R11
Malate −→ Oxaloacetate R12
Malate −→ Glucose R13
Citrate −→ Cholesterol R14
Citrate −→ Fatty-Acids R15
Amino-acids −→ alpha-ketoglutarate R16
alpha-ketoglutarate −→ Amino-acids R17
Odd_Chains-Fatty-Acids −→ Succinyl-CoA R18
Isoleucine −→ Succinyl-CoA R19
Methionine −→ Succinyl-CoA R20
Valine −→ Succinyl-CoA R21
Succinyl-CoA −→ Porphirins R22
Aspartate −→ Malate R23
Phenylalanina −→ Malate R24
Tyrosine −→ Malate R25
Oxalosuccinate −→ Amino-acids R26
Amino-acids −→ Oxalosuccinate R27

first organization, the nodes of the network represent proteins and an edge links two
proteins that interact with each other.
Estrada et al. [16] noted that the characterization of multi-protein complexes in
the whole proteome of an organism requires that the data are organized in lists
of protein membership to protein complexes. This list is usually represented in
two ways. The first is the protein-protein interaction network in which the nodes
represent proteins and an edge links two proteins that interact with each other. This
representation, however, does not take into account the multi-protein complexes
[16]. The second way is an intersection graph, whose nodes represent complexes,
and a link exists between two nodes (complexes) if they have one or more proteins

You might also like