Mathematical Biology: Algebraic Expressions of Conditional Expectations in Gene Regulatory Networks

Journal of Mathematical Biology
https://doi.org/10.1007/s00285-019-01410-y Mathematical Biology
Algebraic expressions of conditional expectations in gene

regulatory networks
Vikram Sunkara1,2
Received: 12 September 2018

© Springer-Verlag GmbH Germany, part of Springer Nature 2019
Abstract
Gene Regulatory Networks are powerful models for describing the mechanisms and
dynamics inside a cell. These networks are generally large in dimension and seldom
yield analytical formulations. It was shown that studying the conditional expecta-
tions between dimensions (interactions or species) of a network could lead to drastic
dimension reduction. These conditional expectations were classically given by solv-
ing equations of motions derived from the Chemical Master Equation. In this paper
we deviate from this convention and take an Algebraic approach instead. That is, we
explore the consequences of conditional expectations being described by a polynomial
function. There are two main results in this work. Firstly, if the conditional expecta-
tion can be described by a polynomial function, then coefficients of this polynomial
function can be reconstructed using the classical moments. And secondly, there are
dimensions in Gene Regulatory Networks which inherently have conditional expecta-
tions with algebraic forms. We demonstrate through examples, that the theory derived
in this work can be used to develop new and effective numerical schemes for forward
simulation and parameter inference. The algebraic line of investigation of conditional
expectations has considerable scope to be applied to many different aspects of Gene
Regulatory Networks; this paper serves as a preliminary commentary in this direction.
Keywords Markov chains · Chemical Master Equation · Dimension reduction
Mathematics Subject Classification 65C40 · 60G20 · 92B05
1 Introduction
An average human being is made up of 37 trillion cells, each of these cells belongs to
one of a hundred different phenotypes, and each of the phenotypes has differentiated
B Vikram Sunkara
sunkara@mi.fu-berlin.de
1 Computational Medicine, Zuse Institute Berlin, 14195 Berlin, Germany

2 Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany
123
V. Sunkara
from a single zygote. Each cell has an identical copy of DNA tracing back to that
first zygote. Nevertheless, it performs its specialised function meticulously by using
only specific segments of the DNA. It is natural to assume that a change in pheno-
type is permanent; that some physical change occurs which fixes the state of the cell
to its purpose—a hammer doesn’t turn into a screw driver. This intuition is true in
human engineered systems, but far from reality in emergent complex systems like
the cell. A cell is constantly active, at any moment several genes are being activated
and deactivated resulting in what appears as a fixed phenotype on long time scales.
These interactions/dynamics inside a cell can be described through the framework
of networks (directed graphs), where the vertices of the network represent molecular
structures such as genes, R N A, and proteins, and the edges represent reaction chan-
nels which describe the interactions between the respective vertices (Wilkinson 2006;
Higham 2008; Karlebach and Shamir 2008; MacArthur et al. 2009). These networks
are referred to as Gene Regulatory Networks (GRNs). Every cell’s cellular process can
be described by a GRN; where a certain group of genes perform a function for the cell
through transcription, translation, and regulation. One example is the circadian rhythm
in mammals, where the internal cellular clock is maintained by two genes activating
and deactivating each other to produce a sinusoidal profile in mCRY and mPER pro-
tein concentrations through time (Barkai and Leibler 2000; Vilar et al. 2002). Another
example is the lactose digestion in bacteria, where a single lactose molecule binds
to the repressor protein near the lactase gene and frees the promotor region, which
then starts the production of lactase needed for digesting the lactose (Choudhary et al.
2014). Modelling a cellular process as a network helps in understanding its underlying
mechanisms, which in turn can aid in predicting its behaviour.
Mathematically speaking, studying a GRN entails studying the various paths which
can be traversed on that network. It was shown experimentally in the early 1990s, that
the internal processes of the cell are inherently stochastic (Blake et al. 2003; Srinivastiv
et al. 2002). Hence, every path over the network is possible and each path has a
probability of being realised. Biological processes inherently have many interacting
genes and molecules involved in complex interaction schemes, hence, investigating all
possible paths over such networks is a real mathematical challenge. The class of GRNs
which can be studied analytically is very small (Jahnke and Huisinga 2007; Grima
et al. 2012). In the majority of cases, numerical schemes have to be applied to study
GRNs. These numerical schemes are covered under two overarching principles: time
scale separation and volume size expansion. The idea behind time scale separation is
that if there are sub-networks which are traversed faster than the rest of the network,
these sub-networks can be reduced, which in turn would lead to a smaller network to
study (Rao and Arkin 2003; Haseltine and Rawlings 2002; Goutsias 2005; Ball et al.
2006; Burrage et al. 2006; Anderson 2007; MacNamara et al. 2008; Jahnke and Kreim
2012; Pájaro et al. 2017). The second principle is that of volume size expansion, where
in essence, the rate at which one traverses the network is adjusted such that the rate of
traversing between any two connected vertices is of the same order. This in turn reduces
the influence of stochasticity, which leads to an overall deterministic behaviour in the
network (Vilar et al. 2002; Singh and Hespanha 2005; Van Kampen 2007; Cardelli et al.
2016; Thomas et al. 2014; Pájaro et al. 2017). Both these principles are highly effective
at reducing the computational complexity of many GRNs. However, they do not cover
123
Algebraic expressions of conditional expectations in GRNs
the whole range of dynamics, in particular, they have difficulties in accurately capturing
GRNs which have cycles inside them (Engblom 2006; Hellander and Lötstedt 2007).
Cycles in GRNs imply that the system never reaches some particular steady state,
rather it is in a permanent transient state through time. There are no fixed time scales
or fixed volume sizes in such dynamic behaviour, these assumptions have to be updated
in each phase of the cycle. Hence, a larger framework is needed, which encompasses
the GRNs currently studied via the principles of time scale separation and volume size
expansion, and further extends to include GRNs with cyclical dynamics.
In this paper, we formulate a new algebraic conditional expectation framework
for studying GRNs. That is, we explore the consequences of conditional expectations
being described by a polynomial function. We will begin by motivating the essential-
ness of conditional expectations for dimension reduction in the context of complex high
dimensional GRNs. We then introduce the principle of algebraic conditional expecta-
tion forms, we will begin with the linear case to build intuition and then extend it to the
general polynomial case. Then we will prove that the algebraic forms naturally arise
in GRNs. We finish the paper by demonstrating how the properties of algebraic condi-
tional expectations can be utilised to build new numerical schemes for studying GRNs.
2 Mathematical background
In this section, we will introduce the key processes and notations which are to be used
in this paper. We begin by defining a Kurtz process, a jump Markov process, which is
commonly used for modelling GRNs. As the study is focused on investigating condi-
tional expectations, we present the necessary assumptions to guarantee their existence
and computability. After formulating the underlying assumptions, we will introduce
the dimension reduction framework, and introduce a new algebraic perspective on the
topic.
2.1 Kurtz process
Let S1 , . . . , S Ns be population counts of Ns different species which can interact with

each other.1 A reaction channel is a transformation of a set of species into another.
Mathematically, a reaction channel (R) is described by the following mapping:

Ns
Ns
R := χiin Si −→ χiout Si ,
i=1 i=1
in/out
where χi are the number of species i which are entering and exiting the reac-
tion channel respectively. The stoichiometry of the reaction channel (R) is the vector
describing the net change in population after the reaction channel has fired. We denote
this by
1 The terms species, dimensions, and vertices originate from different fields of study but refer to the same
concept. Hence, we interchange between the terms to match the context.
123
V. Sunkara
Ns
v R := (χiout − χiin )i=1 .
The propensity/intensity at which a reaction channel fires is given by the function:
Ns

Si
f R (S1 , . . . , S Ns ) := c R , (2.1)
χiin
i=1
where c R is a single event rate and the round brackets are the binomial coefficients.
Functions in the form of f R are referred to as mass action propensities (Gillespie
1977). That is, the rate of firing of a reaction channel is proportional to the product
of the populations of the species involved in starting the reaction. This formalism
emerged from thermodynamical description of the probability of molecules colliding
and forming new molecules. We are interested in systems with Ns ∈ N species under-
going Nr ∈ N reactions via mass action propensities. The stochastic process of such
a system can be modelled by:

Nr t
Z (t) = Z (0) + P f j (Z (s))ds v j , with Z (t) ∈ N0Ns , (2.2)
j=1 0
where P is an inhomogeneous Poisson process (Kurtz 1972). As a shorthand, processes

of the form (2.2) are referred to as a Kurtz processes.
The probability of observing Z (t) in a state z ∈ ⊂ N0Ns at time t ∈ [0, ∞) is
described by the Chemical Master Equation,
∂ p(Z = z; t) r N
= f j (z − v j ) p(Z = z − v j ; t) − f j (z) p(Z = z; t). (2.3)
∂t
j=1
For each reaction indexed by j ∈ {1, . . . , Nr }, we denote the corresponding propensity

functions by f j : → R+ ; and its stoichiometric vectors by v j ∈ Z Ns . For simplicity
of notation, we write the right-hand side of equation (2.3) as a shift minus an identity
operator:
∂ p(z; t)
Nr
= (S j − I ) f j (z) p(z; t), (2.4)
∂t
j=1
where S j is the shift operator of the jth reaction. We define pt to be the vector
(p(Z = z; t))z∈ , and dpt /dt as the vector (∂ p(Z = z; t)/∂t)z∈ . The solution of
the CME (2.3) is found by solving the initial value problem:
dp
t
= A pt , t > 0,
dt (2.5)
p0 ∈ 1 (), t = 0,
where A is an infinitesimal generator (Wilkinson 2006; Khammash and Munsky 2006;

Sunkara 2013) with the properties:
123

Ak,k ≤ 0, Ak,l ≥ 0 for k = l and Ak,l = 0,
l
for all k, l ∈ {1, . . . , ||}. Solving the system of equations in (2.5) gives the full joint
probability distribution of the system at a time point. Kurtz processes which yield
analytical solutions to the equations in (2.5) are a small class (Jahnke and Huisinga
2007). In most common cases, numerical schemes have to be considered. We now
introduce the necessary assumptions for guaranteeing the existence and computability
of the solution of (2.5).
Assumption 2.1 We assume the following:

1. the state space is finite, || < ∞.
2. the operator e At is honest (Banasiak 2014).
3. the joint probability is non-negative over the state space, p(Z = · ; t) > 0, at all
time points t.
Assumption (1) and (2) guarantee that the probabilities of interest do exist and are
computable. Assumption (3) is crucial for applying Bayes’ Theorem—the critical tool
for conditional probability—to help formulate and compute conditional moments.
2.2 Dimension reduction
Any method which solves the CME (2.5) with less equations than || is technically
a dimension reduction method. Within dimension reduction, we want to focus on a
particular subclass of methods, that is, methods which reduce dimension by partition-
ing the species into two groups, a stochastic group and a deterministic group, and
study their dynamics separately (Jahnke and Sunkara 2014; Hasenauer et al. 2013;
Menz et al. 2012; Jahnke 2011; Henzinger et al. 2010; Haseltine and Rawlings 2002;
Jahnke and Kreim 2012; Thomas et al. 2014; Vilar et al. 2002). This class of methods
is referred to as hybrid methods. In essence, the stochastic process Z (t) is split into
two sub-processes:
Z (t) = (X (t), Y (t)). (2.6)
A hybrid method achieves dimension reduction by evolving X (t) stochastically cou-

pled with the statistics of Y (t). With the splitting in (2.6) in mind, we now derive the
general decomposition of a hybrid method for the CME. It is important to note that
when we refer to a species as “deterministic,” we do not model these species determin-
istically, but we are rather referring to their statistics; which evolve deterministically
(e.g. expectation, variance, or higher moments).
Let NsY denote the number of species Y that are described deterministically, and NsX
the number of species X described stochastically. Then the total number of species,
Ns = NsY + NsX , is equal to the sum of the two parts. Corresponding to (2.6), the state
z is written as a tuple:
NX NY
z = (x, y), where x ∈ N0 s , y ∈ N0 s .
123
V. Sunkara
Similarly, the stoichiometric vector of the jth reaction is written as a tuple v j =

(ν j , μ j ), where ν j is the restriction of v j on the stochastically considered species and
μ j on the deterministic. The state space of the stochastically considered species is
NX NY
called the hybrid state space, and is denoted by X ⊂ N0 s . Similarly, Y ⊂ N0 s , is
the state space restricted to only deterministically considered species. The mass action
propensities, (2.1), naturally split into the product structure:
f j (z) = g j (x) h j (y), ∀z ∈ , (2.7)
for non-negative functions g j and h j . The above splitting of species, stoichiometry

and propensities is referred to as the hybrid framework. Given there exists a solution
to the CME (2.3) at a given time t > 0 and the solution is non-zero on the state
space—using Bayes’ theorem—there exists a marginal probability distribution,
p(X = ·; t) : X → [0, 1],
and a family of conditional probability distributions,
{p(Y = · | X = x; t) : Y → [0, 1] for x ∈ X },
such that,
p(z; t) = p(X = x; t) p(Y = y | X = x; t). (2.8)
Furthermore, by the law of conservation of probability, the marginal and conditional
probability distributions satisfy:
∂ p(X = x; t) ∂ p(Y = y | X = x; t)
= 0 and ∀x ∈ X : = 0. (2.9)
x
∂t y
∂t
A hybrid method tries to reconstruct the marginal distribution p(X = ·; t), using
the statistics of the conditional distribution p(Y = · | X = ·; t). The equations of
motion for p(X = ·; t) are derived by substituting (2.8) and (2.9) into the CME and
performing some simple algebra. We now derive the time derivatives of the marginal
distribution and the conditional expectation.
Firstly, for the time derivative of the marginal distribution of X (t), we sum the
CME (2.4) over all the states y. The derivative with respect to time of the product
form in (2.8) is:
∂t p(Z = z; t) = p(Y = y | X = x; t) ∂t p(X = x; t)

+ p(X = x; t) ∂t p(Y = y | X = x; t).
Then, summing the above expression

over y and applying the second condition in
(2.9), gives ∂t p(X = x; t) = y ∂t p(Z = z; t). By substituting (2.3) in the left-hand
side term and then expanding, we will have derived the time derivative of the marginal
distribution p(X = ·; t):
123
∂ p(X = x; t) ∂ p(Z = z; t)
=
∂t y
∂t

Nr
= (S j − I ) g j (x) h j (y) p(X = x; t) p(Y = y | X = x; t),
y j=1

Nr
= (S j − I ) h j (y) p(Y = y | X = x; t) g j (x) p(X = x; t).
j=1 y

()
(2.10)
We have derived the formula of the time derivative of the marginal distribution in the
form of the CME. Firstly, we should observe that the propensity of the deterministically
considered species, the function preceding the probability term (), has become time
dependent. For x ∈ X , if we denote Yx (t) as the process distributed according to
the conditional distribution, p(Y = · | X = x; t), then the term () in (2.10)—by
definition—is the expectation of the propensity of the conditional process Yx (t);

h j (y) p(Y = y | X = x; t) := E h j (Yx (t)) . (2.11)
y
Hence, (2.10) can be rewritten as
∂ p(X = x; t)
Nr

= (S j − I ) E h j (Yx (t)) g j (x) p(X = x; t). (2.12)
∂t
j=1
()
In this form, it is clear that to study or evolve the sub-processes of a high-dimensional

process, conditional moments are needed. If these were known a priori, then any
sub-process could be studied independently to the full process. This highlights the
importance of conditional moments in the process of dimension reduction. The key
principle behind every hybrid method is to harness the conditional expectation struc-
ture. To understand how the conditional moments are computed, let us consider the
following equation for the conditional expectation given by the method of conditional
moments [the derivation for the equations can be found in the works of Jahnke (2011)
and Hasenauer et al. (2013)]:
∂(E[Yx (t))] p(X = x; t))

⎛ ∂t
⎜
Nr ⎜
⎜
= ⎜ h j (y) p(Y = y | X = x − ν j , t) μ j g j (x − ν j ) p(X = x − ν j ; t)
⎜
j=1 ⎜
⎝ y

E[h j (Yx−ν j (t))]
123
V. Sunkara

+ y h j (y) p(Y = y | X = x − ν j ; t) g j (x − ν j ) p(X = x − ν j ; t)
y

E[Yx−ν j (t) h j (Yx−ν j (t))]
⎞

⎟
⎟
⎟
− y h j (y) p(Y = y | X = x; t) g j (x) p(X = x; t)⎟ . (2.13)
⎟
y ⎠

E[Yx (t) h j (Yx (t))]
The above equation solves for the conditional expectation. The first thing to notice is
that the time derivative on the left-hand side contains the marginal probability, and
a large time scale separation between the conditional expectation and the marginal
distribution would be needed to uncouple these terms in the derivative. Secondly, if
h j is a polynomial of degree greater than zero, then there are higher order conditional
moments in the square brackets on the right-hand side. Computing these would lead
to solving an infinite system of non-linear differential equations, which is clearly not
feasible. Hence, approximations like moment truncation and moment closures have to
be applied to make (2.13) computable (Jahnke and Sunkara 2014; Jahnke and Kreim
2012; Hasenauer et al. 2013). In summary, while conditional expectations are criti-
cal for dimension reduction, computing conditional expectations is computationally
challenging and highly non-trivial.
Let us return to (2.12). Here is where we wish to deviate slightly from convention.
Instead of writing down the derivatives of the conditional moments, as done in (2.13),
and imposing closures and truncations, we want to pursue a more algebraic approach.
We begin by asking the following questions:
• What dynamics would be observed if the conditional moments (∗) had a polynomial
form?
• If the conditional expectation structure between two dimensions was a linear func-
tion, could the equations of motion for the slope and the intercept be derived?
• Do certain reactions or stoichiometries of GRNs guarantee that the conditional
moments are embedded in some curve/manifold?
Giving a rigorous answer to the latter question is out of the scope of this work, but we
begin by exploring the preliminary question: “If we were to consider some polyno-
mial ansatz for the conditional moments, then which structures and properties would
emerge?”
3 Algebraic conditional expectation
In this section we will study the relationship between conditional moments and
classical moments.2 Firstly, we will investigate the consequences of the conditional
2 In our context the results can be reformulated to be raw moments, factorial moments, or central moments.
For this reason we say classical moments to encompass it all.
123
expectation having a linear algebraic form. After gaining intuition from the linear
case, we will extend the results to a generalised polynomial form. That is, we will
prove that the coefficients of a polynomial which describes the conditional moments
can be computed by solving a linear system of equations containing the joint and
marginal moments of the random variables. Furthermore, we show that the existence
of an algebraic conditional expectation inherently guarantees moment closure.
The notation from Sect. 2.2 is carried through and used in the rest of this section.
In this section, the interest is strictly on processes at a fixed time point. Hence, the
time variable is omitted from the notation. We quickly recall that X , Y are random
NX NY
variables over their respective state spaces: X ⊂ N0 s , and Y ⊂ N0 s . We assume
NX
that the random variables, Y conditioned on x, for all x ∈ X ⊂ N0 s exist. These
NY
random variables are denoted as Yx , for x ∈ X ⊂ N0 s . Henceforth, whenever we
refer to “conditional expectation” or “conditional moments”, the conditional variable
is the dimension Y and the conditioning variable is the dimension X .
Remark 3.1 To be rigorous, since the state space X is not continuous, the conditional
expectation cannot be a continuous function, but rather there is a smooth manifold in
which the conditional expectation, E[Yx ], is embedded. For brevity we will say that
the conditional expectation has the form of the manifold in which it is embedded.
3.1 Linear conditional expectation
Lemma 3.1 Let α ∈ R NsY ×NsX and β ∈ R NsY ×1 be fixed. If the expectation of Y
conditioned on x ∈ X has a linear form, that is, for all x ∈ X ,
E[Yx ] = α x + β, (3.1)
then
1. E[Y ] = α E[X ] + β,
2. cov(Y , X ) = α cov(X , X ),
3. E[cov(Yx , Yx )] = cov(Y , Y ) − α cov(X , X ) α T .
Proof Fix α ∈ R NsY ×NsX and β ∈ R NsY ×1 . We will prove the three statements sepa-
rately.
Statement 1 We multiply (3.1) by the marginal probability of x and then sum over
all x:
E[Yx ] = α x + β,

E[Yx ] p(X = x) = (α x + β) p(X = x),
x∈ X x∈ X
E[Y ] = α E[X ] + β.
123
V. Sunkara
Statement 2 We begin with the definition of the covariance of X and Y ,

cov(Y , X ) := (y − E[Y ]) (x − E[X ])T p(X = x, Y = y).
x∈ X ,y∈Y
Applying Bayes’ Theorem to the joint distribution and then collating the y related
terms gives us:

= (y − E[Y ]) (x − E[X ])T p(X = x) p(Y = y | X = x),
x∈ X ,y∈Y
⎡ ⎤

= ⎣ (y − E[Y ]) p(Y = y | X = x)⎦ (x − E[X ])T p(X = x),
x∈ X y∈Y
then expanding the square brackets gives

= (E[Yx ] − E[Y ]) (x − E[X ])T p(X = x).
x∈ X
Lastly, we substitute in the linear form (3.1) for the conditional expectation and then
reduce:

= (α x + β − E[Y ]) (x − E[X ])T p(X = x),
x∈ X

= α x x T −α x E[X ]T +β x T − β E[X ]T −E[Y ] x T +E[Y ] E[X ]T p(X = x),
x∈ X
= α E[X X T ] − α E[X ] E[X ]T ,

= α cov(X , X ).
Statement 3 Proof in “Appendix A”. The idea of the proof is to substitute the linear
conditional form (3.1) into Eve’s Law and then reduce.

Lemma 3.1-1 states that the conditional expectation form intersects the point
(E[X ], E[Y ]). Furthermore, Lemma 3.1-2 states that the covariance of X and Y is
a scaler of the variance of X . Writing these two equations together gives the following
system of equations:

cov(X , X ) 0 α cov(Y , X )
= .
E[X ] 1 β E[Y ]
123
We notice that solving the equation above gives the formula for the gradient and the
intercept of the conditional expectation in terms of the moments:
cov(Y , X )
α= and β = −α E[X ] + E[Y ]. (3.2)
cov(X , X )
We have deduced that if the conditional expectation has a linear form, then the gradient
and the intercept can be calculated using classical moments. We will now extend this
observation to the case of general polynomial forms.
3.2 Polynomial conditional expectation
For brevity, we fix NsY , NsX = 1.

Theorem 3.1 Let m ∈ N0 with m | X | . If the expectation of Y conditioned on
x ∈ X has a degree m polynomial form, that is, for all x ∈ X ,
E[Yx ] = κm x m + κm−1 x m−1 + · · · + κ1 x + κ0 , (3.3)
then for n ∈ N0 ,

m
E[Y X n ] = κi E[X i+n ]. (3.4)
i=0
Proof Fix n ∈ N0 , m ∈ N0 . We prove the statement by expanding the definition of
E[Y X n ], then substituting in the polynomial form and reducing.

E[Y X n ] := y x n p(X = x, Y = y),
x∈ X ,y∈Y
applying Bayes’ theorem to the joint distribution gives

= y x n p(Y = y | X = x) p(X = x),
x∈ X ,y∈Y
then collating the y terms reduces the expression to

⎛ ⎞
⎜ ⎟
⎜ ⎜
⎟
⎟
= ⎜ y p(Y = y | X = x)⎟ x n p(X = x),
⎜ ⎟
x∈ X ⎝ y∈Y ⎠

E[Yx ]
substituting (3.3) for the conditional expectation gives

m

= κi x i x n p(X = x).
x∈ X i=0
123
V. Sunkara
Lastly, interchanging the summations gives:

⎛ ⎞

m
= κi ⎝ x i x n p(X = x)⎠ ,
i=0 x∈ X

m
= κi E[X i+n ].
i=0
Remark 3.2 It can be seen that if m = | X | , then the conditional expectation E[Yx ]
would naturally be a polynomial of degree m. This is simply due to the interpolation
theory, which states that given m points then there exists a unique polynomial of degree
m or less which goes through the m points.
Corollary 3.1 Let

• κ := (κi )i=0
m ,∈ R
m+1×1
• := [E[X i+ j ]]i, j ∈ Mm+1×m+1
• μ := (E[Y X i ])i=0
m ∈R
m+1×1
If is invertible, then
κ = −1 μ. (3.5)
Furthermore, if is invertible and [−1 ]1,m+1 = 0, then

m−1
E[Y X ] = κm −
m
[−1 ]1,i+1 E[Y X i ] /[−1 ]1,m+1 . (3.6)
i=0
Proof The linear system of equations in (3.5) arises by simply iterating the term
E[Y X n ], as defined in (3.4), for n = 0, . . . , m :
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
E[X m ] E[X m−1 ] . . . . . . E[X ] 1 κm E[Y ]
⎢ E[X m+1 ] E[X m ] E[X ] ⎥ ⎢ ⎥ ⎢ ⎥
⎢ . . . . . . E[X 2 ] ⎥ ⎢ κm−1 ⎥ ⎢ E[Y X ] ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥•⎢ ⎥=⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ ⎦ ⎣ ⎦ ⎣ ⎦
E[X 2m ] E[X 2m−1 ] . . . . . . E[X m+1 ] E[X m ] κ0 E[Y X m ]

κ μ
Since is invertible, it follows that coefficients in (3.3) can be computed by evaluating

−1 μ.
The second statement can be proved by taking the dot product of the first row of
−1 and μ and rearranging to make E[Y X m ] the subject. We see that if κm = 0, then
E[Y X m ] has a natural moment closure.

123
Definition 3.1 Let ηYm|X (x) denote the m th degree polynomial approximation of
E[Yx ] : X → R+ , where

m
ηYm|X (x) := κi x i , (3.7)
i=0
with the coefficients κi , for i = 0, . . . , m, defined in (3.5).
Remark 3.3 The idea of fitting polynomial structures to stochastic data was originally
investigated by the data driven sciences; where functions were fitted to high dimen-
sional point clouds to unravel the dynamics which generated that data set (Seber
and Lee 2003; Nagel and Steyer 2017). In the data science framework, polynomial
approximation—which we formulated in Definition 3.1—is equivalent to a polyno-
mial regression. That is, if the moments were replaced with empirical moments of a
dataset, the mth degree polynomial approximation reduces to the mth degree polyno-
mial regression.
Example 3.1 So far, the hypotheses in the statements above have begun with “If the
conditional expectation has the form...” A natural question that arises is if Kurtz
processes can even satisfy this part of the hypotheses. To give some intuition into this
question, let us consider the following three simple m R N A translation models:
(Model 1) X ↔∅ Y ↔ ∅,
(Model 2) X ↔∅ X →Y +X Y → ∅,
(Model 3) X ↔∅ 2X → Y + 2X Y → ∅.
All three models contain two species, X and Y , the specific model parameter are
given in “Appendix B”. In the first model, both species are produced via a constant
propensity function and decay via a linear propensity function, with their respective
rate constants. The two species do not interact with each other. The second model is
a simple m R N A translation model. In this model, X is produced and decayed as in
Model 1. Species Y on the other hand, is produced by species X via an autocatalytic
reaction. That is, X produces Y and preserves its population in the process. The third
model extends on the second model, with the autocatalytic reaction needing two X s
to perform the reaction.
We start all the models with an initial population of (0, 0) and compute to a finite
time horizon using the Optimal Finite State Projection method. The left column plots
in Fig. 1 show the joint distribution of these three models, respectively. In the right
column of Fig. 1, we plot their conditional expectations. We can see that these models
exhibit algebraic conditional expectation forms.
The proof that Model 2 has linear conditional expectation structure is given in
“Appendix C”. Also, further analysis showing that Model 3 exhibits quadratic condi-
tional expectation structure at different time points is given in “Appendix D”.
In summary, Theorem 3.1 states that if the conditional expectation has a polynomial
form, then the coefficients of this polynomial form can be computed using the classical
moments. In essence, we have reduced the information of the conditional moments to
123
V. Sunkara
Constant
A
35 −5 E[Yx ]
4 · 10 4·1
0 −5
5
−
0
30 ·1 0
−4 4·
4 4·1 10 − 4
4
·1
0
−
5
2·
25 −
3
10 −
4·
10 −4
0 −5
0 3
·1
10
2 4·
4·1
−4
10 − 3
4·
4 · 10
Y
20
−5
10 −3
4 · 10−3
4
2·
10 −
15
4·
2 · 10−3
5
0−
·1
4·1
4· 0 −4 −4
4
10 10 −
5
4 · 10
−5
0
−5 4·1
4 · 10
5
25 30 35 40 45 50 55 60 65 70 75 25 30 35 40 45 50 55 60 65 70 75
X X
Linear
B 140
E[Yx ]
−4
2 · 10 2
130 ·1
0−
4 4
−
10
2·
120
1·
10 − 3
3
−
110
4
10
0−
1·
10 − 4
·1
2
2·
−3
100 2·1
0
0 −3
Y
3
0−
·1
10 −4
90
·1
1
2 · 10−3
−
2·
0
·1
2
80 −3
1· 10
10 − 1·
3 4
−
70 10
2·
2· −4
10 −
60 4 2 · 10
50
16 18 20 22 24 26 28 30 32 34 36 38 40 42 15 20 25 30 35 40
X X
Quadratic
50
C E[Yx ]
1·
45 10 −5
5
−
40 ·1
0
1 −5
10
6·
35
1 · 10 −5
6 · 10
5
− 5
30 0
· 1 10
−
−5
1 · −4
6 6 · 10
25
Y
4
−
0 −3
0 −5
·1
5
5
20 2.5 · 10
10 −
−
0 6
10 −4
10 −5
·1
1·1
1·
6
6·
15
6·
3 −3
−
10
0 6·
·1
4
3
0−
0−
5
2.
10 − 5
10
·1
6 · · 10 −
5
5
·1
−
6
0
5
·1
4 5
2.
−3 − −
1
1
0 0 0
6·1 6·
1 ·1
5 −3 6
10
2.5 ·−4
−5 −5
6 · 10 0 1 · 10
0 6·1
5 10 15 20 25 30 35 40 45 5 10 15 20 25 30 35 40 45
X X
Fig. 1 a–c (Left) Contour plots describing the joint probability distributions with constant, linear, and
quadratic conditional expectation forms, respectively. (Right) The conditional expectations of the distribu-
tions in the left figure (see “Appendix B” for system parameters)
123
the classical moments. Furthermore, Corollary 3.1 shows that if the coefficient of the
lead term of the polynomial is small, then natural moment closures arise. In Sect. 5, we
will develop some simple numerical schemes—as a proof of concept—which exploit
these results. Before starting with applications, we need to explore and study the
switching behaviour, which is inherent in all GRNs, more rigorously.
4 Switching behaviour
We define a switch to be a random variable which has only two states in its state space.
Naturally, we interpret the two states as “on” and “off”. The biological dynamics
of a gene are accurately modelled as a switch, hence, all GRNs inherently contain
switches. When genes are activated (transcription), they start producing some particles
(RNAs/proteins), depending on which tier the GRN is modelling. It was shown that
this switching behaviour of the genes is inherently stochastic and furthermore, induces
multi-modal behaviour (Grima et al. 2012; Thomas et al. 2014). In this section, we
want to unravel the consequences of the switching behaviour using the conditional
expectation forms derived earlier. We will construct the simplest possible GRN, a
network containing one switch and one particle type, with non-zero covariance. We
then derive the forms of the conditional expectation of particles from the perspective of
the switch and vice versa. We also show that the derived results extend to all GRNs, as
a general GRN can be decomposed into blocks of this simple switch-particles network.
Definition 4.1 We define S to be a random variable over the state space S := {0, 1}.
Intuitively we refer to S as a “switch” and write 0 as off and 1 as on. We define Y
to be a random variable over the state space Y ⊂ N0 . Since samples of Y are non-
negative integers, we refer to Y as “particles”. We couple these two random variables
by imposing that cov(S, Y ) = 0. Furthermore, Assumptions 2.1 hold for the joint
distribution of S and Y .
4.1 A switch’s perspective
Lemma 4.1 Let S be a switch and Y be particles as in Definition 4.1. Then the following
statements hold:
1. E[S] = p(S = on)
2. cov(S, S) = E[S] − E[S]2
3. E[SY ] = E[Yon ] p(S = on)
4. cov(Y , S) = (E[Yon ] − E[Y ]) p(S = on)
5. E[S 2 Y ] = E[SY ]
Proof We will prove the statements in order.

Statement 1
E[S] := 0 p(S = off) + 1 p(S = on),

= p(S = on).
123
V. Sunkara
Statement 2
cov(S, S) := (0 − E[S])2 p(S = off) + (1 − E[S])2 p(S = on),

= E[S]2 (1 − p(S = on)) + (1 − 2 E[S] + E[S]2 ) p(S = on),
= E[S]2 − E[S]2 p(S = on) + p(S = on)
− 2 E[S] p(S = on) + E[S]2 p(S = on),
= E[S]2 − E[S]3 + E[S] − 2 E[S]2 + E[S]3 ,
= E[S] − E[S]2 .
Statement 3

E[SY ] := 0 × y × p(Y = y, S = off) + 1 × y × p(Y = y, S = on),
y∈Y

= y p(Y = y, S = on),
y∈Y

= y p(Y = y | S = on) p(S = on),
y∈Y
= E[Yon ] p(S = on).
Statement 4
cov(Y , S) := E[SY ] − E[S] E[Y ],
applying statement 3 and 1 reduces the right-hand side terms to:
= E[Yon ] p(S = on) − E[Y ] p(S = on),

= (E[Yon ] − E[Y ]) p(S = on).
Statement 5 Since S = S 2 , the value of E[S 2 Y ] is the same as E[SY ].

Theorem 4.1 Let S be a switch and Y be particles as in Definition 4.1, then
cov(Y , S)
E[Yoff ] = − E[S] + E[Y ],
cov(S, S)
cov(Y , S) # $
E[Yon ] = 1 − E[S] + E[Y ].
cov(S, S)
Proof We first prove the conditional expectation of the particles given the switch is off.
We begin by rearranging the definition E[Y ] := E[Yon ] p(S = on) + E[Yoff ] p(S =
off), and then reduce using the statements of Lemma 4.1:
123
−E[Yon ] p(S = on) + E[Y ]

E[Yoff ] := .
p(S = off)
Rearranging Lemma 4.1-4 for E[Yon ] and substituting it in gives:
−(cov(Y , S)/ p(S = on) + E[Y ]) p(S = on) + E[Y ]

= ,
(1 − E[S])
−cov(Y , S) − E[Y ] p(S = on) + E[Y ]
= ,
(1 − E[S])
−cov(Y , S) (1 − E[S]) E[Y ]
= + ,
(1 − E[S]) (1 − E[S])
multiplying the top and bottom by E[S] and applying Lemma 4.1-2 gives us
cov(Y , S)
=− E[S] + E[Y ].
cov(S, S)
Now we prove the conditional expectation of the particles given the switch is on.
We begin by rearranging Lemma 4.1-4 and reducing:
cov(Y , S) + E[S] E[Y ]

E[Yon ] = ,
E[S]
multiplying top and bottom by 1 − E[S] gives us
cov(Y , S) (1 − E[S])
= + E[Y ],
E[S] (1 − E[S])
then substituting Lemma 4.1-2 into the denominator gives us
cov(Y , S)
= (1 − E[S]) + E[Y ].
cov(S, S)
If we compare the components of the conditional expectation of the switch to the

coefficients of the linear conditional expectation form, (3.2), we see that the conditional
expectation of particles with respect to a switch has a linear form.

Theorem 4.1 states that the gradient of the line which intersects the points
(off, E[Yoff ]) and (on, E[Yon ]) is given by cov(Y , S)/cov(S, S). By extending this
further with the observations on linear conditional expectation forms in Lemma 3.1,
we can show that the line which goes through (off, E[Yoff ]) and (on, E[Yon ]) also goes
through (E[S], E[Y ]) (see Fig. 2a). Hence, if any three out of the four terms: E[S],
E[Y ], E[Yoff ], and E[Yon ] are known, then the fourth term—which is unknown—can
be reconstructed using Theorem 4.1.
Remark 4.1 We have only given results for a single switch case. Nevertheless, it was
shown by Ruess (2015), that the results in Lemma 4.1 can be extended to multiple
123
V. Sunkara
A B
off on
E[Yon ]
off on
Expected Particles
E[Y ]
off, off off, on
E[Yoff ]
on, off on,, on
0 (off) 1 (on)
E[S]
Switch
Fig. 2 a Cartoon illustrating the consequence of Theorem 4.1. b Cartoon illustrating the decomposition of
a coupled two switch problem into an uncoupled four switch problem
switch cases. This is due to the fact that a multiple switch problem can be reformulated
into coupled single switch problems by increasing the number of dimensions (see
Fig. 2b).
4.2 A particle’s perspective
Theorem 4.2 Let S be a switch and Y be particles as in Definition 4.1. Then for all
y ∈ Y ,
p(Y = y | S = on) E[S]

E[S y ] = .
p(Y = y | S = on) E[S] + p(Y = y | S = off) (1 − E[S])
Furthermore, let y∗ ∈ Y , then
p(Y = y∗ | S = on) = p(Y = y∗ | S = off) if and only if E[S y∗ ] = E[S].
Proof We can prove the first statement using Bayes’ theorem.
E[S y ] := 0 × p(S = off | Y = y) + 1 × p(S = on | Y = y)

p(S = on, Y = y)
= ,
p(Y = y)
applying Bayes’ Theorem to the numerator and denominator gives us
p(Y = y | S = on) p(S = on)

= ,
p(Y = y | S = on) p(S = on) + p(Y = y | S = off) p(S = off)
123
then simply substituting p(S = off) = 1 − E[S] gives us
p(Y = y | S = on) E[S]

=
p(Y = y | S = on) E[S] + p(Y = y | S = off) (1 − E[S]).
The conditional expectation form of the switch conditioned on the particles seems
to take a hyperbolic form. We now prove the second statement.
Case (→) Let p(Y = y∗ | S = on) = p(Y = y∗ | S = off). We begin with the
result of the previous statement and then reduce:
p(Y = y∗ | S = on) E[S]

E[S y∗ ] = ,
p(Y = y∗ | S = on) E[S] + p(Y = y∗ | S = off) (1 − E[S])
p(Y = y∗ | S = on) E[S]
= ,
p(Y = y∗ | S = on) E[S] + p(Y = y∗ | S = on) (1 − E[S])
p(Y = y∗ | S = on) E[S]
= ,
p(Y = y∗ | S = on)
= E[S].
Case (←) Let E[S y∗ ] = E[S].
p(Y = y∗ | S = on) E[S]

E[S y∗ ] = ,
dividing both sides by E[S] gives
p(Y = y∗ | S = on)
1= ,
flipping the fraction and then multiplying top and bottom by the denominator gives
p(Y = y∗ | S = on) = p(Y = y∗ | S = on) E[S] + p(Y = y∗ | S = off) (1 − E[S]).
Finally, collating the like terms reduces the expression to
0 = (1 − E[S]) (p(Y = y∗ | S = off) − p(Y = y∗ | S = on)).
Since E[S] cannot be one, we can conclude that p(Y = y∗ | S = off) = p(Y = y∗ |
S = on). This completes the proof in both directions.

Corollary 4.1 If the conditional probabilities p(Y = y | S = on) and p(Y = y | S =

off) are Poisson distributed, then
(e−1 λ1 )λ1 E[S]

y
E[S y ] = y # y $λ
,
(e−1 λ1 )λ1 E[S] + e−λ2 /λ1 λ2 1 (1 − E[S])
123
V. Sunkara
A B
p(Y = y | S = off) 1 (on)
Expected Probability
p(Y = y | S = on)
Probability
E[S]
E[Sy ]
0 (off)
y∗ y
y∗ y
Particles Particles
Fig. 3 a, b Cartoons illustrating the consequence of Theorem 4.2
and

λ2 1
y∗ = −1 ,
λ1 log(λ1 ) − log(λ2 )
where λ1 := E[Yon ] and λ2 := E[Yoff ].
Idea of Proof The statement is proved by substituting the corresponding formulas of

the Poisson distribution into Theorem 4.2 and then reducing.

From Theorem 4.2, it is clear that the conditional expectation of the switch with
respect to particles takes a hyperbolic form. In the case where the conditional probabil-
ities are Poisson distributed, the conditional expectation takes the shape of a sigmoid
function (see Fig. 3). Furthermore, like in the case of the conditional expectation of
the particles with respect to the switch, if three out of four terms: p(Y = y | S = on),
p(Y = y | S = off), E[S y ], and E[S] are known, then the fourth term can be recon-
structed using Theorem 4.2.
In this section, we have proven some fundamental results regarding the conditional
expectation forms from the perspective of the switch and that of the particles. We have
shown that there is an inherent algebraic structure in a switch-particle coupled system.
To gain further intuition into the theorems given in this section, we now introduce some
simple numerical schemes which use these theorems. We will apply those schemes to
some toy models to see how the conditional expectation forms behave.
5 Application
In the previous two sections we pursued a theoretical exploration in which we derived

the relationship between classical moments and conditional expectations. From our
theoretical results, a natural question arises: “Can the insights from Sects. 3 and 4
be used to help improve existing numerical methods used for simulating GRNs?”
Answering such a question rigorously is beyond the scope of this work. However, as
a preliminary step, we propose a new numerical solver which we call ACE-Ansatz;
123
and a probability distribution reconstruction method based purely on moments, which

we refer to as ACE-Reconstruction. We will introduce these methods and apply them
to some simple examples. It is important to state that the aim is not to perform a
comparative study, but rather to study examples which use the theorems derived in
this paper, and through these examples gain further intuition into the structures between
dimensions in GRNs.
5.1 ACE-Ansatz
We propose a new hybrid scheme which exploits the property that polynomial con-
ditional expectation forms can be derived from classical moments. In essence, our
new method is analogous to the Method of Conditional Moments (MCM), with an
alternative method for deriving the conditional expectations. That is, in the MCM,
the equations of motion of the conditional expectations are derived from the CME
(Sect. 2.2), whereas in our new method, we will propose a polynomial ansatz for
the derivation of the conditional expectations. Then, utilising Theorem 3.1, we will
simply solve for the corresponding classical moments. We refer to this method as the
Algebraic Conditional Expectation Ansatz method (ACE-Ansatz), which we will now
compute on the well studied Simple Gene Switch model (Grima et al. 2012).
5.1.1 A simple gene switch model
The model describes a system which consists of a gene interacting with a well mixed
pool of m R N A and proteins. At any point in time only the following three vari-
ables of the system can be observed: the state of the gene, the population counts
of m R N A, and the population counts of proteins. We denote this as the processes
X (t) := (G(t), M(t), A(t)), where
• G(t) has a binary state space {on, off}, describing the state of the gene at time t,
• M(t) has a positive integer state space, describing the counts of m R N A at time t,
• A(t) has a positive integer state space, describing the counts of proteins at time t.
The system can undergo seven reactions which alter its state (see Table 1). Verbosely,
reactions one and two describe the background basal switching of the gene in the
system. Reaction three describes the transcription process, where the gene in the “on”
state starts producing m R N A. Reaction five describes the translation process, where
the m R N A is translated to produce a protein. Reaction four and six describe the
degradation of the m R N A and proteins respectively. Reaction seven describes the
activation of the promotor region of the gene by the protein.
5.1.2 Linear ACE-Ansatz approximation
We will use Theorem 3.1 to perform dimension reduction on the simple gene switch
model. In previous literature, where the simple gene switch model was introduced for
dimension reduction, the authors demonstrated that the marginal distribution of the
gene and the proteins could be well approximated by numerous dimension reduction
123
V. Sunkara
Table 1 Simple gene switch model reactions, propensities and stoichiometries
# Reaction Coefficient Stoichiometry Description

τon
1 off −−→ on τon = 0.1 (1, 0, 0) Basal activation
τoff
2 on −−→ off τoff = 0.05 (−1, 0, 0) Basal inactivation
k1
3 on −→ on + m R N A k1 = 10.0 (0, 1, 0) Transcription
γ1
4 m R N A −→ ∅ γ1 = 1.0 (0, −1, 0) m R N A degradation
k2
5 m R N A −→ m R N A + pr otein k2 = 4.0 (0, 0, 1) Translation
γ2
6 pr otein −→ ∅ γ2 = 0.5 (0, 0, −1) Protein degradation
τ̂on
7 off + pr otein −−→ on τ̂on = 0.015 (1, 0, −1) Promoter activation
The system is initialised at G(0) = off, M(0) = 8, and A(0) = 80
schemes (Pájaro et al. 2017; Thomas et al. 2014; Grima et al. 2012; Hasenauer et al.
2013). We will keep the same setting as in the previous literature to help contrast
the application of our theorems. We will first derive the derivative of the marginal
distribution of genes and proteins. Then, as in (2.10), we will highlight the conditional
expectation terms needed to solve for the marginal distribution. We then apply a
linear form ansatz to the conditional expectations. We know from Sect. 3.1, that the
coefficients of the linear form are given by the first and second moments. Hence, we
will derive the equations for the moments up to degree two and then close the higher
order terms.
We begin by deriving the derivative of the marginal distribution of G and A with
respect to time. The steps from the CME to the marginal distribution are given in
“Appendix E”. For a fixed a ∈ A , the following two equations describe the marginal
distributions for the states off and on, respectively:
d p(G = off, A = a; t)
= τoff p(G = on, A = a; t)
dt
+ k2 E[Moff,a−1 (t)] p(G = off, A = a − 1; t)

(∗)
+ γ2 (a + 1) p(G = off, A = a + 1; t)
⎛ ⎞
⎜ ⎟
− ⎝τon + k2 E[Moff,a (t)] +(γ2 + τ̂on ) a ⎠

(∗)
p(G = off, A = a; t), (5.1)
d p(G = on, A = a; t)
= τon p(G = off, A = a; t)
dt
+ k2 E[Mon,a−1 (t)] p(G = on, A = a − 1; t)

(∗)
+ γ2 (a + 1) p(G = on, A = a + 1; t)
123
+ τ̂on (a + 1) p(G = off, A = a + 1; t)

⎛ ⎞
⎜ ⎟
− ⎝τoff + k2 E[Mon,a (t)] + γ2 a ⎠ p(G = on, A = a; t).

(∗)
(5.2)
We see in the equations above that to solve for the marginal distribution, the terms
marked by (∗), that is, the expectation of the m R N A counts conditioned on the protein
count a and the gene state on or off, need to be estimated. We approximate the
conditional expectation by a linear conditional expectation form from Lemma 3.1.
That is, for g ∈ {off : 0, on : 1} and a ∈ Z+ , with
−1
cov(G(t), M(t)) cov(A(t), M(t)) cov(G(t), G(t)) cov(G(t), A(t))
α := ,
(∗∗) (∗∗) cov(G(t), A(t)) cov(A(t), A(t))
we approximate,

g E[G(t)]
E[Mg,a (t)] ≈ α − + E[M(t)] (5.3)
a E[A(t)]
(∗∗)
The terms cov(G(t), A(t)), cov(G(t), G(t)), cov(A(t), A(t)), E[A(t)], and E[G(t)],
can be computed using the marginal distribution p(G = ·, A = ·; t). We note that after
substituting the conditional expectation by a linear form in (5.3), all the terms we solve
for in this section become approximations. However, for brevity we keep the same
notation. A formal notational derivation is given in “Appendix F”. Furthermore, the
joint moments, for example E[G(t) M(t)], will be abbreviated to E[G M(t)] to match
the notation in Sect. 3. We now estimate the terms marked by (∗∗), their time derivatives
were derived using Lemma 2.1 from Engblom (2006) or Equation 11 in Smadbeck
and Kaznessis (2012):
d E[M(t)]
= k1 E[G(t)] − γ1 E[M(t)]. (5.4)
dt
d E[G M(t)]
= τon (−E[G M(t)] − E[M(t)]) − τoff E[G M(t)] + k1 E[G(t)]
dt ⎛ ⎞
⎜ ⎟
− γ1 E[G M(t)] + τ̂on ⎝E[M A(t)] − E[G M A(t)]⎠ . (5.5)

(∗∗∗)
d E[M A(t)]
= k1 E[G A(t)] − (γ1 + γ2 ) E[M A(t)] + k2 E[M 2 (t)]
dt ⎛ ⎞
⎜ ⎟
− τ̂on ⎝E[M A(t)] − E[G M A(t)]⎠ (5.6)

(∗∗∗)
123
V. Sunkara
d E[M 2 (t)] % &

= k1 (2 E[G M(t)] + E[G(t)]) + γ1 −2 E[M 2 (t)] + E[M(t)] .
dt
(5.7)
To close the equations above, an estimate for the term E[G M A(t)], (∗ ∗ ∗), is needed.
Hence, we construct a closure using the equations already known:

E[G M A(t)] = a m p(G = on, M = m, A = a; t),
a,m∈Z+

= a m p(M = m | G = on, A = a; t) p(G = on, A = a; t),
a,m∈Z+
⎛ ⎞

= ⎝ m p(M = m | G = on, A = a; t)⎠ a p(G = on, A = a; t),
a∈Z+ m∈Z+

= E[Mon,a (t)] a p(G = on, A = a; t). (5.8)

a∈Z+
(5.3)
Equations (5.1)–(5.8) form a closed system of equations. Since the ansatz was
linear, we refer to the approximation found by solving these equations as a Linear
ACE-Ansatz approximation. Note that approximations with a higher degree ansatz can
also be constructed by simply increasing the number of moment equations. We will
compare the quality of the Linear ACE-Ansatz approximation to the approximation
constructed by the MCM method described in Hasenauer et al. (2013). That is, the
MCM approximation in this example only approximates the conditional expectation
and all higher order terms are set to zero. It must be noted that having a higher order
MCM approximation would naturally increase the accuracy of the approximation.
However, higher order moments imply more equations to solve.
The joint distribution of the simple switch model can be solved using the Optimal
Finite State Projection method (OFSP). 3 We will use the OFSP approximation as a
reference distribution for comparing both approximation methods. We see in Fig. 4b–
d, that the ACE-Ansatz’s marginal distribution approximation is fairly accurate in
capturing the shape and bi-modality of the reference distribution. The MCM method
captures the bimodality but does not capture the shape of the marginal distribution in the
on state. The linear ACE-Ansatz only used five equations to estimate the conditional
expectations, whereas in the MCM, 277 equations were needed (see Table 2). Even
though using more equations did not provide gains in the quality of the marginal
distribution approximation, we see in Fig. 4a–c, that the MCM approximates the
expectation of m R N A and genes better than the ACE-Ansatz. Nevertheless, utilising
the structure of a polynomial ansatz, we were able to construct an approximation which
was nearly as accurate as that of the MCM; using only five equations.
3 The PyME implementation of the OFSP method and the MCM were used in this work (Sunkara 2017;
Sunkara and Hegland 2010). It must be noted that the MCM module in PyME is not optimised for speed.
All code was run on an Intel i7 2.5 GHz with 16GB of RAM.
123
E[G(t)] ·10−2
p(A = ·, G = on ; t = 10)
A B 4
MCM
0.94 OFSP
3.5 ACE
0.92 3
Expectation
Probability
2.5
0.9
2
0.88 1.5
1
0.86 MCM
OFSP 0.5
ACE
0.84 0
2 3 4 5 6 7 8 9 10
0 20 40 60 80 100 120 140
Time (t) A
E[M (t)] ·10−3

p(A = ·, G = off ; t = 10)
9.5 9
C D MCM
8 OFSP
ACE
9
7
Expectation
6
Probability
8.5
5
4
8
3
7.5 2
MCM
OFSP
ACE
1
7 0
2 3 4 5 6 7 8 9 10
0 20 40 60 80 100 120 140
Time (t) A
Fig. 4 Comparison of different method’s approximation of the simple gene switch model (Sect. 5.1.1). a–d
Dashed green line is representative of the ACE-Ansatz approximation, solid blue line is representative of
the MCM approximation, and the crossed red line is representative of the OFSP reference solution. a The
expectation of the gene being in on state at time t. b The probability distribution of being in an on state
at time t = 10. c The expectation of the population of m R N A in the system at time t. d The probability
distribution of being in an off state at time t = 10 (color figure online)
Table 2 Performance comparison between the ACE-Ansatz and the MCM at t = 10. The number of
equations needed to build the approximation (stochastic, deterministic)
Method Num. equations 1 Error in dist. 1 Error in moments (G, M, A) Comp. time (s)
ACE 344–5 0.0396 + 10−6 (0.014, 0.134, 1.135) 4

MCM 277–277 0.526 + 10−6 (0.005, 0.043, 0.267) 156
OFSP 39876–0 10−6 – 62
The error in distribution is the 1 error between the respective approximation and the OFSP solution. The
error in moments is the 1 error between the respective moment approximation and the OFSP solution’s
moments. The entries in every line show errors in the approximation of the species (G, M, A), respectively
123
V. Sunkara
5.1.3 A two gene toggle switch
We now consider a system containing two genes which are repressing each others
expressions through their corresponding proteins. The two genes are denoted as G 0
and G 1 , and the proteins they express are labeled P and M, respectively. The stochastic
process which evolves the population of the system over time is denoted by X (t) =
(G 0 (t), G 1 (t), P(t), M(t)), where
• G 0 (t) has a binary state space {G on
0 , G 0 }, describing the gene’s repression at
off
time t,
• G 1 (t) has a binary state space {G on
1 , G 1 }, describing the gene’s repression at
off
time t,
• P(t) has a positive integer state space, describing the counts of protein made by
gene G 0 at time t,
• M(t) has a positive integer state space, describing the counts of protein made by
gene G 1 at time t.
The system undergoes ten reactions which evolve the system through time (see
Table 3). Verbosely, protein P binds to the promotor region of G 1 and decreases its
expression. Similarly, protein M binds to the promotor region of G 0 and decreases its
expression. Protein P is made by G 0 and protein M is made by G 1 . That is, both genes
act as the respective other gene’s antagonist. We know from Gardner et al. (2000) and
Cao and Grima (2018) that this system exhibits bimodality in the count of protein P.
Using the linear ACE-Ansatz and the MCM, with G 0 and P as stochastic species, we
approximate the marginal distribution of P. The equations of motion corresponding
Table 3 Two gene toggle switch model reactions, propensities and stoichiometries

σ1
1 G off on
0 + M −→ G 0 σ1 = 0.004 (1, 0, 0, −1) Protein repressing the promotor
region
σ2
2 G on off
0 −→ G 0 σ2 = 0.25 (−1, 0, 0, 0) Protein dissociating from promotor
region
ρ1
3 G off
0 −→ P ρ1 = 60 (0, 0, 1, 0) Transcription–Translation
ρ2
4 G on
0 −→ P ρ2 = 25 (0, 0, 1, 0) Transcription–Translation
k
5 P−
→∅ k = 1.0 (0, 0, −1, 0) Protein degredation
σ3
6 G off on
1 + P −→ G 1 σ3 = 0.004 (0, 1, −1, 0) Protein repressing the promotor
region
σ4
7 G on off
1 −→ G 1 σ4 = 0.25 (0, −1, 0, 0) Protein disassociating from promotor
region
ρ3
8 G off
1 −→ M ρ3 = 45 (0, 0, 0, 1) Transcription–Translation
ρ4
9 G on
1 −→ M ρ4 = 30 (0, 0, 0, 1) Transcription–Translation
k
10 M−
→∅ k = 1.0 (0, 0, 0, −1) Protein degredation
The system is initialised at G 0 (0) = G off off

0 , G 1 (0) = G 1 , P(0) = 0, and M(0) = 0.
123
p(P = · ; t = 5) p(G1 = on ; t)
·10−2
A 3.5 B 0.4
MCM
OFSP
0.35
3 ACE
0.3
2.5
Probabiltiy
Probabiltiy
0.25
2
0.2
1.5
0.15
1 0.1
MCM
0.5 5 · 10−2 OFSP
ACE
0 0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0 10 20 30 40 50 60 70 80 90 100
P Time (t)
E[M (t)] p(G0 = ·, P = · , t = ·)

·10−3
C 40 D 1.2
Hellinger Distance (HD) HD(OFSP,MCM)
HD(OFSP,ACE)
35 1
Expectation
30 0.8
25 0.6
20 0.4
15 MCM 0.2
OFSP
ACE
10 0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Time (t) Time (t)
Fig. 5 Comparison of different methods’ approximations of the two gene toggle switch model (Sect. 5.1.3).
a–d Dashed green line is representative of the ACE-Ansatz approximation, the solid blue line is representa-
tive of the MCM approximation, and the crossed red line is representative of the OFSP reference solution.
a The marginal distribution of protein P at time t = 5. b The expectation of the gene G 1 at time t. c The
expectation of the population of M in the system at time t. d The Hellinger distance between the marginal
distributions, p(G 0 = ·, P = · , t = ·), of the ACE-Ansatz versus OFSP approximation, and the MCM
versus OFSP, over time t (color figure online)
to the linear ACE-Ansatz are given in “Appendix G”. The configuration of the MCM
method is as mentioned in Sect. 5.1.
Upon visual inspection of Fig. 5a, we see that both the linear ACE-Ansatz and the
MCM reproduce the bimodality of P. Similarly, the expectation of G 1 and M over
time are well approximated by both methods (Fig. 5b, c). To further understand the
difference between the two methods, we consider calculating the Hellinger distance
(Fig. 5d) of the two approximations to the OFSP approximation. The Hellinger dis-
tance is sensitive to differences in the probability values away from one; this distance
measure will help us highlight differences away from the peek of the distributions. We
observe that the linear ACE-Ansatz approximation is more accurate with respect to the
Hellinger distance than the MCM approximation. Also, we find that the MCM is far
smoother than the ACE-Ansatz, which could be due to MCM using many equations to
capture the transient phase more accurately. Similar to the Hellinger distance, the 1
123
V. Sunkara
Table 4 Performance comparison between the ACE-Ansatz and the MCM at t = 5
Method Num. equations 1 Error in dist. 1 Error in moments (G 0 , G 1 , P, M) Comp. time (s)
ACE 224–8 0.0008 + 10−6 (0.0004, 0.0003, 0.008, 0.003) 2

MCM 218–436 0.0027 + 10−6 (0.0013, 0.0007, 0.033, 0.008) 53
OFSP 52816–0 10−6 – 48
The number of equations needed to build the approximation (stochastic, deterministic). The error in distri-
bution is the 1 error between the respective approximation and the OFSP solution. The error in moments is
the 1 error between the respective moment approximation and the OFSP solution’s moments. The entries
in every line show errors in the approximation of the species (G 0 , G 1 , P, M), respectively
error in Table 4 shows that the ACE-Ansatz approximation is more accurate than the
MCM approximation. We see in Table 4 that the linear ACE-Ansatz only used eight
equations, in contrast to 436 equations used by MCM. These results hint towards the
two gene toggle switch model having some linear algebraic conditional expectation
structure, which the ACE-Ansatz is utilising. Recent work by Cao and Grima (2018)
showed that this model can also be well approximated by rational functions derived
from imposing steady state assumptions.
In summary, different polynomial ansätze can be used to approximate the con-
ditional expectation structures between dimensions. As an example, we performed
dimension reduction using the linear ansatz. By applying Theorem 3.1, we could
observe that the equations of motion needed to solve for the conditional expecta-
tion are simply the classical moment equations. The simple single and double toggle
switch models computed in this section do not give a definitive answer on how numeri-
cal schemes using algebraic conditional expectation ansätze translate to systems with a
higher number of dimensions/more than ten dimensions. However, in these examples,
where we could explore and reconstruct all structural properties, we could see that the
gain in accuracy was due to exploiting the inherent structure which exists in the system.
The natural future step would be to apply the scheme to higher dimensional models to
explore if they also have similar inherent algebraic structures. Another future research
direction, in terms of numerics, could be to investigate whether the structures we see
in the polynomial forms can be extended to general basis functions, like radial basis
functions, wavelets, etc.
5.2 ACE-Reconstruction
In the previous section, we demonstrated that the ACE-Ansatz could be used to accu-
rately approximate the marginal distributions. We were able to demonstrate that the
linear polynomial forms were sufficient approximations of the conditional expecta-
tions. This leads to the next question: “Can the ACE-Ansatz be used to estimate the
conditional variance? If so, can both the conditional expectation and the conditional
variance be used to reconstruct/approximate the conditional probability?” While this
question warrants its own paper, we will give some preliminary insights using the
theorems already established in this paper.
123
A B C
·104
160 160 2
5 · 10 −5 1.8
140 5
140
−
10 −4 1.6
5· 10
2·
Expectation
Expectation
2 · 10−4
5 · 10 −5
120 120 1.4
−3
1 · 10
5
4
0−
0−
1.2
·1
·1
2
5
−3
Y
0
100 2·
1 100
3
0−
3
·1
10 −
5
10 −
10 −
1
4
5
10 −
10 −
1·
−3
2·
5·
80 2·
10
80 0.8
2·
5·
−3
1 · 10 −
4
−
5 0.6
10 10
2· 5·
60 2 · 10 −4 60
0.4 E[Yx2 ]
−5 E[Yx ]
5 · 10 ηY |X ηY3 2 |X
0.2
40 40
10 15 20 25 30 35 40 45 50 55 10 15 20 25 30 35 40 45 50 55 10 15 20 25 30 35 40 45 50 55
X X X
Fig. 6 a Joint probability distribution of Model 2 (Example 3.1) plotted via a contour plot. b The conditional
expectation of the distribution in a illustrated with a red crossed line and the linear ACE approximation of
the conditional expectation illustrated with a dashed green line. c The squared conditional expectation of the
distribution in a illustrated with a red crossed line and the quadratic ACE approximation of the conditional
expectation illustrated with a dashed green line (color figure online)
In this section, we will reconstruct two 2D distributions, both mono-modal with

non-zero covariance, only using the marginal distribution and moments of the joint
distribution.
5.2.1 Linear ACE-Reconstruction
Let us consider the distribution in Fig. 6a. We see that the distribution is mono-
modal and has non-zero covariance. We continue referring to the dimension which is
conditioned on as X and to the dimension being conditioned as Y . We see in Fig. 6b,
that E[Yx ] has a linear form, and Fig. 6c shows that E[Yx2 ] has a quadratic form.
We first compute the linear and quadratic polynomial approximations for E[Yx ] and
E[Yx2 ], respectively (see Definition 3.1). We begin by solving for the coefficients of
the linear approximation of the conditional expectation,
ηY1 |X (x) = κ11 x + κ10 ,
where the coefficients are found by solving

E[X ] 1 κ11 E[Y ]
• = .
E[X 2 ] E[X ] κ10 E[X Y ]
Substituting in the terms from Table 5 and solving the above linear system of equations
gives that: κ11 = 2.343 and κ10 = 16.462. For the expectation of Y 2 conditioned on
X , we can use the same machinery. Let us now consider a quadratic algebraic form
approximation:
ηY2 2 |X (x) = κ22 x 2 + κ21 x + κ20 ,
123
V. Sunkara
Table 5 Moments of the distribution in Fig. 6a
Moments of X Moments of Y Moments of mixed X and Y
E[X ] = 33.25 E[Y ] = 94.26 E[X Y ] = 3203.24

E[X 2 ] = 1133.69 E[Y 2 ] = 9211.15 E[X Y 2 ] = 318830.26
E[X 3 ] = 39573.27 E[X 2 Y 2 ] = 1129342.88
E[X 4 ] = 1412921.52
where the coefficients are found by solving

⎡ ⎤ ⎡ ⎤ ⎡ ⎤
E[X 2 ] E[X ] 1 κ22 E[Y 2 ]
⎣ E[X 3 ] E[X 2 ] E[X ] ⎦ • ⎣ κ21 ⎦ = ⎣ E[X Y 2 ] ⎦ .
E[X 4 ] E[X 3 ] E[X 2 ] κ20 E[X 2 Y 2 ]
Substituting in the terms from Table 5 and solving the above linear system of equations
gives: κ22 = 5.210, κ21 = 99.111 and κ10 = 9.251. We have just derived an algebraic
form for the first and second conditional moments. Hence, for each x ∈ X , we have
approximations of the first two moments of conditional probability p(Y = · | X = x).
Now, we want to fit a distribution which approximates the conditional probability.
To do this, we use the method of Maximal Entropy (), which fits a distribution to
some prescribed set of moments, such that the fitted distribution has maximal entropy
among all candidate distributions with matching moments. [For further reading on the
Maximal Entropy method please see Smadbeck and Kaznessis (2013), Andreychenko
et al. (2015) and Andreychenko et al. (2017)]. In our case, we wish to fit only the first
two moments. We denote the fitting as follows:
p(Y = y |X = x) ≈ (ηY1 |X (x), ηY2 2 |X (x))(Y = y). (5.9)
Then the approximation of the joint distribution using the conditional moment approx-
imations is given by
p(X = x, Y = y) ≈ (ηY1 |X (x), ηY2 2 |X (x))(Y = y) p(X = x). (5.10)
If we compare the true joint distribution to the ACE-Reconstruction distribution (see

Fig. 7), we see that the covariance structure and the range of probabilities are simi-
lar in both cases. In Fig. 7c, we see that the ACE-Reconstruction has maximal error
near the mode of the distribution and it captures the tails of the distribution well.
For comparison, we also fitted a Gaussian distribution to the mean and covariance
of the original distribution4 (see Fig. 7c). We can see that the Gaussian also captures
the covariance structure and the probability range well. However, in contrast to the
ACE-Reconstruction, the Gaussian reconstruction captures the mode of the original
distribution well and loses accuracy in the tails. We see in Table 6, that for this example,
4 A Gaussian reconstruction in this context involves computing the Gaussian distribution over the discrete
state space and then normalising to make the total mass one.
123
Original ACE Gaussian ·10−3

2.3e-03
180 180 180 2.0e-03
160 160 160

1.8e-03
140 140 140
1.5e-03
120 120 120
1.3e-03
100 100 100
1.0e-03
80 80 80
60 60 60 7.5e-04
40 40 40 5.0e-04
20 20 20 2.5e-04
0 0 0
Y
0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60
−4
·10
2.3e-04
180 180 2.0e-04
160 160
1.7e-04
140 140
1.5e-04
DIFF
120 120
1.3e-04
100 100
1.0e-04
80 80
7.5e-05
60 60
40 40 5.0e-05
20 20 2.5e-05
0 0
0 10 20 30 40 50 60 0 10 20 30 40 50 60
X
Fig. 7 (Top row) Heat maps showing the original distribution to be approximated on the left, then the ACE-
Reconstruction and Gaussian reconstruction in the adjacent heat maps. The intensities are corresponding
to the colour bar given on the right. (Bottom row) The heat maps show the pointwise absolute difference
between the reconstructed and the original distribution, the reconstruction method is given in the column
heading. The intensities are corresponding to the colour bar on the right
Table 6 Comparison between ACE-Reconstruction and Gaussian reconstruction
Method Number of moments 1 Error in distribution
ACE 9 0.0497
Gaussian 5 0.0787
Error in distribution refers to the 1 error between the respective reconstruction and the true distribution
the gain in computational accuracy of the ACE-Reconstruction over a Gaussian recon-

struction does not merit the increased computational effort of the ACE-Reconstruction
over the Gaussian reconstruction.
5.2.2 Higher degree ACE-Reconstruction
We saw in the simple linear case that ACE could capture the distribution very accu-
rately. However, a simple Gaussian reconstruction was also comparably accurate and
significantly cheaper computationally. Now we will consider a slightly more complex
example. The distribution in Fig. 8a commonly occurs when the system’s dynam-
ics moves its distribution into the edge of the state space. In this particular case, the
distribution was taken from an SIR model, at a time point where nearly all suscepti-
ble individuals are converted into infected individuals (see “Appendix H” for system
parameters). We follow similar steps as in the linear example case, but extend a bit fur-
ther by constructing a cubic polynomial- and a quartic polynomial approximation of
the conditional moments. We start by writing down the moments which are necessary
123
V. Sunkara
A 160
B 160
C 1.8
·104
5 · 10 −5
· 1 0 −5
2 · 10 − 5·
4
10 −5
150 150 1.78
0−
1
4 4
5·
−
0 5·
·1 10 −4
2
5 2 5·
Expectation
Expectation
·1 10 −
2 0− 5
·1 4
140 0 −3 140 1.76
3
2 · 10 −
5·
5 · 10−4
10 −
3
Y
5
0−
130 130 1.74
0 −4
·1
5
3
·1
10 −
2
2·
5
−
4
10
0−
2
120 5· 120 1.72

·1
·1
4
0
−
−
0
5
3
·1
2
5
−
10
−4
5 · 10 5·
110 −4 110 1.7
2 · 10
−5
E[Yx ] E[Yx2 ]
5 · 10 ηY3 |X ηY3 2 |X
100 100 1.68
0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35
X X X
Fig. 8 a Joint probability distribution of the SIR system (“Appendix H”) plotted via a contour plot. b
The conditional expectation of the distribution in a illustrated with a red crossed line and the cubic ACE
approximation of the conditional expectation illustrated with a dashed green line. c The squared conditional
expectation of the distribution in a illustrated with a red crossed line and the quartic ACE approximation
of the conditional expectation illustrated with a dashed green line (color figure online)
to approximate the conditional expectations. For a quartic polynomial form, moments

up to degree eight are needed (see Table 7).
We now consider cubic and quartic polynomial approximations of the first two
conditional moments, respectively:
ηY1 |X (x) = κ13 x 3 + κ12 x 2 + κ11 x + κ10 , (5.11)

ηY2 2 |X (x) = κ24 x + κ23 x + κ22 x + κ21 x + κ20 .
4 3 2
(5.12)
The coefficients in the equations above can be found by solving the following system
of equations:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
E[X 3 ] E[X 2 ] E[X ] 1 κ13 E[Y ]
⎢ E[X 4 ] E[X 3 ] E[X 2 ] E[X ] ⎥ ⎢ κ12 ⎥ ⎢ E[X Y ] ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ E[X 5 ] E[X 4 ] E[X 3 ] E[X 2 ] ⎦ • ⎣ κ11 ⎦ = ⎣ E[X 2 Y ] ⎦ ,
E[X 6 ] E[X 5 ] E[X 4 ] E[X 3 ] κ10 E[X 3 Y ]
and
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
E[X 4 ] E[X 3 ] E[X 2 ] E[X ] 1 κ24 E[Y 2 ]
⎢ E[X 5 ] E[X 4 ] E[X 3 ] E[X 2 ] E[X ] ⎥ ⎢ ⎥ ⎢ 2 ⎥
⎢ ⎥ ⎢ κ23 ⎥ ⎢ E[X2Y 2] ⎥
⎢ E[X 6 ] ⎥ ⎢ ⎥ ⎢
E[X ] ⎥ • ⎢ κ22 ⎥ = ⎢ E[X Y ] ⎥
E[X 5 ] E[X 4 ] E[X 3 ] ⎥.
2
⎢
⎣ E[X 7 ] E[X 6 ] E[X 5 ] E[X 4 ] E[X 3 ] ⎦ ⎣ κ21 ⎦ ⎣ E[X 3 Y 2 ] ⎦
E[X 8 ] E[X 7 ] E[X 6 ] E[X 5 ] E[X 4 ] κ20 E[X 4 Y 2 ]
We see in Fig. 8 that the conditional moment approximations do not fit as tightly
as in the previous example, however, they do capture the right trend.
Like in the linear case, we use the Maximum Entropy method () to approximate
the conditional probability using the first two conditional moment approximations
given in (5.11) and (5.12),
p(Y = y |X = x) ≈ (ηY1 |X (x), ηY2 2 |X (x))(Y = y). (5.13)
123
Table 7 Moments of the distribution in Fig. 8a
Moments of X Moments of Y Moments of mixed X and Y
E[X ] = 6.28 E[X 5 ] = 38864736.15 E[Y ] = 131.17 E[X Y ] = 821.69 E[X 2 Y 2 ] = 1141745.56
E[X 2 ] = 73.09 E[X 6 ] = 6861261393.8 E[Y 2 ] = 17277.97 E[X 2 Y ] = 8815.75 E[X 3 Y 2 ] = 21874741.71
E[X 3 ] = 2453.18 E[X 7 ] ≈ 126.94 × 1010 E[X 3 Y ] = 187486.58 E[X 4 Y 2 ] = 843134378.72
E[X 4 ] = 248985.45 E[X 8 ] ≈ 240.45 × 1012 E[X Y 2 ] = 108453.02
123
V. Sunkara
Original ACE Gaussian ·10−3

180 180 180 5.4e-03
170 170 170 4.8e-03
160 160 160

4.2e-03
150 150 150
3.6e-03
140 140 140
3.0e-03
130 130 130
2.4e-03
120 120 120
1.8e-03
110 110 110
100 100 100 1.2e-03
90 90 90 6.0e-04
80 80 80
Y
0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30
−3
·10
180 180 2.0e-03
170 170 1.8e-03
160 160 1.6e-03
150 150 1.4e-03

DIFF
140 140 1.2e-03
130 130 1.0e-03
120 120 8.0e-04
110 110 6.0e-04
100 100 4.0e-04
90 90 2.0e-04
80 80
0 5 10 15 20 25 30 0 5 10 15 20 25 30
X
Fig. 9 (Top row) Heat maps showing the original distribution to be approximated on the left, then the ACE-
Reconstruction and Gaussian reconstruction in the adjacent heat maps. The intensities are corresponding
to the colour bar given on the right. (Bottom row) The heat maps show the pointwise absolute difference
between the reconstructed and the original distribution, the reconstruction method is given in the column
heading. The intensities are corresponding to the colour bar on the right
Table 8 Comparison between ACE-Reconstruction and Gaussian reconstruction
Method Number of moments 1 Error in distribution
ACE 16 0.0378
Gaussian 5 0.4599
Error in distribution refers to the 1 error between the respective reconstruction and the true distribution
Then the reconstruction of the joint distribution is given by
p(X = x, Y = y) ≈ (ηY1 |X (x), ηY2 2 |X (x))(Y = y) p(X = x). (5.14)
We see in Fig. 9 that the ACE-Reconstruction performs much better than the Gaussian
reconstruction. This is illustrated in Table 8, which shows that the expected error of the
ACE-Reconstruction is approximately twelve times smaller than that of the Gaussian
reconstruction. In general, distributions close to boundaries are difficult to compute,
because the boundary forces the distributions to bend. This makes their approximation
computationally challenging. The ACE-Reconstruction, which didn’t give a perfect
fit, nevertheless captured some of these dynamics by approximating the underlying
conditional moments. It must be noted that the Gaussian reconstruction was done with
far fewer moments than the ACE-Reconstruction, so it is not a fair comparison to only
look at the shape. Nevertheless, the aim of this example was to demonstrate that the
reconstruction of complex distribution shapes is possible; given that there is a smooth
underlying manifold in which the conditional moments are embedded.
123
We have given some preliminary evidence that distributions can be reconstructed

using the conditional moments. However, there is much scope to improve this
approach. For example, when the first and second conditional moments are approxi-
mated separately, two independent linear systems of equations are solved. While this
is computationally simple, in many cases the positivity condition of the variance could
be violated due to fitting errors. Furthermore, the matrix with the marginal moments
is ill-conditioned, making the approximation sensitive to cascading errors. A better
strategy which fixes this problem would be to set up a non-linear system of equations
using Eve’s Law. That is, we would trade a linear system of equations for a non-linear
system of equations to preserve interpretability of the approximation.
6 Discussion
When presented with a high dimensional GRN, we are quick to reach for dimension
reduction methods focused around principles of time scale separation and volume
size expansion. These methods are effective at exploiting particular structures in the
network, but are not an overarching framework to help decompose and understand
GRNs. In this paper, we introduced an algebraic framework to describe the rela-
tionship between species in a general Kurtz process, and showed that conditional
expectations are the key to understanding these relationships. We then proved that if
the conditional expectation has an algebraic form, then the form can be inferred from
the classical moments. In short, conditional expectations decompose the dimensional
relationships, and the moments decompose the conditional expectations, elucidating
that the moments contain all critical information about the network. We then proved
that GRNs inherently have algebraic forms between dimensions. Hence, one can trans-
late the theory which we have developed to any general GRN.
To show that there are potential applications for the theorems developed in this work,
we touched on two new methods: one to simulate/evolve GRNs using a polynomial
conditional expectation ansatz; and one to reconstruct complex distribution shapes
using conditional moments. Both cases gave positive preliminary results in favour of
developing new numerical schemes using an algebraic conditional expectation ansatz.
This algebraic line of investigation can be extended into many aspects of GRNs.
For example, we could investigate mappings which project the network onto a domain
which yields lower degree conditional expectation forms (like the concept of lineari-
sation in numerics). Conditional expectation could also be applied in model selection,
where similarity metrics can be designed using the algebraic forms of the conditional
expectations between dimensions.
In summary, conditional expectations are critical for understanding and decompos-
ing GRNs. We proved that the algebraic perspective is a robust and intuitive framework
for studying such networks. Future research down this line of thought is imperative.
Funding V. Sunkara was supported by the BMBF (Germany) project PrevOp-OVERLOAD, grant number
01EC1408H.
123
V. Sunkara
A Proofs
Proof of Lemma 3.1-3 We substitute the conditional expectation form into Eve’s law
(Law of Total Variance) and then reduce.
Eve’s Law states that
cov(Y , Y ) = E[cov(Yx , Yx )] + cov(E[Yx ], E[Yx ]).
Verbosely, the total variation of Y is the sum of the expectation of the conditional
variances and the variance of the conditional expectation. We begin by reducing the
covariance of the conditional expectations:

cov(E[Yx ], E[Yx ]) := (E[Yx ] − E[Y ]) (E[Yx ] − E[Y ])T p(X = x),
x∈ X
substituting the linear conditional expectation form and the expanding gives us

= (α x + β − E[Y ]) (α x + β − E[Y ])T p(X = x),
x∈ X

= (α x + E[Y ] − α E[X ] − E[Y ]) (α x + E[Y ] − α E[X ] − E[Y ])T p(X = x),
x∈ X

= (α x − α E[X ]) (α x − α E[X ])T p(X = x),
x∈ X

= α (x − E[X ]) (x − E[X ])T α T p(X = x),
x∈ X
⎡ ⎤

=α ⎣ (x − E[X ]) (x − E[X ]) p(X = x)⎦ α T ,
T
x∈ X
substituting the definition of a covariance gives
= α cov(X , X ) α T .
Substituting this term above into Eve’s law gives us that,
E[cov(Yx , Yx )] = cov(Y , Y ) − α cov(X , X ) α T .
B Parameters of the three models
See Tables 9, 10 and 11.
123
Table 9 Model 1 system

parameters. T f inal = 5.0
c1
1 ∅ −→ X c1 = 50.0 (1, 0) Birth of X
c2
2 X −→ ∅ c2 = 1.0 (−1, 0) Death of X
c3
3 ∅ −→ Y c3 = 20.0 (0, 1) Birth of Y
c4
4 Y −→ ∅ c4 = 1.0 (0, −1) Death of Y
Table 10 Model 2 system parameters. T f inal = 3.0

c1
1 ∅ −→ X c1 = 30.0 (1, 0) Birth of X
c2
2 X −→ ∅ c2 = 1.0 (−1, 0) Death of X
c3
3 X −→ Y + X c3 = 4.0 (0, 1) Autocatalytic production of Y using X
c4
4 Y −→ ∅ c4 = 1.0 (0, −1) Death of Y
Table 11 Model 3 system parameters. T f inal = 0.6

c1
1 ∅ −→ X c1 = 50.0 (1, 0) Birth of X
c2
2 X −→ ∅ c2 = 0.5 (−1, 0) Death of X
c3
3 2X −→ Y + 2X c3 = 0.1 (0, 1) Autocatalytic production of Y using two X
c4
4 Y −→ ∅ c4 = 0.5 (0, −1) Death of Y
C Proof that the simple mRNA translation model has a linear

conditional expectation structure
The idea and outline for this proof was given by one of the anonymous reviewers of
this paper. The author is grateful to the reviewer and the peer-review process for this
contribution.
We prove that the simple m R N A translation model has linear conditional expecta-
tion structure by using the notion of generating functions. We begin by first deriving
the definition of the conditional expectation in terms of the generating function.
C.1 Conditional expectation in terms of the generating function
Let X and Y be two coupled random variables whose state space are the natural
numbers including zero. The generating function of the joint distribution p(X =
·, Y = ·) is given by,
123
V. Sunkara

φ(t, s) := t x̃ s y p(X = x̃, Y = y), for t, s ∈ C. (C.1)
x̃∈ X ,y∈Y
It is well known that taking the nth derivative of φ and setting t or s to zero gives
the nth degree classical moment of the random variables X and Y , respectively. We
aim to similarly formulate the conditional expectation in terms of derivatives of the
generating function.
For x ∈ X , we define
∂ x φ(t, s) ''
gx (s) := ' ,
dt x t=0

= x! s y p(X = x, Y = y). (C.2)
y∈Y
Verbosely, the function gx (s) is the xth derivative of φ with respect to t, evaluated at
t = 0. We take the natural logorithm of gx (s) to get,
⎛ ⎞

x
log(gx (s)) = n + log ⎝ s y p(X = x, Y = y)⎠ .
n=1 y∈Y
Taking the derivative of the expression above with respect to s gives us,

d log(gx (s)) y∈Y y s y−1 p(X = x, Y = y)
= . (C.3)
ds y∈Y s y p(X = x, Y = y)
Then evaluating the function at s = 1 gives us,

d log(gx (s)) '' y∈ y p(X = x, Y = y)
' = Y ,
ds s=1 y∈Y p(X = x, Y = y)

= y p(Y = y |X = x),
y∈Y
:= E[Yx ]. (C.4)
We have derived the definition of the conditional expectation as function of the

derivatives of the generating function. Naturally, if the generating function is known,
one can evaluate the terms in (C.4) and determine the corresponding conditional expec-
tation structure.
C.2 Linear conditional expectation form of the simple mRNA transcription model
We prove that the simple m R N A transcription model has a linear conditional expecta-
tion form by using the generating function given by Bokes et al. (2012) and substituting
123
it into (C.4). We begin by establishing some notation in order to align with the work
by Brokes et al.
Let M, N be the random variables corresponding with m R N A population and
protein population, respectively. Let the reaction channels be given as follows:
k1 γ1 k2 γ2
R1 : ∅ −
→ M, R2 : M −
→ ∅, R3 : M −
→ M + N , R4 : N −
→ ∅.
We are investigating the dynamics of the stationary distribution, hence we omit the
time component. It was shown by Brokes et al. that the stationary moments of the
simple m R N A translation model are as follows:
k1 k1 k2
E[M] = , E[N ] = , (C.5)
γ1 γ1 γ2

k1 k1 k2 k2 k1 k2
V[M] = , V[N ] = 1+ , cov(M, N ) = . (C.6)
γ1 γ1 γ2 γ1 + γ2 γ1 (γ1 + γ2 )
Then the generating function of the stationary distribution is given by,
φ(t, s) = ea(s)+(t−1) b(s) , (C.7)
where
s
a(s) := α β K (1, 1 + λ, β(r − 1))dr and b(s) := α K (1, 1 + λ, β(s − 1)),
0
with K (·, ·, ·) being the Kummer’s function and
γ1 k1 k2
λ= ,α= ,β= . (C.8)
γ2 γ1 γ2
To find the conditional expectation of the simple m R N A translation model, we will

substitute its generating function (C.7), into (C.2) and reduce.
∂ m φ(t, s) ''
gm (s) := ' ,
dt m t=0
'
'
= ea(s)+(t−1) b(s) b(s)m ' ,
t=0
= ea(s)−b(s) b(y)m .
Taking the natural log gives us,
log gm (s) = a(s) − b(s) + m log(b(s)).
Then taking the derivative with respect to s gives us,
log gm (s) da(s) db(s) m db(s)

= − + . (C.9)
ds ds ds b(s) ds
123
V. Sunkara
By the fundamental theorem of calculus we have that
da(s)
= α β K (1, 1 + λ, β(s − 1)),
ds
dc K (a, b, f (c)) =
d
and by the properties of the derivative of the Krummer’s function,
a f (c)
b K (a + 1, b + 1, f (c)), we have that
db(s) αβ
= K (2, 2 + λ, β(s − 1)).
ds 1+λ
Substituting these terms into (C.9), then evaluating at s = 1 and applying the property
that K (·, ·, 0) = 1 gives us,
log gm (s) '' αβ 1 αβ

= αβ − +m .
ds s=1 1+λ β 1+λ
By the definition given in (C.4), we have that
αβ 1 αβ
E[Nm ] = α β − +m .
1+λ β 1+λ
After substituting in the term (C.8), the conditional expectation in terms of the reaction
rates is given to be,
k1 k2 k1 k2 k2
E[Nm ] = − +m . (C.10)
γ1 γ2 γ1 (γ1 + γ2 ) γ1 + γ2
Hence, the conditional expectation of the simple m R N A translation model has a

linear form. We now cross-validate the coefficients linking the terms above to the raw
moments using Lemma 3.1.
C.3 Cross-validation
Using (3.2) we know that linear conditional expectations of protein conditioned on

m R N A should have the form:
cov(M, N )
E[Nm ] = (m − E[M]) + E[N ].
V[M]
Substituting in (C.5) and (C.6) for the moments gives us,

k2 k1 k1 k2
= m− + ,
γ1 + γ2 γ1 γ1 γ2
123
expanding the terms gives us,
k2 k1 k2 k1 k2
=m − + . (C.11)
γ1 + γ2 γ1 (γ1 + γ2 ) γ1 γ2
Both the terms in (C.10) and (C.11) match.
D Model 3: Conditional expectation through time
In this section we evaluate Model 3 at different time points to observe if the conditional
expectation’s quadratic structure is present through time. Since there are no analytical
solutions for the model known to date, we use an OFSP approximation as the reference
solution and see how close this approximation’s conditional expectation is to the
conditional expectation ansatz. The OFSP approximation was set to have a global 1
error of 10−7 .
In Fig. 10a–c, the joint distribution is rendered in a contour plot, evaluated at time
points T = 0.15, 0.3, and 1.2. Below the joint distributions, in Fig. 10d–e, the
corresponding conditional expectation and the quadratic ACE ansatz are given. We
see that the conditional expectation and the ansatz are fairly similar. There are some
mismatches at the boundary, but this is to be expected since the OFSP does produce
artefacts at the boundary due to truncation criterions.
To further investigate the resolution at which conditional expectations and the ACE
ansatz differ, we study the differences between them though time using three different
metrics: the ∞ norm, to study the maximum error at a particular time point; the 2
norm, to study the difference over the entire state space; and lastly, the relative error
in 2 , to see how the error is changing with respect to the change in the conditional
expectation. In Fig. 10g, we see that the ∞ norm is of the order 10−2 in the interval
of interest and the error is increasing with time. Then in Fig. 10h, we notice that the
2 norm has a similar trend as the ∞ . However, interestingly the total error over the
state space of the 2 norm is only twice as much as that of the ∞ norm, implying
that there are only a few states which are contributing most of the error. Lastly, in
Fig. 10i, we study the relative error over time. We notice that this error falls to roughly
10−4 , implying that the error between the ACE ansatz and the conditional expectation
is roughly ten thousand times smaller than the conditional expectation. This suggests
that the model likely does exhibit a quadratic conditional expectation structure.
123
V. Sunkara
A T = 0.15 B T = 0.3 C T = 1.2

2.5 10 180
4·
10 −
9 4 160 −5
·1 10
4·
3
0−
4
−
10 −
4
2 8
0
·1
140
4·
4 · 10−5
5
−
4·
0
1 · 10 − 7 ·1
4 · 10
2
10
4
2
120
4
−
10 −
4 · 10−4
3
0
·1
−4
1.5 6 4
10 −3
1 −
4·
10
4·
−3
4 · 10 100
5
1·
0−
4·
Y
Y
5
10 −3
5
·1
10
10 −
−2
4
4 · 10 −3
4 · 10
3
−
80 10
4·
4
1·
10 −
·
4
1 4
−3
3 1
−2
0−
−
4
4 · 10 −4
2
0
−
10 −
1· ·1
·1
10
4·
0
1·
4 1·
·1
5
4
−2
0−
4 · 10 60
10
4
5
3 −3 −
·1
−2
0
2
10 ·1
10 −
1·
4
4
4·
3
4 · 10
1 · 10
−4
4 · 10 −
1·
0.5 2 40
10
10
2
4·
−
−
−
0 −3
0
−2
−5
−3
2
10 −3
0
·1
10 −2
·1
4 · 10
·1
1
4
4 · 10 −5
1
4
2
4
20
4·
−
10 −
4·
0. 10
3·
2
1
4·
0
·1
0 0
0 2 4 6 8 10 12 14 2 4 6 8 10 12 14 16 18 20 22 24 26 20 25 30 35 40 45 50 55 60 65 70
1
X X X
D E F
2.5 10 180
E[Yx (0.15)] E[Yx (0.3)] E[Yx (1.2)]
ηY2 |x (0.15) 9 ηY2 |x (0.3) 160 ηY2 |x (1.2)
2 8
140
7
120
1.5 6
100
Y
Y
5
80
1 4
3 60
0.5 2 40
1 20
0 0
0 2 4 6 8 10 12 14 2 4 6 8 10 12 14 16 18 20 22 24 26 20 25 30 35 40 45 50 55 60 65 70
X X X
G ·10−2
H ·10−2
I ·10−4
4.5 8
2.2
E[Y• (t)] − ηY2 |• (t) ∞ E[Y• (t)] − ηY2 |• (t) 2 E[Y• (t)]−ηY2
|• (t) 2
2 4 E[Y• (t)] 2
7
Error
1.8
3.5
6
1.6
3
Error
Error
1.4 5
2
2.5
1.2
4
Relative
1 2
∞
0.8 3
1.5
0.6 2
1
0.4
0.5 1
0.2
0 0 0
0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4 0.6 0.8 1 1.2
Time (t) Time (t) Time (t)
Fig. 10 Model 3 evaluated at time points T = 0.15, 0.3, and 1.2 (T = 0.6 is given in Fig. 1c). a–c Contour
plots describing the joint probability distributions generated using the OFSP method with a global error
of 10−7 . The distributions corresponding to time points T = 0.15, 0.3, and 1.2, are given from left to
right respectively. d–f The conditional expectation of the joint probability distribution is marked with red
crosses. The ACE polynomial fit of order two is drawn as a solid blue line. The conditional expectations
evaluated at time points T = 0.15, 0.3, and 1.2, are given from left to right respectively. g–f The ∞ and
2 norm of the difference between the OFSP conditional expectation and the ACE quadratic ansatz through
time, respectively. i Relative error with respect to the 2 norm showing how the error in the conditional
expectation evolves with respect to the conditional expectation (color figure online)
E Simple gene switch derivations
E.1 Chemical master equation
d p(G = off, M = m, A = a; t)
dt
= τoff p(G = on, M = m, A = a; t)
+ γ1 (m + 1) p(G = off, M = m + 1, A = a; t)
123
+ κ2 m p(G = off, M = m, A = a − 1; t)
+ γ2 (a + 1) p(G = off, M = m, A = a + 1; t)

− τon + (γ1 + κ2 ) m + (γ2 + τ̂on ) a p(G = off, M = m, A = a; t).
d p(G = on, M = m, A = a; t)
dt
= τon p(G = off, M = m, A = a; t)
+ κ1 p(G = on, M = m − 1, A = a; t)
+ γ1 (m + 1) p(G = on, M = m + 1, A = a; t)
+ κ2 m p(G = on, M = m, A = a − 1; t)
+ γ2 (a + 1) p(G = on, M = m, A = a + 1; t)
+ τ̂on (a + 1) p(G = off, M = m, A = a + 1; t)
− {τoff + κ1 + (γ1 + κ2 ) m + γ2 a} p(G = on, M = m, A = a; t)
E.2 Marginal distributions
We follow the same steps as in the generalised form (see Sect. 2.2). Deriving the CME
for the marginal distribution of the gene and the proteins involves the following two
steps:
• substituting p(G = ·, M = ·, A = ·; t) = p(M = · | G = ·, A = ·; t) p(G =
·, A = ·; t),
• summing over all m ∈ M and then collating all conditional probability terms.
E.2.1 Step 1
d p(G = off, M = m, A = a; t)
dt
= τoff p(M = m | G = on, A = a; t) p(G = on, A = a; t)
+ γ1 (m + 1) p(M = m + 1 | G = off, A = a; t) p(G = off, A = a; t)
+ κ2 m p(M = m | G = off, A = a − 1; t) p(G = off, A = a − 1; t)
+ γ2 (a + 1) p(M = m | G = off, A = a + 1; t) p(G = off, A = a + 1; t)

− τon + (γ1 + κ2 ) m + (γ2 + τ̂on ) a p(M = m | G = off, A = a; t)
× p(G = ·, A = ·; t).
d p(G = on, M = m, A = a; t)
dt
= τon p(M = m | G = off, A = a; t) p(G = off, A = a; t)
+ κ1 p(M = m − 1 | G = on, A = a; t) p(G = on, A = a; t)
+ γ1 (m + 1) p(M = m + 1 | G = on, A = a; t) p(G = on, A = a; t)
+ κ2 m p(M = m | G = on, A = a − 1; t) p(G = on, A = a − 1; t)
+ γ2 (a + 1) p(M = m | G = on, A = a + 1; t) p(G = on, A = a + 1; t)
+ τ̂on (a + 1) p(M = m | G = off, A = a + 1; t) p(G = off, A = a + 1; t)
123
V. Sunkara
− {τoff + κ1 + (γ1 + κ2 ) m + γ2 a} p(M = m | G = on, A = a; t)

× p(G = on, A = a; t)
E.2.2 Step 2
d p(G = off, M = m, A = a; t)
m
dt

= τoff p(M = m | G = on, A = a; t) p(G = on, A = a; t)
m

+ γ1 (m + 1) p(M = m + 1 | G = off, A = a; t) p(G = off, A = a; t)
m

+ κ2 m p(M = m | G = off, A = a − 1; t) p(G = off, A = a − 1; t)
m

+ γ2 (a + 1) p(M = m | G = off, A = a + 1; t) p(G = off, A = a + 1; t)
m

− τon + (γ1 + κ2 ) m p(M = m | G = off, A = a; t) + (γ2 + τ̂on ) a
m
× p(G = off, A = a; t).
d p(G = on, M = m, A = a; t)
m
dt

= τon p(M = m | G = off, A = a; t) p(G = off, A = a; t)
m

+ κ1 p(M = m − 1 | G = on, A = a; t) p(G = on, A = a; t)
m

+ γ1 (m + 1) p(M = m + 1 | G = on, A = a; t) p(G = on, A = a; t)
m

+ κ2 m p(M = m | G = on, A = a − 1; t) p(G = on, A = a − 1; t)
m

+ γ2 (a + 1) p(M = m | G = on, A = a + 1; t) p(G = on, A = a + 1; t)
m

+ τ̂on (a + 1) p(M = m | G = off, A = a + 1; t) p(G = off, A = a + 1; t)
m

− τoff + κ1 + (γ1 + κ2 ) m p(M = m | G = on, A = a; t) + γ2 a
m
× p(G = on, A = a; t)
123
F Formal ACE-Ansatz approximation derivation
Before we begin the derivation, it is important to discuss Assumption 2.1-3. We state

that the joint distribution needs to have non-zero probability over all of the state space
through all time. We can easily violate this condition by starting the Kurtz process with
the initial probability distribution which is non-zero over only a subset of the entire
state space (e.g. a single state). However, the CME generator (2.4) has the feature that
regardless of the initial condition, in an infinitesimal time, all the states have non-zero
probability. Hence, numerically, if the processes does start at a single state, we can
evolve it forward by a small time step using OFSP, and then use this time point for the
initial condition in the dimension reduction methods. In the case of the Simple Gene
Switch example in Sect. 5.1.2, we used t = 1 as the starting point for all dimension
reduction methods.
We use the following notational convention: the approximation of the probability
measure p(G = g, A = a; t) is denoted by the function w(g, a, t), furthermore, the
approximation for the expectation operator E[•(t)] is denoted by the function η• (t).
Then the formal derivation of equation (5.1)–(5.8) are given by Eqs. (F.1)–(F.12).
d w(off, a, t)
= τoff w(on, a, t)
dt
+ k2 η M| (off, a − 1, t) w(off, a − 1, t)
+ γ2 (a + 1) w(off, a + 1, t)
# $
− τon + k2 η M| (off, a, t) + (γ2 + τ̂on ) a w(off, a, t), (F.1)
d w(on, a, t)
= τon w(off, a, t)
dt
+ k2 η M| (on, a − 1, t) w(on, a − 1, t)
+ γ2 (a + 1) w(on, a + 1, t)
+ τ̂on (a + 1) w(off, a + 1, t)
# $
− τoff + k2 η M| (on, a, t) + γ2 a w(on, a, t). (F.2)

g η (t)
η M| (g, a, t) = α − G + η M (t) (F.3)
a η A (t)
d η M (t)
= k1 ηG (t) − γ1 η M (t). (F.4)
dt
d ηG M (t)
=τon (−ηG M (t) − η M (t)) − τoff ηG M (t) + k1 ηG (t)
dt
− γ1 ηG M (t) + τ̂on (η M A (t) − ηG M A (t)) . (F.5)
d η M A (t)
= k1 ηG A (t) − (γ1 + γ2 ) η M A (t) + k2 η M 2 (t)
dt
− τ̂on (η M A (t) − ηG M A (t)) (F.6)
d η M 2 (t) # $
= k1 (2 ηG M (t) + ηG (t)) + γ1 −2 η M 2 (t) + η M (t) . (F.7)
dt
123
V. Sunkara

ηG M A (t) = η M| (on, a, t) a w(on, a, t). (F.8)
a∈Z+

η A (t) = a [w(on, a, t) + w(off, a, t)] . (F.9)
a∈Z+

η A2 (t) = a 2 [w(on, a, t) + w(off, a, t)] . (F.10)
a∈Z+
ηG 2 (t) = ηG (t). (F.11)

α := ηG M (t) − ηG (t) η M (t) η M A (t) − η M (t) η A (t)
−1
ηG 2 (t) − ηG (t)2 ηG A (t) − ηG (t) η A (t)
. (F.12)
ηG A (t) − ηG (t) η A (t) η A2 (t) − η A (t)2
G Two gene toggle switch derivations
We use the following notational convention: the approximation of the probability

measure pr ob(G 0 = g, P = p; t) is denoted by the function w(g, p, t), furthermore,
the approximation for the expectation operator E[•(t)] is denoted by the function
η• (t). Like in the simple gene switch case, the approximation is started at t = 0.35
to satisfy Assumption 2.1-3. We introduce the equations of motions in the following
order: marginal distributions, moments, higher order moment closures, and the linear
ACE-Ansatz approximations.
G.1 Marginal distribution
0 , p, t)
dw(G on
= σ1 η M| (G off
0 , p, t) w(G 0 , p, t)
off
dt
+ ρ2 w(G on 0 , p − 1, t)
+ k ( p + 1) w(G on0 , p + 1, t)
+ σ3 (1.0 − ηG 1 | (G on
0 , p + 1, t)) ( p + 1) w(G 0 , p + 1, t)
on
− σ2 w(G on
0 , p, t)
− ρ2 w(G on
0 , p, t)
− k p w(G on
0 , p, t)
− σ3 (1.0 − ηG 1 | (G on
0 , p, t)) p w(G 0 , p, t)
on
(G.1)
0 ,
dw(G off p, t)
= σ2 w(G on
0 , p, t)
dt
+ ρ1 w(G off
0 , p − 1, t)
0 , p + 1, t)
+ k ( p + 1) w(G off
+ σ3 (1.0 − ηG 1 | (G off
0 , p + 1, t)) ( p + 1) w(G 0 , p + 1, t)
off
− σ1 η M| (G off
0 , p, t) w(G 0 , p, t)
off
123
− ρ1 w(G off
0 , p, t)
0 , p, t)
− k p w(G off
− σ3 (1.0 − ηG 1 | (G off
0 , p, t)) p w(G 0 , p, t)
off
(G.2)
G.2 Moments
We derive the equations of motion for the following eight moments: E[G 1 (t)],
E[M(t)], E[G 0 G 1 (t)], E[G 0 M(t)], E[G 1 P(t)], E[G 1 M(t)], E[P M(t)], and
E[M 2 (t)].
Let μ(t) := [ηG 1 (t), η M (t), ηG 0 G 1 (t), ηG 0 M (t), ηG 1 M (t), η P M (t), η M 2 (t)], then
the equation of motion for the approximation of the moments has the form:
dμ(t)
= A μ(t) + A∗ ,
dt
where
⎡ ⎤
−σ4 0 0 0 −σ3 0 0 0
⎢ −ρ3 + ρ4 −k − σ1 0 σ1 0 0 0 0 ⎥
⎢ ⎥
⎢ 0 0 −σ2 − σ4 0 0 σ1 0 0 ⎥
⎢ ⎥
⎢ 0 −σ1 −ρ3 + ρ4 −k + σ1 − σ2 0 0 0 σ1 ⎥
⎢
A := ⎢ ⎥
ρ1 0 −ρ1 + ρ2 , 0 −k + σ3 − σ4 0 0 0 ⎥
⎢ ⎥
⎢ ρ −k − σ − σ σ ⎥
⎢ 4 0 0 0 0 1 4 3 0 ⎥
⎣ 0 ρ1 0 −ρ1 + ρ2 −ρ3 + ρ4 0 −2 k − σ1 − σ3 0 ⎦
−ρ3 + ρ4 k + 2 ρ3 + σ1 0 −σ1 0 −2 ρ3 + 2 ρ4 0 −2 k − 2 σ1
and
⎡ ⎤
σ3 η P (t)
⎢ ρ3 ⎥
⎢ ⎥
⎢ −σ1 ηG 0 G 1 M (t) + σ3 ηG 0 P (t) − σ3 ηG 0 G 1 P (t) ⎥
⎢ ⎥
A∗ := ⎢
⎢ −σ1 ηG 0 M 2 (t) + ρ3 ηG 0 (t) ⎥.
⎥
⎢ σ η (t) − σ η (t) − σ η (t) ⎥
⎢ 3 P 2 3 P 3 G1 P 2
⎥
⎣ σ1 ηG 0 G 1 M (t) − σ3 ηG 1 P M (t) ⎦
2 σ1 ηG 0 M 2 (t) + ρ3
G.3 Higher order moment closures

Let w(G on
0 , t) := p w(G 0 , p, t), and w( p, t) := w(G 0 , p, t) + w(G 0 , p, t).
on on off
We apply the follow moment closers:
ηG 0 M 2 (t) = (η M| (G on
0 , t) + η M| (G 0 , t)) w(G 0 , t),
2 on on
(G.3)

ηG 1 P 2 (t) = ηG 1 | ( p, t) p 2 w( p, t), (G.4)
p
ηG 0 G 1 M (t) = ηG 1 | (G on
0 , t) η M| (G 0 , t) w(G 0 , t),
on on
(G.5)

ηG 1 P M = ηG 1 | ( p, t) η M| ( p, t) p w( p, t), (G.6)
p
123
V. Sunkara

ηG 0 G 1 P (t) = ηG 1 | (G on
0 , p, t) p w(G 0 , p, t),
on
(G.7)
p

ηG 0 P M (t) = η M| (G on
0 , p, t) p w(G 0 , p, t).
on
(G.8)
p
0 , p, t), to generate the cor-

Similarly, we can use the marginal distribution, w(G on
responding moments:

η P (t) = p w( p, t), (G.9)
p

η P 2 (t) = p 2 w( p, t), (G.10)
p

ηG 0 (t) = w(G on
0 , p, t) (G.11)
p

ηG 0 P (t) = p w(G on
0 , p, t). (G.12)
p
G.4 Linear ACE-Ansatz approximations
We approximate the conditional expectations with the linear ACE anzats:

g ηG 0 (t)
η M| (g, p, t) = α M|G 0 ,P − + η M (t), (G.13)
p η P (t)

g η (t)
ηG1| (g, p, t) = αG 1 |G 0 ,P − G0 + ηG 1 (t), (G.14)
p η P (t)
ηG 1 | ( p, t) = αG 1 |P ( p − η P (t)) + ηG 1 (t), (G.15)
ηG 1 | (g, t) = αG 1 |G 0 (g − ηG 0 (t)) + ηG 1 (t), (G.16)
η M| ( p, t) = α M|P ( p − η P (t)) + η M (t), (G.17)
η M| (g, t) = α M|G 0 (g − ηG 0 (t)) + η M (t). (G.18)
Where the gradients are given by:

α M|G 0 ,P := ηG 0 M (t) − ηG 0 (t) η M (t) η M P (t) − η M (t) η P (t)
−1
ηG 0 (t) − ηG 0 (t)2 ηG 0 P (t) − ηG 0 (t) η P (t)
,
ηG 0 P (t) − ηG 0 (t) η P (t) η P 2 (t) − η P (t)2

αG 1 |G 0 ,P := ηG 0 G 1 (t) − ηG 0 (t) ηG 1 (t) ηG 1 P (t) − ηG 1 (t) η P (t)
−1
ηG 0 (t) − ηG 0 (t)2 ηG 0 P (t) − ηG 0 (t) η P (t)
,
ηG 0 P (t) − ηG 0 (t) η P (t) η P 2 (t) − η P (t)2

ηG 1 P (t) − ηG 1 (t) η P (t)
αG 1 |P := ,
η P 2 (t) − η P (t)2
123
Table 12 SIR system parameters

c1
1 S + I −→ 2I c1 = 0.3 (−1, 1) Susceptible person becomes
an infected person
c2
2 I −→ ∅ c2 = 5.5 (0, −1) Infected person leaves the sys-
tem

ηG 1 G 0 (t) − ηG 1 (t) ηG 0 (t)
αG 1 |G 0 := ,
ηG 0 (t) − ηG 0 (t)2

η M P (t) − η M (t) η P (t)
α M|P := ,
η P 2 (t) − η P (t)2

η M G 0 (t) − η M (t) ηG 0 (t)
α M|G 0 := .
ηG 0 (t) − ηG 0 (t)2
H SIR system parameters
The initial starting population was set to (S(0) = 200, I (0) = 4). The OFSP method
was configured to have a global error of 10−6 , with compression performed every 10
steps where each time step was of length 0.002. The distribution is the snapshot of
the system at t = 0.15. We also omit the recovered state since the total population is
conserved, that is, S(t) + I (t) + R(t) = 204 for all time (See Table 12).
References
Anderson D (2007) A modified next reaction method for simulating chemical systems with time dependent
propensities and delays. J Chem Phys 127(21):214107. https://doi.org/10.1063/1.2799998
Andreychenko A, Mikeev L, Wolf V (2015) Reconstruction of multimodal distributions for hybrid moment-
based chemical kinetics. J Coupled Syst Multiscale Dyn 3(2):156–163. https://doi.org/10.1166/jcsmd.
2015.1073
Andreychenko A, Bortolussi L, Grima R, Thomas P, Wolf V (2017) Distribution approximations for the
chemical master equation: comparison of the method of moments and the system size expansion.
In: Graw F, Matthäus F, Pahle J (eds) Modeling cellular systems. Contributions in mathematical and
computational sciences. Springer, Cham, pp 39–66. https://doi.org/10.1007/978-3-319-45833-5_2
Ball K, Kurtz TG, Popovic L, Rempala G (2006) Asymptotic analysis of multiscale approximations to
reaction networks. Ann Appl Probab 16(4):1925–1961
Banasiak J (2014) Positive semigroups with applications. PhD thesis, University of KwaZulu-Natal, Durban,
South Africa
Barkai N, Leibler S (2000) Biological rhythms: circadian clocks limited by noise. Nature 403(6767):267–
268. https://doi.org/10.1038/35002258
Blake WJ, Krn M, Cantor CR, Collins JJ (2003) Noise in eukaryotic gene expression. Nature 422(6932):633–
637. https://doi.org/10.1038/nature01546
Bokes P, King JR, Wood ATA, Loose M (2012) Exact and approximate distributions of protein and mRNA
levels in the low-copy regime of gene expression. J Math Biol 64(5):829–854. https://doi.org/10.1007/
s00285-011-0433-5
123
V. Sunkara
Burrage K, MacNamara S, Tian TH (2006) Accelerated leap methods for simulating discrete stochastic
chemical kinetics. Posit Syst Proc 341:359–366. https://doi.org/10.1007/3-540-34774-7_46
Cao Z, Grima R (2018) Linear mapping approximation of gene regulatory networks with stochastic dynam-
ics. Nat Commun. https://doi.org/10.1038/s41467-018-05822-0
Cardelli L, Kwiatkowska M, Laurenti L (2016) Stochastic analysis of chemical reaction networks using lin-
ear noise approximation. BioSystems 149:26–33. https://doi.org/10.1016/j.biosystems.2016.09.004
Choudhary K, Oehler S, Narang A (2014) Protein distributions from a stochastic model of the lac operon of
E. coli with DNA looping: analytical solution and comparison with experiments. PLoS ONE. https://
doi.org/10.1371/journal.pone.0102580
Engblom S (2006) Computing the moments of high dimensional solutions of the master equation. Appl
Math Comput 180(2):498–515. https://doi.org/10.1016/j.amc.2005.12.032
Gardner TS, Cantor CR, Collins JJ (2000) Construction of a genetic toggle switch in Escherichia coli.
Nature 403(6767):339–342. https://doi.org/10.1038/35002131
Gillespie DT (1977) Exact stochastic simulation of coupled chemical reactions. J Phys Chem 81(25):2340–
2361. https://doi.org/10.1021/j100540a008
Goutsias J (2005) Quasiequilibrium approximation of fast reaction kinetics in stochastic biochemical sys-
tems. J Chem Phys 122(18):184102. https://doi.org/10.1063/1.1889434
Grima R, Schmidt DR, Newman TJ (2012) Steady-state fluctuations of a genetic feedback loop: an exact
solution. J Chem Phys. https://doi.org/10.1063/1.4736721
Haseltine EL, Rawlings JB (2002) Approximate simulation of coupled fast and slow reactions for stochastic
chemical kinetics. J Chem Phys 117(15):6959–6969. https://doi.org/10.1063/1.1505860
Hasenauer J, Wolf V, Kazeroonian A, Theis FJ (2013) Method of conditional moments (MCM) for the
Chemical Master Equation. J Math Biol. https://doi.org/10.1007/s00285-013-0711-5
Hellander A, Lötstedt P (2007) Hybrid method for the chemical master equation. J Comput Phys 227(1):100–
122. https://doi.org/10.1016/j.jcp.2007.07.020
Henzinger TA, Mikeev L, Mateescu M, Wolf V (2010) Hybrid numerical solution of the chemical master
equation. In: Proceedings of the 8th international conference on computational methods in systems
biology. ACM, Trento, pp 55–65. https://doi.org/10.1145/1839764.1839772
Higham DJ (2008) Modeling and simulating chemical reactions. SIAM Rev 50(2):347–368. https://doi.
org/10.1137/060666457
Jahnke T (2011) On reduced models for the chemical master equation. Multiscale Model Simul 9(4):1646–
1676. https://doi.org/10.1137/110821500
Jahnke T, Huisinga W (2007) Solving the chemical master equation for monomolecular reaction systems
analytically. J Math Biol 54:1–26
Jahnke T, Kreim M (2012) Error bound for piecewise deterministic processes modeling stochastic reaction
systems. SIAM Multiscale Model Simul 10(4):1119–1147. https://doi.org/10.1137/120871894
Jahnke T, Sunkara V (2014) Error bound for hybrid models of two-scaled stochastic reaction systems. In:
Dahlke S, Dahmen W, Griebel M, Hackbusch W, Ritter K, Schneider R, Schwab C, Yserentant H (eds)
Extraction of quantifiable information from complex systems: lecture notes in computational science
and engineering, vol 102. Springer, Berlin, pp 303–319. https://doi.org/10.1007/978-3-319-08159-
5_15
Karlebach G, Shamir R (2008) Modelling and analysis of gene regulatory networks. Nat Rev Mol Cell Biol
9(10):770–780. https://doi.org/10.1038/nrm2503
Khammash M, Munsky B (2006) The finite state projection algorithm for the solution of the chemical
master equation. J Chem Phys 124(044104):1–12. https://doi.org/10.1063/1.2145882
Kurtz TG (1972) Relationship between stochastic and deterministic models for chemical reactions. J Chem
Phys 57(7):2976–2978. https://doi.org/10.1063/1.1678692
MacArthur BD, Ma’ayan A, Lemischka IR (2009) Systems biology of stem cell fate and cellular repro-
gramming. Nat Rev Mol Cell Biol 10(10):672–681. https://doi.org/10.1038/nrm2766
MacNamara S, Bersani AM, Burrage K, Sidje RB (2008) Stochastic chemical kinetics and the total
quasi-steady-state assumption: application to the stochastic simulation algorithm and chemical master
equation. J Chem Phys 129(095105):1–13. https://doi.org/10.1063/1.2971036
Menz S, Latorre J, Schütte C, Huisinga W (2012) Hybrid stochastic-deterministic solution of the chemical
master equation. Multiscale Model Simul 10(4):1232–1262. https://doi.org/10.1137/110825716
Nagel W, Steyer R (2017) Probability and conditional expectation. Wiley series in probability and statistics.
Wiley, Oxford. https://doi.org/10.1002/9781119243496
123
Pájaro M, Alonso AA, Otero-Muras I, Vázquez C (2017) Stochastic modeling and numerical simulation
of gene regulatory networks with protein bursting. J Theor Biol 421:51–70. https://doi.org/10.1016/
j.jtbi.2017.03.017
Rao CV, Arkin AP (2003) Stochastic chemical kinetics and the quasi-steady-state assumption: application
to the Gillespie algorithm. J Chem Phys 118(11):4999–5010. https://doi.org/10.1063/1.1545446
Ruess J (2015) Minimal moment equations for stochastic models of biochemical reaction networks with
partially finite state space. J Chem Phys 143(24):244103. https://doi.org/10.1063/1.4937937
Seber GAF, Lee AJ (2003) Linear regression analysis. Wiley, Hoboken. https://doi.org/10.1002/
9780471722199
Singh A, Hespanha JP (2005) Models for multi-specie chemical reactions using polynomial stochastic
hybrid systems. In: IEEE conference on decision and control, pp 2969–2974. https://doi.org/10.1109/
CDC.2005.1582616
Smadbeck P, Kaznessis YN (2012) Efficient moment matrix generation for arbitrary chemical networks.
Chem Eng Sci 84:612–618. https://doi.org/10.1016/j.ces.2012.08.031
Smadbeck P, Kaznessis YN (2013) A closure scheme for chemical master equations. Proc Natl Acad Sci
110(35):14261–14265. https://doi.org/10.1073/pnas.1306481110
Srinivastiv R, You L, Summers J, Yin J (2002) Stochastic vs. deterministic modeling of intracellular viral
kinetics. J Theor Biol 218(3):309–321. https://doi.org/10.1006/jtbi.2002.3078
Sunkara V (2013) Analysis and numerics of the chemical master equation. PhD thesis, Australian National
University
Sunkara V (2017) PyME (Python solver for the chemical master equation). https://github.com/
vikramsunkara/PyME. Accessed 1 Aug 2019
Sunkara V, Hegland M (2010) An optimal finite state projection method. Procedia Comput Sci 1(1):1579–
1586. https://doi.org/10.1016/j.procs.2010.04.177
Thomas P, Popovi N, Grima R (2014) Phenotypic switching in gene regulatory networks. Proc Natl Acad
Sci 111(19):6994–6999. https://doi.org/10.1073/pnas.1400049111
Van Kampen NG (2007) Stochastic processes in physics and chemistry, 3rd edn. North Holland, Amsterdam
Vilar JMG, Kueh HY, Barkai N, Leibler S (2002) Mechanisms of noise-resistance in genetic oscillators.
Proc Natl Acad Sci 99(9):5988–5992. https://doi.org/10.1073/pnas.092133899
Wilkinson DJ (2006) Stochastic modelling for systems biology. Mathematical and computational biology
series. Chapman & Hall, CRC, Boca Raton
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.
123

Mathematical Biology: Algebraic Expressions of Conditional Expectations in Gene Regulatory Networks

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Mathematical Biology: Algebraic Expressions of Conditional Expectations in Gene Regulatory Networks

Загружено:

Авторское право:

Доступные форматы

Journal of Mathematical Biology

https://doi.org/10.1007/s00285-019-01410-y Mathematical Biology

Algebraic expressions of conditional expectations in gene

Received: 12 September 2018

Keywords Markov chains · Chemical Master Equation · Dimension reduction

Mathematics Subject Classification 65C40 · 60G20 · 92B05

1 Computational Medicine, Zuse Institute Berlin, 14195 Berlin, Germany

2.1 Kurtz process

Let S1 , . . . , S Ns be population counts of Ns different species which can interact with

The propensity/intensity at which a reaction channel fires is given by the function:

where P is an inhomogeneous Poisson process (Kurtz 1972). As a shorthand, processes

For each reaction indexed by j ∈ {1, . . . , Nr }, we denote the corresponding propensity

where A is an infinitesimal generator (Wilkinson 2006; Khammash and Munsky 2006;

Assumption 2.1 We assume the following:

2.2 Dimension reduction

A hybrid method achieves dimension reduction by evolving X (t) stochastically cou-

Similarly, the stoichiometric vector of the jth reaction is written as a tuple v j =

f j (z) = g j (x) h j (y), ∀z ∈ , (2.7)

for non-negative functions g j and h j . The above splitting of species, stoichiometry

p(X = ·; t) : X → [0, 1],

and a family of conditional probability distributions,

{p(Y = · | X = x; t) : Y → [0, 1] for x ∈ X },

∂t p(Z = z; t) = p(Y = y | X = x; t) ∂t p(X = x; t)

Then, summing the above expression

Hence, (2.10) can be rewritten as

In this form, it is clear that to study or evolve the sub-processes of a high-dimensional

∂(E[Yx (t))] p(X = x; t))

3 Algebraic conditional expectation

3.1 Linear conditional expectation

Statement 2 We begin with the definition of the covariance of X and Y ,

then expanding the square brackets gives

= α E[X X T ] − α E[X ] E[X ]T ,

3.2 Polynomial conditional expectation

For brevity, we fix NsY , NsX = 1.

E[Yx ] = κm x m + κm−1 x m−1 + · · · + κ1 x + κ0 , (3.3)

applying Bayes’ theorem to the joint distribution gives

then collating the y terms reduces the expression to

substituting (3.3) for the conditional expectation gives

Lastly, interchanging the summations gives:

Corollary 3.1 Let

Since is invertible, it follows that coefficients in (3.3) can be computed by evaluating

with the coefficients κi , for i = 0, . . . , m, defined in (3.5).

4.1 A switch’s perspective

Proof We will prove the statements in order.

E[S] := 0 p(S = off) + 1 p(S = on),

cov(S, S) := (0 − E[S])2 p(S = off) + (1 − E[S])2 p(S = on),

cov(Y , S) := E[SY ] − E[S] E[Y ],

applying statement 3 and 1 reduces the right-hand side terms to:

= E[Yon ] p(S = on) − E[Y ] p(S = on),

Statement 5 Since S = S 2 , the value of E[S 2 Y ] is the same as E[SY ].

Theorem 4.1 Let S be a switch and Y be particles as in Definition 4.1, then

−E[Yon ] p(S = on) + E[Y ]

Rearranging Lemma 4.1-4 for E[Yon ] and substituting it in gives:

−(cov(Y , S)/ p(S = on) + E[Y ]) p(S = on) + E[Y ]

cov(Y , S) + E[S] E[Y ]

multiplying top and bottom by 1 − E[S] gives us

then substituting Lemma 4.1-2 into the denominator gives us

If we compare the components of the conditional expectation of the switch to the

4.2 A particle’s perspective

p(Y = y | S = on) E[S]

Furthermore, let y∗ ∈ Y , then

p(Y = y∗ | S = on) = p(Y = y∗ | S = off) if and only if E[S y∗ ] = E[S].

Proof We can prove the first statement using Bayes’ theorem.

p(Y = y |X = x) ≈ (ηY1 |X (x), ηY2 2 |X (x))(Y = y). (5.9)

p(X = x, Y = y) ≈ (ηY1 |X (x), ηY2 2 |X (x))(Y = y) p(X = x). (5.10)

Method Number of moments 1 Error in distribution

p(Y = y |X = x) ≈ (ηY1 |X (x), ηY2 2 |X (x))(Y = y). (5.13)