0% found this document useful (0 votes)

38 views

Parameter Estimation Via Stochastic Variants of The ECM Algorithm With Applications To Plant Growth Modeling

Mathematical modeling of plant growth has gained increasing interest in recent years due to its potential applications. A general family of models, known as functional–structural plant models (FSPMs) and formalized as dynamic systems, serves as the basis for the current study. Modeling, parameterization and estimation are very challenging problems due to the complicated mechanisms involved in plant evolution. A specific type of a non-homogeneous hidden Markov model has been proposed as an extension of the GreenLab FSPM to study a certain class of plants with known organogenesis. In such a model, the maximum likelihood estimator cannot be derived explicitly. Thus, a stochastic version of an expectation conditional maximization (ECM) algorithm was adopted, where the E-step was approximated by sequential importance sampling with resampling (SISR). The complexity of the E-step creates the need for the design and the comparison of different simulation methods for its approximation. In this direction, three variants of SISR and a Markov Chain Monte Carlo (MCMC) approach are compared for their efficiency in parameter estimation on simulated and real sugar beet data, where observations are taken by censoring plant’s evolution (destructive measurements). The MCMC approach seems to be more efficient for this particular application context and also for a large variety of crop plants. Moreover, a data-driven automated MCMC–ECM algorithm for finding an appropriate sample size in each ECM step and also an appropriate number of ECM steps is proposed. Based on the available real dataset, some competing models are compared via model selection techniques.

Uploaded by

Joack Raynor

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views

Parameter Estimation Via Stochastic Variants of The ECM Algorithm With Applications To Plant Growth Modeling

Uploaded by

Joack Raynor

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Computational Statistics and Data Analysis 78 (2014) 8299

Contents lists available at ScienceDirect

Computational Statistics and Data Analysis

journal homepage: www.elsevier.com/locate/csda

Parameter estimation via stochastic variants of the ECM

algorithm with applications to plant growth modeling
S. Trevezas a, , S. Malefaki b , P.-H. Cournde a
a

Laboratory of Mathematics Applied to Systems, cole Centrale Paris, Grande Voie des Vignes, 92290 Chtenay-Malabry, France

Department of Mechanical Engineering & Aeronautics, University of Patras, GR 26500 Rio Patras, Greece

article

info

Article history:
Received 20 February 2013
Received in revised form 7 April 2014
Accepted 9 April 2014
Available online 18 April 2014
Keywords:
Plant growth model
Hidden Markov model
Monte Carlo ECM-type algorithm
Metropolis-within-Gibbs
Automated Monte Carlo EM algorithm
Sequential importance sampling with
resampling

abstract
Mathematical modeling of plant growth has gained increasing interest in recent years due
to its potential applications. A general family of models, known as functionalstructural
plant models (FSPMs) and formalized as dynamic systems, serves as the basis for the
current study. Modeling, parameterization and estimation are very challenging problems
due to the complicated mechanisms involved in plant evolution. A specific type of a
non-homogeneous hidden Markov model has been proposed as an extension of the
GreenLab FSPM to study a certain class of plants with known organogenesis. In such a
model, the maximum likelihood estimator cannot be derived explicitly. Thus, a stochastic
version of an expectation conditional maximization (ECM) algorithm was adopted, where
the E-step was approximated by sequential importance sampling with resampling (SISR).
The complexity of the E-step creates the need for the design and the comparison of
different simulation methods for its approximation. In this direction, three variants of SISR
and a Markov Chain Monte Carlo (MCMC) approach are compared for their efficiency in
parameter estimation on simulated and real sugar beet data, where observations are taken
by censoring plants evolution (destructive measurements). The MCMC approach seems
to be more efficient for this particular application context and also for a large variety
of crop plants. Moreover, a data-driven automated MCMCECM algorithm for finding an
appropriate sample size in each ECM step and also an appropriate number of ECM steps is
proposed. Based on the available real dataset, some competing models are compared via
model selection techniques.
2014 Elsevier B.V. All rights reserved.

1. Introduction
Mathematical modeling of plant development and growth has gained increasing interest in the last twenty years,
with potential applications in agricultural sciences, plant genetics or ecology. Functionalstructural plant models (FSPMs,
Sievnen et al., 2000) combine the description of plant architectural development and ecophysiological functioning,
and offer the most promising perspectives for a better understanding of plant growth (Vos et al., 2010). However, the
parameterization of FSPMs is generally impeded by several difficulties: the complex and interacting mechanisms which
guide plants evolution generally are translated into strongly nonlinear models involving a large number of equations
and parameters; experimental protocols to collect detailed data are heavy and often inaccurate; finally, plant models are
generally developed without an appropriate statistical framework. As a consequence, plant growth models often remain

Corresponding author. Tel.: +33 0141131798.

E-mail addresses: [email protected] (S. Trevezas), [email protected] (S. Malefaki), [email protected] (P.-H. Cournde).

http://dx.doi.org/10.1016/j.csda.2014.04.004
0167-9473/ 2014 Elsevier B.V. All rights reserved.

S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 8299

descriptive without a real predictive capacity. Efforts have thus been undertaken in the recent years to develop methods for
parameter estimation and uncertainty assessment adapted to complex models of plant growth (Ford and Kennedy, 2011;
Cournde et al., 2011; Trevezas and Cournde, 2013). In this paper, a certain class of plants with known organogenesis (in
plants, organogenesis is the process of creation of new organs) is studied, whose growth is modeled by GreenLab FSPM (de
Reffye and Hu, 2003). A lot of agronomic plants can be modeled in this way, like maize (Guo et al., 2006), rapeseed (Jullien
et al., 2011), grapevine (Pallas et al., 2011) or even trees (Mathieu et al., 2009). The parameters of the model are related to
plant functioning. The vector of observations consists of organ masses, measured only once by censoring plants evolution
at a given observation time (destructive measurements).
In Cournde et al. (2011), a first approach for parameter estimation was introduced but based on the rather restrictive
assumption of an underlying deterministic model of biomass production and uncorrelated errors in the mass measurements
of different organs in the plant structure. In Trevezas and Cournde (2013), the authors proposed a more general framework
for statistical analysis which can potentially be applied to a large variety of plant species by taking into account process
and measurement errors. They provided a frequentist-based statistical methodology for parameter estimation in plants
with deterministic organogenesis rules. This framework can also serve as the basis for statistical analysis in plant models
with stochastic organogenesis (see Kang et al., 2008 for the description of GreenLab with stochastic organogenesis). The
basic idea consists in describing data (organ masses) measurements as resulting from the evolution of a non-homogeneous
hidden Markov model (HMM), where the hidden states of the model correspond to the sequence of unknown biomasses
(masses measured for living organisms) produced during successive growth cycles. In such a complex model, the maximum
likelihood estimator (MLE) cannot be derived explicitly and for this reason a Monte Carlo ECM-type (Expectation Conditional
Maximization) algorithm (Dempster et al., 1977; Meng and Rubin, 1993; Jank, 2005b; McLachlan and Krishnan, 2008) was
adopted to compensate for the non-explicit E-step and also the non-explicit M-step. The authors used sequential importance
sampling with resampling (SISR) to simulate from the hidden states given the observed data. The M-step is performed with a
conditional maximization approach (see, ECM in Meng and Rubin, 1993), in which the parameters of the model are separated
into two groups, one for which explicit updates can be derived by fixing the parameters of the other group, and one for which
updates are derived via numerical maximization.
Due to the typically large number of equations and time steps to consider in plant growth models, the computational
load is an important factor to take into account. Consequently, the efficiency of the estimation algorithms is a key issue to
consider, especially when the final objective is decision-aid in agriculture. Likewise, as one of the objectives of FSPM is to
be able to differentiate between genotypes (two different genotypes should be characterized by two different vectors in the
parameter space Letort, 2008 and Yin and Struik, 2010), the accuracy of the estimation algorithms has to be assessed. In
this context, it is very important to profit from advanced simulation techniques in order to reduce the Monte Carlo error
associated with a given estimation algorithm. For this reason, we focus on the comparison of different simulation techniques
which are performed in the E-step. The resulting approximation of the so-called Q -function (computed in the E-step) is
crucial to the quality of parameter estimation. The most efficient algorithm can subsequently be used to calibrate agronomic
plants with the method of MLE and then make model comparison and selection. An example of this type is presented in the
current paper based on a dataset from the sugar beet plant. Moreover, in order to enhance computational efficiency, the
design of automated and data driven algorithms should help by making an efficient use of Monte Carlo resources, for the
benefit of the users. The above arguments motivate the current study.
In this paper, we compare different simulation techniques to perform the E-step. In particular, three variants of sequential
importance sampling with resampling (SISR) and a Markov Chain Monte Carlo algorithm (MCMC). The three variants
concern: (i) the SISR presented in Trevezas and Cournde (2013), where the resampling step is multinomial (Gordon et al.,
1993), (ii) a modification of the previous algorithm by performing the resampling step with a combination of residual and
stratified resampling (see, eg., Capp et al., 2005 and references therein) and (iii) a sequential importance sampling algorithm
with partial rejection control (see, eg., Liu et al., 1998, 2001). The variant of the MCMC algorithm that we developed is a
hybrid Gibbs sampler (Geman and Geman, 1984; Gelfand and Smith, 1990), where simulations from the full conditional
distributions of the hidden states were replaced by MetropolisHastings (MH) steps (Metropolis et al., 1953; Hastings,
1970). Having as target to further optimize the MCMC approach and to simplify a routine use of the method, a data-driven
automated algorithm is proposed. Moreover, a data-driven automated algorithm is proposed in order to simplify a routine
use of the proposed method. The main benefits of such an algorithm in the EM context concern the automatic determination
of the Monte Carlo sample size in each EM step and of the total number of EM steps. One of the most commonly used in
practice algorithms of this type which is efficient and computationally cheap is the one presented in Caffo et al. (2005). We
adapted this algorithm to our context to find an appropriate sample size in each ECM step and also an appropriate number
of ECM steps. The details can be found in Section 4. Simulation studies from a synthetic example and also a real dataset from
the sugar-beet plant were used to illustrate the performance of the competing algorithms.
The MCMCECM algorithm proved to be the most efficient in parameter estimation for the plant growth model that we
study in this paper. Additional to the significant reduction of the Monte Carlo error, the MCMC algorithm revealed another
advantage compared to SISR in the specific context of this study. Since plant organs have generally large expansion durations,
then by censoring plants evolution at a given time, a whole batch of organs will have not completed their expansion
(immature organs). As will be explained in Section 3, the MCMC algorithm can better handle this type of asymmetry. The
automated version of MCMCECM was thus selected to make a further statistical inference. In particular, two different types
of hidden Markov models were described and tested on a real dataset for their fitting quality.

S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 8299

The rest of this paper is organized as follows. In Section 2, we review the basic assumptions of the GreenLab FSPM and we
give a short description of the non-homogeneous hidden Markov model developed in Trevezas and Cournde (2013). We
describe as well a new competing model which operates in the log-scale and give the framework for making MLE feasible
within the framework of EM-type algorithms. In Section 3, we describe the MCMC approximation to the Q -function of the
E-step and compare the current approach based on MCMC with the one proposed in Trevezas and Cournde (2013) based on
three variants of SISR. Automated Monte-Carlo EM algorithms are reviewed in Section 4, and the adaptation of the automated
algorithm of Caffo et al. (2005) in our context is also provided. The resulting automated MCMCECM is compared with the
non-automated one in synthetic examples. In Section 5, the performance of the aforementioned algorithms is tested on a
real dataset from the sugar-beet plant and a model comparison is also achieved. Finally, in the last section an extended
discussion is provided.
2. Description of the plant growth model
In this section we recall the basic assumptions of the GreenLab model and its formulation as an HMM given in
Trevezas and Cournde (2013). Additionally, we propose a new candidate model and describe an appropriate version of an
ECM-algorithm for MLE. Starting with the initial mass of the seed q0 , plant development is considered to be the result of a
cyclic procedure. The cycle duration is determined by the thermal time needed to set up new organs in the plant and is called
Growth Cycle (GC ). At each GC the available biomass is allocated to organs in expansion and at the same time new biomass is
produced by the green (non-senescent) leaves and will be available for allocation at the next GC . The set of different classes
(types) of organs of a plant is denoted by O . In our application context with the sugar-beet plant, the set of organs consists
of blade b, petiole p and the root r, i.e. O = {b, p, r }. Let us now give the assumptions concerning biomass production and
allocation.
2.1. Modeling assumptions
In the rest of the paper, we use the compact notation xi:j for vectors (xi , . . . , xj ), where i j.
Assumption 1. (i) At the n-th GC , denoted by GC (n), the produced biomass qn is fully available for allocation to all
expanding (preexisting + newly created) organs and it is distributed proportionally to the class-dependent empirical
sink functions given by
so ( i ;

poal

) = po c (ao , bo )

i + 0.5
to

a o 1
1

i + 0.5
to

bo 1

i {0, 1, . . . , to 1},

(1)

where poal = (po , ao , bo ) R+ [1, +)2 is the class specific parameter vector with (po )oO a vector of proportionality
constants representing the sink strength of each class (by convention pb = 1), to is the expansion time period for organs
belonging to the class o O and c (ao , bo ) is the normalizing constant of a discrete Beta(ao , bo ) function, where its
unnormalized generic term is given by the product of the two last factors of (1).
(ii) As in Lemaire et al. (2008), we suppose that expansion durations are the same for blades and petioles and T denotes
their common value: tb = tp = T .
We denote by pal , (poal )oO the vector of all allocation parameters and by (Nno )oO the vector of organs preformed at
GC (n), for all n N (determined by plant organogenesis, and deterministic in this study).
Definition 1. The total biomass demand at GC (n), denoted by dn , is the quantity expressing the sum of sink values of all
expanding organs at GC (n).
Since we consider that there is only one root compartment and the fact that an organ is in its ith expansion stage if and
only if it has been preformed at GC (n i) (see Assumption 1), we have that
dn (pal ) =

oO {r }

min(n,T 1)

Nnoi so (i; poal ) + sr (n; pral ).

(2)

i =0

Except for the initial mass of the seed q0 subsequent biomasses {qn }n1 are the result of photosynthesis by leaf blades.
Definition 2. (i) The photosynthetically active blade surface at GC (n + 1), denoted by sact
n , is the quantity expressing
the total surface area of all leaf blades that have been preformed until GC (n) and will be photosynthetically active at
GC (n + 1),
act
(ii) the ratio (percentage) of the allocated ql which contributes to sact
n will be denoted by l,n .
Assumption 2. (i) The initial mass of the seed q0 is assumed to be fixed and known,
(ii) the leaf blades have a common photosynthetically active period which equals T ,
(iii) the leaf blades have a common surfacic mass denoted by eb .
Now, we describe how biomasses {qn }n1 are obtained.

S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 8299

Assumption 3. In the absence of modeling errors, the sequence of produced biomasses {qn }n1 is determined by the
following recurrence relation known as the empirical BeerLambert law (see Guo et al., 2006):
qn+1 = Fn (q(nT +1)+ :n , un+1 ; p) = un+1 spr

1 exp kB

sact
n (q(nT +1)+ :n ; pal )
spr

(3)

where x+ = max(0, x), un denotes the product of the photosynthetically active radiation during GC (n) modulated by a
function of the soil water content, p , (, spr , kB , pal ), is the radiation use efficiency, spr is a characteristic surface that
represents the two-dimensional projection on the ground, of the space potentially occupied by the plant, kB is the extinction
coefficient in the BeerLambert extinction law, sact
n is given by
n

1
sact
n (q(nT +1)+ :n ; pal ) = eb

lact
,n (pal ) ql ,

(4)

l=(nT +1)+

and

lact
,n (pal ) =

min(l,l+T n1)

dl (pal )

Nlbj sb (j; pbal ),

(n T + 1)+ l n,

(5)

j =0

where dl is given by (2), sb by (1) and Nnb is the number of blades preformed at GC (n).
Note that qn+1 also depends on pal , but only through sact
n , and that p could have lower dimension if some of the aforementioned parameters are fixed or calibrated in the field.
In Trevezas and Cournde (2013) the available data Y were rearranged sequentially into sub-vectors Yn by taking into
account the preformation time (one GC before their first appearance) of all available organs except for the root mass which
was excluded from the data vector. In this paper we use the same data decomposition and we also indicate a way to take
into account the root mass. Each sub-vector Yn contains the masses of the organs which are preformed at GC (n). Whenever
the root mass is included, it is contained in the last sub-vector. If we denote by Gn the vector-valued function that expresses
the theoretical masses of all the different classes of organs which started their development at GC (n), then by summing the
allocated biomass at each expansion stage and Assumption 1 we obtain directly

Gn (qn:(n+T 1) ; pal ) =

T 1

qj+n

j=0

dj+n (pal )

so (j;

poal

(6)

oO{r }

The theoretical root mass, whenever included, is given by

Gr (q0:N ; pal ) =

dj (pal )
j =0

sr (j; pral ).

(7)

The following assumptions determine the stochastic nature of the model.

Assumption 4. Let (Wn )nN and (Vn )nN be two mutually independent sequences of i.i.d. random variables and vectors respectively, independent of Q0 , where Wn N (0, 2 ) and Vn Nd (0, ), with an unknown covariance matrix and d the
cardinality of O {r }. By setting Nno = 1, o {b, p} two types of model equations will be assumed:
(a) model M1 : for n 0,
Qn+1 = Fn (Q(nT +1)+ :n ; p)(1 + Wn ),

(8)

Yn = Gn (Qn:(n+T 1) ; pal ) + Vn ,

(9)

(b) model M2 : for n 0,

Qn+1 = Fn (Q(nT +1)+ :n ; p) eWn ,

(10)

Yn = Gn (Qn:(n+T 1) ; pal ) eVn ,

(11)

where Fn is given by (3), Gn is given by (6), ex , (ex1 , . . . , exd ) for a d-dimensional vector x = (x1 , . . . , xd ) and x y is
the Hadamard (entrywise) product of two vectors.
Remark 1. (i) Assumption 4(a) corresponds to the model equations adopted in Trevezas and Cournde (2013).
(ii) When a dataset Y0:N is available and the root mass is included, then the dimension of YN , GN and VN given in (9) or (11)
is increased by one to incorporate the root mass given by (7) observed with error Vn,d+1 N (0, r2 ).
Both models given above correspond to state-space models with state sequence Q, satisfy Assumptions 13, and differ
in the state and observation equations given by Assumptions 4(a) or 4(b).
Now, we give their equivalent formulation as hidden Markov models (HMM), see Capp et al. (2005). The proof is direct
and will be omitted.

S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 8299

Proposition 1. Under Assumptions 14, the bivariate stochastic process (Q, Y) defined on a probability space ( , F , P ), where

= (p, ) or (p, 2 , ) can be represented as an HMM, where

(i) the hidden sequence Q, with values in R+ , evolves as a time-inhomogeneous T -th order Markov chain with initial distribution
P (Q0 ) = q0 () (dirac at q0 ), where q0 R+ , and transition dynamics due to Assumption 4(a) for model M1 :

P (Qn+1 | Q(nT +1)+ :n ) N Fn (Q(nT +1)+ :n ; p), 2 Fn2 (Q(nT +1)+ :n ; p) ,

(12)

and due to Assumption 4(b) for model M2 :

P (Qn+1 | Q(nT +1)+ :n ) = log N log Fn (Q(nT +1)+ :n ; p), 2 ,

(13)

where log N stands for the log-normal distribution,

(ii) the observable sequence Y, with values in (R+ )d , conditioned on Q forms a sequence of conditionally independent random
vectors and each Yn given Q depends only on Qn:(n+T 1) with conditional distribution due to Assumption 4(a) for model M1 :

P (Yn | Qn:(n+T 1) ) Nd Gn (Qn:(n+T 1) ; pal ), ,

(14)

and due to Assumption 4(b) for model M2 :

P (Yn | Qn:(n+T 1) ) = log Nd log Gn (Qn:(n+T 1) ; pal ), ,

(15)

where log x , (log x1 , . . . , log xd ) for a d-dimensional vector x = (x1 , . . . , xd ).

Remark 2. The model M1 is the one assumed in Trevezas and Cournde (2013) and normality in (12) and (14) is only valid
approximately (with small variances) since we deal with positive r.v.
2.2. Maximum likelihood estimation
The available data y0:N contain organ masses, measured at a given GC (N ) by censoring plants evolution (destructive
measurements). In Cournde et al. (2011) a parameter identification method was proposed for the GreenLab model in
the absence of modeling errors in biomass production ( 2 = 0) and correlation (diagonal covariance matrix ) in the
mass measurements. In Trevezas and Cournde (2013) the method was extended to cover the case of a special type of
modeling errors and to introduce correlation in the mass measurements. The authors managed to estimate the parameter
based on an appropriate stochastic variant of a generalized EM-algorithm (Dempster et al., 1977; Meng and Rubin, 1993;
Jank, 2005b; McLachlan and Krishnan, 2008). Each iteration of an EM algorithm consists of an expectation step (E-step)
and a maximization step (M-step). The E-step involves the computation of the conditional expectation of the complete
data log-likelihood given the observed data under the current parameter value (called Q -function). In the M-step, the
parameters are updated by maximizing the Q -function of the E-step. When the integral involved in the E-step is analytically
intractable, then the Q -function, denoted here by Q(; ), should be approximated. Several efforts have been made in this
direction, e.g., the Stochastic EM (SEM) (Celeux and Diebolt, 1985), the Monte Carlo EM (MCEM) (Wei and Tanner, 1990),
the Stochastic Approximation EM (SAEM) (Delyon et al., 1999), as well as the Quasi-Monte Carlo EM (Jank, 2005a). The
common characteristic of the aforementioned variants is the approximation of the Q -function by simulating the hidden
state sequence from its conditional distribution given the observed data (see Jank, 2005b and Jank, 2006). In the context of
hidden Markov models (Capp et al., 2005) the two most popular and efficient simulation mechanisms concern SISR, see
Gordon et al. (1993), Doucet et al. (2001), Capp et al. (2005), and MCMC, see Metropolis et al. (1953), Hastings (1970), Geman
and Geman (1984), Gelfand and Smith (1990). The resulting algorithms will be referred to as the SISREM and MCMCEM
algorithm. In order to perform the E-step for the HMM M1 the authors in Trevezas and Cournde (2013) approximate the Q (; ). In the next section we propose an approximation of the Q -function based on MCMC.
function via an SISR estimate Q
The estimate of the Q -function can be expressed in a unified way as:

(; ) =
Q

wi log p (q0(i:)N , y0:N ),

(16)

i=1

(i)

where p (q0:N , y0:N ) is the density function of the complete model when the true value is and {wi , q0:N } is a weighted

(i)
M-sample (wi , q0:N and M depend on ) from the conditional distribution of the hidden states q0:N given the observed data
y0:N when the true parameter is . In the case of an MCMC estimate the weights wi are equal to 1/M.

Very often in real-life applications the M-step is analytically intractable as well. Unfortunately, any stochastic EM-type
algorithm that can be designed for the HMMs M1 and M2 given by Proposition 1 leads to a non-explicit M-step as well. For
this reason, a numerical maximization procedure of quasi-Newton type could be implemented at each iteration of a stochastic EM algorithm (see Trevezas and Cournde, 2013) in the same way as it is implemented in a deterministic EM algorithm
(see Lange, 1995). Nevertheless, it is certainly desirable, whenever possible, for computational cost and accuracy reasons
to reduce the number of parameters to be updated via numerical maximization. A way to overcome a complicated M-step
was proposed in Meng and Rubin (1993) with the so-called ECM (Expectation Conditional Maximization) algorithm, where
the authors separated the intractable M-step into smaller tractable conditional M-steps and updated in a cyclic fashion the
parameters of the model. In order to perform the M-step for the HMM M1 the authors in Trevezas and Cournde (2013)

S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 8299

combined conditional and numerical maximization. First, they updated explicitly in a conditional maximization step the
parameters which have explicit updates given fixed values of the rest and then updated the rest of the parameters by the
BroydenFletcherGoldfarbShanno (BFGS) quasi-Newton algorithm. This approach is also adopted here for both models
(; (t ) ) denote the approximation of Q (; (t ) ) given by (16) in the t-th EM-iteration ( = (t ) ) and
M1 and M2 . Let Q
(1 , 2 ) be a decomposition of in two sub-vectors, where 1 can be explicitly updated given 2 . The maximization of
(; (t ) ) with respect to = (1 , 2 ) is described by the following two steps:
Q

1(t +1) = arg max Q (1 , 2(t ) ; 1(t ) , 2(t ) ),

(17)

2(t +1) = BFGS max Q (1(t +1) , 2 ; 1(t ) , 2(t ) ),

where the notation BFGS max corresponds to the solution of the maximization problem with the BFGS quasi-Newton
algorithm. The explicit step (17) corresponding to model M1 can be found in Trevezas and Cournde (2013). The solution to
(17) for the model M2 is given below. The proof is provided in the Appendix.
Proposition 2. Let = (1 , 2 ), where 1 = (, ) and 2 contains all parameters of p except for . The update equations for
1 are given as follows:

N (2 ; ) = exp N

[log Qn log Fn1 (2 ) | y0:N ] ,

(18)

n =1

N (2 ; ) = (N + 1)1

log yn log Gn (2 ) log yn log Gn (2 )

| y0:N .

(19)

n =0

If 2 is estimated as well, then its update equation is given by:

N2 (2 ; ) = N 1

log Qn log Fn1 (2 ) + log

2
| y0:N log
N (2 ; ) .

(20)

n=1

3. MCMC approximation of the Q -function

In this section we propose a suitable approximation of the Q -function by using an MCMC algorithm and we compare this
approach with the one based on SISR developed in Trevezas and Cournde (2013), and with two improvements of the latter
algorithm that will be briefly described in Section 3.2.
3.1. E-step via Markov Chain Monte Carlo
At each iteration of the EM-algorithm, the basic problem is to sample in the most effective way from p (q1:N | q0 , y0:N ),
where is the current value of the model parameters. For the rest of this paper, we alleviate the index since we focus on
the general sampling problem ( is known and fixed at each iteration). Thus, conditionally on Q0 = q0 and Y0:N = y0:N , the
hidden states are sampled from:
Q1:N p(q1:N |q0 , y0:N ) p(q1:N , y0:N |q0 ).

(21)

One of the most important MCMC algorithms for sampling from a multidimensional distribution (such as (21)) is the
Gibbs sampler (Geman and Geman, 1984; Gelfand and Smith, 1990). The Gibbs sampler uses only the full conditional
distributions in order to sample a Markov chain with stationary distribution the corresponding multidimensional
target one. The full conditional distribution of Qn given all the other variables, denoted by n (qn |q0:n1 , qn+1:N ) ,
p(qn |q0:n1 , qn+1:N , y0:N ) corresponding to model M1 , defined by Eqs. (12) and (14), can be written in the form:

n (qn |q0:n1 , qn+1:N )

(n+
T )N

1 exp sact
i1 (qn )

i=n+1

exp

1
2 2
1
2

qn
Fn1 (qn )

2
1

(n+
T )N

i=n+1

qi
Fi1 (qn )

(yi Gi (qn ))

(yi Gi (qn )) ,

(22)

i=(nT +1)+

where = kB /spr > 0, see (3), and all the other quantities that appear in this expression are explained in Section 2 and
are expressed here only as functions of qn . The full conditional distributions corresponding to model M2 can be defined in a
similar manner.

S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 8299

Clearly, direct simulation from (22) is impossible. For this reason, alternative sampling techniques are required,
such as a hybrid Gibbs sampler. The hybrid Gibbs sampler is a Gibbs sampler where at least one of the simulations
of the full conditional distributions is replaced by a MetropolisHastings (MH) step (Metropolis et al., 1953; Hastings,
1970). Let n , n = 1, . . . , N, be the densities of the unnormalized full conditional distributions given by (22) and
fn (zn |q1:n1 , qn+1:N ), i = 1, . . . , N be the densities of the proposal distributions. Let also

qnt 1

, zn ) = min 1,

n (zn |qt1:n1 , qtn+11:N )

n (qtn1 |qt1:n1 , qtn+11:N )

1
fn (qtn1 |qt1:n1 , zn , qtn
+1:N )

1
fn (zn |qt1:n1 , qtn1 , qtn
+1:N )

denote the acceptance probability of the MH-step. The hybrid Gibbs sampler can be described as follows:
Initialize q01:N
For t = 1 to M
For n = 1 to N
1
Draw zn fn (qn |qt1:n1 , qtn
+1:N )
t
Set qn = zn with probability (qtn1 , zn )
otherwise set qtn = qtn1

End
End
The proposal distribution can be chosen arbitrarily with the limitation to satisfy the conditions that ensure the
convergence to the target distribution (Meyn and Tweedie, 1993; Robert and Casella, 2004). Nevertheless, the proposal
distribution affects significantly the convergence rate of the Markov chain to the target distribution. The convergence is
faster if the proposal is closer to the target. Moreover, it should be easy to sample from and not computationally expensive.
In this paper, we used as a proposal distribution for the hidden states the one resulting from the prior transition kernel of
the hidden chain under the current parameters values given by (12) for model M1 and by (13) for model M2 . In the next
subsection we give numerical evidence that the MCMCECM algorithm based on this type of proposal distributions was
generally more efficient than all the versions of the SISRECM. Even if the latter versions involve more informative proposal
distributions (Trevezas and Cournde, 2013, p. 258) than using the prior transition kernel, this seeming advantage is not
enough to outperform the gain from the smoothing property of the Gibbs sampler, where at each conditional simulation
(for a given sweep of the algorithm) of a hidden state, all data are taken into account. For this reason, we kept the proposals
in the MetropolisHastings step as simple as possible. We also tried a Random-Walk MetropolisHastings with different
variances and the results that we obtained were worse.
3.2. MCMC versus sequential importance sampling using simulated datasets
In order to evaluate the effect of the MCMC approximation of the Q -function in parameter estimation and to compare
this approach with the one proposed by Trevezas and Cournde (2013) using SISR, we performed several tests with
simulated data. Moreover, we add in this comparison two improved versions of the original SISRECM, which was based on
multinomial resampling, by improving the resampling step.
The first one consists in replacing multinomial resampling with a combination of residual and stratified resampling.
Residual sampling is a variance reduction technique, which is useful when sampling should be performed, and it is also
known as remainder stochastic sampling (Whitley, 1994). Stratified sampling is based on the idea of stratification (Kitagawa,
1996), where the support (0, 1] of the uniform distribution is partitioned into different strata. This method is based on an
alternative way of generating the uniform random variables, which are needed for the inversion method in the resampling
step, from the strata instead of just (0, 1]. Both resampling techniques dominate multinomial resampling in the sense that
the conditional variances of both the residual and the stratified sampling are always lower than that of multinomial sampling
(see, eg., Capp et al., 2005). It is also known that the combination of these two sampling mechanisms can only decrease the
conditional variance (Capp et al., 2005, p. 248, Remark 7.4.5). The details of the proposed implementation can be found in
Capp et al. (2005).
The second one is sequential importance sampling with partial rejection control (Liu et al., 2001), which is a combination
of rejection control (Liu et al., 1998) and resampling and can be seen as a delayed resampling as explained in Liu et al. (2001).
The rejection control method enables the rejuvenation of the particles in appropriately chosen check-points. These points
can be chosen statically (in advance) or dynamically (according to a control threshold). In the variant that we implemented,
the check-points are selected dynamically, and rejection control takes place when the effective sample size drops below
a first threshold, exactly as in the case of the multinomial resampling. In particular, when a check-point is encountered, a
second control threshold is computed, which in our case is the median of the weights, and particles with weights inferior
to the control threshold are only accepted with probability equal to the ratio of their weights to the control threshold.
When a rejection takes place in the classical rejection control algorithm, the particle is totally discarded and a new one
is proposed. The procedure is repeated, until the proposed particle survives from all the previous checkpoints. The partial
rejection control overcomes the computational inefficiency of this procedure by introducing a delayed resampling step.
Instead of starting from scratch, when a rejection takes place, new states for the current particle are only simulated from

S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 8299

Table 1
Parameters values used to generate the data (for {0.02, 0.1}), where b , p and are the standard
deviations and the correlation coefficient of the measurement error model. The explanation of the other
parameters is given in Section 2. The parameters that should be estimated are given in the first column.
param.

unknown

param.

known

param.

known

ab
ap
pp

3
3
0.8165
100
0.05
0.05
0.8

q0
ar
pr
kB
tr
T
spr

0.003
5.5
400
0.7
100
10
500

eb
br
bb
bp

0.0083
2
2
2

1
b
p

Fig. 1. Parameter estimation during 100 ECM iterations for four model parameters, 1 (leaf resistance), pp (sink petiole), ab (alpha blade), and ap (alpha
petiole), by using three independent realizations of the MCMCECM algorithm ( = 0.1). In every iteration of the ECM, the sample size was fixed at
150.000. The dotted lines correspond to the parameters used to generate the data.

the previous checkpoint. The previous states of the current particle are selected by drawing one particle from those that
had survived at the previous checkpoint. A residual and stratified resampling mechanism can be chosen also in this case for
more efficiency. Further details for the proposed implementation can be found in Liu et al. (2001).
In the following synthetic example, we present a comparison of the aforementioned competing algorithms. All the tests
were run in one core of a Xeon X5650 (2.67 GHz). We generated data vectors y0:N from the model M1 for N = 50 and for
several values of and we present here the cases where {0.02, 0.1}. The parameters values that we used to simulate
the data are presented in Table 1.
As a stopping criterion for the EM algorithm we used a predefined number of EM steps (100 EM steps) just for comparison
purposes. For each independent run of the algorithm, the sample size was increased piecewise linearly (with increasing
slope) for the first 50 iterations (starting from 250, then increased by 10 for the first 25 iterations and by 20 for the subsequent
25 iterations), and for the last 50 iterations we used a quadratic increase until we reached a size of 10.000. In the next section
a more sophisticated method for the specification of the EM steps and also the Monte Carlo sample size in each EM step is
proposed. The burn-in period for the MCMC was fixed at 500 iterations. For a similar type of simulation schedule and some
discussion on some alternatives see Capp et al. (2005).
In such a model the theoretical solution of the MLE is unknown, but several runs from different initial values could be
performed and if more than one solution exists, the estimated log-likelihood values corresponding to the different solutions
are compared. Moreover, the convergence properties of these algorithms have only been established for likelihood functions
that belong to (curved) exponential families (Fort and Moulines, 2003). Here, it is not generally the case. Nevertheless, a good
approximation of the theoretical solution can be obtained in a preliminary stage by simulating (any stochastic version can
be used) with a very large sample size (Capp et al., 2005). In Fig. 1, we give an example of this procedure for one of the tests
presented here. Notice that the parameter paths which result from 3 independent realizations of the algorithm are almost
identical.

S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 8299

Fig. 2. Boxplots of the estimates of six parameters (row-wise: 1 , pp , ab , ap , b and ) based on the synthetic example of the paper when = 0.02 for
the four competing algorithms: sequential importance sampling with (i) multinomial resampling (SISmR), (ii) residual and stratified resampling (SISrsR),
(iii) partial rejection control (SISprc) and finally Markov chain Monte Carlo (MCMC). The estimates are based on 200 independent runs for each algorithm
and averaging is used for the last 25 ECM iterations of each run.
Table 2
Mean CPU execution times (time/run) for all the competing algorithms and both values of
{0.02, 0.1} corresponding to the synthetic example presented in Figs. 2 and 3.

= 0.02
= 0.10

SISmR

SISrsR

SISprc

MCMC

5 m 27 s
6 m 43 s

5 m 50 s
6 m 35 s

6 m 24 s
7 m 48 s

7 m 19 s
8 m 35 s

In order to judge the effectiveness of parameter estimation for the different algorithms, we performed 200 independent
runs of the algorithms. The sample distribution of the resulting estimates is a good index of the Monte Carlo error associated
with the solution obtained from a single run of each algorithm under a given simulation schedule. We present in Figs. 2 and
3 the boxplots of the solutions from the independent runs, for = 0.02 and = 0.1 respectively. The ranking of the
algorithms w.r.t. their concentration around the mean value (the one obtained by averaging the independent realizations)
was the same when different datasets were generated from the same parameter. In all our tests the worst performance
was that of the SISR with multinomial resampling and the best performance was that of the MCMC. The other types of SISR
(residual and stratified resampling versus partial rejection control) had similar performance. All algorithms give similar
means for both values of and the means were closer when was smaller. We also noticed with some supplementary
tests that as increases the superiority of MCMCECM is clearer, that is, as compared to the other algorithms, it gives much
more concentrated estimates of the structural parameters (the first four) for independent runs of the algorithms. In these
examples, for very small values of , i.e., < 0.01, all the algorithms achieved comparable results.
Notice also that the mean estimates that we obtain for the structural parameters with both algorithms are closer to the
true ones (see also Table 1) when = 0.02, and this could be expected since as increases (directly related to the model
uncertainty), the uncertainty for the values of the structural parameters becomes larger.
The mean CPU time per run is given for all the algorithms and both values of in Table 2. We remark that the MCMCECM
is slightly more computationally expensive, but since it has significantly less Monte Carlo error, its overall performance is
the best one.
We also tested the effect of the averaging technique developed by Fort and Moulines (2003) (see also Capp et al.,
2005, p. 407). The authors proposed to smooth parameter estimates during the last iterations of the ECM algorithm by
using a weighted moving average of the last ECM-updates. The averaging starts near the convergence region at iteration t0
t
(u) , t0 < t tf , where (u) and wu are
and the subsequent averaged parameter estimates (t ) are given by (t ) =
u=t0 wu
the ECM-updates and their corresponding weights which are proportional to the Monte-Carlo sample size used at iteration
u respectively, and tf is the total number of ECM iterations. This technique is typically used when the simulation noise
at convergence is still significant. We tested three different scenarios: no-averaging and averaging from the last 25 or 50

S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 8299

Fig. 3. Boxplots of the estimates of six parameters (row-wise: 1 , pp , ab , ap , b and ) based on the synthetic example of the paper when = 0.1 for
the four competing algorithms: sequential importance sampling with (i) multinomial resampling (SISmR), (ii) residual and stratified resampling (SISrsR),
(iii) partial rejection control (SISprc) and finally Markov chain Monte Carlo (MCMC). The estimates are based on 200 independent runs for each algorithm
and averaging is used for the last 25 ECM iterations of each run.

iterations for both values of {0.02, 0.1}. The best results were obtained for the majority of the parameters by averaging
the solutions from the 25 last iterations. Also, it acts in a different way for the different values of . When = 0.02,
averaging improved most of the estimators of all the algorithms with respect to the standard error. Finally, if we increase
the size of the averaging too much (from 25 to 50), then, although it is not significant, the improvement decreases. This is
reasonable since averaging should be used near the convergence region and not too early.
The two methods have been tested in several sets of parameters and in all of them, both methods returned similar mean
estimates for independent runs of the algorithms. Nevertheless, their standard errors are dependent on the value of and
on the algorithm employed. In the examples that we run, the MCMCECM gave smaller standard errors than the other
algorithms except for very small , but these values are not appropriate for the model fitting with the real dataset as will
be explained in Section 5.
Another advantage of the MCMC approach concerns the number of data taken into account for the estimation. For a large
value of T , the SISR versions that we presented can generally take into account only some (and not all) of the organs that
had not reached their full expansion stage when the plant was cut (the immature members). The reason behind this is that
the underlying hidden Markov process is T -dependent and consequently the last weights associated with the particles in
a sequential implementation could degenerate before taking into account all the data. We refer to Trevezas and Cournde
(2013) for further details of this implementation. In Eq. (4.5) of the above reference, the following result holds for the final
weights of the improved filter:

wN(i)T +1 = wN(i)T p (yN T +1 |qN(i) T :N 1 )

(i)

p (yn |qn:N 1 , q N ),

n =N T +2

{wN(i)T , qN(i) T :N 1 }M
i=1 stands

(i)

where
for the available weighted sample one iteration before the last update, and q N are the
final proposed particle positions. It is clear that since the last product has T 1 factors, a practical implementation of this
filter needs to stop the algorithm when the effective sample size (ESS) will be lower than a threshold for the first time
(see Trevezas and Cournde, 2013 for the explanation of the ESS). This is the reason why some data may be lost and this
could be a serious problem for large values of T . In the case that MCMC algorithm is used this problem does not exist. In this
example we excluded from the data vector all the immature members in order to compare both algorithms on the same data.
In order to evaluate the aggregated performance of the MCMCECM algorithm we also generated multiple datasets from
the same model M1 for the two different values of {0.02, 0.1}. Since the variability of parameter estimates between
different datasets, which is related to the distribution of the MLE, is expected to be much larger than the one that we obtain
within the same dataset (due to the Monte Carlo error), the general performance of all the algorithms presented in this

S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 8299

Fig. 4. Boxplots of the estimates of six parameters (1 , pp , ab , ap , b and ) based on 200 independent datasets generated from the same model M1 for
the two different values of , = 0.02 in the first column and = 0.1 in the second column. For each subfigure, the left boxplot corresponds to the
estimates from the SISRECM with partial rejection control and the right boxplot to the MCMCECM.

section should be comparable. For this reason, in this test, together with the performance of the MCMCECM, we only
present the results from one representative of the class of SISRECM algorithms, the one which uses partial rejection control.
Note that if we assume that the solution that we obtain with any of these algorithms is a good approximation of the MLE
corresponding to each dataset, then this test gives also an idea of the properties of the MLE. In Fig. 4, we present the results
for both algorithms based on 200 datasets.
The results showed that both algorithms are identical in their mean performance for all the parameters, without any
atypical behavior. Also, on one hand, the mean estimates were very close to the true ones and no significant bias for any of
the parameters was detected, and on the other hand, the uncertainty in the parameter estimates was relatively small. This
test also revealed that the increase in the noise level from 0.02 to 0.1, increased the uncertainty in the parameter estimation
of three parameters, the leaf resistance 1 and those involved in the allocation, that is, b and p .
4. Ascent-based MCMCECM algorithm
In the previous section we did not emphasize on the specification of the Monte Carlo sample size in each ECM step and/or
the number of the ECM steps. It is known that, if the MC sample size remains constant at each EM iteration, the MCEM
algorithm will not converge due to a persistent Monte Carlo error, unless this size is unnecessarily too large. Moreover, it is
inefficient to start with a large sample size since the initial guess of the MLE may be far from the true value (Wei and Tanner,
1990). Many authors use a deterministic increase in the MC sample size (independently of the data) and stop the algorithm
after a predefined number of EM steps (McCulloch, 1994, 1997; Chan and Kuk, 1998). Nevertheless, these are not the most
effective ways to tackle these problems. Thus, several data-driven strategies have been proposed recently in the literature.
4.1. Data driven automated stochastic EM algorithms
The last decades data-driven strategies have been proposed in the literature to control the Monte Carlo sample size and
the number of the EM steps. In Booth and Hobert (1999) an automated procedure to determine at each EM step if the Monte
Carlo sample size should be increased or not was proposed. This procedure concerns those Monte Carlo EM algorithms for
which the random sample in the E-step is simulated either by exact draws from the corresponding conditional distribution
or by importance sampling from a proposal distribution close enough to the exact one. Based on the random sample of each
step t, an asymptotic confidence interval of the current estimate of the parameter (t ) is constructed. If the past value (t 1)

S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 8299

lies in it, then the EM step is said to be swamped by the Monte Carlo error and the sample size is increased. The size of the
additional sample is arbitrary (e.g. mt mt + mt /c , c = 2, 3, . . .). Moreover, in Booth and Hobert (1999), they proposed
to stop the MCEM algorithm when a stopping rule is satisfied for three consecutive iterations. The most commonly used
stopping criterion is a sufficiently small relative change in the parameters values.
The automated Monte Carlo EM algorithm of Booth and Hobert (1999) was generalized from random to dependent
samples by Levine and Casella (2001). One basic difficulty which arises with dependent samples is how to determine the
aforementioned confidence interval. In this direction, the authors in Levine and Casella (2001) evoke a central limit theorem
(see Theorem 1, Levine and Casella, 2001) on the basis of the subsampling scheme of Robert et al. (1999). In particular, the
Monte Carlo sample size is increased, if at least one of the estimated partial derivatives of the Q -function with respect to
(t 1) , computed on the basis of the subsample, lies in the appropriately designed confidence interval.
Following the steps of Booth and Hobert (1999) and Levine and Casella (2001), the authors in Levine and Fan (2004)
proposed an alternative automated MCMCEM algorithm. The method of increasing the sample size is based as well on
the construction of an appropriate confidence interval. The main innovation of this paper is that the authors give a specific
formula for quantifying the increase in the MC sample size. In this approach, the EM procedure should be applied two times at
each iteration, one for the complete sample and one for the subsample. This is not an issue when the overall implementation
of the EM algorithm is not time consuming, but if, for example, a numerical maximization is needed for the M-step, this
method could be computationally expensive.
In the rest of this subsection we present the data-driven automated MCEM algorithm proposed by Caffo et al. (2005)
which is computationally cheap and can be easily adapted in our case where numerical maximization is involved as well.
Now, we give a short description of the basic ideas of the algorithm. Let

1Q = Q( (t ) ; (t 1) ) Q( (t 1) ; (t 1) ),
(t )

1Q = Q ( ;

(t 1)

) Q (

(t 1)

(23)
(24)

to its estimation given by (16), where the approximation

where Q corresponds to the true Q -function of the model and Q
is based on the mt -sample generated at the t-th iteration of the EM. The most important feature of this algorithm is that it
is an Ascent-based Monte-Carlo EM algorithm, since the main goal is to recover with high probability the ascent property
of the EM. This means that the MC sample size should be chosen throughout iterations in such a way that 1Q > 0 with
high probability. The authors claim that 1Q is a strongly consistent estimator of 1Q and by evoking the appropriate central
limit theorem the following asymptotic result holds true:

mt (1Q 1Q )
N (0, Q2 ),
mt

(25)

where the regularity conditions and the asymptotic variance Q2 depend on the sampling mechanism employed. A sketch
of the proof is given in the case that simulations result from i.i.d. draws and a remark is made that if an MCMC algorithm is
employed, then (25) holds true under stringent regularity conditions. With the help of (25) and a consistent estimator Q2 of

Q2 the following asymptotic lower bound (ALB) with confidence level 1 can be given for 1Q :
Q

z ,
ALB = 1Q
mt

(26)

where z is the upper -quantile of the standard normal distribution. In the same way, an asymptotic upper bound (AUB)
with confidence level 1 can also be obtained for 1Q :

AUB = 1Q +

z .

(27)

The authors use (26) to decide if the current update based on the mt -sample will be accepted or not. In particular:

if ALB > 0, then with high probability 1Q > 0 and (t ) is accepted as the new update,
if ALB 0, then Q is said to be swamped with MC error and a new sample is appended to the existing one to obtain a
new parameter estimate. A geometric increase is recommended (e.g., mt mt + mt /k, k = 2, 3, . . .). The process is
repeated until ALB > 0 for the first time.
After the acceptance of (t ) , the MC sample size for the next MCEM step is determined by using the approximation

1Q t +1 N

1Q t ,

Q2
m t +1

(28)

where Q t is given by (24) and Q t +1 corresponds to the same quantity by letting t t + 1. Indeed, the size mt +1 is chosen
in such a way so as to prespecify the probability to reject the estimate (t +1) (ALB < 0), when 1Q > 0 (type-II error). If we
set this probability equal to and add the logical requirement mt mt +1 , then it can be easily shown by (28) that
mt +1 = max{mt , Q2 (z + z )2 /(1Q t )2 },

(29)

S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 8299

Table 3
Parameter estimation results with the automated MCMCECM algorithm for the synthetic example when {0.1, 0.02}. Means and standard deviations
of the estimates are based on 50 independent runs. The results are obtained for different values of , and (see Eqs. (26) and (27)) and for geometric
increase in the sample size (mt mt + mt /3) when ALB 0.

, ,

0.1
0.25

0.10.250.1

0.1

0.25

0.10.250.1

0.1

2.8961
(7.73 103 )

2.9004
(5.74 103 )

2.9057
(7.31 103 )

2.9728
(1.61 103 )

2.9744
(1.43 103 )

(1.39 103 )

2.8996
(7.67 103 )

2.9039

2.9092

2.9755

(7.28 103 )

2.9736
(1.45 103 )

2.9750

(5.71 103 )

(1.28 103 )

(1.23 103 )

0.8153
(0.05 104 )

0.8153

(0.05 104 )

0.8153
(7.00 106 )

0.8153

(0.06 104 )

(5.00 106 )

100.6717
(0.0583)

(0.0494)

100.6421

100.6004
(0.0559)

100.2758
(0.0094)

100.2671
(0.0087)

100.2637
(0.0088)

0.0505
(1.48 104 )

0.0506

0.0477

(0.64 104 )

0.0477
(4.80 105 )

0.0477

(0.74 104 )

(2.00 105 )

(2.50 105 )

0.0537
(2.02 104 )

0.0537

0.0504

(0.71 104 )

0.0504
(3.70 105 )

0.0504

(0.86 104 )

(2.90 105 )

(2.21 105 )

0.8536
(1.47 103 )

0.8540

0.8284

(0.52 103 )

0.8283
(2.96 103 )

0.8284

(0.62 103 )

(2.25 103 )

(1.60 103 )

0.02

2.9749

where mt corresponds to the initial MC sample size of iteration t (before any eventual augmentation) and z to the upper
-quantile of the standard normal distribution. The last requirement is the stopping criterion. The MCEM algorithm stops
if AUB < , where AUB is given in (27) and is a predefined small constant. If this criterion is satisfied, then the change in
the Q -function is acceptably small. The adaptation of this approach in the case of the MCMCECM that we propose in this
paper is straightforward as long as a method for estimating the variance Q2 is available.
There are several methods for estimating Q2 (see, e.g., Geyer, 1992). One of the most well-known relies on the spectral
estimator which involves the estimation of autocorrelations weighted by a prespecified function. A presentation of different
choices of weight functions can be found in Priestley (1981). Another popular method is batch means (BM) (Bratley et al.,
1987) which is based on the division of the MC sample into a predefined number of batches of equal size. The batch means
are treated as independent which is only approximately true if the length of each batch is much longer than the characteristic
mixing time of the chain. If the batch size is allowed
to
increase with respect to the sample size m, then this method is referred
to as CBM. Usually, the batch size is set equal to ml , where l = 1/2 or l = 1/3. An alternative method for variance estimation is based on regenerative simulation (RS) (see, Mykland et al., 1995), where random times at which the Markov chain
probabilistically restarts itself, are identified. In fact, the CBM can be viewed as an ad hoc version of the RS method (see, Jones
and Hobert, 2001). Both methods split the sample into pieces with the difference that the RS method guarantees that the
pieces are truly independent. Nevertheless, the conditions of RS are hard to verify. The different variance estimation methods are compared in several papers (see, Jones and Hobert, 2001, Jones et al., 2006 and Flegal and Jones, 2010). In Jones and
Hobert (2001), the authors concluded that CBM and RS give similar results. Despite the theoretical advantages of RS and of the
spectral estimator we adopt the CBM method in the proposed algorithm which is significantly simpler and faster in practice.
4.2. The automated MCMCECM algorithm in simulated datasets
In order to evaluate the performance of the automated MCMCECM, we performed the same synthetic tests as the ones
presented in Section 3.2.
The augmentation rule for the Monte-Carlo sample size is given by (29). As a stopping criterion we used AUB < 103 , see
(27). The initial sample size was fixed at 250 and the burn-in period at 500. Our final estimates for all the tests were based on
means from 50 independent runs. In Table 3,the parameters estimates and the corresponding standard errors are presented
for different combinations of the asymptotic levels , and for = 0.1 and = 0.02, see (26) and (27). For each such
combination, we compared two different rates of the geometric increase in the sample size (mt mt + mt /k, for k = 2, 3)
when ALB 0, see (26), but the results are quite similar thus, we present here only the case where mt mt + mt /3. In
Table 3, we present the results for some specific choices of asymptotic levels in the cases where = 0.1 and = 0.02.
Moreover, in Table 4, the corresponding descriptive statistics for the total sample size (TSS), the final sample size (FSS)
and the number of ECM iterations (Iter) until convergence are given. Note that since it is an automated algorithm the final
sample size and the number of iterations until convergence will differ among independent realizations. For this reason we
also present in Table 5 the effect of weighting the estimates from independent runs with weights proportional to their final
sample size.
For all the tested values of , and , the best results with respect to the standard errors were given in the majority
of the cases for the values 0.1 and then for 0.1 0.25 0.1 as expected (with some exceptions). This is better reflected
to the parameters of the measurement error. However, if we run the algorithm by setting the values at 0.1, then a great

S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 8299

Table 4
Descriptive statistics for the total sample size (TSS), the final sample size (FSS) and the number of iterations until convergence (Iter) corresponding to the
tests given in Table 3. The mean execution times are also given.

, ,

0.1
0.25

0.10.250.1

0.1

0.25

0.10.250.1

0.1

Min TSS
Mean TSS
Max TSS
Min FSS
Mean FSS
Max FSS
Min Iter
Max Iter
Mean time

49 247
250 800
637 305
3405
16 495
56 985
46
86
10 m 42 s

90 505
593 939
1 401 011
8508
48 803
164 524
45
74
27 m 05 s

137 264
794 573
1 936 174
13 309
65 199
169 807
28
66
28 m 13 s

3685
22 413
89 009
1066
6314
27 098
12
27
1 m 15 s

9895
41 078
159 998
2988
12 106
40 842
10
18
3 m 06 s

7193
59 441
235 068
3861
17 465
66 197
8
18
3 m 28 s

0.02

Table 5
Parameter estimation results with the automated MCMCECM algorithm for the synthetic example when {0.02, 0.1}. Means and standard deviations
of the weighted estimates based on 50 independent runs when the estimates have weights proportional to the final sample size. The results are obtained
for different values of , and (see Eqs. (26) and (27)). The sample size was increased as mt mt + mt /3, when ALB 0.

, ,

0.1
0.25

0.10.250.1

0.1

0.25

0.10.250.1

0.1

2.8949
(6.46 103 )

2.8988
(4.85 103 )

2.9039
(5.31 103 )

2.9726
(1.44 103 )

2.9742
(1.27 103 )

2.9745
(1.14 103 )

2.8984
(6.41 103 )

2.9023

2.9074

2.9751

(5.30 103 )

2.9734
(1.29 103 )

2.9748

(4.82 103 )

(1.15 103 )

(1.02 103 )

0.8153
(5.00 106 )

0.8153

(5.00 106 )

0.8153
(6.00 106 )

0.8153

(5.00 106 )

100.6825
(0.0481)

(0.0372)

100.6140

100.2767
(0.0080)

100.2685
(0.0078)

(0.0067)

0.0505
(1.45 104 )

0.0505

0.0506

0.0477

(6.30 105 )

0.0477
(3.60 105 )

0.0478

(6.70 105 )

(1.70 105 )

(2.20 105 )

0.0537
(1.91 104 )

0.0537

0.0504

(0.67 104 )

0.0504
(3.10 105 )

0.0504

(0.83 104 )

(2.30 105 )

(1.90 105 )

0.8537
(1.38 103 )

0.8538

0.8541

0.8284

(4.77 104 )

0.8283
(2.47 103 )

0.8284

(5.97 104 )

(1.77 103 )

(1.54 103 )

0.02

100.6528

(0.0393)

100.2660

computational cost is involved (see Table 4), which is not compensated for the gain in precision. For this reason it could be
wiser to decrease the asymptotic levels (by increasing , and ) to have a rapid algorithm with an acceptable precision.
Furthermore, the weighted averages (see Table 5) with respect to the final sample size generally decreased the standard
deviations independently of the values of the asymptotic levels.
It is noteworthy that the automated algorithm gives mean estimates which are closer to the real values than the original
MCMCECM algorithm. On the other hand, even the best automated algorithm gives more variable estimates than the
non-automated one with independent runs of the algorithm. This could be expected due to the variability in the final sample
size and in the number of iterations until convergence of the automated algorithm. The main point here is that the resulting
estimators are of acceptable accuracy in significant less ECM steps and thus in less CPU time if the asymptotic levels are not
set too low. This is very important for a routine use of the algorithm combined with the fact that the automated algorithm
uses efficiently the computational resources.
5. Application to a real dataset and model comparison
In this section, we present an application of our method with experimental data from the sugar-beet. The experimental
protocol is presented in detail in Lemaire et al. (2008). This real-data case was presented in Trevezas and Cournde (2013)
to motivate the use of a hidden Markov model as the best choice among competing models. The current data contain
mass measurements from 42 blades and petioles, assumed to have expansion durations T = 10. With this assumption
all measurements correspond to leaves which have completed their expansion when the plant was cut. The measurements
are given for reference in Table 6. The parameters are divided into two categories, those which were calibrated directly in
the field and the unknown parameter that has to be estimated. In Table 7 we give the values of the fixed parameters and
the initial values that we used for the parameters that have to be estimated (determined in a preliminary searching stage).
In Table 8 we present the parameter estimation results that we obtained for the four competing algorithms, sequential
importance sampling with (i) multinomial resampling (SISmR), (ii) residual and stratified resampling (SISrsR), (iii) partial
rejection control (SISprc) and Markov chain Monte Carlo (MCMC), by fitting the real data with the model M1 . The
corresponding mean CPU times are given as well. The details of the implementation are given in Section 3. The parameter 2

S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 8299

Table 6
A dataset from the sugar-beet plant. Mass measurements from 42 blades (bl) and petioles (pe).
bl
pe

1
0.021
0.01

2
0.069
0.014

3
0.084
0.023

4
0.138
0.045

5
0.246
0.079

6
0.414
0.29

7
0.604
0.475

8
0.85
0.529

9
0.892
0.537

10
0.99
0.649

11
1.398
0.857

12
1.627
0.988

13
1.568
1.059

14
1.774
1.216

bl
pe

15
1.728
1.317

16
1.625
1.263

17
1.349
1.154

18
1.297
1.204

19
1.212
1.134

20
1.184
1.106

21
1.097
1.056

22
1.028
0.964

23
0.943
0.904

24
0.856
0.889

25
0.744
0.797

26
0.615
0.687

27
0.555
0.655

28
0.476
0.532

bl
pe

29
0.422
0.52

30
0.361
0.471

31
0.326
0.392

32
0.277
0.365

33
0.238
0.296

34
0.191
0.241

35
0.179
0.242

36
0.15
0.186

37
0.124
0.167

38
0.117
0.126

39
0.079
0.091

40
0.089
0.094

41
0.106
0.094

42
0.095
0.083

Table 7
Initial values for both unknown and fixed parameters used to initialize the algorithms in the real data case, where b , p and are the standard
deviations and the correlation coefficient of the measurement error model (see Section 2 for the explanation of the other parameters).
param.

unknown

param.

known

param.

known

ab
ap
pp

2.829
1.813
0.8139
97.95
0.076
0.059
0.136

0.1
3.1
329.48
0.7
60
10
0.003

eb
spr
bb
bp
br

0.0083
500
2
2
2

1
b
p

ar
pr
kB
tr
T
q0

Table 8
Parameter estimation results based on the real dataset. Means and standard deviations of the estimates are based on 50 independent runs for the four
competing algorithms: sequential importance sampling with (i) multinomial resampling (SISmR), (ii) residual and stratified resampling (SISrsR), (iii) partial
rejection control (SISprc) and Markov chain Monte Carlo (MCMC). Averaging is used for the last 25 ECM iterations of each run of the algorithm. The mean
CPU times are also provided.
param.

Mean

Standard deviation

SISmR

SISrsR

SISprc

MCMC

SISmR

SISrsR

SISprc

MCMC

ab
ap
Pp

1
b
p

2.8296
1.8037
0.8150
98.2648
0.0752
0.0588
0.1373

2.8306
1.8241
0.8147
98.0964
0.0762
0.0589
0.1258

2.8219
1.8173
0.8147
98.2490
0.0763
0.0588
0.1275

2.8379
1.8297
0.8147
97.9714
0.0761
0.0589
0.1246

0.0296
0.0208
1.24 104
0.545
2.77 104
1.18 104
3.02 103

0.0239
0.0175
1.13 104
0.4286
2.18 104
1.07 104
2.82 103

0.0274
0.0196
1.18 104
0.5001
2.18 104
1.09 104
2.64 103

0.0128
0.0091
0.25 104
0.2336
1.12 104
0.52 104
1.23 103

Mean time

6 m 22 s

6 m 28 s

7 m 21 s

8 m 43 s

represents a standard level of uncertainty for the mean biophysical model given by (3). This value of = 0.1 corresponds
to the model which best fits the data as shown in Trevezas and Cournde (2013). In Table 9 we present the parameter
estimation results that we obtained with the automated MCMCECM algorithm. The details of the implementation are given
in Section 4. Moreover, in Table 10 the corresponding descriptive statistics for the total sample size (TSS), the final sample
size (FSS) and the number of ECM iterations (Iter) until convergence are given.
We remark that the mean parameter estimates that we obtained with the different variants of SISRECM, the MCMCECM
and the automated MCMCECM algorithm are similar. We reach the same conclusion even if we use the averaging
techniques. Nevertheless, notice in Table 8 that the standard deviations from the mean estimates among independent
realizations are roughly from two to six times smaller with the MCMCECM than any variant of SISRECM. The gain in
precision is an important advantage of the MCMCECM since the CPU time needed for a single run is slightly more for
the MCMCECM (with a mean factor of 1.21.4 in our tests) than for the other algorithms. The results of the automated
MCMCECM algorithm are presented in Table 9. The choice = = = 0.25 results in significantly less standard
deviations than the SISR variants, and slightly lower than the non-automated MCMC. When the parameters , and
decrease, then as expected the standard deviations decrease since the final and the total Monte Carlo sample size increases.
Notice also in Table 10 that smaller values of , and decrease the total number of ECM steps until convergence. The
advantages of the automated algorithm cannot be counterbalanced by using averaging in the non-automated algorithm as
we can see in Table 8. Consequently, the choice of a single run of an automated MCMCECM is very reasonable even with
the choice = = = 0.25. Nevertheless, depending on the desired accuracy, it is always possible to combine a small
number of independent runs to obtain weighted mean estimates. Furthermore, the automated algorithm makes indeed an
efficient use of Monte Carlo resources and there is no need to determine a priori the total number of ECM steps and how the
Monte Carlo sample size should be increased.

S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 8299

Table 9
Parameter estimation results based on the real dataset with the automated MCMCECM algorithm. Means and standard deviations of the estimates are
based on 50 independent runs. The results are obtained for different values of the asymptotic levels , and , see relations (26) and (27). Averaging is
used for the last 25 ECM iterations of each run of the algorithm.

, ,

Mean

ab
ap
Pp

1
b
p

Standard deviation

0.25

0.10.250.1

0.1

0.25

0.10.250.1

0.1

2.8372
1.8290
0.8147
97.9824
0.0761
0.0589
0.1253

2.8345
1.8268
0.8147
98.0311
0.0761
0.0589
0.1257

2.8346
1.8264
0.8147
98.0220
0.0761
0.0588
0.1267

0.0120
0.0087
0.57 104
0.2076
2.03 104
0.87 104
2.62 103

0.0078
0.0054
0.49 104
0.1473
1.18 104
0.49 104
1.74 103

0.0068
0.0045
0.47 104
0.1307
1.43 104
0.63 104
1.86 103

Table 10
Descriptive statistics for the total sample size (TSS), the final sample size (FSS) and the number of iterations
until convergence (Iter) corresponding to the tests given in Table 9. The mean CPU times are also provided.

, ,

0.25

0.10.250.1

0.1

Min TSS
Mean TSS
Max TSS
Min FSS
Mean FSS
Max FSS
Min Iter
Max Iter
Mean time

7376
30 173
66 311
2675
9570
20 820
6
21
2 m 24 s

4504
41 645
105 438
3158
19 354
47 450
4
15
6 m 22 s

4530
36 744
230 250
3561
22 487
160 257
3
8
4 m 18 s

Table 11
MLE obtained with the models M1 ( as a free parameter), M1 ( = 0), M2 ( as a free parameter) and M2 ( = 0) for the sugar-beet dataset. In the last
two columns the corrected Akaike information criterion (AICc) and the Bayesian information criterion (BIC) are estimated based on 100 samples of 5 105
independent evaluations. The standard deviations are given in parenthesis. The above criteria are given by AICc = 2(log L d) + 2d(d + 1)/(n d + 1)
and BIC = 2 log L + d log n, where d is the number of free parameters and n the sample size.
Model

AICc

BIC

M1
M1
M2
M2

2.836
2.837
3.019
3.139

1.852
1.829
2.044
2.172

0.8142
0.8147
0.8031
0.8051

98.48
97.98
98.76
96.83

0.0750
0.0761
0.1585
0.1647

0.0591
0.0589
0.2119
0.2114

0.0000
0.1253
0.0000
0.3380

344.13 (0.03)
342.17 (0.02)
334.72 (0.03)
336.18 (0.04)

330.63 (0.03)
326.63 (0.02)
321.23 (0.03)
320.64 (0.04)

In the last part of this section we present the results of the model comparison when fitting the experimental data
presented in Table 6. Two types of models, referred to as models M1 and M2 , were considered in this paper and their hidden
Markov formulation is given by Proposition 1. For each model, we distinguished the cases where the correlation coefficient
between the mass measurement errors of the blade and the petiole is a free parameter that has to be estimated (model M1
and M2 ) or is null (model M1 and M2 ). In the latter cases we have one parameter less to estimate. We run the automated
MCMCECM for all these models with = = = 0.25 and the obtained results are presented in Table 11. We also give
the estimated corrected Akaike information criterion (AICc) and the Bayesian information criterion (BIC) for all the models
that we tested (see, e.g., Bengtsson and Cavanaugh, 2006 and the references therein). Since the best model is the one with
the lowest values in both criteria, the additive error in the mass measurements (models M1 and M1 ) is better adapted than
the log-additive one (models M2 and M2 ). Among all of them, the model M1 had the best performance in this dataset. Even
though we have restricted ourselves to the comparison between these models, the comparison method is of course general,
and could be applied to other formulations of the error models or of the functional models.
6. Discussion
In this paper we proposed simulation techniques based on MCMC for parameter estimation via an ECM algorithm for a
class of plant growth models which can be characterized by deterministic structural development and include process error
in biomass production dynamics, initially introduced in Trevezas and Cournde (2013). The resulting estimation algorithm
based on MCMC improves the one developed in Trevezas and Cournde (2013), where the authors used SISR to perform
the Monte Carlo E-step, by reducing significantly the variance of parameter estimates obtained by independent runs of the
algorithm. Another important advantage of this algorithm as compared to the one proposed in Trevezas and Cournde (2013)
is that the organ masses of the last immature members can all be taken into account even for large expansion durations and
this could be very important for improving the quality of parameter estimation. Moreover, the adaptation of the data-driven

S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 8299

automated algorithm of Caffo et al. (2005) to our algorithm was shown to be a good solution for an intelligent use of Monte
Carlo resources. Simulation studies from a synthetic example and a real dataset from the sugar-beet plant were used to
illustrate the performance of the proposed algorithm. Two different types of hidden Markov models were described and
tested on a real dataset for their fitting quality.
The resulting algorithm is very promising and can be further exploited for decision aid in agricultural science. In this
direction, further effort is needed for the adaptation of this algorithm to other crop plants with deterministic organogenesis
and for model comparison and validation. Furthermore, despite the interest in individual plant growth modeling, the genetic
variability of plants, even of the same variety, can be very important and, if we add locally varying climatic effects, then the
development of two plants in the same field could be highly different. Consequently, a population-based model could be
more appropriate to describe the population dynamics and the inter-individual variability (de Reffye et al., 2009). We are
currently studying an extension to the population level by coupling with a nonlinear mixed effects model (Kuhn and Lavielle,
2005). Another interesting perspective is to broaden the applicability of the proposed statistical methodology in plants with
stochastic organogenesis (e.g. trees) where the total number of organs of each class at each growth cycle is a random variable
(see, e.g., Loi and Cournde, 2008).
Acknowledgments
We would like to thank the anonymous reviewers and also the associate editor for their suggestions, that helped us to
improve the quality of this paper.
Appendix
Proof of Proposition 2. In order to simplify the proof we will change the state variables of the model M2 . By setting
Rn = log Qn and Zn = log Yn we can rewrite Eqs. (10) and (11) as follows:
Rn+1 = log Fn (R(nT +1)+ :n ; , pal ) + Wn ,

(30)

Zn = log Gn (Rn:(n+T 1) ; pal ) + Vn .

(31)

Now, let us analyze the Q -function of the model. Let us also write Fn given by (3) as Fn = Kn . In the rest, we identify the
functions Kn and Gn (see (6)) with the induced random variable Kn (2 ) and the induced random vector Gn (2 ) respectively,
for an arbitrary 2 2 , where 2 is an appropriate euclidean subset. By the assumptions of the model M2 and Eqs. (30)
and (31) we have:

Q(; ) = E log p (R0:N , z0:N ) | z0:N

E log p (Rn |R(nT )+ :n1 ) | z0:N +

n =1

E log p (zn |Rn:(n+T 1)N ) | z0:N

n =0

= C (2 ; ) + Q1 (, , 2 ; ) + Q2 ( , 2 ; ),

(32)

where

Q1 (, 2 , 2 ; ) =
Q2 ( , 2 ; ) =

N
2

log 2

N +1
2

N
1

log(det )

E (Rn log Kn1 (2 ) log )2 | z0:N ,

n =1
N
1

2 n =0

zn log Gn (2 )

1 zn log Gn (2 ) | z0:N ,

and C (2 ; ) is independent of 1 .
Note that for fixed 2 the initial maximization problem of Q w.r.t. 1 can be separated into two distinct maximization
problems of Q1 and Q2 w.r.t. (, 2 ) and respectively. By maximizing Q1 we get easily (18) and (20) and by maximizing
Q2 we get (19). In the latter case the proof is the same as in the case of an additive measurement error model (with the
transformed variables) and a detailed proof can be found in Trevezas and Cournde (2013), Web Appendix C.

References
Bengtsson, T., Cavanaugh, J.E., 2006. An improved Akaike information criterion for state-space model selection. Comput. Statist. Data Anal. 50 (10),
26352654.
Booth, J.G., Hobert, J.P., 1999. Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. J. R. Stat. Soc. Ser. B
Stat. Methodol. 61 (1), 265285.
Bratley, P., Fox, B.L., Schrage, L.E., 1987. A Guide to Simulation. Springer-Verlag, New York.
Caffo, B.S., Jank, W., Jones, G.L., 2005. Ascent-based Monte Carlo expectationmaximization. J. R. Stat. Soc. Ser. B Stat. Methodol. 67, 235251.
Capp, O., Moulines, E., Rydn, T., 2005. Inference in Hidden Markov Models. Springer, New York.
Celeux, G., Diebolt, J., 1985. The SEM algorithm: a probabilistic teacher algorithm derived from the EM algorithm for the mixture problem. Comput. Stat.
Q. 2, 7382.
Chan, J.S.K., Kuk, A.Y.C., 1998. Maximum likelihood estimation for probitlinear mixed models with correlated random effects. Biometrics 53, 8697.

S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 8299

Cournde, P.-H., Letort, V., Mathieu, A., Kang, M.-Z., Lemaire, S., Trevezas, S., Houllier, F., de Reffye, P., 2011. Some parameter estimation issues in functionalstructural plant modelling. Math. Model. Nat. Phenom. 6 (2), 133159.
Delyon, B., Lavielle, V., Moulines, E., 1999. Convergence of a stochastic approximation version of the EM algorithm. Ann. Statist. 27, 94128.
Dempster, A.P., Laird, N.M., Rubin, D.B., 1977. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B Stat. Methodol. 39,
138.
de Reffye, P., Hu, B.G., 2003. Relevant choices in botany and mathematics for building efficient dynamic plant growth models: the greenlab case. In: Hu, B.G.,
Jaeger, M. (Eds.), Plant Growth Models and Applications. Tsinghua University Press and Springer, pp. 87107.
de Reffye, P., Lemaire, S., Srivastava, N., Maupas, F., Cournde, P.-H., 2009. Modeling inter-individual variability in sugar beet populations. In: Li, B.G.,
Jaeger, M., Guo, Y. (Eds.), 3rd International Symposium on Plant Growth and Applications, Beijing, China, November 912. (PMA09). IEEE.
Doucet, A., De Freitas, N., Gordon, N., 2001. Sequential Monte Carlo Methods in Practice. Springer-Verlag, New-York.
Flegal, J.M., Jones, G.L., 2010. Batch means and spectral variance estimators in Markov chain Monte Carlo. Ann. Statist. 2 (38), 10341070.
Ford, E.D., Kennedy, M.C., 2011. Assessment of uncertainty in functional-structural plant models. Ann. Bot. 108 (6), 10431053.
Fort, G., Moulines, E., 2003. Convergence of the Monte Carlo expectation maximization for curved exponential families. Ann. Statist. 31, 12201259.
Gelfand, A.E., Smith, A.F.M., 1990. Sampling based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 85, 398409.
Geman, S., Geman, D., 1984. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6,
721741.
Geyer, C.J., 1992. Practical Markov chain Monte Carlo. Statist. Sci. 7 (4), 473483.
Gordon, N., Salmond, D., Smith, A.F., 1993. Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proc. F 140 (2), 107113.
Guo, Y., Ma, Y.T., Zhan, Z.G., Li, B.G., Dingkuhn, M., Luquet, D., de Reffye, P., 2006. Parameter optimization and field validation of the functional-structural
model GREENLAB for maize. Ann. Bot. 97, 217230.
Hastings, W.K., 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57 (1), 97109.
Jank, W., 2005a. Quasi-Monte Carlo sampling to improve the efficiency of Monte Carlo EM. Comput. Statist. Data Anal. 48 (4), 685701.
Jank, W., 2005b. Stochastic variants of EM: Monte Carlo, quasi-Monte Carlo and more. In: Proceedings of the American Statistical Association.
Jank, W., 2006. The EM algorithm, its stochastic implementation and global optimization: some challenges and opportunities for OR. In: Alt, F., Fu, M.,
Golden, B. (Eds.), Topics in Modeling, Optimization and Decision Technologies: Honoring Saul Gass Contributions to Operation Research. SpringerVerlag, pp. 367392.
Jones, G.L., Haran, M., Caffo, B.S., Neath, R., 2006. Fixed-width output analysis for Markov chain Monte Carlo. J. Amer. Statist. Assoc. 101 (476), 15371547.
Jones, G.L., Hobert, J.P., 2001. Honest exploration of intractable probability distributions via Markov chain Monte Carlo. Statist. Sci. 16 (4), 312334.
Jullien, A., Mathieu, A., Allirand, J.-M., Pinet, A., de Reffye, P., Cournde, P.-H., Ney, B., 2011. Characterisation of the interactions between architecture and
source:sink relationships in winter oilseed rape (brassica napus l.) using the GreenLab model. Ann. Bot. 107 (5), 765779.
Kang, M.Z., Cournde, P.-H., de Reffye, P., Auclair, D., Hu, B.G., 2008. Analytical study of a stochastic plant growth model: application to the GreenLab model.
Math. Comput. Simul. 78 (1), 5775.
Kitagawa, G., 1996. Monte Carlo filter and smoother for non-Gaussian nonlinear state space models. J. Comput. Graph. Statist. 5 (1), 125.
Kuhn, E., Lavielle, M., 2005. Maximum likelihood estimation in nonlinear mixed effects models. Comput. Statist. Data Anal. 49, 10201038.
Lange, K., 1995. A gradient algorithm locally equivalent to the EM algorithm. J. Roy. Statist. Soc. Ser. B 57 (2), 425437.
Lemaire, S., Maupas, F., Cournde, P.-H., de Reffye, P., 2008. A morphogenetic crop model for sugar-beet (beta vulgaris l.). In: International Symposium on
Crop Modeling and Decision Support: ISCMDS 2008, April 1922, 2008, Nanjing, China.
Letort, V., 2008. Multi-scale analysis of sourcesink relationships in plant growth models for parameter identification. Case of the GreenLab model. Ph.D.
Thesis, Ecole Centrale Paris.
Levine, R.A., Casella, G., 2001. Implementations of the Monte Carlo EM algorithm. J. Comput. Graph. Statist. 10 (3), 422439.
Levine, R.A., Fan, J., 2004. An automated (Markov chain) Monte Carlo EM algorithm. J. Stat. Comput. Simul. 74 (5), 349360.
Liu, J.S., Chen, R., Logvinenko, T., 2001. A theoretical framework for sequential importance sampling with resampling. In: Doucet, A., Freitas, N., Gordon, N.
(Eds.), Sequential Monte Carlo Methods in Practice. In: Statistics for Engineering and Information Science, Springer, New York, pp. 225246.
Liu, J.S., Chen, R., Wong, W.H., 1998. Rejection control and sequential importance sampling. J. Amer. Statist. Assoc. 93 (443), 10221031.
Loi, C., Cournde, P.-H., 2008. Generating functions of stochastic L-systems and application to models of plant development. Discrete Math. Theor. Comput.
Sci. Proc. AI, 325338.
Mathieu, A., Cournde, P.-H., Letort, V., Barthlmy, D., de Reffye, P., 2009. A dynamic model of plant growth with interactions between development and
functional mechanisms to study plant structural plasticity related to trophic competition. Ann. Bot. 103 (8), 11731186.
McCulloch, C.E., 1994. Maximum likelihood variance components estimation for binary data. J. Amer. Statist. Assoc. 89, 330335.
McCulloch, C.E., 1997. Maximum likelihood algorithms for generalized linear mixed models. J. Amer. Statist. Assoc. 92, 162170.
McLachlan, G.J., Krishnan, T., 2008. The EM Algorithm and Extensions. John Wiley & Sons Inc..
Meng, X.-L., Rubin, D.B., 1993. Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80, 267278.
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E., 1953. Equation of state calculation by fast computing machines. J. Chem. Phys.
21 (6), 10871092.
Meyn, S.P., Tweedie, R.L., 1993. Markov Chains and Stochastic Stability. Springer-Verlag.
Mykland, P., Tierney, L., Yu, B., 1995. Regeneration in Markov Chain samplers. J. Amer. Statist. Assoc. 90, 233241.
Pallas, B., Loi, C., Christophe, A., Cournde, P.-H., Lecoeur, J., 2011. Comparison of three approaches to model grapevine organogenesis in conditions of
fluctuating temperature, solar radiation and soil water content. Ann. Bot. 107 (5), 729745.
Priestley, M.B., 1981. Spectral Analysis and Time Series. Academic, London.
Robert, C.P., Casella, G., 2004. Monte Carlo Statistical Methods. Springer.
Robert, C.P., Ryden, T., Titterington, D.M., 1999. Convergence controls for MCMC algorithms, with applications to hidden Markov chains. J. Stat. Comput.
Simul. 64, 327355.
Sievnen, R., Nikinmaa, E., Nygren, P., Ozier-Lafontaine, H., Perttunen, J., Hakula, H., 2000. Components of a functional-structural tree model. Ann. For. Sci.
57, 399412.
Trevezas, S., Cournde, P.-H., 2013. A sequential Monte Carlo approach for MLE in a plant growth model. J. Agric. Biol. Environ. Stat. 18 (2), 250270.
Vos, J., Evers, J.B., Buck-Sorlin, G.H., Andrieu, B., Chelle, M., De Visser, P.H.B., 2010. Functional-structural plant modelling: a new versatile tool in crop science.
J. Exp. Bot. 61 (8), 21012115.
Wei, G., Tanner, M., 1990. A Monte Carlo implementation of the EM algorithm and the poor mans data augmentation algorithms. J. Amer. Statist. Assoc.
85, 699704.
Whitley, D., 1994. A genetic algorithm tutorial. Stat. Comput. 4 (2), 6585.
Yin, X., Struik, P.C., 2010. Modelling the crop: from system dynamics to systems biology. J. Exp. Bot. 61 (8), 21712183.