Вы находитесь на странице: 1из 11

Neurocomputing 335 (2019) 48–58

Contents lists available at ScienceDirect

Neurocomputing
journal homepage: www.elsevier.com/locate/neucom

Linear mixed-effects model for multivariate longitudinal compositional


data
Zhichao Wang a,b, Huiwen Wang a,b,c, Shanshan Wang a,c,∗
a
School of Economics and Management, Beihang University, Beijing 100191, China
b
Beijing Key Laboratory of Emergence Support Simulation Technologies for City Operations, Beijing 100191, China
c
Beijing Advanced Innovation Center for Big Data and Brain Computing, Beijing 100191, China

a r t i c l e i n f o a b s t r a c t

Article history: Compositional data analysis is becoming increasing important in economic research, where the variables
Received 13 June 2018 of interest may be structural indicators, such as the investment proportion of industries. In many appli-
Revised 10 November 2018
cations, the measurements of these indicators are collected from countries/regions on a yearly/monthly
Accepted 19 January 2019
basis, which falls into the paradigm of longitudinal data. Typically, data from the same individual may
Available online 24 January 2019
show potential association due to unobserved shared factors. To incorporate the dependence within the
Communicated by Dr Zhanyu Ma individual, we investigate the linear mixed-effects model for multivariate longitudinal compositional data.
We develop and implement a maximum likelihood estimation procedure through the expectation max-
Keywords:
Longitudinal compositional data imization algorithm. We also investigate the statistical inferences of fixed effects coefficients and the
Linear mixed-effects model selection of random effects via a proposed Bayesian information criterion. The proposed method shows
Expectation maximization algorithm desirable properties and performs well in finite samples, as comprehensive numerical studies indicate.
Structural economic indicator We further illustrate the practical utility of the proposed method in a real data study based on China’s
industrial structure, and show that it can improve the performance and enhance the interpretability of
the regression on multivariate compositional data.
© 2019 Elsevier B.V. All rights reserved.

1. Introduction Given the constraints of compositional data, most of the con-


ventional statistical approaches, including regression analysis, can-
Compositional data (or compositions) describe the intrinsic not be directly applied when the response and/or covariates in-
structure of an integrated system by proportion or percentage [1,2]. volve compositional structures. A common approach in such cases
In economic data analysis, they are used to express indicators that is to eliminate these constraints completely/partly through trans-
contain only relative information, such as the water consumption formation before modeling. This idea leads to the development
structure [24,25], the investment/employment structure of indus- of a family of logratio transformations, including the additive lo-
trial sectors [4,22], and the industrial raw material consumption gratio, centered logratio [2], and isometric logratio [ilr; 7]. More
proportions across regions [23]. recent methods of this idea follow the hyperspherical [21] and
In mathematical notation, any composition with D inner parts power transformations [19]. Another advanced solution is to de-

can be represented as a vector x = [x1 , x2 , . . . , xD ] that satisfies scribe compositional data as particular distributions in the simplex,
positive and constant-sum constraints; that is, such as the simplicial normal [16], beta [15], and Dirichlet ones
[13,14,20], which motivates direct modeling on compositions with-

D
0 < xi < 1 ( i = 1, 2, . . . , D ) and xi = 1. out data preprocessing.
i=1
Only a few studies in the literature have considered the re-
gression model with compositional response and covariates. Wang
All the D-part compositions constitute the D-dimensional simplex et al. [22] considered the whole composition as an integrated unit
space, denoted by SD . For further discussion on compositional data, and conducted the linear model for multivariate compositional
see [17]. data (CoLM). Chen et al. [4] presented another modeling strategy,
considering each part within the compositions as a variable in the

Corresponding author at: School of Economics and Management, Beihang Uni-
regression. These methods are investigated under the independent
versity, Beijing 100191, China. identically distributed (IID) assumption on compositional errors
E-mail address: sswang@buaa.edu.cn (S. Wang). and implemented using ordinary least squares (OLS) estimation.

https://doi.org/10.1016/j.neucom.2019.01.043
0925-2312/© 2019 Elsevier B.V. All rights reserved.
Z. Wang, H. Wang and S. Wang / Neurocomputing 335 (2019) 48–58 49

However, this IID assumption is not appropriate in many practi- generalize the longitudinal analysis technique to multivariate com-
cal situations; structural economic indicators can also show defin- positional data, which may provide an instructive solution for deal-
ing features of longitudinal data, which we call longitudinal compo- ing with longitudinal characteristics in compositional data model-
sitional data. For example, as discussed in Section 5, the gross re- ing.
gional domestic product proportions of the three main industries The remainder of this paper is organized as follows.
in China are measured considering dozens of provinces over sev- Section 2 introduces some mathematical notations and related
eral years. The inertia of economic activities results in high depen- preliminaries of compositional data analysis. Section 3 investigates
dence of the collected compositional data on the same province, CoLMM and presents the parameter estimation procedure through
with the regional disparity in China leading to diverse correlations the EM algorithm. Some necessary definitions and related proper-
across provinces. In this case, the conventional compositional data ties are also given in this section. Various numerical studies are
analysis based on the IID assumption can be biased, and the sta- then conducted to assess the performance of the proposed EM
tistical inferences can be misleading when such longitudinal char- estimation in Section 4, including the accuracy of the parameter
acteristics are ignored. Thus, we need to analyze multivariate com- estimation and model selection, the influence on performance
positional data within the longitudinal framework. from different initializations, and the empirical computational
To accommodate the within-subject dependence, [11] proposed complexity of the algorithm, respectively. Section 5 carries out a
the linear mixed-effects model (LMM) incorporating random ef- real data application about China’s industrial structure to illustrate
fects. In LMM, a subset of regression coefficients varies randomly the practical value of the proposed method. Some discussions are
from on individual to another, which accounts for the natural het- presented in Section 6.
erogeneity in the population. As a fundamental approach in longi-
tudinal analysis, LMM has been adopted in numerous applications 2. Preliminaries
[see 10], but those studies considered scalar data only. In this pa-
per, we investigate the LMM for multivariate compositional data We first briefly review the Aitchison geometry and isometric lo-
named CoLMM to deal with longitudinal characteristics. Specifi- gratio (ilr) transformation for compositional data. Some matrix ex-
cally, we assume that the data are observed from N individuals, pressions of multivariate compositions are introduced, in addition
with ni samples in the ith individual (i = 1, 2, . . . , N), then CoLMM to the multivariate normal distribution in the multivariate simplex.
for the jth measurement takes the form For distinction, we use [·] and (·) to denote the compositions (or
multivariate compositions) in the simplex (or multivariate simplex)
and scalar vectors in the real space, respectively. Unless otherwise
yi j = xi j  β  zi j  bi  εi j ( j = 1, 2, . . . , ni ). (1) noted, we use the fixed ilr transformation and its inverse through-
out the paper.

Here, yij is the compositional response, xij and zij are constituted
2.1. Aitchison geometry and ilr transformation
from part of the compositional covariates with dimensions p and
q, respectively, εij is the random compositional error, β is the
Perturbation and powering, denoted respectively by  and ,
unknown p-vector, and bi is the random q-vector. As shown in
are two fundamental operations for compositional data in the
Sections 2 and 3,  and  are respectively the linear combination 
Aitchison geometry. For any compositions x = [x1 , x2 , . . . , xD ] and
and perturbation operations for compositional data. In Model (1), 
y = [y1 , y2 , . . . , xD ] in SD , and α ∈ R, they are defined as
yij is explained by a combination of xi j  β shared by all the indi-
  
viduals and zi j  bi unique to the particular individual i. They are x  y = C ( x1 y1 , x2 y2 , . . . , xD yD ) and α  x = C xα1 , xα2 , . . . , xαD ,
referred to as the fixed and random effects, where β and bi are the
where C (· ) denotes the closure operation, that is, C (z ) =
corresponding coefficients. Specifically, Model (1) reduces to CoLM z z z  
[ D 1 , D 2 , . . . , DD ] with z = (z1 , z2 , . . . , zD ) and zi > 0 for
when there are no random effects. Compared with CoLM, the in- i=1 zi i=1 zi i=1 zi
troduction of random effects in CoLMM measures the regression i = 1, 2, . . . , D. Perturbation and powering operations construct a
characteristics of all the individuals in the population, and also linear structure on SD , with the null element nD = C (1D ) and 1D
makes it possible to describe the characteristics of the particular constituted by 1 in RD . The Aitchison inner product, denoted by
individuals. It distinguishes the between- and within-subject vari- (·,·)a , is defined as
ability of longitudinal compositional responses with relatively little
1 
D D
loss of degrees of freedom, thus improving the performance of the xi yi
( x, y )a = log log ,
regression further. 2D xj yj
i=1 j=1
In this paper, we focus on the parameter estimation for
Model (1). For the reduced model, the estimates are derived by where the subscript a represents the Aitchison geometry. The re-
minimizing the sum of the squared norm of compositional errors lated norm and distance, denoted by ·a and da (·,·), are then
in [22]. However, this OLS estimation method is not available for defined as xa = (x, x )1a /2 and da (x, y ) = x  ya , respectively,
general Model (1). Here, we develop a maximum likelihood (ML) where  denotes the subtraction operation as x  y = x  (−1 ) 
approach to estimate the regression parameters and the variance- y.
covariates of random effects. The estimation procedure is imple- The ilr transformation is conducted via the orthonormal basis of
mented via an expectation maximization (EM) algorithm. To pro- the simplex, under which the compositions in SD are represented
ceed with the EM algorithm, we define the joint distribution of as their coordinates in RD−1 . Specifically, as suggested by [6], x is

the compositional response and random effects coefficients, and transformed to ilr(x ) = x∗ = (x∗1 , x∗2 , . . . , x∗D−1 ) , where the compo-
formulate the full-sample log-likelihood function as well as some nent x∗i is
conditional distributions (for details, see Lemmas 1–3). Moreover, D−i

1  D−i
we investigate the statistical inferences of the fixed effects coeffi- x∗i =  log x j − log xD−i+1 (2)
cients and develop the selection of random effects by a Bayesian (D − i )(D − i + 1 ) D−i+1
j=1
information criterion (BIC). The proposed CoLMM improves the re-
−1
gression on multivariate compositional data and enhances its inter- for i = 1, 2, . . . , D − 1. Then, the inverse is expressed as ilr (x∗ ) =

pretability. To the best of our knowledge, this is the first study to C (exp (w ) ), where exp (w ) = (exp {w1 }, exp {w2 }, . . . , exp {wD } )
50 Z. Wang, H. Wang and S. Wang / Neurocomputing 335 (2019) 48–58

and abbreviated as

D−i
 x∗j i−1 ∗
p

wi =  − xD−i+1 xi  λ = λ  x i = λ j  xi j
(D − j )(D − j + 1 ) i j=1
j=0

( i = 1, 2, . . . , D − 1 ) = λ1  xi1  λ2  xi2  · · ·  λ p  xip ∈ SD ,


   
with x∗0 = x∗D = 0. As the isometry between the simplex and real x  λ = [[x1  λ] , [x2  λ] , . . . , [xn  λ] ] ∈ SDp .
space, for all the ilr transformations, it holds that
Note that these two types of linear combinations share the same
ilr(α  x  β  y ) = α · ilr(x ) + β · ilr(y ) (3) notation as the powering operation and are distinguished by the
dimensions of the related composition and scalar vector.
with any α and β ∈ R.
By using the contrast matrix  ∈ R(D−1 )×D , we can ex-
press the ilr transformation and its inverse as ilr(x ) =  log (x ) 2.3. Multivariate normal distribution in the multivariate simplex
and ilr (x∗ ) = C (exp ( x∗ ) ), respectively, where log (x ) =
−1

(log x1 , log x2 , . . . , log xD ) . The particular form of the contrast The real random vector is defined as a multivariate random
matrix is determined on the given orthonormal basis. Forexample, composition if all its possible values are taken in the multivariate
simplex. Thus, the following multivariate normal distribution can
 corresponding to (2) consists of the elements φi j = D−i
δ
D−i+1 i j be developed in the multivariate simplex.
(i = 1, 2, . . . , D − 1; j = 1, 2, . . . , D), where the subscripts indicate
the row and column, and δi j = (D − i )
−1
if j < D − i + 1, δi j = −1 Definition 1. The multivariate random composition X is said to
if j = D − i + 1, and δi j = 0 otherwise. For any contrast matrix, it follow the multivariate normal distribution in SDp , denoted by X ∼
holds that   ≡ E (D ) = ID − D1 1D 1D , where ID denotes the unit NS (μ, ) with center parameter μ ∈ R(D−1 ) p and covariance ma-
matrix with the dimension D. trix parameter  ∈ R(D−1 ) p×(D−1 ) p if the probability density func-
tion (PDF) of X is
1

1
2.2. Multivariate composition
f (x ) = (D−1 ) p/2
exp − (ilr(x ) − μ ) −1 (ilr(x ) − μ )
( 2π ) | | 1 / 2 2
For any np compositions xi j (i = 1, 2, . . . , n; j = 1, 2, . . . , p) in SD ,
 for any x ∈ SDp .
we refer to the vector xi = [xi1 , xi2 , . . . , xip ] as a p-dimensional D-
part composition in SDp . Here, SDp is the Cartesian product of p sim- As the generalization of the normal distribution in the simplex
plex spaces SD , which is called the multivariate D-part simplex (when p = 1) proposed by [16], Definition 1 considers more than
space with the dimension p. one random composition and their association based on the ilr co-
Componentwise, the Aitchison geometry and ilr transformation ordinates of all the random compositions.
in SDp can be derived from those in SD [9,23]. Specifically, the per-
turbation and powering operations in SDp can be defined as 3. LMM for multivariate compositional data
   
xi  xi = [[xi1  xi 1 ] , [xi2  xi 2 ] , . . . , [xip  xi p ] ] , In this section, we investigate the LMM for multivariate com-
    positional data with longitudinal characteristics. Some definitions
α  xi = [[α  xi1 ] , [α  xi2 ] , . . . , [α  xip ] ]
and related properties for the joint use of random compositions
for any α ∈ R. The analogous definitions of ilr transformation and and real variables are discussed, since they provide theoretical jus-
its inverse can be formulated as tification for the proposed parameter estimation using the EM al-
    gorithm.
ilr(xi ) = x∗i = (ilr(xi1 ) , ilr(xi2 ) , . . . , ilr(xip ) )
   
= (x∗i1 , x∗i2 , . . . , x∗ip ) , 3.1. Model
−1 −1  −1  −1  
ilr ( ) = [ilr (
x∗i )
x∗i1 , ilr ( )
x∗i2 , . . . , ilr ( )
x∗ip ] . Considering samples from the same individual jointly, we pile
To simplify the notations, we introduce some matrix ex- up ni samples from the i-th individual by row. Now, Model (1) can
pressions of multivariate compositional data. Consider the be rewritten in matrix form as
composition matrix piled up in n multivariate composi- yi = xi  β  zi  b i  εi ( i = 1, 2, . . . , N ) (5)

tions by row, i.e., x = [x1 , x2 , . . . , xn ] ; its ilr transformation  
is defined as ilr(x ) = x = (ilr(x(1 ) ), ilr(x(2 ) ), . . . , ilr(x( p) )) ∈
∗ with yi = [yi1 , yi2 , . . . , yi,n
 ] ∈ SD ,
ni xi = [xi1 , xi2 , . . . , xi,ni ] ,
i

R(D−1 )n×p with x( j ) = [x1 j , x2 j , . . . , xn j ] ( j = 1, 2, . . . , p). This

zi = [zi1 , zi2 , . . . , zi,ni ] , and εi = [εi1 , εi2 , . . . , εi,n ] ∈ SnDi . Here,
i
can be equivalently written as ilr(x ) = n () log (x ), where εi is assumed to satisfy the multivariate normal distribution in
n () ∈ R(D−1)n×Dn is blocked diagonal consisting of n con- SnDi , that is, εi ∼ NS (0i∗ , σ 2 Ii∗ ), where 0i∗ denotes the i∗ -vector
trast matrices, that is, n () = diag(, , . . . , ), and log (x ) = consisting of 0 and i∗ is the abbreviation of (D − 1 )ni , and bi is
(log (x(1) ), log (x(2) ), . . . , log (x( p) ) ) ∈ RDn×p . The corresponding assumed to be independent of εi and normally distributed in Rq
inverse is then expressed as ilr
−1
(x∗ ) = C (exp (n ()x∗ )). For any with mean zero and covariance matrix G, where G is positive
n (), we have defined and constant for all individuals. Thus, the parameters to
be estimated in CoLMM contain θ = (β, G, σ 2 ).
n ()n () ≡ En (D ) = diag(E (D ), E (D ), . . . , E (D )) ∈ RDn×Dn . Under the aforementioned assumptions, the ML method can be
(4) used to estimate the parameters in Model (5). Before we proceed
with the EM algorithm in Section 3.2, we define some conditional

Moreover, for any λ = (λ1 , λ2 , . . . , λ p ) ∈ the perturbation- R p, distributions related to random compositions and real variables. As
based linear combinations of xi and x with respect to λ can be implied in Definition 1, the distributions of random (or multivari-
Z. Wang, H. Wang and S. Wang / Neurocomputing 335 (2019) 48–58 51

ate random) compositions are defined by their ilr coordinates. The 3.2. Parameter estimation
conditional PDFs related to the simplex and real space can be de-
rived directly as follows. When there are no random effects in Model (5), that is, CoLM
presented in [22], only β and σ 2 need to be identified. These es-
Lemma 1. In Model (5), yi satisfies the multivariate normal distri- and σˆ 2 , that
timates have the explicit solutions, denoted by β
bution in SnDi for the given θ , that is, yi |θ ∼ NS (μyi , yi ), where
LM LM
is,
μyi = ilr(xi )β and yi = ilr(zi )Gilr(zi ) + σ 2 Ii∗ . Specifically, the PDF −1 
of yi |θ is 
N 
N

β =

ilr(xi ) ilr(xi )

ilr(xi ) ilr(yi ) , (7)
LM
1
g( y i | θ ) = i=1 i=1
(2π )(D−1)ni /2 |yi |1/2

1
1
N

exp − (ilr(yi ) − μyi ) −1 yi ( ilr ( y i ) − μy ) σˆ LM
2
= ) (ilr(y ) − ilr(x )β
(ilr(yi ) − ilr(xi )β ), (8)
2 i LM i i LM
K
i=1
for any yi ∈ SnDi . 
where K = (D − 1 ) N i=1 ni .
Lemma 2. In Model (5), yi conditional on bi satisfies the multivari- To implement the ML estimation for the parameters of general
ate normal distribution in SnDi for the given θ , that is, yi |(bi , θ ) ∼ CoLMM, we rely on the EM algorithm. For a pair of known G and
NS (μyi |bi , σ 2 Ii∗ ) with μyi |bi = ilr(xi )β + ilr(zi )bi . Specifically, the PDF σ 2 (or their estimates), maximizing the log-likelihood function of
of yi |(bi , θ ) is the compositional responses yields the ML estimate of β, that is

1 
N
h ( yi |bi , θ ) = = arg max
β log g(yi |θ )
2 (D−1 )ni /2
( 2π σ ) β ∈R p

1

i=1
−1 
exp − (ilr(yi ) − μyi |bi ) (ilr(yi ) − μyi |bi ) 
N 
N
2σ 2  
= ilr(xi )  −1
yi ilr ( xi ) ilr(xi )  −1
yi ilr ( yi ) . (9)
for any yi ∈ SnDi . i=1 i=1

Lemma 3. In Model (5), bi conditional on yi satisfies the mul- When G and σ 2 are unknown but their estimates G and σˆ 2 are
tivariate normal distribution in Rq for the given θ , that is,
available, we can also obtain β by replacing yi with its estimate

bi |(yi , θ ) ∼ N (μbi |yi , bi ), where μbi |yi = Gilr(zi ) −1
y i r i , b i = G − y in (9). Next, it suffices to estimate G and σ 2 .
 i

yi ilr (zi )G and ri = ilr (yi ) − ilr (xi )β.
Gilr(zi ) −1 Assume that we were to observe bi , in addition to yi , then the
full-sample log-likelihood function based on the data {yi , bi }ni=1 ,
Finally, we introduce the joint PDF of yi and bi for the given
denoted by L(y, b) (i.e., L(yi , bi ; i = 1, 2, . . . , N ) precisely), can be
θ , denoted by f(yi , bi |θ ). By using the equivalent definition of formulated as
the ilr coordinates, we can express this as f (yi , bi |θ ) = h(yi |bi , θ ) ·
φ (bi |G ), where φ is the particular normal PDF of bi for the given 
N 
N 
N
L(y, b) = log f (yi , bi |θ ) = log h(yi |bi , θ ) + log φ (bi |G )
G.
i=1 i=1 i=1
Remark 1. The simplicial normal assumption on εi indicates the
1   −1 1  
N N
K N
particular normality of εij , that is, εi j ∼ NS (0D−1 , σ 2 ID−1 ). Without ∼− log σ 2 − log |G| − bi G bi − ei ei ,
2 2 2 2σ 2
loss of generality, we make this normal assumption and thereby i=1 i=1
deduce the simplicial normality of yi (see Lemmas 1–3), which where ei = ilr(yi ) − μyi |bi and the terms irrelevant to G and σ 2
is a common and widely used assumption for compositional data.
have been dropped. Therefore, the estimates of G and σ 2 are ob-
Moreover, the particular covariance structure indicates that εij can
tained by the first order necessary condition, that is
be further decomposed into D − 1 independent identical sub-errors
corresponding to its ilr coordinates. As the components of the dif- 
N
1 
N
= 1 bi bi σˆ 2 = ei ei .
ferent ilr coordinates of εij ( j = 1, 2, . . . , ni ) are assumed to be un- G
N
and
K
correlated, with equal variance, εij can be considered sampling or i=i i=1

measurement errors, as well as their ilr coordinates. In principle, Following the EM algorithm idea, when a group of estimates θ is
εij can be assumed to satisfy any simplicial distribution with null (t ) = (β
available, denoted by θ (t ) , G
(t ) , (σˆ (t ) )2 ), where the super-
center. However, it would be far complicated to obtain the expres- script t indicates the iterations and t = 0 denotes the initial values
sions of the related PDFs under the other non-normal assumptions of the estimates, G and σˆ 2 can be modified by the corresponding
in the simplex. For this, one should use a Monte Carlo EM algo- conditional expectations, that is
rithm.

N   N 
 
(t+1) = 1 (t ) = 1 
Remark 2. The independence of the compositional error and the G E bi bi |yi , θ b(i t ) (t ) ,
b(i t ) +  (10)
N N b i
random effects coefficients is an important assumption in the the- i=1 i=1
oretical framework of LMM. The covariance between εi ∈ SiD∗ and
bi ∈ Rq in Model (5) is defined as 1   
N
(σˆ (t+1) ) =
2 (t )
E ei ei |yi , θ
cov(εi , bi ) = cov(ilr(εi ), bi ). (6) K
i=1

1   (t ) (t ) 
Here, the ilr transformation in (6) does not affect the independence N

ei
2 2
(t ) )−1 ) ,
ei + (σˆ (t ) ) tr(Ii∗ − (σˆ (t ) ) (
assumption that cov(ilr(εi ), bi ) = 0i∗ 0q . In fact, for any two spec- = yi
K
ified ilr transformations ilrs associated with the contrast matrix i=1
s (s = 1 or 2), there exists an orthogonal matrix A ∈ R(D−1)×(D−1) (11)
such that 2 = A1 . Therefore, the covariance using ilr2 can be
b(i t ) , 
where tr(·) denotes the trace of a matrix. Here, (t ) ,
ei(t ) , and
formulated as 2 E[log (εi )bi ] = A1 E[log (εi )bi ], which will also bi
be equal to 0i∗ 0q if the independence assumption using ilr1 holds. (t ) are the corresponding estimates computed through θ
 (t ) at the
yi
52 Z. Wang, H. Wang and S. Wang / Neurocomputing 335 (2019) 48–58

t-th iteration. Specifically, bi is estimated by its conditional expec- can be drawn for σ 2 that σˆ 2 ≡ σˆ LM
2 . These results verify that CoLM

tation given yi , that is actually works as the pooled method for longitudinal multivariate
 −1 compositional data.

b(i t ) = μ (t ) ilr(zi ) 
(t ) = G (t ) ri(t ) ,
(12)
b |y i i
yi
To summarize, we describe the computation algorithm of the
where (t ) for i = 1, 2, . . . , N. Thus,
ri(t ) is also computed through θ proposed parameter estimation procedure for CoLMM as follows:
( )
by (9), β can be updated as
t The computational complexity of the above parameter esti-
−1 mation procedure relates to three aspects of parameters: the
  (t+1 ) −1

N data scale (e.g., N, ni , and D), model pattern (e.g., p and q),
(t+1) =
β ilr(xi ) yi ilr(xi ) and iteration number T. Let n denote the maximum of ni
i=1 (i = 1, 2, . . . , N). Theoretically, the time complexity may reach

O (T ND3 n3 + T ND2 n2 p2 + T ND2 n2 q2 + T N p3 ) from the calculation
  (t+1 ) −1

N
ilr(xi ) yi ilr(yi ) , (13)   (t+1 ) −1 (t+1 ) with di-
of the inverse of N ilr(x ) (
i=1 i ) ilr(x ) and 
yi i yi
i=1 mensions p and (D − 1 )ni , respectively. Note that, to make sure the
(t+1 ) is the related estimate at the (t + 1)th iteration. identification of CoLMM, p and q are not expected to increase ar-
where  yi
bitrarily, and that D is usually small in practice. The algorithm can

When β (t ) , G (t ) , and σˆ (t ) are substituted by their updates, the
also be accelerated through some optimizations. As verified in nu-
present estimate θ (t ) achieves the modification, that is, θ
(t+1 ) .
merical studies, most of the parameters in CoLMM enjoy the linear
The typical EM algorithm converges to a local optimum, not time complexity in terms of overall computational time.
necessarily the global one. Fortunately, the proposed EM algo-
rithm for CoLMM is always convergent in the proposed frame-
work because the quadratic convex optimization is involved. The 3.3. Some issues
convergence criterion is that the maximum norm of the differ-
ence between the present and previous parameter estimates is In this section, we discuss the selection of random effects in
smaller than a certain boundary. Specifically, the algorithm con- CoLMM, that is, the constitution of zi in Model (5). Many solu-
verges if and only if the estimated parameters from the present tions have been presented for the model selection in conventional
(say β (t ) and G (t ) ) and the previous (say β (t−1 ) and G
(t−1 ) ) itera- LMM [5,8,18]. This paper adopts the BIC-type selector and proposes

tions achieve the following BIC as BIC(zi ) = −2 N i=1 log g(yi |θ ) + log N · qzi (qzi +
 (t ) (t−1)  1 )/2, that is,
max β − β ∞ , G (t ) − G (t−1) ∞ <
for a given convergence threshold > 0, where ·∞ denotes the 
N
BIC(zi ) = −1 (z )
ri (zi )
yi i ri ( zi )
maximum norm of a vector or matrix. We will also stop the algo-
i=1
rithm if it exceeds the iteration limit l. In this paper, we specifically
set = 10−4 and l = 100. Although there is no proof of conver- 
N
+ y (z )| + log N · 1 qz (qz + 1 ),
log | (16)
i
gence for the proposed procedure, our simulation studies demon- i
2 i i
i=1
strate that this choice performs satisfactorily and the algorithm
converges to the true parameters consistently. where y (z ) denote the related estimates of the given
ri (zi ) and  i i
Inspired by the initialization for the conventional LMM with zi , and qzi denotes the dimension of zi . The BIC value of CoLM is
scalar variables in [12], we choose the initial value β (0 ) as β
.
LM simplified to K + K log σˆ 2 . The optional zi that minimizes (16) will
Therefore, β (0 ) and σˆ (0 ) can be computed through β (0 ) as
be selected as the random effects in CoLMM, and the related θ
N  will be used as the final parameter estimates.
 −1 
(0) = 1


b(i 0)
2
G b(i 0) − (σˆ (0) ) (ilr(zi ) ilr(zi )) , (14) Next, we consider the statistical inferences of fixed effects coef-
N ficients. Because the covariances of the ilr coordinates are equiv-
i=1
alent to those of their corresponding compositions, from using
 2 Lemma 1, we have cov(ilr(yi )) = yi . Thus, we derive the covari-
1   (0)
N
σˆ (0) = ilr(yi ) − ilr(zi )
b(i 0) ri , (15) ance of β in (13) as follows:
M
i=1
−1
where M = K − (N − 1 )q − p and
 −1  (0 )
b(i 0 ) = (ilr(zi ) ilr(zi )) ilr(zi ) 
N
ri . ) =
cov(β
 −1
ilr(xi ) yi ilr(xi ) , (17)
As will be verified in numerical studies, the proposed parameter
i=1
estimation procedure for CoLMM is insensitive to initialization of
the algorithm. Compared to the other settings of the initial values where the diagonal elements are related variances of
of parameters, the aforementioned initialization is most effective the components, denoted by var(βˆi ) (i = 1, 2, . . . , p) with
in convergence, which is probably due to utilization of the result = (βˆ , βˆ , . . . , βˆ ) . We then use these variances to construct the
β 1 2 p
by CoLM. test statistics, that is,
For a group of estimated parameters θ = (β , G
, σˆ ), the fitted
compositional response, for example, yi j in Model (1), can be ex- βˆi
 z  for the reduced Z (βˆi ) =  ( i = 1, 2, . . . , N ),
pressed as y = x  β b (or y = x  β
ij ij ij i ij ij var(βˆi )
model), where the subject-specific characteristics
bi can also be
measured with its conditional expectation. which satisfies the standard normal distribution under the null hy-
pothesis, βi = 0. In practice, one can also achieve the interval esti-
Remark 3. We can directly verify that the proposed EM estimation mation of β via the bootstrap method.
procedure for CoLMM is consistent with the aforementioned solu- As shown above, the proposed EM estimation and statistical in-
tions for CoLM. In fact, when there are no random effects, that is, ferences of β are carried out via a given ilr transformation. How-
no zi , bi , and G in Model (5), yi is σ 2 proportional to Ii∗ , implying ever, these estimates and statistical inferences are independent of
that (9) and (13) are identically equal to β . Similar conclusion the specified ilr transformation, or equivalently, the contrast matrix
LM
Z. Wang, H. Wang and S. Wang / Neurocomputing 335 (2019) 48–58 53

as
. For example, by using (4), we can reformulate β are extremely larger than the corresponding true values. More-
LM
−1  over, CoLM almost fails in fitting compositional responses, since

N 
N most of the CoMAPE values exceed 85% and the CoGOF values

β =

log (xi ) En (D ) log (xi )

log (xi ) En (D ) log (yi )
LM hover around 0.4 only. In contrast, CoLMM fits the compositional
i=1 i=1 responses well, where the CoMAPE values are relatively low and
with constant matrix En (D). Similarly, the parameter estimates, BIC, the CoGOF values are significantly high, approaching 1.
and the covariances in (10)–(17) can also be independent of . Furthermore, for the proposed MoLMM, we report the average
biases and the empirical standard derivations of the estimated co-
variance matrix for the random effects in Table 2. As shown in
4. Numerical studies
Table 2, the average biases approach 0 as the number of individ-
uals (N) or the sample size of each individual (ni ) increases; the
In this section, we report the empirical results on three aspects:
corresponding standard derivations become smaller as N increases.
(1) the accuracy of the parameter estimation and model selection;
This result indicates that the proposed estimate of G is consistent.
(2) the influence on performance from different initializations; and
Next, we focus on the performance of the BIC-type selector
(3) the empirical computational complexity of the algorithm. Two
in model selection. Specifically, in the aforementioned simulation
measures are introduced to evaluate the performance of CoLMM,
experiments, where N = 30, ni = 5, and σ = 0.5, we considered
that is, the mean absolute percentage error (CoMAPE) and good-
the CoLMM with random effects consisting of all the subsets of
ness of fit (CoGOF) for compositional data, which are defined re-
compositional covariates as the alternatives (i.e., 25 = 32 models in
spectively as follows, and given as example in Model (1):
total). All the BIC values are computed with over 500 replications,
D − 1 
N ni with the number of correctly determined random effects reaching
da ( yi j ,
yi j )
CoMAPE = × 100%, 498 (99.6%). See Fig. 1 for the related boxplot giving the BIC values
K yi j a
i=1 j=1 of the total alternative models.
N n i From Fig. 1, the BIC values of CoLM (in grey) are generally larger
i=1 j=1
yi j 
yi j 2a
CoGOF = 1 − N n i . than those of CoLMM. The model with all/part of the true three co-
i=1 j=1
yi j 2a variates have relatively small BIC values, and the one with exactly
the true random effects setting, that is, {1, 2, 3}, realizes the min-
Specifically, CoMAPE is nonnegative; a lower CoMAPE indicates a
imum value. Moreover, having redundant covariates may lead to a
better group of fitted compositional responses with lower compo-
large BIC value or an unstable one.
sitional errors. A CoGOF approaching 1 indicates a better regression
model with more interpretable information.
4.2. Different initializations
4.1. Parameter estimation and model selection We follow the simulation experiments in Section 3.2 and, for
simplification, set N = 60, ni ≡ 5 (i = 1, 2, . . . , N), and σ = 0.5. Let
We generate the data from Model (5), where D = 4, p = 5, and θ δ(0) = (β
(0 ) , G
(0 ) , σˆ (0 ) ) denote the varied initial value of parame-
β = (2, 1, −1, −2, 0 ) . For the i-th individual, xi is piled up by ni δ δ δ
ters for the proposed EM algorithm, with δ ≥ 0. In each simulation,
random samples xij ( j = 1, 2, . . . , ni ) independently taken from the ( ) (0 ) ± δ 1 , that
we generate β δ uniformly from the hypercube β
0
p
multivariate normal distribution NS (015 , I15 ) in S54 . εi satisfies the is,
normal distribution NS (03 , σ 2 I3 ) in S4 . zi consists of the first three
covariates within xi ; therefore, q = 3. bi is generated from the mul- [βˆ1(0) − δ, βˆ1(0) + δ ] × [βˆ2(0) − δ, βˆ2(0) + δ ] × · · · × [βˆ p(0) − δ, βˆ p(0) + δ ],
tivariate normal distribution N (03 , G ) in R3 , where G is set
(0 ) and σδ(0) through β (0 ) analogously to (14) and
  and compute G δ δ
g11 g12 g13 9 4.8 0.6 (15). Here, the radius δ ranging from 0 to 104 denotes the dis-
G= g21 g22 g23 = 4.8 4 1 . tance between θ and θ
( 0 ) (0 ) specified in Section 3.2 from near to
g31 g32 g33 0.6 1 1 δ
far. Specifically, δ = 0 indicates θ (0 ) , and δ = 104 is approximately
Moreover, the number of individuals in the population and the the case where θ (0 ) is randomly selected. The simulation is repli-
δ
sample size for each individual are set at (N, ni ) = (30, 5 ), (60, 5), cated independently 500 times. Table 3 reports the iteration results
and (60, 10). Three values of σ , 0.5, 1 and 1.5, are used to re- in the proposed EM algorithm for CoLMM with different initializa-
flect the signal-to-noise ratio from strong to weak. For each setting, tions.
we independently replicate the simulation 500 times and conduct As illustrated in Table 3, the proposed EM algorithm converges
CoLMM with true random effects; we also replicate the baseline in all the simulations. This implies that the estimated results do
model CoLM. Table 1 summarizes the average and the empirical not vary with the different initializations. The use of θ (0 ) gives
standard derivations of estimated coefficients (β and σˆ ), and two the best iteration performance, taking the least computational time
measures (CoMAPE and CoGOF). and average iteration number, and approaches the performance of
The simulation results in Table 1 are encouraging. In general, the oracle case. As θ (0 ) deviates from θ
(0 ) , the iteration number
δ
both CoLM and CoLMM can realize efficient estimates of β, and the of the algorithm increases gradually with the computational time.
parameter estimates obtained from CoLMM are more stable with However, the cost of time is limited; the algorithm beginning with
smaller empirical standard derivations than those obtained from approximately randomly selected initial values of parameters takes
CoLM. For the significant hypothesis test H0 : β j = 0, the absolute only 0.67 (≈ 0.316/0.189 − 1) time longer to converge. From these
values of the test statistics for β j , |Z (βˆ j )| ( j = 1, 2, 3, 4), are clearly simulation results, we conclude that initialization does not have
larger than 1.96, which implies that the first four covariates (i.e., significant influence on the proposed EM algorithm.
xj , j = 1, 2, 3, 4) are significant at 0.05 level. But for β 5 , |Z (βˆ5 )| is
smaller than 1.96, implying that the corresponding covariate x5 is 4.3. Empirical computational complexity
non-significant at 0.05 level. These results coincide with the true
setting that β 5 is equal to 0 but the others are not. For differ- To illustrate the empirical computational complexity of the pro-
ent levels of signal-to-noise ratios, CoLMM succeeds in estimating posed EM algorithm, we vary the values of the parameters inter-
σ 2 , whereas CoLM performs badly, since the estimates from CoLM ested in Model (5) separately and record the corresponding overall
54 Z. Wang, H. Wang and S. Wang / Neurocomputing 335 (2019) 48–58

Table 1
and σˆ , CoMAPE (unit, %), and CoGOF of CoLMM
Means and standard derivations (in brackets) of the estimated parameters β
and CoLM in Section 4.1.

(N, ni ) Model βˆ1 βˆ2 βˆ3 βˆ4 βˆ5 σˆ CoMAPE CoGOF

σ = 0.5
(30, 5) CoLM 2.018 1.027 −0.999 −2.002 −0.015 3.647 84.7 0.441
(0.615 ) (0.419 ) (0.25 ) (0.176 ) (0.172 ) (0.437 ) ( 8.3 ) (0.091 )
CoLMM 2.006 1.009 −0.995 −2.001 0.001 0.499 14.6 0.991
(0.549 ) (0.363 ) (0.185 ) (0.026 ) (0.026 ) (0.018 ) ( 1.5 ) (0.002 )
(60, 5) CoLM 1.996 1.004 −0.999 −2.001 0.001 3.703 85.2 0.427
(0.399 ) (0.283 ) (0.179 ) (0.119 ) (0.124 ) (0.314 ) ( 5.8 ) (0.064 )
CoLMM 2 0.996 −1.005 −2.002 −0.001 0.499 14.6 0.992
(0.387 ) (0.25 ) (0.129 ) (0.018 ) (0.019 ) (0.012 ) (1 ) (0.001 )
(60, 10) CoLM 2.025 1.004 −1.007 −2 −0.001 3.75 84.8 0.424
(0.412 ) (0.274 ) (0.162 ) (0.087 ) (0.09 ) (0.314 ) ( 4.6 ) (0.061 )
CoLMM 2.024 1.013 −1.003 −2 0 0.5 15.3 0.991
(0.396 ) (0.254 ) (0.127 ) (0.013 ) (0.012 ) (0.009 ) (1 ) (0.001 )
σ =1
(30, 5) CoLM 2.018 1.027 −0.999 −2.003 −0.014 3.749 86.6 0.428
(0.616 ) (0.419 ) (0.253 ) (0.181 ) (0.175 ) (0.427 ) ( 7.5 ) (0.089 )
CoLMM 2.007 1.01 −0.995 −2.002 0.002 0.997 28 0.966
(0.551 ) (0.366 ) (0.192 ) (0.052 ) (0.052 ) (0.036 ) ( 2.7 ) (0.007 )
(60, 5) CoLM 1.997 1.004 −1 −2.002 0 3.801 87.2 0.414
(0.401 ) (0.285 ) (0.182 ) (0.122 ) (0.126 ) (0.307 ) ( 5.3 ) (0.053 )
CoLMM 2 0.996 −1.006 −2.004 −0.003 0.999 28.2 0.967
(0.388 ) (0.253 ) (0.134 ) (0.036 ) (0.037 ) (0.025 ) ( 2,1 ) (0.004 )
(60, 10) CoLM 2.025 1.003 −1.006 −2 −0.001 3.849 87 0.411
(0.413 ) (0.275 ) (0.162 ) (0.089 ) (0.093 ) (0.307 ) ( 4.4 ) (0.06 )
CoLMM 2.023 1.012 −1.003 −2 0 1 29.4 0.964
(0.398 ) (0.255 ) (0.129 ) (0.026 ) (0.024 ) (0.017 ) ( 1.7 ) (0.005 )
σ = 1.5
(30, 5) CoLM 2.018 1.028 −0.998 −2.004 −0.013 3.912 88.9 0.408
(0.617 ) (0.421 ) (0.258 ) (0.19 ) (0.182 ) (0.412 ) (7 ) (0.086 )
CoLMM 2.009 1.011 −0.996 −2.004 0.002 1.499 40 0.926
(0.554 ) (0.371 ) (0.202 ) (0.077 ) (0.079 ) (0.081 ) ( 3.8 ) (0.019 )
(60, 5) CoLM 1.998 1.005 −1.001 −2.004 0 3.961 89.4 0.395
(0.403 ) (0.289 ) (0.185 ) (0.126 ) (0.13 ) (0.297 ) ( 4.9 ) (0.06 )
CoLMM 2 0.996 −1.006 −2.006 −0.004 1.498 39.9 0.927
(0.389 ) (0.257 ) (0.141 ) (0.054 ) (0.055 ) (0.037 ) ( 2.5 ) (0.009 )
(60, 10) CoLM 2.024 1.003 −1.006 −2 −0.002 4.009 89.3 0.391
(0.415 ) (0.277 ) (0.164 ) (0.093 ) (0.098 ) (0.295 ) ( 3.8 ) (0.057 )
CoLMM 2.022 1.011 −1.002 −2 0 1.499 41.6 0.922
( 0.4 ) (0.256 ) (0.132 ) (0.039 ) (0.036 ) (0.026 ) ( 2.5 ) (0.01 )

Table 2 Table 3
Means and standard derivations (in brackets) of the biases of the estimated co- Computational time (in s) and iteration performance of the proposed EM algorithm
in CoLMM in Section 4.1. ri j = gi j − gˆi j (i = 1, 2, 3; 1 ≤ j ≤ i)
variance matrix G with different initializations indicated by the radius of the hyper-cubic domain of
denotes the biases of the corresponding elements of G . random initial values. The row “Oracle” indicates the case where the algorithm be-
gins with the true parameter setting in the generation model.
σ (N, ni ) r11 r21 r22 r31 r32 r33
Iteration number
σ = 0.5 (30, 5) 0.394 0.188 0.023 0.13 0.024 0.03
(2.248 ) (1.399 ) (0.573 ) (1.032 ) (0.439 ) (0.273 )
Radius Time Min. Average Max. Histogram
(60, 5) 0.194 0.119 0.033 0.089 0.025 0.018 (δ = )
(1.578 ) (0.934 ) (0.375 ) (0.675 ) (0.271 ) (0.18 )
(60, 10) 0.01 −0.003 −0.013 0.016 0.004 0.022
(1.65 ) (0.999 ) (0.417 ) (0.733 ) (0.304 ) (0.191 ) 0 0.189 3 5.436 7
(0.028 )
σ =1 (30, 5) 0.398 0.188 0.022 0.13 0.022 0.029
(2.266 ) (1.407 ) (0.595 ) (1.048 ) (0.457 ) (0.292 )
(60, 5) 0.2 0.124 0.03 0.093 0.026 0.022 5 0.238 7 10.614 16
( 1.6 ) (0.945 ) (0.387 ) (0.686 ) (0.279 ) (0.19 ) (0.033 )
(60, 10) 0.01 −0.004 −0.014 0.018 0.004 0.022
(1.658 ) (1.001 ) (0.422 ) (0.738 ) (0.306 ) (0.195 ) 10 0.253 8 11.944 16
σ = 1.5 (30, 5) 0.417 0.196 0.025 0.137 0.022 0.03 (0.042 )
(2.322 ) (1.429 ) (0.63 ) (1.076 ) (0.486 ) (0.324 )
(60, 5) 0.206 0.128 0.029 0.098 0.029 0.027 50 0.269 12 13.94 19
(1.631 ) (0.962 ) (0.407 ) (0.705 ) (0.292 ) (0.208 ) (0.04 )
(60, 10) 0.009 −0.005 −0.014 0.02 0.003 0.023
(1.67 ) (1.005 ) (0.43 ) (0.746 ) (0.311 ) (0.203 )
100 0.277 12 14.624 19
(0.021 )

500 0.291 14 16.57 20


(0.022 )
computational time of CoLMM as well as that of CoLM for com-
parison. The specified generation model settings for the different 10 0 0 0.295 13 17.178 21
(0.024 )
parameters are summarized in Table 4, where the influence on
time from the five parameters is considered separately. We repli- 50 0 0 0.305 14 18.956 23
(0.034 )
cate the simulation for each parameter independently 200 times,
considering both CoLMM and CoLM. Specifically, N along with n 10 0 0 0 0.316 17 19.854 24
 (0.028 )
(ni ) in CoLMM denotes the corresponding CoLM with N i=1 ni ob-
Oracle 0.189 4 4.554 6
servations (or Nn for balanced samples). Table 5 reports the effi- (0.034 )
ciency of CoLMM and CoLM with separately varying parameters in
terms of overall computational time.
As illustrated in Table 5, the overall computational time for N, n
(ni ), p, and q enjoys the linear increase with the related parameter, CoGOF increased sharply from approximately 0.42 in CoLM to 0.99
while that for D is approximately quadratic. Compared with CoLM, in CoLMM. Moreover, the iteration numbers T for N, D, and n (ni ) in
CoLMM takes approximately double time to compute. Specifically, CoLMM decrease when the related parameter increases separately,
the extra time cost for N and n (ni ) is relatively large, and that whereas those for p and q increase along with T. Here, the increase
for D and p is relatively small. This is reasonable, considering the of N, D, and n (ni ) implies the enrichment of data, thus reducing
significant improvement in regression, for example, the values of the estimation procedure for the convergent result. In contrast, an
Z. Wang, H. Wang and S. Wang / Neurocomputing 335 (2019) 48–58 55

Fig. 1. Boxplots for the BIC values of all the 32 alternative CoLMM N = 30, ni = 5, and σ = 0.5. The labels in the horizontal axis indicate the constitution of random effects
in CoLMM, and “Null” denotes CoLM specifically.

Table 4 the existing regression for compositional data, that is, CoLM, for
Description of the specified generation model settings in Section 4.2. G0 and β0
comparison.
denote the corresponding values in the simulation experiments in Section 4.1, Gq
denotes the covariance matrix constituted by gi j = ρ |i− j| (i, j = 1, 2, . . . , q) with From the integrated perspective of the three main industries,
ρ = 0.6, and β p = (β0 , 0, . . . , 0 ) ∈ R p . we regressed the gross regional domestic product (GRDP) against
three economic indicators in China, including the total investment
Value of parameters
in fixed assets (INVT), urban unit employment (UrEm), and wage
Parameter interested N D n (ni ≡ ) p q G β σ of urban unit employment (WaUrEm). Related data for the 31
N – 3 5 5 3 G0 β0 0.5 provinces of China during 2011–2016 are available from the Na-
D 60 – 5 5 3 G0 β0 0.5 tional Bureau of Statistics of China (http://data.stats.gov.cn/english/
n (ni ) 60 3 – 5 3 G0 β0 0.5 easyquery.htm?cn=E0103). These data are then processed into the
p 150 3 5 – 3 G0 βp 0.5 compositional variables with three inner parts that correspond
q 150 3 5 15 – Gq 1p 0.5 with the proportions of the primary, secondary, and tertiary indus-
tries.
We carry out a series of CoLMM with all possible constitutions
of these compositional data, including CoLM. Table 6 reports the
increase in p and q complicates the pattern of CoLMM, account-
results of all the eight alternatives, where the significance is based
ing for increased iterations. From these simulation results, we con-
on the standard derivations obtained using the bootstrap method.
clude that, in terms of overall computation time, the proposed EM
From Table 6, the value of CoGOF in CoLM reaches 0.857, with
algorithm is efficient in estimating the parameters in CoLMM. Our
relatively high corresponding values of σˆ and CoMAPE. However,
method greatly improves the performance of the pooled regression
the introduction of random effects increases the value of CoGOF
model CoLM with a reasonable extra computational time cost.
from 0.857 to at least 0.97, along with the values of σˆ from 0.424
to 0.12 and CoMAPE from 37.9% to 10.9%. In all the CoLMM and
5. Application CoLM, most of the estimated fixed effects coefficients pass the sig-
nificance test. These coefficients estimated with CoLMM were gen-
In this section, we apply CoLMM to real data and illustrate the erally more stable than those with CoLM, and had lower standard
practicability and usefulness of the proposed method. We then use derivations.
56 Z. Wang, H. Wang and S. Wang / Neurocomputing 335 (2019) 48–58

Table 5
CoGOF, iteration numbers, and overall computational time (in s) of the proposed EM algorithm with different model settings in
Section 4.3. The column “Iteration” reports the means of iteration numbers in related simulations, and the sub-column “Ratio”
presents the ratios of the overall computational time of CoLMM to that of CoLM.

Computational time

Parameter CoGOF Iteration CoLMM CoLM Ratio Bar plot


(CoLMM / CoLM ) (T = )

N= 60 0.992 / 0.424 6.75 0.214 0.082 2.61


120 0.993 / 0.42 6.185 0.413 0.165 2.5
180 0.993 / 0.418 5.805 0.593 0.239 2.48
240 0.993 / 0.416 5.555 0.772 0.299 2.58
300 0.993 / 0.417 5.34 0.902 0.374 2.41

D= 6 0.991 / 0.418 4.555 0.255 0.129 1.98


12 0.99 / 0.419 3.775 0.463 0.26 1.78
18 0.99 / 0.425 3.31 0.843 0.48 1.76
24 0.99 / 0.425 3.13 1.365 0.764 1.79
30 0.99 / 0.421 3.025 2.109 1.142 1.85

n= 10 0.991 / 0.424 4.755 0.35 0.161 2.17


( ni ≡ )
20 0.99 / 0.422 3.905 0.498 0.245 2.03
30 0.99 / 0.425 3.64 0.712 0.364 1.96
40 0.99 / 0.415 3.405 0.952 0.463 2.06
50 0.99 / 0.424 3.13 1.229 0.569 2.16

p= 5 0.991 / 0.41 4.295 0.72 0.341 2.11


10 0.991 / 0.421 5.095 0.825 0.439 1.88
15 0.991 / 0.421 5.395 0.903 0.551 1.64
20 0.991 / 0.417 5.755 1.033 0.648 1.59
25 0.991 / 0.424 5.96 1.148 0.789 1.46
30 0.991 / 0.424 5.985 1.251 0.907 1.38

q= 3 0.988 / 0.823 4.725 0.902 0.548 1.65


6 0.992 / 0.708 5.61 0.995 0.552 1.8
9 0.994 / 0.624 6.605 1.105 0.542 2.04
12 0.996 / 0.554 7.315 1.194 0.552 2.16
15 0.997 / 0.498 7.18 1.282 0.546 2.35

Table 6
Results for all the alternative models in the real data study. These models are distinguished by their random effects, and “Null”
denotes CoLM. The columns “INVT”, “UrEm” and “WaUrEm” denote the corresponding estimated fixed effects coefficients with
their standard derivations (in brackets) and significance.

Random effects INVT UrEm WaUrEm σˆ BIC CoMAPE (%) CoGOF

Null 0.378∗∗∗ 0.348∗∗∗ −0.147∗∗∗ 0.424 −904.299 37.9 0.857


(0.015 ) (0.025 ) (0.028 )
{x1 } 0.056∗∗∗ 0.039∗ 0.314∗∗∗ 0.188 −751.915 17.7 0.974
(0.011 ) (0.019 ) (0.023 )
{x2 } 0.152∗∗∗ 0 0.292∗∗∗ 0.181 −779.345 18.7 0.976
(0.007 ) (0.017 ) (0.018 )
{x3 } 0.154∗∗∗ 0.053∗∗ 0.246∗∗∗ 0.176 −796.838 17.9 0.977
(0.007 ) (0.018 ) (0.019 )
{x1 , x2 } 0.193∗∗∗ 0.115∗∗∗ 0.146∗∗∗ 0.145 −864.087 13.6 0.986
(0.007 ) (0.021 ) (0.022 )
{x1 , x3 } 0.189∗∗∗ 0.072∗∗∗ 0.187∗∗∗ 0.143 −871.297 13.7 0.986
(0.007 ) (0.021 ) (0.022 )
{x2 , x3 } 0.184∗∗∗ 0.288∗∗∗ −0.007 0.148 −850.319 13.4 0.985
(0.007 ) (0.023 ) (0.024 )
{x1 , x2 , x3 } 0.22 ∗∗∗ 0.253∗∗∗ 0.006 0.12 −931.457 10.9 0.991
(0.006 ) (0.019 ) (0.019 )
∗∗∗ ∗∗
, and ∗ indicate significance at the significance levels 0.001, 0.01 and 0.05, respectively.

indicates the selected model with minimum BIC value.
Z. Wang, H. Wang and S. Wang / Neurocomputing 335 (2019) 48–58 57

Algorithm 1 Parameter estimation procedure for CoLM.M


N,ni
Input:The data set {(xi j , yi j , zi j )}i, j=1 (0 ) , with the intermediate 
; the initial value of the parameter θ (0 ) ; the convergence threshold ; the
yi
iteration limit l.
= (β
Output:θ , G
, σˆ 2 ) and
bi (i = 1, 2, . . ., N).
1: Set t = 0;
2: repeat
3: ri(t+1 ) ,
compute b(i t+1 ) , and
ei(t+1 ) for i = 1, 2, . . ., N:

(t ) ,
ri(t+1) = ilr(yi ) − ilr(xi )β

 −1
(t ) ilr(zi ) 
b(i t+1) = G (t ) ri(t+1) ,

yi

ri(t+1) − ilr(zi )
ei(t+1) = b(i t+1) ;

4: (t+1 ) and (σˆ (t+1 ) )2 :


compute G

N  (t ) −1
(t+1) = 1

G (
b(i t+1) (t ) (Iq − ilr(zi ) 
b(i t+1) + G (t ) )),
ilr(zi )G
yi
N
i=1

1  (t+1) (t+1) 2  (t ) −1


N
2
(σˆ (t+1) ) = (
ei
ei
2
+ (σˆ (t ) ) tr(Ii∗ − (σˆ (t ) )  ));
yi
K
i=1

5: (t+1 ) for i = 1, 2, . . ., N:
compute  yi

 i
(t+1) ilr(zi ) + (σˆ (t+1) ) Ii∗ ;
(t+1) = ilr(z )G 2
yi

6:
(t+1 ) :
compute β
−1 

N
 (t+1 ) −1

N
 (t+1 ) −1
(t+1) =
β ilr(xi ) ( ) ilr(xi ) ilr(xi ) ( ) ilr(yi ) ;
yi yi
i=1 i=1

7: let t = t + 1;
8: until

(t ) − β
max {β (t−1)  , G
(t ) − G
(t−1) ∞ } < or t > l;

9: return

θ = (β
(t ) , G
(t ) , (σˆ (t ) )2 ) and
bi =
b(i t ) (i = 1, 2, . . ., N ).

Table 7 contrast, INVT shows a relatively stable influence on GRDP across


Estimated covariance matrix of the random
all provinces; its variance of random effects is only 0.211.
effects for the final regression model.
In conclusion, this case study illustrates the application value
INVT UrEm WaUrEm of the proposed LMM for multivariate longitudinal compositional
INVT 0.211 0.472 −0.54 data. By introducing three random effects that vary with individ-
UrEm 1.621 −1.745 uals, our method improves the fitting performance of the com-
WaUrEm 1.906 positional responses and enhances the interpretability of the re-
gression. However, as a heuristic study, it remains many economic
questions to be discussed, such as the choice of compositional
covariates and the exhaustive explanation for estimated parame-
ters. Some problems for compositional data modeling have also to
Since the model with random effects containing all the three
be solved; for instance, the modeling under other distribution as-
covariates achieves the minimum BIC value, for which σˆ , CoMAPE,
sumptions in the simplex, such as the Dirichlet distribution. These
and CoGOF also approach the best values, it is selected as the final
issues need to be considered in further research.
regression model. Specifically, in this model, both INVT and UrEm
have a significantly positive correlation with the industrial struc-
ture of GRDP for all the provinces of China in general. This implies 6. Discussion
that an increase in investment/employment will promote a cor-
responding increase in the proportion of GRDP. However, there is This paper investigated the LMM for multivariate compositional
insufficient evidence to show that the wage income structure con- data with longitudinal characteristics, namely CoLMM, where both
tributes to the GRDP structure. Table 7 gives the estimated covari- the response and covariates are compositions. Incorporating ran-
ance matrix of the random effects for this best-performing model. dom effects that consider the difference across individuals, CoLMM
From Table 7, UrEm and WaUrEm show large variation in in- can extract more information from the compositional residuals
fluence on different provinces, with the variances of the corre- obtained by CoLM and then improve the fitting performance and
sponding random effects being 1.621 and 1.906, respectively. In the regression interpretability for compositional data. The related
58 Z. Wang, H. Wang and S. Wang / Neurocomputing 335 (2019) 48–58

parameter estimation procedure, developed through the EM algo- [13] Z. Ma, J.H. Xue, A. Leijon, Z.H. Tan, Z. Yang, J. Guo, Decorrelation of neutral
rithm, enjoys the property of global convergence. Compared with vector variables: theory and applications, IEEE Trans. Neural Netw. Learn. Syst.
29 (2017) 129–143.
the existing pooled method, CoLMM improves the performance of [14] Z. Ma, Y. Lai, W.B. Kleijn, Y.Z. Song, L. Wang, J. Guo, Variational Bayesian learn-
the regression significantly, with a reasonable extra computational ing for Dirichlet process mixture of inverted Dirichlet distributions in non–
time. The longitudinal analysis technique is then generalized to gaussian image feature modeling, IEEE Trans. Neural Netw. Learn. Syst. 99
(2018) 1–15.
multivariate compositional data, and this may provide an al- [15] Z. Ma, A.E. Teschendorff, A. Leijon, Y. Qiao, Variational Bayesian matrix fac-
ternative approach for modeling compositions. Moreover, some torization for bounded support data, Pattern Anal. Mach. Intell. IEEE Trans. 37
definitions and related properties for the joint use of random com- (2015) 876–889.
[16] G. Mateu-Figueras, V. Pawlowsky-Glahn, J.J. Egozcue, The normal distribution
positions and real variables are proposed in theoretical support of
in some constrained sample spaces, Sort Stat. Oper. Res. Trans. 37 (2013)
the proposed method, such as some joint/conditional PDFs related 29–56.
to both the simplex and real space. As summarized in the numer- [17] V. Pawlowsky-Glahn, J.J. Egozcue, R. Tolosana-Delgado, Modeling and Analysis
of Compositional Data, John Wiley & Sons, 2015.
ical studies and real data application, the proposed CoLMM is able
[18] H. Peng, Y. Lu, Models selection in linear mixed effect models, J. Multivar. Anal.
to deal with longitudinal compositional data and obtain efficient 109 (2012) 109–129.
parameter estimates and accurate compositional response fittings. [19] J.L. Scealy, P.D. Caritat, E.C. Grunsky, M.T. Tsagris, A.H. Welsh, Robust princi-
Our main focus is on the parameter estimation for CoLMM. pal component analysis for power transformed compositional data, J. Am. Stat.
Assoc. 110 (2015) 136–148.
However, model selection is also an important and challenging [20] M. Tsagris, Zero adjusted Dirichlet regression for compositional data with zero
issue for conventional LMM. In this paper, we proposed the sta- values present, Statistics 31 (2015) 1–11.
tistical inferences of fixed effects coefficients and the BIC-type [21] H. Wang, Q. Liu, H.M.K. Mok, L. Fu, W.M. Tse, A hyperspherical transforma-
tion forecasting model for compositional data, Eur. J. Oper. Res. 179 (2007)
selector for the selection of random effects. Some more advanced 459–468.
approaches deserve to be studied in the future, such as the joint [22] H. Wang, L. Shangguan, J. Wu, R. Guan, Multiple linear regression modeling for
selection method for fixed and random effects presented by [3]. compositional data, Neurocomputing 122 (2013) 490–500.
[23] H. Wang, L. Shangguan, R. Guan, L. Billard, Principal component analysis for
We also need to adapt CoLMM under more complicate distribu- compositional data vectors, Comput. Stat. 30 (2015) 1079–1096.
tion assumptions in the simplex and investigate more effective [24] Y. Wei, Z. Wang, H. Wang, T. Yao, Y. Li, Promoting inclusive water governance
estimation methods. Moreover, some other longitudinal analysis and forecasting the structure of water consumption based on compositional
data: a case study of Beijing, Sci. Total Environ. 634 (2018) 407–416.
techniques are worth introducing to compositional data modeling
[25] Y. Yun, Z. Zou, H. Wang, A regression model based on the compositional data
in a future study. of Beijing’s water consumed structure and industrial structure, Syst. Eng. 26
(2008) 67–71. In Chinese.
Acknowledgment Zhichao Wang is a Ph.D. candidate from the School of
Economics and Management at Beihang University, China.
The authors are grateful to the Associate Editor and two anony- He received his B.S. degree Applied Mathematics from the
School of Mathematics and System Science at the same
mous referees for their very constructive comments and sugges-
university in 2016. His research interests are the area of
tions which helped improve the paper greatly. This research was computational statistics and data analysis. He currently
financially supported by the Natural Science Foundation of China focuses on multivariate analysis on multiple longitudinal
(Grant nos. 7142017025, 11701023). compositional data.

Supplementary material

Supplementary material associated with this article can be


found, in the online version, at doi:10.1016/j.neucom.2019.01.043. Huiwen Wang received her B.Sc. degree from Beihang
University (BHU), China, in 1982, DEA MASE, from Paris
XI, France, in 1989, and Ph.D. degree in Engineering Sys-
References tem from BHU in 1992. She is currently a professor in
Management Science and engineering Department, School
[1] J. Aitchison, The statistical analysis of compositional data, J. R. Stat. Soc. 44 of Economics and Management (SEM), BHU. Also, she is
(1982) 139–177. director of SEM Academic Degrees Committee, and direc-
[2] J. Aitchison, The Statistical Analysis of Compositional Data, Chapman and Hall, tor of Research Center of Complex Data Analysis in BHU.
London, 1986. Prof. Wang received National Science Fund for Distin-
[3] H.D. Bondell, A. Krishna, S.K. Ghosh, Joint variable selection for fixed and ran- guished Young Scholars http://www.nsfc.gov.cn/english/
dom effects in linear mixed-effects models, Biometrics 66 (2010) 1069–1077. 06gp/pdf/2011/041.pdf Her general area of research is
[4] J. Chen, X. Zhang, S. Li, Multiple linear regression with compositional response statistics and data analysis, with a recent focus on mul-
and covariates, J. Appl. Stat. 44 (2016) 1–16. tivariate analysis for high-dimension complex data. She is
[5] R.B. Dimova, M. Markatou, A.H. Talal, Information methods for model selection an IASC member, a member of National Statistics Teaching Materials Review Com-
in linear mixed effects models with application to HCV data, Comput. Stat. mittee, executive director of China Marketing Association, editorial member of Jour-
Data Anal. 55 (2011) 2677–2697. nal of Symbolic Data Analysis.
[6] J.J. Egozcue, V. Pawlowsky-Glahn, Groups of parts and their balances in com-
positional data analysis, Math. Geol. 37 (2005) 795–828. Shanshan Wang received her B.S. in Mathematics from
[7] J.J. Egozcue, V. Pawlowsky-Glahn, G. Mateu-Figueras, C. Barceló-Vidal, Isometric Qingdao University in 2008, M.S. in Statistics and Prob-
logratio transformations for compositional data analysis, Math. Geol. 35 (2003) ability from Beijing Normal University in 2011, and Ph.D.
279–300. in Statistics from Beijing Normal University in 2014. She is
[8] G.M. Fitzmaurice, N.M. Laird, J.H. Ware, Applied Longitudinal Analysis, John currently assistant professor at Management Science and
Wiley & Sons, 2011. engineering Department, School of Economics and Man-
[9] M. Gallo, Coda in three-way arrays and relative sample spaces, Electron. J. agement (SEM), Beihang University, Beijing China. Her
Appl. Stat. Anal. 5 (2012) 401–406. main research interests are High-dimensional data anal-
[10] C. Hsiao, Analysis of Panel Data, Cambridge University Press, 2014. ysis, Quantile Regression, non/semi-parametric modeling,
[11] N.M. Laird, J.H. Ware, Random-effects models for longitudinal data, Biometrics and survival data analysis.
38 (1982) 963–974.
[12] N.M. Laird, N. Lange, D. Stram, Maximum likelihood computations with re-
peated measures: application of the EM algorithm, J. Am. Stat. Assoc. 82 (1987)
97–105.

Вам также может понравиться