Вы находитесь на странице: 1из 19

Marques, F., Coelho, C. A., and de Carvalho, M.

(2015), On the Distribution of Linear Combinations of Independent Gumbel Random Variables, Statistics and Computing, 25, 683701.

Stat Comput
DOI 10.1007/s11222-014-9453-5

On the distribution of linear combinations of independent Gumbel


random variables
Filipe J. Marques Carlos A. Coelho
Miguel de Carvalho

Received: 8 August 2012 / Accepted: 7 February 2014


Springer Science+Business Media New York 2014

Abstract The distribution of linear combinations of inde- 1 Introduction


pendent Gumbel random variables is of great interest
for modeling risk and extremes in the most different The Gumbel distribution is a particular case of the Gener-
areas of application. In this paper we develop near-exact alized Extreme Value distribution and it has been widely
approximations for the distribution of linear combination of used for modeling risk and extremes (Gumbel 1941; Tiago
independent Gumbel random variables based on a shifted de Oliveira 1963; Hosking et al. 1985; Balakrishnan et
generalized near-integer gamma distribution and on the al. 1992; Wang 1995; Arnold et al. 1998; Castillo et al.
distribution of the difference of two independent generalized 2005; Antal et al. 2009). Linear combinations of Gum-
integer gamma distributions. These near-exact distributions bel related random variates arise naturally in applications
are computationally appealing and numerical studies con- whenever there is the need to model the combination of
firm their accuracy, as assessed by a proximity measure used extremes of several variables, and this has been a topic of
in related studies. We illustrate the proposed approximations considerable attention in diverse applications (Bailey and
on applied problems in networks engineering, computational Gribskov 1997; Cetinkaya et al. 2001; Loaiciga and Leip-
biology, and flood risk management. nik 1999; Burda et al. 2012). Despite the wide range of
applications in which the distribution of linear combina-
Keywords Generalized integer gamma distribution tions of independent Gumbel random variables may be use-
Generalized near-integer gamma distribution ful, few results are available on this distribution. Nadarajah
Gumbel distribution Near-exact distribution (2008) presents the exact distribution of the linear combina-
Phase type distributions Risk tion of p independent Gumbel random variables, using Fox
H and Meijer G functions, but the computational invest-
ment required by these functions limits the practical use-
fulness of this result. This was already remarked by Burda
et al. (2012, p. 189), who claimed that the exact distribu-
tion proposed by Nadarajah is extremely complicated to be
used.
Electronic supplementary material The online version of this
article (doi:10.1007/s11222-014-9453-5) contains supplementary In this paper we propose three accurate, manageable, and
material, which is available to authorized users. computationally appealing near-exact distributions for the
linear combination of independent Gumbel random vari-
F. J. Marques (B) C. A. Coelho
ables; the first one for positive linear combinations, and
DMCMAFCT, Universidade Nova de Lisboa, Lisbon, Portugal
e-mail: fjm@fct.unl.pt the second and third ones can be applied regardless of the
sign of the coefficients of the linear combination. Our near-
M. de Carvalho exact distributions have links with phase-type approxima-
Pontificia Universidad Catlica de Chile, Santiago, Chile
tions (Aldous and Shepp 1987; OCinneide 1990) and, as
M. de Carvalho we discuss below, their accuracy can be controlled effec-
CMA-FCT, Universidade Nova de Lisboa, Lisbon, Portugal tively through a precision parameter. Our first near-exact

123
Stat Comput

distribution is based on the generalized integer gamma 2 The exact and near-exact distributions
(GIG) and generalized near-integer gamma (GNIG) distri-
butions, which have a wealth of applications in multivari- 2.1 Exact distribution
ate analysis (Marques and Coelho 2008; Coelho and Mar-
ques 2010, 2012; Marques et al. 2011; Coelho et al. 2013). Let X 1 , . . . , X p be p independent Gumbel random variables,
The GIG distribution corresponds to the distribution of the with location parameter j R and scale parameter j
sum of independent Gamma random variables with integer R+ , i.e.
shape parameters (Amari and Misra 1997; Coelho 1998),
ind.
while the GNIG distribution corresponds to the distribu- X j Gumbel( j , j ),
tion of the sum of a GIG random variable with an inde- FX j (x) = exp[ exp{(x j )/ j }], x R, (1)
pendent Gamma random variable with a non-integer shape
parameter (Coelho 2004); further details on the GIG and for j = 1, . . . , p. Here and below we use the notations R+
GNIG distributions are given in Appendix 1. We show and A to respectively denote the sets {x R : x > 0} and
that the exact distribution of a positive linear combination {n N : n 2}. The characteristic functions of X j and
!p
of independent Gumbel random variables can be decom- W = j=1 j X j , for j R, are respectively defined as
posed as the sum of two independent random variables: X j (t) = (1 it j ) exp{it j },
the first corresponding to a linear combination of indepen- p
"
dent logarithmized Gamma random variables, and the sec- W (t) = (1 it j j ) exp{it j j }, t R.
ond to a shifted GIG (SGIG) random variable. Our sec- j=1
ond near-exact distribution is based on the so-called SDGIG
distribution, which corresponds to the distribution of the The next theorem provides a characterization of the exact
shifted difference of two independent GIG distributions distribution of the linear combination of independent Gumbel
(Coelho and Mexia 2010, Chap. 2). Hence, in the context random variables.
of our second near-exact distribution, we show that the lin- ind.
ear combination of independent Gumbel random variables Theorem 1 Let X j Gumbel( j , j ), with j R
and R+ . The exact characteristic function of W =
can be decomposed as the sum of two independent ran- !p j
dom variables: the first corresponding to a linear combi- j=1 j X j , with j R, j = 1, . . . , p, can be written as
nation of independent logarithmized Gamma random vari- W (t) = W1 (t)W2 (t), where for any A,
ables, and the second corresponding to a SDGIG random p
" ( it j j )
variable. The third near-exact distribution is also based W1 (t) = , t R, (2)
( )
on the previous decomposition, and on the fact that the j=1
DGIG distribution can be represented as a particular mix- and
ture of integer Gamma distributions. These decompositions
#" 2 $
p " %$ %1 &
are extremely useful as they allow us to construct near- 1+k 1+k
W2 (t) = it
exact distributions by using a shifted version of the GNIG jj jj
j=1 k=0
distributionin the case of positive linear combinations
# ' p &
and by using the SDGIG distributionin the case where
exp it j j , t R. (3)
the coefficients of linear combination are arbitrary real
j=1
numbers.
We illustrate our near-exact approximations by revisiting a Proof The proof follows by noticing that we can write the
problem in network engineering, first addressed by Cetinkaya characteristic function of W as
et al. (2001), a problem in computational biology, earlier con- p
"
sidered by Bailey and Gribskov (1997), and by addressing W (t) = (1 it j j ) exp{it j j }
the problem of interval estimation of the location parame- j=1
ter of a Gumbel distribution in a real data application on #" p
flood risk management, earlier discussed in Hosking et al. ( it j j ) ( )
=
(1985). ( ) ( it j j )
j=1
The structure of our paper is as follows. In Sect. 2 we & # ' p &
(1 it j j )
introduce the exact and near-exact distributions of interest. exp it jj
In Sect. 3 we conduct numerical experiments to assess the (1)
j=1
level of accuracy of our near-exact approximations. In Sect. 4 p #" 2
p "
" ( it j j )
we illustrate our methods in applied modeling issues, and we = (1 + k)
conclude in Sect. 5. ( )
j=1 j=1 k=0

123
Stat Comput

& # ' p &


( )1 according to a shifted sum of p ( 1) independent
1 + k it j j exp it jj Exponential distributions with parameters (1 + k)/( j j ),
j=1
p #" 2$
p " % ! pj = 1, . . . , p and k = 0, . . . , 2, with shift parameter
for
j=1 j j .
" ( it j j ) 1+k
= If some of the Exponential distributions in (3) have the
( ) jj
j=1 j=1 k=0
same parameter we can sum them, obtaining in this way
$ %1 & # ' p &
1+k Gamma distributions, so that equation (3) can be written as
it exp it jj .
jj #"
& # ' p &
j=1
(
W2 (t) = ( j )r j ( j it)r j exp it j j , (5)
j=1 j=1
Some comments are in order.
where is the number of Exponential distributions with dif-
!p ferent rate parameters, j are the parameters of these dis-
(i) We can write W = j=1 X j , where X j = j X j tributions, and r j is the number of such distributions with
Gumbel( j j , j j ), and hence an alternative para-
the same rate parameter j , for j = 1, . . . , . We have thus
meterization can be considered by taking (j , j ) =
established the following corollary to Theorem 1.
( j j , j j ) and setting the corresponding coefficients
of the linear combination as j = 1; in addition, to sim- ind.
plify the expressions we can consider j = 0, in which Corollary 1 Let X j ! Gumbel( j , j ), with j R and
p
case we would be working with a similar distribution j R+ . If W =
j=1 j X j , with j R+ , j =
apart from a shift. 1, . . . , p, then it holds that W = W1 + W2 , with W1 as
(ii) Our results can be readily applied to the product of in (4) and
powers of independent Weibull and Frchet random $ p
' %
variables, through simple transformations; actually if W2 SGIG r, , , jj ,
X j Gumbel( j , j ), then Y j = exp{X j } j=1
Weibull(exp{ j }, j1 ), with distribution function
where r = (r1 , . . . , r ) and = (1 , . . . , ).
* $ %1/ j +
y Here and below we use the letter S to denote a shifted distri-
FY j (y) = 1 exp ,
exp( j ) bution, and we follow the convention that the last parameter
in a shifted distribution is the shift parameter; see Appendix 1
,p j !p for further details.
and thus j=1 Y j = exp{ j=1 j X j }. If
It is instructive to consider the case of the sum of p
Y j = exp{X j } then Y j Frchet(exp{ j }, j1), with
independent Gumbel random variables when j = , j =
distribution function
1, . . . , p, for which simple expressions of the character-
* $ %1/ j +
y istic functions are readily available, as a consequence of
FY j (y) = exp , Corollary 1,
exp( j )
$ %
( it ) p
,p !p W1 (t) = ,
and thus j=1 (Y j ) j = exp{ j=1 j X j }. Using simi- ( )
lar transformations it is also possible to apply our results # "2 & # ' p &
to more complex distributions, such as the Generalized W2 (t) = ( j )r j ( j it)r j exp it j , (6)
Gamma distribution (Marques 2012). j=0 j=1

with r j = p, j = (1 + j)/ , for j = 0, . . . , 2; this


2.1.1 Positive linear combinations ( j > 0)
implies that in such case
If all j are positive, the exact distribution of W is the same $ p
' %
T 1
as that of the sum of two independent random variables, W1 W2 SGIG p1 1 , (1, . . . , 1), 1, j ,
and W2 , where j=1
p
' ind. where 1 1 denotes a 1 vector of ones. The parameter
W1 = j j log Z j , Z j Gamma( , 1), (4) is related with the depth of the SGIG distribution and it
j=1
may be used as a precision parameter, since, as we will see
with A, is a linear combination of p independent loga- in Sect. 3, larger values of lead to more accurate near-exact
rithmized Gamma random variables and W2 is distributed approximations.

123
Stat Comput

2.1.2 General linear combinations ( j R) approximates the exact characteristic function W ; the dis-
tribution of the random variable W is said to be a near-exact
If we have q positive j and p q negative j , the charac- distribution of W (Coelho 2004). Based on the characteriza-
teristic function in (3) can be written as tion of the exact distribution of W in Corollary 1, we take
# " " 2 $ %$ %1 & $ %
1+k 1+k l
W2 (t) = it W1 (t) = exp{it }, (9)
jj jj l it
{ j: j >0} k=0
# " 2 $
" %$ %1 & which is the characteristic function of a random variable
1+k 1+k
+ it W1 SGamma(, l, ), and replaces asymptotically W1
jj jj in (2), for increasing values of ; see Appendix 1 for details
{ j: j <0} k=0
# ' p & on the shifted Gamma distribution. Our choice is based on
exp it jj , the fact that a single logarithmized Gamma random variable
j=1 may be represented as an infinite sum of independent shifted
Exponential random variables (see Appendix 2 for details),
so that similarly to (5), we obtain
and as such the sum of independent logarithmized Gamma
#"
+
"

random variables, eventually multiplied by a parameter, may
+
r +
W2 (t) = (+j )r j (+j it) j (j )r j be represented as an infinite sum of shifted Gamma distrib-
j=1 j=1 utions. Instead of this infinite sum of shifted Gamma distri-
& # ' p & butions, to avoid computational difficulties, we use a single

(j + it)r j exp it jj . (7) shifted Gamma distribution, which matches the first three
j=1 exact moments. Hence, the parameters , l, and , are deter-
where r + = (r1+ , . . . , r++ ) and + = (+ + mined by solving the system of equations
1 , . . . , + ), are
respectively the shape and rate parameters corresponding - -
j W1 (t) -- j W1 (t) --
to the positive j , and r = (r1 , . . . , r ) and = =
t j
-
- t j - , j = 1, 2, 3, (10)
(
1 , . . . , ) are respectively the shape and rate parame- t=0 t=0
ters corresponding to the negative j . In this case the exact
so to ensure that the first three of moments of the exact and
distribution of W is the distribution of the sum of two inde-
approximating distributions are equal. The solution to (10)
pendent random variables, W1 and W2 , where W1 is as in (4)
is
and W2 follows a SDGIG distribution. This gives rise to the
following corollary. = 4(1 2 )3 (2 3 )2 ,
ind. l = 2(1 2 )|2 3 |1 ,
Corollary 2 Let X j Gumbel( j , j ), with j R and
!p = 0 1 2(1 2 )2 |2 3 |1 . (11)
j R+ . If W = j=1 j X j , with j R, j = 1, . . . , p,
then it holds that W = W1 + W2 , with W1 as in (4) and
where we use the following notation throughout the paper
$ 'p %
+ + +
W2 SDGIG r , r , , , , , jj , (8) i+1
i i ( ) = log{ ( )},
j=1 i+1
p
where r + = (r1+ , . . . , r++ ) and + = (+ +
1 , . . . , + ) are
'
i i () = ( j j )i , = (1 , . . . , p ). (12)
respectively the shape and rate parameters corresponding
j=1
to the positive j and r = (r1 , . . . , r ) and =
(
1 , . . . , ) are respectively the shape and rate parame- The resulting near-exact distribution is established in the
ters corresponding to the negative j . next theorem.
ind.
2.2 Near-exact distributions Theorem 2 Let X j Gumbel( j , j ), with j R and
j R+ . If we use as an asymptotic approximation of
2.2.1 First near-exact distribution ( j > 0) W1 (t) in (2) the characteristic function W1 (t) in (9), we
!p
obtain as near-exact distribution for W = j=1 j X j , with
Our first near-exact distribution is based on replacing W1 j R+ , j = 1, . . . , p, the shifted GNIG distribution
by an asymptotic approximation W1 , such that for suffi-
$ p %
ciently large '
SGNIG r , , + 1, + jj ,
W (t) = W1 (t)W2 (t), j=1

123
Stat Comput

with r = (r1 , . . . , r , ) and = (1 , . . . , , l), and where r + , r , + , and are as in Corollary 2.


where the r j , j , and are given in (5) and , l, and
are given by (11). 2.2.3 Third near-exact distribution (3 () = 0)
Proof It is enough to note that for each t R, it holds that
Our third near-exact distribution can be applied when
$ % #" 3 = 0, where 3 is defined in (12); below we assume
l ( )r j
W1 (t)W2 (t) = exp{it } j that W1 SGamma(, l, ) and that 3 = 0, so that either
l it
j=1 sign(3 ) = 1 or sign(3 ) = 1, with sign() denoting the
& # ' p &
sign function. This near-exact distribution is also based on
( j it)r j exp it jj Corollary 2, but here we approximate the distribution of W1
j=1
in (4) with the distribution of sign(3 ) W1 , whose char-
#"
&$ %
rj r j l acteristic function is sign(3 )W1 (t) = W1 (sign(3 ) t),
= ( j ) ( j it) and where W1 (t) is as in (9). Here, the parameters , l, and
l it
j=1
are determined by solving the system of equations
# $ p
' %& - -
exp it + jj . j W1 (sign(3 ) t) -- j W1 (t) --
- = - , (14)
j=1 t j - t j t=0
t=0
(

for j = 1, 2, 3, which has a solution if and only if 3 = 0,
It is again instructive to consider the particular case in which case
addressed in (6), that is when we consider the case of the
sum of independent Gumbel random variables with the same = 4(1 2 )3 (2 3 )2 ,
scale parameter. In this case we obtain the near-exact distri- l = 2(1 2 )|2 3 |1 ,
bution = sign(3 )0 1 2(1 2 )2 |2 3 |1 . (15)
$ 'p %
SGNIG r , , , + j , (13) The following theorem holds.
j=1
ind.
Theorem 4 Let X j Gumbel( j , j ), with j R and
where r = ( p1T 1 , ) and = 1 (1, . . . , 1, l ),
j R+ . If we use as an asymptotic approximation of W1 (t)
with , l, and given by (11).
in (2) the characteristic function W1 (sign(3 ) t) in (9),
Using Corollary 2, and following two interesting rec- !p
we obtain as near-exact distribution for W = j=1 j X j ,
ommendations made by an anonymous reviewer, we next
with 3 = 0, the distribution of sign(3 ) W1 + W2 where
develop two near-exact distributions for the case where the
W1 SGamma(, l, ) and W2 are as in (8), and where ,
sign of the coefficients needs not to be positive.
l, and are given by (15).
2.2.2 Second near-exact distribution ( j R) Technical details on the distribution of sign(3 ) W1 + W2
can be found in the Appendix 1.
We now develop a near-exact distribution, that although less
accurate, it is computationally fast and can be applied to the
case of an arbitrary real j . To do so, we approximate the
3 Numerical studies
distribution of W = W1 + W2 in Corollary 2, with the dis-
tribution of W = E(W1 ) + W2 , where E(W1 ) = 0 1
3.1 Measuring accuracy
with 0 and 1 as defined in (12). The distribution of W
corresponds to our second near-exact approximation, and as
To study the quality of our near-exact approximations we
described in the next theorem W follows a SDGIG distrib-
!p use a measure of proximity between characteristic functions,
ution with shift parameter E(W1 ) + j=1 j j .
that is also a measure of the proximity between distribution
ind. functions, and which is defined as
Theorem 3 Let X j Gumbel( j , j ), with j R and . - -
j R+ . If we replace W1 by E(W1 ) we obtain as near- 1 - W (t) W (t) -
!p = - - dt . (16)
2 - t -
exact distribution for W = j=1 j X j , with j R, j =
R
1, . . . , p, the shifted DGIG distribution
$ p % This is measure is known to be related with the BerryEsseen
'
SDGIG r + , r , + , , + , , E(W1 ) + jj , upper bound (Berry 1941; Esseen 1945; Love 1977; Hwang
j=1 1998), and can be shown to verify the inequality

123
Stat Comput

Table 1 Values of for Scenarios iiii by sign(3 )W1 and FW replaced by the distribution func-
Scenario i Scenario ii Scenario iii tion of sign(3 ) W1 + W2 in Theorem 4 (see expressions
(i , i , i ) (ii , ii , ii ) (iii , iii , iii ) (25) and (26) in Appendix 1 for details on this distribution
p=2 p=4 p=5 function).
4 1.4 104 1.8 104 3.4 104 We note that when , we have 0 and W !
10 8.0 106 1.0 105 2.0 105
W , where ! is used to denote weak convergence. Paren-
thetically, we further note that to be ensured that we accu-
15 2.3 106 2.9 106 5.8 106
rately approximate the tail of the exact distribution, we need
20 9.4 107 1.2 106 2.4 106
to keep increasing the precision parameter as we move
50 5.8 108 7.4 108 1.5 107
towards higher quantiles; further details on the measure
100 7.1 109 9.1 109 1.8 108
can be found in Grilo and Coelho (2007), Marques and
500 5.6 1011 7.2 1011 1.4 1010 Coelho (2008), and Coelho and Marques (2010, 2012).

. - (t) (t) - 3.2 Numerical results


1 - W1 W1 -
FW FW - - dt,
2 - t -
3.2.1 First near-exact distribution ( j > 0)
R

where FW FW = supwR |FW (w) FW (w)|. Here In Tables 1 and 2 we report numerical results conducted
FW denotes a near-exact distribution function, which, for according to the following Scenarios:
example in the case of our first near-exact distribution is
$ ' p % Scenario i: i = (2, 3), i = (5, 6), and i = 1T2 ;
FW (w) = FV w j j ; r , , + 1 , (17) Scenario ii: ii = (4, 1, 2, 3), ii = (0.1, 0.2, 0.3,
j=1 0.4), and ii = (1, 2, 3, 4);
Scenario iii: iii = (10, 10, 20, 30, 40), iii = (1, 2,
where r and are as in Theorem 2, and FV denotes the
3, 4, 5), and iii = (1/2, 1, 3/4, 5, 1).
distribution function of the random variable V with a GNIG
distribution, as defined in (23) in Appendix 1. For our sec-
ond near-exact distribution all follows analogously, but W In Table 1 it can be observed that the values of are
quite lowindicating a good approximationand that the
needs to be replaced by W , with W distributed as in The-
parameter is inversely related to . In addition, it can also
orem 3, and FW in (17), must be accordingly replaced with
be noticed that is unresponsive to changes in j , and the
the distribution function
$ % same happens if we multiply all the j by the same constant.
'p
+ + + The quality of the near-exact approximations is patent from
FV w E(W1 ) jj; r , r , , , , .
the extremely reduced values of .
j=1
The parameter may be chosen according to the desired
Here r + , r , + , and are defined as in Theorem 3, and precision. Higher values of entail however a higher compu-
FV is the distribution function of a random variable V with tational investment, and hence the selection of this parameter
a DGIG distribution. For our third near-exact distribution, involves a precisionburden tradeoff. In Table 2 we present
which can be applied when 3 = 0, W should be replaced the computation time, in seconds, for the calculation of the

Table 2 Computation time (in


seconds) for the near-exact Scenario i Scenario ii Scenario iii
cumulative distribution (i , i , i ) (ii , ii , ii ) (iii , iii , iii )
functions for Scenarios iiii p=2 p=4 p=5

p values p values p values


0.10 0.05 0.01 0.10 0.05 0.01 0.10 0.05 0.01

4 0.02 0.02 0.02 0.05 0.03 0.03 0.05 0.03 0.03


10 0.08 0.08 0.08 0.39 0.41 0.33 0.38 0.39 0.41
15 0.17 0.17 0.20 1.17 1.28 1.14 1.48 1.62 1.08
20 0.30 0.31 0.36 2.87 3.09 2.82 2.93 2.90 2.95
50 2.62 2.50 3.00 64.2 70.9 65.8 72.6 68.3 70.7

123
Stat Comput

Scenario I Scenario II Scenario III

24

340
50
NearExact Quantiles

NearExact Quantiles
NearExact Quantiles
22

320
40

300
20

280
30

18

260
16
20

20 30 40 50 16 18 20 22 24 260 280 300 320 340


Exact Quantiles Exact Quantiles Exact Quantiles

Fig. 1 QQ-plots for Scenarios iiii. The near-exact quantiles, 0.80, responding exact quantiles were computed using the Gil-Pelaez (1951)
0.85, 0.90, 0.925, 0.95, 0.975, 0.99, 0.995 and 0.999, were computed inversion formulas and the bisection method
using the near-exact distribution function in (17) for = 10 and the cor-

Table 3 Values of for


Scenario a p=2 p = 10 p = 20 p = 30 p = 50

4 1.4 104 1.5 105 6.7 106 4.2 106 2.4 106
10 8.0 106 8.1 107 3.5 107 2.2 107 1.3 107
15 2.3 106 2.3 107 9.9 108 6.3 108 3.6 108
20 9.4 107 9.4 108 4.1 108 2.6 108 1.5 108
50 5.8 108 5.8 109 2.5 109 1.6 109 8.9 1010
100 7.1 109 7.1 1010 3.1 1010 1.9 1010 1.1 1010
500 5.6 1011 5.6 1012 2.4 1012 1.5 1012 8.8 1013

p values 0.10, 0.05 and 0.01, using the near-exact quantiles. 3.2.2 Second near-exact distribution ( j R)
These calculations were done using an Intel i7 2GHz proces-
sor; for values of larger than 50 the computation times start In Tables 4 and 5 we report numerical results conducted
to increase steadily. In most computations below we use the according to the following scenarios:
value = 10 as a reference value, as it provides a sensible
computation time/ ratio for our first near-exact distribution. Scenario iv: iv = i = (2, 3), iv = i = (5, 6), and
It is also possible to observe from Table 2 that, as expected, iv = (1, 1);
when we increase p the computation time also grows, being Scenario v: v = ii = (4, 1, 2, 3), v = ii =
this growth less steep for small to moderate values of . (0.1, 0.2, 0.3, 0.4), and v = (1, 2, 3, 4);
To compare the exact and near-exact quantiles we present Scenario vi: vi = iii = (10, 10, 20, 30, 40), vi =
in Fig. 1 QQ-plots for Scenarios iiii. The extreme closeness iii = (1, 2, 3, 4, 5), and vi = (1/2, 1, 3/4, 5, 1).
between the exact and near-exact quantiles is sustained by
the fact that all the points are extremely close to the line of
From Tables 4 and 5 we can observe that the near-exact
equation y = x; the exact quantiles were computed using
approximation obtained using the result in Theorem 3 is not
the Gil-Pelaez (1951) inversion formulas and the bisection
as accurate as the one obtained with Theorem 2, although it
method which is very time consuming, numerically unstable,
presents faster computation times for the same values of .
and hence inappropriate for a regular use.
To achieve in Scenarios ivvi similar performances as the
It is also interesting that the near-exact approximations
ones obtained for Scenarios iiii we need to consider at least
tend to slightly improve with an increasing number of vari-
= 500 as can be seen in Table 4. Again, for Scenarios
ables, as can be seen from Table 3, where we consider a
ivvi, it can be ascertained from Table 5, that for higher
Scenario a with parameters a = ((1) j 2 j, j = 1, . . . , p),
values of p we obtain a higher computational cost.
a = (5/j, j = 1, . . . , p) and a = (2 j + 1, j = 1, . . . , p)
From the QQ-plots in Fig. 2 it can be noticed that, for
for p = 2, 10, 30, 50.
Scenarios ivvi, the near-exact quantiles approximate rea-

123
Stat Comput

Table 4 Values of for Scenarios ivvi Our third near-exact distribution can also be applied to the
Scenario iv Scenario v Scenario vi case of positive linear combination coefficients. In practice
(iv , iv , iv ) (v , v , v ) (vi , vi , vi ) we have found that although both our first and third near-exact
p=2 p=4 p=5 distributions have tantamount precision, the third approach
Second near-exact distribution requires a higher computational investment.
4 4.7 102 4.6 102 5.5 102
10 1.5 102 1.5 102 1.7 102
4 Examples and illustrations
15 9.8 103 9.8 103 1.1 102
20 7.2 103 7.2 103 8.3 103
All examples in this section entail positive linear combina-
50 2.8 103 2.7 103 3.2 103
tions, and hence for conciseness only our first near-exact
100 1.3 103 1.4 103 1.6 103 approximation is used.
500 2.7 104 2.7 104 3.1 104
Third near-exact distribution 4.1 Network engineering
4 5.3 104 4.0 104 3.9 104
10 3.0 105 2.2 105 2.3 105 The real time management of massive data streams in large-
15 8.5 106 6.4 106 6.6 106 scale networks leads to a number of challenging problems in
20 3.5 106 2.6 106 2.7 106 computational statistics (Domingos and Hulten 2003). One
50 2.1 107 1.6 107 1.7 107 of such problems entails achieving at least a minimum level
100 2.6 108 2.0 108 2.1 108 of quality-of-service, and a well-known method for achieving
500 2.1 1010 1.6 1010 1.6 1010 this goal is the so-called egress admission control algorithm
(Cetinkaya et al. 2001). A full description of this algorithm
is beyond the scope of our paper. What is relevant for our
purposes is that their algorithm is based on the sum of two
sonably well the exact ones. In these QQ-plots we consider independent Gumbel distributed random variables, and quot-
= 100, given that it provides a reasonable computation ing the authors (Cetinkaya et al. 2001, p. 76):
time/ ratio for our second near-exact distribution.
From Table 6 it can be ascertained that the accuracy of our Approximating the sum of two Gumbel distributed
second near-exact approximation also tends to improve as the random variables by a Gumbel random variable, the
number of variables increases, although in this case the decre- admission control test follows.
ments in occur at a much slower rate; in Table 6 we con- Thus, Cetinkaya et al. (2001) inadequately use a single
sidered a Scenario b with b = ( j/2, j = 1, . . . , p), b = Gumbel distribution to approximate the sum of two indepen-
(3 j 1, j = 1, . . . , p), and b = ((1) j+1 3j , j = dent Gumbel distributions, as already remarked in Nadara-
1, . . . , p) for p = 2, 10, 20, 30, 50. jah and Kotz (2008). In Fig. 3 we illustrate the reliability of
our SGNIG-based near-exact approximation, introduced in
3.2.3 Third near-exact distribution (3 () = 0) Sect. 2.2, and the inadequacy of the approach in Cetinkaya et
al. (2001), as assessed by the pointwise difference to the exact
We assess the performance of the third near-exact distribution density obtained using the inversion formulas in Gil-Pelaez
on Scenarios ivvi. Tables 4 and 6 reveal that our third near- (1951).
exact distribution possesses similar asymptotic properties as Fig. 3 clearly provides evidence to support the claim that
our second approach, althoughas reflected by its lower val- the egress admission control algorithm could benefit from
ues of it is much more precise. The computation time of using our near-exact approximation. To give a more complete
our third near-exact distribution increases however as a func- view of the comparison between our approach and the one in
tion of , in some cases beyond the realms of practicality. Cetinkaya et al. (2001), we revisit Scenario i from Sect. 3, and
From Table 5 it is possible to observe that our third near- to assess the performance of both approaches we again use
exact distribution presents higher computing times than our the measure , as defined in (16). The results are reported in
second one, and thus we propose = 4 as a reference, as it Fig. 4, and again provide evidence suggesting that our near-
provides a sensible computation time/ ratio, to be used in exact approximation would yield more precise and reliable
practical applications. Note that for = 4, the value of egress admission control algorithms.
is slightly lower than the one for our second approach with Parenthetically, we note that Nadarajah and Kotz (2008)
= 100, but even with this difference the near-exact quan- present an expression for the exact distribution of the sum
tiles of both approaches would be virtually indistinguishable of two independent Gumbel random variables, but only for
if plotted simultaneously in Fig. 2. the cases where the ratio between the scale parameters is a

123
Stat Comput

Table 5 Computation time (in seconds) for the near-exact cumulative distribution functions for Scenarios ivvi
Scenario iv Scenario v Scenario vi
(iv , iv , iv ) (v , v , v ) (vi , vi , vi )
p=2 p=4 p=5
p values p values p values

0.10 0.05 0.01 0.10 0.05 0.01 0.10 0.05 0.01

Second near-exact distribution


4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02
10 0.00 0.02 0.00 0.02 0.02 0.03 0.03 0.03 0.02
15 0.02 0.02 0.02 0.05 0.05 0.05 0.08 0.06 0.08
20 0.03 0.02 0.02 0.09 0.08 0.09 0.13 0.13 0.14
50 0.16 0.13 0.12 0.75 0.73 0.73 1.22 1.23 1.22
100 0.89 0.87 0.86 5.76 5.77 5.76 12.4 12.5 12.4
500 44.6 44.9 45.2 526.3 628.5 630.2 2540.0 2550.0 2545.0
Third near-exact distribution
4 0.23 0.20 0.17 0.22 0.22 0.19 0.44 0.41 0.34
10 2.56 1.75 1.28 2.62 2.59 2.12 12.50 11.65 8.42
15 7.72 5.69 4.17 13.44 6.37 5.68 46.84 47.11 39.61
20 15.88 13.74 11.29 29.03 12.59 11.45 173.46 190.15 166.89
50 130.96 102.31 72.76 260.52 241.05 110.67
* Above 1 h. The same applies for = 100 and = 500

Scenario IV Scenario V Scenario VI


35

90
2
30

NearExact Quantiles

NearExact Quantiles
NearExact Quantiles

100
3
25

110
20

4
15

120
10

130

10 15 20 25 30 35 6 5 4 3 2 130 120 110 100 90


Exact Quantiles Exact Quantiles Exact Quantiles

Fig. 2 QQ-plots for Scenarios ivvi. The near-exact quantiles, 0.80, the corresponding exact quantiles were computed using the Gil-Pelaez
0.85, 0.90, 0.925, 0.95, 0.975, 0.99, 0.995 and 0.999, were computed (1951) inversion formulas and the bisection method
using the near-exact distribution function in (17) for = 100 and

rational number. However, the expression they use for their topic is reviewed by Sandve and Drabls (2006). Our analy-
function J ( , , , ) is not valid when its first and third sis focuses on a method proposed by Bailey and Gribskov
arguments are symmetrical, which means that their expres- (1997) for calculating p values for the test of simultaneous
sions for the cumulative distribution and probability density matching of p DNA sequences in a database. More precisely,
functions simply do not work. the authors consider the test statistic
p
'
4.2 Computational biology W p (n) = X i (n), (18)
i=1

Our second example is on motif discovery in biological where for each i, X i (n) is a sequence of random variables
sequences. Some interesting computational and statistical converging in distribution to a standard Gumbel distribution
issues arising in modeling these problems are documented in X i , as n ; here n should be understood as the number
Keich and Nagarajan (2006), and the huge literature on the of DNA sequences in the database. Under the assumptions in

123
Stat Comput

Table 6 Values of for


scenario b p=2 p = 10 p = 20 p = 30 p = 50

Second near-exact distribution


4 6.0 102 3.7 102 3.3 102 3.2 102 3.1 102
10 1.9 102 1.3 102 1.2 102 1.1 102 1.1 102
15 1.2 102 8.2 103 7.6 103 7.3 103 7.1 103
20 8.7 103 6.1 103 5.6 103 5.4 103 5.3 103
50 3.4 103 2.4 103 2.2 103 2.1 103 2.1 103
100 1.7 103 1.2 103 1.1 103 1.0 103 1.0 103
500 3.3 104 2.3 104 2.1 104 2.1 104 2.0 104
Third near-exact distribution
4 4.2 104 1.7 104 7.7 105 4.8 105 2.7 105
10 2.6 105 9.3 106 4.0 106 2.5 106 1.4 106
15 7.4 106 2.6 106 1.1 106 7.1 107 4.0 107
20 3.1 106 1.1 106 4.7 107 2.9 107 1.6 107
50 1.9 107 6.7 108 2.9 108 1.8 108 1.0 108
100 2.3 108 8.2 109 3.6 109 2.2 109 1.2 109
500 1.8 1010 6.5 1011 2.8 1011 1.7 1011 9.8 1012
1e04

5
Pointwise Difference
0e+00

log ()
10
1e04

15
2e04

20 10 0 10 20 20 40 60 80 100
w Precision Parameter ()

Fig. 3 Pointwise difference between the exact density ( of the Fig. 4 Comparing our near-exact approximation with the approxima-
sum of two independent Gumbel random variables (1 , 1 ) = tion in Cetinkaya et al. (2001), on the basis of the measure , as defined
)
(0, 1) and (2 , 2 ) = (0, 10) and the densities obtained through our in (16), over several values of the precision parameter ; the solid and
near-exact approximation (gray lines), as well as the difference between dashed lines respectively correspond to our near-exact approximation
the exact density and the approximation in Cetinkaya et al. (2001) (black and the approach in Cetinkaya et al. (2001) for Scenario i
line). The exact density was obtained using the inversion formulas in
Gil-Pelaez (1951), and for our approach we take a precision parameter
of = 4, 7, 10, respectively corresponding to the dotted, dashed, and FW (w) = P(W w)
solid gray lines
( p 1)! exp{w}w p1 /
= P(W w), (19)
Bailey and Gribskov (1997), X 1 , . . . , X p is thus a sequence ( p 1)!
of independent standard Gumbel random variables, so that
as an approximation to the distribution function of W . In
the limiting distribution of the test statistic (18) is
opposition, our near-exact approximation for W can be
p
' obtained from (13), and it is based on the shifted GNIG dis-
W p (n) ! W = Xi . tribution
i=1 $ %
T
The authors then propose SGNIG r = ( p1 1 , ), = (1, . . . , 1, l), , ,

123
Stat Comput

= 1.38 106 = 1.38 106

6e09
0.00
0.05

Pointwise Differences
Pointwise Differences

4e09
0.10
0.15

2e09
0.20

0e+00
0.25

5 10 15 20 25 30 29.80 29.85 29.90 29.95


w w
(a) (b)

Fig. 5 Pointwise difference between the exact distribution


( function of the exact density and the approximation in Bailey and Gribskov (1997)
the sum of six independent Gumbel random variables (1 , 1 ) = = (black line). The exact distribution function was obtained using the
)
(6 , 6 ) = (0, 1) and the distribution function obtained through our inversion formulas in Gil-Pelaez (1951), and for our approach we take
near-exact approximation (gray line), as well as the difference between a precision parameter of = 10. Here a, b correspond to different
windows of interest

with , l, given in (11). In Fig. 5 we consider the case skov (1997). These results reinforce the proximity, already
p = 6 with = 10. As it can be observed the pointwise assessed in Sect. 3, between the near-exact distributions
differences between the exact distribution function and the developed and the exact distribution.
distribution function (19) corresponding to the Bailey and Although the computation times for the Bailey and Grib-
Gribskov approximation, are much larger in absolute value skov approximation are comparable with those for the near-
than the ones provided by our near-exact approximation. exact approximations developed for = 4 or = 6, the
To make direct comparisons with the results obtained by precision obtained has no comparison.
Bailey and Gribskov, we use the percent error, which they
define as 4.3 Flood risk management
/
P(W w) P(W w)
err% (w) = 100 , (20) In this section we first show how our results can be used to
P(W w) obtain simple confidence intervals for the location parame-
/ ter of a Gumbel distribution, and then we apply our results
and where we replace P(W w) by P(W w) corre-
to a real data set of annual maximum floods of the river
sponding to our near-exact approximation, which is given
Nidd in Yorkshire, UK, used by Hosking et al. (1985). Let
by
(X 1 , . . . , X n ) be a random sample from a population with
P(W w) = 1 FW (w) Gumbel(, ) distribution, so that
( )
= 1 FV w ; r , , , (21) 2 2
E(X i ) = + , var(X i ) = ,
6
where r = ( p1T 1 , ) and = (1, . . . , 1, l), and
where FV is the distribution function of a random variable where is the EulerMascheroni constant. There are two
V = W with a GNIG distribution, as defined in (23) in cases to be considered: i) is known, and ii) is unknown.
Appendix 1. To evaluate P(W w) we use the Gil-Pelaez
inversion formulas. 4.3.1 Case I
The resulting percent error is plotted in Fig. 6. Compar-
ing this figure with Fig. 5 in Bailey and Gribskov (1997) it If is known, the classical moment estimator of ,
is possible to observe the differences of scales in the ver-
= X ,
0
tical axis which show that the percent errors for the near-
exact approximations are extremely low when compared to is unbiased and consistent in quadratic mean, and as such a
the ones obtained for the approximation in Bailey and Grib- good candidate to build confidence intervals for . Based on

123
Stat Comput

0 is consistent
to build confidence intervals for , given that U

0.006
for , since
3
NearExact Percent Error

6 p
0
U = 4 56 X 7 S 2 .
0.002

p
4 56 7
+ p

As such, an approximate confidence interval for is given


0.002

by
1 2
U0 q1/2 , U
0 q/2 , (22)
0.006

where q/2 and q1/2 are respectively the /2 and the


1! /2 quantiles of U0 , where we have that U
0 =
0 5 10 15 20 25 30 1 n
w
n i=1 X i , with
8 3 9
Fig. 6 Percent error, as defined in (20), for near-exact distributions. 63 2 6
X i Gumbel S , S2 .
The near-exact distribution functions were obtained using (21), where
we take a precision parameter of = 10. The solid, dashed, dotted, and
dashed-dotted black lines respectively correspond to p = 2, 3, 4, 5; the Based on a first impression, one could be tempted to infer
solid gray line corresponds to p = 6 from (22) that changes in the value of S 2 would not affect
the width of the confidence interval, but we note that S 2
appears multiplying in both parameters of the Gumbel dis-
our near-exact approximations in Sect. 2, we can compute for
tribution of X i , and hence the larger the S 2 the wider the
a given level of confidence , the near-exact quantiles q/2
confidence interval.
and q1/2 of 0 , such that
To show that these confidence intervals yield the due cov-
( ) erage probabilities, we performed some simulation studies
1 = P q/2 < 0
< q1/2
( ) for coverage probabilities of 0.90, 0.95 and 0.99; the results
=P 0 q1/2 < < 0
q/2 , are reported in Tables 78. For each case we have simulated
50 batches of 100 samples of size 5 for the case of known
and thus and of size 10 for the case of unknown and we counted
1 2 the number of times, out of 100, that the true value of
q1/2 , 0
0 q/2
fell into the respective confidence interval. The near-exact
is a near-exact level confidence interval for!
. Note that the quantiles, needed to determine the confidence intervals, were
is the quantile of n 1 i=1
quantile of 0 n
X i , where calculated using the near-exact distribution function in (17)
X i Gumbel( , ), and a close approximation to this taking = 4 for both cases of known and unknown .
may be obtained through the near-exact quantiles determined For the case of known , using = 5 and = 5.6, confi-
using the near-exact distribution function in (17). dence intervals for the proportion of times that the true value
of fell into the corresponding confidence interval, based
on the asymptotic distribution of the maximum likelihood
4.3.2 Case II estimator of the proportion p in a Binomial (100, p ) dis-
!n tribution, for a sample of size 50, gave, respectively for the
If is unknown, for S 2 = (n 1)1 i=1 (X i X )2 , we nominal coverage probabilities of 0.90, 0.95 and 0.99,
have
[0.8944, 0.9108] , [0.9414, 0.9538] , [0.9863, 0.9921] ,
2 2 2 p 2 2
E(S ) = var(X i ) = , S 2 var(X i ) = , being clear that in each case the nominal coverage probability
6 6
falls in the respective confidence interval.
so that For the case of unknown , we also used = 5 and
3 = 5.6 to simulate the samples, and then we estimated
6 p
S 2 . as described above. A similar procedure as described above,

gave the following confidence intervals for the proportion
Thus, we propose using the estimator of times that the true value of fell into the corresponding
3 confidence interval
0 = X 6 S2 ,
U [0.8975, 0.9137] , [0.9482, 0.9598] , [0.9875, 0.9929] ,

123
Stat Comput

Table 7 Number of times, out of 100, that the true value of fell into the corresponding confidence interval, in the case of known
Coverage probability Number of times

0.90 93, 90, 84, 90, 85, 93, 89, 88, 98, 91, 87, 94, 91, 88, 97, 88, 90, 87, 92, 95, 90, 93, 86, 93, 89, 88, 92, 90,
89, 90, 85, 92, 86, 92, 91, 92, 91, 93, 84, 90, 87, 89, 93, 91, 87, 97, 93, 91, 89, 90
0.95 96, 94, 96, 95, 95, 95, 95, 91, 93, 97, 98, 98, 95, 95, 95, 89, 93, 97, 97, 98, 92, 94, 93, 97, 95, 97, 94, 92,
96, 97, 94, 99, 97, 92, 95, 91, 95, 96, 92, 92, 98, 97, 93, 94, 89, 96, 92, 95, 95, 97
0.99 100, 100, 97, 99, 99, 99, 99, 99, 100, 98, 100, 100, 98, 99, 100, 98, 100, 96, 97, 99, 100, 96, 100, 99, 99,
100, 99, 99, 100, 98, 98, 99, 100, 100, 99, 99, 100, 99, 97, 98, 99, 98, 99, 100, 100, 99, 98, 99, 100, 98

Table 8 Number of times, out of 100, that the true value of fell into the corresponding confidence interval, in the case of unknown
Probability Number of times

0.90 91, 93, 91, 91, 92, 84, 92, 84, 91, 89, 96, 96, 93, 92, 89, 89, 90, 89, 93, 89, 93, 97, 86, 90, 94, 89, 89, 86,
91, 87, 90, 88, 89, 91, 95, 93, 91, 90, 89, 95, 87, 93, 87, 95, 88, 87, 94, 91, 92, 87
0.95 92, 94, 96, 99, 96, 99, 93, 97, 97, 95, 98, 94, 97, 97, 95, 97, 93, 95, 97, 93, 95, 96, 96, 92, 98, 92, 93, 97,
97, 96, 92, 95, 95, 96, 94, 98, 97, 94, 97, 99, 90, 97, 97, 97, 96, 97, 92, 92, 94, 95
0.99 100, 99, 98, 99, 98, 97, 96, 100, 100, 99, 98, 99, 97, 99, 99, 100, 97, 99, 98, 100, 99, 99, 100, 100, 98, 100,
99, 99, 100, 100, 99, 100, 100, 99, 100, 100, 99, 99, 100, 100, 99, 99, 100, 98, 99, 99, 99, 100, 97, 99

being once again clear that in each case the nominal coverage 5 Discussion
probability falls indeed in the respective confidence interval.
The above results show the adequacy of the confidence inter- In this paper we develop precise, tractable, and computation-
vals proposed even for very small sample sizes. ally appealing near-exact approximations for the distribution
To illustrate the utility of the interval estimation proce- of the linear combination of independent Gumbel random
dure developed above, we consider 35 annual maximum variables. The precision parameter plays a key role in mod-
annual maximum floods of the river Nidd in Yorkshire, ulating the desired reliability of our approximations, with
UK, taken from the Natural Environment Research Council larger values of leading to a higher accuracy. The value
NERC (1975, p. 235). As mentioned by Hosking et al. (1985, of can hence be chosen according to the targeted level
p. 258) these data may reasonably be assumed to come from of precision, but this entails a precisionburden tradeoff as a
a Gumbel distribution. For these data we have as estimates higher value of requires a larger computational investment.
for the parameters and respectively 0 = 109.33 and Although our illustrations focused mostly on the case of sums
= 47.34. The near-exact quantiles, q/2 and q1/2 , were
0 of independent Gumbel variates, our approaches are tailored
determined using the near-exact distribution function in (17) for linear combinations in general, and their accuracy seems
and taking = 4. Hence for 1 = 0.90, 0.95, 0.99 we to be mildly uniform over a different set of weights and sev-
have eral combinations of shape and scale parameters. From the
point of view of modeling extremes, more complex struc-
q0.05 = 11.03, q0.95 = 44.76, tures of dependenceother than exact independenceare
q0.025 = 8.15, q0.975 = 48.38, certainly of interest, as well as tails which are heavier than
q0.005 = 2.69, q0.995 = 55.68, the Gumbel. As discussed by Albrecher et al. (2011) sim-
ple and manageable modelssuch as the CramrLundberg
modelare based on restrictive independence assumptions,
and thus the confidence intervals for , and for 1 = but still can be used as a natural starting point for modeling.
0.90, 0.95, 0.99, are respectively given by Although not explored here, our near-exact approxima-
tions have the potential to be used as a baseline modelsay as
[91.91, 125.64] , [88.29, 128.52] , [80.99, 133.98] . a centering distribution in a Bayesian nonparametric setting
(Mller and Quintana 2004)and from that point of view
it can be understood as a computationally appealing starting
The coverage probabilities obtained for samples of size 5
point for modeling linear combinations of heavy-tailed data
and 10, show that the confidence intervals obtained in this
with more complex structures of dependence. In this context,
way may be applied even for small sample sizes; these are
it seems for example natural centering a Dirichlet process
usually situations in which maximum-likelihood estimation
DP(M, FW ) at our near-exact approximation FW , where
procedures, are not always satisfactory, even for moderate
M > 0 controls the variability of the random distributions
sample sizes as pointed out by Hosking et al. (1985).

123
Stat Comput

,p
F generated according to the DP prior, such that we have where y > 0, K = i=1 ri i ,
F Beta(M FW , M(1 FW )). Since E(F) = FW ran- rj
!
dom realizations of the DP process would on average coin- j (y) =
c j,k y k1 ,
cide with our near-exact distribution, and the role played by rj
k=1
! k1
!
the parameter M can be better understood by noticing that yk
j (y) = c j,k (k 1)! ,
i!ki
var(F) = FW (1 FW )/(M + 1). Hence, by taking a small k=1 i=0 j

value of M the contribution to the inference of the paramet- and the c j,k are given in (11)(13) in Coelho (1998). The
ric model FW would be large, whereas larger values of M GNIG distribution of depth ( + 1) N, introduced by
will give more priority to the data, which may hopefully be Coelho (2004), is defined as the distribution of Y =
informative on the tails and on the structure of dependence ! !
X + j=1 X j , where X is independent of j=1 X j ,
to be revealed. and X Gamma(, l), with R+ \N. We denote this
by Y GNIG(r , , + 1), where r = (r, ) and
= (, l), and the corresponding density and distribution
6 Supplementary material functions are
f Y (y; r , , + 1)
The supplemental files include additional numerical reports,
'
and Mathematica programs which can be used to imple- = K l exp{ j y}
ment the methods described in the article. j=1
rj # &
Acknowledgments We thank the Editor, the Associate Editor, and
' (k) k+1
two Reviewers for their careful reading and constructive comments.
c j,k (k+) y 1 F1 (, k +, (l j )y) ,
We also thank Anthony Davison, Barry Arnold, and Vanda de Carvalho k=1
for helpful discussions and recommendations on an earlier version of and
this paper. Part of this work was developed during an academic visit of
F. Marques and C. Coelho to the Ecole Polytechnique Fdrale de Lau- FY (y; r , , + 1)
sanne where M. de Carvalho was a Post-Doctoral Fellow. This work was

'
partially supported by the Fundao para a Cincia e a Tecnologia (Por- l y
tuguese Foundation for Science and Technology) through the project = (+1) 1 F1 (, +1, ly) K l exp{ j y}
PEst-OE/MAT/UI0297/2014 (Centro de Matemtica e Aplicaes), and j=1
under the Fondecyt Project 11121186. rj k1
' ' y r +i ij
cj,k (+1+i) 1 F1 (, +1+i, (l j )y),
k=1 i=0
(23)
Appendix
for y > 0 and where cj,k = (c j,k kj )/ (k); in the above
Appendix 1: Results and definitions on distributions of expressions 1 F1 () denotes the Kummer confluent hyperge-
interest ometric function.
The random variable X = X + is a shifted Gamma
Part I: The GIG and GNIG distributions distribution with rate R+ , shape r R+ , and shift
ind.
Let X j Gamma(r j , j ) with shape parameters r j N R, if X Gamma(r, ), and we denote this by X
and rate parameters j R+ , all different, for j = 1, . . . , . SGamma(r, , ); the shifted GIG and GNIG distributions
The GIG distribution of depth N, introduced ! by Coelho
are analogously defined and denoted by SGIG(r, , , ) and
(1998), is defined as the distribution of Y = j=1 X j , and SGNIG(r , , + 1, ).
we denote this by Y GIG(r, , ), for r = (r1 , . . . , r ) and Part II: The DGIG distribution and the sum (and the dif-
= (1 , . . . , ). The density and distribution functions of ference) of a DGIG random variable with an independent
Y are Gamma random variable
Let X 1 GIG(r 1 , 1 , p1 ), with r 1 = (r11 , . . . , r1 p1 )

' and 1 = (11 , . . . , 1 p1 ), and X 2 GIG(r 2 , 2 , p2 ), with
f Y (y; r, , ) = K j (y) exp{ j y}, r 2 = (r21 , . . . , r2 p2 ) and 2 = (21 , . . . , 2 p2 ) be two inde-
j=1
pendent random variables with GIG distributions. Let us
then consider the random variable Y = X 1 X 2 . Y has a
and
DGIG distribution whose density and distribution functions
are given by (2.12) and (2.15) in Coelho and Mexia (2010),
'
FY (y; r, , ) = 1 K j (y) exp{ j y}, and we denote this by Y DGIG(r 1 , r 2 , 1 , 2 , p1 , p2 ).
j=1 The shifted SDGIG distribution, with shift R, is denoted

123
Stat Comput

by Y SDGIG(r 1 , r 2 , 1 , 2 , p1 , p2 , ). Next we obtain and c jk ( j = 1, . . . , p1 ; k = 1, . . . , r1 j ) given by (2.9)


results on the distribution of the sum (and the difference) of (2.11) in Coelho and Mexia (2010), with p replaced by p1 and
a DGIG with an independent Gamma random variable; these r j replaced by r1 j and d jk ( j = 1, . . . , p2 ; k = 1, . . . , r2 j )
results are relevant for our third near-exact distribution. One defined in a similar manner, replacing p1 by p2 and r1 j by
useful way to look at the distribution of Y is to see it as a r2 j , and where, for y 0,
particular mixture of integer Gamma or Erlang distributions.
Indeed, after some rearrangements the density and distribu- ki
1j
tion functions of Y may be respectively written as f Y jki (y) = y ki1 e1 j y ,
(k i)

p r and
1 '1 j k1

' '


p jki f Y jki (y), y 0, ki1
' t1 j

j=1 k=1 i=0
FY jki (y) = 1 y t e1 j y , (24)
f Y (y) = r2 j k1
t!
'p2 ' ' t=0



p jki f Y (y), y < 0,
jki are respectively the density and distribution functions of
j=1 k=1 i=0
Y jki Gamma(k i, 1 j ), while f Y ( ) and FY ( )
jki jki
Gamma
are the density and distribution functions of Y jki
and (k i, 2 j ).
The weights p jki and p jki verify the relation
p r2 j k1
' 2 '' r1 j k1 r2 j k1

p1 '
' ' p2 '
' '


p jki p jki + p jki = 1 .


j=1 k=1 i=0
j=1 k=1 i=0 j=1 k=1 i=0

r1 j k1
p1 '
'
'

+


p jki FY jki (y), y 0,
Let now W Gamma(, ), where is a positive non-
j=1 k=1 i=0
FY (y) = p2 'r2 j k1 integer real, be independent of Y . We will consider the ran-

' '

p jki dom variables Z 1 = Y + W and Z 2 = Y W and derive



j=1 k=1 i=0 their distribution functions. The distribution function of Z 1 ,




'p2 ' r2 j k1
' will be given by

p jki FY (y), y < 0,

jki
j=1 k=1 i=0 .+
FZ 1 (z) = FY (z w) f W (w) dw,
0
where, for j = 1, . . . , p1 ; k = 1, . . . , r1 j ; i = 0, . . . , k 1,
which, for z 0, using the notation introduced above for
the GNIG distribution function, with r = (k i, ) and
p2 '
' r2 1 = (1 j , ), may be written as
K1 K2 (k 1)! (h + i 1)!
p jki = c jk dh
ki
1j =1 h=1
i! (1 j + 2 )h+i .z
FZ 1 (z) = FY ( 4z
56w7 ) f W (w) dw
0 0
and, for j = 1, . . . , p2 ; k = 1, . . . , r2 j ; i = 0, . . . , k 1, .+
+ FY ( 4z
56w7 ) f W (w) dw
p1 '
r1 z 0
K1 K2 ' (k 1)! (h + i 1)! r2 j k1
p2 ' .z
p jki = d jk ch , ' '
ki
2j =1 h=1
i! (1 j + 2 )h+i = p jki f W (w) dw
j=1 k=1 i=0 0
' r1 j k1
p1 ' ' .z
with + p jki FY jki (z w) f W (w) dw
j=1 k=1 i=0 0
4 56 7
p1
" p2
"
r r distribution function of
K1 = 11jj , K2 = 22jj ,
G 1 GNIG(r ,1 ,2)
j=1 j=1

123
Stat Comput

r2 j k1
p2 '
' ' .z FZ 2 (z) = P(Y W z) = P(Y W + z)
+ p jki 1 f W (w) dw .z
j=1 k=1 i=0 0 = + 7z ) f W (w) dw
FY ( 4w 56
p2 r2 j k1
''' .+ 0
0
p jki FY (wz) f W (w) dw .+
jki
j=1 k=1 i=0 z + FY ( 4w 56
+ 7z ) f W (w) dw
4 56 7
1FW Y (z) z 0
jki p
'''2 r2 j k1 .z
r2 j k1
p2 '
' ' =
p jki f W (w) dw
= p jki
j=1 k=1 i=0 0
j=1 k=1 i=0
r1 j k1
p1 ' ' p2 ' r2 j k1
' .z
' '
+ p jki FG 1 (z, r , 1 , 2) p jki FY (wz) f W (w) dw
jki
j=1 k=1 i=0 j=1 k=1 i=0 0
r2 j k1
p2 ' B C 4 56 7
' '
p jki 1 FW Y (z) distribution function of
jki
j=1 k=1 i=0 G 2 GNIG(r ,2 ,2)
p1 'r1 j k1 ' r2 j k1
p2 ' ' .+
' '
= p jki FG 1 (z, r , 1 , 2) + p jki f W (w) dw
j=1 k=1 i=0 j=1 k=1 i=0 z
r2 j k1
p2 ' r1 j k1
p1 ' .+
' ' ' '
+ p jki FW Y (z), + p jki FY (w+z) f W (w) dw
jki jki
j=1 k=1 i=0 j=1 k=1 i=0 z
4 56 7
while for z < 0 we have 1FW Y
jki
(z)
r2 j k1
p2 ' r1 j k1
p1 '
' ' ' '
.+ = p jki + p jki
FZ 1 (z) = FY ( 4z
56w7 ) f W (w) dw j=1 k=1 i=0 j=1 k=1 i=0
r2 j k1
p2 '
0 0 ' '
' r2 j k1
p2 ' ' .+ p jki FG 2 (z; r , 2 , 2)
= p jki f W (w) dw j=1 k=1 i=0
r1 j k1
p1 '
j=1 k=1 i=0 0
' '
4 56 7 p jki FW Y jki (z)
=1 j=1 k=1 i=0
' r2 j k1
p2 ' ' .+ p1 'r1 j k1
' '
p jki FY (wz) f W (w) dw = 1 p jki FW Y jki (z)
jki
j=1 k=1 i=0 0 j=1 k=1 i=0
4 56 7 r2 j k1
p2 '
=1FW Y (z)
' '
jki p jki FG 2 (z; r , 2 , 2)
r2 j k1
p2 ' j=1 k=1 i=0
' '
= p jki FW Y (z) .
jki while for z 0 we have
j=1 k=1 i=0
FZ 2 (z) = P(Y W z) = P(Y W + z)
We thus have .+
r1 j k1 = + 7z ) f W (w) dw
FY ( 4w 56
p1 '
'

'


p jki FG 1 (z; r , 1 , 2) 0 0


j=1 k=1 i=0 p 2
''' r2 j k1 .+


' p2 ' r2 j k1
' = p jki f W (w) dw
FZ 1 (z) = + p jki FW Y (z), z 0, (25) j=1 k=1 i=0 0

jki

j=1 k=1 i=0 p1 r1 j k1
''' .+

p2 'r2 j k1

' ' + p jki FY jki (w+z) f W (w) dw


p jki FW Y (z), z < 0.
jki j=1 k=1 i=0 0
j=1 k=1 i=0 4 56 7
1FW Y (z)
jki
Concerning Z 2 = Y W we have, for z < 0, using the r1 j k1
p1 '
' '
notation introduced above for the GNIG distribution func- = 1 p jki FW Y jki (z),
tion, with r = (k i, ) and 2 = (2 j , ), j=1 k=1 i=0

123
Stat Comput


so that .+
w t+k1 ew(+1 ) dw
FZ 2 (z) =
r1 j k1 z
' p1 ' '

*r 1 t

1 p jki FW Y jki (z), z 0, (, z) 1 z ' r1 ' B t C

= 1 + e

j=1 k=1 i=0 () () t! k

' p1 ' r1 j k1
' t=0 k=0
(26) +
1 p jki FW Y jki (z)

(z)k (+1 )t+k (t + k, (+1 )z)

j=1 k=1 i=0

p2 ' r2 j k1

' '

p jki FG 2 (z; r , 2 , 2), z < 0 .

j=1 k=1 i=0


while for z < 0 it yields
It remains now to obtain the distribution function of
random variables of the type of Z = W Y , where .+
W Gamma(, ) and Y Gamma(r, 1 ), where , 1 FZ (z) = 1 FY (w z) f W (w) dw
and 2 are positive reals and r is a positive integer. The dis-
0
tribution function of Z is given by
.+
FZ (z) = P(W Y z) = P(Y z W ) = 1 {1 P(Y > w z)} f W (w) dw
= 1 P(Y W z) 0
.+ .+
= 1 FY (w z) f W (w) dw = 1 f W (w) dw
0 0

which for z 0, using the expression in (24) for the distri- .+


bution function of an integer Gamma or Erlang distribution, + P(Y > w z) f W (w) dw
yields 0
*r 1 t
.z 1 z ' r1 ' B t C
= e (z)k
FZ (z) = 1 FY (w 7z ) f W (w) dw () t! k
4 56 t=0 k=0
0 +

0 .
4 56 7
=0 w t+k1 ew(+1 ) dw

.+ 0
FY (w z) f W (w) dw *r 1 t
1 z ' r1 ' B t C
z = e (z)k
() t! k
.+ t=0 k=0
+
= 1 {1 P(Y > w z)} f W (w) dw
( + 1 )t+k (t + k) ,
z
.+
= 1 f W (w) dw
and as such
z
.+
F (z) =
+ P(Y > w z) f W (w) dw Z *r 1 t B C

(, z) ' r ' t
z 1
z
1 + e 1
*r 1
() () t! k
1 z ' r1
t=0 k=0 +


= FW (z) + e
k t+k
() t! (t + k, (+1 )z) , z 0,
t=0 (z) (+1 )


.+ *r 1 t B C

' r ' t
(w z)t w 1 ew(+1 ) dw

e 1 z 1
(z)k

() t! k
z
t=0 k=0 +
*r 1

t
1 z ' r1 ' B t C

( + 1 )
t+k
(t + k) , z < 0.
= FW (z) + e (z)k
() t! k
t=0 k=0

123
Stat Comput

Appendix 2: Representation of a logarithmized Gamma Berry, A.: The accuracy of the Gaussian approximation to the sum
distribution as an infinite sum of shifted exponential of independent variates. Trans. Am. Math. Soc. 49, 122136
(1941)
distributions Burda, M., Harding, M., Hausman, J.: A Poisson mixture model of
discrete choice. J. Econom. 166, 184203 (2012)
If X Gamma(r, ) its hth moment is given by Castillo, E., Hadi, A.S., Balakrishnan, N., Sarabia, J.M.: Extreme Value
B C (r + h) and Related Models with Applications in Engineering and Science.
E Xh = h . (27) Wiley, Hoboken (2005)
(r ) Cetinkaya, C., Kanodia, V., Knightly, E.W.: Scalable services via egress
admission control. IEEE Trans. Multimed. 3, 6981 (2001)
Then, the random variable Y = log X has what we call Coelho, C.A.: The generalized integer Gamma distributiona basis
a logarithmized Gamma distribution and its characteristic for distributions in multivariate statistics. J. Mult. Anal. 64, 86102
function may be obtained from (27) in the following way (1998)
Coelho, C.A.: The generalized near-integer Gamma distribution: A
(r it) it basis for near-exact approximations to the distribution of statis-
Y (t) = E(Y it ) = , t R, tics which are the product of an odd number of independent Beta
(r )
random variables. J. Mult. Anal. 89, 191218 (2004)
Using the equality Coelho, C.A., Arnold, B.C., Marques, F.J.: The distribution of the prod-
G$ % H uct of powers of independent uniform random variables. J. Mult.
1" 1 zB z C1 Anal. 113, 1936 (2013)
(z) = 1+ 1+ , z C, Coelho, C.A., Marques, F.J.: Near-exact distributions for the indepen-
z n n
n=1 dence and sphericity likelihood ratio test statistics. J. Mult. Anal.
we have 101, 583593 (2010)
I Coelho, C.A., Marques, F.J.: Near-exact distributions for the likeli-
$ %
1 1 " 1 r it hood ratio test statistic to test equality of several variance-covariance
Y (t) = 1+ matrices in elliptically contoured distributions. Comput. Stat. 27,
(r ) r it n
n=1 627659 (2012)
$ %1 J Coelho, C.A., Mexia, J.T.: Product and Ratio of Generalized Gamma-
r it
1+ exp{log it } Ratio Random Variables: Exact and Near-exact Distributions-
n Applications. Lambert Academic Publishing AG & Co, Saarbrcken
# &G "
(2010)
r it n +r Domingos, P., Hulten, G.: A general framework for mining massive
= exp{log } data streams. J. Comput. Graph. Stat. 12, 945949 (2003)
r it n + r it
n=1 Esseen, C.-G.: Fourier analysis of distribution functions, a mathematical
# $ $ %%& H
1 study of the LaplaceGaussian law. Acta Math. 77, 1125 (1945)
exp it log 1 + . Gil-Pelaez, J.: Note on the inversion theorem. Biometrika 38, 481482
n (1951)
Hence Y is also the characteristic function of an infinite sum Gumbel, E.J.: The return period of flood flows. Ann. Math. Stat. 12,
163190 (1941)
of independent shifted Exponential distributions. This shows
Grilo, L.M., Coelho, C.A.: Development and study of two near-exact
that a logarithmized Gamma random variable may be repre- approximations to the distribution of the product of an odd number
sented as an infinite sum of independent shifted Exponential of independent Beta random variables. J. Stat. Plann. Infer. 5, 1560
random variables. 1575 (2007)
Hwang, H.-K.: On convergence rates in the central limit theorems for
combinatorial structures. Euro. J. Combin. 19, 329343 (1998)
Hosking, J.R.M., Wallis, J.R., Wood, E.F.: Estimation of the generalized
References extreme-value distribution by the method of probability-weighted
moments. Technometrics 27, 251261 (1985)
Albrecher, H., Constantinescu, C., Loisel, S.: Explicit ruin formulas for Keich, U., Nagarajan, N.: A fast and numerically robust method for
models with dependence among risks. Ins. Math. Econ. 48, 265270 exact multinomial goodness-of-fit test. J. Comput. Graph. Stat. 15,
(2011) 779802 (2006)
Aldous, D., Shepp, L.: The least variable phase-type distribution is Loaiciga, H.A., Leipnik, R.B.: Analysis of extreme hydrology events
Erlang. Stoch. Mod. 3, 467473 (1987) with Gumbel distributions: marginal and additive cases. Stoch. Env-
Amari, S.V., Misra, R.B.: Closed-form expressions for distribution of iron. Res. Risk Ass. 13, 251259 (1999)
sum of exponential random variables. IEEE Trans. Rel. 46, 519522 Love, M.: Probability Theory, 4th edn. Springer, New York (1977)
(1997) Marques, F.J.: On the product of independent Generalized Gamma ran-
Antal, T., Sylos Labini, F., Vasilyev, N.L., Baryshev, Y.V.: Galaxy- dom variables. Discussion Paper 192012, CMA-FCT-Universidade
distribution and extreme-value statistics. Europhy. Lett. 58, 59001 Nova de Lisboa (2012)
(2009) Marques, F.J., Coelho, C.A.: Near-exact distributions for the spheric-
Arnold, B.C., Balakrishnan, N., Nagaraja, H.N.: Records. Wiley, New ity likelihood ratio test statistic. J. Stat. Plann. Infer. 138, 726741
York (1998) (2008)
Bailey, T.L., Gribskov, M.: Score distributions for simultaneous match- Marques, F.J., Coelho, C.A., Arnold, B.C.: A general near-exact dis-
ing to multiple motifs. J. Comput. Bio. 4, 4559 (1997) tribution theory for the most common likelihood ratio test statistics
Balakrishnan, N., Ahsanullah, M., Chan, P.S.: Relations for single and used in multivariate analysis. Test 20, 180203 (2011)
product moments of record values from Gumbel distribution. Stat. Mller, P., Quintana, F.: Nonparametric Bayesian data analysis. Stat.
Prob. Lett. 15, 223227 (1992) Sci. 19, 95110 (2004)

123
Stat Comput

Nadarajah, S.: Exact distribution of the linear combination of p Gumbel Tiago de Oliveira, J.: Decision results for the parameters of the extreme
random variables. Int. J. Comput. Math. 85, 13551362 (2008) value (Gumbel) distribution based on the mean and the standard
Nadarajah, S., Kotz, S.: Comments on scalable services via egress deviation. Trabajos de Estadstica y de Investigacin Operativa 14,
admission control. IEEE Trans. Multimed. 10, 160161 (2008) 6181 (1963)
Natural Environment Research Council: Flood Studies Report, Vol. 4, Wang, J.Z.: Selection of the K largest order statistics for the domain of
London (1975) attraction of the Gumbel distribution. J. Am. Stat. Assoc. 90, 1055
OCinneide, C.A.: Characterization of phase-type distributions. Stoch. 1061 (1995)
Models 6, 157 (1990)
Sandve, G.K., Drabls, F.: A survey of motif discovery methods in an
integrated framework. Biol. Dir. 1, 11 (2006)

123

Вам также может понравиться