Journal of Econometrics: Christian Conrad, Enno Mammen

Journal of Econometrics 194 (2016) 319329
Contents lists available at ScienceDirect
Journal of Econometrics
journal homepage: www.elsevier.com/locate/jeconom
Asymptotics for parametric GARCH-in-Mean models

Christian Conrad a , Enno Mammen b,c,
a
Faculty of Economics and Social Studies, Heidelberg University, Bergheimer Strasse 58, 69115 Heidelberg, Germany
b
Institute for Applied Mathematics, Heidelberg University, Im Neuenheimer Feld 294, 69120 Heidelberg, Germany
c
Laboratory of Stochastic Analysis and its Applications, Higher School of Economics, 26, Ulitsa Shabolovka, Moscow, Russia
article info abstract

Article history: In this paper we develop an asymptotic theory for the Quasi-Maximum Likelihood Estimator (QMLE) of
Available online 1 June 2016 the parametric GARCH-in-Mean model. The asymptotics is based on a study of the volatility as a process
of the model parameters. The proof makes use of stochastic recurrence equations for this random function
JEL classification:
and uses exponential inequalities to localize the problem. Our results show why the asymptotics for this
C13
specification is quite complex although it is a rather standard parametric model. Nevertheless, our theory
C22
C51 does not yet treat all standard specifications of the mean function.
G12 2016 Elsevier B.V. All rights reserved.
Keywords:
GARCH-in-Mean
Stochastic recurrence equations
Riskreturn relationship
1. Introduction with i.i.d. mean zero variables Zt with variance equal to one. Here,
= (, ) is the unknown parameter, consisting of the regression
The aim of this paper is to develop an asymptotic theory for parameter and the GARCH parameter = (, , ). The true
the Quasi-Maximum Likelihood Estimator (QMLE) in GARCH-in- parameters are denoted by 0 , 0 , 0 , 0 , 0 and 0 . We also write
Mean (GARCH-M) models for the special case of GARCH(1, 1)- ht for h t (0 ) and m0 for m0 . The function h t ( ) is defined as the
innovations. This model was suggested in Engle et al. (1987) and strictly stationary and ergodic solution of (2). Below we will state
has been frequently used in empirical finance for investigating conditions under which such a solution exists and is unique (see
the riskreturn trade-off implied by Mertons (1973) intertemporal Lemma 1). The existence and uniqueness of such a solution at the
CAPM (see, among many others, French et al., 1987 or Lundblad, value = 0 is guaranteed if E [ln(0 Zt2 + 0 )] < 0, see Nelson
2007). Despite of its popularity in empirical applications, up to (1990). We also write h t ( ) for a solution of (2) with fixed starting
now there is no asymptotic theory for the QMLE of the GARCH- value 0 , that is
M. We will explain why the proof of the asymptotic normality
h t ( ) = + (Yt 1 m (h t 1 ( )))2 + h t 1 ( )
of the QMLE is so difficult in this simple classical parametric
model. There is also a mathematical motivation for investigating with h 0 ( ) 0 . (3)
this model, because difficulties in the study of the model arise
from nonstationarities of derivatives of the likelihood function that In the following, the quasi-likelihood function will be based on
create some nonstandard mathematical problems. h t ( ). Lee and Hansen (1994) and Lumsdaine (1996) were the first
The GARCH(1, 1)-M model is given by to derive the distribution theory for the QMLE of the GARCH(1, 1)
model. The theory has been extended to the general GARCH(p, q)
Yt = m0 (h t (0 )) + h t (0 )1/2 Zt , (1) case by Berkes et al. (2003) and Francq and Zakoan (2004),
among others. For results on GARCH models with heavy-tailed
h t () = + (Yt 1 m (h t 1 ( )))2 + h t 1 ( ) (2) innovations see, e.g., Hall and Yao (2003) or Fan et al. (2014). The
result for the GARCH(1, 1) can be easily extended to a GARCH(1,
1) model with a constant mean function m . Also, one can use
Corresponding author at: Institute for Applied Mathematics, Heidelberg results from the GARCH(1, 1) literature to study the properties
University, Im Neuenheimer Feld 294, 69120 Heidelberg, Germany. of the GARCH(1, 1)-M model. For example, in Carrasco and Chen
E-mail addresses: christian.conrad@awi.uni-heidelberg.de (C. Conrad), (2002) it has been shown that ht in the GARCH(1, 1) model is -
mammen@math.uni-heidelberg.de (E. Mammen). mixing with exponentially decaying mixing coefficients. A detailed
http://dx.doi.org/10.1016/j.jeconom.2016.05.010
0304-4076/ 2016 Elsevier B.V. All rights reserved.
320 C. Conrad, E. Mammen / Journal of Econometrics 194 (2016) 319329

discussion of the dependence structure of Yt and ht is provided Assumptions 1 and 2 imply that ht Zt is a covariance-
in Conrad and Karanasos (2015). Some properties of the volatility stationary process with unconditional variance equal to 0 /(1
process ht follow directly from the ARCH() representation of ht . 0 0 ), see Bollerslev (1986), and with Ehst < for s > 0 small
Christensen et al. (2012) give a complete asymptotic analysis for enough, see Francq and Zakoan (2004). Further, they imply that
a GARCH(1, 1)-M model with modified recurrence equation (2). In E [ln(0 Zt2 +0 )] < 0. As mentioned above, this guarantees that (2)
their model it is assumed that ht ( ) = + Yt21 + ht 1 ( ). Then, has a strictly stationary and ergodic solution ht for = 0 . In the
by construction, the ARCH() representation of ht does no longer proof of consistency of the quasi-maximum likelihood estimator,
depend on m. This allows them to develop a detailed theory, also we make use of the theory on stochastic recurrence equations. The
for nonparametric m. Alternative estimators for a nonparametric essential assumption needed in this approach is stated below. In
m have been studied in Linton and Perron (2003) and Conrad and the following, we denote derivatives of functions m (u) w.r.t. the
Mammen (2008). argument u by m (u), m (u), . . . Derivatives w.r.t. the parameter
For a parametric m function, it is standard to assume that the are denoted by m (u), m
(u), . . .
conditional mean can be written as m (x) = 1 + 2 g (x) with
a fixed function g. The original specification of Engle et al. Assumption 3. It holds that
(1987)
assumes either g (ht (0 )) = ht (0 ) or g (ht (0 )) = ht (0 ), E [ln(Ut )] < 0, D1 < +, D2 < +,
while some authors also use g (ht (0 )) = ln(ht (0 )). The linear
specification is directly motivated by Mertons (1973) intertempo- where
ral CAPM, which suggests that the expected excess return on the

Ut = sup 2 D1 |Zt ht + m0 (ht )| + 2 D2 + ,
market should be proportional to the conditional variance of the ,
market return. As noted by Pagan and Hong (1990), the log specifi-
D1 = sup |m (u)|,
cation may be unsatisfactory, since as ht (0 ) 0 the conditional ,u
variance in logs takes very large negative values and the relation-
D2 = sup |m (u)m (u)|.
ship between the conditional variance and Yt may be overstated. ,u
In this paper we will develop an asymptotic theory for GARCH-
M models. For doing so, we will assume that (2) behaves like a Next, we explain why this assumption naturally arises here. For
stochastic
contraction. Our approach will cover the specifications this purpose, we shortly come back to the classical assumption that
g (x) = x and g (x) = ln(x) but will not apply when g (x) = x. E [ln(0 Zt2 + 0 )] < 0. We recall why it implies that there exists
a stationary ergodic solution ht of the GARCH equation. We will
2. Asymptotics for GARCH(1, 1)-M models later explain why Assumption 3 will be useful for similar reasons.
Afterwards, we will discuss how restrictive the assumption is. We
Our main result is a theorem on the asymptotic normality of the start with a brief discussion of stochastic recurrence equations.
Consider first two sequences ht and ht with different starting
QMLE . The proof of this result proceeds in several steps where in
values 0 > 0 and 0 > 0 that fulfill the recurrence equation of
the first step consistency is shown. In the second step, we derive
ht = h t (0 ):
rates of convergence for . In the final step, this result is used to
get local expansions of the quasi-likelihood function that allow to ht = 0 + ht 1 (0 Zt21 + 0 ),
establish asymptotic normality.
t = 0 + ht 1 (0 Zt 1 + 0 ).
h 2
In the first step, our treatment of the quasi-maximum likelihood
t ht = (ht 1 ht 1 )(0 Zt 1 + 0 ) and the condition
2
estimator is based on a stochastic recurrence equations approach Then h
as developed in Straumann (2005) and Straumann and Mikosch E [ln(0 Zt + 0 )] < 0 implies that h
2
t ht 0 a.s. It can be
(2006). In those papers, stochastic recurrence equations of the shown that this result implies that there exists a unique stationary
quasi-likelihood function and of its derivatives have been used to ergodic solution of the recurrence equation of ht = h t (0 ). The
show that they have a stationary ergodic functional solution. In approach of stochastic recurrence equations has been generalized
the GARCH(1, 1)-M model, we can use these arguments to show w.r.t. two aspects: First, one can consider nonlinear recurrence
that the quasi-likelihood function has a stationary ergodic func- equations. Then one needs conditions of the type E [ln()] < 0
tional solution. But this argument does not apply for the deriva- where is the (random) Lipschitz constant of the recurrence
tives of the quasi-likelihood function, at least under reasonable equation. Second, instead of real valued random variables one can
assumptions. We argue that the derivatives of the quasi-likelihood consider random elements of function spaces.
functions show exploding behavior in a neighborhood of the true We use the second approach with the random functions
parameter and that they only have a stable behavior in a shrinking h t () = + (Yt 1 m (h t 1 ()))2 + h t 1 ().
neighborhood. For this reason, in a second step we have to show
convergence rates for . Having these rates, we only have to con- Consider two sequences h t () and h t () again with different
sider the derivatives of the quasi-likelihood functions in shrinking starting values 0 > 0 and 0 > 0:
neighborhoods.
t () = + (Yt 1 m (ht 1 ())) + ht 1 (),
h 2
We make the following assumptions.
h t () = + (Yt 1 m (h t 1 ()))2 + h t 1 ().
Assumption 1. (a) The parameter set is compact. The function One can show the following Lipschitz inequality:
( , u) m (u) is continuous with respect to and differentiable
t () ht () Ut ht 1 ht 1
h
with respect to u. It holds that > 0, > 0 and
0 for all with some constant < 1. For the true with Ut defined above and ... equal to the sup-norm. In As-
parameter 0 it holds that 0 . The innovations Zt are i.i.d. with sumption 3, we had assumed that E [ln(Ut )] < 0. This assumption
E [Zt ] = 0 and E [Zt2 ] = 1. implies that the recurrence equation
(b) It holds that for all with some constant
h t ( ) = + (Yt 1 m (h t 1 ( )))2 + h t 1 ( )
> 0. The true parameter 0 lies in the interior of .
has a stationary ergodic solution h t ( ). This is stated in the follow-
Assumption 2. It holds that 0 + 0 < 1. ing lemma.
C. Conrad, E. Mammen / Journal of Econometrics 194 (2016) 319329 321
Lemma 1. Let Assumptions 1(a), 2 and 3 be satisfied. Then (2) has a Table 1
Behavior of h t () h t (0 ) when 0 = 0.05.
solution h t () that is unique, stationary and ergodic. Furthermore, it
holds that there exists a > 1 such that 2 0.1 0.2 0.3 0.4 0.5
# divergence 0.02 0.50 0.99 1 1

t sup |h t ( ) h t ( )| 0, a.s. (4) Explosion time 1403.8 1316.5 645.96 187.67 82.73

Notes: The table reports the fraction of cases in which h t () h t (0 ) is diverging
for the random function h t that solves (3) for t 1 with fixed starting (# divergence). For those simulations for which we observe divergence the average
value 0 > 0. explosion time t is reported.
For the convergence statement in (4), one also says that h t () Table 2
converges exponentially fast almost surely to h t (). Behavior of h t () h t (0 ) when 0 = 0.1.
Next, the quasi-maximum likelihood estimator is defined as 2 0.1 0.2 0.3 0.4 0.5
= arg max L T ( ), # divergence 0.41 1 1 1 1

Explosion time 1327.5 363.15 90.85 40.91 25.09
where L T ( ) is the quasi-likelihood function: Notes: see Table 1.
T Table 3
1
L T () = [ln(h t ( )) + h t ( )1 (Yt m (h t ( )))2 ]. (5) Behavior of h t () h t (0 ) when 0 = 0.2.
2 t =1
2 0.1 0.2 0.3 0.4 0.5
For the consistency of , we need one further assumption. # divergence 0.99 1 1 1 1
Explosion time 513.39 69.49 27.84 16.78 11.67
Assumption 4. The random variable Zt2 has a non-degenerate Notes: see Table 1.
distribution. The following identifiability condition holds:
m (h0 ) = m0 (h0 ) a.s. if and only if = 0 . clearly show, the fraction of cases in which h t ( ) h t (0 ) is di-
verging (# divergence) is increasing in 2 . Also, the larger 2
the earlier the difference h t ( ) h t (0 ) explodes. Further, by con-
Note that Zt2 has a non-degenerate distribution if and only if Zt
sidering different values of 0 it becomes evident that divergence
has not a two-point distribution, that means that there exist no two
occurs more often and earlier the larger is the expected value of
points z1 and z2 such that P (Zt {z1 , z2 }) = 1.
ht = h t (0 ).
The next theorem states the asymptotic consistency of the
QMLE. The proof makes essential use of the ergodicity of the We now come to the discussion of asymptotic normality of the
process h ( ). In particular, this also implies that the quasi- QMLE . Unfortunately, as mentioned above, the derivatives of
likelihood function converges to its expectation. the quasi-likelihood do not behave well in fixed neighborhoods of
the true parameter 0 . The basic reason is that under reasonable
Theorem 1. For the model given by (1)(2), let Assumptions 1(a), conditions the derivatives of h t do not behave well at points = 0 .
24 be satisfied. Then it holds that 0 .
P Only for in a shrinking neighborhood of 0 , one can control the
asymptotic behavior of the derivatives. For this reason, we need a
Note that Assumption 1(a) allows that = 0 for all or stronger result than Theorem 1. In our next theorem, we will show
that only 0 = 0 (ARCH(1)-model). These cases are not allowed that converges to 0 with nearly parametric rate OP (ln(T )T 1/2 ).
under Assumption 1(b). We will make Assumption 1(b) in the next (l)
In a next step, we will show that the first two derivatives h t ( ) (l
step, where we will show the asymptotic normality of the QMLE .
{1, 2}) of h t ( ) converge to a stationary ergodic process, uniformly
We now come back to a discussion of Assumption 3. We
over with 0 ln(T )T 1/2 . The limiting processes do not
already outlined that this assumption naturally arises here. We
depend on in this shrinking neighborhood. This can be used to
now show that the assumption is rather restrictive. It is always
show asymptotic normality of the QMLE. For our next theorem, we
fulfilled if D1 and D2 are small enough because we have that
need the following additional assumptions:
< 1 by Assumption 1(a). The assumption D2 < + states
that our function m does not grow faster than x a x. The
treatment of functions with faster growth would require a different Assumption 5. For some D > 0 it holds for 0 that
approach. Consider e.g. the recurrence equation for the linear for the functions g1 (s) = m2 (s), g1 (s) = m (s)m (s), g1 (s) =
function m (x) = 1 + 2 x. Here, we get that m (s)m (s), g2 (s) = m (s), g2 (s) = m (s), g2 (s) = m (s),
g2 (s) = m (s), g2 (s) = m (s), g2 (s) = m (s), g3 (s) = m (s),
h t () h t (0 ) = 0 +
g4 (s) = m (s), g4 (s) = m (s) with some constant D > 0
+ 22 [h t 1 ( ) h t 1 (0 )]2 + .
g1 (s) g1 (s ) D|s s |,
It needs a very careful check why the quadratic term in the re-
currence equation does not lead to an explosive behavior during g2 (s) g2 (s ) D|s s |s1/2 ,

0 t T . We expect that the process h t ( ) is not stationary and g3 (s) D s,
that the process explodes for t . In order to illustrate this be- g4 (s) Ds1/2
havior we simulate the GARCH(1, 1)-M model with the following
parameters: 0 = 0.1, 0 = 0.85, 01 = 0 and 02 = 0.5. For the for s > s . Here, f (s) and f (s) denote the first or second
process h t ( ), all parameters but 2 are chosen as in h t (0 ), while order partial derivative of a function f (s) with respect to or to .
2 {0.6, 0.7, 0.8, 0.9, 1}. We choose T = 3000 and consider
Note that e.g. the vector f (s) has a dimension three greater than
M = 1000 replications. For different values of 2 = 2 02 , the
the dimension of f (s) and that it only differs by three additional
following tables show the fraction of cases in which h t ( )h t (0 ) is
diverging (defined as h t ( ) h t (0 ) > 100) and the average point elements equal to zero. If the meaning of f (s) and f (s) is not clear
in time t when this is happening (explosion time). As Tables 13 from the context we will use the explicit notation.
It can be easily seen that Assumption 5 holds for the function Theorem 2. For the model (1)(2), make the Assumptions 19 with
m (s) = 1 + 2 s if < 1/2 or if = 1/2 and 2 is small > 0 small enough. Then it holds that
enough. Furthermore, it holds for m (s) = 1 + 2 ln(s).
d
n( 0 ) N (21 1 21 ),
Assumption 6. There exists > 0, D3 > 0 such that for
where
4D3 |Zt | + D23 + 4DD3 + 4
1 ht

Vt = sup 1/2
4 0 Zt2 + 0 1 = E (Zt2 1) + ht
0 (ht ) + m0 (ht )ht )Zt
(m

0
2h
it holds that t
1 ht 2 1/2
(Zt 1) + ht (m 0 (ht ) + m0 (ht )ht )Zt ,
E ln(Vt ) < 0. 2 ht
Here, D3 is chosen such that 2 = S 1 + S 2 ,

D 3 | s s | ht ht
|m (s) m (s )|
S1 = E ,
2 s 2h2t
for s, s and 0 . If Vt fulfills that P [Vt > 1] > 0, (m 0 (ht ) + m0 (ht )ht )
0 (ht ) + m0 (ht )ht )(m

then we define 1 > 0 as the solution of the equation EVt 1 = 1. If S2 = E .
ht
P [Vt > 1] = 0 we set 1 = and 11 = 0.
If Vt fulfills that P [Vt > 1] > 0, then there exists a unique
In the statement of Theorem 2, ht = h t (0 ), where h t ( ) is
solution 0 < 1 < of the equation EVt 1 = 1, see Theorem 2.1
in Mikosch and Stric (2000). the unique stationary solution of h t +1 ( ) = g (Yt , h t ( )) +
We make the following assumption on the moments of Zt . s g (Yt , h t ( ))h t ( ). Lemma 4 in the appendix states that this
equation has a unique stationary solution.
Assumption 7. The random variables Zt fulfill the following mo- In the proof of the theorem we will show that 2 is non-
ment condition: singular, see Lemma 15. Note that for Gaussian Zt , we have 1 =

EZt 2 < 2 and the asymptotic covariance is equal to 11 . On the other
hand, if there is no mean function, i.e. m0 (ht ) = 0, then 1 and
for some 2 > 4 with 811 + 221 < 1 and 211 + 421 < 1. 2 reduce to
ht (ht ) ht (ht )

1 1
Assumption 8. The random variables Zt fulfill the following mo- 1 = E and 2 = (E [Zt4 ] 1)E
ment condition: 2 h 2t 4 h2t
E (0 Zt2 + 0 )3 /2 = 1 and the covariance reduces to the one of the standard GARCH(1, 1)
(see Theorem 2.2 in Francq and Zakoan, 2004).
for some 3 > 0 with 21 + 22 + 43
1 1 1
< 1 and 101 + 43
1 1
< 1. 3. Conclusions
By application of Theorem 2.1 in Mikosch and Stric (2000), it
follows that under weak conditions there exists a 231 < 1 with Finding sufficient regularity conditions that ensure consistency
E (0 Zt2 + 0 )3 /2 = 1. The main point of this assumption is that 3 and asymptotic normality of the QMLE for the GARCH-M model
fulfills also 231 + 211 < 1 where 1 comes from Assumption 6. has been a long-standing problem in financial econometrics. We
The assumption is satisfied if, for example, Zt N (0, 1) with consider the special case of a parametric GARCH-M model with
0 + 0 < 1. innovations that follow a GARCH(1, 1) process, which is the
specification most often used in empirical applications. Following
Assumption 9. The matrix Straumann (2005) and Straumann and Mikosch (2006), we make
use of stochastic recurrence equations and employ exponential

E m (ht ) m (ht ) inequalities to show the consistency and asymptotic normality of
the QMLE for certain specifications of the mean function that do
is non-singular for = 0 . not grow too fast.
Note that Assumption 9 trivially holds for the above discussed Acknowledgments
class of models where the conditional mean can be written as
m (x) = 1 + 2 g (x) with a fixed function g. For the study of other
We would like to thank the Editor and three anonymous
cases note that ht coincides with the volatility process of a pure
referees for comments which greatly improved our paper. We
GARCH(1, 1)-model. Thus a check of Assumption 9 does not require
would also like to thank Karin Loch and Matthias Hartmann
any knowledge of the properties of GARCH-in-Mean processes.
for helpful comments and suggestions. Support by Deutsche
We now state the following result on the convergence rate of
Forschungsgemeinschaft through the Research Training Group
the quasi-maximum likelihood estimator.
RTG 1953 is gratefully acknowledged. Research of the second
Proposition 1. For the model given by (1)(2), let Assumptions 19 author was prepared within the framework of a subsidy granted
with > 0 small enough be satisfied. Then it holds that to the HSE by the Government of the Russian Federation for the
implementation of the Global Competitiveness Program.
0 = OP ln(T )T 1/2 .

Appendix
Proposition 1 allows to restrict attention to local expansions
of the quasi-likelihood and this is the essential step to derive Proof of Lemma 1. Put
asymptotic normality of the maximum quasi-likelihood estimator
as stated in our main theorem. g (y, s) = + (y m (s))2 + s (6)
and consider the sequence of random functions t with: Because of Assumptions 1 and 2, we have that E [ht Zt2 ] = E [ht ] <
. Further, ergodicity of ht , ht Zt2 and Zt2 , this implies
[t (s)]( ) = g (Yt , s( )). s because of the
t =1 [1 + ht + ht Zt + Zt ] = OP (1) for s . We apply this
1 2 2
that s
The functions t map continuous functions s : [0, ) onto result with s = T and s = C ln T with C > 0 large enough. With
the class of such functions. Note that this bound, (7) and (4), we get if C is chosen large enough that
ht +1 ( ) = [t (ht )]( ). 1
[L T ( ) L T ( )]
T
Because Yt is a stationary and ergodic sequence, the same holds for
C ln T
t .

C ln T 1
1 + ht + ht Zt2 + Zt2

OP
Consider functions s, s with s s . It holds that T C ln T t =1

|t (s) t (s )|( ) 2| ht Zt + m0 (ht )||m (s) m (s )|
T
1
+ OP C ln T 1 + ht + ht Zt2 + Zt2

+ |m (s)2 m (s )2 | + |s s |( ) T t =1
Ut |s s |( ).
ln T
= OP , (8)
The lemma follows from E [ln(Ut )] < 0 by application of Theorem T
3.1 in Bougerol (1993), see also Proposition 5.2.12 in Straumann
(2005). See also the discussion before the statement of Lemma 1. uniformly over . In particular, we have that T1 [L t ( ) L t ( )]
For the application of Theorem 3.1 in Bougerol (1993), we have also converges almost surely to 0, uniformly over .
checked that E ln+ t (s) s < for some s and E ln+ Ut < , It remains to check the last claim: L( ) < L(0 ) for = 0 . For
where ln+ denotes the function with ln+ (x) = ln(x) if x > 1 and 0 the proof of this claim, it suffices to check that
otherwise. Note that E ln+ Ut < directly follows from E ln Ut < h 0 ( ) = h0 a.s. implies = 0 .
0. Furthermore we get E ln+ t (s) s < for a constant
function s by noting that Yt has a bounded second moment. From h0 ( ) = h0 a.s. and stationarity of (h t ( ), ht ), we get that

h 1 ( ) = h1 a.s. Thus, we have that
Proof of Theorem 1. The theorem can be shown by similar argu-
ments as in Theorem 5.3.1. in Straumann (2005). There, the proof 0 = 0 + ( 0 )Y02 2Y0 ( m (h0 ) 0 m0 (h0 ))
is based on the comparison of L T ( ), L T ( ) and L( ), where L T ( ) is + m2 (h0 ) 0 m20 (h0 ) + ( 0 )h0 a.s.
defined in (5) and
This gives
T
1
L T () = ln(h t ( )) + h t ( )1 (Yt m (h t ( )))2 , a0 + a1 Z0 + a2 Z02 = 0 a.s. (9)
2 t =1
with
1
L() = E ln(h t ( )) + h t ( )1 (Yt m (h t ( )))2 .

2 a0 = 0 + m2 (h0 ) 0 m20 (h0 ) + ( 0 )h0
The proof in Straumann (2005) is based on showing:

2m0 (h0 )( m (h0 ) 0 m0 (h0 )) + ( 0 )m20 (h0 ),
1/2
1 a1 = 2( 0 )h0 m0 (h0 ) 2( m (h0 ) 0 m0 (h0 )),
L T L T 0 (in probability), a2 = ( 0 )h0 .
T
1 We now choose three intervals I1 , I2 and I3 such that three vec-
L T L 0 (in probability), L( ) < L(0 ) for = 0 .

T

tors vj = (vj1 , vj2 , vj3 ) = (P (Z0 Ij ), E [Zj I{Z0 Ij } ], E [Zj2 I{Z0 Ij } ])
(j {1, 2, 3}) are linearly independent. Such a choice is possible
The second claim follows from the fact that h t is a stationary er- because of Assumption 4. We now multiply the terms on the left
godic process (see Straumann, 2005). For the first claim, one uses hand side of (9) by I{Z0 Ij } and take the expectation. This gives
the bound
a0 vj1 + a1 vj2 + a2 vj3 = 0
1
|L T ( ) L T ( )| for j {1, 2, 3}. Because of the linear independence of v1 , v2 and
T
v3 we get that
T
c (Yt m (h t ( )))2 (Yt m (h t ( )))2
t ( ) + +1 a0 = a1 = a2 = 0 .
T t =1 st ( ) st ( )
Now, a2 = 0 implies = 0 . From this equality, a1 = 0 and
c |m (h t ()) m (h t ( ))| |Yt |
T
+ Assumption 4 we get that = 0 . Finally, a0 = 0 implies that
T t =1 st ( ) 0 + ( 0 )h0 which implies = 0 and = 0 because
the distribution of h0 is non-degenerate.
c |m (h t ())2 m (h t ( ))2 |
T
+ , For the proof of Proposition 1, we will make use of the following
T t =1 st ( ) lemmas.
where st ( ) = h t ( ) + h t ( ), t ( ) = |h t ( ) h t ( )| and c > 0 Lemma 2. Make the assumptions of Proposition 1. Then with h t =
is a constant, not depending on . Using Assumption 3, we have h t ( ) and m
= m it holds that
m2 (x) c x, |m (x) m (y)| c |x y|, |m (x)2 m (y)2 |
1 (h t ht )2 (m
(h t ) m(ht ))2

c |x y| with a constant c > 0. Using st ( ) 2 (see Assump-
tion 1) we get with a constant c :
+
2 t =1 h 2t h2t t =1 h t
1 c
T
(h t ht ) 2
[L T ( ) L T ( )] t ( ) 1 + ht + ht Zt2 + Zt2 . (Zt 1)

(7)
T T t =1 t =1 h t

ht and define t as the stationary solution of the recurrence equation
+2 (m
(h t ) m(ht )) Zt + OP (ln T ), (10) t +1 = 1 + Vt t . We have that
t =1

ht
t +1 C 0 t +1 Vt (t C 0 t ).
where a b denotes the maximum of the real numbers a and b.
This implies that (t +1 C 0 t +1 )+ Vt (t C
Proof of Lemma 2. Note that by definition of the quasi-likelihood 0 t )+ , where (x)+ denotes the positive part of x. Because (t
estimator , we have that L T ( ) L T (0 ). Because of (8), this C 0 t )+ is stationary and E ln(Vt ) < 0, we get that
implies that L T ( ) L T (0 ) + OP (ln T ). We make use of the (t C 0 t )+ = 0 a.s. Thus for Wt = C t we have
inequality ln(1 + x) x x2 (1 +(x)+ )1 , where (x)+ is the positive that t 0 Wt a.s. For the proof of the lemma, it remains
part of x. This inequality follows easily from a Taylor expansion to be shown that sup1t T |Wt | = OP (T 1/1 ). If P [Vt > 1] = 0,
around x = 0. From these two inequalities, we get that we can bound Vt by a random variable Vt with P [Vt > 1] = 0
and E (Vt )1 = 1 with 1 as large as we like. Thus, w.l.o.g. we can

1 ht
OP (ln T ) ln assume that P [Vt > 1] > 0. For this case we get from Theorem
2 t =1 h t 4.1 in Goldie (1991) that P (Wt x) cx1 for x for some
constant c > 0. This implies sup1t T |Wt | = OP (T 1/1 ). Note also
1 1 1/2
h t (ht Zt + m0 (ht ) m
(h t ))2 Zt2 that Vt is bounded by definition.
2 t =1
Denote by ht ( ) the solution of

1 (ht h t )2

1 ht h t 1 ht h t
Zt2 h t +1 ( ) = g (Yt , h t ( )) + s g (Yt , h t ( ))h t ( ) (14)
2 t =1 h t 4 t =1 h 2t h2t 2 t =1 h t

1/2
ht with deterministic starting value h 0 ( ) = 1 . The function g was
(m0 (ht ) m
(h t ))Zt defined in (6). Furthermore, g and s g are the partial deriva-
t =1 h t tives of g with respect to or s, respectively. We also define h t ( )

1 1 as the solution of
h t (m0 (ht ) m
(h t ))2 .
2 t =1 h t +1 ( ) = g (Yt , h t ( )) + 2 s g (Yt , h t ( ))h t ( )
The claim of the lemma follows by rearrangement of the terms. + ss g (Yt , h t ( ))h t ( ) (15)
Lemma 3. Make the assumptions of Proposition 1. There exist with deterministic starting value h0 ( ) = 2 . Here g , s g and

random variables Wt with sup1t T |Wt | = OP (T 1/1 ) such that for ss g denote second order partial derivatives of g .
0 The next lemma states that
dt +1 ( ) = g (Yt , h t ( )) + s g (Yt , h t ( ))dt ( ) (16)
h t () ht

0 Wt , (11) has a unique stationary solution dt (). We denote this solution by

ht
h t () = dt ().
h () h ( )
t t 0
0 Wt . (12)
Lemma 4. Make the assumptions of Proposition 1. Eq. (16) has a
ht
unique stationary solution h t () = dt () that is ergodic. For >

0, > 1 small enough it holds that
Proof of Lemma 3. We only show claim (11). Claim (12) follows
by similar arguments. For the proof of claim (11), we show that for sup h t ( ) h t ( ) 0, a.s.
some constant C > 0 for 0 0
h t +1 () ht +1 Furthermore, it holds that h t is identical to the derivative of h t , a.s.,

C 0 + Vt ht ( ) ht .

(13) and that it is continuous.
h
t +1
h
t
Proof of Lemma 4. According to Proposition 5.5.1 of Straumann
For a proof of this claim, write h t = h t ( ) and = | 0 |, (2005), it suffices for the statement of the lemma to verify
= | 0 | and = | 0 |. We get that for some constants that: (i) g (y, s) is continuously differentiable with respect to
C1 > 0 for 0 and s for y fixed. (ii) For some > 0 and a stationary

1/2 process Ct with E [ln+ (Ct )] < it holds that g (Yt , s)
|h t +1 ht +1 | + ht Zt2 + (ht Zt + m0 (ht ) g (Yt , s ) Ct |s s | for s, s , 0 . (iii)
E [ln+ (sup0 g (Y0 , h 0 ( )))] < .
m (h t ))2 ht Zt2 + ht + |h t ht | Here, g (y, s) denotes the vector of the first order derivatives

1/2 of g (y, s) with respect to and s for y fixed. We now check

C1 0 (1 + ht + ht Zt2 ) + 2|m (ht ) m (h t )|ht |Zt | (i)(iii). Condition (i) directly follows from our Assumption 1. For
+ 2|m (ht ) m (h t )| |m (ht ) m0 (ht )| the checking of (ii) we note that from Assumption 1 we get by
direct calculations that (ii) holds with Ct = C (1 + ht + ht |Zt |) if
+ |m (ht ) m (h t )|2 + |h t ht |
the deterministic constant C is chosen large enough. The condition
C1 0 (1 + ht + ht Zt2 ) E [ln+ (Ct )] < follows from EZt2 < and Eht < .

D23
For the proof of (iii), one shows the following bound for 0
+ |ht ht | D3 |Zt | + DD3 +
+ . small enough and C > 0 large enough
4
g (Y0 , h 0 ( )) g0 (Y0 , h 0 (0 ))
If we divide both sides of this inequality by ht +1 , we get Eq. (13),
because of ht +1 (0 Zt2 + 0 )ht . For the proof of the lemma, it C [1 + h0 + h0 |Z0 | + (1 + |Z0 |)|h 0 ( ) h0 |]
remains to show that (13) implies (11). Put t = |h t ht |/ht C [1 + h0 + h0 |Z0 | + (1 + |Z0 |)h0 W0 ],
1 (h t ( ) ht )2

where in the last inequality Lemma 3 has been used. Now, by direct
calculations with C > 0 large enough c 0 2
T t =1 h 2t ( ) h2t
g0 (Y0 , h 0 (0 )) C [1 + h0 + h0 |Z0 |]. 1 (m (h t ( )) m(ht ))2

+ c + 0 2
T t =1 h t ( )
This gives for 0 small enough and C > 0 large enough
for all 0 with probability tending to one.
g (Y0 , h 0 ( )) C h0 (1 + Z02 )W0 .
Proof of Lemma 6. Put t (0 ) = 0 and define for = 0
Claim (iii) follows from E [ln+ (h0 )] < , E [ln+ (1 + Z02 )] < and
(ht ( ) ht )2

E [ln+ (W0 )] < . This concludes the proof of Lemma 4. t ( ) = 0 2
h 2t ( ) h2t
(m (ht ( )) m(ht ))2

Lemma 5. Make the assumptions of Proposition 1. It holds for + ( 0 ) Rt ( 0 ) ,
h t ( )
0 that
ht (0 )h t (0 )
(h t ( ) ht )2
Rt =
sup 0 2 E h2t
0 h t ( )2 h2t (m 0 (ht ) + m0 (ht )h t (0 ))
0 (ht ) + m0 (ht )h t (0 ))(m
+ .
ht
( 0 ) S1 ( 0 ) 0,

(17)
Because of Lemmas 1 and 4, we have that t is ergodic and
stationary. Using the bound (21) for the first term of t and a similar
(h t ( ) ht )2

sup 0 E
2 bound for the second term we get that E [sup0 |t ( )|] < .

0 h2 t Thus, we have that

1
( 0 ) S1 ( 0 ) 0, t ( ) E [t ( )] = oP (1).

sup

(18)
0 T t =1

(m (h t ( )) m0 (ht ))2 From Lemma 5, we know that sup0 E [t ( )] 0 for

sup 0 E 2

0 h ( ) t
0. Here, denotes the spectral norm of a matrix, i.e.
E [t ( )] is continuous in = 0 . The statement of the lemma
t =1 Rt = S + oP (1), see Lemma 4, and our

now follows from T1
( 0 ) S2 ( 0 ) 0,

(19)
assumption that S is non-singular. Here, we make the assumption
that is chosen small enough.
(m (h t ( )) m0 (ht ))2

sup 0 2 E Our next lemma contains an exponential inequality for

0 h t
martingales. This inequality is a modification of e.g. Lemma 8.9 in
Van de Geer (2000).

( 0 ) S2 ( 0 ) 0.

(20)
Lemma 7. For random variables ..., e1 , e0 , e1 , . . . , eT suppose that
et is Ft -measurable for an increasing -field Ft , that E [et |Ft 1 ] = 0
and that supt E [exp(c |et |)|Ft 1 ] < (a.s.) for a constant c > 0
Proof of Lemma 5. Choose n with n 0 and n 0 1 (n small enough. Consider a sequence of random variables r1 , r2 , . . .
0 ) e for a unit vector e. For claim (17), we have to show that where rt is measurable with respect to the -field generated by Ft 1 .
Assume that max1t T |rt | c /2 (a.s.). Then it holds that
n 0 2 E (ht (n ) ht ) e E ht (0 )ht (0 ) e 0.
2

1/2

h 2t (n ) h2t h2t
E exp rt et E exp C rt2 ,
t =1 t =1
For a proof of this claim first note that because of Lemma 4 we have
that where C is a deterministic a.s. bound of E 2e2t exp (c |et |) |Ft 1 .

n 0 2 (h t (n ) ht )2 (h 2t (n ) h2t )1

We will make use of this lemma in the proof of the following
lemma. For completeness we will give a proof of Lemma 7,
e h t (0 )h t (0 ) h 2

t e a.s. although proofs of related versions of the result must be available
elsewhere.
Thus, the claim follows by dominated convergence since
Proof of Lemma 7. We will show that for 0 s T
n 0 2 [(h t (n ) ht )2 (h t (n )2 h2t )1 ] Wt2 ht h t (n )1

Wt2 ht 1 Wt4 1 + h2t 1 , (21) E exp rt et
t =1
EWt4 < and Eh2t < , see also Lemma 3.

s

Claims (18)(20) can be shown by similar arguments. E exp rt et Es+1 exp C rt2 , (22)
t =1 t =s+1

Lemma 6. Make the assumptions of Proposition 1. With some where Es+1 [...] denotes the conditional expectation E [...|Fs ]. Note
constants c + > c > 0, it holds that that claim (22) with s = 0 implies the statement of the lemma
1/2
because of
ht ( m
(h t ) m(ht ))2
1/2 + , (24)
h 2t

t =1
2 2
E E1 exp C rt E E1 exp C rt

(h t ht ) 2
ht
t =1 t =1
(Zt 1) + (m(ht ) m(ht ))
Zt
1/2 t =1

ht t =1

ht

rt2 .

= E exp C
(h t ht )2
2/3

t =1 = OP (ln(T )T )
t =1 h 2t h2t
Furthermore, (22) with s = T holds trivially. We will show that 1/2
(22) for s = u + 1 implies that (22) holds for s = u, where
(m
(h t ) m(ht ))2
u = 1, . . . , T 1. Thus by an induction argument we get (22) with + . (25)
s = 0 and this implies the statement of the lemma. t =1 h t
Suppose that (22) with s = u + 1 for some u = 1, . . . , T 1.
then we get by application of the CauchySchwarz inequality that Proof of Lemma 8. We will show that for > 0 small enough

(h t ( ) ht )2

E exp rt et sup
0 t =1
h t ( )2
t =1
1/2
ht (m (h t ( )) m(ht ))2

u+1

+

E exp rt et Eu+2 exp C rt2
h t ( )2
t =1
t =1 t =u +2

(h t ( ) ht ) 2
(Zt 1) + (m (h t ( ))

u+1

h t ( )

= E Eu+1 exp rt et Eu+2 exp C rt2 t =1 t =1
t =1 t =u +2
ht
1/2 m(ht )) Zt = OP (ln(T )). (26)
h t ( )

u +1

E Eu+1 exp 2rt et
t =1 Because of Lemma 1 and consistency of this implies (24).
1/2 For the proof of (26) we will apply Lemma 7 with et = et E [et ],
et = (Zt2 1)I [|Zt | T 1/2 ]. We define rt ( ) = rt ( )I [|rt ( )|

Eu+1 Eu+2 exp C rt2
cr T 1/1 ], rt ( ) = 0 1 (h t ( ) ht )/ht for in a
t =u +2
-neighborhood of 0 with a constant cr > 0. Note that with
1/2 probability tending to one et = et for t = 1, . . . , T . Furthermore,

u
= E exp rt et Eu+1 [exp (2ru+1 eu+1 )] by application of Lemma 3, we have that for all > 0 the constant
t =1 cr can be chosen such that with probability 1 it holds that

1/2 rt ( ) = rt ( ) for t = 1, . . . , T and for all in a -neighborhood of
0 . We now show that for > 0, one can choose a constant c > 0

Eu+1 exp C rt2 .
such that
t =u +2
1/2

We now argue that
P rt ( )et > c ln(T ) T + rt ( ) 2 CT (27)
Eu+1 [exp (2ru+1 eu+1 )] exp( Cru2+1 ). (23) t =1 t =1
If one plugs this into the last inequality one gets (22) with s = u. for T T0 with constant C not depending on and and T0 not
This shows the statement of the lemma. Thus it remains to show depending on . For a proof of (27), we use the inequality
(23). This claim follows by a simple Taylor expansion. One gets with 1/2

|u+1 | |ru+1 | |eu+1 | c /2 |eu+1 | that
P rt ( )et > c ln(T ) T + rt ( )2
Eu+1 [exp (2ru+1 eu+1 )] t =1 t =1
= Eu+1 1 + 2ru+1 eu+1 + 2ru2+1 e2u+1 exp (2u+1 )

rt ( )et exp c ln(T )

E exp
= Eu+1 1 + 2ru2+1 e2u+1 exp (2u+1 )

t =1
Eu+1 1 + Cru2+1

1/2
with rt ( ) = t =1 rt ( ) rt ( ). Application of the last
2
T+
=1+ Cru2+1
lemma with rt = rt ( ), with c = 2cr T 1/2 T 1/1 and with et
exp(Cru2+1 ).
as defined above, gives (27). Note that max1t T |rt | c /2 by
definition and that supt E [exp(c |et |)|Ft 1 ] = supt E [exp(c |et |)]
Lemma 8. Make the assumptions of Proposition 1. It holds that exp(cT 2/2 ) exp(2cr T 1/2 T 1/1 T 2/2 ) < , because of (2/1 ) +
(4/2 ) < 1, by assumption.

(h t ht ) 2
ht In a second step, we apply Lemma 7 with et = et E [et ], et =
(Zt 1) + (m
(h t ) m(ht )) Zt 1/2
Zt I [|Zt | T 1/2 ] and rt = rt ( ) = t =1 rt ( ) rt ( ),
2
t =1 h t t =1 h t T+
11 +231

(h t ht )2 where now rt ( ) = rt ( )I [|rt ( )| cr T

], rt ( ) =
= OP (ln(T ))
t =1 h 2t 0 (m (ht ( )) m(ht )) h ( ) for in a -neighborhood of

1 ht
t
1
0 with a constant cr > 0. Note that uniformly for 1 t T we

(h t ht )2
(m
(h t ) m(ht ))2
have that + ht 1,
t =1
h2t t =1 h 2t
ht
|rt ()| = 0 1 |m (h t ( )) m(ht )| in probability and we get by using the above arguments with (24)
h t ( )
instead of (25) that
h
1/2 1/2

t
= OP (1) 0 1
|ht ( ) ht |ht
+ 0 ht
h t ( )

(h t ht )2
(m
(h t ) m(ht ))2
+ = OP (ln(T )2 ).
= OP (1) 0 1 |h t ( ) ht |h h 2t h2t h t
t ht + OP (1)
1
t =1 t =1
1
+231
= OP (T 1 ), Because of Lemma 6, this proves the statement of the proposition.

by application of Lemma 3. Thus, we have again that for all > 0
the constant cr can be chosen such that with probability 1 We now come to the study of the asymptotic behavior of the
it holds that rt ( ) = rt ( ) for t = 1, . . . , T and for all in a derivatives for h t ( ) and h t ( ). By Lemma 3, we get the statement
-neighborhood of 0 . Application of Lemma 7 with rt = rt ( ), of the following lemma.
1 1
with c = 2cr T 1/2 T 1 +23 and with et as defined above, gives
(27). Note that again max1t T |rt | c /2 by definition and that Lemma 9. Make the assumptions of Proposition 1. For ht ( ) =
supt E [exp(c |et |)|Ft 1 ] = supt E [exp(c |et |)] exp(cT 1/2 )
1 1 1 h t ( ) or ht ( ) = h t ( ) it holds that
exp(2cr T 1/2 T 1 +23 +2 ) < , because of (2/1 ) + (2/2 ) +
ht (0 ) = OP (T 1/1 ).

(4/3 ) < 1, by assumption. sup h
t
1
1t T
We now note that it suffices to show (26) with the supremum
running only over a grid of polynomially many -values. This
follows by using rough estimates for neighbored values of . Thus, For the proof of Theorem 2 we will make use of the following
(26) follows from Eq. (27) with the two choices of et and rt ( ). At expansions for functions s( ):
this stage, also Lemma 6 is used. This concludes the proof of (24).
d
Claim (25) follows from (24) by using the bound sup1t T ht /h t g (y, s( )) = s g (y, s( ))s ( ) + g (y, s( ))
2 sup1t T ht = OP (T 2/3 ). Here, the last equality follows from d
1

Assumption 8, see Theorem 2.1 in Mikosch and Stric (2000) and
v 2
the arguments at the end of the proof of Lemma 3. = ( 2v m )s + , (28)

s
Proof of Proposition 1. From Lemmas 2 and 8, we get that 2v m

2
d

(h t ht )2
(m
(h t ) m(ht ))2 g (y, s( ))
+ d d
h 2t h2t h t
t =1 t =1
= ss g (y, s( ))s ( ) + 2 s g (y, s( ))s ( ) + g (y, s( ))

(h t ht ) 2
0

(Zt 1) + 2 (m (h t )
h t 4v m
= ( 2v m )s +

s

t =1 t =1
2
ht 4 m m

4v m

m(ht )) Zt + OP (ln T )
ht
0 0 0 0

1/2

(h t ht )2 0 0 0 2v m
.

+ 0 (29)

2/3
OP (ln(T )T ) 0 0 0
t =1 h 2t h2t
0 2v m
0 2 m
m 2v m

1/2
Here, we define v = y m (s) and we write s instead of s( ). These

(m (h t ) m(ht ))2
+ . equations will be used in the proof of the next lemma.
t =1 h t
This implies that Lemma 10. Make the assumptions of Proposition 1. For a constant
C > 0 it holds for ht ( ) = h t ( ) or ht ( ) = h t ( ) that

(h t ht )2
(m
(h t ) m(ht ))2
= OP (ln(T )2 T 4/3 ).
ht ( ) ht (0 ) = oP (1).

+ sup sup h
t
1
t =1

h 2
t h 2
t t =1

h t 0 C ln(T )T 1/2 1t T
Because of Lemma 6, this shows that
0 2 = OP (ln(T )2 T 1+(4/3 ) ). Proof. From (28) we get for ht ( ) = h t ( ) or ht ( ) = h t ( ) that

for 0
With Lemma 3, we get from this bound that
( ) ht +1 (0 )

h
t +1
h h
C 0 (1 + ht |Zt | + ht + ht (0 )|Zt |)

= OP (ln(T )T (1/2)+(1/1 )+(2/3 ) ) = oP (1).
t t
sup
1t T ht
+ |ht ( ) ht (0 )|(1 + |Zt | + ht (0 )|Zt | + h
t ht (0 ))
1
Thus, + ht ( ) ht (0 ) Vt (0 Zt2 + 0 )

(30)

(h t ht )2
(m
(h t ) m(ht ))2 for some positive constant C . We now divide the left hand side and
+
t =1 h 2t h2t t =1 h t the right hand side of (30) by ht +1 . Because of ht +1 ht (0 Zt2 +0 )
this gives with a new constant C > 0 Lemma 13. Make the assumptions of Proposition 1. It holds for t =
() ht +1 (0 ) h t (0 ) or t = ht that

ht (0 )
h
t +1
C 0 1 + |Zt | 1 1 1 1
ht +1 ht 21 +2
t t = OP (T
sup h 1
+ T 31 +23 ).
+ ht |ht ( ) ht (0 )|(1 + ht (0 ))
1 1 t T

h ( ) h (0 )

+ t t
Vt . (31) Eq. (29) can be used to show that for 0 C ln(T )T 1/2
ht
ht +1 ( ) h t +1 (0 )

Using the bounds of Lemmas 3 and 9 and sup1t T ht = OP (T 2/3 )
this gives for 0 C ln(T )T 1/2 that C 0 (1 + |Zt |ht + ht + h t (0 ) + h t (0 ))
() ht +1 (0 ) h ( ) h (0 )

h
t t + |h t ( ) h t (0 )|(1 + |Zt |)(1 + h t ht (0 ) + ht ht (0 ))
1 1
Vt ,
t +1
Rt + C (32)
ht +1 ht + h t ( ) h t (0 )(1 + Zt )

where

+ h t ( ) h t (0 ) Vt (0 Zt2 + 0 )

1 1 1 1
sup1t T |Rt | = OP (ln(T )[T 1 +2 (1/2) + T 21 +23 (1/2) ]).
Arguing as in the proof of Lemma 3 this gives that for some positive constant C . By application of the bounds in the
h () h (0 ) last lemmas this gives that

1 1
t t
= OP (ln(T )[T 1 +2 (1/2)
ht ht +1 ( ) h t +1 (0 )

ht ( ) h t (0 )

R Vt ,
1
+231 (1/2)
+ T 21 ])Wt . ht +1
t +C
ht
1
Because of sup1t T Wt = OP (T 1 ) this gives that where
1 1 1 1
h () h (0 )

t | = OP (ln T (T
sup1t T |R 31 +2 (1/2)
+ T 41 +23 (1/2) )).
1 1
+21 (1/2)
t t
= OP (ln(T )T 1 [T 1 Using the last inequality, we get the statement of the following
ht lemma because by assumption 811 + 421 < 1 and 1011 +
1
+231 (1/2) 431 < 1.
+ T 21 ]) = oP (1)
because of 41 + 22 1
< 1 and 611 + 431 < 1 by
1
Lemma 14. Make the assumptions of Proposition 1. For constants
Assumptions 7 and 8. This concludes the proof of the lemma.
C > 0 it holds that
By a small modification of the proof of the last lemma we get
t (0 ) ht ( ) ht (0 ) = oP (1).
sup h
that sup 1

0 C ln(T )T 1/2 1t T
Lemma 11. Make the assumptions of Proposition 1. It holds for
ht () = h t ( ) or ht ( ) = h t ( ) that We now show that the matrix 2 is non-singular.
h ( ) h (0 )
1

sup sup ht t t
0 C ln(T )T 1/2 1t T Lemma 15. Make the assumptions of Theorem 2. It holds that the
1 1 1 1 matrix 2 is non-singular.
0 OP (T 21 +2
+ T 31 +2 3
).
Proof. Suppose that 2 is singular. Then there exists a vector =
Proof. Note that (31) implies that (
, ) = 0 with

() ht +1 (0 ) h ( ) h (0 ) 2 = 0.

h
t t
t +1
0 Rt + C Vt ,
ht +1 ht Because 2 is defined as 2 = S1 + S2 where S1 and S2 are positive
1
1 +2 1 1
21 +23
1 semi-definite, we get that S1 = 0 and S2 = 0. This implies
where sup1t T |Rt | = OP (T
+T ).
that
The next lemma states that
ht = 0 a.s.
d
t +1 = g0 (Yt , h t (0 )) + 2 s g0 (Yt , h t (0 ))h t (0 )

m (ht ) + m0 (ht )ht = 0 a.s.

+ ss g0 (Yt , h t (0 ))d
t +1 ( ) (33) 0

has a unique stationary solution dt . We denote this solution by
Thus we have that m 0 ( ht ) =
m0 (ht ) = 0 a.s. Now,
ht = d
t . Note that this is a random value and not a random func- Assumption 9 implies that = 0.
tion. We now make use of the recurrence equation of ht . Denote the
vector that contains the first three elements of ht by h,t . Then we
Lemma 12. Make the assumptions of Proposition 1. Eq. (33) has a have
t that is ergodic. For > 1 small
unique stationary solution h t = d
enough it holds that 1
1/2
h,t +1 = ht Zt2 + (2 ht Zt m0 (ht ) + )h,t .
ht h t (0 ) 0, a.s. ht
Lemma 11 implies the following bound for the second deriva- Note that
h,t = ht = 0 a.s. and because of stationarity of ht

tives of h t ( ) and h t ( ): also

h,t +1 = ht +1 = 0 a.s.

Thus we have that a.s. Bollerslev, T., 1986. Generalized autoregressive conditional heteroskedasticity.
J. Econometrics 31, 307327.
0 =
h,t +1 Bougerol, P., 1993. Kalman filtering with random coefficients and contractions.
SIAM J. Control Optim. 31, 942959.
1 Carrasco, M., Chen, X., 2002. Mixing and moment properties of various GARCH and
=

ht Zt2 + (2 h1t /2 Zt m (ht ) + )
h,t
stochastic volatility models. Econometric Theory 18, 1739.
0 Christensen, B.J., Dahl, C.M., Iglesias, E.M., 2012. Semiparametric inference in a
ht GARCH-in-Mean model. J. Econometrics 167, 458472.
Conrad, C., Karanasos, M., 2015. On the transmission of memory in GARCH-in-Mean

1
models. J. Time Series Anal. 36, 706720.
=

ht Zt2 . Conrad, C., Mammen, E., 2008. Nonparametric regression on a generated covariate
ht with an application to semiparametric GARCH-in-Mean models. Department of
Economics, Discussion Paper No. 473, University of Heidelberg.
For
= ( , , ) this immediately implies that = 0. Fur-
Engle, R.F., Lilien, D.M., Robins, R.P., 1987. Estimating time varying risk premia in
the term structure. J. Bus. Econom. Statist. 9, 345359.
thermore, = 0 because Zt is independent of ht and has a non- Fan, J., Qi, L., Xiu, D., 2014. Quasi-maximum likelihood estimation of GARCH models
degenerate distribution. Finally, we get that = 0 because ht has with heavy-tailed likelihoods. J. Bus. Econom. Statist. 32, 178191.
a non-degenerate distribution. Francq, C., Zakoan, J.-M., 2004. Maximum likelihood estimation of pure GARCH and
In summary, we have that = 0. This contradicts our assump- ARMA-GARCH processes. Bernoulli 10, 605637.
French, K.R., Schwert, G.W., Stambaugh, R.F., 1987. Expected stock returns and
tions that = 0 and it implies that 2 is non-singular. volatility. J. Financ. Econ. 19, 329.
Arguments similar to the ones used in the second part of this Goldie, C.M., 1991. Implicit renewal theory and tails of solutions of random
proof have also been used in the proof of Theorem 2.2 in Francq and equations. Ann. Appl. Probab. 1, 126166.
Hall, P., Heyde, C.C., 1980. Martingale Limit Theory and its Application. Academic
Zakoan (2004). See the proof of statement (ii) in their proof.
Press, New York.
Hall, P., Yao, Q., 2003. Inference in ARCH and GARCH models with heavy-tailed
By making use of the derived results, we now get the statement
errors. Econometrica 71, 285317.
of Theorem 2. Lee, S.-W., Hansen, B.E., 1994. Aymptotic theory for the GARCH(1, 1) quasi-
maximum likelihood estimator. Econometric Theory 10, 2952.
Proof of Theorem 2. We make use of 0 = L T ( ) = L T (0 ) + Linton, O., Perron, B., 2003. The shape of the risk premium: evidence from
L T ( )( 0 ) for some random with = OP (ln(T )T 1/2 ). a semiparametric generalized autoregressive conditional heteroscedasticity
model. J. Bus. Econom. Statist. 21, 354367.
This gives: Lumsdaine, R.L., 1996. Consistency and asymptotic normality of the quasi-
maximum likelihood estimator in IGARCH(1, 1) and covariance stationary
T ( 0 ) = (T 1 L T ( ))1 T 1/2 L T (0 ). GARCH(1, 1) models. Econometrica 64, 575596.
Lundblad, C., 2007. The risk return tradeoff in the long run: 18362003. J. Financ.
Using the above discussions, we get that Econ. 85, 123150.
Merton, R.C., 1973. An intertemporal capital asset pricing model. Econometrica 41,
T 1 L T ( ) 2 , in probability, 867886.
Mikosch, T., Stric, R., 2000. Limit theory for the sample autocorrelations and

1 1 ht 2

1/2 extremes of a GARCH (1, 1) process. Ann. Statist. 28, 14271451.
T 1/2 L T (0 ) = (Zt 1) + ht (m 0 (ht ) Nelson, D.B., 1990. Stationarity and persistence in the GARCH (1, 1) model.
T t =1 2 ht Econometric Theory 6, 318334.
Pagan, A.R., Hong, Y.S., 1990. Nonparametric estimation and the risk premium.
+ m0 (ht )ht )Zt + oP (1). In: Barnett, W.A., Powell J., and Tauchen G.E. (Eds.), Nonparametric and
Semiparametric Methods in Econometrics and Statistics: Proceedings of
the Fifth International Symposium in Economic Theory and Econometrics,
The theorem follows by application of a martingale central limit pp. 5175.
theorem, see Hall and Heyde (1980). Straumann, D., 2005. Estimation in conditionally heteroskedastic time series
models. In: Lecture Notes in Statistics, vol. 181. Springer, Berlin.
Straumann, D., Mikosch, T., 2006. Quasi-maximum-likelihood estimation in
References conditionally heteroskedastic time series: a stochastic recurrence equations
approach. Ann. Statist. 34, 24492495.
Berkes, I., Horvth, L., Kokoszka, P., 2003. GARCH processes: structure and Van de Geer, S., 2000. Empirical Processes in M-estimation. Cambridge University
estimation. Bernoulli 9, 201227. Press, Cambridge.

Journal of Econometrics: Christian Conrad, Enno Mammen

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Journal of Econometrics: Christian Conrad, Enno Mammen

Загружено:

Авторское право:

Доступные форматы

Journal of Econometrics 194 (2016) 319329

Contents lists available at ScienceDirect

Asymptotics for parametric GARCH-in-Mean models

article info abstract

# divergence 0.02 0.50 0.99 1 1

= arg max L T ( ), # divergence 0.41 1 1 1 1

where L T ( ) is the quasi-likelihood function: Notes: see Table 1.

The proof in Straumann (2005) is based on showing:

h t +1 () ht +1 Furthermore, it holds that h t is identical to the derivative of h t , a.s.,

1/2 of g (y, s) with respect to and s for y fixed. We now check

g0 (Y0 , h 0 (0 )) C [1 + h0 + h0 |Z0 |]. 1 (m (h t ( )) m(ht ))2

EWt4 < and Eh2t < , see also Lemma 3.

= Eu+1 1 + 2ru+1 eu+1 + 2ru2+1 e2u+1 exp (2u+1 )

Because of Lemma 6, this shows that

0 2 = OP (ln(T )2 T 1+(4/3 ) ). Proof. From (28) we get for ht ( ) = h t ( ) or ht ( ) = h t ( ) that

tives of h t ( ) and h t ( ): also

Вам также может понравиться