Nonparametric Vector Autoregression

~:
'
i'
''.
ELSEVIER
Journal of Statistical Planning and

Inference 68 (1998) 221-245
journal of
statistical planning
and inference
Nonparametric vector autoregression

W . H/irdle a, A. T s y b a k o v b, L. Y a n g a,,
a Humboldt-Universitdt zu Berlin, Wirtschaftswissenschafiliche Fakultdt, Institutfiir Statistik und
Okonometrie, Spandauer Strasse 1, D-10178 Berlin, Germany
b Universitb Paris VI, Laboratoire de Statistique Thkorique et AppliquOe, 4, pl. Jussieu, Tour 45-55,
F-75252 Paris, France
Received 1 March 1996; received in revised form 17 December 1996; accepted 6 January 1997
Abstract
We consider a vector conditional heteroscedastic autoregressive nonlinear (CHARN) model in
which both the conditional mean and the conditional variance (volatility) matrix are unknown
functions of the past. Nonparametric estimators of these functions are constructed based on local
polynomial fitting. We examine the rates of convergence of these estimators and give a result on
their asymptotic normality. These results are applied to estimation of volatility matrices in foreign
exchange markets. Estimation of the conditional covariance surface for the Deutsche Mark/US
Dollar (DEM/USD) and Deutsche Mark/British Pound (DEM/GBP) daily returns show negative
correlation when the two series have opposite lagged values and positive correlation elsewhere.
The relation of our findings to the capital asset pricing model is discussed. (~) 1998 Elsevier
Science B.V. All rights reserved.
1. Nonparametric vector autoregression

Multivariate time series occur in many scientific disciplines. Their analysis helps in
modeling dynamics over time as well as explaining interdependence among variables.
A common model in this context is vector autoregression where the dynamics over
time are modeled via a linear operation on the past values of the vector time series,
see Liitkepohl (1991). Typically, in these models the conditional covariance is assumed
to be either fixed or of specific form. Since the beginning of the 1980s the drawback
of fixed linear structures has been stressed by Engle (1982), Robinson (1983, 1984)
and Ter~isvirta (1994) in the econometric literature and by Collomb (1984), Tjostheim
(1994), McKeague and Zhang (1994), and Vieu (1994) in the statistical literature.
Nonlinear time-series models that have been proposed are, e.g., threshold autoregressive
(TAR) models of Tong (1978, 1983), the exponential autoregressive (EXPAR) models
of Haggan and Ozaki (1981), the smooth-transition autoregressive (STAR) models of
Chan and Tong (1986) and Granger and Tefiisvirta (1992).
* Corresponding author. E-mail: yang@wiwi.hu-berlin.de.
0378-3758/98/$19.00 (~) 1998 Elsevier Science B.V. All rights reserved.
PII S 0 3 7 8 - 3 7 5 8 ( 9 7 ) 0 0 1 4 3 - 2
222
W. Hdrdle et al./Journal of Statistical Planning and Inference 68 (1998) 221-245
In the analysis of financial time series, e.g., exchange rates, models for conditional
heteroscedasticity are an important feature. Meese and Rose (1991) state that "it is
now recognized that empirical exchange rate models of the post-Bretton Woods era
are characterized by parameter instability and dismal forecast performance..." This
pessimism about the quality of exchange-rate models became generally accepted after
the publication of the influential papers by Meese and Rogoff (1983) and Diebold and
Nason (1990).
The nonparametric modeling of the mean function and the volatility matrix offers a
way out of this pessimism. It does not depend on specific structures of these quantities and may thus lead to valuable suggestions. In the framework of ARCH models (Engle, 1982), non- and semi-parametric approaches (Gregory, 1989; Engle and
Gonzalez-Rivera, 1991) have been proposed. Engle and Ng (1993) measured the impact of news on volatility and found asymmetric volatility functions. Gouri6roux and
Monfort (1992) models both the conditional mean and the conditional variance in a
flexible nonparametric way
J
Y~=~jI(XiEAj)+~-]~fljl(X,.EAj)~i,
j=l
i = 1 , 2 .... ,
j=l
S i = (Y/T 1, YiT2 . . . . . y i T m ) T E ~ md,
Yic~ d
(1.1)
is called a qualitative threshold ARCH model. Here {Aj}jJ=I with fixed J denotes a
partition of the set of lagged values for Y, (ctj), and (//j) are unknown parameter
vectors and matrices, respectively, and ~i is the white noise. It is a generalization of
the threshold model (Tong, 1983), for the conditional mean but shares with it the
drawback of a fixed number J of threshold points.
A generalization of model (1.1) to a wider class of conditional mean and variance
functions can be seen as a limit of (1.1) for J ~ ~ , thus allowing J to be unknown
Yi=f(Xi)+St/2(Xi)~i,
i = 1,2,...,
gt. = (Y/iTI, YiT 2 . . . . . Y i T m ) T ~ md,
Yi~
a.
(1.2)
We call (1.2) a conditional heteroskedastic autoregressive nonlinear (CHARN) model.

It is a generalization of an ARCH structure.
The use of CHARN modeling is motivated by several examples. It has been found
that GARCH(1, 1) processes fit daily and weekly FX (foreign exchange) rates well in
most cases. The situation for intra-daily data is different though, see Guillaume et al.
(1994).
Drost and Nijman (1993) argued that the specific GARCH structure would not allow arbitrary combinations of conditional heteroskedasticity, and leptokurtocity, for
example. Typically, for intra-daily data the deviation of the unconditional return density from normality increases when the sampling interval is decreased. The model (1.2)
will not suffer from these effects since it neither makes strucatral assumptions on f
and S nor distributional assumptions on (. The situation for the CHARN model is
W. Hiirdle et al./ Journal o f Statistical Plannin9 and Inference 68 (1998) 221-245
223
Rescaled Returns
~..
o
I
T
iI
I
'
500 1000 15'00 2000 2500 3000

Days
Fig. 1. The daily returns of the exchange rates of DEM/USD from 2 January 1980 to 30 October 1992.
Rescaled Returns
d
eq
d
mo,
o,
lobo 15'oo 2obo 25'o0 30'00

Days
Fig. 2. The daily returns of the exchange rates of DEM/GBP from 2 January 1980 to 30 October 1992.
depicted in Figs. 1-3. All computations and graphics are done in XploRe, see H~irdle
et al. (1995).
Figs. 1 and 2 show the daily returns (differences of log spot rates) of Y/l = DEM/USD
(Deutsche Mark/US Dollar) and o f Yi2 = DEM/GBP (Deutsche Mark/British Pound)
for the period from 2 January 1980 to 30 October 1992, a total o f 3212 observations:
both are rescaled so that the range always has length 1. Fig. 3 shows that the two
returns are highly correlated, the correlation equals 0.34, and the squared returns (i.e.
224
W. Hiirdle et al./ Journal of Statistical Plannin9 and InJerence 68 (1998) 221-245
Rescaled Scatterplot
I
X
X
x
X~
X
X
Xxx
d
X x
X
xXx
X
X
o,
X
X
x
o,
-().4
-6.2
6
0'.2
DEM/USD
0'.4
Fig. 3. The daily returns of the exchange rates of both DEM/USD and DEM/GBP from 2 January 1980 to
30 October 1992.
Yi2 and Y~22)also have a correlation of 0.17. Both are statistically significantly different
from zero, for a sample size of 3212.
Figs. 4 and 5 display the conditional covariance function as dependent on one lag.
Thus, in (1.2) we have d = 2, m - - 1 and the task is to estimate
f ( x ) = ( A , J 2 ) T(x)
and
S ( x ) = ( 0 " 1 1 0 " 1 2 ) (x).

0"21 0"22
There exists a negative correlation when the two returns have opposite lagged values,
which correspond to the upper left and the lower right corners of the contour plot or
the lowest contour level at about 15.76% below which are the negative values, while
positive correlations are everywhere else. Both the computation and graphics are done
in XploRe, using the WARPing technique (H~irdle et al., 1995), subsequent work in
Section 4 is done in the same fashion and uses the same single bandwidth.
Hiirdle and Tsybakov (1996) proposed a general class of joint mean and volatilityfunction estimators based on the local polynomial (LP) method in the case of onelag-dependence model (1.2) with one-dimensional Yi. The LP estimator was chosen
in favor of the Nadaraya-Watson (NW) estimator, since the NW estimator does not
achieve good asymptotic convergence rates, unless the marginal (stationary) density of
X~ is sufficiently many times differentiable. Sufficient conditions for such a property to
hold in the model (1.2) are not known. The LP method avoids this difficulty, since it
needs only the continuity of the density of X,.. A more practical reason to use the LP
method is that it corresponds to a local least-squares problem, and for this problem
easy and efficient algorithms are available. Bossaerts et al. (1996) used this method to
study foreign exchange rates. For large dimension d and many lags m, however, the
W. Hgirdle et at/Journal of Statistical Planning and Inference 68 (1998) 221-245
225
DEM/USD, DEM/GBP Covariance
Lagged I~IItlSP returns

l,agged I~I/GBP returns
Z: Conditional Covarianee
X:
:
('10"1)
I'10-1)
('10-21
-1.5
Fig. 4. The conditional covariance, using bandwidth h = 0.0536531.
2.50%
I
"
\~5
/-"
.............
.L
-'.o
-%
:............
o.o
o'.~
Cl[, 11,eli, ll,Cl [, II,C41,I},C$1,11
?.o
?.,
I
'.........
['IO -11
Fig. 5. The contours of the conditional covariance.

precision o f
on additive
The idea
(1979) and
the estimators o f both f and Z will decrease. A structured modelling based

assumptions has therefore been proposed by Chen and Tsay (1993a, b).
o f local polynomial estimation goes back to Stone (1977), Cleveland
Katkovnik (1979, 1985). The statistical properties o f LP estimators in
226
W.. Hdrdle et al./ Journal of Statistical Planning and Inference 68 (1998) 221-245
nonparametric regression (convergence, minimax rate of convergence and pointwise

asymptotic normality) were studied by Tsybakov (1986). The LP estimation method
was later discussed by several authors (see Fan and Gijbels, 1996, for references). For
the multidimensional case, we refer to the work of Ruppert and Wand (1994) who
studied the multivariate local linear regression estimation.
This paper is devoted to estimation of the f ( - ) and S(.) functions for the multivariate
CHARN model. We generalize to the vector case the result of H~irdle and Tsybakov
(1996) on asymptotic normality of LP estimators. We restrict the study, however,
to the local linear case. This is motivated by the fact that higher-order polynomial
estimation in higher dimension is less attractive computationally, while the expressions
for asymptotic bias and variance are much more technical, and they do not seem to be
of practical use.
Inspection of the proofs in Section 5 shows that the result of the present paper also
holds (with obvious reformulation) for the multivariate nonparametric regression model
with heteroskedastic e r r o r s : Yi:f(Xi)+2;l/2(Xi)~i, where ~i are as in (1.1), (X~-,Y/)
are i.i.d., and the design points {Xi} are independent of {~i}.
We shall use the work on probabilistic properties of the process (1.2): Doukhan
and Ghindrs (1980, 1981), Chan and Tong (1985), Mokkadem (1987), Diebolt and
Gurgan (1990), Ango Nze (1992). In these papers the ergodicity, geometric ergodicity
and mixing properties of the process {Yi} are derived under appropriate conditions.
The paper is organized as follows. In Section 2, we present the estimator and in
Section 3 we study the asymptotic properties of this LP technique. In Section 4 we
give an application based on the two-dimensional data of DEM/USD and DEM/GBP
returns. In Section 5, proofs of theorems are given.
2. The estimators
The model we consider is

Yi= f ( X i ) + 2;l/2(X~.)~i,
i=m,m+
l .....
(2.1)
where Y/-- (Yn, Y/2..... Yia)T E R a, ~i = (~il, ~i2.... , ia)T E R a, i = m, m + 1,..., n, and
Xi=(YiT1,YiT_2,...,YiT_m)TCR md are random vector variables; ~i are i.i.d, with
E ( l j ) = 0 , for any l<<.j<~d, E(12j)= 1. The mean vector function f : Rind--* ~d and
volatility matrix function 27 : ~ma ~ Ra Ra are unknown, S ( x ) is symmetric and positive definite for any x c R ma, and the initial value Xm=(YVm_I,Y~_2 . . . . . yff)T is a
random vector variable independent of {~i}. We study the problem of estimating the
conditional volatility matrix function S ( x ) and the conditional mean vector function
f ( x ) , given a time series Yo..... yn.
The technique we employ here is typical in multivariate problems. Instead of 2; and f ,
we can equivalently estimate the following functions:
The mean function of vTy, which is f ( x ; v ) = v T f ( x ) , where v E R a has unit length
and x E Rmd;
W. Hdrdle et al./Journal o f Statistical Plannin O and Inference 68 (1998) 221-245
227
The covariance function of vXy and sTy, which is vVZ(x)s, where v, sE R d both
have unit length and x c md.
For the moment we are implicitly assuming stationarity of {Y~). In fact, only an
approximation is true: {Xi} approaches a stationary process, for i ~ ~ as we shall see
later in Lemma 3.1 The LP method solves the following minimization problems:
n
c,(x; v,s) = arg min
~-~(vTYiYJs
cERma+l
-- cTUin)ZKh ( Y i -
X),
l.= m
(2.2)
Cn(X;V)= arg min ~-~(vT~ --cTU~n)2Eh(Xi--X),

cERmd+I i=m
where K : Rma---+~1 is a kernel K h ( u ) = 1/hmdK(u/h), h=hn is a positive number

(bandwidth), hn ---+O, as n --+ cxD and
Uin = F ( u i n ) ,
Uin =
~-x
(2.3)
where F(u) = (lu) E R ma+l, for u E R md. The estimator of f(x; v) is defined as
J~(x; v) = cn(x; v)TF(O).
The estimator of the function a(x; v,s)= vTZ(x)s is defined as
~(x; v,s) = c,(x; v,s)TF(O) - {c,(x; v)vF(O)}{c,(x; s)TF(0)}.
(24)
We have dropped reference to the sample size n in J~(x;v) and ~(x;v,s) for notational simplicity, we will keep this convention in similar situations hereafter. Another
simplification of notation is the use of one single bandwidth in all coordinates of X.
The asymptotic results in the next section are easily extendable to the case of different
bandwidth in each direction, e.g., in a product kernel
Kh(u) =
hj
j=l
'
where h=(hl . . . . hmd)E R md

+ , see Wand and Jones (1995)
3. The asymptotic results

Let I" I denote the Ll-norm when it is applied to a vector, and the usual matrix norm
IAI = sup lAx],
Ixl-1
when it is applied to a matrix A. Assume the following:
(A1) The error variables ~lj, l~j<<.d, are i.i.d. The density p(.) of ~l exists and
satisfies
inf p(x) > 0
xE.~
228
W. Hdrdle et al./ Journal of Statistical Plann&g and Inference 68 (1998) 221-245

for any compact 3( C ~a. Also E ( ~ l j ) = E ( ~ j ) = O, E(~2j)= 1, and E(~2j)= 1
+ m4 < cx).
(A2) There exist constants C1 >~0, C2 ~>0, r > 0 such that for Ixl ~>r
[f(x)l <c~(1 + Ixl),

ISl/2(x)l ~<c2(1 + Ixl).
(3.1)
(3.2)
(A3) The matrix function X(x) is symmetric for any x C ~md, and satisfies
inf Amin{S(x)} > 2x > 0,
xE.)ff
for any compact ~ff C ~ma, where min(Z) denotes the minimal eigenvalue of a
real symmetric matrix S.
(A4) C1 +CeEl~ll<l/m.
Assumption (A1) is needed for identifiability of the estimation procedure.
Assumptions (A1) and (A3) guarantee that the process {X/} does not die out whereas
(A2) and (A4) are conditions for {X/} not to explode. The following lemma given by
Ango Nze (1992) guarantees ergodicity of the process {Xi}. It is based on the application of the results of Nummelin and Tuominen (1982) and Tweedie (1975). Note
that (A4) becomes redundant when both f ( x ) and X(x) are bounded, in which case
C 1=C 2 =0.
Lemma 3.1. Under the conditions (A1)-(A4) the Markov chain {X/} is geometrically
ergodic, i.e. it is ergodic, with stationary probability measure n(.) such that, for
almost every x, as k --+ e~
]lPk( Ix) - n(')llrv = O(p k)

for some 0 <<.p < 1. Here
Pk(B Ix)=P{Xk E B IX,n =x}
for a Borel subset B C g~ma, and ]]. ]]TV is the total variation distance.
Now we state the conditions necessary to derive joint asymptotic normality of f(x; v)
and d(x; v,s) at a fixed point x E E ma.
(A5) The functions f and Z are componentwise twice continuously differentiable at
the point x E Emd.
(A6) The density/t(.) of the stationary distribution n(-) exists, is bounded, continuous
and strictly positive in a neighborhood of the point x.
(A7) The kernel K is a compactly supported bounded nonnegative funetion on ~ma,
such that
, I K ( u ) du = l,
fuK(u)du=O,
fuuTK(u)du=~72'md ,
where a2 > O, and Imd denotes the identity matrix of dimension rod.
W. Hiirdle et aL /Journal of Statistical Planning and Inference 68 (1998) 221~45
229
(A8) h, =fin -l/(4+md), where fl>O.

(A9) The initial value Xm is a fixed vector in W"d.
Condition (A5) is a smoothness condition for the functions f and Z. Note that it
is related to (A8), the optimal speed of bandwidth Condition (A8) guarantees a balance between bias and variance. A faster speed of h would lead to undersmoothing,
a slower rate would increase the bias over the standard deviation of the estimator by
oversmoothing. Both situations are undesirable since they result in less precise estimation. Condition (A6) is necessary to compute asymptotic bias and variance, (A7) is a
typical assumption for kernels. Assumption (A9) supposes that the CHARN model is
started at some fixed vector
Let fj(x) and ajk(x), j, k = 1,2 .... , d, be the components of the vector function f ( x )
and the matrix function S(x), respectively. Denote IIKII~ = f K 2 ( u ) d u . Asymptotic
normality results are presented in the following theorems.
Theorem 1. Under the assumptions (A1) (A9)

n4-~.,~{f(x; v) - vTf ( x ) }
(3.3)
~, ~AP{b(x;v), V(x; v)}
as n ~ oc with
b(x; v) -- l~2 72 Tr[~TZ(vTf ( x ) ) ]
--.--f
and
V(x, v) : fl-md/)T~(x)
-~
2
IIK 112.
In particular, if one let v be the jth or the kth coordinate vector of ~d, one gets the
following joint asymptotic distribution:
\~(x)
A(x U
\bk(xU' \ejk(x) Vk(x)I
a s n --~ o o w i t h
bj(x): _82-}~ [Tr(~72fj(x))]

and
~(x ) = Uma mJ(x) IIKII~,
<,k(x) =
a(x)
Denote
diag(a) =
0 '.-." 00
a2
,.,
...
ad
fl-md (TJk(x )
IIKIh2.
W. Hiirdle et al./Journal of Statistical Planning and Inference 68 (1998) 221-245
230
for any vector
a =
E ~ a.
Theorem 2. Under the assumptions ( A I ) - ( A 9 )
n.+-~ {~(x; v,s) - vr S(x)s} ~
Y(b(x; v,s), V(x; v,s)}
(3.5)
as n --~ oc with
2
b(x; v,s) = fl2~[Tr{V2g(x)} _ {sVf(x)} Tr{V2fT(x)v}]

_ f12 ~ [ { v T f ( x ) } Tr{V2fT(x)s}]
and
V(x; v,s) = fl-ma ]]KI]2 [(m4 - 2)T*(x) +
~(x)
{vT(x)s} 2]
+ fl-,.d Ilgll 2 {vTZ(x)v}{STS(X)S},

~(X)
where
g(x) = g(x; v, s) = {vT f ( x ) } { s T f ( x ) } + {vT S(x)s},
T*(x) = T*(x; v,s) = Tr[diag2{vTzl/2(x)} diag2{X~/2(x)s}].
The covariance of ~(x; v,s) and O(x; v',s') is
/~-mdllKIh2,
~(x) tin4
- 2)
Tr[ diag{vVX'/Z(x)} diag{S'/2(x)s} diag{v'rZ'/Z(x)} diag{Z'/2(x)s'}]
+ fl m~ IIKII2 [{JS(x)v'}{JS(x)s'} + {V~S(x)s'}(sTS(x)v'}].

~(x)
In particular, if one let v and s be the jth and kth coordinate vectors of ~a or the
f th and k'th coordinate vectors, one 9ets
n4+2--md
( ~J:(X)--0-J'~i() ~ ~(JJ~) (bj]klX) ( Vjk(X) Cjk,i"it(X)~
(3.6)
as' n ~ oe with
bjk(x) =/~2
0"2[
Tr{V20"jk(x) 2 v T f j ( x ) V f k ( x ) } ] ,
2
Vjk(x) = Cjk,jk(X),
W. Hgirdleet al.IJournalof StatisticalPlanningandInference68 (1998)221-245
231
where
cjkj, ,(x) = -md Ilgll {(m4 - 2)Tjk,j,k,(x)

*
+ ajj,(x)akk,(X) + ajk,(X)akj,(X)}
and
Sjt(X)Sj, t(X)Skt(X)Sk'I(X)
/=1
in which sjt(x) denotes the (j, l)th entry of the matrix
~'l/2(x).
Finally, as n ~ cx~
0
n4~md(~jk(X)--GJT(X)~
fj (X) J ~ JV'{ ~kbj(X)J ' (gjjk~X) Vj,(x)) I"
(3.7)
The practical use of these results lies in the possibility to check the form of the mean
and volatility functions. For instance, at each point x we can construct a confidence
interval for ajk(x) based on plug-in estimates for bjk(x) and Vjk(x). The bias conceivably
can be estimated from a local cubic estimate. The variance can be estimated by first
^2
calculating the stochastic innovation term ~ij = {Yij - ~ ( X i ) } 2 / r j j ( X / )
and then setting
if/4 =d-1 Ejd=l n-1
E i = m ( i j - - 1 ) 2. The marginal density/~ can be estimated as usual by

a kernel estimator. Since the bias formula is slightly more involved than the variance
formula, some undersmoothing might be recommended.
4. Application
The importance of the CHARN model for financial data has been pointed out in the
introduction. In this section we come back to the introductory example of DEM/USD
and DEM/GBP exchange rates. Figs. 6 and 7 show the estimated conditional mean
functions J](x) and J~(x) as functions of the lagged values xi = (Yl,i-l,y2,i-I)Z. The
surface and the contour plots all show that the mean functions are rather fiat and are
around zero. In fact, 80% of the J](x) values are in an interval around 0 whose length
is only 0.11 times of the range of yl,i, while 80% of the J~(x) values are in an interval
around 0 whose length is only 0.1557 times of the range of y2,i. The pattern of the
conditional covariance function 612(x) is different though, it changes from negative to
positive as shown in Figs. 4 and 5.
Bollerslev et al. (1988, 1992), studied the capital asset pricing model (CAPM) by
means of the multivariate GARCH model. To illustrate the connection between our
vector CHARN model and their model, consider a random vector Yt of excess asset
retums with E(Ytl~t_l)=-IXt and Var(Yt 1 ~ - 1 ) - 2 : , , where ~ t - l is the information
set generated by Yt-i, i = 1,2 . . . . . . If for nonnegative weight vector wt whose elements
add to 1, wT Yt is a mean-variance efficient portfolio, then the CAPM is
Y, =/~,~' + ~t,
232
W. Hdrdle et al./Journal of Statistical Plannin9 and Inference 68 (1998) 221-245
DEM/USD Mean
-1.5 -Lu
Xz LaggedO~4/OSDreLucns
('10"1}
Y: baggedDEH/GBPreturns
('lo'll
Z: CondlLioaal ~ a n
('10 "1)
Fig. 6. The conditional mean function of the DEM/USD daily returns.
DEM/GBP Mean
X: Lagged DEM/USDreturns
(*10"11
Y: Lagged D~4/GBP returns
(410 "1}
Z: conditional ~ean ['10-1)
Fig. 7. The conditional mean function of the DEM/GBP daily returns.
where
fl, - X , w , / w T S,w,,
with E(e, ] ~ 1 ) - 0 , Vat(e, ]~_l)--_--r,, and ,u'7=w~,ut. This is more general than
ordinary CAPM which restricts Zt to be constant. While our CHARN model would
W. Hiirdle et al./Journal of Statistical Plannin9 and Inference 68 (1998) 221-245
233
DEM/USDVolatility
X: Lagged [~EI41~D returns

y: Lagged DEMIGBP returns
Z: Conditional variance
-1.5 -I.0
('I0-I)
('i0 -I}
-o.~
---
('I0 -2}
Fig. 8. The conditional variance function of the DEM/USD daily returns.

stipulate that St depends nonparametrically on a finite number of past observations,
Bollerslev et al. (1988) used the multivariate GARCH model which allows St to depend
on infinite number of past values, but only parametrically. A special form of the
multivariate GARCH model is
S, = oYt- l Y,V-1
for some constant a > 0 in which case

a,2(r~_ 1 ) = o r ~ _ u Y~- 1,2.
This is a hyperbolic function which exhibits the pattern visible in Figs. 4 and 5. For
such a case, our CHARN model and the multivariate GARCH model would yield
similar results.
Figs. 8 and 9 show the estimated conditional variance functions (TII(X) and 622(x)
as functions of the lagged values xi = (Yl,i-1, y2,i-I )T. One can see that the variance
function for the DEM/USD returns has a parabolic shape while that for DEM/GBP is
roughly flat and positive.
5. Proofs
The proofs of Theorems 1 and 2 proceed in the following steps. First the normal equations of the LS problems (2.3) for the mean- and second-moment functions are solved. All estimators are split into a stochastic part and a systematic bias
W. Hiirdle et al./ Journal of Statistical Planning and Inference 68 (1998) 221-245
234
DEM/GBP Volatility
X: bagged ~41USDreturns
I*lO-t)
Y: Lagged t~41GEP returns
I*10 -11
z: Co~itional varilmce
1'10"2)
Fig. 9. The conditional variance function of the DEM/GBP daily returns.

part. Lemma 3.1 is essential in controling the stochastic part. Lemma 5.1 guarantees the strong mixing property of the recursive scheme (1.2). In combination with
Lemmas 5.2-5.5 we then prove the joint asymptotic normality of the mean estimation
as stated in Theorem 1 and that of volatility as stated in Theorem 2.
Set the matrices W= diag{ ~Kh(Xi - x)}~-m and
Z ~
(4
, .
')
Xn--X
"
Define
vTy =
vTr.]
\ vTf(xn) + vTzI/Z(x,)~n /
and also
VT
yyT s =
= ( (v" f(Xm) + VTs'/2(Xm)~',!(sT f(x~) + STZI/2(Xm)~m) ] .
\ (vTf(x,) + vTZl/Z(x,)~,)(sTf(X,) +sTZ]/2(X,)~,) ,]
W. Hiirdle et al./ Journal of Statistical Planning and Inference 68 (1998) 221-245
235
Then
f(x; v) = F(o)T(zwzT)-1ZW[vTy]
(5.1)
if(X; V,S) =
(5.2)
and
F(o)T(zwz T)-
1Zw[vT yyTs] - / ( x , v)/(x; s)
by direct calculations.
First, to have the limit of (zwzT) - l, we need an auxiliary result based on Lemma 3.1.
Lemma 5.1 (Davydov, 1973). A geometrically ergodic Markov chain whose initial
variable is distributed with its stationary distribution is geometrically strongly mixing

with the mixing coefficients satisfying c~(n) <~co p~ for some 0 <P0 < 1, co >0.
Having Lemma 5.1, the next lemma follows:
Lemma 5.2. Under the conditions of Theorem 1 we have
n 44rod ~ (pl(Xi)(p2(Uin)g(uin)
i=m
n
P'~ flmdl~(x) q)l(x)J(p2(u)g(u)du
44rod ~ E{(p,(~) (p2(ui,)K(ui,)} ~
(5.3)
flmdtt(X) (pI(x) [ qJ2(u)K(u)du
i=m
as n--~ ec, provided (Pl(') is a bounded continuous function in a neighborhood of x

and 02(') is a bounded measurable function.
Proof. See H~irdle and Tsybakov (1996, Lemma 4.3).
Lemma 5.3.
[]
A s n --+ o c ,
(zwzT)_I
]g(X) Omd1
01xmd ]
~K-2ImdJ {1 + % ( 1 ) }
(5.4)
uniformly in a compact neighborhood of x.

Proof. The elements of ZWZ T are all in the form of the left-hand side of (5.3). Using
assumption (A7) and then taking matrix inverse, one gets (5.4). []
Now notice that, in view of Lemma 5.3
f(x; v) - vTf ( x ) = F(o)T(zwzT)-IZW[vTy] -- vr f ( x )

= F(o)T(zwzT) - IZW[vTy]
-F(O)T(zwzT)-I(ZWZT) I vTf ( x )
LhV(vr f ( x ) )
W. Hdrdle et al./ Journal of Statistical Planning and Inference 68 (1998) 221-245
236
=F(O)T(zwzT) - I Z W
[OTY-zT
LhV(vTf
vTf(x) (1x))j
] [
- ~(x)n{1 +%(1)}
~ Kh(gi - X)[vTf(xi) -- vTf(x) -- (gi - x)Tv{vTf(x)}]
i=m
1
+ p ( - - ~ { 1 + op(l)} i=mEKh(Yi --X){l)Yz~l/2(Xi)i}"
[]
(5.5)
To prove Theorem 1, one separates (5.5) into a bias part and a stochastic part as usual.
The bias part is handled by the following lemma:
Lemma 5.4. Let g : Rmd ~ ~1 be a twice continuously differentiable function. Then,
under the assumptions of Theorem 1

p(x)n .= Kh(Xi - x)[g(Xi) - g(x) - (Xi - x)TVg(x)]
=h 202
-~- Tr[~72g(x)] + op(h2).
Proof.
Using the Taylor expansion of g(x), we get

1
#(x)n
_
X,
i~=,Kh( i --x)[g(S/)
1 , En
I~(x)nh .=
__
-- g(x) -- ( S i -x)T~g(x)]
(uin)[g(Xi) - g(x) - huT.Vg(x)]
n
1
i~=mg(uin)[h2uT~V2g(x)uin] + R,
2g(x)nh .=
where
1
--
o(h 2)
~-]~K(ui,)[h2~2sup[V2g(x_4_hw)_VZg(x)[ 1
nK
~ (Uin)=op(h 2)
nh i=m
(5.6)
as n ~ e~, where ~ = m a x { [ w [ : wE suppK} and the last equality in (5.6) is due to

Lemma 5.2. Again, by Lemma 5.2 one has, as n--* oo
1
i~=mK(Uin)[uTnV2g(x)Uin]= 1
~ Tr[K(uin)UinuTnV2g(x)]
2#(x )nh
21~(x)nh i=m
p 1 fTr[K(u)uuTV2g(x)]du
W. Hiirdle et al./Journal o f Statistical Planning and Inference 68 (1998) 221-245
237
= ~ Tr [ f K(u)uuT duVeg(x)]
= ~2 Tr[V2g(x)]
2
Combining this with (5.6) we get the lemma.
In particular, if g(x)= vTf(x), one gets from Lemma 5.4
t~(x)n i= Kh(Xi - x)[vY f(xi) -- vTf(x) -- (Xi - x)TV{vr f(x)}]

= b ( x ; / : ) n -2/(4 + md) q_ Op(n-2/(4 + rod) )
(5.7)
as n ~ cxz, where b(x; v) is as given in Theorem 1. This yields the asymptotics of the
bias term in (5.5).
To work out the asymptotics of the variance term, we need another lemma. Denote
c ~ k _ 1 = a ( X k , X k _ 1. . . . , X m ) the a-algebra generated by X m . . . . . X k. []
Lemma 5.5 (Liptser and Shirjaev, 1980, Corollary 6). Let m be a fixed integer and
Jor every n>~m, let the sequence tln =(r/nk,~-k) be a square integrable martingale
difference, i.e.
E(qnkl~k_l)=O,
E(r/2nk)<~,
m<.k<~n,
(5.8)
and let
E(qZk)-----1,
Vn>~no>>.m.
(5.9)
k=m
The conditions
E(~/]k I ~ k _ l ) ~
1,
asn---~xD,
(5.10)
k=m
E(rt2.j(l~.k I > ~) I ~k_ ~) v--~ 0,
asn---~o~,
(5.11)
k=m
are sufficient for convergence

~q,k
~,~ff(0,1),
asn---~oc.
k=m
Proof of Theorem 1. Now we apply Lemma 5.5 to the following stochastic term of
(5.5/
n
i__~
m #(})-----~Kh(X/-x)vTS'/2(Xi)~,
(5.12)
W. Hiirdle et al. I Journal of Statistical Plannin9 and Inference 68 (1998) 221-245
238
and observe that (in view of Lemma 5.2)
G~
~
i=m
Kh(Xi - x)vrzI/2(Xi)~i
1
I~(x)nh md
(5.13)
{l+o(l)}fK2(u)vTS(x)vd u
= n-4/(4 + rod)V(x; 19) { 1 + 0(1 )}.

Define
nnk = Kh(X~
- x)vTz~I/2(Xk
)~k lt(x)nv/~n.
It is clear from (5.13) and (5.3) that (5.8)-(5.10) hold. It remains to check (5.11) in
order to show that
k=m
qnk ---~ ~A/'(0, 1 ),
as n ~ o c .
(5.14)
We have
2 ~<S I~k 12 ,
r/,k--~O~k
(5.15)
where
h,.a/~-md
Q,k - V(x; v)#2(x)n Kh(x~ - x)2 IvTsI/2(Xk )12{ 1 + O(1 )}.

Note that for some constant C(x, v) depending only on x and v
because of the fact that K is compactly supported and that S is bounded in a shrinking
neighborhood of x. This entails
E {]~k12I(1~12>~--Qik))<~C~(x,v),
where
Cn(X, IJ)=E {]~I]2I (,~I]2~
~2nhmd
independent of k. This and (5.15) yield

E {~/2nk/(l~.~l
k=m
>~)I ~-~_, } = ~ E {qn2~I(rl2nk>~e2)I,~k_1}

k=m
~k~mOnkE{l~kl2I(l~kl2~-'~k)l
<. Cn(x,v)C(x,v)k~=m n-n-~ K
239
W. Hdrdle et al./Journal of Statistical Plannin9 and Inference 68 (1998) 221-245
while
n
~__~ n - ~ K
P ) //(X),
as n--~ c~,
by Lemma 5.2. Thus we have proved (5.14). Now (3.3) is a consequence of (5.5),
(5.7), (5.13) and (5.14). To prove the joint asymptotic normality (3.4), note that, in
view of (5.5) and (5.7),
\L(x)-
= \bk(x)J {1 +Op(1)} +n~W-a \ ~ k , J {1+%(1)}
as iv/--+ oo~ where

_
and vj is the jth coordinate vector in d.

By the Cramgr-Wold device, the joint asymptotic normality of (jn and (kn is proved
if one shows that linear combinations of these random variables satisfy
n 4 ]ma(Otj(j n @ ~k(kn ) - - ~ </IF(O, ~2 Vj(x) @ ~2 Vk(X ) jr_ 2~jO~kCjk(X) )
(5.16)
as n ~cx~, V~j, ~kCl~ I.

The proof of (5.16) is quite similar to that of (5.14), and it is based again on the
application of Lemma 5.5. The difference is that instead of G,, one should use now
G', = ~ E
i=m
\ ]AtX )n
Kh(Xi -- X)(O~jOj q- O~kVk)Tz~ll2(xi)~i
7}
= n-4/(4 +,,d)[~ Vj(x) + ~ VKx) + 2~j~kcjKx)]{ l + o(1 )},

where the last equality follows from Lemma 5.2 (cf. (5.13)).
[]
Proof of Theorem 2. Similar to (5.5), we write
6(x; v, s) - o-(x; v, s) --- (vTf(x))(sTf(x)) -- ]'(x; v)f(x; s)

+ F(0)T (ZWZ T )-IZW[vT yyTs]
- (vTf(x))(sTf(x)) -- vTZ(x)s
= (vTf(x))(sTf(x)) -- f(x; v)f(x; S)
+ F( 0)T (ZWZ z )- i ZW [ vy yyV s
_ Z T ( (vT f(x))(sTU(x))+ vTZ(x)s "~]
\hV((vT f(x))(sT f ( x ) ) + vTX(x)s)JJ
IV.. Htirdle et al./Journal of Statistical Planning and Inference 68 (1998) 221-245
240
= ( v T f ( x ) ) ( s T f ( x ) ) - - / ( x ; v)f(x; s)
+ ~
{1 + %(1)} i=m
~ Kn(Xi - X ) [ v T f ( x i ) f ( x i ) T s
-- vT f ( x ) f ( x ) T s -- (Xi - x)TV{(vT f(x))(sT f ( x ) ) } ]
Av ~
J {I +Op(1)} ~
Kh(Xi - - X ) [ v T x ( x J ) S
i=m
vTX(x)~
- (X,. - x ) T V { v S Z ( x ) s } ]
+~
(~i~
{1 + op(1)} ,=m
~ Kh(X, -x)dZ'/2(X, .)
- Id)Z~/2(X~)s
{sTf(Xi)vT+ vTf(Yi)sT}1/2(Xi)~i,
which after plugging in the formula for f ( x ; v ) (5.5)) yields
v T f ( x ) and f ( x ; s ) -
s T f ( x ) (cf.
a(x;v,s) - a ( x ; v , s ) = {1 +op(l)} ~ Tj,
(5.17/
j=l
where
__
TI
#(x)n i E Kh(Xi - X ) [ v T f ( x i ) f ( x i ) T s -- v T f ( x ) f ( x ) T s -- (Xi --x) T

x V{(vTf(x))(sTf(x))}],
T2
T3 --
p(x)n i~=mKh(Xi - x ) [ v T x ( X i ) s -- vT X(x)s -- (Xi - x ) T v { v T Z ( x ) s } ] ,

1
#(x)n i E Kh(Xi - x ) s T f ( x ) [ v T f ( x i )
-- vT f ( x ) -- (Xi - - x ) T V { v T f ( x ) } ] ,
T4 = - #(x)n i= Kh(Xi - x)vT f(x)[sT f ( x i ) -- sT f ( x ) -- (Xi -- x)TV{sT f ( x ) } ] ,

1
1"5 -- ~ ( x ) n i=m
Kh(X , i-
x)vTzI/Z(x")(~i~T
-- Id)S1/2(X~) s'
~ Kh(Xi _ X ) { s T f ( x i ) _ sTf(x)}{vTsI/Z(xi)~i},
T 6 - p(x)n i=m
T7 -- #(x)n .= Kh(~ -- x){vT f ( x i ) -- vT f(x)}{sTX~/Z(xi)~i},
241
V~ Hdrdle et al./ Journal of Statistical Planning and Inference 68 (1998) 221-245
T8 = - [ f ( x ; v) - vT f(x)][f(x; S) -- sT f(x)]
---- - n -4/(4 + md)[n2/(4 + mcl)(f(x; V) -- vT f(x))][n 2/(4 + md)(f(x; S) -- sT f(x))].
(5.18)
Using L e m m a 5.4, one derives
O-2
T1 = ~2 2 [Tr { V2( ( vXf (x ) )(sV f (x ) ) ) } ]n-2/~4 + ma) + Op(n-Z/(4 + ma)),
T2 = f12 ~ [Tr{V2(vX X(x )s)}]n-2/t4 + rod) + Op(n-2/(4 + ma)),
2
T3 = _f12 ~ {sX f ( x ) } Tr{~72 fX(x)v}n-2/(4 + rod) + Op(n-2/(4 + ma)),

2
T4 =
--fl2~-~{vTf(x)} Tr{~2fT(x)s}n -2/(4+md) + Op(n-2/(4+md)),
(5.19)
and thus
T1 + T2 -[- T3 q- T4 = b(x; v,s )n -2/(4+ md) _[_Op(n-2/(4
+ rod)).
(5.20)
Now we calculate 7"6. Note that
[Kh(Z -- x)(sX f ( z ) -- sV f(x))]

<,Kh(z -- x) sup If(x + wh) - f ( x ) l ~<ChKh(z - x),
IwL<~
since K is compactly supported (here C > 0 is a constant). Thus,
E(T~) -
1 i ~"m E[K2(X~ - x ){sT f (x~ ) - sT f (x ) } Z{vT S(xi )v} ]

p(x)2n 2
C2h 2
U(x)2n~
n
i~-mE[K2(Xi -
x)uTz~(Xi)/)].
By L e m m a 5.2
n
h2flmd ~ E[K2(X~ " - x)vr z(xi)v]

nhmd i=m
--~ flmap(x)vXS(x)v [ K Z ( u ) d u = O ( 1 ) ,
as n---*oc,
and therefore
E(T:)=O
=o(n-4/(4+md)),
asn~oc.
The evaluation of T7 is quite analogous and, hence, we get
T6 + 7,7 = Op(n-2/(4 + md)),
as n --~ cx~.
(5.21 )
242
W. Hdrdle et al. I Journal of Statistical Planning and Inference 68 (1998) 221-245
Also, in view of Theorem 1,

n2/(4+md)T8 ~'0,
asn~.
(5.22)
The relations (5.20)-(5.22) show that the sum ~-]4=1 Tj in (5.17) yields the correct
asymptotic bias, while the terms T6, T7, and T8 are asymptotically negligible. It remains
to show the asymptotic normality of the term 7"5:
n2/(a+md)T5 ~'JU(O,V(x;v,s)),
asn---+~.
Again, to prove this, we use Lemma 5.5. We leave out

(5.10) and (5.11) of Lemma 5.5, since it is done as
only deduce the asymptotic expression for the variance
to G, of the proof of Theorem 1, by the asymptotics
the verification of the conditions

in the proof of Theorem 1. We
of Ts, which is given, analogous
of
1 ~mE[(Kh(Xi -- x)vT~I/2(Xi)(~i~T -- Id)Zl/2(Xi)s)2].

G~n'- I~(X)2n2 i=
(5,23)
To study this expression, use the following lemma.

Lemma
Let a=(al . . . . .
5.6..
a d ) T,
t2=(al .....
ad) T, b=(bl
. . . . . bd) T,
and b=
(bl ..... bd) x be vectors in R d. Then under (Al),

E[(aT(~I ~T -- Id)b)(aT(~l ~T _ Id)b)]
---- (rn 4 --
2) Tr[ diag(a) diag(b) diag(~) diag(b)] + (aT~)(bVb) + (aTb)(CtTb).
Proof. Denoting by 3jk the Kronecker delta and using (A1), we get
IT - ld)b)(av(
=E
k,j=l
- Ia)b)]
aj(~lj~lk-- t~jk)bk l,m=l

E al(~ll~lm
+E [ 1<~j<k<~d~ (ajbk +akbj)~lj~lk
-- (~lm)[)m
1<~l<m<~d~ (Ctlbm q-Clmt)l)~ll~l
d
= m4
a bj +bj +
j=l
d
l~j<k~d
~
=(m4 - 2 ) ~~ ajbjctjbj
which yields the lemma.
(ajbk + akbj)(dj[~k + d~[gj)

d
2 j,~k=l(ajbk + akbj )( ~tjbk + ~tkbj)'
W. Hdrdle et al./ Journal of Statistical Planning and Inference 68 (1998) 221-245
Applying Lemma 5.6 with a =
zl/2(Xi)v
243
and b--Sl/2(Xi)s, we find
E[(vTzl/2(X~)(~i~ -/d)Sl/2(X~)S) 2 IX,]

= (m4 - 2)T*(X/) +
{vTs(Xi)s} 2 + {vTs(xi)v}{sTS(Xi)s}.
This and (5.23) yield
,
G,
1 ~=mE[gZ(xi
p(x)2n 2 i=
1
+ ~#(x)Zn
-- x ) ( m 4 -- 2)T*(Xi)]
"
i~=mE[K~(Xi -x)({vTs,(Xi)s} 2 + {vTs,(Xi)v}{sTS,(Xi)s})]
and, in view of Lemma 5.2,

n4/(4 + rod) Gtnl
=/~-ma
IIKII22[(m4
u(x)
-- 2 ) V * ( x ) q- { o T S ( x ) s } 2 q- {vTz~(X)V}{sT,S(X)S}]
x(1 + o(1))
= V(x;v,s)+o(1),
as n---~oc,
which is the expression for asymptotic variance given in Theorem 2.
[]
To show the joint asymptotic normality (3.6) and (3.7) one proceeds as in the
proof of Theorem 1, by using the Cram~r-Wold device and checking the conditions
of Lemma 5.5. The calculations of covariance terms in (3.6) are based on Lemma 5.6
as well.
Acknowledgements
We would like to thank Christian Hafner, Helmut Lfitkepohl, and Rolf Tschemig for
helpful discussions. We also thank our referees for pointing out several technical errors
to us. This research was supported by Sonderforschungsbereich 373 'Quantifikation und
Simulation Okonomischer Prozesse' Deutsche Forschungsgemeinschaft.
References
Ango Nze, P., 1992. Crit~res d'ergodicit6 de quelques mod/~les /l reprrsentation markovienne. C.R. Acad.
Sci. Paris Srr. I 315, 1301-1304.
Bollerslev, T., Chou, R., Kroner, K., 1992. ARCH modeling in finance: a review of the theory and empirical
evidence. J. Econometrics 52, 5-59.
Bollerslev, T., Engle, R., Wooldridge, J., 1988. A capital asset pricing model with time-varying covariances.
J. Political Economy 96, 116-131.
244
W. Hiirdle et al./ Journal o f Statistical Plannin 9 and Inference 68 (1998) 221-245
Bossaerts, P., H~irdle, W., Hafner, C., 1996. Foreign exchange-rates have surprising volatility. In:
Robinson, P.M. (Ed.), Athens Conf. on Applied Probability and Time Series, vol. 2, Lecture Notes
in Statistics, vol. 115, pp. 55-72. Springer, Berlin.
Chan, K.S., Tong, H., 1985. On the use of deterministic Lyapunov functions for the ergodicity of stochastic
difference equations. Adv. Appl. Probab. 17, 666-678.
Chan, K.S., Tong, H., 1986. On estimating thresholds in autoregressive models. J. Time Ser. Anal. 7,
179-190.
Chen, R., Tsay, R.S., 1993a. Nonlinear additive ARX models. J. Amer. Statist. Assoc. 88, 955-967.
Chen, R., Tsay, R.S., 1993b. Functional-coefficient autoregressive models. J. Amer. Statist. Assoc. 88,
298 -308.
Cleveland, W.S., 1979. Robust locally weighted regression and smoothing scatterplots. J. Amer. Statist.
Assoc. 74, 829-836.
Collomb,
1984.
Proprirt~s de convergence presque complete du prrdicteur ~ noyau.
Z. Wahrscheinlichkeitstheorie verwandte Gebiete 66, 441-460.
Davydov, Yu.A., 1973. Mixing conditions for Markov chains. Theory Probab. Appl. 18, 312-328.
Diebolt, J., Gurgan, D., 1993. Tail behaviour of the stationary density of general nonlinear autoregressive
processes of order one. J. Appl. Probab. 30, 315-329.
Diebold, F., Nason, J., 1990. Nonparametric exchange rate prediction. J. lntemat. Econom. 28, 315-332.
Doukhan, P., Ghindrs, M., 1980. Estimation dans le processus Xn + 1 = f ( X n ) + ~n + 1. C.R. Acad. Sci. Paris
Srr. A 297, 61-64.
Doukhan, P., Ghind~s, M., 1981. Processus autor~gressifs non-linraires. C.R. Acad. Sci. Paris S&. A 290,
921 - 923.
Drost, F.C., Nijman, T.E., 1993. Temporal aggregation of GARCH processes. Econometrica 61, 909-927.
Engle, R.F., 1982. Autoregressive conditional heteroscedasticity with estimates of the variance of UK
inflation. Econometrica 50, 987-1008.
Engle, R.F., Gonzalez-Rivera, G., 1991. Semiparametric ARCH models. J. Bus. Econom. Statist. 9, 345-360.
Engle, R.F., Ng, V., 1993. Measuring and testing the impact of news on volatility. J. Finance 48, 1749-1778.
Fan, J., Gijbels, I., 1996. Local Polynomial Modelling and its Applications. Chapman & Hall, London.
Gouri~roux, Ch., Monfort, A., 1992. Qualitative threshold ARCH models. J. Econometrics 52, 159-199.
Granger, C., Ter~isvirta, T., 1993. Modelling Nonlinear Dynamic Relationships. Oxford University Press,
Oxford.
Gregory, A.W., 1989. A nonparametric test for autoregressive conditional heteroscedasticity: a Markov chain
approach. J. Bus. Econom. Statist. 7, 107-115.
Guillaume, D.M., Dacorogna, M.M., Dav~, R.R., Miiller, U.A., Olsen, R.B., Pictet, O.V., 1994. From the
bird's eye to the microscope: a survey of new stylized facts of the intra-daily foreign exchange market,
Olsen Associates Working Paper.
Haggan, V., Ozaki, T., 1981. Modelling nonlinear vibrations using an amplitude-dependent autoregressive
time series model. Biometrika 68, 189-196.
H~irdle, W., Klinke, S., Turlach, B., 1995. XploRe - an interactive statistical computing environment.
Springer, Heidelberg.
H~irdle, W., Tsybakov, A.B., 1996. Local polynomial estimators of the volatility function in nonparametric
autoregression. J. Econometrics, to appear.
Katkovnik, V.Ya., 1979. Linear and nonlinear methods of nonparametric regression analysis. Automatika
35 -46.
Katkovnik, V.Ya., 1985. Nonparametric Identification and Data Smoothing. Nauka, Moscow (in Russian).
Liptser, R.Sh., Shirjaev, A.N., 1980. A functional central limit theorem for martingales. Theory Probab.
Appl. 25, 667-688.
Liitkepohl, H., 1991. Introduction to Multiple Time Series Analysis. Springer, Heidelberg.
McKeague, I.W., Zhang, M.J., 1994. Identification of nonlinear time series from first order cumulative
characteristics. Ann. Statist. 22, 495-514.
Meese, R.A., Rogoff, K., 1983. Empirical exchange rate models of the seventies, do they fit out of sample?
J. Internat. Econom. 14, 3-24.
Meese, R.A., Rose, A., 1991. An empirical assessment of non-linearities in models of exchange rate
determination. Rev. Econom. Studies 58, 601-619.
Mokkadem, A., 1987. Sur un module autorrgressif nonlin~aire. Ergodicit~ et ergodicit~ grometrique. J. Time
Ser. Anal. 8, 195-204.
W. Hiirdle et al./Journal o f Statistical Plannin9 and Inference 68 (1998) 221-245
245
Nummelin, E., Tuominen, P., 1982. Geometric ergodicity of Harris-recurrent Markov chains with application
to renewal theory. Stochastic Process. Appl. 12, 187-202.
Robinson, P.M., 1983. Nonparametric Estimators for Time Series. J. Time Ser. Anal. 4, 185-207.
Robinson, P.M., 1984. Robust nonparametric autoregression. In: Franke, H~irdle, Martin (Eds.), Robust and
Nonlinear Time Series Analysis, Lecture Notes in Statistics, vol. 26. Springer, Heidelberg.
Ruppert, D., Wand, M.P., 1994. Multivariate locally weighted least squares regression. Ann. Stat. 22,
1346-1370.
Stone, C.J., 1977. Consistent nonparametric regression. Ann. Statist. 5, 595-645.
Terfisvirta, T., 1994. Specification, estimation and evaluation of smooth transition autoregressive models.
J. Amer. Statist. Assoc. 89, 208-218.
Tj~stheim, D., 1994. Non-linear time series analysis: a selective review. Scand. J. Statist. 21, 97-130,
Tong, H., 1978. On a threshold model. In: Chen, C.H. (Ed.), Pattern Recognition and Signal Processing.
Sijthoff & Noordholf, Alphen a/d Rijn, The Netherlands.
Tong, H., 1983. Threshold Models in Nonlinear Time Series Analysis, Lecture Notes in Statistics, vol. 21.
Springer, Heidelberg.
Tsybakov, A.B., 1986. Robust reconstruction of functions by the local approximation method. Prob. Inform.
Transm. 22, 133-146.
Tweedie, R.L., 1975. Sutticient conditions for ergodicity and recurrence of Markov chains on a general state
space. Stochastic Process. Appl. 3, 385-403.
Vieu, P., 1994. Order choice in nonlinear autoregressive models. Statistics 24, 1-22.
Wand, M.P., Jones, M.C., 1995. Kernel Smoothing. Chapman & Hall, London.

Nonparametric Vector Autoregression

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Nonparametric Vector Autoregression

Загружено:

Авторское право:

Доступные форматы

~:

Journal of Statistical Planning and

Nonparametric vector autoregression

1. Nonparametric vector autoregression

W. Hdrdle et al./Journal of Statistical Planning and Inference 68 (1998) 221-245

S i = (Y/T 1, YiT2 . . . . . y i T m ) T E ~ md,

gt. = (Y/iTI, YiT 2 . . . . . Y i T m ) T ~ md,

We call (1.2) a conditional heteroskedastic autoregressive nonlinear (CHARN) model.

W. Hiirdle et al./ Journal o f Statistical Plannin9 and Inference 68 (1998) 221-245

500 1000 15'00 2000 2500 3000

lobo 15'oo 2obo 25'o0 30'00

W. Hiirdle et al./ Journal of Statistical Plannin9 and InJerence 68 (1998) 221-245

S ( x ) = ( 0 " 1 1 0 " 1 2 ) (x).

W. Hgirdle et at/Journal of Statistical Planning and Inference 68 (1998) 221-245

DEM/USD, DEM/GBP Covariance

Lagged I~IItlSP returns

Fig. 4. The conditional covariance, using bandwidth h = 0.0536531.

Cl[, 11,eli, ll,Cl [, II,C41,I},C$1,11

Fig. 5. The contours of the conditional covariance.

the estimators o f both f and Z will decrease. A structured modelling based

nonparametric regression (convergence, minimax rate of convergence and pointwise

The model we consider is

W. Hdrdle et al./Journal o f Statistical Plannin O and Inference 68 (1998) 221-245

c,(x; v,s) = arg min

Cn(X;V)= arg min ~-~(vT~ --cTU~n)2Eh(Xi--X),

where K : Rma---+~1 is a kernel K h ( u ) = 1/hmdK(u/h), h=hn is a positive number

where h=(hl . . . . hmd)E R md

3. The asymptotic results

W. Hdrdle et al./ Journal of Statistical Plann&g and Inference 68 (1998) 221-245

[f(x)l <c~(1 + Ixl),

]lPk( Ix) - n(')llrv = O(p k)

W. Hiirdle et aL /Journal of Statistical Planning and Inference 68 (1998) 221~45

(A8) h, =fin -l/(4+md), where fl>O.

Theorem 1. Under the assumptions (A1) (A9)

~, ~AP{b(x;v), V(x; v)}

\bk(xU' \ejk(x) Vk(x)I

bj(x): _82-}~ [Tr(~72fj(x))]

~(x ) = Uma mJ(x) IIKII~,

W. Hiirdle et al./Journal of Statistical Planning and Inference 68 (1998) 221-245

for any vector

Theorem 2. Under the assumptions ( A I ) - ( A 9 )

n.+-~ {~(x; v,s) - vr S(x)s} ~

Y(b(x; v,s), V(x; v,s)}

b(x; v,s) = fl2~[Tr{V2g(x)} _ {sVf(x)} Tr{V2fT(x)v}]

+ fl-,.d Ilgll 2 {vTZ(x)v}{STS(X)S},

Tr[ diag{vVX'/Z(x)} diag{S'/2(x)s} diag{v'rZ'/Z(x)} diag{Z'/2(x)s'}]

+ fl m~ IIKII2 [{JS(x)v'}{JS(x)s'} + {V~S(x)s'}(sTS(x)v'}].

W. Hgirdleet al.IJournalof StatisticalPlanningandInference68 (1998)221-245

cjkj, ,(x) = -md Ilgll {(m4 - 2)Tjk,j,k,(x)

in which sjt(x) denotes the (j, l)th entry of the matrix

calculating the stochastic innovation term ~ij = {Yij - ~ ( X i ) } 2 / r j j ( X / )

and then setting

if/4 =d-1 Ejd=l n-1

E i = m ( i j - - 1 ) 2. The marginal density/~ can be estimated as usual by

W. Hdrdle et al./Journal of Statistical Plannin9 and Inference 68 (1998) 221-245

Fig. 6. The conditional mean function of the DEM/USD daily returns.

Fig. 7. The conditional mean function of the DEM/GBP daily returns.

W. Hiirdle et al./Journal of Statistical Plannin9 and Inference 68 (1998) 221-245

X: Lagged [~EI41~D returns

Fig. 8. The conditional variance function of the DEM/USD daily returns.

for some constant a > 0 in which case

W. Hiirdle et al./ Journal of Statistical Planning and Inference 68 (1998) 221-245

Fig. 9. The conditional variance function of the DEM/GBP daily returns.