Вы находитесь на странице: 1из 19

Math-Net.

Ru
Общероссийский математический портал

C. Butucea, A. Tsybakov, Sharp optimality in density


deconvolution with dominating bias. I, Теория веро-
ятн. и ее примен., 2007, том 52, выпуск 1, 111–128
DOI: https://doi.org/10.4213/tvp7

Использование Общероссийского математического портала Math-


Net.Ru подразумевает, что вы прочитали и согласны с пользователь-
ским соглашением
http://www.mathnet.ru/rus/agreement

Параметры загрузки:
IP: 93.30.221.223
10 апреля 2023 г., 18:26:35
ТЕОРИЯ ВЕРОЯТНОСТЕЙ
Том 52 И ЕЕ П Р И М Е Н Е Н И Я Выпуск 1
2007

© 2007 г. BUTUCEA С.*, TSYBAKOV А. В.

SHARP OPTIMALITY IN DENSITY DECONVOLUTION


WITH DOMINATING BIAS. I

Рассматривается задача оценивания плотности распределения


вероятностей / независимых и одинаково распределенных случай­
ных величин Xi, наблюдаемых в присутствии независимого и оди­
наково распределенного шума. Предполагается что неизвестная
плотность / принадлежит классу плотностей, характеристические
r
функции которых ведут себя как ехр(—a|u| ) при |гг| —> оо, где
а > 0, г > 0. Плотность распределения вероятностей шума счита­
ется известной и такой, что ее характеристическая функция убы­
s
вает как exp(—P\u\ ) при |и| —> оо, где (3 > 0, 5 > 0. В предполо­
жении, что г < 5 , предлагается оценка ядерного типа, дисперсия
которой оказывается асимптотически пренебрежимой по отноше­
нию к квадрату ее смещения как в случае поточечного, так и в
случае Ь»2-риска. При г < s/2 строится точная адаптивная оценка
для / .

Ключевые слова и фразы: деконволюция, непараметрическое


оценивание плотности, бесконечно дифференцируемые функции,
точные константы в непараметрическом сглаживании, минимакс­
ный риск, адаптивное оценивание.

1. I n t r o d u c t i o n . Assume that one observes Y i , . . . , Y in the model


n

Yi — Xi + S{ г = 1,... , n

where X { are i.i.d. random variables with an unknown probability density /


with respect to the Lebesgue measure on R , the random variables Si are i.i.d.
£
with known probability density f with respect to the Lebesgue measure
on R , and (si,..., £ ) is independent of ( X . . . ,X ).
n b n The deconvolution
problem that we consider here is to estimate / from observations Y i , . . . , Y . n

Y £
Denote by f —f *f the density of the variables Y n where * is the
у х £
convolution sign. Let Ф , Ф , and Ф be the characteristic functions of

* Laboratoire de Probabilites et Modeles, Aleatoires (UMR CNRS 7599), Universite


Paris VI and Modal'X, Universite Paris X, 200, Avenue de la Republique, 92001 Nanterre
Cedex, France; e-mail: butucea@ccr.jussieu.fr
** Laboratoire de Probabilites et Modeles, Aleatoires (UMR CNRS 7599),
Universite Paris VI, 4, PI. Jussieu, Boite Courrier 188, 75252 Paris, France; e-mail:
tsybakov@ccr.jussieu.fr
112 Butucea С, Tsybakov А. В.

random variables YJ, X and respectively. For an integrable function


{)

g: R —> R , define the Fourier transform

9
Ф (и) = j g(x) exp(ixu) du.

We assume that the unknown density / belongs to the class of functions

< , R ( L ) = j/: R - > R , yV = l a n d + j\&(u)\ 2


exp (2a\u\ )r
du ^ 2TTL J,
where a > 0, r > 0, L > 0 are finite constants. The classes of densities of
this type have been studied by many authors starting from [13]. For a recent
overview see [2] and [1].
We suppose also in most of the results that the characteristic function
of noise Si satisfies the following assumption.
A s s u m p t i o n ( N ) . There exist constants u > 0, /3 > 0, s > 0, fe > 0, 0 min

frmax > 0, and 7 , У G R such that

7 s £ 7 s
b m i n M exp ( - P\u\ ) < \Ф (и)\ < Ь ^\и\ ' т ехр ( - (3\u\ ) (1)

for \u\ ^ UQ.


Many important probability densities belong to the class « g ^ ( L ) with > R

some a, r, L or have the characteristic function satisfying ( 1 ) . All such densi­


ties are infinitely many times differentiable on R . Examples include normal,
Cauchy, and general stable laws, Student, logistic, extreme value distribu­
tions, and other, as well as their mixtures and convolutions. Note that in
these examples the values r and/or s are less than or equal to 2. Although
the densities with r > 2, s > 2 are in principle conceivable, they are difficult
to express in a closed form, and the set of such densities does not contain sta­
tistically famous representatives. This remark concerns especially the noise

density f that should be explicitly known. Therefore, without a meaningful
loss, we will sometimes restrict our study to the case 0 < s ^ 2.
For any estimator f of / , define the maximal pointwise risk over the
n

class srf , {L) for any fixed x E R by


a r

L S U E 2
Rn{xJ ,^ecA ))
n = P / \fn(x) ~
L
f(x)\
/€^e,r(L)

and the maximal L -risk by 2

# (L ,
n 2 / N , < R ( L ) ) = sup E , [||/ n - f\\\

where E y [ - ] is the expectation with respect to the joint distribution Р / of


Y i , . . . , Y^, when the underlying probability density of X^s is / , and || • | | 2
Sharp optimality in density deconvolution. I 113

stands for the L ( R ) - n o r m . (In what follows we use the notation L ( R ) , in


2 P

general, for the L -spaces of complex valued functions on R . )


p

The asymptotics of minimax risks differ significantly for the cases r < s,
r = 5 , and r > s. If r < s the «variance» part of the asymptotical minimax
risks is asymptotically negligible with respect to the « b i a s » part, while for
r > s the bias contribution is asymptotically negligible with respect to the
variance. In this paper we consider the bias dominated case, i.e., we assume
that r < s. The setting with dominating variance will be treated in another
paper.
The problems of density deconvolution with dominating bias were his­
torically the first ones studied in the literature (cf. [16], [18], [3], [21], [9],
[10], [14], [6]), motivated by the importance of deconvolution with Gaussian
noise. These papers consider, in particular, the noise distributions satis­
fying (1), but the densities / belonging to finite smoothness classes, such
as Holder's or Sobolev's ones, where the estimation of / is harder than for
the class « ^ ( L ) . In this framework they show that optimal rates of con­
> r

vergence are as a power of ln n which suggests that essentially there is no


hope to recover / with a reasonably small error for reasonable sample sizes.
This conclusion is often interpreted as a general pessimistic message about
the Gaussian deconvolution problem. Note, however, that such minimax
results are obtained for the least favorable densities in Holder's or Sobolev's
classes. Often the underlying density is much nicer (for instance, it belongs
to £/ (L),
air as the popular densities mentioned above), and the estimation
can be significantly improved, as we show below: the optimal rates of con­
vergence are in fact faster than any power of Inn.
Pensky and Vidakovic [15] were the first to point out the effect of fast
rates in density deconvolution, considering the classes of densities that are
somewhat smaller than £/ (L) (including an additional restriction on the
a>r

tails of / ) and with the noise satisfying ( 1 ) . They analyzed the rates of
convergence of wavelet deconvolution estimators, restricting their attention
to the L -risk.
2

The present work contains two parts. In this first part we obtain upper
bounds on the risks of estimators on the classes « e ^ ( L ) . These results in
>r

particular imply that the rates achieved by the estimators in [15] are subop-
timal on srf (L) and that the faster rates can be attained by a simpler and
ayT

more traditional kernel deconvolution method with suitably chosen parame­


ters. In the forthcoming second part (which will appear in one of next issues
of this journal) we give minimax lower bounds on the risks and show that
our method attains not only the optimal rates but also the best asymptotic
constants (i.e., is sharp optimal). Moreover, we prove that the proposed
estimator is sharp optimal simultaneously under the L -risk and under the 2

pointwise risk and that it is sharp adaptive to the parameters a, r, L in some


cases.
114 Butucea С, Tsybakov А. В.

In the forthcoming second part of this paper, we develop a new tech­


nique of construction of minimax lower bounds. It might be useful to get
lower bounds for similar « 2 exponents» type settings in other inverse prob­
lems. T o our knowledge, except for the case r — s — 1 treated in [12], [19],
and [4], such lower bounds are not available even for the Gaussian white noise
(or sequence space) deconvolution model, although some upper bounds are
known (cf. [8], [7]).
Finally, we mention publications on adaptive deconvolution under As­
sumption (N) or its analogs. They deal with the problems that are somewhat
different from ours. Efromovich [6] considered the problem of deconvolution
£ £
where the densities / and f are both periodic on [0,2тг], f satisfies an
analog of Assumption (N) expressed in terms of Fourier coefficients and /
belongs to a class of periodic functions of Sobolev type. He proposed sharp
adaptive estimators with logarithmic rates which are optimal for that frame­
work, as discussed above. Adaptive deconvolution in a Gaussian white noise
model had been studied by Goldenshluger [11]. He worked under the As­
sumption (N) on the Fourier transform of the convolution kernel or under
the assumption that it decreases as a power of и as \u\ —> oo, but he assumed
that the function / to estimate belongs to a Sobolev class with unknown pa­
rameters. He proposed a rate adaptive estimator under the pointwise risk.

2. T h e estimator, its bias and variance. Consider the following


kernel estimator of / :

where h n > 0 is a bandwidth and K n is the function on R defined as the


inverse Fourier transform of

Ф Кп
(и) =
ПН
£
< i) (3)
Ф (и/К)'

Here and later / ( • ) denotes the indicator function. The function K n is called
kernel, but unlike the usual Parzen-Rosenblatt kernels, it depends on n.
Кп
For the existence of K n it is enough that Ф E L ( R ) (and thus
2

фК п £ L ^ R ) ) This holds under mild assumptions. For example, in view


of the continuity property of characteristic functions, the assumption that
£ Кп
Ф (и) ф 0 for all и G R is sufficient to have Ф G L ( R ) . Moreover, the
2

Кп
condition Ф G L ( R ) implies that the kernel K
2 n is real-valued. In fact,
under this condition we have

(
Sharp optimality in density deconvolution. I 115

£ 2
for almost all и G R , where V (u) = I(\u\ < 1)/\Ф (и/к )\ is an even
n п

е
real-valued function belonging to L q ( R ) and Ф (—и/к ) (the complex con­ п

е
jugate of Ф (и/Н )) is the Fourier transform of real-valued function t н->
п

e
h f (—h t).
n This implies that K is a convolution of two real-valued func­
n n

tions.
The estimator (2) belongs to the family of kernel deconvolution estima­
tors studied in many papers starting from [18], [3], and [21]. It can also be
deduced from a unified approach to construction of estimators in statistical
inverse problems [17].
The following proposition establishes upper bounds on the pointwise and
2
the L bias terms, i.e., on the quantities \Eff (x)
2 — f(x)\ and | | E / / — / | | . n n 2

P r o p o s i t i o n 1. Let f G srf , {L), a > 0, r > 0, L > 0 and assume


a r

Кп
that Ф G L ( R ) for any h > 0. Then the squared bias of f {x) is bounded
2 n n

as follows:

2 L
<r Л'-i
sup ЩШ - f(x) s£
2-каг
n„ exp 2
? ) (l + o(D)

as h n 0, while the bias term of the L -risk 2 satisfies

l | E / „ - / | | ^ L e x p ( - ^ )
/

for every h n > 0.


Proof. For the pointwise bias we have

ЩШ - f(x) Kn
\hJ Y
*f ^)W-fW
exp(—iux) du

2
I ^ ( | 7(|n/i |>l)^(n)|du) . n

" (2

Applying the Cauchy-Schwarz inequality and the assumption that / belongs


to j2/ (L) atr we get

r
|E// (x) - /(х)Г < — ^
n / exp ( - 2a\u\ ) du
(27Г)
™y J\u\>l/h n

Х 2 r
X f \Ф (u)\ exp (2a\u\ ) du
J\u\>l/h n

L F
r
— I exp ( - 2a\u\ ) du, (4)
27Г J\ \>l/h
u n

which together with Lemma 2 (see Appendix) yields the first inequality of
the proposition.
116 Butucea С, Tsybakov А. В.

To prove the second inequality, we apply the Plancherel formula and get
2
1 v \
|E//n-/|| 2
2 = WJ-^fKn
ll>r> H.
к ¥ x 2
= _L J \ф »( Н )Ф (и)-$ (u)\ du
и п

27Г J\u\>l/h n

Proposition 1 is proved.
The next proposition gives upper bounds on the pointwise and the L 2

variance terms defined as

Var = E, [\f (x)-E f (x)t


n f n
and

Var / ) 2 / n = Е/ Г||Д - Ё / Д Щ ] ,

respectively.
£
P r o p o s i t i o n 2. Let the left inequality in (1) hold and Ф (и) ф 0 for
all и G R . Then, for any density f such that s u p / ( x ) < / * < oo, the x € R

pointwise variance of the estimator f (x) is bounded as follows: n

sup Var ffn(x) = supE/ \f (x)


n - E f (x)
f n

^ mm

as h —> 0, and, for an arbitrary density f, the variance term of the


n \j -risk
2

satisfies

M+2 -l 7
2/3
Va , /
r / 2 n = [||/n -
B / <^ - ^ e x p j(l + o(l)) (7)

as h -> 0.
n

P r o o f . For the pointwise variance we obtain two separate bounds


and then take the minimum of them. To get the first bound, we write
1 1_K ( x - Y ^ *
Vax,/ (aO = - E ,
n

nhh n I h n
n
( H ) ^ ^
N
У

2
< J!_wk II (8)
Sharp optimality in density deconvolution. I 117

Y £
where we used the fact that the convolution density f = f * f is uniformly
bounded by / * . Applying the Plancherel formula and using (1) and (25) of
Lemma 2 in the Appendix we get

\Kn\\l = ^ I \&(u)\- du 2

^ J\u\^l/h n

< ^ - S - / М-*ехр(2/?М')А.
H-^exp(2/3|u| )du + ^
^ e
/ | Ф №(u)\-
» | du 2

2 7 rr
^m
mi n Juo<l«Kl//in
2 7 r & 2 7
- /J\uKu
|«Kuo 0

С " и' 2 1
exp{2Pu ) s
du + 0(h )
n

™min JO

This and (8) imply the first bound in (6). For the second bound we still
use the second line in (8) but then we apply the Plancherel formula in a
different way:

V < , ( , )
«A'| ;/(^-(T))' ' *
^ 2~Ь/ № K 2 l n u
' ( ) $ Y u
( ) \ d u
< I («)!<*«>
у
where i f 1 > n ( - ) = K (-/h )/h n n n and Ф is the complex conjugate of Ф . Thus,
К1 п Кп
using that Ф > (и) = Ф (Н и) п and then acting similarly to (9) we get

К 1 К 1
Vzr f (x)
f n < |Ф -*Ф -Н|^^^(| к
\Ф "(Ки)\<1и

£ _ 1
= / |Ф Н| ^
27ГП V J|U|^L//i n /

2 2 2 s
ir(3 s b n mm \h J n

which yields the second bound in (6). Finally,

2
112*
nft n JJ h n \ h n J nhn

and in view of (9) we obtain (7). Proposition 2 is proved.


Clearly, the bounds of Proposition 2 can be applied to / G ^ > r ( L ) with,
for example,
/* = sup sup|/(x)|.

This value is finite and can be taken as in Lemma 1 of the Appendix.


Interestingly, inequality (6) shows that asymptotics of the pointwise
variance are different for 0 < 5 < 1 and 5 > 1, while this is not the case
118 Butucea С, Tsybakov А. В.

for the L variance term given by ( 7 ) . Inequality (6) can be compared to


2

the recent result of van Es and Uh [20]. They studied asymptotic pointwise
variance of the same deconvolution kernel estimator in the particular case of
stable noise distributions with | < s ^ 2 and also noticed that 5 = 1 marks
a change of behavior. These effects concerning variance terms will not be
crucial in what follows since we will consider the bias dominated case.
Let us note that for practical implementation we can improve the kernel
by smoothing its Fourier transform. One possibility is the following: let K n

be the kernel defined via its Fourier transform

к
Щи)
Ф "{и) =
&{u/h y n

1 — \u\

Щи) = I(\u\ < 1 - £ ) + —-L-L


n

On
/(1 _ 5 ^ \ \ ^ n u i)
?
s
for some S -> 0 such that S /h
n -> 0 and 5 /h -> 0 as n -> o o . Then
n n n n

2
| K ( x ) | = 0 ( l / | x | ) as \x\ —> oo, K is integrable for any n and the resulting
n n

estimator is also integrable. Moreover, by further smoothing the function Ф


Кп £
we can obtain a Fourier transform Ф as smooth as Ф and the resulting
kernel estimator will have as many finite absolute moments as the noise
distribution.
3. Optimal band widths and upper bounds for the risks. Propo­
sitions 1 and 2 lead to upper bounds for pointwise and L risks that can be 2

minimized in h . In this section we give an asymptotic approximation for


n

the result of such a minimization assuming that r < s. The corresponding


solutions h will be called optimal bandwidths. Note that here we consider
n

only optimization within a given class of estimators, moreover we minimize


upper bounds on the risks and not the exact risks. However, this turns out
to be precise enough in asymptotical sense: in the next section we will show
that the estimator f with optimal bandwidth is sharp minimax over all
n

possible estimators.
Decomposition of the mean squared error of the kernel estimator into
bias and variance terms and application of Propositions 1 and 2 yield
2
Е/ | / ( x ) - f(x)\ ]
n = \E f (x)
f n - f(x)f + Vbx f (x) f n

+2l l
L urt_, r 1
exp - —
/ 2 a \+ — — 2
Г K2 - ~ exp '
(W
n 2
2тгаг *4 KJ 2ж(ЗзЬ т1п n ^ \h

We now minimize the last expression in h . Clearly, the minimizer h = h n n n

tends to 0 as n —> oo. Taking derivatives with respect to h and neglecting n

the smaller terms lead us to the equation for optimal bandwidth

^ nh-^(l + o ( l ) ) = exp fa + f \ (10)


Sharp optimality in density deconvolution. I 119

(asymptotics are taken as h n - » 0, n —» o o ) . Taking logarithms in the above


equation we obtain that the optimal bandwidth h n is a solution in h of the
equation

2 7 In Л + ^ + ^ = Inn + C ( l + o ( l ) ) . (11)

Here and in what follows we denote by С constants with values in R that can
be different on different occasions. For the bandwidth h = h n satisfying (10)
and (11) we can write

S r . e x p (_|) = c ( 1 + o ( 1 ) ) s^l e x p (|)

= С ( 1 + ( 1 ) ) Л £ - - й _ exp
0 [£y

with some constant С > 0. This proves that, for the optimal bandwidth, the
bias term dominates the variance term whenever r < s. (Strictly speaking,
here we consider upper bounds on the bias and variance terms and not
precisely these terms.)
Similarly, for the L -risk we get 2

E /[ll/n-/lll] = l|E//n-/||2+ Var / ) 2 / n

/ 2a\ 1 /2/3

and the minimizer h n = h (L ) n 2 of the last expression is a solution in h of


the equation

(r + 2 7 - 1) 1пЛ + ^ + ^ = Inn + C ( l + o ( l ) ) . (12)

Now, this equation implies

= c a + o a ) ) , , r L s ) ^ e x p ( J_),
for some constant С > 0. This proves that also for the L -risk the bias term 2

dominates the variance term whenever r < s.


Thus we obtain two different equations (11) and (12) that define optimal
bandwidths for pointwise and L 2 risks, respectively, and in both cases the
bias terms are asymptotically dominating.
In fact, we can obtain the same results using a single bandwidth defined
as follows. Denote by h* = h*(ri) a unique solution of the equation
120 Butucea С, Tsybakov А. В.

(in what follows we will assume without loss of generality that n ^ 3 to


2
ensure that Inn > ( l n l n n ) ) . Lemma 4 in the Appendix implies that, both
for the pointwise and the L loss, the bias terms of the estimator f with
2 n

bandwidth h* given by (13) are of the same order as those corresponding to


bandwidths h and / i ( L ) , while the variance terms corresponding to (13)
n n 2

are asymptotically smaller. Thus, the pointwise risk and the L -risk of the 2

estimator f with bandwidth h* given by (13) are asymptotically of the same


n

order as those for estimators f with optimal bandwidths h and fo (L ),


n n n 2

respectively.
Note that, in fact, h* is better than both bandwidths h and h (L ) n n 2

in the variance terms, but these terms are asymptotically negligible with
respect to the bias ones (cf. Lemma 4 ) . Therefore, the improvement does
not appear in the main term of the asymptotics. Note also that the sequence
2 1 - r / s
( l n l n n ) in (13) can be replaced by a sequence satisfying b — o ( ( l n n ) ), n

b /lnlnn
n -> oo and the above argument remains valid (cf. the proof of
Lemma 4 ) .
Calculating the upper bounds for bias terms of the estimator f with n

bandwidth (13) we get the following asymptotical upper bounds for its point-
wise and L risks, respectively:
2

( 1 r ) / s
2 L i r _ x ( 2a\ L /lnn\ " /
V ех = е х
- | ) ( l o
W Л~к) ^ГгЫ) Ч
+ ( 1 ) )

(14)
and
2
</? (L ) = 2 L e x p ( - ^ . (15)

The above remarks can be summarized as follows.

T h e o r e m 1. Let a > 0, L > 0, 0 < r < 5 < oo, let the left inequality
£
in (1) hold and Ф (и) ф 0 for all и G R . Then the kernel estimator f n

with bandwidth defined by (13) satisfies the following pointwise and L risk 2

bounds

2
lim sup sup R (ж, n / , £ 4 , ( L ) ) </?~ ^ 1,
п r (16)

2
l i m s u p . R „ ( L , / „ , ^ , ( L ) ) V > - ( L ) ^ 1,
2 e r 2 (17)
n—too

where the rates ip and (p (L ) n n 2 are given in (14) and (15).


The case r — 1 and s = 2 is of a particular interest. It covers the
e
situation, where the noise density f is Gaussian (s = 2) and the underlying
density / admits the analytic continuation into a strip of the complex plane
(r = 1), as it is the case for the statistically famous densities mentioned in
the introduction. This case is in the zone r ^ s/2, where we get the following
Sharp optimality in density deconvolution. I 121

behavior:

L /lnn\
( — )
( 1
- r ) / s
/ /1пп\
exp(^-2.(—)
0
р/
Л„
j (1 + 0(1)),
.,
rf r < - ,
s
2тг ar

exp - 2 a J — + — (l+o(l)), 2
if r = - , 2
2тгаг V 2/3 j * \ У /5 ^ /
(18)
and

Г /
Ь е х Р ^ - 2 а ^ ) ^ ( 1 + (1)), 0 if r <
(19)

L e x p 2 a + ( i + o ( i ) ) i f
(- \/W f) '
The bandwidth (13) depends on the parameters a, r of the class ^ ( L ) > r

that are not known in practice. However, it is possible to construct an adap­


tive estimator that does not depend on these parameters and that attains
the same asymptotic behavior as in Theorem 1 both for pointwise and L 2

risks when r < s/2. Define the set of parameters

E = | ( a , L , r ) : a > 0, L > 0, 0 < r < |J.

Note that the parameters s and /3 are supposed to be known since they
£
characterize the known density of noise f .
£
T h e o r e m 2. Suppose that the left inequality in (1) holds and Ф (и) ф О
n
for all и G R . Let f be kernel estimator defined in (2) with bandwidth
h — h* defined by
n

Ann /inn \ , л Ч

( 2 0 )

for n large enough so that (Inn)/(2/3) > 1. Then, for all (a, L , r ) G в ,
2
lim sup sup R (ж, n / * , я/ , Щ)
а г cp~ ^ 1
n—too XGR

and
2
l i m s u p # ( L , / * , < , ( L ) ) ^ ( L ) ^ 1,
n 2 r 2

where the rates ip n and </? (L ) are given in (14) and (15) (and, more par­
n 2

ticularly; satisfy (18) and (19) with r < s/2).


Proof. Since r/s < \, we have
122 Butucea С., Tsybakov A. В.

for n large enough, and thus

2a
exp

On the other hand,

p = e i p
I" ((l) rW)'
Therefore, the ratio of the bias term of / * to the variance term of / * both
for the pointwise risk and for the L -risk is bounded from below by 2

for some b G R . This expression tends to oo as n —> o o . Thus, the variance


terms are asymptotically negligible with respect to the bias terms. It remains
to check that the bias terms of f* for both risks are asymptotically bounded
2
by cpl and < ^ ( L ) , respectively.
2

In view of Proposition 1, for n large enough the bias term of / * for the
pointwise risk is bounded from above by
т1
-1/2-] r/s '
Ь /,ячг i ( с (\ъп\ *\ л /1пп\
(Щ-ехр - 2 а ( - ) [ ! - ( - )
2тгаг
( 1 г ) / з r / s r / s 1 / 2
L ( l + o(l)) / 1 п п \ - / „ /lnn\ /lnn\ - \

= V J(l+o(l)),

where с > 0 is a constant and we have used (18) with r < s/2 for the last
equality. Similarly, for n large enough the bias term of / * for the L -risk is 2

bounded from above by


R / S r/s\
/ n /lnn\ R /In

2
< Lexp ( - 2 a ( ^ ) " " + ^ ) ' " " " ) = + 44),

where с > 0 and we have used (19) with r < s/2 for the last equality.
Theorem 2 is proved.
If r = s/2, adaptation to ( a , L) is still possible via a procedure similar
to that of Theorem 2, but it does not attain the exact constant, as shows
the following result. Introduce the set

G 0 = { ( a , L): 0 < a ^ a , L > 0}, 0

where a 0 > 0 is a constant.


Sharp optimality in density deconvolution. I 123

£
T h e o r e m 3. Suppose that the left inequality in (1) holds and Ф (и) ф О
for all и G R . Let f* be the kernel estimator defined in (2) with bandwidth
h — h* defined by
n

1 s
I—— \ ~ /
( Inn A /Inn \

2
where A > A and n is large enough so that (lnn)/(2/3) > (А/' /3) .
0 Then for
r = s/2 and for all (a, L) G 0 , O

2
/ Q^4U oc \
2
\ims\^s\xpR (xJ^,^ (L))Lp-
n r ^ exp — - - ) , (21)
n->oo x€R \ P P J
2
F OTA EX. \

H m s u p i ? ( L , / * , < , ( L ) ) ^ ( L ) ^ exp
n 2 r
2
2 - — J, (22)
where the rates tp and (p (L )n n 2 are given in (18) and (19).
Proof. It is easily checked that the bias exponent

=exp
exp
(" JkW) v V ~w +
T) {1+o(1))
'
while for the variance term exponent

x p = e x p
^ (-(w) ("Ww)'
Since A > a, the bias term of / * asymptotically dominates its variance term.
Inequalities (21) and (22) now follow from these remarks and the expressions
2
for (p , < ^ ( L ) in (18) and (19) with r = s/2. Theorem 3 is proved.
n 2

Appendix.
L e m m a 1. For 0 < a, r, L < oo,
_ 1
sup sup ^ L + 7r C(r, a),

r
иЛеге C ( r , a ) = / °° exp(—2au ) du.
0

Proof. Let Ф = be the characteristic function of / . Clearly,

V x e R 23
h I - ( )
By Markov's inequality

/|Ф(„)|/(|ФМ|е*р( |„Г) > 1 ) * , < / e x p ( 2 K ) | ( „ ) | V „ , 2 . L .


2а a $

Also,

г r
J \Ф(и)\1(\Ф(и)\ехр(2а\и\ ) ^l^jdu ^ 2 exp (-2au ) du = 2C(r, a ) .

Combining the last two inequalities with (23) proves the Lemma.
124 Butucea С, Tsybakov А. В.

L e m m a 2. For any positive a, /?, r, s and for any A G R and B E R ,


we have
f°° 1
A r + 1 _ г
/ u exp(-au ) du = — И e x p ( - m / ) ( l + o(l)), v ->cx), (24)

and
Б 5 + 1 s s
Г и exp(/?u ) du = - j - ^ " e x p ( ^ ) ( l + o(l)), v oo. (25)
Proof of this lemma is omitted. It is based on integration by parts and
standard evaluations of integrals.
L e m m a 3. Let p be the density of stable symmetric distribution with
r
characteristic function exp(—|£| ), 1 < r < 2. Thenp is continuous, p(x) > 0
for all x G R and there exist C\ > 0, c > 0 such that 2

r
p(x) ^ c | z | " " \ 1

for \x\ ^ c . 2

Proof. Prom [22, Theorem 2.2.3, formula (2.2.18)], we get

|l/(r-l) , 1

P{x) = [ <Ч>) exp(-\x\ ^-Mv)) r


dip, хфО, (26)

where
_ /sin(7rry/2)\
r/(1
~ r)
cos(7r(r- l)(p/2)
~ V COS(TT^/2) J COS(TT^/2)

Clearly, for ф G [|, 1] we have

/7г(г — v
1)Ф\
J Y
/7г(г — 1 ) \
1 ^ cos [ _ I ^ cos f — ] > 0,

. f 7ГГ(р\

c ^ sm ( - у
3 I ^ c > 0, 4

where c > 0 and c > 0 are constants. Thus,


3 4

c ( cos — 1
6 ^ Ц<р) ^ c ( cos — I 5 ,

(p G [|, 1], c > 0, с > 0 are constants. Now, if (/? G [|, 1]


5 6

c ( l - <p) ^ cos - y ^ c ( l - <p)


7 8

for some c > 0, c > 0. Finally,


7 8

1 1
c ( l - v) ^- ) <
10 < c (l - ^ / ( - D 9 V v ? 6
Sharp optimality in density deconvolution. I 125

Using (26) and the fact that u(IP) ^ 0 for (P G [0,1], we get

1 / ( r 1 / г 1 г 1 1 г 1
P(X) > c|x| - ^ J' ( 1 - ^ ) ( - > е х р ( - | ж | / ^ - > с ( 1 - ( ^ ) / ( - ) ) DTP 9

1 1 1
= c \ x \ ^ J IPW-VEXP ( - C g d x l V ) ^ " ) ) DTP.

Here and further on с > 0 are constants, probably different on different


occasions.
R 1 R 1
By change of variables и = (\x\ CP) ^ ' \ we get
r l r
,(\x\ /2) " -V u ur-2
P(X) 2 у 0 | |r/(r-l)
g *M-C9U) -j^r dU

1 r r 1
— c\x\ u exp(—c u) 9 du
Jo
1 r r 1 1 r
^ c\x\ u exp(—cgu) du ^ с \х\ г ,
Jo
for |x| ^ c > 0. This also implies that p(x) > 0, for all x ф 0 and
2

1
р(0) = ( 2 т г ) - | е х р ( - | * Г ) < Й # 0 ,

hence p is positive on R . Lemma 3 is proved.


L e m m a 4. Let 0 < r < s < oo and let h* = /i*(n) be defined
by (13), i.e.,

2
^ + ^ == Inn - ( l n l n n ) v )
h: hi
Let hn satisfy
2A 20
b\nh
+ — + ~ - = lnn + C ( l + o ( l ) ) ,
n n -> oo,
fa fa
n n
/ o r some 6 G R and С G R . TTien, as n —> oo, гуе /lave

1 / s
/lnn\ ~

^ exp ( - £ ) = К exp ( - ^ ) (1 + o ( l ) ) , (28)

:
M*)-W *))-
/ o r an?/ a G R , and

/ г : + 2 7 1 е х р + 2 7 _ 1 е х р ( 3 0 )
" (1)^" (|)'
for n large enough.
126 Butucea С, Tsybakov А. В.

s s
Proof. Define x* = fa , x n = fa , and write, for t > 0,

d r / s f r/s
F(t) = 2/3* + 2 a t , Fi (t) = - - ln t + 2/3* + 2at .

Then
2
F ( z * ) = Inn - ( l n l n n ) , (31)

F^Xn) = lnn + C ( l + o(l)), (32)

for a constant С G R . We first prove that x n satisfies

F(x ) n = Inn + Ci l n l n n (1 + o ( l ) ) + C ( l + o ( l ) ) 2 (33)

for some constants C i , C G R . In fact, 2

1
^(x ) = J - ( - ^
n + 2/? + ^ ; / - > o x >

V Sj S

for z large enough, thus Fi(t) is strictly monotone increasing for large i,
n

and a solution z of (32) exists for large n (and is unique). Next, clearly,
n

and therefore ( l n n ) / ( 2 / 3 x ) 1 as n —> oo. Similarly, (lnn)/(2/?x*) —> 1


n

as n —> oo, which yields (27). Thus (—b/s) lnx = (—b/s) l n l n n (1 + o ( l ) ) n

7
as n -> oo, and write F(x ) = i i ( x ) + (b/s) l n x to get (33) in view of (32).
n n n

We have
_ 1 _ 1
x = F (lnn + a ), n x* = F ( l n n - b ) , n n

2
where a = Ci l n l n n (1 + o ( l ) ) + C ( l + o ( l ) ) = O ( l n l n n ) , 6 = ( l n l n n )
n 2 n

_ 1
and F ( - ) is the inverse of F ( - ) . Hence, for some 0 < r < 1 and for n large
enough,
_ 1 _ 1
xn = F ( l n n + a ) = x* + ( F ( l n n - b ))'(a
n n n + b) n

2
+ i ( F - ^ l n n - 6 (l - r) + r a ) ) " ( a + b ) . n n n n (34)

l
The first and the second derivatives of F~ are given by
1 1
{F-\y))'
F'(F-i(y)) 20 + (2ar/s)(F-i(y)y/ s-l '

/s 2
- W>
(F 4ll = -(2ar/s)(r/s-l)(F-\y)r -
{ K V , ) 1 r / s 1 3
(2/?+(2ar/s)(F- (y)) - ) '
Hence

(р-\\пп-Ъ )У г = ——-±-— = J-+ o ( i), „_>«,. (35)


2(3 + (2ar/s)xJ *P
Sharp optimality in density deconvolution. I 127

Next, it is easy to show that there exists у > 0 such that

<г ъи <36)
Ь ~ Ь
_ def _
for у ^y. Considering n large enough so that y n = Ът — Ъ (1 —
п т)+а т^у
п

1 ff
and using the above expression for (F~ (y)) and (36) we get

s 2
(F-\y ))"
n = (F-Hlnn-b (l-T)+ra ))"
n n = 0((lnny/ - )

as n —> oo. This and (34), (35) imply

x,~xn = - - L ( 1 + o(l))(a n +M +О ( ) = ( l + o ( l ) ) . (37)

Using this representation we obtain

r
exp ( - ^ + | ? ) = exp ( - 2a(x /° - *;/•)) r

a r s
= exp ( - 2 с < / * ( [ 1 + 6 ( 2 / 3 x . ) - ( l + o ( l ) ) ] / - l ) )
n

/ s _ 1
= exp(0(6X ) ) = l + o(l),

2 - 1
since b = ( l n l n n ) , ж. = ( 2 / ? ) l n n ( l + o ( l ) ) and r < s. This and the
n

a a s
fact that {h /K) n = {xjx ) / = 1 + o ( l ) imply (28). Next, (29) follows
n

directly from the definition of /г. and from (27). To prove (30), note that,
in view of (37),

7 s+2 -l7 /о/Ч ОЙ\

hn
exp - £ J = (1 + o ( l ) ) exp(2/?(x. - x ))
n

(l + o ( l ) ) e x p ( - & „ [ ! + < 1

for n large enough. Lemma 4 is proved.

A c k n o w l e d g m e n t . The results of this paper were presented at


the Conference «Rencontres de statistiques mathematiques», C I R M Lu­
mmy, 2001. Later, Fabienne Comte and Marie-Luce Taupin suggested a
different estimator for the same problem refraining from studying the opti­
mality of rates issue in [5]. We would like to thank them for discussion of
the results.

REFERENCES

1. Artiles L. M. Adaptive minimax estimation in classes of smooth functions. PhD The­


sis. Utrecht: University of Utrecht, 2001.
128 Butucea С, Tsybakov А. В.

2. Belitser E., Levit B. Asymptotically local minimax estimation of infinitely smooth


density with censored data. — Ann. Inst. Statist. Math., 2001, v. 53, № 2 , p. 289-
306.
3. Carroll R. J., Hall P. Optimal rates of convergence for deconvolving a density. —
J. Amer. Statist. Assoc., 1988, v. 83, №404, p. 1184-1186.
4. Cavalier L., Golubev G. K., Lepski О. V., Tsybakov A. B. Block thresholding and sharp
adaptive estimation in severely ill-posed inverse problems. — Теория вероятн. и ее
примен., 2003, т. 48, в. 3, р. 534-556.
5. Comte F., Taupin M.-L. Penalized contrast estimator for density deconvolution with
mixing variable. Prepublication d'Orsay № 2003-30. Paris: MAP5, Universite Paris V,
2003.
6. Efromovich S. Density estimation in the case of supersmooth measurement error. —
J. Amer. Statist. Assoc., 1997, v. 92, №438, p. 526-535.
7. Efromovich 5., Koltchinskii V. On inverse problems with unknown operators. — IEEE
Trans. Inform. Theory, 2001, v. 47, №7, p. 2876-2894.
8. Ермаков M. С. Минимакская оценка решения одной некорректно поставленной
задачи типа свертки. — Проблемы передачи информации, 1989, т. 25, № 3, с. 28-
39.
9. Fan J. On the optimal rates of convergence for nonparametric deconvolution prob­
lems. — Ann. Statist., 1991, v. 19, №3, p. 1257-1272.
10. Fan J. Global behavior of deconvolution kernel estimates. — Statist. Sinica, 1991,
v. 1, №2, p. 541-551.
11. Goldenshluger A. On pointwise adaptive nonparametric deconvolution. — Bernoulli,
1999, v. 5, № 5, p. 907-925.
12. Golubev G. K., Khasminskii R. Z. Statistical approach to the Cauchy problem for the
Laplace equation. — State of the Art in Probability and Statistics (Leiden, 1999),
Festschrift for W . R. van Zwet. Ed. by M. de Gunst, C. Klaassen, and A. van der Vaart.
Beachwood: Inst. Math. Statist., 2001, p. 419-433. (IMS Lecture Notes Monogr. Ser.,'
v. 36.)
13. Ибрагимов И. А., Хасъминский P. 3. Еще об оценке плотности распределения. —
Записки научн. сем. ЛОМИ, 1983, т. 108, с. 72-88.
14. Masry Е. Multivariate probability density deconvolution for stationary random pro­
cesses. — IEEE Trans. Inform. Theory, 1991, v. 37, №4, p. 1105-1115.
15. Pensky M., Vidakovic B. Adaptive wavelet estimator for nonparametric density de-
convolution. — Ann. Statist., 1999, v. 27, №6, p. 2033-2053.
16. Ritov Y. On a deconvolution of normal distributions. Preprint. Berkeley: University
of Berkeley, 1987.
17. Ruymgaart F. H. A unified approach to inversion problems in statistics. — Math.
Methods Statist., 1993, v. 2, № 2 , p. 130-146.
18. Stefanski L. A., Carroll R. J. Deconvoluting kernel density estimators. — Statistics,
1990, v. 21, №2, p. 169-184.
19. Tsybakov A. B. On the best rate of adaptive estimation in some inverse problems. —
C . R . Acad. Sci. Paris, 2000, v. 330, №9, p. 835-840.
20. van Es A. J., Uh H.-W. Asymptotic normality of nonparametric kernel type deconvo­
lution density estimators: crossing the Cauchy boundary. — J. Nonparametr. Statist.,
2004, v. 16, № 1-2, p. 261-277.
21. Zhang СИ. Fourier methods for estimating mixing densities and distributions. —
Ann. Statist., 1990, v. 18, № 2 , p. 806-831.
22. Zolotarev V. M. One-dimensional Stable Distributions. Providence, RI: Amer. Math.
S o c , 1986, 284 p. (Transl. Math. Monogr., v. 65.)

Поступила в редакцию
30.VIII.2004
Исправленный вариант
27.VI.2005

Вам также может понравиться