Академический Документы
Профессиональный Документы
Культура Документы
Ru
Общероссийский математический портал
Параметры загрузки:
IP: 93.30.221.223
10 апреля 2023 г., 18:26:35
ТЕОРИЯ ВЕРОЯТНОСТЕЙ
Том 52 И ЕЕ П Р И М Е Н Е Н И Я Выпуск 1
2007
Yi — Xi + S{ г = 1,... , n
Y £
Denote by f —f *f the density of the variables Y n where * is the
у х £
convolution sign. Let Ф , Ф , and Ф be the characteristic functions of
9
Ф (и) = j g(x) exp(ixu) du.
7 s £ 7 s
b m i n M exp ( - P\u\ ) < \Ф (и)\ < Ь ^\и\ ' т ехр ( - (3\u\ ) (1)
L S U E 2
Rn{xJ ,^ecA ))
n = P / \fn(x) ~
L
f(x)\
/€^e,r(L)
# (L ,
n 2 / N , < R ( L ) ) = sup E , [||/ n - f\\\
The asymptotics of minimax risks differ significantly for the cases r < s,
r = 5 , and r > s. If r < s the «variance» part of the asymptotical minimax
risks is asymptotically negligible with respect to the « b i a s » part, while for
r > s the bias contribution is asymptotically negligible with respect to the
variance. In this paper we consider the bias dominated case, i.e., we assume
that r < s. The setting with dominating variance will be treated in another
paper.
The problems of density deconvolution with dominating bias were his
torically the first ones studied in the literature (cf. [16], [18], [3], [21], [9],
[10], [14], [6]), motivated by the importance of deconvolution with Gaussian
noise. These papers consider, in particular, the noise distributions satis
fying (1), but the densities / belonging to finite smoothness classes, such
as Holder's or Sobolev's ones, where the estimation of / is harder than for
the class « ^ ( L ) . In this framework they show that optimal rates of con
> r
tails of / ) and with the noise satisfying ( 1 ) . They analyzed the rates of
convergence of wavelet deconvolution estimators, restricting their attention
to the L -risk.
2
The present work contains two parts. In this first part we obtain upper
bounds on the risks of estimators on the classes « e ^ ( L ) . These results in
>r
particular imply that the rates achieved by the estimators in [15] are subop-
timal on srf (L) and that the faster rates can be attained by a simpler and
ayT
Ф Кп
(и) =
ПН
£
< i) (3)
Ф (и/К)'
Here and later / ( • ) denotes the indicator function. The function K n is called
kernel, but unlike the usual Parzen-Rosenblatt kernels, it depends on n.
Кп
For the existence of K n it is enough that Ф E L ( R ) (and thus
2
Кп
condition Ф G L ( R ) implies that the kernel K
2 n is real-valued. In fact,
under this condition we have
(
Sharp optimality in density deconvolution. I 115
£ 2
for almost all и G R , where V (u) = I(\u\ < 1)/\Ф (и/к )\ is an even
n п
е
real-valued function belonging to L q ( R ) and Ф (—и/к ) (the complex con п
е
jugate of Ф (и/Н )) is the Fourier transform of real-valued function t н->
п
e
h f (—h t).
n This implies that K is a convolution of two real-valued func
n n
tions.
The estimator (2) belongs to the family of kernel deconvolution estima
tors studied in many papers starting from [18], [3], and [21]. It can also be
deduced from a unified approach to construction of estimators in statistical
inverse problems [17].
The following proposition establishes upper bounds on the pointwise and
2
the L bias terms, i.e., on the quantities \Eff (x)
2 — f(x)\ and | | E / / — / | | . n n 2
Кп
that Ф G L ( R ) for any h > 0. Then the squared bias of f {x) is bounded
2 n n
as follows:
2 L
<r Л'-i
sup ЩШ - f(x) s£
2-каг
n„ exp 2
? ) (l + o(D)
l | E / „ - / | | ^ L e x p ( - ^ )
/
ЩШ - f(x) Kn
\hJ Y
*f ^)W-fW
exp(—iux) du
2
I ^ ( | 7(|n/i |>l)^(n)|du) . n
" (2
r
|E// (x) - /(х)Г < — ^
n / exp ( - 2a\u\ ) du
(27Г)
™y J\u\>l/h n
Х 2 r
X f \Ф (u)\ exp (2a\u\ ) du
J\u\>l/h n
L F
r
— I exp ( - 2a\u\ ) du, (4)
27Г J\ \>l/h
u n
which together with Lemma 2 (see Appendix) yields the first inequality of
the proposition.
116 Butucea С, Tsybakov А. В.
To prove the second inequality, we apply the Plancherel formula and get
2
1 v \
|E//n-/|| 2
2 = WJ-^fKn
ll>r> H.
к ¥ x 2
= _L J \ф »( Н )Ф (и)-$ (u)\ du
и п
27Г J\u\>l/h n
Proposition 1 is proved.
The next proposition gives upper bounds on the pointwise and the L 2
Var / ) 2 / n = Е/ Г||Д - Ё / Д Щ ] ,
respectively.
£
P r o p o s i t i o n 2. Let the left inequality in (1) hold and Ф (и) ф 0 for
all и G R . Then, for any density f such that s u p / ( x ) < / * < oo, the x € R
^ mm
satisfies
M+2 -l 7
2/3
Va , /
r / 2 n = [||/n -
B / <^ - ^ e x p j(l + o(l)) (7)
as h -> 0.
n
nhh n I h n
n
( H ) ^ ^
N
У
2
< J!_wk II (8)
Sharp optimality in density deconvolution. I 117
Y £
where we used the fact that the convolution density f = f * f is uniformly
bounded by / * . Applying the Plancherel formula and using (1) and (25) of
Lemma 2 in the Appendix we get
\Kn\\l = ^ I \&(u)\- du 2
^ J\u\^l/h n
< ^ - S - / М-*ехр(2/?М')А.
H-^exp(2/3|u| )du + ^
^ e
/ | Ф №(u)\-
» | du 2
2 7 rr
^m
mi n Juo<l«Kl//in
2 7 r & 2 7
- /J\uKu
|«Kuo 0
С " и' 2 1
exp{2Pu ) s
du + 0(h )
n
™min JO
This and (8) imply the first bound in (6). For the second bound we still
use the second line in (8) but then we apply the Plancherel formula in a
different way:
V < , ( , )
«A'| ;/(^-(T))' ' *
^ 2~Ь/ № K 2 l n u
' ( ) $ Y u
( ) \ d u
< I («)!<*«>
у
where i f 1 > n ( - ) = K (-/h )/h n n n and Ф is the complex conjugate of Ф . Thus,
К1 п Кп
using that Ф > (и) = Ф (Н и) п and then acting similarly to (9) we get
К 1 К 1
Vzr f (x)
f n < |Ф -*Ф -Н|^^^(| к
\Ф "(Ки)\<1и
£ _ 1
= / |Ф Н| ^
27ГП V J|U|^L//i n /
2 2 2 s
ir(3 s b n mm \h J n
2
112*
nft n JJ h n \ h n J nhn
the recent result of van Es and Uh [20]. They studied asymptotic pointwise
variance of the same deconvolution kernel estimator in the particular case of
stable noise distributions with | < s ^ 2 and also noticed that 5 = 1 marks
a change of behavior. These effects concerning variance terms will not be
crucial in what follows since we will consider the bias dominated case.
Let us note that for practical implementation we can improve the kernel
by smoothing its Fourier transform. One possibility is the following: let K n
к
Щи)
Ф "{и) =
&{u/h y n
1 — \u\
On
/(1 _ 5 ^ \ \ ^ n u i)
?
s
for some S -> 0 such that S /h
n -> 0 and 5 /h -> 0 as n -> o o . Then
n n n n
2
| K ( x ) | = 0 ( l / | x | ) as \x\ —> oo, K is integrable for any n and the resulting
n n
possible estimators.
Decomposition of the mean squared error of the kernel estimator into
bias and variance terms and application of Propositions 1 and 2 yield
2
Е/ | / ( x ) - f(x)\ ]
n = \E f (x)
f n - f(x)f + Vbx f (x) f n
+2l l
L urt_, r 1
exp - —
/ 2 a \+ — — 2
Г K2 - ~ exp '
(W
n 2
2тгаг *4 KJ 2ж(ЗзЬ т1п n ^ \h
2 7 In Л + ^ + ^ = Inn + C ( l + o ( l ) ) . (11)
Here and in what follows we denote by С constants with values in R that can
be different on different occasions. For the bandwidth h = h n satisfying (10)
and (11) we can write
= С ( 1 + ( 1 ) ) Л £ - - й _ exp
0 [£y
with some constant С > 0. This proves that, for the optimal bandwidth, the
bias term dominates the variance term whenever r < s. (Strictly speaking,
here we consider upper bounds on the bias and variance terms and not
precisely these terms.)
Similarly, for the L -risk we get 2
/ 2a\ 1 /2/3
= c a + o a ) ) , , r L s ) ^ e x p ( J_),
for some constant С > 0. This proves that also for the L -risk the bias term 2
are asymptotically smaller. Thus, the pointwise risk and the L -risk of the 2
respectively.
Note that, in fact, h* is better than both bandwidths h and h (L ) n n 2
in the variance terms, but these terms are asymptotically negligible with
respect to the bias ones (cf. Lemma 4 ) . Therefore, the improvement does
not appear in the main term of the asymptotics. Note also that the sequence
2 1 - r / s
( l n l n n ) in (13) can be replaced by a sequence satisfying b — o ( ( l n n ) ), n
b /lnlnn
n -> oo and the above argument remains valid (cf. the proof of
Lemma 4 ) .
Calculating the upper bounds for bias terms of the estimator f with n
bandwidth (13) we get the following asymptotical upper bounds for its point-
wise and L risks, respectively:
2
( 1 r ) / s
2 L i r _ x ( 2a\ L /lnn\ " /
V ех = е х
- | ) ( l o
W Л~к) ^ГгЫ) Ч
+ ( 1 ) )
(14)
and
2
</? (L ) = 2 L e x p ( - ^ . (15)
T h e o r e m 1. Let a > 0, L > 0, 0 < r < 5 < oo, let the left inequality
£
in (1) hold and Ф (и) ф 0 for all и G R . Then the kernel estimator f n
with bandwidth defined by (13) satisfies the following pointwise and L risk 2
bounds
2
lim sup sup R (ж, n / , £ 4 , ( L ) ) </?~ ^ 1,
п r (16)
2
l i m s u p . R „ ( L , / „ , ^ , ( L ) ) V > - ( L ) ^ 1,
2 e r 2 (17)
n—too
behavior:
L /lnn\
( — )
( 1
- r ) / s
/ /1пп\
exp(^-2.(—)
0
р/
Л„
j (1 + 0(1)),
.,
rf r < - ,
s
2тг ar
exp - 2 a J — + — (l+o(l)), 2
if r = - , 2
2тгаг V 2/3 j * \ У /5 ^ /
(18)
and
Г /
Ь е х Р ^ - 2 а ^ ) ^ ( 1 + (1)), 0 if r <
(19)
L e x p 2 a + ( i + o ( i ) ) i f
(- \/W f) '
The bandwidth (13) depends on the parameters a, r of the class ^ ( L ) > r
Note that the parameters s and /3 are supposed to be known since they
£
characterize the known density of noise f .
£
T h e o r e m 2. Suppose that the left inequality in (1) holds and Ф (и) ф О
n
for all и G R . Let f be kernel estimator defined in (2) with bandwidth
h — h* defined by
n
Ann /inn \ , л Ч
( 2 0 )
for n large enough so that (Inn)/(2/3) > 1. Then, for all (a, L , r ) G в ,
2
lim sup sup R (ж, n / * , я/ , Щ)
а г cp~ ^ 1
n—too XGR
and
2
l i m s u p # ( L , / * , < , ( L ) ) ^ ( L ) ^ 1,
n 2 r 2
where the rates ip n and </? (L ) are given in (14) and (15) (and, more par
n 2
2a
exp
p = e i p
I" ((l) rW)'
Therefore, the ratio of the bias term of / * to the variance term of / * both
for the pointwise risk and for the L -risk is bounded from below by 2
In view of Proposition 1, for n large enough the bias term of / * for the
pointwise risk is bounded from above by
т1
-1/2-] r/s '
Ь /,ячг i ( с (\ъп\ *\ л /1пп\
(Щ-ехр - 2 а ( - ) [ ! - ( - )
2тгаг
( 1 г ) / з r / s r / s 1 / 2
L ( l + o(l)) / 1 п п \ - / „ /lnn\ /lnn\ - \
= V J(l+o(l)),
where с > 0 is a constant and we have used (18) with r < s/2 for the last
equality. Similarly, for n large enough the bias term of / * for the L -risk is 2
2
< Lexp ( - 2 a ( ^ ) " " + ^ ) ' " " " ) = + 44),
where с > 0 and we have used (19) with r < s/2 for the last equality.
Theorem 2 is proved.
If r = s/2, adaptation to ( a , L) is still possible via a procedure similar
to that of Theorem 2, but it does not attain the exact constant, as shows
the following result. Introduce the set
£
T h e o r e m 3. Suppose that the left inequality in (1) holds and Ф (и) ф О
for all и G R . Let f* be the kernel estimator defined in (2) with bandwidth
h — h* defined by
n
1 s
I—— \ ~ /
( Inn A /Inn \
2
where A > A and n is large enough so that (lnn)/(2/3) > (А/' /3) .
0 Then for
r = s/2 and for all (a, L) G 0 , O
2
/ Q^4U oc \
2
\ims\^s\xpR (xJ^,^ (L))Lp-
n r ^ exp — - - ) , (21)
n->oo x€R \ P P J
2
F OTA EX. \
H m s u p i ? ( L , / * , < , ( L ) ) ^ ( L ) ^ exp
n 2 r
2
2 - — J, (22)
where the rates tp and (p (L )n n 2 are given in (18) and (19).
Proof. It is easily checked that the bias exponent
=exp
exp
(" JkW) v V ~w +
T) {1+o(1))
'
while for the variance term exponent
x p = e x p
^ (-(w) ("Ww)'
Since A > a, the bias term of / * asymptotically dominates its variance term.
Inequalities (21) and (22) now follow from these remarks and the expressions
2
for (p , < ^ ( L ) in (18) and (19) with r = s/2. Theorem 3 is proved.
n 2
Appendix.
L e m m a 1. For 0 < a, r, L < oo,
_ 1
sup sup ^ L + 7r C(r, a),
r
иЛеге C ( r , a ) = / °° exp(—2au ) du.
0
V x e R 23
h I - ( )
By Markov's inequality
Also,
г r
J \Ф(и)\1(\Ф(и)\ехр(2а\и\ ) ^l^jdu ^ 2 exp (-2au ) du = 2C(r, a ) .
Combining the last two inequalities with (23) proves the Lemma.
124 Butucea С, Tsybakov А. В.
and
Б 5 + 1 s s
Г и exp(/?u ) du = - j - ^ " e x p ( ^ ) ( l + o(l)), v oo. (25)
Proof of this lemma is omitted. It is based on integration by parts and
standard evaluations of integrals.
L e m m a 3. Let p be the density of stable symmetric distribution with
r
characteristic function exp(—|£| ), 1 < r < 2. Thenp is continuous, p(x) > 0
for all x G R and there exist C\ > 0, c > 0 such that 2
r
p(x) ^ c | z | " " \ 1
for \x\ ^ c . 2
|l/(r-l) , 1
where
_ /sin(7rry/2)\
r/(1
~ r)
cos(7r(r- l)(p/2)
~ V COS(TT^/2) J COS(TT^/2)
/7г(г — v
1)Ф\
J Y
/7г(г — 1 ) \
1 ^ cos [ _ I ^ cos f — ] > 0,
. f 7ГГ(р\
c ^ sm ( - у
3 I ^ c > 0, 4
c ( cos — 1
6 ^ Ц<р) ^ c ( cos — I 5 ,
1 1
c ( l - v) ^- ) <
10 < c (l - ^ / ( - D 9 V v ? 6
Sharp optimality in density deconvolution. I 125
Using (26) and the fact that u(IP) ^ 0 for (P G [0,1], we get
1 / ( r 1 / г 1 г 1 1 г 1
P(X) > c|x| - ^ J' ( 1 - ^ ) ( - > е х р ( - | ж | / ^ - > с ( 1 - ( ^ ) / ( - ) ) DTP 9
1 1 1
= c \ x \ ^ J IPW-VEXP ( - C g d x l V ) ^ " ) ) DTP.
1 r r 1
— c\x\ u exp(—c u) 9 du
Jo
1 r r 1 1 r
^ c\x\ u exp(—cgu) du ^ с \х\ г ,
Jo
for |x| ^ c > 0. This also implies that p(x) > 0, for all x ф 0 and
2
1
р(0) = ( 2 т г ) - | е х р ( - | * Г ) < Й # 0 ,
2
^ + ^ == Inn - ( l n l n n ) v )
h: hi
Let hn satisfy
2A 20
b\nh
+ — + ~ - = lnn + C ( l + o ( l ) ) ,
n n -> oo,
fa fa
n n
/ o r some 6 G R and С G R . TTien, as n —> oo, гуе /lave
1 / s
/lnn\ ~
:
M*)-W *))-
/ o r an?/ a G R , and
/ г : + 2 7 1 е х р + 2 7 _ 1 е х р ( 3 0 )
" (1)^" (|)'
for n large enough.
126 Butucea С, Tsybakov А. В.
s s
Proof. Define x* = fa , x n = fa , and write, for t > 0,
d r / s f r/s
F(t) = 2/3* + 2 a t , Fi (t) = - - ln t + 2/3* + 2at .
Then
2
F ( z * ) = Inn - ( l n l n n ) , (31)
1
^(x ) = J - ( - ^
n + 2/? + ^ ; / - > o x >
V Sj S
for z large enough, thus Fi(t) is strictly monotone increasing for large i,
n
and a solution z of (32) exists for large n (and is unique). Next, clearly,
n
7
as n -> oo, and write F(x ) = i i ( x ) + (b/s) l n x to get (33) in view of (32).
n n n
We have
_ 1 _ 1
x = F (lnn + a ), n x* = F ( l n n - b ) , n n
2
where a = Ci l n l n n (1 + o ( l ) ) + C ( l + o ( l ) ) = O ( l n l n n ) , 6 = ( l n l n n )
n 2 n
_ 1
and F ( - ) is the inverse of F ( - ) . Hence, for some 0 < r < 1 and for n large
enough,
_ 1 _ 1
xn = F ( l n n + a ) = x* + ( F ( l n n - b ))'(a
n n n + b) n
2
+ i ( F - ^ l n n - 6 (l - r) + r a ) ) " ( a + b ) . n n n n (34)
l
The first and the second derivatives of F~ are given by
1 1
{F-\y))'
F'(F-i(y)) 20 + (2ar/s)(F-i(y)y/ s-l '
/s 2
- W>
(F 4ll = -(2ar/s)(r/s-l)(F-\y)r -
{ K V , ) 1 r / s 1 3
(2/?+(2ar/s)(F- (y)) - ) '
Hence
<г ъи <36)
Ь ~ Ь
_ def _
for у ^y. Considering n large enough so that y n = Ът — Ъ (1 —
п т)+а т^у
п
1 ff
and using the above expression for (F~ (y)) and (36) we get
s 2
(F-\y ))"
n = (F-Hlnn-b (l-T)+ra ))"
n n = 0((lnny/ - )
r
exp ( - ^ + | ? ) = exp ( - 2a(x /° - *;/•)) r
a r s
= exp ( - 2 с < / * ( [ 1 + 6 ( 2 / 3 x . ) - ( l + o ( l ) ) ] / - l ) )
n
/ s _ 1
= exp(0(6X ) ) = l + o(l),
2 - 1
since b = ( l n l n n ) , ж. = ( 2 / ? ) l n n ( l + o ( l ) ) and r < s. This and the
n
a a s
fact that {h /K) n = {xjx ) / = 1 + o ( l ) imply (28). Next, (29) follows
n
directly from the definition of /г. and from (27). To prove (30), note that,
in view of (37),
hn
exp - £ J = (1 + o ( l ) ) exp(2/?(x. - x ))
n
(l + o ( l ) ) e x p ( - & „ [ ! + < 1
REFERENCES
Поступила в редакцию
30.VIII.2004
Исправленный вариант
27.VI.2005