Академический Документы
Профессиональный Документы
Культура Документы
1 U-Statistic
from the distribution is estimable if there is some function h of a sample of size r, such that
By definition, the function h does not have to be symmetric, but it can be made so, by the following
way:
1 X
hsym (X1 , , Xr ) = h(X1 , , Xr )
r!
1 X
U= n
h(X1 , . . . , Xr )
r C
where ranges over all combinations. If h is not permutation symmetric, then we modify it as
follows,
(n r)! X
U= h(X1 , . . . , Xr )
n!
P
1
where ranges over all permutations. In either way, U is unbiased. Moreover, note that
Observe that U-statistics generalize many classical estimators. We now give a few examples:
Examples
1
Pn
1. Sample mean: Let r = 1 and choose U = n i=1 h(Xi ) with h being the identity function,
2. Variance: use kernel h(x1 , x2 ) = x21 x1 x2 and then construct the corresponding symmetric
kernel hsym (x1 , x2 ) = 21 (x1 x2 )2 , we obtain the corresponding U-statistic for estimating the
variance:
1 X1
U= n
(Xi Xj )2
2 i<j
2
3. Signed rank statistics (Wilcoxon): The original signed rank test aims to determine if P is
X X
Wn+ = 1{Xi >0} + 1{Xi +Xj >0}
i i<j
(x1 , y1 ), (x2 , y2 ) if the slope of the line connecting these two points is positive, and discordant
for the two vectors if the slope is negative. For iid (X1 , Y1 ), (X2 , Y2 ), if the probability of
them being concordant then there is a positive association between X and Y . Kendalls
Hence, if we let h((x1 , y1 ), (x2 , y2 )) = 2(1{x1 <x2 ,y1 <y2 } + 1{x2 <x1 ,y2 <y1 } ) 1, it follows that
2
The goal of the discussion below is to derive an asymptotic distribution of the U-statistic. To do
we show the asymptotic distrbution of the projection, followed by the proof that U converges in
Xn
S={ gi (Xi )}
i=1
where each gi is any function that has finite second moment. For a random variable T , the Hajek
Theorem 4. Suppose we are interested in the limiting distribution of {Tn }, and relate {Tn } to some
P
{Sn } for which we know the limit, then if Tn Sn 0, by the Slutskys theorem, Tn converges to
Now, let S be a linear space for RVs with finite second moments, then Sb is a projection of T onto
cov(T S,
b S) = 0.
The proof of the last statement in definition 3 goes by proving E[(T S)S]
b = 0 for any S S and
follows:
3
and
c2 = Var(hc (X1 , , Xc ))
r2 2
Hence, the variance of U
b is just
n 1 .
2 (Tn )
Theorem 6. If 2 (Sn ) 1, then
Tn E[Tn ] Sn E[Sn ] P
0
(Tn ) (Sn )
so we have
r2 2
Var(U ) n 1 + O(1/n2 )22 + + O(1/nr )r2 P
= r2 2
1
Var(U n 1
b)
d
Theorem 7 (Asymptotic Convergence). If E[h2 (X1 , . . . , Xr )] < , then n(U ) N (0, r2 12 )
4
Reference
stat210b/notes/7notes.pdf
HoeffdingExp.pdf
2 Tail Bound
Theorem 8 (Markov Inequality). Let X be a nonnegative RV and t > 0. If E[X] < , then
E[X]
P[X t]
t
Applying the Markov inequality to the variable Y = (X E[X])2 , we obtain the Chebyshevs
inequality:
Theorem 9 (Chebyshevs Inequality). Suppose X has finite variance and let = E[X], then
Var(X)
P[|X | t]
t2
Definition 10 (Moment). For a one-dimensional RV X with PDF f (x), the k-th moment about c
is:
Z
k = (x c)k f (x)dx
whenever it exists. Typical choices of c are 0 and the mean. When c is the mean, the moment is
5
Suppose the k-th central moment for X exists, then applying the Markovs inequality to |X |k
gives:
E[|X |k ]
P[|X | t] (2)
tk
Definition 11 (Moment Generating Function). The MGF for a RV X, when it exists, is defined
as follows:
MX () = E[eX ]
characteristic function, the MGF does not always exist. However, when it exists, it becomes a good
tool and can assist in, say extracting the k-th moment.
Assuming that the MGF exists in [a, a], then we can use Markovs inequality to obtain:
E[e(X) ]
P(X t) = P(e(X) et ) , t > 0, [0, a]
et
Note that the Chernoff bound is always sub-optimal to the bound (2).
2.1 Sub-Gaussian
2 2
E[eX ] = e+ 2
t2
inf (log E[eX ] t) =
[0,+) 2 2
6
and so
2
/2 2
P[X + t] et
This gives a probability bound on the upper tail and we thus refer to this inequality as the upper
deviation inequality. It also motivates the discussion of RVs with a tail probability less than or
such that
2
2 /2
E[e(X) ] et
For a sub-Gaussian RV with parameter , by symmetry, X is also SG(), which gives a lower
2
/2 2
P[|X | t] 2et
Example of Sub-Gaussian
2
E[eX ] e /2
2. Bounded RV([1], Example 2.3): Suppose X [c1 , c2 ], then X is SG(), where = (c2 c1 )/2.
2
(ab)2 /8
EeX e
7
Theorem 14 (Hoeffding bound). Let Xi P be iid, where each Xi SG(). Let X be the sample
Remark. Together with the Hoeffdings lemma, one can get a bound for bounded RVs.
Theorem 15 (Equivalencer). Thm 2.1 in [1]. Suppose EX = 0, the followings are equivalent:
2
2 /2
E[eX ] e
P[|X| t] c P[|Z| t]
(2k)! 2k
E[X 2k ]
2k k!
2
A SG RV decays at least at some rate ecx , where x is its deviation from the mean and c is some
constant. Such decaying rates capture only a small subset of RVs of interest. Hence, we consider
8
Definition 16 (Sub-Exponential RVs). A RV X with mean is sub-exponential if there is non-
Compare this with the definition of SG RVs, one finds that SE RVs bound the MGF in a neighbor-
Example
e 2
E[e(X1) ] = e4 /2
1 2
2 2
P(X t) exp(t + ), where < 1/
2
Unlike the SG RVs, where the tail bound can be found by solving an unconstrained optimization
problem, here the MGF is defined only in a neighborhood of 0, so the optimization problem becomes
Theorem 18 (Bernsteins Condition). Let X be a random variable, such that the variance 2
exists. For all integers k 3, the Bernsteins condition with parameter b > is:
1
E[(X )k ] k! 2 bk2
2
Then, this is a sufficient but not necessary condition for sub-exponentiality. In particular, with the
parameters aboe, X SE( 2, 2b).
9
Theorem 19 (Bernsteins Inequality). Suppose X satisfies the Bernsteins condition with param-
h 2 2 i
E[e(X) ] exp
2(1 b||)
2 2
P[|X | > t] 2 exp(t )
2(1 b||)
t 2
2(bt+
P[|X | > t] 2e 2)
for any x1 , x2 X:
||F (x1 ) F (x2 )||2
1 1+
||x1 x2 ||2
The probabilistic version says that: for any of the n2 pairs of points x1 , x2 :
h ||F (x ) F (x )||2 i
n m2 /8
1 2
P 2
]
/ [1 , 1 + ] 2 e
||x1 x2 || 2
10