Вы находитесь на странице: 1из 5

# Assignment #4 STA355H1F

## Problems to hand in:

1. In class, we discussed a general class of estimators of g(x) in the non-parametric regression
model
Yi = g(xi ) + i for i = 1, , n.
of the form
gb(x) =

n
X

wi (x)Yi

i=1

where we required that w1 (x) + + wn (x) = 1 for each x. However, it is often desirable for
the weights {wi (x)} to satisfy other constraints. For example, if Var(i ) is very small or even
0 and g(x) is a smooth function, then we would hope that gb(xi ) g(xi ) for i = 1, , n.
P
(a) Suppose that Yi = g(xi ) = 0 + pk=1 k k (xi ) for i = 1, , n where 0 , 1 , , p are
some constants and k (xi ) are some functions. Show that gb(xi ) = g(xi ) if
n
X

(1)

j=1

## (b) If we define an n n matrix A with

A=

then

wn (x1 )
wn (x2 )
..
...
.
wn (xn )

w1 (x1 ) w2 (x1 )
w1 (x2 ) w2 (x2 )
..
..
.
.
w1 (xn ) w2 (xn )

gb(x1 )

..

. = A

gb(xn )

Y1
..

. .
Yn

## If equation (1) holds, show that

(x )
(x )
1 1
p 1
..
..

,,
.
.

1 (xn )
p (xn )

are eigenvectors of A with eigenvalues all equal to 1. Show that the condition w1 (x) +
+ wn (x) = 1 for all x implies that the vector (1, ..., 1)T is also an eigenvector of A with
eigenvalue 1.
(c) The matrix A in part (b) is called the smoothing matrix and is the analogue of the
hat matrix H = X(X T X)1 X T in linear regression. However, H can only have eigenvalues

## equal to 0 and 1, while the n eigenvalues of A typically satisfy 1 = = r = 1 with |k | < 1

for k = r + 1, , n. Usually, by varying a smoothing parameter, we can make r+1 , , n
closer to 0 (which will make gb(x) smoother) or closer to 1 (which can result in gb(x) be very
non-smooth).
The form of the smoothing matrix is typically hidden from the user and in fact, is often
never explicitly computed. However, for a given smoothing procedure, we can recover the
j-the column of the smoothing matrix by applying the smoothing procedure to pseudo-data
(x1 , y1 ), , (xn , yn ) where yj = 1 and yi = 0 for i 6= j.
Consider estimating g using loess (with locally weighted quadratic regression) in the model
Yi = g(xi ) + i

for i = 1, , 20

where xi = i/21. The following R code computes the smoothing matrix A for a given value
of the smoothing parameter span (in this case 0.7) and then computes its eigenvalues and
eigenvectors:
> x <- c(1:20)/21
> A <- NULL
> for (i in 1:20) {
+
y <- c(rep(0,i-1),1,rep(0,20-i))
+
r <- loess(y~x,span=0.7)
+
A <- cbind(A,r\$fitted)
+ }
> r <- eigen(A)
> round(Re(r\$values),3) # look at real components of the eigenvalues
[1] 1.000 1.000 1.000 0.957 0.715 0.370 0.116 -0.031 -0.013 -0.013
[11] -0.009 0.005 -0.002 0.000 0.000 0.000 0.000 0.000 0.000 0.000
The matrix A is not necessarily perfectly symmetric and so its eigenvalues may contain
(usually small) imaginary components; thus for simplicity, we will just look at the real
components of the eigenvalues. Note that A has 3 eigenvectors with eigenvalue equal to
1 this is a consequence of using locally weighted quadratic regression, which essentially
Repeat the procedure above using different values of span between 0 and 1. What do you
notice about the eigenvalues as span varies?
(d) (Optional but recommended) Take a look at the eigenvectors of A in part (c) (for a
particular value of span). This can be done using the following R code:
> r <- eigen(A)
> for (i in 1:20) {
+
+
plot(x,Re(r\$vectors[,i]),type="b")
+ }

(The command devAskNewPage(ask = T) prompts you to move to the next plot.) Note
that the first few eigenvectors are quite smooth but become less smooth.
2. The idea behind the Lilliefors test of exponentiality can also be applied to goodness-of-fit
testing for normality; the Lilliefors test of normality uses the test statistic
L=

n
1 X

max
I(Xi
<x< n
i=1

xX
x)
S

and S 2 are the sample mean and sample variance), which is simply the Kolmogorov(where X
Smirnov test statistic with estimated parameters, and we reject the hypothesis of normality
for large values of L. Because we are estimating the parameters, the null distribution of
L is not the same as the distribution of the Kolmogorov-Smirnov statistic (for which good
approximations exist); the null distribution of L is typically evaluated based on Monte Carlo
simulation with some ad hoc approximation.
The ks.test function in R can be used to compute the Lilliefors statistic L and gives a
p-value for the Kolmogorov-Smirnov test (which as mentioned above is the wrong p-value
for the Lilliefors test). However, using simulation, it possible to obtain an approximate
relationship between the p-values for the Lilliefors and Kolmogorov-Smirnov test.
(a) For n = 50, generate 1000 samples from a normal distribution and compute the p-values
for the Kolmogorov-Smirnov test using the following R code:
> pvalues50 <- NULL
> for (i in 1:1000) {
+
x <- rnorm(50)
+
# now carry out K-S test with estimated parameters
+
r <- ks.test(x,pnorm,mean(x),sqrt(var(x)),alternative="two.sided")
+
pvalues50 <- c(pvalues50,r\$p.value)
+
}
> plot(ecdf(pvalues50)) # empirical distribution function of pvalues50
Comment on the results. If we carry out a test for normality at the 0.05 level, what would
the p-value from the Kolmogorov-Smirnov test need to be (approximately) in order to reject
the hypothesis of normality at the 0.05 level?
(b) Repeat part (a) for n = 100 (define a vector pvalues100). Are the results similar to
those in part (a)?
(c) Describe the relationship between the p-values for the Lilliefors test and those obtained
from the Kolmogorov-Smirnov test. (Think about how you might transform the KolmogorovSmirnov p-values to get correct p-values for the Lilliefors test.)

## Not to hand in:

3. Suppose that (X1 , , Xn ) have a joint density f (x1 , , xn ) where f is either f0 or f1
(where both f0 and f1 have no unknown parameters). We put a prior distribution on the
possible densities {f0 , f1 }: (f0 ) = 0 > 0 and (f1 ) = 1 > 0 where 0 + 1 = 1. (This is a
Bayesian formulation of the Neyman-Pearson hypothesis testing setup.)
(a) Show that the posterior distribution of {f0 , f1 } is
(fk |x1 , , xn ) = (x1 , , xn )k fk (x1 , , xn ) for k = 0, 1
and give the value of the normalizing constant (x1 , , xn ). (Note that (f0 |x1 , , xn ) +
(f1 |x1 , , xn ) must equal 1.)
(b) When will (f0 |x1 , , xn ) > (f1 |x1 , , xn )? What effect do the prior probabilities 0
and 1 have?
(c) Suppose now that X1 , , Xn are independent random variables with common density g
where g is either g0 or g1 so that
fk (x1 , , xn ) = gk (x1 )gk (x2 ) gk (xn ) for k = 0, 1.
If g0 is the true density of X1 , , Xn and 0 > 0, show that
p

(f0 |x1 , , xn ) 1 as n .
(Hint: Look at n1 ln((f0 |x1 , , xn )/(f1 |x1 , , xn )) and use the WLLN.)
4. Suppose that X1 , , Xn are independent Exponential random variables with parameter
. Let X(1) < < X(n) be the order statistics and define the normalized spacings
D1 = nX(1)
and Dk = (n k + 1)(X(k) X(k1) ) (k = 2, , n).
As stated in class, D1 , , Dn are also independent exponential random variables with parameter .
n be the sample mean of X1 , , Xn and define for integers r 2
(a) Let X
Tn =

n
1 X
Dir .
nr
nX
i=1

and so

## n . You will need to find the joint limiting distribution

(Hint: Note that D1 + + Dn = nX
of

n
1X
1
(Di )

n i=1

n X
n

r
r
(Di r! )
n i=1
and then apply the Delta Method; a note on the Delta Method in two (or higher) dimensions can be found on Blackboard. You can compute the elements of the limiting variancecovariance matrix using the fact that E(Dik ) = k!/k for k = 1, 2, 3, which will allow you
to compute Var(Dir ) and Cov(Dir , Di ).) compute .)
Note: We can use the statistic Tn (or ln(Tn ) for which the normal approximation is slightly
better) defined in part (a) to test for exponentiality, that is, the null hypothesis that
X1 , , Xn come from an exponential distribution; for an -level test, we reject the null
hypothesis for Tn > c where we can approximate c using a normal approximation to the
p
distribution of Tn or ln(Tn ). The success of this test depends on Tn a(F ) where a(F ) > r!
for non-exponential distributions F . Assume that the F concentrates all of its probability
mass on the positive real line and has a density f with f (x) > 0 for all x > 0; without loss of
generality, assume that the mean of F is 1. If k/n t then Dk = (n k + 1)(X(k) X(k1) )
is approximately exponentially distributed with mean (1 t)/f (F 1 (t)), which is constant
for 0 < t < 1 if (and only if) F is an exponential distribution. Since the mean of F is 1,
p
p
n = X
n
D
1 but (D1r + + Dnr )/n a(F ) where
a(F ) = r!

Z 1(
0

1t
f (F 1 (t))

)r

dt.

## By Holders inequality, it follows that

Z 1(
0

and

Z 1
0

1t
f (F 1 (t))

)r

dt

(Z
0

)r

1t
dt
f (F 1 (t))

Z
1t
dt
=
(1 F (x)) dx = 1
f (F 1 (t))
0

since we assume that the mean of F is 1. Thus a(F ) r!. Moreover, if a(F ) = r! then
(1 t)/f (F 1 (t)) = 1 for 0 < t < 1 or (1 F (x))/f (x) = 1 for all x > 0, which implies that
F (x) = 1 exp(x).
(b) Use the test suggested in part (a) on the air conditioning data (from Assignment #1)
taking r = 2 and r = 3 (using the normal approximation to ln(Tn )) to assess whether an
exponential model is reasonable for these data.