Академический Документы
Профессиональный Документы
Культура Документы
Marco Giacoletti
11/22/2013
Simulate Data
We simulate (xi , ei ) with i 1, .., N and N = 5000 so that xi is normally distributed
with mean 1 and ei is normally distributed with mean 0. Their variance-covariance
matrix is:
=
1 0
0 5
While we define:
yi = xi + x2i + ei
Problem 1
Estimate the conditional expectation of wages given iq using the Epanechnikov kernel.
First plot the cross-validation criterion and select the optimal bandwidth.
Solutions:
The conditional expectation at observation x can be defined as:
PN
Xi x
i=1 Yi K
h
gh (x) = P
Xj x
N
j=1 K
h
The kernel function is the Epanechnikov kernel:
3
z2
K (z) = 1
1{|z|<5}
5
4 5
In order to evaluate the Cross-Validation criterion we first calculate the kernel
estimator based on leaving out the ith observation:
1
Xj x
Y
K
j
i=1,j6=i
h
PN
Xk x
k=1,k6=i K
h
PN
gh,(i) (x) =
N
X
gh,(i) (x) Yi
2
i=1
The following figure reports the criterion. We focus on values of the bandwidth in
the interval between 0.01 and 1.01 and build an equally spaced grid where the distance
between two consecutive points is 0.01.
5
1.5
x 10
CrossValidation Criterion
1.45
1.4
1.35
1.3
1.25
0.2
0.4
0.6
0.8
Problem 2
Plot the regression function with 95% confidence bands.
Solutions:
We can now calculate the conditional expectation function using the optimal bandwidth:
2
Xi x
Y
K
i
i=1
hopt
PN
Xj x
K
j=1
hopt
PN
ghopt (x) =
With asymptotic distribution:
ghopt (x) g (x) N
N hopt
2 (x)
0,
f (x)
K(u) du
2
Where we are ignoring the bias term. There are several elements composing the
asymptotic variance. The first one is the scaling factor:
Z
Z
2
K(u) du = K(u)2 du =
2
3
u2
1
=
1{|u|<5} du =
5
4 5
Z +
1 4 2 2
9
1 + u u 1{|u|<5} du =
=
25
5
80
Z 5
1 4 2 2
9
=
1+ u u
du =
80
25
5
5
Z
5
1 5
2 3
9
1 2
2
9
u+
u u =2
5 5 5 5 =
5+
=
80
125
15
80
125
15
5
9
=2
80
5+
!
8 9
3
3
5 2 5
1 2 9
5=
5= 2 5=
= 1+
5
3
5 3 40
15 40
5
5 5
2 (x) =
i=1
f (x) =
2
x
Yi ghopt (x) K Xhiopt
PN
Xj x
j=1 K
hopt
N
X
1
Xi x
K
N hopt i=1
hopt
2 (x) 3
.
f(x) 5 5
Expectation
95% CI
20
15
10
5
3
Problem 3
Estimate the regression function using polynomial series. Calculate the cross-validation
criterion for K = 0, 1, . . . , 10, and select the optimal number of terms in the series.
Solutions:
We now turn to polynomial series; we define the conditional expectation as a regression function:
g (x; K ) =
K
X
k xk
k=0
We again use cross validation to select the optimal value of K; the criterion is:
CV (K) =
N
X
g(i) (Xi ) Yi
i=1
With:
4
2
g(i) (x) =
K
X
k,(i) xk
k=0
We evaluate the criterion for values of K ranging between 0 and 10; as we can see
in the following figure, the criterion is minimized at K = 3 which coincides with a
quadratic function (as expected).
5
1.8
x 10
CrossValidation Criterion
1.7
1.6
1.5
1.4
1.3
1.2
0
Problem 4
Plot the regression function with 95% confidence bands.
Solutions:
The model selected in the previous step is:
Yi = g (Xi ) + ei = 0 + 1 Xi + 2 Xi2 + ei = X i + ei
0
Where X i = 1 Xi Xi2 and = 0 1 2 . Under the homoschedasticity assumption (which holds for the simulated data), we can calculate the regression
standard error as:
r
e0 e
e =
N 2
And the variance-covariance matrix of the coefficients as:
1
0
V ar =
e X X
5
Expectation
95% CI
20
15
10
5
3
Code
% Q
= CV criterion
% g_hat = conditional expectation
% CI_025, CI_975 = 95% confidence bands
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% epanechickov kernel function
epa_kernel = @(z) (3/4*sqrt(5))*(1-(1/5)*z.^2).*(abs(z)<sqrt(5));
% integral of the squared kernek
K2
= 3/(5*sqrt(5));
% allocate memory
N
= size(X,1);
g_hat
= NaN(N,1);
f_hat
= NaN(1,N);
g_hat_i
= NaN(1,N);
v_hat
= NaN(1,N);
CI_025
= NaN(N,1);
CI_975
= NaN(N,1);
% building blocks of CV criterion
for nn=1:N
% exclude current observation
if nn==1
Xn = X(2:N);
Yn = Y(2:N);
end
if nn==N
Xn = X(1:N-1);
Yn = Y(1:N-1);
end
if nn>1 && nn <N
Xn1 = X(1:nn-1);
Xn2 = X(nn+1:N);
Xn = [Xn1;Xn2];
Yn1 = Y(1:nn-1);
Yn2 = Y(nn+1:N);
Yn = [Yn1;Yn2];
end
% nnth element for CV criterion
g_hat_i(nn) = sum(Yn.*epa_kernel((Xn-ones(N-1,1)*X(nn))/h))...
/sum(epa_kernel((Xn-ones(N-1,1)*X(nn))/h));
% conditional expectation
g_hat(nn) =
sum(Y.*epa_kernel((X-ones(N,1)*X(nn))/h))/sum(epa_kernel((X-ones(N,1)*X(nn))/h));
% elements for variance of conditional expectation
f_hat(nn) = (1/(N*h))*sum(epa_kernel((X-ones(N,1)*X(nn))/h));
v_hat(nn) =
sum(((Y-g_hat(nn)*ones(N,1)).^2).*epa_kernel((X-ones(N,1)*X(nn))/h))...
/sum(epa_kernel((X-ones(N,1)*X(nn))/h));
% 95% confidence bands
CI_025(nn) = g_hat(nn) +
norminv(0.025,0,1)*sqrt(K2*v_hat(nn)/f_hat(nn))/sqrt(N*h);
CI_975(nn) = g_hat(nn) +
norminv(0.975,0,1)*sqrt(K2*v_hat(nn)/f_hat(nn))/sqrt(N*h);
end
% CV criterion
Q = sum((g_hat_i-Y).^2);
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [Q g_hat CI_025 CI_975]= CrossVal_pol(X,Y,K)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% input:
% X = covariates
% Y = dependent variable
% K = degree of the polynomial approximation
% output:
% Q
= CV criterion
% g_hat = conditional expectations of Yi
% CI_025, CI_975 = 95% confidence bands
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% set up covariates matrix
N
= size(X,1);
XX
= NaN(N,K+1);
for kk = 1:K+1
XX(:,kk) = X.^(kk-1);
end
% allocate memory
g_hat_i
= NaN(1,N);
% get building blocks of CV criterion
for nn=1:N
% exclude the current observation
if nn==1
XXn = XX(2:N,:);
Yn = Y(2:N);
10
end
if nn==N
XXn = XX(1:N-1,:);
Yn = Y(1:N-1);
end
if nn>1 && nn <N
XXn1 = XX(1:nn-1,:);
XXn2 = XX(nn+1:N,:);
XXn = [XXn1;XXn2];
Yn1 = Y(1:nn-1);
Yn2 = Y(nn+1:N);
Yn = [Yn1;Yn2];
end
% get nnth prediction for CV criterion
theta_n = pinv(XXn*XXn)*(XXn*Yn);
g_hat_i(nn) = XX(nn,:)*theta_n;
end
% get conditional expectations:
theta = pinv(XX*XX)*(XX*Y);
g_hat = XX*theta;
% standard errors of conditional expectations
s2_e = ((Y-XX*theta)*(Y-XX*theta))/(N-K);
s2_X = s2_e*pinv(XX*XX);
s2_f = NaN(N,1);
for nn = 1:N
s2_f(nn) = XX(nn,:)*s2_X*XX(nn,:);
end
% 95% confidence interval
CI_025 = g_hat + norminv(0.025,0,1)*sqrt(s2_f);
CI_975 = g_hat + norminv(0.975,0,1)*sqrt(s2_f);
% CV criterion
Q = sum((g_hat_i-Y).^2);
end
11