Slide (R)

Applied Statistics
J. Blanchet and J. Wadsworth

Institute of Mathematics, Analysis, and Applications EPF Lausanne
An MSc Course for Applied Mathematicians, Fall 2012
Outline
1 2 3 4 5
Motivation: Why Nonparametric Regression? Kernel Regression I: Basics on Kernel Smoothers Kernel Regression II: Local Linear Regression Kernel Regression III: Local Polynomial Regression Bandwitdh Selection
Part I Motivation: Why Nonparametric Regression?
Motivation
Suppose that we collect a random sample tpxi , yi q : i
1 , . . . , n u.
In a previous lecture we assumed that a linear specication would give a good approximation for describing the link between covariates xT i px1 , . . . , xk q and the response variable yi , i.e. yi where i
xi i ,
T
2 Np0, q, for i 1, . . . , n.
This approach is parametric in the sense that it only depends on a nite number of parameters k . In many contexts of interest this specication may however be to rigid to uncover any structure in the data so that generalizations are necessary.
Motivation
To have a more exible framework we replace xT i by a general function mpxi q, and hence move into an innite-dimensional setting. The model is now based on the specication yi where i
m p x i q i ,
Np
2 0,
q, for i 1, . . . , n.
This approach is nonparametric in the sense that we need an innite number of parameters to describe the regression function mpq. Our objective: estimate the regression function mpq from observations tpxi , yi qun i 1 , by using kernel-based procedures.
Why Should I Learn Nonparametric Regression?

To understand the origins of the Universe and test cosmological models
Figure: Cosmic Microwave Background (CMB) temperature power spectrum

as a nonlinear function of the multipole (Genovese et al, 2004). More information on CMB and reduced-galaxy maps such as the one on the left can be found at http://lambda.gsfc.nasa.gov/
Part II Kernel Regression I: Basic Kernel Smoothers
Kernel Smoothing
Basics
The estimation procedures to be discussed here are based on kernel functions, i.e. functions K such that
K px qdx
1.
Remember that kernel density estimation can be made using

n n 1 x xi 1 p K Kh px xi q. fh px q nh i 1 h n i 1 Notation: h is the bandwidth and Kh pq h1 K p{hq.
Next week the problem of kernel density estimation is revisited to provide more background for those who are unfamiliar with the subject.
Kernel Smoothing
Some technical asides
The best obtainable rate of convergence of kernel estimators, in terms of mean integrated squared error MISEpfh q Ef
pp fh px q f px qq2 dx
is O pn4{5 q, but it is possible to improve it slightly, to O pn8{9 q, if we do not restrict our attention to density functions. Since the price to pay is high in terms of interpretation, and the gains are marginal for sample sizes often found in practice, we restrict however our attention to kernels which are density functions. Typically K is taken to be symmetric and unimodal, and there is a sense in which kernels which do not obey these requisites are inadmissable (Cline, 1988).
Local Intercept Regression
Recall our set-up is
m p x i q i Idea: estimate mpx q E pY |X x q using a rolling weighted average

yi of points y with similar values of x
6 6 y 0 2 4 x 6 8 10 -2 0 0 2 4
-2
4 x
10

R: ksmooth{stats}
Our rst method to estimate the regression function mpq is the NadarayaWatson estimator This is based on approximating E pY |X using
xq
yf px , y qdy {f px q by
p pY | X p px q E m
xq
1 n
n i 1 Kh px xi qYi p fh p x q
n i 1 Kh px xi qYi . n i 1 Kh px xi q
The estimator works as rolling weighted mean, with the weights dened by the kernel function K .

R: ksmooth{stats}
This local mean estimator can be shown to be the solution to an optimization problem of interest
p px q arg min m
x
p q i 1
Kh px
xi qpYi px qq2 .
We now need to think about the questions: Choice of the kernel? Choice of the bandwidth?
Some Kernels
Uniform
K pu q
1 It|u|1u , 2
Triangular Epanechnikov
K pu q p1 |u |q It|u|1u , K pu q K pu q K pu q K pu q 3 p1 u2 q It|u|1u , 4
Biweight
15 p1 u2 q2 It|u|1u , 16 35 p1 u2 q3 It|u|1u , 32
Triweight
Gaussian
?1
u2 2
P R.
Some Kernels
Uniform kernel 1.2 1.2 Triangular kernel 1.2 Epanechinikov kernel
1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
0 u
0 u
0.0 3
0.2
0.4
0.6
0.8
1.0
0 u
Biweight kernel 1.2 1.2
Triweight kernel 1.2
Gaussian kernel
1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
0 u
0 u
0.0 3
0.2
0.4
0.6
0.8
1.0
0 u
A Unifying Kernel
Most of the kernels presented above are particular cases of a more general kernel dened as Kq pu q t22q 1 B pq 1, q 1qu1 p1 u 2 qtIt|u|1^q 8u ItuPR^q 8u u, where B p, q denotes the Beta function B px , y q Specically we have that: q q q q q
1
0
u x 1 p1 u qy 1 du .
0: Uniform; 1: Epanechnikov; 2: Biweight; 3: Triweight; 8: Normal.
Choice of the Kernel
p pq is We prefer smooth kernels to ensure that the resulting estimator m smooth.

We prefer compact kernels because this ensures that only data, local to the point at which m is estimated, are used in the t. The optimal choice under some standard assumptions is the Epanechnikov kernel (this is formalized later). It achieves smoothness, compactness, and rapid computation. Any sensible choice of kernel will produce acceptable results, so the choice of the kernel is not crucially important.
Choice of the Bandwidth?
The choice of bandwidth is however critical to the performance of the estimator and far more important than the choice of kernel.
Data
G G G G G GG GG G G G G GG G G G GG GG G G G G GG G G G GG G G GG GG G G G G GG G G G GG G G GG GG G G G GG G G GG G G G G GG G G G GG G G G G GGG GG G G G GG G G G GG G G G G G GG GGG G G G G GG G G G G G G G GG G G G G G G G G G G G G GG G GGGG G G G GG G G G G G G G G G GG G G G G GG G G GGG GG G GG G G G G G G G G G G G G G G G G G G G GG G G G G G G GG G G G G G GGG G G G G G G G G G G G G GG G G G GGG G G G GG G G G G G GG G G G G G G G G G G G G G G G G G G G GG G GG G GG G G GG GG G G GG G G G G G G G G G G G G G G G GG GG G G G G G G G GG GG GG G G G G G G G GG G G G G G GG G G G G G G G G GG G G G G G G G GG G GG G G G G G G GG G G GGG G GG G G G G G G G GG G GG G GG GG GG GGG G G G G G G G G G G G G G GG GG G G G G G G G GG G G G G G G G G GG G G G G G GG G G G GG GG G G G G GG G G G GGG G G G G G G GG G GG G G GG G G G GGG G G G G G G G G G G G G G G GG G G GG G G G G G GG G GG G G
Undersmoothed
G G G G G GG GG G G G G GG G G G GG GG G G G G GG G G G GG G G GG GG G G G G GG G G G GG G G GG GG G G G GG G G GG G G G G GG G G G GG G G G G GGG GG G G G GG G G G GG G G G G G GG GGG G G G G GG G G G G G G G GG G G G G G G G G G G G G GG G GGGG G G G GG G G G G G G G G G GG G G G G GG G G GGG GG G GG G G G G G G G G G G G G G G G G G G G GG G G G G G G GG G G G G G GGG G G G G G G G G G G G G GG G G G GGG G G G GG G G G G G GG G G G G G G G G G G G G G G G G G G G GG G GG G GG G G GG GG G G GG G G G G G G G G G G G G G G G GG GG G G G G G G G GG GG GG G G G G G G G GG G G G G G GG G G G G G G G G GG G G G G G G G GG G GG G G G G G G GG G G GGG G GG G G G G G G G GG G GG G GG GG GG GGG G G G G G G G G G G G G G GG GG G G G G G G G GG G G G G G G G G GG G G G G G GG G G G GG GG G G G G GG G G G GGG G G G G G G GG G GG G G GG G G G GGG G G G G G G G G G G G G G G GG G G GG G G G G G GG G GG G G
y 2 0 2
4 x
10
4 x
10
OK
G G G G G GG GG G G G G GG G G G GG GG G G G G GG G G G GG G G GG GG G G G G GG G G G GG G G GG GG G G G GG G G GG G G G G GG G G G GG G G G G GGG GG G G G GG G G G GG G G G G G GG GGG G G G G GG G G G G G G GG G G G G G G G G G GG G G G G G GGGG G G G GG G G G G G G G G G GG G G G G GG G G GGG GG G GG G G G GG G G G G G G G G G G GG G G G GG G G G GG GG G G G G G GGG G G G G G G G G G G G G G G G G G G GGG G G G GG G G G G G GG G G G G G G GG G GG G G G G GG G GG G G G G GG G G G G G GG G G GG G G G G G G G G G G G G G G G G G G G GG G GG GG G G G G G G G GG G G G GG G G G GG G G G G G G G G G GG G GGG G G G G G G G G G G G G G G GG G G G G G G G G G G G GG GG G GG G GG G G G GG G GG GG GGG G G G G G G G G G GG GG GG G G G G G GG G G G G G GG G GG G G G G G G GG G G G GG GG G G G G GG G G G GGG G G G G G G GG G GG G G GG G G G GGG G G G G G G G G G G G G G G GG G G GG G G G G G GG G GG G G
Oversmoothed
G G G G G GG GG G G G G GG G G G GG GG G G G G GG G G G GG G G GG GG G G G G GG G G G GG G G GG GG G G G GG G G GG G G G G GG G G G GG G G G G GGG GG G G G GG G G G GG G G G G G GG GGG G G G G GG G G G G G G GG G G G G G G G G G GG G G G G G GGGG G G G GG G G G G G G G G G GG G G G G GG G G GGG GG G GG G G G GG G G G G G G G G G G GG G G G GG G G G GG GG G G G G G GGG G G G G G G G G G G G G G G G G G G GGG G G G GG G G G G G GG G G G G G G GG G GG G G G G GG G GG G G G G GG G G G G G GG G G GG G G G G G G G G G G G G G G G G G G G GG G GG GG G G G G G G G GG G G G GG G G G GG G G G G G G G G G GG G GGG G G G G G G G G G G G G G G GG G G G G G G G G G G G GG GG G GG G GG G G G GG G GG GG GGG G G G G G G G G G GG GG GG G G G G G GG G G G G G GG G GG G G G G G G GG G G G GG GG G G G G GG G G G GGG G G G G G G GG G GG G G GG G G G GGG G G G G G G G G G G G G G G GG G G GG G G G G G GG G GG G G
y 2 0 2
4 x
10
4 x
10
Choice of the Bandwidth? Later
The problem of bandwidth selection is a complicated one, and will be discussed later. For now it is important to preserve the idea that the selection of h involves a biasvariance tradeoff: If the bandwidth is too small, the estimator will be too rough; but if it is too large, important features will be smoothed out. The selected bandwidth should of course depend on the sample size, and we assume that h
hn op1q,
limn8 nh
8,
but that h converges to zero at a slower rate than 1{n, i.e.
8.
An Example: The Old Faifthful Data

R: faithful{datasets}
Figure: An eruption of the Old Faithful Geyser, Yellowstone National Park, Wyoming, US.
An Example: The Old Faifthful Data

R: faithful{datasets}
Data: Waiting time to next eruption; Duration of eruption; Observations: 272; Units: minutes; Source: Hrdle (1991).
The NadarayaWatson Estimator in R

R: faithful{datasets} + ksmooth{stats}
attach(faithful) #The Nadaraya--Watson estimator over different bandwidths par(mfrow=c(1,3)) plot(eruptions,waiting,main="h=0.1",xlab="Duration", ylab="Waiting time") lines(ksmooth(duration,waiting,"normal",0.1),lwd=2,col = "red") plot(eruptions,waiting,main="h=0.5",xlab="Duration", ylab="Waiting time") lines(ksmooth(duration,waiting,"normal",0.5),lwd=2,col = "red") plot(eruptions,waiting,main="h=2",xlab="Duration", ylab="Waiting time") lines(ksmooth(duration,waiting,"normal",2),lwd=2,col = "red")
h=0.1 h=0.5 h=2
90
90
Waiting time between eruptions
Waiting time between eruptions 1.5 2.5 3.5 4.5
80
80
70
70
60
60
50
50
1.5
2.5
3.5
4.5
50 1.5
60
70
80
90
2.5
3.5
4.5
Duration of the eruption
The BoundaryBias Problem
In practice we typically observe that the NadarayaWatson has a poorer behavior near the edges of the region where we observe the data (say for durations close to 1.6 or 5.1 in the faithful data) and in regions where we lack more data (say for durations between 2.8 and 3.3). Asymptotic arguments can be used to show that the optimal MISE of the NadarayaWatson estimator O pn4{5 q, reduces to O pn2{3 q close to the boundary of the support of the covariate (Wand & Jones, 1995, 5.5). In the following we present an alternative to this local constant t estimator which removes the bias exactly to rst order.
Part III Local Linear Regression
Local Linear Regression
Instead of taking a rolling weighted mean, we can consider a rolling mean regression, with the weights dened by the kernel function K . This is exactly what is known as local linear regression; the corresponding estimator is now a solution to an optimization problem which extends the one from which we obtained the NadarayaWatson estimator
ppx q
0 px q 1 px q
arg p pxmin q, px qq
0 1
n i
1
Kh px xi qpYi 0 px q 1 px qxi q2 .
The estimator for the regression function is now given by
p0 px q p1 px qx . p px q m
Local Linear Fitting in R

R: locpoly{KernSmooth}
## local linear regression require(KernSmooth) fitlocpol<-locpoly(eruptions,waiting,bandwidth=0.4) plot(eruptions,waiting,xlab="Duration",ylab="Waiting time",main="h=0.5") lines(fitlocpol,type="l",lwd=2,col="red")

h=0.5
Waiting time
50 1.5
60
70
80
90
2.0
2.5
3.0
3.5 Duration
4.0
4.5
5.0
Part IV Local Polynomial Regression
Local Polynomial Regression
Local polynomial regression extends the ideas above to the context where we can have further polynomial terms in the regression. Hence we now perform a rolling weighted regression based on yi
0 px q
p j
1
j px qxij
i ,
where the weights are again controled by the kernel K . This estimator is also the solution to an optimization problem of interest
p0 px q 2 p n ppx q . arg min Kh px xi q yi 0 px q j px qxij . . . px q i 1 j 1 pp px q
Local Polynomial Regression

The estimator for the regression function is thus
p0 px q p p px q m
Particular cases:
p j
1
pj px qx j ,
P N0 .
local intercept regression: p
p 0 px q m
local linear regression: p
p 1 px q m
where
n p p i 1 ts2 px q s1 px qpxi x quKh pxi x qyi , 2 p s2 px qp s0 px q p s1 px q

n i
1
n i 1 Kh pxi x qyi . n i 1 Kh pxi x q
0
p sr px q n1
1
pxi x qr Kh pxi x q.
Comparison over Different Degrees

R: locpoly{KernSmooth}
require(KernSmooth) fitlocpol0<-locpoly(eruptions,waiting,bandwidth=0.5,degree=0) fitlocpol1<-locpoly(eruptions,waiting,bandwidth=0.5,degree=1) fitlocpol2<-locpoly(eruptions,waiting,bandwidth=0.5,degree=2) plot(eruptions,waiting,xlab="Duration",ylab="Waiting time",main="h=0.5") lines(fitlocpol0,type="l",lwd=2,col ="gray") lines(fitlocpol1,type="l",lwd=2,col ="red") lines(fitlocpol2,type="l",lwd=2,col ="blue")
h=0.5; gray, red,and blue respectively correspond to p=0,1,2
50 1.5
60
70
80
90
2.0
2.5
3.0
3.5
4.0
4.5
5.0
What to Take: p
0, 1 or 2?
Local intercept regression (p 0) is nothing else than a weighted moving average. Such a simple local model might work well for some situations, but may not always approximate the underlying function well enough. Local linear ts (p 1) might work better in that case. It can be shown p 1 pq has some nice MSE features and minimax optimality that m properties (Fan, 1992, 1993). Higher-degree polynomials would work in theory, but may tend to overt the data in each subset and may be numerically unstable, making accurate computations difcult. Asymptotic arguments suggest that local polynomials of odd degree dominate those of even degree.
Part V Bandwidth Selection
How much to smooth?
The problem of deciding how much to smooth is always of great importance in nonparametric regression. The selection of the smoothing parameter (bandwidth) is always related to a certain interpretation of the smooth. If the purpose of the smoothing is to increase the "signal to noise ratio", or to suggest a simple model, then a slightly "oversmoothed" curve with a subjectively chosen bandwith might be desirable. On the other hand, when the interest is purely in estimating the regression curve itself with an emphasis on local structures then a slighlty "undersmoothed" curve may be appropriate.
Bandwidth Selectors
General comments
However, a good automatically selected bandwidth is always a good starting point.
, it is necessary to To characterize a bandwidth selector as optimal, ho dene relevant criteria. The minimization of the average squared error (ASE) denes one such criteria ho
1 p h pxi q mpxi qq2 . pm arg hmin ASEphq arg min PR hPR n
i n
1
Here we mainly focus on discussing a cross-validatory bandwidth selector, because of its simplicity, but many other strategies such as plug-in rules are available (Wand and Jones, 1995, 5.8).
Cross-Validation in Regression
Denition
Cross-validation entails estimating the regression function using data with the i th observation removed, and the resultant leave-one-out estimator is, for example, for p 0
p i ,h pxi q m
n j 1,j i Kh pxi xj qyj . n j 1,j i Kh pxi xj q p i ,h pxi qu2 . m
We consider the cross-validation function: CVphq

n 1 tyi n i 1
The CV function validates the ability to predict tyi un i 1 across the subsamples tpxj , yj qun . j 1,j i
Bandwidth selection by cross-Validation

Denition and properties
The cross-validatory bandwidth selector is then dened as
hCV
arg hmin CVphq. PR P N0 . op p1q.
These principles apply overall for p
It can be shown that (Hrdle et al, 1988)
h ho CV
Here op p1q is the stochastic version of Landaus little-o notation, and is used to denote convergence to zero with probability 1.
Bandwidth Selection by Cross-Validation

R: hcv{sm}
## install.packages(sm) require(sm) attach(faithful) hopt<-hcv(eruptions,waiting,display="lines") > hopt [1] 0.4243375

15000 CV 9000 10000 11000 12000 13000 14000
0.2
0.4 h
0.6
0.8
Bandwidth Selection by Cross-Validation

R: ksmooth{stats}
## Nadaraya--Watson estimator evaluated at the optimal bandwidth by cross validation plot(eruptions,waiting,main="h=0.4243375",xlab="Duration", ylab="Waiting time") lines(ksmooth(duration,waiting,"normal",hopt),lwd=2,col="red")
h=0.4243375
50 1.5
60
70
80
90
2.0
2.5
3.0
3.5
4.0
4.5
5.0
Bandwidth Selection
R: ksmooth{stats}
set.seed(123);x<-runif(200,0,1);y<-sin(4*x)+rnorm(200,0,1/3); plot(x,y,col=gray(0.5)); lines(ksmooth(x,y,"normal",0.08),col="green"); lines(ksmooth(x,y,"normal",0.2),col="red"); lines(ksmooth(x,y,"normal",0.5),col="blue"); lines(seq(0,1,length=100),sin(4*seq(0,1,length=100)),lty=2);

2.0
bandwidth=0.08 bandwidth=0.2 bandwidth=0.5 true
1.0 0.0
0.5
0.0
0.5
1.0
1.5
0.2
0.4 x
0.6
0.8
1.0
Bandwidth Selection
R: h.select{sm} + ksmooth{stats}
set.seed(123);x<-runif(200,0,1);y<-sin(4*x)+rnorm(200,0,1/3); h1<-h.select(x,y,method="cv"); h2<-h.select(x,y,method="aicc"); plot(x,y,col=gray(0.5)); lines(ksmooth(x,y,"normal",h1),col="red"); lines(ksmooth(x,y,"normal",h2),col="green"); lines(seq(0,1,length=100),sin(4*seq(0,1,length=100)),lty=2);
2.0 CV bandwidth= 0.09 AIC bandwidth= 0.13 true
1.0 0.0
0.5
0.0
0.5
1.0
1.5
0.2
0.4 x
0.6
0.8
1.0
Bandwidth Selection
R: (h.select + sm.regression){sm}
set.seed(123);x<-runif(200,0,1);y<-sin(4*x)+rnorm(200,0,1/3); h1<-h.select(x,y,method="cv"); h2<-h.select(x,y,method="aicc"); sm.regression(x,y,h=h1,col="red"); sm.regression(x,y,h=h2,col="green",add=T); lines(seq(0,1,length=100),sin(4*seq(0,1,length=100)),lty=2);

2.0
1.0
0.5
0.0
0.5
1.0
1.5
CV bandwidth= 0.09 AIC bandwidth= 0.13 true
0.0
0.2
0.4 x
0.6
0.8
1.0

Slide (R)

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Slide (R)

Загружено:

Авторское право:

Доступные форматы

Applied Statistics

J. Blanchet and J. Wadsworth

An MSc Course for Applied Mathematicians, Fall 2012

Part I Motivation: Why Nonparametric Regression?

Suppose that we collect a random sample tpxi , yi q : i

Why Should I Learn Nonparametric Regression?

Figure: Cosmic Microwave Background (CMB) temperature power spectrum

Part II Kernel Regression I: Basic Kernel Smoothers

Remember that kernel density estimation can be made using

Local Intercept Regression

Recall our set-up is

 m p x i q i Idea: estimate mpx q  E pY |X  x q using a rolling weighted average

Local Intercept Regression

Local Intercept Regression

Biweight kernel 1.2 1.2

Triweight kernel 1.2

 0: Uniform;  1: Epanechnikov;  2: Biweight;  3: Triweight; 8: Normal.

Choice of the Kernel

p pq is We prefer smooth kernels to ensure that the resulting estimator m smooth.

Choice of the Bandwidth?

Choice of the Bandwidth? Later

but that h converges to zero at a slower rate than 1{n, i.e.

An Example: The Old Faifthful Data

An Example: The Old Faifthful Data

The NadarayaWatson Estimator in R

Waiting time between eruptions

Waiting time between eruptions

Waiting time between eruptions 1.5 2.5 3.5 4.5

Duration of the eruption

Duration of the eruption

Duration of the eruption

The BoundaryBias Problem

Part III Local Linear Regression

Local Linear Regression

The estimator for the regression function is now given by

Local Linear Fitting in R

## local linear regression require(KernSmooth) fitlocpol<-locpoly(eruptions,waiting,bandwidth=0.4) plot(eruptions,waiting,xlab="Duration",ylab="Waiting time",main="h=0.5") lines(fitlocpol,type="l",lwd=2,col="red")

Part IV Local Polynomial Regression

Local Polynomial Regression

p0 px q 2 p n  ppx q  .   arg min Kh px xi q yi 0 px q j px qxij . . . px q i 1 j 1 pp px q

Local Polynomial Regression

local intercept regression: p

n p p i 1 ts2 px q s1 px qpxi x quKh pxi x qyi , 2 p s2 px qp s0 px q p s1 px q

n i 1 Kh pxi x qyi . n i 1 Kh pxi x q

Comparison over Different Degrees

Waiting time between eruptions

Duration of the eruption

Part V Bandwidth Selection

How much to smooth?

However, a good automatically selected bandwidth is always a good starting point.

n j 1,j i Kh pxi xj qyj . n j 1,j i Kh pxi xj q p i ,h pxi qu2 . m

We consider the cross-validation function: CVphq 

Bandwidth selection by cross-Validation

The cross-validatory bandwidth selector is then dened as

 arg hmin CVphq. PR P N0 .  op p1q.

These principles apply overall for p

It can be shown that (Hrdle et al, 1988)

Bandwidth Selection by Cross-Validation

## install.packages(sm) require(sm) attach(faithful) hopt<-hcv(eruptions,waiting,display="lines") > hopt [1] 0.4243375

Bandwidth Selection by Cross-Validation

Waiting time between eruptions

Duration of the eruption

m p x i q i Idea: estimate mpx q E pY |X x q using a rolling weighted average

0: Uniform; 1: Epanechnikov; 2: Biweight; 3: Triweight; 8: Normal.

p0 px q 2 p n ppx q . arg min Kh px xi q yi 0 px q j px qxij . . . px q i 1 j 1 pp px q

n p p i 1 ts2 px q s1 px qpxi x quKh pxi x qyi , 2 p s2 px qp s0 px q p s1 px q

n i 1 Kh pxi x qyi . n i 1 Kh pxi x q

n j 1,j i Kh pxi xj qyj . n j 1,j i Kh pxi xj q p i ,h pxi qu2 . m

We consider the cross-validation function: CVphq

arg hmin CVphq. PR P N0 . op p1q.

set.seed(123);x<-runif(200,0,1);y<-sin(4x)+rnorm(200,0,1/3); plot(x,y,col=gray(0.5)); lines(ksmooth(x,y,"normal",0.08),col="green"); lines(ksmooth(x,y,"normal",0.2),col="red"); lines(ksmooth(x,y,"normal",0.5),col="blue"); lines(seq(0,1,length=100),sin(4seq(0,1,length=100)),lty=2);

set.seed(123);x<-runif(200,0,1);y<-sin(4x)+rnorm(200,0,1/3); h1<-h.select(x,y,method="cv"); h2<-h.select(x,y,method="aicc"); sm.regression(x,y,h=h1,col="red"); sm.regression(x,y,h=h2,col="green",add=T); lines(seq(0,1,length=100),sin(4seq(0,1,length=100)),lty=2);