Final Slides HL

Smoothing Spline Models with Large Datasets
Hao Li
University of California
hao li@ucsb.com
December 3, 2020
Overview
1 Introduction
2 D&R Methods
3 Simulation
4 Conclusion
Big Data meets Large Computation
Thanks to the big data, the machine learning community tells

many success stories. Many nonparametric methods likes
Smoothing Splines, Guassian Process Regression (GPR),
Kernel Ridge Regression (KRR), etc su↵er from a cubic time
complexity. This limits the scalability of those methods and
make them una↵ordable for large-scale datasets
Big Data meets Large Computation
Subsets-of-Data: follow the divide-and-conquer idea to

focus on the local subsets of the training data
Sparse kernels: achieve a sparse representation K̃ of K via
particularly designed kernel.
Sparse approximations: approximate the low-rank matrix
K.
Introduction
A smoothing spline model assumes that:
yi = f (xi ) + ✏i , i = 1, ..., n, (1)
where the regression funciton f ∈ H, H is a

reproducing kernel Hilbert space (RKHS) on an
arbitaty set X , and ✏i ∼ N(0, 2 )
i.i.d
Introduction
Suppose that H = H0 ⊕ H1 , and P1 is the orthogonal

projection operator onto H1 . The unique solution of
n
�(yi − f (xi )) + �P1 f �2
2 n
(2)
i=1 2
can be represented as
fˆ(x) = d T (x) + c T ⇠(x) (3)

Introduction
The coefficients c and d can be reduced to solving

the following equations
(⌃ + n I )c + Td = y
TTc = 0
.
where y=(y1 , �, yn )T , T = v (xi )n,pi=1,v =1 , and
⌃ = R1 (xi , xj )ni,j=1 The computation cost of solving
this equation takes O(n3 )
How to reduce the computation
Reference: Danqing Xu & Yuedong Wang (2018): Divide and

Recombine Approaches for Fitting Smoothing Spline Models
with Large Datasets
Divide and Conquer Method (D&R):
i) divides whole data into subsets.
ii) fits the spline model to each subset
iii) recombines estimated functions from each subset into
an overall estimate
Random divide and recombine(RDR): Randomly divide the

whole data into K subsets of approximately equal sizes. Then
the estimated value is fˆ(x) = K1 ∑K fˆk (x), and the posterior
� k=1
deviation is ˆ (x) =K
1
∑K ˆ2 (x)
k=1 k
RDR: It is known that cubic spline MSE=bias squared +

variance =O( )+O(n−1 −1�4 ) (Craven and Wahba 1979), and
the optimal MSE ∼ n−4�5 is achieved with ∼ n−4�5 . Suppose
that n = K × s, then k ∼ s −4�5 is the optimal rate for subset k.
Consequently, MSE (f˜) = O(K 4�5 n−4�5 ) + O(K −1�5 n4�5 ), and
we found the recombined estimate f˜(x) has larger bias and
smaller variance.
Debiased Random divide and recombine(DRDR): to achieve
the optimal rate of MSE, consider new smoothing parameters
˜k = K −4�5 k ∼ n−4�5 . Then the MSE of new recombined
˜ has the same convergence rate as fˆ,
estimated function fnew
the one without dividing into subsets
RDR and DRDR: Randomly divide the whole data into K

subsets of approximately equal sizes. Then the estimated
value is fˆ(x) = K1 ∑K fˆk (x), and the posterior variance is
� k=1
ˆ (x) = K
1
∑K ˆ2 (x)
k=1 k
Sequential Divide and Recombine(SDR): Instead of dividing

observations, dividing the domain into K disjoint subintervals.
fˆ(x) = fˆk (x), when x belongs to the kth subinterval.
One problem with the SDR is the combined estimate is not

smooth at the joints of subintervals.
Overlapping Sequential Divide and Recombine(OSDR): let

a < c < b < d and denote the subset in [a,b] and [c,d] as S1
and S2. Defining the g1 and g2 are the estimated function on
S1 and S2.
The recombined estimate
�
�g1 (x), a ≤ x < c,
�
�
�
g (x) = �w (x)g1 (x) + (1 − w (x))g2 (x), c ≤ x ≤ b,
�
(4)
�
�
�
�g2 (x), b < x ≤ d
′
The weight function w has w(c)=1 and w(b)=0. To make g
continuous on [a,d], w (b) = w (c) = 0. To make g
′ ′ ′′
continuous on [a,d], w 2 (b) = w 2 (c) = 0. One possible choose

could be w (x) = 2⇡
1
sin(2⇡(x − c)�(b − c)) − (x − b)�(b − c)
Simulation Result
f(x)=sin(2⇡x) + x 2 + ✏; where ✏ ∼ N(0, 0.12 ) 10000 observations

from [0,1]; 10 subsets
Simulation Result
f(x)=sin(2⇡x) + x 2 + ✏; where ✏ ∼ N(0, 0.12 ) 10000 observations

Simulation Result
f(x)=sin(32⇡x) − 8(x − 0.5)2 + ✏;✏ ∼ N(0, 0.12 ) 10000 observations

Simulation Result
f(x)=sin(32⇡x) − 8(x − 0.5)2 + ✏;✏ ∼ N(0, 0.12 ) 10000 observations

Simulation Result
�
Doppler function f(x)=x x(1 − x)sin(2⇡(1 + ))�x + + ✏; where
= 0.05, and ✏ ∼ N(0, 0.12 ) 10000 observations from [0,1]; 1000
subsets
Simulation Result
�
Doppler function f(x)=x x(1 − x)sin(2⇡(1 + ))�x + + ✏; where
= 0.05, and ✏ ∼ N(0, 0.12 ) 10000 observations from [0,1]; 1000
subsets
Simulation Result using GCV estimated parameter
Method MSE(f1) MSE(f2) MSE(f3)

All 1.34 11.5 42.1
RDR 2.1 19.4 99.3
DRDR 1.5 14.0 61.8
SDR 4.6 15.9 14.1
OSDR 3.7 13.5 12.9
Conclusion
Advantage of D&R:
easy to implement via parallel computing;
reduce computation cost and time;
Random division approaches (RDR&DRDR) have similar
performance as the method uses whole dataset when true
function is spatially homogeneous.
When the true function is spatially inhomogeneous, the
sequential division approaches (SDR&OSDR) are spatially
adaptive and have better performances than the method that
uses the whole dataset.
Thank You! The End
,

Final Slides HL

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Final Slides HL

Загружено:

Авторское право:

Доступные форматы

Smoothing Spline Models with Large Datasets

Thanks to the big data, the machine learning community tells

Subsets-of-Data: follow the divide-and-conquer idea to

A smoothing spline model assumes that:

yi = f (xi ) + ✏i , i = 1, ..., n, (1)

where the regression funciton f ∈ H, H is a

Suppose that H = H0 ⊕ H1 , and P1 is the orthogonal

fˆ(x) = d T (x) + c T ⇠(x) (3)

The coefficients c and d can be reduced to solving

Reference: Danqing Xu & Yuedong Wang (2018): Divide and

Random divide and recombine(RDR): Randomly divide the

RDR: It is known that cubic spline MSE=bias squared +

RDR and DRDR: Randomly divide the whole data into K

Sequential Divide and Recombine(SDR): Instead of dividing

One problem with the SDR is the combined estimate is not

Overlapping Sequential Divide and Recombine(OSDR): let

continuous on [a,d], w 2 (b) = w 2 (c) = 0. One possible choose

f(x)=sin(2⇡x) + x 2 + ✏; where ✏ ∼ N(0, 0.12 ) 10000 observations

f(x)=sin(2⇡x) + x 2 + ✏; where ✏ ∼ N(0, 0.12 ) 10000 observations

f(x)=sin(32⇡x) − 8(x − 0.5)2 + ✏;✏ ∼ N(0, 0.12 ) 10000 observations

f(x)=sin(32⇡x) − 8(x − 0.5)2 + ✏;✏ ∼ N(0, 0.12 ) 10000 observations

Method MSE(f1) MSE(f2) MSE(f3)

Вам также может понравиться