Вы находитесь на странице: 1из 22

Smoothing Spline Models with Large Datasets

Hao Li

University of California
hao li@ucsb.com

December 3, 2020
Overview

1 Introduction

2 D&R Methods

3 Simulation

4 Conclusion
Big Data meets Large Computation

Thanks to the big data, the machine learning community tells


many success stories. Many nonparametric methods likes
Smoothing Splines, Guassian Process Regression (GPR),
Kernel Ridge Regression (KRR), etc su↵er from a cubic time
complexity. This limits the scalability of those methods and
make them una↵ordable for large-scale datasets
Big Data meets Large Computation

Subsets-of-Data: follow the divide-and-conquer idea to


focus on the local subsets of the training data
Sparse kernels: achieve a sparse representation K̃ of K via
particularly designed kernel.
Sparse approximations: approximate the low-rank matrix
K.
Introduction

A smoothing spline model assumes that:

yi = f (xi ) + ✏i , i = 1, ..., n, (1)

where the regression funciton f ∈ H, H is a


reproducing kernel Hilbert space (RKHS) on an
arbitaty set X , and ✏i ∼ N(0, 2 )
i.i.d
Introduction

Suppose that H = H0 ⊕ H1 , and P1 is the orthogonal


projection operator onto H1 . The unique solution of
n
�(yi − f (xi )) + �P1 f �2
2 n
(2)
i=1 2
can be represented as

fˆ(x) = d T (x) + c T ⇠(x) (3)


Introduction

The coefficients c and d can be reduced to solving


the following equations

(⌃ + n I )c + Td = y
TTc = 0

.
where y=(y1 , �, yn )T , T = v (xi )n,pi=1,v =1 , and
⌃ = R1 (xi , xj )ni,j=1 The computation cost of solving
this equation takes O(n3 )
How to reduce the computation

Reference: Danqing Xu & Yuedong Wang (2018): Divide and


Recombine Approaches for Fitting Smoothing Spline Models
with Large Datasets
Divide and Conquer Method (D&R):
i) divides whole data into subsets.
ii) fits the spline model to each subset
iii) recombines estimated functions from each subset into
an overall estimate
How to reduce the computation

Random divide and recombine(RDR): Randomly divide the


whole data into K subsets of approximately equal sizes. Then
the estimated value is fˆ(x) = K1 ∑K fˆk (x), and the posterior
� k=1

deviation is ˆ (x) =K
1
∑K ˆ2 (x)
k=1 k
How to reduce the computation

RDR: It is known that cubic spline MSE=bias squared +


variance =O( )+O(n−1 −1�4 ) (Craven and Wahba 1979), and
the optimal MSE ∼ n−4�5 is achieved with ∼ n−4�5 . Suppose
that n = K × s, then k ∼ s −4�5 is the optimal rate for subset k.
Consequently, MSE (f˜) = O(K 4�5 n−4�5 ) + O(K −1�5 n4�5 ), and
we found the recombined estimate f˜(x) has larger bias and
smaller variance.
Debiased Random divide and recombine(DRDR): to achieve
the optimal rate of MSE, consider new smoothing parameters
˜k = K −4�5 k ∼ n−4�5 . Then the MSE of new recombined
˜ has the same convergence rate as fˆ,
estimated function fnew
the one without dividing into subsets
How to reduce the computation

RDR and DRDR: Randomly divide the whole data into K


subsets of approximately equal sizes. Then the estimated
value is fˆ(x) = K1 ∑K fˆk (x), and the posterior variance is
� k=1

ˆ (x) = K
1
∑K ˆ2 (x)
k=1 k
How to reduce the computation

Sequential Divide and Recombine(SDR): Instead of dividing


observations, dividing the domain into K disjoint subintervals.
fˆ(x) = fˆk (x), when x belongs to the kth subinterval.

One problem with the SDR is the combined estimate is not


smooth at the joints of subintervals.
How to reduce the computation

Overlapping Sequential Divide and Recombine(OSDR): let


a < c < b < d and denote the subset in [a,b] and [c,d] as S1
and S2. Defining the g1 and g2 are the estimated function on
S1 and S2.
The recombined estimate

�g1 (x), a ≤ x < c,



g (x) = �w (x)g1 (x) + (1 − w (x))g2 (x), c ≤ x ≤ b,

(4)



�g2 (x), b < x ≤ d

The weight function w has w(c)=1 and w(b)=0. To make g
continuous on [a,d], w (b) = w (c) = 0. To make g
′ ′ ′′

continuous on [a,d], w 2 (b) = w 2 (c) = 0. One possible choose


could be w (x) = 2⇡
1
sin(2⇡(x − c)�(b − c)) − (x − b)�(b − c)
Simulation Result

f(x)=sin(2⇡x) + x 2 + ✏; where ✏ ∼ N(0, 0.12 ) 10000 observations


from [0,1]; 10 subsets
Simulation Result

f(x)=sin(2⇡x) + x 2 + ✏; where ✏ ∼ N(0, 0.12 ) 10000 observations


from [0,1]; 10 subsets
Simulation Result

f(x)=sin(32⇡x) − 8(x − 0.5)2 + ✏;✏ ∼ N(0, 0.12 ) 10000 observations


from [0,1]; 10 subsets
Simulation Result

f(x)=sin(32⇡x) − 8(x − 0.5)2 + ✏;✏ ∼ N(0, 0.12 ) 10000 observations


from [0,1]; 10 subsets
Simulation Result

Doppler function f(x)=x x(1 − x)sin(2⇡(1 + ))�x + + ✏; where
= 0.05, and ✏ ∼ N(0, 0.12 ) 10000 observations from [0,1]; 1000
subsets
Simulation Result

Doppler function f(x)=x x(1 − x)sin(2⇡(1 + ))�x + + ✏; where
= 0.05, and ✏ ∼ N(0, 0.12 ) 10000 observations from [0,1]; 1000
subsets
Simulation Result using GCV estimated parameter

Method MSE(f1) MSE(f2) MSE(f3)


All 1.34 11.5 42.1
RDR 2.1 19.4 99.3
DRDR 1.5 14.0 61.8
SDR 4.6 15.9 14.1
OSDR 3.7 13.5 12.9
Conclusion

Advantage of D&R:
easy to implement via parallel computing;
reduce computation cost and time;
Random division approaches (RDR&DRDR) have similar
performance as the method uses whole dataset when true
function is spatially homogeneous.
When the true function is spatially inhomogeneous, the
sequential division approaches (SDR&OSDR) are spatially
adaptive and have better performances than the method that
uses the whole dataset.
Thank You! The End
,

Вам также может понравиться