Вы находитесь на странице: 1из 101

This tutorial is about the intersection of DL and Time Series

Deep Learning Time Series

Deep Learning for Time Series Forecasting: aka.ms/dlts


Deep Learning for Time Series Forecasting: aka.ms/dlts
Deep Learning for Time Series Forecasting: aka.ms/dlts
Agenda Time (mins)
Tutorial Introduction + Pre-requisite Setup 15
Knowledge Recap 10
Introduction to convolutional neural networks (CNN) 40
Introduction to recurrent neural networks (RNN) 25
Break 10
Encoder-decoder RNN model 25
Hybrid models for time series forecasting 25
Hyper-parameter tuning 20
Conclusion 5
Q&A 5

• Hands-on lab and quizzes


• Q&A during break or in the end
Deep Learning for Time Series Forecasting: aka.ms/dlts
Deep Learning for Time Series Forecasting: aka.ms/dlts
Pre-requisite Setup
aka.ms/dlts/pr
Time Series Forecasting
Date Observation
2018-06-04 60
2018-06-05 64
2018-06-06 66
2018-06-07 65

History 2018-06-08 67
2018-06-09 68
2018-06-10 70
2018-06-11 69
2018-06-12 72
2018-06-13 ?
Future 2018-06-14 ?
2018-06-15 ?

Deep Learning for Time Series Forecasting: aka.ms/dlts


Deep Learning for Time Series Forecasting: aka.ms/dlts
Deep Learning for Time Series Forecasting: aka.ms/dlts
Tao Hong, Pierre Pinson, Shu Fan, Hamidreza
Zareipour, Alberto Troccoli and Rob J. Hyndman,
"Probabilistic energy forecasting: Global Energy
Forecasting Competition 2014 and beyond",
International Journal of Forecasting, vol.32, no.3,
pp 896-913, July-September, 2016.

Deep Learning for Time Series Forecasting: aka.ms/dlts


- Deep learning model has been shown perform well in many scenarios
 2014 Global Energy Forecasting Competition (link)
 2016 CIF International Time Series Competition (link)
 2017 Web Traffic Time Series Forecasting (link)
 2018 Corporación Favorita Grocery Sales Forecasting (link)
 2018 M4-Competition (link)

- Non-parametric
- Flexible and expressive
- Easily inject exogenous features into the model
- Learn from large time series datasets

Deep Learning for Time Series Forecasting: aka.ms/dlts


Feedforward Neural
Networks Recap
Feed-Forward Neural Network and Deep Neural Network
Deep Learning for Time Series Forecasting: aka.ms/dlts
https://en.wikipedia.org/wiki/Activation_function Deep Learning for Time Series Forecasting: aka.ms/dlts
Deep Learning for Time Series Forecasting: aka.ms/dlts
Deep Learning for Time Series Forecasting: aka.ms/dlts
learning rate

Deep Learning for Time Series Forecasting: aka.ms/dlts


Computation of

𝑓𝑓1 𝑤𝑤
𝐿𝐿′ 𝑤𝑤 = 𝑔𝑔′ 𝑓𝑓1 ℎ(𝑤𝑤) � 𝑓𝑓1′ ℎ(𝑤𝑤) � ℎ′ 𝑤𝑤

𝐿𝐿′ 𝑤𝑤 = 𝑔𝑔′ 𝑓𝑓2 ℎ(𝑤𝑤) � 𝑓𝑓2′ ℎ(𝑤𝑤) � ℎ′ 𝑤𝑤

𝑓𝑓1 𝑤𝑤 𝑓𝑓2 𝑤𝑤
𝐿𝐿′ 𝑤𝑤 𝐿𝐿′ 𝑤𝑤

Deep Learning for Time Series Forecasting: aka.ms/dlts


Tutorial from deeplearning.ai

Regularization for deep learning

Dropout: A simple way to prevent neural networks from overfitting Zoneout: Regularizing RNNs by
randomly preserving hidden activations

Batch normalization: Accelerating deep network training by reducing internal


covariate shift Recurrent batch normalization

Learning rate schedules and adaptive learning rate methods for deep learning

An overview of gradient descent optimization algorithms

Deep Learning for Time Series Forecasting: aka.ms/dlts


Deep Learning for Time Series Forecasting: aka.ms/dlts
Deep Learning for Time Series Forecasting: aka.ms/dlts
5x1+2
3 3x0+7
2 x -1 = -4
3

5 3 2 7 1 6 X 1 0 -1 = 3 -4 1 1

Input: 1 x 6 Filter: 1 x 3 Output: 1 x 4

Deep Learning for Time Series Forecasting: aka.ms/dlts


5 3 2 7 1 6 X w1 w2 w3 = 3 4 1 1

Input: 1 x 6 Filter: 1 x 3 Output: 1 x 4

Deep Learning for Time Series Forecasting: aka.ms/dlts


X =

Filter: 1 x 3 Output: 1 x 4

X =

Filter: 1 x 3 Output: 1 x 4 Output: 1 x 4 x 3

Input: 1 x 6 X =

Filter: 1 x 3 Output: 1 x 4

Deep Learning for Time Series Forecasting: aka.ms/dlts


t-9 t-8 t-7 t-6 t-5 t-4 t-3 t-2 t-1 t t+1

Deep Learning for Time Series Forecasting: aka.ms/dlts


Deep Learning for Time Series Forecasting: aka.ms/dlts
Deep Learning for Time Series Forecasting: aka.ms/dlts
Deep Learning for Time Series Forecasting: aka.ms/dlts
Deep Learning for Time Series Forecasting: aka.ms/dlts
Deep Learning for Time Series Forecasting: aka.ms/dlts
=
X
Filter: 1 x 3 x 3 Output: 1 x 4 x 1

Input: 1 x 6 x 3 Output: 1 x 4 x 2
=
X
Filter: 1 x 3 x 3 Output: 1 x 4 x 1

Deep Learning for Time Series Forecasting: aka.ms/dlts


https://github.com/MSRDL/Deep4Cast
https://arxiv.org/pdf/1609.03499.pdf

Deep Learning for Time Series Forecasting: aka.ms/dlts


Deep Learning for Time Series Forecasting: aka.ms/dlts
Deep Learning for Time Series Forecasting: aka.ms/dlts
Deep Learning for Time Series Forecasting: aka.ms/dlts
Deep Learning for Time Series Forecasting: aka.ms/dlts
Y Y

/S
NN RNN

X X

Deep Learning for Time Series Forecasting: aka.ms/dlts


Y Y1 Y2 Y3

RNN RNN RNN RNN

X X1 X2 X3

W
In Keras you will see “timesteps” when set data input shape
Deep Learning for Time Series Forecasting: aka.ms/dlts
Y1 Y2 Y3

RNN RNN RNN

X1 X2 X3
RNN

X
Y1 Y2 Y3
In Keras, the parameter “units” is dimensions of hidden state.
(Think of it as feedforward neural network number of units in hidden layer.)

model =RNN
Sequential() RNN RNN
model.add(RNN(4, input_shape=(timesteps, data_dim)))
Y model.add(Dense(3)) (timesteps, features)

X1 X2 X3
RNN

X
Y Y1 Y2 Y3

RNN RNN RNN RNN

X X1 X2 X3

Forward through entire sequence to compute loss

then backward through entire sequence to compute gradient


Deep Learning for Time Series Forecasting: aka.ms/dlts
tanh

Deep Learning for Time Series Forecasting: aka.ms/dlts


Y Y1 Y2 Y3

RNN tanh tanh tanh

X X1 X2 X3

Deep Learning for Time Series Forecasting: aka.ms/dlts


100 time steps is similar to 100 layers feedforward neural net
Deep Learning for Time Series Forecasting: aka.ms/dlts
Deep Learning for Time Series Forecasting: aka.ms/dlts
Deep Learning for Time Series Forecasting: aka.ms/dlts
1-
tanh
tanh

Deep Learning for Time Series Forecasting: aka.ms/dlts


Y1 Y2 Y3

GRU GRU GRU

X1 X2 X3

Uninterrupted
gradient flow
State runs straight 1- 1- 1-
through the entire chain
with minor linear tanh tanh tanh
interactions which
makes information very
easy to pass.

Deep Learning for Time Series Forecasting: aka.ms/dlts


Hidden state:

1-
Update gates:
tanh
Candidate gates/states:

Reset gates:

GRU [Learning phrase representations using rnn encoderdecoder (0,1) (-1, 1)


for statistical machine translation, Cho et al. 2014]
Deep Learning for Time Series Forecasting: aka.ms/dlts
× +

1- × tanh
GRU LSTM ×
tanh tanh

Hidden cell state:


Hidden state:
Hidden state:

Forget gates:
Update gates:
Input gates:
Reset gates:
Candidate gates/states:
Candidate gates/states:
Output gates:
LSTM: A Search Space Odyssey, Greff et al., 2015 Long Short-Term Memory, Hochreiter & Schmidhuber (1997) Deep Learning for Time Series Forecasting: aka.ms/dlts
Y Y1 Y2 Y3

RNN RNN RNN RNN

RNN RNN RNN RNN

X X1 X2 X3

model = Sequential()
model.add(RNN(4, return_sequences=True, input_shape=(timesteps, data_dim)))
model.add(RNN(4))
Deep Learning for Time Series Forecasting: aka.ms/dlts
RNN for one step time
series forecasting
t-5 t-4 t-3 t-2 t-1 t t+1

Deep Learning for Time Series Forecasting: aka.ms/dlts


Deep Learning for Time Series Forecasting: aka.ms/dlts
RNN for multi-step time
series forecasting
t-5 t-4 t-3 t-2 t-1 t t+1 t+2 t+3

Deep Learning for Time Series Forecasting: aka.ms/dlts


Deep Learning for Time Series Forecasting: aka.ms/dlts
t+1 t+2 t+3

Deep Learning for Time Series Forecasting: aka.ms/dlts




Deep Learning for Time Series Forecasting: aka.ms/dlts


t+1 t+2 t+3

Deep Learning for Time Series Forecasting: aka.ms/dlts



Deep Learning for Time Series Forecasting: aka.ms/dlts


Deep Learning for Time Series Forecasting: aka.ms/dlts
Deep Learning for Time Series Forecasting: aka.ms/dlts
t+1 t+2 t+3

Deep Learning for Time Series Forecasting: aka.ms/dlts


t+1 t+2 t+3

Deep Learning for Time Series Forecasting: aka.ms/dlts





Deep Learning for Time Series Forecasting: aka.ms/dlts
github.com/Arturus/kaggle-web-traffic

Deep Learning for Time Series Forecasting: aka.ms/dlts


Deep Learning for Time Series Forecasting: aka.ms/dlts
Deep Learning for Time Series Forecasting: aka.ms/dlts
Slawek Smyl, 2018
https://eng.uber.com/m4-forecasting-competition/
Winner of M4 competition

Deep Learning for Time Series Forecasting: aka.ms/dlts


𝛼𝛼 𝛼𝛼 1 − 𝛼𝛼 𝛼𝛼(1 − 𝛼𝛼)2

𝛼𝛼 1 − 𝛼𝛼
𝛼𝛼 1 − 𝛼𝛼

Hyndman, R.J., & Athanasopoulos, G. (2018) Forecasting: principles and practice Deep Learning for Time Series Forecasting: aka.ms/dlts
� 𝑠𝑠𝑡𝑡+1

𝑠𝑠𝑡𝑡

𝑦𝑦𝑡𝑡
𝑠𝑠𝑡𝑡+𝑚𝑚 = 𝛾𝛾 + 1 − 𝛾𝛾 𝑠𝑠𝑡𝑡
𝑙𝑙𝑡𝑡

Deep Learning for Time Series Forecasting: aka.ms/dlts


� 𝑠𝑠𝑡𝑡+1 � 𝑅𝑅𝑅𝑅𝑅𝑅(𝑥𝑥𝑡𝑡 )

𝑦𝑦𝑡𝑡
𝑙𝑙𝑡𝑡 = 𝛼𝛼 + 1 − 𝛼𝛼 𝑙𝑙𝑡𝑡−1
𝑠𝑠𝑡𝑡

𝑦𝑦𝑡𝑡
𝑠𝑠𝑡𝑡+𝑚𝑚 = 𝛾𝛾 + 1 − 𝛾𝛾 𝑠𝑠𝑡𝑡
𝑙𝑙𝑡𝑡

𝑦𝑦𝑡𝑡
𝑥𝑥𝑡𝑡 =
𝑙𝑙𝑡𝑡 𝑠𝑠𝑡𝑡

S. Smyl, 2018, Introducing a New Hybrid ES-RNN Model Deep Learning for Time Series Forecasting: aka.ms/dlts
� 𝑠𝑠𝑡𝑡+1 � 𝑅𝑅𝑅𝑅𝑅𝑅(𝑥𝑥𝑡𝑡 )

Dilated Recurrent Neural Networks


Dual-Stage Attention-Based Recurrent
Neural Network
Residual LSTM

S. Smyl, 2018, Introducing a New Hybrid ES-RNN Model Deep Learning for Time Series Forecasting: aka.ms/dlts
t+1 t+2 t+3

Denormalization + Seasonalization 𝑦𝑦�𝑡𝑡+1|𝑡𝑡 = 𝑙𝑙𝑡𝑡 � 𝑠𝑠𝑡𝑡+1 � 𝑅𝑅𝑅𝑅𝑅𝑅(𝑥𝑥𝑡𝑡 )

Dense

𝑅𝑅𝑅𝑅𝑅𝑅(𝑥𝑥𝑡𝑡 )
GRU GRU GRU GRU GRU GRU

𝑦𝑦𝑡𝑡
𝑙𝑙𝑡𝑡 = 𝛼𝛼 + 1 − 𝛼𝛼 𝑙𝑙𝑡𝑡−1
Exponential Smoothing + Normalization + De-seasonalization 𝑠𝑠𝑡𝑡

𝑦𝑦𝑡𝑡
𝑥𝑥𝑡𝑡 =
𝑙𝑙𝑡𝑡 𝑠𝑠𝑡𝑡

S. Smyl, 2018, Introducing a New Hybrid ES-RNN Model Deep Learning for Time Series Forecasting: aka.ms/dlts
t+1 t+2 t+3

Denormalization + Seasonalization

Dense

GRU GRU GRU GRU GRU GRU

Exponential Smoothing + Normalization + De-seasonalization

Fast ES-RNN: A GPU Implementation of the ES-RNN Algorithm Deep Learning for Time Series Forecasting: aka.ms/dlts
Deep Learning for Time Series Forecasting: aka.ms/dlts
Deep Learning for Time Series Forecasting: aka.ms/dlts
Output layer

Number of nodes Learning rate


in layers
Input layer
Batch size

Hidden layer 1 Hidden layer 2


Number of Regularization
hidden layers parameter

Deep Learning for Time Series Forecasting: aka.ms/dlts


Hyper-parameter tuning

Challenges
• Huge search space to explore
• Sparsity of good configurations
• Expensive evaluation
• Limited time and resources

Deep Learning for Time Series Forecasting: aka.ms/dlts


Deep Learning for Time Series Forecasting: aka.ms/dlts
𝒓𝒓𝒓𝒓𝒓𝒓𝟏𝟏 = {learning rate=0.2, #layers=2, …}
𝒓𝒓𝒓𝒓𝒓𝒓𝟐𝟐 = {learning rate=0.5, #layers=4, …}


𝒓𝒓𝒓𝒓𝒓𝒓𝒑𝒑 = {learning rate=0.9, #layers=8, …}

(A) Generate new runs #1 𝑟𝑟𝑟𝑟𝑟𝑟1 𝑟𝑟𝑟𝑟𝑟𝑟𝑗𝑗 ???


• Which parameter
configuration to explore? #2 𝑟𝑟𝑟𝑟𝑟𝑟2 𝑟𝑟𝑟𝑟𝑟𝑟𝑖𝑖


(B) Manage resource usage
#𝑝𝑝
of active runs 𝑟𝑟𝑟𝑟𝑟𝑟𝑝𝑝 𝑟𝑟𝑟𝑟𝑟𝑟𝑘𝑘
• How long to execute a run?

Deep Learning for Time Series Forecasting: aka.ms/dlts


Generating New Runs – Sampling algorithms

{ Sampling
algorithm
“learning_rate”: uniform(0, 1),
“num_layers”: choice(2, 4, 8) Config1= {“learning_rate”: 0.2,
… “num_layers”: 2, …}
}
Config2= {“learning_rate”: 0.5,
“num_layers”: 4, …}

Config3= {“learning_rate”: 0.9,


“num_layers”: 8, …}

Deep Learning for Time Series Forecasting: aka.ms/dlts


Managing Active Jobs

Early terminate poorly


performing runs

Deep Learning for Time Series Forecasting: aka.ms/dlts


User inputs & system outputs

Training script Recommended


+ hyperparameter
Training data configuration & run metrics

Hyperparameter Tuning in Azure ML

Deep Learning for Time Series Forecasting: aka.ms/dlts


• Python SDK for
• creating / cancelling
• obtaining results
• Elastic compute (auto-scaling)
• Remote compute can be CPU / GPU cluster

Deep Learning for Time Series Forecasting: aka.ms/dlts


Multi-horizon quantile-based forecaster

Zoneout: Regularizing RNNs by randomly preserving hidden activations


memory networks

recurrent batch normalization


stateful training

An Empirical Evaluation of Generic Convolutional and


Recurrent Networks for Sequence Modeling
Deep Learning for Time Series Forecasting: aka.ms/dlts
aka.ms/dlts

Deep Learning for Time Series Forecasting: aka.ms/dlts


[1] Dropout: A simple way to prevent neural networks from overfitting
[2] Zoneout: Regularizing RNNs by randomly preserving hidden activations
[3] Batch normalization: Accelerating deep network training by reducing internal covariate shift
[4] Recurrent batch normalization
[5] Learning rate schedules and adaptive learning rate methods for deep learning
[6] An overview of gradient descent optimization algorithms
[7] Learning phrase representations using rnn encoderdecoder for statistical machine translation
[8] LSTM: A Search Space Odyssey
[9] Bayesian optimization with robust Bayesian neural networks
[10] Hyperband: A novel bandit-based approach for hyperparameter optimization
[11] Neural architecture search with reinforcement learning
[12] Multi-horizon quantile-based forecaster
[13] Population based training of neural networks
[14] Memory networks
[15] Stateful training
[16] An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
[17] Winning solution of web traffic forecasting competition
[18] Probabilistic energy forecasting: Global Energy Forecasting Competition 2014 and beyond
Deep Learning for Time Series Forecasting: aka.ms/dlts
[19] S. Smyl, 2018, M4 Forecasting Competition: Introducing a New Hybrid ES-RNN Model
[20] Hyndman, R.J., & Athanasopoulos, G. (2018) Forecasting: principles and practice
[21] Dilated Recurrent Neural Networks
[22] Dual-Stage Attention-Based Recurrent Neural Network
[23] Residual LSTM
[24] Fast ES-RNN: A GPU Implementation of the ES-RNN Algorithm

Deep Learning for Time Series Forecasting: aka.ms/dlts


Deep Learning for Time Series Forecasting: aka.ms/dlts

Вам также может понравиться