Академический Документы
Профессиональный Документы
Культура Документы
History 2018-06-08 67
2018-06-09 68
2018-06-10 70
2018-06-11 69
2018-06-12 72
2018-06-13 ?
Future 2018-06-14 ?
2018-06-15 ?
- Non-parametric
- Flexible and expressive
- Easily inject exogenous features into the model
- Learn from large time series datasets
𝑓𝑓1 𝑤𝑤
𝐿𝐿′ 𝑤𝑤 = 𝑔𝑔′ 𝑓𝑓1 ℎ(𝑤𝑤) � 𝑓𝑓1′ ℎ(𝑤𝑤) � ℎ′ 𝑤𝑤
𝑓𝑓1 𝑤𝑤 𝑓𝑓2 𝑤𝑤
𝐿𝐿′ 𝑤𝑤 𝐿𝐿′ 𝑤𝑤
Dropout: A simple way to prevent neural networks from overfitting Zoneout: Regularizing RNNs by
randomly preserving hidden activations
Learning rate schedules and adaptive learning rate methods for deep learning
5 3 2 7 1 6 X 1 0 -1 = 3 -4 1 1
Filter: 1 x 3 Output: 1 x 4
X =
Input: 1 x 6 X =
Filter: 1 x 3 Output: 1 x 4
Input: 1 x 6 x 3 Output: 1 x 4 x 2
=
X
Filter: 1 x 3 x 3 Output: 1 x 4 x 1
/S
NN RNN
X X
X X1 X2 X3
W
In Keras you will see “timesteps” when set data input shape
Deep Learning for Time Series Forecasting: aka.ms/dlts
Y1 Y2 Y3
X1 X2 X3
RNN
X
Y1 Y2 Y3
In Keras, the parameter “units” is dimensions of hidden state.
(Think of it as feedforward neural network number of units in hidden layer.)
model =RNN
Sequential() RNN RNN
model.add(RNN(4, input_shape=(timesteps, data_dim)))
Y model.add(Dense(3)) (timesteps, features)
X1 X2 X3
RNN
X
Y Y1 Y2 Y3
X X1 X2 X3
X X1 X2 X3
X1 X2 X3
Uninterrupted
gradient flow
State runs straight 1- 1- 1-
through the entire chain
with minor linear tanh tanh tanh
interactions which
makes information very
easy to pass.
1-
Update gates:
tanh
Candidate gates/states:
Reset gates:
1- × tanh
GRU LSTM ×
tanh tanh
Forget gates:
Update gates:
Input gates:
Reset gates:
Candidate gates/states:
Candidate gates/states:
Output gates:
LSTM: A Search Space Odyssey, Greff et al., 2015 Long Short-Term Memory, Hochreiter & Schmidhuber (1997) Deep Learning for Time Series Forecasting: aka.ms/dlts
Y Y1 Y2 Y3
X X1 X2 X3
model = Sequential()
model.add(RNN(4, return_sequences=True, input_shape=(timesteps, data_dim)))
model.add(RNN(4))
Deep Learning for Time Series Forecasting: aka.ms/dlts
RNN for one step time
series forecasting
t-5 t-4 t-3 t-2 t-1 t t+1
Deep Learning for Time Series Forecasting: aka.ms/dlts
github.com/Arturus/kaggle-web-traffic
𝛼𝛼 1 − 𝛼𝛼
𝛼𝛼 1 − 𝛼𝛼
Hyndman, R.J., & Athanasopoulos, G. (2018) Forecasting: principles and practice Deep Learning for Time Series Forecasting: aka.ms/dlts
� 𝑠𝑠𝑡𝑡+1
𝑠𝑠𝑡𝑡
𝑦𝑦𝑡𝑡
𝑠𝑠𝑡𝑡+𝑚𝑚 = 𝛾𝛾 + 1 − 𝛾𝛾 𝑠𝑠𝑡𝑡
𝑙𝑙𝑡𝑡
𝑦𝑦𝑡𝑡
𝑙𝑙𝑡𝑡 = 𝛼𝛼 + 1 − 𝛼𝛼 𝑙𝑙𝑡𝑡−1
𝑠𝑠𝑡𝑡
𝑦𝑦𝑡𝑡
𝑠𝑠𝑡𝑡+𝑚𝑚 = 𝛾𝛾 + 1 − 𝛾𝛾 𝑠𝑠𝑡𝑡
𝑙𝑙𝑡𝑡
𝑦𝑦𝑡𝑡
𝑥𝑥𝑡𝑡 =
𝑙𝑙𝑡𝑡 𝑠𝑠𝑡𝑡
S. Smyl, 2018, Introducing a New Hybrid ES-RNN Model Deep Learning for Time Series Forecasting: aka.ms/dlts
� 𝑠𝑠𝑡𝑡+1 � 𝑅𝑅𝑅𝑅𝑅𝑅(𝑥𝑥𝑡𝑡 )
S. Smyl, 2018, Introducing a New Hybrid ES-RNN Model Deep Learning for Time Series Forecasting: aka.ms/dlts
t+1 t+2 t+3
Dense
𝑅𝑅𝑅𝑅𝑅𝑅(𝑥𝑥𝑡𝑡 )
GRU GRU GRU GRU GRU GRU
𝑦𝑦𝑡𝑡
𝑙𝑙𝑡𝑡 = 𝛼𝛼 + 1 − 𝛼𝛼 𝑙𝑙𝑡𝑡−1
Exponential Smoothing + Normalization + De-seasonalization 𝑠𝑠𝑡𝑡
𝑦𝑦𝑡𝑡
𝑥𝑥𝑡𝑡 =
𝑙𝑙𝑡𝑡 𝑠𝑠𝑡𝑡
S. Smyl, 2018, Introducing a New Hybrid ES-RNN Model Deep Learning for Time Series Forecasting: aka.ms/dlts
t+1 t+2 t+3
Denormalization + Seasonalization
Dense
Fast ES-RNN: A GPU Implementation of the ES-RNN Algorithm Deep Learning for Time Series Forecasting: aka.ms/dlts
Deep Learning for Time Series Forecasting: aka.ms/dlts
Deep Learning for Time Series Forecasting: aka.ms/dlts
Output layer
Challenges
• Huge search space to explore
• Sparsity of good configurations
• Expensive evaluation
• Limited time and resources
…
𝒓𝒓𝒓𝒓𝒓𝒓𝒑𝒑 = {learning rate=0.9, #layers=8, …}
…
(B) Manage resource usage
#𝑝𝑝
of active runs 𝑟𝑟𝑟𝑟𝑟𝑟𝑝𝑝 𝑟𝑟𝑟𝑟𝑟𝑟𝑘𝑘
• How long to execute a run?
{ Sampling
algorithm
“learning_rate”: uniform(0, 1),
“num_layers”: choice(2, 4, 8) Config1= {“learning_rate”: 0.2,
… “num_layers”: 2, …}
}
Config2= {“learning_rate”: 0.5,
“num_layers”: 4, …}