A Tensor Based Deep Learning Technique For Intelligent Packet Routing

A Tensor Based Deep Learning Technique
for Intelligent Packet Routing

Bomin Mao∗ , Zubair Md. Fadlullah∗ , Fengxiao Tang∗ , Nei Kato∗ ,
Osamu Akashi† , Takeru Inoue† , and Kimihiro Mizutani†
∗ GraduateSchool of Information Sciences, Tohoku University, Sendai, Japan
Email: {bomin.mao, zubair, fengxiao.tang, kato}@it.is.tohoku.ac.jp
† NTT Corporation, Yokosuka, Japan
Email: {akashi.osamu, inoue.takeru, mizutani.kimihiro}@lab.ntt.co.jp
Abstract—Recently, network operators are confronting the Existing Routing Paradigm Future Routing Paradigm
challenge of exploding traffic and more complex network environ-
Router typically uses fixed,
ments due to the increasing number of access terminals having Router needs to use many different parameters
a single metric
various requirements for delay and package loss rate. However, or a few metrics
traditional routing methods based on the maximum or minimum Number of Hardware Reach-
packets status ability
single metric value aim at improving the network quality of Link Distance
only one aspect, which makes them become incapable to deal state vector Packet Protocols Link
with the increasingly complicated network traffic. Considering types policies delay
the improvement of deep learning techniques in recent years, Static/dynamic routing paths Intelligent packet routing paths
in this paper, we propose a smart packet routing strategy with
Tensor-based Deep Belief Architectures (TDBAs) that considers
multiple parameters of network traffic. For better modeling the Fig. 1: The future routing strategy needs to consider multiple
data in TDBAs, we use the tensors to represent the units in every
layer as well as the weights and biases. The proposed TDBAs
parameters for traffic control.
can be trained to predict the whole paths for every edge router.
Simulation results demonstrate that our proposal outperforms inputs through training with example data. The trained deep
the conventional Open Shortest Path First (OSPF) protocol in learning architecture can predict the values of some parameters
terms of overall packet loss rate and average delay per hop.
when we input the necessary information. Since the deep learn-
ing technique has exhibited superior performance in extremely
I. I NTRODUCTION
difficult applications which have traditionally been dominated
In recent years, the network traffic is increasing tremen- by humans, e.g. board games, it may also have interesting
dously. The global Internet Protocol (IP) traffic per annum applications for network traffic control. Kato et al. [3]–[5]
exceeded the ZettaByte (ZB) threshold at the end of 2016, adopted deep learning to predict the routing paths.
and is expected to increase up to 2.3 ZB by 2020. At the same If we want to utilize deep learning techniques to relate the
time, the number of devices connected to the IP networks will routing paths and multiple network parameters, it is critically
be three times as high as the global population in 2020 [1]. important to choose a suitable mathematical model to represent
The packets sent by the devices have different requirements for the parameters. The tensor, which can be seen as a multi-
the Quality of Service (QoS) [2] since these devices belong dimensional matrix, provides a very concise mathematical
to heterogeneous networks. Thus, it becomes increasingly framework to arrange the values of various parameters [10],
complicated to control the network traffic when designing the leading to its wide application in physics and engineering.
routing strategy. As shown in Fig. 1, conventional routing While the works in [2] and [10] used the tensor to perform
protocols based on the maximum or minimum metric value QoS-specific packet transmission, to the best of our knowl-
focus on fixed metrics, which cannot cope with the complex edge, no prior work investigated how to use the tensor for
traffic environment. In order to control the network traffic intelligent packet routing.
in an effective manner, it is necessary to consider multiple In this vein, we aim to combine the tensor and deep learning
parameters and intelligently choose the routing paths [3]–[5]. technique. We propose a Tensor-based Deep Belief Architec-
Moreover, the multi-core platforms have received significant ture (TDBA), which utilizes the tensor to model the values
improvement in the computing capacity, enabling routers to of every layer, weights and biases. Considering the tensor
undertake much more complex computation [6]. containing multi-dimensional information, we use the numbers
Recently, deep learning, a new breed of machine learning of inbound packets of all edge routers and the corresponding
technique, has been widely applied in various fields, such as time, source and destination, and remaining buffer size to form
image classification [7], pattern recognition [8], and natural a five-dimensional array as the input tensor. We adopt a vector,
language processing [9]. This technique can be utilized to which can be seen as a 1-dimensional tensor as the output,
effectively analyze the complex relationships among multiple to denote the next node. The tensors representing the hidden
978-1-5090-5019-2/17/$31.00 ©2017 IEEE

2.0 … 2.9 … layer. The units in every two adjacent layers are connected
1.6 … 2.0 … through weighted links while there is no inter-layer link. It
… should be noted that a bias unit which is omitted in the figure
1.7 … 2.5 …
…
is added to every unit in all layers except the input layer.
1.3 1.8 …
5 Usually, we use vectors to represent the values of units in
… 1.2 … 2.0 …
each layer, and a 2-dimensional matrix to represent the weight
500B 1.0 … 1.3 …
hop number
ho
and bias values of the links between two layers. If the input
1 … 6 …
comprises multi-dimensional data, we can use the tensor to
represent the values of units in each layer as well as the weights
Fig. 2: An example of a three-order tensor.
and biases. Here, we define an operation of tensors to describe
the relationship between every two layers in the tensor-based
layers, weights, and biases can be decided through training.
deep learning architecture [11].
In our considered network, the packets are supposed to be
Definition 1: Multi-dot Product (⊙). Suppose there are
generated in and destined for edge routers while the inner
three high-order tensors: 𝑋, 𝑍, and 𝑊 , and assume thatïijŽ
routers just forward packets. Every edge router obtains the
𝑋 ∈ 𝑅𝐼1 ×𝐼2 ×⋅⋅⋅×𝐼𝑁 , 𝑍 ∈ 𝑅𝐽1 ×𝐽2 ×⋅⋅⋅×𝐽𝑁 , and 𝑊 ∈
traffic patterns of the other edge routers through signaling
𝑅𝛼×𝐽1 ×𝐽2 ×⋅⋅⋅×𝐽𝑁 (𝛼 = 𝐼1 × 𝐼2 × ⋅ ⋅ ⋅ × 𝐼𝑁 ). Assume that 𝑊
process. Then, the edge routers input the traffic patterns to
consists of 𝛼 sub-tensors of 𝑁 orders. The multi-dot result of
the TDBA and construct the paths to all other edge routers.
𝑋 and 𝑊 is 𝑍, and they satisfy the following equation,
The whole paths are attached to the headers of corresponding
packets. Consequently, other routers just forward the packets 𝑍 = 𝑋 ⊙ 𝑊, ∀𝑧𝑗1 𝑗2 ⋅⋅⋅𝑗𝑁 ∈ 𝑍, 𝑧𝑗1 𝑗2 ⋅⋅⋅𝑗𝑁 = 𝑋 ∙ 𝑊𝛾 (1)
according to the labeled paths. ∑𝑁 −1 ∏𝑁
The remainder of the paper is organized as follows. The where 𝛾 = 𝑗𝛾 + 𝑖=1 (𝑗𝑖 − 1) 𝑟=𝑖+1 𝐽𝑟 .
preliminaries on the tensor and deep learning are provided in Except for the product operation, another difference of the
Sec. II. Sec. III presents our proposed TDBA and how to utilize tensor-based deep learning architecture from the traditional
it for network routing. In Sec. IV, we analyze the performance architecture is the method to calculate the training errors. As
of our proposal. The paper is concluded in Sec. V. we know, the training error is defined by the distance between
the practical output and the desired output. For two 𝑁 -order
II. P RELIMINARIES tensors: 𝑋 ∈ 𝑅𝐼1 ×𝐼2 ×⋅⋅⋅×𝐼𝑁 and 𝑌 ∈ 𝑅𝐼1 ×𝐼2 ×⋅⋅⋅×𝐼𝑁 , their
In this section, we will give some preliminaries about the distance is calculated as follows.
tensor model for routing and the tensor calculations for deep
𝐼1 ×𝐼2 ×⋅⋅⋅×𝐼𝑁
learning. ∑
𝑑𝑇 𝐷 = ⎷ 𝑔𝑖𝑗 (𝑥𝑖 − 𝑦𝑖 )(𝑥𝑗 − 𝑦𝑗 )
A. Tensor Model for Routing 𝑖,𝑗=1 (2)
√
A tensor is an organized multi-dimensional matrix which
= (𝑋 − 𝑌 )𝑇 𝐺(𝑋 − 𝑌 ),
can represent the data in large-scale forms. The order or rank of
a tensor is the dimensionality of the matrix needed to represent where 𝐺 is the metric matrix used to reflect the intrinsic
it. Traditional scalars and vectors can be considered as 0- relationships between different coordinates for a high-order
dimensional and 1-dimensional tensors, respectively. We can data. Its element, 𝑔𝑖𝑗 , is typically defined as:
define 𝑋 ∈ 𝑅𝐼1 ×𝐼2 ×⋅⋅⋅×𝐼𝑁 as an N-order tensor and the size
of the 𝑗 𝑡ℎ dimension is 𝐼𝑗 (𝑗 = 1 . . . 𝑁 ). Every dimension 1 ∥𝑝𝑖 − 𝑝𝑗 ∥2
2
of the tensor 𝑋 can represent a metric or a parameter. Then 𝑔𝑖𝑗 = 𝑒𝑥𝑝{− }, (3)
2𝜋𝛿 2 2𝛿 2
the value of 𝑥𝑗1 𝑗2 ⋅⋅⋅𝑗𝑁 can denote another parameter’s value
2
under various conditions. For example, in Fig. 2, a three- where ∥𝑝𝑖 − 𝑝𝑗 ∥2 is the distance between 𝑥𝑖1 𝑖2 ⋅⋅⋅𝑖𝑁 (𝑥𝑖 ) and
order tensor is depicted. The three dimensions represent the 𝑦𝑗1 𝑗2 ⋅⋅⋅𝑗𝑁 (𝑦𝑗 ), and it can be calculated as follows.
hop number, packet size, and destination, respectively. The
2
elements’ values in the tensor denote the delay of a router ∥𝑝𝑖 − 𝑝𝑗 ∥2 =
for sending packets of different sizes to various destination √ (4)
(𝑖1 − 𝑗1 ) + (𝑖2 − 𝑗2 ) + ⋅ ⋅ ⋅ + (𝑖𝑁 − 𝑗𝑁 )2
2 2
routers via paths having different hops. We can find that the
value of 𝑥5,500,6 is 1.8, meaning that the transmission delay III. T ENSOR -BASED D EEP L EARNING ROUTING
of the 500B packet destined for router 5 via the 6-hop path is S TRATEGY
1.8 seconds. From the example, it can be easily found that the
tensor is an efficient tool to represent the values of network After introducing the basic knowledge on the tensor and
parameters of a large scale network. multi-dot operation, we design the tensor-based deep learning
model for routing. For clarity, this section is separated into
B. Tensor-Based Deep Learning Model four parts: the input and output design, the forward propagation
As depicted in Fig. 3a, the deep learning architecture con- process, the backward propagation process, and the procedures
sists of an input layer, several hidden layers, and an output of utilizing the structure for routing.
Tensor of Bottom Layer Layer l-1 Layer l Top Layer Output
traffic patterns vector
Unit j in the hidden layer, ℎ
(0
‫ܫ‬௦௥
‫ܫ‬ௗ௥ … … hidden layer
‫ܫ‬௥
1 ...
bias unit
…
…
‫ܫ‬௧௜௠ unit j
‫ܫ‬௕௦
0
‫ݓ‬௝௜௟
… … visible layer
unit i
Unit i in the visible layer,
0)
Hidden Layers
(a) The considered TDBA. (b) The architecture of Restricted

Boltzmann Machine.
Fig. 3: The consider tensor-based deep learning model.
A. Envisioned Input and Output Design B. Forward Propagation Process

After designing the input and output layers, we now need
to choose the deep learning architecture. In our proposal, we
For better network traffic control, we choose the traffic utilize the Deep Belief Architecture (DBA) which is the most
pattern as the input of the deep learning architecture since common and effective among all deep learning models [3].
it is the direct sign of the traffic situation. We utilize the Since the units in the DBA are modeled as tensors, it is
number of inbound packets of every router as its traffic pattern. referred to as the TDBA in the remainder of the paper.
Therefore, the time should be divided into many slots and As we know, every DBA is composed of several Restricted
every time slot is assumed to be Δ𝑡 seconds. The number of Boltzmann Machines (RBMs) and each RBM consists of a
inbound packets in the past 𝛽 time slots is adopted. To utilize visible layer and a hidden layer denoted by 𝑉 ∈ 𝑅𝐼1 ×𝐼2 ×⋅⋅⋅×𝐼𝑁
the traffic pattern more effectively, we also record the source and 𝐻 ∈ 𝑅𝐽1 ×𝐽2 ×⋅⋅⋅×𝐽𝑁 , respectively. The training of the
and destination nodes for every packet. Additionally, at the TDBA consists of two steps: the Greedy Layer-Wise training
end of each time slot, every router also needs to record the to initialize the structure and the backpropagation step to fine-
size of its remaining buffer. With these information, we can tune the structure [12]. For the TDBA, the Greedy Layer-Wise
build a five-order tensor, 𝑋 ∈ 𝑅𝐼𝑟 ×𝐼𝑠𝑟 ×𝐼𝑑𝑟 ×𝐼𝑡𝑖𝑚 ×𝐼𝑏𝑠 , shown training is to train RBMs one by one in a bottom-up fashion.
in Fig. 3a, to describe the traffic pattern of the network. The As the RBM is an undirected graph model where the units in
tensor orders, 𝐼𝑟 , 𝐼𝑠𝑟 , 𝐼𝑑𝑟 , 𝐼𝑡𝑖𝑚 , and 𝐼𝑏𝑠 denote the router, the visible layer are connected to stochastic hidden units using
the source router, the destination router, the time, and the symmetrically weighted connections in Fig. 3b, the training
remaining buffer size, respectively. For the considered network, process of the RBM is unsupervised. This implies that sets of
the value of 𝐼𝑟 depends on the number of routers, traffic unlabeled data are given to the visible layer. Then, the values of
patterns of which are utilized. The values of 𝐼𝑠𝑟 and 𝐼𝑑𝑟 depend the weights and biases are repeatedly adjusted until the hidden
on the number of routers generating and receiving packets layer can reconstruct the visible layer as exactly as possible.
since some routers in the considered backbone network just To measure the reconstruction error, a log-likelihood function
play the role of forwarding packets. The value of 𝐼𝑡𝑖𝑚 is of the visible layer is utilized as follows.
equal to 𝛽, which is decided during the training process of
𝑚
∑
the TDBA.
𝑙(𝑊, 𝐵, 𝐴) = log 𝑝(𝑉 (𝑡) ), (5)
After designing the input, the next step is to design the 𝑡=1
output tensor for routing. First, we assume that only a single
path exists for every source-destination pair. If the output layer where 𝑊 ∈ 𝑅𝛼×𝐽1 ×𝐽2 ×⋅⋅⋅×𝐽𝑁 (𝛼 = 𝐼1 × 𝐼2 × ⋅ ⋅ ⋅ × 𝐼𝑁 ),
represents one whole path or several whole paths, the error 𝐵 ∈ 𝑅𝐽1 ×𝐽2 ×⋅⋅⋅×𝐽𝑁 , and 𝐴 ∈ 𝑅𝐼1 ×𝐼2 ×⋅⋅⋅×𝐼𝑁 denote the tensor
rate will be high as any error in the output layer means that at of weights, biases of the hidden layer, and biases of the visible
least one path is wrong. Therefore, in our envisioned design, layer, respectively. 𝑚 denotes the number of training data. 𝑉 (𝑡)
the output comprises only the next node that can be denoted by is the 𝑡𝑡ℎ training data, the probability of which is 𝑝(𝑉 (𝑡) ).
a vector. The number of the elements in the vector is the same To maximize 𝑙(𝑊, 𝐵, 𝐴), the gradient descent of 𝑙(𝑊, 𝐵, 𝐴)
as the network scale and all elements are binary. In the vector, is used to adjust the values of 𝑊, 𝐵, 𝐴, which can be expressed
only one element has the value of 1 and its order represents as follows.
∂𝑙(𝑊, 𝐵, 𝐴)
which router is the next node. Fig. 3a gives an example the 𝜃 := 𝜃 + 𝜂1 , (6)
output layer, and the second element having the value of 1 ∂𝜃
means that router 2 is the next node. Additionally, the vector where 𝜃 represents any element in the tensors 𝑊, 𝐵, and 𝐴.
can also be seen as a 1-dimensional tensor. 𝜂1 denotes the learning rate in the first step.
The value of ∂𝑙(𝑊, 𝐵, 𝐴)/∂𝜃 can be obtained through C. Backpropagation Process
the Contrastive Divergence (CD) method and the Gibbs Since the RBMs except the final one are unsupervised
Sampling method [13]. The cores of ∂𝑙(𝑊, 𝐵, 𝐴)/∂𝑤𝑗𝑖 , trained in the first step, the values of the weights and biases
∂𝑙(𝑊, 𝐵, 𝐴)/∂𝑏𝑗 , and ∂𝑙(𝑊, 𝐵, 𝐴)/∂𝑎𝑖 are given below (𝑖 = are not optimum. Thus, we still need to fine-tune the whole
𝑖1 𝑖2 ⋅ ⋅ ⋅ 𝑖𝑛 , 𝑗 = 𝑗1 𝑗2 ⋅ ⋅ ⋅ 𝑗𝑛 ). architecture. The goal of training is that the output of the
′
∂𝑙(𝑊, 𝐵, 𝐴) 1∑
𝑙 deep learning architecture denoted by 𝑌 is the same as 𝑌
𝑝(ℎ𝑗 = 1∣𝑉 𝑘 )𝑣𝑖𝑘 ,
′
= 𝑣𝑖 𝑝(ℎ𝑗 = 1∣𝑉 ) − when we input 𝑋 to the DBA. For calculating 𝑌 , we run
∂𝑤𝑗𝑖 𝑙
𝑘=1 a forward propagation after inputting 𝑋 with the initialized
(7) values of the weights and biases. In the forward propagation
process, the units in each layer are weighted and then they
𝑙
1∑
𝑙 𝑙 𝑙
∂𝑙(𝑊, 𝐵, 𝐴) get activated in the next layer. We use 𝑍 𝑙 ∈ 𝑅𝐼1 ×𝐼2 ×⋅⋅⋅×𝐼𝑁
= 𝑝(ℎ𝑗 = 1∣𝑉 ) − 𝑝(ℎ𝑗 = 1∣𝑉 𝑘 ), (8) 𝑙 𝑙 𝑙
∂𝑏𝑗 𝑙 and 𝑈 𝑙 ∈ 𝑅𝐼1 ×𝐼2 ×⋅⋅⋅×𝐼𝑁 to represent the weighted value and
𝑘=1
the activated value of the units in the 𝑙𝑡ℎ layer, respectively.
𝑙 The order of the tensor representing the 𝑙𝑡ℎ layer is 𝐼1𝑙 × 𝐼2𝑙 ×
∂𝑙(𝑊, 𝐵, 𝐴) 1∑ 𝑘 𝑙
= 𝑣𝑖 − 𝑣𝑖 . (9) ⋅ ⋅ ⋅ × 𝐼𝑁 . If we assume that the TDBA consists of 𝐿 layers,
∂𝑎𝑖 𝑙 then for the input layer and the output layer, 𝑍 1 = 𝑋 and
𝑘=1
′
where 𝑙 is the number of sampling. 𝑉 𝑘 is the value after 𝑘- 𝑌 = 𝑈 𝐿 . Since the input layer does not need to be activated,
times sampling. 𝑈 1 = 𝑍 1 . For the weight values of the links connecting to
𝑙−1 𝑙 𝑙 𝑙
As the value of every unit in one layer is independent on the the 𝑙𝑡ℎ layer, we use the tensor 𝑊 𝑙 ∈ 𝑅𝛼 ×𝐼1 ×𝐼2 ×⋅⋅⋅×𝐼𝑁
other units in the same layer when the other layers are fixed, (𝛼𝑙−1 = 𝐼1𝑙−1 × 𝐼2𝑙−1 × ⋅ ⋅ ⋅ × 𝐼𝑁 𝑙−1
). Then, we can utilize
the conditional distribution probabilities of each layer while the the initialized weights and biases to calculate the weighted
other layer is given, 𝑝(𝑉 ∣𝐻; 𝑊, 𝐵, 𝐴) and 𝑝(𝐻∣𝑉 ; 𝑊, 𝐵, 𝐴), values and the activated values of every layer according to the
can be calculated as follows, following equations,
𝐼1 ×𝐼2∏
×⋅⋅⋅×𝐼𝑁 𝑍 𝑙 = 𝑊 𝑙 ⊙ 𝑈 𝑙−1 + 𝐵 𝑙 , (14)
𝑝(𝑉 ∣𝐻; 𝑊, 𝐵, 𝐴) = 𝑝(𝑣𝑖 ∣𝐻; 𝑊, 𝐵, 𝐴), (10)
𝑖
𝑈 𝑙 = 𝑠𝑖𝑔𝑚(𝑍 𝑙 ), (15)
𝐽1 ×𝐽2∏
×⋅⋅⋅×𝐽𝑁
𝐼1𝑙 ×𝐼2𝑙 ×⋅⋅⋅×𝐼𝑁
𝑙
𝑝(𝐻∣𝑉 ; 𝑊, 𝐵, 𝐴) = 𝑝(ℎ𝑗 ∣𝑉 ; 𝑊, 𝐵, 𝐴). (11) where 𝐵 𝑙 ∈ 𝑅 denotes the biases of the 𝑙𝑡ℎ layer.
𝑗 After the forward propagation process, the output of the
′
If the values of the units in the visible layer and the hidden TDBA, 𝑌 (𝑈 𝐿 ), can be obtained. To measure the difference
′
layer are all binary, then 𝑝(𝑣𝑖 = 1∣𝐻; 𝑊, 𝐵, 𝐴) and 𝑝(ℎ𝑗 = between 𝑌 and 𝑌 , we need to define a loss function as
1∣𝑉 ; 𝑊, 𝐵, 𝐴) are expressed as, follows,
𝑚
1 ∑ ′ (𝑡) ′
𝐽1 ×𝐽2∑
×⋅⋅⋅×𝐽𝑁 𝐽= (𝑌 − 𝑌 (𝑡) )𝑇 𝐺(𝑌 (𝑡) − 𝑌 (𝑡) )+
2𝑚 𝑡=1
𝑝(𝑣𝑖 = 1∣𝐻; 𝑊, 𝐵, 𝐴) = 𝑠𝑖𝑔𝑚( 𝑤𝑗𝑖 ℎ𝑗 + 𝑎𝑖 ),
𝑙 𝑙
𝐿 𝐼1 ×𝐼2∑
𝑙
×⋅⋅⋅×𝐼𝑁 𝛼 𝑙−1 (16)
𝑗
1 ∑ ∑
(12) 𝑤𝑗2𝑙 𝑖𝑙−1
2𝜆
𝑙=2 𝑗 𝑙 =1 𝑖𝑙−1 =1
𝐼1 ×𝐼2∑
×⋅⋅⋅×𝐼𝑁
𝑝(ℎ𝑗 = 1∣𝑉 ; 𝑊, 𝐵, 𝐴) = 𝑠𝑖𝑔𝑚( 𝑤𝑗𝑖 𝑣𝑖 + 𝑏𝑗 ). The first term on the right side of Eq. 16 is an average sum-
𝑖
of-squares error term. The second term is called a weight decay
(13) term used for decreasing the magnitude of the weights and
helping to prevent overfitting. 𝜆 is the weight decay parameter.
where 𝑠𝑖𝑔𝑚 represents the sigmoid activation function, i.e.,
For minimizing the value of 𝐽 in the backpropagation process,
𝑠𝑖𝑔𝑚(𝑥) = 1+𝑒1−𝑥 .
we use the gradient descent of 𝐽 to update 𝑊 and 𝐵 as shown
By repetitively updating the values of the weights and
in the equations below,
biases, we can finally train the RBM, and then use the activated
output of the hidden layer as the visible layer of the next RBM.
After training the RBMs, the TDBA can get well initialized. ∂𝐽
𝑤𝑗𝑙 𝑙 𝑖𝑙−1 := 𝑤𝑗𝑙 𝑙 𝑖𝑙−1 − 𝜂2 , (17)
Here, it should be noted that the training data in our proposal ∂𝑤𝑗𝑙 𝑙 𝑖𝑙−1
is labeled, which means that for every example of the traffic
pattern, 𝑋, there is a corresponding output 𝑌 . For training ∂𝐽
the final RBM with the labeled data, we usually take the third 𝑏𝑙𝑖𝑙 := 𝑏𝑙𝑖𝑙 − 𝜂2 , (18)
∂𝑏𝑙𝑖𝑙
layer up-bottom and the top layer, 𝑌 , as the visible layer of
the final RBM. where 𝜂2 denotes the learning rate.
After updating the values of weights and biases, we run IV. P ERFORMANCE E VALUATION
the forward propagation again and check whether the value
In this section, we evaluate the effectiveness of our TDBA-
of 𝐽 satisfies the upper limit. If not, the update and forward
based routing strategy in terms of network performance. In
propagation process need to be repetitively run until the value
our simulations based on C++, because of all computations
of 𝐽 is not above the upper limit or the number of loops reaches
conducted on a workstation, we choose a medium sized, 4 × 4
a definite value.
wired grid network, as our considered network. The sizes of
the data packets and signaling packets are both set to 1 Kb.
D. The Procedures of the Proposed Tensor-Based Deep Learn- The link capacity is set to 400 Kbps and the total buffer size
ing Routing Model of every router is assumed to be 400 Kb. The time slot is set
to 0.25 second.
Algorithm 1: The procedures of the tensor-based deep After setting these parameters, first, we use the OSPF proto-
learning routing model col as the routing strategy to simulate the packets transmission
Input: number of routers 𝑁 , number of edge routers 𝐸, process to record some data for training the TDBAs. As
training data (traffic pattern of edge routers, next router) the packets are all generated in the edge routers, we utilize
1: for each router 𝑖 ∈ 𝑁 do
the traffic patterns of only edge routers as the input. The
2: for each router 𝑗 ∈ 𝐸 do traffic pattern of only one-time slot is enough for offering
3: Train the TDBA[𝑖][𝑗] with corresponding training high accuracy and more time slots mean much more complex
data according to Eqs. 5 to 18 deep learning structures. Also, since the source and destination
4: end for information have no relation with the remaining buffer size, we
5: Send the weight and bias values to the edge routers can construct a 3-dimensional tensor representing the traffic
6: end for
patterns of edge routers as the input. The three dimensions
7: for each edge router 𝑗 ∈ 𝐸 do
of the tensor denote the router order, source node, and the
8: Record the traffic pattern 𝑇 𝑃 [𝑗] and send it to other destination node, respectively. Besides the traffic patterns of
edge routers edge routers, we add a separate unit for every edge router rep-
9: Execute Forward Propagation Process using Eqs. 14 resenting its remaining buffer size in the time slot. Therefore,
and 15, output the next nodes the input layer has 1740 units while the output layer has 16
10: Using the next node information, construct the paths units. We have tried different TDBA structures and found that
to destination routers the accuracy rate of the TDBA consisting of 5 layers and 20
11: end for
units in each hidden layer reaches above 95%. Therefore, we
12: return the paths to destination routers
choose this structure in our proposal. After training the TDBAs
in each router with data from OSPF, we can finally get their
weights and biases.
In this part, we explain how the proposed tensor-based deep Before analyzing the network performance of the TDBA-
learning routing strategy works for our considered network based routing strategy, we make a comparison between the
consisting of edge routers and inner routers. In the considered proposed TDBA and the DBA utilized in our previous work [3]
network, only the edge routers offer the access service for the as shown in Table I. Compared with previous work, in this
terminals while the inner routers just forward packets for edge paper, the input of every TDBA contains more information,
routers. Then, we can suppose that the packets are all generated leading to more units in the input layer. Moreover, the value of
in and destined for edge routers. the average Mean Square Error (MSE) for the TDBA is much
The procedures of the proposed model are presented in lower than that for DBA meaning that the prediction accuracy
Alg. 1. It needs to be mentioned that the training data can be has been improved a lot due to more input information.
obtained through available network data traces, e.g. the Center For more clearly demonstrating the effectiveness of the
for Applied Internet Data Analysis (CAIDA) [14], or from TDBA-based routing strategy, we compare the network perfor-
a similar network topology. The data consist of the detailed mance in terms of overall packet loss rate and average delay
traffic patterns of all edge routers and the corresponding paths per hop under different traffic loads as shown in Figs. 4 and 5.
connecting any two edge routers. Then every router uses the
traffic patterns and the corresponding next nodes information TABLE I: Comparison of TDBA and DBA
for all edge routers to train its TDBAs and then sends the
Aspects TDBA DBA
values of weights and biases to all the edge routers as shown in
Steps 1 to 6. Next, every edge router records its traffic pattern #inbound packets,
and share it with the other edge routers. With the information, Input Information #inbound packets source, destination,
every edge router can utilize the TDBAs to construct the paths remaining buffer size
to other edge routers as shown in Steps 7 to 11. Then, the paths #units in the
1740 16
can be labeled to the headers of packets and other routers send input layer
the packets according to the paths. Average MSE ≈ 10−4 ≈ 10−2
16 0.55
OSPF OSPF
Average Delay per Hop (s)

14 TDBA TDBA
Packet Loss Rate (%)
0.5
12
0.45
10
0.4
8
6 0.35
4 0.3
2 0.25
0 0.2
1.2 1.8 2.4 3 3.6 1.2 1.8 2.4 3 3.6
Packet Generating Rate (Mbps) Packet Generating Rate (Mbps)
Fig. 4: Comparison of overall packet loss rate for the conven- Fig. 5: Comparison of average delay per hop for the conven-
tional OSPF and the proposed TDBA system. tional OSPF and the proposed TDBA system.
In Fig. 4, it can be noticed that when the packet generating rate parameters, e.g. the packet size and the service time, which
is limited within 2 Mbps, the performance in terms of packet leads to the tensor model having more dimensions.
loss rate is the same for OSPF and TDBA, demonstrating that
R EFERENCES
the training of TDBA enables it to choose the same paths as
OSPF. The packet loss rate is nearly zero, which means that the [1] W. Huang, G. Song, H. Hong, and K. Xie, “Deep Architecture for
Traffic Flow Prediction: Deep Belief Networks with Multitask Learning,”
traffic congestion does not happen at this point. However, when IEEE Transactions on Intelligent Transportation Systems, vol. 15, no. 5,
the packet generating rate exceeds 2.4 Mbps, the performance pp. 2191–2201, Apr. 2014.
of the two systems becomes different and their gap increases [2] O. Yeremenko, “Development of the Dynamic Tensor Model for Traffic
Management in a Telecommunication Network with the Support of
with the increasing network traffic load. In Fig. 4, for OSPF, Different Classes of Service,” Eastern-European Journal of Enterprise
the packet loss rate goes up quickly which means that the Technologies, vol. 6, no. 9 (84), pp. 12–19, Dec. 2016.
traffic congestion happens and becomes more severe with the [3] N. Kato, Z. M. Fadlullah, B. Mao, F. Tang, O. Akashi, T. Inoue, and
K. Mizutani, “The Deep Learning Vision for Heterogeneous Network
rise of packet generating rate. On the other hand, no traffic Traffic Control - Proposal, Challenges, and Future Perspective -,” IEEE
congestion happens for TDBA as it suffers from no packet loss Wireless Communications Magazine, vol. 24, no. 3, pp. 146–153, Dec.
rate. This can be also demonstrated in Fig. 5 as the average 2016.
[4] B. Mao, Z. M. Fadlullah, F. Tang, N. Kato, O. Akashi, T. Inoue, and
delay per hop of TDBA keeps unchanged during the whole K. Mizutani, “Routing or Computing? The Paradigm Shift Towards
period while that of OSPF firstly surges and then grows slowly. Intelligent Computer Network Packet Transmission Based on Deep
Note that the average delay per hop of OSPF nearly reaches Learning,” IEEE Transactions on Computers, May. 2017, available
online, doi: 10.1109/TC.2017.2709742.
the theoretical maximum delay for the assumed scenario due to [5] Z. M. Fadlullah, F. Tang, B. Mao, N. Kato, O. Akashi, T. Inoue,
the limited buffer size, which will also happen to that of TDBA and K. Mizutani, “State-of-the-Art Deep Learning: Evolving Machine
if the packet generating rate continues to grow and exceeds a Intelligence Toward Tomorrow’s Intelligent Network Traffic Control
Systems, ” IEEE Communications Surveys & Tutorials, May. 2017,
certain level. Here, it can be concluded that the TDBA based available online, doi: 10.1109/COMST.2017.2707140.
routing strategy outperforms OSPF since the former can much [6] Z. Baker, T. Bhattacharya, M. Dunham, P. Graham, R. Gupta, J. Inman,
better control the network traffic. A. Klein, G. Kunde, A. McPherson, M. Stettler, et al., “The PetaFlops
Router: Harnessing FPGAs and Accelerators for High Performance
Computing,” Links, vol. 12, no. 4, pp. 16, Jan. 2009.
V. C ONCLUSION [7] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet Classification
In this paper, we proposed a deep learning architecture with Deep Convolutional Neural Networks,” in Advances in Neural
Information Processing Systems, pp. 1097–1105, Dec. 2012.
based on the tensor model to predict the traffic paths by using [8] Y. LeCun, Y. Bengio, and G. Hinton, “Deep Learning,” Nature, vol. 521,
only traffic patterns. Since a tensor can represent a much no. 7553, pp. 436–444, May. 2015.
higher-dimensional data compared with vectors or scalars, our [9] R. Sarikaya, G. E. Hinton, and A. Deoras, “Application of Deep Belief
Networks for Natural Language Understanding,” IEEE/ACM Transac-
proposal in the paper considers multiple networking parame- tions on Audio, Speech, and Language Processing, vol. 22, no. 4,
ters instead of only one parameter like conventional routing pp. 778–784, Feb. 2014.
protocols. In our proposal, the TDBA utilizes only the edge [10] L. Kuang, L. T. Yang, S. C. Rho, Z. Yan, and K. Qiu, “A Tensor-Based
Framework for Software-Defined Cloud Data Center,” ACM Transactions
routers’ traffic patterns with various conditions considered on Multimedia Computing, Communications, and Applications, vol. 12,
to predict the whole paths to other routers while the inner no. 5s, pp. 74:1–74:23, Dec. 2016.
routers do not need to send their traffic pattern information [11] Q. Zhang, L. T. Yang, and Z. Chen, “Deep Computation Model for
Unsupervised Feature Learning on Big Data,” IEEE Transactions on
to the edge routers. Since our proposal has a lower signaling Services Computing, vol. 9, no. 1, pp. 161–171, Nov. 2015.
overhead than OSPF, our proposal outperforms OSPF in terms [12] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A Fast Learning Algorithm
of packet loss rate and average delay per hop when the network for Deep Belief Nets,” Neural computation, vol. 18, no. 7, pp. 1527–
1554, Jul. 2006.
experiences a high traffic load. As an increasing number of [13] G. E. Hinton, “Training Products of Experts by Minimizing Contrastive
terminals are accessing networks for various kinds of services, Divergence,” Neural Computation, vol. 14, no. 8, pp. 1771–1800, Aug.
our future direction will be to apply the tensor-based deep 2002.
[14] “Center for Applied Internet Data Analysis (CAIDA).” http://www.caida.
learning architecture for routing to meet the QoS requirements org/home/.
of different services. In that case, we need to consider more

A Tensor Based Deep Learning Technique For Intelligent Packet Routing

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

A Tensor Based Deep Learning Technique For Intelligent Packet Routing

Загружено:

Авторское право:

Доступные форматы

A Tensor Based Deep Learning Technique

for Intelligent Packet Routing

Email: {akashi.osamu, inoue.takeru, mizutani.kimihiro}@lab.ntt.co.jp

978-1-5090-5019-2/17/$31.00 ©2017 IEEE

(a) The considered TDBA. (b) The architecture of Restricted

Fig. 3: The consider tensor-based deep learning model.

A. Envisioned Input and Output Design B. Forward Propagation Process

Average Delay per Hop (s)

Вам также может понравиться