Академический Документы
Профессиональный Документы
Культура Документы
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2944889, IEEE Internet of
Things Journal
Abstract—In the era of Internet of Things (IoTs), watching the most relevant video to target audiences, e.g., Youtube,
videos on mobile devices has been a popular applications in Netflix1 , IMDB2 , Movielens3 and iQIYI [7], [8]. And
our daily life. How to recommend videos to users is one of the in the era of Internet of Things (IoTs), watching videos
most concerned problem for Internet Video Service Providers
(IVSPs). In order to provide better recommendation service to on mobile devices has been a popular application in our
users, they deploy cloud servers in a Geo-distributed manner. daily life. In order to collect user data, the IVSPs deploy
Each server is responsible for analyzing a local area of user distributed cloud servers in different places [9], [10]. Each
data. Therefore, these cloud servers form information islands server is responsible for storing and analyzing the data
and the characteristics of data present non-independent generated by users who are located in specific areas [11],
and identically distribution (non-i.i.d). In this scenario,
it is difficult to provide accurate video recommendation [12], the user data from different places are non-i.i.d.
service to the minority of users in each area. To tackle Therefore, it is hard to recommend videos to the minority
this issue, we propose JointRec, a deep learning-based of users precisely. For example, as shown in Fig. 1, there
joint cloud video recommendation framework. JointRec are four cloud servers distributed in different areas. Bob
integrates the JointCloud architecture into mobile IoTs and is a ten-year-old child who lives in Area A, where the
achieves federated training among distributed cloud servers.
Specifically, we first design a Dual-Convolutional Probabilistic proportion of adults accounts for the majority. These users
Matrix Factorization (Dual-CPMF) model to conduct video prefer the entertainment videos and soap operas. Peter is an
recommendation. Based on this model, each cloud can old man who lives in Area C, the major users in this area
recommend videos by exploiting the user’s profiles and are teenagers, their interests tend to education videos. Due
description of videos that users rate, thereby providing more to the difference of user distribution in these two areas, the
accurate video recommendation services. Then we present
a federated recommendation algorithm which enables each characteristics of data produced by them are quite different,
cloud to share their weights and train a model cooperatively. and forming information islands between them. If each
Furthermore, considering the heavy communication costs in server makes video recommendation decisions merely based
the process of federated training, we combine low rank matrix on its local data, it will bring information deviation, leading
factorization and 8-bit quantization method to reduce uplink to non-precise recommendation. Furthermore, given the
communication costs and network bandwidth. We validate the
proposed approach on the real-world dataset, the experimental distributed nature of this scenario, the centralized method
results indicate the effectiveness of our proposed approach. have the following inherent weaknesses. i) Single point
of failure. The centralized server may fail due to attacks,
Index Terms—JointCloud Computing, Mobile Internet of
Things, Deep Learning, Video Recommendation System, Non- which threatens the reliability of the system. ii) All data
IID data setting, Federated Training. is required to send to the centralized server and process
centrally, leading heavy cost and communication overhead
for centralized system [13]. iii) Considering the difference
I. I NTRODUCTION
of data distribution, the centralized recommendation results
2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2944889, IEEE Internet of
Things Journal
2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2944889, IEEE Internet of
Things Journal
B. Distributed Learning and JointCloud Computing III. S YSTEM A RCHITECTURE AND D UAL -CPMF V IDEO
From the perspective of distributed learning, Povey et.al R ECOMMENDATION
[21] study distributed training by iteratively averaging We consider a distributed JointCloud video recommender
local training models. Arjevani et al. [22] investigate system architecture where cloud servers are distributed in
distributed machine learning based on gradient descent different places, as illustrated in Fig. 2. The user data of
from a theoretical point of view. But these works only watching video in these regions are collected and stored
consider the data setting and have unrealistic assumptions in specific cloud server. Each cloud server trains a video
that the data at different nodes is independent and identically recommendation model using the local data. Among these
distributed (i.i.d.), whereas the more general cases involving clouds, one of the cloud is designated as the aggregator,
non-i.i.d. which is hard to solve. Google in [23] proposes which is responsible for aggregating the training parameter
a communication-efficient learning method for training files. Once other cloud servers receive the requests, they send
decentralized data called Federated Learning (FL). Each the parameter files to the aggregator. Then the aggregator
client has a local training dataset and computes an update return the new parameter files to these cloud servers after
to the current global model maintained by the server, the some processing. Some key notations frequently used in this
experiments demonstrate that it is robust to unbalanced paper are summarized in Table I.
and non-i.i.d data distributions. However, the FL approach In the following, we describe the Dual-CPMF model for
cannot be directly applied to our learning task, because video recommendation in details. As shown in Fig. 3, the
the data derives from individual mobile devices and user and video document refer to the user’s reviews on
train locally. Our work shifts the focuses on multi-cloud the videos and the description and name of the videos,
cooperation with each other, each cloud data center stores respectively. The user profile includes the user’s attributes,
2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2944889, IEEE Internet of
Things Journal
2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2944889, IEEE Internet of
Things Journal
Sj = cnnv (W2 , Yj ) (6) where rij is the real rating value, N (x|0, σ 2 ) is the
probability density function, and Iij is an indicator that
where Ti and Sj are the user and video feature vector Iij = 1 implies that the user rates video and 0 otherwise.
respectively. W1 and W2 denote all the weight and bias In order to optimize the variables in user and video
variables. Xi and Yj are the raw input documents of user i latent models, weight and bias variables of CNNs, we can
and video j. maximize a posterior estimation (MAP) to conduct PMF.
max p(U, V, W1 , W2 |R, Xi , Yj , σ 2 , σU
2
, σV2 , σW
2
) (13)
C. Probabilistic Matrix Factorization U,V,W1 ,W2
Then, we adopt the PMF model to generate the ratings. = max [p(R|U, V, σ 2 )p(U |W1 , Xi , σU
2
)
U,V,W1 ,W2
Assuming we have N users and M videos, the ratings are
× p(V |W2 , Yj , σV2 )p(W1 |σW
2 2
)p(W2 |σW )]
represented by R ∈ RN ×M matrix. With the user and video
feature matrix Ti and Sj from CNN architecture, we further
add two zero-mean spherical Gaussian noise variables with D. Parameters Optimization
variance σ 2 . Hence, the user latent factor Ui for user i and Given a training dataset, there are many parameters to
the video latent factor V for video j are given by: be optimized in Dual-CPMF. We want to find the MAP
U = Ti + θi (7) estimate of U, V, W1 , W2 , then predicting the missing values
in ratings R and using the predictions to do recommendation.
Actually, the maximization of the posterior probability is
V = Sj + ξj (8) equivalent to minimizing the joint log-likehood. By taking
negative logarithm on formula (13), it can be reformulated
where θi and ξj are Guassian noise matrix, θi follows zero-
2 as follows:
mean Gaussian distribution and the variance is σU . And the
N X
M
ξj follows zero-mean Gussian distribution whose variance is X Iij
σV2 . Namely, θi ∼ N (0, σU
2
Iij ) and ξj ∼ N (0, σV2 Iij ). L(U, V, W1 , W2 ) = ||(rij − uTi vj )||2 (14)
i j
2
N N M
λU X λV X λw X
Y
2 2
p(θ|σU )= N (θi |0, σU Iij ) (9) + ||ui ||2 + ||vj ||2 + ||(W1k + W2k )||2
i 2 i 2 j 2
k
M where λU is σ 2 /σU 2
, λV is σ 2 /σV2 , and λw is σ 2 /σW
2
, ui
Y
p(ξ|σV2 )= N (σj |0, σV2 Iij ) (10) and vj indicate the updated value of user and video latent
j vector.
In order to obtain the optimal value of U and V , we adopt
2
Furthermore, we denote W1 , W2 ∼ N (0, σW Iij ) for coordinate descent algorithm, which iteratively optimizes
the training weights. As mentioned above, the conditional a latent variable by fixing the remaining variables during
distribution over user and video latent factor are given by: training process. We first fix U (or V ), W1 , W2 and take
N
Y derivative of L with respect to U (or V ) and set it to
2 2 zero. Solving the corresponding equations will lead to the
p(U |W1 , Xi , σU )= (ui |Ti , σU Iij ) (11)
i updating rule as follows:
ui ← (V Ii V T + λU IK )−1 (V Ri + λU Ti ) (15)
M
Y
p(V |W2 , Yj , σV2 ) = (vj |Sj , σV2 Iij ) (12)
j vj ← (U Ij V T + λV IK )−1 (U Rj + λV Sj ) (16)
2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2944889, IEEE Internet of
Things Journal
where Ii is a diagonal matrix whose the elements is Iij , (i = where Ln is the loss function of each cloud n, η > 0
1, ..., N ), Ri is a vector with (rij )N
j=1 for user i. For video is the learning rate. Furthermore, the global update on the
j, Ij and Rj are defined similarly. Given the U and V , we aggregator is defined as follows:
can further optimize the parameters W1 and W2 . However, N
the weight parameters cannot be optimized analytically as
X dn
w(t + 1) = wn (t + 1) (20)
we do for U and V . Because they are closely related to the n=1
d
features in CNN architecture. Noting that the loss function L PN
can be interpreted as a error function with regularized terms. where d = n=1 dn .
By fixing U and V , then we have: The federated recommendation (FR) strategy is presented
in Algorithm 1, the compress and decompress function will
N be explained in Section IV-B.
λU X λV X
E(W1 , W2 ) = ||(ui − Ti )||2 + ||vj − Sj ||2
2 i 2 j
Algorithm 1: Federated Recommendation Strategy
(17)
wk
Input: The N clouds are indexed by n; the weight file
λW X wn of each cloud n; the amount of mobile
+ ||(w1,k + w2,k )||2 + constant
2 devices dn in each cloud, the local minibatch
k
size B.
According to equation (17), we utilize back-propagation {Aggregator};
algorithm to respectively optimize W1 and W2 . Initialize w0 ;
After the repeated iterations, the optimization of for each round t=1,2,... do
parameters are updated until convergence. With the Compress(w0 );
optimized U, V, W1 , W2 , we can calculate the unknown for each cloud n in parallel do
values of ratings R and predict the latent preferences: stransfer(ip, remote, local, usr, psd);
request(url);
r̂ij ≈ uTi vj = (Ti + θi )T (Sj + ξj ) (18) for each wn (t) do
Decompress(wn (t));
Recalling Ti and Sj are the feature vectors that are
extracted from CNN architecture. θi and ξj stand for the wn (t + 1), dn , n ← Cloud(n,wn (t));
Gaussian noise matrix for user i and video j. With respect response();
to cold-start problem, we can use the predicted user-video d += dn ;
PN
matrix to conduct video recommendations. w(t + 1) ← n=1 ddn wn (t + 1);
Now we obtain the training model parameters for video
{Single Cloud};
recommendation in each cloud. In the next section, we
Cloud(n, wn (t)):
will study the federated recommendation strategy among
listen(port);
distributed clouds.
Decompress(wn (t));
user loadweight(wn (t));
IV. F EDERATED R ECOMMENDATION AND E FFICIENT for each local epoch n from 1 to B do
C OMMUNICATION S TRATEGY CNN Model():
wn (t + 1) = wn (t) − η∇Ln (wn (t));
A. Federated Recommendation Strategy ctransfer();
We present a distributed gradient-descent algorithm for Compress(wn (t + 1));
federated recommendation strategy. Supposing that there are return wn (t + 1) to aggregator.
N clouds distributed in different regions, each with local
dataset of size s1 , . . . , sn , . . . , sN . Each cloud n performs a
single batch gradient calculation per communication round
t. Specifically, at t = 0, all clouds download the same initial B. Efficient Communication Strategy
parameters from aggregator, then they compute the gradient- As mentioned above, each single cloud independently
descent gn (t) = ∇Ln (wn (t)), (t = 0, 1, 2, ...) on its local computes a weight update to the current model based on
dataset. The wn (t) is the weight file of cloud n at round its local data, and communicates the update to an aggregator
t. This step refers to local update. After one or more local cloud, where the single cloud-side updates are aggregated to
training, the aggregator gathers these gradients and updates compute a new global model. Therefore, the communication
the parameters for next iteration by applying a weighted efficiency is of the utmost importance. The goal of increasing
average of all resulting models. We define this process as communication efficiency of FR is to reduce the cost of
global update. For each cloud n, the local update rule is sending weight file to the aggregator, while learning from
defined as: data stored on single cloud with limited internet connections.
In this paper, we propose a weight compression strategy
wn (t + 1) = wn (t) − ηgn (t) (19) to reduce the uplink communication cost. We compress
2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2944889, IEEE Internet of
Things Journal
2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2944889, IEEE Internet of
Things Journal
cases. In Case 1, each cloud server trains a model on its predictions. A smaller value indicates a better performance
local data and then conducts federate training according of predictions [36].
to our approach. In Case 2, each cloud server performs v
u N,M
training locally, but they does not interact with each other. u1 X
RM SE = t (rij − r̂ij )2 (21)
In Case 3, all the data are trained centrally in a cloud n i,j
server. In the experiment, we test two data splits. Table III
shows the data distribution on each cloud. where r̂ij is the predicted value, n is the number of
ratings. As for the top-K recommendations, Recall@K
Table III (R@K) and Precision@K (P@K) are selected to evaluate
The data distribution on each cloud the recommendation accuracy.
Cloud Users Proportion Videos Majority Minority |R(i)| ∩ |T (i)|
Distribution 1
R@K = (22)
|T (i)|
Cloud 1 898 0.149 3286 865 33
Cloud 2 1337 0.221 3325 1308 29
|R(i)| ∩ |T (i)|
Cloud 3 3805 0.630 3470 3791 14 P @K = (23)
Distribution 2 |R(i)|
Cloud 1 726 0.120 3354 626 100 where R(i) is the top-K videos in the recommended list for
Cloud 2 1325 0.220 3433 1125 200 user i, T (i) is the items set that user i likes, | · | indicates the
Cloud 3 3989 0.660 3649 3689 300 number of elements in the set. R@K refers to the fraction of
items returned in the top-K list. P@K refers to how many
recommended items are accurate in the top-K list.
C. Preprocessing and Parameters Setting
F. Experimental Result
We preprocess the dataset as follows: (1) converting the
users’ age, genders, occupations and movie genres into equal The experimental results consist of two aspects: 1) For
length arrays, (2) constructing user-plot data according to the centralized training, we mainly evaluate the performance
users’ ratings, (3) calculating the tf-idf score for each word of our proposed video recommendation model. 2) For the
in user-plot and movie-plot data, (4) selecting top 8000 distributed training, we compare the performance under
distinct words as a vocabulary, (5) spliting the dataset into different cases. They include federated training among
the training set (80%), the validation set (10%) and the test distributed clouds, single cloud training without information
set (10%). Additionally, we remove the items which do not fusion based on local dataset independently and the
have plot documents in the whole dataset and the users that centralized baseline.
have less 3 ratings [19]. 1) Prediction Accuracy Evaluation. We first compare
We set the initial size of latent dimension of U and V the results of rating prediction and evaluate the prediction
to 50. The value of λU is 100 and λV is 10. In the CNN accuracy of our model. Table IV shows the RMSE results of
model, 1) we use Adam as the optimizer and each mini-batch our approach and other models. We can observe that Dual-
consists of 128 training samples, 2) we use three different CPMF outperforms other methods, proving the effectiveness
filters to extract features, 3) we set the dropout rate to 0.1 of the user feature extraction for model fitting. Note that
to reduce overfitting. ”Improve” indicates the degree of improvement of ”Dual-
CPMF” over these methods. The results indicate that the
users’ profiles play an important role in recommendation
D. Competitors model, and extracting features from user perspective can
We compared our proposed model Dual-CPMF with four model users’ preferences better. Therefore, our proposed
competitors: PMF, CTR, CDL, ConvMF, which are the model is more suitable for video recommendation.
representative works on extending matrix factorization with
Table IV
content information in recommender systems. PMF [33] is Overall comparison results of RMSE
a basic rating prediction model that only uses ratings. CTR
[34] (Collaborative Topic Regression) combines PMF and Model PMF CTR CDL ConvMF Dual-CPMF
LDA to use ratings and documents. CDL [35] (Collaborative Value 0.8971 0.8969 0.8879 0.8595 0.8472
Deep Learning) uses both SDAE and PMF to improve Improve 5.56% 5.54% 4.58% 1.43%
accuracy. ConvMF [19] (Convolutional PMF) is a context-
aware recommendation model that integrates CNN into PMF. 2) The Impact of Various Sparseness of the Dataset.
As shown in Table V, we compare the RMSE on
different sparsenesses of the dataset. Because the ConvMF
E. Evaluation Metrics outperforms CTR and CDL, we only select PMF and
Root mean squared error (RMSE) is a extensively used ConvMF for comparison. Our approach performs better
measurement for the performance evaluation of rating than both of them over all ranges of sparseness. Specifically,
2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2944889, IEEE Internet of
Things Journal
Table VI
The impact of latent dimension D
Latent Dimension D
Model
Fig. 6. Parameter Analysis of λU and λV . 10 20 50 100
PMF 0.8685 0.8665 0.8683 0.8646
ConvMF 0.8863 0.8770 0.8566 0.8554
we note that the improvement of Dual-CPMF over ConvMF Dual-CPMF 0.8568 0.8526 0.8489 0.8474
from 2.09% to 1.43% when data density increases from
0.93% to 3.71%. It indicates that Dual-CPMF can obtain 4) Performance Evaluation of Distributed Video
more accurate predictions by providing more ratings. Recommendation System. In Fig. 7 and Fig. 8, we
concentrate on performance evaluation in different cases.
Table V In the figure, the bar in a box is the average recall of the
The RMSE over various sparseness of training data on the dataset model. The upper and bottom borders of a box represent
75 percentile and 25 percentile. The tips of the upper and
Ratio of training set to the entire dataset
bottom whiskers represent the max and min values. Fig. 7
Model 20% 40% 60% 80%
shows the Recall@250 comparison for the minority user
(0.93%) (1.86%) (2.78%) (3.71%)
PMF 1.1137 0.9964 0.9251 0.9097
groups under two data distribution. The recall of our method
ConvMF 0.9908 0.9361 0.8822 0.8595 (Case 1) is higher than Case 2 in all clouds on average,
Dual-CPMF 0.9700 0.9143 0.8596 0.8472 validating our intuition that federated training and parameter
Improvement 2.09% 2.32% 2.56% 1.43% fusion can improve the performance in the distributed video
recommendation system. Although the average recall of
3) Parameter Analysis. To investigate the impact of the Case 1 is slightly lower than Case 3, with the increasing
parameters on the performance of Dual-CPMF, we conduct number of users, the recall of our method approaches to
parameter analysis on λU and λV . We consider four values Case 3, even sometimes better than Case 3. It is probably
of λU and λV , i.e., 0.1, 1, 10, 50, 100 in this experiment. due to the distributed fusion training can make use of the
Fig. 6 shows the RMSE changes of Dual-CPMF under computation resources of multiple clouds. Furthermore, we
different values of λU and λV on the dataset. λU implies can see that the recall of Case 1 shows an increasing trend
how Ti affects user latent factor U and λV indicates how Sj from Cloud 1 to Cloud 3. This is because the number of
affects video latent factor V . Note that the prediction error users on these three clouds is increasing, the more users
is high when both two parameters approach to 0, while the will bring to better recommendation performance. Fig. 8
value decreases when λU and λV increase. It indicates that shows the Recall@250 comparison for the majority user
increasing the textual information of user and video is benefit groups under two data distribution. As shown in the figure,
for increasing the accuracy of rating prediction. We can also the results of majority user group are similar to the minority
see that the best performance of recommendation is achieved user groups in the Fig. 7.
when (λU , λV ) takes value of (100,10), the ”lowest point” ( Fig. 9 shows the comparison of different Recall@K for
RMSE=0.8472) in the figure. It implies that the influence of the minority of user groups in different cases. As shown in
users’ perspective on the prediction accuracy is greater than the figure, with the value of K increases from 50 to 200, the
that of videos’ perspective. However, when λU exceeds a recall of all cases in Cloud 1 to Cloud 3 increase, and our
certain threshold, the accuracy begins to decrease, it is likely method (Case 1) outperforms the one in Case 2. Although
that it would have side effect when we put much emphasize the performance of Case 3 is better than Case 1 in Cloud
on the user information. Furthermore, we note that the 1. As the number of users increase in Cloud 2 and Cloud
ideal combination of two parameters on ConvMF and Dual- 3, the value of recall in Case 1 approaches to Case 3.
CPMF is same, but we achieve the smaller RMSE than 5) The comparison for IID Case and Non-IID Case.
ConvMF model. It indicates that the implements of Dual- To evaluate the performance of our approach in the i.i.d and
2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2944889, IEEE Internet of
Things Journal
1 . 0
1 . 0
2 .0
C 1 : C a s e 1 C 1 : C a s e 1
C 2 : C a s e 2 C 2 : C a s e 2
0 . 8 C 3 : C a s e 3
0 . 8 C 3 : C a s e 3
1 .8 C lo u d 3 C lo u d 2 C lo u d 1
R e c a ll@ 2 5 0
R e c a ll@ 2 5 0
0 . 6
0 . 6
IID C a s e
1 .6
N o n -IID C a s e
C a s e 1 _ R M S E
0 . 4
0 . 4
1 .4
0 . 2
0 . 2
0 . 0
0 . 0
1 .2
C 1 C 2 C 3 C 1 C 2 C 3 C 1 C 2 C 3 C 1 C 2 C 3 C 1 C 2 C 3 C 1 C 2 C 3
C l o u d 1 C l o u d 2 C l o u d 3 C l o u d 1 C l o u d 3
C l o u d 2
C a s e C a s e
1 .0
(a) Distribution 1. (b) Distribution 2.
0 .8
Fig. 7. Recall@250 comparison for the minority user groups under two 0 1 0 2 0 3 0 0 1 0 2 0 0 1 0 2 0
data distribution. C o m m u n ic a tio n R o u n d s
1 . 0
1 . 0
Fig. 10. The RMSE for IID Case and Non-IID Case.
C 1 : C a s e 1
C 1 : C a s e 1
C 2 : C a s e 2
C 2 : C a s e 2
0 . 8
0 . 8 C 3 : C a s e 3
C 3 : C a s e 3
0 . 3 5
0 . 3 5
I I D C a s e I I D C a s e
R e c a ll@ 2 5 0
R e c a ll@ 2 5 0
N o n - I I D C a s e N o n - I I D C a s e
0 . 6
0 . 6 0 . 3 0
0 . 3 0
0 . 2 5
0 . 2 5
0 . 4
0 . 4
R e c a ll@ 2 5 0
R e c a ll@ 2 5 0
0 . 2 0 0 . 2 0
0 . 2
0 . 2
0 . 1 5 0 . 1 5
0 . 0 0 . 1 0 0 . 1 0
0 . 0
C 1 C 2 C 3 C 1 C 2 C 3 C 1 C 2 C 3 C 1 C 2 C 3 C 1 C 2 C 3 C 1 C 2 C 3
C l o u d 1 C l o u d 3 C l o u d 1 C l o u d 2 C l o u d 3
C l o u d 2
0 . 0 5 0 . 0 5
C a s e C a s e
0 . 0 0
0 . 0 0
C a s e 1
0 .3
C a s e 2
C a s e 3
Cloud 2 and Cloud 3. As we can see in Fig. 12(a), the
0 .2 C lo u d 3
0 .1
combination compression method in the bottom row and
0 .3 baseline in the top row. It indicates that the combination
0 .2 compression method performs better than others on Cloud
0 .1 C lo u d 1 1, whereas on the Cloud 2 and Cloud 3 shown in Fig. 12(b)
0 .0
5 0 1 0 0 1 5 0 2 0 0 and Fig. 12(c), the curves of three compression methods
K
are much closer to the baseline. This is caused by the
Fig. 9. Recall@K comparison for the minority user groups under different different distribution of users on three clouds. Therefore,
cases. we can conclude that weight compression not only reduces
2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2944889, IEEE Internet of
Things Journal
RMSE
Compression Method Model Size Compression Ratio
Cloud 1 Cloud 2 Cloud 3
Baseline 0.8885 0.9285 0.8547 85.58MB 1x
Matrix Factorization 0.8854 0.9283 0.8543 43.7MB 1.95x
8-bit Quantification 0.8873 0.9368 0.8541 21.46MB 3.98x
MF + 8-bit Quantification 0.8804 0.9281 0.8545 6.67MB 12.83x
Table VII
Comparison of compression ratio
1 . 0 0
1 . 0 0
B a s e l i n e
B a s e l i n e
B a s e l i n e
1 . 0 8
M F + 8 - b i t Q u a n t i f i c a t i o n
M F + 8 - b i t Q u a n t i f i c a t i o n
M F + 8 - b i t Q u a n t i f i c a t i o n
M F
M F
M F
0 . 9 6
8 - b i t Q u a n t i f i c a t i o n
8 - b i t Q u a n t i f i c a t i o n 8 - b i t Q u a n t i f i c a t i o n
0 . 9 6
1 . 0 4
R M S E
R M S E
R M S E
0 . 9 2
1 . 0 0
0 . 9 2
0 . 8 8
0 . 9 6
0 . 8 8
0 . 9 2 0 . 8 4
0 5 1 0 1 5 2 0 0 5 1 0 1 5 2 0 2 5 0 5 1 0 1 5 2 0 2 5 3 0
R o u n d s R o u n d s R o u n d s
(a) The RMSE of Cloud 1. (b) The RMSE of Cloud 2. (c) The RMSE of Cloud 3.
Fig. 12. Comparison of RMSE with matrix factorization, 8-bit quantization and combination.
the communication overhead, but also has no impact on the the performance of JointRec on real-world movie dataset.
recommendation performance. The experimental results demonstrate that JointRec can
8) The comparison of Recall/Precision/F-Measure recommend accurate videos to both minority and majority
under the compression. In addition to the comparison of users, while the efficient communication strategy can
RMSE, we also discuss the Recall@250, Precision@250 achieve 12.83x compression ratio than the baseline with no
and F-Measure (α = 1) on these three clouds. The Recall loss of performance.
and Precision are defined above. The F-Measure considers In the future, we plan to investigate the edge-end-cloud
the Recall and Precision comprehensively. orchestrated distributed video recommendation, where the
recommender system is deployed in the edge server in
(α2 + 1)P R
F − M easure = (25) proximity to mobile users. In such a case, the data generated
α2 P + R by mobile users can be directly processed locally on edge
where P is the precision and R is the recall, α is a variable. servers rather than sending to the remote clouds. How to
In Fig. 13, we can observe that the values of three metrics deal with the cooperative computing among them is a new
under compression are close to the baseline. The results challenge.
show that weight compression strategy has less impact on
system performance in terms of Recall, Precision and F- R EFERENCES
Measure. [1] S. Zhang, L. Yao, A. Sun, and Y. Tay, “Deep learning based
recommender system: A survey and new perspectives,” ACM
Computing Surveys (CSUR), vol. 52, no. 1, p. 5, 2019.
VI. C ONCLUSION [2] T. Yang, H. Liang, N. Cheng, R. Deng, and X. Shen, “Efficient
scheduling for video transmissions in maritime wireless
In this paper, we have proposed JointRec, a deep communication networks,” IEEE Transactions on Vehicular
learning-based video recommendation framework for Technology, vol. 64, no. 9, pp. 4215–4229, 2014.
[3] X. Zhang, H. Chen, Y. Zhao, Z. Ma, Y. Xu, H. Huang, H. Yin, and
minority mobile users. To this end, we design a Dual- D. O. Wu, “Improving cloud gaming experience through mobile edge
CPMF video recommendation model, this model can computing,” IEEE Wireless Communications, 2019.
extract the unique latent features of users and videos from [4] X. Zhang, H. Yin, D. O. Wu, G. Min, H. Huang, and Y. Zhang,
“Ssl: A surrogate-based method for large-scale statistical latency
the user’s profiles and the description of videos, thereby measurement,” IEEE Transactions on Services Computing, 2017.
achieving more accurate recommendation performance. [5] R. Xing, Z. Su, N. Zhang, J. Luo, H. Pu, and Y. Peng, “Trust
Then we propose federated recommendation strategy that based intrusion detection and learning aided incentive mechanism for
autonomous driving,” IEEE Network, vol. in press, 2019.
each distributed cloud trains model based on local area data [6] Y. Wang, Z. Su, Q. Xu, T. Yang, and N. Zhang, “A novel charging
and updates training weights cooperatively. Considering scheme for electric vehicles with smart communities in vehicular
the heavy communication cost during weight update, we networks,” IEEE Transactions on Vehicular Technology, 2019.
[7] P. Covington, J. Adams, and E. Sargin, “Deep neural networks
develop a weight compression algorithm to reduce network for youtube recommendations,” in Proceedings of the 10th ACM
bandwidth and communication overhead. We have evaluated conference on recommender systems. ACM, 2016, pp. 191–198.
2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2944889, IEEE Internet of
Things Journal
0 . 5
B a s e l i n e B a s e l i n e B a s e l i n e
0 . 3 0
0 . 2 5
M F + 8 - b i t Q u a n t i f i c a t i o n M F + 8 - b i t Q u a n t i f i c a t i o n M F + 8 - b i t Q u a n t i f i c a t i o n
0 . 4
0 . 2 5
0 . 2 0
1 )
P r e c is io n @ 2 5 0
R e c a ll@ 2 5 0
=
0 . 2 0
α
0 . 3
e a s u r e (
0 . 1 5
0 . 1 5
0 . 2
F - M
0 . 1 0
0 . 1 0
0 . 1
0 . 0 5
0 . 0 5
0 . 0 0 . 0 0 0 . 0 0
C l o u d 1 C l o u d 2 C l o u d 3
C l o u d 1 C l o u d 2 C l o u d 3 C l o u d 1 C l o u d 2 C l o u d 3
C l o u d C l o u d C l o u d
(a) The comparison of recall. (b) The comparison of precision. (c) The comparison of F-Measure(α = 1).
[8] Z. Zhao, Q. Yang, H. Lu, T. Weninger, D. Cai, X. He, and [22] Y. Arjevani and O. Shamir, “Communication complexity of distributed
Y. Zhuang, “Social-aware movie recommendation via multimodal convex learning and optimization,” in Advances in neural information
network learning,” IEEE Transactions on Multimedia, vol. 20, no. 2, processing systems, 2015, pp. 1756–1764.
pp. 430–440, 2017. [23] H. B. McMahan, E. Moore, D. Ramage, S. Hampson et al.,
[9] H. Yin, X. Zhang, H. H. Liu, Y. Luo, C. Tian, S. Zhao, and “Communication-efficient learning of deep networks from
F. Li, “Edge provisioning with flexible server placement,” IEEE decentralized data,” arXiv preprint arXiv:1602.05629, 2016.
Transactions on Parallel and Distributed Systems, vol. 28, no. 4, pp. [24] S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He,
1031–1045, 2016. and K. Chan, “When edge meets learning: Adaptive control for
[10] Z. Su, Y. Hui, and T. H. Luan, “Distributed task allocation to enable resource-constrained distributed machine learning,” arXiv preprint
collaborative autonomous driving with network softwarization,” IEEE arXiv:1804.05271, 2018.
Journal on Selected Areas in Communications, vol. 36, no. 10, pp. [25] X. Yue, H. Wang, W. Liu, W. Li, P. Shi, and X. Ouyang, “Jcdta:
2175–2189, 2018. The data trading archtecture design in jointcloud computing,” in
[11] X. Zhang, H. Huang, H. Yin, D. O. Wu, G. Min, and Z. Ma, “Resource 2018 IEEE 24th International Conference on Parallel and Distributed
provisioning in the edge for iot applications with multi-level services,” Systems (ICPADS), 2018, pp. 1–6.
IEEE Internet of Things Journal, 2018. [26] W. Chen, M. Ma, Y. Ye, Z. Zheng, and Y. Zhou, “Iot service based
[12] Y. Wang, Z. Su, and N. Zhang, “Bsis: Blockchain based secure on jointcloud blockchain: The case study of smart traveling,” in 2018
incentive scheme for energy delivery in vehicular energy network,” IEEE Symposium on Service-Oriented System Engineering (SOSE),
IEEE Transactions on Industrial Informatics, 2019. 2018, pp. 216–221.
[27] X. Fu, H. Wang, P. Shi, Y. Fu, and Y. Wang, “Jcledger: A blockchain
[13] D. Zhang, Y. Qiao, L. She, R. Shen, J. Ren, and Y. Zhang, “Two time-
based distributed ledger for jointcloud computing,” in 2017 IEEE
scale resource management for green internet of things networks,”
37th International Conference on Distributed Computing Systems
IEEE Internet of Things Journal, vol. 6, no. 1, pp. 545–556, 2018.
Workshops (ICDCSW), 2017, pp. 289–293.
[14] H. Wang, P. Shi, and Y. Zhang, “Jointcloud: A cross-cloud cooperation [28] J. Pennington, R. Socher, and C. Manning, “Glove: Global vectors
architecture for integrated internet service customization,” in 2017 for word representation,” in Proceedings of the 2014 conference on
IEEE 37th international conference on distributed computing systems empirical methods in natural language processing (EMNLP), 2014,
(ICDCS). IEEE, 2017, pp. 1846–1855. pp. 1532–1543.
[15] T. Yang, Z. Zheng, H. Liang, R. Deng, N. Cheng, and X. Shen, [29] D. Tang, B. Qin, Y. Yang, and Y. Yang, “User modeling with neural
“Green energy and content-aware data transmissions in maritime network for review rating prediction,” in International Conference on
wireless communication networks,” IEEE Transactions on Intelligent Artificial Intelligence, 2015, pp. 1340–1346.
Transportation Systems, vol. 16, no. 2, pp. 751–762, 2014. [30] Z. Wang, Y. Zhang, H. Chen, Z. Li, and F. Xia, “Deep user
[16] D. Zhang, L. Tan, J. Ren, M. K. Awad, S. Zhang, Y. Zhang, and modeling for content-based event recommendation in event-based
P.-J. Wan, “Near-optimal and truthful online auction for computation social networks,” INFOCOM, 2018.
offloading in green edge-computing systems,” IEEE Transactions on [31] Y. Gong, L. Liu, M. Yang, and L. Bourdev, “Compressing deep
Mobile Computing, 2019. convolutional networks using vector quantization,” arXiv preprint
[17] D. Zhang, Z. Chen, M. K. Awad, N. Zhang, H. Zhou, and X. S. arXiv:1412.6115, 2014.
Shen, “Utility-optimal resource management and allocation algorithm [32] Y. Lin, S. Han, H. Mao, Y. Wang, and W. J. Dally, “Deep gradient
for energy harvesting cognitive radio sensor networks,” IEEE Journal compression: Reducing the communication bandwidth for distributed
on Selected Areas in Communications, vol. 34, no. 12, pp. 3552–3565, training,” arXiv preprint arXiv:1712.01887, 2017.
2016. [33] R. Salakhutdinov and A. Mnih, “Probabilistic matrix factorization,” in
[18] Q. Diao, M. Qiu, C.-Y. Wu, A. J. Smola, J. Jiang, and C. Wang, International Conference on Neural Information Processing Systems,
“Jointly modeling aspects, ratings and sentiments for movie 2007, pp. 1257–1264.
recommendation (jmars),” in Proceedings of the 20th ACM SIGKDD [34] C. Wang and D. M. Blei, “Collaborative topic modeling for
international conference on Knowledge discovery and data mining. recommending scientific articles,” in ACM SIGKDD International
ACM, 2014, pp. 193–202. Conference on Knowledge Discovery and Data Mining, 2011, pp.
[19] D. Kim, C. Park, J. Oh, S. Lee, and H. Yu, “Convolutional 448–456.
matrix factorization for document context-aware recommendation,” in [35] H. Wang, N. Wang, and D. Y. Yeung, “Collaborative deep learning
Proceedings of the 10th ACM Conference on Recommender Systems. for recommender systems,” pp. 1235–1244, 2014.
ACM, 2016, pp. 233–240. [36] M. Elahi, Y. Deldjoo, F. Bakhshandegan Moghaddam, L. Cella,
[20] C.-Y. Wu, A. Ahmed, A. Beutel, A. J. Smola, and H. Jing, S. Cereda, and P. Cremonesi, “Exploring the semantic gap for movie
“Recurrent recommender networks,” in Proceedings of the tenth ACM recommendations,” in Proceedings of the Eleventh ACM Conference
international conference on web search and data mining. ACM, on Recommender Systems. ACM, 2017, pp. 326–330.
2017, pp. 495–503.
[21] D. Povey, X. Zhang, and S. Khudanpur, “Parallel training of deep
neural networks with natural gradient and parameter averaging,” arXiv
preprint arXiv:1410.7455, 2014.
2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.