Вы находитесь на странице: 1из 12

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2944889, IEEE Internet of
Things Journal

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1

JointRec: A Deep Learning-based Joint Cloud


Video Recommendation Framework for Mobile
IoTs
Sijing Duan, Deyu Zhang, Member, IEEE, Yanbo Wang, Lingxiang Li, Member, IEEE, and
Yaoxue Zhang, Senior Member, IEEE,

Abstract—In the era of Internet of Things (IoTs), watching the most relevant video to target audiences, e.g., Youtube,
videos on mobile devices has been a popular applications in Netflix1 , IMDB2 , Movielens3 and iQIYI [7], [8]. And
our daily life. How to recommend videos to users is one of the in the era of Internet of Things (IoTs), watching videos
most concerned problem for Internet Video Service Providers
(IVSPs). In order to provide better recommendation service to on mobile devices has been a popular application in our
users, they deploy cloud servers in a Geo-distributed manner. daily life. In order to collect user data, the IVSPs deploy
Each server is responsible for analyzing a local area of user distributed cloud servers in different places [9], [10]. Each
data. Therefore, these cloud servers form information islands server is responsible for storing and analyzing the data
and the characteristics of data present non-independent generated by users who are located in specific areas [11],
and identically distribution (non-i.i.d). In this scenario,
it is difficult to provide accurate video recommendation [12], the user data from different places are non-i.i.d.
service to the minority of users in each area. To tackle Therefore, it is hard to recommend videos to the minority
this issue, we propose JointRec, a deep learning-based of users precisely. For example, as shown in Fig. 1, there
joint cloud video recommendation framework. JointRec are four cloud servers distributed in different areas. Bob
integrates the JointCloud architecture into mobile IoTs and is a ten-year-old child who lives in Area A, where the
achieves federated training among distributed cloud servers.
Specifically, we first design a Dual-Convolutional Probabilistic proportion of adults accounts for the majority. These users
Matrix Factorization (Dual-CPMF) model to conduct video prefer the entertainment videos and soap operas. Peter is an
recommendation. Based on this model, each cloud can old man who lives in Area C, the major users in this area
recommend videos by exploiting the user’s profiles and are teenagers, their interests tend to education videos. Due
description of videos that users rate, thereby providing more to the difference of user distribution in these two areas, the
accurate video recommendation services. Then we present
a federated recommendation algorithm which enables each characteristics of data produced by them are quite different,
cloud to share their weights and train a model cooperatively. and forming information islands between them. If each
Furthermore, considering the heavy communication costs in server makes video recommendation decisions merely based
the process of federated training, we combine low rank matrix on its local data, it will bring information deviation, leading
factorization and 8-bit quantization method to reduce uplink to non-precise recommendation. Furthermore, given the
communication costs and network bandwidth. We validate the
proposed approach on the real-world dataset, the experimental distributed nature of this scenario, the centralized method
results indicate the effectiveness of our proposed approach. have the following inherent weaknesses. i) Single point
of failure. The centralized server may fail due to attacks,
Index Terms—JointCloud Computing, Mobile Internet of
Things, Deep Learning, Video Recommendation System, Non- which threatens the reliability of the system. ii) All data
IID data setting, Federated Training. is required to send to the centralized server and process
centrally, leading heavy cost and communication overhead
for centralized system [13]. iii) Considering the difference
I. I NTRODUCTION
of data distribution, the centralized recommendation results

N OWADAYS, the recommendation system becomes


an effective way to deal with information overhead,
especially in the ever-growing multimedia applications
cannot be representative of the preference of single local
areas. Therefore, a distributed strategy is more suitable for
such a scenario.
[1]–[6]. As one kind of online multimedia applications, Recently, the emerging form of ”shared global economy”
video services have received great attention from Internet requires cloud resources to be collaboratively exploited
Video Service Providers (IVSP). They aim at providing by distributed cloud providers in a geo-distributed manner
[14]. Wang et al. [14] proposed JointCloud, a cross-cloud
Copyright (c) 2019 IEEE. Personal use of this material is permitted.
However, permission to use this material for any other purposes must be cooperation architecture for integrated internet service
obtained from the IEEE by sending a request to pubs-permissions@ieee.org. customization. With this architecture, the distributed cloud
Sijing Duan, Deyu Zhang, Yanbo Wang, Lingxiang Li and Yaoxue Zhang
are with the School of Computer Science and Engineering, Central South
1 https://en.wikipedia.org/wiki/Netflix.
University, Changsha, 410083, China.
2 https://en.wikipedia.org/wiki/IMDb.
E-mails: {dsjyfd012, zdy876, wangyb, lingxiang.li, zyx}@csu.edu.cn.
Corresponding author: Deyu Zhang 3 http://grouplens.org/datasets/movielens/.

2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2944889, IEEE Internet of
Things Journal

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2

servers are able to cooperate with each other to provide


video recommendation service for the minority of users
in each areas. On the other hand, from the perspective
A
of communication, it is a challenge to provide efficient
communication between distributed clouds [15]–[17].
D
In this paper, our motivation is to provide more accurate
video recommendation services to the minority of users
under the above mentioned scenario and achieve efficient
communication. To reach this goal, several challenges must C
B
be addressed.
• How to recommend videos when the data
distribution is non-i.i.d? There has been many
Fig. 1. The distributed video recommendation under Non-IID scenario.
researches on video recommendation, hardly any
studies focus on the impact of non-i.i.d data distribution
on video recommendation. Due to the data distribution federated recommendation algorithm to enable information
of watching videos in a specific area depends on the fusion among distributed clouds. Finally, we also propose an
local users, any local data will not be representative efficient communication strategy to reduce communication
for the global distribution. Therefore, the features overheads and maintain the performance of system. In
of user data in different places are non-i.i.d. How summary, the main contributions of this work include:
to recommend videos under non-i.i.d scenario is a 1) We propose to model user profiles and video properties
challenge. by using convolutional neural networks (CNNs),
• How to design recommendation model to extract motivated by the superiority of CNNs in extracting
features from non-linear user-video data under non- complex features. By using the textual information
i.i.d scenario? In the non-i.i.d case, user attributes and ratings from users and videos, our model can
(e.g., age, gender, occupation) are the key data for extract the unique latent factors of users and videos.
the division of different types of users. It requires These latent factors are then used to predict ratings.
the recommendation algorithm to be able to adapt to Notably, unlike the existing works that mainly focus
the user attributes. Furthermore, there is much implicit on exploiting the features from item’s perspective,
information including in various non-linear user-video we shift the focus to user’s perspective. Finally, we
feedback data, e.g., ratings and reviews [18]. It is incorporate the video and user latent factors into
necessary to design a recommendation model to extract probabilistic matrix factorization (PMF) model to
deep level features from these data. improve the recommendation accuracy.
• How to realize cooperation among distributed 2) We integrate JointCloud architecture into mobile
clouds? Each cloud is in charge of collecting data IoTs scenario. By training the models in distributed
from the areas it covers, leading to information clouds, this framework can provide accurate video
islands among these clouds. In this case, the video recommendations to the minority of mobile users
recommendation results based on the local data when the data distribution is non-i.i.d. Specifically,
performed by the single cloud will bring information this process includes single cloud training steps where
deviations and inaccurate recommendation results for each cloud trains recommendation model by using its
the minority of users. Therefore, how to enable the local data. It also includes global aggregation steps
cooperation of distributed clouds, realizing information where different clouds upload their training weights
fusion is another challenge remained to be solved. to an aggregator. The aggregator then aggregates all
• How to achieve efficient communication and reduce the parameters by taking a weighted average. After
network cost? Distributed training requires significant aggregation, it distributes the updated parameters to
communication bandwidth for frequent information each single cloud for the next iteration.
exchange. Hence, how to reduce communication cost 3) We design a weight compression algorithm to decrease
is a another challenge. communication overhead during federated training.
In this paper, we adopt the JointCloud architecture in Low-rank matrix factorization and 8-bit quantization
mobile IoTs, to recommend videos to the minority of are combined together and achieve a compression ratio
mobile users when the data is non-i.i.d in the distributed of 12.83x without sacrificing accuracy.
scenario. To improve the performance of recommendation, 4) We comprehensively evaluate the performance of the
we propose a Dual-CPMF (Dual-Convolutional Probabilistic proposed methods via extensive experiments on a real
Matrix Factorization) model for video recommendation. world dataset. The experimental results demonstrate
This model characterizes the latent preferences of users that our proposed video recommendation model
by exploiting the user’s profiles and textual information of outperforms existing classic and state-of-art models.
videos that users have rated. Furthermore, we present a By federated training, our approach is more applicable

2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2944889, IEEE Internet of
Things Journal

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3

to the Non-IID scenario.


The remainder of this paper is organized as follows. Cloud
Aggregator
Section II presents related works. In Section III, we
describe the system architecture of JointRec and introduce
the Dual-CPMF model for video recommendation in
details. The Federated Recommendation strategy (FR) and Mobile Devices
efficient communication strategy will be given in Section
IV. The experimental results are shown in Section V and
the conclusion is presented in Section VI. Cloud Cloud

II. R ELATED W ORK


A. Deep Learning-based Video Recommendation
Mobile Devices Mobile Devices
Recently, the deep learning techniques are widely
studied in the video recommender systems. Convington Fig. 2. The System Architecture of JointRec.
et al. [7] design a two-stage recommendation approach
using deep learning, which allows Youtube to make
recommendations from large video corpus. Kim et al. [19] user data of several specific regions and trains jointly. Wang
propose a framework called ConvMF that integrates CNN et al. [24] address the problem of learning model parameters
into PMF to capture textual information of documents from data distributed across multiple edge nodes, but they
and enhance the rating prediction accuracy. Different from mainly focus on theoretical analysis of distributed machine
[19] that focusing on exploiting video content from item’s learning. The practical application among distributed nodes
perspective, our work considers both user attributes and and joint cloud are not considered. Yue et al. [25] present the
video properties, and applying CNN on user documents to JointCloud Computing Data Trading Architecture (JCDTA),
further capture the user preferences. Wu et al. [20] develop which is optimized to solve the data trading problem for
a novel recommender system based on recurrent neural cross-stakeholder in the JointCloud environment. Based
networks that can accurately model the user and movie on JointCloud Blockchain, Chen et al. [26] achieve a
dynamics on Netflix and IMDB datasets. Zhao et al. [8] privacy-protected and inter-cloud data fusing platform that
develop a heterogeneous movie recommendation model satisfy the demand for data mining and analytic activities
that exploiting the textual descriptions, user ratings and in IoT. Fu et al. [27] propose JCLedger, a blockchain based
social relationships. However, these approaches are only distributed ledger for JointCloud Computing.
suitable to traditional centralized recommendation systems, In contrast to the above works, our research in this paper
and they rarely pay attention to extracting user profile emphasizes the JointCloud computing on the Non-IID data
features that are essential in distributed JointCloud video scenario, where the feature of users and data distribution are
recommendation, whose distribution of data is Non-IID. various among different cloud servers.

B. Distributed Learning and JointCloud Computing III. S YSTEM A RCHITECTURE AND D UAL -CPMF V IDEO
From the perspective of distributed learning, Povey et.al R ECOMMENDATION
[21] study distributed training by iteratively averaging We consider a distributed JointCloud video recommender
local training models. Arjevani et al. [22] investigate system architecture where cloud servers are distributed in
distributed machine learning based on gradient descent different places, as illustrated in Fig. 2. The user data of
from a theoretical point of view. But these works only watching video in these regions are collected and stored
consider the data setting and have unrealistic assumptions in specific cloud server. Each cloud server trains a video
that the data at different nodes is independent and identically recommendation model using the local data. Among these
distributed (i.i.d.), whereas the more general cases involving clouds, one of the cloud is designated as the aggregator,
non-i.i.d. which is hard to solve. Google in [23] proposes which is responsible for aggregating the training parameter
a communication-efficient learning method for training files. Once other cloud servers receive the requests, they send
decentralized data called Federated Learning (FL). Each the parameter files to the aggregator. Then the aggregator
client has a local training dataset and computes an update return the new parameter files to these cloud servers after
to the current global model maintained by the server, the some processing. Some key notations frequently used in this
experiments demonstrate that it is robust to unbalanced paper are summarized in Table I.
and non-i.i.d data distributions. However, the FL approach In the following, we describe the Dual-CPMF model for
cannot be directly applied to our learning task, because video recommendation in details. As shown in Fig. 3, the
the data derives from individual mobile devices and user and video document refer to the user’s reviews on
train locally. Our work shifts the focuses on multi-cloud the videos and the description and name of the videos,
cooperation with each other, each cloud data center stores respectively. The user profile includes the user’s attributes,

2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2944889, IEEE Internet of
Things Journal

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 4

Table I Embedding Matrix


Key Notations A
Little
A little boy named andy love to be in his Boy
room play with his toy especially his doll Named

Notation Definition named woody… this
story
e~i The embedding vector of the i-th word …
This story base on the best selling novel by …
Ei The user word embedding matrix for user i terry mcmillan follow the life of four woman They
l The length of the document
cji The j-th shared weight of user i Fig. 4. Word embedding representation.
mip The pooling feature vector for document of user i
nf The number of filters
Wc The weight matrix of convolutional layer where vector e~i represents the embedding vector of the i-th
bc The bias of convolutional layer word, the Ei denotes the user word embedding matrix for
Ti ,Sj The feature vector of user and video user i, l is the length of the document.
Xi ,Yj The raw input documents of user i and video j
U, V The latent factor of user and video B. CNN Architecture
θ i , ξj The Guassian noise matrix of user i and video j
Iij The indicator to imply whether the user rates video The CNNs can extract the textual features, it includes
σU , σV , σW The variance of Gaussian noise matrix U , V , W three layers: convolutional layer, pooling layer and fully-
rij The real rating value of user i for video j connected layer.
ui , vj The updated value of user i and video j latent vector. Convolutional Layer. The convolutional layer adopts the
W1 , W2 The all parameters in User-CNN and Video-CNN filters to extract features of the documents [29]. Here, we set
gn (t) The gradient descent of cloud n at time t the width of these filters as the width of the input matrix, and
wn (t) The weight file of cloud n at round t then slide the filters over full rows of the input embedding
a×b
Hn The weight update matrix of cloud n
a×b
matrix to form the feature map [30]. The new features after
hmax , hmin The maximum and minimal value of Hn
convolution operation can be expressed as follows:
cji = f (Wcj ∗ Ei + bjc ) (2)
𝜆𝑈 𝑢𝑖 𝑟𝑖𝑗 v𝑗 𝜆𝑉
where the Wcj
∈ R w∗s
is the jth shared weight (w is the
Fully Connected Layer Fully Connected Layer
filter size and s is the dimension of word embedding), bjc
Max Pooling Max Pooling is the bias, ∗ denotes the convolution operator and f is a
Layer Layer
non-linear activation function. There are several non-linear
𝑊1
Convolutional Convolutional
𝑊2 activation functions such as sigmoid, tanh and ReLU, we
Layer Layer
select ReLU to avoid the problem of vanishing gradient.
𝜆𝑊 Embedding Embedding Embedding Embedding 𝜆𝑊
Layer Layer Layer Layer Pooling Layer. After convolutional layer, each filter will
generate a feature map. We then apply the max-pooling
User User Video Video
Document Profile Profile Document operation over the corresponding feature map. The pooling
User Network Video Network layer extracts the most prominent convoluted feature of each
feature map, and constructs them a fixed size vector. The
Fig. 3. The Dual-CPMF Model. feature vector at this layer for user i can be formulated as
follows:
n
i.e., user id, genders, ages and occupations. The video profile mip = [max(c1i ), max(c2i ), ..., max(ci f )] (3)
refers to the genres of videos. The network structures of
user and video are similar, they utilize the CNNs with word where mip
is the pooling feature vector for user document
embedding to extract the feature of users and videos from of user i, nf is the number of filters.
the textual information. Then the PMF model is applied to Fully-connected Layer. In the fully-connected layer,
take the convoluted features as the latent factor to predict high-level features extracted by convolutional and pooling
the rating matrix. With the predicted rating matrix, we can layers are projected by using nonlinear activation function:
conduct top-N video recommendations to users. λU , λV and Ti = tanh(Wf ∗ mip + bf ) (4)
λW are three constant values which will be mentioned in the
following. where Wf is the weight matrix of the fully-connected layer
and bf is the bias.
Finally, we take the output of fully-connected layer
A. Word Embedding Layer as our latent factor for probabilistic matrix factorization
As shown in Fig. 4, the embedding layer transforms a raw (PMF). Therefore, through the above processes, the CNN
document into a textual matrix. each word is represented as architecture can be viewed as a function that takes a textual
a feature vector. In this paper, we adopt the pre-trained word document as input, and returns feature vectors of users and
embedding model Glove [28]. Then, we have: videos as follows:
Ei = [e~1 ||e~2 ||...||~
el ] (1) Ti = cnnu (W1 , Xi ) (5)

2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2944889, IEEE Internet of
Things Journal

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 5

Input layer with Convolutional layer Max pooling Fully-connected


Document
word embedding with multiple filters layer layer

Fig. 5. The convolutional layer.

Sj = cnnv (W2 , Yj ) (6) where rij is the real rating value, N (x|0, σ 2 ) is the
probability density function, and Iij is an indicator that
where Ti and Sj are the user and video feature vector Iij = 1 implies that the user rates video and 0 otherwise.
respectively. W1 and W2 denote all the weight and bias In order to optimize the variables in user and video
variables. Xi and Yj are the raw input documents of user i latent models, weight and bias variables of CNNs, we can
and video j. maximize a posterior estimation (MAP) to conduct PMF.
max p(U, V, W1 , W2 |R, Xi , Yj , σ 2 , σU
2
, σV2 , σW
2
) (13)
C. Probabilistic Matrix Factorization U,V,W1 ,W2

Then, we adopt the PMF model to generate the ratings. = max [p(R|U, V, σ 2 )p(U |W1 , Xi , σU
2
)
U,V,W1 ,W2
Assuming we have N users and M videos, the ratings are
× p(V |W2 , Yj , σV2 )p(W1 |σW
2 2
)p(W2 |σW )]
represented by R ∈ RN ×M matrix. With the user and video
feature matrix Ti and Sj from CNN architecture, we further
add two zero-mean spherical Gaussian noise variables with D. Parameters Optimization
variance σ 2 . Hence, the user latent factor Ui for user i and Given a training dataset, there are many parameters to
the video latent factor V for video j are given by: be optimized in Dual-CPMF. We want to find the MAP
U = Ti + θi (7) estimate of U, V, W1 , W2 , then predicting the missing values
in ratings R and using the predictions to do recommendation.
Actually, the maximization of the posterior probability is
V = Sj + ξj (8) equivalent to minimizing the joint log-likehood. By taking
negative logarithm on formula (13), it can be reformulated
where θi and ξj are Guassian noise matrix, θi follows zero-
2 as follows:
mean Gaussian distribution and the variance is σU . And the
N X
M
ξj follows zero-mean Gussian distribution whose variance is X Iij
σV2 . Namely, θi ∼ N (0, σU
2
Iij ) and ξj ∼ N (0, σV2 Iij ). L(U, V, W1 , W2 ) = ||(rij − uTi vj )||2 (14)
i j
2
N N M
λU X λV X λw X
Y
2 2
p(θ|σU )= N (θi |0, σU Iij ) (9) + ||ui ||2 + ||vj ||2 + ||(W1k + W2k )||2
i 2 i 2 j 2
k

M where λU is σ 2 /σU 2
, λV is σ 2 /σV2 , and λw is σ 2 /σW
2
, ui
Y
p(ξ|σV2 )= N (σj |0, σV2 Iij ) (10) and vj indicate the updated value of user and video latent
j vector.
In order to obtain the optimal value of U and V , we adopt
2
Furthermore, we denote W1 , W2 ∼ N (0, σW Iij ) for coordinate descent algorithm, which iteratively optimizes
the training weights. As mentioned above, the conditional a latent variable by fixing the remaining variables during
distribution over user and video latent factor are given by: training process. We first fix U (or V ), W1 , W2 and take
N
Y derivative of L with respect to U (or V ) and set it to
2 2 zero. Solving the corresponding equations will lead to the
p(U |W1 , Xi , σU )= (ui |Ti , σU Iij ) (11)
i updating rule as follows:
ui ← (V Ii V T + λU IK )−1 (V Ri + λU Ti ) (15)
M
Y
p(V |W2 , Yj , σV2 ) = (vj |Sj , σV2 Iij ) (12)
j vj ← (U Ij V T + λV IK )−1 (U Rj + λV Sj ) (16)

2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2944889, IEEE Internet of
Things Journal

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 6

where Ii is a diagonal matrix whose the elements is Iij , (i = where Ln is the loss function of each cloud n, η > 0
1, ..., N ), Ri is a vector with (rij )N
j=1 for user i. For video is the learning rate. Furthermore, the global update on the
j, Ij and Rj are defined similarly. Given the U and V , we aggregator is defined as follows:
can further optimize the parameters W1 and W2 . However, N
the weight parameters cannot be optimized analytically as
X dn
w(t + 1) = wn (t + 1) (20)
we do for U and V . Because they are closely related to the n=1
d
features in CNN architecture. Noting that the loss function L PN
can be interpreted as a error function with regularized terms. where d = n=1 dn .
By fixing U and V , then we have: The federated recommendation (FR) strategy is presented
in Algorithm 1, the compress and decompress function will
N be explained in Section IV-B.
λU X λV X
E(W1 , W2 ) = ||(ui − Ti )||2 + ||vj − Sj ||2
2 i 2 j
Algorithm 1: Federated Recommendation Strategy
(17)
wk
Input: The N clouds are indexed by n; the weight file
λW X wn of each cloud n; the amount of mobile
+ ||(w1,k + w2,k )||2 + constant
2 devices dn in each cloud, the local minibatch
k
size B.
According to equation (17), we utilize back-propagation {Aggregator};
algorithm to respectively optimize W1 and W2 . Initialize w0 ;
After the repeated iterations, the optimization of for each round t=1,2,... do
parameters are updated until convergence. With the Compress(w0 );
optimized U, V, W1 , W2 , we can calculate the unknown for each cloud n in parallel do
values of ratings R and predict the latent preferences: stransfer(ip, remote, local, usr, psd);
request(url);
r̂ij ≈ uTi vj = (Ti + θi )T (Sj + ξj ) (18) for each wn (t) do
Decompress(wn (t));
Recalling Ti and Sj are the feature vectors that are
extracted from CNN architecture. θi and ξj stand for the wn (t + 1), dn , n ← Cloud(n,wn (t));
Gaussian noise matrix for user i and video j. With respect response();
to cold-start problem, we can use the predicted user-video d += dn ;
PN
matrix to conduct video recommendations. w(t + 1) ← n=1 ddn wn (t + 1);
Now we obtain the training model parameters for video
{Single Cloud};
recommendation in each cloud. In the next section, we
Cloud(n, wn (t)):
will study the federated recommendation strategy among
listen(port);
distributed clouds.
Decompress(wn (t));
user loadweight(wn (t));
IV. F EDERATED R ECOMMENDATION AND E FFICIENT for each local epoch n from 1 to B do
C OMMUNICATION S TRATEGY CNN Model():
wn (t + 1) = wn (t) − η∇Ln (wn (t));
A. Federated Recommendation Strategy ctransfer();
We present a distributed gradient-descent algorithm for Compress(wn (t + 1));
federated recommendation strategy. Supposing that there are return wn (t + 1) to aggregator.
N clouds distributed in different regions, each with local
dataset of size s1 , . . . , sn , . . . , sN . Each cloud n performs a
single batch gradient calculation per communication round
t. Specifically, at t = 0, all clouds download the same initial B. Efficient Communication Strategy
parameters from aggregator, then they compute the gradient- As mentioned above, each single cloud independently
descent gn (t) = ∇Ln (wn (t)), (t = 0, 1, 2, ...) on its local computes a weight update to the current model based on
dataset. The wn (t) is the weight file of cloud n at round its local data, and communicates the update to an aggregator
t. This step refers to local update. After one or more local cloud, where the single cloud-side updates are aggregated to
training, the aggregator gathers these gradients and updates compute a new global model. Therefore, the communication
the parameters for next iteration by applying a weighted efficiency is of the utmost importance. The goal of increasing
average of all resulting models. We define this process as communication efficiency of FR is to reduce the cost of
global update. For each cloud n, the local update rule is sending weight file to the aggregator, while learning from
defined as: data stored on single cloud with limited internet connections.
In this paper, we propose a weight compression strategy
wn (t + 1) = wn (t) − ηgn (t) (19) to reduce the uplink communication cost. We compress

2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2944889, IEEE Internet of
Things Journal

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 7

the weight update files by using a combination of low- V. E XPERIMENTATION R ESULTS


rank matrix factorization (MF) [31] and 8-bit probabilistic In this section, we first evaluate the performance of our
quantization [32] before sending it to the aggregator, then approach on a real-world dataset from four perspectives:
decompress the files before global training. 1) the accuracy on rating predictions compared with
Low Rank Matrix Factorization: Supposing the weight competitors, 2) the parameter analysis, 3) the performance
update matrix of cloud n is Hna×b , a ≤ b, we express of JointRec system, 4) the effectiveness of efficient
Hna×b as the product of two matrices: Hna×b = Pna×k communication strategy.
Qk×b
n , k = b/N , N is a positive integer which affect the
compression performance. After matrix factorization, the A. Dataset Description
number of elements can be compressed by % of the original.
% is a compression ratio, where % = (k × (a + b))/(a × b), We evaluate the performance on MovieLens. It is a
% ∈ [ N1 , N2 ]. extensively used dataset for movie recommender system.
It is composed of users’ explicit ratings on the movies
8-bit Quantization: After the low rank matrix from 1 to 5 score. Furthermore, we only select Movielens-
factorization, we further compress the updates by 8- 1M dataset because it provides some additional information
bit quantizing. Let hmax and hmin as the maximum and besides ratings. It consists of attribute information of each
minimum value of weight matrix Hna×b . Firstly, we equally user, i.e., age, occupation and gender, the movies’ genres
divide [hmin , hmax ] into 2M intervals. The length of each as well. These features are significant to the studies of
interval is lin . For any value h in the matrix Hna×b , we distributed JointRec. We can construct the Non-IID data
h
calculate the position pos of h in the interval, pos = b lin c. scenario for different regions according users’ age. (Note
Finally, we transform the pos into 8-bit value. The weight that we are unable to use the Movielens-10M and Movielens-
32
file can be compressed by M x. 20M datasets because they do not contain these information,
The aggregator will decompress the weight file when and thus we could not utilize the user profiles difference
receiving the compressed update from each single cloud. among each areas.) Furthermore, we enrich this dataset with
The decompression is the reverse process of compression. plot summary for each movie, provided by the IMDB4 . Table
The weight compression algorithm is presented in Algorithm II shows the statistics of the dataset.
2.
Table II
The statistics of Movielens-1M Dataset
Algorithm 2: Weight Compression Algorithm
#users #videos #ratings density
Input: the weight file wn , N, M=8 6040 3544 993482 4.641%
Output: the compressed Una×k , Vnk×b
for each Hna×b from every layer of wn do
hmax = Max(Hna×b );
hmin = Min(Hna×b ); B. Setup
lin = (hmax - hmin ) / 2M ; We conduct our experiments on the networked prototype
if a >1 then system with three cloud servers. They are distributed in
//Low Rank Matrix Factorization; three different places and interconnected via http protocol.
k = Na ; These three distributed clouds have local IoT dataset. In
Una×k = Random initialization matrix; addition, we assign one server to be the aggregator. These
Vnk×b = Unk×a Hna×b ; cloud servers are performed on three Linux Work-Station
//M-bit Quantization; with Intel(R) Xeon(R) CPU E5-2683 v3@2.00GHz, Nvidia
for each u in Una×k do GTX TITAN X and 32 GB memory.
j = bu/lin c; To investigate the JointRec system, we first partition the
u = BYTES(j); dataset into three age groups, i.e., G1 group (sum=876
for each v in Vnb×k do users): the elders over than 50 years old, G2 group
j = bv/lin c; (sum=1325 users): the teenagers under 24 years old, G3
v = BYTES(j); group (sum=3839 users): the users between age of 20 to
50. The G1 group is mainly distributed in Cloud 1. The G2
else
group is mainly distributed in Cloud 2. The G3 group is
Una×k ← Hna×b for each h in Una×k do
mainly placed in Cloud 3. Then we select some users from
j = bh/lin c;
each group and add in other clouds. In this case, there are
v = BYTES(j);
two types of data on each cloud server: majority user group
data and minority user group data, forming the non-i.i.d
return Una×k , Vnk×b ;
on each cloud. We make comparisons under three different
4 http://www.imdb.com/.

2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2944889, IEEE Internet of
Things Journal

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 8

cases. In Case 1, each cloud server trains a model on its predictions. A smaller value indicates a better performance
local data and then conducts federate training according of predictions [36].
to our approach. In Case 2, each cloud server performs v
u N,M
training locally, but they does not interact with each other. u1 X
RM SE = t (rij − r̂ij )2 (21)
In Case 3, all the data are trained centrally in a cloud n i,j
server. In the experiment, we test two data splits. Table III
shows the data distribution on each cloud. where r̂ij is the predicted value, n is the number of
ratings. As for the top-K recommendations, Recall@K
Table III (R@K) and Precision@K (P@K) are selected to evaluate
The data distribution on each cloud the recommendation accuracy.
Cloud Users Proportion Videos Majority Minority |R(i)| ∩ |T (i)|
Distribution 1
R@K = (22)
|T (i)|
Cloud 1 898 0.149 3286 865 33
Cloud 2 1337 0.221 3325 1308 29
|R(i)| ∩ |T (i)|
Cloud 3 3805 0.630 3470 3791 14 P @K = (23)
Distribution 2 |R(i)|
Cloud 1 726 0.120 3354 626 100 where R(i) is the top-K videos in the recommended list for
Cloud 2 1325 0.220 3433 1125 200 user i, T (i) is the items set that user i likes, | · | indicates the
Cloud 3 3989 0.660 3649 3689 300 number of elements in the set. R@K refers to the fraction of
items returned in the top-K list. P@K refers to how many
recommended items are accurate in the top-K list.
C. Preprocessing and Parameters Setting
F. Experimental Result
We preprocess the dataset as follows: (1) converting the
users’ age, genders, occupations and movie genres into equal The experimental results consist of two aspects: 1) For
length arrays, (2) constructing user-plot data according to the centralized training, we mainly evaluate the performance
users’ ratings, (3) calculating the tf-idf score for each word of our proposed video recommendation model. 2) For the
in user-plot and movie-plot data, (4) selecting top 8000 distributed training, we compare the performance under
distinct words as a vocabulary, (5) spliting the dataset into different cases. They include federated training among
the training set (80%), the validation set (10%) and the test distributed clouds, single cloud training without information
set (10%). Additionally, we remove the items which do not fusion based on local dataset independently and the
have plot documents in the whole dataset and the users that centralized baseline.
have less 3 ratings [19]. 1) Prediction Accuracy Evaluation. We first compare
We set the initial size of latent dimension of U and V the results of rating prediction and evaluate the prediction
to 50. The value of λU is 100 and λV is 10. In the CNN accuracy of our model. Table IV shows the RMSE results of
model, 1) we use Adam as the optimizer and each mini-batch our approach and other models. We can observe that Dual-
consists of 128 training samples, 2) we use three different CPMF outperforms other methods, proving the effectiveness
filters to extract features, 3) we set the dropout rate to 0.1 of the user feature extraction for model fitting. Note that
to reduce overfitting. ”Improve” indicates the degree of improvement of ”Dual-
CPMF” over these methods. The results indicate that the
users’ profiles play an important role in recommendation
D. Competitors model, and extracting features from user perspective can
We compared our proposed model Dual-CPMF with four model users’ preferences better. Therefore, our proposed
competitors: PMF, CTR, CDL, ConvMF, which are the model is more suitable for video recommendation.
representative works on extending matrix factorization with
Table IV
content information in recommender systems. PMF [33] is Overall comparison results of RMSE
a basic rating prediction model that only uses ratings. CTR
[34] (Collaborative Topic Regression) combines PMF and Model PMF CTR CDL ConvMF Dual-CPMF
LDA to use ratings and documents. CDL [35] (Collaborative Value 0.8971 0.8969 0.8879 0.8595 0.8472
Deep Learning) uses both SDAE and PMF to improve Improve 5.56% 5.54% 4.58% 1.43%
accuracy. ConvMF [19] (Convolutional PMF) is a context-
aware recommendation model that integrates CNN into PMF. 2) The Impact of Various Sparseness of the Dataset.
As shown in Table V, we compare the RMSE on
different sparsenesses of the dataset. Because the ConvMF
E. Evaluation Metrics outperforms CTR and CDL, we only select PMF and
Root mean squared error (RMSE) is a extensively used ConvMF for comparison. Our approach performs better
measurement for the performance evaluation of rating than both of them over all ranges of sparseness. Specifically,

2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2944889, IEEE Internet of
Things Journal

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 9

CNN structure and the feature extraction of user profiles are


more suitable for matrix factorization, and they are helpful
for building a more effective video recommender system.
In addition, we compare the influence of dimension D
of the latent factor on the performance of Dual-CPMF and
other competitors. These latent factors contain characteristics
of users and movies. The higher value of D implies the
richer information. As shown in Table VI, Dual-CPMF
outperforms all other models in any cases. Furthermore, we
can observe that the performances of all models increase
when factor dimensionality D grows, it indicates that more
textual information bring better prediction accuracy.

Table VI
The impact of latent dimension D

Latent Dimension D
Model
Fig. 6. Parameter Analysis of λU and λV . 10 20 50 100
PMF 0.8685 0.8665 0.8683 0.8646
ConvMF 0.8863 0.8770 0.8566 0.8554
we note that the improvement of Dual-CPMF over ConvMF Dual-CPMF 0.8568 0.8526 0.8489 0.8474
from 2.09% to 1.43% when data density increases from
0.93% to 3.71%. It indicates that Dual-CPMF can obtain 4) Performance Evaluation of Distributed Video
more accurate predictions by providing more ratings. Recommendation System. In Fig. 7 and Fig. 8, we
concentrate on performance evaluation in different cases.
Table V In the figure, the bar in a box is the average recall of the
The RMSE over various sparseness of training data on the dataset model. The upper and bottom borders of a box represent
75 percentile and 25 percentile. The tips of the upper and
Ratio of training set to the entire dataset
bottom whiskers represent the max and min values. Fig. 7
Model 20% 40% 60% 80%
shows the Recall@250 comparison for the minority user
(0.93%) (1.86%) (2.78%) (3.71%)
PMF 1.1137 0.9964 0.9251 0.9097
groups under two data distribution. The recall of our method
ConvMF 0.9908 0.9361 0.8822 0.8595 (Case 1) is higher than Case 2 in all clouds on average,
Dual-CPMF 0.9700 0.9143 0.8596 0.8472 validating our intuition that federated training and parameter
Improvement 2.09% 2.32% 2.56% 1.43% fusion can improve the performance in the distributed video
recommendation system. Although the average recall of
3) Parameter Analysis. To investigate the impact of the Case 1 is slightly lower than Case 3, with the increasing
parameters on the performance of Dual-CPMF, we conduct number of users, the recall of our method approaches to
parameter analysis on λU and λV . We consider four values Case 3, even sometimes better than Case 3. It is probably
of λU and λV , i.e., 0.1, 1, 10, 50, 100 in this experiment. due to the distributed fusion training can make use of the
Fig. 6 shows the RMSE changes of Dual-CPMF under computation resources of multiple clouds. Furthermore, we
different values of λU and λV on the dataset. λU implies can see that the recall of Case 1 shows an increasing trend
how Ti affects user latent factor U and λV indicates how Sj from Cloud 1 to Cloud 3. This is because the number of
affects video latent factor V . Note that the prediction error users on these three clouds is increasing, the more users
is high when both two parameters approach to 0, while the will bring to better recommendation performance. Fig. 8
value decreases when λU and λV increase. It indicates that shows the Recall@250 comparison for the majority user
increasing the textual information of user and video is benefit groups under two data distribution. As shown in the figure,
for increasing the accuracy of rating prediction. We can also the results of majority user group are similar to the minority
see that the best performance of recommendation is achieved user groups in the Fig. 7.
when (λU , λV ) takes value of (100,10), the ”lowest point” ( Fig. 9 shows the comparison of different Recall@K for
RMSE=0.8472) in the figure. It implies that the influence of the minority of user groups in different cases. As shown in
users’ perspective on the prediction accuracy is greater than the figure, with the value of K increases from 50 to 200, the
that of videos’ perspective. However, when λU exceeds a recall of all cases in Cloud 1 to Cloud 3 increase, and our
certain threshold, the accuracy begins to decrease, it is likely method (Case 1) outperforms the one in Case 2. Although
that it would have side effect when we put much emphasize the performance of Case 3 is better than Case 1 in Cloud
on the user information. Furthermore, we note that the 1. As the number of users increase in Cloud 2 and Cloud
ideal combination of two parameters on ConvMF and Dual- 3, the value of recall in Case 1 approaches to Case 3.
CPMF is same, but we achieve the smaller RMSE than 5) The comparison for IID Case and Non-IID Case.
ConvMF model. It indicates that the implements of Dual- To evaluate the performance of our approach in the i.i.d and

2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2944889, IEEE Internet of
Things Journal

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 10

1 . 0
1 . 0
2 .0
C 1 : C a s e 1 C 1 : C a s e 1

C 2 : C a s e 2 C 2 : C a s e 2

0 . 8 C 3 : C a s e 3
0 . 8 C 3 : C a s e 3
1 .8 C lo u d 3 C lo u d 2 C lo u d 1
R e c a ll@ 2 5 0

R e c a ll@ 2 5 0
0 . 6
0 . 6

IID C a s e
1 .6
N o n -IID C a s e

C a s e 1 _ R M S E
0 . 4
0 . 4

1 .4
0 . 2
0 . 2

0 . 0
0 . 0
1 .2
C 1 C 2 C 3 C 1 C 2 C 3 C 1 C 2 C 3 C 1 C 2 C 3 C 1 C 2 C 3 C 1 C 2 C 3

C l o u d 1 C l o u d 2 C l o u d 3 C l o u d 1 C l o u d 3
C l o u d 2

C a s e C a s e

1 .0
(a) Distribution 1. (b) Distribution 2.
0 .8
Fig. 7. Recall@250 comparison for the minority user groups under two 0 1 0 2 0 3 0 0 1 0 2 0 0 1 0 2 0
data distribution. C o m m u n ic a tio n R o u n d s

1 . 0
1 . 0
Fig. 10. The RMSE for IID Case and Non-IID Case.
C 1 : C a s e 1
C 1 : C a s e 1

C 2 : C a s e 2
C 2 : C a s e 2

0 . 8
0 . 8 C 3 : C a s e 3
C 3 : C a s e 3

0 . 3 5
0 . 3 5
I I D C a s e I I D C a s e
R e c a ll@ 2 5 0
R e c a ll@ 2 5 0

N o n - I I D C a s e N o n - I I D C a s e
0 . 6
0 . 6 0 . 3 0
0 . 3 0

0 . 2 5
0 . 2 5
0 . 4
0 . 4

R e c a ll@ 2 5 0
R e c a ll@ 2 5 0
0 . 2 0 0 . 2 0

0 . 2
0 . 2

0 . 1 5 0 . 1 5

0 . 0 0 . 1 0 0 . 1 0
0 . 0

C 1 C 2 C 3 C 1 C 2 C 3 C 1 C 2 C 3 C 1 C 2 C 3 C 1 C 2 C 3 C 1 C 2 C 3

C l o u d 1 C l o u d 3 C l o u d 1 C l o u d 2 C l o u d 3
C l o u d 2
0 . 0 5 0 . 0 5

C a s e C a s e

0 . 0 0
0 . 0 0

(a) Distribution 1. (b) Distribution 2. C a s e 1 C a s e 2 C a s e 3 C a s e 1 C a s e 2 C a s e 3

(a) Distribution 1. (b) Distribution 2.


Fig. 8. Recall@250 comparison for the majority user groups under two
data distribution. Fig. 11. The Recall@250 comparison for IID and Non-IID case under two
data distribution.

non-i.i.d scenarios, we compared RMSE and Recall@250 in


the Fig. 10 and Fig. 11 respectively. In the IID case, we and Case 3, they do not adopt federate training and weight
select the same percentage of users from each age group fusion, so their recalls are similar under IID case and Non-
according to the proportion in the Table III and assign to IID case, validating our method is applicable to the Non-IID
each cloud. In this case, the age distribution among each scenario.
cloud is i.i.d. And the number of users under IID and Non- 6) The comparison of compression ratio. We
IID cases are equal in each cloud. evaluate the reduction in network bandwidth using weight
As shown in the Fig. 10, we can observe that the RMSE compression ratio as follows,
under Non-IID case is lower than IID case, indicating that
our approach in the Non-IID case achieves better prediction Compression Ratio =
(24)
accuracy than in the IID case. In addition, we compare the size[Hna×b ]/size[compress(Hna×b )]
recall under two data distribution in the Fig. 11. We can
see that the recall of IID case under Case 1 is higher than Table VII shows the results of RMSE and compression
Non-IID case. However, the results in Case 2 and Case 3 ratio on Cloud 1-3, we compare the performance of several
are similar. This is because these clouds conduct distributed compression methods. The matrix factorization, 8-bit
fusion and federated training in Case 1. Whereas in Case 2 quantization, and the combination compression method give
1.95x, 3.98x, and 12.83x compression than baseline with
no increase of RMSE.
0 .3 7) The comparison of RMSE under compression. In
0 .2 Fig. 12, we compare three types of compression method
0 .1 C lo u d 2 introduced in Section IV-B. Fig. 12(a), Fig. 12(b) and Fig.
K

12(c) show the RMSE convergence curves on Cloud 1,


M in o r ity _ R e c a ll@

C a s e 1
0 .3
C a s e 2
C a s e 3
Cloud 2 and Cloud 3. As we can see in Fig. 12(a), the
0 .2 C lo u d 3
0 .1
combination compression method in the bottom row and
0 .3 baseline in the top row. It indicates that the combination
0 .2 compression method performs better than others on Cloud
0 .1 C lo u d 1 1, whereas on the Cloud 2 and Cloud 3 shown in Fig. 12(b)
0 .0
5 0 1 0 0 1 5 0 2 0 0 and Fig. 12(c), the curves of three compression methods
K
are much closer to the baseline. This is caused by the
Fig. 9. Recall@K comparison for the minority user groups under different different distribution of users on three clouds. Therefore,
cases. we can conclude that weight compression not only reduces

2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2944889, IEEE Internet of
Things Journal

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 11

RMSE
Compression Method Model Size Compression Ratio
Cloud 1 Cloud 2 Cloud 3
Baseline 0.8885 0.9285 0.8547 85.58MB 1x
Matrix Factorization 0.8854 0.9283 0.8543 43.7MB 1.95x
8-bit Quantification 0.8873 0.9368 0.8541 21.46MB 3.98x
MF + 8-bit Quantification 0.8804 0.9281 0.8545 6.67MB 12.83x
Table VII
Comparison of compression ratio

1 . 0 0
1 . 0 0

B a s e l i n e
B a s e l i n e
B a s e l i n e

1 . 0 8
M F + 8 - b i t Q u a n t i f i c a t i o n
M F + 8 - b i t Q u a n t i f i c a t i o n
M F + 8 - b i t Q u a n t i f i c a t i o n

M F
M F
M F

0 . 9 6
8 - b i t Q u a n t i f i c a t i o n
8 - b i t Q u a n t i f i c a t i o n 8 - b i t Q u a n t i f i c a t i o n
0 . 9 6
1 . 0 4
R M S E

R M S E

R M S E
0 . 9 2

1 . 0 0

0 . 9 2

0 . 8 8
0 . 9 6

0 . 8 8

0 . 9 2 0 . 8 4

0 5 1 0 1 5 2 0 0 5 1 0 1 5 2 0 2 5 0 5 1 0 1 5 2 0 2 5 3 0

R o u n d s R o u n d s R o u n d s

(a) The RMSE of Cloud 1. (b) The RMSE of Cloud 2. (c) The RMSE of Cloud 3.

Fig. 12. Comparison of RMSE with matrix factorization, 8-bit quantization and combination.

the communication overhead, but also has no impact on the the performance of JointRec on real-world movie dataset.
recommendation performance. The experimental results demonstrate that JointRec can
8) The comparison of Recall/Precision/F-Measure recommend accurate videos to both minority and majority
under the compression. In addition to the comparison of users, while the efficient communication strategy can
RMSE, we also discuss the Recall@250, Precision@250 achieve 12.83x compression ratio than the baseline with no
and F-Measure (α = 1) on these three clouds. The Recall loss of performance.
and Precision are defined above. The F-Measure considers In the future, we plan to investigate the edge-end-cloud
the Recall and Precision comprehensively. orchestrated distributed video recommendation, where the
recommender system is deployed in the edge server in
(α2 + 1)P R
F − M easure = (25) proximity to mobile users. In such a case, the data generated
α2 P + R by mobile users can be directly processed locally on edge
where P is the precision and R is the recall, α is a variable. servers rather than sending to the remote clouds. How to
In Fig. 13, we can observe that the values of three metrics deal with the cooperative computing among them is a new
under compression are close to the baseline. The results challenge.
show that weight compression strategy has less impact on
system performance in terms of Recall, Precision and F- R EFERENCES
Measure. [1] S. Zhang, L. Yao, A. Sun, and Y. Tay, “Deep learning based
recommender system: A survey and new perspectives,” ACM
Computing Surveys (CSUR), vol. 52, no. 1, p. 5, 2019.
VI. C ONCLUSION [2] T. Yang, H. Liang, N. Cheng, R. Deng, and X. Shen, “Efficient
scheduling for video transmissions in maritime wireless
In this paper, we have proposed JointRec, a deep communication networks,” IEEE Transactions on Vehicular
learning-based video recommendation framework for Technology, vol. 64, no. 9, pp. 4215–4229, 2014.
[3] X. Zhang, H. Chen, Y. Zhao, Z. Ma, Y. Xu, H. Huang, H. Yin, and
minority mobile users. To this end, we design a Dual- D. O. Wu, “Improving cloud gaming experience through mobile edge
CPMF video recommendation model, this model can computing,” IEEE Wireless Communications, 2019.
extract the unique latent features of users and videos from [4] X. Zhang, H. Yin, D. O. Wu, G. Min, H. Huang, and Y. Zhang,
“Ssl: A surrogate-based method for large-scale statistical latency
the user’s profiles and the description of videos, thereby measurement,” IEEE Transactions on Services Computing, 2017.
achieving more accurate recommendation performance. [5] R. Xing, Z. Su, N. Zhang, J. Luo, H. Pu, and Y. Peng, “Trust
Then we propose federated recommendation strategy that based intrusion detection and learning aided incentive mechanism for
autonomous driving,” IEEE Network, vol. in press, 2019.
each distributed cloud trains model based on local area data [6] Y. Wang, Z. Su, Q. Xu, T. Yang, and N. Zhang, “A novel charging
and updates training weights cooperatively. Considering scheme for electric vehicles with smart communities in vehicular
the heavy communication cost during weight update, we networks,” IEEE Transactions on Vehicular Technology, 2019.
[7] P. Covington, J. Adams, and E. Sargin, “Deep neural networks
develop a weight compression algorithm to reduce network for youtube recommendations,” in Proceedings of the 10th ACM
bandwidth and communication overhead. We have evaluated conference on recommender systems. ACM, 2016, pp. 191–198.

2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2944889, IEEE Internet of
Things Journal

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 12

0 . 5

B a s e l i n e B a s e l i n e B a s e l i n e
0 . 3 0
0 . 2 5

M F + 8 - b i t Q u a n t i f i c a t i o n M F + 8 - b i t Q u a n t i f i c a t i o n M F + 8 - b i t Q u a n t i f i c a t i o n

0 . 4
0 . 2 5

0 . 2 0

1 )
P r e c is io n @ 2 5 0
R e c a ll@ 2 5 0

=
0 . 2 0

α
0 . 3

e a s u r e (
0 . 1 5

0 . 1 5

0 . 2

F - M
0 . 1 0

0 . 1 0

0 . 1
0 . 0 5
0 . 0 5

0 . 0 0 . 0 0 0 . 0 0

C l o u d 1 C l o u d 2 C l o u d 3
C l o u d 1 C l o u d 2 C l o u d 3 C l o u d 1 C l o u d 2 C l o u d 3

C l o u d C l o u d C l o u d

(a) The comparison of recall. (b) The comparison of precision. (c) The comparison of F-Measure(α = 1).

Fig. 13. Comparison of Recall@250, Precision@250, F-Measure(α = 1)

[8] Z. Zhao, Q. Yang, H. Lu, T. Weninger, D. Cai, X. He, and [22] Y. Arjevani and O. Shamir, “Communication complexity of distributed
Y. Zhuang, “Social-aware movie recommendation via multimodal convex learning and optimization,” in Advances in neural information
network learning,” IEEE Transactions on Multimedia, vol. 20, no. 2, processing systems, 2015, pp. 1756–1764.
pp. 430–440, 2017. [23] H. B. McMahan, E. Moore, D. Ramage, S. Hampson et al.,
[9] H. Yin, X. Zhang, H. H. Liu, Y. Luo, C. Tian, S. Zhao, and “Communication-efficient learning of deep networks from
F. Li, “Edge provisioning with flexible server placement,” IEEE decentralized data,” arXiv preprint arXiv:1602.05629, 2016.
Transactions on Parallel and Distributed Systems, vol. 28, no. 4, pp. [24] S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He,
1031–1045, 2016. and K. Chan, “When edge meets learning: Adaptive control for
[10] Z. Su, Y. Hui, and T. H. Luan, “Distributed task allocation to enable resource-constrained distributed machine learning,” arXiv preprint
collaborative autonomous driving with network softwarization,” IEEE arXiv:1804.05271, 2018.
Journal on Selected Areas in Communications, vol. 36, no. 10, pp. [25] X. Yue, H. Wang, W. Liu, W. Li, P. Shi, and X. Ouyang, “Jcdta:
2175–2189, 2018. The data trading archtecture design in jointcloud computing,” in
[11] X. Zhang, H. Huang, H. Yin, D. O. Wu, G. Min, and Z. Ma, “Resource 2018 IEEE 24th International Conference on Parallel and Distributed
provisioning in the edge for iot applications with multi-level services,” Systems (ICPADS), 2018, pp. 1–6.
IEEE Internet of Things Journal, 2018. [26] W. Chen, M. Ma, Y. Ye, Z. Zheng, and Y. Zhou, “Iot service based
[12] Y. Wang, Z. Su, and N. Zhang, “Bsis: Blockchain based secure on jointcloud blockchain: The case study of smart traveling,” in 2018
incentive scheme for energy delivery in vehicular energy network,” IEEE Symposium on Service-Oriented System Engineering (SOSE),
IEEE Transactions on Industrial Informatics, 2019. 2018, pp. 216–221.
[27] X. Fu, H. Wang, P. Shi, Y. Fu, and Y. Wang, “Jcledger: A blockchain
[13] D. Zhang, Y. Qiao, L. She, R. Shen, J. Ren, and Y. Zhang, “Two time-
based distributed ledger for jointcloud computing,” in 2017 IEEE
scale resource management for green internet of things networks,”
37th International Conference on Distributed Computing Systems
IEEE Internet of Things Journal, vol. 6, no. 1, pp. 545–556, 2018.
Workshops (ICDCSW), 2017, pp. 289–293.
[14] H. Wang, P. Shi, and Y. Zhang, “Jointcloud: A cross-cloud cooperation [28] J. Pennington, R. Socher, and C. Manning, “Glove: Global vectors
architecture for integrated internet service customization,” in 2017 for word representation,” in Proceedings of the 2014 conference on
IEEE 37th international conference on distributed computing systems empirical methods in natural language processing (EMNLP), 2014,
(ICDCS). IEEE, 2017, pp. 1846–1855. pp. 1532–1543.
[15] T. Yang, Z. Zheng, H. Liang, R. Deng, N. Cheng, and X. Shen, [29] D. Tang, B. Qin, Y. Yang, and Y. Yang, “User modeling with neural
“Green energy and content-aware data transmissions in maritime network for review rating prediction,” in International Conference on
wireless communication networks,” IEEE Transactions on Intelligent Artificial Intelligence, 2015, pp. 1340–1346.
Transportation Systems, vol. 16, no. 2, pp. 751–762, 2014. [30] Z. Wang, Y. Zhang, H. Chen, Z. Li, and F. Xia, “Deep user
[16] D. Zhang, L. Tan, J. Ren, M. K. Awad, S. Zhang, Y. Zhang, and modeling for content-based event recommendation in event-based
P.-J. Wan, “Near-optimal and truthful online auction for computation social networks,” INFOCOM, 2018.
offloading in green edge-computing systems,” IEEE Transactions on [31] Y. Gong, L. Liu, M. Yang, and L. Bourdev, “Compressing deep
Mobile Computing, 2019. convolutional networks using vector quantization,” arXiv preprint
[17] D. Zhang, Z. Chen, M. K. Awad, N. Zhang, H. Zhou, and X. S. arXiv:1412.6115, 2014.
Shen, “Utility-optimal resource management and allocation algorithm [32] Y. Lin, S. Han, H. Mao, Y. Wang, and W. J. Dally, “Deep gradient
for energy harvesting cognitive radio sensor networks,” IEEE Journal compression: Reducing the communication bandwidth for distributed
on Selected Areas in Communications, vol. 34, no. 12, pp. 3552–3565, training,” arXiv preprint arXiv:1712.01887, 2017.
2016. [33] R. Salakhutdinov and A. Mnih, “Probabilistic matrix factorization,” in
[18] Q. Diao, M. Qiu, C.-Y. Wu, A. J. Smola, J. Jiang, and C. Wang, International Conference on Neural Information Processing Systems,
“Jointly modeling aspects, ratings and sentiments for movie 2007, pp. 1257–1264.
recommendation (jmars),” in Proceedings of the 20th ACM SIGKDD [34] C. Wang and D. M. Blei, “Collaborative topic modeling for
international conference on Knowledge discovery and data mining. recommending scientific articles,” in ACM SIGKDD International
ACM, 2014, pp. 193–202. Conference on Knowledge Discovery and Data Mining, 2011, pp.
[19] D. Kim, C. Park, J. Oh, S. Lee, and H. Yu, “Convolutional 448–456.
matrix factorization for document context-aware recommendation,” in [35] H. Wang, N. Wang, and D. Y. Yeung, “Collaborative deep learning
Proceedings of the 10th ACM Conference on Recommender Systems. for recommender systems,” pp. 1235–1244, 2014.
ACM, 2016, pp. 233–240. [36] M. Elahi, Y. Deldjoo, F. Bakhshandegan Moghaddam, L. Cella,
[20] C.-Y. Wu, A. Ahmed, A. Beutel, A. J. Smola, and H. Jing, S. Cereda, and P. Cremonesi, “Exploring the semantic gap for movie
“Recurrent recommender networks,” in Proceedings of the tenth ACM recommendations,” in Proceedings of the Eleventh ACM Conference
international conference on web search and data mining. ACM, on Recommender Systems. ACM, 2017, pp. 326–330.
2017, pp. 495–503.
[21] D. Povey, X. Zhang, and S. Khudanpur, “Parallel training of deep
neural networks with natural gradient and parameter averaging,” arXiv
preprint arXiv:1410.7455, 2014.

2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.