Вы находитесь на странице: 1из 23

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2905101, IEEE

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2018.Doi Number

Towards Deep Learning Prospects: Insights

for Social Media Analytics
Malik Khizar Hayat1*, Ali Daud2,3, Abdulrahman A. Alshdadi4, Ameen Banjar4, Rabeeh Ayaz Abbasi5,
Yukun Bao6, Hussain Dawood7
1Department of Software Engineering, Foundation University Islamabad, Islamabad, Pakistan
2Department of Computer Science and Software Engineering, International Islamic University, Islamabad, Pakistan
3Department of Information Systems, King Abdulaziz University, Jeddah, Saudi Arabia
4Department of Information Systems and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia.
5Department of Computer Science, Quaid-i-Azam University, Islamabad, Pakistan
6Huazhong University of Science and Technology, Wuhan, China
7Department of Computer and Network Engineering, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia

*Corresponding author: Malik Khizar Hayat (e-mail: khizar.malik@fui.edu.pk)

ABSTRACT Deep Learning (DL) has attracted increasing attention on account of its significant processing power in tasks such as
speech, image, or text processing. In order to the exponential development and widespread availability of digital Social Media (SM);
analyzing these data using traditional tools and technologies is tough or even intractable. DL is found as an appropriate solution to this
problem. In this article, we keenly discuss practiced DL architectures by presenting a taxonomy-oriented summary, following the major
efforts made towards Social Media Analytics (SMA). Nevertheless, instead of the technical description, the article emphasis on
describing the SMA-oriented problems with DL-based solutions. To this end, we also highlight the DL research challenges (such as
scalability, heterogeneity, multimodality etc.) and future trends.
INDEX TERMS social media data, dynamic network, deep learning, feature learning

data. However, its enormousness and diversity invite

such a solution to the problem, which would better be
With the dawn of SM data over the globe, the rate of able to depict the obscured information and knowledge
growth of data-intensive problems has also been from the data. As an active sub-area of machine learning,
increased. The wide availability and exponential growth DL is believed to be a powerful tool to deal with SMA
of digital data have made it almost challenging or even problems. Apparently, together with other SM
unendurable to be visualized, explored, managed, and applications, web-based applications are increasing day-
analyzed by means of contemporary software tools and to-day as recent hotspots [3]. Significantly, it includes
technologies. The abundant increase in data volume, the social computing such as online communities, reputation
diversity in data variety, and the incoming/outgoing data systems, question-answering systems, prediction
velocity (the concept known as 3Vs) are the most systems, recommender systems, and Heterogeneous
prominent reasons, why and how the SM data escalated. Information Network Analysis (HINA) [4]. Moreover,
For instance, more than 1K Petabytes of data per day is the graph theory illustrates better the semantic structure
being processed over the Internet, according to a report of SM data which represents the users as nodes and
by the National Security Agency. In 2006-2011, relationships among them as links.
digitized data has grown nine times and by 2020, it will SM data is enormously increasing every day which
touch 35 trillion gigabytes [1]. This enormous digital requires refined patterns and features extraction for
data intensification unlocks significant research healthier knowledge discovery. Most of the conventional
prospects for various sectors such as education, health, learning methods use shallow-structured learning
industrial sector, business, public administration, architectures. However, DL discusses supervised or
scientific research and so on. In addition, the emergence unsupervised machine learning techniques which
of SM also directs to a sensational paradigm shift in the automatically learn the hierarchical representations for
current scientific research on the road to data-driven classification. Recently, DL has appealed significant
knowledge discovery. consideration from the research community because of
Even though the SM has a key role in connecting the inspiration from biological observations on human
people around the globe [2], it offers a vast variety of brain processing. Also, it has performed predominantly
knowledge extracting tasks as mentioned earlier. Pulling in numerous research areas such as digital image
out the information from data and reaping knowledge processing, speech to text, and collaborative filtering.
from this information is not even yet a trivial problem to Likewise, the DL has been applied in engineering and
solve. Machine learning techniques, accompanied by the manufacturing products; successfully that get facilitated
advances in existing computing power, played an of the huge-volume digital data. Certainly, the well-
important role to leverage hidden information in this known companies, such as Google, Apple, and Facebook

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2905101, IEEE

have to deal with the heaps of data on a daily basis. 1) TERMINOLOGIES

Moreover, these companies are eagerly posing DL-
a) Social Media Analytics
oriented projects. For instance, using DL, Apple’s Siri,
an application in the iPhone works as a virtual assistant, SMA is the web-based technologies used to transform
proposes widespread services counting news about the communications carried over virtual networks and
sports, answers to user’s questions, reports for latest communities into the interactive discussion. The
weather updates, and reminders. While Google applies interactive Web 2.0 Internet-based applications are
DL to the multitudes of chaotic Google Translator’s data. composed of content generated by the user, such as
In contrast, the SMA is one of the paramount, hot, comments or text posts, videos, and data generated
and recent areas of study. Connecting DL with SMA can through all online communications. Generally, users
reveal evocative insights. Formerly, a number of reviews generate service-specific SM profiles that are governed
[5-7] show that DL is viable and efficient to solve by some organization. SM facilitates the growth of
substantial big data problems. However, most of the online social networks by linking profile of a user with
focus was on DL applications, for instance, image other people or groups usually consuming similar
classification and speech recognition. None of them interests. Accordingly, SMA is the approaches to
discussed the most paramount and developed SM assemble data from different SM platforms, for instance,
platform, in particular. DL methods and applications [8] Facebook, Twitter, etc., then evaluate and analyze the
feast through the zeal of application domains including data to make business decisions. Significantly, this data
business, education, economics, health informatics etc. is updating, expanding, and evolving, unceasingly [9] –
In this review article, we cover some noteworthy perhaps, a good pathway to comprehend real-time
application domains of SM such as user behavior customer experiences, intents, and sentiments.
analysis, business analytics, sentiment analysis, and b) Social Network Analysis
anomaly detection where DL has played a striking role The social network lies in the subcategory of SM
to leverage rich knowledge. In terms of contribution, the where two users are connected with a common interest.
following are the fundamental insights this study seeks The individuals and groups are the nodes while the edges
to answer: show connections between the nodes. Social Network
1) Provides a contemporary summary of existing Analysis (SNA) is the drawing and determining
DL methods that can exhibit a roadmap to associations and drifts amongst individuals, groups,
extract useful insights for SMA. organizations, URLs, and other interlinked information
2) Provides a classification scheme that identifies entities using networks and graph theory [10]. It also
important features to study semantics of the helps to analyze human relationships both graphically
particular problem which may be helpful for and mathematically.
designing better future vision in diverse SMA Comparatively, the SMA entitles to the Business
application domains. Intelligence (BI) tools – reporting, searching,
3) Investigates the pros and cons of existing visualizing, text mining and so on – applied to the
techniques. information sourced from SM platforms, for instance,
4) Enlightens the prominent application domains Facebook and Twitter. It helps us to answer the questions
for applying DL. such as, how much and from where you are driving the
5) Uncovers the noteworthy research challenges traffic, how much influential is your messaging etc.
and future directions. However, the SNA is explicitly focused on identifying
Following the introductory section, Section 2 the relationships, connections, interactions, influence
describes relevant concepts and terminologies about SM amongst information engineering entities such as
and DL used in the study. Section 3 illustrates the individuals, groups, organizations and so on. It helps us
taxonomy of methods; however, each taxonomic group to answer the questions – how closely an individual is
is further sub-categorized based on different techniques linked to a network, how information drift occurs within
in selected application domains. Section 4 describes the a network etc. In particular, we investigate the former
DL perspectives for benchmark datasets. Section 5 here in this study.
discusses the performance evaluation measures for DL
models. Section 6 highlights how the SM is challenging c) Big Data
the research community. Additionally, this section The datasets which are so voluminous, diverse, rapid,
deliberates intuitive research challenges and future and complex than the conventional data-processing and
directions whereas Section 7 concludes the study. computational systems are unable to deal with, are
termed as the Big Data. In order to store, process, and
analyze such data entails plausible tools and techniques
II. BASICS OF SOCIAL MEDIA AND DEEP LEARNING which can well-encompass the underlying data.
Before diving into the article details, we begin with an Moreover, SM platforms, for instance, Facebook,
overview of the basic concepts, terminologies, data Twitter, and YouTube are causing swift intensification in
types, and architectures concerning SM and DL. the data. The Big Data from SM can be used to extract
trends, patterns, and associations particularly pertinent to
A. SOCIAL MEDIA human behavior, entities interactions, and complex
Here we explain some mostly practiced terminologies integrations. Certainly, the Big Data is interwoven with
and concepts pertinent to the SM. the explicit 4Vs concepts, meaningfully, Volume,
Variety, Velocity, and Veracity [5-7].

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2905101, IEEE

d) Dynamic Network Language Processing (NLP), computational linguistics,

Users linked together by friendship links, make a and other text analysis techniques to categorize the users’
network of users at Facebook. This network varies over attitude. Meaningfully, it uses data mining techniques to
time by several online activities, for instance, adding or extract and gather data for analysis. On purpose, this
removing the friends, liking or disliking the products, analysis is to determine the subjective knowledge from a
joining or leaving the groups, and so on. In a dynamic piece of text or a corpus such as news articles, reviews,
network, nodes and edges vary over time. Statistical comments, blog posts, SM news feeds, tweets, and status
methods, computational investigation, or computerized updates [17].
simulations are often essential to discover how these d) Link Prediction
networks grow, shrink, fine-tune, or deal with the
peripheral interferences. Link prediction is the data-analysis method used to
estimate the relationships between nodes in the SM
e) Signed Social Network networks. The nodes could be people, organization, or
People hold both kinds of sentiments against each transactions. SM platforms are highly dynamic where
other – positive and negative. With the advancement of nodes and edges are constantly evolving. In an SM
SM platforms, individuals frequently prefer to express network, the new interactions which are yet expected to
their sentiments using these platforms. The sentiments befall, the problem is denoted as link prediction [18].
such as friends or foes, agreements or disagreements,
likes or dislikes, trust or distrust, group joining or
departing, and so on can be bi-categorized into positive a) Microblogging
connections such as friends, agreements, or likes and A blog is an informal-adapted discussion often
negative connections such as foes, disagreements, or depicted in the written, image, audio, or audiovisual
dislikes. Such interactions yield to the development of forms on the World Wide Web. However, a microblog is
Signed Social Networks (SSNs) [11, 12]. Certainly, SM a precise form of a blog which allows smaller content to
signed networks are noisy and sparse usually with a be shared online as compared to a regular blog. It lets a
massive number of users and multitudes of relationships. user post slight content perhaps, small sentences, images,
SM illustrations of SSNs comprise friends/foes in short length videos, or connections to the web pages. In
Slashdot1 [13], and trust/distrust in Epinions2 [14]. order to keep the blog updated, bloggers usually use a
number of services such as instant messaging, emailing,
etc. These services are termed as microposts whereas
a) Information Object bringing them in practice to keep the blog up-to-date is
Real-world networks can be better indicated as graphs. called microblogging [19]. SM websites such as Weibo,
The graph comprises vertices and relationships among Twitter, Facebook also provides microblogging services
vertices. For instance, two people (vertices) can be like status updates and tweets [20]. Some microblogging
related to a network if they are connected via a friendship services allow users to regulate their privacy settings that
link in the Facebook graph. Real-world entities or nodes who can read or comment on their microblogs.
in these graphs are denoted as information objects [15]. Microblogging applications spread in various domains
such as e-commerce, online marketing, product
b) Domain Adaptation searching, and advertising sales.
In business, users buy and sell different products.
Some buying and selling purely depend upon users’ b) Friendships
reviews, ratings etc. However, users’ interested domains Two people connected with each other using any
could vary among multiple types. Partaking a separate online medium construct a social platform termed as
system for classifying sentiments for each domain would friendship network. People are consuming tons of their
not likely be serviceable in terms of heavy cost, higher energy being bridged with others on SM platforms. The
resources requirement, and more processing time as well. friendship network members may or may not focus on a
In order to learn a single system from the set of different particular topic rather put more attention on remaining
domains, a substitute approach is used which solely linked to their friends. In [21], the authors considered
depends upon the labeled or unlabeled data. these networks as friendship-event networks. For
Subsequently, the learned system is applied to any instance, in an academic collaboration network, the
labeled or unlabeled target domain. Being a multi-source researchers are actors/friends, collaborations are
system, it is unviable to extract patterns that are not friendships, and conferences are events. Moreover, the
shared and meaningful across domains. The problem of noteworthy online friendship platforms include but not
learning systems on diverse domain distributions is limited to Facebook3, MySpace4, Badoo5, and Bebo6.
termed as domain adaptation [16].
c) Professional
c) Sentiment Analysis A professional network or particularly professional
The process of finding subjective information hidden network service is a category of social network service
in the users’ content is known as sentiment analysis. that is concentrated merely on communications and
Particularly, the content is in the form of text sourced associations of a corporate/professional nature [22].
from multiple SM platforms. Thus, it uses Natural Instead of counting private, non-corporate or

1 4
http://www.slashdot.org/ http://www.myspace.com/
2 5
http://www.epinions.com/ http://www.badoo.com/
3 6
http://www.facebook.com/ http://www.bebo.com/

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2905101, IEEE

unprofessional relationships, the focus is to either find a terminologies pertinent to the DL.
job or move a step ahead in professional career [20, 22].
Some substantial professional platforms include
LinkedIn7, Xing8, and Data.com Connect9. a) Neural Network
d) Photos Neural Network (NN) or Artificial Neural Network
(ANN) is a paradigm to process information specifically
Photo sharing platforms allow users to share photos
fortified by the biological neurons (like in human
online publicly or privately. Sharing refers to viewing the
nervous systems). It attempts to find fundamental
images but not necessarily download them [20].
relationships in datasets by using the series of processor
Objectively, these sites allow users to backup images,
that imitates the way the human brain works. The
make the images searchable, share images, and even give
significance of this paradigm – it comprises numerous
them the control, who can see their shared images. These
extremely interrelated processing elements (artificial
platforms could also be used for multi purposes such as
neurons) operating mutually to solve particular
image resourcing and repository, visual literacy, and
problems. A neural network usually encompasses
research purposes. Some substantial photo sharing
numerous tier-wise arranged processors operating in
platforms include Flickr10, Instagram11, and Pinterest12.
parallel. Raw input is injected to the first tier, whereas
e) Videos the output from each preceding tier is injected as input to
Video sharing platforms help users to share their the succeeding tier. Eventually, the last tier generates the
videos online publicly or privately. Some of the video final output [26]. In contrast to DL, training in neural
sharing websites allow users to upload short length networks always happens simultaneously for all layers.
videos, however, others allow lengthy video content as b) Convolutional Neural Network
well. Objectively, these sites allow users to save, share,
Convolutional Neural Network (CNN) is
comment, and control viewership of their shared content.
encompassed of numerous layers with few feature
Online SM platforms are one of the extensively used
representation layers and other layers as in a typical
media to share video content, these days. These
neural network. Frequently, it starts off with two varying
platforms also offer live streaming features to
layer types namely; convolutional and sub-sampling.
exclusively get connected with other users. Moreover,
Nonetheless, the former layers accomplish convolution
they also allow users to link their existing SM accounts
processes and the later layer reduces the size of earlier
such as Facebook with the video-sharing websites to
layers [1]. Owing to fewer parameters, training CNNs is
instantly share videos [23]. Some substantial video
rather easier than more fully linked networks. Moreover,
sharing platforms include YouTube 13 , Vimeo 14 , and
with a fixed size input, CNNs generate absolute outputs
which diminish the cost of preprocessing.
f) Question/Answer Forums
c) Recurrent Neural Network
Question/Answer (Q/A) forums tend to answer the
Recurrent Neural Network (RNN), a class of ANN
questions asked by different users across multiple
where links between components construct a directed
domains. Users are able to see the questions asked by
graph. It lets an RNN parade a progressive dynamic
other users if they help them, and similar is the case with
behavior. Distinct from feedforward NNs, internal
posted answers usually by domain experts. Q/A forums
memory of RNNs can be used to process random input
are often fused by huge and professional corporations
sequences. This establishes them appropriate for tasks
and inclined to be applied as a community which lets
such as joining handwriting and even speech to text
users in alike domains discuss questions provided with
conversion. Moreover, in RNNs, every node is connected
expert answers. In social Q/A services, any user can ask
to another node with a directed connection whereas
a question and also post an answer. In [24], the authors
nodes could be input, output, or hidden nodes that adjust
divided these services into three categories – digital
the data between input and output nodes [7].
reference services, where users can ask for help from
librarians, expert services, where organizations offer d) Auto-Encoder
Q/A service and experts are supposed to answer the An Auto-Encoder (AE), also known as auto-
questions, and social or community Q/A, where associator, or Diabolo network is an ANN used for
everybody can ask and answer a question. Sharing unsupervised learning. The Auto-Encoder is aimed to
knowledge over online communities helps users to learn an encoding (usually a representation) for a dataset.
encourage self-presentation, peer recognition, and social The learned representation can then be used for
learning [25]. Some substantial Q/A forums include dimensionality reduction. Recently, the auto-encoder
StackExchange16, Quora17, Answers18. concept has become extensively used for generative
models learning. Constructively, the modest form of and
B. DEEP LEARNING auto-encoder is a feedforward, non-RNN with a similar
Here we explain significantly practiced concepts and quantity of nodes in the input and output layer.

7 13
http://www.linkedin.com/ http://www.youtube.com/
8 14
http://www.xing.com/ http://www.vimeo.com/
9 15
http://connect.data.com/ http://www.veoh.com/
10 16
http://www.flickr.com/ http://www.stackexchange.com/
11 17
http://www.instagram.com/ http://www.quora.com/
12 18
http://www.pinterest.com/ http://www.answers.com/

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2905101, IEEE

Accordingly, an auto-encoder is an unsupervised methods that train several levels of data representations
learning model with binary parts as encoder and decoder in deep architectures. The learning could be supervised,
[27]. semi-supervised, or unsupervised [30]. It uses a multiple-
layered cascade of nonlinear processing units (neurons)
e) Restricted Boltzman Machines
for feature transformation and extraction. The output
A Restricted Boltzman Machine (RBM) is a neural from each preceding layer is used as input for the
network with binary layers. It comprises a single non- succeeding layer as shown in FIGURE 1; 𝑥" , 𝑥# and so
connected hidden units’ layer and connected visible
on are the inputs, 𝑤% up to 𝑤' are weights where 𝑖 =
units’ layer. Additionally, the connections between
hidden and visible units are symmetrical and undirected. 1,2, … , 𝑛 and 𝑦 is the output. Each gray circle is a neuron
There is a bias for hidden together with visible units in which processes input based on an activation function
the network. Binary values are used for hidden and (described later in this sub-section). The neurons
visible units (0 for hidden and 1 for visible). The exchange messages between each other in a complexly
applications of RBM lie around classification, interconnected schema. The links have adjustable
dimensionality reduction, feature learning, collaborative numeric weights (𝑤) based on training. Eventually, these
filtering, and topic modeling [28]. layers form a hierarchy of concepts. DL is composed of
more hidden layers while neural networks consist of up
f) Deep Belief Network to 3 layers most of the time. The training is occurred in 2
Deep Belief Network (DBN), a class of deep NN in the parts for instance, after training layer 1, the output of
machine learning, or rather a generative graphical model, layer 1 would then forward to the next layer. Finally, the
encompassed of various hidden layers with latent output layer demonstrates the learnt representations [31].
variables. Within each layer, the connections are
b) Activation Function
between the layers but not between units. When
unsupervised training is performed, a DBN reconstructs The activation function of a node (neuron) describes
its inputs probabilistically. The layers then act as feature its output given the input or a set of input values. These
extractors. Next to learning, to perform classification, a functions are an extremely important feature for
DBN can be further dealt with supervised training. DBNs Dynamic Neural Networks (DNNs) [32].
can be observed as an alignment of merely unsupervised Fundamentally, these functions decide whether the
networks, for instance, RBM or Auto-Encoders, where neuron should be activated or not. Also, whether the
each hidden layer of sub-network acts as the visible layer incoming information for the neuron is relevant or
for the next [29]. ignorable. It is the non-linear transformation done over
the input signals. The result of this transformation is then
2) CONCEPTS set to the next layer of neurons as input. Mathematically,
a) Deep Learning
the activation function can be depicted as:
DL deals with a collection of machine learning

FIGURE 1. Deep learning

hidden space. Each vertex in this space is denoted as a
𝑌 = 𝑏 + 3 𝑥% 𝑤% (1) low-dimensional vector which helps in the direct
%4" computational processing of the network. For embedding
a number of networks, it is essential to preserve the
where 𝑌 is output, 𝑏 is bias factor which minimizes network structure both locally and globally [34].
the loss function over the training set, 𝑥 is the input, and
𝑤 is the connection weight. d) Word Embedding
Applying DL to resolve NLP problems aims to obtain
c) Network Embedding well-distributed words representations, particularly the
Network embedding is a technique to learn the vertex words from vocabulary are mapped to real-valued vector
representations of low-dimensions in networks [33]. It representations. These vector representations are knowns
drives to demonstrate the data into a low-dimensional

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2905101, IEEE

as word embeddings [35]. Theoretically, it encompasses individual biological neuron or feature in a deep model
a mathematical embedding that revolves around a space is not intelligent. However, a large collection of
opening at one dimension per word and ending at connections related and acting together can subdue
continuous and substantially low-dimensional vector intelligent behavior. In this section, we discuss the
space. Using these representations as input, these problem domains of SM where DL has been used as a
embeddings can boost the performance of NLP tasks, for key technique to the problem solutions. Next, we discuss
instance, sentiment analysis, syntactic parsing, and NLP. the pros and cons of existing methods. To end, we walk
through the taxonomic details of these methods and
provide a comprehensive summary of the DL methods
Connectionism is the philosophy of DL while an for SMA. FIGURE 2 shows the taxonomy of the DL
methods in SMA.

FIGURE 2. Taxonomy of DL Methods

A. USER BEHAVIOR ANALYSIS quite effectively. Aiming which, Qingchen et al. [37]
The current living society is a combination of various proposed a Tensor Auto-Encoder (TAE), deep
entities whereas human beings are one of them. computational model, to learn features from the
Intuitively, the behavior of human beings can be majorly heterogenetic YouTube data. Given a reference basis of
categorized into – individual behavior and group vectors, a tensor is used to represent the linear relation
behavior. Both have their own causes and consequences. between vectors. Arrays are one of the ways to represent
However, human beings as users in society, behave tensors in computer memory. The dimensions of the
differently in different social situations. Social behavior array make the degree (rank) of a tensor. For instance, a
is the outcome of certain atmospheric changes, 2D array can be used to represent a linear map between
environmental events, or social influences. To obtain the vectors, hence, a 2nd-order tensor. In order to represent
knowledge of the well-being of a society, along with the input data and the representations in all layers, the
knowing the social changes, it is equally important to TAE extends the conventional DL model to high-order
become familiar with the social behavior of individuals. tensor space by means of tensors. In this model, tensors
Moreover, it is worthy to determine the impact of are used to fuse the learned heterogeneous data features
social influences on users' behavior. As defined earlier, inside the hidden layer. It benefits the tensor DL model
the SM is a prominent source of connecting people in to apprehend the multifaceted relationships of the input
society and primarily count on user-generated content. data. To train the TAE model, the authors designed a
Accordingly, the DL offers captivating techniques to high-order back-propagation algorithm – an important
analyze users’ behavior, learning correlations between tool to improve the prediction accuracy. However, in
their past and current characteristics based on SM. Here comparison with using homogeneous data on TAE, the
we go through some categorized tasks performed in SM heterogeneous data took more time as it requires more
to analyze users’ behavior using DL. iterations to train the parameters.
SM data comprises a wealth of valued information
1) PREDICTION USING DL usable to make noteworthy predictions. Apparently,
A number of studies exist which used DL to predict heterogenous source learning is yet a non-trivial task in
human behavior in social networks. Heli et al. [36] SM. Yongpo et al. fusing social networks [38], proposed
learned communities using data from Twitter which is a novel deep model, Fuses sociAl netwoRks uSing dEep
considered as one of the prominent tools for information lEarnING (FARSEEING) which integrated the useful
dissemination. DL could handle multi-dimensional data

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2905101, IEEE

information from heterogeneous social networks. characteristic.

Particularly, this is an information fusion task which However, the SM is heterogeneous with multi-modal
acquired DL to learn from the complexity of multiple user-generated content which inspires to have joint
data sources. In this model, the authors used different representation for the data. For instance, a flower image
inner layers to learn complex representations from could be associated with a number of textual tags which
multiple social networks. First, the users are associated make latent feature learning for image classification
by relating their multiple social forum accounts. Second, quite complicated. A joint representation could better
the given users are characterized using extracted multi- deliver the information associated with the content.
faceted features such as linguistic, demographic, and Zhaoquan et al. [15] proposed a DL-based approach to
behavioral features. Apparently, the activities of users in classify SM data, particularly images, using latent
SM are unbalanced which cause the data missing. This feature learning. The authors used Flickr dataset and
missing data is inferred before feeding the extracted classified the images as linked and not linked with a tag
features into FARSEEING using Non-negative Matrix which somehow articulated this as an image
Factorization (NMF). NMF is a set of algorithms in the classification as well as link analysis problem.
multivariate analysis where a matrix M is factorized into Apparently, dealing with such an enormous feature space
two matrices X and H, without any negative element in is a non-trivial task, rather, DL could be a worthy tool to
all three matrices. handle image data. The reasons include the unsupervised
The low-level features are mapped into high-level pre-training of diverse social data characteristics, fine-
features using deep layers then, the high-level features tuning of features, layer-wise learning structure, and the
are fused together for learning the task. The users’ explanation of more abstract and robust semantics.
confidence level and consistency among multiple social In [15], the authors presented a Relational Generative
forums are measured to gain a comprehensive Deep Belief Nets (RGDBN) model and investigated
understanding of a user’s interest, behavior and links between information objects which are generated
personality traits. The data sources used as a ground truth by interactions of latent features. Initially, low-level
are Quora, About.me, and LinkedIn. representations are learned in RGDBN, then using a deep
Social networks are the source of creating online architecture with more layers, higher-level
relationships among users. Labeling these relationships representations are used to learn better the links between
as negative or positive drive these networks to SSNs. images and associated textual tags. The authors believe
Feng et al. [39, 40] proposed deep belief network-based that integrating the collective effect of latent features
(DBN) techniques to predict links in SSNs. The learning into deep model could better represent the
prediction tasks such as co-authorship, friendship, trust, diverse and heterogeneous data space. Keeping in view,
distrust, and further associations are considered. It is learning useful network representations, Diaxin et al.
obvious that social networks have escalated usage rate presented a Structural Deep Network Embedding
for communication in a number of crises situations, these (SDNE) [33] model, to efficiently capture the
days. Mehdi et al. [41] presented a strong analysis of SM enormously non-linear structure of the complex
posts such as text, images to predict the crises situations networks. In particular, it is a semi-supervised deep
communicated by users. The various SM posts are model with several layers of non-linear functions. The
typically informal, brief, and heterogeneous (a mixture multiple deep layers in SDNE allowed this model to
of languages, acronyms, and misspellings) in nature. seizure the tremendously non-linear heterogeneous
Without loss of generality, identifying the context of the network structure. The objective of network embeddings
post is often necessary to infer its underlying meaning. is to learn complex representations for heterogeneous
Moreover, the posts on other ordinary events are also the networks.
data part, which grants supplementary training noise. DL Using social networks, people share multi-typed
better understands these complex representations to learn diverse data. However, it is unlikely that users share their
the crises situations. In addition, Xiaoqian et al. [42] used personal data, for instance, gender, birth year,
microblog data to predict users’ behavior by proposing demographics etc. User behavior prediction entails
an unsupervised drawing of the Linguistic classifying the users on the basis of their age groups
Representation Feature Vector (LRFV). This method which can reveal valuable insights about user behaviors
could comprehensively and more objectively describe among different age groups. In [43], the authors analyzed
users’ semantic information. 7000 sentences from social networks. They used Deep
Convolutional Neural Network (DCNN) to classify
social network posts features such as hashtag, retweet,
SM is one of the most prominent modes of interaction
characters in a tweet, number of followers, number of
amongst people in which they produce, exchange and
tweets etc. After extensive experimentation using
share ideas and information in networks and
different machine learning algorithms, for instance,
communities. In general, SM data is noisy, diverse, of
Random Forest, Decision Trees, Support Vector
low quality, in large quantity, and heterogeneous in
Machine (SVM) etc., they found that DCNN
nature. In order to record routine activities, users with
outperformed the counterparts in terms of large-scale
diverse background practice SM platforms. This
data classification. In [43], the authors also proposed
influences the SM data to be subjective. It also gives
enhanced Sentiment Metric (eSM) which could classify
these data a wide collection of attributes such as the
the users by age, who restrict their personal information.
resources used, the appearance of entities in a specific
context, information diffusion, link analysis, and so on. 3) CLUSTERING USING DL
For instance, SM tasks such as image annotation and In SM data, community detection is a realistic solution
classification are non-trivial because of the diversity to determine the intrinsic grouping of information

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2905101, IEEE

objects (defined under Social Media Concepts). In order cricket, like to play games relevant to cricket on PC or
to group information objects, diverse attributes may have Xbox. The authors used several data sources such as
varying contributions. The attribute values have pairs of Microsoft product logs such as Bing search log,
interest with influence on the grouping task. For instance, Windows Store download history log, or movie view
the degree of qualification – a worthy attribute in an logs from Xbox to make an interest-oriented
attempt to group users with corresponding institutions. recommendation.
In social networks, Thi et al. [44] integrated clustering DNN is used to incorporate the high-dimensional to
with ranking and proposed a novel deep model, Deep the lower-dimensional features space of users and items
Learning Cluster Rank (DeepLCRank). This method from different domains. Social network users often
better illustrates the ranked clusters in social networks. belong to different domains, hence always look for items
For each item of the cluster – there is a rank assigned on of their interests. MV-DNN has the ability to recommend
the basis of learned features of information objects in the items based on categorical features such as movie genre,
network. The varied information objects in social application category, country or region the item belongs
networks form much complex representation where to, and so on.
DeepCLRank could handle this quite effectively. Collaborative Filtering (CF) is also a famous style of
recommending the appropriate content to users. CF-
based methods typically use users’ ratings to recommend
Individuals tend to use social media as a marker to
pertinent items to them. However, the sparsity of ratings
solve their problem by posting queries. These forums are
causes noteworthy deprivation in recommendation
known as Community Question Answer (CQA) forums.
performance. Hao et al. [47] proposed a Collaborative
These help users to have satisfactory information.
Deep Learning (CDL) model which assimilates the
Nevertheless, it is not very likely that users would be able
learning of deep representations for the content
to get the desired content in fractions of time because
information (items) and collaborative filtering for users’
there exists a lot of answers to the same questions at
ratings. The authors used diverse data domains such as
CQA. It necessitates ranking the answers provided by
CiteuLike, Netflix, and IMDB to recommend items to
experts at CQA.
Zheqian et al. [45] proposed an approach to predict
In addition, users’ trust also plays an important role in
users’ personalized satisfaction using multiple instance
finding trustworthy recommendations. Shuiguang et al.
DL frameworks. The authors presented a novel model,
[48] proposed Deep Learning based Matrix Factorization
Multiple Instance Deep Learning (MIDL) framework to
(DLMF) model to synthesize the interests of users and
predict personalized user satisfaction. In CQA, a single
links of their trust. DLMF performed better in terms of
question could have a number of answers. The known
recommendation accuracy with unusual data and cold-
aspect is one of the answers is assigned a satisfied tag,
start users as well. By using Epinions data, the authors
however, the known aspect is what exactly the answer is
used an autoencoder to learn the users’ initial feature
with satisfied tag. This situation intrigues to learn
vectors and items in the first phase, whereas final latent
multiple instances for answers. Each answer to a
feature vectors are learned in the second phase. This
question at CQA is considered as an instance in a bag
method could also work for trusted-community
where each question resolution acquires one satisfactory
detection. TABLE TABLE I shows the summary of DL
answer on the Stack Exchange dataset. In terms of the
methods for user behavior modeling.
historical behavior of users, common user space is
defined and initialized for the representation of each
individual user. Subsequently, after extracting features, B. BUSINESS ANALYSIS
all are injected into deep recurrent neural networks to With the dawn of SM such as social networks, blogs,
rank them as positive or negative. review forums, ratings, and recommendations are swiftly
thriving. To automatically filter them out is very critical
5) RECOMMENDATION USING DL for businesses tending to sell their products and
SM data is a promising source of the incessant recognize new market prospects. However, the large-
recommendation of pertinent content to multi-domain scale [49, 50] social data makes it tough to classify users’
users. The influence of recommendation could be sentiments automatically. For instance, reviews from two
exaggerated if items from different domains are jointly different domains would contain different vocabulary
learned and getting recommended. Ali et al. [46] which surges different data distributions for diverse
proposed a Multi-View Deep Neural Network (MV- domains. Consequently, domain adaptation can play a
DNN), which mapped items and users into a semantic transitional role to learn intermediate representations.
space which is shared and recommends items with
capitalized similarity. For instance, people who visit
espncricinfo.com would most likely to see news about
Model Year Main Idea Findings Limitations
Prediction using DL
• Tensor DL model is effective to • Tensor-based feature learning
• Deep-computation learn features from takes more time
2016 model for learning heterogeneous data. • Heterogeneity in data enhances
features on big data • Higher capability of multimodal the complexity of the model
features learning

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2905101, IEEE

• DL is effective in dealing with • Measuring the confidence level

multi-sourced SM data for each information source
• Multiple social • FARSEEING performed better • Consistency among different
FARSEEING [38] 2016
network learning capturing the complex nature of information sources
SM by multi-sourced • Multi-source multi-task
information fusion learning
• DBN-based link
• Positive and negative links
prediction and • Incorporate more significant
DBN-based link prediction with improved
feature features in the model
prediction model 2015 accuracy also in
representation • Update novel link prediction
[39] • Behavior learning for members
• Signed social approaches
in signed social networks
• Relevance between the
personality of users and their
Unsupervised behaviors over social networks
Linguistic • Prediction of social linguistic • Consider huge data to learn
• Feature learning for
Representation behaviors of users through features
2016 personality
Feature Vector unsupervised DL • Novel unsupervised feature
(LRFV) extraction • LRFV is more efficient, objective, learning models
[42] and complete as compared to
the supervised feature
Clustering using DL
• Integrating DL with the SM data
for better ranking
• DL for enhanced
DeepLCRank • Assimilating ranking with • Synthetic experimentation
2017 clustering and wise
[44] clustering or community • Explore real-world datasets
member ranking is objectively
Classification using DL
• RGDBN-based feature learning
process is adequate to retain
information about connections
• Consider more sensible and
both for homogeneous and
multifaceted forms of latent
heterogeneous data
• Deep architecture- features
RGDBN • More effective prediction of
2013 based latent features • Posterior inference techniques
[15] links between connected media
learning for model learning, and
experimentation in different
• Non-feature-based learning
SM platforms
methods are ineffective in user
recommender systems and
image annotation tasks
• Deep networked • Multi-layered semi-supervised
SDNE embedding-based deep models better preserve the • Learn representations for a
[33] network structure- network structure non-connected novel vertex
preserving • Robustness to sparse networks
• With the writing style, history, • Consider other sentiment
• Classification of and profile-based users’ history, metrics such as time-based
2017 teenager and adults the accuracy of age web-log to predict missing
using deep CNN determination for Twitter users information in users’ social
can be enhanced profiles
Ranking using DL
• Question-diversity increased the
• User-personalized necessity of satisfaction • Applications to other
2017 satisfaction prediction information retrieval tasks and
prediction • DL-based methods predict with NLP
higher accuracy
Recommendation using DL
• Incorporate more users’
features to improve
• Content-based
recommendation quality,
recommendation • Augmented-recommendation
• Avoid dimensionality reduction
MV-DNN system with the MV-DNN framework
2015 with enhanced DNN-based
[46] • Enhance the quality • Incorporate multi-domain data
and scalability of for the recommendation
• Integrate collaborative filtering
with DL for the
• CDL can better understand the
• Learning user ratings
tagging and collaborative
CDL • Deep representation • Consider the context and order
2015 filtering
[47] learning for content of words in text corpora
• Suitable recommendation
increases the output precision

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2905101, IEEE

• DLMF predict improved latent

features for the enhanced
• Trust-based recommendation • Temporal sensitivity to
2017 recommendation in • Individual characteristics affect recommend trust-based
social networks the users’ decisions content in social networks
• Reliable recommendations by


Using SM platforms, people used to manage customer On account of the upsurge practice of SM, people tend
relationship [51], and make hotel decision for to buy attires and dresses online, these days. Kevin et al.
outing/eating [52]. Actually, SM platform such as [55] proposed a hierarchical deep CNN framework to
Facebook, Twitter are now an extensive source of input recommend better and efficient clothing options for
and highly valuable for marketing research corporations, online customers. The authors used a large-scale image
public view associations, and other text mining units. It dataset sourced from Yahoo online shopping. The
drags such entities to spend more on SM to gain more costumes’ images depict high variability in poses and
business [51, 53]. appearances with significantly noisy backgrounds. The
Xavier et al. [16] presented a DL approach for domain reason, the clothing-specific tree is generated with the
adaptation of sentiment classifiers. DL techniques learn categories such as Men, Women, and so on, whereas sub-
the intermediate concepts between the source and target categories include top, dress, coat, outfit and so on. The
data. The domain adaptation drift enables DL to learn solution intuition is to match the cloths images liked by
meaningful intermediate concepts such as product price customers to that of the images in the dataset. The deep
or quality, customer services, customer reviews about CNN is used to learn discerning features representations
products and so on. Based on the features revealed at the automatically, talented of detecting heterogeneous types
preceding level, the features are learned, level-wise. of clothing images. Additionally, the DL-based
Additionally, Amazon is a widely used platform for hierarchical search offered immediate retrieval response
business where DL leveraged better learning as compared to conventional CNN with manually
representations across all domains. constructed features.
The authors used Amazon dataset in [16] with reviews The deep CNN models are becoming pervasive for
from domains together with books, kitchen, electronics, learning feature representations. M. Hadi et al. [56]
and DVDs. For feature extraction phase, the Stacked proposed a DL-based model to match the precise
Denoising Autoencoder (SDA) is compared with multi- shopping location with users’ query. The large-scale
layer perceptron (MLP). The two variations of SDA-1 cloth shop images are used from Tamaraberg 19 and
with one layer and SDA-3 with three layers are used in ModCloth 20 . The problem of matching users’ clothing
the comparison. query with available and feasible shopping location is
The MLP performance illustrated that being non-linear intuited as computing the cosine similarity between the
supports to extract information, however, is not adequate cloth features from query and features of online shop
to accumulate all necessary information from data. It is images. To recommend customers with better shopping
more passable to use an unsupervised phase which can locations, the shop retrievals are ranked based on the
incorporate data from diverse domains. Obviously, on computed similarity. Accordingly, Qiang et al. [57] also
this wide-ranging problem, a single layer does not suffice presented a DL-based approach to describe people on the
to grasp optimal performance. The stacking of three basis of granulated clothing features.
layers together returns the best representation from data.
It is worth noticing that the representation learned by C. SENTIMENT ANALYSIS
SDAsh3 is significant for diverse domains and is thus Sentiment analysis, also stated as opinion mining
accurately tailored to domain adaptation. TABLE II involves the attitude prediction of users’ that are
shows the summary of DL methods for business analysis. generating massive textual content using multi-typed SM
Xiao et al. [54] presented a CNN-based model to platforms, for instance, Facebook, Twitter, etc.
classify users based on their products need that is Significantly, the intent of analyzing users’ sentiments is
expressed on an SM platform. A product consumed by a prevailing, that is to classify their intellect towards a
certain user is more likely to get endorsed, subsequently. specific product or topic is positive, negative, neutral, or
From a range of products, the products’ consumption even to classify into some other category. It is commonly
makes it significant to classify products that are more applied to the product survey answers, customer
likely to be consumed by the consumers. The proposed evaluations, user opinions, and in domains such as
CNN-based product consumption intention model can education, business, e-commerce, and healthcare. In this
better classify the words of intention from the text as section, concerning sentiment analysis, we explore the
compared to the SVM along with word embeddings or techniques comprising prediction, classification, and
bag-of-words. ranking the sentiments.
Model Year Main Idea Findings Limitations
Classification using DL

19 20
http://www.tamaraberg.com/street2shop http://www.ModCloth.com

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2905101, IEEE

• Less computation
• Extract unsupervised required for cross-
SDA • Design cost-effective and
2011 meaningful representations domain transformations
[16] efficient DL-based model
for sentiments • Multi-domain data
• Semantic information
with word embeddings
in sentences,
• Model user consumption • CNN-based CIMM
CIMM intention for certain accomplished better • Personalized classification of
[54] products performance than users based on product usage
• CNN model traditional SVM
• Mid-level sentence
representation by
domain adaptation
Recommendation using DL
• Learning domain-
specific visual features
Deep CNN • Supervised pre-training • Develop models with unlabeled
• Better and efficient online
framework 2015 on the image dataset images
clothing recommendation
[55] • CNN-based content • Attribute annotations
• Introduced a novel
Deep CNN-based • More precise configuration
• Enhance the similarity dataset ‘Tamaraberg’
similarity between user intended and
2015 between users’ queries and • Similarity computation
computation shop images
online clothing shop images for users’ queries and
[56] • Manual evaluation
shop images.

embeddings. Nevertheless, in [60], the authors proposed

a DL architecture for Twitter sentiment analysis using
Predicting opinions from SM data is a prevalently
pre-trained word embeddings sourced from GloVe
active task. The English language has been commonly
embeddings [61]. In [35], the authors used the root, affix,
used for opinion mining task. However, Changliang et al.
and syllable to infer the semantic meaning from text. In
[58] predicted sentiment labels for Chinese sentiment
order to enhance the word representations, the semantic
corpus. The authors collected 2270 movies’ reviews21 .
and syntactic knowledge is used as additional inputs. The
Subsequently, the movie reviews are filtered based on
Continuous Bag-Of-Words (CBOW) model is used as
specific criteria such as with rude language, with special
the baseline method, however, the Morfessor, Longman,
symbols, with more than one sentence, with typos, with
WordNet, and Freebase are the datasets used in the
short or long sentences, or with multi-languages.
evaluation of word embeddings’ quality learned with
Eventually, they constructed a dataset composed of
fused knowledge and without knowledge. In the same
Chinese sentiments and named it – Chinese Sentiment
regard, this study also explores three tasks namely word
similarity, analogical reasoning, and sentence
The sentiments are classified into 5 classes,
completion task. After comparison, the DL framework
specifically, very positive, positive, neutral, negative,
directly generated the embeddings for each root/affix,
and very negative. The authors proposed a recursive DL
and syllable by aggregating the morphological elements.
model namely Recursive Neural Deep Model (RNDM)
Significantly, because of the layered nature of DL-
to predict the labels for these classified sentiments. This
based models, they are well-equipped to incorporate both
model is compared with three baselines models, Naïve
semantic and syntactic features learning. However, the
Bayes (NB), Maximum Entropy (ME), and Support
authors concluded that syntactic knowledge delivered
Vector Machine (SVM). Equally, the RNDM
significant input information but it may be inappropriate
outperformed all of the baselines as it predicted better
for regularized objectives. Nevertheless, semantic
sentiment labels for sentences with contrastive
knowledge can expand the performance of sentence
conjunction structure, like “X but Y” in English.
completion and the word similarity task. In addition,
Recently, in order to resolve a wide range of NLP and
though, for the analogical reasoning task, applying the
text mining tasks, DL techniques are rapidly developed
semantic knowledge as additional input is quite
and have drawn persuasive consideration to training deep
and complex models on abundant data. Since the text is
Accordingly, owing to leverage the users’ expressions
created by humans, it already comprises morphological
and sentiments within 140 tweet characters’ restriction
aspect such as grammatical rules, syntactic knowledge,
by Twitter, it is rather a non-trivial task. In order to clean
for instance, Part-Of-Speech (POS) tagging, and also the
the unnecessary information from tweets dataset used for
semantic knowledge such as the relationship between
experimentation, Stojanovski et al. [60] removed all
words and entities, synonyms, and antonyms. However,
URLs and HTML entries from the tweets. The pre-
DL leverages to generate fine word embeddings by
trained word embeddings are used to construct lookup
incorporating knowledge [59].
tables, where each is linked with the matching feature
Jiang et al. [35] conducted an extensive study to show
representation. In terms of DL, the authors fused two NN
the injection of knowledge-based DL into word


2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2905101, IEEE

models; one is CNN used for feature extraction from (LP) needs to be crafted manually which depends on the
tweets, and Gates Recurrent Neural Network GRNN that sentences’ grammatical accuracy. Owing to the
uses sequential data where input are dependent upon the automatic feature extraction nature, a CNN framework
previous outputs. The noteworthy properties and reasons could effectively overcome such limitations for feature
using a Gated Recurrent Unit (GRU) comprise: it uses extraction.
less number of parameters, needs fewer data to The aspect-term features are based on its neighboring
generalize, and also enables rapid learning. words. While, for aspect extraction, a 5-word window is
Architecturally, GRU contains gating units – to control used about all the words in a sentence, particularly ±2
the information flow inside the underlying units. words. The window features which lie locally are
Consequently, the fusion of CNN with GRNN dominated considered as middle word features. Next, the feature
the performance of existing individual NN models. vector is served to a CNN. Similar to [35], CBOW model
is used for creating word embeddings. Google and
Amazon embeddings are used as dataset particularly in
In current ages, DL has arisen as an operative means
electronics (e.g., cell phones, laptops), or food-chains
for addressing the problems pertinent to sentiment
(e.g., fast-food, restaurant) domains on which five LP
classification. Without human efforts, except the labeling
rules defined in [63] are applied. Though, the
phase, a neural network innately learns a valuable
performance of CBOW over food-chains found to be
representation, automatically. Nevertheless, the victory
better than that of electronics’ domain. Since electronics’
of DL extremely counts on the availability of extensive
domain encompassed less aspect-oriented terms.
training data. With the thriving of e-commerce and Web
Ziyu et al. [64] also presented a novel framework,
2.0, people start consuming SM increasingly and post
Weakly-supervised Deep Embedding (WDE) to classify
comments about their buying experiences on review or
customer reviews. The focus is on the semantic
merchant websites. This prejudiced stuff is a valuable
orientation for each sentence. In this framework, first, a
resource to merchants for products improvement, service
high-level representation is learned which captured the
quality, and to prospective customers to make
overall distribution of sentences in terms of sentiments
appropriate decisions.
using rating information. Next, a classification layer is
Soujanya et al. [62] proposed a systematic approach to
supplemented on the upper end of the embedding layer
extract short text features. It is founded on the inner layer
for supervised fine-tuning using labeled sentences.
activation values of a deep CNN. The authors used CNN
Practicing review ratings for sentiment classification
for textual data feature extraction. However, utterances
[64] is an initial effort in the sentiment analysis
are translated from Spanish into English using Google
translator. The CNN used is composed of 7 layers and is
A large amount of unlabeled data is trained by using
trained using a typical backpropagation procedure –
RBM/Auto-encoders. As ratings are noisy labels, so
usually convenient in improving the accuracy of the
which would mislead classifier. Hence, the following
simple 5-star rating scale rule is adopted:
The dataset is composed of 498 short video fragments
where a person utters one sentence. For the sake of
𝑝𝑜𝑠, 𝑖𝑓 𝑠 𝑖𝑠 𝑖𝑛 𝑎 4 𝑜𝑟 5 − 𝑠𝑡𝑎𝑟𝑠 𝑟𝑒𝑣𝑖𝑒𝑤
polarity of sentiments, the items are manually tagged as 𝑙(𝑠) = 9
𝑛𝑒𝑔, 𝑖𝑓 𝑠 𝑖𝑠 𝑖𝑛 𝑎 1 𝑜𝑟 2 − 𝑠𝑡𝑎𝑟𝑠 𝑟𝑒𝑣𝑖𝑒𝑤
positive, negative, or neutral. However, by discarding
neutral items, in total 447 items are processed. The
combined feature vectors of visual, audio, and textual where 𝑙(𝑠) represents the weak label of sentence 𝑠 .
modalities are used to train a classifier grounded on Amazon customer reviews of three domains namely,
multiple kernel learning (MKL) algorithm, a well-known digital cameras, laptops, and cell phones are used as a
heterogeneous data processor. dataset. However, 3-star reviews are ignored. The
Actually, the authors combined the results of feature proposed WDE method is compared with several
and decision-level fusion. Moreover, in feature-level baseline methods, however, in terms of accuracy, WDE
fusion, the features are fed into a supervised classifier outperformed the existing methods. In a nutshell, WDE
namely SVM after extraction, while in decision-level actually well trained the DNN by manipulating reviews’
fusion, the extracted features are fed into separate rating information that is commonly accessible on social
classifiers, and then decisions are combined. The websites.
significance of CNN feature extractor is being automatic Araque et al. [65] proposed a Classifier Ensemble
and it does not rely on handcrafted features. In particular, Model (CEM) to depict the significance of multi-sourced
it well-adapts to the distinctiveness of the specific information. Significantly, this composition has further
dataset, in a supervised manner. information as compared to its base components. The
Being a subtask of analyzing sentiments, aspect model aimed to enhance the overall performance of
extraction involves identifying the targets of opinions in sentiment classification by integrating surface
prejudiced text, particularly in detecting that the (traditional ML classifier) and deep (DL-based) features
judgment holder is either endorsing or complaining which could not be attained by using the classifiers
about the particular features of a product or service. separately. Overall, seven public datasets are used from
Soujanya et al. [63] proposed a novel method for aspect- movie reviews and microblogging domain.
oriented opinion mining using a 7-layered deep CNN. Using SM platform such as Twitter, occasionally,
The traditional methods for feature extraction from text users post nonsensical content. It could be categorized as
such as conditional random fields (CRF) have hate or abusive speech, targeting people such as
limitations. These limitations include several features politicians, celebrities, or products. Detecting such
required to perform even better. The linguistic pattern hateful intents of users belongs to a certain group or a set

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2905101, IEEE

of users fit in another group is thus important. Moreover, aimed to learn decent intermediate representations of the
it is equally significant to recommend appropriate documents and queries, later on, used for calculating
content to users. Pinkesh et al. [66] used several DL their semantic matching. The network comprised a wide
architectures namely CNNs, FastText, and Long Short- convolutional layer tracked by a non-linear and simple
Term Memory Networks (LSTMs) to classify tweets as max pooling – which is used to reduce the
hate speech or not using labels – sexist, racist, or neither. dimensionality. The raw words are used as input to the
Functionally, CNNs are used for hate intent detection, network. This input is needed to be interpreted into real-
FastText can quickly represent a document in the form valued feature vectors. The successive network layers are
of word vectors to tune the word representations using then used to process these real-valued feature vectors.
pre-trained word embeddings sourced from GloVe Furthermore, extracting significant patterns is the
embeddings as in [60], and LSTMs are used to track objective of the convolutional layer, specifically,
long-standing dependencies in the tweets. Detecting discriminative word sequences established within the
applicability of DL to detect hate intent in the tweets is common training input sentence instances. Purposefully,
the major contribution in [66]. the authors used two well-known retrievals TREC
Instead, Georgios et al. [67] also presented a deep benchmarks: TREC microblog retrieval and answer
architecture to classify short text from tweets as hatred sentence selection.
speech. Nonetheless, this approach does not solely count CNN supported in learning more intermediate
on pre-trained word embeddings. The intuition of representations which therefore improved learning of
classifying short text is established on the historical high-quality sentence models. This architecture
tendency of users to post hatred content in the form of comprises intermediate representations of the
offensive messages. Usually, users prefer to use short questions/answers, which together establish a much
terms or words to express their slang intents. richer representation. ConvNets also do not require
Accordingly, tweets of length 30 are used in training manual feature engineering, virtual preprocessing, and
deep proposed model, the reason – frequency of word peripheral resources, which might be costly or rather not
vectors is preferred over pre-trained word embeddings. available. The same architecture model has applications
Muath et al. [68] proposed a Multi-Layered CNN in other domains as well.
(MLCNN) to classify tweets into five scales – highly Ming et al. [70] proposed biLSTM-based method to
positive, positive, neutral, negative, and highly negative. select a suitable answer from a pool of answers to a
Nevertheless, after empirical evaluation of the proposed question asked at the SM platform. TREC QA question
model, the authors found 3-layered CNN, the best among answer selection dataset is used in this study. The authors
all other combinations of layers. used distributed deep representations to match the
questions in open domain question answering systems
with appropriate answers by bearing their semantical
CQA forums practice several questions from users and
structure. The problem is solved with the intuition of
answers from experts every day. For instance, if a user
higher cosine similarity with the question and the answer
posts her question on a CQA forum, the purpose is to
is chosen with the highest cosine similarity from a
seek for the best answer against her question. However,
descending similarity ranked list. The biLSTM, a DL-
a number of experts inclined towards posting a quality
based model played the role where a question has
answer to the same question. This establishes a link
multiple relations to the words or terms used in the
between question-answers text pairs. Likewise, in
answers along with several relations and ideas. The
several information retrieval tasks, the links between
noticeable feature of the proposed model is that it does
query-documents exist as short text pairs. These impose
not depend upon linguistic features engineering or tools
the requirement to rank the answer-question pairs or
whereby it can be applied to any domain. TABLE III
query-document pairs of text. In addition, feature
shows the summary of DL methods for sentiment
engineering is a protruding aspect of learning these
Above all, it has been lately depicted that CNNs are
quite effective in resourceful learning and embedding D. ANOMALY DETECTION
input sentences into low-dimensional vector space. It Anomaly detection is the field of detecting abnormalities
preserves the important semantic and syntactic from data. While investigating the real-life datasets, a
characteristics of the input sentences. Besides, it also collective necessity is to detect such instances in the data
heads towards the contemporary results in many text which stand out from the rest of the data instances [71].
processing tasks. Aliaksei et al. [69] proposed a Such data instances are known as abnormalities, outliers,
convolution neural network-based (ConvNets) approach or anomalies.
to learning the ranks for short text pairs. This model is
Model Year Main Idea Findings Limitations
Prediction using DL
• Focus on sentiment
analysis for the Chinese • Consider more
language forms/languages of text with
RNDM • Predict sentiment label
2014 • Chinese treebank respect to natural language
[58] • Recursive DL
introduction • Incorporate more textual
• Sentence-level contexts
sentiment label

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2905101, IEEE

prediction with higher

• Contrastive conjunction
structure in sentences
• Fusion of DL models • Pre-train the word embeddings
CNN-GRNN • Predict tweets’ labels performed better than on equally large-scale datasets,
[60] • Fused DL models using the models • Consume bi-directional GRNN
individually for enhanced accuracy
Classification using DL
• Effective extraction of
sentiment polarity from
• Extract more tempting features
CNN-based feature speaking video clips
• Feature extraction for short from the visual modality
extraction 2015 using CNN-based
texts • Feature selection method for
[62] method
extracting key features
• Facial expressions, and
words’ utterances
• Better data fitting with
non-linear Deep CNN,
• Linear models such as
Conditional Random
• Aspect extraction from Field (CRF) are weak in
7-layered deep CNN • Diverse linguistic patterns
2016 textual corpora data fitting
[63] (LPs) for aspect extraction
• Opinion mining • Vanishing feature
requirement diminishes
the time and
development cost
• Learned an embedding
space to capture the
sentiment distribution
• Review sentiment of sentences
• Apply weak labels extraction
WDE classification • Inference of weak labels
2016 on other types of deep
[64] • Ratings as weak supervision from ratings
signals • Penalizing relative
distances among
sentences for improved
• Multi-sourced
information such as • Apply deep models to the
• DL-based sentiment
CEM surface features with aspect-based sentiment
2017 classification
[65] word vectors analysis
• Word embeddings
• Improved sentiment • Emotion analysis
• Enhanced performance
Fusing FastText, • Hate speech detection
of DNN methods • User’s network features to
CNNs, LSTMs 2017 • Twitter with multiple DL-
• Effective embeddings’ detect hate speech
[66] based techniques
learning with LSTMs
RNN-based hate • Hate speech detection • Better short text
• Evaluate and analyze texts in
speech detection 2018 • RNN-based ensemble • Count of word
other foreign languages
[67] learning frequency vectors
• Three convolutional
• Explore further methods to
layers better classify the
• Classify tweets into a five- classify text
MLCNN tweets into a five-point
2018 point scale • Devise new pooling strategies
[68] scale
• Multi-layered CNN other than max and average
• Other combinations of
layers are not effective
Ranking using DL
• No requirement of any
external resources
• Apply the model to different
ConvNets • Re-ranking pairs of short • No need for manual
2015 social networks other than
[69] texts feature engineering
• Cost-effective feature
• Enhanced appropriate
LSTM-based DL • Answer selection form a answer selection from
• Answer quality prediction and
framework 2015 pool of answers the pool of answers
ranking in community QA
[70] • TREC QA • Traditional IR models
are not effective

Usually, anomalies are instigated by bugs in the observations as to arouse suspicions that it was generated
underlying data but occasionally, anomalies are by a different mechanism”. In order to detect anomalies,
produced due to formerly unknown fundamental process several efforts have been made using DL. We discuss
– rather, Hawkins [72] defined an outlier as: “an outlier here some significant DL-based anomaly detection
is an observation which deviates so much from the other techniques in SM.

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2905101, IEEE

1) CLASSIFICATION USING DL framework is presented coalescing the goods from both

Supervised anomaly detection can play a pivotal role early and late fusion strategies. In early fusion strategy,
in improving security systems. It can also help security stacked denoising auto-encoder is proposed to discretely
organizations and law enforcement agencies to act learn both motion and appearance features. Then, one-
proactively and detect destructive and predatory actions class SVM is used in order to predict anomaly scores.
and conversations in the cyberspace. Mohammadreza et Towards the end, the anomaly scores are grouped
al. [73] proposed a CNN-based classifier to efficiently together to detect abnormal events. It is found that the
detect such actions in large-volume chat logs. The proposed ADMN model does not rely on former
authors used PAN-2012 22 public dataset for knowledge for feature representation learning. Also, it is
experimentation. CNNs, as a binary classifier, is used to more powerful than existing hand-crafted learning of
accomplish the conversational text classification task. In video features representations. The downside of this
CNN, the convolutional layer usually functions on approach includes higher computational cost and also co-
several input regions. However, the pooling layer is used occurrence of several patterns in the videos.
to sub-sample the higher levels of abstraction in each Anomalies are nonetheless, a result of multiple events
convolutional layer. Being a text classification task, co-occurring or multiple factors causing instances to be
max-pooling is used as it beat the other method that is declared as anomalies. One of the drawbacks of [76]
average-pooling [74]. It is found that using two includes a lack of consideration of co-occurrence of
convolutional layers is less effective as compared to the patterns in videos. Malik et al. [77] detected co-
one convolutional layer in text classification tasks. evolutionary anomalies in heterogeneous bibliographic
However, for image classification tasks, several numbers information networks. Co-evolutionary anomaly (target
of convolutional layers might help, indeed. Since the object) is associated with a number of linked attributes
model tended to overfit the data with the greater number (attribute object). Correspondingly, the influence of each
of convolutional layers. of the attribute objects is calculated on the target objects
Manass´es et al. [75] proposed a deep Convolutional which helped to identify the cause of the occurrence of
Auto-Encoder (CAE) for anomaly detection in video anomalies in the underlying data.
data. The authors used publicly available videos Likewise, Yachuang et al. [78] detected anomalies in
including SM platforms. The proposed CAE model does the crowded scenes. The authors proposed a deep
not necessitate the labeled data since all the training Gaussian Mixture Model (GMM) with a different
instances fit in the non-anomalous group. The sliding combination of layers on top of each other. The motion
windows are used to sub-sample the video frames from and appearance features are extracted using Principle
video clips, wherein both motion and appearance Component Analysis (PCA) and applied with clustering.
features are extracted. The frame reconstruction error is The clusters with less number of members and/or far
presented as the anomaly score. The Area Under Curve away from the regular groups are picked as anomalies. It
(AUC) is used to calculate the performance of the is found that using deep-GMM is way valuable as
proposed model. It is found that the accumulation of compared to hand-crafted feature learning. However,
high-level information to unprocessed data can augment short and long-term temporal motion features are still
the performance of the proposed classifier, CAE. having room to explore. TABLE IV shows the summary
Particularly, supplementing high-level information can of DL methods for anomaly detection.
be valued given that the types of anomalies known
beforehand that are envisioned to be detected. IV. DATASETS AND BENCHMARKS
2) CLUSTERING USING DL In this section, we go through significant SM datasets
Detecting anomalous events particularly in videos usually used as benchmarks in the SM tasks.
is of yet utmost significance. The scenes are greater in
number, since detecting abnormal scenes from video data A. FACEBOOK
can be considered as a clustering problem. Feature In the era of social connectionism, people prefer to
learning is non-trivial in video surveillance. However, connect with their friends online using SM platforms.
Dan et al. [76] proposed a DNN-based Appearance and Facebook [23] is one of such platforms which people use
Motion DeepNet (ADMN) to automatically and privately to express and share their content. Though, the
efficiently learn feature representations. A double fusion
Model Year Main Idea Findings Limitations
Classification using DL
• Single convolutional
layer for text
CNN-based classification is more
predatory text • Classification of predatory efficient • RNNs and LSTMs for large-
classifier discussions in chat logs • Consuming several scale documents training
[73] convolutional layers are
advantageous in image
classification tasks

22 23
http://pan.webis.de/clef12/pan12-web/ http://multiplesocialnetworklearning.azurewebsites.

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2905101, IEEE

• Supplement
CAE • Devise methods to imitate the
• Anomaly detection in videos information to raw data
[75] 2017 semantic conception of visuals
• Convolution auto-encoders enhances the
by humans
performance of CAE
Clustering using DL
• High computational cost
• Fuse more multi-modal data
• Prior knowledge is not using stacked denoising auto-
• Anomalous event detection
ADMN reliable in DL encoders
2016 in video
[76] • Especially, for feature • Co-occurrence of several
• DL-based architectures
representation patterns
• Inclusion of contextual
• Enhanced learning of
• Build deep models for short-
• Anomaly detection from motion and appearance
Deep-GMM term motion feature learning
2017 crowded scenes features
[78] • Long-term motion learning
• Deep-GMM • Hand-crafted feature
• Appearance feature learning
learning is not effective

SM usage of users may vary due to the wide-ranging access to this dataset is not truly public unless requested
services offered by these platforms. Certainly, a number or subscribe by a registered entity.
of platforms are available to download and use publicly DL can play an important role in analyzing the textual,
accessible SM datasets. Particularly, the features of image, or video data provided by the LinkedIn platform.
Facebook dataset comprise – demographic features such Usually, people use this platform for job search,
as gender, education, relationship status; user topics such however, applying DL to scan through the candidates
as dispositions, professions; user posting behavior such resumes and filter out the eligible ones can be of
as slang posts, photos posts, video posts, shares, likes and significant support for human resource managers.
so on.
An abundant amount of multimodal data is being D. FLICKR
posted at Facebook every day. However, most of the data Photo sharing is one of the most used online service,
is in unstructured form. In order to extract meaningful these days. Flickr is an image and short video sharing
insights from unstructured data, DL can service in platform where people share their experiences and
representing it in a form which can answer questions emotions in the form of photos and short videos. These
such as – how often does a certain company’s product platforms are available in a number of languages
appear in pictures to a particular user? Correspondingly, including English, German, French, Korean, Spanish,
the profusion of multimodal data can be sheltered by and so on. Account creation is not mandatory to access
applying powerful DL models in several noteworthy the content on such SM platforms but uploading imposes
SMA tasks – textual analysis, image analysis, targeted the account creation.
advertising, and so on. DL can be applied for image and video data analytics.
On account of the recurrent nature - the RNN-based
B. TWITTER models can be used to analyze users’ likes, dislikes,
People tend to express their opinions about a certain recommending the most anticipated content to users, and
topic. Twitter, an SM platform which people use to surrounding the users with advertisements of their
publicly express and share their personal viewpoints and personal and pertinent interests.
interpretations about a certain event, personality,
organization, or happenings. The features of tweets E. YOUTUBE
include positive emotions, negative emotions, Videos and moving objects are the most substantial
relationship expressions, or health status. type of content to fascinate users. YouTube is one of the
DL has striking implications in NLP. In terms of SM most prominent SM platforms to share the content in the
insights, the text is still a dominating factor among multi- form of videos. It allows the consumers to upload (with
typed data. Users practice Twitter to express their an account), share, rate, or view the video posts and
personal viewpoint about a certain social, political, manage their account privacy settings. The user-
educational, personal, or professional event. In addition, generated content uploaded at YouTube ranges from
most of the Twitter data is posted in the form of text. DL educational, artistic, trailers, documentaries, towards
can help in a number of NLP tasks such as crises situation official live video streaming.
identification, users’ behavior analysis, quantifying Analyzing dynamic and live streams is a non-trivial
users’ enthusiasm about a certain event, and so on. problem, however, deep auto-encoders can leverage
dynamic video analysis significantly. The underlying
C. LINKEDIN nature of the data anticipates more powerful frame
Companies seek individuals who are professional in representations which simple ML approaches cannot
their field of interest and tend to excel in future career. provide. However, DL can be used to extract meaningful
LinkedIn is a type of professional SM platform where insights from dynamic video streaming platforms such as
people build their profiles to get hired or to hire the YouTube, Metacafe, and Vimeo.
appropriate candidates. Fundamentally, it is an
employment-oriented and business-focused online social F. STACKEXCHANGE
service to create a network of professionals. Usually, the Individuals may face a number of problems, every day,

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2905101, IEEE

for which they sought the trivial and reliable solutions. is to make precise predictions. The diverse nature of SM
Stack Exchange is a type of SM platform where people data demands either to communicate the complete
can ask questions from varied fields and topics and learn information with long delays owing to the excessive and
solutions from community question-answer systems. complex processing or to have data samples in order to
However, a family of experts, sometimes the normal make the predictions. However, the later could lead
users tend to answer the questions asked by various towards imprecise diagnostics and predictions.
users. The reputation can be achieved by posting highly For instance, a DL-based model which predicts voter
upvoted and quality answers to the questions. fraud. A binary classifier algorithm process individual
Particularly, the answer with a maximum number of votes and classify them as legit or fraudulent. Obviously,
upvotes is chosen as the best answer. Stack Overflow, the error cost for a fraudulent vote as legit is more severe
Super User, and English Language and Usage are some than a parallel error that marks a legit vote as potentially
prominent sites used for learning programming, fraudulent. In case, after processing a million votes, if the
information technology, and English linguistics model fails to capture 100 fraudulent votes, even then,
respectively. there will be a high accuracy. However, the end result
CQA is a platform where users seek for the most could affect the context in a greater sense.
relevant answers to their questions, the best answers, to-
the-point answers, and most importantly to search such B. MODEL SIZE
answers rapidly. This constitutes a multimodal search In order to progress the prediction accuracy of the
altogether. The deep hidden layers can be used to model, the size of the NN matters equally. Also, it is
construct a model which can satisfy users multitude of mounting exponentially. Therefore, to process abundant
necessities. data using deep models, they need to be really efficient,
scalable, and robust, so that the computation accuracy
V. PERFORMANCE EVALUATION may not be transformed if the model needs enhancement.
There a number of ways to evaluate the performance
of DL methods. The recall, precision, and F1-score are a C. LEARNING RATE
noteworthy performance measure for prediction, ranking Learning rate matters in a great deal most of the times
and classification-based tasks [38, 45]. The F1-score for any DL-based method. The reason – DL is used to
stabilizes the recall and precision as: process plentiful data with improved accuracy and rapid
processing. Also, the dynamic nature of DL models, they
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∙ 𝑟𝑒𝑐𝑎𝑙𝑙 involve training, deployment, and sometimes re-training
𝐹" = (2) – which is a requirement when new data come to the
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙
lodge. Therefore, it is quite important – how fast a deep
model can learn representations aligned with the speed
where 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 – the portion of pertinent instances
of new data arrival.
among instances retrieved while 𝑟𝑒𝑐𝑎𝑙𝑙 – the ratio of
retrieved pertinent instances with the total number of
pertinent instances in the data [79]. The classification VI. CHALLENGES AND FUTURE DIRECTIONS
tasks also evaluated by ROC curve. With ROC curve as The preceding sections explored different domains of
a the ground, the Area Under Curve (AUC) is used as an SM where DL methods are applied, described the
evaluation metric for each class [39]. datasets used as benchmarks for specific research
To measure the difference between actual observed purposes, and deliberated some performance evaluation
values to that of predicted values, Root Mean Squared measures used in the literature. However, the features
Error (RMSE) is often used to evaluate the model allied with SM pose challenges in adjusting DL methods
performance [42, 48]. The larger RMSE error values in such a way that they can solve those problems. In this
signify greater performance loss. Nevertheless, RMSE is section, we present some topics where DL needs
a performance evaluation technique that is highly supplementary investigation for SMA. Undoubtedly,
sensitive to anomalies. It can be represented as: these are dealing with high-dimensional data, learning
with streaming data, scalability of models, and
distributed computing.
(𝑧LPQ 𝑧MP )# (3)
%4" SM platforms encompass a number of social relations
which needs trusted recommendations. People are
where 𝑧LP and 𝑧MP are differences between predicted and always ought to get connected in a type of relationship
that can be trusted or receive such information that is
real values, and 𝑁 is the sample size [80].
reliable. Trust metrics have to play a significant role in
Usually, the proposed method performance is also recommender systems [81]. Nevertheless, disseminating
evaluated by an empirical comparison of the proposed trusted information can be helpful for to conquer
and existing methods’ results [76]. However, three core
unswerving recommendations. The authors [48] made a
performance metrics plays a significant role in order to DL-based effort to recommend trust-aware relations to
measure the performance of a DL model; accuracy, size SM users. However, that study is limited to Epinions and
of the model, and the rate of learning. Flixster data only.
Deliberately, the trust in a pool of relations may vary
A. ACCURACY with respect to time. Also, the content rated by users also
Predominantly, the core objective of DL-based models significantly sensitive to time. More importantly, the

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2905101, IEEE

ratings which have become obsolete may cause the data and Facebook. However, the information in users’
noisy and unreliable to be used for social reviews and opinions is of great significance and needs
recommendations. Efforts are still needed to proposed to be explored.
reliable and scalable DL-based techniques to recommend Today, SM users are able to post their reviews and
trusted information and trust-aware relations in SM opinions in a wide variety of natural languages such as
platforms. English, Chinese, Turkish, Spanish, and so on. The
authors [58] made an effort to explore Chinese language
B. REFINING DL AND AVOIDING DIMENSIONALITY sentiments using recursive DL. The challenges are two-
REDUCTION fold here: one is to create data banks for languages other
Machine learning models use a wide range of than English as [58] build the Chinese Sentiment
parameters to predict, learn, recommend, classify, Treebank, and the other is to use and build apposite DL
cluster, or group different data items. Sometimes, these methods to explore SM natural languages beyond
random variables or parameters are often denoted as English [67]. In addition, there is still a decent pitch to
dimensions. Most of the machine learning algorithms play with DL methods on a wide ground of SM websites.
perform dimensionality reduction before processing a
dataset. However, DL provides the room to avoid E. ASPECT EXTRACTION
dimensionality reduction by improving and undertaking Being a subtask of sentiment analysis, the aspect
more robust and scalable learning. extraction comprises categorizing target of opinions in
The authors [46] though used multiple data sources to prejudiced text. Specifically, in distinguishing the
train and learn DL methods. However, they suggested particular aspects of a product, the opinion originator is
making DL learning more scalable that one could avoid either endorsing or complaining about. This problem
reducing dimensions. While using an unabridged set of needs a set of linguistic patterns to classify words as an
features could depict more durable results through aspect or non-aspect words in sentences.
learning a model. Accordingly, the authors [63] presented a 7-layered
However, it is also indispensable to reduce dimensions deep CNN model for efficient aspect extraction from
[82] which can enrich the network structure. Apparently, Google and Amazon embeddings. Significantly, LPs
an enhanced network structure can help in processing the play a prominent role in aspect extraction in SM data.
nodes of DL-based network with greater ease and less Hence, it enquires DL for more robust LPs crafting for
complexity. efficient and precise aspect extraction.
Generally, the recommendation systems recommend
C. CUT COST AND PUT PRODUCTIVITY items on the basis of the customer’s purchase history, for
With the exponential growth in SM, the availability of instance. However, designing deep-models which can
online reviews and recommendations have significantly learn the representations for the recommendation on the
increased, which thereby made the sentiment basis of multiple latent aspects [83] or personalized
classification, an interesting topic in academic and aspects [84] could be of great worth.
industrial research. However, the existence of an even
bigger pool of multi-domain online reviews caused it F. HETEROGENEOUS ANOMALY DETECTION
difficult to gather annotated training data. The authors Social networks grasp the capital of useful
[16] proposed a deep model, SDA, which is based on information. While the same user could have her
stacked denoising autoencoders which use the output of accounts on a number of social networking platforms
one layer as the input of the next stacked layer, hence, [69]. Integrating users’ information from heterogeneous
improving representation learning. SM data sources is always intriguing. It can illustrate a
SDA can use data from different domains which cuts comprehensive understanding of the interests and
the computation required to transfer to numerous behaviors of users – which are dynamic, heterogeneous,
domains. However, the number of domains adapted is and multi-contextual. If some data instance is an
not promising, while there exist several social networks anomaly in a specific context, however, it may be not in
acquiring to adapt data as a single source. In addition, the other context. Also, the same data instance could be
training a DL model using large dataset requires more anomalous from the multi-points of view – which is
powerful resources which could increase the cost. It significant to detect. Implementing DL approaches for
shows us the scope of building more cost-effective and joint representation learning can better seizure the
efficient DL methods to incorporate yet more social complexity of networks for heterogeneous anomaly
network domains [76]. detection [85]. However, maintain consistency among
heterogeneous social networks is challenging [38].
Sentiment analysis is the research area that G. FUSING ANOMALY DETECTION AND SOCIAL
investigates the opinions of people, individual INFLUENCE
evaluations, attitudes, sentiments, appraisals, and Escalated usage of social media platforms caused an
emotions towards objects such as products, services, unprecedented intensification in the social data and
individuals, organizations, events, issues, topics, and provides an extraordinary prospect to study social
their attributes. With the rise of Web 2.0 and enormous behaviors of users. However, in the pool of social media
growth in SM content, sentiment analysis has now users, few are influential while the majority are normal.
developed as a prevalent and challenging research Similar is the case with the anomalies in social networks.
problem. Users are now able to post their opinions and A number of earlier studies propose to solve
views on a wide range of social websites such as Twitter conventional anomaly detection problems using the

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2905101, IEEE

activity-based, graph-based, community-based, sources and so on, still need efficient and reliable DL-
distance-based, and statistical-based methodologies. based techniques. Using DL, these challenges need to be
Influence analysis and anomaly detection problems addressed in a canonical way that proves to be an edge
mostly investigated independently with the domain for the scientific community. We have faith that these
specific applications. However, rather studying both posed challenges will bring plentiful research prospects
problems separately, combinatorial consideration may to the DL community. Also, they will deliver key
capture rich semantics from social networks with diverse developments in various real-life fields such as
and more efficient applications. education, business, e-commerce, medicine etc.
Likewise, fusing two or more problem solutions can
arise to a new valuable solution, diminish the preceding VIII. REFERENCES
or individual solution cost, and as well as enhance the
learning rate for the DL model. Importantly, DL has the
ability to learn more complex and heterogeneous [1] X.-W. Chen and X. Lin, "Big Data Deep
representation for networks. The network diffusion Learning - Challenges and Perspectives," IEEE
based embeddings [86] methods can solve a number of Access, vol. 2, pp. 514-525, 2014.
limitations of the DL-based methods including [2] R. Hanna, A. Rohm, and V. L. Crittenden,
heterogeneity, scalability, and multimodality. "We’re all connected: The power of the social
media ecosystem," Business horizons, vol. 54,
With the rise in SM data such as product reviews, [3] C. L. P. Chen and C.-Y. Zhang, "Data-intensive
business forums, and so on – analyzing sentiments along Applicatons, Challenges, Techniques and
with the topics of the text is equally significant to swiftly Technologies: A Survey on Big Data,"
and efficiently recapitulate the textual data [87]. Social Information Sciences, vol. 275, pp. 314-347,
and public availability of Web 2.0 generates a path to the 2014.
unprecedented intensification of SM data whereas a [4] C. Shi, Y. Li, J. Zhang, Y. Sun, and S. Y. Philip,
significant part of that data is of textual nature. The SM "A survey of heterogeneous information
platforms such as described in section II, the users are network analysis," IEEE Transactions on
unrestricted to express their opinions. As compared to Knowledge and Data Engineering, vol. 29, pp.
the conventional documents – the SM data (documents) 17-37, 2017.
entails the sentiment along with topic-oriented analysis. [5] M. M. Najafabadi, F. Villanustre, T. M.
Instead, exploring the power of DL-based models for Khoshgoftaar, N. Seliya, R. Wald, and E.
NLP tasks, the distributed representation learning for Muharemagic, "Deep learning applications and
words can be used to effectively identify topics and challenges in big data analytics," Journal of Big
predict sentiments for document summarization and Data, vol. 2, p. 1, 2015.
sentiments analysis tasks respectively [88]. Existing [6] S. Pal, Y. Dong, B. Thapa, N. V. Chawla, A.
works focused on topic identification and sentiment Swami, and R. Ramanathan, "Deep learning for
analysis individually, however, studying them together network analysis: Problems, approaches and
and developing a unified model is effective, indeed. It challenges," in Military Communications
could also support in summarizing textual data and also Conference, MILCOM 2016-2016 IEEE, 2016,
in CQA platforms. pp. 588-593.
[7] L. Deng, "A tutorial survey of architectures,
algorithms, and applications for deep learning,"
APSIPA Transactions on Signal and
SMA has grown widespread attention, recently. While Information Processing, vol. 3, 2014.
going through the literature, we found several articles [8] L. Deng and D. Yu, "Deep learning: methods
studying diverse aspects of SM problems. However, and applications," Foundations and Trends® in
there are no articles revealing DL prospects using Signal Processing, vol. 7, pp. 197-387, 2014.
insights from SM analytics. In this article, we [9] J. A. Obar and S. S. Wildman, "Social media
particularly enlighten this gap, also the pertinent models definition and the governance challenge: An
and algorithms, comprehensively. In terms of DL, we introduction to the special issue," 2015.
present the state-of-the-art research accomplishments in [10] E. Otte and R. Rousseau, "Social network
SM analytics. We also present the current research analysis: a powerful strategy, also for the
challenges and future directions in this domain. information sciences," Journal of information
In conclusion, the SM platforms present a number of Science, vol. 28, pp. 441-453, 2002.
noteworthy challenges to DL. We provide a detailed [11] M. Shahriari and R. Klamma, "Signed social
depiction of varied SM domain. The DL-based methods networks: Link prediction and overlapping
have significant power to learn valuable data community detection," in Advances in Social
representations from multi-domain SM platforms such as Networks Analysis and Mining (ASONAM),
user behavior analysis, business analysis, sentiment 2015 IEEE/ACM International Conference on,
analysis, anomaly detection and many more. However, 2015, pp. 1608-1609.
the aspects including powerful resource requirement to [12] N. Girdhar and K. Bharadwaj, "Signed Social
deal with the data heaps, improving productivity and Networks: A Survey," in International
putting down the computational costs, learning efficient Conference on Advances in Computing and
data representations from heterogeneous social data Data Sciences, 2016, pp. 326-335.

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2905101, IEEE

[13] J. Kunegis, A. Lommatzsch, and C. Bauckhage, [27] Y. Bengio, "Learning deep architectures for
"The slashdot zoo: mining a social network with AI," Foundations and trends® in Machine
negative edges," in Proceedings of the 18th Learning, vol. 2, pp. 1-127, 2009.
international conference on World wide web, [28] H. Larochelle and Y. Bengio, "Classification
2009, pp. 741-750. using discriminative restricted Boltzmann
[14] P. Massa and P. Avesani, "Trust metrics on machines," in Proceedings of the 25th
controversial users: Balancing between tyranny international conference on Machine learning,
of the majority," International Journal on 2008, pp. 536-543.
Semantic Web and Information Systems [29] G. E. Hinton, "Deep belief networks,"
(IJSWIS), vol. 3, pp. 39-64, 2007. Scholarpedia, vol. 4, p. 5947, 2009.
[15] Z. Yuan, J. Sang, Y. Liu, and C. Xu, "Latent [30] J. Schmidhuber, "Deep learning in neural
Feature Learning in Social Media Network," in networks: An overview," Neural networks, vol.
Proceedings of the 21st ACM International 61, pp. 85-117, 2015.
Conference on Multimedia, Barcelona, Spain, [31] Y. LeCun, Y. Bengio, and G. Hinton, "Deep
2013, pp. 253-262. learning," nature, vol. 521, p. 436, 2015.
[16] X. Glorot, A. Bordes, and Y. Bengio, "Domain [32] F. Agostinelli, M. Hoffman, P. Sadowski, and
Adaptation for Large-Scale Sentiment P. Baldi, "Learning activation functions to
Classification: A Deep Learning Approach," in improve deep neural networks," arXiv preprint
ICML'11 Proceedings of the 28th International arXiv:1412.6830, 2014.
Conference on International Conference on [33] D. Wang, P. Cui, and W. Zhu, "Structural deep
Machine Learning, Bellevue, Washington, network embedding," in Proceedings of the
USA, 2011, pp. 513-520. 22nd ACM SIGKDD international conference
[17] B. Pang and L. Lee, "Opinion mining and on Knowledge discovery and data mining,
sentiment analysis," Foundations and Trends® 2016, pp. 1225-1234.
in Information Retrieval, vol. 2, pp. 1-135, [34] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan,
2008. and Q. Mei, "Line: Large-scale information
[18] D. Liben‐Nowell and J. Kleinberg, "The link‐ network embedding," in Proceedings of the
prediction problem for social networks," 24th International Conference on World Wide
journal of the Association for Information Web, 2015, pp. 1067-1077.
Science and Technology, vol. 58, pp. 1019- [35] J. Bian, B. Gao, and T.-Y. Liu, "Knowledge-
1031, 2007. powered deep learning for word embedding," in
[19] A. M. Kaplan and M. Haenlein, "The early bird Joint European Conference on Machine
catches the news: Nine things you should know Learning and Knowledge Discovery in
about micro-blogging," Business horizons, vol. Databases, 2014, pp. 132-148.
54, pp. 105-113, 2011. [36] H. Aramo-Immonen, J. Jussila, and J.
[20] T. Aichner and F. Jacob, "Measuring the degree Huhtamäki, "Exploring co-learning behavior of
of corporate social media use," International conference participants with visual network
Journal of Market Research, vol. 57, pp. 257- analysis of Twitter data," Computers in Human
276, 2015. Behavior, vol. 51, pp. 1154-1162, 2015.
[21] L. Licamele and L. Getoor, "Social capital in [37] Q. Zhang, L. T. Yang, and Z. Chen, "Deep
friendship-event networks," in Data Mining, computation model for unsupervised feature
2006. ICDM'06. Sixth International Conference learning on big data," IEEE Transactions on
on, 2006, pp. 959-964. Services Computing, vol. 9, pp. 161-171, 2016.
[22] J. E. Vascellaro, "Social networking goes [38] Y. Jia, X. Song, J. Zhou, L. Liu, L. Nie, and D.
professional," Wall Street Journal, pp. D1-D2, S. Rosenblum, "Fusing Social Networks with
2007. Deep Learning for Volunteerism Tendency
[23] H. Li, H. Wang, J. Liu, and K. Xu, "Video Prediction," in AAAI, 2016, pp. 165-171.
sharing in online social networks: measurement [39] F. Liu, B. Liu, C. Sun, M. Liu, and X. Wang,
and analysis," in Proceedings of the 22nd "Deep belief network-based approaches for link
international workshop on Network and prediction in signed social networks," Entropy,
Operating System Support for Digital Audio vol. 17, pp. 2140-2169, 2015.
and Video, 2012, pp. 83-88. [40] F. Liu, B. Liu, C. Sun, M. Liu, and X. Wang,
[24] C. Shah, S. Oh, and J. S. Oh, "Research agenda "Deep learning approaches for link prediction
for social Q&A," Library & Information in social network services," in International
Science Research, vol. 31, pp. 205-209, 2009. Conference on Neural Information Processing,
[25] J. Jin, Y. Li, X. Zhong, and L. Zhai, "Why users 2013, pp. 425-432.
contribute knowledge to online communities: [41] M. B. Lazreg, M. Goodwin, and O.-C. Granmo,
An empirical study of an online social Q&A "Deep Learning for Social Media Analysis in
community," Information & management, vol. Crises Situations," in The 29th Annual
52, pp. 840-849, 2015. Workshop of the Swedish Artificial Intelligence
[26] M. van Gerven and S. Bohte, Artificial neural Society (SAIS) 2–3 June 2016, Malmö, Sweden,
networks as models of neural information 2016, p. 31.
processing: Frontiers Media SA, 2018. [42] X. Liu and T. Zhu, "Deep learning for
constructing microblog behavior representation

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2905101, IEEE

to identify social media user’s personality," [56] M. H. Kiapour, X. Han, S. Lazebnik, A. C.

PeerJ Computer Science, vol. 2, p. e81, 2016. Berg, and T. L. Berg, "Where to Buy It:
[43] R. G. Guimaraes, R. L. Rosa, D. De Gaetano, Matching Street Clothing Photos in Online
D. Z. Rodriguez, and G. Bressan, "Age Groups Shops," in ICCV, 2015, pp. 3343-3351.
Classification in Social Network Using Deep [57] Q. Chen, J. Huang, R. Feris, L. M. Brown, J.
Learning," IEEE Access, 2017. Dong, and S. Yan, "Deep domain adaptation for
[44] T. T. Zin, P. Tin, and H. Hama, "Deep Learning describing people based on fine-grained
Model for Integration of Clustering with clothing attributes," in Computer Vision and
Ranking in Social Networks," in International Pattern Recognition (CVPR), 2015 IEEE
Conference on Genetic and Evolutionary Conference on, 2015, pp. 5315-5324.
Computing, 2016, pp. 247-254. [58] C. Li, B. Xu, G. Wu, S. He, G. Tian, and H. Hao,
[45] Z. Chen, B. Gao, H. Zhang, Z. Zhao, H. Liu, and "Recursive deep learning for sentiment analysis
D. Cai, "User Personalized Satisfaction over social data," in Proceedings of the 2014
Prediction via Multiple Instance Deep IEEE/WIC/ACM International Joint
Learning," in Proceedings of the 26th Conferences on Web Intelligence (WI) and
International Conference on World Wide Web, Intelligent Agent Technologies (IAT)-Volume
2017, pp. 907-915. 02, 2014, pp. 180-185.
[46] A. M. Elkahky, Y. Song, and X. He, "A multi- [59] D. Tang, F. Wei, N. Yang, M. Zhou, T. Liu, and
view deep learning approach for cross domain B. Qin, "Learning sentiment-specific word
user modeling in recommendation systems," in embedding for twitter sentiment classification,"
Proceedings of the 24th International in Proceedings of the 52nd Annual Meeting of
Conference on World Wide Web, 2015, pp. 278- the Association for Computational Linguistics
288. (Volume 1: Long Papers), 2014, pp. 1555-1565.
[47] H. Wang, N. Wang, and D.-Y. Yeung, [60] D. Stojanovski, G. Strezoski, G. Madjarov, and
"Collaborative deep learning for recommender I. Dimitrovski, "Finki at SemEval-2016 Task 4:
systems," in Proceedings of the 21th ACM Deep learning architecture for Twitter
SIGKDD International Conference on sentiment analysis," in Proceedings of the 10th
Knowledge Discovery and Data Mining, 2015, International Workshop on Semantic
pp. 1235-1244. Evaluation (SemEval-2016), 2016, pp. 149-154.
[48] S. Deng, L. Huang, G. Xu, X. Wu, and Z. Wu, [61] J. Pennington, R. Socher, and C. Manning,
"On deep learning for trust-aware "Glove: Global vectors for word
recommendations in social networks," IEEE representation," in Proceedings of the 2014
transactions on neural networks and learning conference on empirical methods in natural
systems, vol. 28, pp. 1164-1177, 2017. language processing (EMNLP), 2014, pp.
[49] J. Dean, G. Corrado, R. Monga, K. Chen, M. 1532-1543.
Devin, M. Mao, et al., "Large scale distributed [62] S. Poria, E. Cambria, and A. Gelbukh, "Deep
deep networks," in Advances in neural convolutional neural network textual features
information processing systems, 2012, pp. and multiple kernel learning for utterance-level
1223-1231. multimodal sentiment analysis," in Proceedings
[50] A. Coates, B. Huval, T. Wang, D. Wu, B. of the 2015 Conference on Empirical Methods
Catanzaro, and N. Andrew, "Deep learning with in Natural Language Processing, 2015, pp.
COTS HPC systems," in International 2539-2544.
Conference on Machine Learning, 2013, pp. [63] S. Poria, E. Cambria, and A. Gelbukh, "Aspect
1337-1345. extraction for opinion mining with a deep
[51] F. Piller, A. Vossen, and C. Ihl, "From social convolutional neural network," Knowledge-
media to social product development: the Based Systems, vol. 108, pp. 42-49, 2016.
impact of social media on co-creation of [64] Z. Guan, L. Chen, W. Zhao, Y. Zheng, S. Tan,
innovation," 2011. and D. Cai, "Weakly-Supervised Deep
[52] L. McCarthy, D. Stock, and R. Verma Ph D, Learning for Customer Review Sentiment
"How travelers use online and social media Classification," in IJCAI, 2016, pp. 3719-3725.
channels to make hotel-choice decisions," 2010. [65] O. Araque, I. Corcuera-Platas, J. F. Sánchez-
[53] B. D. Weinberg and E. Pehlivan, "Social Rada, and C. A. Iglesias, "Enhancing deep
spending: Managing the social media mix," learning sentiment analysis with ensemble
Business horizons, vol. 54, pp. 275-282, 2011. techniques in social applications," Expert
[54] X. Ding, T. Liu, J. Duan, and J.-Y. Nie, "Mining Systems with Applications, vol. 77, pp. 236-246,
User Consumption Intention from Social Media 2017.
Using Domain Adaptive Convolutional Neural [66] P. Badjatiya, S. Gupta, M. Gupta, and V.
Network," in AAAI, 2015, pp. 2389-2395. Varma, "Deep learning for hate speech
[55] K. Lin, H.-F. Yang, K.-H. Liu, J.-H. Hsiao, and detection in tweets," in Proceedings of the 26th
C.-S. Chen, "Rapid clothing retrieval via deep International Conference on World Wide Web
learning of binary codes and hierarchical Companion, 2017, pp. 759-760.
search," in Proceedings of the 5th ACM on [67] G. K. Pitsilis, H. Ramampiaro, and H. Langseth,
International Conference on Multimedia "Detecting Offensive Language in Tweets
Retrieval, 2015, pp. 499-502. Using Deep Learning," arXiv preprint
arXiv:1801.04433, 2018.

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2905101, IEEE

[68] M. ALALI, N. M. Sharef, H. Hamdan, M. A. A. Filtering via Heterogeneous Information

Murad, and N. A. Husin, "Multi-layers Networks," in IJCAI, 2018, pp. 3393-3399.
Convolutional Neural Network for Twitter [84] J. Tang and K. Wang, "Personalized top-n
Sentiment Ordinal Scale Classification," in sequential recommendation via convolutional
International Conference on Soft Computing sequence embedding," in Proceedings of the
and Data Mining, 2018, pp. 446-454. Eleventh ACM International Conference on
[69] A. Severyn and A. Moschitti, "Learning to rank Web Search and Data Mining, 2018, pp. 565-
short text pairs with convolutional deep neural 573.
networks," in Proceedings of the 38th [85] S. Liu, Q. Qu, and S. Wang, "Heterogeneous
International ACM SIGIR Conference on anomaly detection in social diffusion with
Research and Development in Information discriminative feature discovery," Information
Retrieval, 2015, pp. 373-382. Sciences, vol. 439, pp. 1-18, 2018.
[70] M. Tan, C. d. Santos, B. Xiang, and B. Zhou, [86] Y. Shi, M. Lei, H. Yang, and L. Niu, "Diffusion
"LSTM-based deep learning models for non- network embedding," Pattern Recognition, vol.
factoid answer selection," arXiv preprint 88, pp. 518-531, 2019.
arXiv:1511.04108, 2015. [87] D. Tang, Z. Zhang, Y. He, C. Lin, and D. Zhou,
[71] V. Chandola, A. Banerjee, and V. Kumar, "Hidden topic–emotion transition model for
"Anomaly detection: A survey," ACM multi-level social emotion detection,"
computing surveys (CSUR), vol. 41, p. 15, Knowledge-Based Systems, vol. 164, pp. 426-
2009. 435, 2019.
[72] D. M. Hawkins, Identification of outliers vol. [88] R. K. Amplayo, S. Lee, and M. Song,
11: Springer, 1980. "Incorporating product description to sentiment
[73] M. Ebrahimi, C. Y. Suen, and O. Ormandjieva, topic models for improved aspect-based
"Detecting predatory conversations in social sentiment analysis," Information Sciences, vol.
media by deep convolutional neural networks," 454, pp. 200-215, 2018.
Digital Investigation, vol. 18, pp. 33-49, 2016.
[74] Y. Zhang and B. Wallace, "A sensitivity
analysis of (and practitioners' guide to)
convolutional neural networks for sentence
classification," arXiv preprint
arXiv:1510.03820, 2015.
[75] M. Ribeiro, A. E. Lazzaretti, and H. S. Lopes,
"A study of deep convolutional auto-encoders
for anomaly detection in videos," Pattern
Recognition Letters, 2017.
[76] D. Xu, Y. Yan, E. Ricci, and N. Sebe,
"Detecting anomalous events in videos by
learning deep representations of appearance and
motion," Computer Vision and Image
Understanding, vol. 156, pp. 117-127, 2017.
[77] M. K. Hayat and A. Daud, "Anomaly detection
in heterogeneous bibliographic information
networks using co-evolution pattern mining,"
Scientometrics, vol. 113, pp. 149-175, 2017.
[78] Y. Feng, Y. Yuan, and X. Lu, "Learning deep
event models for crowd anomaly detection,"
Neurocomputing, vol. 219, pp. 548-556, 2017.
[79] D. M. Powers, "Evaluation: from precision,
recall and F-measure to ROC, informedness,
markedness and correlation," 2011.
[80] A. G. Barnston, "Correspondence among the
correlation, RMSE, and Heidke forecast
verification measures; refinement of the Heidke
score," Weather and Forecasting, vol. 7, pp.
699-709, 1992.
[81] P. Massa and P. Avesani, "Trust metrics in
recommender systems," in Computing with
social trust, ed: Springer, 2009, pp. 259-285.
[82] L. Qiao, H. Zhao, X. Huang, K. Li, and E. Chen,
"A Structure-Enriched Neural Network for
network embedding," Expert Systems with
Applications, vol. 117, pp. 300-311, 2019.
[83] X. Han, C. Shi, S. Wang, S. Y. Philip, and L.
Song, "Aspect-Level Deep Collaborative

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2905101, IEEE

HUSSAIN DAWOOD received his MS and

MALIK KHIZAR HAYAT did his master degree Ph.D. degree in Computer Application
from International Islamic University, Islamabad Technology from Beijing Normal University,
(2016). He is an active member of the Data Mining Beijing, China in 2012 and 2015, respectively. He
and Information Retrieval Group, IIU, Islamabad, is currently working as an Assistant Professor at
Pakistan. He has published is MS work in the Department of Computer and Network
Scientometrics. His Research interests include
Engineering, College of Computer Science and
heterogeneous social networks, anomaly detection,
Scientometrics, and deep learning. Engineering, University of Jeddah, Jeddah, Saudi
Arabia. His current research interests include image processing, pattern
ALI DAUD (M'76–SM'81–F'87) obtained his recognition, machine learning, and feature extraction.
Ph.D. degree from Tsinghua University (July
2010). He is Associate Professor and head of Data
Mining and Information Retrieval Group, IIU,
Islamabad, Pakistan. He has published about 70
papers in reputed international Impact Factor
journals and conferences. He has taken part in
many research projects and is Principal Investigator
(PI) of two projects. His research interests include
Data Mining, Social Network Analysis and
Mining, Probabilistic Models, Scientometrics, and Natural Language


head of Computer Science and Artificial Intelligent
Department (CSAI) as well as Vice Dean of College
of Computer Science and Engineering (CCSE) at
University of Jeddah, Jeddah, Saudi Arabia. He did
his PhD in cloud computing from University of
Southampton, Southampton, UK in February 2018.
His research interests span mainly around Industry 4.0 Prestaining
issues of Cloud Computing and Fog Computing Security, Internet of
Things (IoT), Smart Cities, Intelligent Systems, Deep Learning, Data
Science Analytics and Modelling. He has published numerous
conference papers, journal Papers and one book chapter.

AMEEN BANJAR received his Ph.D. degree from

the Faculty of Engineering and Information
Technology, University of Technology, Sydney.
Currently, he is working as an assistant professor at
Department of Information Systems and
Technology, College of Computer Science and
Engineering, University of Jeddah, Saudi Arabia.
His research interests include network management,
intelligent Systems, Machine learning, data science and analytics.

RABEEH AYAZ ABBASI received his Ph.D. in

computer science from the University of Koblenz-
Landau, Germany in 2010. He received HEC-
DAAD scholarship for his Ph.D. During his Ph.D.,
he worked on extracting and utilizing semantics in
social media. Afterward, he worked as an assistant
professor at Quaid-i-Azam University, Islamabad,
Pakistan. Currently, he is working at the
Department of Information Systems, King
Abdulaziz University, Jeddah, Saudi Arabia. His current research
interests include social media analytics and social network analysis.

YUKUN BAO (M’04) received the B.Sc.,

M.Sc., and Ph.D. degrees in management
science and engineering from the Huazhong
University of Science and Technology, China,
in 1996, 1999, and 2002, respectively.
Currently, he is an Associate Professor at the
Department of Business and Management,
School of Management, Huazhong University
of Science and Technology, China. He has been
the Principal Investigator for two research projects funded by the
Natural Science Foundation of China and has served as a Referee of
paper review for several IEEE journals and international journals and a
PC member for several international academic conferences. His current
research interests include time series modeling and forecasting,
business intelligence, and data mining.

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.