LSTM V2

Abstract:
Deep learning plays a vital role in today’s technology. Along with the success of deep learning in many
other application domains, deep learning is also popularly used in sentiment analysis in recent years
Social and video sharing media is one of the platforms, most of the people use to express their feelings,
thoughts, suggestions and opinion via blog posts, status update, website, forums and online discussion
groups etc. Especially when sports like cricket, soccer and football are played, then many discussions are
made on social media like Twitter and respective forums using their restricted words. The opinion
expressed by huge population may be in a different manner/different notation and they may comprise
different polarity like positive, negative or both positivity and negativity regarding current trend. So,
simply looking at each opinion and drawing a conclusion is very difficult and time consuming. A notable
works has been shown for opinion mining in different category based on English language. Whereas
sentiment analysis on Bangla language in different category such as cricket is inadequate because of
scarcity of data. We have worked one of the renewable dataset “ABAS” and also add 25% data on it. In
this paper we have presented an implementation of one of deep learning approach called Long Short-
Term Memory (LSTM) for sentimental analysis on cricket match. The accuracy of this approach has
beyond than previous all of the method.
Introduction
For the massive improvement of the technology, people all over the world expresses their feelings or
opinions through the various social media on a wide-ranging topics. Majority of this topics includes text
data like emails, chats, social media, surveys, articles, and documents. These texts are usually difficult, to
analyze, understand, and sort through. So, it is urgent task to epitomize the unstructured data created by
people over the social media. Moreover in the field of NLP, it has become a topic of enormous interest to
analysis these unstructured data which named as Sentiment Analysis (SA). Sentiment analysis is the
computational study of people’s opinions or sentiments towards different entities such as products,
services, organizations, individuals, issues, events, topics and so on. It is also a contextual mining of text
which convey the emotions, sentiments or suggestion of an individual and extracts subjective information
to understand the social sentiment of their brand, product or service while monitoring the text on the web.
There is a lack of direct interaction between buyers and sellers on social media or any web based business
or organization for which it has become harder to monitor and analyze buyers reviews about services or
entities. Also it is tedious and time-consuming task to go through the process of checking out reviews and
comments of individual customers and figuring out the sentiments. Though a lots of research on the
sentiment analysis in English language has been done by many authors whereas a few of SA in Bengali
language has been represented. So it is a challenging task to work on SA in Bengali language because of
shortage of resources. Almost 200 million people is spoken Bangla language as their first language where
160 million people are Bangladeshi. Thus it is also a significant job to work with Sentiment Analysis in
Bengali language.
In this paper we are influenced to work with Sentiment analysis on Bangladeshi Cricket. Most of the
people in Bangladesh love cricket game as they love their own religion []. This game creates a vibe in
people emotions during the match and also after and before the match as well. Because of that they
expresses their sentiment or emotion on social media like Facebook, Twitter, YouTube and so on. Hence
this an exciting field to analyze with real people sentiments or emotions for cricket. In this way, we are
influenced to work with sentiment analysis on cricket game.
Recurrent neural networks (RNN) are a type of neural network which recurrent the connections in order
to form memory and which makes the information persistent. In RNNs, all the inputs are connected to
each other which makes it promising for sequential classification like sentiment analysis [ref 2]. RNN has
been already used to speech recognition [], handwriting recognition [], natural language processing [] and
so on. In the way of Sentimental Analysis, RNNs shows remarkable accuracies to analyze various
language like English [], Arabic [], Chinese [], Turkish [], Bengali [] etc. It has also frequently used to
classify sentiments on various topics like Facebook status [], YouTube comments [], product reviews [],
hotel reviews [], movie reviews [] and etcetera. But when dealing with long-term dependencies, RNNs
have trouble to deal with it. By presenting a memory into the network, Long-Short Term Memory
(LSTM) can solve this long-term dependencies problem [ref1]. LSTM is a distinct type of RNN, which is
capable of learning long-term dependencies [ref 4]. The combined model of RNN and LSTM has been
also used in SA where the problem of long-term dependencies arisen. Thus we are very much influenced
to classify sentiment on Bangladeshi cricket through Bengali texts analysis with the implementation of
both RNN and LSTM.
Deep learning technique that learns multiple layers of representations or features of the data and produces
state-of-the-art prediction results. In this paper we represented an automated system based on deep
learning to analyze sentiment about Bangladeshi cricket by using Recurrent Neural Network (RNN) with
Long-Short Term Memory (LSTM). Various sentiments of different people have been extracted from
different social media and news portal. Then these sentiments are categorized and identified the overall
polarity of comments as positive, negative or neutral.
Related Work
Sentiment Analysis has been widely used on English Language as well as other language. A wide range of research has already
been done in this scope by applying many effective method.
A Long Short-Term Memory (LSTM) based sentiment analysis has been proposed in []. The authors were prone to justify that a
combination of RNN and LSTM can have better recall and accuracy rate than conventional RNN. They have used both English
and Chinese texts for the SA, where Chinese text has been gone through a segmentation process as there are no participle in
Chinese texts or language.
Multi-class Sentiment Classification based on deep neural networks evaluated in [] where CNN and LSTM has been used to
capture the contextual information from the input texts. They have combined the deep learning model with one-versus-rest
training mechanism in order to apply it to multi-class classification. To extract partial features from the texts they considered the
two layers of CNN model and they claim to achieve an accuracy of 78.42% of their dataset.
Another compelling research has been represented by Yaser et al. [] in order to automated the sentiment analysis of twitter which
consists of three dataset named IMDb, AMAZON and Airline. The authors took two approaches for sentiment analysis, one of
them is machine learning approaches which is consists of support vector machine, naive Bayes, decision tree, and K-nearest
neighbor and another one is deep neural network includes recurrent neural network using Long Short-Term Memory (LSTM).
To overcome the drawback of using three type of RNNs for an in-depth analysis of predicting the sentiment of reviews in [], the
authors employed three RNNs named vanilla RNNs, Long Short-Term Memory(LSTM)and Gated Recurrent Units(GRU). These
networks are considered both unidirectional and bidirectional nature. They has been evaluated their proposed networks
performance on the AMAZON dataset and sentiment analysis benchmark datasets SST-1 and SST-2 where they found that GRU
to be the best choice in order to achieve best accuracy.
By analyzing restaurant and movie reviews, a deep learning based research has been proposed in this paper []. RNN-based Deep-
learning Sentiment Analysis (RDSA) employed to suggest or recommend the nearest place or particular place as per the
requirement of an individual by analyzing the different reviews and taste and consequently computing the score. The authors
single Z-order is used to assist the users by finding precise place according to the requirement.
In order to summarize Sentiment Analysis on Bengali Language, several notable research has been already investigated by many
respective authors.
In [], the authors represented a model in order to classify sentiment of Bengali text using Recurrent Neural Network (RNN) with
bidirectional LSTM (BiLSTM). They employed BiLSTMs to escalate the lump of input information for usable to the network
because Bi-directional LSTM information flow both backward to forward and forward to backward using two hidden layers.
10000 comments of Facebook status considered as dataset where their proposed model achieved 85.67% accuracy.
Bangla and Romanized Bangla Text (BRBT) sentiment analysis has been evaluated by Asif et al. [] by using Deep Recurrent
model. They have used three types of fully connected neural networks layer where one and two layer of its used to categorize
positive and negative sentiments and number three output nodes were used for ambiguous sentiments.
For detecting multilevel sentiment and emotions from Bengali YouTube videos [], the authors built a deep learning model by
using both CNN and LSTM architecture. To identify three (positive, negative, neutral) and five label (strongly positive, positive,
neutral, negative, strongly negative) sentiment as well as emotions, they considered Support Vector Machine (SVM) and Naive
Bayes (NB) as their baseline methods. Term Frequency Inverse Document Frequency (Tf-Idf) with n-gram tokens has been used
to extract set of features from respective sentence. They found 65.97% and 54.24% accuracy for three and five labels sentiments
respectively.
By using multinomial Naïve Bayes and Support Vector Machine [], a sentiment polarity detection approach has been investigated
by Kamal et al. Their proposed model was tested on Bengali tweet dataset. They also used feature set named n-gram and
SentiWordnet for the feature extraction.
In [], a comprehensive study on sentiment of Bengali text has been proposed by Md. Al- Amin, Md. Saiful Islam and Shapan Das
Uzzal where to detect the sentiment more accurately and calculate positive vector, negative vector and query vector cosine
similarity using TF-IDF was applied. Naïve Bayes model also have been applied by using Unigram & stammer in order to get
good performance and Hellinger PCA to determine the actual contexts of the words.
A character level Supervised Recurrent Neural Network approach has been taken in [], where RNN was used to classify Bengali
sentiment and these sentiments are categorized into positive, negative and neutral sentiments.
Sentiment analysis on Bangladeshi Cricket game has been done by Shamsul et al. [] where they used Support Vector Machine
(SVM) classifier to classify their text dataset. They built a dataset from ABSA dataset, Facebook group of Bangladesh Cricket
and sports section of Prothom-Alo newspaper. After parsed and tokenized all the words from the dataset, they used TF-IDF
Vectorizer for vectorization the text data. Besides SVM they also used Decision Tree and Multinomial Naive Bayes. They used
10% of their dataset for testing purpose and where they found 64% accuracy.
MODEL
In the previous section it is explained why RNN and LSTM work well for sentiment analysis. In this
section a description of these two models is given. Both models are an extension of the standard neural
network. The LSTM network is built upon the RNN, so the RNN will be discussed first
Reference 1
RNN
Recurrent neural networks (RNN) are type of network that exploit the sequential nature of their input.
Such inputs could be text, speech, time series and anything else where the occurrence of an element in the
sequence is dependent on the elements that appeared before it [2]. RNN is a class of net that can predict
future. A recurrent neuron stores the state of a previous output and combine with the current input there
by preserving some relationship of the current input with the previous input.
Yt-3 Yt-2 Yt-1 Yt

Y
𝛴 𝛴 𝛴 𝛴 𝛴
X Xt-3 Xt-2 Xt-1 Xt

A recurrent neural network looks very much like a feed forward neural network, except that it also has
connections pointing backward. In figure 1.1(Left) illustrate the simplest possible RNN, composed of just
one neuron receiving input, producing an output, and sending that output back itself. At each time step t,
this recurrent neuron receives the input xt as well as its on output from the previous time step, y(t-1). We
can represent this tiny network against the time axis illustrate in figure 1.1(Right). This is called unrolling
the network through time. When gradient is passed back through many steps the network parameters
tuned by backpropagation through time.
<Tx> <t>
Y Y
Wya Wya
<0> <1> <2>

a a a Waa
Waa Waa
Wax Wax Wax Wax Wax

<1> <2> <3> <Tx> <t>
X X X X X
One to many
In figure 1.2 illustrate the unrolled diagram of Recurrent neural network. Here represents
<0>
the hidden layer each of this neural network and a is the activation of the RNN network. Feed the
network a sequence of input and ignore all outputs except for the last one. In other words, this is a
sequence to vector network or many to one network. This network takes in a sentence (a sequence of
words) and outputs a sentiment (positive, negative or neutral).
In time step one input X<1> feed into first neural network with hidden layer and try to predict the output
and ignore it, then pass to the second time step . It then goes to the second neural network and feed the
second input X<2>. The second neural network instead of predicting something by only X <2>, it also gets
some input information from whether compute at time step one. So, in particular the activation value from
time step one is passed to the time step two. Then the next time step neural network input of X <3> ,
process the information and pass it to next time step and so on up until the last time step X <Tx> and then
predict only one output Y<Tx>.
LSTM
The LSTM is a variant of RNN that is capable of learning long dependencies. LSTMs were first proposed
by Hochreiter and Schmidhuber and refined by many other researchers. They work well on a large variety
of problems and are the most widely used type of RNN.
In training the RNN involves backpropagation. Since the parameter are shared by all time steps, the
gradient at each output depends not only on the current step, but also on the previous state. This process
known as backpropagation through time (BPTT). When RNN trained by backpropagation through time
therefore unfolded into feed forward network. When gradient is passed back through many steps, it tends
to grow or vanish. This known as vanishing gradient problem in long term task [3b]
The LSTM cell is specially designed unit of logic that will help to reduce the vanishing gradient problem
sufficiently to make recurrent neural networks more useful for long term memory task i.e. text sequence
prediction. The way it does so by creating an internal memory state which is simply added to the process
input and greatly reduce the multiplicative effect of small gradient. The time dependence and effect of
previous inputs are controlled by an interesting concept called forget gate, which determines which state
are remembered or forgotten. Two other gates, the input gate and output gate are also feature of LSTM
cells. The figure 2.1 illustrate LSTM Cell.
yt-1 yt yt+1
sigmoid
Ct-1 Ct Ct+1
x +
x tanh x
memory
new
sigmoid sigmoid tanh sigmoid

Forget i Input output
f n o
U
U + gate
f + gate
i
U + n
U + gate
o ht ht+1
h W W W W
[4] In the above diagram the line across the top is the cell state c, and represent internal memory of the
unit. The line across the bottom is the hidden state and the f, i, n, o, are forget gate, input gate, new
memory state and output gate respectively. The gate mechanism by which the LSTM cell works around
the vanishing gradient problem. During the training time LSTM learns the parameters for these gates. At
time step t, LSTM first decide what information to dump or pass from the cell state. The decision is made
by sigmoid function called “forget gate”. The function takes output from previous time step ht-1 and
current input Xt. The sigmoid function modulates the output of these gates between zero and one, so the
output vector produced can be multiplied element-wise with another vector to define how much of the
second vector can pass through the first. Where 1 means “completely keep” and 0 means “completely
dump” in equation below
ft = 𝜎(Wf Xt + Uf ht-1)
Then LSTM decide what new information to store in the cell state. It has two steps. First a sigmoid
function called “input gate” as in equation, decides which values LSTM will update. Next, a tanh function
creates a vector of new candidate values Ct, which will be added to the cell state. LSTM combines these
two to create an update to the state.
it = 𝜎 (Wi Xt + Ui ht-1)
Ĉt = tanh (Wn Xt + Un ht-1)
It is time to update the old cell state Ct-1 into new cell state Ct as equation. Note that forget gate ft
can control the gradient passed through it and allow for explicit “memory” dump and updates,
which helps to alleviate vanishing gradient or exploding gradient problem in RNN.
Ct = ft * Ct-1 + it * Ĉt
Finally, LSTM decides the output, which is based on the cell state. LSTM first runs a sigmoid layer,
which decides which parts of the cell state to output in Equation (11), called “output gate”. Then, LSTM
puts the cell state through the tanh function and multiplies it by the output of the sigmoid gate, so that
LSTM only outputs the parts it decides to as Equation (12).
Ot = σ (WO xt + UO ht-1)
ht = Ot * tanh (Ct)
DATASETS
Creating Data
We have worked one of the renewable dataset “ABAS” and also add 25% data on it. After extending
ABSA, this dataset has used in this paper. We used in this dataset is the extended version of previous
ABSA datasets [5]. This datasets about Bangladesh cricket related comments, which is quite similar to the
datasets we want to build. This ABSA datasets contains 2979 data with 5 columns. This tiny dataset is not
suitable with LSTM RNN for sentiment analysis. Because this dataset causes high variance problem in
deep learning. To overcome this issue, we added more data with existing ABSA datasets. Hence reduce
high variance problem. The extended data was manually picked from various online resources i.e.
Facebook, YouTube, Prothom-Alo, BBC Bangla and Bdnews24.com. To pick data tried to find a proper
way to collect and prepare main datasets as data collection is time expensive sometimes.
Data Preprocessing
Data processing is the importance section in Natural language processing. The real-world data are messy
not cleaned well and often not suitable to feed neural network. To feed real-world data into LSTM RNN
network data processing is the key section for sentiment analysis. In our datasets the comments obtained
from various online sources are noisy and often contains error, unnecessary information and duplication.
This data is also messy for any Natural language application.
The preprocessing of the text data is an essential step as there we prepare the text data ready for
the mining. If we do not apply then data would be very inconsistent and could not generate good
analytics results. So, in preprocessing of the data all the punctuation, unimportant words are
removed and words can be grouped into groups, words can stem to their roots, all missing values
can be replaced with some values, case of text could be replaced into a single one and mostly
depending upon the requirement of the application. That’s why we process our data step by step as it
doesn’t carry much weightage in context of the text.
Stopwords: -
Stopwords refers to the most common words in a language. The most common words such as এবং,
have no impact to predict our model. But there some words such as না, নাই, নেই , নয়
এবার, ওরা, কে, কেউ ইত্যাদি
have important impact on negative sentiment and such as হ্যা, করে , কাজ , কাজে have important impact on
positive sentiment. We list these words as whitelist and remove these from stopwords list.
There are some tools1 specially avoid removing these stopwords to support neural network. We remove
stopwords from our extended datasets. There is multiple resource for Bangla stopwords removal
technique [1] [2]. (Resources2)
[1]
https://github.com/stopwords-iso/stopwords-bn
[2]
https://www.nltk.org/book/ch02.html
Text Process: -
Links, URLs, user tags, emotion and mention from comments, hash-tags, punctuation marks were
removed to give annotators an unbiased text, only content to make a decision based on three criteria
positive, negative and neutral.
Name Process: -
Name process is another text data compression technique where mainly all proper noun of Bangla is
substitute by a common word. It does not affect an accuracy of the model but compress the dataset. For sentiment
analysis of this dataset we mainly replace all country name such as (বাংলাদিশ, ভারত্, পাদেস্তান, অদেদলয়া ) by an
word “কিশ” and all players name such as (ত্াদিি, িাশরাদি, সাদেব,দবরাট কোহদল) is replaced by an word
“কস”. We have also replaced different spelling and nick name of the players by “কস”. Some of example are given
below.
Different Name Substitution
সাদেব আল হাসান/সাদেব/ শাদেব আল কস

হাছান/শাদেব/েযাদেন
িাশরাদি দবন “
িরত্ু জা/িাশরাদি/িযাশ/িরত্ু জা/েযাদেন
ত্াদিি ইেবাল/ত্াদিি/ “
ছাব্বির রুম্মান/ছাব্বির রাহিান/ সাব্বির “

রাহিান
িুশদিেুর রদহি/িুশদিে/ “
দলটন িাস/ দলটন “
Consider these example

“বাংলাদিশ খুব ভাদলা কখদল ”- positive sentiment “কিশ খুব ভাদলা কখদল”- positive sentiment
“দলটন িাস বযাটটং োইল খুব সুন্দর”- positive sentiment “কস বযাটটং োইল খুব সুন্দর”- positive sentiment
Manual validation: -
Collected data samples are manually annotated into one of three categories: (1) positive (2)
negative and (3) neutral. Each extended data sample was manually annotated by three different native
Bangla speaking individuals for total three validations [6]. Each annotator validated the data without
knowing decisions made by other. This ensure that the validations are unbiased and personal.
Elongated words often contain more sentiment information for multiclass categorizations. For example,
“বাহ্হহ্ হ অদনে ভাদলা!!” Certainly, provides more positive feelings. Therefore, instead of applying
lemmatization we had kept only elongated words [7].
Feature Extraction
The datasets require extraction before starting it for predicting model. Data or text must be passed and
tokenized before using predicting model. Tokenization is the process of splitting a sequence of string or
sentence into smaller parts such as words. In order to implement a deep learning-based model, we need to
represent each word in a sentence as a vector representation. To do this we take each of the sentence from
the datasets and tokenize it by NLTK library. Then we apply word2vec algorithm for proper vector
representation of each word. Word2vec is an efficient algorithm for learning a word embedding from a
text corpus. Word embedding are way to transform words in text to numerical vector’s so that they can be
analyzed by standard deep learning algorithm that’s require vector’s as numerical input. Word2vec taking
input a large corpus of text and producing a vector space of words.
We have implemented both Continuous Bag of Words (CBOW) and Skip gram model from [8]. In
CBOW architecture, the model predicts the center of word given a window of surrounding words. Thus, it
predicts the center word given the context word. In contrast the skip gram architecture the model predicts
the surrounding words given a center of word. Then the embedding weights is extracted from the word’s
weights matrix (CBOW and Skip gram) and generate vector of each word. Then feed it to LSTM
network.
Reference of Model Section:
1. Sentiment Analysis with Long Short-Term Memory networks

2. A Deep Recurrent Neural Network with BiLSTM model for Sentiment Classification
3. Deep Learning with keras by Antonio Gulli, Sujit Pal page: 260
4. Deep Learning for Sentiment Analysis: A Survey
5. Sentiment Analysis on Bangladesh Cricket with Support Vector Machine
6. Sentiment Analysis on Bangla and Romanized Bangla Text (BRBT) using Deep Recurrent models
7. Detecting Multilabel Sentiment and Emotions from Bangla YouTube Comments
8. Learning Representations of Text using Neural Networks, by T. Mikolov, I. Sutskever, K. Chen, G.
S. Corrado, J. Dean, Q. Le, and T. Strohmann, NIPS 2013)

LSTM V2

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

LSTM V2

Загружено:

Авторское право:

Доступные форматы

Abstract:

Yt-3 Yt-2 Yt-1 Yt

X Xt-3 Xt-2 Xt-1 Xt

<0> <1> <2>

Wax Wax Wax Wax Wax

sigmoid sigmoid tanh sigmoid

Different Name Substitution

সাদেব আল হাসান/সাদেব/ শাদেব আল কস

ছাব্বির রুম্মান/ছাব্বির রাহিান/ সাব্বির “

দলটন িাস/ দলটন “

Consider these example

1. Sentiment Analysis with Long Short-Term Memory networks

Вам также может понравиться