Вы находитесь на странице: 1из 10

A Lexicon-Based Unsupervised Model to Evaluate Product Ratings Vs

Reviews

Mahapara G.;Taiba N.; Ramamani V.


RACE Students, REVA University
Bengaluru, India
mahapara.ba05@reva.edu.in, taiba.ba05@reva.edu.in, ramamani.ba05@reva.edu.in;

Abstract

With the rapid advancement in the web technology there is a huge amount of data
present in the web for internet users. Such huge amount of data is mainly from the
social media where millions of people express their thoughts and views in their daily
interaction which can be their sentiments or opinions about a particular thing. Mainly
for the E-commerce shoppers the customer reviews are a reliable source of
information. A customer usually searches for online product reviews while evaluating
other alternative products. These ecommerce websites provide the feature for
customer to write the product reviews and scores the product from 1 to 5 or it’s
commonly referred to as star rating. These data are very useful for the companies to
improve the customer Experience. By analysing and getting insights from customer
feedback, companies have better information to make strategic decisions, and an
accurate understanding of what the customer actually wants and, as a result, a better
experience for everyone.

In order to automate the analysis of such data, Sentiment analysis is used. Sentiment
analysis is a rapidly emerging domain in the area of research in the field of Natural
Language Processing (NLP) [1]. Sentiment analysis depends on our ability to identify
the sentimental terms in a corpus and their orientation [2]. This machine learning tool
can provide insights by automatically analysing product reviews and separating them
into tags: Positive, Neutral, and Negative [3].
Essentially, there are two different approaches to extract sentiments automatically.
Classification approach which involved building classifier from labelled instances of
texts or sentences which is a supervised method. Second is the Lexicon based
approach which involves deriving the orientation of the document using semantics of
the words or phrases and this is more of an unsupervised method [4]. Many works have
already been done using different Machine Learning Techniques, so to achieve the
target, in this work, we propose a novel lexicon-based . unsupervised model that
differs from existing models in the way that it aggregates the sentiment values of
positive and negative words within a message.

In this paper we attempt to find out if there exists any difference between the
sentiments of the product reviews w.r.t to the product ratings given by the Customer
in the “Amazon” Website. Here, we will use text mining to summarize users’ reviews
and extract sentiments of the writers of the review. We then use our sentiment lexicons
to mark up all sentiment words and associated entities in our corpus. We will use
dictionary of words annotated with the words semantic orientation or polarity based
on which we will be deriving the orientation of the review. For positive sentiment we
are going to use the rating of 4 and 5 and for negative sentiment from 1 to 3. Then we
will compare the ratings with the sentiments of the product reviews provided by the
customers to find out, whether they are associated with each other or not.
Consequently, a more comprehensive analysis can be undertaken regarding the
sentiment as opposed to positive–negative-neutral classification.

Key Words: Sentiment analysis; Natural Language Processing; Text Mining; Corpus;
Lexicon.

1. Introduction
Sentiment analysis is a line of research that allows to determine people’s attitude and opinions
in relation to different topics, products, services, events, and their attributes.
The role of sentiment analysis has been growing significantly with the rapid spread of social
networks, microblogging applications and forums. Today, almost every web page has a
section for the users to leave their comments about products or services, and share it with
friends on Facebook, Twitter or Pinterest - something that was not possible just few years ago.
Mining this volume of opinions provides information for understanding collective human
behaviour. An increasing amount of evidence is pointed out that by analysing sentiment of
the social-media content it might be possible to predict size of the markets, results of
marketing campaigns and marketing ROI.

2.Research Methodology

2.1 Sentiment Analysis Methodology

The main two methods of sentiment analysis, lexicon-based method and machine learning
based approach, both rely on the bag-of-words. In the machine learning supervised method
the classifiers are using the unigrams as features. In the lexicon-based method the unigrams
which are found in the lexicon are assigned a polarity score, the overall polarity score of the
text is then computed as sum of the polarities of the unigrams. In the recent years more
advanced algorithms for sentiment analysis were developed that take in consideration not
only the message itself, but the context in which the message is published, who is the author
of the message, who are the friends of the author, what is the underlying structure of the
network.

The Lexicon based model follows the following steps to extract the sentiment of the texts
2.1.1 Data Collection
For Our work we collected the data
using the online reviews on the “One
Plus 7 pro phones” available through
Amazon.com. Review data on
Amazon.com is provided through the
product’s page, along with general
product review and the star rating
provided by the customers. We
retrieved the pages containing all
customer reviews for “One Plus 7pro”
mobile phone. Our first criterion for
selecting this particular product was
that the specific product had a
relatively large number of product
reviews compared with other products
Figure 1. Flow Diagram of the Work to be Performed
in that category. We then Scraped the data using Parse hub tool available online to get the
desired dataset.
For our work, we obtained the posted reviews for “One Plus 7pro” mobile phone, and a total
of 2000 reviews has been collected along with the respective star rating given by the customers
who have bought this Phone. Each web page containing the set of reviews for a particular
product was parsed to remove the HTML formatting from the text and then transformed into
an XML file that separated the data into records (the review) and fields (the data in each
review).
We excluded from the analysis reviews that did not have anyone vote whether the review was
helpful or not.

2.1.2 Data Exploration:


Python was used for Exploration and understanding of the data.
From the cursory look at the data collected,
we found the product to be quite popular
among the masses. Over 81% reviews were
positive (4 or 5 out of 5 stars). But sometimes,
the ratings given may be misleading. Actual
sentiment of the review may differ from the
star rating given. Hence our aim is to compare
the star ratings given with the actual
sentiment of the review and find out if there
is a major mismatch between the two.
Working towards this goal, we analysed the
star ratings associated with the reviews. He
Figure 2. Figure Showing the Percentage of Total Star Ratings
we are categorizing the reviews as “Positive” whose star ratings are”>3” and the review as
“Negative” whose star ratings are “<3”. The below table shows the Rating categories.

Table 1. Reviews along with the Star-ratings showings the category of Ratings it falls to
whether “Positive “or” Negative”

2.1.3 Data Pre-processing

We perform the pre-processing steps before the actual methods of sentiment analysis are
applied. Typical pre-processing procedure includes the following steps:

Tokenization or Bag-of-Words Creation: Tokenization is just the process of splitting a


sentence into words. The incoming string is broken into tokens: comprising words and other
elements, for example URL links. The common separator for identifying individual words is
whitespace, however other symbols can also be used.

Text
Data Science is Fun.

Tokens
“Data”,”Science”,”is”,”Fun”.

Figure 3. Tokenization of a Sentence


Stemming and lemmatisation: The goal of both stemming and lemmatization is to reduce
inflectional forms and sometimes derivationally related forms of a word to a common base
form.” With that being said, stemming/lemmatizing helps us reduce the number of overall
terms to certain “root” terms.
The dimensionality of the BOW will be reduced when different words, such as read, reader
and reading are mapped into one word read and are counted together. However, one should
be careful when applying stemming, since it can increase bias. For example, the biased effect
of stemming is merging of distinct words experiment and experience into one-word
experiment, or words which ought to be merged together (such as “adhere” and “adhesion”)
may remain distinct after stemming. These are examples of over stemming and under-
stemming errors respectively. Over stemming lowers precision and under stemming lowers
recall. Python provides different stemmers in different languages. For our purpose we have
used “PorterStammer” which is the oldest and most commonly used English language
stemmer. PorterStemmer uses Suffix Stripping to produce stems.
Python NLTK provides WordNet Lemmatizer that uses the WordNet Database to lookup
lemmas of words. We downloaded the WordNet corpora from NLTK downloader before
using the WordNet Lemmatizer bring the words in our review texts to their base form.
Once the above exercise of cleaning the review text was completed, we joined words to form
the original sentence. The resulting data we got was amazingly simple and easy for analysis.
Stop-words removal: Stop words are words which carry a connecting function in the
sentence, such as prepositions, articles, etc. There is no definite list of stop words, but some
search machines, are using some of the most common, short function words, such as the, is,
at, which, and on. They can be removed since they have a high frequency of occurrence in the
text, but do not affect the final sentiment of the sentence. For our work we downloaded
the stopwords corpus using nltk.download() and used it to strip our review text from these
unnecessary words. Further we also removed handles (@), numbers, URLs, emojis and any
other special characters to have only text. Next step is to Strip the records and create a word
list for each text and remove those words which are not contributing in analysis like ‘star’,
‘phone’, ‘pro', 'one', 'oneplus', 'plus', 'review' etc.
Lowering the Case of the Letters: All tokens are lowered in case.
Part-of-Speech Tagging (POS): The process of part-of-speech tagging allows to automatically
tag each word of text in terms of which part of speech it belongs to: noun, pronoun, adverb,
adjective, verb, interjection, intensifier etc. The goal is to be able to extract patterns from
analysing frequency distributions of these part-of-speech tags and use it in the classification
process as a feature.
Data used for this study is the Amazon Product dataset consisting of Customer Reviews. Data
has three class labels – positive, negative, neutral. A visualization of frequency of each class
label is as follows:

2.1.4 Bag Of words

A bag-of-words model, or BOW for short, is a way of extracting features from text for use in
modelling, such as with machine learning algorithms. The approach is very simple and
flexible, and can be used in a myriad of ways for extracting features from
documents.

These words are unique


words occurring in our Text
document and will be used
as the features from our
Lexicon-Based Model. These
features will again be used
further for training our
machine learning algorithms.

Figure 4. Figure showing the occurrence of unique words


The model is only concerned with whether known words occur in the document, not
where in the document.

2.4.5 Scoring Words


Once a vocabulary has been
chosen, the occurrence of words in
the documents needs to be scored.
Some additional simple scoring
methods include:

Counts. Count the number of times


each word appears in a document.

Frequencies. Calculate the


frequency that each word appears
in a document out of all the words
in the document. In this approach,
we look at the histogram of the
words within the text, i.e.
considering each word count as a
feature.
Figure 5. Figure showings the Most Frequent words in the Text document

3. Consistency Assumption Verification


The following details can affect the review of the users and products in terms of sentiment (e.g. 1-5
rating stars) and text, and verify them on review datasets. We argue that the influences of users and
products include the following four aspects.
• User-Sentiment consistency: A user has specific preference on providing sentiment ratings.
Some users favour giving higher ratings like 5 stars and some users tend to give lower ratings. In other
words, sentiment ratings from the same user are more consistent than those from different users.
• Product-sentiment consistency: Similar with user-sentiment consistency, a product also has
its “preference” to receive different average ratings on account of its overall quality. Sentiment ratings
towards the same product are more consistent than those towards different products.
• User-Text consistency: A user likes to use personalized sentiment words when expressing
opinion polarity or intensity. For example, a strict user might use “good” to express an excellent
attitude, but a lenient user may use “good” to evaluate an ordinary product.
• Product-Text consistency: Similar with user text consistency, a product also has a collection of
product-specific words suited to evaluate it. For example, people prefer using “sleek” and “stable” to
evaluate a smartphone, while like to use “wireless” and “mechanical” to evaluate a keyboard.

4. Lexicon-based classification

Application of a lexicon is one of the two main approaches to sentiment analysis and it
involves calculating the sentiment from the semantic orientation of word or phrases that occur
in a text. In unsupervised technique, classification is done by comparing the features of a given
text against sentiment lexicons whose sentiment values are determined prior to their use.
Sentiment lexicon contains lists of words and expressions used to express people’s subjective
feelings and opinions. With this approach a dictionary of positive and negative words is
required, with a positive or negative sentiment value assigned to each of the words. Generally
speaking, in lexicon-based approaches a piece of text message is represented as a bag of
words. Following this representation of the message, sentiment values from the dictionary are
assigned to all positive and negative words or phrases within the message. A combining
function, such as sum or average, is applied in order to make the final prediction regarding
the overall sentiment for the message. Apart from a sentiment value, the aspect of the local
context of a word is usually taken into consideration, such as negation or intensification.

The lexicon-based techniques to Sentiment analysis is unsupervised learning because it does


not require prior training in order to classify the data. The basic steps of the lexicon-based
techniques are outlined below [9]:
1. Pre-process each text (i.e. remove HTML tags, noisy characters).
2. Initialize the total text sentiment score: s ← 0.
3. Tokenize text. For each token, check if it is present in a sentiment dictionary.
(a) If token is present in dictionary,
i. If token is positive, then s ← s + w.
ii. If token is negative, then s ← s − w.
4. Look at total text sentiment score s,
(a) If s > threshold, then classify the text as positive.
(b) If s < threshold, then classify the text as negative.
In our work we decided to apply a lexicon-based approach in order to avoid the need to
generate a labelled training set. The main disadvantage of machine learning models is their
reliance on labelled data. It is extremely difficult to ensure that sufficient and correctly labelled
data can be obtained. Besides this, the fact that a lexicon-based approach can be more easily
understood and modified by a human is considered a significant advantage for our work. We
found it easier to generate an appropriate lexicon than collect and label relevant corpus. Given
the Customers review of “Products” data is pulled from shopping site “Amazon”. For the
sentiment analysis here, we are using the dictionary-based approach. The idea is to first collect
a small set of opinion words manually with known orientations, and then to grow this set by
searching in the, AFINN which is a list of words rated for valence with an integer between
minus five (negative) and plus five (positive). Sentiment analysis is performed by cross-
checking the string tokens (words, emojis) with the AFINN list and getting their respective
scores. Below is the sample of the Sentiment score and the polarity we got:
Table 2. Product reviews with their respective star ratings, sentiment score and the category
to which this sentiment fall into (As per our work if the sentiment score is greater than 3 it
is considered as the “Positive” and if its below 3 it is considered as “Negative”.
When we reviewed and compared the Sentiment Scores with the actual review and the Star
Ratings, we saw a lot of difference.
This prompted us to use Machine learning technics like KNN and Randomforest to see if we
could analyse the sentiment of the review text better than what we had achieved so far with
the unsupervised method.

5. Machine Learning Algorithms


K-Nearest-Neighbours (KNN) is a non-parametric supervised classification algorithm, which
is simple yet effective in many cases. The KNN classifier is considered as the most popular
classifier for pattern recognition due to its effective performance with efficient results and its
simplicity. KNN algorithm classifies by analogy i.e. by comparing the unknown data point
with the training data points to which it is similar. Similarity is measured by Euclidean
distance.
To use machine learning models, we had to divide data into Test and Train samples. We had
checked our results with four different splits of Test and Train samples (i.e.
60:40,70:30,80:20,90:10). We then tried KNN model with different values of K. The
performance of this model was analysed on the basis of various performance analysis metrics
namely precision, recall and accuracy. Accuracy of 63% was the best we could achieve using
KNN model.

Figure 6. The result Obtained from the KNN Algorithm, where the Accuracy achieved was
63%.
Random forest, which were formally proposed in 2001 by Leo Breiman and Adèle Cutler, are
part of the automatic learning techniques. This algorithm combines the concepts of random
subspaces and "bagging". The decision tree forest algorithm trains on multiple decision trees
driven on slightly different subsets of data. The random forest algorithm is one of the best
among classification algorithms - able to classify large amounts of data with accuracy. It is an
ensemble learning method for classification and regression that constructs a number of
decision trees at training time and delivers the class that is the mode of the classes output by
individual trees. Random forest is an ensemble learning method that construct a number of
decision trees at randomly selected features and predict the class of a test instance by voting
of the individual trees.
For our study we also adopted
Randomforest to see if we could achieve
better accuracy in scoring or predicting
the polarity of the review texts.
RF is not sensitive to input parameters;
thus, we just used the default
parameters for each classifier. The
trained classifiers return scores between
0 and 1, these scores are then
Figure 7. The result Obtained from the RandomForest Algorithm,
where the Accuracy achieved was 62%.
‘negative’ or ‘positive’. For each combination, the existence of element is considered positive
(P) or negative (N). The classification metrics considered for the sentiment analysis are
Accuracy, Precision, Recall and F-Measure and these parameters are evaluated based on the
calculated positivity and negativity of reviews by the proposed hybrid approach. With
Randomforest method we were able to achieve 62% accuracy.

6.Conclusion
The interest in sentiment analysis as a field of research is growing rapidly. It has been shown
that transformation of the huge volume of textual data from the web into meaningful
information can be very useful. However, the task of accurate opinion extraction still remains
challenging. Most of the times the sentiments of the reviews do not match with the Star rating
provided by the customers, this may impact the Business of the Online shopping sites as
Online reviews are important because they have become a reference point for buyers across
the globe and because so many people trust them when making purchasing decisions and the
star ratings gives the overall picture of Customers experience. It is important to address the
difference between the sentiment of the reviews and the star ratings because a lot of
customers “use rating filters” to simplify their searches, so if the average star rating for
the particular product comes below the rating of 3 it would be considered as not
impressive product. Sometimes, the reviews will be unfair and even false and the rating
given will be “5” then it would be considered as the best product. To address such fake
and biased reviews we come up with this work. The machine learning Algorithm like
“KNN” and the “RandomForest” shows the accuracy as 63% and 62% ,this shows that
there exists an inconsistency between the sentiment score of the Reviews wrt to the star
ratings.
7.Future Work

There is a lot of scope in analysing the video and images on the web. Now a days, with
the advent of Facebook, Instagram and Video vines people are expressing their
thoughts with pictures and videos along with text.

Sentiment analysis will have to pace up with this change. Tools which are helping
companies to change strategies based on Facebook and Twitter will also have to
accommodate the number of likes and re-tweets that the thought is generating on the
Social media.

People follow and unfollow people and comments on Social Media but never
comment so there is scope in analysing these aspects of the Web as well.

References
1. NLP based sentiment analysis for Twitter's opinion mining and visualization - Maha Al-
Ghalibi; Adil Al-Azzawi; Kai Lawonn(2019).
2. Sentiment analysis and the complex natural language Khan et al. Complex Adapt Syst
Model- Muhammad Taimoor Khan, Mehr Durrani2, Armughan Ali, Irum Inayat, Shehzad
Khalid and Kamran Habib Khan (2016).
3. Sentiment analysis using product review data- Xing Fang and Justin Zhan Fang and
Zhan Journal of Big Data (2015).
4.Simple and Practical lexicon based approach to Sentiment Analysis- Prabu Palanisamy,
Vineet Yadav and Harsha Elchuri (2013)
5. Sentiment Mining of Movie Reviews using Random Forest with Tuned
Hyperparameters. (2014) Parmar, Hitesh & Bhanderi, Sanjay & Shah, Glory.
6. Random Forest and Support Vector Machine based Hybrid Approach to Sentiment
Analysis (2018) Yassine AL AMRANI, Mohamed LAZAAR , Kamal Eddine EL KADIRI
7. KNN classifier-based approach for multi-class sentiment analysis of twitter data.(2018)
Soudamini Hota, Sudhir Pathak
8. Twitter Sentiment Analysis: Lexicon Method, Machine Learning Method and Their
Combination, Olga Kolchyna, Th´arsis T. P. Souza1, Philip C. Treleaven and Tomaso Aste,
Department of Computer Science, UCL, Gower Street, London, UK, Systemic Risk Centre,
London School of Economics and Political Sciences, London, UK.
9. Serendio: Simple and Practical lexicon based approach to Sentiment Analysis Prabu
Palanisamy, Vineet Yadav and Harsha Elchuri Serendio Software Pvt Ltd Guindy, Chennai
600032, India

Оценить