Abusive Content Detection Using Sentimental Analysis Final

ABUSIVE CONTENT DETECTION
USING SENTIMENT ANALYSIS

Under the guidance of:
Dr.P.Radhika Raju
Project Team:
Patil Rahul Reddy(16001A0501)
Mummadi Ruthwick Reddy(16001A0550)
Muppidi Snigdha(16001A0554)
INTRODUCTION
Sentiment analysis is a logical evaluation of people’s opinions and
emotions. It is currently an active research area in Natural Language
Processing(NLP)and Text Mining.
Sentiment analysis is used to keep the spread of false news in check,
to remove any abusive content, to know the customer experience
and to monitor social media.
The number of social media users are increasing daily, therefore the
need for sentiment analysis cannot be over emphasized.
WHY?
There are many people using social media nowadays, hence abusive
content is also on the rise, due to this many people are getting
effected.
So, to handle this kinds of scenarios, an application which can detect
abusive content from the text data is required to be developed.
SCOPE:
This model only works with emotion oriented information seeking system.
This model works only with text data but not with multimedia data.
WORKING OF SENTIMENT ANALYSIS
IMPLEMENTATION:
Sentiment analysis is a classification algorithm, where a classifier is

fed with data and it returns the corresponding categories like
abusive, non-abusive.
Features are extracted from the data using feature extracting
techniques like TF-IDF(Term Frequency-Inverse Document
Frequency).
Sentiment analysis is done by supervised classification algorithms
such as logistic regression which is fed with features.
Once training is done, the trained model can classify the data into
the corresponding categories.
NORMALIZING AND CLEANING:
In normalizing, we replace some of the short forms of the words into
their full forms.
Eg: ‘u’ will be replaced with ‘you’.
In cleaning stage, we remove all the stop words, punctuations.
Eg: stop words include ‘you’, ‘is’ ,’they’ etc:-.
 Pre-processing of the data happens in these two stages.
TERM FREQUENCY-INVERSE
DOCUMENT FREQUENCY(TF-IDF):
TF-IDF is a statistical measure that evaluates how relevant a word is to a
document in a collection of documents.
 This is done by multiplying two metrics: how many times a word appears in a
document, and the inverse document frequency of the word across a set of
documents.
Term frequency is the frequency of word in the document.
Inverse document frequency gives the inverse of the total occurrences of the
word in all documents.
Chi-Square test:
A Chi-Square test is a test of statistical significance for categorical variables.
The data should be in the form of frequencies or counts of a particular category
and not in percentages
LOGISTIC REGRESSION:
Logistic regression is a classification algorithm used to assign observations to a
discrete set of classes.
Logistic regression is of different types. For example, binary logistic regression
where the possible outcomes are only two.
WORK FLOW OF ALGORITHM
INPUT
Text data for training and testing
Data Cleaning
Tokenization Abbreviation Treatment
Stop Words Removal Bad-words Synonyms
Mapping Punctuation Removal
APPLICATION FLOW
TF-IDF Transformation
Chi Square Feature

Selection
Modelling
Classification
RESULT
INSIGHTS:
53% of comments which have abusive words are not actually abusive
For every one in five comments, abusive word variants are used to insult rather
than direct abusive words
Typing errors are a common part of chat but are penalized heavily by model in
case of a resemblance with abusive words
SUMMARY:
Model has an accuracy of 91.2% on training data and 81% on cross
validation
Logistic Regression is found to be the best suitable model in
comparison to popularly used Naïve Bayes and SVM
1500 relevant features are selected using Chi square test
Common Chat Words Abusive Word Variants

CONCLUSION:
This application helps in reduction of negativity, revolutionary and terror
thoughts.
Reference:
Monkeylearn.com
Datacamp.com
Kaggle.com
Towardsdatascience.com
Scikit-learn.org
THANK YOU

Abusive Content Detection Using Sentimental Analysis Final

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Abusive Content Detection Using Sentimental Analysis Final

Загружено:

Авторское право:

Доступные форматы

ABUSIVE CONTENT DETECTION

USING SENTIMENT ANALYSIS

Sentiment analysis is a classification algorithm, where a classifier is

Chi Square Feature

Common Chat Words Abusive Word Variants

Вам также может понравиться