Академический Документы
Профессиональный Документы
Культура Документы
net/publication/321902238
CITATIONS READS
0 354
2 authors:
Some of the authors of this publication are also working on these related projects:
iRead: personalised reading apps for primary school children View project
All content following this page was uploaded by Marcos Zampieri on 16 March 2018.
OFFENSIVE 0.0
HATE OK
Predicted label
Figure 2: Confusion matrix of the character 4-gram model for our 3 classes. The heatmap represents
the proportion of correctly classified examples in each class (this is normalized as the data distribution
is imbalanced). The raw numbers are also reported within each cell. We note that the H ATE class is the
hardest to classify and is highly confused with the O FFENSIVE class.
Finally, we also examine a confusion matrix for cation setting. In binary classification, Dinakar
the character 4-gram model, as shown in Figure 2. et al. (2011) note that the frequency of offensive
This demonstrates that the greatest degree of con- words helps classifiers to distinguish between hate
fusion lies between hate speech and generally of- speech and socially acceptable texts.
fensive material, with hate speech more frequently We see a few directions in which this work
being confused for offensive content. A substan- could be expanded such as the use of more ro-
tial amount of offensive content is also misclas- bust ensemble classifiers, a linguistic analysis of
sified as being non-offensive. The non-offensive the most informative features, and error analysis
class achieves the best result, with the vast major- of the misclassified instances. These aspects are
ity of samples being correctly classified. presented in more detail in the next section.
Pete Burnap and Matthew L Williams. 2015. Cyber Shervin Malmasi, Mark Dras, and Marcos Zampieri.
hate speech on twitter: An application of machine 2016a. Ltg at semeval-2016 task 11: Complex word
classification and statistical modeling for policy and identification with classifier ensembles. In Proceed-
decision making. Policy & Internet 7(2):223–242. ings of SemEval.