Академический Документы
Профессиональный Документы
Культура Документы
AFFECTIVE I E S A N D AND
F U SENTIMENT
T U R E S ANALYSIS
Editor: Erik cambria, Nanyang Technological University, Singapore, cambria@ntu.edu.sg
Deep Learning-Based
Document Modeling
for Personality
Detection from Text
Navonil Majumder, Instituto Politcnico Nacional
Soujanya Poria, Nanyang Technological University
Alexander Gelbukh, Instituto Politcnico Nacional
Erik Cambria, Nanyang Technological University
P
ersonality is a combination of an individuals Openness (OPN). Is the person inventive and cu-
behavior, emotion, motivation, and thought- rious versus dogmatic and cautious?
pattern characteristics. Our personality has great im-
Texts often reflect various aspects of the au-
pact on our lives; it affects our life choices, well-being, thors personality. In this article, we present a
health, and numerous other preferences. Automatic method to extract personality traits from stream-
detection of a persons personality traits has many of-consciousness essays using a convolutional
important practical applications. In the context of neural network (CNN). We trained five different
sentiment analysis,1 for example, the products and networks, all with the same architecture, for the
services recommended to a person should be those five personality traits (see the Previous Work in
that have been positively evaluated by other users Personality Detection sidebar for more informa-
with a similar personality type. Personality detection tion). Each network was a binary classifier that
can also be exploited for word polarity disambigua- predicted the corresponding trait to be positive or
tion in sentiment lexicons,2 as the same concept can negative.
convey different polarity to different types of people. To this end, we developed a novel document-
In mental health diagnosis, certain diagnoses cor- modeling technique based on a CNN features ex-
relate with certain personality traits. In forensics, tractor. Namely, we fed sentences from the essays to
knowing personality traits helps reduce the circle of convolution filters to obtain the sentence model in
suspects. In human resources management, person- the form of n-gram feature vectors. We represented
ality traits affect ones suitability for certain jobs. each individual essay by aggregating the vectors of
Personality is typically formally described in its sentences. We concatenated the obtained vectors
terms of the Big Five personality traits, 3 which are with the Mairesse features,4 which were extracted
the following binary (yes/no) values: from the texts directly at the preprocessing stage;
this improved the methods performance. Discard-
Extroversion (EXT). Is the person outgoing, talk- ing emotionally neutral input sentences from the es-
ative, and energetic versus reserved and solitary? says further improved the results.
Neuroticism (NEU). Is the person sensitive and For final classification, we fed this document vec-
nervous versus secure and confident? tor into a fully connected neural network with one
Agreeableness (AGR). Is the person trustworthy, hidden layer. Our results outperformed the current
straightforward, generous, and modest versus state of the art for all five traits. Our implementa-
unreliable, complicated, meager, and boastful? tion is publicly available and can be downloaded
Conscientiousness (CON). Is the person effi- freely for research purposes (see http://github.com
cient and organized versus sloppy and careless? /senticnet/personality-detection).
T
he Big Five, also known as the Five Factor Model, is the References
most widely accepted model of personality. Initially, it 1. E. Tupes and R. Christal, Recurrent Personality Factors Based
was developed by several independent groups of re- on Trait Ratings, tech. report ASD-TR-61-97, Lackland Air Force
searchers. However, it was advanced by Ernest Tupes and Ray- Base, 1961.
mond Christal1; J.M. Digman made further advancements,2 2. J. Digman, Personality Structure: Emergence of the Five-
and Lewis Goldberg later perfected it.3 Factor Model, Ann. Rev. Psychology, vol. 41, no. 1, 1990,
pp. 417440.
Some earlier work on automated personality detection from
3. L. Goldberg, The Structure of Phenotypic Personality Traits,
plain text was done by James Pennebaker and Laura King,4 Am. Psychologist, vol. 48, no. 1, 1993, pp. 2634.
who compiled the essay dataset that we used in our experi- 4. J.W. Pennebaker and L.A. King, Linguistic Styles: Language
ments (see http://web.archive.org/web/20160519045708/http:// Use as an Individual Difference, J. Personality and Social Psy-
mypersonality.org/wiki/doku.php?id=wcpr13). For this, they chology, vol. 77, no. 6, 1999, pp. 12961312.
collected stream-of-consciousness essays written by volunteers 5. J.W. Pennebaker, R.J. Booth, and M.E. Francis, Linguistic In-
in a controlled environment and then asked the authors of the quiry and Word Count: LIWC2007, operators manual, 2007.
essays to define their own Big Five personality traits. They used 6. F. Mairesse et al., Using Linguistic Cues for the Automatic Rec-
Linguistic Inquiry and Word Count (LIWC) features to deter- ognition of Personality in Conversation and Text, J. Artificial
Intelligence Research, vol. 30, 2007, pp. 457500.
mine correlation between the essay and personality.5
7. S.M. Mohammad and S. Kiritchenko, Using Hashtags to Cap-
Franois Mairesse and colleagues used, in addition to ture Fine Emotion Categories from Tweets, Computational In-
LIWC, other features, such as imageability, to improve per- telligence, vol. 31, no. 2, 2015, pp. 301326.
formance.6 Saif Mohammad and Svetlana Kiritchenko per- 8. F. Liu, J. Perez, and S. Nowson, A Language-Independent and
formed a thorough study on this essays dataset, as well as Compositional Model for Personality Trait Recognition from
the MyPersonality Facebook status dataset, by applying dif- Short Texts, Computing Research Repository (CoRR), 2016;
ferent combinations of feature sets to outperform Mairesses http://arxiv.org/abs/1610.04345.
results, which they called the Mairesse baseline.7 9. S. Poria et al., A Deeper Look into Sarcastic Tweets using Deep
Recently, Fei Liu and colleagues developed a language- Convolutional Neural Networks, Proc. 26th Intl Conf. Compu-
tational Linguistics, 2016, pp. 16011612.
independent and compositional model for personality trait
10. S. Poria, E. Cambria, and A. Gelbukh, Aspect Extraction for
recognition for short tweets.8 Opinion Mining with a Deep Convolutional Neural Network,
On the other hand, researchers have successfully used Knowledge-Based Systems, vol. 108, 2016, pp. 4249.
deep convolutional networks for related tasks such as senti- 11. S. Poria et al., Convolutional MKL Based Multimodal Emotion
ment analysis,9 aspect extraction,10 and multimodal emotion Recognition and Sentiment Analysis, Proc. IEEE Intl Conf.
recognition.11 Data Mining, 2016, pp. 439448.
Overview of the Method ing and unification, such as reduction with the word2vec embeddings. 5
Our method includes input data pre- to lowercase. This gives a variable-length feature
processing and filtering, feature ex- Document-level feature extraction. set for the document: the document
traction, and classification. We use two We used the Mairesse baseline fea- is represented as a variable number
types of features: a fixed number of ture set, which includes such global of sentences, which are represented
document-level stylistic features, and features as the word count and av- as a variable number of fixed-length
per-word semantic features that are erage sentence length. word feature vectors.
combined into a variable-length repre- Filtering. Some sentences in an es- Classification. For classification, we
sentation of the input text. This vari- say may not carry any personality use a deep CNN. Its initial layers pro-
able-length representation is fed into a clues. Such sentences can be ignored cess the text in a hierarchical man-
CNN, where it is processed in a hier- in semantic feature extraction for ner. Each word is represented in the
archical manner by combining words two reasons: first, they represent input as a fixed-length feature vector
into n-grams, n-grams into sentences, noise that reduces the classifiers using word2vec, and sentences are
and sentences into a whole document. performance, and second, removal represented as a variable number of
The obtained values are then com- of those sentences considerably re- word vectors. At some layer, this vari-
bined with the document-level stylistic duces the input size, and thus the able-length vector is reduced to fixed-
features to form the document repre- training time, without negatively length vector of each sentence, which
sentation used for final classification. affecting the results. So, we remove is a kind of sentence embedding in a
Specifically, our method includes the such sentences before the next step. continuous vector space. At that level,
following steps: Word-level feature extraction. We documents are represented as a vari-
represent individual words by word able number of such fixed-length sen-
Preprocessing. This includes sen- embedding in a continuous vector tence embeddings. Finally, at a deeper
tence splitting as well as data clean- space; specifically, we experimented layer, this variable-length document
rd t
cto n
ve ume
Fully connecte
c
Accordingly, the network comprises
Do
d layer
seven layers: input (word vectorization),
Mairesse feat
ures d Mairesse convolution (sentence vectorization),
1-max pooling max pooling (sentence vectoriza-
layer
cto ce
tion), 1-max pooling (document vec-
ve nten
rs
i
torization), concatenation (document
Se
Concatenation vectorization), linear with Sigmoid
layer
activation (classification), and two-
neuron softmax output (classification).
Max pooling la Figure 1 depicts the end-to-end
yer network for two sentences. In the rest
of this article, we discuss these steps
and layers in detail.
Input
200
We represent the dataset as a set of
Convolution la documents: each document d is a se-
yer
quence of sentences, each sentence si
is a sequence of words, and each word
200 wj is a real-valued vector of fixed
wi
I length known as word embedding. In
vis ll
In t i It our experiments, we used Googles
in dia is
wi to pretrained word2vec embeddings.5
nte Word ho o
t
Word embedd
r
vector su
in Thus, our input layer is a four-
ing mm
size: 300 er dimensional real-valued array from
DSWE , in which D is the number
of documents in the dataset, S is the
Figure 1. Architecture of our network. The network consists of seven layers. The input maximum number of sentences in a
layer (shown at the bottom) corresponds to the sequence of input sentences (only
two are shown). The next two layers include three parts, corresponding to trigrams,
document across all documents, W is
bigrams, and unigrams. The dotted lines delimit the area in a previous layer to which the maximum number of words in a
a neuron of the next layer is connectedfor example, the bottom-right rectangle sentence across all documents, and E
shows the area comprising three word vectors connected with a trigram neuron. is the length of word embeddings.
In implementation, to force all doc-
vector is reduced to a fixed-length doc- Network Architecture uments to contain the same number
ument vector. This fixed-length fea- We trained five separate neural classifiers, of sentences, we padded shorter doc-
ture vector is then concatenated with all with the same architecture, for the Big uments with dummy sentences. Simi-
the document-level features giving a Five personality traits. The processing flow larly, we padded shorter sentences
fixed-length document vector, which in our network comprises four main steps: with dummy words.
is then used for final classification.
word vectorization, in which we use Aggregating Word Vectors into
When aggregating word vectors into fixed-length word2vec word embed- Sentence Vectors
sentence vectors, we use convolution to dings as input data; We use three convolutional filters to
form word n-gram features. However, sentence vectorization, from se- extract unigram, bigram, and trigram
when aggregating sentence vectors into quences of words in each sentence features from each sentence. After max
the document vector, we do not use to fixed-length sentence vectors; pooling, the sentence vector is a concat-
convolution to form sentence n-gram document vectorization, from the enation of the feature vectors obtained
features. We tried this arrangement, sequence of sentence vectors to the from these three convolutional filters.
but the network did not converge in 75 document vector; and
epochs, so we left this experiment to classification, from the document vec- Convolution. To extract the n-gram
our future work. tor to the classification result (yes/no). features, we apply a convolutional filter
Preprocessing. We split the text into Sentence filtering. We assumed that Word n-gram baseline. As a baseline
a sequence of sentences at the period a relevant sentence would have at least feature set, we used 30,000 features:
and question mark characters. Then one emotionally charged word. After 10,000 most-frequent-word unigrams,
we split each sentence into words at extracting the document-level features, bigrams, and trigrams in our dataset. We
whitespace characters. We reduced but before extracting the word2vec fea- used the Scikit-learn library to extract
all letters to lowercase and removed tures, we discarded all sentences that these features from the documents.11
all characters other than ASCII let- had no emotionally charged words.
ters, digits, exclamation marks, and We used the NRC Emotion Lexicon Classification. We experimented with
single and double quotation marks. (http://saifmohammad.com/WebPages/ three classification settings. In the vari-
Some essays in the dataset con- NRC-Emotion-Lexicon.htm) to ob- ant marked MLP in Table 1, we used
tained no periods or missing periods, tain emotionally charged words.9,10 the network shown in Figure 1, which is
resulting in absurdly long sentences. This lexicon contains 14,182 words a multiple-layer perceptron (MLP) with
For these cases, we split each obtained tagged with 10 attributes: anger, antici- one hidden layer, trained together with
sentence that was longer than 150 pation, disgust, fear, joy, negative, posi- the CNN. In the variant marked SVM
words into sentences of 20 words tive, sadness, surprise, and trust. We (support vector machine) in the table,
each (except the last piece that could considered a word to be emotionally we first trained the network shown in
happen to be shorter). charged if it had at least one of these at- Figure 1 to obtain the corresponding
tributes; there are 6,468 such words in document vector d for each document
Extracting document-level features. the lexicon (most of the words in this in the dataset, and then used these vec-
We used Mairesse and colleagues lexicon have no attributes). tors to train a polynomial SVM of de-
library (http://farm2.user.srcf.net So, if a sentence contained none of gree 3. In the variant marked sMLP/
/ r e search /personalit y/recog ni zer the 6,468 words, we removed it be- MP in the table, in a similar manner
.html) to extract the 84 Mairesse fea- fore extracting the word2vec features we used the vectors d (the max pool-
tures from each document.4 from the text. In our dataset, all es- ing layer) to train a stand-alone MLP