Академический Документы
Профессиональный Документы
Культура Документы
1 Introduction
Music emotion has been studied by many researches in the field of psychology,
such as the ones described in [1]. The literature mentions three main models for music
emotions: 1) categorical model, originated from the work of [2], that describes music
in terms of a list of basic emotions [3], 2) dimensional model, originated from the
work of [4], who proposed that all emotions can be described in a Cartesian coordi-
nate system of emotional dimensions [5], and 3) component process model, from the
work of [6] that describes emotion appraised according to the situation of its occur-
rence and the current listener's mental (emotional) state.
Computational models, for the analysis and retrieval of emotional content in music
have also been studied and developed, in particular by the Music Information Re-
trieval (MIR) community, that also maintain a repository of publication on the field
(available at the International Society for MIR: www.ismir.net). To name a few, [7]
studied a computational model for musical genre classification that is similar (al-
though simpler) to emotion retrieval. [8] provided a good example of audio feature
extraction, using multivariate data analysis and behavioral validation of the features.
There are several other examples of computing models for retrieving emotional re-
lated features from music, such as [9] and [10] that studied the retrieval of high-level
features of music, such as tonality, in a variety of music audio files.
In the study of the continuous development of music emotion, [11] used a two-
dimensional model to measure along time the music emotions appraised by listeners,
for several music pieces. The emotion dimensions that he described are: arousal (that
goes from calm to agitated) and valence (that goes from sad to happy). Then, he pro-
posed a linear model with five acoustic descriptors to predict these dimensions in a
time series analysis of each music piece. [12] applied the same listener’s mean ratings
of [11], however, to develop and test a general model (meaning, one same model for
all music pieces). This model was created with System Identification techniques to
predict the same emotional dimensions.
As seeing in [11] and [12] results, these models successfully predicted the dimen-
sion of arousal, with high correlation with the ground-truth. However, the retrieval of
valence has proved to be particularly difficult. This may be due to the fact that the
previous models did not make extensive usage of high-level acoustic features. While
low-level features account for basic temporal psychoacoustic features, such as loud-
ness, roughness or pitch, the high-level ones account for cognitive musical features.
These are contextual-based and deliver one prediction for each overall music excerpt.
If this assumption is true, it is understandable the reason why valence, as a highly
contextual dimension of music emotion, is poorly described by models using low-
level descriptors.
Intuitively, it was expected that valence, as the measurement of happiness in mu-
sic, would be mostly related to the high-level descriptors of: key clarity (major x
minor), harmonic complexity, and pulse clarity. However, as it is described further,
the experimental result pointed out to other perspectives.
Lately, our research group has being involved with the computational development
of eight high-level musical descriptors. They are: 1) Pulse Clarity (the sensation of
pulse in music). 2) Key Clarity (the sensation of a tonal center in music). 3) Harmonic
complexity (the sensation of complexity delivered by the musical chords). 4) Articu-
lation (music feature going from staccato to legato). 5) Repetition (presence of re-
peated musical patterns). 6) Mode (music feature going from minor to major tonali-
ties). 7) Event Density (amount of distinctive and simultaneous musical events). 8)
Brightness (sensation of how bright the music excerpt is). The design of these de-
scriptors was done using Matlab. They all involve different techniques and ap-
proaches whose explanations are too extensive to fit in this work and will be thor-
oughly described in further publications.
To test and improve the development of these descriptors, behavioral data was col-
lected from thirty-three listeners that were asked to rate the same features that were
predicted by the descriptors. They rated one hundred short excerpts of music (five
seconds of length each) from movie sound tracks. Their mean rating was then corre-
lated with the descriptors predictions. After several experiments and adjustments, all
descriptors presented a correlation with this ground-truth above fifty percent.
The table shows that the model reported here performed significantly better than the
previous ones for this specific music piece. The last column of table 1 shows the per-
formance for the high-level descriptor “event density”, the one that presented the
highest correlation with the ground-truth. This descriptor alone presents higher results
than previous models. The results shown seem to suggest that high-level descriptors
can be successfully used to improve the dynamic prediction of valence.
The figure below depicts the comparison between the thirty-three listeners mean
rating for valence in the Aranjuez concerto and the prediction given by the multiple
regressive model using the eight high-level musical descriptors.
Fig. 1. Mean rating of behavioral data for valence (continuous line) and model prediction (dashed
line)
This work is part of a bigger project, called “Tuning you Brain for Music”, the
BrainTuning project (www.braintuning.fi). An important part of it is the study of
acoustic features retrieval from music and the relation of them to specific emotional
appraisals. Following this goal, the high-level descriptors (here briefly described)
were designed and implemented. They were initially conceived out of the evident lack
of such descriptors in the literature. In Braintuning, a fairly large number of studies
for the retrieval of emotional connotations in music were investigated. As seem in
previous models, for the dynamic retrieval of highly contextual emotions such as the
appraisal of happiness (represented by valence), low-level descriptors are not enough,
once that they do not take into consideration the contextual aspects of music.
It was interesting to notice that the high-level prediction of the high-level descrip-
tor “event density” presented the highest correlation with the valence mean rating,
while the predictions of “key clarity” and “mode” correlated very poorly. This seems
to imply that, at least in this particular case, musical sensation of a major or minor
tonality (represented by “mode”) or a tonal center (“key clarity”) is not related to
valence, as it might be intuitively inferred. What most counted here was the amount
of simultaneous musical events (event density). By “event”, it is here understood any
perceivable rhythmic, melodic or harmonic pattern.
This experiment chose the music piece “Aranjuez” because it was the one that pre-
vious models presented the lowest prediction rate for valence. Of course, more ex-
periments are needed and are already planned for further studies. Nevertheless, we
believed that this work may have presented an interesting prospect for the develop-
ment of better high-level descriptors and models for the continuous measurement of
contextual musical features such as valence.
References
1. Sloboda, J. A. and Juslin, P. (Eds.): Music and Emotion: Theory and Research. Oxford:
Oxford University Press. ISBN 0-19-263188-8. (2001)
2. Ekman, P.: An argument for basic emotions. Cognition & Emotion, 6 (3/4): 169–200,
(1992).
3. Juslin, P. N., & Laukka, P.: Communication of emotions in vocal expression and music
performance: Different channels, same code? Psychological Bulletin(129), 770-814. (2003)
4. Russell, J.A.: Core affect and the psychological construction of emotion. Psychological
Review Vol. 110, No. 1, 145- 172. (2003)
5. Laukka, P., Juslin, P. N., & Bresin, R.: A dimensional approach to vocal expression of emo-
tion. Cognition and Emotion, 19, 633-653. (2005)
6. Scherer, K. R., & Zentner, K. R.: Emotional effects of music: production rules. In J. P. N. &
J. A. Sloboda (Eds.), Music and emotion: Theory and research (pp. 361-392). Oxford: Ox-
ford University Press (2001)
7. Tzanetakis, G., & Cook, P.: Musical Genre Classification of Audio Signals. IEEE Transac-
tions on Speech and Audio Processing, 10(5), 293-302. (2002)
8. Leman, M., Vermeulen, V., De Voogdt, L., Moelants, D., & Lesaffre, M.: Correlation of
Gestural Musical Audio Cues. Gesture-Based Communication in Human-Computer Interac-
tion. 5th International Gesture Workshop, GW 2003, 40-54. (2004)
9. Wu, T.-L., & Jeng, S.-K.: Automatic emotion classification of musical segments. Proceed-
ings of the 9th International Conference on Music Perception & Cognition, Bologna, (2006)
10. Gomez, E., & Herrera, P.: Estimating The Tonality Of Polyphonic Audio Files: Cogtive
Versus Machine Learning Modelling Strategies. Paper presented at the Proceedings of the
5th International ISMIR 2004 Conference, October 2004., Barcelona, Spain. (2004)
11. Schubert, E.: Measuring emotion continuously: Validity and reliability of the two-
dimensional emotion space. Aust. J. Psychol., vol. 51, no. 3, pp. 154–165. (1999)
12. Korhonen, M., Clausi, D., Jernigan, M.: Modeling Emotional Content of Music Using
System Identification. IEEE Transactions on Systems, Man and Cybernetics. Volume: 36,
Issue: 3, pages: 588- 599. (2006)