Академический Документы
Профессиональный Документы
Культура Документы
179
ISMIR 2008 – Session 2a – Music Recommendation and Organization
tion underlying this approach is that the probability that a Organizing Map that we use to organize the collection (cf. Section 4).
180
ISMIR 2008 – Session 2a – Music Recommendation and Organization
a newly generated SOM at a deeper hierarchy level. A more [1, 2]. Finally, each piece of music p represented by map unit
detailed explanation of the GHSOM algorithm is given, for u is assigned a prototypicality value, which is the higher, the
example, in [3]. more prototypical p is for u.
In order to suit our need for playlist generation, we had
to modify the original GHSOM algorithm in two regards. ru (p) = rsu (p) · rw
u
(a) (1)
First, we redefined the neighborhood function used in the
training process. In particular, we defined a cyclic neighbor- 1
rsu (p) = norm (2)
hood relation that extends its range over the map’s borders, 1 + ln(1 + |p − u|)
i.e., considers the last row/column of the map whenever a
map unit in the first row/column is modified and vice versa. 6 EVALUATION
Second, each time a map unit u is expanded to a new SOM
at a deeper hierarchy level, this new SOM is initialized us- We conducted two sets of evaluation experiments. The first
ing the model vectors of u’s neighboring units. aimed at comparing our GHSOM-based playlist generation
These two modifications allow for a very simple playlist approach to the ones proposed in [9, 8]. In accordance with
generation approach in that we just have to train a 1-dim- [9], the quality of the playlist was evaluated by calculating
ensional GHSOM on the input data and ensure the recursive the genre entropy for a number of consecutive tracks. Sec-
expansion of map units until each track is represented by ond, we assessed the quality of our approach to detecting
its sole unit. Traversing the resulting GHSOM tree in pre- prototypical tracks via a user study.
order while successively storing all tracks represented by the For evaluation, we used the same collection as used in [8].
leave nodes eventually yields a playlist containing all pieces This collection contains 2,545 tracks by 103 artists, grouped
of the collection. As for training the GHSOM, we use the in 13 genres. For details on the genre distribution, please
audio similarity matrix, whose creation is described in Sub- consider [8].
section 3.1. For performance reasons, however, we apply
Principal Components Analysis (PCA) [5] to reduce the di-
mensionality of the feature space to 30. 2 6.1 Entropy of the Playlist
To evaluate the coherence of the playlists generated by our
5 FINDING CLUSTER PROTOTYPES GHSOM-based approach, we compute the entropy of the
genre distribution on sequences of the playlists. More pre-
One of the main goals of this work is determining proto- cisely, using each piece of the collection as starting point,
typical tracks that can serve as representatives to describe we count how many of n consecutive tracks belong to each
each cluster, i.e., map unit, of the GHSOM. However, pre- genre. The result is then normalized and interpreted as a
liminary experiments showed that relying solely on audio- probability distribution, on which the Shannon entropy is
based information to calculate a representative piece of mu- calculated, according to Equation 3, where log2 p(x) = 0 if
sic for each cluster yields unsatisfactory results, as already p(x) = 0.
explained in Section 1. To alleviate this problem, we in-
corporate artist prototypicality information extracted as de- H(x) = − p(x) log2 p(x) (3)
x
scribed in Subsection 3.2 into an audio-based prototypical-
ity ranking. The resulting ranking function thus combines In Figure 1, the entropy values for n = 2 . . . 318 are
audio features of all tracks in the cluster under consider- given (plot “audio ghsom”), averaged over the whole play-
ation with Web-based artist prototypicality information as list, i.e., each track of the playlist is chosen once as the start-
detailed in the following. ing track for a sequence of length n. 4
Denoting the map unit under consideration as u, the ranking Comparing the entropy values given by the GHSOM ap-
function ru (p) that assigns a prototypicality estimation to proach to the ones presented in [8] reveals that the GH-
the representation in feature space of each 3 piece of music SOM approach performs very similar to the SOM-based ap-
p by artist a is calculated as indicated in Equation 1, where proach followed in [8]. This is obviously no surprise, but
rsu (p) denotes the audio signal-based part of the ranking shows that the automated splitting of clusters by the GH-
u
function (cf. Equation 2) and rw (a) is the corrected Web- SOM does not harm the entropy of the generated playlists,
based ranking function from Subsection 3.2. In the signal- while at the same time offers the additional benefit of auto-
based ranking function, u denotes the mean of the feature matically structuring the created playlists in a hierarchical
vectors represented by u, | · | is the Euclidean distance, and manner. Like those created by the recursive SOM approach
norm(·) is a normalization function that shifts the range to in [8], the GHSOM-based playlists show quite good long-
2 30 dimensions seemed appropriate for our test collection of 2,545 term entropy values (for large values of n), but rather poor
pieces of music. Employing the approach on larger collections, most prob- values for short playlists.
ably requires increasing this number.
3 More precisely, “each” refers to those pieces of music that are repre- 4 318 tracks correspond to an angular extent of 45 degrees on the circular
sented by the map unit u under consideration. track selector in the Traveller’s Sound Player application.
181
ISMIR 2008 – Session 2a – Music Recommendation and Organization
all data items. tracks, but this sample remained constant for all participants
182
ISMIR 2008 – Session 2a – Music Recommendation and Organization
183
ISMIR 2008 – Session 2a – Music Recommendation and Organization
8 CONCLUSIONS AND FUTURE WORK [2] J.-J. Aucouturier, F. Pachet, and M. Sandler. ”The Way
It Sounds”: Timbre Models for Analysis and Retrieval
We presented an approach to automatically cluster the tracks of Music Signals. IEEE Transactions on Multimedia,
in a music collection in a hierarchical fashion with the aim 7(6):1028–1035, 2005.
of generating playlists. Each track is represented by similar-
ities calculated on MFCC features, which are used as input [3] M. Dittenbach, D. Merkl, and A. Rauber. The Grow-
to the clustering algorithm. To this end, we reimplemented ing Hierarchical Self-Organizing Map. In Proc. of the
and modified the Growing Hierarchical Self-Organizing Map Int’l Joint Conference on Neural Networks, Como, Italy,
(GHSOM). We further proposed an approach to determine, 2000. IEEE Computer Society.
for each of the clusters given by the GHSOM, the most rep- [4] M. Dittenbach, A. Rauber, and D. Merkl. Uncovering
resentative track. This is achieved by combining the MFCC- Hierarchical Structure in Data Using the Growing Hier-
based audio features with artist-related information extracted archical Self-Organizing Map. Neurocomputing, 48(1–
from the Web. We finally integrated the two approaches 4):199–216, 2002.
into a music player software called the “Traveller’s Sound
Player”, for which we elaborated an extension to deal with [5] H. Hotelling. Analysis of a Complex of Statistical Vari-
the hierarchical data. ables Into Principal Components. Journal of Educa-
We carried out evaluation experiments in two ways. First, tional Psychology, 24:417–441 and 498–520, 1933.
we aimed at assessing the coherence of the playlists gen-
erated by the GHSOM approach by measuring their genre [6] T. Kohonen. Self-Organizing Maps, volume 30 of
entropy. We compared these entropy values to those given Springer Series in Information Sciences. Springer,
by other playlist generation approaches and found that they Berlin, Germany, 3rd edition, 2001.
largely correspond to a similar approach based on standard
[7] B. Logan. Mel Frequency Cepstral Coefficients for Mu-
SOMs. In addition, our approach also yields a hierarchically
sic Modeling. In Proc. of the Int’l Symposium on Music
organization of the collection and determines a representa-
Information Retrieval, Plymouth, MA, USA, 2000.
tive for each cluster, while at the same time the good long-
term entropy values of the (GH)SOM-based playlist gener- [8] T. Pohle, P. Knees, M. Schedl, E. Pampalk, and G. Wid-
ation are retained. Second, we conducted a user study to as- mer. “Reinventing the Wheel”: A Novel Approach to
sess the quality of the found cluster prototypes. Even though Music Player Interfaces. IEEE Transactions on Multi-
the results are still improvable, we could exceed the baseline media, 9:567–575, 2007.
considerably. Especially when taking into account the chal-
lenging setup of the user study, the results are promising. [9] T. Pohle, E. Pampalk, and G. Widmer. Generating
As for future work, we are thinking of integrating the artist Similarity-based Playlists Using Traveling Salesman Al-
distribution of the clusters into the ranking function, as the gorithms. In Proc. of the 8th Int’l Conference on Digital
user study showed that users tend to prefer representative Audio Effects, Madrid, Spain, 2005.
tracks by artists who occur often in the cluster under con-
[10] M. Schedl, P. Knees, K. Seyerlehner, and T. Pohle. The
sideration. Similar considerations would probably also be
CoMIRVA Toolkit for Visualizing Music-Related Data.
suited to filter out outliers. Furthermore, we have to experi-
In Proc. of the 9th Eurographics/IEEE VGTC Sympo-
ment with different user interfaces for hierarchically access-
sium on Visualization, Norrköping, Sweden, 2007.
ing music collections as the integration of the cluster repre-
sentatives into the Traveller’s Sound Player is currently done [11] M. Schedl, P. Knees, and G. Widmer. Improving Proto-
in a quite experimental fashion. Our special concern in this typical Artist Detection by Penalizing Exorbitant Popu-
context will be the usability of the user interface for mobile larity. In Proc. of the 3rd Int’l Symposium on Computer
devices. Music Modeling and Retrieval, Pisa, Italy, 2005.
[12] M. Schedl, P. Knees, and G. Widmer. Investigating
9 ACKNOWLEDGMENTS Web-Based Approaches to Revealing Prototypical Mu-
sic Artists in Genre Taxonomies. In Proc. of the 1st IEEE
This research is supported by the Austrian Fonds zur Förder- Int’l Conference on Digital Information Management,
ung der Wissenschaftlichen Forschung (FWF) under project Bangalore, India, 2006.
numbers L112-N04 and L511-N15.
[13] B. Whitman and S. Lawrence. Inferring Descriptions
and Similarity for Music from Community Metadata.
10 REFERENCES
In Proc. of the 2002 Int’l Computer Music Conference,
Göteborg, Sweden, 2002.
[1] J.-J. Aucouturier and F. Pachet. Improving Timbre Sim-
ilarity: How High is the Sky? Journal of Negative Re-
sults in Speech and Audio Sciences, 1(1), 2004.
184