Вы находитесь на странице: 1из 8

Disclaimer: A report submitted to Dublin City University, School of Computing for module

CA437: Multimedia Information Retrieval, 2009/2010. I hereby certify that the work presented and
the material contained herein is my own except where explicitly stated references to other material
are made.

Relevance Feedback and Metadata


for Multimedia Information Retrieval

Eoin Costelloe

CA437: Multimedia Information Retrieval


Abstract:

This report explores the concept of how new technologies can convert multimedia into index
information with help from relevance feedback. This new index information can then be used to
create the metadata for the multimedia. In the case of music, technologies such as shazam or tunatic
are able to figure out the song by listening to it. Once the song is found, we can then pick up
additional metadata such as genre, title and artist. The user can then search this metadata to find the
music they want (Reference 1). We can then use implicit relevance feedback to generate relevance
information about the music results.

Overview:

The systems functionality is that it will take in a database of audio files and generate a large
amount of metadata which the user can then search over. It will first extract the standard
information from the audio file which includes: length, size, format (mp3, wma etc.) and also
contain information about the files location such as user uploaded by or website audio file was
found on. The system will then extract the song information by matching the audio to an audio
fingerprint in a database of most popular songs. This may include referencing an external database
of song information. Once the standard and song information is found we store these in a meta-
database.

The user can now do a simple text retrieval search over this indexed information to find their
music. With Audiorank, we can weigh audio files which closely match their song information with
higher ranks. For example, if the song information states that a song is 6 minutes long, then an
audio file which is only 2 minutes long is probably not what the user is searching for. This rank can
be done offline as it is not based on the users query.

The system can also do implicit relevance feedback to link music together. This is where a
session for a user may include several searches for different types of music which may be
associated with one another. This can be assigned as similar artists or similar tracks. The system can
also be helped with explicit relevance feedback done through user assigned tags and genres etc.
This user dependant information will grow over time and may be more valuable than the initial
information and will be controlled as such.
Functional Description:

The system must be able to deliver the following functionality:

Maintain Database of Audio Files:

Description: It must be able to generate a database of audio files. This can be a set of links to
external audio files such as files found on other websites on the internet (Reference 2) etc. or a list
of locally saved audio files. These files can be uploaded by users through a GUI once they create a
user account which is then associated with every file they upload.

Criticality: This is vital to the system as these audio files are what the user is actually looking for.

Technical Issues: The links to external audio files will have to be maintained to make sure they
point to the correct place and are still valid. Also the locally saved audio files will have to be easily
accessible as these will be downloaded by users using the system.

Dependencies: Requires a connection to the internet and a way of searching for raw audio files.

Metadata Generation:

Description: It must be able to generate metadata for each audio file including user information, file
information and music information. Music information is done by extracting audio information
using audio analysis. It then generates an Audiorank for this file based on this information.
Similarity between audio files can be caught here and so a list of alternate download locations can
be created for audio files that are similar.

Criticality: Metadata is required for the text search to work correctly.

Technical Issues: Music information may not be able to be extracted from the audio file as audio
analysis is not perfect. This may lead to faulty information.

Dependencies: Audio analysis requires an internet connection to match audio fingerprints with
current audio file.
User Text Search:

Description: It must also allow users to search through this meta-database. This is done through a
GUI allowing the users to send a text query (Figure 1) and get the result (Figure 2). This result
includes a dynamic way of playing the audio files and an image representing the audio artist and
audio album.

Criticality: The search does not need to be graphical but helps in ease of use. This is required to
allow users to efficiently search through the database of audio files.

Technical Issues: Some audio files may not be able to play through the GUI.

Dependencies: This requires a meta-database to search through. This is just a standard text retrieval
search through this database.

Relevance Feedback:

Description: The system must also be able to extract additional information from user queries to
find relevant music. This includes generating automatic and manual feedback which create
additional metadata for audio files such as genre and artist compatibility (Reference 3). This
information can then be used in future queries to find relevant artists.

Criticality: This is not required as genre and artist compatibility may already be generated in the
Metadata generation phase. This is only needed to keep the system up to date.

Technical Issues: User feedback may be incorrect and should be aggregated for full effect before
being applied to the metadata.

Dependencies: This is considered an additional feature for the User Text Search phase.
Figure 1:

Figure 2:
Implementation Plan:

This system can be implemented as a web interface for general users and an offline meta-
database generation. The web interface will allow users to create text search queries to query to
meta-database for audio files that they want and will also allow the user to add their own audio files
to the audio files database. An automatic crawler will also search the internet for raw audio files to
be added to the audio files database. An automatic script will also extract information from the
audio files database and populate the meta-database with the results.
The most important interface to this system is the GUI for the user to enter their search
queries as this will be the most used and will describe the system in most detail to general users. If
the system is to be most up to date we also require the metadata generation to occur fast on newly
inserted audio files. The system should also be very easy to use as its design and functionality for
the user is very simple.

Evaluation Plan:

This system should generate a list of relevant results to users queries. This means that it
should find the relevant metadata to a users search and this metadata should also be relevant to the
audio file. This is helped by relevance feedback where relevant search results for a query are given
a higher rank in later searches. The system should also effectively take in a raw audio file and
generate searchable index information for this file. This is then stored in a meta-database. It also
should allow users to add audio files to the system.
System Architecture Diagram:
References:
1. Author: A. Nanopoulos, Journal: Information Processing and Management, Volume: 45,
Year of Publication: (2009), Pages: 392 – 396, Title: 'Music search engines: Specifications
and challenges', Website: http://delab.csd.auth.gr/papers/IPM09nrrm.pdf.
2. Author: O. Celma, P. Cano, P. Herrera, Conference Paper: 7th International Conference on
Music Information Retrieval (ISMIR), Year of Publication: (2006), Title: 'Search Sounds:
An audio crawler focused on weblogs', Website:
http://ismir2006.ismir.net/PAPERS/ISMIR06144_Paper.pdf.
3. Author: M. Levy, M. Sandler, Journal: Austrian Computer Society, Year of Publication:
(2007), Title: 'A Semantic Space For Music Derived From Social Tags',
Website:http://ismir2007.ismir.net/proceedings/ISMIR2007_p411_levy.pdf .

Appendices:
• SkreemR Search Incorporated, Website: http://skreemr.com/, Description: music search
engine
• Sylvain Demongeot, Website: http://www.wildbits.com/tunatic/, Description: music track
identifier
• Shazam Entertainment Limited, Website: http://www.shazam.com/, Description: music track
identifier

Glossary:

Audio Fingerprint: This allows the system to match an audio file with a specific song. This must
contain enough information so as to be unique against most other songs and acoustic models.

Extensible Markup Language (XML): a protocol for containing and managing information.

Metadata: This is descriptive information about data. In the context of this report, the descriptive
information is user information, file information and music information and the data is an audio file.
This will most likely be stored in XML format.

Meta-database: A list that contains standard and song information about a specific audio file in the
music database.

Graphical User Interface (GUI): a way of interacting with a program in a visual way rather than
simple text.

Audiorank: this makes sure significant files appear higher on search results. Significant files are
ones which are saved locally and are likely to be full length songs etc. This is based on SkreemR's
Audiorank system (http://skreemr.com/audiorank.jsp),