Semantic A nnotation and Retrieval
of
Audio Content

Work by Douglas Turnbull, Luke Barrington, David Torres, and Prof. Gert Lanckriet


We have developed a computer audition system that can:
1) Annotate novel audio tracks with semantically meaningful words and
2) Use a semantic query to retrieve relevant tracks from database of unlabeled audio content.
We consider the related tasks of content-based audio annotation and retrieval as one supervised multi-class problem in which we model the joint probability of acoustic features and words.

We have collected the CAL-500 data set of 1700 human-generated musical annotations that describe 500 popular western musical tracks.

For each word in a vocabulary, we use this data to train a Gaussian mixture model (GMM) over an audio feature space. We estimate the parameters of the model using the weighted mixture hierarchies expectation maximization algorithm [ref]. This algorithm is more scalable to large data sets and produces better density estimates than standard parameter estimation techniques.

The quality of the music annotations produced by our system is comparable with the performance of humans on the same task. Our 'query-by-text' system can retrieve appropriate songs for a large number of musically relevant words. We also show that our audition system is general by learning a model that can annotate and retrieve sound effects.

Here is an example of how our system can be used to generate a completely automated record review. We annotate a random song from our dataset, chose the top words from a number of categories (genre, instrumentation, etc.) and place these words in a natural language context. Words in bold are generated by our system. Automatic Rolling Stone!

Eminem - Dead Wrong
This is a hip hop/rap song that also has a electronica feel. It is not tender / soft and arousing / awakening. It features drum machine, samples, synthesizer and a nice male vocal solo. The vocals are rapping and strong. It is a song that is very danceable and with a synthesized texture that you might like to listen to while at a party.

CAL-500 semantic vocabulary for music analysis
Automatic annotations of the CAL-500 dataset


Download CAL-500

Researchers interested in using the annotations in the data set for their own computer audition research can request a free copy of our data set by emailing Doug Turnbull ( dturnbul -at- cs -dot- ucsd -dot- edu). Please let us know how and why you plan to use this data.

Relevant Publications
D. Turnbull, L. Barrington, D. Torres, G. Lanckriet Semantic Annotation and Retrieval of Music and Sound Effects - IEEE Transactions on Audio, Speech, and Language Processing, February 2008 bib
Turnbull, Barrington, Torres & Lanckriet (2007) - Towards Musical Query-by-Semantic Description using the CAL500 Data Set. ACM SIGIR, Amsterdam, July 2007 bib
Barrington, Chan, Turnbull & Lanckriet (2007) - Audio Information Retrieval Using Semantic Similarity. International Conference on Acoustic, Speech and Signal Processing (ICASSP), Hawaii, April 2007 bib
Turnbull, Barrington, Torres & Lanckriet (2007) - Exploring the Semantic Annotation and Retrieval of Sound. CAL Technical Report CAL-2007-01, San Diego, February 2007
Turnbull, Barrington & Lanckriet (2006) - Modeling Music and Words using a Multi-Class naive Bayes Approach. International Symposium on Music Information Retrieval (ISMIR), Victoria, October 2006


Related Work
Dan Ellis at Columbia's LabROSA.
Brian Whitman, graduate of the MIT Media Lab.
Malcolm Slaney, now at Yahoo! Research.
Nuno Vasconcelos, Gustavo Carneiro, Antoni Chan and Nikhil Rasiwasia at UCSD's Statistical Visual Computing Lab.