|
We have developed a computer audition system that can:
1) Annotate novel audio tracks with semantically meaningful words and 2) Use a semantic query to retrieve relevant tracks from database of unlabeled audio content. We consider the related tasks of content-based audio annotation and retrieval as one supervised multi-class problem in which we model the joint probability of acoustic features and words. We have collected the CAL-500 data set of 1700 human-generated musical annotations that describe 500 popular western musical tracks. For each word in a vocabulary, we use this data to train a Gaussian mixture model (GMM) over an audio feature space. We estimate the parameters of the model using the weighted mixture hierarchies expectation maximization algorithm [ref]. This algorithm is more scalable to large data sets and produces better density estimates than standard parameter estimation techniques. The quality of the music annotations produced by our system is comparable with the performance of humans on the same task. Our 'query-by-text' system can retrieve appropriate songs for a large number of musically relevant words. We also show that our audition system is general by learning a model that can annotate and retrieve sound effects. | ||
| Here is an example of how our system can be used to generate a completely automated record review. We annotate a random song from our dataset, chose the top words from a number of categories (genre, instrumentation, etc.) and place these words in a natural language context. Words in bold are generated by our system. Automatic Rolling Stone! | ||
| ||
CAL-500 semantic vocabulary for music analysis
Automatic annotations of the CAL-500 dataset
Download CAL-500
| Researchers interested in using the annotations in the data set for their own computer audition research can request a free copy of our data set by emailing Doug Turnbull ( dturnbul -at- cs -dot- ucsd -dot- edu). Please let us know how and why you plan to use this data. |
Relevant Publications
| D. Turnbull, L. Barrington, D. Torres, G. Lanckriet Semantic Annotation and Retrieval of Music and Sound Effects - IEEE Transactions on Audio, Speech, and Language Processing, February 2008 bib |
| Turnbull, Barrington, Torres & Lanckriet (2007) - Towards Musical Query-by-Semantic Description using the CAL500 Data Set. ACM SIGIR, Amsterdam, July 2007 bib |
| Barrington, Chan, Turnbull & Lanckriet (2007) - Audio Information Retrieval Using Semantic Similarity. International Conference on Acoustic, Speech and Signal Processing (ICASSP), Hawaii, April 2007 bib |
| Turnbull, Barrington, Torres & Lanckriet (2007) - Exploring the Semantic Annotation and Retrieval of Sound. CAL Technical Report CAL-2007-01, San Diego, February 2007 |
| Turnbull, Barrington & Lanckriet (2006) - Modeling Music and Words using a Multi-Class naive Bayes Approach. International Symposium on Music Information Retrieval (ISMIR), Victoria, October 2006 |
Related Work
| Dan Ellis at Columbia's LabROSA. |
| Brian Whitman, graduate of the MIT Media Lab. |
| Malcolm Slaney, now at Yahoo! Research. |
| Nuno Vasconcelos, Gustavo Carneiro, Antoni Chan and Nikhil Rasiwasia at UCSD's Statistical Visual Computing Lab. |