Audio Signal Processing Lab

Lab members:

 Xavier Serra, Faculty, head of lab  Georgi Dzhambazov, PhD student
 Dmitry Bogdanov, Postdoc  Gong Rong, PhD student
 Frederic Font, Postdoc  Alastair Porter, researcher
 Gopala Krishna Koduri, PhD student  Oriol Romani, researcher
 Sertan Şentürk, PhD student  Andrés Ferraro, researcher
 Sankalp Gulati, Postdoc  Hasan Sercan Atlı, researcher
 Jordi Pons, PhD student  Swapnil Gupta, researcher
 Rafael Caro Repetto, PhD student  Eduardo Fonseca, PhD student
 Sergio Oramas, PhD student  Xavier Favory, PhD student

Current focus of the Audio Signal Processing Lab of the MTG is to combine audio signal processing methods with machine learning and semantic technologies in order to create large and structured sound and music collections and to extract useful musical knowledge from them. Our current research is partly supported through several EU projects (CompMusicAudioCommons, CAMUT) and national projects (MINGUS, DTIC-MdM).

In the context of CompMusic we are interested in the development of music description techniques through the study of the art music traditions of India (Hindustani and Carnatic), Turkey (Turkish-makam), Maghreb (Arab-Andalusian), and China (Beijing Opera) (ex: Serra 2012Serra, 2011). Our approach is based on combining signal-processing and machine-learning methodologies and thus a big effort has been dedicated to put together appropriate research corpora with which to carry this data-driven work. Using these corpora we have been focusing on the study on melodic and rhythmic issues with the goal to identify musically meaningful patterns and develop similarity measures between the relevant data entities. This work is permitting us to develop Dunya, which integrates the music corpora and software tools with which to browse them.

In the context of AudioCommons (Font et al., 2016) we are interested in developing technologies that can enhance the reuse potential of openly available audio content. These technologies relate to the semantic description of audio content, such as the one available in We are working on developing and sound representation approaches and ontologies for particular use cases of the creative industries, and also on developing feature analysis methodologies, using Essentia, that can generate semantically meaningful descriptions for those same use cases.

Most of the core signal processing algorithms being developed and used in our research projects are part of Essentia, an open-source C++ library for audio processing optimised for scalability (Bogdanov et al., 2013).

Since 2005 we have been developing and maintaining, a platform with which we do research on social computing and semantic web topics (ex: Font , 2015;  Roma et al., 2012). Freesound is an excellent platform in which we have been experimenting, deploying and evaluating research ideas related to audio description, classification, recommendation, similarity measures, tag propagation and ontologies. 

More recently we are actively involved in the development of, an open platform for crowdsourcing audio analysis data of commercial music recordings, obtained using Essentia, that can be of use for a variety of music information research and application tasks.