Back Gerard Roma defends his PhD thesis on June 5th

Gerard Roma defends his PhD thesis on June 5th

26.05.2015

 

5 Jun 2015

Gerard Roma defends his PhD thesis entitled "Algorithms and representations for supporting online music creation
with large-scale audio databases" on Friday June 5th 2015 at 11:00h in room 55.309 of the Communication Campus of the UPF.
 
The jury of the defense is: Sergi Jordà (UPF), Diemo Schwarz (IRCAM), Enric Guaus (ESMUC)
 
Thesis abstract:
The rapid adoption of Internet and web technologies has created an opportunity for making music collaboratively by sharing information online. However, current applications for online music making do not take advantage of the potential of shared information. The goal of this dissertation is to provide and evaluate algorithms and representations for interacting with large audio databases that facilitate music creation by online communities. This work has been developed in the context of Freesound, a large-scale, community-driven database of audio recordings shared under Creative Commons (CC) licenses. The diversity of sounds available through this kind of platform is unprecedented. At the same time, the unstructured nature ofcommunity-driven processes poses new challenges for indexing and retrieving information to support musical creativity. In this dissertation we propose and evaluate algorithms and representations for dealing with the main elements required by online music making applications based on large-scale audio databases: sound files, including time-varying and aggregate representations, taxonomies for retrieving sounds, music representations and community models. As a generic low-level representation for audio signals, we analyze the framework of cepstral coefficients, evaluating their performance with example classification tasks. We found that switching to more recent auditory filter such as gammatone filters improves, at large scales, on traditional representations based on the mel scale. We then consider common types of sounds for obtaining aggregated representations. We show that several time series analysis features computed from the cepstral coefficients complement traditional statistics for improved performance. For interacting with large databases of sounds, we propose a novel unsupervised algorithm that automatically generates taxonomical organizations based on the low-level signal representations. Based on user studies, we show that our approach can be used in place of traditional supervised classification approaches for providing a lexicon of acoustic categories suitable for creative applications. Next, a computational representation is described for music based on audio samples. We demonstrate through a user experiment that it facilitates collaborative creation and supports computational analysis using the lexicons generated by sound taxonomies. Finally, we deal with representation and analysis of user communities. We propose a method for measuring collective creativity in audio sharing. By analyzing the activity of the Freesound community over a period of more than 5 years, we show that the proposed creativity measures can be significantly related to social structure characterized by network analysis.

 

Multimedia

Categories:

SDG - Sustainable Development Goals:

Els ODS a la UPF

Contact