Voice and Audio Processing

Our research in the field of audio signal processing is wide and multidisciplinary, with an important focus on technology transfer acknowledged by dozens of patents and several commercial products of great success. Currently our interests spread in the area of singing voice synthesis, voice transformation, source separation and automatic soundscape generation.

Main research topics

Singing Voice Synthesis

For more than a decade we have been developing models and specific approaches for the synthesis of the singing voice based on the concept of performance sampling, with the aim of achieving a natural singing synthesizer (ex: Bonada & Serra, 2007; Bonada, 2008). We have been continuously collaborating with Yamaha Corp in this area, and this collaboration has resulted in the popular Vocaloid commercial synthesizer.

More recent research centers on applying deep learning to singing synthesis. This approach allows us to do things such as modeling a singer's timbre and expression from natural songs (Blaauw & Bonada, 2017 [https://www.mdpi.com/2076-3417/7/12/1313]), or cloning voices from just a few minutes of recordings (Blaauw & Bonada, 2019 [https://arxiv.org/abs/1902.07292]).

Demos

Most recent demos:

Demo 1 (newer, model trained on data with few annotations)

Demo 2 (older, model trained on data with more annotations)

Demo 3 (clone with a few minutes of recordings)

Older demos:

Source Separation

Musical Audio Signal Separation addresses the problem of segregating certain signals from a musical mixture. We focused on the analysis and extraction of the predominant voice from polyphonic music (ex: Marxer et al., 2012), and percussion components (e.g. Janer et al. 2012). These algorithms have various applications including musical production (e.g remixes), entertainment (e.g. karaoke) or cultural heritage (e.g. restoration). More recent work addresses the case of score-informed source separation. In he video demo below, we show how different orchestral instruments can be enhanced.

Score-informed separation of orchestral music (2014)

More demos and information on the project's page.

Other research topics

Real-time voice processing algorithms (ex: Mayor et al., 2011) have been integrated in the licensed Kaleivoicecope technology. We investigated topics such as voice quality transformations, voice impersonation, speech processing, emotional speech synthesis, voice enhancement, non-stationary sinusoidal analysis.

Voice transformation demo (2007)

Singer impersonation demo (1999)

We have contributed to the voice analysis field with several methods for automatically transcribing melody and expression, as well as rating a singing performance (ex: Mayor et al., 2009). We have also adapted voice conversion strategies used in speech to the specificities of the singing voice, allowing creating singer models when only a limited amount of audio material is available (ex: Villavicencio & Bonada, 2010).

Processing polyphonic audio has also drawn our interest, inclusing polyphonic time-scaling, tempo detection, rhythm modification (ex: Janer et al, 2006), tonal analysis and visualization (ex: Gómez & Bonada, 2005), audio mosaicing (ex: Coleman et al, 2010), score following.

Real-time rhythm transformation demo (2006)

Audio mosaicing demo (2009)

In the last years, we have extended the concept of performance sampling to the violin case, contributing with novel approaches for accurately capturing performer gestures with nearly non-intrusive sensing techniques, and statistically modeling the temporal contour of those gestures and the timbre they produce (ex: Maestre et al., 2010). Instrumental gesture modeling has shown to be a natural approach to control physical models, thus to fill the gap between the high-level controls of a symbolic score and the low-level input of the physical system.

Violinist gesture tracking demo (2010)

Violin synthesis controlled with gestures (2010)

The musical score can be used as supporting information in some audio applications. For example, in audio to score alignment the goal is to synchronize a given audio signal to the score. This process can be achieved in real-time or in an offline manner.

Audio to Score Alignment example (2014)

Orchestral music alignment (2014)

Beyond music and voice signals, we applied our algorithms to environmental sounds and bioacoustics as well. For example, the generation of soundscapes (ex: Janer et al., 2011) or analysis and denoising of marine bioacoustic signals.

Soundscape generation demo (2011)

MTG - Music Technology Group

Voice and Audio Processing

Main research topics

Singing Voice Synthesis

Source Separation

Other research topics