Voice and Audio Processing
Main research topics
For more than a decade we have been developing models and specific approaches for the synthesis of the singing voice based on the concept of performance sampling, with the aim of achieving a natural singing synthesizer (ex: Bonada & Serra, 2007; Bonada, 2008). We have been continuously collaborating with Yamaha Corp in this area, and this collaboration has resulted in the popular Vocaloid commercial synthesizer. More recent research deals with singing style modeling that learns how to imitate the expression of a singer given some of her/his recordings.
|Growl synthesis demo (2013)||Singing style modeling demo (2012)|
Diphone concatenative synthesis with growls
Musical Audio Signal Separation addresses the problem of segregating certain signals from a musical mixture. We focused on the analysis and extraction of the predominant voice from polyphonic music (ex: Marxer et al., 2012), and percussion components (e.g. Janer et al. 2012). These algorithms have various applications including musical production (e.g remixes), entertainment (e.g. karaoke) or cultural heritage (e.g. restoration). More recent work addresses the case of score-informed source separation. In he video demo below, we show how different orchestral instruments can be enhanced.
|Lead vocals separation examples (2012)||Score-informed separation of orchestral music (2014)|
Other research topics
Real-time voice processing algorithms (ex: Mayor et al., 2011) have been integrated in the licensed Kaleivoicecope technology. We investigated topics such as voice quality transformations, voice impersonation, speech processing, emotional speech synthesis, voice enhancement, non-stationary sinusoidal analysis.
|Voice transformation demo (2007)||Singer impersonation demo (1999)|
We have contributed to the voice analysis field with several methods for automatically transcribing melody and expression, as well as rating a singing performance (ex: Mayor et al., 2009). We have also adapted voice conversion strategies used in speech to the specificities of the singing voice, allowing creating singer models when only a limited amount of audio material is available (ex: Villavicencio & Bonada, 2010).
Processing polyphonic audio has also drawn our interest, inclusing polyphonic time-scaling, tempo detection, rhythm modification (ex: Janer et al, 2006), tonal analysis and visualization (ex: Gómez & Bonada, 2005), audio mosaicing (ex: Coleman et al, 2010), score following.
|Real-time rhythm transformation demo (2006)||Audio mosaicing demo (2009)|
In the last years, we have extended the concept of performance sampling to the violin case, contributing with novel approaches for accurately capturing performer gestures with nearly non-intrusive sensing techniques, and statistically modeling the temporal contour of those gestures and the timbre they produce (ex: Maestre et al., 2010). Instrumental gesture modeling has shown to be a natural approach to control physical models, thus to fill the gap between the high-level controls of a symbolic score and the low-level input of the physical system.
|Violinist gesture tracking demo (2010)||Violin synthesis controlled with gestures (2010)|
The musical score can be used as supporting information in some audio applications. For example, in audio to score alignment the goal is to synchronize a given audio signal to the score. This process can be achieved in real-time or in an offline manner.
|Audio to Score Alignment example (2014)||Orchestral music alignment (2014)|
Beyond music and voice signals, we applied our algorithms to environmental sounds and bioacoustics as well. For example, the generation of soundscapes (ex: Janer et al., 2011) or analysis and denoising of marine bioacoustic signals.
|Soundscape generation demo (2011)|