Expressive Singing Synthesis Based on Unit Selection for the Singing Synthesis Challenge 2016

TitleExpressive Singing Synthesis Based on Unit Selection for the Singing Synthesis Challenge 2016
Publication TypeConference Paper
Year of Publication2016
Conference NameInterspeech
AuthorsBonada, J., Umbert M., & Blaauw M.
Conference Start Date13/09/2016
Conference LocationSan Francisco, USA
KeywordsExpressive Synthesis, singing voice synthesis
AbstractSample and statistically based singing synthesizers typically require a large amount of data for automatically generating expressive synthetic performances. In this paper we present a singing synthesizer that using two rather small databases is able to generate expressive synthesis from an input consisting of notes and lyrics. The system is based on unit selection and uses the Wide-Band Harmonic Sinusoidal Model for transforming samples. The first database focuses on expression and consists of less than 2 minutes of free expressive singing using solely vowels. The second one is the timbre database which for the English case consists of roughly 35 minutes of monotonic singing of a set of sentences, one syllable per beat. The synthesis is divided in two steps. First, an expressive vowel singing performance of the target song is generated using the expression database. Next, this performance is used as input control of the synthesis using the timbre database and the target lyrics. A selection of synthetic performances have been submitted to the Interspeech Singing Synthesis Challenge 2016, in which they are compared to other competing systems.
intranet