Emotional Speech Synthesis for a Radio DJ: Corpus Design and Expression Modeling

TitleEmotional Speech Synthesis for a Radio DJ: Corpus Design and Expression Modeling
Publication TypeMaster Thesis
Year of Publication2010
AuthorsUmbert, M.
preprint/postprint documentstatic/media/Umbert-Marti-Master-Thesis-2010.pdf

This master thesis concerns the design of a corpus for speech synthesis as well as the modeling of different emotions in the context of a Radio DJ speaker.

In the context of the radio DJ speaker we designed a corpus that represents what radio DJs use to present songs being played in a radio show. A professional speaker has been recorded uttering a set of these sentences in different levels of arousal and speed. By labeling the phonemes of the recorded phonemes, control parameters have been extracted from these sentences in order to transform or synthesize them in other emotion and speech rate conditions, and thus change the control parameters accordingly or the synthesized keywords such as a band or a song name.

More precisely, the aim of this project is to model how different acoustic parameters behave according to a given emotion. The model considers syllable energy, duration and pitch which will be used to transform (or even synthesize) a recorded sentence into another with a different emotion. These results are objectively compared to the training data as well as subjectively evaluated in terms of emotion activation and speech rate.