Emotional Speech Synthesis for a Radio DJ: Corpus Design and Expression Modeling

Umbert, M.

Note: This bibliographic page is archived and will no longer be updated. For an up-to-date list of publications from the Music Technology Group see the Publications list .

Emotional Speech Synthesis for a Radio DJ: Corpus Design and Expression Modeling

Title	Emotional Speech Synthesis for a Radio DJ: Corpus Design and Expression Modeling
Publication Type	Master Thesis
Year of Publication	2010
Authors	Umbert, M.
preprint/postprint document	static/media/Umbert-Marti-Master-Thesis-2010.pdf
Abstract	This master thesis concerns the design of a corpus for speech synthesis as well as the modeling of different emotions in the context of a Radio DJ speaker. In the context of the radio DJ speaker we designed a corpus that represents what radio DJs use to present songs being played in a radio show. A professional speaker has been recorded uttering a set of these sentences in different levels of arousal and speed. By labeling the phonemes of the recorded phonemes, control parameters have been extracted from these sentences in order to transform or synthesize them in other emotion and speech rate conditions, and thus change the control parameters accordingly or the synthesized keywords such as a band or a song name. More precisely, the aim of this project is to model how different acoustic parameters behave according to a given emotion. The model considers syllable energy, duration and pitch which will be used to transform (or even synthesize) a recorded sentence into another with a different emotion. These results are objectively compared to the training data as well as subjectively evaluated in terms of emotion activation and speech rate.