Signal_Processing_Magazine_2015

 

Sound examples for the Signal Processing Magazine 2015 paper entitled 
"Expression Control in Singing Voice Synthesis:
Features, Approaches, Evaluation, and Challenges."

Umbert, M., Bonada J., Goto M., Nakano T., & Sundberg J.

Music Technology Group, Universitat Pompeu Fabra
Barcelona, Spain

 
 

Reference

Umbert, M., Bonada J., Goto M., Nakano T., & Sundberg J. (In Press). Expression Control in Singing Voice Synthesis: Features, Approaches, Evaluation, and Challenges. IEEE Signal Processing Magazine.[PDF (coming soon)]

Abstract
 
In the context of singing voice synthesis, expression control manipulates a set of voice features related to a particular emotion, style, or singer. Also known as performance modeling, it has been approached from different perspectives and for different purposes, and different projects have shown a wide extent of applicability. The aim of this article is to provide an overview of approaches to expression control in singing voice synthesis. Section I introduces some musical applications that use singing voice synthesis techniques to justify the need for an accurate control of expression. Then, expression is defined and related to speech and instrument performance modeling. Next, Section II presents the commonly studied set of voice parameters that can change perceptual aspects of synthesized voices. Section III provides, as the main topic of this review, an up-to-date classification, comparison, and description of a selection of approaches to expression control. Then, Section IV describes how these approaches are currently evaluated and discusses the benefits of building a common evaluation framework and adopting perceptually-motivated objective measures. Finally, Section V discusses the challenges that we currently foresee.

Download audios samples

Audio content

  1. Audio samples which are refered in the original article.
  2. Classification of Expression Control Methods in Singing Voice Synthesis.
  3. Gathered singing voice synthesis performances with different expression control approaches:
    • where both URLs from the original publication and audio are provided if available.
    • * in case the audio link is not provided in the refered article, audios are published with the author's permission.

Audio Samples

Description Clara Rockmore’s theremin performance of Vocalise
Sound Original source
Samples

 

Description Expression analysis of a singing voice recording sample
Sound MTG Studio Recording
Samples

 

Classification of Expression Control Methods in Singing Voice Synthesis

 

 

Singing voice synthesis performances with different expression control approaches

  1. Performance-driven
  2. Rule-Based
  3. Statistical Modeling
  4. Unit Selection

Performance-driven

Title High Quality Singing Synthesis using the Selection-based Synthesis Scheme
Type Non-iterative
Publication

Y. Meron (1999)

Sound

Original source

Synthesized audio
 
Title Performance-driven control for sample-based singing voice synthesis
Type Non-iterative
Publication

J. Janer et al (2006)

Sound

Original source

Reference audio
Synthesized audio
 
Title Speech-to-singing synthesis: converting speaking voices to singing voices by controlling acoustic features unique to singing voices
Type Non-iterative
Publication

T. Saitou et al (2007)

Sound

Original source

Reference audio
Synthesized audio
Reference audio
Synthesized audio
 
Title VocaListener: A Singing-to-Singing Synthesis System Based on Iterative Parameter Estimation
Type Iterative
Publication

T. Nakano and M. Goto (2009)

Sound

Original source

Reference audio
Synthesized audio

Title VocaListener2: A Singing Synthesis System Able to Mimic a User's Singing in Terms of Voice Timbre Changes As Well As Pitch and Dynamics
Type Iterative
Publication

T. Nakano and M. Goto (2011)

Sound

Original source

Reference audio
Synthesized audio


Rule-based

 
Title The KTH synthesis of singing
Type KTH rules
Publication

J. Sundberg (2006)

Sound Original audio links and description in the reference
Ex 3: Different pitch change timing
Ex 13: Synthesized tenor voice
Ex 16: Synthesized boy soprano
 
Title Expressive Performance Model for a Singing Voice Synthesizer
Type KTH-rules with Vocaloid
Publication

M. Alonso (2005)

Sound  *
Audio: dry
Audio: anger
Audio: fear
Audio: happy
Audio: sad
Audio: tender
 
Title Voice Processing and Synthesis by Performance Sampling and Spectral Models
Type Empirically based
Publication

J. Bonada (2008)

Sound  *
Song 1 (Excerpt 1): Female
Song 1 (Excerpt 2): Male
Song 2 (Excerpt 1): Female 1
Song 2 (Excerpt 1): Female 2
Song 2 (Excerpt 2): Male 1
 

Statistical Modeling

 
Title An HMM-based Singing Voice Synthesis System
Type HMM-based
Publication

K. Saino et al (2006)

Sound Not available: check Sinsy as a later system improvement
 
Title Recent Development of the HMM-based Singing Voice Synthesis System - Sinsy
Type HMM-based
Publication

K. Oura et al (2010)

Sound

Original source

Audio: dry
Audio: anger
 
Title A singing style modeling system for singing voice synthesizers
Type HMM-based
Publication

K. Saino et al (2010)

Sound Not available

 

Unit Selection

 
Title Generating Singing Voice Expression Contours Based on Unit Selection
Type Unit selection
Publication

M. Umbert, J. Bonada, and M. Blaauw (2013)

Sound

Original source

Song 1: default synthesis
Song 1: expressive synthesis
Song 2: default synthesis
Song 2: expressive synthesis

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

intranet