AudioClas: Automatic Classification of Sound Effects

Fact Sheet

Status: FINISHED - 12-Jul-2006

Technological Area, Market Area

Start Date: 01-Jan-2002
End date: 01-Feb-2005



There is currently no method to automatically classify audio and musical sound effects. To digitalize a library of 4 million already recorded effects and manually classify them would take two operators 34 years! Human classification of sound effects is also notoriously inaccurate (when is a 'slam' a 'bang' for example). We therefore wish to develop computer software that will classify these sounds sensibly, reliably and quickly. Having classified these sounds, we need to be able to search on desirable search terms, both from test (a 'bang'), from microphone input, where the operator says 'bang', or from an already existing effect, asking for similar effects. Finally when the desired effect, or effects, are found, it may be necessary to 'combine' effects. For example, we may find the 'slam' of a FORD MONDEO car door and the 'slam' of a ROLLS ROYCE car door. To get a MERCEDES car door, we may wish to 'combine' these effects to produce something half way (or three quarters) between them. Keywords: audio, classification, computer.

MTG contribution

The Music Techology Group will work on the software algorithms that analyse the sounds, extract all the possible parameters from them, and select the sound characteristics that should be stored in the database for its future reference. To do that, the SMS techniques, explained below, will be used. A sound model assumes certain characteristics of the sound waveform or the sound generation mechanism.

In general, every analysis/synthesis system has an underlying model. The sounds produced by musical instruments, or by any physical system, can be modeled as the sum of a set of sinusoids plus a noise residual. The sinusoidal, or deterministic, component normally corresponds to the main modes of vibration of the system. The residual comprises the energy produced by the excitation mechanism which is not transformed by the system into stationary vibrations plus any other energy component that is not sinusoidal in nature.

This method allows a better recognition and characterisation of sounds that improves the efficiency of sound recognition software. Parameters such as pitch, amplitude (attack, steady state and decay analysis), vibrato, peak conntinuation, spectral shape of the noise part and its time-varying magnitude, etc. are used to describe the sound. After extracting these sound description parameters, an algorithm should be developed to find similarities between sounds. A powerful and robust database organisation must be used to improve the speed and reliability of this algorithm. This technique could also be used in synthesis algorithms allowing a different processing on the sinusoidal and the stocastic part of the sound. This improves the naturalness of the sound. After that, for sound/effects modification or combination, different sound morphing algorithms could be used.


Eureka project website

Eureka success story