Acoustic Cues to Beat Induction A Machine Learning Perspective

Gouyon, F.; Widmer, G.; Xavier Serra

Note: This bibliographic page is archived and will no longer be updated. For an up-to-date list of publications from the Music Technology Group see the Publications list .

Acoustic Cues to Beat Induction A Machine Learning Perspective

Title	Acoustic Cues to Beat Induction A Machine Learning Perspective
Publication Type	Conference Paper
Year of Publication	2005
Conference Name	10th Rhythm Perception and Production Workshop
Authors	Gouyon, F. , Widmer G. , & Serra X.
Pagination	177-188
Conference Location	Bilzen, Belgium
Abstract	According to Honing (1993) , "there seems to be a general consensus on the notion of discrete elements (e.g. notes, sound events or objects) as the primitives of music ... but a detailed discussion and argument for this assumption is missing from the literature.'' While early computational models of beat induction often include the processing of discrete events as parsed scores or MIDI events, many recent systems deal directly with acoustic signals. Part of them intend to derive similar note-like representations and others refer to a data granularity of a lower level of abstraction and a different timescale acoustic features computed on consecutive short signal frames (typically 10 ms-long). While on the one hand different musical cues to beat induction have been studied in the context of discrete note representations (note time, duration, pitch, harmony), on the other hand, few different lower level features have been considered, mainly energy variations in several frequency bands. In this study, we address the question of which acoustic features are the most adequate for identifying musical beats computationally. We consider 274 different acoustic features computed on consecutive frames and evaluate systematically the worth of individual features as well as feature subsets in the task of providing reliable cues for the presence and localisation of beats in musical signals. Evaluation of features is based on a machine learning methodology implying a large corpus of beat-annotated musical audio pieces covering 10 musical genres (1360 instances, more than 90000 beats).