News and Events

Seminar by Jean-Julien Aucouturier on spectro-temporal receptive fields for MIR
22 Nov 2013
Jean-Julien Aucouturier, from CNRS/IRCAM, gives a seminar on "Spectro-temporal receptive fields (STRFs): a biologically-plausible alternative to MFCCs?" on Friday November 22nd at 15:30h in room 55.410.

Abstract: We describe some recent experiments to adapt a recent computational model of the mammalian auditory cortex to the tasks of Music Information Retrieval. The model, called Spectro-temporal Receptive Fields (STRFs), simulates the responses of auditory cortical neurons as a filterbank of Gabor function tuned on frequencies, but also rates (temporal modulations in Hz) and scales (frequency modulations in cycle/octave). Off the shelf, it provides a 30,000 dimensional feature space; when these dimensions are integrated, we can derive novel signal representations/features that  (1) perform equivalently or better than e.g. Mel-Frequency Cepstrum Coefficients for a task of audio similarity, (2) are somewhat amusing (e.g. dynamic frequency wrapping instead of DTW), and (3) more plausible that the usual MIR features from a biological point of view. 

18 Nov 2013 - 10:46 | view
Martín Haro defends his PhD thesis on November 22nd
22 Nov 2013

Martín Haro defends his PhD thesis entitled "Statistical Distribution of Common Audio Features" on Friday November 22nd 2013 at 11:00h in room 55.309 of the Communication Campus of the UPF.

The jury of the defense is: Josep Lluis Arcos (IIIA-CSIC), Emilia Gómez (UPF) and Jean-Julien Aucouturier (IRCAM).

Abstract: In the last few years some Music Information Retrieval (MIR) researchers have spotted important drawbacks in applying standard successful-inmonophonic algorithms to polyphonic music classification and similarity assessment. Noticeably, these so called “Bag-of-Frames” (BoF) algorithms share a common set of assumptions. These assumptions are substantiated in the belief that the numerical descriptions extracted from short-time audio excerpts (or frames) are enough to capture relevant information for the task at hand, that these frame-based audio descriptors are time independent, and that descriptor frames are well described by Gaussian statistics. Thus, if we want to improve current BoF algorithms we could: i) improve current audio descriptors, ii) include temporal information within algorithms working with polyphonic music, and iii) study and characterize the real statistical properties of these frame-based audio descriptors. From a literature review, we have detected that many works focus on the first two improvements, but surprisingly, there is a lack of research in the third one. Therefore, in this thesis we analyze and characterize the statistical distribution of common audio descriptors of timbre, tonal and loudness information. Contrary to what is usually assumed, our work shows that the studied descriptors are heavy-tailed distributed and thus, they do not belong to a Gaussian universe. This new knowledge led us to propose new algorithms that show improvements over the BoF approach in current MIR tasks such as genre classification, instrument detection, and automatic tagging of music. Furthermore, we also address new MIR tasks such as measuring the temporal evolution of Western popular music. Finally, we highlight some promising paths for future audio-content MIR research that will inhabit a heavy-tailed universe.

 

15 Nov 2013 - 14:30 | view
Giant steps kick-off: creating the next generation of digital musical tools
21 Nov 2013 - 22 Nov 2013

GiantSteps (Seven League Boots for Music Creation and Performance) is a STREP project funded by the European Commission and coordinated by the MTG in collaboration with JCP Consult, which will last 36 months starting the 1st of November 2013.

The project aims to create the "seven-league boots" for music production in the next decade and beyond. Just as the boots of European folklore tradition enable huge leaps, GiantSteps proposes digital music tools for the near future, to empower all musical users, from professionals to casual musicians and even children. Bringing different types of musical knowledge in the form of musical expert agents – including harmonic, melodic, rhythmic and song structure agents – at different proficiency levels according to the specific needs of the user, the GiantSteps tools seek to stimulate the inspiration of all users, allowing them to create collaboratively, boosting mutual inspiration, helping anyone to learn and discover while creating, and facilitating professionals to work faster, while maintaining their creative flow.

In order to meet this ambitious goal the GiantSteps project has collected a strong balanced consortium representing the whole value chain and including leading agents coming from research institutions (MTG-UPF and Universität Linz JKU), industrial partners (Native Instruments and Reactable Systems) and music practitioners (STEIM and Red Bull Music Academy). With this consortium, and the project coordination by JCP-Consult, GiantSteps will be able to combine techniques and technologies in new and unprecedented ways. This includes the combination of state of the art interfaces and interface design techniques with advanced methods in Music Information Research that have yet to be applied in a real-time interaction  context  nor  with  creativity  objectives.  The  consortium’s  industry organizations will guarantee the alignment of these cutting edge technologies with existing market requirements, allowing for a smooth integration of research outcomes into real world systems. The MTG team, coordinated by Sergi Jordà and Perfecto Herrera, will bring its expertise on music information retrieval (MIR) and advanced interaction.

The GiantSteps kick-off meeting will take place at the MTG next 21st and 22nd of November.

14 Nov 2013 - 18:25 | view
Seminar by Bob Sturm on evaluation in MIR
13 Nov 2013

Bob L. Sturm, from Aalborg University Copenhagen, gives a seminar on "The crisis of evaluation in MIR" on Wednesday November 13th 2013 at 3:30pm in room 55.410.

Abstract: I critically address the "crisis of evaluation" in music information retrieval (MIR), with particular emphasis paid to music genre recognition, music mood recognition, and autotagging. I demonstrate four things: 1) many published results unknowingly use datasets with faults that render them meaningless; 2) state-of-the-art (“high classification accuracy”) systems are fooled by irrelevant factors; 3) most published results are based upon an invalid evaluation design; and 4) a lot of work has unknowingly built, tuned, tested, compared and advertised "horses" instead of solutions. (The example of the horse Clever Hans provides an appropriate illustration.) I argue these problems occur because: 1) many researchers assume a dataset is a good dataset because many others use it; 2) many researchers assume evaluation that is standard in machine learning or information retrieval are useful and relevant for MIR; 3) many researchers mistake systematic, rigorous, and standardized evaluation for being scientific evaluation; and 4) problems and success criteria remain ill-defined, and thus evaluation poor, because researchers do not define appropriate use cases. I show how this "crisis of evaluation" can be addressed by formalizing evaluation in MIR to make clear its aims, parts, design, execution, interpretation, and assumptions. I also present several alternative evaluation approaches that can separate horses from solutions.

8 Nov 2013 - 22:37 | view
Participation to ISMIR 2013

Mohamed Sordo, Alastair Porter, Dmitry Bogdanov, Sertan Şentürk and Xavier Serra participate to the International Society for Music Information Retrieval Conference 2013 (ISMIR 2013) that will take place from November 4th to the 8th 2013 in Curitiba (Brazil). These are the research papers that will be presented:

Mohamed gives a tutorial on "Music Autotagging" and he is the chair of the Late-break/Demos session.

30 Oct 2013 - 19:43 | view
Seminar by Lonce Wyse on Audio and Interaction
31 Oct 2013

Lonce Wyse, from National University of Singapore, gives a talk on 'Audio and Interaction through the Browser' on Thursday Oct 31st, 2013 at 15.30h in room 52.321

Abstract: Emerging web standards are just beginning to address the vast unrealized potential for sound on the browser platform. The first small steps toward a capable sound synthesis system have been made with the Web API, but there are many shortcomings in its performance and usability. With faster engines and good libraries, Javascript has become the lingua franca for client side programming, and is making inroads on the server side as well, though issues remain from the perspective of sound. In this talk, I will discuss recent developments of the web platform for sound, and present several projects from my lab that are moving toward creating an ecosystem of support for sound design, synthesis, interaction, collaboration, and performance with sound on the web.

Biography: Lonce Wyse received his PhD in Cognitive and Neural systems from Boston University in 1994 with a dissertation on pitch perception. He is now an Associate Professor of Communications and New Media at the National University of Singapore. He also directs the Arts and Creativity Lab and an Art/Science Residency Program at the NUS Interactive and Digital Media Institute. He is currently spending a semester sabbatical visiting the Music Technology Group at UPF.

28 Oct 2013 - 19:01 | view
ESSENTIA wins the Open-Source Competition of ACM-Multimedia

ESSENTIA, an audio analysis software library developed at the MTG, wins the Open-Source Software Competition of ACM Multimedia 2013, the worldwide premier multimedia conference that took place in Barcelona from October 21st to 25th 2013. 

The ACM Multimedia Open-Source Software Competition celebrates the invaluable contribution of researchers and software developers who advance the field by providing the community with implementations of codecs, middleware, frameworks, toolkits, libraries, applications, and other multimedia software. The criteria for judging all submissions include broad applicability and potential impact, novelty, technical depth, demo suitability, and other miscellaneous factors (e.g., maturity, popularity, student-led, no dependence on closed source, etc.).

ESSENTIA is an open-source C++ library for audio analysis and audio-based music information retrieval released under the Affero GPLv3 license. It contains an extensive collection of reusable algorithms which implement audio input/output functionality, standard digital signal processing blocks, statistical characterization of data, and a large set of spectral, temporal, tonal and high-level music descriptors. ESSENTIA is behind a number of commercial applications and has been used by many academic research projects. It is responsible for the audio similarity search functionality of Freesound.org.

For more information on ESSENTIA read the paper that was presented at ACM Multimedia. For extensive documentation and for downloading the software go to its website

25 Oct 2013 - 10:01 | view
Participation to the IEEE AASP Challenge

Gerard Roma, Waldo Nogueira and Perfecto Herrera have participated in the Challenge for Detection and Classification of Acoustic Scenes and Events organized by the Audio and Acoustic Signal Processing comittee of the IEEE. One of their submitted algorithms obtained the best accuracy for the scene classification problem involving 10 different acoustic scenes. The MTG algorithm scored well above both the baseline and the other 16 submitted algorithms. The strengths of the approach lie in taking advantage of recurrence quantification analysis features computed from MFCC.
This research is presented during this week at the 2013 Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), that is held in Mohonk Mountain House, New Paltz, NY, USA.

22 Oct 2013 - 13:57 | view
Seminar by George Tzanetakis on Physical Modelling
24 Oct 2013

George Tzanetakis, from University of Victoria (Canada), gives a talk on "Physical Modelling Beyond Sound Synthesis: three case studies" on Thursday October 24th at 15:30h in room 52.321.

Abstract: Physical modelling synthesis refers to a set of sound synthesis techniques in which the generated sound is computed as the output of a set of equations describing the actual physics of sound production of musical instruments. It can be viewed as a virtual simulation of the physical world. Physical modelling synthesis has many advantages including rich realistic sound, meaningful control parameters (such as the length of a string), and computational efficiency. One of the main challenges is that, similarly to actual acoustic music instruments, the control of physical models is not trivial as there are many regions of the parameter space that produce undesirable output sound. In this talk I will describe three case studies of combining concepts from different research areas (namely New Interfaces for Musical Expression and Music Information Retrieval) and physical modelling synthesis that attempt to address this control issue:

  1.  the SoundPlane which is a multi-touch pressure sensitive surface  that provides rich control possibilities
  2. the hybrid Gyil (an African Xylophone from Ghana) in which the buzzing gourds are physically modeled on a computer but the actual wooden bars are retained and
  3. Vivi, a virtual violinist that is taught how to bow a physical violin model similarly to how a beginner player learns to bow an actual acoustic violin.
21 Oct 2013 - 11:07 | view
Participation to ACM Multimedia 2013

Quite a few members of the MTG participate to ACM Multimedia 2013, which takes place in Barcelona from October 21st to the 25th. A part from the people attending and others volunteering in the organization, the research presentations from the MTG include:

Additionally, Emilia Gomez gives a tutorial on "Multimedia Information Retrieval: Music and Audio" and also Juanjo Bosch and Álvaro Sarasúa will take part in the Doctoral Symposium.

17 Oct 2013 - 15:59 | view
intranet