News and Events

Seminar by Geraint Wiggins on Computational Creativity
2 Mar 2017

Geraint A. Wiggins, from Queen Mary University of London, gives a seminar on "Creativity, deep symbolic learning, and the information dynamics of thinking" on Thursday, March 2nd 2017, at 15:30h in room 55.309 of the Communication Campus of the UPF.

Abstract: I present a hypothetical theory of cognition which is based on the principle that mind/brains are information processors and compressors, that are sensitive to certain measures of information content, as defined by Shannon (1948). The model is intended to help explicate processes of anticipatory and creative reasoning in humans and other higher animals. The model is motivated by the evolutionary value of prediction in information processing in an information-overloaded world.
The Information Dynamics of Thinking (IDyOT) model brings together symbolic and non-symbolic cognitive architectures, by combining sequential modelling with hierarchical symbolic memory, in which symbols are grounded by reference to their perceptual correlates. This is achieved by a process of chunking, based on boundary entropy, in which each segment of an input signal is broken into chunks, each of which corresponds with a single symbol in a higher level model. Each chunk corresponds with a temporal trajectory in the complex Hilbert space given by a spectral transformation of its signal; each symbol above each chunk corresponds with a point in a higher space which is in turn a spectral representation of the lower space. Norms in the spaces admit measures of similarity, which allow grouping of categories of symbol, so that similar chunks are associated with the same symbol. This chunking process recurses “up” IDyOT’s memory, so that representations become more and more abstract.
It is possible to construct a Markov Model along a layer of this model, or up or down between layers. Thus, predictions may be made from any part of the structure, more or less abstract, and it is in this capacity that IDyOT is claimed to model creativity, at multiple levels, from the construction of sentences in everyday speech to the improvisation of musical melodies.
IDyOT’s learning process is a kind of deep learning, but it differs from the more familiar neural network formulation because it includes symbols that are explicitly grounded in the learned input, and its answers will therefore be explicable in these terms.
In this talk, I will explain and motivate the design of IDyOT with reference to various different aspects of music, language and speech processing, and to animal behaviour.

24 Feb 2017 - 18:13 | view
YoMo: Presentation of TELMI project in the Mobile World Congress

On February 27th at 1PM we will be presenting TELMI project in the Youth Mobile Festival (YoMo) which is part of the GSMA Mobile World Congress. This festival is aimed at students from 10 to 16 years old, and it's expected to have 15.000 students visiting the festival.

The presentation will be focused on how interactive technologies can help in instrument learning, we will show some demos and the participants will be able to try some prototypes.

Find here more details of the activity:

21 Feb 2017 - 17:46 | view
Journal paper on the QUARTET dataset and the Repovizz system

Over the past few years, there has been an increasingly active discussion about publishing and accessing datasets for reuse in academic research. Although sometimes driven by concrete needs concerning a particular dataset or project, this topic is not accessory. In a data-driven research community like ours, it is very healthy to exchange ideas and perspectives on how to devise flexible means for making our data and results accessible--a valuable pursuit towards supporting research reproducibility.

The Music Technology Group of UPF hosts and provides free access to a number of datasets for music and audio research. As it normally happens with other published datasets, one needs to download the data files and procure means for exploring them locally. If limited to audio files or annotations, such process does not generally bring significant difficulties other than data volume. However, as the number and nature of modalities, extracted descriptors, and annotations increase (think of motion capture, video, physiological signals, time series of different sample rates, etc.), difficulties arise not only in the design or adoption of formatting schemes, but also in the availability of platforms that enable and facilitate exchange by providing simple ways to remotely visualize or explore the data before downloading.

In the context of several recent projects focused on music performance analysis and multimodal interaction, we had to collect, process, and annotate music performance recordings that included dense data streams of different modalities. Envisioning the future release of our dataset for the research community, we realized the need for better means to explore and exchange data. Since then, at UPF we have been developing Repovizz, a remote hosting platform for multimodal data storage, visualization, annotation, and selective retrieval via a web interface and a dedicated API.

By way of the recently published article E. Maestre, P. Papiotis, M. Marchini, Q. Llimona, O. Mayor, A. Pérez, M. Wanderley, Enriched Multimodal Representations of Music Performances: Online Access and Visualization IEEE MultiMedia, Vol 24:1, pp. 24-34, 2017, we introduce Repovizz to the MIR Community and open access to the QUARTET dataset, a fully annotated collection of string quartet multimodal recordings released through Repovizz.

For a short, unadorned video demonstrating Repovizz, please go to Although still under development, Repovizz can be used by anyone in the academic community.

The QUARTET dataset comprises 96 recordings of string quartet exercises involving solo and ensemble conditions, containing multichannel audio (ambient microphones and piezoelectric pickups), video, motion capture (optical and magnetic) of instrumental gestures and of musician upper bodies, computed bowing gesture signals, extracted audio descriptors, and multitrack score-performance alignment. The dataset, processed and curated over the past years partly in the context of the PhD dissertation work of Panos Papiotis on ensemble interdependence, is now freely available for the research community.

16 Feb 2017 - 16:15 | view
MUTEK'17: Artist in residence at MTG

As last years, we have stablished a collaboration with MUTEK festival to promote the use of our technologies within the artists community.

This year there is an open call for artists to create a work using technologies developed in the context of RAPID-MIX project.

The selected artist will be working at the MTG and will have the support from researchers during the residence. The final work will be presented as part of the program of MUTEK festival on March 9th at Mazda Space.

Open call: artist-in-residence mutek8

14 Feb 2017 - 18:14 | view
Journal paper on orchestral music source separation along with a new dataset
We are glad to announce the publishing of a journal paper on orchestral music source separation along with the PHENICX-Anechoic dataset. The methods were prototyped during the PHENICX project and were used for tasks as orchestra focus/instrument enhancement. To our knowledge, this is the first time source separation is objectively evaluated in such a complex scenario. 

M. Miron, J. Carabias-Orti, J. J. Bosch, E. Gómez and J. Janer, "Score-informed source separation for multi-channel orchestral recordings", Journal of Electrical and Computer Engineering (2016))"

Abstract: This paper proposes a system for score-informed audio source separation for multichannel orchestral recordings. The orchestral music repertoire relies on the existence of scores. Thus, a reliable separation requires a good alignment of the score with the audio of the performance. To that extent, automatic score alignment methods are reliable when allowing a tolerance window around the actual onset and offset. Moreover, several factors increase the difficulty of our task: a high reverberant image, large ensembles having rich polyphony, and a large variety of instruments recorded within a distant-microphone setup. To solve these problems, we design context-specific methods such as the refinement of score-following output in order to obtain a more precise alignment. Moreover, we extend a close-microphone separation framework to deal with the distant-microphone orchestral recordings. Then, we propose the first open evaluation dataset in this musical context, including annotations of the notes played by multiple instruments from an orchestral ensemble. The evaluation aims at analyzing the interactions of important parts of the separation framework on the quality of separation. Results show that we are able to align the original score with the audio of the performance and separate the sources corresponding to the instrument sections.

The PHENICX-Anechoic dataset includes audio and annotations useful for tasks as score-informed source separation, score following, multi-pitch estimation, transcription or instrument detection, in the context of symphonic music. This dataset is based on the anechoic recordings described in this paper:

Pätynen, J., Pulkki, V., and Lokki, T., "Anechoic recording system for symphony orchestra," Acta Acustica united with Acustica, vol. 94, nr. 6, pp. 856-865, November/December 2008.
For more information about the dataset and how to download you can access the PHENICX-Anechoic web page
14 Feb 2017 - 13:19 | view
SMC students winners in the Bulgaria Music Hackathon

Sound and Music Computing Master's students Manaswi Mishra, Kushagra Sharma and Siddharth Bhardwaj won the Music Vision category in the Bulgaria Music Hackathon with the project "Samsara - a virtual interactive soundscape", and got an opportunity of a sponsored visit to the monthly Music Hackathon in New York City, USA.

This project was a virtual interactive soundscape where multiple performers can create and manipulate music through their smartphones/touch interfaces. The virtual world is a physics based system where different types of virtual atoms interact with each other and the performer. Each performer has a role in the collaborative soundscape - a creator (sound generator atoms), a preserver (filters and fx atoms) and a destroyer (destroying the number of generators (creator atoms) in the system). Performers can create, delete and manipulate their type of atoms in the dynamic system of moving atoms and collisions to create a collaborative soundscape.

More information about the project can be found here: Presentation at Hackathon.

And a report about the event and interview on Live Bulgarian national TV: Eurocom TV interview.



13 Feb 2017 - 16:34 | view
Tenure Track position in Machine Learning at UPF

The Unit of Engineering in Information and Communication Technologies (ETIC) ( at the Pompeu Fabra University in Barcelona is opening an academic position in Machine Learning.

ETIC has a highly interdisciplinary research environment with faculty interests covering a broad range of topics that can be broadly grouped into four main areas where data-driven research is transversal to all of them: (

Multimedia Technologies, covering topics in image processing and cinematography, sound and music computing, human-computer Interaction, graphics and educational technologies and cognition as it relates to multimedia.

Computational Biology and Biomedical Systems including topics such as computational neuroscience, analysis of biomedical data, nonlinear signal analysis in biological systems, instrumentation and biomedical electronics, computational simulation and biomechanics, medical imaging and modeling of complex biomedical systems.

Computation and Intelligent Systems covering foundations of computer science, several aspects of artificial intelligence such as planning, natural language processing, computer vision, machine learning, ubiquitous computing, information retrieval and data mining, and human cognition and its relation to robotics. 

Networking and Communications with topics related to both wired and wireless networks, network science, information theory and coding, as well as policy aspects and strategies related to networking technologies.

Currently ETIC is strengthening its research interests in these four areas with special interest in Machine Learning. We are looking for a junior scientist with strong research interests in any of these areas but with special focus in Machine Learning, and we seek applications for a Tenure-Track position. The appointed applicant will progressively lead a new research group, in close collaboration with the existing ones, being involved in the undergraduate and postgraduate teaching duties of the Department, in particular with the deployment of the new Degree in Mathematical Engineering of Data Science.

Candidates should hold a PhD degree and have an excellent scientific-technical background in Machine Learning, including (but not restricted to) topics such as Multimedia Technologies, Computational Biology and Biomedical Systems, Computation and Intelligent Systems or Networking and Communications. The candidate should also have interest in confronting and resolving scientific problems/challenges, and leading several research publications in major journals.

A motivation letter, curriculum vitae, list of publications, research plan and addresses of three putative referees should be sent as a single pdf to randp [dot] dtic [at] upf [dot] edu, before March 24th 2017.

9 Feb 2017 - 11:20 | view
Escolab 2017: presentation of 28x28 project

We will participate in Escolab 2017 presenting the project "28x28" developed by the artist Xavier Bonfill and the MTG researcher Frederic Font with the support of Phonos.

28x28 make use of Freesound and Essentia to develop a system that allows to create interactive real-time music compositions by playing a domino game.

The activity will be held in the Sala Polivalent where, after an introduction to the technical and artistic work behind the project, the participants will be able to try the installation and create their own music work.

This project shows the type of interaction between research and artistic creation promoted by the MTG.

Information: Escolab UPF

7 Feb 2017 - 17:24 | view
Gopala K. Koduri and Sertan Şentürk defend their PhD thesis
22 Feb 2017

Wednesday, February 22nd 2017 at 11:00h in room 55.309 (Tanger Building, UPF Communication Campus)

Gopala K. Koduri: “Towards a multimodal knowledge base for Indian art music: A case study with melodic intonation”
Thesis director: Xavier Serra
Thesis Committee: Anja Volk (Utrecht University), Baris Bozkurt (Koç University) and George Fazekas (QMUL)
Abstract: This thesis is a result of our research efforts in building a multi-modal knowledge-base for the specific case of Carnatic music. Besides making use of metadata and symbolic notations, we process natural language text and audio data to extract culturally relevant and musically meaningful information and structuring it with formal knowledge representations. This process broadly consists of two parts. In the first part, we analyze the audio recordings for intonation description of pitches used in the performances. We conduct a thorough survey and evaluation of the previously proposed pitch distribution based approaches on a common dataset, outlining their merits and limitations. We propose a new data model to describe pitches to overcome the shortcomings identified. This expands the perspective of the note model in-vogue to cater to the conceptualization of melodic space in Carnatic music. We put forward three different approaches to retrieve compact description of pitches used in a given recording employing our data model. We qualitatively evaluate our approaches comparing the representations of pitched obtained from our approach with those from a manually labeled dataset, showing that our data model and approaches have resulted in representations that are very similar to the latter. Further, in a raaga classification task on the largest Carnatic music dataset so far, two of our approaches are shown to outperform the state-of-the-art by a statistically significant margin.
In the second part, we develop knowledge representations for various concepts in Carnatic music, with a particular emphasis on the melodic framework. We discuss the limitations of the current semantic web technologies in expressing the order in sequential data that curtails the application of logical inference. We present our use of rule languages to overcome this limitation to a certain extent. We then use open information extraction systems to retrieve concepts, entities and their relationships from natural language text concerning Carnatic music. We evaluate these systems using the concepts and relations from knowledge representations we have developed, and groundtruth curated using Wikipedia data. Thematic domains like Carnatic music have limited volume of data available online. Considering that these systems are built for web-scale data where repetitions are taken advantage of, we compare their performances qualitatively and quantitatively, emphasizing characteristics desired for cases such as this. The retrieved concepts and entities are mapped to those in the metadata. In the final step, using the knowledge representations developed, we publish and integrate the information obtained from different modalities to a knowledge-base. On this resource, we demonstrate how linking information from different modalities allows us to deduce conclusions which otherwise would not have been possible.

Wednesday, February 22nd 2017 at 16:00h in room 55.309 (Tanger Building, UPF Communication Campus)

Sertan Şentürk: “Computational Analysis of Audio Recordings and Music Scores for the Description and Discovery of Ottoman-Turkish Makam Music”
Thesis director: Xavier Serra
Thesis Committee: Gerhard Widmer (Johannes Kepler University), Baris Bozkurt (Koç University) and Tillman Weyde (City, University of London)
Abstract: This thesis addresses several shortcomings on the current state of the art methodologies in music information retrieval (MIR). In particular, it proposes several computational approaches to automatically analyze and describe music scores and audio recordings of Ottoman-Turkish makam music (OTMM). The main contributions of the thesis are the music corpus that has been created to carry out the research and the audio-score alignment methodology developed for the analysis of the corpus. In addition, several novel computational analysis methodologies are presented in the context of common MIR tasks of relevance for OTMM. Some example tasks are predominant melody extraction, tonic identification, tempo estimation, makam recognition, tuning analysis, structural analysis and melodic progression analysis. These methodologies become a part of a complete system called Dunya-makam for the exploration of large corpora of OTMM.
The thesis starts by presenting the created CompMusic Ottoman-Turkish makam music corpus. The corpus includes 2200 music scores, more than 6500 audio recordings, and accompanying metadata. The data has been collected, annotated and curated with the help of music experts. Using criteria such as completeness, coverage and quality, we validate the corpus and show its research potential. In fact, our corpus is the largest and most representative resource of OTMM that can be used for computational research. Several test datasets have also been created from the corpus to develop and evaluate the specific methodologies proposed for different computational tasks addressed in the thesis.
The part focusing on the analysis of music scores is centered on phrase and section level structural analysis. Phrase boundaries are automatically identified using an existing state-of-the-art segmentation methodology. Section boundaries are extracted using heuristics specific to the formatting of the music scores. Subsequently, a novel method based on graph analysis is used to establish similarities across these structural elements in terms of melody and lyrics, and to label the relations semiotically. 
The audio analysis section of the thesis reviews the state-of-the-art for analysing the melodic aspects of performances of OTMM. It proposes adaptations of existing predominant melody extraction methods tailored to OTMM. It also presents improvements over pitch-distribution-based tonic identification and makam recognition methodologies. 
The audio-score alignment methodology is the core of the thesis. It addresses the culture-specific challenges posed by the musical characteristics, music theory related representations and oral praxis of OTMM. Based on several techniques such as subsequence dynamic time warping, Hough transform and variable-length Markov models, the audio-score alignment methodology is designed to handle the structural differences between music scores and audio recordings. The method is robust to the presence of non-notated melodic expressions, tempo deviations within the music performances, and differences in tonic and tuning. The methodology utilizes the outputs of the score and audio analysis, and links the audio and the symbolic data. In addition, the alignment methodology is used to obtain score-informed description of audio recordings. The score-informed audio analysis not only simplifies the audio feature extraction steps that would require sophisticated audio processing approaches, but also substantially improves the performance compared with results obtained from the state-of-the-art methods solely relying on audio data.
The analysis methodologies presented in the thesis are applied to the CompMusic Ottoman-Turkish makam music corpus and integrated into a web application aimed at culture-aware music discovery. Some of the methodologies have already been applied to other music traditions such as Hindustani, Carnatic and Greek music. Following open research best practices, all the created data, software tools and analysis results are openly available. The methodologies, the tools and the corpus itself provide vast opportunities for future research in many fields such as music information retrieval, computational musicology and music education.
6 Feb 2017 - 12:43 | view
CompMusic Seminar
23 Feb 2017

On February 23rd 2017, Thursday, from 9:30h to 14:00h in room 55.309 of the Communication Campus of the Universitat Pompeu Fabra in Barcelona, we will have a CompMusic seminar. This seminar accompanies the PhD thesis defenses of Gopala Krishna Koduri and Sertan Şentürk that takes place the previous day.

9:30 Gerhard Widmer (Johannes Kepler University, Linz, Austria)
"Con Espressione! - An Update from the Computational Performance Modelling Front"
Computational models of expressive music performance have been a target of considerable research efforts in the past 20 years. Motivated by the desire to gain a deeper understanding of the workings of this complex art, various research groups have proposed different classes of computational models (rule-based, case-based, machine-learning-based) for different parametric dimensions of expressive performance, and it has been demonstrated in various studies that such models can provide interesting new insights into this musical art. In this presentation, I will review recent work that has carried  this research further. I will mostly focus on a general modelling framework known as the "Basis Mixer", and show various extensions of this model that have gradually increased the modelling power of the framework. However, it will also become apparent that are still serious limitations and obstacles on the path to comprehensive models of musical expressivity, and I will briefly report on a new ERC project entitled "Con Espressione", which expressly addresses these challenges. Along the way, we will also hear about a recent musical "Turing Test" that is said to demonstrate that computational performance models have now reached a level where their interpretations of classical piano music are qualitatively indistinguishable from true human performances -- a story that I will quickly try to put into perspective ...
10:30 Tillman Weyde (City, University of London, UK)
"Digital Musicology with Large Datasets"
The increasing availability of music data as well as networks and computing resources has the potential to profoundly change the methodology of musicological research towards a more data-driven empirical approach. However, many questions are still unanswered regarding the technology, data collection and provision, metadata, analysis methods and legal aspects. This talk will report on an effort to address these questions in the Digital Music Lab project, and present achieved outcomes, lessons learnt and challenges that emerged in this process. 
11:30 Coffee break
12:00 Anja Volk (Utrecht University, Netherlands)
"The explication of musical knowledge through automatic pattern finding"
In this talk I will discuss the role of computational modeling for gaining insights into the specifics of a musical style for which there exists no long-standing music theory such as in Western classical music, Carnatic music or Ottoman-Turkish makam music. Specifically, I address the role of automatic pattern search in enabling us to scrutinize what it is that we really know about a specific music style, if we consider ourselves to be musical experts. I elaborate my hypothesis that musical knowledge is often implicit, while computation enables us to make part of this knowledge explicit and evaluate it on a data set. This talk will address the explication of musical knowledge for the question as to when we perceive two folk melodies to be variants of each other for the case of Dutch and Irish folk songs, and when we consider a piece to be a ragtime. With examples from research within my VIDI-project MUSIVA on patterns in these musical styles, I discuss how musical experts and non-experts working together on developing computational methods can gain important insights into the specifics of a musical style, and the implicit knowledge of musical experts. 
13:00 György Fazekas (Queen Mary, University of London, UK)
"Convergence of Technologies to Connect Audio with Meaning: from Semantic Web Ontologies to Semantic Audio Production”
Science and technology plays in an increasingly vital role in how we experience, how we compose, perform, share and enjoy musical audio. The invention of recording in the late 19th century is a profound example that, for the first time in human history, disconnected music performance from listening and gave rise to a new industry as well as new fields of scientific investigation. But musical experience is not just about listening. Human minds make sense of what we hear by categorising and by making associations, cognitive processes which give rise to meaning or influence our mood. Perhaps the next revolution akin to recording is therefore in audio semantics. Technologies that mimic our abilities and enable interaction with audio on human terms are already changing the way we experience it. The emerging field of Semantic Audio is in the confluence of several key fields, namely, signal processing, machine learning and Semantic Web ontologies that enable knowledge representation and logic-based inference. In my talk, I will put forward that synergies between these fields provide a fruitful way, if not necessary to account for human interpretation of sound. I will outline music and audio related ontologies and ontology based systems that found applications on the Semantic Web, as well as intelligent audio production tools that enable linking musical concepts with signal processing parameters in audio systems. I will outline my recent work demonstrating how web technologies may be used to create interactive performance systems that enable mood-based audience-performer communication and how semantic audio technologies enable us to link social tags and audio features to better understand the relationship between music and emotions. I will hint at how some principles used in my research also contribute to enhancing scientific protocols, ease experimentation and facilitate reproducibility. Finally, I will discuss challenges in fusing audio and semantic technologies and outline some future opportunities they may bring about.
1 Feb 2017 - 13:35 | view