|Abstract||Research on how humans categorize music genres is still in its infancy. There is no computational model that explains how musical features are attended, selected and weighted in order to yield genre decision. We also lack of a model that explains how new categories are created and integrated into our musical knowledge structures. Contrastingly, one of the most active areas in Music Information Retrieval is that of building automatic genre classification systems. Most of their systems can achieve good results (80% of correct decisions) when the number of genres to be classified is small (i.e. less than 10). They usually rely on timbre and rhythmic features that do not cover the whole range of musical facets, nor the whole range of conceptual abstractness that seem to be used when humans perform this task. The aim of our work is to improve our knowledge about the importance of different musical facets and features on genre decisions. We present a series of listening experiments where audio has been altered in order to preserve some properties of music (rhythm, timbre, harmonics
) but at the same time degrading other ones. It was expected that, for specific genre discrimination (e.g., folk versus pop), timbre alterations would be more critical than rhythm alterations. In other words, we try to find whether a given genre can be identified by a unique property of music or by a weighted combination of them. The pilot experiment we report here used 42 excerpts of modified audio (representing 9 musical genres). Listeners, who had different musical background, had to identify the genre of each one of the excerpts. Results of this survey have been used to build an ensemble of genre-tuned classifiers that improved up to 4 percent points the observed performance over a standard generic classifier for the 9 genres. These results show that understanding our perceptual and cognitive constraints and preferences is important when building a MIR system. With the inclusion of this information, the accuracy of automatic genre recognition systems is improved and its computational efficiency increases as well.