Seminar by Bob Sturm on evaluation in MIR

Back Seminar by Bob Sturm on evaluation in MIR

Seminar by Bob Sturm on evaluation in MIR

08.11.2013

13 Nov 2013

Bob L. Sturm, from Aalborg University Copenhagen, gives a seminar on "The crisis of evaluation in MIR" on Wednesday November 13th 2013 at 3:30pm in room 55.410.

Abstract: I critically address the "crisis of evaluation" in music information retrieval (MIR), with particular emphasis paid to music genre recognition, music mood recognition, and autotagging. I demonstrate four things: 1) many published results unknowingly use datasets with faults that render them meaningless; 2) state-of-the-art (“high classification accuracy”) systems are fooled by irrelevant factors; 3) most published results are based upon an invalid evaluation design; and 4) a lot of work has unknowingly built, tuned, tested, compared and advertised "horses" instead of solutions. (The example of the horse Clever Hans provides an appropriate illustration.) I argue these problems occur because: 1) many researchers assume a dataset is a good dataset because many others use it; 2) many researchers assume evaluation that is standard in machine learning or information retrieval are useful and relevant for MIR; 3) many researchers mistake systematic, rigorous, and standardized evaluation for being scientific evaluation; and 4) problems and success criteria remain ill-defined, and thus evaluation poor, because researchers do not define appropriate use cases. I show how this "crisis of evaluation" can be addressed by formalizing evaluation in MIR to make clear its aims, parts, design, execution, interpretation, and assumptions. I also present several alternative evaluation approaches that can separate horses from solutions.

MTG - Music Technology Group

Seminar by Bob Sturm on evaluation in MIR

Seminar by Bob Sturm on evaluation in MIR

Multimedia

Categories:

SDG - Sustainable Development Goals:

Contact