Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Spyridon Kantarelis

Semantic-Aware Interpretable Multimodal Music Auto-Tagging

May 26, 2025

Andreas Patakis, Vassilis Lyberatos, Spyridon Kantarelis, Edmund Dervakos, Giorgos Stamou

Abstract:Music auto-tagging is essential for organizing and discovering music in extensive digital libraries. While foundation models achieve exceptional performance in this domain, their outputs often lack interpretability, limiting trust and usability for researchers and end-users alike. In this work, we present an interpretable framework for music auto-tagging that leverages groups of musically meaningful multimodal features, derived from signal processing, deep learning, ontology engineering, and natural language processing. To enhance interpretability, we cluster features semantically and employ an expectation maximization algorithm, assigning distinct weights to each group based on its contribution to the tagging process. Our method achieves competitive tagging performance while offering a deeper understanding of the decision-making process, paving the way for more transparent and user-centric music tagging systems.

* Accepted at Interspeech 2025

Via

Access Paper or Ask Questions

CHORDONOMICON: A Dataset of 666,000 Songs and their Chord Progressions

Oct 29, 2024

Spyridon Kantarelis, Konstantinos Thomas, Vassilis Lyberatos, Edmund Dervakos, Giorgos Stamou

Abstract:Chord progressions encapsulate important information about music, pertaining to its structure and conveyed emotions. They serve as the backbone of musical composition, and in many cases, they are the sole information required for a musician to play along and follow the music. Despite their importance, chord progressions as a data domain remain underexplored. There is a lack of large-scale datasets suitable for deep learning applications, and limited research exploring chord progressions as an input modality. In this work, we present Chordonomicon, a dataset of over 666,000 songs and their chord progressions, annotated with structural parts, genre, and release date - created by scraping various sources of user-generated progressions and associated metadata. We demonstrate the practical utility of the Chordonomicon dataset for classification and generation tasks, and discuss its potential to provide valuable insights to the research community. Chord progressions are unique in their ability to be represented in multiple formats (e.g. text, graph) and the wealth of information chords convey in given contexts, such as their harmonic function . These characteristics make the Chordonomicon an ideal testbed for exploring advanced machine learning techniques, including transformers, graph machine learning, and hybrid systems that combine knowledge representation and machine learning.

Via

Access Paper or Ask Questions

Perceptual Musical Features for Interpretable Audio Tagging

Jan 04, 2024

Vassilis Lyberatos, Spyridon Kantarelis, Edmund Dervakos, Giorgos Stamou

Figure 1 for Perceptual Musical Features for Interpretable Audio Tagging

Figure 2 for Perceptual Musical Features for Interpretable Audio Tagging

Figure 3 for Perceptual Musical Features for Interpretable Audio Tagging

Figure 4 for Perceptual Musical Features for Interpretable Audio Tagging

Abstract:In the age of music streaming platforms, the task of automatically tagging music audio has garnered significant attention, driving researchers to devise methods aimed at enhancing performance metrics on standard datasets. Most recent approaches rely on deep neural networks, which, despite their impressive performance, possess opacity, making it challenging to elucidate their output for a given input. While the issue of interpretability has been emphasized in other fields like medicine, it has not received attention in music-related tasks. In this study, we explored the relevance of interpretability in the context of automatic music tagging. We constructed a workflow that incorporates three different information extraction techniques: a) leveraging symbolic knowledge, b) utilizing auxiliary deep neural networks, and c) employing signal processing to extract perceptual features from audio files. These features were subsequently used to train an interpretable machine-learning model for tag prediction. We conducted experiments on two datasets, namely the MTG-Jamendo dataset and the GTZAN dataset. Our method surpassed the performance of baseline models in both tasks and, in certain instances, demonstrated competitiveness with the current state-of-the-art. We conclude that there are use cases where the deterioration in performance is outweighed by the value of interpretability.

* Github Repository: https://github.com/vaslyb/perceptible-music-tagging

Via

Access Paper or Ask Questions

Employing Crowdsourcing for Enriching a Music Knowledge Base in Higher Education

Jun 12, 2023

Vassilis Lyberatos, Spyridon Kantarelis, Eirini Kaldeli, Spyros Bekiaris, Panagiotis Tzortzis, Orfeas Menis - Mastromichalakis, Giorgos Stamou

Figure 1 for Employing Crowdsourcing for Enriching a Music Knowledge Base in Higher Education

Figure 2 for Employing Crowdsourcing for Enriching a Music Knowledge Base in Higher Education

Figure 3 for Employing Crowdsourcing for Enriching a Music Knowledge Base in Higher Education

Figure 4 for Employing Crowdsourcing for Enriching a Music Knowledge Base in Higher Education

Abstract:This paper describes the methodology followed and the lessons learned from employing crowdsourcing techniques as part of a homework assignment involving higher education students of computer science. Making use of a platform that supports crowdsourcing in the cultural heritage domain students were solicited to enrich the metadata associated with a selection of music tracks. The results of the campaign were further analyzed and exploited by students through the use of semantic web technologies. In total, 98 students participated in the campaign, contributing more than 6400 annotations concerning 854 tracks. The process also led to the creation of an openly available annotated dataset, which can be useful for machine learning models for music tagging. The campaign's results and the comments gathered through an online survey enable us to draw some useful insights about the benefits and challenges of integrating crowdsourcing into computer science curricula and how this can enhance students' engagement in the learning process.

* To be published in The 4th International Conference on Artificial Intelligence in Education Technology (AIET 2023), Berlin, Germany, 31 June-2 July 2023. For The GitHub code for the created music dataset, see https://github.com/vaslyb/MusicCrow

Via

Access Paper or Ask Questions

SpotHitPy: A Study For ML-Based Song Hit Prediction Using Spotify

Jan 19, 2023

Ioannis Dimolitsas, Spyridon Kantarelis, Afroditi Fouka

Abstract:In this study, we approached the Hit Song Prediction problem, which aims to predict which songs will become Billboard hits. We gathered a dataset of nearly 18500 hit and non-hit songs and extracted their audio features using the Spotify Web API. We test four machine-learning models on our dataset. We were able to predict the Billboard success of a song with approximately 86\% accuracy. The most succesful algorithms were Random Forest and Support Vector Machine.

Via

Access Paper or Ask Questions